LWN.net Logo

(La)Tex and non-ascii character sets

(La)Tex and non-ascii character sets

Posted Feb 27, 2007 22:30 UTC (Tue) by boog (subscriber, #30882)
In reply to: "Dependent on" ??? by eklitzke
Parent article: Mitchell Baker and the Firefox Paradox (Inc)

The comment that the ".tex file needs to be ascii" is incorrect. The "inputenc" package can translate many character sets to ascii for you, so the .tex file can be in the native encoding. There is even the ucs package for translating native utf-8 unicode.

There is still at least one problem for non-ascii character sets though, and that is bibtex (the bibliography tool), which requires the .bib bibliography databases to be in standard tex (\"u for u-umlaut, etc). I wonder whether there wouldn't be a workaround whereby inputenc or ucs would be run on the .bib file before bibtex processes it.

Although I don't use it, other systems built upon tex, such as ConTeXt might be considered to have a more "modern" syntax, if you don't like (la)tex.


(Log in to post comments)

(La)Tex and non-ascii character sets

Posted Feb 28, 2007 16:49 UTC (Wed) by khim (subscriber, #9252) [Link]

Bibtex works just fine with UTF-8 AFAIK - just not the ancient version included in tetex

(La)Tex and non-ascii character sets

Posted Feb 28, 2007 21:39 UTC (Wed) by boog (subscriber, #30882) [Link]

I would be grateful if you could point me to the utf8-capable version. I have really looked quite hard, and I still have not found anything better than bibtex8, which can use 8-bit encodings but not utf8. Are you sure you are not confusing the two? See, for instance, p7 of http://www.tug.org/pracjourn/2006-4/fenn/fenn.pdf.

(La)Tex and non-ascii character sets

Posted Mar 2, 2007 16:04 UTC (Fri) by jschrod (subscriber, #1646) [Link]

Some small addition: inputenc does not "translate to ASCII", but to LICR, the LaTeX Internal Character Representation. I.e., inputenc transforms the input into internal tokens, none of them being ASCII.

And while the PP is very obviously wrong -- since more than 10 years -- in the assertion that LaTeX is ASCII only, it is true that working in a UTF-8 environment with TeX is not really well supported. Being able to type UTF-8 characters in one's TeX files is only part of the game; BibTeX, index sorting, and other auxilliary tools are really missing for most end users.

It's a sad state, but then, we're simply too few developers over here in TeX-land.

Joachim

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds