|
|
Subscribe / Log in / New account

Riddled with errors (Addendum)

Riddled with errors (Addendum)

Posted Feb 9, 2006 4:50 UTC (Thu) by dractyl (subscriber, #26334)
In reply to: Riddled with errors by dractyl
Parent article: Setting up international character support (Linux.com)

Well now that I don't have a splitting headache, I should correct some minor points, just for the sake of completeness.

1. As was already pointed out UTF-8 encodes all of ASCII in 1 byte per character, not ISO 8859-1. It should be noted however that Unicode's codepoints are identical to ISO 8859-1's, just encoded differently.

2. CODE2000 is *not* free but shareware. In fact, I just paid my shareware fee last night when I discovered my mistake. You can find the font at:

http://home.att.net/~jameskass/index.htm

3. In my response, I gave the author a hard time about not pointing out any fonts, which is not quite accurate. While the author did manage to point out the Gentium unicode font, that font only has 1500 or so glyphs, which only makes it suitable for latin and greek (and soon cyrillic) alphabets. It's not anywhere near unicode complete and is not suitable for non-western users.

4. I stated I never came across "UTF-8" in the X config files. This is true as far as the keymaps are concerned, but they do show up in the locales, which affect input via the Compose file which defines what sequences of characters you can type in to get a given character. As usual in Unix there is more than one way to do it. For example, if I want to type in an Ò you can do it any of these ways:

<dead_grave> <O> : "Ò" U00D2 # LATIN CAPITAL LETTER O WITH GRAVE
<Multi_key> <grave> <O> : "Ò" U00D2 # LATIN CAPITAL LETTER O WITH GRAVE
<combining_grave> <O> : "Ò" U00D2 # LATIN CAPITAL LETTER O WITH GRAVE

I used <dead_grave> <O> because I use dead_keys rather than a Compose key (known here as Multi_key). In my keymap I have <dead_grave> set to AltGr+` where AltGr is mapped to the right alt key.

In any case, your locale can affect what sequence of characters you need to type in to get a specific character. For example, tt_RU.TATAR-CYR, tt_RU.KOI8-C, and tt_RU.UTF-8 all have different compose sequences from each other. Thankfully, all UTF-8 locales use the same compose sequences except pt_BR.UTF-8 which changes the compose sequence for a few characters.

Once again, this does not actually have anything to do with UTF-8 or even unicode for that matter. These compose combinations are based on custom and the X11R6 standard rather than Unicode.

5. I stated deadkey keymaps were the best way to input characters such as ü or ç. The Compose key or any special keys you may have on your keyboard are good too. It's really up to personal choice. The point I was making was that this isn't a UTF-8 issue.

6. I stated that you get a square if your font is failing you. This is true, but you may also get a small black diamond with a question mark in it or indeed nothing at all if it's a very bad font.


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds