LWN.net Logo

Asian Governments Start to Speak the Same Language on Linux Implementations (LinuxInsider)

LinuxInsider.com reports that several Asian countries are sharing research about conversion to Linux. "According to Japanese officials, the purpose of talks between Japan, South Korea and China is to share research findings, reduce the amount of money spent on Windows licensing and maintenance fees, and promote the use of Linux in the private sector. The main goal is to come up with a Linux standard that will support Asian languages -- which have many more characters than Western alphabets. In the Chinese language, for example, there are literally thousands of characters."
(Log in to post comments)

How about using an existing standard?

Posted May 13, 2004 16:49 UTC (Thu) by Ross (subscriber, #4065) [Link]

UTF-8 anyone?

The more difficult questions deal with input rather than storage.

How about using an existing standard?

Posted May 13, 2004 16:55 UTC (Thu) by bradfitz (subscriber, #4378) [Link]

Even if UTF-8 (thus Unicode) were perfect, the larger problem is making sure all apps play well together, passing around Unicode data to each other.

I'm by no means an expert on this, but I understand there are a lot of complaints about Unicode's Han Unification. Some links:

http://en.wikipedia.org/wiki/Han_unification
http://www.cs.uu.nl/~otfried/Mule/unihan.html

How about using an existing standard?

Posted May 13, 2004 17:47 UTC (Thu) by proski (subscriber, #104) [Link]

Note that Latin "i", Greek "i" and Ukrainian "i" have separate codes 0x0069, 0x03af and 0x0456 respectively. Despite having common roots and common look, those characters belong to different alphabets, which must have been the reason for using different codes. It's very disappointing that Unicode designers didn't apply the same standards to Asian languages. In fact, some characters that share the same code are actually written differently in Chinese and Japanese. It's hard to advocate something that has so big flaws.

How about using an existing standard?

Posted May 13, 2004 18:04 UTC (Thu) by Ross (subscriber, #4065) [Link]

That's very interesting information! I had the apparently incorrect
impression that they applied these standards to every language and that
the problem was more often having too many symbols instead of too few.

How about using an existing standard?

Posted May 13, 2004 18:28 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

Scientific papers in English commonly use Greek letters as symbols, and the Greek iota is treated as a different character than the Roman i in equations. It makes sense, therefore, to have separate codes for English and Greek letters. Ukrainian I don't know enough about. However, the original goal of Unicode was to come up with a 16-bit encoding for all characters (though this is no longer required); given that goal, it's simply not possible to assign unique codes if compromises aren't made for Asian languages.

However, ISO 10646 is a superset of the original Unicode; it allows for going up to 31 bits, though the first 65534 characters have a "preferred" status. The extra bits could be used if it is really necessary to distinguish among Asian character sets.

How about using an existing standard?

Posted May 13, 2004 18:28 UTC (Thu) by piman (subscriber, #8957) [Link]

> It's hard to advocate something that has so big flaws.

So instead we advocate... ShiftJIS? EUC-JP? HZ? Big5? EUC-TW? All of these are even worse. :(

How about using an existing standard?

Posted May 20, 2004 17:22 UTC (Thu) by dvdeug (subscriber, #10998) [Link]

Greek 'i' looks like ι, which looks distinctly different on my screen, and is used in a contrastive manner in math texts. Yes, Ukranian І and і look like I and i, but one letter does not a alphabet make. Russian/Ukranian N ('Н') may look like a Latin H, but Russian n ('н') doesn't look like a Latin h. It's simply not reasonable to unify the characters.

Another rule about unifiying was that all the legacy character sets, including the old Japanese ones, had to have a complete map into Unicode. Any character they distinguished had to be distinguished in Unicode. And JIS X 0208 encodes the Greek, Latin and Russian characters seperately.

And it's not like CJK was the only thing unified. Serbian n and Russian n were unified despite the fact that they look different when italized. Similar unification rules were applied to the entire system.

As for CJK, it's not exactly cut and dried. The Korean part, for instance, tend to complain that it wasn't unified enough, quite a few of the Chinese are happy with it, and the Japanese are often quite happy using Unicode programs as long as it doesn't say Unicode on the side. Unicode provides at least as much distinction as the legacy standards.

Finally, there is a Unicode solution; use language tags provided in the standard to distinguish between Japanese, Chinese and Korean. (Equivelently but more widely supported, use language tags in XML to distinguish between languages.) Many programs already support intelligent font selection when given language information.

As someone else pointed out, there's no comparable system. The only serious contender to Unicode is Tron. Unfortunately real information about Tron is only available in Japanese. From my reading of the English information, I get the impression that they're missing a lot of characters that Unicode has steadily added for Cherokee, Lakota, Middle High German, and mathematics, among others, and that they fall into the CJK habit of treating characters as pure glyphs and fail to provide the data to identify character identity. If Tron is only usable for CJK, it can't replace Unicode.

How about using an existing standard?

Posted May 13, 2004 18:27 UTC (Thu) by piman (subscriber, #8957) [Link]

While that's true, Unicode is still loads better than anything else, and it's what stands a good chance of getting supported. It's unfortunate that Unicode got caught up in the political/linguistic debate about the kanji/hanji, but the short of it is that it does support considerably more characters than alternatives, and it's actually implemented in programs used today.

How about using an existing standard?

Posted May 13, 2004 17:25 UTC (Thu) by freethinker (guest, #4397) [Link]

Indeed. I'm really surprised the Chinese haven't cobbled up some kind of alphabet or syllabary, if only for computer input purposes. Puts them at a big disadvantage vs Japan (kana) and Korea (hangul).

How about using an existing standard?

Posted May 14, 2004 1:02 UTC (Fri) by jzhao (guest, #2865) [Link]

It used to be a big obstacle for Chinese people to use computers. Nowadays with so many efficient Chinese input methods, Chinese people seem to be doing just fine. The explosion of Chinese online communities proves it.

How about using an existing standard?

Posted May 14, 2004 1:22 UTC (Fri) by freethinker (guest, #4397) [Link]

So, how do they do it? Not voice, I should think.

How about using an existing standard?

Posted May 14, 2004 3:22 UTC (Fri) by proski (subscriber, #104) [Link]

http://en.wikipedia.org/wiki/Chinese_input_methods_for_computer

Asian Governments Start to Speak the Same Language on Linux Implementations (LinuxInsider)

Posted May 14, 2004 8:30 UTC (Fri) by janpla (guest, #11093) [Link]

I can't imagine that anything as basic supporting Asian languages is the issue here. The version of Redhat I use is fully capable of displaying Chinese, Japanese, Korean, Thai, Arabic, etc etc, and it seems to have full support for at least the Chinese input methods.

This looks more like an exercise to get rid of the dependency on Microsoft products and taking the lead in computer development. And to be quite honest, I think they could easily do so - we in the West have got tired somehow. We are no longer a 'powerhouse of creativity', or perhaps we never were.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds