|
|
Log in / Subscribe / Register

Time to internationalize the kernel?

One of the latest bright ideas to go around on the linux-kernel mailing list is that the messages printed by the kernel should be presented in the local language. After all, the rest of the system can be localized, but the kernel remains firmly English-only. Wouldn't it be better to complete the job?

There are a number of approaches one could take to this sort of problem. One would be to have the various printk() strings available to the kernel in all supported languages, with the correct one selected at run time. One need only look at what that approach would do to the size of the kernel to reject it outright. Trying to support a compile-time language option seems impractical at best.

And besides, Linus has been quite clear on what he thinks of in-kernel localization support:

The answer is: go ahead and do it, but don't do it in the kernel. Do it in klogd or similar.

So would-be translators are forced to look at user-space solutions. Riley Williams posted one possible approach: add a unique message number to each message printed to the kernel. Format strings passed to printk() are already expected to begin with a string like "<2>", which provides the log level of the message. Why not put in, instead, something like "<2.12345>"? User-space translation code could then use the message number to index into a file of localized messages.

The devil, of course, is in the details. In the 2.5.67 kernel, there are almost 52,000 details (in the form of printk() statements). It is hard to imagine anybody having the patience to go through and assign unique message numbers to each of those statement. It's even harder to conceive of anybody being willing to translate that many messages into even a single other language. They do not make the most exciting reading material, especially since all the really good profanity is restricted to code comments. There are very few prospective translators with an itch that requires scratching that strongly.

Now try to imagine that whole structure of message numbers and translations surviving past more than about two minor kernel releases. Each new message would require a new number; just administering the number space would take quite a bit of somebody's time. Translations would have to keep up with changes to messages. Bear in mind that the 2.5.67 patch, alone, affected 824 printk() statements. 2.4.20, amazingly, affected more than 6,000. This system would be entirely unmaintainable.

So in-kernel support for internationalization is unlikely in any form. Whether it can be done entirely externally is another question; Linus suggests trying to translate the messages directly from text. That, probably, is a way of saying that it will not happen at all. But one never knows...


to post comments

Time to internationalize the kernel?

Posted Apr 17, 2003 5:10 UTC (Thu) by komarek (guest, #7295) [Link] (2 responses)

Seems like Linus' suggestion works just fine, using AltaVista's Babblefish: Grain de tux janv. 26 de 22:28:01: Syslogd 1,4,1: relancement. Grain de tux janv. 26 de 22:28:03: klogd 1,4,1, = de source de notation/proc/kmsg commencé. Grain de tux janv. 26 de 22:28:03: l'inspection/boot/System.map-2.4.19-gentoo-r10 Grain de tux janv. 26 de 22:28:03: Chargé 17634 de symboles/boot/System.map-2.4.19-gentoo-r10. Grain de tux janv. 26 de 22:28:03: Version 2,4,19 de grain d'allumette de symboles. Grain de tux janv. 26 de 22:28:03: Chargé 205 symboles de 12 modules. Okay, I'll admit I had to massage some of that by hand. But it sure looks like perfect French to me. Well, perfect French for a Monty Python sketch. Here's an interesting translation. Anyone care to guess which kernel message produced it?
  Module de gestion de périphérique inscrit de bloc chargé
I have this sad picture in my head of a young French sysadmin, who hated English classes au lycee, trying to debug system problems by running syslog output through Babblefish. -Paul Komarek

Time to internationalize the kernel?

Posted Apr 24, 2003 9:35 UTC (Thu) by durandal (guest, #10860) [Link] (1 responses)

Hello, I am French and when I read this message, I laugh :) Automatic translators are not reliable, these kernel messages don't mean anything. The word "kernel" itself is translated into "grain", but in French we say "noyau" (the word for the hard part of a fruit, like the cherry, I don't know if "kernel" means exactly the same thing in English, my dictionary isn't very clear). There are also a lot of grammatical errors... (surely like in this comment btw ;) To give you an idea, I translated my kernel messages in French and then in English using Google :
Apr 22 10:03:25 Anduril syslogd 1.4.1#10:  restarting.  
Apr 22 10:03:25 Core of Anduril:  klogd 1.4.1#10, source of journalizing =/proc/kmsg started.
Apr 22 10:03:25 Core of Anduril:  Examination of/boot/System.map-2.4.18 
Apr 22 10:03:25 Anduril usbmgr[192 ]:  starting 0.4.8 
Apr 22 10:03:25 Core of Anduril:  15211 symbols charged from /boot/System.map-2.4.18.  
Apr 22 10:03:25 Core of Anduril:  The symbols correspond to version 2.4.18 of the core.  
Apr 22 10:03:25 Core of Anduril:  114 symbols charged starting from 9 modules.  
Your last French sentence is not clear too, I couldn't translate it :), but it might end with "block device module loaded".

Time to internationalize the kernel?

Posted Oct 27, 2003 0:48 UTC (Mon) by komarek (guest, #7295) [Link]

The part of a fruit which can be used to grow new fruit is a seed. A cherry seed is also called a "pit". A peach seed is a "stone". Clear yet? ;-)

We only use "kernel" to refer to the hard part of a grain, as far as I know.

It's been so long since I posted that comment that I'm not sure what the source of the message was. But I think your translation sounds reasonable. I wonder if it might have been something about journaling, though?

-Paul Komarek

Time to internationalize the kernel?

Posted Apr 17, 2003 10:54 UTC (Thu) by NAR (subscriber, #1313) [Link] (1 responses)

Why not put in, instead, something like "<2.12345>"?

Assigning a number to an error message sounds like a good idea: in some systems when the user got an error message like this, he went to the bookshelf, took the appropriate book and looked up the error message by its number.

But with Linux, one usually doesn't go to the bookshelf, but puts the text of the error message into Google's search field to get information about the error. And this method works quite well. But imagine, if I get an error message from the kernel in, let's say, Hungarian. How many hits would I get from Google for the Hungarian message? Not much, and very much less than if I would have searched for the original English message in the first place. And there are some even smaller languages out there...

Bye,NAR

Time to internationalize the kernel?

Posted Apr 23, 2003 12:14 UTC (Wed) by mwilck (subscriber, #1966) [Link]

> How many hits would I get from Google for the Hungarian message?
> Not much

Exactly. However this applies just as well to messages from other parts of the system, not the kernel alone. Actually, it applies to anything where language is used for human-machine interaction. Think of menu items: If you run a non-English environment, you'll hardly be able to submit a meaningful bug report because you can't tell which GUI elements you clicked on.

From this perspective, I18n was a bad idea right from the start. Unfortunately, it's impossible to turn back the wheel ... but please, don't put it into the kernel.

Time to internationalize the kernel?

Posted Apr 18, 2003 15:20 UTC (Fri) by wolfrider (guest, #3105) [Link]

--This whole idea of "internationalization" is fairly recent anyway. Back in the Good Old Days, the priests^H^H^H^H^H^H^Hprogrammers had to learn the COMPUTER's language to get by! Ones and zeroes, baybee!

--Seriously, I don't see why it should be internationalized. To do it right, you would have to have a very large consortium of developers maintaining the stuff full time. If you want to use Linux, learn English - you could always fork the kernel code for your country, but that's a huge undertaking. KISS.

Why use translation?

Posted Apr 18, 2003 17:05 UTC (Fri) by melauer (guest, #2438) [Link] (1 responses)

Isn't the text of a kernel message, or the constant part of it (i.e. minus any variables such as device names), just as unique as a number assigned to the message would be? If so, then perhaps the text of the kernel message itself could be the index to a translation database, just like the index number suggested by Mr. Williams. You would need more than a basic pattern matcher to match those kernel messages which include device names, path names, or any other variables, but that should be doable.

The only problem with this method that I can think of would be if the text of kernel messages changed too much, causing the error message database to need constant updating. Even then, any given distro should be able to create an error message database for their official kernels. Once the database is created for one kernel, they only need to update it for each new kernel release, and these occur infrequently.

Why use translation?

Posted Apr 18, 2003 17:36 UTC (Fri) by melauer (guest, #2438) [Link]

I hate to reply to myself, but there's something I should have mentioned.

Although the article mentions massive numbers of kernel error messages being "affected" in recent kernels, I'm not convinced that this means that the text of kernel messages changes regularly. Even if "affected" means that text was changed, remember that there was a lot of patching related to cleaning up spelling/grammatical errors in the kernel recently (The sort of thing probably won't happen again anytime soon). I'll have look at those patches when I have the time to see what exactly went on. In the meantime, I'm still not convinced that kernel error messages change so much from patch to patch.

Time to internationalize the kernel?

Posted Apr 19, 2003 4:45 UTC (Sat) by dizzl (guest, #5521) [Link]

I agree with the editor's comment that maintaining a number space for all
messages is at best difficult and very time consuming.

Thus another approach might be to separate namespaces so there is less
chance of clashes between message numbers. For instance, printk() can
encode the filename (__FILE__) into the message number, automatically
separating all files into separate namespaces. A message id could then
become something like "<2:kernel/sched.c:3>".

This could mean a big increase in the kernel binary however, unless the
English texts are also moved out of the kernel itself and into a lookup by klogd.

Time to internationalize the kernel?

Posted Apr 20, 2003 20:14 UTC (Sun) by hthoma (subscriber, #4743) [Link]

Please don't drive i18n too far. I am german and I already
hate that I get "Speicherzugriffsfehler" instead of
"segmentation fault" with the standard setup. Anyone who
knows his computer and linux good enough to understand
a kernel message knows english good enough anyway.

Herbert.

Time to internationalize the kernel?

Posted Apr 28, 2003 18:52 UTC (Mon) by lpq (guest, #9526) [Link]

Why not "<2>[drivers/hotplug/acpiphp_core.c:34] "?

Numbers on a per-file name basis, inserted by an automated pre-processor
before a final kernel is released?

After the first release, pre-processor would likely need to be 2 pass
to find highest previously used number, not change any existing message
numbers and add newer, higher numbered messages to any new added
messages. This would, also allow for manual assignment of a message
number to "override" auto-assignment.

I thought about abbreviating the pathname components, but better safe
than sorry(?).

-l


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds