LWN.net Logo

Using Unicode in Linux (NewsForge)

NewsForge converts a Linux system to use Unicode. "First of all, check whether you're already using a Unicode locale. The command locale prints out the values of environmental variables that influence the locale settings. A complete description of their meanings is available in locale man pages. Usually, locale names consist of a lowercase language code followed by an underscore and an uppercase country code (e.g. en_US for U.S. English). Unicode locale names that use UTF-8 encoding additionally end with ".UTF-8." If such names are present in the output of locale, you are already using a Unicode locale."
(Log in to post comments)

Using Unicode in Linux (NewsForge)

Posted Nov 2, 2004 19:17 UTC (Tue) by jwb (subscriber, #15467) [Link]

Good info. Since I switched to UTF-8 I've noticed a lot of shellutils/textutils/coreutils are quite a lot slower. Grep is especially bad. I'll frequently set the locale back to "C" when I need to grep or sort something large.

But the REALLY annoying thing is my emacs keybindings in zsh+xterm are broken. Alt-f and Alt-b insert the æ and â characters, respectively, instead of moving forward or backward one word. This problem only exists in xterm, not on the console.

Is there any way to have Alt-* emacs bindings as well as UTF-8 locale and Unicode display?

Using Unicode in Linux (NewsForge)

Posted Nov 2, 2004 21:30 UTC (Tue) by iabervon (subscriber, #722) [Link]

You can tell xterm that you expect output from programs to be in UTF-8 regardless of locale with "-u8 +lc".

The alt issue is that "alt-f" is normally signalled with an "e6" byte, which is normally distinctive as not a character because it has the high bit set. However, with unicode support, you might actually type a character which is an "e6", and zsh thinks you did. The right solution is really to have ANSI escape codes for alt keys, using the same solution that works for, e.g., the arrow keys. Of course, ANSI escape sequences are a mess anyway, but at least they are a separate space as well as extensible.

Using Unicode in Linux (NewsForge)

Posted Nov 2, 2004 21:51 UTC (Tue) by jwb (subscriber, #15467) [Link]

That's the sort of thing I did not expect with UTF-8. In UTF-8 there's no valid character that is represented by 0xE6. That could be the first byte of a multibyte character, but it is invalid by itself. When I type Alt-f I get the ISO-8859-1 character at that codepoint, "LATIN SMALL LETTER AE". When I type Alt-f in a UTF-8 xterm, I'm getting the two-byte sequence 0xC3 0xA6. Which is sort of annoying. I'm sure there's a quite reasonable interaction of the keyboard driver, the kernel, XFree86, Dick Nixon, and the Communist Party, but I an unable to see it clearly from here.

When I type Alt-f in a Gnome Terminal, the File menu pops open, which is even worse ;)

Using Unicode in Linux (NewsForge)

Posted Nov 2, 2004 22:54 UTC (Tue) by iabervon (subscriber, #722) [Link]

Well, you actually get a unicode 0x000000E6, which is also the ISO-8859-1 0xE6, encoded in UTF-8 as 0xC3 0xA6. Obviously, xterm shouldn't send an 0xE6 to a program expecting UTF-8, since that wouldn't be a complete item. So it is deciding to send an alt-0x66 (seven bit ASCII 0x66, in particular) as 0x66+0x80=0xE6, and then uses the locale to determine how to encode a 0xE6 and gets 0xC3 0xA6.

The right answer, as someone else pointed out, is 0x1b 0x66, AKA "ESC f", which is already essentially the same except that it doesn't use non-ASCII characters that get complex meanings.

Using Unicode in Linux (NewsForge)

Posted Nov 3, 2004 8:09 UTC (Wed) by djao (subscriber, #4263) [Link]

When I type Alt-f in a Gnome Terminal, the File menu pops open, which is even worse ;)

This one is easy to fix. On the menubar: Edit - Keybindings - Disable all menu mnemonics. If you have no menubar, right-click on the terminal and pick Show menubar.

Now, if only it were so easy to disable menu hotkeys in the rest of GNOME, then we'd be making progress....

Using Unicode in Linux (NewsForge)

Posted Nov 2, 2004 22:32 UTC (Tue) by foom (subscriber, #14868) [Link]

With reasonable terminals, "Alt-f" is already sent as ESC f, and therefore doesn't have
that issue.

Using Unicode in Linux (NewsForge)

Posted Nov 3, 2004 14:35 UTC (Wed) by jmshh (guest, #8257) [Link]

I always considered that a design bug. This makes it difficult to differentiaten between Alt-f and Esc f at the receiving side. The only possibility left for the program controlled by some terminal is by timing. As soon as network delays come into the play, all bets are off. Also, not every computer is a PC, so not every keyboard has an Esc key.

Using Unicode in Linux (NewsForge)

Posted Nov 3, 2004 22:04 UTC (Wed) by iabervon (subscriber, #722) [Link]

Your two points are contradictory. Almost every keyboard has at least one of ESC and Alt/Meta, but not all have both. That means that you shouldn't be treating Alt-f and ESC f differently, because there are people who can't type one or the other. Programs shouldn't distinguish between a π typed on a keyboard with a π key and a π typed by using a compose-character method; the same is true of Alt-f.

Of course, some programs want to distinguish between an escape sequence, and the user just hitting ESC by itself. These will have problems. But they already have problems with, for example, arrow keys.

ALT character encodings

Posted Nov 2, 2004 22:43 UTC (Tue) by Per_Bothner (subscriber, #7375) [Link]

We need a specification for terminal emulators as to which keystrokes transmit which key sequences. The specification should work with UTF-8, so no escape sequences with the high bit set.

There seems to be a consensus finally for DEL vs BS vs ctrl-H (I don't remember where I saw it). Most applications also handle arrow keys (though there are two different encoding), and page-up/-down. Is there a consensus for function keys?

The Alt key is tricky, because it is also used for "local" operations. So even if there were a standard for encoding Alt-f it is possible it might get interpreted by the terminal emulator or windows manager. This is a problem of conflicting user interface standards: Does ctrl-V do a paste, or scroll (as in Emacs)? And even if you agree it does a paste, should the paste be performed locally (by the terminal emulator) or remotely (by the program running in the terminal emulator)? Ideally, the answer to the latter shouldn't matter, as both applications should use the same clipboard - which is a little tricky with traditional tty-based applications.

ALT character encodings

Posted Nov 2, 2004 23:09 UTC (Tue) by iabervon (subscriber, #722) [Link]

Actually, looking at ANSI escape sequences, plus the fact that holding alt is considered equivalent to first typing ESC for terminals (which I'd forgotten) should give a reasonable solution for the keys. In fact, if Alt-f opens the file menu, typing ESC f instead should still make the shell go forward a word (or whatever the receiving program does with that input), since the terminal emulator is actually getting keyboard scan codes and can tell what you're really doing without escape sequence encoding.

As for the user interface standards, it is impossible to have a single clipboard between the terminal emulator and the program running in it; I often have the same program running in multiple terminal emulators on different machines, using screen. I find it useful that I can copy some text in emacs in the screen session on my login server while connected from work, log out of work, go home, log in, connect to the screen session, and paste the text. For that matter, I sometimes log into the screen session from multiple places at once, and I don't want my clipboard transferred from desktop 1 via my server to desktop 2 whenever I copy something on desktop 1. So it matters very much whether I'm copying something from the desktop into the terminal emulator or whether I'm copying something inside the terminal emulator.

ALT character encodings

Posted Nov 4, 2004 3:15 UTC (Thu) by Per_Bothner (subscriber, #7375) [Link]

typing ESC f instead should still make the shell go forward a word (or whatever the receiving program does with that input)

It's not an ideal solution, though, as typing ESC f isn't quite as convenient as typing Alt-f.

I find it useful that I can copy some text in emacs in the screen session on my login server while connected from work, log out of work, go home, log in, connect to the screen session, and paste the text.

That seems rather esotheric. Few people are going to try that. (That doesn't mean it wouldn't bve nice to have.) But more important is being able to copy to the clipboard from one application, and paste into the terminal emulator. Note I can type ctrl-Y in a remote Emacs X window, and have it paste from the clipboard on my local screen. It's not unreasonable to want the same for a remote Emacs running in a terminal window - though it would take some enhancement to the terminal enulator.

It is still possible to get the effect you want: if you log out your "X-enhanced screen", it can copy the current clipboard contents to the screen session. This would be somewhat similar to the proposal "clipboard manager", except that this clipboard manager would be local to screen.

I've been dreaming of an enhanced terminal emulator, including the functionality of the abandonded XMLterm, my Emacs terminal-mode and "rlfe" front-end, but I don't see any signs of it happening, I'm sorry to say.

ALT character encodings

Posted Nov 4, 2004 6:39 UTC (Thu) by iabervon (subscriber, #722) [Link]

It's not an ideal solution, though, as typing ESC f isn't quite as convenient as typing Alt-f.

Personally, I don't like drop-down menus, and I really think they're inappropriate for terminal emulators. Particularly "File", since they don't do anything with files.

That seems rather esotheric.

I do unusual things, certainly. On the other hand, once you have the right model in your head, it works just like you'd think. It's a bit like carrying around a book and reading it in sequence regardless of where you happen to be, or having a laptop that you can pull out in different places, except that you don't have to carry it around.

Personally, I copy from one window to another with the mouse, and expect the keyboard to act internally to the program, although obviously emacs doesn't actually work entirely that way, and there's no particular reason it should. Of course, xterm and programs in them do behave the way I expect. I don't think I've ever wanted to have a remote program pull the clipboard state from the local session, or expected that to work; sending the local clipboard through the terminal to the remote program seems more natural. I've actually been more likely to expect to be able to copy and paste into a window out of real life (obviously, this didn't work).

ALT character encodings

Posted Nov 7, 2004 16:08 UTC (Sun) by pimlott (subscriber, #1535) [Link]

I don't think I've ever wanted to have a remote program pull the clipboard state from the local session, or expected that to work

I do this all the time with vim. Even in terminal mode, you can paste from the clipboard by typing "*p. Of course, it needs an X connection to do this, usually the one forwarded over ssh. This is not ideal, because I use a screen-like program (dtach), so that when I reattach from a different system, vim still tries to use the old X connection. So for me, a terminal-based clipboard protocol would be a big win. (One could imagine an alternate hack: put $DISPLAY into a file .display, and have vim poll that file and reinitialize its X connection when it changes. But a terminal-based protocol would be more likely to be supported by non-X-aware applications.)

sending the local clipboard through the terminal to the remote program seems more natural

Do you mean a simple paste into the terminal? This is even less ideal, because usually you want the program at the other end to accept the pasted text uninterpreted, but there's no way for it to know it's getting a paste. (Well, vim has paste mode for this purpose, but you have to turn it on and off.)

I've actually been more likely to expect to be able to copy and paste into a window out of real life (obviously, this didn't work).

:-(

Using Unicode in Linux (NewsForge)

Posted Nov 2, 2004 22:56 UTC (Tue) by bartoldeman (subscriber, #4205) [Link]

Try

xterm*EightBitInput: false

in ~/.Xresources
then restart X or run xrdb -merge ~/.Xresources.
That way alt-x is received as Esc x (try "cat -v", you should see ^[x)

Alternatively you can setup a seperate meta key (e.g. the key
with the logo next to the alt key), and use that key instead of Alt where programs refer to M-something. xterm may need "MetaSendsEscape" then.

Better conversion tools

Posted Nov 4, 2004 11:51 UTC (Thu) by mjr (subscriber, #6979) [Link]

The article missed some valuable conversion tools; convmv is a smart tool for converting filenames between different encodings, and GNU recode is often more handy than iconv due to its capability to convert a file in-place (negating the need for the user to take care of temporary files etc).

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds