> Fixing non-ASCII code presents several additional and absolutely unnecessary problems.
Readability matters, and having keywords in crazy places for reasons that most C++ programmers don't understand doesn't help readability at all.
> I don't know if you ever had a pleasure of editing code written in an unfamiliar script, but I had. Cut&pasting identifiers can only get you so far before you start crying.
You must be really desperate if you need to come up with crazy use cases like that. I never needed to do what you described, nor did 99.9% of developers. Optimising a language for something like that would be insane.
Posted Mar 30, 2013 12:36 UTC (Sat) by khim (subscriber, #9252)
[Link]
I never needed to do what you described, nor did 99.9% of developers.
Rilly? Let's exclude developers who never worked in the international teams and never had to deal with foreign scripts. Do you still claim 99% of these like Unicode?
Most style guides forbid anything besides US ASCII even in languages where no such limitation exist for a reason. You only need to edit one piece of Java code which has Japanese-specific pieces with names in kanji and Arabic-specific pieces with names in abjad to understand why unicode in programming languages is bad idea (tm). At least if foreign names are transliterated you can least type them (even if you can not always pronounce them), but to even distinguish names in some weird scripts like abjad you need some training.
Even without kanji or abjad it's easy to create extremely hard-to-edit pieces of code. Here is an example:
public class HelloWorld {
public static void main(String[] args) {
for (int ⅰ=1;ⅰ<=10;ⅰ++) {
for (int і=1;і<=10;і++)
System.out.printf("%4d", і*і);
System.out.println();
}
}
}
It's realtively easy to spot the error (depending on the font in your editor, of course), but if you try to fix it… you'll probably need to use copy-paste and if your piece of code contains not just “ⅰ” and “і” but a straightforward “i” and less straightforward “i” and also “ⁱ” and “ᵢ”, too… then you are in trouble. For java compiler all six are quite different but for programmer some of them may be hard to distinguish and some are impossible to distinguish at all (depending on font).
US Ascii also has couple of pieces where confusion is possible (think "O" vs "0" and "1" vs "l"), but these are well-known and fonts are often specifically designed to distinguish them. With unicode confusion is inevitable: I'm yet to see a font where “і” differs from “i” and often even “ⅰ” is indistinguishable from “і” or “i”.
The whole "let's argue about the taste of oysters with those who actually ate them" just make me sick. Sorry, but I've worked with programs which use Unicode and I most definitely don't want to repeat this experience. Unicode in comments is fine (even if you can not change them in a clean way you can always just replace them with some approximation and if code deal with a nuances of a foreign language then often you need to use a bit of unicode to explain what goes on there), but once Unicode reaches identifiers it becomes a disaster and I shudder to even think about them lever of mayhem when it'll reach the syntax of language itself.
Still no fixed template syntax
Posted Mar 31, 2013 13:57 UTC (Sun) by nix (subscriber, #2304)
[Link]
I'm yet to see a font where “і” differs from “i”
FWIW, here, the former (CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I, according to uniname) is differently hinted, so appears slightly different (with more hinting blur) if the patented FreeType bytecode interpreter is turned on. (In my web browser, i FULLWIDTH LATIN SMALL LETTER I is invisible: in my terminals, it's visible since they use a different font, it is visible but ⅰ SMALL ROMAN NUMERAL ONE and ⁱ SUPERSCRIPT LATIN SMALL LETTER I are square boxes. IMNSHO, anyone who uses any of these when programming is a maniac. Even using them in literal strings or translated output is questionable: font coverage for these letters is just too poor.)