Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 14, 2022 13:34 UTC (Fri) by rahulsundaram (subscriber, #21946)
In reply to: Malcolm: Prevent Trojan Source attacks with GCC 12 by gspr
Parent article: Malcolm: Prevent Trojan Source attacks with GCC 12

> But maybe other people need them?

That’s why he wanted it as an option.

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 14, 2022 14:14 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (8 responses)

> > But maybe other people need them?

> That’s why he wanted it as an option.

That's it.

In general in computer languages, the intersection between what everyone can deal with is ASCII. The rest is causing trouble to *some* participants. Sure, within a company or a bunch of buddies from the same school or country you can write in your own language and not care about the trouble caused to anyone else trying to participate to your project. But when you start to have to deal with characters that do not exist on your keyboard, the same one that you're using to write "main()", "#include" or "const unsigned", it starts to become annoying.

I'm really amazed by the fact that many people speak a lot about inclusivity these years and that at the same time we seem to be making everything possible to complicate participation to world-wide projects using excentricities like this. I'm not a native english speaker myself, yet I make the effort of writing all my comments in this language, my doc as well, naming variables and functions this way etc, hoping that they're accessible to others. Sometimes I make mistakes in the naming and it takes me lots of efforts to find the most suitable names. Be it, I'm doing my best. But I long ago stopped writing using my native language (french), using accents or even other non-ASCII characters that I used to find convenient to refer to paragraphs etc, just because it was a pain for others to deal with (e.g. find another occurrence in the file, copy-paste it everywhere needed is not respectful of others).

Thus indeed I would like to have an option to make sure these extremely rare and most often accidental practices disappear from code I'm in charge of, without having to be rude to contributors. It's much better for them to see a warning during "make" than having someone ask them to write something differently in a comment.

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 14, 2022 15:06 UTC (Fri) by gspr (guest, #91542) [Link] (6 responses)

> In general in computer languages, the intersection between what everyone can deal with is ASCII.

But the intersection of what everyone can deal with and what is necessary for everyone is probably empty. In that case, settling for ASCII is rather arbitrary.

> The rest is causing trouble to *some* participants.

So does ASCII!

> Sure, within a company or a bunch of buddies from the same school or country you can write in your own language and not care about the trouble caused to anyone else trying to participate to your project. But when you start to have to deal with characters that do not exist on your keyboard, the same one that you're using to write "main()", "#include" or "const unsigned", it starts to become annoying.

Annoying… for you, yes. The person whose name is not ASCII-safe might see the situation differently. (This is not a personal gripe; my name is ASCII-safe and I almost exclusively write and code in English)

> I'm really amazed by the fact that many people speak a lot about inclusivity these years and that at the same time we seem to be making everything possible to complicate participation to world-wide projects using excentricities like this.

Excentricities like what?

> I'm not a native english speaker myself, yet I make the effort of writing all my comments in this language, my doc as well, naming variables and functions this way etc, hoping that they're accessible to others.

Well, that's great. I do, too. But I find that using non-ASCII symbols, especially in comments, to describe mathematically motivated code is extremely useful and clarifying.

> Sometimes I make mistakes in the naming and it takes me lots of efforts to find the most suitable names. Be it, I'm doing my best. But I long ago stopped writing using my native language (french), using accents or even other non-ASCII characters that I used to find convenient to refer to paragraphs etc, just because it was a pain for others to deal with (e.g. find another occurrence in the file, copy-paste it everywhere needed is not respectful of others).

OK, so you chose to forego your native language for the sake of what's convenient for you. You may disagree with people who don't want to forego theirs, but it's a bit weird to write them off.

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 15, 2022 10:51 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (5 responses)

As an example of an eccentricity, ASCII has _case_ which is a really weird feature where some of the symbols are available in two varieties with almost but not quite the same meaning, but, it only has case for its set of twenty six Latin letters, not for the digits for example, even though digits can have case, we just didn't bother mapping that and it fell out of use. It's so rarely used, let alone needed for the digits, that Unicode didn't even bother distinguishing either. But case was preserved for the Latin letters despite this.

On the other hand, ASCII lacks the proper quote marks having chosen to go with typewriter-style "straight" quotes to save space, and it can't spell some English words in the conventional way because it lacks accented letters. It is an odd duck. Like C it was probably a good choice in the decade when I was born, but is not The Right Thing today.

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 15, 2022 11:25 UTC (Sat) by mpr22 (subscriber, #60784) [Link] (3 responses)

> it can't spell some English words in the conventional way because it lacks accented letters.

I suspect most native English speakers probably never spelled naïve, coöperate, and fiancé(e) with the accented letters (even 35 years ago when I was in primary school and we all still had to use pen(cil) and paper for ~100% of schoolwork) anyway :)

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 15, 2022 12:25 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (2 responses)

The "coö-" spellings are certainly out of favor (except at the New Yorker). I still use "naïve" and "fiancé". I think "café" is probably among the words that keeps the accent the most IME. Also, "Pokémon" is common enough in certain circles (though probably copy/pasted).

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 15, 2022 12:37 UTC (Sat) by mpr22 (subscriber, #60784) [Link] (1 responses)

Plenty of native English speakers spell that trademarked foreign proper noun without the acute, for a variety of different reasons. (can't input acute accents on their HID; don't care about diacritical marks at all; managed to reprogram their brains to assume (augmented) Latin vowels for any word that isn't obviously English or spelled-by-an-English-person so don't need the accent to pronounce it correctly; are weird half-purist weebs (if they were really purist they'd spell it in katakana); ...)

(In Pokemon fandom you'll even find people who deliberately de-capitalize it on the grounds that in-universe it's a humdrum ordinary word.)

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 17, 2022 1:18 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> (In Pokemon fandom you'll even find people who deliberately de-capitalize it on the grounds that in-universe it's a humdrum ordinary word.)

Yeah, but *everybody* does that if the title is a common noun in-universe. Mass Effect fans will write "the mass effect" (and I believe the official in-game codex uses this style as well), Portal fans refer to "the portal gun" (its official name is "the Aperture Science Handheld Portal Device," and so some fans call it the "ASHPD"), and I have never heard of anyone capitalizing "hobbit" except in the actual title of Tolkien's book. This is completely standard English.

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 17, 2022 5:41 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

It's not just scripts, either. English (and many other western European languages, to be fair) has this really weird feature called "tense," where you have to indicate whether something happened in the past or the not-past, for every single sentence that you write. This is grammatically required; future constructions, for example, can be written in several different ways (using modal "will", using the "[be] going to" construction, just specifying a time as in "Tomorrow, I go to the store", etc.), but every single one of those constructions absolutely *must* be in the not-past tense, and every single sentence that actually takes place in the past must be in the past tense (can't write "*Yesterday, I go to the store"). There are plenty of languages that just don't require a tense, so you don't have to describe when everything happens if you don't feel that it is relevant.

Malcolm: Prevent Trojan Source attacks with GCC 12

Posted Jan 14, 2022 16:00 UTC (Fri) by marcH (subscriber, #57642) [Link]

Right, like when banning words like "dummy" or "blacklist" without realizing how _American_ English the inclusivity effort is. Very ironic.

American English is the lingua franca of computing, that ship has sailed. Pretending it's not is just making things more difficult.

Another recent irony is "pronouncing names correctly". Then the ignorant patronizing talks about "first" and "last" names instead of "given" or "family". But more importantly, it assumes that American speakers are capable of making sounds not in their language, which is obviously not true. They're not even capable of pronouncing most European names that look like English ones and it's not something most adults (in any country) can easily change. That's why many Chinese people take English "nicknames" at work, simply because they know tonal languages are extremely difficult to adjust to and the important thing is the ability to communicate.
- Paying attention and pronouncing people's names _as they desire_: yes of course, that's a very basic respect.
- Pronouncing names "correctly": of course not, we can't do that.