Exposing Trojan Source exploits in Emacs
For those just tuning in, one of the Trojan Source vulnerabilities takes advantage of the control codes built into Unicode for the handling of bidirectional text. While this article is written in a left-to-right language, many languages read in the opposite direction, and Unicode-displaying applications must be prepared to deal with that. Sometimes, those applications need some help to know the direction to use when rendering a particular piece of text. Unicode provides control codes to reverse the current direction for this purpose; unfortunately, clever use of those codes can cause program text to appear differently in a editor (or browser or other viewing application) than it appears to the compiler. That can be used to sneak malicious code past even an attentive reviewer.
One part of the problem is applications that show code containing overrides in a way that is correct (from a Unicode-text point of view), but which is incorrect in terms of what will actually be compiled. So an obvious solution is to change how applications display such text. It is thus not surprising that a conversation sprung up on the Emacs development list to figure out what the Emacs editor should do.
Emacs maintainer Eli Zaretskii was quick to point out that this problem was not as new as some people seem to think. A variant of it had been discussed on that list back in 2014; at that time, the concern was malicious URLs but the basic technique was the same. Zaretskii explained that, in response, he had added some defenses to Emacs:
As result of these discussions, I implemented a special function, bidi-find-overridden-directionality, which is part of Emacs since version 25.1, released 5 years ago. (Don't rush to invoke that function with the code samples mentioned above: it won't catch them.) My expectation, and the reason why I bothered to write that function, was that given the interest and the long discussion, the function will immediately be used in some URL-related code in Emacs. That didn't happen, and the function is collecting dust in Emacs ever since.
He would be, he said, less than fully enthusiastic about launching into another defensive effort without some sort of assurance that this work would actually find some users.
Others, meanwhile, were thinking about ways to make it clear that there is funny business going on in code containing directional overrides. Daniel Brooks posted an approach using the existing Emacs whitespace-mode, with some extra configuration to mark directional overrides as special types of white space. Gregory Heytings posted a patch with a similar intent that worked by adding a new display table. Stefan Kangas suggested having the Emacs byte compiler raise errors whenever the problematic control codes appear in Emacs Lisp code unless a special flag is set.
Zaretskii was not particularly impressed with any of these approaches.
Simply marking the control characters, he said, is just creating
"visual noise
" that will make reading text harder, and is
addressing the wrong problem: "The mere presence of these characters
is NOT the root cause. These characters are legitimate and helpful when
used as intended
".
He referenced TUTORIAL.he, the Emacs tutorial translated
to Hebrew, which uses overrides for Emacs commands. This
version of the file in a GitHub copy of the Emacs repository now helpfully
marks the lines containing those overrides (as another Trojan Source
defense). Zaretskii's point is that
adding warnings to this kind of usage, which is not malicious, is a
distraction that trains users to ignore the warnings wherever they appear.
So what should Emacs do? Zaretskii continued:
The challenge, therefore, is not to make these characters stand out wherever they happen, because that would flag also their legitimate uses for no good reason. The challenge is to flag only those suspicious or malicious uses of these characters. And that cannot be done by just changing the visual appearance of those characters, because their legitimate uses are by far more frequent than their malicious uses. To flag only the suspicious cases, the code which does that needs to examine the details of the text whose directionality was overridden and detect those cases where such overriding is suspicious. For example, when a character with a strong left-to-right directionality has its directionality overridden to behave like right-to-left character, that is highly suspicious, because it makes no sense to do that in 99.99% of valid use cases.
Zaretskii quickly committed a patch to the Emacs repository implementing this heuristic. Your editor decided to give it a try, starting with this example of malicious code posted by Brooks:
(defun main ()
(let ((is_admin nil))
; begin admins only(when is_admin
(print "You are an admin.")) ; end admins only(
)
This code contains overrides that cause the when test to be commented out; the effect of the overrides can be seen in the browser by slowly highlighting over the code with the mouse. This code, displayed in a normal Emacs buffer with font-lock turned on, looks like this:
It is worth noting that the suspicious nature of the code is already reasonably clear from the syntax coloring; the when test is colored as a comment. Zaretskii's patch is meant to make this problem stand out even more: when the new highlight-confusing-reorderings command is run on it, the code now looks like this:
That should certainly be enough to cause even a casual reader to wonder what is going on with that code. As intended, this new command does not highlight the overrides used in TUTORIAL.he — except that, amusingly, it found two places where the overrides were used incorrectly.
Heytings didn't
like this solution: "When security is at stake, I very
much prefer too many false positives to missing one danger
". He
also pointed out a case that Zaretskii's code failed to catch, citing it as
proof that only highlighting the malicious uses is not feasible. (That
case did not survive its encounter with the email archives; it can be seen
in this page). Zaretskii
responded that users
who don't care about false positives can highlight all of the
relevant control characters in Emacs now; he also applied a fix to detect
the case that Heytings found. At that point, Heytings made it clear
that he thought his point had been missed and gave up on the discussion.
So now the discussion would appear to be over; Emacs has a mechanism to make suspicious use of Unicode directional overrides easy to see. It may be a while before users benefit from that work, though. It is not at all clear that this change will be backported to current Emacs releases, so it may only be found in development builds for some time. There is also nothing that uses it by default; the highlighting will only happen if the user explicitly asks for it. To make this functionality more available, developers will need to incorporate it into the major modes used with various programming languages.
This fix, assuming it is shown to work over time, is only directly relevant
to the small subset of developers who live their lives within Emacs. The
approach taken, though, could prove to be useful beyond Emacs. Just waving
a red flag at something that might be suspicious is usually not the best
solution for security problems, especially if most of the instances that
are flagged are legitimate. After a while, we all grow weary of looking
past those flags and simply stop seeing them. If it is possible to just
shine a light on uses that truly merit a closer look, though, then we might
just gain a little security from it.
| Index entries for this article | |
|---|---|
| Security | Unicode |
