Malcolm: The state of static analysis in the GCC 12 compiler
Some other languages, such as Perl, can track input and flag any variable that should not be trusted because it was read from an outside source such as a web form. Flagging variables in this manner is called tainting. After a program runs the variable through a check, the variable can be untainted, a process called sanitization.Our GCC analyzer's taint mode is activated by -fanalyzer-checker=taint (which should be specified in addition to -fanalyzer). Taint mode attempts to track attacker-controlled values entering the program and to warn if they are used without sanitization.
Posted Apr 12, 2022 14:03 UTC (Tue)
by IanKelling (subscriber, #89418)
[Link] (11 responses)
Posted Apr 12, 2022 14:12 UTC (Tue)
by dave_malcolm (subscriber, #15013)
[Link] (1 responses)
FWIW, I've written up a much more barebones version of the material for the GCC 12 release notes here:
Posted Apr 12, 2022 14:44 UTC (Tue)
by IanKelling (subscriber, #89418)
[Link]
Posted Apr 12, 2022 14:17 UTC (Tue)
by Paf (subscriber, #91811)
[Link] (8 responses)
Why is there this religious objection to the idea that *someone else’s* proprietary code might execute on your CPU while you’re looking at their website? But heck, even granting that, do you also disable CSS? HTML5 + CSS can and is used to program quite complex web apps and they’re not any more open source than your average blob of JavaScript.
Posted Apr 12, 2022 14:41 UTC (Tue)
by IanKelling (subscriber, #89418)
[Link] (1 responses)
It is not religious. As the link I posted explains the objection in the first 2 sentences:
In the free software community, the idea that [any nonfree program mistreats its users]( https://www.gnu.org/philosophy/free-software-even-more-im... ) is familiar. Some of us defend our freedom by rejecting all proprietary software on our computers.
> HTML5 + CSS can and is used to program quite complex web apps
I'm not aware of that happening. Can you point to an example?
Posted Apr 12, 2022 15:59 UTC (Tue)
by excors (subscriber, #95769)
[Link]
As a slightly silly example, there's a JS-free playable version of Minesweeper at https://codepen.io/bali_balo/pen/BLJONZ . (It uses some server-side scripting to generate the HTML and CSS code, and it looks like the clickable squares are <label>s linked to checkboxes to store the state of your clicks, then the rest is using CSS selectors to render the scene based on that state. That seems easily generalisable to many kinds of interactive applications.)
Posted Apr 12, 2022 15:47 UTC (Tue)
by brunowolff (guest, #71160)
[Link]
Posted Apr 12, 2022 17:13 UTC (Tue)
by flussence (guest, #85566)
[Link]
I do, but that's due to the modern web brain-rot of having megabytes of custom fonts and icons loaded from third-party sites, and what those sites *do* with all that metadata they harvest and correlate. A well-engineered website would not need multiple CDNs for something as ostensibly simple as theming a blog.
Posted Apr 13, 2022 6:42 UTC (Wed)
by oldtomas (guest, #72579)
[Link]
Uh, oh. Emotions run high on that one, eh? Seems the problem is on its way to being solved, without much noise. Don't worry :-)
Back to the topic: thanks for that awesome compiler. Looking forward!
Posted Apr 13, 2022 7:21 UTC (Wed)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
It's perfectly reasonable to object to having more of other people's proprietary code running inside your security boundaries than is necessary, and Javascript engines are a well-known fount of security vulnerabilities so it makes sense to minimize the amount of unvetted material they're exposed to.
(And I can fairly straightforwardly construct that argument without even subscribing to the underlying position, so I find it very hard to believe that you don't understand.)
Posted Apr 13, 2022 8:03 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link] (1 responses)
Posted Apr 16, 2022 7:24 UTC (Sat)
by oldtomas (guest, #72579)
[Link]
In general, free is better than non-free; those are two dimensions which aren't totally independent, but also not totally dependent, no?
Posted Apr 12, 2022 18:44 UTC (Tue)
by bartoc (guest, #124262)
[Link] (6 responses)
Posted Apr 12, 2022 19:38 UTC (Tue)
by rgmoore (✭ supporter ✭, #75)
[Link] (5 responses)
It seems to me that the whole point of having a taint tracker is to avoid taint from growing out of control. Best practice is to sanitize potentially unsafe input immediately and work only with the sanitized form. If the taint tracker blows up because tainted input is winding up all over the place, you need to rewrite your code. Once you have stuff pretty well under control, the tracker will warn you if you missed something.
Posted Apr 13, 2022 14:07 UTC (Wed)
by dave_malcolm (subscriber, #15013)
[Link] (4 responses)
Thanks - I think your comment is hinting at a missing feature here, in that my code is looking at:
...but maybe I need some kind of:
I'm not quite sure what that would look like, though. Maybe an attribute on functions? But then nearly everything would be labelled with that, which wouldn't be desirable. I'm working on LTO support for the analyzer, though that has its own scaling issues... Or maybe an:
(my knowledge of the kernel internals is largely from reading LWN over the years, rather than actually hacking on it, so help/advice from kernel developers would be most welcome!)
Posted Apr 13, 2022 18:20 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link] (3 responses)
That wasn't my intention at all. I'm coming at this from the POV of a user. My gut feeling is that a "everything should be sanitized by now" is exactly what we don't want. The whole point of this kind of analyzer is to actually check this stuff, not to make any kind of assumption. The only assumption you're really forced to make is to accept the programmer's claims when they say their sanitizing steps work. It would be great if you could actually figure that out, but it seems like something beyond the scope of any taint checker I've heard of.
My point was more about the idea that the taint will rapidly spread beyond control, i.e. a rapid taint explosion. If that happens, it's most likely a sign of a pervasive problem with failure to sanitize inputs. In that case, the programmer needs to sit down and seriously rethink their code.
Posted Apr 13, 2022 20:56 UTC (Wed)
by dave_malcolm (subscriber, #15013)
[Link] (1 responses)
Right now, the taint part of the analyzer is relatively new and untested, and so a rapid taint explosion (ugh!) is also likely to be due to bugs in my analysis code, rather than in the code being analyzed. I hope to improve that for GCC 13.
Posted Apr 13, 2022 22:08 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link]
No apology needed. Back and forth is how you straighten out misunderstandings- which can just as easily be because the writer expressed themselves poorly as because the reader took something wrong.
I had honestly never given much thought to how you would go about testing code for this kind of analyzer. It seems like it would be even more challenging to debug than ordinary code. Writing good test cases is obviously a major challenge!
Posted Apr 14, 2022 2:54 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
That is probably impossible in the general case.
Suppose, for example, you're letting the user upload a file. The HTTP headers say the file is some size, so you have to allocate a buffer whose size depends on the number the user gave you, with reasonable sanity checks for things like "don't try to allocate 10+ GiB of RAM," "don't try to allocate negative numbers," etc. Then the number you pass to the allocator was, at some point, tainted, and it needs to not be tainted by the time it reaches malloc(). But it's not really possible for static analysis to understand what exactly counts as a "reasonable" sanity check (e.g. maybe on your system, 10 GiB of RAM is no big deal and you could easily go up to 100, but on my system even 1 GiB would be a problem).
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
>
> I'm not aware of that happening. Can you point to an example?
Malcolm: The state of static analysis in the GCC 12 compiler
For that page the restriction was silly. Disabling style for the page made it display reasonably, so it doesn't appear that javascript support should have been tested for.
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
(a) sources of taint (e.g. ioctls, copy_from_user),
(b) propagation of taint, and
(c) trusting sinks to which taint must not reach (e.g. allocation sizes, array indices, etc, as noted by the warnings I listed in the article)
(d) "everything should have been sanitized by now" attribute?
__analyzer_assert_sanitized (void *);
intrinsic or somesuch?
Malcolm: The state of static analysis in the GCC 12 compiler
...but maybe I need some kind of:
(d) "everything should have been sanitized by now" attribute?
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler
Malcolm: The state of static analysis in the GCC 12 compiler