|
|
Subscribe / Log in / New account

Malcolm: The state of static analysis in the GCC 12 compiler

David Malcolm has posted an update on the state of static analysis in GCC 12.

Some other languages, such as Perl, can track input and flag any variable that should not be trusted because it was read from an outside source such as a web form. Flagging variables in this manner is called tainting. After a program runs the variable through a check, the variable can be untainted, a process called sanitization.

Our GCC analyzer's taint mode is activated by -fanalyzer-checker=taint (which should be specified in addition to -fanalyzer). Taint mode attempts to track attacker-controlled values entering the program and to warn if they are used without sanitization.



to post comments

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 14:03 UTC (Tue) by IanKelling (subscriber, #89418) [Link] (11 responses)

With js disabled, the page only displays "Sorry, you need to enable JavaScript to visit this website.", but if you press firefox's reader view, it displays fine.

https://www.gnu.org/philosophy/javascript-trap.en.html

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 14:12 UTC (Tue) by dave_malcolm (subscriber, #15013) [Link] (1 responses)

Thanks; I've raised this with the hosting team.

FWIW, I've written up a much more barebones version of the material for the GCC 12 release notes here:

https://gcc.gnu.org/gcc-12/changes.html#analyzer

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 14:44 UTC (Tue) by IanKelling (subscriber, #89418) [Link]

Thank you! And thank you for doing this awesome work on GCC. I enjoyed the article.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 14:17 UTC (Tue) by Paf (subscriber, #91811) [Link] (8 responses)

There is *so much* non free software in basically every piece of hardware out there and all across the web. *any* code on the web may be non-free.

Why is there this religious objection to the idea that *someone else’s* proprietary code might execute on your CPU while you’re looking at their website? But heck, even granting that, do you also disable CSS? HTML5 + CSS can and is used to program quite complex web apps and they’re not any more open source than your average blob of JavaScript.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 14:41 UTC (Tue) by IanKelling (subscriber, #89418) [Link] (1 responses)

> Why is there this religious objection

It is not religious. As the link I posted explains the objection in the first 2 sentences:

In the free software community, the idea that [any nonfree program mistreats its users]( https://www.gnu.org/philosophy/free-software-even-more-im... ) is familiar. Some of us defend our freedom by rejecting all proprietary software on our computers.

> HTML5 + CSS can and is used to program quite complex web apps

I'm not aware of that happening. Can you point to an example?

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 15:59 UTC (Tue) by excors (subscriber, #95769) [Link]

>> HTML5 + CSS can and is used to program quite complex web apps
>
> I'm not aware of that happening. Can you point to an example?

As a slightly silly example, there's a JS-free playable version of Minesweeper at https://codepen.io/bali_balo/pen/BLJONZ . (It uses some server-side scripting to generate the HTML and CSS code, and it looks like the clickable squares are <label>s linked to checkboxes to store the state of your clicks, then the rest is using CSS selectors to render the scene based on that state. That seems easily generalisable to many kinds of interactive applications.)

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 15:47 UTC (Tue) by brunowolff (guest, #71160) [Link]

I have javascript disabled by default because it is a security problem.
For that page the restriction was silly. Disabling style for the page made it display reasonably, so it doesn't appear that javascript support should have been tested for.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 17:13 UTC (Tue) by flussence (guest, #85566) [Link]

> But heck, even granting that, do you also disable CSS?

I do, but that's due to the modern web brain-rot of having megabytes of custom fonts and icons loaded from third-party sites, and what those sites *do* with all that metadata they harvest and correlate. A well-engineered website would not need multiple CDNs for something as ostensibly simple as theming a blog.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 13, 2022 6:42 UTC (Wed) by oldtomas (guest, #72579) [Link]

"*so much*" "*any*" "religious" "*someone else's*"

Uh, oh. Emotions run high on that one, eh? Seems the problem is on its way to being solved, without much noise. Don't worry :-)

Back to the topic: thanks for that awesome compiler. Looking forward!

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 13, 2022 7:21 UTC (Wed) by mpr22 (subscriber, #60784) [Link] (2 responses)

> Why is there this religious objection to the idea that *someone else’s* proprietary code might execute on your CPU while you’re looking at their website?

It's perfectly reasonable to object to having more of other people's proprietary code running inside your security boundaries than is necessary, and Javascript engines are a well-known fount of security vulnerabilities so it makes sense to minimize the amount of unvetted material they're exposed to.

(And I can fairly straightforwardly construct that argument without even subscribing to the underlying position, so I find it very hard to believe that you don't understand.)

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 13, 2022 8:03 UTC (Wed) by mjg59 (subscriber, #23239) [Link] (1 responses)

If you don't inspect the code before you run it, is it really better to run malicious Javascript that's under a free license than it is to run malicious Javascript that's under a non-free license?

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 16, 2022 7:24 UTC (Sat) by oldtomas (guest, #72579) [Link]

I assume your "really better" is meant from a strict security standpoint.

In general, free is better than non-free; those are two dimensions which aren't totally independent, but also not totally dependent, no?

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 18:44 UTC (Tue) by bartoc (guest, #124262) [Link] (6 responses)

I really hope this taint mode is useful, but such tracking tends to hit "rapid taint explosions" (how appetizing) pretty quickly. GCC has lately been pretty good about providing clever warnings and static analysis, so I'm looking forward to trying this out.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 12, 2022 19:38 UTC (Tue) by rgmoore (✭ supporter ✭, #75) [Link] (5 responses)

It seems to me that the whole point of having a taint tracker is to avoid taint from growing out of control. Best practice is to sanitize potentially unsafe input immediately and work only with the sanitized form. If the taint tracker blows up because tainted input is winding up all over the place, you need to rewrite your code. Once you have stuff pretty well under control, the tracker will warn you if you missed something.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 13, 2022 14:07 UTC (Wed) by dave_malcolm (subscriber, #15013) [Link] (4 responses)

[author of the article here]

Thanks - I think your comment is hinting at a missing feature here, in that my code is looking at:
(a) sources of taint (e.g. ioctls, copy_from_user),
(b) propagation of taint, and
(c) trusting sinks to which taint must not reach (e.g. allocation sizes, array indices, etc, as noted by the warnings I listed in the article)

...but maybe I need some kind of:
(d) "everything should have been sanitized by now" attribute?

I'm not quite sure what that would look like, though. Maybe an attribute on functions? But then nearly everything would be labelled with that, which wouldn't be desirable. I'm working on LTO support for the analyzer, though that has its own scaling issues... Or maybe an:
__analyzer_assert_sanitized (void *);
intrinsic or somesuch?

(my knowledge of the kernel internals is largely from reading LWN over the years, rather than actually hacking on it, so help/advice from kernel developers would be most welcome!)

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 13, 2022 18:20 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link] (3 responses)

...but maybe I need some kind of: (d) "everything should have been sanitized by now" attribute?

That wasn't my intention at all. I'm coming at this from the POV of a user. My gut feeling is that a "everything should be sanitized by now" is exactly what we don't want. The whole point of this kind of analyzer is to actually check this stuff, not to make any kind of assumption. The only assumption you're really forced to make is to accept the programmer's claims when they say their sanitizing steps work. It would be great if you could actually figure that out, but it seems like something beyond the scope of any taint checker I've heard of.

My point was more about the idea that the taint will rapidly spread beyond control, i.e. a rapid taint explosion. If that happens, it's most likely a sign of a pervasive problem with failure to sanitize inputs. In that case, the programmer needs to sit down and seriously rethink their code.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 13, 2022 20:56 UTC (Wed) by dave_malcolm (subscriber, #15013) [Link] (1 responses)

Indeed, sorry if I mischaracterized your comment.

Right now, the taint part of the analyzer is relatively new and untested, and so a rapid taint explosion (ugh!) is also likely to be due to bugs in my analysis code, rather than in the code being analyzed. I hope to improve that for GCC 13.

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 13, 2022 22:08 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]

No apology needed. Back and forth is how you straighten out misunderstandings- which can just as easily be because the writer expressed themselves poorly as because the reader took something wrong.

I had honestly never given much thought to how you would go about testing code for this kind of analyzer. It seems like it would be even more challenging to debug than ordinary code. Writing good test cases is obviously a major challenge!

Malcolm: The state of static analysis in the GCC 12 compiler

Posted Apr 14, 2022 2:54 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> It would be great if you could actually figure that out, but it seems like something beyond the scope of any taint checker I've heard of.

That is probably impossible in the general case.

Suppose, for example, you're letting the user upload a file. The HTTP headers say the file is some size, so you have to allocate a buffer whose size depends on the number the user gave you, with reasonable sanity checks for things like "don't try to allocate 10+ GiB of RAM," "don't try to allocate negative numbers," etc. Then the number you pass to the allocator was, at some point, tainted, and it needs to not be tainted by the time it reaches malloc(). But it's not really possible for static analysis to understand what exactly counts as a "reasonable" sanity check (e.g. maybe on your system, 10 GiB of RAM is no big deal and you could easily go up to 100, but on my system even 1 GiB would be a problem).


Copyright © 2022, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds