LWN.net Logo

Probable e1000e corruption culprit found (and 2.6.27.1 released)

Probable e1000e corruption culprit found (and 2.6.27.1 released)

Posted Oct 17, 2008 0:16 UTC (Fri) by paragw (subscriber, #45306)
In reply to: Probable e1000e corruption culprit found (and 2.6.27.1 released) by nevets
Parent article: Probable e1000e corruption culprit found (and 2.6.27.1 released)

Also am I right in saying that this bug was also present on x86_64 and the only difference was that it scribbled on non io-remapped memory? Surprising though there were no consequences of this bug on 64-bit.


(Log in to post comments)

Probable e1000e corruption culprit found (and 2.6.27.1 released)

Posted Oct 17, 2008 0:43 UTC (Fri) by nevets (subscriber, #11875) [Link]

Actually no, not really.

The code did a cmpxchg which simply looks at the contents of the address, and if it matches to what it thinks it does, then it writes the new data. Otherwise, it does not write the new data (I did recently learn that cmpxchg will always write something: either the same data it read, on failure, or the new data if the read data matches).

The code that did this cmpxchg also had fault protection on to catch writing to non-writable memory. Since the cmpxchg tests 32 bits, there still exists a 1 in 4 billion chance that it somehow could be pointing into a kernel data space that has the same data as a call to mcount, and it would update the data.

NOTE!!! I can not stress this more. The new code to ftrace, that is queued for 28, goes through considerable pains to prevent this from happening.

1) All functions labeled with __init (the code that gets removed at boot up) is also labeled with notrace (to prevent this code from being placed in the ftrace update tables).

2) Modules now call a ftrace_release function that will allow ftrace to remove all functions it knows about from its update tables, when the module unloads.

3) On x86 (and soon on PPC) we record the mcount call sites at compile time. At early boot up (before SMP starts), these call sites are modified into nops. Only when ftrace is enabled and starts tracing is the kstop_machine needed to update the call sites. But this time because of the above two points, the changes made are only to locations in the kernel that we know is text.

4) We do not use cmpxchg anymore. We use __copy_from/to_user to read and write the data. We read it, compare it, then write it. If any of these steps fail, you will see a big nasty warning.

Again, if we didn't think these changes were so intrusive, we would have pushed them into 27.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds