Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
is HWPOISON a hardware level feature or a OS level feature?
if it's a hardware level feature (which is what I understood from the original article) then it wouldn't necessarily cause a machine check error ever.
if this is instead a difference in how the OS responds to a memory error I just completely misunderstood what's happening.
Posted Sep 1, 2009 23:59 UTC (Tue) by jzbiciak (✭ supporter ✭, #5246)
A machine check error (whether delivered as an exception or an interrupt--the new MCA does both depending on the error type) is a message from the hardware to the software. In the most recent Intel architectures, they support a notion of "recoverable machine check," wherein the hardware tells the OS that no CPU state was corrupted when it noticed the problem. If you look at that PDF I linked, there are a number of status bits (including AR--Action Required) that indicate the severity of the error. There's a separate table in Intel's PDF that suggests the possible OS responses to a particular error.
Once the hardware delivers the message to the OS (via a machine check), the OS is then free to deal with the machine check however it pleases. For "Action Optional" machine checks that can happen asynchronously to program execution (such as due to scrubbing), the OS can queue up a handler to go deal with the affected page, either by poisoning it or unmapping it or what-have-you. That's the stuff Andi Kleen and co.'s patch does.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds