A gnarly 2.6.19 file corruption bug
[Posted December 20, 2006 by corbet]
When Linus
released 2.6.19,
he expressed a certain degree of confidence about its quality:
It's one of those rare "perfect" kernels. So if it doesn't happen
to compile with your config (or it does compile, but then does
unspeakable acts of perversion with your pet dachshund), you can
rest easy knowing that it's all your own d*mn fault, and you should
just fix your evil ways.
While this kernel may have lived up to expectations in a number of ways, it
would appear that somebody's evil ways have messed things up - and
dachshunds would be well advised to keep a low profile. It seems that
this kernel can corrupt ext3 filesystems - behavior which was not in the
original set of design goals.
The good news (for users) is that the bug is hard to trigger, and that most
access patterns work just fine. The bulk of the trouble seems to come with
a certain Bittorrent client, which has an unusual access pattern at best.
On occasion, parts of a page will end up being written as zeroes, through
to the end of the page. Please do not expect your editor to explain why
this is happening; it seems that nobody really understands that yet. The
solution, however, may involve some relatively serious low-level memory
management surgery.
The apparent origin of the problem is a change in how dirty pages are
tracked in the kernel. Prior to 2.6.19, this information lived in the page
tables; the 2.6.19 kernel, however, moves some of this information into the
page structure. This change enables better tracking of dirty
pages in the system, which is a good thing, but it could also be bringing
some old bugs out to play.
Not all of those bugs are necessarily in the kernel; at one point, Linus
went off and wrote a demonstration program
showing how a buggy program would work with older kernels but get
surprising results in 2.6.19. What it comes down to is that if a program
maps a file into memory, it cannot put data into that memory beyond the
current length of the file and expect that data to make it to disk. It was
a nice demonstration, but this behavioral change does not appear to be
behind the problem reports.
Confusion surrounding the propagation and management of the page dirty bits
is at the top of the suspect list, as of this writing. Nobody seems to be
able to point at anything specific, however, beyond the fact that the code
appears to be rather badly messed up. Says
Linus:
A lot of this is actually historical cruft. Some of it may even be
code that was never supposed to work, but because we maintained
_other_ dirty bits in the PTE's, and never touched them before, we
never even realized that the code that played with PG_dirty was
totally insane.
So the approach being taken by Linus is to
rework the dirty page accounting code into something a little more
reasonable. To that end, test_clear_page_dirty() is no more,
having been pronounced "insane" by Linus. Instead, the new code tries for
a better defined sense of when the dirty bit on a page can be cleared; it
comes down to either (1) the page is being written to backing store,
or (2) the page is no longer relevant (when a file is truncated, for
example). In typical fashion, Linus fixed enough to make his own
configuration work, leaving the rest as an exercise for the reader.
He makes no claims that this rework will have solved the problem, only that
it makes the code more sane than it was before. As of this writing, there
have been no responses from the people who are able to reproduce this
problem. If the problem goes away - and the developers can convince
themselves that it has not just been papered over - then some version of
this fix will likely need to be prepared for a 2.6.19 update. Then, maybe,
the dachshunds can come out of hiding.
(
Log in to post comments)