LWN: Comments on "A nasty file corruption bug - fixed"

A nasty file corruption bug - fixed

zykov — Thu, 26 Jun 2014 15:32:16 +0000

The rooting your Android device gives applications so access to the root of the device. The root is say the administrator account and can adjust everything inside the phone. This makes it possible to carry out the Samsung normally try to prevent edits. With a jailbreak (or rooting your Samsung) make you so that you get access to all functionalities of the phone or tablet, including features that are not normally accessible, but will be accessible once you Root Galaxy S3 Now still gives your applications access to the root of your phone or tablet apps can use this to customize the default interface and implement other changes that are not normally allowed. There by rooting it so all new features available. Additionally, you can install applications that are not normally possible and remove some apps that may be normal.

Couple of clarifications

Nick — Sat, 13 Jan 2007 06:11:44 +0000

The article is quite good, but there may have been one thing unclear or not exactly right (to me, at least).

Firstly, there was no bug in 2.6.18 or earlier. Two bugs were introduced with the dirty shared mmap accounting patches: one was that pte dirty information would be thrown away, the other was removal of some vital lock coverage that exposed a race.

Secondly, the actual problem was not IO started before set_page_dirty() being called. As other people have noted, the buffers will be marked clean _before_ the IO starts, and set_page_dirty will redirty all buffers including the ones currently under IO.

The main problem was very simple: ptes were getting their dirty bits cleared without transferring that dirtyness into the page. Now this *appeared* to be safe, because that was only happening when we wanted to clean the dirty information, before starting page writeback. Now if the filesystem had previously cleaned some buffers, many filesystems will not write them out again when doing this page writeback.

Data is lost when the memory represented by one of these "clean" buffers has actually been modified via this pte.

Before the page dirty tracking patches went in, such a situation would also see the writeout of such a buffer to be skipped (because the dirty state is only in the pte, and not known to the pagecache). The difference is: that dirty info in the pte does not get chucked away -- it will get transferred to the page (and its buffers) either with msync, or when that memory is unmapped (munmap or exit).

Was that even slightly understandable or helpful? :)

There is still a race ?

jzbiciak — Mon, 08 Jan 2007 13:43:46 +0000

I interpreted the comment to mean "don't let the fact there's a tiny race here stop you from trying out this intermediate, incomplete patch. I know how to fix the race, and presumably anything in its final form would do so."

There is still a race ?

i3839 — Sat, 06 Jan 2007 17:11:01 +0000

I believe I read that this race, if it happened, would cause a writeout to happen twice, instead of only once. It wouldn't cause a writeout to be dropped, so this race can's cause corruption.

There is still a race ?

Lovechild — Sat, 06 Jan 2007 04:07:47 +0000

The following is my take, seeing as I'm a retard baby compared to actual kernel hackers I might be wrong.

If it's a strictly theorically race and the fix means an overhead, it's often left with a comment to say 'here be dragons' so that if someone actually manage to hit it with a valid test case then it can be fixed. No need to endure overhead here and there for things that happen only in theory, it all adds up you know. Also adding code tends to cause more bugs to appear in sutle ways, adds to the complexity of reading and working with the codebase.

A nasty file corruption bug - fixed

riel — Fri, 05 Jan 2007 20:28:36 +0000

You are correct. Dirty bits are cleared when I/O is started, so the application can dirty the page again while the disk I/O happens, without the kernel forgetting that the page was dirtied again.

There is still a race ?

mikov — Wed, 03 Jan 2007 05:01:57 +0000

Linus says that "it still has a tiny tiny race (see the comment), but I
bet nobody can really hit it in real life anyway, and I know several ways
to fix it, so I'm not really _that_ worried about it."

This worries me a bit. Things that are never supposed to happen usually
happen first :-) Are they planning to fix that race ?

A nasty file corruption bug - fixed

dlang — Tue, 02 Jan 2007 23:02:40 +0000

at some point you still need a pointer to each disk block of data, and that is what the bh is supposed to be used for (per Linus).

there are several good reasons for not just going in and changeing all filesystems to not use them for flushing

1. not everyone agrees with Linus (Andrew M for example)

2. it would be a very invasive set of changes to the filesystems, which would introduce their own risk of new bugs.

3. many people who agree with Linus that bh should not be used for flushing are also not sure of exactly what should be done to eliminate this (and how much of the new code should be filesystem neutral and how much should be specific to each filesystem)

A nasty file corruption bug - fixed

flewellyn — Tue, 02 Jan 2007 21:29:50 +0000

I might be naive in asking this, but why are buffer-heads still used at all? Obviously, filesystems were meant to transition away from using them for flushing, so what are they still used for?

Also, I might again be naive in asking, but why not patch all filesystems to not use them for flushing, if doing so is incorrect?

A nasty file corruption bug - fixed

iabervon — Tue, 02 Jan 2007 18:34:53 +0000

But I don't think that's actually true. If the I/O on the block is active, it has already cleared the bh's dirty bit (because the rule is that you clear dirty bits when you decide to write out data, not when you finish, to plug exactly the race you're talking about), and therefore set_page_dirty() will set it and things will be okay. I think this was Linus's second-to-last theory (something was cleaning a buffer after it sent the data to the disk), but it turned out not to be the problem.

The issue is if the page gets written out after set_page_dirty() but before the last write to the page, because the VM didn't redirty buffers in dirty pages when more writes came in. After getting the concurrent dirtying case correct, it essentially missed the case of writes to a clean part of a dirty page.

A nasty file corruption bug - fixed

kay — Tue, 02 Jan 2007 09:37:04 +0000

The article may be a little confusing about this, but it states clear:

If the set_page_dirty() call comes while the I/O on the block is active, the filesystem will not notice the fact that the block's data may have changed after it was written

Kay

A nasty file corruption bug - fixed

rganesan — Tue, 02 Jan 2007 05:51:36 +0000

I agree with this comment that the article does not tell the full story. In particular, I don't think the statement "When the I/O is complete, the filesystem clears the dirty flag in the bh." is correct. I believe the filesystem clears the dirty flag in the bh when the I/O is started.

A nasty file corruption bug - fixed

iabervon — Tue, 02 Jan 2007 04:51:08 +0000

This, of course, leaves out three-quarters of the story, in which quite a number of people, including Linus, found a number of things which were confusing or actual bugs, but weren't actually the real issue. There was quite a bit of argument about whether dirty bits on pages or page tables were getting lost in complicated situations in the VM (including Linus finding something that probably was a bug, and probably would cause the right sort of corruption, but fixing it didn't solve the problem), but it turned out not to be the issue at all.

I'm not sure I actually completely follow what was going on, but I think it's a bit more subtle than the article concludes. If the PTE is already dirty, further writes don't lead to set_pte_dirty() being called. But the buffer heads may be cleaned by the filesystem after the PTE is initially marked dirty and before later writes. Then, when the page is finally done, the buffer heads are already marked clean, so they're skipped. Linus finally found that, when the bug triggered, the kernel was deciding to write out the page, at a point where there was no activity, and then doing nothing because all of the buffer heads were clean.

(Linus had previously thought the issue was that, somewhere, a dirty bit was getting cleared when I/O was completed rather than when I/O started. If you clear the dirty bit when I/O is completed, you'd lose any writes which happen during I/O. But he couldn't find anywhere this was happening, because the real issue was different.)

A nasty file corruption bug - fixed

dlang — Mon, 01 Jan 2007 23:05:01 +0000

it seems that this bug has been in the kernel since at least the 2.5 timeframe, the change in 2.6.19 just made it far easier to hit.

A nasty file corruption bug - fixed

ber — Mon, 01 Jan 2007 21:29:29 +0000

With Cyrus imapd, especially within the Kolab Server we saw file corruptions which could be related to mmap problems. It occurrs rarely enough that we do not have a testcase. Details at kolab/issue840. So I would welcome patches for older kernels and referable information on how long this bug has been in there.

A nasty file corruption bug - fixed

bgoglin — Mon, 01 Jan 2007 21:23:29 +0000

Nobody noticed previously because the bug was hidden. But some changes in 2.6.19 (dirty page balancing, causing writeback to happen earlier) revealed the bug, making it occur much more frequently. Everybody using 2.6.19 should probably downgrade to an earlier kernel or use an upcoming 2.6.19.2 with the fix.

A nasty file corruption bug - fixed

arjan — Mon, 01 Jan 2007 10:08:12 +0000

people noticed.. in hindsight. Suddenly a series of db4 reports show up with people saying they see this regularly and it's now gone away with the fix...

A nasty file corruption bug - fixed

bronson — Mon, 01 Jan 2007 09:57:15 +0000

On the other hand, because nobody noticed this bug for four years, I don't think another week or two will cause anyone much trouble.

A nasty file corruption bug - fixed

arjan — Mon, 01 Jan 2007 08:26:52 +0000

I hope the distros will provide updates quickly since this seems to affect all 2.6 kernel versions out there...