Two more
Two more
Posted Mar 31, 2009 7:03 UTC (Tue) by man_ls (guest, #15091)In reply to: From ext3 to ext4: An Interview with Theodore Ts'o (Linux Magazine) by bojan
Parent article: From ext3 to ext4: An Interview with Theodore Ts'o (Linux Magazine)
Thanks for an excellent summary. Let me explain two more possible consequences:
- Mr Ts'o shows considerable arrogance saying that virtually every application on the planet is "badly written" (including GNU fileutils, meaning most frequently used OS tools such as mv). He also seems unaware of what we might call "Hot topics in filesystem design", such as: "POSIX is not the bible of reliability it was never supposed to be" or "Users dislike empty files".
- This dangerous combination of arrogance and ignorance is leading Mr Ts'o to quickly damage ext4 reputation and place it next to XFS in users minds, and we all know how hard it is to revert that kind of reputation. This may leave Linux users in many years to come between a rock and a hard place when it comes to filesystem performance: use the obsolete and slow ext3, or suffer the consequences of repeated slow fsync() calls in the much-needed ext4.
Posted Mar 31, 2009 7:52 UTC (Tue)
by rahulsundaram (subscriber, #21946)
[Link] (9 responses)
We have many many arrogant people in the Free software world and key parts of any Linux system depends on their code. If you can find any technical incompetence that results in issues unfixed, it might be worth considering but I don't see you pointing out any such issues.
Posted Mar 31, 2009 19:38 UTC (Tue)
by man_ls (guest, #15091)
[Link] (8 responses)
Posted Mar 31, 2009 21:02 UTC (Tue)
by rahulsundaram (subscriber, #21946)
[Link] (7 responses)
Posted Mar 31, 2009 21:40 UTC (Tue)
by man_ls (guest, #15091)
[Link] (5 responses)
Posted Mar 31, 2009 21:50 UTC (Tue)
by rahulsundaram (subscriber, #21946)
[Link] (4 responses)
The very same blog post that describes the problems also mentions that fixes have already been queued. Technically, I don't know what more you could ask for. To be clear, there are other potential issues present but the ones you are talking about were fixed even before the blog post was written.
Posted Mar 31, 2009 22:47 UTC (Tue)
by man_ls (guest, #15091)
[Link] (3 responses)
There are few black and white issues, but a filesystem developer saying that corrupting user data is fine would seem to qualify. Later commiting a fix to "work around" the problem while a hundred thousand developers fix their code is hardly enough. Technically, I am not even sure a public flogging would be enough.
And now, ladies and gentlemen, with your kind permission I will just call Ts'o a nazi in a half-assed invocation of Godwin's law to jump out of this discussion and go to sleep.
Posted Mar 31, 2009 23:52 UTC (Tue)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
Posted Apr 1, 2009 0:04 UTC (Wed)
by bojan (subscriber, #14302)
[Link] (1 responses)
Actually, you don't even have to look at other file systems. ext3 in writeback mode is sufficient, because metadata can go to disk before data. You may end up with garbage in your files after the crash.
Posted Apr 1, 2009 6:52 UTC (Wed)
by man_ls (guest, #15091)
[Link]
Posted Apr 3, 2009 14:03 UTC (Fri)
by anton (subscriber, #25547)
[Link]
Posted Mar 31, 2009 9:38 UTC (Tue)
by regala (guest, #15745)
[Link] (4 responses)
Posted Mar 31, 2009 9:47 UTC (Tue)
by regala (guest, #15745)
[Link] (3 responses)
Posted Mar 31, 2009 18:21 UTC (Tue)
by man_ls (guest, #15091)
[Link] (2 responses)
That reminds me of the old joke. A reckless driver on the highway is listening to the radio: "Attention, attention, there is a crazy man driving against the traffic on the highway", and he says: "One? All of 'em!"
Posted Mar 31, 2009 19:32 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
I wonder if you're been using the same Internet I have, really.
Posted Mar 31, 2009 21:37 UTC (Tue)
by man_ls (guest, #15091)
[Link]
Posted Mar 31, 2009 14:02 UTC (Tue)
by clugstj (subscriber, #4020)
[Link] (8 responses)
Posted Mar 31, 2009 15:40 UTC (Tue)
by sbergman27 (guest, #10767)
[Link] (7 responses)
There is competence, and there is judgment. And the two are distinct. I think that it is his judgment on this matter that is in question. I've been waiting for Linus to speak on the matter. I would be very interested in his view of this matter. Of course, the distros have the final say as to what are the effective defaults, even down to the patches they choose to apply. And *savvy* users have the ultimate decision as to the configuration of their systems. Unsavvy users, of course, are stuck with what they get.
Posted Mar 31, 2009 19:18 UTC (Tue)
by sbergman27 (guest, #10767)
[Link]
========================
Isn't that the same fix? ext4 just defaults to the crappy "writeback"
We might as well go back to ext2 then. If your data gets written out long
Linus
=======================
Posted Mar 31, 2009 19:18 UTC (Tue)
by man_ls (guest, #15091)
[Link] (5 responses)
We might as well go back to ext2 then. If your data gets written out long
after the metadata hit the disk, you are going to hit all kinds of bad
issues if the machine ever goes down.
And expecting every app to do fsync() is also crazy talk, especially with
the major filesystems _sucking_ so bad at it (it's actually a lot more
realistic with ext2 than it is with ext3).
So look for a middle ground. Not this crazy militant "user apps must do
fsync()" crap. Because that is simply not a realistic scenario.
And ext3 with "data=writeback" does the same, no?
Both of which are - as far as I can tell - total braindamage. At least
with ext3 it's not the _default_ mode.
Posted Mar 31, 2009 19:39 UTC (Tue)
by oak (guest, #2786)
[Link]
If /dev/null writes aren't zero-copy, it's journaled too!
The window for data retrieval is (infinitely) small though.
Posted Mar 31, 2009 22:37 UTC (Tue)
by bojan (subscriber, #14302)
[Link] (3 responses)
Major filesystems being "ext3 in ordered mode only", of course. The rest could be just fine with fsync(), as we can see above from his ext2 comment. And as Ted pointed out, ext4 doesn't have a big penalty on fsync(), because it doesn't have to flush out MBs of stuff that are unrelated to this particular fsync(), every time this system call is used.
Just as Linus says that ext4 is brain damaged for doing delayed allocation by default, so can it be claimed that is ext3 brain damaged for locking up people's machines for a few seconds on a perfectly reasonable system call: fsync(). We have seen this from the FF fiasco. In fact, when Linux says that having an interactive application do fsync() is impossible, he must mean on ext3 in ordered mode, because that's what FF complaints were about. As Alan Cox and Ted pointed out, one can already do fsync() in another thread and be fully interactive.
As for configuration files of KDE (which is where the problem started), the library can trivially do backup of these files on startup and _never_ use fsync() after that. Other problems should probably be solved by a proper system call that does guarantee ordering (I think Ted provisionally called it fbarrier() or something). Then we'd have a real guarantee of the behaviour, instead of relying on whims of implementations.
Claiming the rename() always did "data before metadata" commits is ahistorical. So, the crazy talk ain't that crazy after all. We just got caught we our pants down.
Surely, Linus is "tha man" when it comes to Linux and what he says will eventually go. But, removing any criticism from what he says is just arse licking, IMNSHO.
Posted Mar 31, 2009 22:43 UTC (Tue)
by bojan (subscriber, #14302)
[Link] (1 responses)
Gee, he should have called it something else. It is impossible to get the man's name right after having "Linux" :-)
Posted Apr 12, 2009 7:59 UTC (Sun)
by Duncan (guest, #6647)
[Link]
Actually, "he" (Linus) did call it something else, "Freeix". It was
(Just google freeix linux for more. "I'm feeling lucky" does it for me.)
Duncan
Posted Apr 2, 2009 12:01 UTC (Thu)
by renox (guest, #23785)
[Link]
OR the other possibility is to use a FS which does the operations in-order which simplify a lot the application programming.
Posted Mar 31, 2009 19:26 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Mar 31, 2009 19:28 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Apr 3, 2009 6:52 UTC (Fri)
by efexis (guest, #26355)
[Link]
Two more
You are right, "commit rights" was meant in a purely rhetorical sense. Saying "Linus should not pull nor even cherry-pick from Mr Ts'o any more" just doesn't carry the same strength.
Two more
If you can find any technical incompetence that results in issues unfixed, it might be worth considering but I don't see you pointing out any such issues.
Sorry, I don't buy that. Technical competence to me is not just leaving issues unfixed; it includes the ability to see the consequences of your actions. When a guy makes a change and suggests that thousands compensate for it for no good reason that is a pretty good sign of incompetence. As sbergman27 pointed out below (and as he quoted a few jiffies before I did), Linus did choose the word "incompetent".
Two more
Just for one reason: because Mr Ts'o never admitted to being wrong. In Catholic terms, what good is reparation without repentance? Or, how can you ever learn from your mistakes if you don't admit them in the first place?
Workarounds
Workarounds
What other filesystems are you talking about? On ext2 and other filesystems without a journal, sure, users know the risks and live with them. But applications seem to work fine on most other journaling filesystems: ext3, reiserfs, hfs+, zfs, even xfs was fixed years ago. Cygwin on ntfs works fine.
Workarounds
Workarounds
Workarounds
Writeback mode? FAT?!? Please leave your (metaphorical) commit rights in the reception on your way out. Both of you.
Workarounds
Two more
Ted suggested that it was an application usage problem but
added hacks to workaround the issues anyway.
It's a question of trust. Do I trust my data to a file system whose
developer has the attitude that Ted T'so has? Not if I have an
alternative.
Two more
who's been contributing since September 1991 ?
Two more
what I'd like you to do, is to think about what you said. I don't think anyone can say Ted was ever arrogant, in these dreadful flame threads around Launchpad, Ubuntu and here on LWN. He's been quite understanding, never calling anybody anything while being insulted by herds of angry mob.
Would you please like to stop ? He's no arrogant, you are. Ever considering Linus starting to mistrust his judgement is ridiculous.
Have you ever had anyone say that your code is "badly written" because he understood a spec in a rather peculiar manner? That amply qualifies as an insult to me. Given that most people in the world understands the spec differently, it's not bad for arrogance either.
Two more
Two more
kernel development (in fact, in free software development, period). I may
sometimes disagree with what he says, but he's *always* worth listening
to, and always well reasoned.
nix, I highly value your opinion, and Mr Ts'o can be a patron saint of the arts, but he has behaved like a jerk in this issue. Just look at his own E pur si muove:
Good people behaving badly
This will cause a significant performance hit, but apparently some Ubuntu users are happy using proprietary Nvidia drivers, even if it means that when they are done playing World of Goo, quitting the game causes the system to hang and they must hard-reset the system. For those users, it may be that nodelalloc is the right solution for now personally, I would consider that kind of system instability to be completely unacceptable, but I guess gamers have very different priorities than I do.
I probably got too carried away with the discussion (and my own indignation). Probably he did not mean to insult anyone, and he did express himself with manners. But this tirade is not well reasoned; it has a lot of holes and is in general a lot of rubbish. More's the pity if he is such a worthy individual as you say.
Two more
Two more
It is his competence that matters.
"""
Two more
On Tue, 24 Mar 2009, Theodore Tso wrote:
>
> Try ext4, I think you'll like it. :-)
>
> Failing that, data=writeback for single-user machines is probably your
> best bet.
behavior, which is insane.
Sure, it makes things _much_ smoother, since now the actual data is no
longer in the critical path for any journal writes, but anybody who thinks
that's a solution is just incompetent.
after the metadata hit the disk, you are going to hit all kinds of bad
issues if the machine ever goes down.
I had the impression that Linus had already spoken against data loss, and he has indeed:
Where competence meets judgment
Sure, it makes things _much_ smoother, since now the actual data is no
longer in the critical path for any journal writes, but anybody who thinks
that's a solution is just incompetent.
Gods how I enjoyed that quote. And:
But I also think that the "we write meta-data synchronously, but then the
actual data shows up at some random later time" is just crazy talk. That's
simply insane. It _guarantees_ that there will be huge windows of times
where data simply will be lost if something bad happens.
And:
Doesn't at least ext4 default to the _insane_ model of "data is less
important than meta-data, and it doesn't get journalled"?
Linus is tha man.
Speed doesn't matter if you cannot trust it
cat >/dev/null
Where competence meets judgment
Where competence meets judgment
Where competence meets judgment
above "he should have called it something else" was simply a figure of
speech, but maybe the below will be new to the newbies at least.
Linus' colleague that put it up on the ftp-site that put it in a
directory he named "linux", and so history was made.
Judgments must take into accounts users
Which means that whatever the FS if you must use fsync to have the correct behaviour then to avoid showing freeze to the user you must go to the dreaded multi-threaded world.
Sure the FS can provide a (Linux specific) write barrier, but it's very likely that nobody will use this.
There may be a small performance cost, somehow I doubt that users will care.
Two more
coreutils now (merged with what used to be sh-utils and textutils).
Two more
loss: the remaining instances don't seem major to me (extending existing
files, for instance, is much rarer than writing out new ones).
"extending existing
files, for instance, is much rarer than writing out new ones"
Two more
My system, apache and database replay log directories would disagree on that one.