From ext3 to ext4: An Interview with Theodore Ts'o (Linux Magazine)
From ext3 to ext4: An Interview with Theodore Ts'o (Linux Magazine)
Posted Oct 21, 2016 13:54 UTC (Fri) by damnyoulinux (guest, #111878)Parent article: From ext3 to ext4: An Interview with Theodore Ts'o (Linux Magazine)
I inherited a system where someone had put ext4 on several hundred workstations. I believe this may have happened automatically during a distribution upgrade, however even new installs following used ext4 and the same options as the updated system. These systems are used in many ways, including a fair few database applications. I would semi regularly, perhaps once or twice a day have to attend to file corruption issues with these. It would usually require purging and reimporting datasets. These workstations would not be treated too delicately and this was outside of my control. They would for example very be turned off at the switch at the end of the day. In my case power loss support was essential.
Considering something other than simple hardware faults (cosmic rays, loose cables, etc) was delayed because the file system never being corrupted. When the number of work stations doubles, then quadrupled and so on it One widely used application in particular would often have failures and it made no sense because it was chosen specifically for using a power loss safe means of saving data. It basically relied on move being atomic. My first thought was perhaps they lied but after deducing the application wrote more data than others than had corruption and that it was fairly proportionate to write load I started to consider file system or storage media. These were not non-standard applications but common applications being used by millions of people throughout the world so you would expect them to be reasonably resilient and power safe especially if they claim to be (to be fair though, they are normally run on more stable servers). In some cases empty files would be common. The thing about the file system is that it was relying on defaults that were established as acceptably safe before with ext3 and that didn't produce such a high rate of errors, ext4 had the same settings. Some guides today still specify those settings if you're vague with your search on things like safe mount options, people assume they will still be safe. I didn't want to go down the rabbit hole of issues with storage media so focused on the file system and found out about data=ordered not being safe. On face value, everything on the system looked fine. If you search for rename and it being atomic you will find lots of re-enforcement for it. If you do C anyone familiar with the rename function will have the belief that it's supposed to be atomic. It's an operation that seems like it could easily be atomic. It's also very useful as a poor man's safe file update. When everything looked power loss safe though and this application was relying on a rename operation I started to wonder about my assertion and belief that rename is always atomic. With this I eventually found myself here.
Unfortunately even if at least for me the culprit exists I think the problem still exists on some levels. There are probably a lot of people out there who still have the ticking time bomb of bad mount options as well as many who have had to restore a backup and don't really know why. With the information out there people today may have a very hard time avoiding this mistake unless taking extensive efforts to avoid it. It's not necessarily straight forward when setting up a system the damage mount options that once were safe can do.
More of the problem also comes from trying to find the right information. It's like following a trail of breadcrumbs through a labyrinth just to get mount options that are reasonably power safe and that you can well understand. The file system is something that is sacred. It deserves a lot of attention, so such things should in an ideal world be far more forthcoming, well presented and delivered by an authoritative expert source. People expect the reliable behaviour and have come to depend on it. You would expect that it would say in big bold letters for things such as man pages enumerating the options that this issue exists. If you think how important databases and backups are, this is easily just as important. The trail I followed started with that you need "data=journal" because it disables delayed malloc. Few places explain though that you can't just change that. A google result for the option nodelalloc returns a first page seemingly entirely of comments. You need to change the options in your bootloader or tune the filesystem with journal_data as a default mount option. It's also hard to find things out such as how does data=ordered compare with nodelalloc. Can they be used together? Going deeper in the solution is now that you only need nodelalloc. So why data=journal? What about the results that say to use data=writeback? How do these things compare on performance and data integrity? You also learn that there are other journal safety options no one uses because of a bug a while back. Are they safe now? Is no one using them because they've filtered them out as an option since the original bug? Do I have to understand the filesystem fully, read the implementation, run my own benchmarks, run my own tests and so on to be able to set options that I know are safe or safe in the right way and give the best bang for my buck on performance? What's the final conclusion on this topic and the best solution for the problem?
If you don't have such a busy schedule, this kind of thing might not be as frustrating to you are it is to me. The corruption is recoverable as backups are taken appropriately. It still becomes very time consuming however to have to keep restoring them as a relatively high frequency. It betrays the Linux's track record of being a solid system for data storage applications. While it's also part of the tradition that "Linux is hard", I don't think it should be this hard for something as fundamentally crucial as your data and being able to get certain guarantees, reliable consistent information and so on.