Flash storage topics
At the 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Jaegeuk Kim described some current issues for flash storage, especially with regard to Android. Kim is the F2FS developer and maintainer, and the filesystem-track session was ostensibly about that filesystem. In the end, though, the talk did not focus on F2FS and instead ranged over a number of problem areas for Android flash storage.
He started by noting that Universal Flash Storage (UFS) devices have high read/write speeds, but can also have high latency for some operations. For example, ext4 will issue a discard command but a UFS device might take ten seconds to process it. That leads the user to think that Android is broken, he said.
![Jaegeuk Kim [Jaegeuk Kim]](https://static.lwn.net/images/2018/lsf-kim-sm.jpg)
UFS devices have a "huge garbage-collection overhead". When garbage collection is needed, the performance of even sequential writes drops way down. That needs to be avoided, so UFS must be periodically given some time to do its garbage collection. But power is a more important consideration, so hibernating the device is prioritized, which does not leave much time for the device to do its garbage collection.
Amir Goldstein suggested doing garbage collection when the device is charging; he thought that should provide a reasonable solution. Kim said that Android currently declares a ten-minute idle time at 2am that is used to defragment the filesystem. It could perhaps also be used for garbage collection.
The solution to the discard performance problem should be fairly straightforward, he said. A kernel thread (kthread) could be added to issue discards asynchronously during idle time. Candidate blocks could be added to a list that would be processed by the kthread. There is a race condition if the block gets reallocated, however.
Different UFS devices have different latencies for their cache-flush commands. Some vendors' devices have low latency but others have ten-second latencies for a single cache-flush command. Given that, it makes sense to batch cache-flush commands.
Filesystem encryption is mandatory for Android. It is present in ext4 and has also been added to F2FS. There is some hardware encryption code from Qualcomm that cannot be pushed upstream, however. Ted Ts'o said that it is "horrible code" that only works for ext4 ecryptfs or F2FS; no one has had time to clean it up for the mainline.
Kim would like to see the garbage collection on the device side get optimized. He would like to add a customized interface that can be called when it is time to do garbage collection. If the system can detect idle time, it can then initiate the garbage-collection process.
SQLite performance is another problem area. SQLite uses fsync() to ensure its data has gotten to storage. By default it uses a journal, so writes to the database end up requiring two writes and two fsync() calls (first for the journal and then to the final location). Two fsync() operations can be expensive and are not needed for F2FS because it is a copy-on-write filesystem. A feature has been added to SQLite to avoid one write and one fsync() by using F2FS atomic writes.
In order to reduce the latency of fsync() calls, he is looking at write barriers. He researched them and found that they had been removed long ago. Kent Overstreet said they were removed due to unclear semantics, especially for stacked filesystems. In that case, the stack would have to provide order guarantees for the BIOs all the way down the stack, which would be difficult to do and would defeat the purpose of some of the layers. Beyond that, it is impossible to test to make sure that has been done correctly.
But Kim said that the Android case would not involve device-mapper or other stacking, he is just trying to avoid the cache-flush command. Jan Kara suggested a new storage command, like "issue barrier", that would cause any I/O issued before the barrier to complete before any new I/O.
Index entries for this article | |
---|---|
Kernel | Filesystems/Flash |
Conference | Storage, Filesystem, and Memory-Management Summit/2018 |
Posted Jun 6, 2018 22:58 UTC (Wed)
by Tobu (subscriber, #24111)
[Link] (18 responses)
Posted Jun 7, 2018 17:55 UTC (Thu)
by drh (guest, #65025)
[Link] (16 responses)
On the other hand, many applications don't care so much about losing a little work during a power outages as long as everything comes back up in a sane state.
Posted Jun 7, 2018 19:15 UTC (Thu)
by andresfreund (subscriber, #69562)
[Link]
Posted Jun 7, 2018 20:52 UTC (Thu)
by zlynx (guest, #2285)
[Link] (14 responses)
It doesn't happen very often anyway. Between laptops with batteries and desktops with UPS most of my data loss comes from kernel bugs these days.
Speaking of UPS, I never understood why someone would balk at spending $100 to protect a $1,000 computer. Bad power can cause nasty issues.
Posted Jun 7, 2018 22:10 UTC (Thu)
by k8to (guest, #15413)
[Link]
At an organizational level it's another thing to organize and do procurement and asset management for. Which is basically the same type of dynamic just playing out at a different scale, but here sometimes the paperwork and human hours for the 1000$ computer are larger than 1000$ (and relatedly the paperwork for the UPS could be similar to the computer).
It probably still makes sense in the long run, though maybe not if you go thoroughly in the cattle direction.
Posted Jun 8, 2018 15:18 UTC (Fri)
by nix (subscriber, #2304)
[Link] (12 responses)
But UPSes are *also* a source of unreliability. If you have only a couple of power flickers a decade (as the UK used to until it privatized a lot of its electricity network and started skimping on maintenance), a UPS *worsens* reliability rather than improving it.
Posted Jun 8, 2018 15:33 UTC (Fri)
by zlynx (guest, #2285)
[Link] (11 responses)
In my personal experience, while I have had UPS batteries go bad, which just happens every so many years, I haven't had a UPS actually fail. I have had more computer power supplies fail.
And as for utility power, I don't understand how the UK could possibly be as reliable as you say. Perhaps I'm used to more spread out rural areas of Colorado, but with power lines on poles combined with extremely high winds (tornadoes sometimes) and lightning, and heavy wet snow, well, power is just going to go out every now and then. I really don't see how maintenance could help.
Posted Jun 8, 2018 15:50 UTC (Fri)
by karkhaz (subscriber, #99844)
[Link] (7 responses)
Posted Jun 8, 2018 16:42 UTC (Fri)
by excors (subscriber, #95769)
[Link] (1 responses)
Posted Jun 14, 2018 17:30 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
After heavy rain, there was a landslip at a chalk pit that took out the local substation.
At work, some robbers tried to blow their way into a bank vault, but in the process took out a major electricity supply cable.
Some thieves tried to steal a copper power line (250KVA, I think) and took out a small town.
Things like brownouts are pretty much unknown.
So yes, in Britain MOST people MOST of the time never experience a problem. The only people who will see any need for a UPS are people who live near an industrial area where their neighbours are dirtying the supply. Outside of that, supply is both good and reliable, and short outages are almost unknown. If there's a problem, it's either with the house supply itself, or in the cases I've mentioned above it's a major but localised problem - at one day, the first problem was the one rectified the quickest of the above three. The second took a week, while the third left many homes without power for days ...
Cheers,
Posted Jun 9, 2018 10:53 UTC (Sat)
by anselm (subscriber, #2796)
[Link] (4 responses)
The UK Met Office would beg to disagree on tornadoes:
Given that overhead power lines are fairly common at least in rural areas of the UK, it would not be in the least surprising that storms (including hurricane remnants and tornadoes) caused occasional power outages.
Posted Jun 9, 2018 12:39 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (3 responses)
Posted Jun 9, 2018 13:44 UTC (Sat)
by karkhaz (subscriber, #99844)
[Link] (2 responses)
One thing that occurs to me, though, is that the sources for "highest number of tornadoes recorded per area" seem to have very population densities (Bangladesh is mentioned often, and the Netherlands and UK have the highest and third-highest densities of all the non-tiny European nations). This may be the reason that the reported numbers are so high for these countries: people are a lot more likely to see a tornado, even if it is too weak to cause significant damage, than in rural Tornado Alley. This is compounded by the fact that tornadoes are difficult to observe directly using radar, so tornado reports mostly come from people who have seen the tornado first-hand. And higher population densities lead to more frequent infrastructure that could suffer from noticeable damage, e.g. power lines, train tracks, etc.
Also I wonder if people being acclimatized to huge tornadoes in Tornado Alley leads to people reporting less of the smaller ones: a relatively benign tornado that would have somebody in the UK scrambling to phone the Met Office might just be ignored by their cousin over the pond.
Posted Jun 12, 2018 6:14 UTC (Tue)
by k8to (guest, #15413)
[Link] (1 responses)
In the United States northeast growing up, we had a number of minor twisters that no one thought to label "tornado". If it only uprooted 30 trees or so, it was "just a twister".
Posted Jun 12, 2018 6:34 UTC (Tue)
by mpr22 (subscriber, #60784)
[Link]
Posted Jun 8, 2018 17:47 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Long-distance high-voltage cross-country lines and lines leaving rural power stations *are* up in the air, but it takes a hell of a lot of wind to knock down one of *those* monster pylons (again, it more or less never happens unless a tree falls on the lines, and they are usually routed away from trees for exactly that reason). It does happen, but because these are high-voltage lines there is almost always fallback from elsewhere in the grid if one is hit, unless there has been a *major* storm and numerous of them have been taken out at once. (Again, this affects major conurbations essentially never: distant rural Scotland, or single small towns that might have only one or two incoming lines, sure, but nothing larger.)
I am unhappy with the current state of the UK power network. I've had two flickers in the last decade, and checking my supplier's logs I see one five-minute outage! This is awful: in the decades of the 1990s and 2000s I had none at all. It's hard to justify a UPS with reliability like that. (Note: in the same time window we had a half-*day* water outage, when a farmer drove a plough through a 6in supply pipe and de-watered half the town...)
Posted Jun 28, 2018 11:32 UTC (Thu)
by jospoortvliet (guest, #33164)
[Link] (1 responses)
Posted Jul 7, 2018 19:53 UTC (Sat)
by nix (subscriber, #2304)
[Link]
Posted Jun 14, 2018 15:36 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
At present, (a) user-space can not reason about the state of the disk, and (b) a "sync" is effectively a Denial-of-Service attack on all the other users of the system (and yourself).
If I can guarantee that certain writes hit the disk in a fixed order, then I can reason about the state of the disk, and write a robust app.
The problem is that POSIX explicitly only applies to a properly functioning system - it explicitly disclaims all liability if the system malfunctions, and things like databases need to be able to reason about a malfunctioning system.
Cheers,
Posted Jun 8, 2018 5:17 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (5 responses)
There's some empirical evidence that cheap eMMCs found in - you guessed - cheap Android phones wear out quickly, sometimes even sooner than when security issues there stopped receiving fixes.
On the other hand, there's an incredible number of Android sites and communities rooting and reviewing every single Android device under the sun and running many various benchmark on them. Just no... storage endurance test ever? Why?
It wouldn't sound like a major feat to run some storage test designed to "break" flash storage as fast as possible thanks to smartly configured write amplification[*] and what not and to measure how many cycles it takes before the memory dies. Or is it hard and why? OK that would cost a sample device but the number of clicks for the corresponding review should hopefully offset that.
[*] https://lwn.net/Articles/428584/ Optimizing Linux with cheap flash drives Arnd Bergmann
> For example, ext4 will issue a discard command but a UFS device might take ten seconds to process it
LOL. Hey, why would you have half-decent latency requirements for components aiming a market of purely interactive products! I doubt any eMMC was that bad, I mean at least not any brand new eMMC.
Posted Jun 8, 2018 9:04 UTC (Fri)
by excors (subscriber, #95769)
[Link] (4 responses)
Maybe some device manufacturers measure and optimise their IO, allowing themselves to choose a cheaper chip with lower endurance because they have confidence that it will be sufficient, whereas others don't care and have a higher-endurance chip that wears out quicker because they're constantly spamming it with log files and unnecessary caches and some process is calling sync() every 30ms. Simply comparing the raw endurance would give misleading results as to which device is better, and reviewing devices with misleading benchmarks is harmful since it forces manufacturers to optimise for those benchmarks rather than for users.
I suspect it's also hard to get meaningful measurements from a single device, because of the random nature of the failures. You might need to test a large number to get an accurate MTBF, and it seems impractical and a bit silly to buy a large number of phones just to test a chip that costs a few dollars.
Posted Jun 8, 2018 14:06 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (3 responses)
This is already the nature of almost the entire industry except in this case. Yet I don't think anyone would like a benchmark-free world. The answer is rather better and more varied benchmark(s) that are harder to cheat. Considering the relative simplicity of storage interfaces (compared to say... GPUs!) designing such an endurance benchmark that models real-world usage quite reliably doesn't seem crazy. In fact isn't there some endurance benchmark already for less disposable storage products?
> You might need to test a large number to get an accurate MTBF, and it seems impractical and a bit silly to buy a large number of phones just to test a chip that costs a few dollars.
Fair enough. Then maybe the answer should be something like this:
Posted Jun 8, 2018 15:33 UTC (Fri)
by excors (subscriber, #95769)
[Link] (2 responses)
Measuring the endurance of a particular flash chip doesn't sound like it should be too difficult; just do a load of writes until you see IO failures or data loss, and maybe do something to see how effective any wear-levelling is, and compare against the vendor's endurance guarantees to make sure they're not lying. But if you want to know how that affects the lifetime of a phone, you need to know the behaviour of the software on that phone, and you need to know what memory chip it uses (which is non-trivial since a single model of phone might use parts from multiple vendors at once, for supply chain diversification, and change parts over time to reduce cost), and that's not something a typical phone review site could feasibly do. CPU/GPU benchmarks are much easier since the relevant software is provided by the benchmark itself, and the hardware is usually consistent across a phone model (or if some are different then it's probably a whole different SoC and is very obvious), so measurements on a test device are likely to match customer devices.
To get realistic data about large populations, I guess you'd need access to automatically-uploaded error logs or customer support records to see how many users have encountered storage errors. That would be nice, but seems unlikely to happen.
Posted Jun 8, 2018 18:20 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (1 responses)
Basic benchmark design problem, not specific to storage or endurance.
> and you need to know what memory chip it uses
Not a problem specific to storage or endurance: https://www.google.com/search?q=iphone+intel+modem
> CPU/GPU benchmarks are much easier since the relevant software is provided by the benchmark itself
Interfaces to GPU are orders of magnitude more complex than storage interfaces; one of the reasons cheating GPU benchmarks is universal: https://www.google.com/search?q=game+benchmark+cheating
> The problem isn't necessarily that people would cheat, it's that the marketing people would tell the engineers to spend effort legitimately...
We know how "legitimately" often ends up with (at least) GPUs and car emissions. You can take for granted that some actors will always go "beyond legitimate"; again nothing specific to flash storage or endurance.
> I guess you'd need access to automatically-uploaded error logs or customer support records to see how many users have encountered storage errors. That would be nice, but seems unlikely to happen.
How do we know it's not happening already? (biggest lie on the Internet: "I agree")
Posted Jun 8, 2018 18:44 UTC (Fri)
by excors (subscriber, #95769)
[Link]
Error logs certainly get uploaded already, on some devices - they're very useful for identifying and prioritising common bugs, quickly detecting regressions when rolling out OTAs, etc. What I mean is unlikely is that the companies with that information would ever release it publicly.
Posted Jun 8, 2018 9:55 UTC (Fri)
by awilfox (guest, #124923)
[Link] (1 responses)
What about night owls like me, that are regularly using the phone at that time? If the phone isn't idle then, does it just wait for the next idle time? Searched ddg but nobody seems to have heard of Android doing defragmenting every 02:00...
Posted Jun 18, 2018 12:40 UTC (Mon)
by nelzas (subscriber, #4427)
[Link]
That "issue barrier" command would be a perfect fit for some databases.
Ensuring a given order is a lot faster than ensuring everything has been persisted to disk, and can be a sufficient guarantee in distributed systems.
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Wol
Flash storage topics
Neither tornadoes nor hurricanes form here
Around 30 tornadoes a year are reported in the UK. These are typically small and short-lived, but can cause structural damage if they pass over built-up areas.
Hurricanes in a literal sense don't occur in the UK a lot, but sometimes the “tail end” of a hurricane can end up in Britain as a destructive storm, like Hurricane Ophelia in October, 2017.
The UK has the most tornadoes per year of any country in Europe, and more tornadoes per square kilometre per year than any country in the world except the Netherlands. (I think it might even have more tornadoes per square kilometre per year than the region of the USA known as "Tornado Alley".)
Flash storage topics
Flash storage topics
Flash storage topics
Wikipedia tells me that to qualify as a tornado, a weather phenomenon must involve a rotating wind column, reaching from ground level to the base of the overhead clouds, with surface wind speeds in excess of 40 mph (64 km/h).
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Flash storage topics
Wol
Flash storage endurance
https://www.bunniestudios.com/blog/?p=3554 On Hacking MicroSD Cards
Flash storage endurance
Flash storage endurance
https://ai.google/research/pubs/pub32774 "Failure Trends in a Large Disk Drive Population"
Maybe it's happening somewhere already.
Flash storage endurance
Flash storage endurance
https://fosdem.org/2018/schedule/event/apitrace/
Yet no one suggests to stop benchmarking GPUs.
Flash storage endurance
>
> How do we know it's not happening already? (biggest lie on the Internet: "I agree")
[slightly off-topic] 2AM?
[slightly off-topic] 2AM?
searching for defrag in phone settings doesn't give me results...