Flash storage endurance
Flash storage endurance
Posted Jun 8, 2018 9:04 UTC (Fri) by excors (subscriber, #95769)In reply to: Flash storage endurance by marcH
Parent article: Flash storage topics
Maybe some device manufacturers measure and optimise their IO, allowing themselves to choose a cheaper chip with lower endurance because they have confidence that it will be sufficient, whereas others don't care and have a higher-endurance chip that wears out quicker because they're constantly spamming it with log files and unnecessary caches and some process is calling sync() every 30ms. Simply comparing the raw endurance would give misleading results as to which device is better, and reviewing devices with misleading benchmarks is harmful since it forces manufacturers to optimise for those benchmarks rather than for users.
I suspect it's also hard to get meaningful measurements from a single device, because of the random nature of the failures. You might need to test a large number to get an accurate MTBF, and it seems impractical and a bit silly to buy a large number of phones just to test a chip that costs a few dollars.
Posted Jun 8, 2018 14:06 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (3 responses)
This is already the nature of almost the entire industry except in this case. Yet I don't think anyone would like a benchmark-free world. The answer is rather better and more varied benchmark(s) that are harder to cheat. Considering the relative simplicity of storage interfaces (compared to say... GPUs!) designing such an endurance benchmark that models real-world usage quite reliably doesn't seem crazy. In fact isn't there some endurance benchmark already for less disposable storage products?
> You might need to test a large number to get an accurate MTBF, and it seems impractical and a bit silly to buy a large number of phones just to test a chip that costs a few dollars.
Fair enough. Then maybe the answer should be something like this:
Posted Jun 8, 2018 15:33 UTC (Fri)
by excors (subscriber, #95769)
[Link] (2 responses)
Measuring the endurance of a particular flash chip doesn't sound like it should be too difficult; just do a load of writes until you see IO failures or data loss, and maybe do something to see how effective any wear-levelling is, and compare against the vendor's endurance guarantees to make sure they're not lying. But if you want to know how that affects the lifetime of a phone, you need to know the behaviour of the software on that phone, and you need to know what memory chip it uses (which is non-trivial since a single model of phone might use parts from multiple vendors at once, for supply chain diversification, and change parts over time to reduce cost), and that's not something a typical phone review site could feasibly do. CPU/GPU benchmarks are much easier since the relevant software is provided by the benchmark itself, and the hardware is usually consistent across a phone model (or if some are different then it's probably a whole different SoC and is very obvious), so measurements on a test device are likely to match customer devices.
To get realistic data about large populations, I guess you'd need access to automatically-uploaded error logs or customer support records to see how many users have encountered storage errors. That would be nice, but seems unlikely to happen.
Posted Jun 8, 2018 18:20 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (1 responses)
Basic benchmark design problem, not specific to storage or endurance.
> and you need to know what memory chip it uses
Not a problem specific to storage or endurance: https://www.google.com/search?q=iphone+intel+modem
> CPU/GPU benchmarks are much easier since the relevant software is provided by the benchmark itself
Interfaces to GPU are orders of magnitude more complex than storage interfaces; one of the reasons cheating GPU benchmarks is universal: https://www.google.com/search?q=game+benchmark+cheating
> The problem isn't necessarily that people would cheat, it's that the marketing people would tell the engineers to spend effort legitimately...
We know how "legitimately" often ends up with (at least) GPUs and car emissions. You can take for granted that some actors will always go "beyond legitimate"; again nothing specific to flash storage or endurance.
> I guess you'd need access to automatically-uploaded error logs or customer support records to see how many users have encountered storage errors. That would be nice, but seems unlikely to happen.
How do we know it's not happening already? (biggest lie on the Internet: "I agree")
Posted Jun 8, 2018 18:44 UTC (Fri)
by excors (subscriber, #95769)
[Link]
Error logs certainly get uploaded already, on some devices - they're very useful for identifying and prioritising common bugs, quickly detecting regressions when rolling out OTAs, etc. What I mean is unlikely is that the companies with that information would ever release it publicly.
Flash storage endurance
https://ai.google/research/pubs/pub32774 "Failure Trends in a Large Disk Drive Population"
Maybe it's happening somewhere already.
Flash storage endurance
Flash storage endurance
https://fosdem.org/2018/schedule/event/apitrace/
Yet no one suggests to stop benchmarking GPUs.
Flash storage endurance
>
> How do we know it's not happening already? (biggest lie on the Internet: "I agree")