Optimizing Linux with cheap flash drives

Posted Feb 18, 2011 22:36 UTC (Fri) by zlynx (guest, #2285)
In reply to: Optimizing Linux with cheap flash drives by ewen
Parent article: Optimizing Linux with cheap flash drives

The firmware in 2.5" SSDs is already at the advanced levels you are talking about. They can handle any sort of access pattern and maintain great performance while doing it.

The problem is going to remain with CF and SD cards and memory sticks. These small devices cannot include the powerful processsing chips and/or 64 MB DRAM that some of the SSDs require for their magic.

I think these small devices would be much better off to create a new bypass mode in the IDE/SCSI/USB that exposes the actual flash to the host system. Give the real block size, block error status, erase, read and write commands. Might also need commands to access special metadata write areas like block maps, last written block, etc. Then we could run real flash filesystems on them.

Optimizing Linux with cheap flash drives

Posted Feb 18, 2011 23:35 UTC (Fri) by bronson (subscriber, #4806) [Link] (15 responses)

> I think these small devices would be much better off to create a new bypass mode

I think everyone will agree with this. But, short of waving a magic wand (i.e. Microsoft or Intel write specs), I don't see any way of making this happen. It's a monster chicken-and-egg problem: OSes can't add support for devices that don't exist, and vendors won't bother implementing a raw interface until OSes can use it.

Optimizing Linux with cheap flash drives

Posted Feb 19, 2011 0:00 UTC (Sat) by saffroy (guest, #43999) [Link]

Also I suspect the patents on FTL (flash translation layer) algorithms still make it hard for free OSes to use similar approaches.

Optimizing Linux with cheap flash drives

Posted Feb 19, 2011 12:40 UTC (Sat) by willy (subscriber, #9762) [Link] (8 responses)

It's thought to be too hard to "release specs". The algorithms for handling any particular generation (and type) of Intel flash are substantially different from each other. That's just one manufacturer ... Linux would have a horrendous time trying to keep up with the dozens of flash manufacturers each releasing a new generation of flash every 18 months, possibly in several different flavours (1, 2, 3 and 4 bit per cell).

It's probably not even possible for Linux mainline to keep up with that frequency, let alone the enterprise distros or the embedded distros (I was recently asked "So what changed in the USB system between 2.6.10 and 2.6.37?"). And then there's the question about what to do for other OSes.

It's not just a question of suboptimal performance if you use the wrong algorithms for a given piece of flash; there are real problems of data loss and the flash wearing out. No flash manufacturer wants to be burdened with a massive in-warranty return because some random dude decided to change an '8' to a '16' in their OS that tens of millions of machines ended up running.

So yes, as Arnd says, the industry is looking to abstract away the difference between NAND chips and run the algorithms down on the NAND controller. I'm doing my best to help in the NVMHCI working group ... see
http://www.bswd.com/FMS10/FMS10-Huffman-Onufryk.pdf for a presentation given last August.

(I work for Intel, but these are just my opinions).

Optimizing Linux with cheap flash drives

Posted Feb 19, 2011 19:06 UTC (Sat) by ewen (subscriber, #4772) [Link]

Perhaps the middle ground is to come up with some (de facto?) standardised way for manufacturers to categorise how their flash algorithms are optimised. (In addition to any minimum/maximum speed claims, etc.) "Optimised for video streaming", and "optimised for FAT32" being two raised by the article as relatively common, but there's a need for several more categories. At least that way, even without knowing the exact details of how, one could attempt to match the media purchased to the intended workload. Because at the moment it seems tricky as a purchaser to do that, outside perhaps video streaming and assuming everything else is optimised for the FAT file system on it at purchase.

Ewen

Optimizing Linux with cheap flash drives

Posted Feb 19, 2011 21:03 UTC (Sat) by arnd (subscriber, #8866) [Link] (6 responses)

The NVMHCI concept (thanks for the Link!) makes a lot of sense at the high end where the drives can be smart enough to do a good job at providing high performance and low wear.

However, at the low end that I looked at, most drives get everything wrong to start with: there is too little RAM and processing power to do the reordering that would be needed for ideal NAND access patterns, the drives only do dynamic wear leveling, if any, so they break down more quickly than necessary.
The way that the SD card association deals with the problem is to declare all file systems other than FAT32 (with 32KB clusters) unsupported.

What we'd instead need for these devices is indeed a way to be smarter in the host about what it's doing. The block discard a.k.a. trim logic is one example of this that sometimes works already, but is not really enough to work with dumb controllers. What I'd like to see is an abstraction on segment level, using commands like "open this segment for sequential writes", "garbage-collect this segment now", "report status of currently open segments", "how often has this segment been erased?".

Optimizing Linux with cheap flash drives

Posted Feb 19, 2011 22:40 UTC (Sat) by willy (subscriber, #9762) [Link] (1 responses)

Yes, these are quite different devices ... I would estimate a factor of 100+ difference in price, and probably similar factors in terms of capacity, speed, power, etc, etc.

The API you're suggesting makes a ton of sense for the low end devices. I don't think there's a whelk's chance in a supernova of it coming to anything, though. You'd need the SD association to require it, and I can't see it being in the interest of any of their members. When the reaction to "hey, your cards suck for this other filesystem" is "your filesystem is wrong", I can't see them being enthusiastic about something this radical.

I do see that Intel are members. I'll try to find out internally if something like this could fly.

Optimizing Linux with cheap flash drives

Posted Nov 21, 2016 8:57 UTC (Mon) by Hi-Angel (guest, #110915) [Link]

Did you find?

Optimizing Linux with cheap flash drives

Posted Feb 20, 2011 3:25 UTC (Sun) by Oddscurity (guest, #46851) [Link] (3 responses)

So in summary, if I want to run an ext3 filesystem on a USB stick, I'm better off formatting the stick as FAT32 and then running the ext3 as a loop?

Or would that be the wrong conclusion?

Not that it's all I took away from this great article, but I'm wondering what I can do in the meantime to optimise my use with such devices.

Optimizing Linux with cheap flash drives

Posted Feb 20, 2011 4:07 UTC (Sun) by ewen (subscriber, #4772) [Link] (2 responses)

Alas, no, running ext3 in a loop on FAT32 doesn't magically change your file system access patterns from ext3 access patterns to FAT access patterns. (Eg, in that scenario the FAT would hardly ever change since you allocate a huge file for the loop and then just write within it, versus native FAT32 with it changing with each file change, so the cheap flash drives optimisation for the 4MB holding the FAT would be wasted; and you'd still get random updates frequently in "unexpected" -- by the naive firmware -- locations.)

It appears if you want to run ext3 on a cheap flash drive, you pretty much have to assume that it's going to be slower than advertised (possibly MUCH slower, especially for write), and that there's a very real risk of wearing out some areas of the flash faster than might be expected. Probably okay for a mostly-read workload if you ensure that you turn off atime completely (or every read is also a write!), but not ideal for something with regular writes.

If it's an option for your use case, then sticking with the original FAT file system -- and using it natively -- is probably the least bad option. Certainly that's what I do with all my portable drives that see any kind of regular updates. (It also has the benefit that I don't have to worry about drivers for the file system on any system I might plug it into.)

Ewen

Optimizing Linux with cheap flash drives

Posted Feb 20, 2011 14:11 UTC (Sun) by Oddscurity (guest, #46851) [Link]

Thanks for the comprehensive answer.

I may as well switch to just FAT32 for part of the use cases and the other ones are dominated by reads, so can stay on ext.

Optimizing Linux with cheap flash drives

Posted Nov 21, 2016 9:10 UTC (Mon) by Hi-Angel (guest, #110915) [Link]

I disagree, I think the best one can do is to dump offsets/cluster sizes that in use by the original FAT, and then to use them for formatting in a EXT. More over, one need to do it even for no intention of usage with other FS than FAT — because upon reformatting it's easy to get offsets wrong.

I'm wondering btw, why didn't the article have a chapter about finding out those sizes from the original FS. Last time I searched (½year ago), I only found people trying out timing attacks to the stick for that kind of things, though getting info for FS just after the stick bought would be way simpler. I'll check it out, perhaps below in comments someone mentioned it.

Optimizing Linux with cheap flash drives

Posted Feb 21, 2011 14:41 UTC (Mon) by marcH (subscriber, #57642) [Link]

> It's a monster chicken-and-egg problem: OSes can't add support for devices that don't exist, and vendors won't bother implementing a raw interface until OSes can use it.

Agreed, and any way out of this situation would require (at least) a transition phase were some devices support either mode, letting the operating system choose.

Is such a "dual-mode" technically feasible?

Optimizing Linux with cheap flash drives

Posted Feb 22, 2011 14:44 UTC (Tue) by etienne (guest, #25256) [Link] (1 responses)

I do not know a lot about it, but there is specs at
http://onfi.org/specifications/
There is even a connector for FLASH looking like the SDRAM connector.

Re: ONFI (Optimizing Linux with cheap flash drives)

Posted Apr 27, 2011 20:44 UTC (Wed) by frr (guest, #74556) [Link]

Thanks for that link :-) I've noticed that industry group before, but didn't pay much attention. To me, it's been just another flash chip interface standard from the JEDEC stable - notably without Samsung :-) After your remark about the standard connectors, I've taken a better look...

Since 2006 or 2007, there have been several revisions of the ONFI interface standard: 1.0, 1.1, 2.0, 2.1, 2.2, 2.3, and recently 3.0. The most visible differences are in transfer rate.
The "NAND connector" spec from 2008 is a separate paper - not an integral part of the main standard document. The NAND Connector paper refers to ONFI 1.0 and 2.0 standards documents. But - have you ever seen some motherboard or controller board with an ONFI socket? I haven't. In the meantime, there's ONFI 3.0 - it postulates some changes to the set of electrical signals, for the sake of PCB simplification - but there's no update to the "NAND connector" paper. To me that would hint that the NAND connector is a dead end - a historical branch of evolution that has proved fruitless... Please correct me if I'm wrong there, as I'd love to be :-)

ONFI 3.0 does refer to an LGA-style socket (maybe two flavours thereof), apart from a couple of standard BGA footprints. Which would possibly allow for field-replaceable/upgradeable chip packages, similar to today's CPU's. Note that the 3.0 spec doesn't contain a single occurrence of the word "connector" :-)

As far as I'm concerned, for most practical purposes, ONFI remains a Flash chip-level interface standard. It seems ONFI is inside the current Intel SSD's - it's the interface between the flash chips and the multi-channel target-mode SATA Flash controller. The multiple channels are ONFI channels. The SATA Flash controller comprises the SSD's disk-like interface to the outside world, and does all the "Flash housekeeping" in a hidden way.

Note that there's an FAQ at the ONFI web site, claiming that "No, ONFI is not another card standard."

From a different angle, note that the ONFI electrical-level interface (set of signals, framing, traffic protocol) is different from the native busses you can typically see in today's computers, such as FSB/QPI/PCI-e/PCI/LPC/ISA/DDR123_RAM. ONFI is not "seamless" or "inherent" to today's PC's: you have nowhere to attach that bus to, such that you'd have the Flash memory e.g. linear-mapped into the host system's memory space - which doesn't look like a good idea anyway, considering the Flash capacities and the CPU cores' address bus width (no it's not a full 64 bits - it's more like 32, 36 or maybe slightly more with the Xeons). Getting a "NAND connector" slot in your PC is not just a matter of the bus and connector and some passive PCB routing to some existing chipset platform. You'd need a "bridge" or "bus interface", most likely from PCI-e to ONFI (less likely straight from the root complex / memory hub). For several practical purposes, the hypothetical PCI interface would likely use a MMIO window + paged access to the ONFI address space, or possibly SG-DMA for optimum performance. I could imagine a simple interface using a general-purpose "PCI slave bridge" with DMA capabilities, similar to those currently made by PLX Corp. - except that those cannot do DDR, the transfer rates are too low, the FIFO buffers are perhaps too small for a full NAND Flash page and the bridges can't do SG-DMA... The initiative would IMO have to come from chipset makers (read: Intel) who could integrate an ONFI port in the south bridge. I haven't found a single hint of any initiative in that vein. There are even no stand-alone chips implementing a dedicated PCI-to-ONFI "dumb bridge". Google reveals some "ONFI silicon IP cores" from a couple fabless silicon design companies - those could be used as the ONFI part of such a bridge, if some silicon maker should decide to go that way, or maybe some are "synthesizable" in a modern FPGA.

As for the basic idea, which is to "present raw NAND chips to the host system and let the host OS do the Flash housekeeping in software, with full knowledge of the gory details": clearly ONFI isn't going that way. And quite possibly, it's actually heading in precisely the opposite direction :-) There is a tendency to hide some of the gory details even at the chip interface level. On the ONFI Specs page you can find another "stand-alone paper" specifying "Block Abstracted NAND", as an enhancement to the basic ONFI 2.1 standards document. The paper is also referred back to by the ONFI 3.0 standard (where it lists BA NAND opcodes). Looks like an "optional LBA access mechanism to NAND Flash" (does this correlate with the moment SanDisk got a seat at the ONFI table, by any chance?) And in the ONFI 3.0 spec, you can find a chapter on "EZ NAND", which is to hide some of the gory details of ECC handling (at the chip interface level).

Ahh well...

Optimizing Linux with cheap flash drives

Posted Feb 24, 2011 21:35 UTC (Thu) by ajb (subscriber, #9694) [Link] (1 responses)

Possibly it would be easier, instead of exposing the internals of the SD card, for the OS to provide computation and memory services to the SD card. This would have to be optional, because the SD card might be plugged into a cheap camera or something with no memory either. But it would be a fairly simple interface, which would not need to change based on the card internals.

Optimizing Linux with cheap flash drives

Posted Feb 24, 2011 22:27 UTC (Thu) by zlynx (guest, #2285) [Link]

Perhaps a Java class stored on the storage card. It could implement some well-defined interface type and its constructor could take some parameters for things like a hardware interface class, memory buffer, debug logger and a few other things.

It could run from userspace with the right interface class. Or from the kernel if someone wrote a simplified Java interpreter or maybe a module compiler.

I suppose instead of Java it could be written in whatever VM it is that ACPI uses. Kernels already have interpreters for that.

It could be fairly nifty.

Optimizing Linux with cheap flash drives

Posted Feb 18, 2011 23:54 UTC (Fri) by ewen (subscriber, #4772) [Link] (1 responses)

That's good news about the 2.5" SSD flash drives. From what I'd read to date it seemed like the better ones were using decent (random read/write performance oriented) algorithms, but the cheaper ones were still using a fairly naive approach (and, eg, getting much slower as soon as all the flash cells had been written at least once). Perhaps we're further along the "adopt the IO controller" curve this time than I thought.

As you say, the (physically) smaller devices (CF/SD/etc), especially at the low cost end of the market, are always going to be constrained by available processing resources. So maybe some sort of "direct media control" API is the most optimal answer at the low end, especially if we can avoid the worst of the "fakeraid" situation (one-OS-version-only binary blobs). (There's also a higher risk of bricking the drive if you're moving, eg, a SD card back and forth between something doing its own low level access and something using the higher level API and firmware access. But like the NTFS driver presumably eventually enough the details will be right that people can trust it. And embedded devices with, eg, internal SD, can mostly ignore that risk.)

Ewen

Optimizing Linux with cheap flash drives

Posted Feb 19, 2011 9:03 UTC (Sat) by roblucid (guest, #48964) [Link]

I just can't see why or who in the industry is going to have enough interests in this to "make it so". These cards that I come across are intended for storing camera type output, creating jpg and avi files or in embedded Sat Nav systems (replacing DVD).

They may be slow (and especially on writes) but for intended purpose, they're quick enough, how is there ever going to be momentum to create a market standard to convenice the small minority "enthusiast" market segment, who want to "hack" around with the hardware.

The SSD drive manufacturers see peformance block I/O support for NTFS, ext3/4, xfs and eventually btrfs as a way to add value and differentiate their product. Any alternative to the ATA interface, requires widespread software & hardware support, perhaps it would have happened if Flash had the kind of marketing hoop-la & attention that CPU architecture receives. The fact is, MS tried Readyboost feature in Vista to allow flash drives for fast virtual memory, and it flopped horribly in practice as the drives weren't quick enough, and memory prices fell fast enough to throw RAM at the paging problem.

Now perhaps there has been an opportunity in smart Phones and embedded manufacturers wanting to avoid MS patent taxes on use of FAT; but again when I read around on reviews, noone talks about filesystem performance, they're getting excited by multi-core, to resolve possible latency issues which show up as "UI sluggishness". It seems again that either the performance of the flash drive is good enough, or they've mitigated the issues in products and that becomes part of the competitive advantage.

Without encumbunts needing it, who's going to develop "direct media control"?