Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Posted Feb 18, 2011 22:36 UTC (Fri) by zlynx (guest, #2285)In reply to: Optimizing Linux with cheap flash drives by ewen
Parent article: Optimizing Linux with cheap flash drives
The problem is going to remain with CF and SD cards and memory sticks. These small devices cannot include the powerful processsing chips and/or 64 MB DRAM that some of the SSDs require for their magic.
I think these small devices would be much better off to create a new bypass mode in the IDE/SCSI/USB that exposes the actual flash to the host system. Give the real block size, block error status, erase, read and write commands. Might also need commands to access special metadata write areas like block maps, last written block, etc. Then we could run real flash filesystems on them.
Posted Feb 18, 2011 23:35 UTC (Fri)
by bronson (subscriber, #4806)
[Link] (15 responses)
I think everyone will agree with this. But, short of waving a magic wand (i.e. Microsoft or Intel write specs), I don't see any way of making this happen. It's a monster chicken-and-egg problem: OSes can't add support for devices that don't exist, and vendors won't bother implementing a raw interface until OSes can use it.
Posted Feb 19, 2011 0:00 UTC (Sat)
by saffroy (guest, #43999)
[Link]
Posted Feb 19, 2011 12:40 UTC (Sat)
by willy (subscriber, #9762)
[Link] (8 responses)
It's probably not even possible for Linux mainline to keep up with that frequency, let alone the enterprise distros or the embedded distros (I was recently asked "So what changed in the USB system between 2.6.10 and 2.6.37?"). And then there's the question about what to do for other OSes.
It's not just a question of suboptimal performance if you use the wrong algorithms for a given piece of flash; there are real problems of data loss and the flash wearing out. No flash manufacturer wants to be burdened with a massive in-warranty return because some random dude decided to change an '8' to a '16' in their OS that tens of millions of machines ended up running.
So yes, as Arnd says, the industry is looking to abstract away the difference between NAND chips and run the algorithms down on the NAND controller. I'm doing my best to help in the NVMHCI working group ... see
(I work for Intel, but these are just my opinions).
Posted Feb 19, 2011 19:06 UTC (Sat)
by ewen (subscriber, #4772)
[Link]
Ewen
Posted Feb 19, 2011 21:03 UTC (Sat)
by arnd (subscriber, #8866)
[Link] (6 responses)
However, at the low end that I looked at, most drives get everything wrong to start with: there is too little RAM and processing power to do the reordering that would be needed for ideal NAND access patterns, the drives only do dynamic wear leveling, if any, so they break down more quickly than necessary.
What we'd instead need for these devices is indeed a way to be smarter in the host about what it's doing. The block discard a.k.a. trim logic is one example of this that sometimes works already, but is not really enough to work with dumb controllers. What I'd like to see is an abstraction on segment level, using commands like "open this segment for sequential writes", "garbage-collect this segment now", "report status of currently open segments", "how often has this segment been erased?".
Posted Feb 19, 2011 22:40 UTC (Sat)
by willy (subscriber, #9762)
[Link] (1 responses)
The API you're suggesting makes a ton of sense for the low end devices. I don't think there's a whelk's chance in a supernova of it coming to anything, though. You'd need the SD association to require it, and I can't see it being in the interest of any of their members. When the reaction to "hey, your cards suck for this other filesystem" is "your filesystem is wrong", I can't see them being enthusiastic about something this radical.
I do see that Intel are members. I'll try to find out internally if something like this could fly.
Posted Nov 21, 2016 8:57 UTC (Mon)
by Hi-Angel (guest, #110915)
[Link]
Posted Feb 20, 2011 3:25 UTC (Sun)
by Oddscurity (guest, #46851)
[Link] (3 responses)
Or would that be the wrong conclusion?
Not that it's all I took away from this great article, but I'm wondering what I can do in the meantime to optimise my use with such devices.
Posted Feb 20, 2011 4:07 UTC (Sun)
by ewen (subscriber, #4772)
[Link] (2 responses)
It appears if you want to run ext3 on a cheap flash drive, you pretty much have to assume that it's going to be slower than advertised (possibly MUCH slower, especially for write), and that there's a very real risk of wearing out some areas of the flash faster than might be expected. Probably okay for a mostly-read workload if you ensure that you turn off atime completely (or every read is also a write!), but not ideal for something with regular writes.
If it's an option for your use case, then sticking with the original FAT file system -- and using it natively -- is probably the least bad option. Certainly that's what I do with all my portable drives that see any kind of regular updates. (It also has the benefit that I don't have to worry about drivers for the file system on any system I might plug it into.)
Ewen
Posted Feb 20, 2011 14:11 UTC (Sun)
by Oddscurity (guest, #46851)
[Link]
I may as well switch to just FAT32 for part of the use cases and the other ones are dominated by reads, so can stay on ext.
Posted Nov 21, 2016 9:10 UTC (Mon)
by Hi-Angel (guest, #110915)
[Link]
I'm wondering btw, why didn't the article have a chapter about finding out those sizes from the original FS. Last time I searched (½year ago), I only found people trying out timing attacks to the stick for that kind of things, though getting info for FS just after the stick bought would be way simpler. I'll check it out, perhaps below in comments someone mentioned it.
Posted Feb 21, 2011 14:41 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
Agreed, and any way out of this situation would require (at least) a transition phase were some devices support either mode, letting the operating system choose.
Is such a "dual-mode" technically feasible?
Posted Feb 22, 2011 14:44 UTC (Tue)
by etienne (guest, #25256)
[Link] (1 responses)
Posted Apr 27, 2011 20:44 UTC (Wed)
by frr (guest, #74556)
[Link]
Since 2006 or 2007, there have been several revisions of the ONFI interface standard: 1.0, 1.1, 2.0, 2.1, 2.2, 2.3, and recently 3.0. The most visible differences are in transfer rate.
ONFI 3.0 does refer to an LGA-style socket (maybe two flavours thereof), apart from a couple of standard BGA footprints. Which would possibly allow for field-replaceable/upgradeable chip packages, similar to today's CPU's. Note that the 3.0 spec doesn't contain a single occurrence of the word "connector" :-)
As far as I'm concerned, for most practical purposes, ONFI remains a Flash chip-level interface standard. It seems ONFI is inside the current Intel SSD's - it's the interface between the flash chips and the multi-channel target-mode SATA Flash controller. The multiple channels are ONFI channels. The SATA Flash controller comprises the SSD's disk-like interface to the outside world, and does all the "Flash housekeeping" in a hidden way.
Note that there's an FAQ at the ONFI web site, claiming that "No, ONFI is not another card standard."
From a different angle, note that the ONFI electrical-level interface (set of signals, framing, traffic protocol) is different from the native busses you can typically see in today's computers, such as FSB/QPI/PCI-e/PCI/LPC/ISA/DDR123_RAM. ONFI is not "seamless" or "inherent" to today's PC's: you have nowhere to attach that bus to, such that you'd have the Flash memory e.g. linear-mapped into the host system's memory space - which doesn't look like a good idea anyway, considering the Flash capacities and the CPU cores' address bus width (no it's not a full 64 bits - it's more like 32, 36 or maybe slightly more with the Xeons). Getting a "NAND connector" slot in your PC is not just a matter of the bus and connector and some passive PCB routing to some existing chipset platform. You'd need a "bridge" or "bus interface", most likely from PCI-e to ONFI (less likely straight from the root complex / memory hub). For several practical purposes, the hypothetical PCI interface would likely use a MMIO window + paged access to the ONFI address space, or possibly SG-DMA for optimum performance. I could imagine a simple interface using a general-purpose "PCI slave bridge" with DMA capabilities, similar to those currently made by PLX Corp. - except that those cannot do DDR, the transfer rates are too low, the FIFO buffers are perhaps too small for a full NAND Flash page and the bridges can't do SG-DMA... The initiative would IMO have to come from chipset makers (read: Intel) who could integrate an ONFI port in the south bridge. I haven't found a single hint of any initiative in that vein. There are even no stand-alone chips implementing a dedicated PCI-to-ONFI "dumb bridge". Google reveals some "ONFI silicon IP cores" from a couple fabless silicon design companies - those could be used as the ONFI part of such a bridge, if some silicon maker should decide to go that way, or maybe some are "synthesizable" in a modern FPGA.
As for the basic idea, which is to "present raw NAND chips to the host system and let the host OS do the Flash housekeeping in software, with full knowledge of the gory details": clearly ONFI isn't going that way. And quite possibly, it's actually heading in precisely the opposite direction :-) There is a tendency to hide some of the gory details even at the chip interface level. On the ONFI Specs page you can find another "stand-alone paper" specifying "Block Abstracted NAND", as an enhancement to the basic ONFI 2.1 standards document. The paper is also referred back to by the ONFI 3.0 standard (where it lists BA NAND opcodes). Looks like an "optional LBA access mechanism to NAND Flash" (does this correlate with the moment SanDisk got a seat at the ONFI table, by any chance?) And in the ONFI 3.0 spec, you can find a chapter on "EZ NAND", which is to hide some of the gory details of ECC handling (at the chip interface level).
Ahh well...
Posted Feb 24, 2011 21:35 UTC (Thu)
by ajb (subscriber, #9694)
[Link] (1 responses)
Posted Feb 24, 2011 22:27 UTC (Thu)
by zlynx (guest, #2285)
[Link]
It could run from userspace with the right interface class. Or from the kernel if someone wrote a simplified Java interpreter or maybe a module compiler.
I suppose instead of Java it could be written in whatever VM it is that ACPI uses. Kernels already have interpreters for that.
It could be fairly nifty.
Posted Feb 18, 2011 23:54 UTC (Fri)
by ewen (subscriber, #4772)
[Link] (1 responses)
As you say, the (physically) smaller devices (CF/SD/etc), especially at the low cost end of the market, are always going to be constrained by available processing resources. So maybe some sort of "direct media control" API is the most optimal answer at the low end, especially if we can avoid the worst of the "fakeraid" situation (one-OS-version-only binary blobs). (There's also a higher risk of bricking the drive if you're moving, eg, a SD card back and forth between something doing its own low level access and something using the higher level API and firmware access. But like the NTFS driver presumably eventually enough the details will be right that people can trust it. And embedded devices with, eg, internal SD, can mostly ignore that risk.)
Ewen
Posted Feb 19, 2011 9:03 UTC (Sat)
by roblucid (guest, #48964)
[Link]
They may be slow (and especially on writes) but for intended purpose, they're quick enough, how is there ever going to be momentum to create a market standard to convenice the small minority "enthusiast" market segment, who want to "hack" around with the hardware.
The SSD drive manufacturers see peformance block I/O support for NTFS, ext3/4, xfs and eventually btrfs as a way to add value and differentiate their product. Any alternative to the ATA interface, requires widespread software & hardware support, perhaps it would have happened if Flash had the kind of marketing hoop-la & attention that CPU architecture receives. The fact is, MS tried Readyboost feature in Vista to allow flash drives for fast virtual memory, and it flopped horribly in practice as the drives weren't quick enough, and memory prices fell fast enough to throw RAM at the paging problem.
Now perhaps there has been an opportunity in smart Phones and embedded manufacturers wanting to avoid MS patent taxes on use of FAT; but again when I read around on reviews, noone talks about filesystem performance, they're getting excited by multi-core, to resolve possible latency issues which show up as "UI sluggishness". It seems again that either the performance of the flash drive is good enough, or they've mitigated the issues in products and that becomes part of the competitive advantage.
Without encumbunts needing it, who's going to develop "direct media control"?
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
http://www.bswd.com/FMS10/FMS10-Huffman-Onufryk.pdf for a presentation given last August.
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
The way that the SD card association deals with the problem is to declare all file systems other than FAT32 (with 32KB clusters) unsupported.
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
http://onfi.org/specifications/
There is even a connector for FLASH looking like the SDRAM connector.
Re: ONFI (Optimizing Linux with cheap flash drives)
The "NAND connector" spec from 2008 is a separate paper - not an integral part of the main standard document. The NAND Connector paper refers to ONFI 1.0 and 2.0 standards documents. But - have you ever seen some motherboard or controller board with an ONFI socket? I haven't. In the meantime, there's ONFI 3.0 - it postulates some changes to the set of electrical signals, for the sake of PCB simplification - but there's no update to the "NAND connector" paper. To me that would hint that the NAND connector is a dead end - a historical branch of evolution that has proved fruitless... Please correct me if I'm wrong there, as I'd love to be :-)
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives
Optimizing Linux with cheap flash drives