Future storage technologies and Linux
The International Disk Drive Equipment and Materials Association works to standardize disk drives and the many components found therein. While some may say that disk drives - rotating storage - are on their way out, the fact of the matter is that the industry shipped a record 650 million drives last year and is on track to ship one billion drives in 2015. This is an industry which is not going away anytime soon. Who drives this industry? There are, he said, four companies which control the direction the disk drive industry takes: Dell, Microsoft, HP, and EMC. Three of those companies ship Linux, but Linux is not represented in the industry's planning at all.
One might be surprised by what's found inside contemporary drives. There is typically a multi-core ARM processor similar to those found in cellphones, and up to 1GB of RAM. That ARM processor is capable, but it still has a lot of dedicated hardware help; special circuitry handles error-correcting code generation and checking, protocol implementation, buffer management, sequential I/O detection, and more. Disk drives are small computers running fairly capable operating systems of their own.
The programming interface to disk drives has not really changed in almost two decades: drives still offer a randomly-accessible array of 512-byte blocks addressable by a logical block addresses. The biggest problem on the software side is trying to move the heads as little as possible. The hardware has advanced greatly over these years, but it is still stuck with "an archaic programming architecture." That architecture is going to have to change in the coming years, though.
The first significant change has been deemed the "advanced format" - a fancy term for 4K sector drives. Christoph Hellwig asked for the opportunity to chat with the marketing person who came up with that name; the rest of us can only hope that the conversation will be held on a public list so we can all watch. The motivation behind the switch to 4K sectors is greater error-correcting code (ECC) efficiency. By using ECC to protect larger sectors, manufacturers can gain something like a 20% increase in capacity.
The developer who has taken the lead in making 4K-sector disks work with Linux is Martin Petersen; he complained that he has found it to be almost impossible to work on new technologies with the industry. Prototypes from manufacturers worked fine with Linux, but the first production drives to show up failed outright. Even with his "800-pound Oracle hat" on, he has a hard time getting a response to problems. "Welcome," Michael responded, "to the hard drive business." More seriously, he said that there needs to be a "Linux certified" program for hardware, which probably needs to be driven by Red Hat to be taken seriously in the industry. Others agreed with this idea, adding that, for this program to be truly effective, vendors like Dell and HP would have to start requiring Linux certification from their suppliers.
4K-sector drives bring a number of interesting challenges beyond the increased sector size. Windows 2000 systems will not properly align partitions by default, so some manufacturers have created off-by-one-alignment drives to compensate. Others have stuck with normal alignment, and it's not always easy to tell the two types of drive apart. Meanwhile, in response to requests from Microsoft and Dell, manufacturers are also starting to ship native 4K drives which do not emulate 512-byte sectors at all. So there is a wide variety of hardware to try to deal with. There is an evaluation kit available for developers who would like to work with the newer drives.
The next step would appear to be "hybrid drives" which combine rotating storage and flash in the same package. The first generation of these drives did not do very well in the market; evidently Windows took over control of the flash portion of the drive, defeating its original purpose, so no real performance benefit was seen. There is a second generation coming which may do better; they have more flash storage (anywhere from 8GB to 64GB) and do not allow the operating system to mess with it, so they should perform well.
Ted Ts'o expressed concerns that these drives may be optimized for filesystems like VFAT or NTFS; such optimizations tend not to work well when other filesystems are used. Michael replied that this is part of the bigger problem: Linux filesystems are not visible to the manufacturers. Given a reason to support ext4 or btrfs the vendors would do so; it is, after all, relatively easy for the drive to look at the partition table and figure out what kinds of filesystem(s) it is dealing with. But the vendors have no idea of what demand may exist for which specific Linux filesystems, so support is not forthcoming.
A little further in the future is "shingled magnetic recording" (SMR). This technology eliminates the normal guard space between adjacent tracks on the disk, yielding another 20% increase in capacity. Unfortunately, those guard tracks exist for a reason: they allow one track to be written without corrupting the adjacent track. So an SMR drive cannot just rewrite one track; it must rewrite all of the tracks in a shingled range. What that means, Michael said, is that large sequential writes "should have reasonable performance," while small, random writes could perform poorly indeed.
The industry is still trying to figure out how to make SMR work well. One possibility would be to create separate shingled and non-shingled regions on the drive. All writes would initially go to a non-shingled region, then be rewritten into a shingled region in the background. That would necessitate the addition of a mapping table to find the real location of each block. That idea caused some concerns in the audience; how can I/O patterns be optimized if the connection between the logical block address and the location on the disk is gone?
The answer seems to be that, as the drive rewrites the data, it will put it into something resembling its natural order and defragment it. That whole process depends on the drive having enough idle time to do the rewriting work; it was said that most drives are idle over 90% of the time, so that should not be a problem. Cloud computing and virtualization might make that harder; their whole purpose is to maximize hardware utilization, after all. But the drive vendors seem to think that it will work out.
Michael presented four different options for the programming interface to SMR drives. The first was traditional hard drive emulation with remapping as described above; such drives will work with all systems, but they may have performance problems. Another possibility is "large block SMR": a drive which does all transfers in large blocks - 32MB at a time, for example. Such drives would not be suitable for all purposes, but they might work well in digital video recorders or backup applications. Option three is "emulation with hints," allowing the operating system to indicate which blocks should be stored together on the physical media. Finally, there is the full object storage approach where the drive knows about logical objects (files) and tries to store them contiguously.
How well will these drives work with Linux? It is hard to say; there is currently no Linux representation on the SMR technical committee. These drives are headed for market in 2012 or 2013, so now is the time to try to influence their development. The committee is said to be relatively open, with open mailing lists, no non-disclosure agreements, and no oppressive patent-licensing requirements, so it shouldn't be hard for Linux developers to participate.
Beyond SMR, there is the concept of non-volatile RAM (NV-RAM). An NV-RAM device is an array of traditional dynamic RAM combined with an equally-sized flash array and a board full of capacitors. It operates as normal RAM but, when the power fails, the charge in the capacitors is used to copy the RAM data over to flash; that data is restored when the power comes back. High-end storage systems have used NV-RAM for a while, but it is now being turned into a commodity product aimed at the larger market.
NV-RAM devices currently come in three forms. The first looks like a traditional disk drive, the second is a PCI-Express card with a special block driver, and the third is "NV-DIMM," which goes directly onto the system's memory bus. NV-DIMM has a lot of potential, but is also the hardest to support; it requires, for example, a BIOS which understands the device, will not trash its contents with a power-on memory test, and which does not interleave cache lines across NV-DIMM devices and regular memory. So it is not something which can just be dropped into any system.
Looking further ahead, true non-volatile memory is coming around 2015. How
will we interface to it, Michael asked, and how will we ensure that the
architecture is right? Dell and Microsoft asked for native 4K-sector
drives and got them. What, he asked, does the Linux community want? He
recommended that the kernel community form a five-person committee to talk
to the hard disk drive industry. There should also be a list of developers
who should get hardware samples. And, importantly, we should have a
well-formed opinion of what we want. Given those, the industry might just
start listening to the Linux community; that could only be a good thing.
| Index entries for this article | |
|---|---|
| Kernel | Block layer |
| Kernel | Solid-state storage devices |
| Conference | Storage, Filesystem, and Memory-Management Summit/2011 |
