By Jonathan Corbet
April 6, 2011
The opening plenary session at the second day of the 2011 Linux Filesystem,
Storage, and Memory Management workshop was led by Michael Cornwall, the
global director for technology standards at
IDEMA, a standards organization for disk
drive manufacturers. Thirteen years ago, while working for a hardware
manufacturer, Michael had a hard time finding the right person to talk to in the
Linux community to get support for his company's hardware. Years later,
that problem still exists; there is no easy way for the hardware industry
to work with the Linux community, with the result that Linux has far
less influence than it should. His talk covered the changes that are
coming in the storage industry and how the Linux community can get involved
to make things work better.
The International Disk Drive Equipment and Materials Association works to
standardize disk drives and the many components found therein. While some
may say that disk drives - rotating storage - are on their way out, the
fact of the matter is that the industry shipped a record 650 million
drives last year and is on track to ship one billion drives in 2015. This
is an industry which is not going away anytime soon.
Who drives this industry? There are, he said, four companies which control
the direction the disk drive industry takes: Dell, Microsoft, HP, and EMC.
Three of those companies ship Linux, but Linux is not represented in the
industry's planning at all.
One might be surprised by what's found inside contemporary drives. There
is typically a multi-core ARM processor similar to those found in
cellphones, and up to 1GB of RAM. That ARM processor is capable, but it
still has a lot of dedicated hardware help; special circuitry handles
error-correcting code generation and checking, protocol implementation,
buffer management, sequential I/O detection, and more. Disk drives are
small computers running fairly capable operating systems of their own.
The programming interface to disk drives has not really changed in
almost two decades: drives still offer a randomly-accessible array of 512-byte
blocks addressable by a logical block addresses. The biggest problem on
the software side is trying to move the heads as little as possible. The
hardware has advanced greatly over these years, but it is still stuck with
"an archaic programming architecture." That architecture is going to have
to change in the coming years, though.
The first significant change has been deemed the "advanced format" - a
fancy term for 4K sector drives. Christoph
Hellwig asked for the opportunity to chat with the marketing person who
came up with that name; the rest of us can only hope that the conversation
will be held on a public list so we can all watch. The motivation behind
the switch to 4K sectors is greater error-correcting code (ECC)
efficiency. By using ECC to protect larger sectors, manufacturers can gain
something like a 20% increase in capacity.
The developer who has taken the lead in making 4K-sector disks work
with Linux is Martin Petersen; he complained that he has found it to be
almost impossible to work on new technologies with the industry.
Prototypes from manufacturers worked fine with Linux, but the first
production drives to show up failed outright. Even with his "800-pound
Oracle hat" on, he has a hard time getting a response to problems.
"Welcome," Michael responded, "to the hard drive business." More
seriously, he said that there needs to be a "Linux certified" program for
hardware, which probably needs to be driven by Red Hat to be taken
seriously in the industry. Others agreed with this idea, adding that, for
this program to be truly effective, vendors like Dell and HP would have to
start requiring Linux certification from their suppliers.
4K-sector drives bring a number of interesting challenges beyond the
increased sector size. Windows 2000 systems will not properly align
partitions by default, so some manufacturers have created
off-by-one-alignment drives to compensate. Others have stuck with normal
alignment, and it's not always easy to tell the two types of drive apart.
Meanwhile, in response to requests from Microsoft and Dell, manufacturers
are also starting to ship native 4K drives which do not emulate 512-byte
sectors at all. So there is a wide variety of hardware to try to deal
with. There is an evaluation kit
available for developers who would like to work with the newer drives.
The next step would appear to be "hybrid drives" which combine rotating
storage and flash in the same package. The first generation of these
drives did not do very well in the market; evidently Windows took over
control of the flash portion of the drive, defeating its original purpose,
so no real performance benefit was seen. There is a second generation
coming which may do better; they have more flash storage (anywhere from 8GB
to 64GB) and do not allow the operating system to mess with it, so they
should perform well.
Ted Ts'o expressed concerns that these drives may be optimized for
filesystems like VFAT or NTFS; such optimizations tend not to work well
when other filesystems are used. Michael replied that this is part of the
bigger problem: Linux filesystems are not visible to the manufacturers.
Given a reason to support ext4 or btrfs the vendors would do so; it is,
after all, relatively easy for the drive to look at the partition table and
figure out what kinds of filesystem(s) it is dealing with. But the vendors
have no idea of what demand may exist for which specific Linux filesystems,
so support is not forthcoming.
A little further in the future is "shingled magnetic recording" (SMR). This
technology eliminates the normal guard space between adjacent tracks on the
disk, yielding another 20% increase in capacity. Unfortunately, those
guard tracks exist for a reason: they allow one track to be written without
corrupting the adjacent track. So an SMR drive cannot just rewrite one
track; it must rewrite all of the tracks in a shingled range. What that
means, Michael said, is that large sequential writes "should have
reasonable performance," while small, random writes could perform poorly
indeed.
The industry is still trying to figure out how to make SMR work well. One
possibility would be to create separate shingled and non-shingled regions
on the drive. All writes would initially go to a non-shingled region, then
be rewritten into a shingled region in the background. That would
necessitate the addition of a mapping table to find the real location of
each block. That idea caused some concerns in the audience; how can I/O
patterns be optimized if the connection between the logical block address
and the location on the disk is gone?
The answer seems to be that, as the drive rewrites the data, it will put it
into something resembling its natural order and defragment it. That whole
process depends on the drive having enough idle time to do the rewriting
work; it was said that most drives are idle over 90% of the time, so that
should not be a problem. Cloud computing and virtualization might make
that harder; their whole purpose is to maximize hardware utilization, after
all. But the drive vendors seem to think that it will work out.
Michael presented four different options for the programming interface to
SMR drives. The first was traditional hard drive emulation with remapping
as described above; such drives
will work with all systems, but they may have performance problems.
Another possibility is "large block SMR": a drive which does all transfers
in large blocks - 32MB at a time, for example. Such drives would not be
suitable for all purposes, but they might work well in digital video
recorders or backup applications. Option three is "emulation with hints,"
allowing the operating system to indicate which blocks should be stored
together on the physical media. Finally, there is the full object storage approach where the drive knows
about logical objects (files) and tries to store them contiguously.
How well will these drives work with Linux? It is hard to say; there is
currently no Linux representation on the SMR technical committee. These
drives are headed for market in 2012 or 2013, so now is the time to try to
influence their development. The committee is said to be relatively open,
with open mailing lists, no non-disclosure agreements, and no oppressive
patent-licensing requirements, so it shouldn't be hard for Linux developers
to participate.
Beyond SMR, there is the concept of non-volatile RAM (NV-RAM). An NV-RAM
device is an array of traditional dynamic RAM combined with an
equally-sized flash array and a board full of capacitors. It operates as
normal RAM but, when the power fails, the charge in the capacitors is used
to copy the RAM data over to flash; that data is restored when the power
comes back. High-end storage systems have used NV-RAM for a while, but it
is now being turned into a commodity product aimed at the larger market.
NV-RAM devices currently come in three forms. The first looks like a
traditional disk drive, the second is a PCI-Express card with a special
block driver, and the third is "NV-DIMM," which goes directly onto the
system's memory bus. NV-DIMM has a lot of potential, but is also the
hardest to support; it requires, for example, a BIOS which understands the
device, will not trash its contents with a power-on memory test, and which
does not interleave cache lines across NV-DIMM devices and regular memory.
So it is not something which can just be dropped into any system.
Looking further ahead, true non-volatile memory is coming around 2015. How
will we interface to it, Michael asked, and how will we ensure that the
architecture is right? Dell and Microsoft asked for native 4K-sector
drives and got them. What, he asked, does the Linux community want? He
recommended that the kernel community form a five-person committee to talk
to the hard disk drive industry. There should also be a list of developers
who should get hardware samples. And, importantly, we should have a
well-formed opinion of what we want. Given those, the industry might just
start listening to the Linux community; that could only be a good thing.
(
Log in to post comments)