LSFMM: Storage data integrity
In a combined Storage and Filesystem track session at the 2013 LSFMM Summit, Darrick Wong kicked off a discussion of storage data integrity protection. He gave a talk at the 2011 Linux Plumbers Conference on the same topic, so he wanted to update the assembled developers on what had changed. There are standards for attaching metadata to data that is being written to or read from storage devices that can be used to check the data's integrity. The SCSI data integrity field (DIF) can hold a CRC to detect data corruption as well as 32-bit block numbers that can catch misplaced writes.
There are applications that want to use DIF, but there is currently no user-space interface to do so. One way to get there might be using space in the kernel asynchronous I/O (KAIO) interface to add a pointer to "something else" that holds the protection information. Another option is to "pull a Windows trick" and have length and version information so that the kernel knows which version of the API is being used, thus how much data to copy from user space, Wong said.
Another idea that comes up repeatedly is to use Joel Becker's batched I/O interface, called sys_dio, Wong said. That interface, which provides a way to attach integrity information to I/O operations, was originally something Becker did that was customized for Oracle's use case. Becker would like to make it more generic. It is a nicer interface that is "purely asynchronous" for direct I/O (i.e. O_DIRECT); Becker put out an RFC for sys_dio two years ago. It was used by Martin Petersen to pass the protection information in and out of the kernel, but neither he nor Petersen has yet had time to work on finishing it up.
With his "database hat on", Petersen said that both solutions (KAIO and sys_dio) would be useful. He went on to describe how applications use the protection information to widen the window over which the data is protected. The application can query the block layer to find out what kind of CRC to generate. Those applications (typically database systems like Oracle and MySQL) already have a block-oriented view, so they calculate the proper CRC to send with the data. If it arrives and doesn't pass the CRC test, the application may still be able to recreate it, which is why it is interested in integrity handling. The kernel can do the calculation for other applications, Petersen said.
The SCSI T10 DIF is only concerned with protection on the path between the host adapter (HBA) and the storage device, but Petersen authored the data integrity extensions (DIX) [PDF] to add end-to-end data integrity by including the operating system and applications. He is looking into making DIX look "less blocky" on the host, so that it could calculate a CRC on a list of scatter/gather I/O operations, then pass it to the HBA, which could then write it in the proper block format.
But Becker was not convinced that applications need to be shielded from dealing with blocks directly. Any application that cares about end-to-end integrity will also care about the blocks on disk. Petersen would like sys_dio to not preclude byte-oriented uses, though. Wong said that it is easier to use KAIO for that case, however.
In the end, the storage and filesystem developers agreed to look carefully
at what Wong plans to post to the lists over the coming months, with an eye
toward
resolving these issues.
Index entries for this article | |
---|---|
Kernel | Data integrity |
Conference | Storage, Filesystem, and Memory-Management Summit/2013 |
Posted Apr 25, 2013 20:45 UTC (Thu)
by mkp (subscriber, #45897)
[Link]
sys_dio was drafted based on the experiences Joel and I gathered developing oracleasm. sys_dio is an attempt to make the interface generic and not tied to the Database or ASM.
LSFMM: Storage data integrity