|Did you know...?|
LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.
Hannes Reinecke led two sessions at this year's Linux Storage, Filesystem, and Memory Management (LSFMM) Summit that were concerned with errors in the block layer. The first largely focused on errors that come out of partition scanners, while the second looked at a fundamental rework of the SCSI error-handling path.
Reinecke has added more detailed I/O error codes that are meant to help diagnose problems. One area where he ran into problems was that an EMC driver was returning ENOSPC when it hit the end of the disk during a partition scan. He would rather see that be an ENXIO, which is what the seven kernel partition scanners (and one in user space) return for the end-of-disk condition. So, he has remapped that error to ENXIO in the SCSI code. Otherwise, the thin provisioning code gets confused as it expects ENOSPC only when it hits its limit.
Al Viro was concerned that the remapped error code would make it all the way out to user space and confuse various tools. But Reinecke assured him that the remapped errors stop at the block layer. Being able to distinguish between actual I/O errors and the end-of-disk condition will also allow the partition scanner to stop probing in the presence of I/O errors, he said.
In another session, on day two, Reinecke presented a proposal for recovering from SCSI errors at various levels (LUN, target, bus, ...). In addition, doing resets at some of the levels do not make any sense depending on the kind of error detected, he said. If the target is unreachable, for example, trying to reset the LUN, target, or bus is pointless; instead a transport reset should be tried, if that fails, a host reset would be next. This would be the path taken when either a command times out or returns an error.
There were lots of complaints from those in attendance about resetting more than is absolutely necessary. That disrupts traffic to other LUNs when a single LUN or target has an error, even though the other LUNs are handling I/O just fine. Part of the problem, according to Reinecke, is that the LUN reset command does not time out.
But Roland Dreier noted that one missed I/O can cause a whole storage array to get reset, which can take a minute or more to clear. In addition, once the error handler has been entered, all I/O to the host in question is stopped. In some large fabrics, one dropped packet can lead to no I/O for quite some time, he said. Reinecke disputed that a dropped frame would lead to that, since commands are retried, but agreed that a more serious error could lead to that situation.
Complicating things further, of course, is that storage vendors all do different things for different errors. The recovery process for one vendor may or may not be the same as what is needed for another. In the end, it seemed like there was agreement that Reinecke's changes would make things better than what we have now, which is obviously a step in the right direction.
[ Thanks to the Linux Foundation for travel support to attend LSFMM. ]
Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds