Hard drive protection

[Posted October 12, 2005 by corbet]

One of the many features which will be shipped with the 2.6.14 kernel will be a driver for the "hard drive active protection system" found in some ThinkPad laptops. This system provides a set of sensors, and, in particular, an accelerometer which can report on the position of the laptop - and how quickly that position is changing. There are a number of applications of such device - such as a version of neverball played by tipping the laptop. The real purpose, however, is to enable the system to react to a fall and attempt to protect the hard drive.

The next step in the implementation of that purpose is the hard drive protection patch recently posted by Jon Escombe. This patch adds two new callbacks to the block request queue which drivers can provide:

    typedef int (issue_protect_fn) (request_queue_t *);
    typedef int (issue_unprotect_fn) (request_queue_t *);

If the driver provides these functions, the request queue, as seen in sysfs, will contain a new protect attribute. If a value is written to that attribute, the block system will interpret it as an integer number of seconds. The issue_protect_fn() will be called, and the request queue will be plugged for the indicated number of seconds. When that time expires, issue_unprotect_fn() will be called and the queue will be restarted.

The theory of operation here is that a user-space daemon will be monitoring the status of the system, as reported by the accelerometer. Should this daemon note that the laptop has begun to accelerate, it will quickly write a value to the protect attribute for each drive in the system. The drives will respond by parking the disk heads, and, in any other possible way, telling the drive to crawl into its shell and prepare for impact. Once the event has transpired, the shattered remains of the laptop can attempt to resume normal operation.

The idea seems reasonable, but block maintainer Jens Axboe has turned down the patch for now. Says Jens:

We have far too many queue hooks already, adding two more for a relatively obscure use such as this one is not a good idea.

The number of request queue callbacks is indeed large. Some of them have little to do with drivers (there's one which is called whenever disk activity happens, for example; it can be used to flash a keyboard LED in the absence of a hardware disk activity light), but others, such as the ones discussed here, are direct requests to the underlying block driver. The use of callbacks seems a little redundant in this situation, given that the request queue is, fundamentally, a mechanism for conveying commands to block drivers. The right solution might thus be to use the request queue to carry commands beyond those requesting the movement of blocks to and from the drive.

To an extent, the request queue is already used this way. Packet commands, ATA task file commands, and power management commands can be fed to drivers through the queue. In each case, the flags field of struct request is used to indicate that something special is being requested. The use of flags in this way is getting a little unwieldy, however, leading to the consideration of a new approach.

That approach, as seen in a patch held by Jens, is to add a new field (cmd_type) to struct request which indicates the type of command embodied by each request. Currently-anticipated types include packet commands, sense requests, power management commands, flush requests, driver-specific special requests, and Linux-specific, generic requests. Oh, and the occasional request to move a disk block in one direction or the other. The addition of cmd_type turns struct request into a generic carrier of commands to a disk drive.

With this mechanism in place, the "brace yourself, we're falling!" message becomes just another Linux-specific block request type. When such an event happens, the kernel need only place one of those messages on the queue - preferably at the head of the queue - and call the driver's request() function. The driver can then prepare the drive for the coming catastrophe and plug the queue itself. No additional callbacks required.

This approach does involve some significant changes to the block layer, however, and would include a driver API change. So it is not likely to take a quick path into the kernel. The hard drive protection mechanism, which will require the new API, thus looks likely to wait in line for a while yet.

Hard drive protection

Posted Oct 13, 2005 5:04 UTC (Thu) by jwb (guest, #15467) [Link] (7 responses)

Why is this level of software support needed? If the designers had any sense of decency, the
accelerometer would be an input to the hard disk's voice coil driver. The heads would park without
any help from some shell script.

I cannot even imagine the proposed implementation working. Imagine the accelerometer firing an
interrupt, which the kernel eventually gets around to servicing in a few hundred microseconds.
Afterwards the userspace daemon gets scheduled. Of course it has to be paged in, which takes a
bit. After a few seconds the daemon tries to park the heads, which are already smashed into the
platters.

I can even imagine a situation where this system, by paging in, causes a previously idled and parked
hard disk to spin up and run, just in time for the impact.

Hard drive protection

Posted Oct 13, 2005 6:55 UTC (Thu) by lienha_r (guest, #29121) [Link] (3 responses)

The problem is that you need quite a few heuristics in order to determine if the laptop really is falling: the sensors generate a lot of noise and it is not trivial to determine what is exactly happening. Imagine you're in a train, which is a typical situation where the laptop often receives little bumps. Would you like the disk head to park every 2 seconds?

For the paging problem, a simple solution is to mlock() the user-space program (that problem was also raised on the LKML) and to nice() it to a negative value.

Hard drive protection

Posted Oct 13, 2005 12:12 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Your point is a good one, but, if I was, say, Fujitsu, I'd be looking to develop a minimialist, embedded piece of hardware in the drive itself. Call it "Project Codpiece".
It would be a great way to differentiate a hard drive product in the market.
Once the idea catches on, motherboard makers can put such a device on their products, and integrate it with the BIOS, so that you get an interrupt to park hard drive and secure the power supply, as well.

Hard drive protection

Posted Oct 15, 2005 5:40 UTC (Sat) by dvdeug (guest, #10998) [Link] (1 responses)

There may be quite a few heuristics, but probably not more than a few k of code. I doubt a userspace heuristic could do a lot better then something built into the hard drive.

As for the train situation, a disk head parking every 2 seconds wouldn't completely kill performance, and I don't see any reason why a userspace demon would be better at handling this situation than an embedded chip. In fact, a embedded chip could probably freeze and unfreeze the disk faster than a userspace demon could, making even pathological situations run better with an embedded chip instead of a userspace demon.

Hard drive protection

Posted Oct 31, 2005 4:48 UTC (Mon) by syzygylwn (guest, #33471) [Link]

1) Yes Train (in my case boat) movements are very hard to differentiate from falling.

2) Yes, even heads parking every few seconds does severely impact performance.

3) The IBM windows drivers have this problem.

I have a Thinkpad T43. It has this active protection system. I took a month long sailing trip which my laptop joined me on. As you might imagine a sail boat very rarely gets large shocks, usualy its more of slow rolls. The laptop (even on the lowest sensativity setting) would park its head durring all but the most calm settings. The real problem is that these sensors have to detect the notebook falling off of a table and speeding towards the floor, before it hits, with enough time to park the heads for impact. This is no small trick. It works well in most other situations I've put it in, but then again I was most likely to drop it on the boat which is when I had to shut it off to get any work done.

Hard drive protection

Posted Oct 13, 2005 16:52 UTC (Thu) by ballombe (subscriber, #9523) [Link] (2 responses)

Why is this level of software support needed?
This is needed by the windows driver, it has to displays a popup:

             Falling laptop wizard

      Exclusive "hard drive active protection system" technology
      is detecting that your laptop is falling.

    Do you want Windows stop the hard disks ?
       [YES]       [NO]          [HELP]

Hard drive protection

Posted Oct 21, 2005 11:44 UTC (Fri) by tushar@mwti.net (guest, #30914) [Link] (1 responses)

So u expect to press yes when it falls from table? If u can press yes that why not to hold it properly.

Hard drive protection

Posted Oct 28, 2005 6:10 UTC (Fri) by turpie (guest, #5219) [Link]

If you're confused as to how that popup dialog operates try reading this article for more information.

Powerbooks as well

Posted Oct 13, 2005 9:22 UTC (Thu) by wingo (guest, #26929) [Link]

FWIW recent powerbooks also have accelerometers with linux drivers, although fewer people are hacking on them:

http://www.popies.net/ams/

Hard drive protection

Posted Oct 14, 2005 17:08 UTC (Fri) by giraffedata (guest, #1954) [Link]

This function is totally out of place in the block layer. The matter of an impact destroying a device has nothing to do conceptually with storing and retrieving blocks. It is conceptually possible not only for a block device to involve no impact-sensitive hardware, but for impact-sensitive hardware to involve no block device.

This needs to go well below the request queue, down nearer the level where things like physical shock are understood.

It seems to me that a device driver that can use warning of an impending shock to protect its device should just register with a totally independent service to be notified. Assuming it's a disk device implementing a Linux block device, then when the driver gets notified, it parks the heads and delays sending any new requests to the device for a while; to the block layer, it just looks like a long-running request.

Hard drive protection

Posted Oct 14, 2005 21:08 UTC (Fri) by zblaxell (subscriber, #26385) [Link]

I can see the bug reports now:

Bugzilla #121314: hdapsd failed to park heads last week

Description:
Last week my laptop fell off of the table due to its power cable becoming entangled with the resident feline. According to the debugging log, the specific sequence of accelerations as the laptop bounced off of first the cat, then the table leg, then the floor, then down the stairs, was analyzed by the Bayesian classifier and determined to be 95% likely to be "harmless accelerometer noise" and the hard disk heads were not parked. Unfortunately, when the laptop hit the bottom step and slid across the floor to the wall, the final impact caused the disk heads to slam into the journal area of the filesystem, destroying both completely. Fortunately I recorded the incident with several high-speed video cameras that I just happen to have distributed throughout my house, and I was able to precisely measure the distances travelled by the laptop using the black-and-white square tiles on the floors and walls, which are all exactly 27.6cm on a side.
Now that I've replaced the hard disk and restored my backups, I can now get online, open this bug and attach the video files for some open-source guru to analyze.

I too am wondering what the hard disk manufacturers are thinking. There is at least one (sometimes three or more) embedded CPU inside a hard disk which could process the accelerometer data using code and algorithms written by people who actually have the equipment to test and debug them. If there's room for the sensor device in the drive controller board and chipset, there's room for the code and data to process its output too.

What's next? Do we start moving the ECC, DSP, or servo control functions out of the drive and into the host CPU? Modern host CPU's are certainly capable of doing the data processing (at least the parts that can tolerate high latency), it would save a few cents per disk to reduce the specs of the embedded controller a little, and users who buy the cheapest disk available at any given time wouldn't notice. XT hard disks used to do even the highly latency-sensitive operations (remember sector interleaving?) in the host CPU, so it's not a new idea. Heck, last time I checked, the Microsoft family of operating systems still supports the sector interleave parameter for formatting disks...

IMHO subsystems like this (including other things like fan controllers and CPU temperature sensors should not depend on the host CPU at all. The host CPU is busy doing other things, thank you very much, and may not always be able to respond to a whiny little sensor nobody has ever heard of, even if that sensor is warning of imminent destruction. This is why good laptops have embedded controllers with their own CPU, memory, and a sane policy in firmware. It's always nice to be able to control these things from the host OS, but a sane system design never requires it.

What happens to one of these disks if the host CPU's OS locks up, and the frustrated user throws the laptop at a nearby wall? ;-)

Hard drive protection

Posted Oct 16, 2005 11:11 UTC (Sun) by pengo (guest, #7787) [Link]

Will one falling hard drive be able to trigger, say, secondary hard drives in your laptop to park their heads too?

Hard drive protection

Posted Oct 16, 2005 23:48 UTC (Sun) by njhurst (guest, #6022) [Link] (2 responses)

It seems to me that parking the heads when extra accelleration is detected is a bit too late. Wouldn't it be better to park the heads if more than x seconds of near 0G occurs? The x corresponds to the amount time required for the laptop to fall a dangerous distance - about 0.3s for a 0.5m fall by my calculations; minus the time to park the heads (hopefully a lot less than 0.3 seconds).

Hard drive protection

Posted Oct 28, 2005 6:44 UTC (Fri) by turpie (guest, #5219) [Link] (1 responses)

Not quite.
When the laptop is on your lap it would be at 1G. When you drop it it would accellerate as it fell to the ground. It would be accellerating before it reached 0G.

Hard drive protection

Posted Jan 4, 2006 8:40 UTC (Wed) by thedave (guest, #34932) [Link]

I'm gonna have to go ahead and disagree with you there.

While the laptop is on your lap, its accelerometer will be sensing an acceleration of 1G up (away from Earth).

The moment the support of your lap is removed, its new acceleration will be 0G + (air resistance away from Earth) ~= 0G, as air resistance is inconsequential at low speeds.

But, for all intents and purposes the measured acceleration while falling will be zero, until it encounters something to change it.