libata new EH document

From:		Tejun Heo <htejun@gmail.com>
To:		Jeff Garzik <jgarzik@pobox.com>, albertcc@tw.ibm.com
Subject:		[RFC] libata new EH document
Date:		Mon, 29 Aug 2005 15:11:24 +0900
Cc:		linux-ide@vger.kernel.org
 Hello, Jeff, Albert & ATA developers.

 This is the final one of recent document series for libata EH - SCSI
EH, ATA exceptions, libata EH and, this one - libata new EH.

 This document tries to discuss how to implement new advanced EH.  It
also describes some proposed mechanisms in detail.  I'm aware that
things are vague without actual code, but I still think this document
alone can at least help discussion if nothing else.  As long as some
consensus is reached regarding general desing, I'll follow up with
patches.

 Jeff, a lot are from my previous new EH/NCQ patchset but also quite
a bit has changed (for better, I hope).

 Thanks.


libata new EH
======================================

 As discussed in the previous libata EH doc, the current libata EH
needs some improvements.  This document discusses goals of new libata
EH and how to reach them.  Please read SCSI EH, ATA exceptions and
libata EH documents first.

TABLE OF CONTENTS

[1] Goals & design choices
    [1-1] Use SCSI hostt->eh_strategy_handler()
    [1-2] Unified error path in an EH thread
    [1-3] Synchronization
    [1-4] Clean mechanism to hand off qc's to EH
    [1-5] Separate EH qc
    [1-6] SCSI/libata separation
[2] Designs
    [2-1] Handoff of failed qc's
    [2-2] Timed out scmd's and qc's
    [2-3] Summary of [2-1] and [2-2]
    [2-4] EH processing & completion
[3] Ideas
    [3-1] Using EH for non-error exceptions and dynamic reconfiguration
    [3-2] Using EH for host_set level exclusion
[4] Implementation plan


[1] Goals & design choices

 The final goal is implementing advanced error handling as described
in ATA exceptions document including NCQ EH, dynamic transport
reconfiguration and non-error exception handling for power management
and hot plugging.

 The followings are sub goals and design choices to reach the final
goal.


[1-1] Use SCSI hostt->eh_strategy_handler()

    We have two other alternatives here - one is using fine-grained
    SCSI EH callbacks and the other is implementing separate EH for
    libata.

    Using fine-grained SCSI EH callbacks is possible, but it has too
    much SCSI/SPI assumptions in it - ATA error handling can be quite
    different from SCSI error handling.  Also, as described in the
    SCSI EH doc, it issues several SCSI commands for recovery.  They
    can be translated but recovery through translation is a bit
    creepy, IMHO.

    The second option - private EH implementation - is attractive in
    that it will be better integrated into libata.  However,
    implementing a full EH when a generic framwork is already in place
    doesn't make a lot of people happy.  And, I think integration
    problems can be worked around without too much trouble.

    The basic semantics of eh_strategy_handler() are

    - Full context EH.

    - After EH is started, all normal command processing is suspended
      until EH is complete.

    - Once EH is determined to be necessary, active commands are
      drained by suppressing all command issuing and waiting for
      in-flight commands.  When EH is finally entered, all active
      commands are failed commands.

    IMO, above semantics are fairly fundamental to block device error
    handling and, in the future, to whatever framework libata
    migrates, assuming above semantics shouldn't hurt too much.


[1-2] Unified error path in an EH thread

    Currently EH is scattered around several places including the
    interrupt handler and polling tasks.  This is problemetic for the
    following reasons.

    a. Full EH context is required for error handling.

       Advanced recovery usually involves resetting, command issuing
       and other blocking operations.

    b. Simple errors may trigger complex error handling behavior.

       For example, when an ABRT error occurs, reporting to upper
       layer is sufficent for most cases; however, repeated ABRT
       errors for known-to-be-supported commands might indicate too
       high transmission speed.  In such cases, full EH context is
       required to perform error handling.

    c. Scattered complex EH is difficult to implement and maintain.

       EH logic can be somewhat complex and scattering won't help
       implementing and maintaining it.  Also, libata low level
       drivers are allowed to override callbacks where part of EH
       logic may reside making matters worse.


[1-3] Synchronization

    A simple & concrete qc synchronization model to make sure that EH
    and any other processing don't occur concurrently is needed.


[1-4] Clean mechanism to hand off qc's to EH

    For EH to handle errors and timeouts, letting EH deal with and
    complete both errored and timed out qc's is good for simplicity
    and consistency.  To achieve this, we need a mechanism to hand off
    a qc to EH.

    Currently, libata EH has a similar mechanism to hand off a failed
    ATAPI qc to EH.  As described in libata EH doc, such qc is
    half-completed and used as place holder until EH is kicked in and
    handles it.

    This half-completion isn't very clean semantically and requires
    calling splitted internal completion routines directly.  Also, as
    such qc's are not explicitly marked as failed, not-very-intuitive
    stuff has to be done to avoid spurious interrupts or other events
    from messing with it after error has occurred.


[1-5] Separate EH qc

    EH needs to issue qc's for recovery.  There can be several ways to
    allocate EH qc.

    a. reserve one extra qc for internal/EH commands
    b. reserve one of normal qc's
    c. use failed qc
    d. complete failed qc first and reuse it

    The preferred choice is #a for the following reasons.

    - Allowing only one concurrent internal command is okay as long as
      proper allocation mechanism is implemented or only one user is
      guaranteed.

    - EH commands are restricted to non-NCQ commands, so reserving an
      extra qc won't break qc to tag mapping.

    - #b is impossible for non-NCQ devices because only one qc is
      available.

    - #c requires dancing with qc's internals.  No real nerd likes
       dancing.

    - It may be necessary to issue commands to determine whether to
      finish or retry a qc, so #d is out.


[1-6] SCSI/libata separation

    Internal libata EH logic implementation should be free from SCSI
    considerations.  All glueing work should be localized to EH
    frontend and once in the actual error handling EH should only deal
    with qc's.


[2] Designs

 This section proposes detailed design of several important mechanisms
to help discussion and verification.


[2-1] Handoff of failed qc's

 As described above, when normal command processing determines that a
qc has failed, those qc's have to be handed off to EH without being
lost.

 A new qc flag ATA_QCFLAG_ERROR is defined to mark qc's which have
failed and ata_qc_error() is defined to be used by command processing
to mark failed qc and schedule EH.  ata_qc_error() has to be called
under the same condition as ata_qc_complete() - under host_lock - and
performs the following.

 1. First check if the command is already marked with
    ATA_QCFLAG_ERROR.  If so, this isn't the first error completion
    attempt, just return.

 2. Mark the qc with ATA_QCFLAG_ERROR.

 3. As, currently, SCSI command issuing is not atomic with respect to
    SHOST_RECOVERY flag, we need a separate atomic mechanism to plug
    command issuing.  Per-port flag ATA_FLAG_ERROR is set here to
    prevent further command issuing.

 4. Corresponding scmd's result code is set to
    SAM_STAT_CHECK_CONDITION and qc->scsidone() callback is called
    directly.  As we haven't filled sense data,
    scsi_determine_disposition() will return FAILED and SCSI EH will
    be scheduled.  Note that as we directly call qc->scsidone(), qc is
    left intact.

 After above function is complete, the following conditions are true.

 a. The qc has ATA_QCFLAG_ERROR set and no further normal qc
    processing will happen for the command.

 b. No new qc will be issued for the port.

 c. EH is scheduled.

 d. Corresponding scmd and qc are left alone until EH processes them.

 Note that to achieve above behavior, we need to modify other places
too.  e.g. ata_qc_complete() needs to be modified to ignore failed
qc's and command issuing part to fail issuing if ATA_FLAG_ERROR is
set.


[2-2] Timed out scmd's and qc's

 Because libata keeps separate command list as qc array, there can be
disagreement regarding which commands have timed out between SCSI and
libata.  Consider the following scenario.

 1. A scmd is issued.

 2. Corresponding qc is allocated, initialized & issued.

 3. SCSI timeout occurs & EH scheduled.

 4. The qc completes successfully.  Because timer already has expired,
    scsi_done() will return without doing anything.

 5. EH starts.

 In above case, we have a timed out scmd but the corresponding qc has
already completed and been deallocated, and this is the only case
where a failed or timed out scmd doesn't have its corresponding qc.
Note that if the qc failed in step #4, ata_qc_error() would have been
called, the qc tagged with ATA_QCFLAG_ERROR and EH would take steps in
[2-1].

 This can be easily worked around by scanning scmds on shost->eh_cmd_q
and complete scmds which don't have corresponding qc's with success
code.  This way, internal libata EH can be insulated from SCSI details
and can only deal with qc's.

 qc's which are determined to have timed out are marked with
ATA_QCFLAG_ERROR | ATA_QCFLAG_TIMEDOUT.  Note that all above should
happen atomically as we don't wanna race with interrupt handler or
polling tasks.


[2-3] Summary of [2-1] and [2-2]

 - All failed qc's will have ATA_QCFLAG_ERROR set.

 - All timed out qc's will have ATA_QCFLAG_ERROR and
   ATA_QCFLAG_TIMEDOUT set.

 - Whenever ATA_QCFLAG_EROR bit is set, ATA_FLAG_ERROR should also be
   set.

 - All of above three should be done while holding host_set lock.

 - ata_qc_complete() and ata_qc_error() should not perform any
   operation on qc's which have ATA_QCFLAG_ERROR set.

 - No non-internal commands should be allowed on ports which have
   ATA_FLAG_ERROR set.


[2-4] EH processing & completion

 After entered, EH can issue internal qc's for recovery.  Note that we
need to implement separate mechanisms for error handling and timeout
as we can't call into EH recursively.

 For errors, just reporting failure should be enough and this can be
easily implemented by calling ata_qc_complete() from ata_qc_error()
for internal commands.

 Separate timer should be used for internal commands.  When this timer
expires, the best we can do is completing the qc with failed status.

 EH code should be prepared to take appropriate actions to handle both
errors and timeouts such as resetting device on timeout.

 After necessary steps are taken for recovery and disposition is
determined for each failed qc, EH should retry or finish each failed
qc.  As noted in SCSI EH doc, eh_strategy_handler() should call either
scsi_finish_command() or scsi_queue_insert().  Because the failed qc's
are still active, overriding their ->scsidone callbacks appropriately
and performing ata_qc_complete() on those will do the job.  Note that
ATA_QCFLAG_ERROR checking should be bypassed when finishing off failed
qc's from EH.

 After all failed qc's are taken care of, libata EH should make sure
that all integrity constraints described in SCSH EH doc is met and
clear ATA_FLAG_ERROR on the port.  Returning from
hostt->eh_strategy_handler() will make SCSI midlayer resume normal
processing.


[3] Ideas

[3-1] Using EH for non-error exceptions and dynamic reconfiguration

 Handling non-error exceptions like hot plugging and dynamic
reconfigurations such as transfer speed lowering are best done inside
EH, but currently there is no way to invoke EH without failed or
timedout scmds.  IMHO, a mechanism to allow EH invocation without
failed scmd should be simple to implement in SCSI midlayer and can
solve this issue nicely.


[3-2] Using EH for host_set level exclusion

 Some EH / configuration actions require host_set level exclusion.
This also can be solved by adding the mechanism described in [3-1].
Before starting such an operation, EH's can be invoked on all other
ports.  After all ports are safely parked inside EH, the operation can
be performed.  After the operation is complete, other ports can be
released from EH.


[4] Implementation plan

 Implementation of new EH can be separated into two stages.  The first
is implementing EH framework.  i.e. qc handoff, EH invocation, qc
completion in EH and resuming normal operation.  The latter part is
implementing actual error handling logic according to ATA exceptions
doc.

 After completing the first step, the current error handling logic can
be moved onto the new framework.  As this won't change libata's
behavior viewed from controllers and devices, we only have to verify
the framework itself and can continue to use the current logic until
the second part is complete.

 As getting error handling logic right would take some time for
testing if not for development and there are some high-on-wishlist
features delayed due to EH - NCQ and hot plugging.  Once the new EH
framework is complete, fitting those in first and implementing unified
EH logic later might be a good idea.  NCQ can be easily integrated
once the framework is in place, but I'm not sure about hotplugging.
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html