| From: |
| Bart Van Assche <bvanassche-AT-acm.org> |
| To: |
| "Martin K . Petersen" <martin.petersen-AT-oracle.com> |
| Subject: |
| [PATCH v6 00/28] Optimize the hot path in the UFS driver |
| Date: |
| Tue, 14 Oct 2025 13:15:42 -0700 |
| Message-ID: |
| <20251014201707.3396650-1-bvanassche@acm.org> |
| Cc: |
| linux-scsi-AT-vger.kernel.org, Bart Van Assche <bvanassche-AT-acm.org> |
| Archive-link: |
| Article |
Hi Martin,
This patch series optimizes the hot path of the UFS driver by making
struct scsi_cmnd and struct ufshcd_lrb adjacent. Making these two data
structures adjacent is realized as follows:
@@ -9040,6 +9046,7 @@ static const struct scsi_host_template ufshcd_driver_template = {
.name = UFSHCD,
.proc_name = UFSHCD,
.map_queues = ufshcd_map_queues,
+ .cmd_size = sizeof(struct ufshcd_lrb),
.init_cmd_priv = ufshcd_init_cmd_priv,
.queuecommand = ufshcd_queuecommand,
.mq_poll = ufshcd_poll,
The following changes had to be made prior to making these two data
structures adjacent:
* Add support for driver-internal and reserved commands in the SCSI core.
* Instead of making the reserved command slot (hba->reserved_slot)
invisible to the SCSI core, let the SCSI core allocate a reserved command.
* Remove all UFS data structure members that are no longer needed
because struct scsi_cmnd and struct ufshcd_lrb are now adjacent
* Call ufshcd_init_lrb() from inside ufshcd_queuecommand() instead of
calling this function before I/O starts. This is necessary because
ufshcd_memory_alloc() allocates fewer instances than the block layer
allocates requests. See also the following code in the block layer
core:
if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx,
hctx->numa_node))
Although the UFS driver could be modified such that ufshcd_init_lrb()
is called from ufshcd_init_cmd_priv(), realizing this would require
moving the memory allocations that happen from inside
ufshcd_memory_alloc() into ufshcd_init_cmd_priv(). That would make
this patch series even larger. Although ufshcd_init_lrb() is called for each
command, the benefits of reduced indirection and better cache efficiency
outweigh the small overhead of per-command lrb initialization.
* ufshcd_add_scsi_host() happens now before any device management
commands are submitted. This change is necessary because this patch
makes device management command allocation happen when the SCSI host
is allocated.
* Allocate as many command slots as the host controller supports. Decrease
host->cmds_per_lun if necessary once it is clear whether or not the UFS
device supports less command slots than the host controller.
Changes compared to v5:
- Removed the "|| sht->queue_reserved_command" test from
scsi_add_host_with_dma().
- Removed "WARN_ON_ONCE" from "WARN_ON_ONCE(!sdev->budget_map.map)" in
scsi_change_queue_depth().
- Removed "if (WARN_ON_ONCE(!sdev->budget_map.map)) return -EINVAL;" from
scsi_realloc_sdev_budget_map().
- Simplified and improved the scsi_debug abort implementation.
- Removed the scsi_device_is_pseudo_dev() declaration from
drivers/scsi/scsi_priv.h.
- Fixed ufshcd_get_hba_mac(): "Failed to get mac" is no longer reported if the
function succeeds.
- In the UFS driver, set .nr_reserved_cmds in the SCSI host template instead of
in ufshcd_init().
Changes compared to v4:
- Dropped the scsi_execute_cmd() changes.
- Restored patch "scsi: core: Add scsi_{get,put}_internal_cmd() helpers".
- Switched back from scsi_execute_cmd() to blk_execute_rq() for submitting
device management commands in the UFS driver.
- As suggested by John Garry, modified the scsi_debug patch such that aborting
a SCSI command happens by submitting a reserved command.
Changes compared to v3:
- Fixed a spelling error in patch 1 and left out a superfluous if-statement.
- Left out scsi_host_template.alloc_pseudo_sdev and allocate a pseudo SCSI
device if either nr_reserved_cmds > 0 or .queue_reserved_commands has been
set.
- Left out the 'pseudo_sdev' local variable from scsi_forget_host().
- Removed a backwards jump from scsi_get_pseudo_dev().
- Included a bug fix for synchronous scanning.
- Skip scsi_track_queue_full() and scsi_handle_queue_ramp_up() for pseudo SCSI
devices.
- Extended the scsi_execute_rq() functionality.
- Use scsi_execute_rq() for submitting reserved commands instead of
blk_execute_rq().
- Dropped the patch that introduces scsi_get_internal_cmd() and
scsi_put_internal_cmd().
Changes compared to v2:
- Removed scsi_host_update_can_queue() and also the UFS driver refactoring
patches that were introduced to support this call.
- Added .queue_reserved_command(). Added ufshcd_queue_reserved_command().
- Removed a BUG_ON() statement from ufshcd_get_dev_mgmt_cmd().
- Modified and renamed ufshcd_mcq_decide_queue_depth().
Changes compared to v1:
- Left out the kernel patches related to support for const SCSI command
arguments.
- Added SCSI core patches for allocating a pseudo SCSI device and reserved
command support.
- Added several kernel patches to switch the UFS driver from a hardcoded
reserved slot to calling scsi_get_internal_cmd().
- Enable .alloc_pseudo_sdev in the scsi_debug driver.
Bart Van Assche (24):
scsi: core: Move two statements
scsi: core: Make the budget map optional
scsi_debug: Abort SCSI commands via an internal command
ufs: core: Move an assignment in ufshcd_mcq_process_cqe()
ufs: core: Change the type of one ufshcd_add_cmd_upiu_trace() argument
ufs: core: Only call ufshcd_add_command_trace() for SCSI commands
ufs: core: Change the type of one ufshcd_add_command_trace() argument
ufs: core: Change the type of one ufshcd_send_command() argument
ufs: core: Only call ufshcd_should_inform_monitor() for SCSI commands
ufs: core: Change the monitor function argument types
ufs: core: Rework ufshcd_mcq_compl_pending_transfer()
ufs: core: Rework ufshcd_eh_device_reset_handler()
ufs: core: Rework the SCSI host queue depth calculation code
ufs: core: Allocate the SCSI host earlier
ufs: core: Call ufshcd_init_lrb() later
ufs: core: Use hba->reserved_slot
ufs: core: Make the reserved slot a reserved request
ufs: core: Do not clear driver-private command data
ufs: core: Optimize the hot path
ufs: core: Pass a SCSI pointer instead of an LRB pointer
ufs: core: Remove the ufshcd_lrb task_tag member
ufs: core: Make blk_mq_tagset_busy_iter() skip reserved requests
ufs: core: Move code out of ufshcd_wait_for_dev_cmd()
ufs: core: Switch to scsi_get_internal_cmd()
Hannes Reinecke (3):
scsi: core: Support allocating reserved commands
scsi: core: Support allocating a pseudo SCSI device
scsi: core: Add scsi_{get,put}_internal_cmd() helpers
John Garry (1):
scsi: core: Introduce .queue_reserved_command()
drivers/scsi/hosts.c | 15 +
drivers/scsi/scsi.c | 12 +-
drivers/scsi/scsi_debug.c | 113 ++++-
drivers/scsi/scsi_error.c | 3 +
drivers/scsi/scsi_lib.c | 145 +++++-
drivers/scsi/scsi_priv.h | 1 +
drivers/scsi/scsi_scan.c | 74 ++-
drivers/scsi/scsi_sysfs.c | 5 +-
drivers/ufs/core/ufs-mcq.c | 56 +--
drivers/ufs/core/ufshcd-crypto.h | 18 +-
drivers/ufs/core/ufshcd-priv.h | 20 +-
drivers/ufs/core/ufshcd.c | 802 ++++++++++++++++---------------
include/scsi/scsi_device.h | 23 +
include/scsi/scsi_host.h | 33 +-
include/ufs/ufshcd.h | 12 -
15 files changed, 859 insertions(+), 473 deletions(-)