blk: honor isolcpus configuration
From: | Daniel Wagner <wagi-AT-kernel.org> | |
To: | Jens Axboe <axboe-AT-kernel.dk>, Keith Busch <kbusch-AT-kernel.org>, Christoph Hellwig <hch-AT-lst.de>, Sagi Grimberg <sagi-AT-grimberg.me>, "Michael S. Tsirkin" <mst-AT-redhat.com> | |
Subject: | [PATCH v8 00/12] blk: honor isolcpus configuration | |
Date: | Fri, 05 Sep 2025 16:59:46 +0200 | |
Message-ID: | <20250905-isolcpus-io-queues-v8-0-885984c5daca@kernel.org> | |
Cc: | Aaron Tomlin <atomlin-AT-atomlin.com>, "Martin K. Petersen" <martin.petersen-AT-oracle.com>, Thomas Gleixner <tglx-AT-linutronix.de>, Costa Shulyupin <costa.shul-AT-redhat.com>, Juri Lelli <juri.lelli-AT-redhat.com>, Valentin Schneider <vschneid-AT-redhat.com>, Waiman Long <llong-AT-redhat.com>, Ming Lei <ming.lei-AT-redhat.com>, Frederic Weisbecker <frederic-AT-kernel.org>, Mel Gorman <mgorman-AT-suse.de>, Hannes Reinecke <hare-AT-suse.de>, Mathieu Desnoyers <mathieu.desnoyers-AT-efficios.com>, Aaron Tomlin <atomlin-AT-atomlin.com>, linux-kernel-AT-vger.kernel.org, linux-block-AT-vger.kernel.org, linux-nvme-AT-lists.infradead.org, megaraidlinux.pdl-AT-broadcom.com, linux-scsi-AT-vger.kernel.org, storagedev-AT-microchip.com, virtualization-AT-lists.linux.dev, GR-QLogic-Storage-Upstream-AT-marvell.com, Daniel Wagner <wagi-AT-kernel.org> | |
Archive-link: | Article |
The main changes in this version are - merged the mapping algorithm into the existing code - dropping a bunch of SCSI drivers update With the merging of the isolcpus-aware mapping code, there is a change in how the resulting CPU–hctx mapping looks for systems with identical CPUs (non-hyperthreaded CPUs). My understanding is that it shouldn't matter, but the devil is in the details. Package L#0 NUMANode L#0 (P#0 3255MB) L3 L#0 (16MB) L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1) L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2) L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3) L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 (P#4) L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 (P#5) L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 (P#6) L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 (P#7) base version: queue mapping for /dev/nvme0n1 hctx0: default 0 8 hctx1: default 1 9 hctx2: default 2 10 hctx3: default 3 11 hctx4: default 4 12 hctx5: default 5 13 hctx6: default 6 14 hctx7: default 7 15 patched: queue mapping for /dev/nvme0n1 hctx0: default 0 1 hctx1: default 2 3 hctx2: default 4 5 hctx3: default 6 7 hctx4: default 8 9 hctx5: default 10 11 hctx6: default 12 13 hctx7: default 14 15 Package L#0 + L3 L#0 (16MB) L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#1) L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 PU L#2 (P#2) PU L#3 (P#3) L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 PU L#4 (P#4) PU L#5 (P#5) L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 PU L#6 (P#6) PU L#7 (P#7) Package L#1 + L3 L#1 (16MB) L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 PU L#8 (P#8) PU L#9 (P#9) L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 PU L#10 (P#10) PU L#11 (P#11) L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 PU L#12 (P#12) PU L#13 (P#13) L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 PU L#14 (P#14) PU L#15 (P#15) base and patched: queue mapping for /dev/nvme0n1 hctx0: default 0 1 hctx1: default 2 3 hctx2: default 4 5 hctx3: default 6 7 hctx4: default 8 9 hctx5: default 10 11 hctx6: default 12 13 hctx7: default 14 15 As mentioned I've decided to update only SCSI drivers which are already using pci_alloc_irq_vectors_affinity with the PCI_IRQ_AFFINITY. These drivers are using the auto IRQ affinity managment code, which is what is the pre-condition for isolcpus to work. Also missing are the FC drivers which support nvme-fabrics (lpfc, qla2xxx). The nvme-fabrics code needs to be touched first. I've got the patches for this, but let's first get the main change in shape. After that, I can start updating one driver one by one. I think this reduced the risk of regression significantly. Signed-off-by: Daniel Wagner <wagi@kernel.org> --- Changes in v8: - added 524f5eea4bbe ("lib/group_cpus: remove !SMP code") - merged new logic into existing function, avoid special casing - group_mask_cpus_evenly: - /s/group_masks_cpus_evenly/group_mask_cpus_evenly - updated comment on group_mask_cpus_evenly - renamed argument from cpu_mask to mask - aacraid: added missing num queue calculcation (new patch) - only update scsi drivers which support PCI_IRQ_AFFINIT, and do not support nvme-fabrics - don't __free for cpumask_var_t, it seems incompatible - updated doc to hightlight the CPU offlining limitation - collected tags - Link to v7: https://patch.msgid.link/20250702-isolcpus-io-queues-v7-0... Changes in v7: - send out first part of the series: https://lore.kernel.org/all/20250617-isolcpus-queue-count... - added command line documentation - added validation code, so that resulting mapping is operational - rewrote mapping code for isolcpus so it takes into account active hctx - added blk_mq_map_hk_irq_queues which uses mask from irq_get_affinity - refactored blk_mq_map_hk_queues so caller tests for HK_TYPE_MANAGED_IRQ - Link to v6: https://patch.msgid.link/20250424-isolcpus-io-queues-v6-0... Changes in v6: - added io_queue isolcpus type back - prevent offlining hk cpu if a isol cpu is still present isntead just warning - Link to v5: https://lore.kernel.org/r/20250110-isolcpus-io-queues-v5-... Changes in v5: - rebased on latest for-6.14/block - udpated documetation on managed_irq - updated commit message "blk-mq: issue warning when offlining hctx with online isolcpus" - split input/output parameter in "lib/group_cpus: let group_cpu_evenly return number of groups" - dropped "sched/isolation: document HK_TYPE housekeeping option" - Link to v4: https://lore.kernel.org/r/20241217-isolcpus-io-queues-v4-... Changes in v4: - added "blk-mq: issue warning when offlining hctx with online isolcpus" - fixed check in cgroup_cpus_evenly, the if condition needs to use housekeeping_enabled() and not cpusmask_weight(housekeeping_masks), because the later will always return a valid mask. - dropped fixed tag from "lib/group_cpus.c: honor housekeeping config when grouping CPUs" - fixed overlong line "scsi: use block layer helpers to calculate num of queues" - dropped "sched/isolation: Add io_queue housekeeping option", just document the housekeep enum hk_type - added "lib/group_cpus: let group_cpu_evenly return number of groups" - collected tags - splitted series into a preperation series: https://lore.kernel.org/linux-nvme/20241202-refactor-blk-... - Link to v3: https://lore.kernel.org/r/20240806-isolcpus-io-queues-v3-... Changes in v3: - lifted a couple of patches from https://lore.kernel.org/all/20210709081005.421340-1-ming.... "virito: add APIs for retrieving vq affinity" "blk-mq: introduce blk_mq_dev_map_queues" - replaces all users of blk_mq_[pci|virtio]_map_queues with blk_mq_dev_map_queues - updated/extended number of queue calc helpers - add isolcpus=io_queue CPU-hctx mapping function - documented enum hk_type and isolcpus=io_queue - added "scsi: pm8001: do not overwrite PCI queue mapping" - Link to v2: https://lore.kernel.org/r/20240627-isolcpus-io-queues-v2-... Changes in v2: - updated documentation - splitted blk/nvme-pci patch - dropped HK_TYPE_IO_QUEUE, use HK_TYPE_MANAGED_IRQ - Link to v1: https://lore.kernel.org/r/20240621-isolcpus-io-queues-v1-... --- Daniel Wagner (12): scsi: aacraid: use block layer helpers to calculate num of queues lib/group_cpus: remove dead !SMP code lib/group_cpus: Add group_mask_cpus_evenly() genirq/affinity: Add cpumask to struct irq_affinity blk-mq: add blk_mq_{online|possible}_queue_affinity nvme-pci: use block layer helpers to constrain queue affinity scsi: Use block layer helpers to constrain queue affinity virtio: blk/scsi: use block layer helpers to constrain queue affinity isolation: Introduce io_queue isolcpus type blk-mq: use hk cpus only when isolcpus=io_queue is enabled blk-mq: prevent offlining hk CPUs with associated online isolated CPUs docs: add io_queue flag to isolcpus Documentation/admin-guide/kernel-parameters.txt | 22 ++- block/blk-mq-cpumap.c | 201 +++++++++++++++++++++--- block/blk-mq.c | 42 +++++ drivers/block/virtio_blk.c | 4 +- drivers/nvme/host/pci.c | 1 + drivers/scsi/aacraid/comminit.c | 3 +- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + drivers/scsi/megaraid/megaraid_sas_base.c | 5 +- drivers/scsi/mpi3mr/mpi3mr_fw.c | 6 +- drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +- drivers/scsi/pm8001/pm8001_init.c | 1 + drivers/scsi/virtio_scsi.c | 5 +- include/linux/blk-mq.h | 2 + include/linux/group_cpus.h | 3 + include/linux/interrupt.h | 16 +- include/linux/sched/isolation.h | 1 + kernel/irq/affinity.c | 12 +- kernel/sched/isolation.c | 7 + lib/group_cpus.c | 63 ++++++-- 19 files changed, 353 insertions(+), 47 deletions(-) --- base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0 change-id: 20240620-isolcpus-io-queues-1a88eb47ff8b Best regards, -- Daniel Wagner <wagi@kernel.org>