| From: |
| Alexandre Courbot <acourbot-AT-nvidia.com> |
| To: |
| Danilo Krummrich <dakr-AT-kernel.org>, Alice Ryhl <aliceryhl-AT-google.com>, David Airlie <airlied-AT-gmail.com>, Simona Vetter <simona-AT-ffwll.ch>, Bjorn Helgaas <bhelgaas-AT-google.com>, Krzysztof Wilczyński <kwilczynski-AT-kernel.org>, Miguel Ojeda <ojeda-AT-kernel.org>, Gary Guo <gary-AT-garyguo.net>, Björn Roy Baron <bjorn3_gh-AT-protonmail.com>, Benno Lossin <lossin-AT-kernel.org>, Andreas Hindborg <a.hindborg-AT-kernel.org>, Trevor Gross <tmgross-AT-umich.edu>, Boqun Feng <boqun-AT-kernel.org> |
| Subject: |
| [PATCH v3 0/6] gpu: nova-core: run unload sequence upon unbinding |
| Date: |
| Wed, 22 Apr 2026 22:40:50 +0900 |
| Message-ID: |
| <20260422-nova-unload-v3-0-1d2c81bd3ced@nvidia.com> |
| Cc: |
| John Hubbard <jhubbard-AT-nvidia.com>, Alistair Popple <apopple-AT-nvidia.com>, Joel Fernandes <joelagnelf-AT-nvidia.com>, Timur Tabi <ttabi-AT-nvidia.com>, Eliot Courtney <ecourtney-AT-nvidia.com>, dri-devel-AT-lists.freedesktop.org, linux-kernel-AT-vger.kernel.org, rust-for-linux-AT-vger.kernel.org, Alexandre Courbot <acourbot-AT-nvidia.com> |
| Archive-link: |
| Article |
Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:
- Probing requires setting up the WPR2 region, which cannot be done if
there is already one in place. Hence the current requirement to reset
the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
which the kernel might recycle.
On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.
This series does what is needed to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is trivial
and will be added alongside the Blackwell series [1] if this can be
merged first.
The first patch adds a `warn_on_err` utility macro to the kernel crate
as it is useful to warn on failures in the driver unbind path, but I can
remove it if it is not deemed useful.
This series applies cleanly on `master` as of today.
[1] https://lore.kernel.org/all/20260411024953.473149-1-jhubb...
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v3:
- Disambiguate doccomment for `warn_on_err`.
- Test the correct bit instead of the whole register value to determine
that the GSP has stopped.
- Use an enum instead of a boolean to encode the power level when
shutting down the GSP.
- Add missing newline to `dev_err`.
- Add missing doccomments for new types.
- Use values from bindings instead of magic numbers.
- Remove the redundant `get_gsp_info` function.
- Better document Booter Unloader mailbox sentinel value, and check the
value of mbox0 upon return.
- Link to v2: https://patch.msgid.link/20260421-nova-unload-v2-0-2fe549...
Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d82...
---
Alexandre Courbot (6):
rust: add warn_on_err macro
gpu: nova-core: use warn_on_err macro
gpu: nova-core: remove unneeded get_gsp_info proxy function
gpu: nova-core: do not import firmware commands into GSP command module
gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
drivers/gpu/nova-core/firmware/booter.rs | 1 -
drivers/gpu/nova-core/firmware/fwsec.rs | 1 -
drivers/gpu/nova-core/gpu.rs | 21 +++--
drivers/gpu/nova-core/gsp/boot.rs | 100 +++++++++++++++++++++-
drivers/gpu/nova-core/gsp/commands.rs | 69 +++++++++++----
drivers/gpu/nova-core/gsp/fw.rs | 4 +
drivers/gpu/nova-core/gsp/fw/commands.rs | 44 ++++++++++
drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 +++
drivers/gpu/nova-core/regs.rs | 5 ++
rust/kernel/bug.rs | 10 +++
10 files changed, 241 insertions(+), 25 deletions(-)
---
base-commit: b4e07588e743c989499ca24d49e752c074924a9a
change-id: 20251216-nova-unload-4029b3b76950
Best regards,
--
Alexandre Courbot <acourbot@nvidia.com>