Live Update Orchestrator
From: | Pasha Tatashin <pasha.tatashin-AT-soleen.com> | |
To: | pratyush-AT-kernel.org, jasonmiu-AT-google.com, graf-AT-amazon.com, changyuanl-AT-google.com, pasha.tatashin-AT-soleen.com, rppt-AT-kernel.org, dmatlack-AT-google.com, rientjes-AT-google.com, corbet-AT-lwn.net, rdunlap-AT-infradead.org, ilpo.jarvinen-AT-linux.intel.com, kanie-AT-linux.alibaba.com, ojeda-AT-kernel.org, aliceryhl-AT-google.com, masahiroy-AT-kernel.org, akpm-AT-linux-foundation.org, tj-AT-kernel.org, yoann.congal-AT-smile.fr, mmaurer-AT-google.com, roman.gushchin-AT-linux.dev, chenridong-AT-huawei.com, axboe-AT-kernel.dk, mark.rutland-AT-arm.com, jannh-AT-google.com, vincent.guittot-AT-linaro.org, hannes-AT-cmpxchg.org, dan.j.williams-AT-intel.com, david-AT-redhat.com, joel.granados-AT-kernel.org, rostedt-AT-goodmis.org, anna.schumaker-AT-oracle.com, song-AT-kernel.org, zhangguopeng-AT-kylinos.cn, linux-AT-weissschuh.net, linux-kernel-AT-vger.kernel.org, linux-doc-AT-vger.kernel.org, linux-mm-AT-kvack.org, gregkh-AT-linuxfoundation.org, tglx-AT-linutronix.de, mingo-AT-redhat.com, bp-AT-alien8.de, dave.hansen-AT-linux.intel.com, x86-AT-kernel.org, hpa-AT-zytor.com, rafael-AT-kernel.org, dakr-AT-kernel.org, bartosz.golaszewski-AT-linaro.org, cw00.choi-AT-samsung.com, myungjoo.ham-AT-samsung.com, yesanishhere-AT-gmail.com, Jonathan.Cameron-AT-huawei.com, quic_zijuhu-AT-quicinc.com, aleksander.lobakin-AT-intel.com, ira.weiny-AT-intel.com, andriy.shevchenko-AT-linux.intel.com, leon-AT-kernel.org, lukas-AT-wunner.de, bhelgaas-AT-google.com, wagi-AT-kernel.org, djeffery-AT-redhat.com, stuart.w.hayes-AT-gmail.com, ptyadav-AT-amazon.de, lennart-AT-poettering.net, brauner-AT-kernel.org, linux-api-AT-vger.kernel.org, linux-fsdevel-AT-vger.kernel.org, saeedm-AT-nvidia.com, ajayachandra-AT-nvidia.com, jgg-AT-nvidia.com, parav-AT-nvidia.com, leonro-AT-nvidia.com, witu-AT-nvidia.com, hughd-AT-google.com, skhawaja-AT-google.com, chrisl-AT-kernel.org, steven.sistare-AT-oracle.com | |
Subject: | [PATCH v4 00/30] Live Update Orchestrator | |
Date: | Mon, 29 Sep 2025 01:02:51 +0000 | |
Message-ID: | <20250929010321.3462457-1-pasha.tatashin@soleen.com> | |
Archive-link: | Article |
This series introduces the Live Update Orchestrator (LUO), a kernel subsystem designed to facilitate live kernel updates. LUO enables kexec-based reboots with minimal downtime, a critical capability for cloud environments where hypervisors must be updated without disrupting running virtual machines. By preserving the state of selected resources, such as file descriptors and memory, LUO allows workloads to resume seamlessly in the new kernel. The git branch for this series can be found at: https://github.com/googleprodkernel/linux-liveupdate/tree... The patch series applies against linux-next tag: next-20250926 While this series is showed cased using memfd preservation. There are works to preserve devices: 1. IOMMU: https://lore.kernel.org/all/20250928190624.3735830-16-skh... 2. PCI: https://lore.kernel.org/all/20250916-luo-pci-v2-0-c494053... ======================================================================= Changelog since v3: (https://lore.kernel.org/all/20250807014442.3829950-1-pash...): - The main architectural change in this version is introduction of "sessions" to manage the lifecycle of preserved file descriptors. In v3, session management was left to a single userspace agent. This approach has been revised to improve robustness. Now, each session is represented by a file descriptor (/dev/liveupdate). The lifecycle of all preserved resources within a session is tied to this FD, ensuring automatic cleanup by the kernel if the controlling userspace agent crashes or exits unexpectedly. - The first three KHO fixes from the previous series have been merged into Linus' tree. - Various bug fixes and refactorings, including correcting memory unpreservation logic during a kho_abort() sequence. - Addressing all comments from reviewers. - Removing sysfs interface (/sys/kernel/liveupdate/state), the state can now be queried only via ioctl() API. ======================================================================= What is Live Update? Live Update is a kexec-based reboot process where selected kernel resources (memory, file descriptors, and eventually devices) are kept operational or their state is preserved across a kernel transition. For certain resources, DMA and interrupt activity might continue with minimal interruption during the kernel reboot. LUO provides a framework for coordinating live updates. It features: State Machine Manages the live update process through states: NORMAL, PREPARED, FROZEN, UPDATED. Session Management ================== Userspace creates named sessions (driven by LUOD: Live Update Orchestrator Daemon, see: https://tinyurl.com/luoddesign), each represented by a file descriptor. Preserved resources are tied to a session, and their lifecycle is managed by the session's FD, ensuring automatic cleanup if the controlling process exits unexpectedly. Furthermore, sessions can be finished, prepared, and frozen independently of the global LUO states. This granular control allows a VMM to serialize and resume specific VMs as soon as their resources are ready, without having to wait for all VMs to be prepared. After a reboot, a central live update agent can retrieve a session handle and pass it to the VMM process, which then restores its own file descriptors. This ensures that resource allocations, such as cgroup memory charges, are correctly accounted against the workload's cgroup instead of the administrative agent's. KHO Integration =============== LUO programmatically drives KHO's finalization and abort sequences (KHO may soon to become completely stateless, which will make KHO interraction with LUO even simpler: https://lore.kernel.org/all/20250917025019.1585041-1-jaso...) KHO's debugfs interface is now optional, configured via CONFIG_KEXEC_HANDOVER_DEBUG. LUO preserves its own metadata via KHO's kho_add_subtree() and kho_preserve_phys() mechanisms. Subsystem Participation ======================= A callback API, liveupdate_register_subsystem(), allows kernel subsystems (e.g., KVM, IOMMU, VFIO, PCI) to register handlers for LUO events (PREPARE, FREEZE, FINISH, CANCEL) and persist a u64 payload via the LUO FDT. File Descriptor Preservation ============================ An infrastructure (liveupdate_register_file_handler, luo_preserve_file, luo_retrieve_file) allows specific types of file descriptors (e.g., memfd, vfio) to be preserved and restored within a session. Handlers for specific file types can be registered to manage their preservation, storing a u64 payload in the LUO FDT. Userspace Interface =================== ioctl (/dev/liveupdate): The primary control interface for creating and retrieving sessions, triggering global LUO state transitions (prepare, finish, cancel), and managing preserved file descriptors within a session. sysfs (/sys/kernel/liveupdate/state) A read-only interface for monitoring the current LUO state. Selftests ========= Includes kernel-side hooks and an extensive userspace selftest suite to verify core LUO functionality, including subsystem registration, state transitions, and complex multi-kexec session lifecycles. LUO State Machine and Events ============================ NORMAL: Default operational state. PREPARED: Initial preparation complete after LIVEUPDATE_PREPARE event. Subsystems have saved initial state. FROZEN: Final "blackout window" state after LIVEUPDATE_FREEZE event, just before kexec. Workloads must be suspended. UPDATED: Next kernel has booted via live update, awaiting restoration and LIVEUPDATE_FINISH. Events LIVEUPDATE_PREPARE: Prepare for reboot, serialize state. LIVEUPDATE_FREEZE: Final opportunity to save state before kexec. LIVEUPDATE_FINISH: Post-reboot cleanup in the next kernel. LIVEUPDATE_CANCEL: Abort prepare or freeze, revert changes. Mike Rapoport (Microsoft) (1): kho: drop notifiers Pasha Tatashin (24): kho: allow to drive kho from within kernel kho: make debugfs interface optional kho: add interfaces to unpreserve folios and page ranes kho: don't unpreserve memory during abort liveupdate: kho: move to kernel/liveupdate liveupdate: luo_core: luo_ioctl: Live Update Orchestrator liveupdate: luo_core: integrate with KHO liveupdate: luo_subsystems: add subsystem registration liveupdate: luo_subsystems: implement subsystem callbacks liveupdate: luo_session: Add sessions support liveupdate: luo_ioctl: add user interface liveupdate: luo_file: implement file systems callbacks liveupdate: luo_session: Add ioctls for file preservation and state management reboot: call liveupdate_reboot() before kexec kho: move kho debugfs directory to liveupdate liveupdate: add selftests for subsystems un/registration selftests/liveupdate: add subsystem/state tests docs: add luo documentation MAINTAINERS: add liveupdate entry selftests/liveupdate: Add multi-kexec session lifecycle test selftests/liveupdate: Add multi-file and unreclaimed file test selftests/liveupdate: Add multi-session workflow and state interaction test selftests/liveupdate: Add test for unreclaimed resource cleanup selftests/liveupdate: Add tests for per-session state and cancel cycles Pratyush Yadav (5): mm: shmem: use SHMEM_F_* flags instead of VM_* flags mm: shmem: allow freezing inode mapping mm: shmem: export some functions to internal.h luo: allow preserving memfd docs: add documentation for memfd preservation via LUO Documentation/core-api/index.rst | 1 + Documentation/core-api/kho/concepts.rst | 2 +- Documentation/core-api/liveupdate.rst | 64 ++ Documentation/mm/index.rst | 1 + Documentation/mm/memfd_preservation.rst | 138 +++ Documentation/userspace-api/index.rst | 1 + .../userspace-api/ioctl/ioctl-number.rst | 2 + Documentation/userspace-api/liveupdate.rst | 25 + MAINTAINERS | 18 +- include/linux/kexec_handover.h | 53 +- include/linux/liveupdate.h | 209 +++++ include/linux/shmem_fs.h | 23 + include/uapi/linux/liveupdate.h | 460 +++++++++ init/Kconfig | 2 + kernel/Kconfig.kexec | 15 - kernel/Makefile | 2 +- kernel/liveupdate/Kconfig | 72 ++ kernel/liveupdate/Makefile | 14 + kernel/{ => liveupdate}/kexec_handover.c | 507 ++++------ kernel/liveupdate/kexec_handover_debug.c | 222 +++++ kernel/liveupdate/kexec_handover_internal.h | 45 + kernel/liveupdate/luo_core.c | 588 ++++++++++++ kernel/liveupdate/luo_file.c | 599 ++++++++++++ kernel/liveupdate/luo_internal.h | 114 +++ kernel/liveupdate/luo_ioctl.c | 255 +++++ kernel/liveupdate/luo_selftests.c | 345 +++++++ kernel/liveupdate/luo_selftests.h | 84 ++ kernel/liveupdate/luo_session.c | 887 ++++++++++++++++++ kernel/liveupdate/luo_subsystems.c | 452 +++++++++ kernel/reboot.c | 4 + mm/Makefile | 1 + mm/internal.h | 6 + mm/memblock.c | 60 +- mm/memfd_luo.c | 523 +++++++++++ mm/shmem.c | 51 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/liveupdate/.gitignore | 2 + tools/testing/selftests/liveupdate/Makefile | 48 + tools/testing/selftests/liveupdate/config | 6 + .../testing/selftests/liveupdate/do_kexec.sh | 6 + .../testing/selftests/liveupdate/liveupdate.c | 404 ++++++++ .../selftests/liveupdate/luo_multi_file.c | 119 +++ .../selftests/liveupdate/luo_multi_kexec.c | 182 ++++ .../selftests/liveupdate/luo_multi_session.c | 155 +++ .../selftests/liveupdate/luo_test_utils.c | 241 +++++ .../selftests/liveupdate/luo_test_utils.h | 51 + .../selftests/liveupdate/luo_unreclaimed.c | 107 +++ 47 files changed, 6757 insertions(+), 410 deletions(-) create mode 100644 Documentation/core-api/liveupdate.rst create mode 100644 Documentation/mm/memfd_preservation.rst create mode 100644 Documentation/userspace-api/liveupdate.rst create mode 100644 include/linux/liveupdate.h create mode 100644 include/uapi/linux/liveupdate.h create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (80%) create mode 100644 kernel/liveupdate/kexec_handover_debug.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h create mode 100644 kernel/liveupdate/luo_core.c create mode 100644 kernel/liveupdate/luo_file.c create mode 100644 kernel/liveupdate/luo_internal.h create mode 100644 kernel/liveupdate/luo_ioctl.c create mode 100644 kernel/liveupdate/luo_selftests.c create mode 100644 kernel/liveupdate/luo_selftests.h create mode 100644 kernel/liveupdate/luo_session.c create mode 100644 kernel/liveupdate/luo_subsystems.c create mode 100644 mm/memfd_luo.c create mode 100644 tools/testing/selftests/liveupdate/.gitignore create mode 100644 tools/testing/selftests/liveupdate/Makefile create mode 100644 tools/testing/selftests/liveupdate/config create mode 100755 tools/testing/selftests/liveupdate/do_kexec.sh create mode 100644 tools/testing/selftests/liveupdate/liveupdate.c create mode 100644 tools/testing/selftests/liveupdate/luo_multi_file.c create mode 100644 tools/testing/selftests/liveupdate/luo_multi_kexec.c create mode 100644 tools/testing/selftests/liveupdate/luo_multi_session.c create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.c create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.h create mode 100644 tools/testing/selftests/liveupdate/luo_unreclaimed.c -- 2.51.0.536.g15c5d4f767-goog