|
|
Subscribe / Log in / New account

The rseq() manual page

This manual page was formatted from the librseq repository in August 2025.

rseq(2) System Calls Manual rseq(2)
NAME 
 rseq <-> restartable sequences system call

LIBRARY 
 Standard C library (libc, <->lc)

SYNOPSIS 
 #include <linux/rseq.h> /* Definition of RSEQ_* constants */
 #include <sys/syscall.h> /* Definition of SYS_* constants */
 #include <unistd.h>
 int syscall(SYS_rseq, struct rseq *rseq, uint32_t rseq_len, int flags, uint32_t sig);
 Note: glibc provides no wrapper for rseq(), necessitating the use of syscall(2).

DESCRIPTION 

 The rseq() ABI accelerates specific user-space operations by registering a
 per-thread data structure shared between kernel and user-space. This data
 structure can be read from or written to by user-space to skip
 otherwise expensive system calls.

 A restartable sequence is a sequence of instructions guaranteed to be
 executed atomically with respect to other threads and signal handlers on
 the current CPU. If its execution does not complete atomically, the kernel
 changes the execution flow by jumping to an abort handler defined by
 user-space for that restartable sequence.

 Using restartable sequences requires to register a rseq() ABI per-thread
 data structure (struct rseq) through the rseq() system call. Only one
 rseq() ABI can be registered per thread, so user-space libraries and
 appli<hy> cations must follow a user-space ABI defining how to share
 this resource. The ABI defining how to share this resource between
 applications and libraries is defined by the C library. Allocation of the
 per-thread rseq() ABI and its registration to the kernel is handled by
 glibc since version 2.35.

 The rseq() ABI per-thread data structure contains a rseq_cs field which
 points to the currently executing critical section. For each thread, a
 single rseq critical section can run at any given point. Each critical
 sec<hy> tion need to be implemented in assembly.

 The rseq() ABI accelerates user-space operations on per-cpu data by
 defining a shared data structure ABI between each user-space thread and
 the kernel.  It allows user-space to perform update operations on per-cpu
 data without requiring heavy-weight atomic operations.

 The term CPU used in this documentation refers to a hardware execution
 context. For instance, each CPU number returned by sched_getcpu() is a
 CPU. The current CPU means the CPU on which the registered thread is
 running.

 Restartable sequences are atomic with respect to preemption (making it
 atomic with respect to other threads running on the same CPU), as well as
 signal delivery (user-space execution contexts nested over the same
 thread). They either complete atomically with respect to preemption on the
 current CPU and signal deliv<hy> ery, or they are aborted.

 Restartable sequences are suited for update operations on per-cpu data.
 Restartable sequences can be used on data structures shared between
 threads within a process, and on data structures shared between threads
 across different processes.

 Some examples of operations that can be accelerated or improved by this ABI:
 - Memory allocator per-cpu free-lists,
 - Querying the current CPU number,
 - Incrementing per-CPU counters,
 - Modifying data protected by per-CPU spinlocks,
 - Inserting/removing elements in per-CPU linked-lists,
 - Writing/reading per-CPU ring buffers content.
 - Accurately reading performance monitoring unit counters with respect to
   thread migration. 

 Restartable sequences must not perform system calls. Doing so may result
 in termination of the process by a segmentation fault.

 The rseq argument is a pointer to the thread-local struct rseq to be
 shared between kernel and user-space.  The structure struct rseq is an
 extensible structure. Additional feature fields can be added in future
 kernel versions. Its layout is as follows:

 Structure alignment

 This structure is aligned on either 32-byte boundary, or on the alignment
 value returned by getauxval() invoked with AT_RSEQ_ALIGN if the structure
 size differs from 32 bytes.

 Structure size

 This structure size needs to be at least 32 bytes. It can be either 32
 bytes, or it needs to be large enough to hold the result of getauxval()
 invoked with AT_RSEQ_FEATURE_SIZE. Its size is passed as parameter to the
 rseq() system call.

 #include <linux/rseq.h>

 struct rseq {
  __u32 cpu_id_start;
  __u32 cpu_id;
  union {
    /* ... */
  } rseq_cs;
  __u32 flags;
  __u32 node_id;
  __u32 mm_cid;
 } __attribute__((aligned(32)));
 Fields
 
 cpu_id_start

 Always-updated value of the CPU number on which the registered thread is
 running. Its value is guaranteed to always be a possible CPU number, even
 when rseq() is not registered. Its value should always be confirmed by
 reading the cpu_id field before user-space performs any side-effect
 (e.g. storing to memory).

 This field is always guaranteed to hold a valid CPU number in the range [
 0 .. nr_possible_cpus - 1 ]. It can therefore be loaded by user-space and
 used as an offset in per-cpu data structures without having to check
 whether its value is within the valid bounds compared to the number of
 possible CPUs in the system.

 Initialized by user-space to a possible CPU number (e.g., 0), updated by
 the kernel for threads registered with rseq().

 For user-space applications executed on a kernel without rseq() support,
 the cpu_id_start field stays initialized at 0, which is indeed a valid CPU
 number. It is therefore valid to use it as an offset in per-cpu data
 structures, and only validate whether it's actually the current CPU number
 by comparing it with the cpu_id field within the rseq critical section.

 If the kernel does not provide rseq() support, that cpu_id field stays
 initialized at -1, so the comparison always fails, as intended.

 This field should only be read by the thread which registered this data
 structure. Aligned on 32-bit. It is up to user space to implement a
 fall-back mechanism for scenarios where rseq() is not available.

 cpu_id

 Always-updated value of the CPU number on which the registered thread is
 running. Initialized by user-space to -1, updated by the kernel for
 threads registered with rseq().  This field should only be read by the
 thread which registered this data structure. Aligned on 32-bit.

 rseq_cs

 The rseq_cs field is a pointer to a struct rseq_cs. Is is NULL when no
 rseq assembly block critical section is active for the registered
 thread. Setting it to point to a critical section descriptor (struct
 rseq_cs) marks the beginning of the critical section.

 Initialized by user-space to NULL.  Updated by user-space, which sets the
 address of the currently active rseq_cs at the beginning of assembly
 instruction sequence block, and set to NULL by the kernel when it restarts
 an assembly instruction sequence block, as well as when the kernel detects
 that it is preempting or delivering a signal outside of the range targeted
 by the rseq_cs. Also needs to be set to NULL by user-space before
 reclaiming memory that contains the targeted struct rseq_cs.

 Read and set by the kernel.  This field should only be updated by the
 thread which registered this data structure.  Aligned on 64-bit.

 flags
 Flags indicating the restart behavior for the registered thread. This is
 mainly used for debugging purposes. Can be a combination of: 
 RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
 Inhibit instruction sequence block restart on preemption for this thread. This
 flag is deprecated since Linux 6.1.
 RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
 Inhibit instruction sequence block restart on signal delivery for this
 thread. This flag is deprecated since Linux 6.1. 
 RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
 Inhibit instruction sequence block restart on migration for this
 thread. This flag is deprecated since Linux 6.1.
 Initialized by user-space, used by the kernel.

 node_id
 Always-updated value of the current NUMA node ID.
 Initialized by user-space to 0.

 Updated by the kernel. Read by user-space with single-copy atomicity
 semantics. This field should only be read by the thread which registered
 this data structure. Aligned on 32-bit.

 mm_cid

 Contains the current thread's concurrency ID (allocated uniquely within a
 memory map).  Updated by the kernel. Read by user-space with single-copy
 atomicity semantics. This field should only be read by the thread which
 registered this data structure. Aligned on 32-bit.

 This concurrency ID is within the possible cpus range, and is temporarily
 (and uniquely) assigned while threads are actively running within a memory
 map. If a memory map has fewer threads than cores, or is limited to run on
 few cores concurrently through sched affinity or cgroup cpusets, the
 concurrency IDs will be values close to 0, thus allowing efficient use of
 user-space memory for per-cpu data structures.

 The layout of struct rseq_cs version 0 is as follows:

 Structure alignment
 This structure is aligned on 32-byte boundary.

 Structure size
 This structure has a fixed size of 32 bytes.

 #include <linux/rseq.h>
 struct rseq_cs {
  __u32 version;
  __u32 flags;
  __u64 start_ip;
  __u64 post_commit_offset;
  __u64 abort_ip;
 } __attribute__((aligned(32)));

Fields
 
 version
 Version of this structure. Should be initialized to 0.

 flags
 
 Flags indicating the restart behavior of this structure. Can be a combination of:
 RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
 Inhibit instruction sequence block restart on preemption for this critical
 section.  This flag is deprecated since Linux 6.1.
 RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
 Inhibit instruction sequence block restart on signal delivery for this
 critical section. This flag is deprecated since Linux 6.1. 
 RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
 Inhibit instruction sequence block restart on migration for this critical
 section. This flag is deprecated since Linux 6.1.

 start_ip
 Instruction pointer address of the first instruction of the sequence of
 consecutive assembly instructions. 

 post_commit_offset

 Offset (from start_ip address) of the address after the last instruction of
 the sequence of consecutive assembly instructions.

 abort_ip

 Instruction pointer address where to move the execution flow in case of
 abort of the sequence of consecutive assembly instructions.

 The rseq_len argument is the size of the struct rseq to register.

 The flags argument is 0 for registration, and RSEQ_FLAG_UNREGISTER for
 unregistration.

 The sig argument is the 32-bit signature to be expected before the abort
 handler code.

 A single library per process should keep the struct rseq in a per-thread
 data structure. The cpu_id field should be initialized to -1, and the
 cpu_id_start field should be initialized to a possible CPU value
 (typically 0).

 Each thread is responsible for registering and unregistering its struct
 rseq. No more than one struct rseq address can be registered per thread at
 a given time.

 Reclaim of struct rseq object's memory must only be done after either an
 explicit rseq unregistration is performed or after the thread exits.

 In a typical usage scenario, the thread registering the struct rseq will
 be performing loads and stores from/to that structure. It is however also
 allowed to read that structure from other threads. The struct rseq field
 updates performed by the kernel provide relaxed atomicity semantics
 (atomic store, without memory ordering), which guarantee that other
 threads performing relaxed atomic reads (atomic load, without memory
 ordering) of the cpu number fields will always observe a consistent value.

RETURN VALUE 

 A return value of 0 indicates success. On error, <->1 is returned,
 and errno is set appropriately. 

ERRORS 

 EINVAL
 Either flags contains an invalid value, or rseq contains an address which
 is not appropriately aligned, or rseq_len contains an incorrect size, or
 restartable sequence is already registered for this  thread and at least
 one argument differs from the active registration. 

 ENOSYS
 The rseq() system call is not implemented by this kernel.

 EFAULT
 rseq is an invalid address.

 EBUSY
 Restartable sequence is already registered for this thread and all
 arguments are the same as the active registration. 

 EPERM
 The sig argument on unregistration does not match the signature received
 on registration.

VERSIONS 
 The rseq() system call was added in Linux 4.18.

STANDARDS 
 rseq() is Linux-specific.

SEE ALSO 
 sched_getcpu(3), membarrier(2), getauxval(3)
Linux man-pages (unreleased) (date) 5


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds