Removing syscall() from OpenBSD
I hope I am forcing attack coders into using increasingly more complicated methods. Same time, it means fewer methods are available. Other methods make exploitation more fragile. This is pushing success rates into "low-percent statistical" success. If we teach more software stacks to "fail hard, don't try to recover", that is an improvement in security.
From: | Theo de Raadt <deraadt-AT-openbsd.org> | |
To: | tech-AT-cvs.openbsd.org | |
Subject: | Removing syscall(2) from libc and kernel | |
Date: | Fri, 27 Oct 2023 08:45:41 -0600 | |
Message-ID: | <69276.1698417941@cvs.openbsd.org> |
Piece by piece, I've been trying to remove the easiest of the terminal-actions that exploit code uses (ie. getting to execve, or performing other system calls, etc). I recognize we can never completely remove all mechanisms they use. However, I hope I am forcing attack coders into using increasingly more complicated methods. Same time, it means fewer methods are available. Other methods make exploitation more fragile. This is pushing success rates into "low-percent statistical" success. If we teach more software stacks to "fail hard, don't try to recover", that is an improvement in security. I already made it difficult to call execve() directly in a few ways. The kernel must be entered via the exact syscall instruction, inside the libc syscall stub. Immediately before that syscall instruction, the SYS_execve instruction is loaded into a register. On some architectures, the PLT-reachable stub performs a retguard check, which can be triggered by a few methods. Stack pivots are also mostly prevented because of other checks. It is not possible to enter via the SYS_syscall (syscall register = 0) case either. Attack code can try to do perform other system calls, to create filesystem damage or network communication. They could still load other syscall numbers and jump to a found syscall instruction, if they are able to cheat the retguard epilogue (It is a bit unfortunate that libc syscall stubs tend to use the same save register, but at least the compare offset is chosen random at compile time). Or, they could know where all the system calls are from a pre-read libc, which requires them to be on the machine before performing an online or offline attack (libc is random relinked, but still readable in the filesystem). It's difficult to discover code-locations online only, because most architectures also have xonly code now. Some methods can use PLT entries (which also vary based upon random relink), but I've not seem much methodology using PLT entry + offset. Anyways, everyone of these things I mention, and the ones I don't mention, tend to be more difficult than the previous methods. I'm trying to remove simple methods, and force attackers into more and more complex methods. I promise that I will circle back and damage the more complex methods in the future. So in this next step, I'm going to take away the ability to perform syscall #0 (SYS_syscall), with the first argument being the real system call. This library interface, and all the pieces below it, will be going away: https://man.openbsd.org/syscall.2 There's going to be some fallout which takes time to fix, especially in the "go" ecosystem. Snapshots for some architectures now contain kernel diffs which reject syscall(2). The symbol still remains libc. I'm including a piece of this diff. Index: sys/arch/alpha/alpha/trap.c =================================================================== RCS file: /cvs/src/sys/arch/alpha/alpha/trap.c,v diff -u -p -u -r1.108 trap.c --- sys/arch/alpha/alpha/trap.c 8 Mar 2023 04:43:07 -0000 1.108 +++ sys/arch/alpha/alpha/trap.c 27 Oct 2023 03:26:49 -0000 @@ -497,17 +497,15 @@ dopanic: * a3, and v0 from the frame before returning to the user process. */ void -syscall(code, framep) - u_int64_t code; - struct trapframe *framep; +syscall(u_int64_t code, struct trapframe *framep) { - const struct sysent *callp; + const struct sysent *callp = sysent; struct proc *p; - int error, indirect = -1; + int error; u_int64_t opc; u_long rval[2]; u_long args[10]; /* XXX */ - u_int hidden, nargs; + u_int nargs; atomic_add_int(&uvmexp.syscalls, 1); p = curproc; @@ -515,24 +513,11 @@ syscall(code, framep) framep->tf_regs[FRAME_SP] = alpha_pal_rdusp(); opc = framep->tf_regs[FRAME_PC] - 4; - switch(code) { - case SYS_syscall: - indirect = code; - code = framep->tf_regs[FRAME_A0]; - hidden = 1; - break; - default: - hidden = 0; - } - - error = 0; - callp = sysent; - if (code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; - nargs = callp->sy_narg + hidden; + nargs = callp->sy_narg; switch (nargs) { default: if (nargs > 10) /* XXX */ @@ -559,7 +544,7 @@ syscall(code, framep) rval[0] = 0; rval[1] = 0; - error = mi_syscall(p, code, indirect, callp, args + hidden, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: Index: sys/arch/amd64/amd64/locore.S =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/locore.S,v diff -u -p -u -r1.141 locore.S --- sys/arch/amd64/amd64/locore.S 24 Oct 2023 13:20:09 -0000 1.141 +++ sys/arch/amd64/amd64/locore.S 27 Oct 2023 03:26:49 -0000 @@ -508,6 +508,7 @@ ENTRY(savectx) lfence END(savectx) +// XXX this should not behave like a nop IDTVEC(syscall32) sysret /* go away please */ END(Xsyscall32) Index: sys/arch/amd64/amd64/trap.c =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/trap.c,v diff -u -p -u -r1.101 trap.c --- sys/arch/amd64/amd64/trap.c 5 Jul 2023 12:58:55 -0000 1.101 +++ sys/arch/amd64/amd64/trap.c 27 Oct 2023 03:26:49 -0000 @@ -553,7 +553,7 @@ syscall(struct trapframe *frame) caddr_t params; const struct sysent *callp; struct proc *p; - int error, indirect = -1; + int error = ENOSYS; size_t argsize, argoff; register_t code, args[9], rval[2], *argp; @@ -570,26 +570,9 @@ syscall(struct trapframe *frame) argp = &args[0]; argoff = 0; - switch (code) { - case SYS_syscall: - /* - * Code is first argument, followed by actual args. - */ - indirect = code; - code = frame->tf_rdi; - argp = &args[1]; - argoff = 1; - break; - default: - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; - + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp = sysent + code; argsize = (callp->sy_argsize >> 3) + argoff; if (argsize) { switch (MIN(argsize, 6)) { @@ -620,7 +603,7 @@ syscall(struct trapframe *frame) rval[0] = 0; rval[1] = 0; - error = mi_syscall(p, code, indirect, callp, argp, rval); + error = mi_syscall(p, code, callp, argp, rval); switch (error) { case 0: Index: sys/arch/arm/arm/syscall.c =================================================================== RCS file: /cvs/src/sys/arch/arm/arm/syscall.c,v diff -u -p -u -r1.26 syscall.c --- sys/arch/arm/arm/syscall.c 11 Feb 2023 23:07:26 -0000 1.26 +++ sys/arch/arm/arm/syscall.c 27 Oct 2023 03:26:49 -0000 @@ -93,8 +93,8 @@ void swi_handler(trapframe_t *frame) { struct proc *p = curproc; - const struct sysent *callp; - int code, error, indirect = -1; + const struct sysent *callp = sysent; + int code, error; u_int nap = 4, nargs; register_t *ap, *args, copyargs[MAXARGS], rval[2]; @@ -103,32 +103,19 @@ swi_handler(trapframe_t *frame) /* Before enabling interrupts, save FPU state */ vfp_save(); - /* Re-enable interrupts if they were enabled previously */ - if (__predict_true((frame->tf_spsr & PSR_I) == 0)) - enable_interrupts(PSR_I); + enable_interrupts(PSR_I); p->p_addr->u_pcb.pcb_tf = frame; /* Skip over speculation-blocking barrier. */ frame->tf_pc += 8; - code = frame->tf_r12; - ap = &frame->tf_r0; - switch (code) { - case SYS_syscall: - indirect = code; - code = *ap++; - nap--; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; + code = frame->tf_r12; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; nargs = callp->sy_argsize / sizeof(register_t); if (nargs <= nap) { @@ -145,27 +132,23 @@ swi_handler(trapframe_t *frame) rval[0] = 0; rval[1] = frame->tf_r1; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: frame->tf_r0 = rval[0]; frame->tf_r1 = rval[1]; - frame->tf_spsr &= ~PSR_C; /* carry bit */ break; - case ERESTART: /* * Reconstruct the pc to point at the swi. */ frame->tf_pc -= 12; break; - case EJUSTRETURN: /* nothing to do */ break; - default: bad: frame->tf_r0 = error; Index: sys/arch/arm64/arm64/syscall.c =================================================================== RCS file: /cvs/src/sys/arch/arm64/arm64/syscall.c,v diff -u -p -u -r1.14 syscall.c --- sys/arch/arm64/arm64/syscall.c 13 Apr 2023 02:19:04 -0000 1.14 +++ sys/arch/arm64/arm64/syscall.c 27 Oct 2023 03:26:49 -0000 @@ -33,7 +33,7 @@ svc_handler(trapframe_t *frame) { struct proc *p = curproc; const struct sysent *callp; - int code, error, indirect = -1; + int code, error = ENOSYS; u_int nap = 8, nargs; register_t *ap, *args, copyargs[MAXARGS], rval[2]; @@ -50,19 +50,9 @@ svc_handler(trapframe_t *frame) ap = &frame->tf_x[0]; - switch (code) { - case SYS_syscall: - indirect = code; - code = *ap++; - nap--; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp = sysent + code; nargs = callp->sy_argsize / sizeof(register_t); if (nargs <= nap) { @@ -79,25 +69,22 @@ svc_handler(trapframe_t *frame) rval[0] = 0; rval[1] = 0; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: frame->tf_x[0] = rval[0]; frame->tf_spsr &= ~PSR_C; /* carry bit */ break; - case ERESTART: /* * Reconstruct the pc to point at the svc. */ frame->tf_elr -= 12; break; - case EJUSTRETURN: /* nothing to do */ break; - default: bad: frame->tf_x[0] = error; Index: sys/arch/hppa/hppa/trap.c =================================================================== RCS file: /cvs/src/sys/arch/hppa/hppa/trap.c,v diff -u -p -u -r1.161 trap.c --- sys/arch/hppa/hppa/trap.c 11 Feb 2023 23:07:26 -0000 1.161 +++ sys/arch/hppa/hppa/trap.c 27 Oct 2023 03:26:49 -0000 @@ -764,8 +764,8 @@ void syscall(struct trapframe *frame) { struct proc *p = curproc; - const struct sysent *callp; - int retq, code, argsize, argoff, error, indirect = -1; + const struct sysent *callp = sysent; + int code, argsize, argoff, error; register_t args[8], rval[2]; #ifdef DIAGNOSTIC int oldcpl = curcpu()->ci_cpl; @@ -778,29 +778,16 @@ syscall(struct trapframe *frame) p->p_md.md_regs = frame; - argoff = 4; retq = 0; - switch (code = frame->tf_t1) { - case SYS_syscall: - indirect = code; - code = frame->tf_arg0; - args[0] = frame->tf_arg1; - args[1] = frame->tf_arg2; - args[2] = frame->tf_arg3; - argoff = 3; - break; - default: - args[0] = frame->tf_arg0; - args[1] = frame->tf_arg1; - args[2] = frame->tf_arg2; - args[3] = frame->tf_arg3; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; + argoff = 4; + code = frame->tf_t1; + args[0] = frame->tf_arg0; + args[1] = frame->tf_arg1; + args[2] = frame->tf_arg2; + args[3] = frame->tf_arg3; + + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; if ((argsize = callp->sy_argsize)) { register_t *s, *e, t; @@ -830,7 +817,7 @@ syscall(struct trapframe *frame) */ i = 0; switch (code) { - case SYS_lseek: retq = 0; + case SYS_lseek: case SYS_truncate: case SYS_ftruncate: i = 2; break; case SYS_preadv: @@ -851,12 +838,12 @@ syscall(struct trapframe *frame) rval[0] = 0; rval[1] = frame->tf_ret1; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: frame->tf_ret0 = rval[0]; - frame->tf_ret1 = rval[!retq]; + frame->tf_ret1 = rval[1]; frame->tf_t1 = 0; break; case ERESTART: @@ -872,7 +859,7 @@ syscall(struct trapframe *frame) break; } - ast(p); + ast(p); // XXX why? mi_syscall_return(p, code, error, rval); Index: sys/arch/i386/i386/trap.c =================================================================== RCS file: /cvs/src/sys/arch/i386/i386/trap.c,v diff -u -p -u -r1.162 trap.c --- sys/arch/i386/i386/trap.c 16 Apr 2023 06:43:49 -0000 1.162 +++ sys/arch/i386/i386/trap.c 27 Oct 2023 03:26:49 -0000 @@ -516,9 +516,9 @@ void syscall(struct trapframe *frame) { caddr_t params; - const struct sysent *callp; - struct proc *p; - int error, indirect = -1; + const struct sysent *callp = sysent; + struct proc *p = curproc; + int error; register_t code, args[8], rval[2]; #ifdef DIAGNOSTIC int ocpl = lapic_tpr; @@ -540,38 +540,22 @@ syscall(struct trapframe *frame) } #endif - p = curproc; p->p_md.md_regs = frame; - code = frame->tf_eax; - - params = (caddr_t)frame->tf_esp + sizeof(int); - switch (code) { - case SYS_syscall: - /* - * Code is first argument, followed by actual args. - */ - indirect = code; - copyin(params, &code, sizeof(int)); - params += sizeof(int); - break; - default: - break; - } + code = frame->tf_eax; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; argsize = callp->sy_argsize; + params = (caddr_t)frame->tf_esp + sizeof(int); if (argsize && (error = copyin(params, args, argsize))) goto bad; rval[0] = 0; rval[1] = frame->tf_edx; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: Index: sys/arch/m88k/m88k/trap.c =================================================================== RCS file: /cvs/src/sys/arch/m88k/m88k/trap.c,v diff -u -p -u -r1.128 trap.c --- sys/arch/m88k/m88k/trap.c 2 Aug 2023 06:14:46 -0000 1.128 +++ sys/arch/m88k/m88k/trap.c 27 Oct 2023 03:26:49 -0000 @@ -1153,9 +1153,9 @@ void m88100_syscall(register_t code, struct trapframe *tf) { int i, nap; - const struct sysent *callp; + const struct sysent *callp = sysent; struct proc *p = curproc; - int error, indirect = -1; + int error; register_t args[8] __aligned(8); register_t rval[2] __aligned(8); register_t *ap; @@ -1172,19 +1172,9 @@ m88100_syscall(register_t code, struct t ap = &tf->tf_r[2]; nap = 8; /* r2-r9 */ - switch (code) { - case SYS_syscall: - indirect = code; - code = *ap++; - nap--; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; i = callp->sy_argsize / sizeof(register_t); if (i > sizeof(args) / sizeof(register_t)) @@ -1200,7 +1190,7 @@ m88100_syscall(register_t code, struct t rval[0] = 0; rval[1] = tf->tf_r[3]; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); /* * system call will look like: @@ -1266,7 +1256,7 @@ void m88110_syscall(register_t code, struct trapframe *tf) { int i, nap; - const struct sysent *callp; + const struct sysent *callp = sysent; struct proc *p = curproc; int error; register_t args[8] __aligned(8); @@ -1285,17 +1275,8 @@ m88110_syscall(register_t code, struct t ap = &tf->tf_r[2]; nap = 8; /* r2-r9 */ - switch (code) { - case SYS_syscall: - code = *ap++; - nap--; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else + // XXX out of range stays on syscall0, which we assume is enosys + if (code >= 0 || code <= SYS_MAXSYSCALL) callp += code; i = callp->sy_argsize / sizeof(register_t); Index: sys/arch/mips64/mips64/trap.c =================================================================== RCS file: /cvs/src/sys/arch/mips64/mips64/trap.c,v diff -u -p -u -r1.167 trap.c --- sys/arch/mips64/mips64/trap.c 26 Apr 2023 16:53:59 -0000 1.167 +++ sys/arch/mips64/mips64/trap.c 27 Oct 2023 03:26:49 -0000 @@ -396,14 +396,12 @@ fault_common_no_miss: case T_SYSCALL+T_USER: { struct trapframe *locr0 = p->p_md.md_regs; - const struct sysent *callp; - unsigned int code, indirect = -1; + const struct sysent *callp = sysent; + unsigned int code; register_t tpc; uint32_t branch = 0; int error, numarg; - struct args { - register_t i[8]; - } args; + register_t args[8]; register_t rval[2]; atomic_inc_int(&uvmexp.syscalls); @@ -422,51 +420,22 @@ fault_common_no_miss: trapframe->pc, 0, branch); } else locr0->pc += 4; - callp = sysent; code = locr0->v0; - switch (code) { - case SYS_syscall: - /* - * Code is first argument, followed by actual args. - */ - indirect = code; - code = locr0->a0; - if (code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; - numarg = callp->sy_argsize / sizeof(register_t); - args.i[0] = locr0->a1; - args.i[1] = locr0->a2; - args.i[2] = locr0->a3; - if (numarg > 3) { - args.i[3] = locr0->a4; - args.i[4] = locr0->a5; - args.i[5] = locr0->a6; - args.i[6] = locr0->a7; - if (numarg > 7) - if ((error = copyin((void *)locr0->sp, - &args.i[7], sizeof(register_t)))) - goto bad; - } - break; - default: - if (code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; - - numarg = callp->sy_narg; - args.i[0] = locr0->a0; - args.i[1] = locr0->a1; - args.i[2] = locr0->a2; - args.i[3] = locr0->a3; - if (numarg > 4) { - args.i[4] = locr0->a4; - args.i[5] = locr0->a5; - args.i[6] = locr0->a6; - args.i[7] = locr0->a7; - } + + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; + + numarg = callp->sy_narg; + args[0] = locr0->a0; + args[1] = locr0->a1; + args[2] = locr0->a2; + args[3] = locr0->a3; + if (numarg > 4) { + args[4] = locr0->a4; + args[5] = locr0->a5; + args[6] = locr0->a6; + args[7] = locr0->a7; } rval[0] = 0; @@ -477,29 +446,24 @@ fault_common_no_miss: TRAPSIZE : trppos[ci->ci_cpuid]) - 1].code = code; #endif - error = mi_syscall(p, code, indirect, callp, args.i, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: locr0->v0 = rval[0]; locr0->a3 = 0; break; - case ERESTART: locr0->pc = tpc; break; - case EJUSTRETURN: break; /* nothing to do */ - default: - bad: locr0->v0 = error; locr0->a3 = 1; } mi_syscall_return(p, code, error, rval); - return; } Index: sys/arch/powerpc/powerpc/trap.c =================================================================== RCS file: /cvs/src/sys/arch/powerpc/powerpc/trap.c,v diff -u -p -u -r1.131 trap.c --- sys/arch/powerpc/powerpc/trap.c 11 Feb 2023 23:07:27 -0000 1.131 +++ sys/arch/powerpc/powerpc/trap.c 27 Oct 2023 03:26:49 -0000 @@ -239,11 +239,11 @@ trap(struct trapframe *frame) struct vm_map *map; vaddr_t va; int access_type; - const struct sysent *callp; + const struct sysent *callp = sysent; size_t argsize; register_t code, error; register_t *params, rval[2], args[10]; - int n, indirect = -1; + int n; if (frame->srr1 & PSL_PR) { type |= EXC_USER; @@ -360,27 +360,13 @@ trap(struct trapframe *frame) case EXC_SC|EXC_USER: uvmexp.syscalls++; - code = frame->fixreg[0]; params = frame->fixreg + FIRSTARG; - switch (code) { - case SYS_syscall: - /* - * code is first argument, - * followed by actual args. - */ - indirect = code; - code = *params++; - break; - default: - break; - } + code = frame->fixreg[0]; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; argsize = callp->sy_argsize; n = NARGREG - (params - (frame->fixreg + FIRSTARG)); if (argsize > n * sizeof(register_t)) { @@ -395,7 +381,7 @@ trap(struct trapframe *frame) rval[0] = 0; rval[1] = frame->fixreg[FIRSTARG + 1]; - error = mi_syscall(p, code, indirect, callp, params, rval); + error = mi_syscall(p, code, callp, params, rval); switch (error) { case 0: Index: sys/arch/powerpc64/powerpc64/syscall.c =================================================================== RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/syscall.c,v diff -u -p -u -r1.11 syscall.c --- sys/arch/powerpc64/powerpc64/syscall.c 11 Feb 2023 23:07:27 -0000 1.11 +++ sys/arch/powerpc64/powerpc64/syscall.c 27 Oct 2023 03:26:49 -0000 @@ -30,27 +30,17 @@ void syscall(struct trapframe *frame) { struct proc *p = curproc; - const struct sysent *callp; - int code, error, indirect = -1; + const struct sysent *callp = sysent; + int code, error; int nap = 8, nargs; register_t *ap, *args, copyargs[MAXARGS], rval[2]; - code = frame->fixreg[0]; ap = &frame->fixreg[3]; + code = frame->fixreg[0]; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; - switch (code) { - case SYS_syscall: - indirect = code; - code = *ap++; - nap--; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; nargs = callp->sy_argsize / sizeof(register_t); if (nargs <= nap) { args = ap; @@ -66,7 +56,7 @@ syscall(struct trapframe *frame) rval[0] = 0; rval[1] = 0; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: @@ -74,15 +64,12 @@ syscall(struct trapframe *frame) frame->fixreg[3] = rval[0]; frame->cr &= ~0x10000000; break; - case ERESTART: frame->srr0 -= 4; break; - case EJUSTRETURN: /* nothing to do */ break; - default: bad: frame->fixreg[0] = error; Index: sys/arch/riscv64/riscv64/syscall.c =================================================================== RCS file: /cvs/src/sys/arch/riscv64/riscv64/syscall.c,v diff -u -p -u -r1.16 syscall.c --- sys/arch/riscv64/riscv64/syscall.c 13 Apr 2023 02:19:05 -0000 1.16 +++ sys/arch/riscv64/riscv64/syscall.c 27 Oct 2023 03:26:49 -0000 @@ -39,33 +39,20 @@ void svc_handler(trapframe_t *frame) { struct proc *p = curproc; - const struct sysent *callp; - int code, error, indirect = -1; + const struct sysent *callp = sysent; + int code, error; u_int nap = 8, nargs; register_t *ap, *args, copyargs[MAXARGS], rval[2]; uvmexp.syscalls++; - /* Re-enable interrupts if they were enabled previously */ - if (__predict_true(frame->tf_scause & EXCP_INTR)) - intr_enable(); - ap = &frame->tf_a[0]; code = frame->tf_t[0]; - switch (code) { - case SYS_syscall: - indirect = code; - code = *ap++; - nap--; - break; - } + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp += code; - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; nargs = callp->sy_argsize / sizeof(register_t); if (nargs <= nap) { args = ap; @@ -81,21 +68,18 @@ svc_handler(trapframe_t *frame) rval[0] = 0; rval[1] = 0; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: frame->tf_a[0] = rval[0]; frame->tf_t[0] = 0; /* syscall succeeded */ break; - case ERESTART: frame->tf_sepc -= 4; /* prev instruction */ break; - case EJUSTRETURN: break; - default: bad: frame->tf_a[0] = error; Index: sys/arch/sh/sh/trap.c =================================================================== RCS file: /cvs/src/sys/arch/sh/sh/trap.c,v diff -u -p -u -r1.54 trap.c --- sys/arch/sh/sh/trap.c 11 Feb 2023 23:07:27 -0000 1.54 +++ sys/arch/sh/sh/trap.c 27 Oct 2023 03:26:49 -0000 @@ -516,44 +516,20 @@ syscall(struct proc *p, struct trapframe { caddr_t params; const struct sysent *callp; - int error, opc, indirect = -1; - int argoff, argsize; + int error, opc; + int argsize; register_t code, args[8], rval[2]; uvmexp.syscalls++; opc = tf->tf_spc; code = tf->tf_r0; - params = (caddr_t)tf->tf_r15; - switch (code) { - case SYS_syscall: - /* - * Code is first argument, followed by actual args. - */ - indirect = code; - code = tf->tf_r4; - argoff = 1; - break; - default: - argoff = 0; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else - callp += code; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp = sysent + code; argsize = callp->sy_argsize; -#ifdef DIAGNOSTIC - if (argsize > sizeof args) { - callp += SYS_syscall - code; - argsize = callp->sy_argsize; - } -#endif - if (argsize) { register_t *ap; int off_t_arg; @@ -570,19 +546,16 @@ syscall(struct proc *p, struct trapframe } ap = args; - switch (argoff) { - case 0: *ap++ = tf->tf_r4; argsize -= sizeof(int); - case 1: *ap++ = tf->tf_r5; argsize -= sizeof(int); - case 2: *ap++ = tf->tf_r6; argsize -= sizeof(int); - /* - * off_t args aren't split between register - * and stack, but rather r7 is skipped and - * the entire off_t is on the stack. - */ - if (argoff + off_t_arg == 3) - break; + *ap++ = tf->tf_r4; argsize -= sizeof(int); + *ap++ = tf->tf_r5; argsize -= sizeof(int); + *ap++ = tf->tf_r6; argsize -= sizeof(int); + /* + * off_t args aren't split between register + * and stack, but rather r7 is skipped and + * the entire off_t is on the stack. + */ + if (off_t_arg != 3) { *ap++ = tf->tf_r7; argsize -= sizeof(int); - break; } if (argsize > 0) { @@ -594,7 +567,7 @@ syscall(struct proc *p, struct trapframe rval[0] = 0; rval[1] = tf->tf_r1; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { case 0: Index: sys/arch/sparc64/sparc64/trap.c =================================================================== RCS file: /cvs/src/sys/arch/sparc64/sparc64/trap.c,v diff -u -p -u -r1.115 trap.c --- sys/arch/sparc64/sparc64/trap.c 11 Feb 2023 23:07:28 -0000 1.115 +++ sys/arch/sparc64/sparc64/trap.c 27 Oct 2023 03:26:49 -0000 @@ -1109,9 +1109,10 @@ syscall(struct trapframe *tf, register_t int64_t *ap; const struct sysent *callp; struct proc *p = curproc; - int error, new, indirect = -1; + int error = ENOSYS, new; register_t args[8]; register_t rval[2]; + register_t *argp; if ((tf->tf_out[6] & 1) == 0) sigexit(p, SIGILL); @@ -1137,44 +1138,31 @@ syscall(struct trapframe *tf, register_t ap = &tf->tf_out[0]; nap = 6; - switch (code) { - case SYS_syscall: - indirect = code; - code = *ap++; - nap--; - break; - } - - callp = sysent; - if (code < 0 || code >= SYS_MAXSYSCALL) - callp += SYS_syscall; - else { - register_t *argp; - - callp += code; - i = callp->sy_narg; /* Why divide? */ - if (i > nap) { /* usually false */ - if (i > 8) - panic("syscall nargs"); - /* Read the whole block in */ - if ((error = copyin((caddr_t)tf->tf_out[6] - + BIAS + offsetof(struct frame, fr_argx), - &args[nap], (i - nap) * sizeof(register_t)))) - goto bad; - i = nap; - } - /* - * It should be faster to do <= 6 longword copies than - * to call bcopy - */ - for (argp = args; i--;) - *argp++ = *ap++; + if (code <= 0 || code >= SYS_MAXSYSCALL) + goto bad; + callp = sysent + code; + i = callp->sy_narg; /* Why divide? */ + if (i > nap) { /* usually false */ + if (i > 8) + panic("syscall nargs"); + /* Read the whole block in */ + if ((error = copyin((caddr_t)tf->tf_out[6] + + BIAS + offsetof(struct frame, fr_argx), + &args[nap], (i - nap) * sizeof(register_t)))) + goto bad; + i = nap; } + /* + * It should be faster to do <= 6 longword copies than + * to call bcopy + */ + for (argp = args; i--;) + *argp++ = *ap++; rval[0] = 0; rval[1] = 0; - error = mi_syscall(p, code, indirect, callp, args, rval); + error = mi_syscall(p, code, callp, args, rval); switch (error) { vaddr_t dest; Index: sys/kern/kern_ktrace.c =================================================================== RCS file: /cvs/src/sys/kern/kern_ktrace.c,v diff -u -p -u -r1.112 kern_ktrace.c --- sys/kern/kern_ktrace.c 11 May 2023 09:51:33 -0000 1.112 +++ sys/kern/kern_ktrace.c 27 Oct 2023 03:26:49 -0000 @@ -160,7 +160,7 @@ ktrsyscall(struct proc *p, register_t co u_int nargs = 0; int i; - if ((code & KTRC_CODE_MASK) == SYS_sysctl) { + if (code == SYS_sysctl) { /* * The sysctl encoding stores the mib[] * array because it is interesting. Index: sys/sys/ktrace.h =================================================================== RCS file: /cvs/src/sys/sys/ktrace.h,v diff -u -p -u -r1.46 ktrace.h --- sys/sys/ktrace.h 23 Feb 2023 01:33:20 -0000 1.46 +++ sys/sys/ktrace.h 27 Oct 2023 03:26:49 -0000 @@ -76,8 +76,6 @@ struct ktr_header { #define KTR_SYSCALL 1 struct ktr_syscall { int ktr_code; /* syscall number */ -#define KTRC_CODE_MASK 0x0000ffff -#define KTRC_CODE_SYSCALL 0x20000000 int ktr_argsize; /* size of arguments */ /* * followed by ktr_argsize/sizeof(register_t) "register_t"s Index: sys/sys/syscall_mi.h =================================================================== RCS file: /cvs/src/sys/sys/syscall_mi.h,v diff -u -p -u -r1.28 syscall_mi.h --- sys/sys/syscall_mi.h 11 Feb 2023 23:07:23 -0000 1.28 +++ sys/sys/syscall_mi.h 27 Oct 2023 03:26:49 -0000 @@ -51,8 +51,8 @@ * The MD setup for a system call has been done; here's the MI part. */ static inline int -mi_syscall(struct proc *p, register_t code, int indirect, - const struct sysent *callp, register_t *argp, register_t retval[2]) +mi_syscall(struct proc *p, register_t code, const struct sysent *callp, + register_t *argp, register_t retval[2]) { uint64_t tval; int lock = !(callp->sy_flags & SY_NOLOCK); @@ -73,15 +73,8 @@ mi_syscall(struct proc *p, register_t co #ifdef KTRACE if (KTRPOINT(p, KTR_SYSCALL)) { /* convert to mask, then include with code */ - switch (indirect) { - case SYS_syscall: - indirect = KTRC_CODE_SYSCALL; - break; - default: - indirect = 0; - } KERNEL_LOCK(); - ktrsyscall(p, code | indirect, callp->sy_argsize, argp); + ktrsyscall(p, code, callp->sy_argsize, argp); KERNEL_UNLOCK(); } #endif
Posted Oct 27, 2023 15:55 UTC (Fri)
by iustin (subscriber, #102433)
[Link] (21 responses)
Oooh, wait, this was an _indirect_ syscall. Yes, I can definitely see that being a vector for abuse. I wonder what were the uses cases for it…
Posted Oct 27, 2023 16:10 UTC (Fri)
by corbet (editor, #1)
[Link] (2 responses)
Posted Oct 27, 2023 17:16 UTC (Fri)
by rahulsundaram (subscriber, #21946)
[Link]
Yes. In Go, if you want to even get a owner/group info of a file, you will have to add OS specific conditions and use the syscall interface.
Posted Oct 27, 2023 21:35 UTC (Fri)
by jrtc27 (subscriber, #107748)
[Link]
Posted Oct 27, 2023 17:10 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (17 responses)
One of the hardening tricks OpenBSD is working on is control flow analysis at syscall entry; if you enter the kernel from the "wrong" place, then clearly something fishy is happening, and you should be stopped.
As a side-effect, this breaks making direct syscalls, because you'll be making those syscalls from the "wrong" place.
Posted Oct 27, 2023 17:48 UTC (Fri)
by gutschke (subscriber, #27910)
[Link] (16 responses)
I realize that OpenBSD has a lot more control about its ecosystem and can thus make these type of changes to the official API. But if Linux ever contemplated making similar choices, there would have to be some way to register syscall call sites that live outside of libc. Expect a lot of existing applications to break until patched.
Posted Oct 27, 2023 18:19 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (10 responses)
Posted Oct 28, 2023 5:31 UTC (Sat)
by wtarreau (subscriber, #51152)
[Link] (9 responses)
Posted Oct 28, 2023 19:04 UTC (Sat)
by khim (subscriber, #9252)
[Link] (3 responses)
Windows also doesn't provide one-canonical-libc. But it still doesn't support stable syscalls.
Posted Oct 29, 2023 5:13 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Posted Oct 30, 2023 14:35 UTC (Mon)
by zwol (guest, #126152)
[Link] (1 responses)
Yeah, the built-in Windows component that performs "the C library"'s function of communicating directly with the kernel is NTDLL.DLL. It's instructive to read up on what-all NTDLL.DLL provides, because it started out, back in the days of NT 4.0, as a "just the system calls" shim under KERNEL32.DLL, like someone was saying we should have over in Unix land. But look at the actual file "c:\windows\system32\ntdll.dll" in a current-generation Windows. It's bigger than libc.so.6. If you dump out its symbol table, it provides a whole lot of functionality besides the system calls. This is for practical engineering reasons. NTDLL.DLL is the only piece of code that is guaranteed to be loaded into every user space process, and that means it's the logical place to put functionality that every process needs, no matter what — such as the dynamic loader. Well, it turns out that a dynamic loader is a big Twinkie with a lot of dependencies, because it needs a heap allocator, it needs to be able to report errors, and it needs to be tied so intimately into the thread library that it's easiest to make it be the thread library. All by themselves those things add up to most of a conventional "language runtime". And then once you have that, inevitably it suffers feature creep. I have occasionally thought about what a minimal "just the syscalls" shared library, that all the post-C languages could agree on, would look like, and, well, I think the only plausible way for it to be truly "minimal" and "language-agnostic" is if we somehow move the dynamic loader out of process. That would be worth doing for various security-related reasons as well, but it needs a bunch of new kernel API (so it can manipulate an arbitrary process's memory map without becoming that process's debugger), and it might be unacceptably slow, and it adds, um, challenges to system boot. (It might be interesting to think about a different split, though, in which we divide all of ISO C + POSIX functionality into "everyone needs this" and "only programs written in C want this", accepting that much of the "everyone needs" pile will retain a certain C flavor to it. I don't know if it would be worth doing, I would have to take the time to make the lists.)
Posted Nov 1, 2023 17:32 UTC (Wed)
by simcop2387 (subscriber, #101710)
[Link]
Posted Oct 28, 2023 21:21 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (4 responses)
Then patch glibc's make file so that "if linux then include linux-libc"?
Cheers,
Posted Oct 30, 2023 14:36 UTC (Mon)
by zwol (guest, #126152)
[Link] (3 responses)
Posted Oct 30, 2023 14:59 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Oct 30, 2023 19:35 UTC (Mon)
by zwol (guest, #126152)
[Link] (1 responses)
Posted Oct 30, 2023 19:41 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Posted Oct 27, 2023 18:33 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (3 responses)
At minimum such restrictions must be enabled with a separate syscall, or some bits in the ELF header, or whatever.
Posted Oct 30, 2023 11:21 UTC (Mon)
by mgedmin (subscriber, #34497)
[Link] (2 responses)
Posted Oct 30, 2023 12:12 UTC (Mon)
by joib (subscriber, #8541)
[Link] (1 responses)
I vaguelly recall one of the *BSD projects working on a BSD licensed clone of git, but my google-fu is failing me and I can't find any information about that project. (FreeBSD seems to have switched to using git for their development, using the 'official' GPL-licensed git. )
Posted Oct 30, 2023 14:22 UTC (Mon)
by adamnew123456 (subscriber, #136057)
[Link]
Which underscores the GP's point even further - OpenBSD would sooner write its own Git than depend on the official one. Not that it's *totally* unjustified given their security goals, but that's a far boundary to draw for "the base OS".
Posted Oct 29, 2023 0:15 UTC (Sun)
by wahern (subscriber, #37304)
[Link]
OpenBSD added pinsyscall(2) for this (https://man.openbsd.org/pinsyscall), though it only permits a single call site, the initial pin is immutable, and it currently only knows about execve(2) (see https://github.com/openbsd/src/blob/47565b7/sys/uvm/uvm_m...).
Posted Oct 27, 2023 18:15 UTC (Fri)
by rywang014 (subscriber, #167182)
[Link] (10 responses)
Posted Oct 27, 2023 18:24 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Posted Oct 27, 2023 18:37 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (1 responses)
This is a syscall, now called syscall, but that used to be called indir back in V4 UNIX (and possibly earlier), and that OpenBSD inherited through the BSD legacy.
The idea is that on platforms where a syscall's parameters are stored inline after the syscall instruction (rather than being passed in registers, as on modern platforms), you can assemble a system call at runtime without having to write into the text segment or execute from the data segment - you call indir from the text segment, with parameters pointing at the place in the data segment where you've assembled your actual system call parameters.
Posted Oct 28, 2023 20:30 UTC (Sat)
by dezgeg (subscriber, #92243)
[Link]
Posted Oct 27, 2023 18:38 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Posted Oct 27, 2023 18:24 UTC (Fri)
by lindi (subscriber, #53135)
[Link]
Posted Oct 27, 2023 18:34 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (3 responses)
Posted Oct 27, 2023 20:37 UTC (Fri)
by Karellen (subscriber, #67644)
[Link] (2 responses)
Posted Oct 27, 2023 20:47 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (1 responses)
Posted Oct 27, 2023 20:54 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
Posted Oct 27, 2023 21:24 UTC (Fri)
by ibukanov (subscriber, #3942)
[Link]
Removing syscall() from OpenBSD
In Linux, syscall() is the way to gain access to system calls that do not yet have support in the C library. I believe that the Go runtime uses it as well for system-call access.
Uses for syscall()
Uses for syscall()
Uses for syscall()
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Wol
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Game of Trees
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD
Removing syscall() from OpenBSD