|
|
Subscribe / Log in / New account

Removing syscall() from OpenBSD

For a view into the OpenBSD approach to security, see this message from Theo de Raadt, where he describes a plan to remove the syscall() system call (which allows the invocation of any available system call by providing its number) from the kernel. The purpose, of course, is to make it harder for an attacker to invoke an arbitrary system call, even if they are able to run some code on the target system.

I hope I am forcing attack coders into using increasingly more complicated methods. Same time, it means fewer methods are available. Other methods make exploitation more fragile. This is pushing success rates into "low-percent statistical" success. If we teach more software stacks to "fail hard, don't try to recover", that is an improvement in security.


From:  Theo de Raadt <deraadt-AT-openbsd.org>
To:  tech-AT-cvs.openbsd.org
Subject:  Removing syscall(2) from libc and kernel
Date:  Fri, 27 Oct 2023 08:45:41 -0600
Message-ID:  <69276.1698417941@cvs.openbsd.org>

Piece by piece, I've been trying to remove the easiest of the
terminal-actions that exploit code uses (ie. getting to execve, or performing
other system calls, etc).

I recognize we can never completely remove all mechanisms they
use. However, I hope I am forcing attack coders into using increasingly
more complicated methods. Same time, it means fewer methods are
available.  Other methods make exploitation more fragile.  This is
pushing success rates into "low-percent statistical" success. If we
teach more software stacks to "fail hard, don't try to recover", that is
an improvement in security.

I already made it difficult to call execve() directly in a few ways.
The kernel must be entered via the exact syscall instruction, inside the
libc syscall stub.  Immediately before that syscall instruction, the
SYS_execve instruction is loaded into a register.  On some
architectures, the PLT-reachable stub performs a retguard check, which
can be triggered by a few methods.  Stack pivots are also mostly
prevented because of other checks.  It is not possible to enter via
the SYS_syscall (syscall register = 0) case either.

Attack code can try to do perform other system calls, to create
filesystem damage or network communication.  They could still load other
syscall numbers and jump to a found syscall instruction, if they are
able to cheat the retguard epilogue (It is a bit unfortunate that libc
syscall stubs tend to use the same save register, but at least the
compare offset is chosen random at compile time).  Or, they could know
where all the system calls are from a pre-read libc, which requires them
to be on the machine before performing an online or offline attack (libc
is random relinked, but still readable in the filesystem).  It's
difficult to discover code-locations online only, because most
architectures also have xonly code now.  Some methods can use PLT
entries (which also vary based upon random relink), but I've not seem
much methodology using PLT entry + offset.

Anyways, everyone of these things I mention, and the ones I don't mention,
tend to be more difficult than the previous methods.  I'm trying to remove
simple methods, and force attackers into more and more complex methods.
I promise that I will circle back and damage the more complex methods in
the future.


So in this next step, I'm going to take away the ability to perform syscall #0
(SYS_syscall), with the first argument being the real system call.

This library interface, and all the pieces below it, will be going away:

    https://man.openbsd.org/syscall.2

There's going to be some fallout which takes time to fix, especially in the
"go" ecosystem.

Snapshots for some architectures now contain kernel diffs which reject
syscall(2).  The symbol still remains libc.

I'm including a piece of this diff.




Index: sys/arch/alpha/alpha/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/alpha/alpha/trap.c,v
diff -u -p -u -r1.108 trap.c
--- sys/arch/alpha/alpha/trap.c	8 Mar 2023 04:43:07 -0000	1.108
+++ sys/arch/alpha/alpha/trap.c	27 Oct 2023 03:26:49 -0000
@@ -497,17 +497,15 @@ dopanic:
  * a3, and v0 from the frame before returning to the user process.
  */
 void
-syscall(code, framep)
-	u_int64_t code;
-	struct trapframe *framep;
+syscall(u_int64_t code, struct trapframe *framep)
 {
-	const struct sysent *callp;
+	const struct sysent *callp = sysent;
 	struct proc *p;
-	int error, indirect = -1;
+	int error;
 	u_int64_t opc;
 	u_long rval[2];
 	u_long args[10];					/* XXX */
-	u_int hidden, nargs;
+	u_int nargs;
 
 	atomic_add_int(&uvmexp.syscalls, 1);
 	p = curproc;
@@ -515,24 +513,11 @@ syscall(code, framep)
 	framep->tf_regs[FRAME_SP] = alpha_pal_rdusp();
 	opc = framep->tf_regs[FRAME_PC] - 4;
 
-	switch(code) {
-	case SYS_syscall:
-		indirect = code;
-		code = framep->tf_regs[FRAME_A0];
-		hidden = 1;
-		break;
-	default:
-		hidden = 0;
-	}
-
-	error = 0;
-	callp = sysent;
-	if (code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp += code;
 
-	nargs = callp->sy_narg + hidden;
+	nargs = callp->sy_narg;
 	switch (nargs) {
 	default:
 		if (nargs > 10)		/* XXX */
@@ -559,7 +544,7 @@ syscall(code, framep)
 	rval[0] = 0;
 	rval[1] = 0;
 
-	error = mi_syscall(p, code, indirect, callp, args + hidden, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
Index: sys/arch/amd64/amd64/locore.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/locore.S,v
diff -u -p -u -r1.141 locore.S
--- sys/arch/amd64/amd64/locore.S	24 Oct 2023 13:20:09 -0000	1.141
+++ sys/arch/amd64/amd64/locore.S	27 Oct 2023 03:26:49 -0000
@@ -508,6 +508,7 @@ ENTRY(savectx)
 	lfence
 END(savectx)
 
+// XXX this should not behave like a nop
 IDTVEC(syscall32)
 	sysret		/* go away please */
 END(Xsyscall32)
Index: sys/arch/amd64/amd64/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/trap.c,v
diff -u -p -u -r1.101 trap.c
--- sys/arch/amd64/amd64/trap.c	5 Jul 2023 12:58:55 -0000	1.101
+++ sys/arch/amd64/amd64/trap.c	27 Oct 2023 03:26:49 -0000
@@ -553,7 +553,7 @@ syscall(struct trapframe *frame)
 	caddr_t params;
 	const struct sysent *callp;
 	struct proc *p;
-	int error, indirect = -1;
+	int error = ENOSYS;
 	size_t argsize, argoff;
 	register_t code, args[9], rval[2], *argp;
 
@@ -570,26 +570,9 @@ syscall(struct trapframe *frame)
 	argp = &args[0];
 	argoff = 0;
 
-	switch (code) {
-	case SYS_syscall:
-		/*
-		 * Code is first argument, followed by actual args.
-		 */
-		indirect = code;
-		code = frame->tf_rdi;
-		argp = &args[1];
-		argoff = 1;
-		break;
-	default:
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
-
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp = sysent + code;
 	argsize = (callp->sy_argsize >> 3) + argoff;
 	if (argsize) {
 		switch (MIN(argsize, 6)) {
@@ -620,7 +603,7 @@ syscall(struct trapframe *frame)
 	rval[0] = 0;
 	rval[1] = 0;
 
-	error = mi_syscall(p, code, indirect, callp, argp, rval);
+	error = mi_syscall(p, code, callp, argp, rval);
 
 	switch (error) {
 	case 0:
Index: sys/arch/arm/arm/syscall.c
===================================================================
RCS file: /cvs/src/sys/arch/arm/arm/syscall.c,v
diff -u -p -u -r1.26 syscall.c
--- sys/arch/arm/arm/syscall.c	11 Feb 2023 23:07:26 -0000	1.26
+++ sys/arch/arm/arm/syscall.c	27 Oct 2023 03:26:49 -0000
@@ -93,8 +93,8 @@ void
 swi_handler(trapframe_t *frame)
 {
 	struct proc *p = curproc;
-	const struct sysent *callp;
-	int code, error, indirect = -1;
+	const struct sysent *callp = sysent;
+	int code, error;
 	u_int nap = 4, nargs;
 	register_t *ap, *args, copyargs[MAXARGS], rval[2];
 
@@ -103,32 +103,19 @@ swi_handler(trapframe_t *frame)
 	/* Before enabling interrupts, save FPU state */
 	vfp_save();
 
-	/* Re-enable interrupts if they were enabled previously */
-	if (__predict_true((frame->tf_spsr & PSR_I) == 0))
-		enable_interrupts(PSR_I);
+	enable_interrupts(PSR_I);
 
 	p->p_addr->u_pcb.pcb_tf = frame;
 
 	/* Skip over speculation-blocking barrier. */
 	frame->tf_pc += 8;
 
-	code = frame->tf_r12;
-
 	ap = &frame->tf_r0;
 
-	switch (code) {	
-	case SYS_syscall:
-		indirect = code;
-		code = *ap++;
-		nap--;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
+	code = frame->tf_r12;
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp += code;
 
 	nargs = callp->sy_argsize / sizeof(register_t);
 	if (nargs <= nap) {
@@ -145,27 +132,23 @@ swi_handler(trapframe_t *frame)
 	rval[0] = 0;
 	rval[1] = frame->tf_r1;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
 		frame->tf_r0 = rval[0];
 		frame->tf_r1 = rval[1];
-
 		frame->tf_spsr &= ~PSR_C;	/* carry bit */
 		break;
-
 	case ERESTART:
 		/*
 		 * Reconstruct the pc to point at the swi.
 		 */
 		frame->tf_pc -= 12;
 		break;
-
 	case EJUSTRETURN:
 		/* nothing to do */
 		break;
-
 	default:
 	bad:
 		frame->tf_r0 = error;
Index: sys/arch/arm64/arm64/syscall.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/arm64/syscall.c,v
diff -u -p -u -r1.14 syscall.c
--- sys/arch/arm64/arm64/syscall.c	13 Apr 2023 02:19:04 -0000	1.14
+++ sys/arch/arm64/arm64/syscall.c	27 Oct 2023 03:26:49 -0000
@@ -33,7 +33,7 @@ svc_handler(trapframe_t *frame)
 {
 	struct proc *p = curproc;
 	const struct sysent *callp;
-	int code, error, indirect = -1;
+	int code, error = ENOSYS;
 	u_int nap = 8, nargs;
 	register_t *ap, *args, copyargs[MAXARGS], rval[2];
 
@@ -50,19 +50,9 @@ svc_handler(trapframe_t *frame)
 
 	ap = &frame->tf_x[0];
 
-	switch (code) {	
-	case SYS_syscall:
-		indirect = code;
-		code = *ap++;
-		nap--;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp = sysent + code;
 
 	nargs = callp->sy_argsize / sizeof(register_t);
 	if (nargs <= nap) {
@@ -79,25 +69,22 @@ svc_handler(trapframe_t *frame)
 	rval[0] = 0;
 	rval[1] = 0;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
 		frame->tf_x[0] = rval[0];
 		frame->tf_spsr &= ~PSR_C;	/* carry bit */
 		break;
-
 	case ERESTART:
 		/*
 		 * Reconstruct the pc to point at the svc.
 		 */
 		frame->tf_elr -= 12;
 		break;
-
 	case EJUSTRETURN:
 		/* nothing to do */
 		break;
-
 	default:
 	bad:
 		frame->tf_x[0] = error;
Index: sys/arch/hppa/hppa/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/hppa/hppa/trap.c,v
diff -u -p -u -r1.161 trap.c
--- sys/arch/hppa/hppa/trap.c	11 Feb 2023 23:07:26 -0000	1.161
+++ sys/arch/hppa/hppa/trap.c	27 Oct 2023 03:26:49 -0000
@@ -764,8 +764,8 @@ void
 syscall(struct trapframe *frame)
 {
 	struct proc *p = curproc;
-	const struct sysent *callp;
-	int retq, code, argsize, argoff, error, indirect = -1;
+	const struct sysent *callp = sysent;
+	int code, argsize, argoff, error;
 	register_t args[8], rval[2];
 #ifdef DIAGNOSTIC
 	int oldcpl = curcpu()->ci_cpl;
@@ -778,29 +778,16 @@ syscall(struct trapframe *frame)
 
 	p->p_md.md_regs = frame;
 
-	argoff = 4; retq = 0;
-	switch (code = frame->tf_t1) {
-	case SYS_syscall:
-		indirect = code;
-		code = frame->tf_arg0;
-		args[0] = frame->tf_arg1;
-		args[1] = frame->tf_arg2;
-		args[2] = frame->tf_arg3;
-		argoff = 3;
-		break;
-	default:
-		args[0] = frame->tf_arg0;
-		args[1] = frame->tf_arg1;
-		args[2] = frame->tf_arg2;
-		args[3] = frame->tf_arg3;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
+	argoff = 4;
+	code = frame->tf_t1;
+	args[0] = frame->tf_arg0;
+	args[1] = frame->tf_arg1;
+	args[2] = frame->tf_arg2;
+	args[3] = frame->tf_arg3;
+
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp += code;
 
 	if ((argsize = callp->sy_argsize)) {
 		register_t *s, *e, t;
@@ -830,7 +817,7 @@ syscall(struct trapframe *frame)
 		 */
 		i = 0;
 		switch (code) {
-		case SYS_lseek:		retq = 0;
+		case SYS_lseek:
 		case SYS_truncate:
 		case SYS_ftruncate:	i = 2;	break;
 		case SYS_preadv:
@@ -851,12 +838,12 @@ syscall(struct trapframe *frame)
 	rval[0] = 0;
 	rval[1] = frame->tf_ret1;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
 		frame->tf_ret0 = rval[0];
-		frame->tf_ret1 = rval[!retq];
+		frame->tf_ret1 = rval[1];
 		frame->tf_t1 = 0;
 		break;
 	case ERESTART:
@@ -872,7 +859,7 @@ syscall(struct trapframe *frame)
 		break;
 	}
 
-	ast(p);
+	ast(p);		// XXX why?
 
 	mi_syscall_return(p, code, error, rval);
 
Index: sys/arch/i386/i386/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/trap.c,v
diff -u -p -u -r1.162 trap.c
--- sys/arch/i386/i386/trap.c	16 Apr 2023 06:43:49 -0000	1.162
+++ sys/arch/i386/i386/trap.c	27 Oct 2023 03:26:49 -0000
@@ -516,9 +516,9 @@ void
 syscall(struct trapframe *frame)
 {
 	caddr_t params;
-	const struct sysent *callp;
-	struct proc *p;
-	int error, indirect = -1;
+	const struct sysent *callp = sysent;
+	struct proc *p = curproc;
+	int error;
 	register_t code, args[8], rval[2];
 #ifdef DIAGNOSTIC
 	int ocpl = lapic_tpr;
@@ -540,38 +540,22 @@ syscall(struct trapframe *frame)
 	}
 #endif
 
-	p = curproc;
 	p->p_md.md_regs = frame;
-	code = frame->tf_eax;
-
-	params = (caddr_t)frame->tf_esp + sizeof(int);
 
-	switch (code) {
-	case SYS_syscall:
-		/*
-		 * Code is first argument, followed by actual args.
-		 */
-		indirect = code;
-		copyin(params, &code, sizeof(int));
-		params += sizeof(int);
-		break;
-	default:
-		break;
-	}
+	code = frame->tf_eax;
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp += code;
 
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
 	argsize = callp->sy_argsize;
+	params = (caddr_t)frame->tf_esp + sizeof(int);
 	if (argsize && (error = copyin(params, args, argsize)))
 		goto bad;
 
 	rval[0] = 0;
 	rval[1] = frame->tf_edx;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
Index: sys/arch/m88k/m88k/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/m88k/m88k/trap.c,v
diff -u -p -u -r1.128 trap.c
--- sys/arch/m88k/m88k/trap.c	2 Aug 2023 06:14:46 -0000	1.128
+++ sys/arch/m88k/m88k/trap.c	27 Oct 2023 03:26:49 -0000
@@ -1153,9 +1153,9 @@ void
 m88100_syscall(register_t code, struct trapframe *tf)
 {
 	int i, nap;
-	const struct sysent *callp;
+	const struct sysent *callp = sysent;
 	struct proc *p = curproc;
-	int error, indirect = -1;
+	int error;
 	register_t args[8] __aligned(8);
 	register_t rval[2] __aligned(8);
 	register_t *ap;
@@ -1172,19 +1172,9 @@ m88100_syscall(register_t code, struct t
 	ap = &tf->tf_r[2];
 	nap = 8; /* r2-r9 */
 
-	switch (code) {
-	case SYS_syscall:
-		indirect = code;
-		code = *ap++;
-		nap--;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp += code;
 
 	i = callp->sy_argsize / sizeof(register_t);
 	if (i > sizeof(args) / sizeof(register_t))
@@ -1200,7 +1190,7 @@ m88100_syscall(register_t code, struct t
 	rval[0] = 0;
 	rval[1] = tf->tf_r[3];
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	/*
 	 * system call will look like:
@@ -1266,7 +1256,7 @@ void
 m88110_syscall(register_t code, struct trapframe *tf)
 {
 	int i, nap;
-	const struct sysent *callp;
+	const struct sysent *callp = sysent;
 	struct proc *p = curproc;
 	int error;
 	register_t args[8] __aligned(8);
@@ -1285,17 +1275,8 @@ m88110_syscall(register_t code, struct t
 	ap = &tf->tf_r[2];
 	nap = 8;	/* r2-r9 */
 
-	switch (code) {
-	case SYS_syscall:
-		code = *ap++;
-		nap--;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
+	// XXX out of range stays on syscall0, which we assume is enosys
+	if (code >= 0 || code <= SYS_MAXSYSCALL)
 		callp += code;
 
 	i = callp->sy_argsize / sizeof(register_t);
Index: sys/arch/mips64/mips64/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/mips64/mips64/trap.c,v
diff -u -p -u -r1.167 trap.c
--- sys/arch/mips64/mips64/trap.c	26 Apr 2023 16:53:59 -0000	1.167
+++ sys/arch/mips64/mips64/trap.c	27 Oct 2023 03:26:49 -0000
@@ -396,14 +396,12 @@ fault_common_no_miss:
 	case T_SYSCALL+T_USER:
 	    {
 		struct trapframe *locr0 = p->p_md.md_regs;
-		const struct sysent *callp;
-		unsigned int code, indirect = -1;
+		const struct sysent *callp = sysent;
+		unsigned int code;
 		register_t tpc;
 		uint32_t branch = 0;
 		int error, numarg;
-		struct args {
-			register_t i[8];
-		} args;
+		register_t args[8];
 		register_t rval[2];
 
 		atomic_inc_int(&uvmexp.syscalls);
@@ -422,51 +420,22 @@ fault_common_no_miss:
 			    trapframe->pc, 0, branch);
 		} else
 			locr0->pc += 4;
-		callp = sysent;
 		code = locr0->v0;
-		switch (code) {
-		case SYS_syscall:
-			/*
-			 * Code is first argument, followed by actual args.
-			 */
-			indirect = code;
-			code = locr0->a0;
-			if (code >= SYS_MAXSYSCALL)
-				callp += SYS_syscall;
-			else
-				callp += code;
-			numarg = callp->sy_argsize / sizeof(register_t);
-			args.i[0] = locr0->a1;
-			args.i[1] = locr0->a2;
-			args.i[2] = locr0->a3;
-			if (numarg > 3) {
-				args.i[3] = locr0->a4;
-				args.i[4] = locr0->a5;
-				args.i[5] = locr0->a6;
-				args.i[6] = locr0->a7;
-				if (numarg > 7)
-					if ((error = copyin((void *)locr0->sp,
-					    &args.i[7], sizeof(register_t))))
-						goto bad;
-			}
-			break;
-		default:
-			if (code >= SYS_MAXSYSCALL)
-				callp += SYS_syscall;
-			else
-				callp += code;
-
-			numarg = callp->sy_narg;
-			args.i[0] = locr0->a0;
-			args.i[1] = locr0->a1;
-			args.i[2] = locr0->a2;
-			args.i[3] = locr0->a3;
-			if (numarg > 4) {
-				args.i[4] = locr0->a4;
-				args.i[5] = locr0->a5;
-				args.i[6] = locr0->a6;
-				args.i[7] = locr0->a7;
-			}
+
+		if (code <= 0 || code >= SYS_MAXSYSCALL)
+			goto bad;
+		callp += code;
+
+		numarg = callp->sy_narg;
+		args[0] = locr0->a0;
+		args[1] = locr0->a1;
+		args[2] = locr0->a2;
+		args[3] = locr0->a3;
+		if (numarg > 4) {
+			args[4] = locr0->a4;
+			args[5] = locr0->a5;
+			args[6] = locr0->a6;
+			args[7] = locr0->a7;
 		}
 
 		rval[0] = 0;
@@ -477,29 +446,24 @@ fault_common_no_miss:
 		    TRAPSIZE : trppos[ci->ci_cpuid]) - 1].code = code;
 #endif
 
-		error = mi_syscall(p, code, indirect, callp, args.i, rval);
+		error = mi_syscall(p, code, callp, args, rval);
 
 		switch (error) {
 		case 0:
 			locr0->v0 = rval[0];
 			locr0->a3 = 0;
 			break;
-
 		case ERESTART:
 			locr0->pc = tpc;
 			break;
-
 		case EJUSTRETURN:
 			break;	/* nothing to do */
-
 		default:
-		bad:
 			locr0->v0 = error;
 			locr0->a3 = 1;
 		}
 
 		mi_syscall_return(p, code, error, rval);
-
 		return;
 	    }
 
Index: sys/arch/powerpc/powerpc/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/powerpc/powerpc/trap.c,v
diff -u -p -u -r1.131 trap.c
--- sys/arch/powerpc/powerpc/trap.c	11 Feb 2023 23:07:27 -0000	1.131
+++ sys/arch/powerpc/powerpc/trap.c	27 Oct 2023 03:26:49 -0000
@@ -239,11 +239,11 @@ trap(struct trapframe *frame)
 	struct vm_map *map;
 	vaddr_t va;
 	int access_type;
-	const struct sysent *callp;
+	const struct sysent *callp = sysent;
 	size_t argsize;
 	register_t code, error;
 	register_t *params, rval[2], args[10];
-	int n, indirect = -1;
+	int n;
 
 	if (frame->srr1 & PSL_PR) {
 		type |= EXC_USER;
@@ -360,27 +360,13 @@ trap(struct trapframe *frame)
 	case EXC_SC|EXC_USER:
 		uvmexp.syscalls++;
 
-		code = frame->fixreg[0];
 		params = frame->fixreg + FIRSTARG;
 
-		switch (code) {
-		case SYS_syscall:
-			/*
-			 * code is first argument,
-			 * followed by actual args.
-			 */
-			indirect = code;
-			code = *params++;
-			break;
-		default:
-			break;
-		}
+		code = frame->fixreg[0];
+	        if (code <= 0 || code >= SYS_MAXSYSCALL)
+			goto bad;
+                callp += code;
 
-		callp = sysent;
-		if (code < 0 || code >= SYS_MAXSYSCALL)
-			callp += SYS_syscall;
-		else
-			callp += code;
 		argsize = callp->sy_argsize;
 		n = NARGREG - (params - (frame->fixreg + FIRSTARG));
 		if (argsize > n * sizeof(register_t)) {
@@ -395,7 +381,7 @@ trap(struct trapframe *frame)
 		rval[0] = 0;
 		rval[1] = frame->fixreg[FIRSTARG + 1];
 
-		error = mi_syscall(p, code, indirect, callp, params, rval);
+		error = mi_syscall(p, code, callp, params, rval);
 
 		switch (error) {
 		case 0:
Index: sys/arch/powerpc64/powerpc64/syscall.c
===================================================================
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/syscall.c,v
diff -u -p -u -r1.11 syscall.c
--- sys/arch/powerpc64/powerpc64/syscall.c	11 Feb 2023 23:07:27 -0000	1.11
+++ sys/arch/powerpc64/powerpc64/syscall.c	27 Oct 2023 03:26:49 -0000
@@ -30,27 +30,17 @@ void
 syscall(struct trapframe *frame)
 {
 	struct proc *p = curproc;
-	const struct sysent *callp;
-	int code, error, indirect = -1;
+	const struct sysent *callp = sysent;
+	int code, error;
 	int nap = 8, nargs;
 	register_t *ap, *args, copyargs[MAXARGS], rval[2];
 
-	code = frame->fixreg[0];
 	ap = &frame->fixreg[3];
+	code = frame->fixreg[0];
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp += code;
 
-	switch (code) {
-	case SYS_syscall:
-		indirect = code;
-		code = *ap++;
-		nap--;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
 	nargs = callp->sy_argsize / sizeof(register_t);
 	if (nargs <= nap) {
 		args = ap;
@@ -66,7 +56,7 @@ syscall(struct trapframe *frame)
 	rval[0] = 0;
 	rval[1] = 0;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
@@ -74,15 +64,12 @@ syscall(struct trapframe *frame)
 		frame->fixreg[3] = rval[0];
 		frame->cr &= ~0x10000000;
 		break;
-
 	case ERESTART:
 		frame->srr0 -= 4;
 		break;
-
 	case EJUSTRETURN:
 		/* nothing to do */
 		break;
-
 	default:
 	bad:
 		frame->fixreg[0] = error;
Index: sys/arch/riscv64/riscv64/syscall.c
===================================================================
RCS file: /cvs/src/sys/arch/riscv64/riscv64/syscall.c,v
diff -u -p -u -r1.16 syscall.c
--- sys/arch/riscv64/riscv64/syscall.c	13 Apr 2023 02:19:05 -0000	1.16
+++ sys/arch/riscv64/riscv64/syscall.c	27 Oct 2023 03:26:49 -0000
@@ -39,33 +39,20 @@ void
 svc_handler(trapframe_t *frame)
 {
 	struct proc *p = curproc;
-	const struct sysent *callp;
-	int code, error, indirect = -1;
+	const struct sysent *callp = sysent;
+	int code, error;
 	u_int nap = 8, nargs;
 	register_t *ap, *args, copyargs[MAXARGS], rval[2];
 
 	uvmexp.syscalls++;
 
-	/* Re-enable interrupts if they were enabled previously */
-	if (__predict_true(frame->tf_scause & EXCP_INTR))
-		intr_enable();
-
 	ap = &frame->tf_a[0];
 	code = frame->tf_t[0];
 
-	switch (code) {
-	case SYS_syscall:
-		indirect = code;
-		code = *ap++;
-		nap--;
-		break;
-	}
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp += code;
 
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
 	nargs = callp->sy_argsize / sizeof(register_t);
 	if (nargs <= nap) {
 		args = ap;
@@ -81,21 +68,18 @@ svc_handler(trapframe_t *frame)
 	rval[0] = 0;
 	rval[1] = 0;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
 		frame->tf_a[0] = rval[0];
 		frame->tf_t[0] = 0;		/* syscall succeeded */
 		break;
-
 	case ERESTART:
 		frame->tf_sepc -= 4;		/* prev instruction */
 		break;
-
 	case EJUSTRETURN:
 		break;
-
 	default:
 	bad:
 		frame->tf_a[0] = error;
Index: sys/arch/sh/sh/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/sh/sh/trap.c,v
diff -u -p -u -r1.54 trap.c
--- sys/arch/sh/sh/trap.c	11 Feb 2023 23:07:27 -0000	1.54
+++ sys/arch/sh/sh/trap.c	27 Oct 2023 03:26:49 -0000
@@ -516,44 +516,20 @@ syscall(struct proc *p, struct trapframe
 {
 	caddr_t params;
 	const struct sysent *callp;
-	int error, opc, indirect = -1;
-	int argoff, argsize;
+	int error, opc;
+	int argsize;
 	register_t code, args[8], rval[2];
 
 	uvmexp.syscalls++;
 
 	opc = tf->tf_spc;
 	code = tf->tf_r0;
-
 	params = (caddr_t)tf->tf_r15;
 
-	switch (code) {
-	case SYS_syscall:
-		/*
-		 * Code is first argument, followed by actual args.
-		 */
-		indirect = code;
-	        code = tf->tf_r4;
-		argoff = 1;
-		break;
-	default:
-		argoff = 0;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else
-		callp += code;
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp = sysent + code;
 	argsize = callp->sy_argsize;
-#ifdef DIAGNOSTIC
-	if (argsize > sizeof args) {
-		callp += SYS_syscall - code;
-		argsize = callp->sy_argsize;
-	}
-#endif
-
 	if (argsize) {
 		register_t *ap;
 		int off_t_arg;
@@ -570,19 +546,16 @@ syscall(struct proc *p, struct trapframe
 		}
 
 		ap = args;
-		switch (argoff) {
-		case 0:	*ap++ = tf->tf_r4; argsize -= sizeof(int);
-		case 1:	*ap++ = tf->tf_r5; argsize -= sizeof(int);
-		case 2: *ap++ = tf->tf_r6; argsize -= sizeof(int);
-			/*
-			 * off_t args aren't split between register
-			 * and stack, but rather r7 is skipped and
-			 * the entire off_t is on the stack.
-			 */
-			if (argoff + off_t_arg == 3)
-				break;
+		*ap++ = tf->tf_r4; argsize -= sizeof(int);
+		*ap++ = tf->tf_r5; argsize -= sizeof(int);
+		*ap++ = tf->tf_r6; argsize -= sizeof(int);
+		/*
+		 * off_t args aren't split between register
+		 * and stack, but rather r7 is skipped and
+		 * the entire off_t is on the stack.
+		 */
+		if (off_t_arg != 3) {
 			*ap++ = tf->tf_r7; argsize -= sizeof(int);
-			break;
 		}
 
 		if (argsize > 0) {
@@ -594,7 +567,7 @@ syscall(struct proc *p, struct trapframe
 	rval[0] = 0;
 	rval[1] = tf->tf_r1;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 	case 0:
Index: sys/arch/sparc64/sparc64/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/sparc64/sparc64/trap.c,v
diff -u -p -u -r1.115 trap.c
--- sys/arch/sparc64/sparc64/trap.c	11 Feb 2023 23:07:28 -0000	1.115
+++ sys/arch/sparc64/sparc64/trap.c	27 Oct 2023 03:26:49 -0000
@@ -1109,9 +1109,10 @@ syscall(struct trapframe *tf, register_t
 	int64_t *ap;
 	const struct sysent *callp;
 	struct proc *p = curproc;
-	int error, new, indirect = -1;
+	int error = ENOSYS, new;
 	register_t args[8];
 	register_t rval[2];
+	register_t *argp;
 
 	if ((tf->tf_out[6] & 1) == 0)
 		sigexit(p, SIGILL);
@@ -1137,44 +1138,31 @@ syscall(struct trapframe *tf, register_t
 	ap = &tf->tf_out[0];
 	nap = 6;
 
-	switch (code) {
-	case SYS_syscall:
-		indirect = code;
-		code = *ap++;
-		nap--;
-		break;
-	}
-
-	callp = sysent;
-	if (code < 0 || code >= SYS_MAXSYSCALL)
-		callp += SYS_syscall;
-	else {
-		register_t *argp;
-
-		callp += code;
-		i = callp->sy_narg; /* Why divide? */
-		if (i > nap) {	/* usually false */
-			if (i > 8)
-				panic("syscall nargs");
-			/* Read the whole block in */
-			if ((error = copyin((caddr_t)tf->tf_out[6]
-			    + BIAS + offsetof(struct frame, fr_argx),
-			    &args[nap], (i - nap) * sizeof(register_t))))
-				goto bad;
-			i = nap;
-		}
-		/*
-		 * It should be faster to do <= 6 longword copies than
-		 * to call bcopy
-		 */
-		for (argp = args; i--;)
-			*argp++ = *ap++;
+	if (code <= 0 || code >= SYS_MAXSYSCALL)
+		goto bad;
+	callp = sysent + code;
+	i = callp->sy_narg; /* Why divide? */
+	if (i > nap) {	/* usually false */
+		if (i > 8)
+			panic("syscall nargs");
+		/* Read the whole block in */
+		if ((error = copyin((caddr_t)tf->tf_out[6]
+		    + BIAS + offsetof(struct frame, fr_argx),
+		    &args[nap], (i - nap) * sizeof(register_t))))
+			goto bad;
+		i = nap;
 	}
+	/*
+	 * It should be faster to do <= 6 longword copies than
+	 * to call bcopy
+	 */
+	for (argp = args; i--;)
+		*argp++ = *ap++;
 
 	rval[0] = 0;
 	rval[1] = 0;
 
-	error = mi_syscall(p, code, indirect, callp, args, rval);
+	error = mi_syscall(p, code, callp, args, rval);
 
 	switch (error) {
 		vaddr_t dest;
Index: sys/kern/kern_ktrace.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_ktrace.c,v
diff -u -p -u -r1.112 kern_ktrace.c
--- sys/kern/kern_ktrace.c	11 May 2023 09:51:33 -0000	1.112
+++ sys/kern/kern_ktrace.c	27 Oct 2023 03:26:49 -0000
@@ -160,7 +160,7 @@ ktrsyscall(struct proc *p, register_t co
 	u_int nargs = 0;
 	int i;
 
-	if ((code & KTRC_CODE_MASK) == SYS_sysctl) {
+	if (code == SYS_sysctl) {
 		/*
 		 * The sysctl encoding stores the mib[]
 		 * array because it is interesting.
Index: sys/sys/ktrace.h
===================================================================
RCS file: /cvs/src/sys/sys/ktrace.h,v
diff -u -p -u -r1.46 ktrace.h
--- sys/sys/ktrace.h	23 Feb 2023 01:33:20 -0000	1.46
+++ sys/sys/ktrace.h	27 Oct 2023 03:26:49 -0000
@@ -76,8 +76,6 @@ struct ktr_header {
 #define KTR_SYSCALL	1
 struct ktr_syscall {
 	int	ktr_code;		/* syscall number */
-#define KTRC_CODE_MASK			0x0000ffff
-#define KTRC_CODE_SYSCALL		0x20000000
 	int	ktr_argsize;		/* size of arguments */
 	/*
 	 * followed by ktr_argsize/sizeof(register_t) "register_t"s
Index: sys/sys/syscall_mi.h
===================================================================
RCS file: /cvs/src/sys/sys/syscall_mi.h,v
diff -u -p -u -r1.28 syscall_mi.h
--- sys/sys/syscall_mi.h	11 Feb 2023 23:07:23 -0000	1.28
+++ sys/sys/syscall_mi.h	27 Oct 2023 03:26:49 -0000
@@ -51,8 +51,8 @@
  * The MD setup for a system call has been done; here's the MI part.
  */
 static inline int
-mi_syscall(struct proc *p, register_t code, int indirect,
-    const struct sysent *callp, register_t *argp, register_t retval[2])
+mi_syscall(struct proc *p, register_t code, const struct sysent *callp,
+    register_t *argp, register_t retval[2])
 {
 	uint64_t tval;
 	int lock = !(callp->sy_flags & SY_NOLOCK);
@@ -73,15 +73,8 @@ mi_syscall(struct proc *p, register_t co
 #ifdef KTRACE
 	if (KTRPOINT(p, KTR_SYSCALL)) {
 		/* convert to mask, then include with code */
-		switch (indirect) {
-		case SYS_syscall:
-			indirect = KTRC_CODE_SYSCALL;
-			break;
-		default:
-			indirect = 0;
-		}
 		KERNEL_LOCK();
-		ktrsyscall(p, code | indirect, callp->sy_argsize, argp);
+		ktrsyscall(p, code, callp->sy_argsize, argp);
 		KERNEL_UNLOCK();
 	}
 #endif


to post comments

Removing syscall() from OpenBSD

Posted Oct 27, 2023 15:55 UTC (Fri) by iustin (subscriber, #102433) [Link] (21 responses)

I haven't written generic code that does direct syscalls (skipping libc) in 10+ years, so maybe it's just me - but at some level, there still will be a generic syscall interface, no?

Oooh, wait, this was an _indirect_ syscall. Yes, I can definitely see that being a vector for abuse. I wonder what were the uses cases for it…

Uses for syscall()

Posted Oct 27, 2023 16:10 UTC (Fri) by corbet (editor, #1) [Link] (2 responses)

In Linux, syscall() is the way to gain access to system calls that do not yet have support in the C library. I believe that the Go runtime uses it as well for system-call access.

Uses for syscall()

Posted Oct 27, 2023 17:16 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]

> In Linux, syscall() is the way to gain access to system calls that do not yet have support in the C library. I believe that the Go runtime uses it as well for system-call access.

Yes. In Go, if you want to even get a owner/group info of a file, you will have to add OS specific conditions and use the syscall interface.

Uses for syscall()

Posted Oct 27, 2023 21:35 UTC (Fri) by jrtc27 (subscriber, #107748) [Link]

Notably though, Linux does syscall(2) entirely in userspace, whereas the BSDs have an actual syscall(2) syscall in the kernel instead. This has advantages and disadvantages. One could still do things the Linux way on OpenBSD, subject to restrictions about "where was that syscall issued from?" questions that may make the kernel not honour the request.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 17:10 UTC (Fri) by farnz (subscriber, #17727) [Link] (17 responses)

One of the hardening tricks OpenBSD is working on is control flow analysis at syscall entry; if you enter the kernel from the "wrong" place, then clearly something fishy is happening, and you should be stopped.

As a side-effect, this breaks making direct syscalls, because you'll be making those syscalls from the "wrong" place.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 17:48 UTC (Fri) by gutschke (subscriber, #27910) [Link] (16 responses)

In general, this is wonderful and I am all for this type of hardening. In practice, libc has a very particular abstraction that it presents to its callers. It is a lot more restrictive than what the kernel is capable of doing. 99% of the time, that's exactly what you want. But there always will be that remaining 1% where you have to side-step libc for one reason or another.

I realize that OpenBSD has a lot more control about its ecosystem and can thus make these type of changes to the official API. But if Linux ever contemplated making similar choices, there would have to be some way to register syscall call sites that live outside of libc. Expect a lot of existing applications to break until patched.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:19 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (10 responses)

Linux is the weirdo here. The vast majority of operating systems do not officially support "direct" syscalls. If you make a "direct" syscall on almost any other operating system, it can break at any time and the vendor will either (Apple) tell you that it's your own fault or (Microsoft) curse your name and shim your app to emulate a call to the interface it should've been using in the first place (at a considerable speed penalty, of course).

Removing syscall() from OpenBSD

Posted Oct 28, 2023 5:31 UTC (Sat) by wtarreau (subscriber, #51152) [Link] (9 responses)

Linux is significantly different from many other OSes in that the libc is not provided by the same people who make the kernel, so both projects live their own lives and have to offer a wide compatibility with each other. In other OSes it's easier to offer full OS support in the shipped libc, hence removing any justification for not using the shipped libc.

Removing syscall() from OpenBSD

Posted Oct 28, 2023 19:04 UTC (Sat) by khim (subscriber, #9252) [Link] (3 responses)

Windows also doesn't provide one-canonical-libc. But it still doesn't support stable syscalls.

Removing syscall() from OpenBSD

Posted Oct 29, 2023 5:13 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

Eh, semantics. They provide Windows.h and all of the other nonsense, and that's their equivalent to libc. The fact that it looks nothing like ISO/POSIX libc is certainly annoying for people who want to write cross-platform code, but when we're talking about the OS-userspace boundary, it's filling the same role as libc.

Removing syscall() from OpenBSD

Posted Oct 30, 2023 14:35 UTC (Mon) by zwol (guest, #126152) [Link] (1 responses)

Yeah, the built-in Windows component that performs "the C library"'s function of communicating directly with the kernel is NTDLL.DLL.

It's instructive to read up on what-all NTDLL.DLL provides, because it started out, back in the days of NT 4.0, as a "just the system calls" shim under KERNEL32.DLL, like someone was saying we should have over in Unix land. But look at the actual file "c:\windows\system32\ntdll.dll" in a current-generation Windows. It's bigger than libc.so.6. If you dump out its symbol table, it provides a whole lot of functionality besides the system calls.

This is for practical engineering reasons. NTDLL.DLL is the only piece of code that is guaranteed to be loaded into every user space process, and that means it's the logical place to put functionality that every process needs, no matter what — such as the dynamic loader. Well, it turns out that a dynamic loader is a big Twinkie with a lot of dependencies, because it needs a heap allocator, it needs to be able to report errors, and it needs to be tied so intimately into the thread library that it's easiest to make it be the thread library. All by themselves those things add up to most of a conventional "language runtime". And then once you have that, inevitably it suffers feature creep.

I have occasionally thought about what a minimal "just the syscalls" shared library, that all the post-C languages could agree on, would look like, and, well, I think the only plausible way for it to be truly "minimal" and "language-agnostic" is if we somehow move the dynamic loader out of process. That would be worth doing for various security-related reasons as well, but it needs a bunch of new kernel API (so it can manipulate an arbitrary process's memory map without becoming that process's debugger), and it might be unacceptably slow, and it adds, um, challenges to system boot.

(It might be interesting to think about a different split, though, in which we divide all of ISO C + POSIX functionality into "everyone needs this" and "only programs written in C want this", accepting that much of the "everyone needs" pile will retain a certain C flavor to it. I don't know if it would be worth doing, I would have to take the time to make the lists.)

Removing syscall() from OpenBSD

Posted Nov 1, 2023 17:32 UTC (Wed) by simcop2387 (subscriber, #101710) [Link]

It's not explicitly targeting that topic, but you might give a look to Redox-OS since the whole gimmic there is that it's written in rust and with no C (in theory, not sure if it's completely that way). They provide a c library for compatibility but it's not used as the way to interact with the kernel.

https://www.redox-os.org/

Removing syscall() from OpenBSD

Posted Oct 28, 2023 21:21 UTC (Sat) by Wol (subscriber, #4433) [Link] (4 responses)

I know it might be a bit messy, but is there any reason linux doesn't have a mini-libc, which just contains all the stuff that should be in glibc and isn't?

Then patch glibc's make file so that "if linux then include linux-libc"?

Cheers,
Wol

Removing syscall() from OpenBSD

Posted Oct 30, 2023 14:36 UTC (Mon) by zwol (guest, #126152) [Link] (3 responses)

For the record, I fully intend to shove all the missing system calls into glibc Real Soon Now.

Removing syscall() from OpenBSD

Posted Oct 30, 2023 14:59 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (2 responses)

Would this obviate the need for things like `libkeyutils` and `libiouring` (though ISTR there being multiples of these)?

Removing syscall() from OpenBSD

Posted Oct 30, 2023 19:35 UTC (Mon) by zwol (guest, #126152) [Link] (1 responses)

Not if they provide functionality beyond just bare system call wrappers. I don't know anything about the specific libraries you mention, though.

Removing syscall() from OpenBSD

Posted Oct 30, 2023 19:41 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

While I don't know about the io_uring libraries, I don't think there's much to libkeyutils' userspace. Some auto-allocating variants, a recursive search function, error descriptions. I guess my main worry is about conflicting symbol names. glibc will likely use symbol versioning, so maybe there's not much of a worry about runtime, just compiler conflicts if the headers meet each other inside a single TU.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:33 UTC (Fri) by smurf (subscriber, #17840) [Link] (3 responses)

Expect a heap of deservedly-pointed feedback from Linus (and others) for any idea that requires old applications to be "patched" if they want to continue working.

At minimum such restrictions must be enabled with a separate syscall, or some bits in the ELF header, or whatever.

Removing syscall() from OpenBSD

Posted Oct 30, 2023 11:21 UTC (Mon) by mgedmin (subscriber, #34497) [Link] (2 responses)

It's OpenBSD, they have all the userspace applications in one giant CSV repository (or have they managed to migrate to Subversion yet?), and they don't care about ABI compatibility. It's not the first time.

Removing syscall() from OpenBSD

Posted Oct 30, 2023 12:12 UTC (Mon) by joib (subscriber, #8541) [Link] (1 responses)

CVS, not CSV, presumably.

I vaguelly recall one of the *BSD projects working on a BSD licensed clone of git, but my google-fu is failing me and I can't find any information about that project. (FreeBSD seems to have switched to using git for their development, using the 'official' GPL-licensed git. )

Game of Trees

Posted Oct 30, 2023 14:22 UTC (Mon) by adamnew123456 (subscriber, #136057) [Link]

You're thinking of GOT: https://gameoftrees.org/goals.html

Which underscores the GP's point even further - OpenBSD would sooner write its own Git than depend on the official one. Not that it's *totally* unjustified given their security goals, but that's a far boundary to draw for "the base OS".

Removing syscall() from OpenBSD

Posted Oct 29, 2023 0:15 UTC (Sun) by wahern (subscriber, #37304) [Link]

> there would have to be some way to register syscall call sites that live outside of libc

OpenBSD added pinsyscall(2) for this (https://man.openbsd.org/pinsyscall), though it only permits a single call site, the initial pin is immutable, and it currently only knows about execve(2) (see https://github.com/openbsd/src/blob/47565b7/sys/uvm/uvm_m...).

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:15 UTC (Fri) by rywang014 (subscriber, #167182) [Link] (10 responses)

Why do we have a syscall() in the first place, since the user space can always write a wrapper (assembly) function that sets the syscall number and then call `syscall` instruction?

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:24 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

I'm not sure how OpenBSD does it, but on Linux, syscall() is literally just a userspace library function that does exactly what you describe. The OpenBSD man page is rather terse, but I would be pretty shocked if it worked all that differently from Linux.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:37 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

This is a syscall, now called syscall, but that used to be called indir back in V4 UNIX (and possibly earlier), and that OpenBSD inherited through the BSD legacy.

The idea is that on platforms where a syscall's parameters are stored inline after the syscall instruction (rather than being passed in registers, as on modern platforms), you can assemble a system call at runtime without having to write into the text segment or execute from the data segment - you call indir from the text segment, with parameters pointing at the place in the data segment where you've assembled your actual system call parameters.

Removing syscall() from OpenBSD

Posted Oct 28, 2023 20:30 UTC (Sat) by dezgeg (subscriber, #92243) [Link]

Finally a good rationale why this needs (or well, has been needed in the past) to be in the kernel. Thanks.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:38 UTC (Fri) by smurf (subscriber, #17840) [Link]

Well, on OpenBSD we can expect this function to not work at all, as soon as they release next.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:24 UTC (Fri) by lindi (subscriber, #53135) [Link]

Not allowing a mapping to be both writable and executable is a standard security feature nowadays.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 18:34 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (3 responses)

syscall(3) is how to do system calls in a portable way.

Removing syscall() from OpenBSD

Posted Oct 27, 2023 20:37 UTC (Fri) by Karellen (subscriber, #67644) [Link] (2 responses)

syscall(2) ?

Removing syscall() from OpenBSD

Posted Oct 27, 2023 20:47 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (1 responses)

Sure. But syscall() is not kernel system call, what is it doing in section 2?

Removing syscall() from OpenBSD

Posted Oct 27, 2023 20:54 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

Because all of the other libc syscall wrappers are in section 2. syscall(2) is just "a wrapper for any syscall that doesn't have a wrapper."

Removing syscall() from OpenBSD

Posted Oct 27, 2023 21:24 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

OpenBSD only allows calling a syscall from an address where libc is loaded. So to allow runtime-assembled calls a separated meta-call has to be provided by libc.


Copyright © 2023, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds