|
|
Subscribe / Log in / New account

Emulating Windows system calls, take 2

Emulating Windows system calls, take 2

Posted Jul 21, 2020 13:28 UTC (Tue) by Funcan (guest, #44209)
In reply to: Emulating Windows system calls, take 2 by tnemeth
Parent article: Emulating Windows system calls, take 2

I'd say that they're implementing 'personalities' the same way they did containers - provide the facilities known needed right now, and let anybody built a 'product' around them - adding new feature as needed. This allows multiple parallel efforts at producing a 'product' and iteration, rather than trying to design everything in advance, which history suggests works badly.


to post comments

Emulating Windows system calls, take 2

Posted Jul 23, 2020 13:53 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

Threads were added the same way, and the result was eventually a best-in-class threading implementation which does not fill the kernel with complexity related to one specific threading model. Seems like a good approach to me too (though if you'd asked me back in the year 2000 when LinuxThreads was the only threading implementation out there, I might have said something different!)

Emulating Windows system calls, take 2

Posted Jul 23, 2020 21:02 UTC (Thu) by BenHutchings (subscriber, #37955) [Link] (1 responses)

This is not what happened with threads. The kernel APIs that supported LinuxThreads were not sufficient to build a compliant POSIX threads implementation. The New POSIX Threads Library (NPTL) that eventually replaced LinuxThreads required additional kernel APIs.

Emulating Windows system calls, take 2

Posted Jul 24, 2020 22:08 UTC (Fri) by nix (subscriber, #2304) [Link]

It did -- but this wasn't done the way (say) Solaris did it, by piling the entire threading abstraction into the kernel and complicating the bejeezus out of everything. Instead we got away with a few relatively simple and noninvasive things: thread groups, a bit of tweaking to signal handling (which was painful, sure, but much less painful than almost everything else around signal handling already was and is), futexes, and some nice souping up to PID allocation so you could have silly numbers of them.

(And a good few of those abstractions were, I vaguely recall, there *already*, and used by LinuxThreads: tgids, for instance. I'm sure futexes were new for NPTL though.)

Emulating Windows system calls, take 2

Posted Aug 11, 2020 15:50 UTC (Tue) by Ericson2314 (guest, #139248) [Link] (6 responses)

I think there should be both. This is a *dynamic* way to control the meaning of syscalls, personalities area *static* way.

In general, it's not good to force the use of dynamic solutions to static needs, even though the dynamic one is strictly more expressive. It's good to be static where possible because it better conveys intent, is simpler to reason about, especially w.r.t. security, and can be better optimized.

I think Linux should have this and personalities. For example, there could be a native personality, a Windows personality (via Kernel module, let's say), and a way to disjoint-union (sum) personalities via name-spacing (e.g. a tag big) somewhere in memory or registers. Then Wine can use this in combination with the native+windows union personality, and the trampoline just neeeds to set the Windows-or-Linux bit, letting the kernel do the rest.

I separately want personalities to revive Capsicum/CloudABI on Linux. And the dynamic mechanism alone is a dubious way to do something as security-critical as that.

Emulating Windows system calls, take 2

Posted Aug 11, 2020 18:15 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

Note that personalities already exist in Linux, as a kernel-side concept.

It looks like the only non-Linux personality that hasn't bitrotted to the point of removal is https://elixir.bootlin.com/linux/latest/source/arch/alpha/kernel/osf_sys.c

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:50 UTC (Tue) by Ericson2314 (guest, #139248) [Link]

Thanks for letting me know!

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:06 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

The problem with classic personalities is that they are very shallow. They can be used to adapt syscalls, but little else.

Fully emulating Windows requires much more infrastructure. Windows has completely different synchronization primitives, different IO stack (IOCP!), and a different thread and process model.

You need a LOT of code to emulate all of this. You can put it into the kernel (and there were projects that basically put Wine into kernel mode), but the chances of it getting merged are nil.

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:50 UTC (Tue) by Ericson2314 (guest, #139248) [Link] (1 responses)

Yes absolutely agreed. Being able to do "deep" stuff would basically require "composition over configuration", so an exokernel, to reign in the complexity. Great idea, but Linux absolutely is not that.

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:51 UTC (Tue) by Ericson2314 (guest, #139248) [Link]

That's why I brought up Capsicum / CloudABI. It's about the most interesting thing that can be done that is "shallow".

Emulating Windows system calls, take 2

Posted Jan 2, 2021 10:50 UTC (Sat) by ras (subscriber, #33059) [Link]

This is a very late comment I guess, buts that's mostly because I still don't know what is being proposed.

> The problem with classic personalities is that they are very shallow. They can be used to adapt syscalls, but little else.

I presume that means the design of personality API forces you to place the code in the kernel. "The code" is something using Linux to emulate the foreign syscall. I can tell you with reasonable authority[0] that's true.

Surprisingly the code doesn't care where it lives. In fact it mostly doesn't change. Even if the code was a kernel module moving to user space. I know that because I've done it [1].

I can also tell you from the maintainer of the said code's perspective, user space is the nicer place to be. Far, far, far nicer. To paint the picture, the kernel code I maintained has to do syscall's in some way. Why syscall's? Because they are the only interface in the kernel that's stable, and life's far too short spend it migrating out of tree code to each kernel version. But getting to syscall table was mission bloody impossible. The mechanism I inherited was unwinding the kernel stack, disassembling return addresses until you found the code that dispatched to you, then extract the pointer to the syscall table from the previous instruction. (Sorry if I wrecked your meal.) This is of course not easy to port between kernel versions either, but at least you are porting just one obnoxious thing.

Turns out there is only one real challenge to moving it to user space. That is getting the kernel to deliver the syscall to some user space trampoline address. That ends up in being one of two challenges. The first kind is when the syscall mechanism is what the Linux kernel uses. In that case is becomes 'redirect any syscall outside of the trampoline range to the trampoline'. This case is covered in the article.

The 2nd kind is when the syscall mechanism is not what the kernel uses. A software interrupt is the obvious way, but the x86 / amd64 designers in particular have displayed commendable imaginative flair in this area. However, in every case I've seen the mechanism is illegal to do in user space, so either the kernel handles it, or it's SIGSEGV. So all you have to do is arrange for that mechanism (whatever it is) to boomerang to an address in the trampoline, with minimal overhead.

Handling the second case doesn't seem hard: a standardised API like syscall_trap(SYSCALL_TRAP_INT80, trampoline_trap, trampoline_begin, trampoline_end, flags), a kernel module for each SYSCALL_TRAP_??? mechanism. Job done.

How the proposed mechanism do this is a bit of a mystery. I guess I should look at the code.

[0] http://ibcs64.sourceforge.net/
[1] http://ibcs-us.sourceforge.net/


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds