Emulating Windows system calls, take 2

Posted Jul 17, 2020 16:46 UTC (Fri) by tnemeth (subscriber, #37648)
Parent article: Emulating Windows system calls, take 2

Are kernel developpers knowingly avoiding the implementation of a skin / personality mechanism ?

Emulating Windows system calls, take 2

Posted Jul 17, 2020 17:55 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (6 responses)

Arguably, this *is* (part of) a skin / personality system, in much the same way as FUSE is (part of) a filesystem.

Emulating Windows system calls, take 2

Posted Jul 17, 2020 18:34 UTC (Fri) by tnemeth (subscriber, #37648) [Link] (5 responses)

Without a clear design it only looks like a kludge / dirty hack for just one single use case. Xenomai has this kind of feature (native, posix, ...). It could be a nice place to start looking for a similar architecture and do things "right" (IMHO: I'm not sure if I'm right here myself, not knowing the details :) ).

Even if I'm not a particular fan of being able to run what I think is a bunch of ugly Windows applications, I do recognize that some (like my children) would like to play Windows games and I would much prefer running them under Linux than under Windows itself. I'm clearly afraid of security implications of such a mechanism.

But if we go there, then why not add syscall support for other OSes like MacOS, Haiku, whateverOS...

So it is a, not-full-fledged, hiding its name, part of a skin mechanism. Sorry :) I like clear views, clear APIs... and grumbling a lot.

Emulating Windows system calls, take 2

Posted Jul 17, 2020 19:38 UTC (Fri) by krisman (subscriber, #102057) [Link] (4 responses)

> Without a clear design it only looks like a kludge / dirty hack for
> just one single use case. Xenomai has this kind of feature (native,
> posix, ...). It could be a nice place to start looking for a similar
> architecture and do things "right" (IMHO: I'm not sure if I'm right
> ahere myself, not knowing the details :) ).
>
> Even if I'm not a particular fan of being able to run what I think is a
> bunch of ugly Windows applications, I do recognize that some (like my
> children) would like to play Windows games and I would much prefer
> running them under Linux than under Windows itself. I'm clearly afraid
> of security implications of such a mechanism.

Can you forward any specific security concerns that are not already mitigated to the original thread? We definitely want to have a good look at any security implications of this feature. I expect that most issues would be mitigated by making it unable to cross fork/exec boundaries.

>
> But if we go there, then why not add syscall support for other OSes like
> MacOS, Haiku, whateverOS...add specific support for any platform in the kernel.

The goal is to provide an infrastructure for emulation in userspace. This means exactly that we don't need to go adding support for whateverOS in the kernel. :)

> So it is a, not-full-fledged, hiding its name, part of a skin
> mechanism. Sorry :) I like clear views, clear APIs... and grumbling a
> lot.

We try to solve most emulation issues in userspace, unless it really needs to be in the kernel (i.e. the stuff in personality(2)). I'd say to not expect a generic skin interface beyond specific features to solve pain points for userspace emulation.

But, saying we are hiding it is not fair. In fact, I called it a personality mechanism in my first submission, but we dropped that name to avoid confusion.

Emulating Windows system calls, take 2

Posted Jul 17, 2020 21:49 UTC (Fri) by tnemeth (subscriber, #37648) [Link] (2 responses)

> > I'm clearly afraid of security implications of such a mechanism.
>
> Can you forward any specific security concerns that are not already mitigated to the original
> thread? We definitely want to have a good look at any security implications of this feature. I
> expect that most issues would be mitigated by making it unable to cross fork/exec boundaries.

Of course not. This is just a /personnal fear/. I have a profound distrust in anything that runs
under Windows. Not that I trust blindingly softwares that runs on Linux, but I've seen so many
malware hidden in documents, images and vulnerabilities even in windows softwares made by
"security" teams (last time I had been affected was with Cisco Webex) that I can imagine that
some ways will be explored to gain access to a Linux system through faulty Windows
programs.

> The goal is to provide an infrastructure for emulation in userspace. This means exactly that we
> don't need to go adding support for whateverOS in the kernel. :)

It's, indeed, better out of the kernel. So a fuse-like API for personalities :)

> But, saying we are hiding it is not fair. In fact, I called it a personality mechanism in my first
> submission, but we dropped that name to avoid confusion.

I'm sorry, I didn't mean to be unfair. I missed that point (I do not follow LKML anymore).

Thank you for clearing my mind :)

Emulating Windows system calls, take 2

Posted Jul 18, 2020 17:45 UTC (Sat) by smcv (subscriber, #53363) [Link]

> I have a profound distrust in anything that runs under Windows. Not that I trust blindingly softwares that runs on Linux, but I've seen so many malware hidden in documents, images and vulnerabilities even in windows softwares made by "security" teams

Wine is already not a security mechanism. If you want to run Windows software with lower privilege than your normal login account, you'll need to run Wine in a less-privileged environment using container namespaces, LSMs and/or seccomp (for example a Flatpak, Snap or Docker container), as a separate uid, or in a virtual machine.

Emulating Windows system calls, take 2

Posted Jul 20, 2020 20:03 UTC (Mon) by plugwash (subscriber, #29694) [Link]

> It's, indeed, better out of the kernel. So a fuse-like API for personalities :)

This leads to the question that if a "personality" is going to be implemented in userland should it be implemented in the same process as the foreign code or a separate process.

There are pros to both approaches.

Pros of same process:

* The performance cost of switching context between processes is avoided.
* The emulation code can easily access data belonging to the foreign code through pointers
* For a foreign platform (like windows) where the "normal" interface is defined as a library ABI, not a syscall ABI most calls don't have to go through the emulation process at all.

Pros of separate process

* The foreign code cannot deliberately or accidentally mess with the emulation code.
* The foreign code can use the address space however it needs (wine has to use some fairly dirty tricks to allow non-relocatable windows binaries to be loaded in the required location)
* There is no need for a special mechanism to switch back and forth between regular syscall mode and foreign syscall mode.
* The system could potentially be used for security sandboxing as well as foreign code support.

Emulating Windows system calls, take 2

Posted Jul 23, 2020 13:51 UTC (Thu) by nix (subscriber, #2304) [Link]

> Can you forward any specific security concerns that are not already mitigated to the original thread? We definitely want to have a good look at any security implications of this feature. I expect that most issues would be mitigated by making it unable to cross fork/exec boundaries.

Quite. This has no more security implications than a signal handler (to which it is very similar), so as long as it takes the same approach (reset signal handlers on address space reset at exec() time) we should be fine. Sure, if you use this mechanism *wrong* all sorts of things can happen, but the same is true of signal handlers and indeed of general bugs in code. The implications are no worse, except that making a mistake here is more likely to be obvious because it will probably break the program's use of lots of syscalls at once :)

Emulating Windows system calls, take 2

Posted Jul 21, 2020 13:28 UTC (Tue) by Funcan (subscriber, #44209) [Link] (10 responses)

I'd say that they're implementing 'personalities' the same way they did containers - provide the facilities known needed right now, and let anybody built a 'product' around them - adding new feature as needed. This allows multiple parallel efforts at producing a 'product' and iteration, rather than trying to design everything in advance, which history suggests works badly.

Emulating Windows system calls, take 2

Posted Jul 23, 2020 13:53 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

Threads were added the same way, and the result was eventually a best-in-class threading implementation which does not fill the kernel with complexity related to one specific threading model. Seems like a good approach to me too (though if you'd asked me back in the year 2000 when LinuxThreads was the only threading implementation out there, I might have said something different!)

Emulating Windows system calls, take 2

Posted Jul 23, 2020 21:02 UTC (Thu) by BenHutchings (subscriber, #37955) [Link] (1 responses)

This is not what happened with threads. The kernel APIs that supported LinuxThreads were not sufficient to build a compliant POSIX threads implementation. The New POSIX Threads Library (NPTL) that eventually replaced LinuxThreads required additional kernel APIs.

Emulating Windows system calls, take 2

Posted Jul 24, 2020 22:08 UTC (Fri) by nix (subscriber, #2304) [Link]

It did -- but this wasn't done the way (say) Solaris did it, by piling the entire threading abstraction into the kernel and complicating the bejeezus out of everything. Instead we got away with a few relatively simple and noninvasive things: thread groups, a bit of tweaking to signal handling (which was painful, sure, but much less painful than almost everything else around signal handling already was and is), futexes, and some nice souping up to PID allocation so you could have silly numbers of them.

(And a good few of those abstractions were, I vaguely recall, there *already*, and used by LinuxThreads: tgids, for instance. I'm sure futexes were new for NPTL though.)

Emulating Windows system calls, take 2

Posted Aug 11, 2020 15:50 UTC (Tue) by Ericson2314 (guest, #139248) [Link] (6 responses)

I think there should be both. This is a *dynamic* way to control the meaning of syscalls, personalities area *static* way.

In general, it's not good to force the use of dynamic solutions to static needs, even though the dynamic one is strictly more expressive. It's good to be static where possible because it better conveys intent, is simpler to reason about, especially w.r.t. security, and can be better optimized.

I think Linux should have this and personalities. For example, there could be a native personality, a Windows personality (via Kernel module, let's say), and a way to disjoint-union (sum) personalities via name-spacing (e.g. a tag big) somewhere in memory or registers. Then Wine can use this in combination with the native+windows union personality, and the trampoline just neeeds to set the Windows-or-Linux bit, letting the kernel do the rest.

I separately want personalities to revive Capsicum/CloudABI on Linux. And the dynamic mechanism alone is a dubious way to do something as security-critical as that.

Emulating Windows system calls, take 2

Posted Aug 11, 2020 18:15 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

Note that personalities already exist in Linux, as a kernel-side concept.

It looks like the only non-Linux personality that hasn't bitrotted to the point of removal is https://elixir.bootlin.com/linux/latest/source/arch/alpha/kernel/osf_sys.c

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:50 UTC (Tue) by Ericson2314 (guest, #139248) [Link]

Thanks for letting me know!

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:06 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

The problem with classic personalities is that they are very shallow. They can be used to adapt syscalls, but little else.

Fully emulating Windows requires much more infrastructure. Windows has completely different synchronization primitives, different IO stack (IOCP!), and a different thread and process model.

You need a LOT of code to emulate all of this. You can put it into the kernel (and there were projects that basically put Wine into kernel mode), but the chances of it getting merged are nil.

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:50 UTC (Tue) by Ericson2314 (guest, #139248) [Link] (1 responses)

Yes absolutely agreed. Being able to do "deep" stuff would basically require "composition over configuration", so an exokernel, to reign in the complexity. Great idea, but Linux absolutely is not that.

Emulating Windows system calls, take 2

Posted Aug 11, 2020 19:51 UTC (Tue) by Ericson2314 (guest, #139248) [Link]

That's why I brought up Capsicum / CloudABI. It's about the most interesting thing that can be done that is "shallow".

Emulating Windows system calls, take 2

Posted Jan 2, 2021 10:50 UTC (Sat) by ras (subscriber, #33059) [Link]

This is a very late comment I guess, buts that's mostly because I still don't know what is being proposed.

> The problem with classic personalities is that they are very shallow. They can be used to adapt syscalls, but little else.

I presume that means the design of personality API forces you to place the code in the kernel. "The code" is something using Linux to emulate the foreign syscall. I can tell you with reasonable authority[0] that's true.

Surprisingly the code doesn't care where it lives. In fact it mostly doesn't change. Even if the code was a kernel module moving to user space. I know that because I've done it [1].

I can also tell you from the maintainer of the said code's perspective, user space is the nicer place to be. Far, far, far nicer. To paint the picture, the kernel code I maintained has to do syscall's in some way. Why syscall's? Because they are the only interface in the kernel that's stable, and life's far too short spend it migrating out of tree code to each kernel version. But getting to syscall table was mission bloody impossible. The mechanism I inherited was unwinding the kernel stack, disassembling return addresses until you found the code that dispatched to you, then extract the pointer to the syscall table from the previous instruction. (Sorry if I wrecked your meal.) This is of course not easy to port between kernel versions either, but at least you are porting just one obnoxious thing.

Turns out there is only one real challenge to moving it to user space. That is getting the kernel to deliver the syscall to some user space trampoline address. That ends up in being one of two challenges. The first kind is when the syscall mechanism is what the Linux kernel uses. In that case is becomes 'redirect any syscall outside of the trampoline range to the trampoline'. This case is covered in the article.

The 2nd kind is when the syscall mechanism is not what the kernel uses. A software interrupt is the obvious way, but the x86 / amd64 designers in particular have displayed commendable imaginative flair in this area. However, in every case I've seen the mechanism is illegal to do in user space, so either the kernel handles it, or it's SIGSEGV. So all you have to do is arrange for that mechanism (whatever it is) to boomerang to an address in the trampoline, with minimal overhead.

Handling the second case doesn't seem hard: a standardised API like syscall_trap(SYSCALL_TRAP_INT80, trampoline_trap, trampoline_begin, trampoline_end, flags), a kernel module for each SYSCALL_TRAP_??? mechanism. Job done.

How the proposed mechanism do this is a bit of a mystery. I guess I should look at the code.

[0] http://ibcs64.sourceforge.net/
[1] http://ibcs-us.sourceforge.net/