November 20, 2006
This article was contributed by John Richard Moser
The IT industry and the open source community both currently enjoy a
healthy want for security, a growing passion that has brought about new
security tools and even some new programming languages. It isn't always easy
to get all of these things working together; virtual machines such as Mono, for example, have
difficulty with the memory space policies enforced by PaX or SELinux. Some
implementations of the CLI virtual machine may have difficulty functioning
with these security protections, and may be exposed to native code called
from C# programs or the virtual machine itself.
The C# programming language is gaining popularity, and has been used to
write programs such as Beagle, F-Spot, and Banshee. It is also a supported
language for development in the GNOME environment. C# has strong type
checking, array bounds checking, detection of attempts to use uninitialized
variables, and automatic
garbage collection, making it both type-safe and memory-safe; these
aspects make it an attractive language for developers who want to sidestep
manual memory management and just get their programs working.
C# programs are typically compiled to Common Instruction Language (CIL), a
bytecode language designed to be run inside a virtual machine implementing
the Common
Language Infrastructure (CLI). Bytecode languages are similar to
machine-level instructions, except they're not hosted on a physical CPU;
effectively they are CPU architectures that are only run on emulators.
Another familiar example of this is the Java platform, the typical target of
the Java programming language.
The most naive approach to bytecode execution is to use an interpreter.
Interpreters read each instruction in the program as executed; determine what
the instruction is; and then modify the state of the virtual machine as
needed, changing memory values or the program execution point. Interpreters
execute dozens of instructions each time they process a bytecode instruction;
programs execute very slowly, with all but the simplest being irritatingly
sluggish.
Virtual machines often use a technique called Just-in-Time compilation
(JIT) to improve performance. Rather than interpret, JIT compilers generate
equivalent native code from the bytecode they encounter; in essence, they
translate the parts of the program being run to run natively as encountered.
Because of this, the continuous interpreter cost becomes a series of short
one-time compilation costs, which in most cases goes unnoticed.
The first time I wrote for LWN, I authored a small article on security improving technologies which
could be deployed now. Since then, these and other technologies have
become more prevalent; ProPolice is part of gcc, and some of the concepts behind PaX and grsecurity are now integrated into products
such as Exec Shield and SELinux. SELinux has policy elements
that can be applied to almost exactly mimic the behavior of
mprotect() under PaX.
Briefly put, both PaX and SELinux supply a set of protections that prevent
programs from executing any memory that could have ever been directly altered
by the program itself. A typical exploit technique is to use a flaw in a
program to cause it to execute an area of memory an attacker loaded with
code; with these restrictions, this attack is no longer possible. The attackers are
forced then to resort to executing existing code out of order, which is a
blind shot at a moving target due to address space
randomization.
These protections are highly significant; however, they interfere in an
unfortunate way with the execution of programs on Just-in-Time (JIT)
mechanisms such as those used in Mono. The JIT needs to write code into
memory and execute it; and the security system won't allow code generated at
runtime to run. Since the interpreter is far too slow to be useful, the only
real option is to disable the security mechanisms that interfere with the
JIT.
The Common
Language Infrastructure (CLI) allows for managed code to access unmanaged
code; in other words, C# code can call plain old C libraries, making the
program as a whole vulnerable to flaws that can't exist in C#. The
implementation of the virtual machine is also a factor: Mono implements Web browser features using
Mozilla's Gecko rendering engine; and Java implementations can, for example,
use libpng bindings to supply PNG image handling rather than
full managed rewrites.
Below are listed a couple popular Mono applications—C# and other
CLI applications that run on Mono—using native libraries; as well as
some of those libraries that have had significant security holes allowing
remote runtime code execution.
- Banshee, a music
player that handles a variety of formats.
- F-Spot, a personal photo
management application for GNOME.
- libpng is used in F-Spot as well, for much more
than just album covers.
- zlib is also used in F-Spot.
- libxml2 was the subject of CVE-2004-0989
and CVE-2004-0110.
With this potential for vulnerability, it would be attractive to find a
solution for executing Mono without using the JIT. To execute CLI
applications without a JIT, Mono would have to provide a method of executing
assemblies without rewriting them into native code at runtime. This method
would have to function both for typical CIL code and for dynamic
assembly. Dynamic assembly is used to generate CIL bytecode at runtime,
which is then executed by Mono with the help of the JIT. The Cecil debugger; IronPython; and the IKVM Java runtime are examples of
programs that use dynamic assembly to execute whole programs.
The most naive method would be to switch back to the interpreter.
Unfortunately we've already established that the interpreter is extremely
slow, requiring dozens of cycles to complete even the simplest addition or
variable assignment. Even if the interpreter didn't have such prohibitive
performance issues, it's not really
supported anywhere the JIT works, and isn't actively maintained.
Another possibility is to use the Ahead-of-Time (AOT) compiler to run Mono
programs. The AOT compiles Mono assemblies to native code and stores them as
shared libraries. AOT modules can be cached, verified, and updated as needed.
This allows Mono to dlopen() the generated code and execute it like
any other library. This not only eliminates runtime code generation; but also
also increases code sharing between applications, reducing overall system
memory usage. Unfortunately, dynamic assembly doesn't work with AOT, because
it cannot be cached and verified later.
Ulrich Drepper described method of
double-mapping a
file, in which the same memory is available in two different places under
two different permission sets. The file is created, opened, and unlinked so
no other program can alter it; and then
mmap() is used to make two
shared mappings, one writable and one executable. This would work; but it
would also increase disk access and use more of the task's virtual address
space. It would also still allow a very obscure, unlikely, but possible
method for directly introducing code into a program's address space and
executing it successfully.
Currently there doesn't seem to be an obvious great solution to get Mono
to run without runtime code generation. The interpreter is too slow; AOT
doesn't cover dynamic assembly; and Drepper's method of double-mapping a file
creates more disk access. Hybrid methods such as AOT with double-mapping for
dynamic assemblies are also possible, reducing the severity of some of the
drawbacks. By combining these methods, varying degrees of immunity to remote
code execution are afforded with corresponding cost trade-offs.
Of interesting note is that double-mapping a file would prevent policy
from being used to restrict the program to mapping only system libraries and
a global AOT cache. Apart from the unlikely special case with double-mapping,
enhanced memory protections will guarantee that an attacker cannot directly
introduce code into a running program; however, attacks that use
return-to-libc chains can still create, mmap(), and execute a file.
To prevent this, one could restrict executable file-backed mappings to
directories only the system administrator can write to, such as system
libraries and a global AOT cache; of course, this would break
double-mapping.
I cannot predict the implications of these facts for trusted systems and
the applications of C# and Mono in high-security environments. For my own
purposes, I would prohibit the use of Mono programs in environments with
strong security requirements. In my perspective, the cost and potential for
error involved in manually auditing all native code in both the Mono virtual
machine and any native code used by Mono applications simply does not supply
enough value; it is much easier to utilize protections against classes of
vulnerabilities than to prove that applications do not need said protections.
Your mileage may vary.
(
Log in to post comments)