|
|
Subscribe / Log in / New account

Remote imports for Python?

By Jake Edge
August 30, 2017

Importing a module into a Python program is a pretty invasive operation; it directly runs code in the current process that has access to anything the process can reach. So it is not wildly surprising that a suggestion to add a way to directly import modules from remote sites was met with considerable doubt—if not something approaching hostility. It turns out that the person suggesting the change was not unaware of the security implications of the idea, but thought it had other redeeming qualities; others in the discussion were less sanguine.

In his first post to the python-ideas mailing list, security researcher John Torakis proposed imports via HTTP and HTTPS be added as a core Python feature. He also filed an enhancement request; both the post and the bug pointed to his httpimport repository on GitHub that has a prototype implementation. The justification he cited was always likely to set off some alarm bells:

My proposal is that this module can become a core Python feature, providing a way to load modules even from Github.com repositories, without the need to "git clone - setup.py install" them.

Other languages, like golang, provide this functionality from their early days (day one?). Python development can be greatly improved if a "try before pip installing" mechanism gets in place, as it will add a lot to the REPL [read-eval-print loop] nature of the testing/experimenting process.

Chris Angelico vehemently opposed the feature, at least for core Python: "This is a security bug magnet; can you imagine trying to ensure that malicious code is not executed, in an arbitrary execution context?" If the feature is explicitly enabled (via, say, pip), it is much less worrisome, Angelico said. The idea of allowing imports over regular HTTP is one that should be dropped, he said; even HTTPS imports would require being "absolutely sure that your certificate chains are 100% dependable".

Oleg Broytman also opposed the idea, suggesting that it would require a Python Enhancement Proposal (PEP), instead of simply filing an enhancement request, to truly be considered. He also noted that there is a difference for Go's remote imports: those happen at compile time for Go, while they would be done at runtime for Python.

The README for Torakis's httpimport repository mentions it being used as a "staging protocol for covertutils backdoors". Torakis's covertutils project is described as a "framework for backdoor programming"; that also got Angelico's attention:

But I'm not entirely sure I want to support this. You're explicitly talking about using this with the creation of backdoors... in what, exactly? What are you actually getting at here?

Torakis responded to those complaints (in a posting with non-standard quoting). The backdoor work is evidently part of his day job and httpimport could be useful for that work, especially for rapid prototyping, testing, and debugging purposes. He did agree that HTTP imports are dangerous, but noted that doing so locally (i.e. only to localhost or perhaps trusted local systems) could be useful for testing. He also objected to the complaint about HTTPS, however, noting that certificate checks would be done to eliminate the man-in-the-middle threat. In another posting, he put it this way: "if you can't trust your certificate store now, and you are afraid of Remote code execution through HTTPS, stop using pip altogether".

While indicating that httpimport would make an excellent addition to the Python Package Index (PyPI), Paul Moore pointed out another flaw with the idea of making the feature a core part of the language: it removes the ability for an organization's security team to disallow it. On the other hand, the team could restrict the ability to install modules from PyPI or simply blacklist certain entries such as httpimport. He said:

[...] whereas with a core module, it's there, like it or not, and *all* Python code has to be audited on the assumption that it might be used. I could easily imagine cases where the httpimport module was allowed on development machines and CI servers, but forbidden on production (and pre-production) systems. That option simply isn't available if the feature is in the core.

This is not the first time this idea has come up, Guido van Rossum noted; it was first proposed (and rejected) in 1995 or so—before HTTPS was invented, he said. He may be misremembering either the date or the status of HTTPS, as Wikipedia gives 1994 for its creation, though it surely was not in widespread use by 1995. As with others in the thread, Van Rossum was happy to see a third-party httpimport available to those who need it, but it is too much of a security concern to ever consider adding to the standard library.

Torakis said that he was two-years old at the time of that decision and that "times have changed". He said that he is willing to make changes to make httpimport acceptable for the standard library. It would make working with Python much easier for him and others:

I'm talking about the need to rapidly test public code. I insist that testing code available on Github (or other repos), without the venv/clone/install hassle is a major improvement in my (and most sec researchers' I know) Python workflow. It makes REPL prototyping million times smoother. We all have created small scripts that auto load modules from URLs anyway.

There was no support for adding it as a core feature in the thread, though. A simple "pip install httpimport" is all that would be needed to get access to the feature (once Torakis gets it added to PyPI, anyway). So some thread participants wondered why it was so imperative that it become part of the standard library. As Stephen J. Turnbull put it: "It's an attractive nuisance unless you're a security person, and then pip is not a big deal." Echoing Moore to some extent, Nick Coghlan added another concern:

[...] donning my commercial redistributor hat: it already bothers some of our (and our customers') security folks that we ship package installation tools that access unfiltered third party package repositories by default (e.g. pip defaulting to querying PyPI).

As a result, I'm pretty sure that even if upstream said "httpimport is in the Python standard library now!", we'd get explicit requests asking us to take it out of our redistributed version and make it at most an optional install (similar to what we do with IDLE and Tcl/Tk support in general).

There is at least one popular scripting language that makes this feature available as part of the language: PHP. Originally, directives like include and require could contain a URL; code would be retrieved from that URL, then executed. Over time, the wisdom of that choice has been questioned; these days, two configuration options (allow_url_fopen and allow_url_include) govern the behavior and remote inclusion is disallowed by default. While many scripting-style languages have ways to accomplish remote inclusion, it is considered something of a trap for the unwary, thus not elevated to be a top-level language feature.

Something that bears noting from the discussion is that installing code over HTTPS from PyPI is no more (or less) dangerous than doing so from GitHub—at least from a man-in-the-middle perspective. There is still the danger of using code from a malicious GitHub repository but, in truth, the same problem exists for PyPI. There is no active vetting of either GitHub repositories or packages uploaded to PyPI. HTTPS can ensure that you are connecting to the server holding the proper key, but it cannot protect you from asking for the wrong thing from that server.

As Torakis's age post indicates, there may be something of a generation gap surrounding this issue. The GitHub-centric, rapid-fire development style meets the grizzled graybeards who still sport some of the scars of security issues past. It is certainly true that it is easy enough to add remote imports to a program (via httpimport or something hand-rolled), but the idea is that programmers will have reached a certain level of understanding when they get to that point—hopefully enough to recognize the dangers of doing so. In any case, by not making it a top-level, supported feature, abuse of it is not the responsibility of the Python core team. Avoiding that kind of "attractive nuisance" (and the bugs it can spawn) is another lesson that the Python graybeards have learned along the way.

Index entries for this article
SecurityPython


to post comments

Remote imports for Python?

Posted Aug 30, 2017 2:57 UTC (Wed) by bferrell (subscriber, #624) [Link]

I'm guessing the lesson from "deprecated" javascript module was lost on this person.

Remote imports for Python?

Posted Aug 30, 2017 3:39 UTC (Wed) by luto (guest, #39314) [Link] (9 responses)

I'm surprised no one has suggested doing this but making it mandatory for the import call to include a hash of the module.

Remote imports for Python?

Posted Aug 30, 2017 5:04 UTC (Wed) by josh (subscriber, #17465) [Link] (3 responses)

Likewise, that would be quite helpful.

Or the hash of a public key matching the private key the module is signed with.

Remote imports for Python?

Posted Aug 30, 2017 6:53 UTC (Wed) by mokki (subscriber, #33200) [Link] (1 responses)

Java has tried to support this for 20+ years with signed remote packages and a sandbox.

99% of security bugs reported against java in the last 5 years have been about remote code escaping the sandbox.

Why would python want that security circus? It just makes the language seem bad when actually 99.9% of code never uses the remote execute feature and thys does not even enable the sandbox.

Enabling this in python without sandbox would be security nightmare and supporting a sandbox is known to be security nightmare.

Remote imports for Python?

Posted Aug 31, 2017 19:32 UTC (Thu) by k8to (guest, #15413) [Link]

Python is definitely incapable of meaningfully sandboxing.

That aside, the proposal for supporting it with a hash seems sort of vaguely OK, but I don't see the point. If you know the content you want to run ahead of time, why do you need to load it dynamically? I expect the major use pattern at that point will be people who write some code to generate the hash dynamically and then httpimport it, or in other words, the path of laziness.

Remote imports for Python?

Posted Sep 1, 2017 13:52 UTC (Fri) by syops (guest, #115198) [Link]

I do find myself wishing Subresource Integrity could be generalized to work with any HTTP GET. As it stands, SRI provides the resource in the src or href attribute, and the hash (or signature?) in a separate attribute. But that doesn't help with any arbitrary simple http application or library. Requests, pip and our favorite curl | sudo bash aren't designed to checksum the data they're fetching. I'd like to think this could be solved in a future implementation of http (maybe rolling hashes could be used to avoid the pitfall of the client having to stage and hash a very large download before writing it to disk), but I'm a dreamer.

Then again, there's always IPFS. As I understand it, with IPFS, the hash is the address of the content.

Remote imports for Python?

Posted Aug 30, 2017 7:22 UTC (Wed) by niner (subscriber, #26151) [Link] (4 responses)

From the article I get the impression that this feature is meant for really rapid prototyping or testing. The httpimport module's documentation talks about testing pull requests. Nowhere is mentioned that this should come anywhere near production code. Having to get a checksum of some remote module first negates the speed benefits. And people who think about security at all will know not to use httpimport in production code.

That said, it _is_ a good idea when downloading remote code is indeed wanted. Some Perl 6 modules for example are just bindings for native libraries. To simplify installation on Windows, their installation may involve downloading DLLs and it's common to use checksums to secure that.

Remote imports for Python?

Posted Aug 30, 2017 16:41 UTC (Wed) by jezuch (subscriber, #52988) [Link] (1 responses)

> Nowhere is mentioned that this should come anywhere near production code.

You're right, but that doesn't mean that it won't. I guess a Daily WTF article about some "quick hack" like this that ended up in production ("just temporarily, promise!"), won't be long coming :)

Remote imports for Python?

Posted Sep 10, 2017 5:32 UTC (Sun) by Garak (guest, #99377) [Link]

DWA, my new TLA of the day, thx. I think the DWA angle is interestingly core here. The modern world being a smaller place has had this evolution of putting oddities, and perhaps a shrinking amount of scattered human suffering under a metaphoric microscope. The sorts of news stories that make the feeds as well as the mainstream news are a lot like this. And unfortunately, the sensationalism outweighing a more rationally weighted perception seems to be the rule in this day and age. (Of course a lot more was more easily hidden from the public in the past, not claiming things overall were obviously better historically).

What I'm thinking in response to your comment is that such arguments of trying to protect people from shooting themselves in the foot don't seem that persuasive to me. I see a vast diverse ecosystem of developers, and I'm personally not bothered by the DWA factor. The news industry won't let the DWA factor fade, they'll just add another lense to that microscope and give people the entertaining idiot story to laugh at or be scared of for a moment.

Of course the example that comes to mind above is bannning swimming pools to save children because of course some parents who choose to have pools will fail, and kids will die. Somehow, for reasons I couldn't map out in a thesis, I'm more worried about the swimming pools than the dangers being talked about here if the feature gets added. And I've been able to swim for 3 decades now. I mean seriously, there will always be ways for software developers and businesses that rely on them to metaphorically shoot themselves in the foot. I don't see this as significantly effecting that in the long term. But OTOH, the obvious FOSS answer is - Fork Python if you care enough. Let the fittest thrive the most in the ecosystem. Somehow I doubt anyone cares about this feature that much. But it made for some sensationalist reaction commentary with academic entertainment value.

Remote imports for Python?

Posted Sep 1, 2017 0:53 UTC (Fri) by ThinkRob (guest, #64513) [Link] (1 responses)

> From the article I get the impression that this feature is meant for really rapid prototyping or testing.

The road to hell is paved with good intentions.

There is no doubt in my mind that as soon as this feature hits mainline, that a whole bunch of shops with "Agile" in their job descriptions are gonna go "Sweet, we don't need a build system anymore!" and run with it.

Remember: this is the reality in which something like 20% or whatever of npm modules broke when 'leftpad' went away. Good engineering discipline is in precious short supply nowadays...

Remote imports for Python?

Posted Sep 1, 2017 1:12 UTC (Fri) by anselm (subscriber, #2796) [Link]

Good engineering discipline is in precious short supply nowadays...

It seems to me that a software system whose installation procedure is based on piping the output of curl into a local root shell forfeits any claim to “good engineering discipline” right there.

Remote imports for Python?

Posted Aug 30, 2017 3:53 UTC (Wed) by smckay (guest, #103253) [Link] (9 responses)

I find it difficult to take the guy at face value. Compared to the alternatives, the value of remote code loading as a language feature is mostly making exploits easier. Everyone who actually wants this terrible misfeature can google "Python import from URL" and be good to go in 2 minutes.

Remote imports for Python?

Posted Aug 30, 2017 16:29 UTC (Wed) by drag (guest, #31333) [Link] (8 responses)

It's not really any different then how people get software now. It just cuts out steps.

Currently if you use pip it goes like this:

https server --> pip install foo --> import foo

If you use distro supplied packages it is like this:

https server ---> distro build package ---> https server ---> yum install python-foo --> import foo

In a real sense it's not a whole lot different then just going:

https server ---> import foo

In every case the original source of trust is the https server. Even the deb or rpm package signatures doesn't confirm that the original code wasn't pulled from a compromised source. Sometimes distros can audit the code, but that only happens for a minority of packaged software.

If security is really the priority then the real way to do it is to have the original developers sign a tarball or package of the source code prior to ever being posted to anything touching the internet and then establish a chain of trust all the way through to the end user. The easiest way to do that may just be to eliminate as many layers between end users and developers as possible. This doesn't prevent a third party from auditing the source code and giving their official blessing to specific revisions.

Remote imports for Python?

Posted Aug 30, 2017 16:49 UTC (Wed) by adam820 (subscriber, #101353) [Link] (4 responses)

So maybe not an import-from-URL, but more of an "auto-pip", where everything is hosted and vetted? If you stick an import in the code, and then try to run it but it doesn't have it, it just searches for it and auto-downloads it?

Remote imports for Python?

Posted Aug 30, 2017 17:11 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

It searches for it, makes an RPM spec file, builds and installs it, and submits the source package for inclusion in Fedora?

Remote imports for Python?

Posted Aug 31, 2017 15:57 UTC (Thu) by gioele (subscriber, #61675) [Link]

> It searches for it, makes an RPM spec file, builds and installs it, and submits the source package for inclusion in Fedora?

It is not that automated, but the Debian packages for many Rubygems are created, built and semi-automatically updated in a similar way, using gem2deb and gemwatch.

Remote imports for Python?

Posted Aug 31, 2017 1:20 UTC (Thu) by drag (guest, #31333) [Link] (1 responses)

> So maybe not an import-from-URL, but more of an "auto-pip", where everything is hosted and vetted?

I suppose so.

I like how Debian does it with apt-get. You have a signed list of packages with their checksums and locations and mirrors. You could mirror the file local to the list or on any server really. Then you don't have to really care where or how they are stored because you have their checksums in a secure manner. Https vs http vs ftp vs nfs mount or whatever... doesn't matter.

Nothing revolution or weird or remarkable or even that much different. The difference between this sort of setup versus distro packages is that it would be OS agnostic since it would be largely source code based. The same list of packages would be for OS X vs Linux vs Windows or whatever.

Of course the crappy part is that some packages would require all libraries being present for compiling. I know that is relatively easy for Linux, but I don't know what that is like to setup for OS X or Windows.

Remote imports for Python?

Posted Aug 31, 2017 3:12 UTC (Thu) by smckay (guest, #103253) [Link]

Homebrew is pretty dang good these days. It's basically Portage on OSX except the default build configuration has binaries available.

Remote imports for Python?

Posted Sep 1, 2017 10:07 UTC (Fri) by amarao (guest, #87073) [Link]

There is a huge difference between 'direct from http' and apt-get (yum) install.

1. when we compile anything from external git we clone it into our git.
2. Normally all code comes with tests and those tests are executed at build time (at CI)
3. Packages are artifacts, they are stored and used every time in pristine manner to rebuild working environment on each run.

If you replace whole build->publish->install cycle, you will have flaky production. What if there was a hiccup in the connectivity during installation? What if author pushed new version between deploying two servers with same dependencies (but their subdependencies unpinned?).

Best practice for DevOps:

1. Everything can be rebuild and deployed automatically.
2. And we have it sources.
3. On our premises
4. Even build system and jobs can be rebuild from git.
5. Every deploy can be repeated in precise manner.
6. Everythin committed is covered by tests.

Remote imports for Python?

Posted Sep 2, 2017 11:45 UTC (Sat) by robert_s (subscriber, #42402) [Link]

The "distro build package" step you mention here is actually more like: distro build package, run tests, make sure it works properly with all the other components of the distro.

And most importantly, if a package author goes AWOL or starts adding things that might not be in the users interest, the distro has the ability to patch the code or even transparently switch to a different upstream.

Remote imports for Python?

Posted Sep 3, 2017 7:13 UTC (Sun) by johan (guest, #112044) [Link]

> It's not really any different then how people get software now. It just cuts out steps.

It could potentially cut out pip from the loop yes, but how hard is a "pip install" anyway?
We are not talking about simplifying something that is currently complex, we are talking about shaving of a few minutes at max in their workflow.
So why should we need yet another way to do the same thing in a slightly less secure manner?

Personally i see this httpimport as a great library, but I don't see much reason to add it to the python standard library.
There's probably more people afraid of the security of python than who wants this feature, so adding it to the python standard library doesn't make sense.

Remote imports for Python?

Posted Aug 31, 2017 7:17 UTC (Thu) by jwilk (subscriber, #63328) [Link]

> [HTTPS] surely was not in widespread use by 1995.

Indeed. HTTPS support was added to stdlib in Python 2.0, released in 2000.

Source: https://docs.python.org/3/whatsnew/2.0.html#module-changes

Remote imports for Python?

Posted Aug 31, 2017 8:44 UTC (Thu) by lamby (subscriber, #42621) [Link]

"age post" ?

Remote imports for Python?

Posted Aug 31, 2017 8:45 UTC (Thu) by Sesse (subscriber, #53779) [Link] (1 responses)

So what happens when you want to move from e.g. Gitlab to Github (or your URL becomes inaccessible for any other reason), but there's 200k users out there with your module URL hard-coded in their imports?

/* Steinar */

Remote imports for Python?

Posted Sep 2, 2017 17:34 UTC (Sat) by sasha (guest, #16070) [Link]

Here in Russia, the government authorities can block access to some IPs. At some point they blocked access to GitHub (I do not remember the reason: hate speech or substances). It broke a lot of things for large corporations, so GitHub was immediately unblocked. The programmers explained that they were following "the world best practices".

The possibility to replace a curl script with direct python import changes nothing here.

Remote imports for Python?

Posted Aug 31, 2017 8:52 UTC (Thu) by dunlapg (guest, #57764) [Link]

As Torakis's age post indicates, there may be something of a generation gap surrounding this issue. The GitHub-centric, rapid-fire development style meets the grizzled graybeards who still sport some of the scars of security issues past.

I think it's worth noting that in the world the actual greybeards grew up in was a world of open ports and unsecured SMTP relays -- a world where you typed your password in the clear into telnet and ftp. In other words, a world just as trusting and impatient as the "curl http://somecode.com/install.sh | sudo sh" crowd. The difference is not one of culture or personality, it's one of experience.

Remote imports for Python?

Posted Aug 31, 2017 12:09 UTC (Thu) by bernat (subscriber, #51658) [Link] (1 responses)

Go doesn't have this feature either. Package names look like an URL but are just names. "go get" will interpret them as URL (with a builtin set of rules), fetch them and put them in GOPATH for the compiler to find them.

Remote imports for Python?

Posted Aug 31, 2017 19:38 UTC (Thu) by lsl (subscriber, #86508) [Link]

Right, although it's even more layered than that.

To the compiler, import paths are opaque strings used for identifying a given module/package.

The 'go' build tool (roughly equivalent to what you would use CMake or whatever for in a C-based project) interprets them as file system references inside GOPATH and uses them to figure out what source files to pass to the compiler.

Only 'go get' tries to infer a remote location from them. What 'go get' does is let you say "clone this repo without making me figure out whether it uses Git, Mercurial or SVN and put it where the build system will pick it up". It's a convenience wrapper around VCS tools and is inherently targeted at developers, not users. You don't want to use it at deploy time.

Thus, Torakis' Go analogy is a bogus one.

Remote imports for Python?

Posted Aug 31, 2017 16:44 UTC (Thu) by cesarb (subscriber, #6266) [Link]

I don't know if it has already been mentioned on the thread, but there's another problem with remote imports: reliability. A local import can work even offline; a remote import depends on both the remote server and the network being up every time the process is started.

Remote imports for Python?

Posted Sep 1, 2017 13:20 UTC (Fri) by bandrami (guest, #94229) [Link]

from leftpad import unpredictability


Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds