Remote imports for Python?
Importing a module into a Python program is a pretty invasive operation; it directly runs code in the current process that has access to anything the process can reach. So it is not wildly surprising that a suggestion to add a way to directly import modules from remote sites was met with considerable doubt—if not something approaching hostility. It turns out that the person suggesting the change was not unaware of the security implications of the idea, but thought it had other redeeming qualities; others in the discussion were less sanguine.
In his first post to the python-ideas mailing list, security researcher John Torakis proposed imports via HTTP and HTTPS be added as a core Python feature. He also filed an enhancement request; both the post and the bug pointed to his httpimport repository on GitHub that has a prototype implementation. The justification he cited was always likely to set off some alarm bells:
Other languages, like golang, provide this functionality from their early days (day one?). Python development can be greatly improved if a "try before pip installing" mechanism gets in place, as it will add a lot to the REPL [read-eval-print loop] nature of the testing/experimenting process.
Chris Angelico vehemently opposed the
feature, at least for core Python: "This is a security bug magnet;
can you imagine trying to ensure
that malicious code is not executed, in an arbitrary execution
context?
" If the feature is explicitly enabled (via, say,
pip), it is much less worrisome, Angelico said. The idea of
allowing imports over regular HTTP is one that should be dropped, he said;
even HTTPS imports would require being "absolutely sure that your
certificate chains are 100% dependable
".
Oleg Broytman also opposed the idea, suggesting that it would require a Python Enhancement Proposal (PEP), instead of simply filing an enhancement request, to truly be considered. He also noted that there is a difference for Go's remote imports: those happen at compile time for Go, while they would be done at runtime for Python.
The README
for Torakis's httpimport repository mentions it being used as a
"staging protocol for covertutils
backdoors
". Torakis's
covertutils project is described as a "framework for backdoor
programming
"; that also got Angelico's attention:
Torakis responded to those complaints (in a
posting with non-standard quoting). The backdoor work is evidently part of
his day job and httpimport could be useful for that work,
especially for
rapid prototyping, testing, and debugging purposes. He did agree that HTTP
imports
are dangerous, but noted that doing so locally (i.e. only to localhost or
perhaps trusted local systems) could be useful for testing. He also objected
to the complaint about HTTPS, however, noting that certificate checks would
be done to eliminate the man-in-the-middle threat. In another posting, he put it this way:
"if you can't trust your certificate store now, and you
are afraid of Remote code execution through HTTPS, stop using pip
altogether
".
While indicating that httpimport would make an excellent addition to the Python Package Index (PyPI), Paul Moore pointed out another flaw with the idea of making the feature a core part of the language: it removes the ability for an organization's security team to disallow it. On the other hand, the team could restrict the ability to install modules from PyPI or simply blacklist certain entries such as httpimport. He said:
This is not the first time this idea has come up, Guido van Rossum noted; it was first proposed (and rejected) in 1995 or so—before HTTPS was invented, he said. He may be misremembering either the date or the status of HTTPS, as Wikipedia gives 1994 for its creation, though it surely was not in widespread use by 1995. As with others in the thread, Van Rossum was happy to see a third-party httpimport available to those who need it, but it is too much of a security concern to ever consider adding to the standard library.
Torakis said that he was two-years old at
the time of that decision and that
"times have changed
". He
said that he is willing to make changes to make httpimport
acceptable for the
standard library. It would make working with Python much easier for him
and others:
There was no support for adding it as a core feature in the thread, though.
A simple "pip install httpimport" is all that would be needed
to get access to the feature (once Torakis gets it added to PyPI, anyway).
So some thread participants wondered why it was so imperative that it
become part of the standard library. As Stephen J. Turnbull put it: "It's an attractive nuisance unless
you're a security person, and then pip is not a big deal.
" Echoing
Moore to some extent, Nick
Coghlan added another concern:
As a result, I'm pretty sure that even if upstream said "httpimport is in the Python standard library now!", we'd get explicit requests asking us to take it out of our redistributed version and make it at most an optional install (similar to what we do with IDLE and Tcl/Tk support in general).
There is at least one popular scripting language that makes this feature available as part of the language: PHP. Originally, directives like include and require could contain a URL; code would be retrieved from that URL, then executed. Over time, the wisdom of that choice has been questioned; these days, two configuration options (allow_url_fopen and allow_url_include) govern the behavior and remote inclusion is disallowed by default. While many scripting-style languages have ways to accomplish remote inclusion, it is considered something of a trap for the unwary, thus not elevated to be a top-level language feature.
Something that bears noting from the discussion is that installing code over HTTPS from PyPI is no more (or less) dangerous than doing so from GitHub—at least from a man-in-the-middle perspective. There is still the danger of using code from a malicious GitHub repository but, in truth, the same problem exists for PyPI. There is no active vetting of either GitHub repositories or packages uploaded to PyPI. HTTPS can ensure that you are connecting to the server holding the proper key, but it cannot protect you from asking for the wrong thing from that server.
As Torakis's age post indicates, there may be something of a generation gap
surrounding this issue. The GitHub-centric, rapid-fire development style
meets the grizzled graybeards who still sport some of the scars of security
issues past. It is certainly true that it is easy enough to add remote
imports to a program (via httpimport or something hand-rolled),
but the idea is
that programmers will have reached a certain level of understanding when
they get to that point—hopefully enough to recognize the dangers of doing
so. In any
case, by not making it a top-level, supported feature, abuse of it is not
the responsibility of the Python core team. Avoiding that kind of
"attractive nuisance" (and the bugs it can spawn) is another lesson that
the Python graybeards have learned along the way.
Index entries for this article | |
---|---|
Security | Python |
Posted Aug 30, 2017 2:57 UTC (Wed)
by bferrell (subscriber, #624)
[Link]
Posted Aug 30, 2017 3:39 UTC (Wed)
by luto (guest, #39314)
[Link] (9 responses)
Posted Aug 30, 2017 5:04 UTC (Wed)
by josh (subscriber, #17465)
[Link] (3 responses)
Or the hash of a public key matching the private key the module is signed with.
Posted Aug 30, 2017 6:53 UTC (Wed)
by mokki (subscriber, #33200)
[Link] (1 responses)
99% of security bugs reported against java in the last 5 years have been about remote code escaping the sandbox.
Why would python want that security circus? It just makes the language seem bad when actually 99.9% of code never uses the remote execute feature and thys does not even enable the sandbox.
Enabling this in python without sandbox would be security nightmare and supporting a sandbox is known to be security nightmare.
Posted Aug 31, 2017 19:32 UTC (Thu)
by k8to (guest, #15413)
[Link]
That aside, the proposal for supporting it with a hash seems sort of vaguely OK, but I don't see the point. If you know the content you want to run ahead of time, why do you need to load it dynamically? I expect the major use pattern at that point will be people who write some code to generate the hash dynamically and then httpimport it, or in other words, the path of laziness.
Posted Sep 1, 2017 13:52 UTC (Fri)
by syops (guest, #115198)
[Link]
I do find myself wishing Subresource Integrity could be generalized to work with any HTTP GET. As it stands, SRI provides the resource in the src or href attribute, and the hash (or signature?) in a separate attribute. But that doesn't help with any arbitrary simple http application or library. Requests, pip and our favorite Then again, there's always IPFS. As I understand it, with IPFS, the hash is the address of the content.
Posted Aug 30, 2017 7:22 UTC (Wed)
by niner (subscriber, #26151)
[Link] (4 responses)
That said, it _is_ a good idea when downloading remote code is indeed wanted. Some Perl 6 modules for example are just bindings for native libraries. To simplify installation on Windows, their installation may involve downloading DLLs and it's common to use checksums to secure that.
Posted Aug 30, 2017 16:41 UTC (Wed)
by jezuch (subscriber, #52988)
[Link] (1 responses)
You're right, but that doesn't mean that it won't. I guess a Daily WTF article about some "quick hack" like this that ended up in production ("just temporarily, promise!"), won't be long coming :)
Posted Sep 10, 2017 5:32 UTC (Sun)
by Garak (guest, #99377)
[Link]
What I'm thinking in response to your comment is that such arguments of trying to protect people from shooting themselves in the foot don't seem that persuasive to me. I see a vast diverse ecosystem of developers, and I'm personally not bothered by the DWA factor. The news industry won't let the DWA factor fade, they'll just add another lense to that microscope and give people the entertaining idiot story to laugh at or be scared of for a moment.
Of course the example that comes to mind above is bannning swimming pools to save children because of course some parents who choose to have pools will fail, and kids will die. Somehow, for reasons I couldn't map out in a thesis, I'm more worried about the swimming pools than the dangers being talked about here if the feature gets added. And I've been able to swim for 3 decades now. I mean seriously, there will always be ways for software developers and businesses that rely on them to metaphorically shoot themselves in the foot. I don't see this as significantly effecting that in the long term. But OTOH, the obvious FOSS answer is - Fork Python if you care enough. Let the fittest thrive the most in the ecosystem. Somehow I doubt anyone cares about this feature that much. But it made for some sensationalist reaction commentary with academic entertainment value.
Posted Sep 1, 2017 0:53 UTC (Fri)
by ThinkRob (guest, #64513)
[Link] (1 responses)
The road to hell is paved with good intentions.
There is no doubt in my mind that as soon as this feature hits mainline, that a whole bunch of shops with "Agile" in their job descriptions are gonna go "Sweet, we don't need a build system anymore!" and run with it.
Remember: this is the reality in which something like 20% or whatever of npm modules broke when 'leftpad' went away. Good engineering discipline is in precious short supply nowadays...
Posted Sep 1, 2017 1:12 UTC (Fri)
by anselm (subscriber, #2796)
[Link]
It seems to me that a software system whose installation procedure is based on piping the output of curl into a local root shell forfeits any claim to “good engineering discipline” right there.
Posted Aug 30, 2017 3:53 UTC (Wed)
by smckay (guest, #103253)
[Link] (9 responses)
Posted Aug 30, 2017 16:29 UTC (Wed)
by drag (guest, #31333)
[Link] (8 responses)
Currently if you use pip it goes like this:
https server --> pip install foo --> import foo
If you use distro supplied packages it is like this:
https server ---> distro build package ---> https server ---> yum install python-foo --> import foo
In a real sense it's not a whole lot different then just going:
https server ---> import foo
In every case the original source of trust is the https server. Even the deb or rpm package signatures doesn't confirm that the original code wasn't pulled from a compromised source. Sometimes distros can audit the code, but that only happens for a minority of packaged software.
If security is really the priority then the real way to do it is to have the original developers sign a tarball or package of the source code prior to ever being posted to anything touching the internet and then establish a chain of trust all the way through to the end user. The easiest way to do that may just be to eliminate as many layers between end users and developers as possible. This doesn't prevent a third party from auditing the source code and giving their official blessing to specific revisions.
Posted Aug 30, 2017 16:49 UTC (Wed)
by adam820 (subscriber, #101353)
[Link] (4 responses)
Posted Aug 30, 2017 17:11 UTC (Wed)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Aug 31, 2017 15:57 UTC (Thu)
by gioele (subscriber, #61675)
[Link]
It is not that automated, but the Debian packages for many Rubygems are created, built and semi-automatically updated in a similar way, using gem2deb and gemwatch.
Posted Aug 31, 2017 1:20 UTC (Thu)
by drag (guest, #31333)
[Link] (1 responses)
I suppose so.
I like how Debian does it with apt-get. You have a signed list of packages with their checksums and locations and mirrors. You could mirror the file local to the list or on any server really. Then you don't have to really care where or how they are stored because you have their checksums in a secure manner. Https vs http vs ftp vs nfs mount or whatever... doesn't matter.
Nothing revolution or weird or remarkable or even that much different. The difference between this sort of setup versus distro packages is that it would be OS agnostic since it would be largely source code based. The same list of packages would be for OS X vs Linux vs Windows or whatever.
Of course the crappy part is that some packages would require all libraries being present for compiling. I know that is relatively easy for Linux, but I don't know what that is like to setup for OS X or Windows.
Posted Aug 31, 2017 3:12 UTC (Thu)
by smckay (guest, #103253)
[Link]
Posted Sep 1, 2017 10:07 UTC (Fri)
by amarao (guest, #87073)
[Link]
1. when we compile anything from external git we clone it into our git.
If you replace whole build->publish->install cycle, you will have flaky production. What if there was a hiccup in the connectivity during installation? What if author pushed new version between deploying two servers with same dependencies (but their subdependencies unpinned?).
Best practice for DevOps:
1. Everything can be rebuild and deployed automatically.
Posted Sep 2, 2017 11:45 UTC (Sat)
by robert_s (subscriber, #42402)
[Link]
And most importantly, if a package author goes AWOL or starts adding things that might not be in the users interest, the distro has the ability to patch the code or even transparently switch to a different upstream.
Posted Sep 3, 2017 7:13 UTC (Sun)
by johan (guest, #112044)
[Link]
It could potentially cut out pip from the loop yes, but how hard is a "pip install" anyway?
Personally i see this httpimport as a great library, but I don't see much reason to add it to the python standard library.
Posted Aug 31, 2017 7:17 UTC (Thu)
by jwilk (subscriber, #63328)
[Link]
Indeed. HTTPS support was added to stdlib in Python 2.0, released in 2000.
Source: https://docs.python.org/3/whatsnew/2.0.html#module-changes
Posted Aug 31, 2017 8:44 UTC (Thu)
by lamby (subscriber, #42621)
[Link]
Posted Aug 31, 2017 8:45 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (1 responses)
/* Steinar */
Posted Sep 2, 2017 17:34 UTC (Sat)
by sasha (guest, #16070)
[Link]
The possibility to replace a curl script with direct python import changes nothing here.
Posted Aug 31, 2017 8:52 UTC (Thu)
by dunlapg (guest, #57764)
[Link]
I think it's worth noting that in the world the actual greybeards grew up in was a world of open ports and unsecured SMTP relays -- a world where you typed your password in the clear into telnet and ftp. In other words, a world just as trusting and impatient as the "curl http://somecode.com/install.sh | sudo sh" crowd. The difference is not one of culture or personality, it's one of experience.
Posted Aug 31, 2017 12:09 UTC (Thu)
by bernat (subscriber, #51658)
[Link] (1 responses)
Posted Aug 31, 2017 19:38 UTC (Thu)
by lsl (subscriber, #86508)
[Link]
To the compiler, import paths are opaque strings used for identifying a given module/package.
The 'go' build tool (roughly equivalent to what you would use CMake or whatever for in a C-based project) interprets them as file system references inside GOPATH and uses them to figure out what source files to pass to the compiler.
Only 'go get' tries to infer a remote location from them. What 'go get' does is let you say "clone this repo without making me figure out whether it uses Git, Mercurial or SVN and put it where the build system will pick it up". It's a convenience wrapper around VCS tools and is inherently targeted at developers, not users. You don't want to use it at deploy time.
Thus, Torakis' Go analogy is a bogus one.
Posted Aug 31, 2017 16:44 UTC (Thu)
by cesarb (subscriber, #6266)
[Link]
Posted Sep 1, 2017 13:20 UTC (Fri)
by bandrami (guest, #94229)
[Link]
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
curl | sudo bash
aren't designed to checksum the data they're fetching. I'd like to think this could be solved in a future implementation of http (maybe rolling hashes could be used to avoid the pitfall of the client having to stage and hash a very large download before writing it to disk), but I'm a dreamer.Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Good engineering discipline is in precious short supply nowadays...
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
2. Normally all code comes with tests and those tests are executed at build time (at CI)
3. Packages are artifacts, they are stored and used every time in pristine manner to rebuild working environment on each run.
2. And we have it sources.
3. On our premises
4. Even build system and jobs can be rebuild from git.
5. Every deploy can be repeated in precise manner.
6. Everythin committed is covered by tests.
Remote imports for Python?
Remote imports for Python?
We are not talking about simplifying something that is currently complex, we are talking about shaving of a few minutes at max in their workflow.
So why should we need yet another way to do the same thing in a slightly less secure manner?
There's probably more people afraid of the security of python than who wants this feature, so adding it to the python standard library doesn't make sense.
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
As Torakis's age post indicates, there may be something of a generation gap surrounding this issue. The GitHub-centric, rapid-fire development style meets the grizzled graybeards who still sport some of the scars of security issues past.
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?
Remote imports for Python?