LWN: Comments on "Malicious software libraries found in PyPI" https://lwn.net/Articles/733853/ This is a special feed containing comments posted to the individual LWN article titled "Malicious software libraries found in PyPI". en-us Fri, 05 Sep 2025 11:50:21 +0000 Fri, 05 Sep 2025 11:50:21 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Malicious software libraries found in PyPI https://lwn.net/Articles/734434/ https://lwn.net/Articles/734434/ mathstuf <div class="FormattedComment"> I've looked at Canopy before and hit the problem that their modules which require C/C++ can't be shared. For example, h5py ships a copy of HDF5 without headers. This means that to not conflict with it from your HDF5-using C/C++ code, you need to mangle your HDF5 symbols. It also means that if you have a vendor MPI you require and an HDF5 with MPI support, their h5py isn't useful. Has this been addressed?<br> </div> Thu, 21 Sep 2017 11:14:29 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734404/ https://lwn.net/Articles/734404/ arvidma <div class="FormattedComment"> They do, but they are way too far behind on updates and way too limited in package selection. <br> <p> I did get a suggestion a few comments ago about Enthought's repository. That seems to be pretty close to what I'm looking for. I've done a bit of googling in the past for something like that, but never came across them before. Or perhaps I just disregarded them due to their marketing being so focused on "science". It is far from complete and would require major changes to our workflows, but might be possible to live with for daily development work. Prices for lower tier offerings are very reasonable too, though the Enterprise version is unspecified and would be what I would want.<br> <p> Since "everyone" is already using PyPI/DevPI and Wheels, it would be much nicer with a commercial index that was plug-and-play with existing infrastructure and workflows.<br> </div> Thu, 21 Sep 2017 05:54:05 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734328/ https://lwn.net/Articles/734328/ smcv <div class="FormattedComment"> <font class="QuotedText">&gt; It surprises me that there is no commercial service that provides an audited subset of PyPI.</font><br> <p> Some Linux distros are commercial, and most Linux distros (commercial or otherwise) contain a curated (or even audited) subset of PyPI (and also non-Python equivalents like CPAN, CTAN, ELPA).<br> </div> Wed, 20 Sep 2017 16:15:37 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734157/ https://lwn.net/Articles/734157/ arvidma <div class="FormattedComment"> The pypi index in DevPI works like a transparant proxy. First request for a package will trigger a fetch from upstream PyPI. Nice thing about that, is that you have zero-effort, up-to-date local copy of only those package that you use.<br> <p> You can operate it offline as well, of course, by doing manually/scripted upload to a local custom index. Perhaps there are filters as well. I haven't dug any deeper in the config, than was necessary to solve the problems in front of me.<br> </div> Tue, 19 Sep 2017 18:11:26 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734147/ https://lwn.net/Articles/734147/ drag <div class="FormattedComment"> Right, but the 'remote import' is just one of a hundred thousand ways things can go wrong. And this is something that should be easy for python package management people to address with a pretty simple and low-overhead approval process.<br> <p> Why they don't do that yet is a bit of a mystery, but it is something that should be able to be done without a whole lot of sacrifice. <br> </div> Tue, 19 Sep 2017 15:22:09 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734065/ https://lwn.net/Articles/734065/ clump <blockquote> How pretty much every distro works is the same regardless of what distribution you are talking.</blockquote> Please do not spread misinformation. Enterprise distros in particular spend a lot of time auditing and hardening source code. Mon, 18 Sep 2017 23:56:36 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734062/ https://lwn.net/Articles/734062/ pboddie <div class="FormattedComment"> But can you filter packages with DevPi? (I guess you must be able to because surely no-one really wants to privately mirror PyPI in its entirety.) That would at least let you stay within the realm of packages whose authors you mostly trust, with random new dependencies being excluded and flagged for approval/auditing.<br> <p> Auditing every update to every package is too much effort unless, as I noted, you can get the revenue to pay people to do it. And traditional distributions tend to cultivate some kind of relationships between package maintainers and upstream maintainers, even though much is made of antagonism between these parties, which potentially means getting lots of people involved. Just like traditional distributions do, of course.<br> </div> Mon, 18 Sep 2017 23:32:20 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734050/ https://lwn.net/Articles/734050/ pboddie <div class="FormattedComment"> I'm not a Debian developer, just someone who packaged something. But my point was that there is a difference between getting the software packaged in the first place and maintaining the package once it exists in Debian. Even if the latter degrades to the level of passive uploading, there will have been a reasonable effort to look at the software in the first place, including an assessment of whether a maintainer really wants anything to do with it.<br> <p> So, that guy whose "remote imports" package was discussed recently would presumably struggle to find anyone wanting to package that, largely because no-one really wants to be on the spot when people use it as a huge backdoor on lots of people's systems. Meanwhile, getting such stuff into PyPi and pip-installed everywhere is relatively trivial.<br> </div> Mon, 18 Sep 2017 21:10:21 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734047/ https://lwn.net/Articles/734047/ drag <div class="FormattedComment"> How much of your experience as a Debian developer involves actually auditing the source code of what you package? <br> <p> Licensing and copyright is a necessary thing, but it really doesn't have much practical impact on security or safety of the software. <br> </div> Mon, 18 Sep 2017 20:37:09 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734037/ https://lwn.net/Articles/734037/ pboddie <blockquote>How pretty much every distro works is the same regardless of what distribution you are talking.<br/> For the vast majority of packages they simply pull from upstream, add the package metadata, compile, and upload.</blockquote> <p>In terms of actually getting the software into the distro, neither of these statements correspond to my own experiences.</p> <p>First of all, when packaging something for Debian, I found that the person packaging the same code for Fedora seemed to be finished already while I, the author and various Debian developers were still thrashing out the different copyright and licensing details of the files that would, as essential functionality, be going into packages for both distros. Maybe I missed all the copyright metadata for the Fedora package, or maybe Fedora isn't as strict as Debian.</p> <p>And while it may be true that uploaders get into a routine after a while, and here there may be things that slip through the net, getting the software packaged in the first place is absolutely not a simple matter of sticking a label on and uploading it. If it were, distros would accumulate software much more readily and there wouldn't be all those people claiming that distros only provide "ancient" software.</p> Mon, 18 Sep 2017 18:15:57 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734034/ https://lwn.net/Articles/734034/ arvidma <div class="FormattedComment"> Local mirroring via DevPi is a no-brainer, especially if you use a CI system and have a bit of commit traffic. In most environments it speeds things up enormously in addition to providing some protection against upstream outages etc. And it is so nice using private indices for internal components that are shared among different projects! I really do love my DevPI server.<br> <p> It is, however, a much to big undertaking to cherrypick and audit every single update to every n-th degree dependency, so in reality you end up not adding any kind of security in the local repository.<br> </div> Mon, 18 Sep 2017 18:05:45 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734023/ https://lwn.net/Articles/734023/ arvidma In my experience, it is easy to get management to pay the cost of tools and services that bring tangible value to a development team. Even if that cost is quite substantial. On the other hand, it can be quite hard to get management to spend <i>more</i> money than strictly necessary, even if that amount of extra money is not very substantial. Mon, 18 Sep 2017 17:55:44 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734020/ https://lwn.net/Articles/734020/ drag <div class="FormattedComment"> How pretty much every distro works is the same regardless of what distribution you are talking.<br> <p> For the vast majority of packages they simply pull from upstream, add the package metadata, compile, and upload. <br> <p> The amount of testing that goes on is going to be extremely minimal. If the package maintainer uses their own packages they may provide some testing to make sure it works on their own stuff, but that is about it. They shove it into some testing version of the distribution for a arbitrary amount of time to let it 'bake' and then copy it over to the more stable releases or channels or whatever they want to call it. <br> <p> If problems crop up for users bugs are filed and then fixes are made if the package maintainer has the time and inclination to do so. <br> <p> And that is pretty much it. There are high profile packages like the kernel or firefox or openssl that distributions put a lot of time and effort into, but for every one of those there are going to be dozens of other packages that are just pushed out without much of a second thought. <br> <p> That's not a indictment or calling the maintainers lazy or saying they do nothing or the process that distributions go through is worthless. It just is what it is. It's reality due to the limitation on the amount of time people are able to dedicate to this sort of thing. <br> <p> <p> Distributions perform some important 'weeding out', but also if people depended on distributions for all their software then that would eliminate a huge percentage of open source software that will never get used. If you have to wait for software to be mature before it's packaged and people are allowed to use it then that means there can never be any new software written.<br> <p> <p> <font class="QuotedText">&gt; Yes. The developer always want the latest and greatest version. Sometime the user of his software disagree.</font><br> <p> A ideal package management solution should allow people to install the version they want if they request it. <br> </div> Mon, 18 Sep 2017 15:56:12 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/734017/ https://lwn.net/Articles/734017/ farnz <p>I think there are a couple of good reasons why a distro package is, on average, less likely to be backdoored than a pypi package: <ol> <li>When the distro packager tries to package something that uses (say) <tt>urllib</tt> instead of <tt>urllib3</tt> that's already packaged by someone else, they're likely to spot the issue and ask why you're not using the existing package. This raises the bar for an attacker - they now have to have a convincing reason to use their package instead of the one they cloned from, otherwise the question is likely to result in them being spotted. <li>The bar for becoming a distro packager tends to be higher than the bar for putting code up on pypi and similar package lists (for good reasons in both cases). This again raises the bar - the attacker has to do more than typo-squat a commonly used package to get their first package into the distro. Thus, if you use distro-packaged stuff, you're more likely to be pointed at the commonly used library, not the typo-squat. </ol> <p>Note that both of these are quantitive increases in difficulty, not panaceas - it's entirely possible that a distro package is backdoored, too, it's just that the bar to getting there is higher (more likely to be a TLA, less likely to be a bored kid). Mon, 18 Sep 2017 15:49:02 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733961/ https://lwn.net/Articles/733961/ diegor <div class="FormattedComment"> <font class="QuotedText">&gt;A random Debian packager (look, someone uses Debian Developer, but in reality is a Debian &gt;Packager...) touches the source code of a package like it is its creator, implying a good </font><br> <font class="QuotedText">&gt;amount of arrogance, in the form of *I'm the Debian maintainer! I'm better than it's authors*.</font><br> <p> Maybe we are going OT, but you should not decide what people think in their head. Of course it was a mistake, but for example you're implying that the dd never tried to discuss with authors the change. And that was not true.<br> <p> But I don't want to transform this thread in "I don't like debian, let's rant". I prefer to discuss on the merit of using pypi vs distribution package where distribution can be *any* distribution.<br> <p> <font class="QuotedText">&gt;A more sound process is the one taken by rolling-release distros, which simply say: the authors of software "Foo" have released version 1.1.0. Since the authors know their software, the distros have to ship this version.</font><br> <p> Of course you need or want a rolling release distro, you should go for it. <br> <p> But it's not always so simple. Sometimes software work fine with old version of libraries, and not work with the bleeding edge version. Sometime API changes. And if you use a rolling release distro, you find out that sometimes some software stop working for this reason.<br> <p> Are you willing to take the small risk? Go for it. A lot of people use arch and are happy.<br> Someone else prefer have a more conservative approach. Nobody is doing wrong. It's just that different people have different needs.<br> <p> <font class="QuotedText">&gt; you, the developer, that knows the software like no-one else, really care about the latest version, not some version released 1 year ago.</font><br> <p> Yes. The developer always want the latest and greatest version. Sometime the user of his software disagree. And usually the solution is to have many version of the same library embedded in different software, that solve one problem and introduce another one: usually nobody care to fix the embedded one. <br> <p> In my experience, I've seen custom software maintained by more of 10 years, with embedded some libraries that are 10 years old. Go figure.<br> <p> <p> <p> </div> Mon, 18 Sep 2017 14:47:53 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733960/ https://lwn.net/Articles/733960/ diegor <div class="FormattedComment"> <font class="QuotedText">&gt;&gt; How many intrinsically malicious people are likely to shoulder that burden of extended bad faith?</font><br> <font class="QuotedText">&gt;Quite a few, given the existence of three-letter agencies around the world. Fortunately, the &gt;payoff for subverting some random Debian packages is not that great. </font><br> <p> Yes. They don't need it. They just put their code on pypi with no effort, and people are very happy to include in their project.<br> </div> Mon, 18 Sep 2017 14:11:53 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733953/ https://lwn.net/Articles/733953/ pboddie <div class="FormattedComment"> ActiveState used to provide a Python package repository, not related to PyPI, for the users of their own Python distribution. Maybe Enthought does something similar for their own products.<br> <p> One might imagine the more resourceful users of packages from PyPI at least mirroring the service locally, just to avoid connectivity and availability issues, but that requires infrastructure effort that some companies are trying to avoid as they outsource just about everything else. So I think there could be demand, but then the operator might want to curate the packages offered (to avoid what the article references), and then it becomes a question of balancing costs and revenue. And persuading people that the benefits are worth it, of course.<br> </div> Mon, 18 Sep 2017 10:40:07 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733945/ https://lwn.net/Articles/733945/ kronat <div class="FormattedComment"> <font class="QuotedText">&gt; I don't know how many upstream packager are also distro packager, but I suspect not many.</font><br> <p> There are a lot of distros... It's a huge amount of work doing the development and the package maintenance.<br> <p> <font class="QuotedText">&gt; The point is that public repository have no QA beside the author. </font><br> <p> Which to me it's a plus point. The root cause of the necessity of all these packages providers is the long release cycle by some distros. A random Debian packager (look, someone uses Debian Developer, but in reality is a Debian Packager...) touches the source code of a package like it is its creator, implying a good amount of arrogance, in the form of *I'm the Debian maintainer! I'm better than it's authors*. This has led to catastrophes (remember the random number generator issue?) and to the increased presence of third-party packager, as we are discussing, because developers are not, generally, fine with old versions of the software. Your software release uses the latest version of your dependencies, and if you don't have -stable branches of your software (like in the majority of the OSS software) you, the developer, that knows the software like no-one else, really care about the latest version, not some version released 1 year ago.<br> <p> A more sound process is the one taken by rolling-release distros, which simply say: the authors of software "Foo" have released version 1.1.0. Since the authors know their software, the distros have to ship this version. They are not arrogant enough to say 'ok, let's backport bug fixes to 0.8.0, leaving out new features'...<br> <p> </div> Mon, 18 Sep 2017 09:03:02 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733937/ https://lwn.net/Articles/733937/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; The process to become a Debian developer, on the other hand, is onerous, requiring demonstration of good faith efforts to master Debian policy through an extended mentorship. </font><br> The consequence is that very few native PIP packages are present in Debian or other distros. <br> <p> <font class="QuotedText">&gt; How many intrinsically malicious people are likely to shoulder that burden of extended bad faith? </font><br> Quite a few, given the existence of three-letter agencies around the world. Fortunately, the payoff for subverting some random Debian packages is not that great. <br> </div> Mon, 18 Sep 2017 02:00:54 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733936/ https://lwn.net/Articles/733936/ njs <div class="FormattedComment"> To some extent this is what the standard library ends up being. Apparently there are quite a few places that forbid the use of third-party packages entirely, or have some onerous approval procedure you have to go through before you can use them, and this creates pressure to move popular packages into the standard library (even though this is almost entirely a bad thing for everyone else).<br> <p> I'm dubious that anyone's would pay money for this service, though. PyPI is barely even surviving at the current level of support, and businesses that rely on it don't seem to care about that.<br> </div> Mon, 18 Sep 2017 00:12:48 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733927/ https://lwn.net/Articles/733927/ arvidma <div class="FormattedComment"> It surprises me that there is no commercial service that provides an audited subset of PyPI. I think most busineses that rely on Python would be happy to pay serious money for that.<br> <p> </div> Sun, 17 Sep 2017 20:45:19 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733926/ https://lwn.net/Articles/733926/ donbarry <div class="FormattedComment"> Unbacked, perhaps, but a very reasonable assertion. Many if not most contributors to perl, python, ruby, and javascript repositories are fly-by-night, unknown individuals. That most of the code contributed is in good faith is testament to the general well-meaning character of individuals. But there is no barrier to code contribution, which means also no barrier to malicious code contribution.<br> <p> The process to become a Debian developer, on the other hand, is onerous, requiring demonstration of good faith efforts to master Debian policy through an extended mentorship. How many intrinsically malicious people are likely to shoulder that burden of extended bad faith? <br> <p> <p> </div> Sun, 17 Sep 2017 20:24:04 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733921/ https://lwn.net/Articles/733921/ rweikusat2 <div class="FormattedComment"> There's no reason to assume that "random Debian developers" are inherently more trustworthy than "other random developers". And "the distribution packager will at least ..." is an unbacked assertion.<br> <p> </div> Sun, 17 Sep 2017 17:41:44 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733907/ https://lwn.net/Articles/733907/ diegor <div class="FormattedComment"> I don't know how many upstream packager are also distro packager, but I suspect not many.<br> <p> While anybody can upload a new package on pypi or similar repository, to upload a new package in a distribution, you need to become a contributor, that usually require some efforts.<br> <p> It happens that sometime the distro packager became the upstream, because the package is no more maintained upstream and some fixed are required.<br> <p> The point is that public repository have no QA beside the author. <br> </div> Sun, 17 Sep 2017 10:08:41 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733905/ https://lwn.net/Articles/733905/ karkhaz <div class="FormattedComment"> I've been curious about this for a while. If nobody knows the answer to this I'll try to write some automatic trawl to figure it out myself, but:<br> <p> how often is the upstream packager the same as the distro packager?<br> <p> There are a lot of people who advocate _never_ using pip, npm, gem etc. and prefer to use distro-provided packages only. I understand the argument as follows: any random developer can push something useful to pip, and then a few weeks later update it to contain a back door, with no oversight at all. Whereas the distribution packager is going to do at least a cursory review of the diff before packaging and distributing the update.<br> <p> But there are surely a bunch of upstream developers who are also Debian Developers for the sole purpose of packaging their software? If I get some time, I might try parsing the Debian package database. My idea is to get the maintainer from the database, and if the "external link" field points to a popular package repository (pip, npm etc) then download and scrape the webpage that the link points to to find who the original author is, and compare...<br> </div> Sun, 17 Sep 2017 09:40:23 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733904/ https://lwn.net/Articles/733904/ diegor <div class="FormattedComment"> Correction: this is why you should not simply do: pip install $package .<br> <p> But it's a common problem. Public repository like Pypi and similar don't check the code, and probably they never do. And there is no policy about same or similar module name.<br> <p> Atm the only solution I see, it is to prefer module installed from your preferred distribution. At least you have some basic check that avoid more simple attack.<br> <p> <p> </div> Sun, 17 Sep 2017 09:16:31 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733880/ https://lwn.net/Articles/733880/ flussence <div class="FormattedComment"> Given the timing, it seems like this is the tail end of a 90-day responsible disclosure policy.<br> </div> Sat, 16 Sep 2017 19:32:52 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733882/ https://lwn.net/Articles/733882/ lucifargundam <div class="FormattedComment"> This is why you don't copy/paste code.<br> </div> Sat, 16 Sep 2017 19:27:02 +0000 Malicious software libraries found in PyPI https://lwn.net/Articles/733863/ https://lwn.net/Articles/733863/ aklaver No offense to the National Security Authority of Slovakia, but this is old news: http://incolumitas.com/data/thesis.pdf https://github.com/pypa/pypi-legacy/issues/379 https://mail.python.org/pipermail/distutils-sig/2017-June/030592.html Sat, 16 Sep 2017 04:09:50 +0000