PyTorch and the PyPI supply chain
The PyTorch compromise that happened right at the end of 2022 was rather ugly, but its impact was not widespread—seemingly, at least. The incident does highlight some of the perils of relying on an external "supply chain" for the components that are used to build one's software. It also would appear to be another case of "security researchers" run amok, though perhaps that part of the story is only meant to cover the tracks—or ass—of the perpetrator.
Beyond that, the incident shows that the Python Package Index (PyPI) and the pip package installer act in ways that arguably assisted the compromise. That clearly comes as a surprise to many, though those behaviors are well-known and well-established in the Python Package Authority (PyPA) community. There is, at minimum, a need for education on that topic.
Compromise
People (or continuous-integration bots) who installed the nightly build of the PyTorch machine-learning framework using pip between December 25 and 30 got an unwelcome surprise. A binary program was installed with a dependent module that was triggered when that module was imported into a PyTorch-using code base. That binary gathers system information (e.g. name servers, host names) and the contents of various interesting files (e.g. /etc/passwd, $HOME/.ssh/*, the first 1000 files in $HOME), then uploads that information to an attacker-controlled server using encrypted DNS queries.
In order to build PyTorch, multiple dependencies of various sorts are required. Some are regular PyPI packages that should be downloaded from that repository, while others are PyTorch-specific packages that should come from the PyTorch nightly repository. A single pip command is used to install from both PyPI and the PyTorch nightly repository given on the command line, but pip does not distinguish between the two repositories; it treats them both as equal possibilities for fulfilling the need for a given package.
If there is a dependency on, say, torchtriton from some other part of PyTorch and there is a package by that name available on PyPI, pip can choose it to install instead of the one by the same name in the PyTorch repository. That is exactly what happened, of course; an attacker registered the torchtriton PyPI package and uploaded a version of that code that functioned exactly the same as the original—except that it added the malicious payload that is executed when it is imported. It is unknown exactly how many sites were actually affected, but the malicious torchtriton package was downloaded from PyPI around 2,800 times, according to a lengthy analysis of the compromise by Tzachi Zorn.
Once the PyTorch project was alerted to the malware at PyPI on December 30, it took immediate steps to fix the problem. The torchtriton package name was removed as a dependency from PyTorch and replaced with pytorch-triton; a placeholder project called pytorch-triton was registered at PyPI so that the problem could not recur. In addition, PyTorch nightly builds that referred to torchtriton as a dependency were removed from the repositories so that any cached versions of the malicious package would not be picked up inadvertently. The PyPI administrators were also alerted and they promptly removed the malicious package. On December 31, the project put out the advisory linked above.
The analysis by Zorn (and another by Ax Sharma at BleepingComputer) describe efforts by the perpetrator of the attack to explain their actions. At first, the domain used for DNS lookups in order to exfiltrate the information put up a short text message [archive link] claiming that the information was gathered simply to identify the companies affected so that they could be alerted. Another, longer message was apparently sent to various outlets with similar claims, including that all of the data gathered by malicious payload had been deleted, which can be seen in those articles. It is pretty much impossible to verify one way or the other; it could be truthful and heartfelt—or it could simply be damage control.
Dependency confusion
The type of problem being exploited here is called "dependency confusion"; the technique was popularized by Alex Birsan in 2021, but the pip bug report linked above makes it clear that the problem was known in that community back in 2020. When the ‑‑extra‑index‑url option for pip is used, it consults that index and adds all of the packages it provides to its internal list. When it comes time to install a package, pip chooses the one with the highest version (or highest version that satisfies any version constraints that were specified) regardless of which repository it comes from.
PEP 440 ("Version
Identification and Dependency Specification") governs how pip
chooses which version to install. One might think pinning a dependency to a
specific version would be sufficient, but, as Dustin Ingram pointed
out in a recent discussion, that is not true. pip and other
installers "prefer wheels with more specific tags over less specific
tags
". That makes it relatively easy for an attacker to shadow even a
version-pinned dependency.
As Ingram noted in another message, the way to truly pin a dependency is by specifying the hash values of the binary artifacts to be installed as described in the pip documentation. That thread is interesting in other ways, however.
It starts with request
for help in convincing the security administrators at a company to unblock
PyPI. Kirk Graham ran into a problem at his company, which had wholesale
blocked the repository "because there were '29 malwared malicious
modules' at the site
". Those modules had long been removed from PyPI
but the reputation for unreliability lingered on. Brett Cannon pointed
out that there are lots of other places where malicious code can
sometimes be obtained:
My first question would be whether they block every project index out there (e.g., npm, crates.io, etc.), as they all have the same problem? Or what about GitHub? I mean where does the line get drawn for protecting you from potentially malicious code?My follow-up is how do they expect you to do use any open source Python code? If so, how are you supposed to get that code? Straight from the repositories? I mean I know lots of large companies that ban pulling directly from code indexes like PyPI, but then these are large companies with dedicated teams to get the source, store it internally, do their own builds of wheels, etc. If you block access to using what the projects provide you have to be up for doing all the work they provide in getting you those files.
Several in the thread pointed to various services and tools for managing dependencies of open-source components, which might help solve the problem at the company. Graham was clearly frustrated with the situation and his company, but once he found out about PyTorch, he changed his tune to certain extent:
Over the holidays there was malicious code added to PyTorch module on PyPi. That makes me think our Security Director is right. If there isn't better security from PyPi and GitHub those sites will be blocked by more and more companies. Open Source needs to be more secure. /sigh
That is not an entirely accurate picture of what happened, which was pointed out in the thread, but the larger point still stands. To outsiders it looks like PyTorch itself was compromised on PyPI, when what actually happened is more nuanced than that.
The pip bug report came
up in the thread as well. Reading through that report makes it clear
that the problem does not lend itself to a simple or straightforward fix.
The root of the problem is that people do not understand that using the
PyPI repository is not without risks and they fail to fully evaluate what
those risks are—and what they mean for their software supply chain. As
Paul Moore put
it when the bug was resurrected after the Birsan posting in 2021:
"But I do think that we're trying to apply a technology solution
to a people problem here, and that never goes well :-(
"
Much of what Moore and other PyPA developers have to say in the report is worth reading for those interested in the problem. So far, the most straightforward "solution" is to remove the ‑‑extra‑index‑url option entirely, but that has its own set of problems, as Moore noted:
There really is no "good" way of securing ‑‑extra‑index‑url if you look at it that way. Allow pip to look in 2 locations and you have to accept that all of your packages are now being served as securely as the least secure index. And the evidence of the "dependency confusion" article is that people simply aren't aware of that reality. So what the pip developers need to decide is whether our responsibility ends with having documented how multiple indexes work, or whether we should view the ability to have multiple indexes as an "attractive nuisance" and remove it to ensure that people aren't tempted to use it in an insecure manner.The clamour of voices arguing "this is a security flaw", plus the sheer stress on the maintainers that would be involved in arguing that this isn't our problem, suggests that we should remove the feature. But there's no doubt that it would penalise people who use the ability correctly - and it feels wrong to be penalising those people for the sake of the group who didn't properly assess the risks.
The bug report thread was brought to life again after the PyTorch mess, naturally. Moore describes some concrete steps that could be taken to address the problem, but it still requires someone (or some organization) willing to take on that work, make a proposal, and push it through to completion. So far there has been a lot of talk about the problem, but little in the way of action to fix it.
It really should come as no surprise that grabbing random code from the internet sometimes results in less than ideal outcomes. The flipside of that is that, usually, "sometimes" is extremely rare, which in some ways leads directly to the "attractive nuisance" argument. These kinds of problems are not new and are seemingly not going away anytime soon either. Each time we have an event like this PyTorch compromise, it gives open-source software another black eye, which is perhaps not entirely fair, but also not entirely surprising.
| Index entries for this article | |
|---|---|
| Security | Python |
| Security | Supply chain |
| Python | Packaging |
| Python | Security |
Posted Jan 12, 2023 1:13 UTC (Thu)
by koh (subscriber, #101482)
[Link] (24 responses)
Granted, I'm using Gentoo since quite a while, so to me it feels natural to say, e.g., '>=sys-libs/glibc-2.32::gentoo' in order to give the constraints
In a non-centralized setting with "‑‑extra‑index‑url" there is no 'local' name/reference to a repository, but that shouldn't be a problem - at least on the technical level. The URLs are still managed centrally (for most of the internet for most of the time - that's another can of worms, though).
I keep coming back to the question why every language needs their own package manager with the usual set of problems to (a) discover and (b) solve in incompatible manners...
Posted Jan 12, 2023 2:47 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (8 responses)
Because the alternative is waiting for distros to repackage a hundred thousand random project repos? Add in Nix, vcpkg, chocolatey, HomeBrew, etc.
Let's say I'm working on a project. I discover that I can split a new library out of it. What do I do? I make a repo (or directory; many language package managers don't care that much) and publish it. Users can upgrade to this version just fine today. If I need to wait for…something to happen elsewhere, my tool is stuck in out-of-date versions until someone picks up the ball and adds this new package.
Sure, you could say "just use what is in your distro", but that ignores reality. People want new compilers, new development tools, etc. These end up pulling in the same things the distro wants to provide, but in newer, incompatible versions. What are you to do? Uproot your distro when Debian turns out to be too slow?
I'm all for splitting out dependencies and using system copies when possible, but I can't link my development processes with Debian (or Arch, Fedora, etc.) release cycles. I've got work to do, you know? Far better to let developers pick their own distro sandbox they like playing in and letting them do development on top of it in a convenient manner.
Posted Jan 12, 2023 3:09 UTC (Thu)
by bferrell (subscriber, #624)
[Link] (2 responses)
This kewl new app will ONLY use the cutting edge version of the language... But the dev has no clue to document this.
It's beginning to make RPM dependency hell look simple
Posted Jan 12, 2023 11:44 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (1 responses)
No problem, create a patch, send it to the developer, they merge it, push a new minor release to PyPI and you can get on with it. In my experience, a month or two is the usual turnaround time. This fits in the release cycle, we just pause the ticket till the upstream release. Telling us to "use the version shipped by the distribution" is equivalent to saying "work around this bug for the next year or two". And it's not just one bug, it's several over several different packages. Eventually tracking which workarounds you're waiting for an upstream release becomes a significant amount of work.
Besides, workarounds are annoying, this is open source, we should be fixing the upstream packages, not working around the issues elsewhere.
I know projects with the strict rule that all packages must be installed from Debian. And it *almost* works. If the packages are missing features you can simply tell the product owner it's not possible yet. But there are always a few packages for which the Debian release is simply buggy, but such a corner case that only affects basically you it's not going to be updated there (because upstream has fixed it in a new version, and Debian isn't going to bump the version). So you end up making an exception for just these handful of packages (basically using py2deb). And hope it doesn't get too many.
The step from there to "just pull everything from PyPI with version/hash pinning" is very, very small.
Posted Jan 12, 2023 13:53 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
And don't forget that predicting what a distro will have when you release your new version is hard. If you're going to release in April 2023, and something you depend upon has a necessary update released in December 2022, is that update going to be in the current stable Debian release when you make your release?
If you guess wrong, you end up in one of two sub-optimal situations:
Bundling from a vendor source neatly sidesteps this - if Bookworm has the dependency version you need, then unbundling can be done then, while if it doesn't, no problem, you've got the vendored code. And then language repositories like PyPI make it simpler, because they're already working in terms of a dependency tree, not copied code, so you can look and go "aha, when I build the Debian package, I can unbundle libfoo, since Debian has the right version of libfoo already".
Posted Jan 12, 2023 15:37 UTC (Thu)
by rgmoore (✭ supporter ✭, #75)
[Link] (4 responses)
I don't know if waiting for the distributions to package everything is the right solution. I'm inclined to believe developers who say it's unlikely to work for them, if only because of the problem you highlight of there being too many libraries to package. I also believe, though, that the current solution of grabbing whatever is out there, trusting it's fine, and then acting surprised by supply chain attacks is also not working. It's just resulting in occasional spectacular failures rather than regular, boring unavailability of bug fixes and product enhancements.
What is needed is a system that provides some kind of real quality control, so developers can have a confidence the libraries they're using are what it says on the tin. This has the unfortunate side effect of slowing everything down for the QC step, but the alternative is occasionally getting pwned when attackers finally decide your system is worth attacking. Pretending everything is fine in an attempt to go as fast as possible is demonstrably not working.
Posted Jan 12, 2023 17:22 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (1 responses)
For example, it would be possible for someone to write a bot that checked if the contents of the wheels distributed by PyPI match the source in the indicated repository. The problem is there is nowhere to place this information in a way that is of any use. Or, it would be nice to straight up reject any package that has existed for less than 3 months. Or being able to namespace dependencies to ensure they come from the right repository.
Python is here paying the decision early on that no packaging/repository standard would be made and to let the community create one organically. It's biting back hard now. More recent languages did not repeat that mistake.
PS. Don't talk to me about solutions like Nexus which try to solve the problem on the client-side but don't really have any extra information to work with and so just end up adding an extra layer of frustration. Until the necessary information is available in machine readable form no client-side tooling can help.
Posted Jan 13, 2023 4:01 UTC (Fri)
by pabs (subscriber, #43278)
[Link]
https://github.com/crev-dev/
Posted Jan 18, 2023 1:15 UTC (Wed)
by hazmat (subscriber, #668)
[Link] (1 responses)
Posted Jan 18, 2023 15:34 UTC (Wed)
by hazmat (subscriber, #668)
[Link]
Posted Jan 12, 2023 3:00 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link] (5 responses)
I am less than half joking when I say "blame Perl".
Posted Jan 12, 2023 3:14 UTC (Thu)
by bferrell (subscriber, #624)
[Link] (1 responses)
A few years back a VERY common module got re-written and made major changes to the behavior of the code... With no documentation. They just thought it was a "good idea (tm)".
Post hasty, that got changed and while the new behavior WAS a good idea and kept, it became a "turn it on with a variable if you want it" vs "here, let me shove this down your throat".
Posted Jan 12, 2023 14:36 UTC (Thu)
by smoogen (subscriber, #97)
[Link]
Posted Jan 12, 2023 15:03 UTC (Thu)
by rgmoore (✭ supporter ✭, #75)
[Link]
In Perl's defense, the modern distribution system didn't exist yet when CPAN started. CPAN was built at more or less the same time as modern distribution packaging systems, so waiting for packages to go up on the distribution wasn't a serious option. Even if people had been willing to wait a few years for that system to develop, nobody knew that it was going to develop, or even that Linux was going to win the Unix wars, so some kind of homebrew packaging system was necessary.
Posted Jan 12, 2023 17:01 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (1 responses)
Posted Jan 22, 2023 9:42 UTC (Sun)
by oldtomas (guest, #72579)
[Link]
I think Perl's growth happened at a time where "fitting in an environment" was the obvious thing to do. One data point? POD has as one of its main targets man pages.
Python (re- [1]) started a trend which I'll call "language monotheism", where each language had (or thought it had) to fight for absolute dominance. I think this might be something for computer sociologists to study some day.
[1] Not the first round, mind you. Older people might remember C vs Pascal, quiche eaters and that. Of course, nowadays, in the era of overabundance, survival and money are more at stake than back then.
Posted Jan 12, 2023 3:05 UTC (Thu)
by flussence (guest, #85566)
[Link]
Posted Jan 12, 2023 8:45 UTC (Thu)
by ms (subscriber, #41272)
[Link] (4 responses)
I think that certainly helps. But there have also been lots of examples of submitting PRs that get malicious code in to repos; along with social engineering to take over code repos; and e.g. established chrome extensions being sold to a new owner and then malicious code gets injected. In these cases, the name of the repository hasn't changed.
Another thing that helps is getting away from this mantra of "always fetch the latest version that satisfies your semver constraints". If you take the Go approach of _minimal_ version rather than maximal, then the blast radius is much reduced: it is no longer sufficient to release a new compromised version - that on its own will not get picked up. You would also have to modify the deps of a repo that imports that, and of that, and so on, all the way up to the top.
What I absolutely detest is the attitude that "this is a people problem, we shouldn't try to solve it with technical means". Correct - you won't be able to _solve_ it. But that's not the point. The point is to reduce the probability of these farcical messes from occurring. And there is plenty of prior art out there that helps. Refusing to learn from that is just sticking your head in the sand.
Posted Jan 12, 2023 12:35 UTC (Thu)
by khim (subscriber, #9252)
[Link]
Sounds like cultivation of Log4Shells instead of “dependency confusion”. But yeah, that's definitely fit well into “simple non-solutions” scheme Go practices.
Posted Jan 14, 2023 19:42 UTC (Sat)
by KJ7RRV (subscriber, #153595)
[Link] (2 responses)
Posted Jan 15, 2023 11:14 UTC (Sun)
by farnz (subscriber, #17727)
[Link] (1 responses)
There cannot be a way to specify that an update is a security update without losing any gains from the "minimal version" route; there is no way to distinguish "malicious actor flags version with back door as security update" from "good actor flags version removing back door as security update".
As with so many things, it all comes down to trust. If you trust upstream to release good updates, you want to take their latest code. If you don't trust upstream, you should be locking exact versions, and reviewing every new release upstream manually before you bring it in (which, in turn, has to be your top priority in case the fixes are security relevant to your code).
Posted Jan 15, 2023 12:18 UTC (Sun)
by ms (subscriber, #41272)
[Link]
Both of these are relevant:
Posted Jan 12, 2023 12:19 UTC (Thu)
by khim (subscriber, #9252)
[Link] (2 responses)
Because Gentoo is not macOS or Windows, basically. Newbies to the programming would, inevitably, use one of these two. And if your language doesn't support them well then it's chances of being used in place of more popular alternative is almost null. And if you have something that works for beginners… people continue to use it for other things, because why not?
Posted Jan 12, 2023 20:14 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
However, I wouldn't recommend gentoo to newbies ... (unless, of course, they want to do things the hard way :-)
Cheers,
Posted Jan 12, 2023 21:19 UTC (Thu)
by khim (subscriber, #9252)
[Link]
The problem if distros is not technical, ultimately, it's social. Most distro makers drank too much FSF kool aid and now believe others want to create sources for others to use. Nothing could be further from the truth! Neither users nor developers are interested in the software for software sake. Their goal is to produce binary and to either give it away or use it. That's why disconnect is so deep. In a world where creation of software source is the goal you have to support various versions of dependencies (because this increases usability of your sources) and then, on top of that, you can afford “curated repos”, then, on top of that, you can provide “long term support” and all these other things. In a world where software source only exists because it's not very convenient to write in machine code directly… situation is radically different: developers assume that they would decide what dependencies they would use and what targets they would support and users decide they would decide what version of application they would download and use. Given insane disconnect between expectations it's no wonder no one is happy. Gentoo, NixOS and other such distros support that mode, but they make assumption that this desire to control everything goes to the core… but most developers and users don't go that far: they are Maybe if Gentoo or Nix supported macOS and Windows this would have been an acceptable compromise, but alas, they don't do that (at least they don't make it easy enough to use for newbie), thus we have no alternative to per-language package managers.
Posted Jan 12, 2023 6:55 UTC (Thu)
by bof (subscriber, #110741)
[Link] (10 responses)
All these language package repo things, run wide open to everyone uploading stuff. With the obvious downsides. So why isn't there trusted language "distros" with trusted groups of maintainers curating that into trustable, separate repos meant for the "consumers" out there? And why the frell does everybody consuming the packaging, accept that as God given (adding in a snide remark about the Dino distros)?
Posted Jan 12, 2023 7:12 UTC (Thu)
by maniax (subscriber, #4509)
[Link] (3 responses)
And this is not only a security question. Stuff "out there" is usually too bleeding edge to be reliable enough, and just fetching "the latest and greatest" is bound to break stuff.
Posted Jan 12, 2023 8:28 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (2 responses)
Stability in a changing world is an illusion or in many cases even a deception sold to the gullible companies who desire it but don't understand how fast the world really moves in terms of software compatibility with the rest of the world (both in terms of protocols, data formats,... and in terms of legal and regulatory frameworks,...) and security issues.
Posted Jan 12, 2023 8:58 UTC (Thu)
by ms (subscriber, #41272)
[Link]
I think everything really does just boil down to "you just have to trust other people". Yep, checksums, and version numbers, and all that goodness is great for verifying things don't change that you don't want to. I wouldn't want to be without that. But when I'm looking for a library to solve a particular problem, I look at the number of stars and forks, the rate of commits and who they're from, and the issue tracker, and that's my starting point for establishing trust. And I think it's a good thing: a society where the default behaviour is not to trust, not to give the benefit of the doubt, not to assume good, is not worth having.
Posted Jan 12, 2023 10:12 UTC (Thu)
by ballombe (subscriber, #9523)
[Link]
The code does not run in a vacuum. Distributions are much more familiar with the environment where the code will run,
Posted Jan 12, 2023 8:22 UTC (Thu)
by LtWorf (subscriber, #124958)
[Link]
And since it generally doesn't end up in malware being downloaded, the current system is good enough… until the next malware happens.
Posted Jan 12, 2023 12:45 UTC (Thu)
by khim (subscriber, #9252)
[Link]
Because there are no “consumers”? Developers want two things which can not be, obviously, satisfied simultaneouly: Distributions solve problem #2 well but entirely fail to handle #1. Language repos and AppStores solve #1 well, but suck at #2. Since half a loaf is better than no loaf developers stick to what solves one problem and can half-ass the 2nd one rather than use something that fails entirely to solve half of the problem.
Posted Jan 12, 2023 13:24 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Haskell has Stackage.
Posted Jan 12, 2023 15:05 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link] (1 responses)
Said 'consumers' will need to be willing to compensate the people who do this work; it's definitely not an effort which can be funded with volunteer time (and we can already see how that works in other areas).
Posted Jan 13, 2023 6:02 UTC (Fri)
by bof (subscriber, #110741)
[Link]
Absolutely. Where you have distributions now with significant parts of the important packages somewhat current in their latest and/or rolling releases, they are backed by enough manpower to have dedicated paid people take care of a certain subject area. And they have built their base of enterprise customer subscriptions to fund that in a sustainable fashion.
Seeing Python at the top of the yearly language popularity lists, I feel that something like that should work in the dynamic languages field, too.
So, Conda, right? Is it the only "player" right now doing something like that?
Posted Jan 12, 2023 22:06 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Because that's a lot of work. There are companies that are selling this as a service, but they all kinda suck.
Posted Jan 12, 2023 11:12 UTC (Thu)
by summentier (guest, #100638)
[Link] (1 responses)
But trying to coerce setuptools to do what you want it to do is not fun. Its code is an undocumented mess, its abstractions are leaky and incoherent, and its architecture like a Jenga tower resting on top of a pile of Mikado sticks. Look at nontrivial setup scripts bundled (e.g., for project such as numpy or tensorflow): they resemble ancient incantations much more than actual code.
So I do not envy pip's job. But much of what ails setuptools also seems to have infected pip: its documentation is ... terse, to say the very least, its code isn't great either, and it does like to act and fail in ever-surprising ways. Moreover, coming from Rust or Julia, it is very hard to be satisfied with the hodgepodge of virtualenvs one has to set up in case of dependency conflicts. So, respectfully, it seems in character that pip does something sub-optimal and I think a doc fix is not going to fix those deep structural issues. (Anaconda, while certainly well-intentioned, tends to make everything worse, particularly on supercomputers.)
I understand that pip is in a tight spot now with respect to backwards compatibility.
Posted Jan 12, 2023 17:10 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
I think at one point, NumPy's additions to setuptools were on the same order of size as setuptools itself. SciPy probably didn't make things any easier.
I will agree about the undocumented mess 100% though. Figuring out what could go into some fields (globs, symlink traversal, etc.) involved tracing the value(s) through the code to where they hit some active API that actually used them. The duck typing helps with being able to get things done by abusing things like `../` traversal to grab things, but really hinders with making anyone aware of what is possible (and what of that is actually intended).
Posted Jan 14, 2023 9:29 UTC (Sat)
by cyperpunks (subscriber, #39406)
[Link] (5 responses)
Maybe it's a path forward is to split PyPI in curated/blessed part and a free for all section? The blessed part will move somewhat slower, but much faster than PSL.
Posted Jan 14, 2023 11:11 UTC (Sat)
by amacater (subscriber, #790)
[Link]
And yes - I'm frankly amazed how many language / package distribution mechanisms for various operating systems have effectively reimplemented apt poorly.
Posted Jan 16, 2023 0:24 UTC (Mon)
by rgmoore (✭ supporter ✭, #75)
[Link] (3 responses)
Maybe this is unfair, but the impression I've gotten from the discussions on here is that anything less than full speed ahead will upset a lot of developers. At the very least, each developer has their own idea about how much delay for quality control is acceptable, and any delay at all will upset some people. Whatever choice you make will make some people unhappy there's too much delay and others unhappy there isn't enough QC.
Posted Jan 16, 2023 0:58 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
Sod full speed ahead. Sod quality control. Just take a step back. Think about what you're doing. INVEST TIME IN DESIGN. Then you *won't* *need* so much quality control. Then "full speed ahead" will feel like a tortoise (and won't do a Torrey Canyon). Then you'll end up with twice the quality in half the time.
The problem is that, without someone who has the power to knock heads together, having a sensible design discussion can be incredibly difficult. It just takes a couple of people who think their needs are the greatest, and are determined make their voice heard over everyone else, and things will implode.
Cheers,
Posted Jan 16, 2023 10:05 UTC (Mon)
by kleptog (subscriber, #1183)
[Link] (1 responses)
I read that here a lot too, but I've yet to meet such a developer in real life. Sure, you have junior developers that wonder what the point is. When they've spent a week trying to untangle dependencies to get the buildbot to pass again they suddenly appreciate the virtue of pinning versions.
Untangling package dependencies to find a working combination is one of the least interesting jobs there is.
Posted Jan 16, 2023 11:37 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
The thing that appears as "full speed ahead" is not that all developers want to be on the latest version of everything, but that the combined effect of all developers wanting their pet dependency to be on the latest version (which adds a feature they need, or a bugfix that affects their product's security) is "full speed ahead".
Basically, anything other than "we only accept dependencies in the oldest distribution release in extended support" (RHEL6, for example) ends up looking like "full speed ahead" in discussions, because no matter how carefully you consider your update plans, there will be someone who perceives your decision to update a minimum supported dependency version as "moving too fast".
Posted Jan 26, 2023 17:27 UTC (Thu)
by irvingleonard (guest, #156786)
[Link] (1 responses)
Now, the problem is that if you use PYPI you have to play by its rules. Package names are an asset on PYPI: the first one that claims it will own it. They obviously didn't read that memo and got bitten by it. The "solution" was as simple as publishing a dummy package in PYPI with a very low version number for every "private" package that only lived in their private index. That dummy package could be a simple readme with the instructions on how to reach the private index and with that they would have prevent the hijacking.
Am I wrong in this analysis?
Posted Sep 11, 2023 19:49 UTC (Mon)
by snnn (guest, #155862)
[Link]
1. Their packages are huge. One file could be 1-2 GB. But PyPI is free. PyPI cannot be so generous to provide so much free storage for every PyPI project.
This problem is a very general. Almost all machine learning packages with GPU acceleration capabilities need to deal with this. I believe every non- casual user should setup their private pypi index. Even the original problem is fixed, as long as you have multiple indexes, you are still at risk. You may think the problem in a different way: how much can I trust the Facebook's pypi index servers? What if someone puts a fake "wheel" package in PyTorch's PyPI index? Don't think no Facebook employee's account can be hacked if you still could remember last year Nvidia lost their GPG key.
PyTorch and the PyPI supply chain
- package 'sys-libs/glibc'
- version larger or equal to 2.32
- repository called 'gentoo' (locally)
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
https://reproducible-builds.org/
https://bootstrappable.org/
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
https://research.swtch.com/vgo-mvs (the section on "Upgrade Timing" is most relevant here).
I'm certainly not claiming Go is the only language to do this; it is simply the one with which I'm most familiar.
> You would also have to modify the deps of a repo that imports that, and of that, and so on, all the way up to the top.
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
https://go.dev/blog/vuln
https://go.dev/blog/supply-chain
> I keep coming back to the question why every language needs their own package manager with the usual set of problems to (a) discover and (b) solve in incompatible manners...
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
Wol
PyTorch and the PyPI supply chain
happy to use OS that hardware maker gives them too scared to replace OS that hardware maker gives to them, they want to control things on top of that.PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
and most distribution developers are also part of upstream. they also tend to have more user-aligned view than upstream. user-hostile upstream do exist.
PyTorch and the PyPI supply chain
> So why isn't there trusted language "distros" with trusted groups of maintainers curating that into trustable, separate repos meant for the "consumers" out there?
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
Hopefully new projects (such as poetry) will improve this, I have to say, rather sorry state.
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
Wol
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
PyTorch and the PyPI supply chain
1. You could use your private index by disabling PYPI altogether and provide every possible dependency. It would work for your package but break every other one out there.
2. You could embrace PYPI and don't use a private index. This might be not feasible for political (or technical?) reasons.
3. You could use them both at the same time, which is what they ended up doing.
PyTorch and the PyPI supply chain
2. You may build PyTorch with different build configs. For example, different CUDA versions. PyTorch community wants to keep all of them under the same name: pytorch. Otherwise it would harder to other packages to setup dependency on PyTorch. Therefore, PyTorch chose two different approaches: 1. put all of them in the same index and use local version to distinguish them. 2. Put each of them into a different index. However, both approaches are not supported by PyPI.
