So much wasted energy

Posted Sep 8, 2025 22:53 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
In reply to: So much wasted energy by zyga
Parent article: npm debug and chalk packages compromised (Aikido)

There aren't that many good solutions for intermediate proxies. Things like JFrog Artifactory have licenses that cost five digits per year for a single server.

Then there's a case of integrity. If you download packages from NPM, you can be pretty sure that you're getting actual packages (verified via TLS), so you won't get supply-chain attacked if your proxy is poisoned.

So much wasted energy

Posted Sep 9, 2025 9:03 UTC (Tue) by kleptog (subscriber, #1183) [Link] (4 responses)

Well, the web community has fought hard against any concept of a "trusted HTTPS caching proxy". Technically such a thing would be possible, allowing clients to have the transport security of HTTPS, while also allowing a third-party to cache certain responses transparently. Instead we have the super-wasteful "download everything from the source every time" model.

Debian package distribution shows it can be done in a non-transparent way. Anybody can setup a Debian mirror without asking Debian for permission.

It's a bit late now. Even if we could get enough people to decide that a HTTPS caching proxy was good idea and figure out an implementation, it would be a decade at least before there was enough support to make it work.

So much wasted energy

Posted Sep 9, 2025 14:16 UTC (Tue) by muase (subscriber, #178466) [Link] (2 responses)

Is it that much more wasteful though?

Because whether I connect to server X for download, or to 3rd-party-server Y, doesn't necessarily make a difference in itself – the total amount of connections and transmitted data stays the same. Localization and server efficiency matters, but that is probably already quite good with NPM. I don't know their infrastructure, but I cannot imagine that they have a single cluster in someone's basement that is serving all this all across the world – I bet they are using one or more of the big CDNs, which in turn effectively boils down to pretty efficient and rather localized cache servers.

It would be interesting to do the numbers here, because a custom "trusted HTTPS caching proxy" is also a piece of infrastructure that needs resources and energy. I'm not sure how realistic it is in practice to setup something that's local enough to be more efficient than a highly optimized CDN, which after all can make use of bigger scaling effects and better resource sharing. Maybe if it sits in your local network?

Tbh I think only obvious improvement would be to increase local caching where it's not done already; browsers do that (with caveats), and for build pipelines you can use a pre-populated npm cache in your build pipeline.

So much wasted time

Posted Sep 10, 2025 7:40 UTC (Wed) by kleptog (subscriber, #1183) [Link] (1 responses)

It is wasteful is that most precious resource: time. We added a local cache to our build infra because it reduced time to test our patches significantly. The difference between a build that takes one minute and one that takes five minutes is huge.

It pains me every time I see a Gitlab or ADO build with steps that takes 10 seconds (or longer!) to download a container image and start it to run a process that completes in 2 seconds.

So much wasted time

Posted Sep 10, 2025 13:21 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Oh how I wish for even 5 minute builds :) .

But yes, we have gitlab-runner just accumulate Docker detritus throughout the week. Compilation cache is distributed (Linux) or machine-local (macOS, Windows, and clang-tidy caches on Linux). At the end of the week, we do a `docker system prune --all --force --volumes` to avoid out-of-disk issues. The nightly scheduled pipelines then end up pulling down the primary images to help with the coming week (not all machines will get all images, but it's not zero).

Other places that seem expensive as well that are not necessarily project-specific:

- cache creation and extraction
- archive creation and extraction
- cloning strategies

There are some settings for compression level that may help if the resulting archives don't explode size limits.

So much wasted energy

Posted Sep 9, 2025 18:36 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

NPM also has quite a bit of legacy. There's "integrity" field now in the package-lock file that can store the hash of the dependency, but it's not yet universal. Once it spreads a bit more, local caching can be made safe.

In comparison, Golang readily supports dumb caches because it stores the hashes of all the dependencies. If a cache tampers with a dependency, the download will fail.

So much wasted energy

Posted Sep 9, 2025 11:28 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> Then there's a case of integrity. If you download packages from NPM, you can be pretty sure that you're getting actual packages (verified via TLS), so you won't get supply-chain attacked if your proxy is poisoned.

Nobody's mentioned cryptographic manifests. Okay, you have the problem of every build needs a new version number (unless you can allocate multiple crypto-sums to a single file), but the master site has a manifest of crypto-hashes for all packages on the site - which the mirrors are NOT supposed to supply, and once you have the master manifest you can download the package from any Tom Dick or Harry site, and check the manifest before you trust it.

I forgot - a signed manifest - and then why can't you download it from any mirror? Isn't that the whole point of signatures - to prove who the original author was, and that it hasn't been modified since? If someone's gained the ability to forge a signed manifest, you've probably got bigger problems ...

Cheers,
Wol

So much wasted energy

Posted Sep 9, 2025 12:25 UTC (Tue) by anselm (subscriber, #2796) [Link]

This is essentially what Debian does. Debian has a large infrastructure of third-party mirrors, and anyone is free to run their own (public or private) mirror.

Mirror integrity

Posted Sep 9, 2025 14:10 UTC (Tue) by farnz (subscriber, #17727) [Link] (5 responses)

Integrity shouldn't depend on going back to the original source, if implemented properly. HTML5 has Subresource Integrity (also usable in NPM package-lock.json), Cargo.lock has integrity hashes, and of course Debian solved this particular problem for Debian packages years ago.

You need to get the lockfiles from a trusted location, and you need to get the hash into the lockfile from a trusted location, but beyond that, the problem is one of mirroring the "right" content, rather than integrity.

Mirror integrity

Posted Sep 9, 2025 16:48 UTC (Tue) by josh (subscriber, #17465) [Link] (4 responses)

Subresource Integrity is interesting, but it unfortunately fails open: browsers that don't understand it will still load the resource without checking integrity. It would be nice if all browsers that supported Subresource Integrity also supported the Integrity-Policy header (and an equivalent in HTML, perhaps using http-equiv).

Also, in an ideal world, it'd be nice to do cross-domain caching based on hash, so that if you've already downloaded a file with a given cryptographically strong hash, you don't have to re-download it from another site.

Mirror integrity

Posted Sep 9, 2025 17:33 UTC (Tue) by farnz (subscriber, #17727) [Link]

You could, however, build on top of SRI and say that for things that build on top of it, if SRI fails, you re-fetch from the origin directly. For example, you could say that if you include the integrity-hash-cache attribute, too, then you can share the cached copy with anything that has a matching SRI hash and integrity-hash-cache option set - allowing a copy of something fetched from (say) ajax.googleapis.com to match your copy of jQuery served from my-host.example.com without ever fetching it from my-host.example.com, but still fetching from my-host.example.com if Google has changed their copy as compared to your (say because Google has hotfixed a bug that you depend upon).

That avoids the fail-open issue, because you're now saying "you can do this thing that you could not normally do, but only if the integrity verification works". If integrity verification fails, you fall back to fetching from the origin directly.

Mirror integrity

Posted Sep 10, 2025 6:53 UTC (Wed) by taladar (subscriber, #68407) [Link] (2 responses)

> Also, in an ideal world, it'd be nice to do cross-domain caching based on hash, so that if you've already downloaded a file with a given cryptographically strong hash, you don't have to re-download it from another site.

That sounds like it would just make hash collision attacks a lot easier (not finding the collision but using it once you found one).

Mirror integrity

Posted Sep 10, 2025 8:37 UTC (Wed) by edeloget (subscriber, #88392) [Link] (1 responses)

Are there any known hash collision for any modern cryptographic hash function (starting at the sha2 family)?

SHA-2 is not known to be weak

Posted Sep 10, 2025 16:48 UTC (Wed) by dkg (subscriber, #55359) [Link]

No, there are no known hash collisions in any member of the SHA-2 or SHA-3 families today.

So much wasted energy

Posted Sep 9, 2025 15:51 UTC (Tue) by Karellen (subscriber, #67644) [Link] (1 responses)

Why don't tools keep git clones of the repos?

Any update should only download changes since the last time you tried to grab an update - especially if you're only tracking `main`, and checking out by commit id should be pretty secure. Also, git is totally capable of working from a "local" to your organisation/subnet clone of the repos you care about (updated, once every few hours or so), and have all your client tools clone from that, and still be happy that if you've got a specific git commit, it's the one you meant to get.

Every client downloading a whole new tarball, all from the upstream hosting provider, every single time, is absolutely bonkers.

So much wasted energy

Posted Sep 9, 2025 16:09 UTC (Tue) by dskoll (subscriber, #1630) [Link]

Every client downloading a whole new tarball, all from the upstream hosting provider, every single time, is absolutely bonkers.

It absolutely is. But so many build systems, especially in the embedded world, just externalize costs and hammer the upstream provider. They just don't care.

For a package I wrote and maintain (RP-PPPoE) I had to put the download link behind a form that asks you to verify you're human before letting you download. Something like 95% of the traffic to my server was downloads of the same tarball, over and over and over again.