Removing support for DeltaRPMs in Fedora
Way back in 2009, we looked at the presto
plugin for yum, which added support for DeltaRPMs to Fedora. That package
format allows just the binary differences (i.e. the delta) between an
installed RPM and its update to be transmitted, which saves network
bandwidth; the receiving system then
creates the new RPM from those two pieces before installing it. Support
for DeltaRPMs was eventually added to the distribution by default, though
the feature
has never really lived up to expectations—and hopes. Now, it would seem
that Fedora is ready to, in the words of project leader Matthew Miller,
"give DeltaRPMs a
sad, fond farewell
".
Miller raised the question of retiring DeltaRPMs in a February 21 post to the Fedora devel mailing list. He pointed to a five-year-old open bug report that described problems with retaining the .drpm files for packages due to the way the Pungi distribution composer works. Miller also noted that a thread from 2021 discussing "deltarpm usefulness?" did not come to any firm decision.
The problem is that the DeltaRPMs are created but only getting synced to
the mirrors as part of the update composed on the day the DeltaRPMs was
created;
the next day, a new distribution update gets composed, without using the
previous
DeltaRPMs, and that gets
pushed to the mirrors. The net effect, as Jonathan Dieter pointed out
in the bug report, is that the DeltaRPMs are only available for a
day; "That means that the only way to take full advantage of deltarpms
in Fedora is to update every single day.
" Doing things that way
"has very little end-user value
", Miller said.
There are some benefits to using DeltaRPMs, especially for those on slow or metered connections, but generally bandwidth consumption has faded as a major problem for most people at this point. Meanwhile, as Kevin Fenzi pointed out in the bug report, there are still costs:
Sure, you save BW [bandwidth], but you spend more in CPU and disk I/O to reassemble the rpms on every client machine. You also make updates pushes slower and use more resources.
Beyond that, there are some technical hurdles in the way of better DeltaRPM
retention on the Fedora mirrors. The problem has never risen to the top of
anyone's priority list, which is unfortunate, but now probably is not the
time to address that,
Miller said. Instead, Fedora has other technologies:
"We have ostree and various
container-delta approaches. We should focus on those [...]
".
There were few complaints in the thread about that conclusion—generally, the opposite, in fact. Stephen Smoogen wondered when the axe should fall; should DeltaRPMs be removed for Fedora 39 and beyond, after a particular date, or, perhaps, discontinued for the upcoming Fedora 38? Miller said that normally he would be suggesting starting with Fedora 39, since the deadline for changes to Fedora 38 (which is due in April) has passed, but the DeltaRPMs have largely not been available anyway. He asked the infrastructure and release-engineering folks what the ramifications of simply shutting off the creation of DeltaRPMs would be. Kevin Fenzi replied that it was an easy change to make (and to back out if need be), without much in the way of risks.
Both Gary Buhrmaster and Dennis Gilmore posted their anecdotal experiences with the availability (and savings) from DeltaRPMs. The feature did not really provide much for either of them; Buhrmaster put it this way:
While occasionally I have seen a small decrease in the size of the files transferred (which certainly can benefit some people some of the time), the total elapsed time of the transaction has always ended up being higher as the recreation of the original rpm exceeds the time that it would have taken me to just download the full new rpm (with an admittedly reasonably high speed network provider in my environment).
The savings versus cost of DeltaRPMs is not entirely easy to work out; Ben
Cotton thought
the tradeoff may be particularly problematic for those with lower-powered
systems. "And since delta RPMs trade
bandwidth for CPU, it probably makes things worse for folks in
developing countries.
" He did wonder if the feature should only be
turned off by default on Fedora installs, while the distribution
continued building
the .drpm files; that would "still allow people to opt-in to
it
" and the distribution to measure its use. But Miller said that,
even with the measurement, it would still be difficult to judge whether the
benefit outweighs the cost.
Several times in the recent thread, Demi Marie Obenour advocated removing DeltaRPM support in order to reduce the attack surface of the distribution. She is mostly concerned with security holes in the program that reassembles the RPM from the delta, as she described in a March 2022 post:
This assumes that deltarpm (the program) does not contain any security flaws of its own, which could allow for code execution while the deltarpm is being applied. This is a bad assumption: a cursory audit I did found that it is not designed with untrusted input in mind. The code is also quite hard to follow, which makes auditing it quite difficult. Finally, it exposes decompression libraries to untrusted input before signature verification, and it itself has at the very least several areas where a bad deltarpm could cause it to allocate gigabytes of RAM.
In the earlier thread that Miller had pointed to, there was some confusion
about the integrity and authenticity checking for DeltaRPMs. Marek
Marczykowski-Górecki had proposed disabling DeltaRPMs in that earlier
thread as well, but thought another reason to do so was security-related
because
DeltaRPMs are "processed before
checking the package signature, which exposes rather big attack
surface [...]
".
But Fenzi described
the process for assembling and checking a DeltaRPM, which is just as secure
as the
checking for a regular RPM, he said. Obenour's concerns about the reassembly
process
are still valid, of course, but:
drpms work by downloading the delta, then using it + the version you have installed to recreate the signed rpm (just like you downloaded the full signed update) and then the gpg signature is checked of that full rpm, just like one you downloaded. If the drpm is tampered with it won't reassemble and it will fall back to the full signed rpm.
The security concerns seem like they could be addressed, as could the build-process issues that drastically reduce the availability of DeltaRPMs, but there is no large impetus for the feature these days. The tradeoff is potentially valuable in some scenarios, but the cost has been deemed too high by many. Given that users have been largely living without even the limited benefits that the feature provides—for a fair number of years and Fedora releases—the time has come to let it go. That will not happen without some action by the Fedora Engineering Steering Committee (FESCo), but with essentially zero opposition to removing DeltaRPMs, agreement seems clear.
Posted Mar 8, 2023 18:37 UTC (Wed)
by jhoblitt (subscriber, #77733)
[Link] (3 responses)
Posted Mar 8, 2023 20:17 UTC (Wed)
by Sesse (subscriber, #53779)
[Link] (2 responses)
Posted Mar 8, 2023 23:21 UTC (Wed)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Mar 9, 2023 8:54 UTC (Thu)
by Sesse (subscriber, #53779)
[Link]
Posted Mar 8, 2023 19:13 UTC (Wed)
by hikingpete (guest, #139538)
[Link]
Posted Mar 8, 2023 20:44 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (10 responses)
Actually, why everything related to RPM or DEB packages is so slow? Some time ago I tried https://distr1.org/ and I was amazed by package installation speed (which is kinda its goal). APK in Alpine is another really fast package manager.
Posted Mar 8, 2023 21:08 UTC (Wed)
by bartoc (guest, #124262)
[Link] (1 responses)
Posted Mar 9, 2023 12:00 UTC (Thu)
by LtWorf (subscriber, #124958)
[Link]
Posted Mar 8, 2023 21:14 UTC (Wed)
by jreiser (subscriber, #11027)
[Link] (4 responses)
Posted Mar 9, 2023 1:12 UTC (Thu)
by rgmoore (✭ supporter ✭, #75)
[Link] (1 responses)
To put it another way, the designers chose to prioritize simplicity and maintainability over performance. It might be possible to rewrite the code in a better performing language than Python, cache the timezone data, and so forth, but that would come at the cost of making the code more complex and thus harder to maintain. This might be a worthwhile tradeoff, but it would be hard to be sure until someone has actually written an alternative and seen how fast it goes.
Posted Mar 9, 2023 1:46 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link]
RPM is largely written in C and parts of it is now using Rust.
https://fedoraproject.org/wiki/Changes/RpmSequoia
If the reference to Python is about yum/dnf, then dnf5 is written in C++ (with Python bindings) and should improve speed (some numbers in the change proposal)
Posted Mar 10, 2023 22:24 UTC (Fri)
by WolfWings (subscriber, #56790)
[Link] (1 responses)
Posted Mar 11, 2023 0:10 UTC (Sat)
by mbunkus (subscriber, #87248)
[Link]
Posted Mar 9, 2023 9:02 UTC (Thu)
by Sesse (subscriber, #53779)
[Link]
1. It is very conservative in fsync; if e.g. the power goes out, the package manager needs to be able to rollback to a safe state.
#2 makes #1 _much_ worse. I'm fairly certain that if you switched to something like SQLite and redesigned the rollback system around that (not literally reusing database transactions for filesystem transactions, but sticking your intents into the database and then persisting them all to disk in a single commit), you could get package installation down to <100 ms on an SSD, cold-cache.
Posted Mar 13, 2023 3:33 UTC (Mon)
by cozzyd (guest, #110972)
[Link] (1 responses)
Posted Mar 13, 2023 11:20 UTC (Mon)
by timon (subscriber, #152974)
[Link]
# rm -r /var/lib/man-db
From now on, apt/dpkg will tell me something like:
> Processing triggers for man-db (2.10.2-1) ...
You might also want to clean up after man-db now, removing the actual cache in /var/cache/man, disabling or deleting cronjobs and systemd timers, ...
Posted Mar 9, 2023 5:26 UTC (Thu)
by pabs (subscriber, #43278)
[Link] (13 responses)
Posted Mar 9, 2023 6:52 UTC (Thu)
by epa (subscriber, #39769)
[Link] (12 responses)
Posted Mar 9, 2023 8:11 UTC (Thu)
by anton (subscriber, #25547)
[Link] (9 responses)
According to the article Ben Cotton thinks that CPU is more of a problem in developing countries than bandwidth. I rally doubt that. Bandwidth can be a problem even in not-so-central areas in developed countries and can differ by several orders of magnitude, while the CPU performance for single-threaded latency-sensitive code differs only by about an order of magnitude, even for older and weaker CPUs.
AFAIK git uses diffs over the wire (and only over the wire), and I think that's still a good approach.
Posted Mar 9, 2023 11:07 UTC (Thu)
by hailfinger (subscriber, #76962)
[Link] (7 responses)
And for some larger organizations (universities, internet providers, ...) it makes sense to provide their own mirror for often-downloaded software to reduce the amount of external bandwidth needed. Faster downloading for their own users is a nice side effect.
Posted Mar 9, 2023 14:02 UTC (Thu)
by pabs (subscriber, #43278)
[Link] (6 responses)
Posted Mar 9, 2023 20:16 UTC (Thu)
by smoogen (subscriber, #97)
[Link] (5 responses)
Working with one of the CDN's we tried a method to keep more data longer in the CDN but ran over what they were wanting to give us in a week or so. I believe currently we only use the CDN's in specific targeted regions and solutions (aka if you have an Amazon cloud instance you will go through their CDN).
The problem is increased by the number of delta-rpms we need to keep. We have to basically keep one per package as far back as want to get back to. Many packages get updates daily but people in low bandwidth areas are also usually only updating any where from 1/week to 1/month. That means we need to keep at least a month worth of every package around so that the delta between A-one-month-ago.rpm and A-today.rpm can be calculated. Each one of those takes anywhere from a few seconds to a minute to calculate. We normally update a couple hundred packages to a thousand packages a day. Doing just the previous delta makes it so that the time to calculate deltas can be done in an hour or so for a compose to get out the door. We would need to extend that by about 7 to 30 times that to make it really useful for people who need delta rpms.
Posted Mar 9, 2023 23:40 UTC (Thu)
by excors (subscriber, #95769)
[Link]
Probably not really relevant to Fedora but for general interest in this topic: Windows has tried a few different approaches for efficient updates, described in https://devblogs.microsoft.com/oldnewthing/20200213-00/?p... (and earlier posts linked from there).
"Express" updates sound closest to DeltaRPMs: they generate a patch from every previous release to the current release, and do that every time there's a new release, so you end up generating O(n^2) patches after n releases. That's quite expensive on the server side and not great for caches.
"Quality" updates start with a single baseline release. For every new release, they generate a patch from the baseline to the new release, plus a reverse patch from the new release back to the baseline. Then a client running any arbitrary version can apply the reverse patch for their version, followed by the forwards patch for the latest version. That means you end up with O(n) patches in total. The process is not as bandwidth-efficient as express updates, but it sounds like it simplifies a lot of the update infrastructure.
Posted Mar 12, 2023 11:22 UTC (Sun)
by rwmj (subscriber, #5474)
[Link] (3 responses)
Posted Mar 12, 2023 12:40 UTC (Sun)
by pabs (subscriber, #43278)
[Link] (2 responses)
Posted Mar 12, 2023 12:45 UTC (Sun)
by rwmj (subscriber, #5474)
[Link]
Posted Mar 13, 2023 13:39 UTC (Mon)
by paulj (subscriber, #341)
[Link]
Posted Mar 9, 2023 13:13 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted Mar 9, 2023 8:13 UTC (Thu)
by pabs (subscriber, #43278)
[Link]
Posted Mar 11, 2023 7:06 UTC (Sat)
by gdt (subscriber, #6284)
[Link]
Sites like Google, Facebook and Netflix build out content distribution networks to solve these issues as well as to spread load. Attempts to build a free CDN have not been nearly as successful as mirrors. My guess is that's because mirrors have more ability for the server owner to apply policy; for example, a mirror might limit itself to Linux software, or to retrogaming, etc. A Linux distribution could use a commercial CDN, but for a network which wants more rapid access to FOSS software, that general-purpose CDN is even more problematic (eg, that network could install, say, an Akamai cache but still not be assured that FOSS software would be cached on-shore).
Posted Mar 9, 2023 10:02 UTC (Thu)
by gtirloni (subscriber, #85631)
[Link]
It was pretty cool when I was in a 256Kbps connection, but now that everyone I know has >100Mbps, it's much less useful.
So long, and thanks for all the fish, DeltaRPM :wave:
Posted Mar 9, 2023 10:36 UTC (Thu)
by NRArnot (subscriber, #3033)
[Link] (1 responses)
They might do better to provide a simple "black box" method for locally cacheing rpms site-wide, so that folks with two or more similar installations would only have to download an rpm once on their broadband. How about a live USB rpm-cacher that one could just plug into any old PC with a decent amount of space to spare on its hard disk, and a one-line mod to the .repo files?
The last time I had to upgrade an entire site (about 20 systems) from one Fedora release to another, I hacked together a special-purpose Squid proxy. But it had a few gremlins, and I had to disable it rather than leave it in place for day-to-day upgrades. It wasn't worth my time to work out what I was doing wrong and it had served its main purpose (of removing limited broadband bandwidth as an hours-at-work factor).
Posted Mar 9, 2023 14:38 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link]
You might be looking for https://dnf-plugins-core.readthedocs.io/en/latest/local.html
quick-fedora-mirror at https://fedoraproject.org/wiki/Infrastructure/Mirroring#M... if you want something more full fledged.
Posted Mar 9, 2023 13:40 UTC (Thu)
by eru (subscriber, #2753)
[Link] (2 responses)
Posted Mar 17, 2023 14:54 UTC (Fri)
by plugwash (subscriber, #29694)
[Link] (1 responses)
Posted Mar 17, 2023 16:27 UTC (Fri)
by rahulsundaram (subscriber, #21946)
[Link]
Unit of maintenance and binary but yeah this is true for Fedora and likely every distribution that builds from source.
Posted Mar 9, 2023 18:10 UTC (Thu)
by flussence (guest, #85566)
[Link] (3 responses)
But I'd also like to point out the experience of trying to do so in Linux utterly sucks. Even *Windows* just does p2p update distribution without manual setup.
We have nearly ubiquitous mDNS, everything is cryptographically signed, and the status quo is that if you want to do something as simple as (e.g.) getting a few Debian/Ubuntu boxes to share .deb files locally, you've got to ssh into every box individually to configure apt-cacher-ng by hand. I imagine Fedora's situation isn't much nicer, or else they would've done this sooner. My own Gentoo+NFS setup is functional but the less said about its specifics the better.
Posted Mar 9, 2023 19:25 UTC (Thu)
by cesarb (subscriber, #6266)
[Link] (2 responses)
There's a privacy risk on doing the same for a Linux distribution. On Windows, you could assume that everyone who is on the same Windows version has the same set of Windows components. On a Linux distribution, the set of installed packages is usually distinct enough that it could even be used to fingerprint a single device. And that's just fingerprinting, the presence or absence of a package could be something you might not want to announce to anyone who happens to be on the same local network as your laptop (for instance, you might not want your peers to know that you have installed tor-browser, or emacs).
Of course, for desktop computers which never leave a trusted local network, that's not much of an issue, so it would be great if enabling a "p2p update distribution" mode were as simple as setting a configuration flag or installing a single package (with no further manual configuration needed).
Posted Mar 9, 2023 20:54 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
What is the scenario? A hotel network setting up a rogue mDNS-based mirror?
I guess it's possible, but what would be the purpose? They already can use your MAC address to identify your machine.
Posted Mar 9, 2023 21:36 UTC (Thu)
by jreiser (subscriber, #11027)
[Link]
Posted Mar 10, 2023 3:25 UTC (Fri)
by pabs (subscriber, #43278)
[Link]
http://debdelta.debian.net/
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Slowness of RPM package updates
Slowness of RPM package updates
Slowness of RPM package updates
Slowness of RPM package updates
Slowness of RPM package updates
Removing support for DeltaRPMs in Fedora
2. It has a home-grown database and transaction system based on flat files, which means there can be no WAL or similar techniques to batch states, and every large action involves dozens of files. (For instance, when you start up dpkg -i, it reads a couple thousand of these flat files into RAM.)
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
> Not building database; man-db/auto-update is not 'true'.
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
It seems to me that content delivery networks (CDNs) are the (mostly) user-transparent modern version of mirrors, and are extensively used by companies with lots of traffic for the same files. It's interesting that they have not replaced mirrors in those areas where mirrors are established; I guess the CDNs take more money. As with mirrors, sometimes (rarely) there are problems. Concerning proxy servers, they went out of favour a long time ago (even before https became common) for some reason; in a way CDNs deliver the same service, but the server side arranges them rather than the client side.
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
What I have wondered for along time, though, is why many updates seem to update RPM's that probably have had no other change than the version number. For example, any Wine update reinstalls a ton of font files. I guess it is a simplification for the project maintainers: Easier to label all components as Vn+1 even though many are same as in Vn or even Vn-1, than keep track of the real changes.
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
Removing support for DeltaRPMs in Fedora
MAC address is not a UUID
Removing support for DeltaRPMs in Fedora
https://debdeltas.debian.net/
https://wiki.debian.org/Teams/Dpkg/Spec/DeltaDebs