|
|
Subscribe / Log in / New account

Improving .deb

May 28, 2019

This article was contributed by Alexander E. Patrakov

Debian Linux and its family of derivatives (such as Ubuntu) are partly characterized by their use of .deb as the packaging format. Packages in this format are produced not only by the distributions themselves, but also by independent software vendors. The last major change of the format internals happened back in 1995. However, a discussion of possible changes has been brought up recently on the debian-devel mailing list by Adam Borowski.

As documented in the deb(5) manual page, modern Debian packages are ar archives containing three members in a particular order. The first file is named debian-binary and has the format version number, currently "2.0", as one line of text. The second archive member is control.tar.xz, containing the package metadata files and scripts that are executed before and after package installation or removal. Then comes the data.tar.xz file, the archive with the actual files installed by the package. For both the control and data archives, gzip, not xz, was used for compression historically and is still a valid option. The Debian tool for dealing with package files, dpkg, has gained support for other decompressors over time. At present, xz is the most popular one both for Debian and Ubuntu.

The choice to use ar as the outer archive format might seem strange. After all, the only other modern application of this format is for static libraries (they are ar archives with object code files inside), and the de-facto standard for archives in the Unix world is tar, not ar. The reason for this historical decision is, according to Ian Jackson, that "handwriting a decoder for ar was much simpler than for tar".

Before 1995, a different format, not based on ar, was used for Debian packages. It was, instead, a concatenation of two ASCII lines (format version and the length of the metadata archive) and two gzip compressed tar archives, one with metadata, similar to the modern control.tar.gz, and one with files, just like data.tar.gz. Even though old-format packages are not in active use now, modern dpkg can still create and install them.

What prompted Borowski to start a discussion about changing the internals of the package format amounts to a few possible improvements that can easily be implemented. For example, while the xz compressor yields the smallest package size, switching to zstd for compression would improve the unpacking time by a factor of eight while still beating the venerable gz in terms of compression ratio. As Borowski suggested:

Thus, even though we'd want to stick with xz for the official archive, speed gains from zstd are so massive that it's tempting to add support for it, at least for non-official uses, possibly also for common Build-Depends.

To be fair, this is not the first time developers have proposed zstd compression support for inclusion into Debian's dpkg. Also, Ubuntu 18.04 ships with zstd support already enabled in its version of dpkg.

Beyond recommending adding support for a new compressor, Borowski suggested returning to the old format. The reason was that ar archives (and thus modern deb packages) store the size of their members as a string of no more than ten decimal digits. It means that data.tar.xz can be at most 9,999,999,999 bytes long, or approximately 9.4GiB. While there are no packages of this size in the Debian archive (the largest package is flightgear-data-base, taking "only" 1,178,833,172 bytes), this limitation is indeed a problem for some communities producing unofficial packages, as confirmed by Sam Hartman. The old format does not have a fixed-size length field and thus does not have such a limitation. In addition, in the benchmarks performed by Borowski, even in the apples-to-apples comparison using the gzip compressor for both format versions, the old format was slightly faster to decompress.

Jackson, as the developer who introduced the currently used format, responded that Borowski's suggestion is "an interesting proposal". He acknowledged that the size limitation is indeed a problem and explained the rationale behind the current format. Namely, the old format was not easy to extract without dpkg (e.g. on non-Debian systems) and was not easily extensible. A short discussion thereafter confirmed that people do routinely extract .deb files on "foreign" Linux distributions by hand and perceive this ability as an important property of the .deb package format. Extensibility, on the other hand, in practice amounted to the addition of new decompressors and new fields in files that are in the control tarball. All of that could be done with the old format just as well.

However, switching away from the current "ar with tar files inside" format does not necessarily mean returning to the old format. And that's exactly the objection raised by Ansgar Burchardt. He mentioned the use case of extracting only a few data files (such as the Debian changelog, or a pristine copy of the configuration files), which is currently slow. This operation is slow not only because of a slow decompressor, but also because, in order to get to a file in the middle of a compressed tar archive, one has to decompress and discard everything before it. In other words, fixing this slowness would require switching away from a "compressed tar" format for the data archive to something that supports random access. According to Burchardt, if the Debian project were to introduce one incompatible change to the package format anyway, it would be also a chance to move away from tar, or to tack on other improvements that require incompatible changes. Jackson, however, expressed disagreement with the idea of bundling several incompatible changes together.

Borowski measured the overhead of switching to a seekable archive format by compressing each file in the /usr directory and the Linux kernel source individually and comparing the total size of the compressed files with the size of a traditional compressed tar.xz archive. As it turns out, individually compressed files, which are needed for a seekable archive, took 1.8x more space, thus making the proposal too expensive. Burchardt suggested retesting with the 7z archiver, because it can do something in between compressing files individually and compressing the whole archive. Namely, to get a file from the middle of the archive, one needs to decompress everything not from the very beginning, but only from the beginning of a so-called "solid block" containing the file in question. The "solid block" size is tunable. Still, even with 16MiB solid blocks, according to Borowski's measurement, "the space loss is massive" (1.2x). This experiment convinced Burchardt that switching to a format that allows random access is just not worth it.

An idea of replacing ar with uncompressed tar as the outer archive format has also been proposed. This would eliminate the package-size limitation, while keeping the advantage that Debian packages can be examined and unpacked by low-level shell tools. This is actually the same as the opkg format used by embedded Linux distributions.

Guillem Jover (the maintainer of dpkg) acknowledged the problems with both old and current .deb package formats and, after examining possible alternatives, concluded that the proposal to switch the outer archive format to tar is "the most straightforward and simple of the options". He promised to present a diff to the .deb format documentation and to start adding support in dpkg version 1.20.x. However, Borowski objected to any "archive in archive" format design and especially did not like uncompressed tar as the outer archive, because it wastes bytes on so-called "blocks" that are only relevant for tape drives. Also, optional features of the tar archive format, such as sparse file support, would unnecessarily complicate the implementation.

Jackson suggested that it is possible to support only a strict subset of the tar format, without the problematic features. He noted that it is already the case for the usage of ar as the outer archive format, "to the point that it is awkward to *create* a legal .deb with a normal ar utility". He also brought up his old idea on how to deal with the data.tar.xz size limit: just split it into multiple files and store them in the ar archive as extra members. This proposal has the advantage that it is still compatible with third-party tools and amounts to absolutely no change if the existing package size limit is not hit.

At this point, the discussion accumulated quite a large number of conflicting proposals and opinions. Due to the issue being too contentious, Jover retracted his promise to work on changing the format documentation. The thread died off without any conclusions or action items. Still, at this time no official Debian packages come too close to the limitations of the current .deb format, so no urgent action is needed. And, if someone needs to unofficially package something really big, they can do it right now — thanks to Borowski's idea about the old format, which is still supported.


Index entries for this article
GuestArticlesPatrakov, Alexander E.


to post comments

Improving .deb

Posted May 28, 2019 18:24 UTC (Tue) by smoogen (subscriber, #97) [Link] (11 responses)

Alternatives that could be looked at changing the outer container too:
* ogg . it is a container format which could be used and has a lot of code for it in many different modes. This would allow for the greater portability that Debian strives for.
* zip. Also crossplatform and also default download type from say git(hub/lab) where you could download the deb directly. Make some changes and a zip2exe which installs Debian on a windows box could be possible.
* cpio. no no there are bridges too far.

Improving .deb

Posted May 28, 2019 19:03 UTC (Tue) by mbunkus (subscriber, #87248) [Link] (9 responses)

I hope you mentioning Ogg is a joke. Ogg is a horrible container format, even for the purpose it was actually designed for (audio/video encapsulation with synchronization, suitable for streaming at low bitrates). It doesn't have a central directory of its content. It doesn't have an index. It's time synchronization mechanism requires knowledge of the codec on the (de)muxer level, not only on the en-/decoder level. Changing anything inside the container requires rewriting the whole file.

But the worst of all is its horrible overhead. The whole stream is split into Ogg packets (one audio or video frame in one packet) — so far so good. But Ogg packets are again spread across one or more Ogg pages, adding more overhead for no real benefit (the reason is to have points to sync to in streaming scenarios as each Ogg page starts with a well known byte sequence, but that's completely irrelevant for file storage).

The most ridiculous thing of it all is how the size of Ogg packets is encoded. The algorithm is:

size = 0
do {
byte = read_next_byte()
size += byte
} while (size == 255)

So in order to encode the size of an arbitrary file, let's say /bin/bash at 1,166,912 bytes, the size will be encoded in 4,577 bytes. Encoding the size of the example from the article, 1,178,833,172 bytes, would require a wooping 4,5 MB.

Apart from that Ogg has no provisions for storing things a file container might need (ownership, permissions, ACLs, extended attributes, whatever else) while containing stuff you don't need for file storage (granulepos values used for inter-stream time synchronization, serial numbers). You'd need to invent all kinds of proprietary meta data formats. I'm certain all of the existing code handling Ogg files has been written with handling audio/video streams in mind, meaning there's zero cross-platform/cross-application gain.

At that point you're basically inventing a whole new container format.

There are tons of container formats out there tailored to file storage. You've already mentioned two of them. Here are a couple of others: tar, 7z. Neither of them is perfect, to be sure, but at least some of them (e.g. tar with certain extensions as created by e.g. the BSD's tar command) can deal with ACLs, extended attributes, several either offer different compression algorithms (7z, zip) or are completely agnostic to them (cpio, tar) and therefore extensible. They're much, much better suited to file storage than Ogg is.

OK this has gotten much too long and ranty. I always get somewhat frustrated and emotional when talking about Ogg. I hope you didn't take any of this personal; it certainly wasn't meant that way.

Improving .deb

Posted May 28, 2019 19:27 UTC (Tue) by ncm (guest, #165) [Link] (1 responses)

Yes, the Ogg idea is a joke. But is Matrioshka?

Improving .deb

Posted May 28, 2019 19:53 UTC (Tue) by mbunkus (subscriber, #87248) [Link]

It's actually called Matroska, but even though it's not a joke, it isn't any better at being a container for files either. It avoids a lot of pitfalls of the Ogg container and contains a lot more features, but it's still designed for storing audio/video/subtitle streams suitable for synchronized playback.

The fundamental difference between A/V containers & file containers is how multiple streams/tracks/files are laid out. In a file container all files are laid out one after the other. Accessing the content of one file is ideally as seeking to its start position and doing one long read operation.

In an A/V container, on the other hand, you place those parts of each stream/track close together that need to be played together. All of the data is tightly interleaved by their timestamps. This is done in order not to have to seek forward and backward all the time, which is especially atrocious for transports with high latency (e.g. optical discs or online streaming). In the Good Old Days™ there were a lot of AVI files (and I even have a couple of MP4 files) where track content was laid out like in a file container (first all the video data, then all the audio data), and playing such a file from a CD-ROM was neigh impossible.

Improving .deb

Posted May 28, 2019 22:19 UTC (Tue) by lmartelli (subscriber, #11755) [Link] (5 responses)

You probably meant : while (byte != 255)

Improving .deb

Posted May 29, 2019 6:58 UTC (Wed) by mbunkus (subscriber, #87248) [Link] (3 responses)

No, I meant what I wrote. The size 1024 would be stored as 255, 255, 255, 255, 4. The size of 255 would be stored as 255, 0. See the examples in the documentation at https://www.xiph.org/ogg/doc/framing.html in the section "The encapsulation process".

Improving .deb

Posted May 29, 2019 12:57 UTC (Wed) by jezuch (subscriber, #52988) [Link] (2 responses)

Nope, still doesn't make sense :) The way you wrote it the loop would do at most two iterations, unless the bytes read were one 255 and a bunch of zeros. Which is not what the format specifies.

Did you mean: while (byte == 255); ?

(Oh the perils of commenting on a forum full of programmers ;) )

Improving .deb

Posted May 29, 2019 13:04 UTC (Wed) by mbunkus (subscriber, #87248) [Link] (1 responses)

Well that was stupid… of course I meant "while (byte == 255)". Sorry :D

Improving .deb

Posted May 29, 2019 14:49 UTC (Wed) by gevaerts (subscriber, #21521) [Link]

We really need support for comment unit tests here on lwn!

Improving .deb

Posted May 29, 2019 7:47 UTC (Wed) by weberm (guest, #131630) [Link]

Probably not. I was wondering about the condition as well. But consider how this supposedly encodes a size of 256: as one byte saying 255, and one saying 1. So the condition does make sense: Keep reading more single uint8_t's adding up to 255 bytes to the size while the read uint8_t contains its maximum value. (disclaimer: I don't know the container format, just extrapolating the grandparent comment).

Improving .deb

Posted May 30, 2019 18:44 UTC (Thu) by smoogen (subscriber, #97) [Link]

yes it was a joke.. an old one that Debian would use ogg containers way before it would ever use cpio.. and vice versa that RPM would use ogg before it would use tar.

Improving .deb

Posted May 29, 2019 20:32 UTC (Wed) by notriddle (subscriber, #130608) [Link]

Ogg?! I'd rather use a giant bencode dictionary; at least it doesn't have a bunch of superfluous features for A/V sync and stuff.
d7:control10:TARCONTENT4:data10:TARCONTENTe

Improving .deb

Posted May 28, 2019 18:48 UTC (Tue) by logang (subscriber, #127618) [Link] (10 responses)

The size limit seems like a feature to me.

I don't expect I'd want to install any application whose compressed size is greater than 9GiB. What would be in such a monstrosity? It'd be enough to fit an entire typical debian desktop install in a single package, and then some.

Improving .deb

Posted May 28, 2019 19:15 UTC (Tue) by excors (subscriber, #95769) [Link]

Modern games can easily reach 100GB, particularly if they're designed for 4K screens with lots of high-res textures and FMVs. (Of course you wouldn't want to install anything that large from a software distribution system that's so antiquated it doesn't support delta updates, but maybe one day they'll fix that too.)

Improving .deb

Posted May 28, 2019 19:19 UTC (Tue) by pizza (subscriber, #46) [Link] (4 responses)

The full offline installer tarball of the Xilinx Vivado suite is over 20GB.

Believe me, nobody "wants" to install Vivado. :-)

Improving .deb

Posted May 28, 2019 19:30 UTC (Tue) by logang (subscriber, #127618) [Link] (3 responses)

I second not "wanting" to install Vivado.

I haven't touched Vivado in a long time but the Xilinx tools used to contain an entire Java runtime, QT, perl, etc, etc. (I hope you don't care about security updates on all this code.) The real problem is the proprietary software bundles that don't use shared libraries and need to have the kitchen sink included.

Also, speaking to Vivado specifically, I think it would also be very sensible to split it up into a main deb plus one deb per FPGA family (Spartan, Virtex, Artex, Kintex, etc). That way users can choose what they want and they don't necessarily need to waste so much disk space.

So back to my main point, having the file size limit can force developers to do sensible things like solve the problems above.

Improving .deb

Posted May 29, 2019 17:41 UTC (Wed) by thoughtpolice (subscriber, #87455) [Link]

> So back to my main point, having the file size limit can force developers to do sensible things like solve the problems above.

If that was the case and how it worked in reality, they would have already fixed it. But they haven't: rather than getting a .deb package that can easily be installed like any others, they ship tarball installers with horrid self-extraction programs. In fact they probably wouldn't do it anyway even with that problem fixed, because they want one binary blob they ship to every platform, hence why they vendor everything under the sun. They have no interest in supporting multiple package formats. You're making a categorical error in thinking their goals (ship highly expensive, niche, proprietary software to users in controlled environments with support contracts) are the same as yours ("nice" Linux system integration). In fact the only people who suffer under this setup are the people who *do* want to be "nice" about Linux system integration, and ship .deb files of large programs/packages (for reasons that may not entirely be under their control.) Not Xilinx. And even if it wasn't a moot point, single device families (UltraScale+ IIRC) consume more space than is already allowed by a single .deb anyway, so there you go.

If you think a company like Xilinx that makes billions of dollars a year is going to change all of this because Debian has an artificial technical restriction on the size of .deb files: they won't, you're just not that important. They do not care about what Linux distro developers/users want or think is "ideal", and they have far more money and time than you do, so they can make it work. They have more money to burn than you have time to implement artificial technical restrictions to try and "force them" to behave. I don't know why people persist in believing this kind of trivial, easily-countered approach works: the entire premise relies on the faulty assumption that the dynamics of power are in your favor. They are not.

Improving .deb

Posted May 30, 2019 6:00 UTC (Thu) by Tov (subscriber, #61080) [Link]

Huge proprietary applications should not use distribution package formats like .deb or .rpm anyway. More often than not they pollute the filesystem with their own idea of absolute directory structures, included libraries, broken dependencies and impossible un-installation.

That is exactly what we have appimage/snap/flatpak for!

Improving .deb

Posted May 30, 2019 15:41 UTC (Thu) by imMute (guest, #96323) [Link]

As someone who maintains debian packages (for our company, not for Debian itself), uses Xilinx toolchains, AND has read a tiny bit about flatpack/snap/etc, I'm really torn.

As a pro-distribution guy, I like the benefits that package maintainers bring. On the other hand, I love that the Xilinx toolchain is entirely self contained: no need to mess around with figuring out which dependencies are needed (because god help them if they actually document those things). I totally see why flatpack (et. al.) are rapidly gaining popularity.

Improving .deb

Posted May 28, 2019 20:12 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

We used .debs to distribute scientific software that has HUGE lookup tables. The archive was about 8Gb in size.

Improving .deb

Posted May 28, 2019 21:47 UTC (Tue) by flussence (guest, #85566) [Link] (1 responses)

I'd say ar having *one* way to store file sizes is a feature too.

tar's default is a null-terminated string of 11 octal digits (8GiB limit!), some implementations store 12 non-null digits (64GiB), the GNU-proprietary extended format sets the MSB bit to 1 and uses the other 95 bits as a binary-encoded number (32ZiB), and PAX instead defines a size xattr that accepts arbitrary-precision integers. None of the large size formats have universal support.

That's just one facet of it. It's an awful format in modern times, kept alive entirely through inertia. The same could be said about a lot of container/compression formats.

Improving .deb

Posted May 31, 2019 7:11 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

Well PAX tar (POSIX.1-2001) was specified more than 20 years ago, is supported by tools (even if GNU tar does not use it as default write format) and as far as I know has all the legacy limitations lifted.

Are there still reasons not to use it?

Improving .deb

Posted May 29, 2019 4:42 UTC (Wed) by dvdeug (guest, #10998) [Link]

The article mentioned that the Debian archive for flightgear-data-base is one gigabyte. This data package for a flight simulator only includes terrain for San Francisco; most people dynamically download the terrain from online, but you can download all ninety gigabytes of terrain and store it on disk.

Improving .deb

Posted May 28, 2019 21:10 UTC (Tue) by jhoblitt (subscriber, #77733) [Link] (22 responses)

Is there a technical argument for why it makes more sense to massively refactor Deb instead of just switching to rpms?

Improving .deb

Posted May 28, 2019 21:27 UTC (Tue) by compenguy (guest, #25359) [Link] (4 responses)

> Is there a technical argument for why it makes more sense to massively refactor Deb instead of just switching to rpms?

The database format of rpms is painful to deal with, and really rather impractical for some of the use cases described in the article, such as extracting the installer contents for manual installation on an unsupported system.

Also, most of the linux installation professionals I know hate rpm with a passion and would much rather work with deb packages, for a host of reasons not directly relating to the file format itself. The state management for package install/upgrade/uninstall is more robust and intuitive for deb being one of the really big ones. I will say, though, that on the deb side of things, I miss rpm's autoreq/autoprov system. Deb's tooling doesn't let you provide/require a SONAME, rather the tooling will look at known packages and use the name of the package that installs that lib as the dependency.

Most of the rest is kind of a grey area of just having different design patterns for solving different kinds of installation problems.

Improving .deb

Posted May 30, 2019 15:50 UTC (Thu) by imMute (guest, #96323) [Link] (1 responses)

>Deb's tooling doesn't let you provide/require a SONAME, rather the tooling will look at known packages and use the name of the package that installs that lib as the dependency.

I wonder if this wouldn't be possible in debs with some creative use of Provides and Requires. A package containing a library that "provides" some SONAME could have a "Provides: SONAME-libfoo.so.2" on it. Packages that need that SONAME could add "Requires: SONAME-libfoo.so.2". Specific versioning would be tricky, since you can't know the exact versioning a providing package uses. I'm thinking epoch versions might throw a wrench in there... Also that the SONAME "version" number and the package version number (even just the "upstream" part) aren't always numerically the same.

Since everyone should already be using dh_makeshlibs / dh_shlibdeps, this might not even be too hard to prototype...

Improving .deb

Posted May 31, 2019 14:58 UTC (Fri) by patrakov (subscriber, #97174) [Link]

In this particular case, there is a policy how to name packages providing shared libraries, and a lintian check that enforces it. E.g., under this policy, a package that provides libncurses.so.6 must be named libncurses6. So this is a case of convention over configuration, but the end result is the same.

Improving .deb

Posted Jun 6, 2019 9:38 UTC (Thu) by hensema (guest, #980) [Link]

> The database format of rpms is painful to deal with, and really rather impractical for some of the use cases described in the article, such as extracting the installer contents for manual installation on an unsupported system.

Rpm is using cpio as its archive format. Equally arcane as ar, but it does enable you to extract an rpm on foreign systems.

> Also, most of the linux installation professionals I know hate rpm with a passion and would much rather work with deb packages, for a host of reasons not directly relating to the file format itself.

So what you're saying it's better to invent a new format to avoid hurting feelings of those "professionals"?

Let's face it: Debian can still be Debian even if they switch their underlying package format to RPM. Or any other vaguely modern package format.

Refactoring .deb is a good thing, but it does make sense to shop around for existing solutions that work, are mature and maintained.

Improving .deb

Posted Jun 10, 2019 6:33 UTC (Mon) by ceplm (subscriber, #41334) [Link]

> The state management for package install/upgrade/uninstall is more robust and intuitive for deb being one of the really big ones.

You know that's like twenty years out-of-date complaint, right? And the only meaning of the word "intuitive" is "what I am used to and I hate anything changing", right?

Improving .deb

Posted May 28, 2019 22:08 UTC (Tue) by amacater (subscriber, #790) [Link] (16 responses)

Uncompressing a Debian package by hand is not hard to do: in the spirit of "stupid things I've done that I can share" - I once managed to remove most of /usr and reinstate it by unpacking .debs until it worked :)

You can't do that with rpms - and anyway "whose version of rpm" - I'm old enough to remember when Red Hat broke rpm such that you couldn't install updates, when Mandriva introduced a "newer" version of rpm that was a fork by an erstwhile maintainer, that OpenSUSE rpms don't work well with anyone else's.

Debian's strict policy on packaging and upgrades is what makes seamless upgrade from say, Debian 7 to Debian 9 remotely possible: if you're _really_ lucky, you might just be able to upgrade CentOS 6.8 to 7 or 7.6 to 8 but the rpm world is a reinstall to fix every problem.

Debian and Ubuntu share very similar package formats: Ubuntu developers do things differently at times with versions of gcc or whatever so you can't drop Debian and Ubuntu packages together: but you can easily use the source to rebuild them readily.

Disclaimer: I'm a Debian developer but a CentOS and Red Hat sysadmin advising engineers in my day job.
I've been using both for ~24 years.

If you _really_ need more than 9GB in a single .deb, chances are that you're doing it wrong even now.

Improving .deb

Posted May 29, 2019 0:26 UTC (Wed) by kfox1111 (subscriber, #51633) [Link] (11 responses)

Sure you can extract just /usr. see the rpm2cpio tool.

I tried some weird experiments many years ago with the alien package converter. Installed redhat using the debian installer (And I think the storm linux installer...). Also installed debian using anaconda. There isn't a huge difference between rpms and debs at the end of the day.

Its what you put in them that counts. :)

Improving .deb

Posted May 29, 2019 0:48 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]

> You can't do that with rpms

You sure can. I have done that. rpm2cpio and convenience scripts like rpmls (part of rpmdevtools) make this really easy.

> if you're _really_ lucky, you might just be able to upgrade CentOS 6.8 to 7 or 7.6 to 8 but the rpm world is a reinstall to fix every problem.

Certainly hasn't been the case for years. Many RPM distributions support a straightforward upgrade path.

The complications in enterprise distributions have not much to do with RPM the package format or even the tooling (c.f: things like dnf system upgrade) but the fact that these distributions have a very long lifecycles (10 to 15 years) and they tend to run many third party applications (including proprietary ones) that are brittle in the face of OS upgrades. The answer to that has been VM's and containers.

Minor variations in RPM (nearly all distributions have folded back into using RPM4 which has a very active and healthy upstream project now) don't matter as much. There are packaging differences because unlike Debian/ Debian derivatives like Ubuntu, the RPM world has a broad number of distributions which aren't all derived from Red Hat ex: openSUSE and even in such cases, the divergences have steadily gone away with time. The number of patches in say Fedora, mandriva, opensuse etc patches in RPM package itself is pretty low at this point. Even macros have consolidated considerably.

> but you can easily use the source to rebuild them readily.

You certainly can do that with RPM pretty quickly and I have done that for dozens and dozens of packages. Lots of packages in Fedora due it for supporting EPEL and even more do it for things like openbuildservice.

All of these sounds like issues that are outdated at this point.

Improving .deb

Posted May 29, 2019 17:48 UTC (Wed) by wahern (subscriber, #37304) [Link] (9 responses)

> There isn't a huge difference between rpms and debs at the end of the day.

That's not at all fair to Debian packages. You can make do with RPM and the RPM ecosystem (Yum, DNF), but it's still a pock marked hell scape. Here's a good jumping off point for the low-level sins of RPM specifically: https://xyrillian.de/thoughts/posts/argh-pm.html

rpm pains (orig: Improving .deb)

Posted May 29, 2019 18:53 UTC (Wed) by domo (guest, #14031) [Link]

I'd also vote down rpm 4 format...

In early 2010's I spend one month trying to figure out the rpm source code in order to
make it working elsewhere. While doing that I got some knowledge about the format,
and then got good enough replacement made using perl(1)

search for 'rrpmbuild' for code reference.

the format is quite complicated for human observer...

whatever the format is (IMO extending ar(5) is not best option, old tools cannot understand
it anyway, so something better could be deviced) it should be simple enough everyone
can easily do their own tools (or help extending the existing ones).

Best would be some new "extensible linux package format" (w/ sane format, no xml etc)
which could be adopted by all distributions. the format would have extensible package
metadata format, and then extensible file (metadata, including file contents) format.

Even I could device such a format, just that no-one would adopt implementation done
by random programmer very often...

Improving .deb

Posted May 30, 2019 11:08 UTC (Thu) by jond (subscriber, #37669) [Link] (1 responses)

That was a great (terrifying) read, thanks.

I'm now wondering how much rpm-ostree might side-step this madness, if at all.

Improving .deb

Posted Jun 4, 2019 3:56 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

> I'm now wondering how much rpm-ostree might side-step this madness, if at all.

I am not sure I follow what you are wondering about here. The internal low level implementation details of RPM format obviously doesn't affect end users. What matters to end users is functionality like library and file based dependencies or boolean dependencies or weak dependencies etc.

ostree based systems don't use RPM at all and therefore dependencies don't really matter all that much on these systems for end users. What you get is a OS that is constructed and pushed to end users a single "immutable" image and everything else is supposed to be running of containers of some sort. rpm-ostree provides some level of compatibility with traditional RPM packages but the more you use them, the more you move away from the advantages that a immutable base image provides. Instead the recommend path is use wrapper like Fedora toolbox and within that you could just install plain rpm packages.

Improving .deb

Posted May 31, 2019 1:15 UTC (Fri) by bojan (subscriber, #14302) [Link] (4 responses)

Well, my Fedora 30 box must be on fire then, I'm sure. After all, I've been upgrading it since Red Hat Linux 6.0 (no, not RHEL 6 - the actual RHL 6.0), with at one point cross-grading to x86_64 from i386/i686 (totally not recommended, not supported and yet quite doable). And yet, it all still works. But, I'll delete it all right now, because it obviously is broken beyond repair. Oh, wait...

So, RPM has been tinkered with since its inception and now has a whole lot of baggage caused by various design errors, improvements, folks finding ways to bend it in new and useful ways. Shocking stuff. Who would have thought that a package format that is 22 years old would be like that. :-)

More planned, BTW: https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zst...

Improving .deb

Posted May 31, 2019 2:18 UTC (Fri) by wahern (subscriber, #37304) [Link] (1 responses)

Yes, RPM is manifestly serviceable. But my experience comes not as a user but as a packager. It's been over a year since I last had to package using RPM and have mostly tried to put it all behind me. Suffice it to say that the only thing the RPM format and RPM tooling has going for it is that from a distance it looks enticing--simpler, cleaner, more convenient. It's none of those things when you get up close, though. And don't get me started on Yum/DNF....

> Who would have thought that a package format that is 22 years old would be like that. :-)

Debian package users! The Debian package format is old and wrinkly, but it has aged incredibly well in terms of forethought and capabilities. The tooling is more complex but that's because the ecosystem is layered. Many of the biggest headaches in the land of Yum and RPM (sections, macros, file contents, dependencies, building, ...) are insurmountable and force everybody and everything to accommodate the limitations. (Ignorance is bliss, though!) For every headache one can identify in the land of .debs and Apt there are *both* dirty hacks and clean changes in approach that resolve them; rarely are you stopped in your tracks with the realization you simply cannot accomplish something functionally.

IMO the Debian packaging ecosystem continues to evolve and improve. There are improvements to the RPM ecosystem, but they asymptotically move RPM toward a wall.

Detailing all the issues here would be impractical (and I don't have the memory for it, only the scars), but if you have time carefully go through the history of the development of Modularity (you may need to use Wayback Machine to see how the project specifications changed) and you'll see how RedHat had to backtrack and literally reinvent Modularity late in the RHEL8 development cycle after they realized they couldn't surmount various limitations to RPM, particularly with regards to build-time and run-time dependency management. I remember a co-worker raving about how awesome it would be and me being incredulous that they could pull it off, and lo-and-behold it turned out that they couldn't.

Improving .deb

Posted May 31, 2019 4:20 UTC (Fri) by bojan (subscriber, #14302) [Link]

I use rpm as both a user and a packager. Are there issues? Sure, sometimes. Do things generally work? Yep.

So, I have no idea why folks go on these long rants to point out how everything Debian has an almost saint like quality and everything else is pure junk. The fact is that both systems are in widespread use and they work, each with their own limitations.

Improving .deb

Posted May 31, 2019 8:05 UTC (Fri) by amacater (subscriber, #790) [Link] (1 responses)

Take a CentOS machine with no third party proprietary RPMS but maybe some EPEL (built by Fedora for RHEL/CentOS) installed and updated to be running CentOS 5.10 - that ran out of support in 2017 - so only two years out of support. Try to upgrade it to CentOS 6 - pretty much guarantee you'll fail - there's no upgrade path. If by some lucky chance, you can hand craft it: move to CentOS 6.8 and do the same to move it to 7 - that's two current distributions. Now move it to 8. Four distributions with a long term lifecycle - three of them concurrent and fully supported -but no upgrade path through them.

If you keep your Fedora fully maintained then you'll be upgrading every 12 months or so and will lose support for your version at best every 18 months.

Now take a neglected Debian 7 - some two years out of support. Move it to 8 which is on long term support. Move it to 9. [In a month or so, you can move it to 10 when Buster comes out, maybe.] That includes the sysvinit-systemd transfer which needs a reboot. That takes you from kernel 3.10 - 4.4 seamlesly and 4.19 next month. Oh, and for fun, do this with no network access. You might do this with CentOS: you _can_ do this with DVD images and Debian :) [And yes, it's an "uphill, both ways in the snow" kind of story - but it's real, and there are lots of machines out there that are "only" two years out of support and have to be maintained and upgraded without data loss. ].

Improving .deb

Posted May 31, 2019 8:49 UTC (Fri) by bojan (subscriber, #14302) [Link]

I actually upgrade my Fedora systems every 6 months. The point being, if the distro decides to have a policy of upgrades, then it's possible regardless of the underlying packaging.

Red Hat decided they didn't want to support upgrades from RHEL 5 to 6, but 6 to 7 (for some products) and from 7 to 8 is possible:

https://access.redhat.com/solutions/637583
https://access.redhat.com/documentation/en-us/red_hat_ent...

Improving .deb

Posted May 31, 2019 8:30 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

The article ponders multi gig deb limitations and you point to a page that complains a few bytes are wasted in rpm metadata. Really?

You have all the tools to manipulate rpm files under Linux, you can even open them in generic non-Linux archiving tools like 7zip and it will *just* *work* (yes you will lose rpm-specific metadata. Just like you will lose iso-specific metadata when treating isos like a giant archive. If you absolutely refuse to use native rpm tools just uncompress the source rpm, the whole package is described in a human-readable spec file, you don't absolutely need to read the binary transformation of this same info).

The rpm installation/update process has a mind-numbing amount of entry points, with very specific (and weird) ordering, but the average packager does not have to think about them. When you *do* need to think about them, because the software being packaged has special needs, you’re happy to have them available (or, like pretty much everyone, you decide it’s all too complex, and try to do your own better simpler thing, and months later, when you've exhausted all the weird corner cases required by your software, and actually understand the problem space, you switch to native rpm-provided facilities, because now you actually understand why they need to behave the way they do. Of course some people are too lazy to actually fix all corner cases, or too proud to admit they were wrong, so they will push garbage that does not make use of the tech capabilities, and complain rpm is awful). It's the same difference between an init tech with barebones facilities, that requires you to write giant custom scripts to work (SysV init), and something with built-in capabilities, that requires knowing the manual to call the built-in capabilities correctly (systemd).

And the rest is just the packaging policies and rules of each distro, which are not the same, so anyone looking at the packages done by other distros will be lost and unhappy, and only people that mistake their habits with natural laws of nature will seriously complain about it (yes, Debian packaging is weird and crufty too when looked at by outsiders). And two rpm distros won't do things the same way because they don't have the same opinions, and so would two deb distros.

The rpm format is actually nice enough many distributions adopted it and do their own different thing with it. And yes it also provides automation facilities in form of macros, so you don't have to do it all by hand, and distros with different opinions and objectives will automate things differently, what's the problem with that? It's like complaining not two Firefox users install the same extensions, and it's too hard to understand why two Firefoxes do not behave the exact same way.

Improving .deb

Posted Jun 3, 2019 22:22 UTC (Mon) by logang (subscriber, #127618) [Link]

Second this: I have a couple machines that have seen every release since Lenny (5.0 to 9.0) without any reinstall and many that have been around since Stretch. The fact that Debian makes this painless is a *huge* win seeing reinstalling is a lot more work and would probably require a new machine to be swapped in. Every two years I just upgrade every machine and it typically takes less than an hour per machine. If I had to reinstall, I probably wouldn't upgrade anything which is obviously wrong.

Improving .deb

Posted Jun 6, 2019 8:55 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

> You can't do that with rpms - and anyway "whose version of rpm" - I'm old enough to remember when Red Hat broke rpm such that you couldn't install updates, when Mandriva introduced a "newer" version of rpm that was a fork by an erstwhile maintainer, that OpenSUSE rpms don't work well with anyone else's.

> Debian's strict policy on packaging and upgrades is what makes seamless upgrade from say, Debian 7 to Debian 9 remotely possible: if you're _really_ lucky, you might just be able to upgrade CentOS 6.8 to 7 or 7.6 to 8 but the rpm world is a reinstall to fix every problem.

What you miss here, is that (afaik) all distros that use deb are DERIVATIVES of debian, so they inherited debian's packaging rules.

OpenSUSE (at least its parent) PREDATES rpm, heck iirc it even predates Red Hat, so while it adopted the rpm program and file format, it already had its own, completely different, packaging rules.

Things are a lot better on that front now, I believe ...

(SuSE began as a Slackware derivative, then was derived from some other obscure distro, then became its own master.)

Cheers,
Wol

Improving .deb

Posted Jun 6, 2019 19:42 UTC (Thu) by amacater (subscriber, #790) [Link] (1 responses)

Yes, sorry, I didn't mention Jurix and so on that became SUSE. Every distribution I've seen adopt RPM - back to and including Linux-FT version 2 - has botched it. Red Hat botched it a couple of times where updates became impossible.

Deb "just works" but only because Debian puts a whole lot of policy in place and developers are constrained to work so that packages co-install, don't overwrite libraries from other packages and so on. It's hard nosed packaging policy that makes it work. [A colleague says "CentOS just downloads it's easy - Debian's too big!" but that's because Debian includes the world and its source ]

Improving .deb

Posted Jun 6, 2019 20:50 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

> It's hard nosed packaging policy that makes it work.

I agree with that view. That has nothing to do with the format of the archive. It is much more higher level.

>"CentOS just downloads it's easy - Debian's too big!"

Not sure what that means. Net installation works just fine in either.

Improving .deb

Posted May 28, 2019 23:25 UTC (Tue) by oliwarner (subscriber, #81320) [Link] (1 responses)

These conversations seem to focus far too much on the minutia of compression while completely ignoring that various distributions (even those using debs) are all stomping off in radical new directions for features like user sandboxing, true static dependencies, and granular permissions.

Debs will need to gain more than a x8 speedup to survive the next generation of distributions.

Improving .deb

Posted May 29, 2019 1:39 UTC (Wed) by interalia (subscriber, #26615) [Link]

I think it's fair to try to solve one relatively simple problem in the existing format, without trying to replace it. Yes they could also solve this by trying to solve four big problems at once, but I think it would be 4^3 = 64x more difficult to get consensus in Debian on those four problem and their solution, let alone implementing and using that solution.

Debian's long-term direction and future is a worthy discussion but should be a totally separate one. We shouldn't let perfect be the enemy of a good minor improvement.

why confine the discussion to Linux-based systems?

Posted May 29, 2019 2:45 UTC (Wed) by gus3 (guest, #61103) [Link] (3 responses)

Has anyone studied package management in OpenIndiana, Haiku, ReactOS, Windows? What can we learn from their package management systems?

why confine the discussion to Linux-based systems?

Posted May 29, 2019 10:52 UTC (Wed) by hei8483j (guest, #124709) [Link] (2 responses)

Don't even mention Windows Installer. The whole Windows experience is so complicated in comparison with a Linux system. In Linux, the main effort is creating good build scripts. In Windows, you are always writing custom actions to supplement the installer itself, its dependencies and runtimes. You could think it easy to install an SQL server or a .NET runtime. Not so. If there only were a distro repository for Windows and you could add dependencies to MSI packages. And don't get me started with the in-built database in MSI.

why confine the discussion to Linux-based systems?

Posted May 31, 2019 0:11 UTC (Fri) by compenguy (guest, #25359) [Link] (1 responses)

> And don't get me started with the in-built database in MSI.

What on earth do you think rpms are? Each rpm is a berkeley db hierarchical database.

> The whole Windows experience is so complicated in comparison with a Linux system. In Linux, the main effort is creating good build scripts. In Windows, you are always writing custom actions to supplement the installer itself, its dependencies and runtimes.

Actually, there are a lot of technical parallels between MSI design and execution and rpm design and execution. If you look at the order that rpm scripts are run during upgrade, it's a really mind-bending process and feels really unnatural. But it is, in fact, very very efficient with disk writes/erases especially when not all the files in the package might be changing. An MSI installer with a late-scheduled RemoveExistingProducts executes actions in a sequence _very_ similar to rpms.

In fact, if PowerShell, Wix, and Burn had been invented about a decade prior, the MSI installer development experience would have looked a good bit more like rpm than it currently does.

As it is, though, Microsoft is trying not to invest anything in MSI in order to push their app store distribution model. Apple deprecated their pkg installation system probably almost a decade ago, again "because appstore", but they still can't manage to kill it - it's just too useful (although the pkg system is pretty scary in its own right).

why confine the discussion to Linux-based systems?

Posted Jun 3, 2019 23:26 UTC (Mon) by brouhaha (subscriber, #1698) [Link]

> What on earth do you think rpms are? Each rpm is a berkeley db hierarchical database.

The system-wide RPM database is a berkeley db.

An individual RPM file is just an RPM header prepended to a cpio archive.

Improving .deb

Posted May 29, 2019 9:50 UTC (Wed) by SiB (subscriber, #4048) [Link]

> Before 1995, a different format, not based on ar, was used for Debian packages. It was, instead, a concatenation of two ASCII lines (format version and the length of the metadata archive) and two gzip compressed tar archives, one with metadata, similar to the modern control.tar.gz, and one with files, just like data.tar.gz. Even though old-format packages are not in active use now, modern dpkg can still create and install them.

Problem solved? Big packages can use the old format, no changes required?

ar limits

Posted May 29, 2019 9:56 UTC (Wed) by geert (subscriber, #98403) [Link] (2 responses)

Are people using ar to create static libraries running into the size limit?
Do they have a solution planned?

ar limits

Posted May 29, 2019 10:27 UTC (Wed) by gb (subscriber, #58328) [Link]

Yeah, for it seems that actually necessary is to fix ar format, something like set first byte to 0 and use rest 9 bytes as a binary size, that would give us 2zettabytes, with possibility to extend later.

ar limits

Posted May 30, 2019 13:01 UTC (Thu) by mort (guest, #132348) [Link]

I once tried to package up all of Chromium, with debug symbols, as an ar archive for static linking. I definitely hit size limits.

Improving .deb

Posted May 29, 2019 15:07 UTC (Wed) by jezuch (subscriber, #52988) [Link] (3 responses)

Seems to me someone is obsessed with benchmarks wrt decompression speed. Isn't installing a package I/O bound anyway? Or FS-bound, because the installer has to fsync often to provide some guarantees that a power failure during the operation does not leave the system in a horribly corrupted state. Even on my SSD any bigger upgrade takes much longer than a straightforward decompress.

May make somewhat more sense on source packages, though, but these are not .debs

Improving .deb

Posted May 30, 2019 17:22 UTC (Thu) by imMute (guest, #96323) [Link] (2 responses)

All package building within Debian's buildd network is done using chroots (typically via a tool like pbuilder, sbuild, etc). I've found that you can get pretty big speedups by doing all that on a tmpfs. So, yes, disk IO is a big bottleneck for *users* of deb packages. But *builders* of those debs could benefit from speed increases elsewhere.

For example, pbuilder itself uses compressed tarballs to store an image of the rootfs at rest. Each time you want to build a package, that tarball has to be uncompressed and extracted. I've found that you can use cowbuilder instead (I'm not sure exactly how the "instead" happens - git-buildpackage does it automagically for me) which keeps everything uncompressed/untarred and uses COW filesystems to copy the "pristine" rootfs for each build. It's incredibly fast to get the chroot ready to use, and then I find myself waiting 2-3 minutes for apt to install all my dependencies (obtained from a cache on the same ramfs; *not* from a mirror).

Improving .deb

Posted Jun 3, 2019 10:44 UTC (Mon) by jezuch (subscriber, #52988) [Link] (1 responses)

Right, I haven't considered package builders. I haven't actually used pbuilder much - it's too cumbersome and, as you notice, slow. Nowadays I do my builds in a chroot on a writable btrfs snapshot. I used to build on tmpfs but my SSD is fast enough and I needed the RAM for something else (gcc -flto can eat a lot of memory!). I wonder if Debian is considering using something similar on their build farms? (I really have no clue what's the current state there.)

But anyway, xz is relatively fast at decompression as all LZ77 type algorithms are. Is it not fast enough? On the other hand bandwidth is cheap these days so... :)

Improving .deb

Posted Jun 4, 2019 3:03 UTC (Tue) by pabs (subscriber, #43278) [Link]

The current state is that Debian uses sbuild and tarballs as the base of the build chroot.

Improving .deb

Posted May 29, 2019 18:42 UTC (Wed) by wahern (subscriber, #37304) [Link] (1 responses)

A big problem with Zip, IMO, is that the metadata for archived files is stored twice--in an index and as a header to each file. Which one do you use and trust? This creates a dilemma for metadata parsers and especially security scanners.

If you don't mind the uncleanliness and potential security issues of such redundant metadata, one can create an index for tar files, including compressed tar files. I've experimented with this (for both tar and tar+gzip), though nothing releasable. The upside is that adding an index could be done in a backward compatible manner--just another object in the outer archive that could be ignored.

Improving .deb

Posted Jun 6, 2019 21:36 UTC (Thu) by dfsmith (guest, #20302) [Link]

Wouldn't you trust both or neither? If they match, yay! If not, the zip is corrupted (and shouldn't have passed the signature check).

Improving .deb

Posted May 29, 2019 20:16 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> After all, the only other modern application of this format is for static libraries (they are ar archives with object code files inside)

Even Windows static libraries are `ar` format (the linkable part of shared libraries are too, they just have some metadata which says "runtime load this .dll file").

Improving .deb

Posted May 29, 2019 22:48 UTC (Wed) by unixbhaskar (guest, #44758) [Link]

Proposals sound bloody good. But, do not make changes for the sake of changes. And importantly, which should not break the existing stuff, then all hell might break.

I think if not months, a few years down the line, surely there must be some change in this respect. No point to make a hasty decision.

Improving .deb

Posted May 30, 2019 9:08 UTC (Thu) by bokr (guest, #58369) [Link]

Seems like in a discussion of packaging, the gnu guix package features deserve a look?

Their base long-term dependency is the stability of the linux kernel ABI, as I understand it,
so old apps using old libraries can coexist with newer, so long as the kernel supports all
the syscalls involved (which Linus is pretty good at enforcing as the kernel evolves),

https://www.gnu.org/software/guix/manual/en/html_node/Fea...

Improving .deb

Posted May 30, 2019 15:05 UTC (Thu) by eru (subscriber, #2753) [Link]

When comparing compressed archives that compress each file separately with compressing an uncompressed archive as a whole, it seems to me the kernel source is the best-case for the latter: almost all files are C sources belonging to the same software project, so they have lots of mutual similarity for the compressor to chew on, and C code also compresses well. By contrast, a deb or rpm usually contains mostly binaries and other binary data files, which are less similar and less compressible. I am sure the size difference would be much less between the two approaches, if real binary packages were used as the test material.

tar for outer wrapper - wasted space

Posted May 31, 2019 23:40 UTC (Fri) by brouhaha (subscriber, #1698) [Link]

Using tar for the outer wrapper is not going to waste much space for block padding, because the outer wrapper only contains three files. Tar defaults to 512B internal blocks and 10KiB I/O blocks. If I understand the modern tar format (ustar) correctly, at most the waste from partial blocks and the ends of the three files will be 1533 bytes (3 partial 512-byte blocks of 1 byte each, leaving 511 bytes wasted in each partial block). There will also be waste to round the entire archive to a 10KiB length, so that will be at most 10239 bytes, for a maximum total padding of 11,772 bytes. Assuming that the average file size mod 512 is uniformly distributed (which admittedly might not be a valid assumption), the average waste for the entire outer tarball wrapper should be around 6.5 KiB. There is also a 512B header for each file, which could perhaps be considered largely wasted space, in which case the average waste for the entire outer tarball wrapper should be around 8 KiB.

This seems like a problem only if the format changes include having a significant number of files nested directly in the outer wrapper, rather than in an inner compressed archive.

Improving .deb

Posted Jun 6, 2019 3:55 UTC (Thu) by brunowolff (guest, #71160) [Link] (2 responses)

Could squashfs-tools be used for this? That would provide random access. The dictionary is shared to at least some extent across files, so the size cost might not be that high. On linux you can mount the file system, though older kernels won't work with newer compression methods. But you can also extract the files without having to mount the file system.

Improving .deb

Posted Jun 26, 2019 2:50 UTC (Wed) by fest3er (guest, #60379) [Link] (1 responses)

I was just thinking something like this: use a FS that performs compression. Sounds like SquashFS might work nicely, if it achieves an acceptable level of compression. Ideally, there'd be little need to pre-decompress the pkg; just loop-mount it and use rsync to install the files.

As I understand, uncompressing certain .xz archives (perhaps large archives) *can* require a lot of RAM

Haiku have created what sounds like a novel approach to packages. For most user packages, there's no need to unpack and install files. As I understand, the pkg file is simply put where it belongs; once there, its contents become available to the system. To remove the pkg, delete the pkg file. I've no idea how they made it work (perhaps some form of FS union).

Improving .deb

Posted Jun 26, 2019 10:32 UTC (Wed) by excors (subscriber, #95769) [Link]

> As I understand, uncompressing certain .xz archives (perhaps large archives) *can* require a lot of RAM

I don't believe it depends on the archive size, just on the dictionary size that was chosen when compressing, because the decompressor has to construct that dictionary in RAM. The man page says the default compression mode ("xz -6") uses an 8MB dictionary, and the most expensive preset ("xz -9") uses 64MB, though with custom settings it can support up to 1.5GB. Compression takes roughly 10x more RAM.

(For comparison, zlib(/gzip/etc) uses a 32KB dictionary by default, which is partly why modern algorithms can perform so much better.)


Copyright © 2019, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds