|
|
Log in / Subscribe / Register

Stapelberg: distri: a Linux distribution to research fast package management

Michael Stapelberg has announced the first release of "distri", a distribution focused on simplifying and accelerating package management. "distri’s package manager is extremely fast. Its main bottleneck is typically the network link, even at high speed links (I tested with a 100 Gbps link). Its speed comes largely from an architecture which allows the package manager to do less work."

to post comments

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 14:48 UTC (Mon) by smoogen (subscriber, #97) [Link] (2 responses)

I wish him (and everyone else helping) luck in this.

I have seen multiple package managers over the years all starting off on speed and showing they can be faster than anything in production. Then all the corner cases (what do you mean this packages does these 4 things in an immutable directory or it crashes??? why would anyone do that?' and 'well I need this for my problem or I cant use this' requests start adding in the complexity. That said, the ideas put together for a specific use case versus a 'this format is for everyone everywhere everytime' looks good.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 17:55 UTC (Mon) by mstapelberg (guest, #66308) [Link]

Thanks!

Yeah, starting off quick and simple before encountering the actual real-world use-cases is common indeed :)

With distri, I had that in the back of my mind. Today, distri has 425 packages and I have been using it on my laptop as the only operating system for the past few months.

This is not to say that there won’t ever be people running into use-cases for which they’d need to extend distri, of course. I’m just saying that for me, and how I use computers, distri’s approach seems to work! :)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Oct 28, 2019 22:10 UTC (Mon) by cchemparathy (guest, #74571) [Link]

check out nix. it pretty much does what you want, and exists with a non-trivial package set.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 15:57 UTC (Mon) by imgx64 (guest, #78590) [Link]

Interesting idea. Reminds me of GoboLinux and a little bit of OSTree.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 16:01 UTC (Mon) by kloczek (guest, #6391) [Link] (2 responses)

This is reinventing the wheel.
IPS have been introduced more than decade ago and still is used on Solaris.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 18:54 UTC (Mon) by nwildner (guest, #133890) [Link]

Just face it: Year is 2019 and pretty much all package/filesystem related inovations were achieved, and people tend to create new features on Linux distros that are based on those.

"This isn't innovation" is such a childish argument today. Get over it.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 22:42 UTC (Tue) by flussence (guest, #85566) [Link]

I'm sure Oracle is crying all the way to the bank at the thought that nobody wants to play with their ball.

FUSE?

Posted Aug 19, 2019 16:33 UTC (Mon) by arcivanov (subscriber, #126509) [Link] (11 responses)

> In distri, these locations are called exchange directories and are provided via FUSE in /ro.

This sounds like a terrible, terrible idea, considering how slow FUSE is.

FUSE?

Posted Aug 19, 2019 17:40 UTC (Mon) by mstapelberg (guest, #66308) [Link] (10 responses)

This is a proof-of-concept, and FUSE allowed much quicker iteration times than writing my own kernel module :)

In my day-to-day, I don’t actually notice a perceivable slow-down, so the current FUSE implementation quick enough to demonstrate what I wanted to show.

But yeah, moving the file system into the kernel would certainly be a step on the productionization checklist for this approach!

There is also an interesting paper which talks about moving only the hot paths of a FUSE file system into the kernel using eBPF: https://www.usenix.org/system/files/atc19-bijlani.pdf

FUSE?

Posted Aug 19, 2019 18:31 UTC (Mon) by kloczek (guest, #6391) [Link] (9 responses)

> This is a proof-of-concept, and FUSE allowed much quicker iteration times than writing my own kernel module :)

This sounds even more terrifying.

FUSE?

Posted Aug 19, 2019 20:40 UTC (Mon) by Paf (subscriber, #91811) [Link] (8 responses)

... why? Loading and unloading a kernel module can be painful, the environment is less forgiving, and if you screw up, oops there goes the system so now you have to reboot. It's all really annoying for starting out with brand new stuff and just hacking away.

Speaking as someone who works as a file system dev for my day job, prototyping stuff in FUSE is common (almost standard) practice. It's a great idea.

And if distri is already getting great speed, which it sounds like it is, then clearly FUSE isn't the bottleneck, so maybe it's best to just keep using it.

FUSE?

Posted Aug 19, 2019 21:32 UTC (Mon) by kloczek (guest, #6391) [Link] (7 responses)

Package management should not be about loading and unloading kernel modules.

FUSE?

Posted Aug 19, 2019 21:39 UTC (Mon) by mstapelberg (guest, #66308) [Link]

It isn’t. But developing a package manager (or any software, really) benefits from quick iteration times.

FUSE?

Posted Aug 20, 2019 3:26 UTC (Tue) by Paf (subscriber, #91811) [Link] (5 responses)

It’s ... not about that? Though development *might require that*, since fast iteration requires reloading code. Additionally, that’s an argument *for* using FUSE, not against using it?

FUSE is usually easier to develop within and intrinsically safer in several ways. If it’s fast enough that it’s not the bottleneck, then *great*.

FUSE?

Posted Aug 20, 2019 6:13 UTC (Tue) by kloczek (guest, #6391) [Link] (4 responses)

So why some package management aspects needs to be done in kernel space?

FUSE?

Posted Aug 20, 2019 7:07 UTC (Tue) by edomaur (subscriber, #14520) [Link] (3 responses)

These are *filesystem* aspects, in fact, not package management per se. And the kind of directory fusion evoked by this project could very well benefits other projects, like, for example, Docker-like systems, read only application deployment, immutable systems.

FUSE?

Posted Aug 20, 2019 21:12 UTC (Tue) by kloczek (guest, #6391) [Link]

Why package management must be part of the file systems?

FUSE?

Posted Aug 20, 2019 21:14 UTC (Tue) by kloczek (guest, #6391) [Link] (1 responses)

A so it is docker problem not a package management.
Good to know.

FUSE?

Posted Aug 28, 2019 12:11 UTC (Wed) by nix (subscriber, #2304) [Link]

It's not a "problem", it's a useful new filesystem feature, which, like most filesystem features, has multiple potential use cases. It's no more "package management" or a "docker problem" than overlayfs is a "live CD problem".

I can easily see you in the 1960s complaining that nobody needs these newfangled "file" thingies and the need for them over proper hardwired partitions laid out at disk format time is a problem with these newfangled programs that want such ridiculous fripperies.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 17:38 UTC (Mon) by q3cpma (subscriber, #120859) [Link] (6 responses)

A long time ago, I was also obsessed with package manager speed, but after switching to Gentoo, I realized that features and correctness really are more important.
I'm personally way more interested in the quickly evolving Ravenports.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 17:51 UTC (Mon) by mstapelberg (guest, #66308) [Link] (4 responses)

Correctness is vital of course. We have different priorities regarding speed vs features, it seems :)

Thanks for bringing up ravenports, haven’t heard of it before.

Taking a quick look at its buildsheets, they look like Makefiles and allow arbitrary commands at build time (e.g. https://raw.githubusercontent.com/jrmarino/Ravenports/mas...).

I think this is a mistake when reaching a certain number of ports and/or contributors. Restricting the possibilities allows for easier large-scale changes and other tasks that benefit from machine-readable packages.

In distri, package build instructions are declarative¹. See https://github.com/distr1/distri/blob/master/pkgs/irssi/b... for an example and https://repo.distr1.org/distri/jackherer/docs/building.html for the documentation. I want to publish a blog post going into more detail about this approach, too.

① Of course there is an escape hatch in the form of writing custom build instructions, but very few packages need that. Defaults matter—reducing a change from “audit these 700 packages” to “change the builder and audit the 7 oddball packages” drastically helps :)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 18:10 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

Have you considered using a more well known declarative format like say TOML for your package definition?

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 21:15 UTC (Mon) by mstapelberg (guest, #66308) [Link]

Yeah, but I’m much more fluent in protobuf. When I’ve used TOML (both as a user and as a developer), it was okay, but not distinctly better than the text protobuf I’m using here.

One neat advantage is that text proto can easily be converted into binary proto, which I’m using to quickly and efficiently transfer metadata about all packages in a given repo.

In the end, the choice of declarative language dialect doesn’t really matter too much, as long as it’s somewhat natural for users to write :)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 17:28 UTC (Tue) by q3cpma (subscriber, #120859) [Link] (1 responses)

Well, while I'm against fully declarative syntaxes, using them or something like Gentoo's eclasses is indeed a must to factorize what is, as you said, essentially almost always the same thing (autoconf/cmake -> make/ninja).

What I meant by "correctness" is that USE flags + the dependency atom syntax allows for a lower granularity and more detailed description of dependencies. You can for example describe that ogg123 with the flac feature needs libflac with the ogg feature. Coupled with the gestion of multiple versions of each package in the tree, this gives you probably the most complete way to handle the problem.

For me, this is really what sets portage apart; the emerge tool itself shows its age and rust (especially by needing a lot of third party tools like eix or gentoolkit), pkgcore might become a better choice.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 21:04 UTC (Tue) by mstapelberg (guest, #66308) [Link]

I have some reservations against optional dependencies: https://michael.stapelberg.ch/posts/2019-05-23-optional-d... :)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 22:55 UTC (Tue) by flussence (guest, #85566) [Link]

It should be pointed out that Gentoo didn't care about correctness until third-party package managers came along that did. Portage itself is an unmaintainable disaster to this day, and the distro's QA relies on people with ad-hoc tinderbox scripts and pkgcore. The whole thing is going to come crashing down one day unless a concerted effort is made to pay down the technical debt.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 19:09 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

Nice!

Can you use overlayfs to combine packages instead of your FUSE filesystem?

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 19:46 UTC (Mon) by compenguy (guest, #25359) [Link] (2 responses)

overlayfs would probably be a good step up from fuse on several axes. I haven't checked if overlayfs allows this, but you wouldn't even need to define an "upper", just chain up the "lowers" for the per-package exchange directory.

Not sure what the performance implications are, exactly, but it'd probably scale with the number of packages participating in the "exchange".

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 21:16 UTC (Mon) by mstapelberg (guest, #66308) [Link] (1 responses)

I actually had used overlayfs and kernel SquashFS mounts before starting the FUSE file system, but it turns out that it’s really tricky to modify overlayfs mounts at runtime, and kernel mounts in general scale really poorly.

Setting up a build environment namespace took multiple seconds with the kernel mount solution and is done sub-second with the FUSE file system, which can lazily load images and is just more flexible in general.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 7:54 UTC (Tue) by mm7323 (subscriber, #87386) [Link]

There's also unionfs.

Either way FUSE is pretty nice and simple and performance isn't _that_ terrible. It seems like a good approach to research and test concepts, and distri looks very interesting.

Thank you for taking the time to write it up.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Sep 4, 2019 7:12 UTC (Wed) by holgerschurig (guest, #6714) [Link]

This is without any tests, just a strong assumption.

OverlayFS (or UnionFS) aren't good for this. If any user would "cd /usr/share/man", then you couldn't add "/ro/just-recently-installed_08.15/man" to it. Because you cannot change the union while one is using it. This would mean that for package updates you'd need to go into some single-user mode. In some scenarios this would be okay, but generally this would just add a layer of inconvenience.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 19:50 UTC (Mon) by compenguy (guest, #25359) [Link] (3 responses)

A couple of thoughts:

1. Have you done any benchmarking?

I imagine it would be relatively straightforward to "flatten out" the filesystem (copy all the files directly into place rather than mounted squashfses and "exchanges") to benchmark the overhead of this system.

2. I'm kind of curious what the boot process is like - is a flat initrd required in order to prep all the package and exchange mounts on the root filesystem first?

3. Also, what are the implications of in-place upgrade of a package while the system is running? I imagine the squashfs cannot be unmounted until all open filehandles to it have been closed, which could make things difficult, but maybe that can be circumvented by bind-mounting the updated package *over* the path for the existing one?

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 21:11 UTC (Mon) by nybble41 (subscriber, #55106) [Link]

> I imagine the squashfs cannot be unmounted until all open filehandles to it have been closed

You can unmount a filesystem while there are open filehandles using the MNT_DETACH umount(2) flag (or umount --lazy from the command line). That disconnects the mount point but keeps the filesystem around until all the handles are closed.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 21:26 UTC (Mon) by mstapelberg (guest, #66308) [Link] (1 responses)

> 1. Have you done any benchmarking?

The most recent thing I have tweaked is how binaries locate their shared libraries, and how they are started. We used to use a big rpath (I think NixOS still does?), but putting symlinks into a lib/ subdirectory per package and pointing rpath to that directory is much faster (thanks to e.g. ld.so’s caching).

Overall, performance is good enough for day-to-day use-cases.

> I imagine it would be relatively straightforward to "flatten out" the filesystem (copy all the files directly into place rather than mounted squashfses and "exchanges") to benchmark the overhead of this system.

Indeed. In fact, this happens when we generate an initrd with dracut.

> 2. I'm kind of curious what the boot process is like - is a flat initrd required in order to prep all the package and exchange mounts on the root filesystem first?

An initrd is not required with distri unless you’re using an encrypted file system.

init is defined in https://github.com/distr1/distri/blob/master/cmd/distri/i.... As you can see, pid 1 mounts the FUSE file system at /ro, then execs systemd and the boot proceeds normally.

> 3. Also, what are the implications of in-place upgrade of a package while the system is running? I imagine the squashfs cannot be unmounted until all open filehandles to it have been closed, which could make things difficult, but maybe that can be circumvented by bind-mounting the updated package *over* the path for the existing one?

Packages aren’t ever updated in that sense, only new versions are being added to the package store. E.g., zsh-amd64-5.6.2-5 and zsh-amd64-5.6.3-6 can be installed at the same time.

The /bin exchange directory will point to the binary contained in the package with the highest distri revision number (6 > 5 in the example above), so new processes will use the new version.

To get rid of old packages, “distri gc” deletes the image from the package store, so at the next reboot it won’t be available anymore.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 23, 2019 8:56 UTC (Fri) by mgedmin (guest, #34497) [Link]

Does this mean a reboot is required if I'm out of disk space and want to free some of it by uninstalling packages?

Can this limitation be lifted?

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 20:45 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

dnf would be much faster for me if it just fetched the metadata updates in parallel instead of sequentially. Watching it inch through ten repos one by one is just painful.

Regarding triggers etc ... couldn't you install packages in parallel, even with pre/post-install scripts, as long as you ensure a package's triggers run in the right order relative to the triggers of its dependencies?

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 19, 2019 21:19 UTC (Mon) by mstapelberg (guest, #66308) [Link]

Agreed: there definitely are low-hanging fruit in the bigger package managers with regards to parallel downloading :)

Regarding hooks/triggers: this depends on the specific semantics of the package manager. I think in Debian you have so many extension points (check out https://wiki.debian.org/MaintainerScripts#Upgrading) that you can’t easily change the order in which things are done.

In distri, I want to demonstrate that even the extreme point of view of getting rid of hooks/triggers altogether is viable, so other distributions should feel encouraged to restrict their offering in order to optimize.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 1:59 UTC (Tue) by unixbhaskar (guest, #44758) [Link] (4 responses)

Well, I have been using several distros from ages and the most disappointing of all of them is the NixOS. If an OS does not allow me to configure, make and make install EASILY. I am not going to use that crap. Period.

I believe distri will allow an ordinary user and seriously limited in technical ability to allow that that to do without much ado.

Michael, thanks for taking the steps. Package managers are a pain in the arse ...most of them if not all. I believe that is what you are looking at most in distri.

I would like to give it spin and hoping it will play well with other OS in the disk ...

I am wildly hoping it allows install without much fuss( you know that the tipping point, if it's not working at first go with ease, how claver or good your work, it will be dump in the dust), I have seen real pathetic process in others...they survive because they are backed by big shops...heck

Last, but not the least, please for heaven's sake make the documentation in one place and explicit. That's where most of the people will be hanging on initially.

Good luck.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 5:59 UTC (Tue) by mstapelberg (guest, #66308) [Link] (3 responses)

I appreciate your enthusiasm for the project! I definitely agree with all the points you listed :)

> I would like to give it spin and hoping it will play well with other OS in the disk ...

It should play as well as any other Linux distribution: distri uses GRUB, which uses os-prober to find other OS.

That said, please be very careful and be sure to have a backup!

> I am wildly hoping it allows install without much fuss( you know that the tipping point, if it's not working at first go with ease, how claver or good your work, it will be dump in the dust), I have seen real pathetic process in others...they survive because they are backed by big shops...heck

Note that distri does not have an installer right now. The idea is that you try it out as a live system only currently.

I have some ideas around how an installer should work, but we’ll see if I get around to implementing that :)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 6:28 UTC (Tue) by zdzichu (subscriber, #17118) [Link] (1 responses)

Too bad you didn't choose something implementing BootLoaderSpec, which explicitly supports multiple operating systems.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 20:58 UTC (Tue) by mstapelberg (guest, #66308) [Link]

We can always change course :)

System boot hasn’t been a priority so far.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 7:23 UTC (Tue) by unixbhaskar (guest, #44758) [Link]

"It should play as well as any other Linux distribution: distri uses GRUB, which uses os-prober to find other OS."

I have ditched Grub many moons ago and run everything from UEFI shell or some sort of dead gummiboot ...becasue that is fast and minimal. I do run at least 7-8 distros side by side(all of them have physical partitions) and that works well for me for a long time .

I think, just need an entry there to run distri.... if otherwise, please let us know. That can save some time.

Linux path, if and only initrd path and partition value...that's all would be needed.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 7:13 UTC (Tue) by edomaur (subscriber, #14520) [Link] (2 responses)

Michael, do you plan to setup a "secure supply chain" tooling too ? I'm involved in a project with an in-toto and Uptane backend and it would be really helpfull to somehow have that baked in the package management system from the start.

https://uptane.github.io/
https://in-toto.github.io/

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 21:00 UTC (Tue) by mstapelberg (guest, #66308) [Link] (1 responses)

Thanks for the links! That isn’t really my area of expertise, so I haven’t done anything in that regard, and I don’t see that changing in the foreseeable future :)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 22, 2019 9:03 UTC (Thu) by edomaur (subscriber, #14520) [Link]

In fact, the idea is to sign a new manifest at each step of the packaging process, so the source of the package can be confirmed. If I remember correctly, Debian is using that scheme.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 8:51 UTC (Tue) by Sesse (subscriber, #53779) [Link] (8 responses)

Having a bar is always good. With some luck, this will spur interest in the more traditional package managers (although most of my upgrades run in nightly cron jobs these days, so installation speed isn't something that really bugs me all that much).

I really wish dpkg would stop fsync-ing each and every file it unpacks… Do it in the background, and if the system crashes, have some sort of redo log job that can run on startup to make everything fine again. Or something. :-)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 20, 2019 17:12 UTC (Tue) by kreijack (guest, #43513) [Link] (6 responses)

> I really wish dpkg would stop fsync-ing each and every file it unpacks…

As BTRFS users (where dpkg si *very* slow on a spinning disk), I suffered of this. Even if I can understanding the reasons. I solved using libeatmydata (which I had to patch otherwise some syncs are not avoided).

However in the few filesystem capable of doing a snapshot (like btrfs, zfs or a filesystem layered on top of dm-snapshot), a better solution would be to make a snapshot before the upgrading; after the upgrading the package manager may delete the snapshot if all was fine or may perform a rollback in case of problem.

I create a script which doing that, but it was far to be perfect.

Anyway most of the BTRFS problems disappeared when I switched to a SSD (even tough BTRFS for these operation is still slow).

> Do it in the background, and if the system crashes, have some
> sort of redo log job that can run on startup to make everything fine again. Or something. :-)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 23, 2019 9:00 UTC (Fri) by mgedmin (guest, #34497) [Link] (3 responses)

Every time I hear people propose filesystem snapshots for rolling back failed package upgrades I wonder how that would work.

What if I'm editing a text file in ~/src/myproject/ while the package manager is installing something in the background (and fails). Will the snapshot delete my changes? Assuming ~/src is on the same filesystem (which is pretty standard nowadays; I don't know of any distros that separate /usr from /home by default).

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 23, 2019 10:15 UTC (Fri) by farnz (subscriber, #17727) [Link]

Speaking specifically to btrfs snapshotting, and not to other filesystems, you'd create subvolumes for /home etc, which are a cross between a directory (which is how they appear in the directory tree) and a sub-filesystem (which is how btrfs-aware tools treat them). You can then snapshot / without snapshotting /home - do the upgrade in the snapshot, and if it succeeds, roll forward to the new, updated snapshot.

Thus, no more half-upgraded state, and no changes to the contents of /home.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 23, 2019 22:34 UTC (Fri) by Lurchi (guest, #38509) [Link] (1 responses)

It works very well in (open)SUSE. $HOME is either a separate partition or a separate subvolume. Each subvolume is (can be) snapshotted independently, so you can rollback each one individually.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 24, 2019 0:38 UTC (Sat) by nybble41 (subscriber, #55106) [Link]

What if you're editing an unrelated file in /etc when the installation fails? Lots of packages need to install default configuration files into /etc and yet it's also an area which is frequently modified outside the package manager. This also applies to parts of /var.

Creating snapshots before installation makes sense, but I'm dubious about automated rollback as the primary means of recovering from failures.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Sep 4, 2019 7:18 UTC (Wed) by holgerschurig (guest, #6714) [Link] (1 responses)

apt install eatmydata
eatmydata apt install foo bar baz

And yeah, I wish I could just specify the fsync usage in /etc/dpkg/... or /etc/apt/...

Stapelberg: distri: a Linux distribution to research fast package management

Posted Sep 4, 2019 14:04 UTC (Wed) by nix (subscriber, #2304) [Link]

Something similar to chattr +s/+D, only the opposite. Sounds like it should be fairly easy to implement...

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 21, 2019 8:50 UTC (Wed) by nilsmeyer (guest, #122604) [Link]

> Having a bar is always good. With some luck, this will spur interest in the more traditional package managers (although most of my upgrades run in nightly cron jobs these days, so installation speed isn't something that really bugs me all that much).

When using VMs I generally just switch out the entire root fs, the process of building that FS could also hugely benefit from speeding up as would building containers, because in this case a mistake during the process usually means starting over.

> I really wish dpkg would stop fsync-ing each and every file it unpacks… Do it in the background, and if the system crashes, have some sort of redo log job that can run on startup to make everything fine again. Or something. :-)

There is the force-unsafe-io option (or eatmydata). I find the idea of just staging the whole change in another file tree and then swapping things out on disk interesting. When I used ZFS as my root FS I just used snapshots so I could revert to the "before upgrade" state.

There is probably also some low-hanging fruit in the area of archive formats and compression, the ubiquitous tar really isn't the most suitable for the task, performance wise.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 21, 2019 12:55 UTC (Wed) by shatsky (guest, #133925) [Link] (1 responses)

> distri uses SquashFS images, a comparatively simple file system image format that I happen to be familiar with from my work on the gokrazy Raspberry Pi 3 Go platform.

Are you sure cutting install times by unpack phase justifies adding level of access indirection which will impact runtime performance? People usually install software once, then use it for a time.

> A nice side effect of using read-only image files is that applications are immutable and can hence not be broken by accidental (or malicious!) modification.
> E.g., all files provided by package zsh-amd64-5.6.2-3 are available under /ro/zsh-amd64-5.6.2-3

Sounds like NixOS without deps closure immutability. NixOS remounts store readonly and uses hashes in store paths to guarantee immutability not just for individual packages, but for packages together with all their deps (which are still separate packages and are shared between depending packages when it's safe).

> In distri, these locations are called exchange directories and are provided via FUSE in /ro

NixOS does this with symlinks. Its system-wide "exchange directories" are stored in special packages which are on the top of deps hierarchy and are available via /run/current-system/sw.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 26, 2019 7:36 UTC (Mon) by mstapelberg (guest, #66308) [Link]

> Are you sure cutting install times by unpack phase justifies adding level of access indirection which will impact runtime performance? People usually install software once, then use it for a time.

Yes, and that’s what I want to demonstrate with this project. The runtime overhead is minimal, and the installation speed-up significant.

> Sounds like NixOS without deps closure immutability. NixOS remounts store readonly and uses hashes in store paths to guarantee immutability not just for individual packages, but for packages together with all their deps (which are still separate packages and are shared between depending packages when it's safe).

distri provides the same guarantees.

> NixOS does this with symlinks. Its system-wide "exchange directories" are stored in special packages which are on the top of deps hierarchy and are available via /run/current-system/sw.

Yep. Creating these symlink farms can take seconds, hence distri provides them on demand (which is a bit quicker).

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 21, 2019 13:08 UTC (Wed) by luto (subscriber, #39314) [Link] (1 responses)

> Because all packages are co-installable thanks to separate hierarchies, there are no conflicts at the package store level, and no dependency resolution (an optimization problem requiring SAT solving) is required at all.
> In exchange directories, we resolve conflicts by selecting the package with the highest monotonically increasing distri revision number.

I don’t see how this will work in a real (multiple major changes, multiple packagers) situation. Here are some examples:

A new version of one package obsoletes a different. How does the obsolete package get removed? Does it still work if there are multiple sources of packages?

Two packages provide the same exchange file. The one with the lower revision numbers gets a bug fix release. Now the winner is switched.

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 26, 2019 7:41 UTC (Mon) by mstapelberg (guest, #66308) [Link]

> A new version of one package obsoletes a different. How does the obsolete package get removed?

The obsolete package will be removed once it’s no longer referenced by any other installed packages. In terms of declaring the roots (packages which users explicitly want to install), I could imagine having a hand-curated list, but I haven’t thought about this a whole lot from the UX perspective. Perhaps there’s a good solution :)

> Does it still work if there are multiple sources of packages?

Not sure what you’re getting at? Can you rephrase your question?

> Two packages provide the same exchange file. The one with the lower revision numbers gets a bug fix release. Now the winner is switched.

Global exchange directories (as opposed to per-package exchange directories) are meant to be used where there is loose API coupling, so I’m not entirely sure what the problem is with the winner switching? Other packages will still find and use precisely the dependencies that they were built with, so if we’re talking about e.g. LDAP DNS resolution (a glibc plugin located at runtime), picking up the bug fix seems like the correct path of action to me? :)

Stapelberg: distri: a Linux distribution to research fast package management

Posted Aug 28, 2019 12:19 UTC (Wed) by nix (subscriber, #2304) [Link]

This looks very interesting, but I'm concerned about the maintainability of everything that would require package-install scripting crossing >1 package having to be folded into the package manager directly (I agree that single-package stuff should basically always be expressible via the archive content itself, without scriptng).

Wouldn't it be nicer to have a composable/pluggable package manager, so that packages could contribute plugins at install time that add such functionality as needed? It seems to me that this would give the flexibility of conventional hook scripts *and* the (admittedly considerable) advantages of your hookless approach, without requiring package manager changes just because a couple of tightly coupled packages want to do related things programmatically at install time.


Copyright © 2019, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds