LWN.net Logo

Fedora and distributed source packages

By Jonathan Corbet
July 16, 2008
Fedora's new version of RPM, announced on July 9, has hit the Rawhide repositories; after inspiring some initial cries of pain, it would appear to settling in well. It is good to see activity on Red Hat's version of RPM after a long period where nothing much was happening. In the process of bringing this new code to Rawhide, the RPM developers have also inspired some interesting side discussions on topics like whether such a major change should have gone through the official "features" process first. But the most extended (and arguably most interesting) discussion came from an unexpected direction.

Doug Ledford is known in kernel development circles, but, being an RHEL engineer, he has not been seen much in the Fedora camp. He joined the RPM discussion with a feature request of his own: he would like a set of tags which would facilitate the location of a package's source code in a distributed version control system (DVCS). So these tags would indicate which DVCS is in use (git, mercurial, etc.), where the repository is to be found, the tag corresponding to the source code for a specific version of a package, etc. And, Doug let it be known, it would be nice if he could have those tags soon; tomorrow would be nice, but before the Fedora 10 release in particular.

Once this information exists for a package, interesting things can be done. For example, source RPM packages could become much smaller; rather than containing a tarball and a set of distributor-applied patches, it could just hold the DVCS information. An "installation" of that package would then just go to the source repository and check out the sources from there. If the source repository is managed carefully, it could help the cooperation between Fedora and the upstream projects; patches could be pushed and pulled between repositories with ease. This kind of mechanism could also make it easier for the Fedora project to distribute "spins" created by outsiders by reducing the resources required to make the associated source code available. See this lengthy pitch from Doug for more discussion of the advantages of the distributed source package approach.

Of course, there are some obstacles too. Not all projects are using a DVCS, so integration with those projects would be more difficult. Quite a few projects have material in their repositories which, for legal reasons, cannot be distributed by Fedora. Finding a way to excise that material without breaking the connections between repositories could be challenging. The tarballs distributed by many upstream projects - which are the starting place for Fedora packages now - often contain changes which are not reflected in their source repositories. Those changes can include the removal of non-distributable material, or simply generating the configure file.

These challenges are real, and some of them will take a fair amount of work to resolve. But it seems clear that things eventually need to go in this direction. Tighter integration between projects and distributors can only help the whole free software ecosystem work in a more efficient manner. Tarballs reflect a form of frozen state which is entirely divorced from the code's history - and from its future. Or, as Doug put it:

It's all about the repo. A tarball is something you hand off to poor saps that haven't joined the 21st century, all the while snickering at their inability to get with the times. It is nothing more than a middle man step that interferes with efficiency of operation and that should be cut out of the loop.

A source package format that can maintain its connections wherever it goes can only make the whole system work better. So it is good that the Fedora folks (including those beyond Doug who have been thinking about this issue for a while now) are working on this problem.

There was, however, an interesting omission from the discussion; as far as your editor can tell, nobody ever mentioned the work being done by the vcs-pkg project, which is aimed toward this goal:

Our goal is to integrate version control with distro package maintenance. We want to recognise all involved in the process, from upstream, the package maintainers of the various distributions, their security and release teams, and power users, who aren't afraid to fix their own bugs, and give maximum flexibility to them.

This group is mostly Debian-based, but its members are making a concerted effort to create solutions which are independent of any given distribution (or DVCS). It can only make sense for Fedora to work with this project - or at least have a look at what vcs-pkg is doing and come up with a good excuse why a different solution has to be invented for Fedora.

The integration of distributed version control and packaging can only reach its full potential if, among other things, it facilitates cooperation between distributors and their upstream providers, their users, and, importantly, other distributors. If each distributor brews up its own solution (again), they'll have a hard time sharing their work with each other. Few upstream projects will have the patience to integrate with several disparate distributor systems, so that integration will be much less likely to happen. All of this can be avoided, though, if the distributors decide now to work toward some common standards for the use of distributed version control in packaging.


(Log in to post comments)

Fedora and distributed source packages

Posted Jul 17, 2008 2:01 UTC (Thu) by me@jasonclinton.com (subscriber, #52701) [Link]

Once again, RedHat employees fail to look outside the company to see if anyone else is working
on something before having a giant brain-storming session and deciding they just invented
something new.

Chock this one up to yet another RedHat NIH.

Still, if they manage to reimplement the wheel in the same way that others have already
invented it, it only serves to benefit upstream more. Good to see progress--that's something,
I guess.

Fedora and distributed source packages

Posted Jul 17, 2008 3:00 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

Once again, you waste no time on checking facts before attacking Red Hat. Go ahead and search
through vcs-pkg mailing list archives to witness participation of Red Hat employees or refer
to http://vcs-pkg.org/people/ where Jesse Keating who is a Red Hat employee and Fedora release
engineer is listed among others as people interested in this effort. 

It is true however that vcs-pkg effort is fairly unknown and very few distributions seem to be
participating and it is quite possible Doug Ledford (a kernel engineer working on the RHEL
side of things as the article noted) who initiated the discussion might not be aware of it.
This isn't a organizational initiative as you seem to be implying. 

Fedora and distributed source packages

Posted Jul 17, 2008 3:08 UTC (Thu) by me@jasonclinton.com (subscriber, #52701) [Link]

RH's cult of NIH is emphatically *not* the result of some PHB somewhere in the system making
it so as you seem to think that I think. Indeed, it's much more subtle.

Having met and discussed this issue with various RH employees, it's quite clear that there are
simply too many people on RH's payroll who either:

a) go home at the end of the day and *never* turn on a computer and so they are never exposed
to any community other than their co-workers or

b) go home at the end of the day and use a Mac.

It's not a development policy problem. It's a hiring issue.

Fedora and distributed source packages

Posted Jul 17, 2008 3:35 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

Quite assuming assertions. Maybe you should be the HR instead of the troll then ;-)

Fedora and distributed source packages

Posted Jul 17, 2008 3:42 UTC (Thu) by me@jasonclinton.com (subscriber, #52701) [Link]

Unfortunately, I can't just submit a patch to RH's HR process or I would.

Fedora and distributed source packages

Posted Jul 17, 2008 6:28 UTC (Thu) by dledford (guest, #52938) [Link]

I will assure you that this is not a case of NIH syndrome.  While I can't speak for the apt
using distros, what I'm working towards needs specific support from not only the Fedora build
system, but also from RPM itself.  After talking with both the rpm4 and rpm5 people, I can
tell you that neither of them had these things on their roadmaps.  So, while the people at
vcs-pkg.org are working some interesting stuff, it simply isn't the same as what I'm trying to
do as they haven't even approached the RPM maintainers for the changes I asked for (or any
variant thereof).

Oh, and it's 2:30am my time...and I don't use a mac ;-)

Fedora and distributed source packages

Posted Jul 17, 2008 8:57 UTC (Thu) by walters (subscriber, #7396) [Link]

Jason, I don't know where you're getting this from; there is very little basis in fact to your
claims.

Fedora and distributed source packages

Posted Jul 17, 2008 15:36 UTC (Thu) by ofeeley (guest, #36105) [Link]

That doesn't seem entirely fair. Part of what seems to have happened is that the Fedora
process by which large changes (such as a new version of RPM) are communicated to the rest of
the community got bypassed. This Feature Process is being refined so that it's easier for
developers to determine whether or not they should be alerting the rest of the development
community to significant changes. See Fedora Weekly News#134 "New RPM Sparks Exploded Source
Debate"[1].

And in this case the Red Hat developer distinctly _did_ establish communication (very
publically) with those driving the clean-up and re-basing of RPM from the mess in which it was
left. All is not sweetness and light, but on the other hand there are some promising and
exciting ideas being discussed.

1. http://fedoraproject.org/wiki/FWN/LatestIssue#New_RPM_Spa...

A self contained RPM is important

Posted Jul 17, 2008 4:29 UTC (Thu) by ctg (subscriber, #3459) [Link]

The idea of pulling down the source at build time worries me.

At the moment the source RPM provide a "matter of record" for that package: what it is, and
how to build it.  The source RPM will be around in 1, 3, or 10 years time (should you wish to
archive it - or the CD/DVD).  But will the online repository?

It might do, but, at the moment, it only requires one party to maintain the source RPM, but
with this sort of scheme it will require many - ISPs, hosting environments, repository,
project maintainers etc. etc.

A self contained RPM is important

Posted Jul 17, 2008 6:06 UTC (Thu) by dledford (guest, #52938) [Link]

Whether pulling down source in a tarball or in a dvcs clone, either way you are pulling down
source at the time of the build.  And keep in mind that the changes I'm working towards mainly
apply to what could be termed as "well behaved" dvcs systems (hg, git, etc.).  In these
systems, even if the upstream repo were to go away entirely, Fedora (or any other distributor)
could keep their own clone of the upstream repo around for as long as they wish, and from that
single repo you could get *all* of the "matter of record" snapshots for a package instead of
needing an srpm per "matter of record" snapshot.  This becomes especially important when you
consider something like the kernel that will never go away.  It's likely that we would throw
away old kernel srpms long before we would purge old kernel versions from a dvcs repo.

A self contained RPM is important

Posted Jul 20, 2008 19:54 UTC (Sun) by salimma (subscriber, #34460) [Link]

Fedora (and I'd assume RHEL too) pull source tarballs from its own cache. Supporting DVCS
systems would save up a lot of space, since a new release would be accomodated by storing the
diffs from the previous release, rather than an entire new copy.

A self contained RPM is important

Posted Jul 17, 2008 6:09 UTC (Thu) by Cato (subscriber, #7643) [Link]

Good point - I think the distro would need to clone any upstream DVCSs using this scheme for
the support lifetime of any given distro release.

Even if the DVCS for a project only goes down temporarily it could be annoying.  Also, what
happens when the project switches DVCS - you'd have to release a new package version to cover
that too.

A self contained RPM is important

Posted Jul 17, 2008 6:33 UTC (Thu) by dledford (guest, #52938) [Link]

Any distro like Fedora already has to copy and retain the upstream sources.  Today we do that
with srpms and tarballs.  We actually keep around tarballs that upstream may have long since
thrown away. Switching to keeping sources in a dvcs setup does not change the requirement to
keep sources, it only changes the format.  So regardless of the change, you can expect Fedora
to always keep its own copy of upstream sources around so that it can always satisfy its open
source license requirements.

A self contained RPM is important

Posted Jul 17, 2008 8:27 UTC (Thu) by nim-nim (subscriber, #34454) [Link]

This won't happen. For many legal and practical reasons the code a package is built from will
be archived either in tar.bz2 or vcs form both Fedora-side and on the source disks the
distribution releases.

So vcs-based packages may result in many things, but certainly not in the source size
reduction this article mentions.

A self contained RPM is important

Posted Jul 17, 2008 11:42 UTC (Thu) by epa (subscriber, #39769) [Link]

I expect you can still build a traditional source package at the same time you build the
binary package.  If you really wanted, it could make an old-style tarball and include that;
alternatively it could include the whole DVCS repository/working copy with all the history.
Either way you have the traditional big lump of source code you can use to recreate the package
and distribute to fulfil your GPL obligations.

WARNING!!! Potential GPL violations ahead

Posted Jul 17, 2008 11:34 UTC (Thu) by faramir (subscriber, #2327) [Link]

An earlier message may have subtly alluded to this problem, but I want to make it explicit.

The GPL REQUIRES that if you distribute binaries that you MUST distribute the source code from
which those binaries where created.  You can't unilaterally claim that someone else is
fulfilling your responsibilities.

There are already too many small distributions (usually respins) which make vague statements
about being based on Fedora/Ubuntu/Debian/etc..  Often they don't even attempt to specify a
particular version let alone actually hosting the  source themselves.  This proposal sounds
like it might be useful from a technical point of view (and would admittedly clear up the
vague references to source code).  However, I believe it would make such GPL violations even
more common.

I also believe that loosening the GPL requirements to allow this would be a bad idea.  Given
the extent to which some companies already find it convenient (if not a direct part of their
business plan) to make it hard to regenerate binaries from GPLed source, I see any climate
which encourages people to not host their own sources as likely to make even harder to
actually get source to the binaries I'm running let alone being in a position to use them.

WARNING!!! Potential GPL violations ahead

Posted Jul 17, 2008 14:18 UTC (Thu) by erwbgy (subscriber, #4104) [Link]

From http://www.gnu.org/licenses/gpl.html:

6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways:
...
d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements.

WARNING!!! Potential GPL violations ahead

Posted Jul 17, 2008 15:02 UTC (Thu) by iabervon (subscriber, #722) [Link]

But you're stuck in a 20th-century idea of "source". These days, a tarball of program code
hardly counts as "source", as the GPL defines it ("the preferred form of the work for making
modifications to it"). If the project is using a DVCS, the developers almost certainly benefit
from it, and prefer to have the history available when making modifications to the work.
Therefore the tarballs are just a less-processed set of object code that's less difficult to
modify than the binary, and may not be sufficient to fulfill the requirements of the GPL.
(Yes, I'm kidding, at least for now. The FSF would probably dispute the claim that you can't
turn off a SVN server for a GPL project for which you don't have sole copyright without
offering a full-database download for 3 years first because each update from SVN is an object
code distribution from the full-repository-content source.)

In any case, the equivalent of a SRPM would really be the clone of the DVCS that contains the
tag on the version to use to build that binary. The location information is really a small
amount of metadata (barely more than the package name, really).

WARNING!!! Potential GPL violations ahead

Posted Jul 17, 2008 18:19 UTC (Thu) by jspaleta (subscriber, #50639) [Link]

Speaking as a Fedora Project Board member.....

To be clear, whatever Fedora does, we'll be doing it in a way that meets and exceeds our
source distribution requires as we understand them currently.

As a project we have a compelling interest in making sure downstream consumers and our own
contributors have the ability to easily replicate exactly how we build things so they can then
provide us with patches..which we can then provide to upstream.  I have a hard time seeing how
we can continue to do that in a way that doesn't exceed the minimum source distribution
requirements set forth any any copyright license that we currently allow for software in the
repository. 

Cloning will have to be involved, there's really no doubt about that.

And hopefully whatever we do, ends up making it easier for us(or any rpm based distro) to
drive bug fixes back to into the upstream projects.  If expanding what rpm thinks of as
"source" to better match how upstream projects internally handle "source" helps us do a better
job of driving bugfixes and enhancements into the upstream codebase.. then we should do it.
Of course, we can't know for sure if the changes being discussed currently will actually make
things easier..or just more complicated.  The best we can hope for is that we come to an
agreement as to the potential value and whether its a smart bet (even if its not a safe bet).
We must continue to make smart bets on where to re-invest in technology upgrades which aids in
the overall sustainability of the entire open source development ecosystem.  

The real-world dvcs landscape is very fractured so there's very little chance that the changes
being discussed when introduced into rpm will instantly allow us to replace how Fedora's
buildsystem operates. If the changes go into rpm to make it possible to support dvcs as a
"source" then from a Fedora policy perspective we will most likely implement usage of that
capability in stages, supplementing the tarball/patches approach not outright replacing it.
Different dvcs concepts work differently and I doubt we could comfortable just flip the switch
for all of them at the same time in our build system.

-jef



Source packages are useful

Posted Jul 18, 2008 10:45 UTC (Fri) by NAR (subscriber, #1313) [Link]

Actually there's one more useful thing in having the sources in the distribution repository -
if I need some software for e.g. Solaris, I usually don't go looking for it in its website,
but go for a Debian mirror and get the .tar.gz from there. It's faster, especially if the
software in question has some dependecies, because those packages are also on the Debian
mirror and most likely there are of a right version.

Fedora and distributed source packages

Posted Jul 24, 2008 21:01 UTC (Thu) by dkite (guest, #4577) [Link]

I'm curious about this solution. What does it solve?

Is there such a large disconnect between distributions and upstream projects?

Maybe that is the case with Fedora. Others?

Derek

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds