LWN.net Logo

The git tree is the source code

The git tree is the source code

Posted Nov 10, 2012 12:04 UTC (Sat) by epa (subscriber, #39769)
Parent article: Introducing RedPatch (Ksplice Blog)

The GPL requires that you distribute source in "the preferred form of the work for making modifications to it". If the workflow of programmers is heavily based on git, then the preferred form of the work is the git tree and that is what the GPL requires Red Hat (and others) to distribute. Flattening it into a tarball is no more acceptable than running it through a source mangler to strip all the comments. In both cases you no longer have the work in the form you prefer for modifying it.

Certainly, when Red Hat receive the code from Linus and pals it is in the form of a git tree, and if they distribute their changed version in a different form the onus is on them to show that this is now the "preferred" form for making modifications.


(Log in to post comments)

The git tree is the source code

Posted Nov 10, 2012 12:41 UTC (Sat) by branden (subscriber, #7029) [Link]

That's what I said about two years ago. The few people at Red Hat who could be troubled to respond to my argument at all basically told me to go pound sand.

It will be interesting see if the definition of "preferred form for modifying the work" now changes for the sake of whacking Oracle with a stick.

Which, to be fair, is not without its appeal...

The git tree is the source code

Posted Nov 10, 2012 14:03 UTC (Sat) by khim (subscriber, #9252) [Link]

It will be interesting see if the definition of "preferred form for modifying the work" now changes for the sake of whacking Oracle with a stick.

Not at all. “Closed internal repo with only occasional tarballs drop for outside consumption” is how FSF developed it's flagship products (Emacs, GCC, GLiBC) for years starting from the very beginning. And they created the GPL, remember?

P.S. You may say that FSF eventually opened their repos but remember that this was political decision, not copyright-dictated one.

The git tree is the source code

Posted Nov 10, 2012 19:51 UTC (Sat) by donbarry (guest, #10485) [Link]

The "preferred form for modifying the work" has changed with time as the state of the art has developed. In the 1980s and 1990s, there was sccs, rcs, and eventually cvs. Most development was done by small teams, and the growth of larger free software communities, and a social environment accompanied by effective source management tools to support that social environment -- was in the future. Did the FSF lag behind some other organizations in moving to these tools? Yes and no. Though some of their projects, which are often quite "loosely" within the FSF umbrella, moved faster than others, the FSF also promoted early DVCS efforts like GNU Arch, and made sure that alternatives to the early proprietary social programming sites like sourceforge (which was briefly free software) were available.

We owe them our gratitude. Remember that their resources are often orders of magnitude less than that of even the free software companies.

The git tree is the source code

Posted Nov 15, 2012 13:36 UTC (Thu) by vonbrand (subscriber, #4458) [Link]

I can get Linus' kernel source as a tarball, unpack it and make a git repo for my own use, and thus have a set up that makes my development more productive. I might even throw in some other, closed source tools for my own pleasure. That doesn't mean anybody is entitled to said tools results. What GPL demands is the source code to study and modify, not random gossip on who wrote what line and changed what else. The same way it doesn't require anybody to ship the editor, compiler and whatnot the developers require for their work.

The git tree is the source code

Posted Nov 10, 2012 13:01 UTC (Sat) by paulj (subscriber, #341) [Link]

If no kernel copyright holder objects, then RedHat have no problem, regardless of the merit of this argument (I think it's pretty clear that RedHats' preferred form for working on their kernel package is NOT a single, unified tarball). So a kernel hacker would have to notify them to desist and be prepared to sue them if they do not. How likely is that?

The git tree is the source code

Posted Nov 10, 2012 13:18 UTC (Sat) by dowdle (subscriber, #659) [Link]

I'm sure a handful of Oracle employees have some code in the kernel that they hold copyright on. I definitely favor Red Hat here, but I'm just saying. At the very least, think btrfs.

What is weird to me... is that btrfs is the "zfs for Linux" but it so happens that Oracle also owns zfs. They could change (or dual license) zfs for Linux if they wanted to, but no... they'd prefer to have one product that is ok with the GPL (btrfs) and one that is not (zfs).

The git tree is the source code

Posted Nov 10, 2012 13:59 UTC (Sat) by khim (subscriber, #9252) [Link]

If the workflow of programmers is heavily based on git, then the preferred form of the work is the git tree and that is what the GPL requires Red Hat (and others) to distribute.

Many developers don't use git (Linus himself used just a series of tarballs for years). And since in court you'll be confronted by FSF (it developed gcc, glibc and many other things in exactly this fashion for years—the only difference was that they used CVS and not git back then) who wrote GPL in the first place… I wish you luck—you'll need it.

The git tree is the source code

Posted Nov 10, 2012 16:48 UTC (Sat) by epa (subscriber, #39769) [Link]

The FSF is not relevant here - as the copyright holders of the code they wrote, they could not get in trouble for violating the GPL no matter what they did. I think that if it ever did get to court, showing "the preferred form for making modifications to the program" would be pretty straightforward.

Counsel: And so during your time at Red Hat you normally worked on changes to the software using the git tree?

Witness 1: That's right.

Counsel: But why didn't you just use a tarball instead?

Witness 1: That would make my job more difficult, because...

Counsel: So you would say that you preferred to operate on the git source tree rather than a tarball?

Witness 1: Yes.

The git tree is the source code

Posted Nov 11, 2012 2:31 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

How far down the rabbit hole do you want to go? I prefer to modify the source I work on with Vim using a directory setup which I have Vim and zsh configured to work with easily via custom scripts, aliases, and commands, at least 2 monitors if possible, and so on. How do I enforce that those I distribute to receive this "preferred form for making modifications to the program"?

The line has to be drawn somewhere. IANAL, but I'd think it'd be close to the form which implies the fewest number of "tools" given a "clean system" and is such that a "reasonable modification" could be made. For example, a standard tarball requires tar, one of the decompression binaries, and an editor (which could likely be assumed since modifications need to be made). These "tools" should probably be generally accessible (though not necessarily free), but that's another definition that would need pinned down.

The git tree is the source code

Posted Nov 11, 2012 18:27 UTC (Sun) by khim (subscriber, #9252) [Link]

The FSF is not relevant here - as the copyright holders of the code they wrote, they could not get in trouble for violating the GPL no matter what they did.

FSF set the standard. Others (for example Cygnus) worked in this fashion also.

Counsel: And so during your time at Red Hat you normally worked on changes to the software using the git tree?

Witness 1: That's right.

Counsel: But why didn't you just use a tarball instead?

Witness 1: That would make my job more difficult, because...

Counsel: So you would say that you preferred to operate on the git source tree rather than a tarball?

Witness 1: Yes.

And now for the cross-examination:

Counsel: Was your repo a replica of the official tree?

Witness 1: No.

Counsel: You imported the official tarball and applied local patches on top of that to make repository smaller.

Witness 1: That's right.

Counsel: And you have not used the full replica of the official repo, because...

Witness 1: It'll be too large and unwieldy and it was easier to start with tarball.

As I've said: I wish you luck—you'll need it. I'm not saying it's impossible to prove that git is "the preferred form for making modifications to the program", but it'll be quite hard. You'll need to collect statistic and prove that most developers use replica of the official git tree (I'm not sure if that's even true at all and to use this argument in court you need to prove that it's true without any shadow of doubt), then you'll need to somehow explain why kernel developers themselves don't think git repo is essential, etc.

Fell free to do that if you have too many millions of dollars to burn.

The git tree is the source code

Posted Nov 11, 2012 19:03 UTC (Sun) by Jonno (subscriber, #49613) [Link]

> to use this argument in court you need to prove that it's true *without any shadow of doubt*

Actually, that would be a civil case, so you only have to prove it *on the balance of probabilities* (eg. that is more likely than not) but that is still quite a hurdle.

And BTW, even criminal cases only require *beyond reasonable doubt*, not beyond any doubt, as that would generally be impossible...

The git tree is the source code

Posted Nov 11, 2012 21:22 UTC (Sun) by epa (subscriber, #39769) [Link]

OK, I wasn't aware that developers working on the Linux kernel preferred to start with the tarball rather than fork from Linus's git tree or the git tree of some other kernel developer. If that is the case, then it's fair to say that they are distributing the code in the same form they received it, which is also the preferred form for making further changes.

The git tree is the source code

Posted Nov 12, 2012 15:59 UTC (Mon) by khim (subscriber, #9252) [Link]

It may be surprising, but yes, not all developers use Git. That's well-known fact. And the ones who use it not always use replica of the official repo.

It's hard to find out solid numbers, of course.

The git tree is the source code

Posted Nov 11, 2012 23:06 UTC (Sun) by viro (subscriber, #7872) [Link]

> Counsel: Was your repo a replica of the official tree?

> Witness 1: No.

Perjury is generally frowned upon... I do *not* think that "preferred form" argument would fly, since the general practice is what it is, but your "defence" is a load of horse manure. Anybody who starts a tree by importing a tarball instead of doing git clone is an idiot; "too large and unwieldy" being what, ~500Mb for full history? _And_ that's ~500Mb that only has to be present in a local clone of mainline tree; setting alternates to point to it eliminates that from yours, so unless you seriously propose a workflow that consists of git show or git format-patch on another box + mail or scp to the box where you work on your tree for each backport... *shudder*

In any case, I don't think that "RH kernel team does not consist of hopeless idiots" can be considered confidential information, so... No, rhel6 kernel git tree is *not* set up in such a cretinous way.

The git tree is the source code

Posted Nov 12, 2012 12:03 UTC (Mon) by nix (subscriber, #2304) [Link]

Quite. I suppose it might make sense, if you were doing development on a really space-constrained embedded box or something -- but I hope you're never running a compile on that sort of machine, or you'll be waiting for hours and hours.

(Alternates aren't ideal -- one fetch in the wrong place and oops you suddenly have heaps of redundant objects since you need to pull in the other place too and you probably don't have them set as alternates of each other. You need to be quite rigid in your fetching habits if this is really to save you a lot of space. But, heck, I do that sometimes and it just vanishes in the oceans of space on my multiple-year-old obsolete small disks. Even the *Chromium* source tree vanishes in the oceans of space on those disks. Disks you can actually buy nowadays will be even larger. This argument will never fly.)

The git tree is the source code

Posted Nov 12, 2012 16:05 UTC (Mon) by khim (subscriber, #9252) [Link]

It makes sense when kernel is only part of the package. "~500Mb for full history" may not seem like much, but take 500Mb for kernel, 600Mb for gcc, 100Mb for binutils, etc and you soon arrive at gigabytes.

And if you throw aways history of other projects to save space, then why will you want to offer special treatment to kernel?

The git tree is the source code

Posted Nov 12, 2012 17:13 UTC (Mon) by peter.todd (subscriber, #63121) [Link]

Git offers a method to do a git checkout without getting full history known as a shallow clone. It's fast and allows you to later get the rest of history if you need to merge the tree later on. Secondly if you haven't been importing binary files into your repository even years worth of changes tends to compress very effectively to the point where the space taken up by old revisions is a small multiple of the most recent revision.

I find it hard to believe that arguing tarballs are the preferred form of source code for a project otherwise developed with git would pass the balance of probabilities test in a civil case. After all, as mentioned elsewhere the other side just needs to ask your developers what they use to work with the source code. Even if they reply with a different revision control system, that indicates that you should be publishing your changes from it instead.

Having said that, if internally you *actually* don't use *any* revision control system, then yeah, maybe for you the preferred form really is tarballs.

The git tree is the source code

Posted Nov 12, 2012 19:34 UTC (Mon) by tzafrir (subscriber, #11501) [Link]

A shallow clone removes all the history from a project. I suspect you'll find it hard to believe that arguing shallow clones are the preferred form of source code for a project otherwise developed with git would pass the balance of probabilities test in a civil case.

The git tree is the source code

Posted Nov 12, 2012 21:43 UTC (Mon) by viro (subscriber, #7872) [Link]

FWIW, I don't believe that such a requirement would fly in a court; at least not for quite a few years. Said that, size arguments are BS:
$ du -s .
652664 .
$ du -s .git
118828 .git
and that - on a tree with alternates pointing to straight mirror of kernel.org linux-2.6.git (which obviously doesn't need to be distributed). Note that unpacked kernel source eats about 4.5 times more than everything in .git. IOW, it's really noise. Granted, that's unpacked (i.e. working tree, not package being distributed), but packed (tar.bz2) will give only ~2 times increase compared to that of source without history - more than 1.2, but not very much more.

_Legally_ git doesn't qualify as "preferred source", but for all practical purposes it is strongly preferred as far as kernel work is concerned. gcc is a different story - they prefer suckversion and _that_ is a space hog, indeed. binutils... IIRC, they also use svn these days.

I don't know how to express that in license without running into insane corners, like "you must never rebase / cherry-pick / fold incremental fixes". On the other hand there's patently obnoxious behaviour several groups used to demonstrate - once a year or so they ran diff between the mainline and whatever had been in their CVS tree and post megabytes of non-differentiated garbage to e.g. l-k, usually reverting a bunch of fixes in process. Hell knows... TBH, I doubt that license is the right tool here...

The git tree is the source code

Posted Nov 13, 2012 12:12 UTC (Tue) by jwakely (subscriber, #60262) [Link]

> gcc is a different story - they prefer suckversion

It's true the repo lives in subversion, but some (many?) of us use git-svn to work with it. I certainly don't prefer subversion and can't remember the last time I used subversion to commit anything.

The git tree is the source code

Posted Nov 10, 2012 14:00 UTC (Sat) by Otus (guest, #67685) [Link]

> If the workflow of programmers is heavily based on git, then the preferred
> form of the work is the git tree and that is what the GPL requires Red Hat
> (and others) to distribute.

The problem I (IANAL) see with that argument is that the GPL allows you to
keep undistributed versions private. Those intermediate versions in the git
tree were not distributed, and could in theory even include upcoming
features that were later taken out and still count as trade secrets.

The git tree is the source code

Posted Nov 10, 2012 14:14 UTC (Sat) by pbonzini (subscriber, #60935) [Link]

The keyword here is *modification*. You do not need patches to modify the Red Hat tree.

You need patches to analyze a process (the creation of the Red Hat kernel) and replicate the same changes to *another* program (the Oracle kernel).

I see it the other way round: the actual preferred form of modification is flattened source code. To modify something you work on a checkout, not on a source tarball + some patches. "Source code + patches" is only acceptable because it is easily flattened.

Think of it. Suppose you have a bug in a Fedora package and you found that there is a fix upstream for it, for example via a Bugzilla search. Roughly speaking, Fedora packages are distribute as a tarball and a possibly empty set of patches.

The first things you do in order to test the fix are "fedpkg prep" to flatten the Fedora package and "git clone" to fetch the upstream change. You do not need the patches that build up the Fedora package (which is what you're modifying), but you need the patches that build up upstream (which is what you're analyzing.

The git tree is the source code

Posted Nov 10, 2012 16:50 UTC (Sat) by epa (subscriber, #39769) [Link]

I see it the other way round: the actual preferred form of modification is flattened source code.
There's some merit in that argument. Even when people do fetch a git tree to modify the code, they don't usually look at the history of past patches but just work on their change based on the state of the code as it is now.

The git tree is the source code

Posted Nov 10, 2012 21:48 UTC (Sat) by apoelstra (subscriber, #75205) [Link]

>There's some merit in that argument. Even when people do fetch a git tree to modify the code, they don't usually look at the history of past patches but just work on their change based on the state of the code as it is now.

Most code I changes to code, it's usually because

(a) I am a project developer, and I need git to keep track of all the work I'm doing, or
(b) I am not a project developer, so I'm just doing a small bugfix. Then I'd like 'git log' or 'git blame' to find out where the bug came from.

The git tree is the source code

Posted Nov 15, 2012 12:08 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

> (a) I am a project developer, and I need git to keep track of all the work I'm doing, or

Just use quilt, you do not need the history.

(Devil's advocate, of course---nowadays I'm spoiled by the ease of doing "git init", but I was doing the above a lot in CVS/svn days).

> (b) I am not a project developer, so I'm just doing a small bugfix. Then I'd like 'git log' or 'git blame' to find out where the bug came from.

Nice to have, but definitely not a requirement for modification.

(Not devil's advocate here, I rarely do "git clone" if I'm just fixing something for myself. I just work on top of the Fedora package).

The git tree is the source code

Posted Nov 11, 2012 11:34 UTC (Sun) by cyanit (guest, #86671) [Link]

Well, some kinds of modifications like removing or altering a specific set of changes are much better done on the git tree, and the original work (Linux) is also provided as a git tree.

It seems to me they might very well be in violation of the GPL.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds