|| ||Alexandre Oliva <lxoliva-AT-fsfla.org> |
|| ||git-AT-vger.kernel.org |
|| ||rebase parents, or tracking upstream but removing non-distributable bits |
|| ||Thu, 30 Dec 2010 15:54:29 -0200|
|| ||Article, Thread
Say the git repository of a project I use (with changes) on another
projet I work on contains portions that I oughtn't distribute. Say,
portions that are illegal, immoral or too risky in my jurisdiction:
patented stuff that lawyers say I should not distribute in anyway,
unauthorized or otherwise copyright-infringing bits, text or pictures
that are offensive or even illegal to publish, i.e., stuff that I must
not be caught distributing and that, ideally, I could arrange to not
If you guessed that my primary reason to want this is the non-Free
Software in the Linux git repository, you got it right :-) Anyhow,
regardless of your opinion as to my stance in this matter, I hope you'll
agree that the scenarios above are relevant and desirable. Heck, even a
business that decides to remove all traces from a feature that was
planned for a certain release, but that is pushed back to a later
release, could benefit from this.
Note that simply reverting/removing these bits from the head of a branch
wouldn't be enough: since the repository carries the entire history,
pushing the head of the branch to my public repository would amount to
publishing the bits I must not publish.
I need to be able to maintain and publish a modified repository, that
filters out the unwanted portions, but still be able to pull changes
from the upstream repository. Desirable, but not strictly necessary, is
the possibility of letting upstream pull my improvements, without
bringing in the changes I made to remove the bits I'm not supposed to
Given this problem statement, I started looking for solutions that
didn't require modifying git.
I first looked into rewriting history, removing the unwanted bits and
replaying subsequent changes, but quickly discarded it, for it would
make my local repository incompatible with upstream both ways: I
wouldn't be able to pull from it; upstream wouldn't be able to pull to
it; third parties would run into ugly situations trying to carry patches
from either one to the other.
Now, it looks like I might be able to pull from upstream if I maintain
manually a graft file that named each upstream commit as an additional
parent of the corresponding local rebase commit that brought it into my
rewritten tree. Workable, maybe, but this wouldn't help third parties
that used my public repository.
Besides, I'm concerned that pushing from the local repository (with the
graft file) to the public repository would end up publishing the changes
I'm not supposed to distribute, because they'd be taken as parents of
the local commits.
Are there any other ways to support the desired features with git as-is?
AFAICT, there isn't, so I've been thinking of how to introduce this. I
suppose the simplest way to accomplish this is to introduce the notion
of a "weak parent": one that is taken into account for purposes of
checking whether a commit is present in a branch being merged- or
rebased into, but that is not transmitted over pushes, and that is not
retained over purges, and not complained about when missing.
I'm under the impression that this could not just work, but also make
rebasing in general (especially the hard case) far less problematic, for
git would be able to relate a rebased commit with an original commit.
Now, assuming I'm correct in this assessment, there are two questions
- how to represent this?
I thought of changing the commit blob format so as to somehow mark the
weak parents, say, with an additional character on the same line:
parent f00ba5... W
an alternate header:
or even an additional line:
For some backward compatibility, it looks like only the last form would
as much as stand a chance of being properly parsed, if the weak notes
are added at the end of the blob.
Another possibility is to create another kind of object, that named an
original and rebased commit and that, like a tag object, would be
(optionally?) transmitted when the (rebased) commit it named was
transmitted. This could be more interesting, in that it might enable
all traces of a rebase to be eventually removed. A (named?) object that
names multiple such pairs of commits might make even more sense to this
Am I on the right track? Any thoughts, preferences, suggestions,
concerns, recommendations, advice, pointers or gotchas to watch out for
before I start implementing any of these possibilities?
I realize that, although this option could make "git pull --rebase" work
to track upstream in the rebased branch, and would enable me to publish
the repository with the rebased branch without the pieces I shouldn't
distribute, I'm not sure this would enable upstream to easily integrate
my changes. Or would it?
Thanks in advance,
I'm not subscribed, but I'm going to look for replies in the archives.
That said, I'd appreciate if you'd explicitly copy me in any follow ups.
(Mail-Followup-To: set accordingly)
Last but not least: Happy GNU Year! :-)
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist Red Hat Brazil Compiler Engineer
to post comments)