|From:||Alexandre Oliva <lxoliva-AT-fsfla.org>|
|Subject:||rebase parents, or tracking upstream but removing non-distributable bits|
|Date:||Thu, 30 Dec 2010 15:54:29 -0200|
Say the git repository of a project I use (with changes) on another projet I work on contains portions that I oughtn't distribute. Say, portions that are illegal, immoral or too risky in my jurisdiction: patented stuff that lawyers say I should not distribute in anyway, unauthorized or otherwise copyright-infringing bits, text or pictures that are offensive or even illegal to publish, i.e., stuff that I must not be caught distributing and that, ideally, I could arrange to not even possess. If you guessed that my primary reason to want this is the non-Free Software in the Linux git repository, you got it right :-) Anyhow, regardless of your opinion as to my stance in this matter, I hope you'll agree that the scenarios above are relevant and desirable. Heck, even a business that decides to remove all traces from a feature that was planned for a certain release, but that is pushed back to a later release, could benefit from this. Note that simply reverting/removing these bits from the head of a branch wouldn't be enough: since the repository carries the entire history, pushing the head of the branch to my public repository would amount to publishing the bits I must not publish. I need to be able to maintain and publish a modified repository, that filters out the unwanted portions, but still be able to pull changes from the upstream repository. Desirable, but not strictly necessary, is the possibility of letting upstream pull my improvements, without bringing in the changes I made to remove the bits I'm not supposed to distribute. Given this problem statement, I started looking for solutions that didn't require modifying git. I first looked into rewriting history, removing the unwanted bits and replaying subsequent changes, but quickly discarded it, for it would make my local repository incompatible with upstream both ways: I wouldn't be able to pull from it; upstream wouldn't be able to pull to it; third parties would run into ugly situations trying to carry patches from either one to the other. Now, it looks like I might be able to pull from upstream if I maintain manually a graft file that named each upstream commit as an additional parent of the corresponding local rebase commit that brought it into my rewritten tree. Workable, maybe, but this wouldn't help third parties that used my public repository. Besides, I'm concerned that pushing from the local repository (with the graft file) to the public repository would end up publishing the changes I'm not supposed to distribute, because they'd be taken as parents of the local commits. Are there any other ways to support the desired features with git as-is? AFAICT, there isn't, so I've been thinking of how to introduce this. I suppose the simplest way to accomplish this is to introduce the notion of a "weak parent": one that is taken into account for purposes of checking whether a commit is present in a branch being merged- or rebased into, but that is not transmitted over pushes, and that is not retained over purges, and not complained about when missing. I'm under the impression that this could not just work, but also make rebasing in general (especially the hard case) far less problematic, for git would be able to relate a rebased commit with an original commit. Now, assuming I'm correct in this assessment, there are two questions that remain: - how to represent this? I thought of changing the commit blob format so as to somehow mark the weak parents, say, with an additional character on the same line: parent f00ba5... W an alternate header: wparent f00ba5... or even an additional line: parent f00ba5... ... weak f00ba5... For some backward compatibility, it looks like only the last form would as much as stand a chance of being properly parsed, if the weak notes are added at the end of the blob. Another possibility is to create another kind of object, that named an original and rebased commit and that, like a tag object, would be (optionally?) transmitted when the (rebased) commit it named was transmitted. This could be more interesting, in that it might enable all traces of a rebase to be eventually removed. A (named?) object that names multiple such pairs of commits might make even more sense to this end. Am I on the right track? Any thoughts, preferences, suggestions, concerns, recommendations, advice, pointers or gotchas to watch out for before I start implementing any of these possibilities? I realize that, although this option could make "git pull --rebase" work to track upstream in the rebased branch, and would enable me to publish the repository with the rebased branch without the pieces I shouldn't distribute, I'm not sure this would enable upstream to easily integrate my changes. Or would it? Thanks in advance, I'm not subscribed, but I'm going to look for replies in the archives. That said, I'd appreciate if you'd explicitly copy me in any follow ups. (Mail-Followup-To: set accordingly) Last but not least: Happy GNU Year! :-) -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds