Posted Oct 13, 2010 12:25 UTC (Wed) by mingo
In reply to: Merge Commits
Parent article: Lessons from PostgreSQL's Git transition
I do not see how a linear tree with serialized features is more difficult to review. AFAIK the only information lost is the concurrent progress of different features... how is this harmful?
I pull quite a few trees from sub-maintainers and I generally find non-rebased trees easier to review, for multiple reasons:
- The timeline is visible. Was the feature done on a single day? Done over several days, weeks or months? Which bit took the most time?
- Bugs are visible and give me the maintainer a way to see the natural stability (and the natural problem points of a feature) - helping me judge whether to merge something or not. If i see a tree that has been rebased on the day it got sent to me i lose this kind of info.
- Progression of the feature is more visible: it usually starts with a 'baby feature' commit, then goes down towards maturity.
- I can pull something that i know is old enough and is reasonably stable - looking at the timestamps. With a rebased tree you never really know. It might be fine - or not.
So as long as a tree is maintained in an orderly fashion (i.e. it does not have messy changelogs and messy bugs and messy merges, etc.) a tree with true history is much more valuable to maintainers (and future reviewers/bugfinders) than a rebased tree.
Trees with true history tend to follow development practices more closely, so they tend to be more fine-grained. Those kinds of trees are easier to bisect - even though it will also have 'broken' commits in them, with live bugs.
True history is also easier to debug and bisect, etc.
Please explain that too. An example maybe?
Also, another problem we saw with rebases in the Linux kernel were trees rebased to some new base - triggering new, not-thought-of-before interactions - or basing on a buggy new base.
On the other hand, real-history trees tend to be based on something stable that works fine for a group of developers for a longer period of time. That's a pretty good practical guarantee.
To put it in a different way: true-history trees tend to be done 'defensively', with every commit having a real meaning and having some real testing - because this is the tree that the developers worked on for a long time. (I don't pull from people who don't have clean maintenance practices)
A rebased/linearized tree tends to be done with the knowledge that the end-result tested out fine - and often the intermediate steps are not reliable or suffer bit-rot due to the rebase. We've had many problems with that in the Linux kernel. A rebase is a 'risks every single commit' kind of global operation, with associated global risks.
So, in theory you are right, a linear tree can be better than a true-history tree - simply because any problem of a true-history tree can be eliminated via a rebase.
In practice though, after years of experience with them in the Linux kernel context, they are markedly worse.
YMMV. With a small enough project you can use just about any workflow with Git and not feel any pain really. But if you want your project to grow (and i'm sure most of us want to see PostgreSQL grow) then sooner or later 'Git best practices' need to be considered.
The IMO two best Git workflows in existence are the Git project itself, and the Linux kernel.
to post comments)