User: Password:
|
|
Subscribe / Log in / New account

Merge Commits

Merge Commits

Posted Oct 12, 2010 19:40 UTC (Tue) by mingo (subscriber, #31122)
In reply to: Merge Commits by daglwn
Parent article: Lessons from PostgreSQL's Git transition

... We still allow any developers (and committers) to use whatever parts of git they want as they develop, but for commits going into the main tree, we are making a number of restrictions ... We will not allow merge commits ... We will not use the author field in git to tag it with the patches original author ... we will require that author and committer are always set to the same thing, and we will then credit the author(s) (along with the reviewer(s)) in the commit message ...

Ouch, indeed this looks like a broken Git workflow.

The 'cannot review patches' argument appears to be a bad excuse - trees with merge commits are just as easy to review as linear trees. (In fact often they are easier to review as they show the natural progress of a feature instead of some artificial after-the-fact representation of it. True history is also easier to debug and bisect, etc.)

From this list alone it appears to me that someone is trying to keep central control/power, and got surprised during the Git conversion that a distributed SCM works against that.

They wont enjoy the full power of Git unless they start handling their contributors as equals and allow them to become sub-maintainers - with merge commits, true history, etc.


(Log in to post comments)

Merge Commits

Posted Oct 12, 2010 19:48 UTC (Tue) by dmk (subscriber, #50141) [Link]

Give them time. They first have to de-cvs themselves...

Merge Commits

Posted Oct 12, 2010 19:52 UTC (Tue) by corbet (editor, #1) [Link]

That's my impression too, after having questioned this on their mailing list a couple months or so ago. I think it's mostly a matter of not wanting to change too many things at once. The tool change is now done; one assume that the workflow changes will come in their own time.

Merge Commits

Posted Oct 13, 2010 2:54 UTC (Wed) by njs (guest, #40338) [Link]

Heck, I *like* it when my RDBMS developers are super-conservative and risk-averse...

Merge Commits

Posted Oct 13, 2010 6:46 UTC (Wed) by mingo (subscriber, #31122) [Link]

I have no problems with RDBMS developer being conservative and progressing slowly and meticulously.

We should be careful to not base that kind of healthy conservatism on misunderstandings though - and the merge arguments seem to stem from misunderstandings of Git workflows.

In any case i'd like to congratulate the PostgreSQL project for making the difficult transition to Git - i don't think they will regret it! :-)

Merge Commits

Posted Oct 15, 2010 7:22 UTC (Fri) by dark (guest, #8483) [Link]

No, no, an essential part of conservatism is the assumption that new things are not yet fully understood. If they (as a project) have misunderstandings about Git workflows then that is an excellent reason to stick with their current workflows for now.

Merge Commits

Posted Oct 15, 2010 7:52 UTC (Fri) by mingo (subscriber, #31122) [Link]

No, no, an essential part of conservatism is the assumption that new things are not yet fully understood. If they (as a project) have misunderstandings about Git workflows then that is an excellent reason to stick with their current workflows for now.

Saying that "we are sticking with our existing workflow because we don't understand the Git workflow yet" is of course fine and is a valid approach, but that is not what they did: instead they explicitly claimed things about the Git workflow which is simply not true, and justified their steps with those (incorrect) assumptions.

Claiming/believing things that are not true is obviously not a productive element of 'conservativism'.

Merge Commits

Posted Oct 13, 2010 2:32 UTC (Wed) by yarikoptic (subscriber, #36795) [Link]

Yeap... and then some time they would discover

git diff branch1...branch2
and
git log branch1..branch2

and why those two of the "same kind" whenever

git diff branch1..branch2
git log branch1..branch2

are not quite brothers ;-)

Merge Commits

Posted Oct 12, 2010 20:30 UTC (Tue) by jberkus (subscriber, #55561) [Link]

As I said elsewhere in the article, the goal was to NOT change the current workflow for patch approval. Keep in mind that the PostgreSQL project has a smaller ecosystem than Linux; there's only 26 committers and around 100 active major contributors, so the centralization is not considered a problem.

More importantly, several committers did not even try Git until the migration happened. Before we could even consider changing workflows, all contributors will need to be comfortable with Git. That's at least 6 months off, which really means for the 9.2 development cycle *at the soonest*.

The changes to the Commitfests don't need to be dramatic; in a lot of ways, linking to a git snapshot would be much easier than the current e-mail-and-link-to-archive method. However, when you have people who have 14 years of experience reviewing context-diff patches for the project, they're not going to adjust quickly to another method. And there's no reason to make them adjust quickly, either.

Merge Commits

Posted Oct 12, 2010 20:50 UTC (Tue) by daglwn (guest, #65432) [Link]

And there's no reason to make them adjust quickly, either.

Oh, but there is. As others have pointed out, a lot of tools rely on various git conventions. It's easy enough to create a context diff from git. That seems like a separate issue from how the merge is actually done. I don't see any reason not to use git's merge power to make life so much easier.

Merge Commits

Posted Oct 13, 2010 11:55 UTC (Wed) by marcH (subscriber, #57642) [Link]

> As I said elsewhere in the article, the goal was to NOT change the current workflow for patch approval.

Considering all the problems you have been through this sounds more than reasonable. One thing at a time. Moreover time just works for you now.

Thanks for a great article.

Merge Commits

Posted Oct 13, 2010 11:59 UTC (Wed) by marcH (subscriber, #57642) [Link]

> trees with merge commits are just as easy to review as linear trees. (In fact often they are easier to review as they show the natural progress of a feature instead of some artificial after-the-fact representation of it.

I do not see how a linear tree with serialized features is more difficult to review. AFAIK the only information lost is the concurrent progress of different features... how is this harmful?

> True history is also easier to debug and bisect, etc.

Please explain that too. An example maybe?

Merge Commits

Posted Oct 13, 2010 12:25 UTC (Wed) by mingo (subscriber, #31122) [Link]

I do not see how a linear tree with serialized features is more difficult to review. AFAIK the only information lost is the concurrent progress of different features... how is this harmful?
I pull quite a few trees from sub-maintainers and I generally find non-rebased trees easier to review, for multiple reasons:

- The timeline is visible. Was the feature done on a single day? Done over several days, weeks or months? Which bit took the most time?
- Bugs are visible and give me the maintainer a way to see the natural stability (and the natural problem points of a feature) - helping me judge whether to merge something or not. If i see a tree that has been rebased on the day it got sent to me i lose this kind of info.
- Progression of the feature is more visible: it usually starts with a 'baby feature' commit, then goes down towards maturity.
- I can pull something that i know is old enough and is reasonably stable - looking at the timestamps. With a rebased tree you never really know. It might be fine - or not.
So as long as a tree is maintained in an orderly fashion (i.e. it does not have messy changelogs and messy bugs and messy merges, etc.) a tree with true history is much more valuable to maintainers (and future reviewers/bugfinders) than a rebased tree.

True history is also easier to debug and bisect, etc.
Please explain that too. An example maybe?
Trees with true history tend to follow development practices more closely, so they tend to be more fine-grained. Those kinds of trees are easier to bisect - even though it will also have 'broken' commits in them, with live bugs.

Also, another problem we saw with rebases in the Linux kernel were trees rebased to some new base - triggering new, not-thought-of-before interactions - or basing on a buggy new base.

On the other hand, real-history trees tend to be based on something stable that works fine for a group of developers for a longer period of time. That's a pretty good practical guarantee.

To put it in a different way: true-history trees tend to be done 'defensively', with every commit having a real meaning and having some real testing - because this is the tree that the developers worked on for a long time. (I don't pull from people who don't have clean maintenance practices)

A rebased/linearized tree tends to be done with the knowledge that the end-result tested out fine - and often the intermediate steps are not reliable or suffer bit-rot due to the rebase. We've had many problems with that in the Linux kernel. A rebase is a 'risks every single commit' kind of global operation, with associated global risks.

So, in theory you are right, a linear tree can be better than a true-history tree - simply because any problem of a true-history tree can be eliminated via a rebase.

In practice though, after years of experience with them in the Linux kernel context, they are markedly worse.

YMMV. With a small enough project you can use just about any workflow with Git and not feel any pain really. But if you want your project to grow (and i'm sure most of us want to see PostgreSQL grow) then sooner or later 'Git best practices' need to be considered.

The IMO two best Git workflows in existence are the Git project itself, and the Linux kernel.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds