Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
Finding a patch's kernel version with git
Posted Jun 17, 2010 16:44 UTC (Thu) by dgm (subscriber, #49227)
Posted Jun 17, 2010 18:01 UTC (Thu) by iabervon (subscriber, #722)
In this particular case, however, I suspect that, if they'd been using a centralized system, it still would have looked like a fix was needed for earlier versions, but that impression would have been correct. A decentralized system makes it feasible to use a process which prevents bugs from affecting releases that one would expect them to affect. Even aside from getting the analysis right, it's hard to say that, when it looks like a bug affects several releases, you'd rather it actually affect all of them than turn out to have been excluded from them.
Posted Jun 18, 2010 9:53 UTC (Fri) by marcH (subscriber, #57642)
> Could you provide an example?
238 release_4 tag
345 bugfix commit
525 release_5 tag
From the above it is obvious that release 5 is fixed while release 4 is not.
> To the extent that, with a centralized system, developers can't do any development that would make the naive method wrong, that's true.
> A decentralized system makes it feasible to use a process which prevents bugs from affecting releases that one would expect them to affect. Even aside from getting the analysis right, it's hard to say that, when it looks like a bug affects several releases, you'd rather it actually affect all of them than turn out to have been excluded from them.
Posted Jun 18, 2010 11:11 UTC (Fri) by farnz (guest, #17727)
Except that your "obvious" is wrong.
Release 4 clearly can't contain commit 345, which occurred after release 4; but, thanks to branching, release 5 may also not contain commit 345. Consider the following timeline:
238 |TRUNK |release_4 tag
239 |TRUNK |Create release_4_maintenance branch
345 |release_4_maintenance|bugfix commit
525 |TRUNK |release_5 tag
It should be obvious that release 5 is not fixed, while some maintenance release of release 4 (say 4.1) is fixed. The apparent order in the commit ID has confused things, as while commit 345 happened before release 5 was tagged, it happened in a branch. I'm assuming here that thanks to human error, the developer forgot to ensure that the fix was present on TRUNK as well as the release_4_maintenance branch - this could happen if he's fixing it in a hurry because R4.1 is due out and this bug is important to a customer.
Posted Jun 18, 2010 13:26 UTC (Fri) by marcH (subscriber, #57642)
Posted Jun 18, 2010 13:34 UTC (Fri) by farnz (guest, #17727)
No, I read it, but I thought you were completely and utterly out of touch with reality. Everyone I know who uses source control has multiple branches; if nothing else, you branch when you tag so that you have somewhere to work on maintenance fixes separately from your feature improvements. All that the difficulty in merging that SVN and CVS impose (but other centralised systems don't - this isn't a natural advantage of DVCSes) does is ensure that I'm more likely to make the sort of human error that I mention in my post.
Clearly, you did not bother to read my post; you just saw a counterexample to your point, and decided to play hurt.
Posted Jun 20, 2010 7:23 UTC (Sun) by chad.netzer (✭ supporter ✭, #4257)
Posted Jun 20, 2010 21:00 UTC (Sun) by marcH (subscriber, #57642)
Please demonstrate it in more detail, thanks. Warning: you're not allowed completely artificial commands or workflows that a regular SVN user would never use in pratice.
Posted Jun 21, 2010 8:11 UTC (Mon) by chad.netzer (✭ supporter ✭, #4257)
Any system of development with feature branches, release branches, and a trunk can allow this situation to occur, since not all features need be merged into trunk before a release. farnz's example was adequate; rev numbers themselves aren't an indicator of when and where a bug might have been merged into trunk. That's why nearly all modern systems have history visualization.
You were discussing "centralized systems", and made the claim "by making branching and merging expensive, you prevent yourself from going into a complicated situation where you cannot easily track commits any more." However, I was discussing SVN, and one of the features of SVN is that branches are cheap:
So, does your claim apply to SVN? If so, perhaps you can suggest a change to the example SVN workflow on Wikipedia, demonstrating "many branches", so that it can be improved.
Posted Jun 21, 2010 16:18 UTC (Mon) by marcH (subscriber, #57642)
This is about the implementation while I am talking about the workflow.
Please have a look at one of Linus' rants about centralized systems where he typically complains about the inconvenience of branching *and merging* in such systems.
Posted Jun 21, 2010 18:40 UTC (Mon) by chad.netzer (✭ supporter ✭, #4257)
If you mean that the common workflow in centralized systems does not allow long lived branches, and certainly not across releases, well I suppose I can accept that, but it is more a matter of discipline than central vs. distributed. Basically it means keeping non-mergeable feature branches out of the repository, or always rebasing them after a new release, so that branch rev ids are contained strictly between two releases. Neither option seems pleasant.
Posted Jun 20, 2010 20:56 UTC (Sun) by marcH (subscriber, #57642)
In a centralized system no one ever considers such a timeline which mixes different branches. Why would you shoot yourself in the foot like this? It would not make any sense.
The unfortunate side-effect of a decentralized system is that you cannot avoid considering such timelines, because of the incredible ease of branching and merging that you have there.
Posted Jun 20, 2010 21:35 UTC (Sun) by farnz (guest, #17727)
Consider the following arrangement (which isn't uncommon); development takes place on the main branch (trunk in SVN terminology). When a release is made, the release manager tags it with a tag like "release-4-0-0", and they also create a "maint-4-0" branch for bugfixes only. Development of new features continues apace on trunk, leading towards a 4.1 release (tag "release-4-1-0", branch "maint-4-1").
During a deployment of 4.0.0, a bug report comes in; as the sustaining engineer responsible for 4.0, I have a copy of the maint-4-0 branch checked out, as well as a copy of trunk. It's late on a Friday, but as the bug reporter is important, I diagnose the bug, discovering that it's an obscure interaction between something very specific in their configuration and our software. Because it's an obscure interaction, it's unlikely to be seen by anyone else, and automated testing won't catch it. However, it's a simple bug to fix, so I code the fix, check it in, and send the reporter a hotfix to test.
I duly note in the bug tracker that I've checked in a fix, SVN revision 155234. Process tells me that I need to forward port the fix to trunk as well as check it in to maintenance, but I'm now very tired, and don't trust myself to forward port it properly, so I leave it, fully intending to come back and do the rest of the job later.
Life intervenes, and I forget. The only formal record of the problem says it was fixed in revision 155234; my colleages note that trunk is now at 156974, and assume that the fix is in trunk. Because we got ahead of plan on feature development, 4.1.0 is ready before we've accumulated enough minor bugs to make a 4.0.1 release worthwhile; we can tell from the revision number that my fix is after 4.0 was released, but before 4.1 was ready. The resulting assumption upsets the original bug reporter, because they saw a 4.0 release that failed, a quick fix that I said was checked in, and a 4.1 release that fails again in the same way.
And before you claim this is unrealistic, that no-one uses SVN like this, please be aware that I've not worked anywhere in the last 10 years that uses centralised VCS (not just SVN, but commercial VCS as well), and doesn't use it like this. Most large projects have a need for branches - if nothing else, one for development, one for maintenance, so that users can pick up just bug fixes to the latest release without picking up half-baked new features too.
Posted Jun 21, 2010 8:24 UTC (Mon) by marcH (subscriber, #57642)
This is the unrealistic part. The above assumes that you *and* your colleagues have never heard about branching in their entire life.
Everyone using SVN knows well that a revision number is meaningless without a branch. In the real (and centralized) world, you duly note in the bug tracker in which revision *and branch* you committed the fix. Even if you are clueless and forget the branch information, then your less incompetent colleagues will notice your mistake and find this information instantly in just one simple "svn log -r 155234" command: much easier than in decentralized, branching-fest system.
Posted Jun 21, 2010 8:52 UTC (Mon) by farnz (guest, #17727)
Actually, this is based on real events - it does happen like this in the real world, because I've been there to experience it. My colleague noted the revision number in bugtracker, expecting it to be obvious and implicit that he was working on the maint branch, because he was the sustaining engineer for that version (as well as working on features for the new version). He forgot (human error, it happens) to forward port it, or to flag that he hadn't done so.
When the release manager looked to see if all bugs were fixed, they assumed (because this is true for most bugs in our bugtracker, and because we'd decided relatively early in the process to skip 4.0.1, so very little work had been done on maint - it was just bad luck that this was one of the first "in-the-field" bugs for that release) that the fixed-in revision was for trunk, and thus assumed that it was fixed in trunk but not maint.
Had we done a 4.0.1 release, this would have been caught, as the release manager would have noticed then that he had a bug fixed in maint but not on trunk, or in trunk but not maint (either way round, not acceptable). Had we not decided relatively early to skip 4.0.1, it would have been obvious that we had bugs from 4.0.0 which only had one fixed in revision, so that we couldn't have fixed them on both trunk and maint. However, most bugs from 4.0.0 were just fixed on trunk, because we knew we weren't going to do a 4.0.1.
Had the revision number not been so "obviously" ordered, we'd have not made the mistake either; CVS style revision numbers would have protected us, because it would have been clear that the fix was on a branch only, and DVCS-style commit hashes would have protected us, because we'd have to ask the tools to tell us which branches contained the fix. Indeed, had we not had "obviously" ordered revision numbers, the release manager would have used the tools we'd built on top of SVN to confirm that bug fixes were in trunk as well as maint - it was the combination of very few fixes actually made to maint, and the hassle of running the tool when it was "obvious" from revision numbers that the fix was in the release we were about to make that resulted in this bit of human error.
Of course, this takes us a long way from your claim that this is an advantage of CVCS - you had said that all I had to do was look at revision numbers; I've given you a counterexample from experience where the ordering of revision numbers resulted in human confusion, and you're saying that it couldn't have happened like that, because people don't make that sort of mistake.
Posted Jun 21, 2010 11:46 UTC (Mon) by marcH (subscriber, #57642)
Yes, all you have to do is to look at revision numbers *in the same branch* (otherwise the order is obviously meaningless). SVN makes this very easy whereas git does not (because branches are so much more flexible in git). That's all really.
The rest is just about the day you overlooked SVN branches, not very relevant to this discussion.
As long as you use "branchless" SVN revision numbers + make wild guesses about these numbers then you will hit the same problem again and again. Simply stop making guesses; how difficult is that? By the way your painfully long explanation above demonstrates that this type of mistake is not common, even for your team.
Posted Jun 21, 2010 12:11 UTC (Mon) by farnz (guest, #17727)
The problem is that because revision numbers are not per-branch in most CVCSes (CVS is one exception), it's far too easy to accidentally compare revision numbers in different branches, without realising that that's what you've done. You're never going to stop people throwing around branchless revision numbers (whether they be SVN repository versions, or git SHA tags), because they're a convenient shorthand and happen to work just fine most of the time.
Further, because it's "obvious" that r176594 came after r176593, it's all too easy to assume that whatever changed in r176593 is also changed in r176594. This isn't a safe assumption in a world with branches, but in a CVCS world, you normally get away without explicitly stating which branch a revision is on. Same applies in the DVCS world, but because there's no "obvious" order to DVCS commit IDs, people don't make assumptions when they see a bare revision number; they'll check one way or another.
The painfully long explanation is simply because you refused to believe that people did this sort of thing when I presented it without explanation. Now I've added an explanation, you're complaining that it's unrealistic because I've had to tell you in great detail just how it happens, since you didn't believe that it did happen at all unless you had it explained in great detail?!? Oh, and your initial example didn't include branch names as part of the revision ID, either; if you really expect that developers don't use bare revision IDs, but use the combination of branch name and revision ID, why didn't your example include them?
Posted Jun 21, 2010 16:12 UTC (Mon) by marcH (subscriber, #57642)
If you cannot see the contradiction above then there is not much that can be done.
> The painfully long explanation is simply because you refused to believe that people did this sort of thing when I presented it without explanation.
The only reason your explanation is long is because it has to enumerate an unlikely long chain of mistakes, oversights and clueless SVN users. Or are you seriously saying such branch confusion happens often at your place? And yet you are still using branchless numbers? Hard to believe.
I asked for more detail because your first "explanation" was just: "thanks to human error, the developer forgot to ensure that the fix was present on TRUNK". This did not explain anything.
> and your initial example didn't include branch names as part of the revision ID, either;[...] why didn't your example include them?
Because you were supposed to read the entire post, not just the beginning of it. OK: this second post of mine sucked. Believe it or not, but I am glad you pushed me in clarifying the whole thing.
Posted Jun 21, 2010 16:43 UTC (Mon) by bronson (subscriber, #4806)
Obviously he sees the contradiction -- he pointed it out. "You're never going to stop people throwing around branchless revision numbers" means that until 'svn st' includes the branch name so people will copy-n-paste both into an email in one swipe, svn users will tend to use branchless revision numbers even when they know better.
You both are doing a great job of slinging insults and quote mining, making very little forward progress on what is ultimately an absurdly simple concept.
Posted Jun 21, 2010 15:48 UTC (Mon) by anselm (subscriber, #2796)
The underlying problem here is that SVN, being conceptually a souped-up CVS, basically deals in »snapshots of trees«. A branch in SVN is essentially just another subdirectory inside the repository (alongside »trunk«), and snapshots of the whole repository tree get numbered consecutively no matter where any changes happen.
DVCSes like git, on the other hand, normally work in terms of »changesets«, i.e., atomic collections of patches to files in the repository. A branch is just a sequence of changesets that hasn't yet been applied to the main repository (the »trunk« if you will).
The advantage of the changeset-based approach is that it is a lot easier to tell whether a repository contains a given change(set), where with SVN the system needs to try to infer this after the fact from looking at the files in question – which is why DVCSes are generally much better at merging than SVN. If the previous poster had used a DVCS instead of SVN, it would have been very obvious that the 4.0.1 bug fix had not been reintroduced to the mainline (i.e., the changeset incorporating the fix had not been merged), where SVN, on the other hand, didn't really help with this. The idea of numbering revisions consecutively even if that doesn't mean anything in the context of a branched repository is not one of the high points of SVN's design.
Posted Jun 18, 2010 16:40 UTC (Fri) by iabervon (subscriber, #722)
2.6.33+5 Commit that introduces a bug
2.6.35-rc1 Merge the buggy commit to mainline
In a centralized system, it would be obvious whether the bugfix would have to be backported to the 2.6.34 release; with a decentralized system, it requires inspecting the commit graph. However, with a centralized system, the bugfix would have to be backported to the 2.6.34 release, because the bug was in work that was committed before the 2.6.34 release; with a decentralized system, merging new development could be put off until after the release without making it harder to do the development, so the bug was not in the 2.6.34 release.
The unstated assumption in many comparisons is that, in a centralized system, the item labeled "2.6.33+5" above would have been where the item labeled "2.6.35-rc1" is, but this is unrealistic because it would require the developer to either wait until that time to write the code, or leave the change uncommitted and wait until "2.6.35-rc1" to do any other work. With the order of events fixed, it becomes clear that the choice is between thinking a release had a bug and being right, and thinking a release had a bug and being wrong.
As a general rule, bugfixes get considered for backporting at the point where they are merged into the mainline, and it's obvious at that point that no release contains them yet (you can't tell from the bugfix commit whether it missed a release, but you can tell from the pull request). The interesting question is when the bug was introduced, since that determines what backports are needed, and these backports go into patch release branches which have linear histories. When considering whether a fix is in 2.6.33.N and looking at the commit that fixes it for 2.6.33.*, you can just look at the commit dates.
Posted Jun 21, 2010 9:36 UTC (Mon) by chad.netzer (✭ supporter ✭, #4257)
How is this true? Not all feature branches in a centralized system need be merged before a release, and so it seems to not be a matter of centralized vs. distributed. Bazaar supports both centralized (ie. non-local commit, merging, etc.) and full DVCS semantics (local branching/merging), and I'm having a hard time seeing how your example would not require graph inspection in either mode, since the commit was written before a release, but merged afterwards according to the comment.
If you simply require all branches in a centralized system to be merged before a release, with non-merged branches to be re-written/rebased off the latest release, then that's one thing. But that can be enforced in a distributed system as well (though it sounds like a painful policy, more painful than whatever it intends to fix).
Posted Jun 21, 2010 16:12 UTC (Mon) by iabervon (subscriber, #722)
So, if you understand a system to be "decentralized" if the software includes the necessary features for not having a central repository, even if it is used with a central repository in actual deployment, then the systems still considered "centralized" essentially require all branches to be rebased around a release just because merging them is otherwise too hard to get right.
Posted Jun 21, 2010 19:45 UTC (Mon) by chad.netzer (✭ supporter ✭, #4257)
Here is an informative workflow example, demonstrating marcH's "strict ordering" principle:
It works in this case because trunk has no other branches than stable, and stable is (obviously) never merged back into trunk. That same workflow works even with a distributed system (by rebasing before pushing to the main repo, for example). It blends history among developers, however, which can be a disadvantage at times, especially with many developers and features in active development.
Posted Jun 29, 2010 14:58 UTC (Tue) by marcH (subscriber, #57642)
Agreed I guess, except you conveniently omit the huge influence that a centralized system has on branching policies (the "workflow").
> Certainly the "centralized repo" tools that I'm familiar with have gained better branch and merge support over the years,
As long a system is centralized, you will neither be able to permanently erase branches nor to re-organize the past in any way. And you will need some human-level protocol to avoid collisions. These, and probably others reasons I miss will always have a chilling effect on branching. Why do you think git-svn is so popular?
I am pretty sure that the main reason why most decentralized systems were invented is because their authors wanted to increase the branching and merging freedom.
My point here is that the tool has a huge influence on the way it is used. Sure you can often hammer down a nail with a screwdriver, but do people do that routinely?
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds