User: Password:
|
|
Subscribe / Log in / New account

Ten-year timeline part 6: almost to the present

Ten-year timeline part 6: almost to the present

Posted Feb 28, 2008 7:55 UTC (Thu) by jamesh (guest, #1159)
In reply to: Ten-year timeline part 6: almost to the present by nix
Parent article: Ten-year timeline part 6: almost to the present

> and a whole bunch of really nasty ones that had bedeviled other VC
> systems for ages just ceased to exist, like rename tracking (what?
> you need to track renames? why not just search for similar content
> when you pack? that way you can merge stuff that's similar whether
> or not it originated in a rename.)

The primary reason for wanting to track renames in a version control system is not storage
efficiency: it is to correctly merge a branch even if files have been moved around.

Git's answer to the merge problem is to infer the renames based on the file content, which
works quite well except for a few cases where it doesn't (some cases with renamed directories
and added files come to mind).

In contrast, VCS's that do track renames generally only use the information they captured when
performing the merge.  This can break down in cases where a file has been split in two, since
they'll often try to apply the changes to just one of the halves.

One thing to note is that the data model of a VCS that tracks renames does not preclude
performing git-style content based merging, while the reverse is not true.  Perhaps the ideal
merge algorithm will turn out to be a combination of tracking renames and content based
heuristics.

As both approaches have their faults, I'd prefer to keep my data in a form that tracks the
renames.  5 to 10 years down the track, if it turns out that content based merges are state of
the art I can always throw that information away.


(Log in to post comments)

Ten-year timeline part 6: almost to the present

Posted Feb 28, 2008 10:23 UTC (Thu) by nix (subscriber, #2304) [Link]

It's true about the merging stuff, I quite forgot about that thanks to using VCs that had
largely solved this problem for so long now :)


Tracking renames explicitly works fine until your users forget to tell the VCS as well as the
filesystem about the rename. Then it breaks. I find the users forget to do this all the damn
time.

I suspect manual rename tracking (as opposed to content-detection inference at some stage)
will work properly only when the FS and VCS are merged, and look at ClearCase for an example
of how ugly *that* can get. (Also, manual rename tracking intrinsically can't handle the case
you mentioned of 'oh that file got cut in half', let alone more complex cases.)

Ten-year timeline part 6: almost to the present

Posted Feb 28, 2008 20:18 UTC (Thu) by njs (guest, #40338) [Link]

It's easy for the VCS to detect when the user has forgotten to record a rename -- there's an
inconsistency between the VCS's idea of what the tree should look like and what the tree
actually looks like on disk.  (This is true both when the VCS thinks a rename should have
occurred but none did, or vice-versa.)  Most VCSes already refuse to allow a commit when
there's a "missing file" situation like this.  So just add some code to that routine that says
"such and such files are missing -- hmm, but it looks like very similar files are in an
'unknown' state over there.  (Do you want me to record the following renames to fix things
up?)/(Do you want me to rename the following files on disk to fix things up?): a -> b, c ->
d".

This combines the ease-of-use of the automatic content tracking systems with the
predictability of explicit rename tracking.  It still doesn't help with files getting cut in
half, though, sure.  (Handling that case in an explicit and predictable manner looks hopeless,
though, which is why I personally would prefer that such heuristics only be used as part of a
manual merge with a human checking the results.  YMMV.)

Hooking into the FS doesn't really help, and probably hurts, because a VCS "rename" is at a
higher semantic level than a FS "rename".  It's perfectly common for people to, say, copy a
file, edit the copy, and then replace/delete the original.  rename(2) is a way to move bits
around, "myvcs rename" is a way to record intentions.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds