User: Password:
Subscribe / Log in / New account



Posted Jun 25, 2013 15:50 UTC (Tue) by nye (guest, #51576)
In reply to: mercurial? by marcH
Parent article: Subversion 1.8.0 released

>Your "recent example" link does not look related, did you copy the wrong URL?

I see what khim means:

That thread describes an example where a file has been mostly rewritten, and the new file is almost identical to an unrelated file elsewhere in the tree, because they are both so short that the license header constitutes the majority of the file.

Since git tracks the state of trees, there's no problem when you're using git natively. To send the changes to svn though requires that git generates a diff. The change to the file is so great, and the similarity to an existing file so high, that the diff it generates by default describes a copy of the other file followed by a change of the non-license part of the file, which is clearly a failure in git's move/copy detection heuristics and will produce nonsensical change history when imported to svn.

That said, it's arguably not too major a failure, because if you're looking at the diffs you're generating then you can see the problem immediately and can ask git to DTRT by setting the similarity parameter for the heuristic to something more appropriate. I don't know if git-svn makes that easy to notice in the standard workflow though, as I don't use it.

There are some fairly dodgy heuristics in git which can cause this kind of thing, and while we're looking at dodgy heuristics I might suggest that the similarity threshold could be a) dynamically adjusted according to the length of the file, and b) possibly altered to weight lines more heavily the further they are down the file, based on the reasoning that it's typically the beginning of most files (for code anyway) that tend to have a load of common guff that confuses the similarity detection.

(Log in to post comments)


Posted Jun 27, 2013 22:12 UTC (Thu) by marcH (subscriber, #57642) [Link]

In summary, to actually have happened this accident would have had to take:
- an idiot not having a single look at the commits he pushes, plus
- a lawyer forcing him to add a copyright header longer than the rest of the file.

Clearly the type of thing that happens everyday at the git-svn office.

Once again and more seriously: every single limitation of git-svn can quickly be worked around by temporarily (and rarely) switching back to the plain SVN client.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds