User: Password:
Subscribe / Log in / New account



Posted May 13, 2011 22:59 UTC (Fri) by martinfick (subscriber, #4455)
In reply to: DVCS-autosync by martinfick
Parent article: DVCS-autosync

Really, there are many ways with git to rewrite your history, and that is the main point that I was trying to make. And, surely rewriting your history every now and then to keep it reasonably sized is surely better than just having no history, isn't it?

(Log in to post comments)


Posted May 13, 2011 23:10 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

You also have to prune the remote repository. If the old blobs are in the packs, they will still get downloaded when cloned even if they're unreferenced. Same with pushing: blobs reachable from commits in the reflog is considered "referenced" and blobs will linger from that as well. To remove these blobs, a combination of "git repack", "git fsck --unreachable HEAD", and "git prune" would be necessary. I don't know of one tool/command that does a pruning of *all* blobs not reachable from a set of commits.


Posted May 14, 2011 9:17 UTC (Sat) by rmayr (subscriber, #16880) [Link]

[dvcs-autosync author here...]

That is what I will be trying to do for those files not managed by git-annex (not that git-annex support is done yet...). However, I don't know yet what the best way to rewrite the history will be and am happy to take any suggestions at this point. filter-branch is something I have used before for splitting repositories, but there may be better ways to reduce history. From a user point of view, the use cases that should be supported by the trimmed history are IMHO: a) "Give me the state of this directory at a point in time up to X days/weeks/months ago" and b) "I want this file as it was Y revisions ago". Everything that does not fit in either a) or b) with user-configured values for X and Y can be pruned.


Posted May 14, 2011 20:53 UTC (Sat) by dlang (subscriber, #313) [Link]

when trimming the history, I can see a couple of desired approaches

1. I want to keep all revisions up to X days/revisions ago, but I don't care much about anything before that.

2. I know that this is a good version, I want to make sure I can always see this version, but I don't care about all the other versions I haven't tagged.

3. I want to save one version per day/week/?? but not _every_ version (i.e. get me back into the approximate area, but don't spend the resources to save every version)

I can also see combinations of these

for example I want a snapshot each day, but every revision for the last week

given how efficient the git compression can be, I would suggest having a dry-run mode that will tell you how much space you are saving by throwing away the history. for binary files it may be quite a bit, but for other files you may be surprised at how little it saves.


Posted May 17, 2011 21:15 UTC (Tue) by zooko (guest, #2589) [Link]

(Tahoe-LAFS hacker here)

There are hooks to integrate git-annex with Tahoe-LAFS, and that might make it easier to do the sort of pruning you are talking about here.

Also, send me email at and I'll hook you up with a Tahoe-LAFS storage service that you can use for testing.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds