User: Password:
|
|
Subscribe / Log in / New account

DVCS-autosync

DVCS-autosync

Posted May 13, 2011 22:46 UTC (Fri) by dlang (subscriber, #313)
In reply to: DVCS-autosync by martinfick
Parent article: DVCS-autosync

rebasing doesn't solve the problem, you are wanting to actually trim the history (a 'shallow' clone in git terms, but then making that the real repository and throwing away it's history)

I think there is a command along the lines of git filter-history that can walk one repository and create a new one with less history than the old one had. This could be used to trim the old history, but when you do so, you will have to re-clone the (now smaller) repository to all your machines


(Log in to post comments)

DVCS-autosync

Posted May 13, 2011 22:56 UTC (Fri) by martinfick (subscriber, #4455) [Link]

I think that you are assuming that you always keep alternate/old branches around. Why couldn't you simply rebase onto a new smaller branch and fetch that branch on all your copies, then drop the old master branch?

DVCS-autosync

Posted May 13, 2011 22:59 UTC (Fri) by martinfick (subscriber, #4455) [Link]

Really, there are many ways with git to rewrite your history, and that is the main point that I was trying to make. And, surely rewriting your history every now and then to keep it reasonably sized is surely better than just having no history, isn't it?

DVCS-autosync

Posted May 13, 2011 23:10 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

You also have to prune the remote repository. If the old blobs are in the packs, they will still get downloaded when cloned even if they're unreferenced. Same with pushing: blobs reachable from commits in the reflog is considered "referenced" and blobs will linger from that as well. To remove these blobs, a combination of "git repack", "git fsck --unreachable HEAD", and "git prune" would be necessary. I don't know of one tool/command that does a pruning of *all* blobs not reachable from a set of commits.

DVCS-autosync

Posted May 14, 2011 9:17 UTC (Sat) by rmayr (subscriber, #16880) [Link]

[dvcs-autosync author here...]

That is what I will be trying to do for those files not managed by git-annex (not that git-annex support is done yet...). However, I don't know yet what the best way to rewrite the history will be and am happy to take any suggestions at this point. filter-branch is something I have used before for splitting repositories, but there may be better ways to reduce history. From a user point of view, the use cases that should be supported by the trimmed history are IMHO: a) "Give me the state of this directory at a point in time up to X days/weeks/months ago" and b) "I want this file as it was Y revisions ago". Everything that does not fit in either a) or b) with user-configured values for X and Y can be pruned.

DVCS-autosync

Posted May 14, 2011 20:53 UTC (Sat) by dlang (subscriber, #313) [Link]

when trimming the history, I can see a couple of desired approaches

1. I want to keep all revisions up to X days/revisions ago, but I don't care much about anything before that.

2. I know that this is a good version, I want to make sure I can always see this version, but I don't care about all the other versions I haven't tagged.

3. I want to save one version per day/week/?? but not _every_ version (i.e. get me back into the approximate area, but don't spend the resources to save every version)

I can also see combinations of these

for example I want a snapshot each day, but every revision for the last week

given how efficient the git compression can be, I would suggest having a dry-run mode that will tell you how much space you are saving by throwing away the history. for binary files it may be quite a bit, but for other files you may be surprised at how little it saves.

DVCS-autosync

Posted May 17, 2011 21:15 UTC (Tue) by zooko (guest, #2589) [Link]

(Tahoe-LAFS hacker here)

There are hooks to integrate git-annex with Tahoe-LAFS, and that might make it easier to do the sort of pruning you are talking about here.

http://tahoe-lafs.org/trac/tahoe-lafs/wiki/RelatedProjects

Also, send me email at zooko@zooko.com and I'll hook you up with a Tahoe-LAFS storage service that you can use for testing.

DVCS-autosync

Posted May 13, 2011 23:02 UTC (Fri) by dlang (subscriber, #313) [Link]

switching your syncing from one branch to another would seem to be at least as problematic as re-generating your master branch.

in addition, the history you want to keep around is usually the more recent stuff, rebasing the latest version onto old history seems to be loosing the most valuable revisions.

DVCS-autosync

Posted May 13, 2011 23:21 UTC (Fri) by martinfick (subscriber, #4455) [Link]

> switching your syncing from one branch to another would seem to be at least as problematic as re-generating your master branch.

I don't know how DVCS-autosync uses git, but there is nothing magical about "master" in git, in fact I rarely even care about it. All branches are equal with git, and all repos are equal. There is no need to even clone repos, simply init a new repo and fetch into it the branch you care about. git is a DVCS, it is not subversion.

> in addition, the history you want to keep around is usually the more recent stuff, rebasing the latest version onto old history seems to be loosing the most valuable revisions.

I fail to see how that is an argument against keeping history when the alternative, no history, surely is no better than the wrong history, is it?

But, who said anything about specifically cleaning up old history, clean up any history you want. Go ahead, clean it up to only one checkin, only the latest if you want. You will still be no worse off than having only one copy and you will have the ability to add history whenever you want.

DVCS-autosync

Posted May 14, 2011 9:21 UTC (Sat) by rmayr (subscriber, #16880) [Link]

> I don't know how DVCS-autosync uses git, but there is nothing magical about "master" in git, in fact I rarely even care about it. All branches are equal with git, and all repos are equal. There is no need to even clone repos, simply init a new repo and fetch into it the branch you care about. git is a DVCS, it is not subversion.

dvcs-autosync doesn't care about it at all. Whatever the checked out branch is, it will work on. Dieter and I have already been discussing that explicit support for branches may be good to have in future versions for accessing (and possibly modifying) older revisions. But right now, it's completely transparent. You can even have two clones of the repository (in different directories) pointing to different branches and have two instances of dvcs-autosync working on those branches concurrently.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds