Re: rfc: why are we still distributing the portage tree via rsync?

From:		Rich Freeman <rich0-AT-gentoo.org>
To:		gentoo-dev <gentoo-dev-AT-lists.gentoo.org>
Subject:		Re: rfc: why are we still distributing the portage tree via rsync?
Date:		Tue, 3 Jul 2018 11:38:03 -0400
Message-ID:		<CAGfcS_kzzo4gkGKrORnuRw6hxvPJzwy6-2OL5A-7Rdgki0Un0w@mail.gmail.com>
Archive-link:		Article

On Tue, Jul 3, 2018 at 11:22 AM William Hubbs <williamh@gentoo.org> wrote:
>
> Mostly because of the recent "trustless infrastructure" thread, I am
> wondering why we are still distributing the portage tree primarily
> via rsync instead of git?
>
> Can someone educate me on that, and is it worth considering moving away
> from rsync distribution?
>

Here are the pros/cons that I've seen come up in the past:

1.  emerge-webrsync is probably more secure at the moment, because
emerge --sync with git leaves the tree corrupt if it doesn't verify.
That seems like something that could be fixed, and which should be
fixed regardless (presumably somebody just has to do the work - I
can't imagine the portage team would turn away patches).

2.  git seems to be more efficient for frequent syncing, while rsync
seems to be more efficient for infrequest syncing.  I'd guess the
crossover is somewhere around a week or few, but I don't have data to
support that.

3.  we have more rsync mirrors, though with the possibility of using
mirrors like github I don't see why this matters (as long as we
actually secure distribution).

4.  by default git tends to accumulate history, which can eat up disk
space.  I imagine this could be automatically trimmed if users wanted,
though during syncing it would at least need to store all the commits
between the last fetched and next-fetched, and that means fetching
things that might have been subsequently removed/changed

Personally I stick with git.  I want the history anyway, and since I
sync frequently it involves WAY less disk IO and seems to be very
network-efficient as well.

-- 
Rich