Support large repositories!

Posted Apr 4, 2010 12:04 UTC (Sun) by RCL (guest, #63264)
In reply to: Support large repositories! by fdr
Parent article: A proposed Subversion vision and roadmap

I can't imagine what other [convenient] mechanisms could be used for data.
In gamedev data is a first-class citizen, and data commits (from 60+
artists) are usually much more frequent than code commits (from 10+
coders) yet they are interdependent (data depends on editor and game
depends on data), so they should be versioned together...

And by the way, being "binary" is not the deciding factor. Some
intermediate data formats may be textual (id software even tried that for
production formats, but it failed miserably on consoles). Things don't get
much better if you deal with gigabytes of text files.

Basic problem seems to be time needed to detect the change. Perforce
relies on explicit help from the user by requiring that you "p4 open" some
file before editing it (kind of enforced by setting files to read-only),
but it makes "p4 diff" blazingly fast. SVN tries to guess itself and while
convenient, that slows things down.

Git seems to be ideologically incompatible with the very idea of workflow,
where code is a tiny fragment of overall versioned data. DVCSes get all
the things wrong here: local history (which would require several TBs) is
not needed here, ability to lock the file is missing, ability to detect
changes by scanning the files is detrimental.

There are some ways where DVCS might be more convenient (but only for
coders, which are tiny part of the team), that's why Perforce introduced
"shelving" concept, which seems to get you the main advantage of DVCS into
traditional VCS. Perhaps Subversion should do the same...

Support large repositories!

Posted Apr 5, 2010 11:01 UTC (Mon) by CaddyCorner (guest, #64998) [Link]

It would be nice if large binary formats used e.g. tar as a part of their spec. This obviously doesn't address the current situation.

Perhaps something that does address the present situation is simply viewing files which have changed their modification date as changed and running a diff index in the background. If it were possible to transparently wrap the binary blob in a paged format then modification dates could be taken on each block/page. This seems to all be at the cost of the VCS trying to not inject itself into the workflow, possibly injection could be optional.

Support large repositories!

Posted Apr 5, 2010 11:31 UTC (Mon) by marcH (subscriber, #57642) [Link]

> In gamedev data is a first-class citizen, and data commits (from 60+ artists) are usually much more frequent than code commits (from 10+ coders)

Now I feel glad not to be a second-class citizen in game development. I would hate to have to deal with tens of gigabytes every time I want to test a one line fix in isolation (just because no cares to modularize these tens of gigabytes...)

> Git seems to be ideologically incompatible with the very idea of workflow, where code is a tiny fragment of overall versioned data.

Yes, because it is designed and optimized for the opposite use case.

> DVCSes get all the things wrong here: local history (which would require several TBs) is not needed here, ability to lock the file is missing, ability to detect changes by scanning the files is detrimental.

These are all desired features for a distributed, "text-optimized" VC.

Thanks for sharing your experience with versioning binaries. It allows you to highlight better than anyone else how optimizing for binaries is different and incompatible from optimizing for text.

Support large repositories!

Posted Apr 8, 2010 16:33 UTC (Thu) by Spudd86 (subscriber, #51683) [Link]

what do you mean, like gigabytes in one text file? If you you're doing the text file part wrong...