Support large repositories!
Support large repositories!
Posted Apr 4, 2010 0:28 UTC (Sun) by RCL (guest, #63264)In reply to: Support large repositories! by marcH
Parent article: A proposed Subversion vision and roadmap
Your code should always be synchronized to a particular state of your data it works with. That's so common that I'm suprised that you are asking "why". Here are few reasons why:
1) Using bytecode-compiled scripts which rely on particular layout and sizeof of structures used by C/C++ code and which are usually embedded in data (maps, actors) they operate on.
2) Using binary formats which are tied to particular code which loads/uses them (e.g. if you are making a console game, you don't have resources to have hierarchicaly structured XML-like formats which require additional memory to be loaded and/or cannot be read with a single I/O operation, which is a showstopper if you are streaming directly from DVD - your only realistic option is to load large binary chunks, change a few pointers here and there and that's it).
3) Having the ability to branch/tag a particular state of the whole game so it can be later ported/patched/otherwise updated independently of ongoing development...
etc etc etc
Basically there are as many reasons to have your binary data versioned as there are reasons to have your plaintext data versioned.
Posted Apr 4, 2010 7:25 UTC (Sun)
by fdr (guest, #57064)
[Link] (6 responses)
I do think there's some reason for improvement here, but I must admit: often
I think some UI slickness could be of big help here. It would also be nice
Posted Apr 4, 2010 8:18 UTC (Sun)
by ikm (guest, #493)
[Link] (1 responses)
mechanism other than a VCS (version control system)? Maybe, like, a hammer, or a shovel? But not a VCS. Definitely not. VCSes are not for versioning data, no, no. Shovels are for that. Dig a hole, snatch your data, mark the place with a rock. And you're done.
Posted Apr 5, 2010 21:40 UTC (Mon)
by fdr (guest, #57064)
[Link]
No need to lean so heavily on the expansion of a TLA to make biting remarks.
Posted Apr 4, 2010 12:04 UTC (Sun)
by RCL (guest, #63264)
[Link] (3 responses)
And by the way, being "binary" is not the deciding factor. Some
Basic problem seems to be time needed to detect the change. Perforce
Git seems to be ideologically incompatible with the very idea of workflow,
There are some ways where DVCS might be more convenient (but only for
Posted Apr 5, 2010 11:01 UTC (Mon)
by CaddyCorner (guest, #64998)
[Link]
Perhaps something that does address the present situation is simply viewing files which have changed their modification date as changed and running a diff index in the background. If it were possible to transparently wrap the binary blob in a paged format then modification dates could be taken on each block/page. This seems to all be at the cost of the VCS trying to not inject itself into the workflow, possibly injection could be optional.
Posted Apr 5, 2010 11:31 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
Now I feel glad not to be a second-class citizen in game development. I would hate to have to deal with tens of gigabytes every time I want to test a one line fix in isolation (just because no cares to modularize these tens of gigabytes...)
> Git seems to be ideologically incompatible with the very idea of workflow, where code is a tiny fragment of overall versioned data.
Yes, because it is designed and optimized for the opposite use case.
> DVCSes get all the things wrong here: local history (which would require several TBs) is not needed here, ability to lock the file is missing, ability to detect changes by scanning the files is detrimental.
These are all desired features for a distributed, "text-optimized" VC.
Thanks for sharing your experience with versioning binaries. It allows you to highlight better than anyone else how optimizing for binaries is different and incompatible from optimizing for text.
Posted Apr 8, 2010 16:33 UTC (Thu)
by Spudd86 (subscriber, #51683)
[Link]
Posted Apr 5, 2010 22:18 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (6 responses)
That's so common that only perforce seem to handle large binaries well?
I am afraid what is actually common is to generate large binaries from source.
Posted Apr 6, 2010 1:19 UTC (Tue)
by bronson (subscriber, #4806)
[Link] (5 responses)
Besides, when it takes six hours for an optimized compile (this was the 90s), or when the dev tools cost $25,000/seat, then hell yes you check binaries into revision control. Right next to the source code.
Posted Apr 6, 2010 9:09 UTC (Tue)
by marcH (subscriber, #57642)
[Link]
As a matter of fact, I work daily with binaries that I cannot compile myself. Hell no they are not checked in right next to the source code, not to make revision control operations unnecessarily slow down to a crawl.
Posted Apr 6, 2010 18:52 UTC (Tue)
by avik (guest, #704)
[Link] (3 responses)
Posted Apr 7, 2010 23:19 UTC (Wed)
by cmccabe (guest, #60281)
[Link] (2 responses)
Amen to that.
Checking in large blobs of "mystery meat" on the honor system just leads to chaos.
Posted Apr 8, 2010 23:13 UTC (Thu)
by bronson (subscriber, #4806)
[Link] (1 responses)
I notice you guys are ignoring my main points about audio and video files, and cross compilers that cost a lot of dough per seat. OK, fine, let's restrict this discussion to just native compiling. Even in this specialized case, anyone who's kept a distributed ccache up and running might be skeptical of Avi's advice.
Executables are WAY more backward compatible than object files. If you can ensure that everyone is running the exact same minor version of gcc and libraries, ccache would probably work. In most dev shops, where there's a crazy mix of personal favorite Linux distros is plus a bunch of custom-compiled shared libs, I'm pretty sure trying to keep everyone on ccache will cost you a lot more time than it saves. (spoken from my bitter experience of trying to do this in 2006).
Different strokes, right? You will to use whichever technique is best for your shop. That might be ccache, custom scripts pulling binaries off fileservers, or just checking them right into source control. Each one has its place.
Posted Apr 30, 2010 18:44 UTC (Fri)
by cmccabe (guest, #60281)
[Link]
When you check _code_, a skilled coder can look at your change and figure out what it is doing. When you check in a _binary_, there is no obvious way to figure out how it differs from the binary that was previously there. Sure you could disassemble it and run a detailed anaylsis, but realistically, that's not going to happen. Hence, it's "mystery meat."
> I notice you guys are ignoring my main points about audio and
No, I totally agree with your points regarding audio and video. I hope that git will be extended to support working with these large files more effectively.
> Executables are WAY more backward compatible than object files. If
You are doing it wrong. Set up a chroot environment with the proper libraries and compiler. Look up "cross compiling with gcc."
Posted Apr 6, 2010 16:53 UTC (Tue)
by Spudd86 (subscriber, #51683)
[Link]
Why are you putting the bytecode in the repository if it's coupled to the changes in the C/C++ source it should be built at the same time as all your native code...
Support large repositories!
it be done using some other mechanism.
the large resources are not terribly coupled to code change (ex: texture
tweaking), and I really, really like the fact that the "log" and "diff"
operators are local and blazing fast for textual artifacts. In the idea
world I could have all my binary blobs, too, however....
for managing third-party code/dependencies. At the same time, I do imagine
that the right kind of simple script could fetch the resources needed, at
least a conveniently as a P4/SVN install, while not losing access to git's
advantages when manipulating textual artifacts.
Support large repositories!
Support large repositories!
textual artifacts and larger blobs. Maybe one could call the DVCSs as they
exist now incomplete, but I would say the same for the centralized systems
where I feel my ability to interrogate textual history is agonizingly slow
(and, of course, requires network access).
It's somewhat silly.
Support large repositories!
In gamedev data is a first-class citizen, and data commits (from 60+
artists) are usually much more frequent than code commits (from 10+
coders) yet they are interdependent (data depends on editor and game
depends on data), so they should be versioned together...
intermediate data formats may be textual (id software even tried that for
production formats, but it failed miserably on consoles). Things don't get
much better if you deal with gigabytes of text files.
relies on explicit help from the user by requiring that you "p4 open" some
file before editing it (kind of enforced by setting files to read-only),
but it makes "p4 diff" blazingly fast. SVN tries to guess itself and while
convenient, that slows things down.
where code is a tiny fragment of overall versioned data. DVCSes get all
the things wrong here: local history (which would require several TBs) is
not needed here, ability to lock the file is missing, ability to detect
changes by scanning the files is detrimental.
coders, which are tiny part of the team), that's why Perforce introduced
"shelving" concept, which seems to get you the main advantage of DVCS into
traditional VCS. Perhaps Subversion should do the same...
Support large repositories!
Support large repositories!
Support large repositories!
Support large repositories!
Support large repositories!
Support large repositories!
Support large repositories!
Support large repositories!
> repository. This way the first person to compile takes the hit, the rest
> reuse the generated binaries, and the source control doesn't need to be
> aware of it.
Support large repositories!
Support large repositories!
> someone commit rights it becomes an honor system no matter what software
> you're using.
> video files
> you can ensure that everyone is running the exact same minor version
> of gcc and libraries, ccache would probably work. In most dev shops,
> where there's a crazy mix of personal favorite Linux distros is plus
> a bunch of custom-compiled shared libs, I'm pretty sure trying to
> keep everyone on ccache will cost you a lot more time than it saves.
> (spoken from my bitter experience of trying to do this in 2006).
Support large repositories!