LWN.net Logo

Hamano: GitTogether 2011

Git maintainer Junio C. Hamano reports on GitTogether 2011 on the Google Open Source blog. A two-day "unconference" event was held at Google's Mountain View headquarters to discuss various Git features, including: "Support for large blobs that would not fit in the memory has been always lacking in Git. There recently has been a lot of work in the native support (e.g. storing them straight to the object store without having to read and hold the whole thing in core, checking out from the object store to the working tree without having to hold the whole thing in core, etc.). There are a few third-party tools and approaches with their own pros-and-cons, but it was generally agreed that adding a split-object encoding like Avery Pennarun's "bup" tools uses would be the right way to help support object transfer between repositories to advance the native support of large objects in Git further."
(Log in to post comments)

Hamano: GitTogether 2011

Posted Dec 10, 2011 1:22 UTC (Sat) by nelhage (subscriber, #59579) [Link]

I see some mention of submodules getting better. Does anyone have a link for a good synopsis of recent changes to git submodule support? I really want submodules to be the right answer for many different problems that I have, but I've found them to be unusably unwieldy in almost every case I've tried to use them. I remain optimistic they'll get there eventually, however...

Hamano: GitTogether 2011

Posted Dec 10, 2011 3:00 UTC (Sat) by joey (subscriber, #328) [Link]

I was at the Git Together this year..

The submodule improvements have not AFAIK landed yet. One feature that's being considered is a "floating" submodule, that tracks the head of a remote branch, rather than staying pinned as they do now. The rest of it, and the parts that have been worked on so far, as far as I remember, mostly consisted of making various commands recurse into submodules. (There might have also been some plans to consolidate the submodule's .git directories in the toplevel .git?)

Hamano: GitTogether 2011

Posted Dec 10, 2011 5:53 UTC (Sat) by leif81 (guest, #75132) [Link]

More about floating submodules. If it interests you and you want to see it happen, post to the list!

http://comments.gmane.org/gmane.comp.version-control.git/...

Hamano: GitTogether 2011

Posted Dec 11, 2011 9:13 UTC (Sun) by rvfh (subscriber, #31018) [Link]

> (...) the parts that have been worked on so far, as far as I remember, mostly consisted of making various commands recurse into submodules.

Isn't this the first obvious thing we would expect from submodules? That and automatic merging of course ;-)

Hamano: GitTogether 2011

Posted Dec 13, 2011 1:47 UTC (Tue) by bronson (subscriber, #4806) [Link]

That and keeping all objects in the root .git so switching branches doesn't blow away unpushed work on the submodule.

(admittedly, it's git clean -df that actually blows the work away, but the fault lies in the current implementation of submodules, not git clean or the careless developer)

Preventing clean -df from blowing away unpushed work in submodules

Posted Dec 13, 2011 2:54 UTC (Tue) by jrn (subscriber, #64214) [Link]

Hamano: GitTogether 2011

Posted Dec 13, 2011 15:43 UTC (Tue) by daglwn (subscriber, #65432) [Link]

I would really like to see git-subtree mainstreamed. It is extremely useful, fairly intuitive and behaves just as I thought submodules should.

Large binary blobs.

Posted Dec 10, 2011 7:25 UTC (Sat) by tshow (subscriber, #6411) [Link]

I would be really, *really* happy if git were to improve in its handling of binary blobs. We're using git for game development, and while it's wonderful in almost every respect, the handling of binary blobs isn't what I'd like it to be.

Unfortunately, games tend to have piles of binary blobs. Textures, sound, models, that kind of thing. Usually orders of magnitude larger than the code base in size.

Of course, the occasional papercut from git is almost refreshing after the constant maiming that perforce and subversion visited on us. I find in particular that git is the tool I was always imagining when trying to merge subversion or perforce branches and thinking "this can't possibly be the best we can do...".

Large binary blobs.

Posted Dec 10, 2011 13:42 UTC (Sat) by hmh (subscriber, #3838) [Link]

Until VLBB (very large binary blog) support lands in git, or if you want to use git to track stuff that really doesn't benefit much from delta compression anyway, you can use git-annex.

http://git-annex.branchable.com/

Large binary blobs.

Posted Dec 11, 2011 19:08 UTC (Sun) by tshow (subscriber, #6411) [Link]

Looks interesting, but unfortunately we have to be able to run on multiple platforms (linux, bsd, osx, windows), and I don't want to have to set Haskell up on all of those for each machine. Maybe I'll look at media-git; it seems to be using ruby at least, which we're using extensively for tools and support code already.

Large binary blobs.

Posted Dec 12, 2011 8:58 UTC (Mon) by Tobu (subscriber, #24111) [Link]

You don't need to install Haskell anywhere but on the machine that builds git-annex. There is no windows port of git-annex at the moment though.

Large binary blobs.

Posted Dec 12, 2011 15:39 UTC (Mon) by tshow (subscriber, #6411) [Link]

Unfortunately, that's a deal killer. The only reason I run windows (or OSX, for that matter) at all is for game development, because some platforms simply require it. I tried running windows on a virtual machine and letting Linux manage the underlying filesystem (so I could do everything in Linux except compile/run), but at the time I tried it the game devkit reliably bluescreened the windows session as soon as the game binary tried to upload.

The game company said "we don't support that, not going to fix", and VMWare and Parallels couldn't do anything for us because we couldn't legally give them any access to the devkit.

I can't tell you how much I'm hoping Raspberry Pi takes off. Or something like it.

Large binary blobs.

Posted Dec 12, 2011 19:47 UTC (Mon) by joey (subscriber, #328) [Link]

It's an open question how much longer windows support will matter. :) However, there's nothing in git-annex's design that prohibits it being used on windows, as long as git on windows can check out a repository that contains symlinks, translating them to the native windows equivilant if necessary. I believe that there's a cywin git for windows that can do that.

There's a lot of POSIX in git-annex's unix implementation, but its data model is just symlinks and plain text data files, so someone who really wanted to could write a batch script or c# clone of git-annex on windows.

Alternatively, if someone can manage to build a ghc that targets cygwin on windows, git-annex could just be built and work with few changes. I know it's been done before, but it's "unsupported" and I've never found such a ghc binary.

Alternatively alternatively, someone who is a native windows programmer
could sit down and replace all the current POSIX calls in git-annex with
native windows equivilants.

It's not that easy...

Posted Dec 12, 2011 22:42 UTC (Mon) by khim (subscriber, #9252) [Link]

However, there's nothing in git-annex's design that prohibits it being used on windows, as long as git on windows can check out a repository that contains symlinks, translating them to the native windows equivilant if necessary. I believe that there's a cywin git for windows that can do that.

It's not easy. Microsoft have implemented symlinks in Vista, but to make sure people will not use them it added two obstacles:
1. You must know in advance if your symlinks is pointing to directory or to file.
2. You need administration capability (complete with UAC flash and everything) to create symlinks (specifically you need SeCreateSymbolic capability.

Cygwin does not use native Windows symlinks for these reasons (it can read them but will not ever create them). It uses files with "SYSTEM" attribute set to do the trick. This works in CygWin, but the price is horrible: file access is veeery slow in CygWin. And of course native Windows programs can not see these symlinks at all (in the best case they see file with "SYSTEM" attribute).

Most Windows users use msysgit because it's many times faster then cygwin's one and then symlinks are not an option at all.

Large binary blobs.

Posted Dec 11, 2011 9:33 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

I have friends in gamedev. One pretty large game tools company uses git for source but puts ALL binary resources into a separate directory side-by-side with git source. During the build process binary resources are hard-linked into the resulting compiled distribution.

Apparently, designers really really like locking which is kinda hard to do with git.

Large binary blobs.

Posted Dec 11, 2011 19:06 UTC (Sun) by tshow (subscriber, #6411) [Link]

A lot of game companies used "visual source safe" for revision control for a long time, and as a result have very peculiar beliefs about proper revision control practice.

For that matter, the company I started at used MKS RCS, which was mandatory locking. If you "checked out" a file, it got locked in your name. There's a whole process of unlearning you have to go through to move to a system without locks. Aside from anything else, the whole problem of merging (which git has made nearly painless) made lockless revision control look like chthonic madness if you've only ever dealt with locking systems.

Combine that with the usual managerial problems. One place I worked, we wanted revision control, and we wanted to use perforce (it was the late 90s). Management did some backroom voodoo and decided that what we should get was StarTeam, because it was Java and had Integrated Email! You can imagine how that worked out. Another place I worked, a dotcom, had reached the point where Visual Source Safe was unusable; it would corrupt its database within 15 minutes of going live, reboot, spend three hours reconstructing its database, repeat. They replaced it with Harvest, because (according to the head of IT) "We're an Oracle shop, and it uses Oracle.". So I said, "Wait, WTF do we have that runs on Oracle?".

"Harvest!"

Most of us rolled our eyes, but there was one developer who was thrilled. He went running off to convert things. Ten minutes and much cursing later, he came back. I don't remember the exact numbers, but he said it was something like 22 mouse clicks per file to check in a changeset.

So one of the developers who ran a few internal servers put Subversion on them, and we quietly used that instead. Every once in a while, somebody would take a tar.gz snapshot of the head revision and check it into Harvest. IT never noticed, as far as I know.

As an unrelated aside, the guy who ran the servers also had one of the best random number generators I've seen. A lava lamp with a cheap webcam pointed at it. It was generating entropy for an MMO action game.

These days most of the people I've talked to are using either Subversion or Perforce, though we're evangelizing git as we can. We used Subversion for a long time, but git is just plain better in every way. We actually also used TLA for a year or so and Perforce before that. TLA was excellent at the time, but had a couple of gaping holes we couldn't live with.

From a game development point of view, one of the important things is that there's a lot of data, and a lot of the code is very data-driven by necessity. Which means it's quite possible if you have things set up wrong that a junior artist can check in some art that breaks the build. This can be a great test of your team's patience if you're running Perforce with the art directories set to "retain head revision only" and you have an important deadline in seven hours.

Ultimately (ramble mode off...) game development really wants to happen in a single repository. The code and binary assets are interdependant. You can work that way with git as it stands (and we do...), but it would be nice if git were to have more support for that model.

Large binary blobs.

Posted Dec 12, 2011 1:39 UTC (Mon) by ebiederm (subscriber, #35028) [Link]

To get away from locking someone would need to implement merging for your binary blobs, or a process that ensures only one person works on a binary blob at a time.

I wonder if anyone is working on merge algorithms for anything besides text files.

Large binary blobs.

Posted Dec 12, 2011 2:37 UTC (Mon) by tshow (subscriber, #6411) [Link]

IIRC AlienBrain (proprietary) has some of this. I've seen a few image diff tools and the like. The main problem (especially in games) is that there are so many *kinds* of data; textures in various formats, audio in various formats, potentially movies, 3D models (there's a whole mess in itself...), byte compiled scripts, game-specific data, font metadata...

The diff tool needs to understand what it's diffing. The best a revision control tool like git can do for arbitrary binaries is to expose a mechanism for associating specific diff tools with specific files (or types, but that's its own ball of hair).

Since games are still very performance oriented (and will likely remain so for a long time), game developers often tend to try to bake everything into formats that require minimal runtime processing. So, for instance, the Wii version of a game uses assets baked in formats the Wii hardware understands, so it's just a matter of loading them and throwing them at the hardware without having to swizzle, align or endian swap or anything.

Large binary blobs.

Posted Dec 12, 2011 8:33 UTC (Mon) by ldo (subscriber, #40946) [Link]

tshow:

Since games are still very performance oriented (and will likely remain so for a long time), game developers often tend to try to bake everything into formats that require minimal runtime processing.

Can’t the baking be done as an automatic part of the build process? Then the repo only needs to store the inputs, not the baked results. And there’s one less lot of things to get out of sync and produce bad builds.

Large binary blobs.

Posted Dec 12, 2011 9:02 UTC (Mon) by Tobu (subscriber, #24111) [Link]

Distributing pre-built stuff close to the VCS could save time. Slave branches (that autobuild whenever their master changes), with the option of not actually storing the built data on git (but using something like git-annex), would be pretty good for that.

Large binary blobs.

Posted Dec 12, 2011 15:01 UTC (Mon) by cortana (subscriber, #24596) [Link]

Then builds would take too long, require you to install too many tools on the developers' machines, and you run the risk of two builds not producing precisely the same output (depending on exactly what the build is doing).

All insane reasons, but they crop up, nonetheless.

Large binary blobs.

Posted Dec 12, 2011 15:32 UTC (Mon) by tshow (subscriber, #6411) [Link]

> Can’t the baking be done as an automatic part of the build process? Then the repo only needs to store the inputs, not the baked results. And there’s one less lot of things to get out of sync and produce bad builds.

You can do that; that's certainly what we do on Linux most of the time. I prefer doing that when I can. You would not believe how many games have lost their source assets; several companies I worked at simply could not rebuild any game (shipping or not) that had been out for more than a year or so, because all the source assets were erased to make space and only the baked assets remain. So I definitely favor build-time conversion, and even when I can't do build-time conversion I keep the raw assets in the tree.

There are a couple of problems with build-time conversion that crop up, however:

One problem is processing time. We've never given in to the C++ koolaid here, so on a fast machine a full clean build of a typical game (engine, assets, game source) is typically under two minutes. We use a somewhat handrolled build process to make it happen that fast, but that's build time on a single machine (ie: no dist-cc, no ccache).

Some assets, however, are expensive to produce from source. As a general example, pre-baking lighting on a complex model (think: game level) can be quite expensive. As a more specific example, one platform we worked on recently required a lot (about 3 hours) of CD-quality music in a proprietary (undocumented) format, and the conversion tool supplied was source-unavailable and hideously slow. Converting the music from .wav files to the proprietary format took about 40 minutes on the fastest machine we had. Oddly, it didn't take much longer on slower machines; we strongly suspected it was doing I/O stupidly. Probably reading one sample at a time and jumping all over the files. But regardless, without hacking the format and writing our own converter there was nothing we could do to fix it.

In these cases, we split the build so that the cheap stuff can be rebuilt every time and the expensive stuff gets checked in. We tried caching, but it still meant when the big files were changing regularly that any pull potentially meant your next build would take too long. And ultimately, it still had us storing large binaries in the repository either way; the source files and the baked files were similar sizes, with the baked files being slightly smaller. So we threw our hands up and just checked in the baked files, so only one person had to eat the long build.

Another problem is that some platforms (particularly the ones that are very IDE-driven) are hostile to integrated builds. XCode, for example (spit, spit), does things under the hood that are apparently not available from the command line. When you build for iThings, there are several steps in the process including plist compilation and signing the binary that xcode handles with built-in tools that have no command-line equivalent. There's also no way on XCode to launch/debug a binary on the device from the command line. So you wind up stuck using the IDE, at least as a "build/run" button.

XCode has (sort of) support for "pre-build steps", but it's the usual problem with such things; the IDE isn't expecting feedback from the script, so it may or may not recognize when the asset build fails, and IDEs are also really flaky about code generated in "pre-build steps". We have a few tools that generate code; the simplest example is a tool that converts a yaml file describing (say) characters to a C file containing structure definitions and enumerations. With a standard unified build, the asset build can drop that in the code directory and when the code goes to build it just happens. With IDE-based systems, sometimes the dependency tree is calculated when you push the "build" button, with obvious consequences.

Some IDEs are even stingier. I remember asking a CodeWarrior tech what I needed to do in order to automate my asset build (this would have been around 2001), and got back a response that started "Ok, uh... you really want to do that? Why? Are you sure you couldn't do it some other way? Uh... ok. I think... Ok, you'll need to put your build phase in a DLL...".

At one point on that project the team got so frustrated with CodeWarrior that they took a CodeWarrior box and manual out to the parking lot, burned it, took pictures as it burned, and mailed them without comment to everyone at CodeWarrior they had email addresses for. Apparently that's a good way of getting a lot of free stuff out of a company.

At any rate, some of the platforms you run into in game development are simply hostile to sane build processes, so you wind up baking some or all of your assets in a separate build. At which point, that's generally what you want to check in.

Large binary blobs.

Posted Dec 13, 2011 21:59 UTC (Tue) by elanthis (guest, #6227) [Link]

Merging of binary blobs just doesn't make sense, though. Think of merging two PNGs: what would the merge changes to two overlapping blocks of pixels look like when you're workiclubs zlib compressed data? This is similar reasons that line oriented diff/merge tools totally barf on non-pretty-printed XML: the changes are detectable when you work with the native data structure, but not with the encoded byte stream that structure is stored on disk in.

What you need in an ideal world is the ability to specify different merge tools for different file types and then to provide image merges and such for common formats (including recursive merges for compressed files and even for "file system in a file" container formats like archive files). Of course all those toolkit/engine/application specific binary formats need a custom merge tool as well, which is a huge pain in the butt.

In the need, it's just way way way easier to use locking on resources that can't reasonably be merged.

There's also the issue of frequently changing large files, which don't play well with git anyway. If an artist is updating a 30MB rigged, animated, textured model over and over and wants to checkpoint/commit each major change, that git repo explodes in size, and every clone or pull is equally massive since it has to download a complete copy of every revision of every file. Great for source, horrific abomination for most types of content.

Mercurial is addressing this with recent versions that support a look aside file store for large files, so the repo just knows which file id to use for each commit, only pulls the latest version, and just refuses to do diffs on those files (which you can't do anyway, so no loss there).

Large binary blobs.

Posted Dec 13, 2011 22:30 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

> What you need in an ideal world is the ability to specify different merge tools for different file types and then to provide image merges and such for common formats (including recursive merges for compressed files and even for "file system in a file" container formats like archive files).

git provides you the ability to specify tools for diff, patch, and merge of different file types (along with very flexible ways to define 'file types', not just file extensions)

there is a lack of good tools for doing the diff patch and merge for most formats, but if you can find the tools, git can use them appropriately.

Using git to store large binary blobs (LibreOffice bibisect)

Posted Dec 10, 2011 11:28 UTC (Sat) by mjw (subscriber, #16740) [Link]

I was recently surprised at how well git is actually already handling large binary blobs. LibreOffice has a repository of binary builds for testers:

What is bibisect? And what is it doing in my office?
http://sweetshark.livejournal.com/7683.html

"53 complete Linux 64-bit office installs [...] at 450MB each, that would be ~22GB total, however, it is only 749MB total download size, that is less than 15MB per installation. And one does not need to install them in parallel as one can switch through all of them with a quick "git checkout source-hash-XXXXXX" -- one switch costs <1 second."

binaries in git

Posted Dec 10, 2011 16:40 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

I am reminded of Johnson's comment about a dog walking on its hind legs (ignoring its social context, let's not start an argument about what Johnson may or may not have thought about women or religion)

We too have one git repository full of binaries (candidate builds of our custom in-house application), against my better judgement but in the knowledge that building something custom which solves the same problems for us would mean more effort taken off something else.

It's not fast, and it's not elegant, but it's faster than I thought it might be, and less crude than I expected. If some day an upgrade (plus maybe few one-shot commands to set wheels in motion) makes it magically faster or smoother, that would be very welcome.

Using git to store large binary blobs (LibreOffice bibisect)

Posted Dec 12, 2011 13:07 UTC (Mon) by cesarb (subscriber, #6266) [Link]

> I was recently surprised at how well git is actually already handling large binary blobs.

These blobs are probably not very large. The main problem with large blobs AFAIK is that git likes to have whole blobs in memory; if each blob in that repository is for instance 45MB (guessing a number here, it is probably less than that), and you have several gigabytes of RAM and address space, git can deal with it fine.

It is when your blobs grow to hundreds or thousands of megabytes that git starts having problems.

Using git to store large binary blobs (LibreOffice bibisect)

Posted Dec 12, 2011 19:38 UTC (Mon) by joey (subscriber, #328) [Link]

That's one of the problems. I expect git will continue to get better at not eating memory with large blobs.

Another problem is git-gc --auto not being any fun at all when you have large blobs. This can already be worked around by forcing large blobs to stay in loose objects, and I expect git will continue to improve support here too until the current workarounds are not needed.

Yet another problem is that reuiring double the disk space, for .git/ and the working tree, can become overly expensive when using git with large blobs on eg, a netbook. I don't think this is being addressed in git yet.

To me the real killer issue is that, once you're storing large quantities of data in git (think terabytes), you probably don't want to have a copy of all your data in each cloned .git repository. You may have more data in git than fits on any one drive or system. You need the ability to manage and track what data is stored where, and the ability to transfer it around between repositories. You need to be able to archive data away when you're probably done with it, and not continue to have it clutter up repositories. You need to ensure that your requirements for the durability of the data are met while doing all this. Git has few facilities to help with this if you've added massive amounts of large data to a git repository. (Shallow clones and git subtree are stabs toward it, but not suitable solutions.)

git-annex is written to especially handle that last case, comprehensively and well, while also avoiding whatever scalability problems git currently
or in the future has with large data.

Hamano: GitTogether 2011

Posted Dec 10, 2011 19:19 UTC (Sat) by jengelh (subscriber, #33263) [Link]

What worries me is the amount of memory git clone uses. Cloning the linux kernel makes the git processery on the server-side go beyond 400M RSS according to top, even if there is almost no reason to repack:

<server>$ git count-objects -v
count: 153
size: 1188
in-pack: 3152753
packs: 13
size-pack: 1371339
prune-packable: 24
garbage: 0

Hamano: GitTogether 2011

Posted Dec 11, 2011 5:26 UTC (Sun) by xxiao (subscriber, #9631) [Link]

I had to push a 1.4G source to the server and it never completes, had to make it smaller and push it multiple steps instead, yes git could deal with memory better.

on the binary blob, format-patch just can't really deal with it, esp when the blob is large in size.

Hamano: GitTogether 2011

Posted Dec 11, 2011 9:16 UTC (Sun) by rvfh (subscriber, #31018) [Link]

OTOH, one could argue that Git was designed to handle text files, not binary files, and that binary files should probably be handled in a completely different way (FTP/HTTP server with backup comes to mind.)

Hamano: GitTogether 2011

Posted Dec 11, 2011 13:39 UTC (Sun) by karath (subscriber, #19025) [Link]

Git was designed to meet the needs of 1 person: Linus Torvalds. His need was to manage the Linux kernel development process, which does not include muti-gigabyte "assets".

It appears that Linus has relatively good taste (somewhat like Steve Jobs) in that he built the tool that he wanted to use and now many others want to use that tool in ways he never imagined. Unlike Steve Jobs' designs, the design of Git does not enforce specific implementations. It appears that people are actively working on making git better at handling such "assets".

regards,
Charles

Hamano: GitTogether 2011

Posted Dec 12, 2011 7:08 UTC (Mon) by sitaram (subscriber, #5959) [Link]

In my experience there are very few (types of) projects that need *large* binary blobs. Emphasis on "large".

Binary blobs are plenty -- a lot of documentation, images, etc., also gets checked in, but in all the projects at $DAYJOB I have seen/helped, no one has complained about the *size* being the issue.

Mergeability is. Lack of locking for such files is. (But locking is inherently difficult in a DVCS).

Hamano: GitTogether 2011

Posted Dec 12, 2011 8:34 UTC (Mon) by ldo (subscriber, #40946) [Link]

sitaram:

Lack of locking for such files is [the issue].

If you need locking, you’re doing it wrong.

Hamano: GitTogether 2011

Posted Dec 12, 2011 9:05 UTC (Mon) by sitaram (subscriber, #5959) [Link]

A lot of people, like it or not, use openoffice or (gasp!) MS Office for docs and those files cannot be merged using any automatic merge I am aware of.

The only way is out-of-(git)-band communication among team members to prevent Alice from touching a file Bob is working on -- something that is *not* required for text files.

Locking *is* the right answer for these cases. Except locking is fundamentally incompatible with *D*VCS.

Hamano: GitTogether 2011

Posted Dec 12, 2011 19:21 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Uhm, they can be merged using MS Office itself (it has a 'merge' mode!).

Hamano: GitTogether 2011

Posted Dec 13, 2011 1:56 UTC (Tue) by bronson (subscriber, #4806) [Link]

It seems unlikely that Git could use that as a merge strategy.

Hamano: GitTogether 2011

Posted Dec 13, 2011 2:00 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Why not? git can just leave both conflicting revisions of files to be merged, so you can start MS Word to resolve the conflict.

I remember doing this for SVN many years ago. It even had simplest forms of automatic merge (like merging TOCs).

Hamano: GitTogether 2011

Posted Dec 13, 2011 2:11 UTC (Tue) by emk (guest, #1128) [Link]

MS Word has really extensive scripting APIs, and you can easily specify things like "turn on Track Changes, show metadata changes and comments, save the the result as PDF". IIRC, you can do this using either Visual Basic for Applications or Word's .NET automation APIs. It's not very hard, either—if you can write Bash scripts, you can definitely figure it out.

And sure enough, some quick Googling reveals an external Subversion merge tool for Word that uses VBA:

http://stackoverflow.com/questions/2491954/merge-microsof...

Excel also offers some pretty extensive scripting APIs. But the rest of MS Office is much weaker, with virtual no scripting APIs at all.

Hamano: GitTogether 2011

Posted Dec 13, 2011 2:32 UTC (Tue) by emk (guest, #1128) [Link]

MS Word has really extensive scripting APIs, and you can easily specify things like "turn on Track Changes, show metadata changes and comments, save the the result as PDF". IIRC, you can do this using either Visual Basic for Applications or Word's .NET automation APIs. It's not very hard, either—if you can write Bash scripts, you can definitely figure it out.

And sure enough, some quick Googling reveals an external Subversion merge tool for Word that uses VBA:

http://stackoverflow.com/questions/2491954/merge-microsof...

Excel also offers some pretty extensive scripting APIs. But the rest of MS Office is much weaker, with virtual no scripting APIs at all.

Hamano: GitTogether 2011

Posted Dec 13, 2011 4:02 UTC (Tue) by sitaram (subscriber, #5959) [Link]

I see where I went wrong. I should have said "for example...".

I never imagined LWN of all places would get into discussions of scripting MS Office.

Hamano: GitTogether 2011

Posted Dec 13, 2011 13:44 UTC (Tue) by nix (subscriber, #2304) [Link]

More relevant than Office are things like 3D modellers. So you have two divergent 3D models. Can you merge them at all? If your authoring tool can't already do it you'll probably have to write a merge mode for said tool from scratch. (Good luck if it's closed-source.)

Hamano: GitTogether 2011

Posted Dec 15, 2011 15:31 UTC (Thu) by tshow (subscriber, #6411) [Link]

> (Good luck if it's closed-source.)

Well, actually, that's not as bad as it could be. All the big modelling packages have extensive scripting and plugin support; they pretty much have to in order to be competitive. So even the completely closed ones let you root around in their guts enough to do things like write model merge tools.

The real problem there is that merging differing models is going to be a highly interactive process, and probably a nightmarish one for everyone involved. User A has modified the specular component of this material and flagged these triangles as two-sided, while user B has deleted these vertices, moved those, and added a bump map...

If things were in completely different components (ie: user A modified the animation, user B modified the texture mapping) you could reasonably just have an "accept all changes" option, but otherwise you're potentially in a world of deep madness. Especially given that 3D models aren't as linear as text, so you can't exploit nearby context to the same extent.

Hamano: GitTogether 2011

Posted Dec 13, 2011 15:39 UTC (Tue) by cortana (subscriber, #24596) [Link]

Speaking as an OO.org user here--I don't know if my comments apply to MS Office as well--the 'merge' mode is too hard to use. I never managed to get my users to try it. To this day they don't know how to handle a conflict--they just check in the repo in its conflicted state (with common base, theirs and ours versions in 3 separate files) and then complain when later they think Subversion 'ate' their work...

Hamano: GitTogether 2011

Posted Dec 13, 2011 19:36 UTC (Tue) by louie (subscriber, #3285) [Link]

Office 2003's merge tool is not great, but better than OOo's, and Office 2010's much better. So it's really an OOo problem. (But there is no equivalent for merging, say, powerpoint, or excel, or... so on. Needless to say I think a github for docs, that made this stuff easier and more transparent, would be a godsend for collaborative document authors.)

Hamano: GitTogether 2011

Posted Dec 13, 2011 8:59 UTC (Tue) by renox (subscriber, #23785) [Link]

> If you need locking, you’re doing it wrong.

As many definitive statement, this one is wrong of course.
It alls depends on the relative costs of merging vs waiting (and enforcing that someone locks files only for a short period of time), sometimes merging can create costly errors so I wouldn't say that locking is wrong all of the time, just most of the time.

Hamano: GitTogether 2011

Posted Dec 11, 2011 16:00 UTC (Sun) by xxiao (subscriber, #9631) [Link]

the problem is that these days we have some binary blobs embedded in the whole build source tree...

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds