|
|
Subscribe / Log in / New account

Meta's Sapling source-code management system

Meta's Sapling source-code management system

Posted Nov 16, 2022 16:15 UTC (Wed) by eplanit (guest, #121769)
Parent article: Meta's Sapling source-code management system

I'm not so impressed by the results of Meta's "ambition" so far. Maybe it's a good VCS, but it seems to come from quite a self-aggrandizing attitude: our problems are so unique, and solvable by nobody else except ourselves.


to post comments

Meta's Sapling source-code management system

Posted Nov 16, 2022 16:27 UTC (Wed) by mgk (guest, #74833) [Link]

... you beat me to it. +1

Meta's Sapling source-code management system

Posted Nov 16, 2022 17:12 UTC (Wed) by Sesse (subscriber, #53779) [Link] (1 responses)

Large monorepos have been solved many times before, but I don't know if anyone else has published their solutions?

I'm not really sure what the client alone is good for, though. I assume there's no public server?

Meta's Sapling source-code management system

Posted Nov 16, 2022 17:18 UTC (Wed) by geofft (subscriber, #59789) [Link]

The client is capable of cloning Git repos - that's the example that they show in the blog post. I think the blog post is a little confusing because they've rolled up a whole bunch of good ideas into a single system (which is understandable: that's the system they were using internally). My reading of the blog post is that you can see some of the good ideas, like the way the CLI shows history and handles stacked changes, with Sapling pointed at a Git repo, but other good ideas, like whatever they've done to address large monorepos, require pointing it at a Sapling server.

The blog post says "You can now try its various features using Sapling’s built-in Git support to clone any of your existing repositories." and "Many of our scale features require using a Sapling-specific server and are therefore unavailable in our initial client release."

Note that in addition to the Sapling CLI, they're also releasing ReviewStack, an alternative user interface for reviewing GitHub pull requests. The code appears to be in the Sapling repo, and there's also a public instance of it at https://reviewstack.dev .

Meta's Sapling source-code management system

Posted Nov 16, 2022 17:13 UTC (Wed) by geofft (subscriber, #59789) [Link] (13 responses)

They're not terribly unique in reaching that conclusion, though, nor are they unique in trying to solve their own VCS scaling problems internally. Google has their own deeply proprietary VCS. Microsoft has a highly-customized version of Git, most of which they've recently upstreamed. My own employer, which is much smaller than either, has custom tooling (which we've open-sourced, but to my knowledge nobody else has adopted) because until about two years ago, thanks to work by GitHub (Microsoft), GitLab, and others, pure upstream Git wasn't even in the running.

I'm hoping the end result of this is something similar to what Microsoft did with Scalar - the good ideas from it get merged into upstream Git, instead of it becoming yet another standalone VCS.

This is also a pathway that Meta themselves is familiar with: Instagram released their internal CPython fork Cinder https://github.com/facebookincubator/cinder not because they want people to use Cinder itself but because they want it as a public base of discussion to upstream the good ideas into actual CPython, so they can eventually drop the fork.

Meta's Sapling source-code management system

Posted Nov 17, 2022 3:56 UTC (Thu) by bartoc (guest, #124262) [Link] (1 responses)

(I work for Microsoft)

Yeah, the up-streamed stuff from gvfs/scalar is also pretty vastly improved over what was in the old fork. sparse index/worktree/clones are way better than gvfs because you don't need to worry about some random program enumerating the git repo (including getting file sizes) and causing gvfs to download everything. I had both TortoiseGit and WinDirStat do this, it's quite annoying.

Something not in git uptream that I would love to see is a way to automatically symlink/junction git submodules (the ones in .git/modules) to some central area, scalar (the from git-for-windows) does seem to _somehow_ do this, I think using the alternates mechanism and a shim clone/fetch command but it's not super clean. I would be happy just getting modules pointing to exactly the same initial remote pointing somewhere common, it would at least make it harder to end up with 10 different copies of LLVM's repo on my machine.

Oh, another pretty easy win (on linux, at least, but perhaps on windows and mac with a compatibility shim) would be teaching git-checkout-index to use copy_file_range when available. The first "chunk" of the (unpacked) git object doesn't match the first "chunk" of the checked-out file so it's kinda filesystem specific if this works. And ofc on windows even if you have a compat shim almost nobody can use it because ReFS/btrfs/zfs are not widely used (I think those are all the CoW filesystems with windows implementations).

Meta's Sapling source-code management system

Posted Nov 19, 2022 4:00 UTC (Sat) by sionescu (subscriber, #59410) [Link]

> sparse index/worktree/clones are way better than gvfs because you don't need to worry about some random program enumerating the git repo (including getting file sizes) and causing gvfs to download everything. I had both TortoiseGit and WinDirStat do this, it's quite annoying.

That would be throwing the baby out with the bathwater. Having a single total view of the repo has so many benefits that if some devtool can't cope with the size, then it's time to blacklist or fix it.

Meta's Sapling source-code management system

Posted Nov 17, 2022 10:43 UTC (Thu) by nysan (guest, #81015) [Link] (10 responses)

In the end, all monorepos end up into what clearcase used to be.
A big configspec to select different versions from different subdirectories of the monorepo. And then comes a corporate merger, and now you have two monorepos. :-O

Compare above with an google-repo XML file in a git repo, describing multiple sub-git-repos.

Its essentially the same thing. Monorepo is just way worse.

Meta's Sapling source-code management system

Posted Nov 17, 2022 12:20 UTC (Thu) by khim (subscriber, #9252) [Link] (7 responses)

> In the end, all monorepos end up into what clearcase used to be. A big configspec to select different versions from different subdirectories of the monorepo.

How much time this “end” needs? AFAIK Google's one haven't devolved into that.

The trick is simple: don't provide means to combine two versions. Period. If you need two versions of some third-party code for some reason then you just create two directories. Like Python2 vs Python3 difference was handled in the linux distros for years, too.

> And then comes a corporate merger, and now you have two monorepos. :-O

Why is that a problem? As long as you don't start weird automerger schemes and just treat code from another repo as “third party” and import code in the appropriate fashion everything works. Google does that with abseil AFAIK.

> Compare above with an google-repo XML file in a git repo, describing multiple sub-git-repos.

repo exist because Google needed something similar to monorepo but open-sourced. I deal with it on my $DAYJOB. It kinda-sorta works but is just so flaky, cumbersome and unreliable compared to normal monorepo.

Meta's Sapling source-code management system

Posted Nov 17, 2022 12:51 UTC (Thu) by nysan (guest, #81015) [Link] (6 responses)

"The trick is simple: don't provide means to combine two versions. Period."

OK, so you have 40 ppl working on feature X, and 40 ppl working on feature Y.
In the end, you need to integrate and test X and Y together, since they are dependent.
Merging X first, and Y second does compile, but can't be integration-tested.

How would you do this, in case you only allow a single CM version in the monorepo ?

Meta's Sapling source-code management system

Posted Nov 17, 2022 14:07 UTC (Thu) by pkolloch (subscriber, #21709) [Link]

Incremental integration with feature flags.

Meta's Sapling source-code management system

Posted Nov 17, 2022 14:36 UTC (Thu) by khim (subscriber, #9252) [Link] (4 responses)

> Merging X first, and Y second does compile, but can't be integration-tested.

If that's a monorepo then there are no merging. Individual commits are merged, of course (you can not have few thousand people working on the same code and not have some conflicts) but features are never implemented in branches.

Branches are for bugfixes.

Android couldn't follow that model 100% because of organisational issues, but it tries.

> OK, so you have 40 ppl working on feature X, and 40 ppl working on feature Y.

That just means that 80 people are committing to the trunk, what's the problem?

Meta's Sapling source-code management system

Posted Nov 17, 2022 22:06 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (3 responses)

> Branches are for bugfixes.

Not even. At least in my experience, branches are for releases. You branch at a point where the build is green, do any cherrypicks that you need, cut a release on the branch, and that's it. No merging. The branch just gets abandoned (maybe we GC it eventually?).

The corollary to this: If your code does not run at HEAD, that's your problem. You cannot make your own private branch where you use some ancient version of libfoo that nobody else is willing to support. When libfoo updates, everyone is expected to update with it, or else your code stops building (and, eventually, stops running in production). Depending on the size and reasonableness of the breakage, the people who maintain (and/or vendor) libfoo will probably be expected to help you transition to the new version, or even to do it for you, but you can't just say "we like the old version better" and expect that to end the discussion.

The corollary to the corollary: You really want to have good test coverage, because the libfoo maintainers can't be reasonably expected to find the breakage if the tests all pass (or if there are no tests).

Meta's Sapling source-code management system

Posted Nov 17, 2022 22:25 UTC (Thu) by khim (subscriber, #9252) [Link]

This is similar to crater run, I guess.

Only crater run ensures that compiler can be updated (and not other libraries) while in monorepo everything is supposed to work like that (but you can also update all the clients, which is the whole reason it's a monorepo).

Meta's Sapling source-code management system

Posted Nov 29, 2022 3:37 UTC (Tue) by brooksmoses (guest, #88422) [Link]

Yup; in my experience the "branches are for bugfixes" comes up when you need to fast-track a very specific bugfix, and so you cherrypick it onto the release branch of the existing release and then make a new release from that branch.

Meta's Sapling source-code management system

Posted Nov 29, 2022 13:59 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Hmm. We use topic branches for *all* development (there are a few exceptions; mainly automatic development version number bumps, but nothing manual). Branches for releases are `-s ours` merged into more recent branches (this preserves an "all history is reachable from HEAD" property and means we can trivially resurrect any old branch for maintenance as needed). But we also have strict vendoring rules and mangle everything to avoid conflicts with anything that could be loaded in the same process (such is life when you make SDK-like things, not end-user products).

Meta's Sapling source-code management system

Posted Nov 17, 2022 22:52 UTC (Thu) by bartoc (guest, #124262) [Link] (1 responses)

I broadly agree (also, note that if you have a multi-repo scheme like this you can use git namespaces to keep the separate heads, branches, and tags while using the same object store and thus possibly deduplicating more things).

However, I think repo is .... not that good. It's mostly submodules plus some features that are almost always not a good idea. I don't think this is really true of repo given its age and Gerrit integration, but a lot of these tools feel like someone reading that submodules were problematic somewhere and just reinventing them without really understanding them. Basically all the criticisms of submodules have easy solutions or are just misunderstandings about how git works.

Meta's Sapling source-code management system

Posted Nov 18, 2022 1:09 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Eh. We use submodules and I still don't like them. I don't think there are *better* solutions that are as easy to use when they are updated "often enough" (subtree extraction/merging works for "infrequent" updates). The lack of easy sharing between worktrees and local forks is painful as well when coupled with poor support for shallow cloning the things. My biggest gripe is `git archive` just punting instead of doing anything useful. `git-archive-all` is better, but still doesn't handle the cornercases that we end up hitting (custom attributes are only supported at the top-level; we can't export-ignore either because we need to query for other attributes).

The solutions exist, but are spread out, not easy to stitch together, or just end up being custom code.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds