Meta's Sapling source-code management system
Sapling began 10 years ago as an initiative to make our monorepo scale in the face of tremendous growth. Public source control systems were not, and still are not, capable of handling repositories of this size. Breaking up the repository was also out of the question, as it would mean losing monorepo’s benefits, such as simplified dependency management and the ability to make broad changes quickly. Instead, we decided to go all in and make our source control system scale.Starting as an extension to the Mercurial open source project, it rapidly grew into a system of its own with new storage formats, wire protocols, algorithms, and behaviors. Our ambitions grew along with it, and we began thinking about how we could improve not only the scale but also the actual experience of using source control.
At this point, only the client side of the system has been released; the
company "hopes to
" release the rest later.
Posted Nov 16, 2022 15:26 UTC (Wed)
by rjek (subscriber, #94501)
[Link] (1 responses)
Posted Nov 17, 2022 15:05 UTC (Thu)
by ldearquer (guest, #137451)
[Link]
Posted Nov 16, 2022 16:15 UTC (Wed)
by eplanit (guest, #121769)
[Link] (17 responses)
Posted Nov 16, 2022 16:27 UTC (Wed)
by mgk (guest, #74833)
[Link]
Posted Nov 16, 2022 17:12 UTC (Wed)
by Sesse (subscriber, #53779)
[Link] (1 responses)
I'm not really sure what the client alone is good for, though. I assume there's no public server?
Posted Nov 16, 2022 17:18 UTC (Wed)
by geofft (subscriber, #59789)
[Link]
The blog post says "You can now try its various features using Sapling’s built-in Git support to clone any of your existing repositories." and "Many of our scale features require using a Sapling-specific server and are therefore unavailable in our initial client release."
Note that in addition to the Sapling CLI, they're also releasing ReviewStack, an alternative user interface for reviewing GitHub pull requests. The code appears to be in the Sapling repo, and there's also a public instance of it at https://reviewstack.dev .
Posted Nov 16, 2022 17:13 UTC (Wed)
by geofft (subscriber, #59789)
[Link] (13 responses)
I'm hoping the end result of this is something similar to what Microsoft did with Scalar - the good ideas from it get merged into upstream Git, instead of it becoming yet another standalone VCS.
This is also a pathway that Meta themselves is familiar with: Instagram released their internal CPython fork Cinder https://github.com/facebookincubator/cinder not because they want people to use Cinder itself but because they want it as a public base of discussion to upstream the good ideas into actual CPython, so they can eventually drop the fork.
Posted Nov 17, 2022 3:56 UTC (Thu)
by bartoc (guest, #124262)
[Link] (1 responses)
Yeah, the up-streamed stuff from gvfs/scalar is also pretty vastly improved over what was in the old fork. sparse index/worktree/clones are way better than gvfs because you don't need to worry about some random program enumerating the git repo (including getting file sizes) and causing gvfs to download everything. I had both TortoiseGit and WinDirStat do this, it's quite annoying.
Something not in git uptream that I would love to see is a way to automatically symlink/junction git submodules (the ones in .git/modules) to some central area, scalar (the from git-for-windows) does seem to _somehow_ do this, I think using the alternates mechanism and a shim clone/fetch command but it's not super clean. I would be happy just getting modules pointing to exactly the same initial remote pointing somewhere common, it would at least make it harder to end up with 10 different copies of LLVM's repo on my machine.
Oh, another pretty easy win (on linux, at least, but perhaps on windows and mac with a compatibility shim) would be teaching git-checkout-index to use copy_file_range when available. The first "chunk" of the (unpacked) git object doesn't match the first "chunk" of the checked-out file so it's kinda filesystem specific if this works. And ofc on windows even if you have a compat shim almost nobody can use it because ReFS/btrfs/zfs are not widely used (I think those are all the CoW filesystems with windows implementations).
Posted Nov 19, 2022 4:00 UTC (Sat)
by sionescu (subscriber, #59410)
[Link]
That would be throwing the baby out with the bathwater. Having a single total view of the repo has so many benefits that if some devtool can't cope with the size, then it's time to blacklist or fix it.
Posted Nov 17, 2022 10:43 UTC (Thu)
by nysan (guest, #81015)
[Link] (10 responses)
Compare above with an google-repo XML file in a git repo, describing multiple sub-git-repos.
Its essentially the same thing. Monorepo is just way worse.
Posted Nov 17, 2022 12:20 UTC (Thu)
by khim (subscriber, #9252)
[Link] (7 responses)
How much time this “end” needs? AFAIK Google's one haven't devolved into that. The trick is simple: don't provide means to combine two versions. Period. If you need two versions of some third-party code for some reason then you just create two directories. Like Python2 vs Python3 difference was handled in the linux distros for years, too. Why is that a problem? As long as you don't start weird automerger schemes and just treat code from another repo as “third party” and import code in the appropriate fashion everything works. Google does that with abseil AFAIK.
Posted Nov 17, 2022 12:51 UTC (Thu)
by nysan (guest, #81015)
[Link] (6 responses)
OK, so you have 40 ppl working on feature X, and 40 ppl working on feature Y.
How would you do this, in case you only allow a single CM version in the monorepo ?
Posted Nov 17, 2022 14:07 UTC (Thu)
by pkolloch (subscriber, #21709)
[Link]
Posted Nov 17, 2022 14:36 UTC (Thu)
by khim (subscriber, #9252)
[Link] (4 responses)
If that's a monorepo then there are no merging. Individual commits are merged, of course (you can not have few thousand people working on the same code and not have some conflicts) but features are never implemented in branches. Branches are for bugfixes. Android couldn't follow that model 100% because of organisational issues, but it tries. That just means that 80 people are committing to the trunk, what's the problem?
Posted Nov 17, 2022 22:06 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Not even. At least in my experience, branches are for releases. You branch at a point where the build is green, do any cherrypicks that you need, cut a release on the branch, and that's it. No merging. The branch just gets abandoned (maybe we GC it eventually?).
The corollary to this: If your code does not run at HEAD, that's your problem. You cannot make your own private branch where you use some ancient version of libfoo that nobody else is willing to support. When libfoo updates, everyone is expected to update with it, or else your code stops building (and, eventually, stops running in production). Depending on the size and reasonableness of the breakage, the people who maintain (and/or vendor) libfoo will probably be expected to help you transition to the new version, or even to do it for you, but you can't just say "we like the old version better" and expect that to end the discussion.
The corollary to the corollary: You really want to have good test coverage, because the libfoo maintainers can't be reasonably expected to find the breakage if the tests all pass (or if there are no tests).
Posted Nov 17, 2022 22:25 UTC (Thu)
by khim (subscriber, #9252)
[Link]
This is similar to crater run, I guess. Only crater run ensures that compiler can be updated (and not other libraries) while in monorepo everything is supposed to work like that (but you can also update all the clients, which is the whole reason it's a monorepo).
Posted Nov 29, 2022 3:37 UTC (Tue)
by brooksmoses (guest, #88422)
[Link]
Posted Nov 29, 2022 13:59 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Posted Nov 17, 2022 22:52 UTC (Thu)
by bartoc (guest, #124262)
[Link] (1 responses)
However, I think repo is .... not that good. It's mostly submodules plus some features that are almost always not a good idea. I don't think this is really true of repo given its age and Gerrit integration, but a lot of these tools feel like someone reading that submodules were problematic somewhere and just reinventing them without really understanding them. Basically all the criticisms of submodules have easy solutions or are just misunderstandings about how git works.
Posted Nov 18, 2022 1:09 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
The solutions exist, but are spread out, not easy to stitch together, or just end up being custom code.
Posted Nov 16, 2022 17:17 UTC (Wed)
by q_q_p_p (guest, #131113)
[Link] (2 responses)
Posted Nov 16, 2022 17:47 UTC (Wed)
by Sesse (subscriber, #53779)
[Link] (1 responses)
Posted Nov 16, 2022 18:08 UTC (Wed)
by q_q_p_p (guest, #131113)
[Link]
Posted Nov 16, 2022 18:20 UTC (Wed)
by IanKelling (subscriber, #89418)
[Link] (7 responses)
https://engineering.fb.com/2022/11/15/open-source/sapling...
"I’d also like to thank the Mercurial open source community for all their collaboration and inspiration" but this is also an announcement that their contributions are no longer welcome under GPL as they were before. Unless I'm missing something, that seems like a rather backhanded thank you.
Posted Nov 17, 2022 10:42 UTC (Thu)
by paulj (subscriber, #341)
[Link] (6 responses)
The internal history of this, as I understood it (a couple of years out of date, and not near any team responsible - just as a user), is that FB started with mercurial. They then had to heavily customise hg to make it work at the ever greater scales they had internally with more and more code and developers working on it. Until they had effectively completely rewritten the back-end to use a Facebook specific, distributed object store - that's the "eden" bit in the source code I think (maybe simplified / pared-down for external use, I don't know). The front-end hg tools I think were heavily modified too. Sl however is a from scratch rewrite. I think it started out as a wrapper around the hg tools, but grew into a standalone front-end. There was also an effort to reimplement the back-end in Rust, along with front-end tooling for that - I think that's the "Mononoke" bit in the code, IIRC.
Last I remember, I /think/ the developer workflow still had some odd cases where you needed to use the hg commands, but for nearly all stuff you could use sl for your daily work-flow.
I presume that progressed to the stage where the completely rewritten Facebook^WMeta front-end + backend, sl / mononoke, can do everything itseflf, and is feature complete - and hence this can be released as "sapling" (retro-fitted name).
Posted Nov 17, 2022 10:47 UTC (Thu)
by paulj (subscriber, #341)
[Link] (1 responses)
I'd be a bit sceptical of using Facebook stuff outside of FB. There are internal brownie points for releasing stuff as open-source perhaps, but there are few to none for taking the time to maintain open-source stuff. Also, the internal culture is to build everything from a mono-repo, and have no concern for backward compatibilities (other than the non-atomic roll outs of binaries/artifacts from a build from said mono-repo). So I'd hate to depend on FB code outside of FB. In particular, the FB C++ library (folly) maintainers explicitly are hostile to attempts to make life easier for maintaining code out of FBCode that depends on their stuff.
Posted Nov 29, 2022 12:25 UTC (Tue)
by scientes (guest, #83068)
[Link]
Except ZSTD.
Posted Nov 17, 2022 15:28 UTC (Thu)
by IanKelling (subscriber, #89418)
[Link] (3 responses)
I downloaded the repo. It is a copy of the mercurial repo from 2005 onward until it forks.
Posted Nov 17, 2022 16:45 UTC (Thu)
by paulj (subscriber, #341)
[Link] (2 responses)
I'm looking at https://github.com/facebook/sapling and - at a high-level anyway - I don't see anything from mercurial in the head/tip, nor in the history. I know internally that monnooke was a from-scratch rewrite. And the Eden object store and SCM backend isn't from mercurial either.
I could be confused, but can you be more explicit about what part of Sapling started out as mercurial code?
Posted Nov 18, 2022 13:52 UTC (Fri)
by IanKelling (subscriber, #89418)
[Link] (1 responses)
Posted Nov 18, 2022 14:15 UTC (Fri)
by paulj (subscriber, #341)
[Link]
The mercurial code is there at: https://github.com/facebook/sapling/tree/main/eden/scm/ed... - maybe in other places.
Posted Nov 16, 2022 19:24 UTC (Wed)
by sdalley (subscriber, #18550)
[Link] (15 responses)
Posted Nov 17, 2022 0:05 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (14 responses)
In general, Git's attitude seems to be that high-level concepts like commits and rebases should be understood directly in terms of their low-level on-disk representations as trees, refs, etc. Git's on-disk representation uses a staging area, so therefore you have a staging area as part of the UI. Mercurial does not do this. In Mercurial, the on-disk representation is considered an implementation detail, subject to revision at any time, and you are expected to understand commits as primitive objects. A staging area would be redundant to what Mercurial calls a "secret commit" (i.e. a commit that you don't intend to push, and that the tooling will prevent you from pushing accidentally), so Mercurial does not supply a staging area, even though there is some denormalization under the hood. This is a relatively small difference of opinion, but an important one.
Posted Nov 17, 2022 5:16 UTC (Thu)
by jthill (subscriber, #56558)
[Link] (2 responses)
Git's not abstract, it's concrete, that's true; it uses abstractions to help understand and describe what's possible, not to limit it. Pretending that that's some sort of abstract principle rather than for concrete benefit is rather spectacularly missing the point.
Posted Nov 17, 2022 17:54 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Ironically, this sentence is too abstract, and I have no idea what you are talking about.
Posted Nov 11, 2024 18:58 UTC (Mon)
by jthill (subscriber, #56558)
[Link]
Okay, this might be such an extreme necro it qualifies as actually weird but trying to find words for a reply to this has been niggling at my hindbrain all this time. That acknowledged,
Git is: a dag of snapshots plus annotated tags in the object db; local refs; and an index for tracking work on (often constructing new) snapshots. That's it. Everything else, everything else, is in whatever's-useful-in-your-work territory.
There's software design that starts with some perceived ideal/need and jumps straight to abstractions which are then explained and implemented, this is the root of the "implementation details don't matter" view of software, the "abstract principle first" sort of design that views any behavior not covered by the abstraction as aberrant, egregious.
Then there's software design that starts with basically a data structure and asks "what use can be made of it", it might start out as a design for a perceived need but abstractions are just ways of talking about the effects you can get.
What I'm saying is: Git's the second kind. Anything you can do with a dag of snapshots and re-hanging local labels, you can do with Git. The people who want definitive and elegant abstractions tend to express distaste for this, they'll call Git's UI a leaky abstraction and get more pejorative from there. And I think that's where they're entirely missing the point. Git's a tool, a data structure plus commands to work with it. The Git interface uses abstractions to talk about the useful things you can do, not to define what's proper.
Posted Nov 19, 2022 15:01 UTC (Sat)
by kleptog (subscriber, #1183)
[Link] (5 responses)
The staging area is useful precisely *because it is not a commit*. You can add chunks, remove chunks, edit chunks in preparation for commit and only at the last moment do you actually make the commit. When making significant changes, it's not always immediately apparent which parts go where and having a separate staging area helps managing this.
I'm not sure I could go back to a VCS without a staging area. Secret commits seem like a straitjacket in comparison.
Posted Nov 20, 2022 0:56 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
> The staging area is useful precisely *because it is not a commit*. You can add chunks, remove chunks, edit chunks in preparation for commit and only at the last moment do you actually make the commit.
You can do all of those things with a secret commit, too. hg commit -i will happily prompt you for the precise chunks you want, let you edit them, etc, in exactly the same way as git add -p. The only difference is the terminology.
Posted Nov 20, 2022 1:14 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
When there staging area is empty:
* git commit does nothing, so it has no equivalent.
When the staging area is nonempty:
* git commit is equivalent to hg phase -d . (last argument is a dot and is the hg equivalent of HEAD)
Bonus feature: You can stack multiple staging areas on top of each other, by using commit instead of amend. Git can't do that without using something like stash, which requires you to fiddle with an entirely different set of commands.
Posted Nov 20, 2022 1:15 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
Rather, hg commit --secret -i, assuming you still want to work on it some more.
Posted Nov 21, 2022 6:30 UTC (Mon)
by roc (subscriber, #30627)
[Link] (1 responses)
There is simply no reason for the staging area to exist. If it was more fungible than a commit, that would be an indication to make commits more fungible, not to introduce an entirely new concept.
Posted Dec 6, 2022 15:54 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Posted Nov 19, 2022 15:21 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (4 responses)
That has a MAJOR benefit. If your understanding of the abstraction is different from mine, there is no "source of truth" to put us right. With git, you just point to the on-disk structure and say "There!".
There is another MAJOR benefit. While I can't speak to the stats, higher Mathematics requires the ability of abstract thought. Somewhere I came across the "fact", that people acquire this ability about age 14, and maybe *less than half* the population EVER acquire it. In other words, you have to be above average to understand how Mercurial works? Even worse, there's no source of truth to tell you whether you're right?
(If you remember that long screed about Pick and Relational, we have exactly the same thing - Pick may be abstract but it is heavily defined in how it maps to disk structures. Relational is defined in mathematical tuples and how it works is "ignore that man behind the curtain. Things are so much easier to understand when they map to real-world concepts you can build on.)
Cheers,
Posted Nov 20, 2022 0:58 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
The program's behavior is the source of truth. The abstraction is what it is, no more and no less.
Posted Nov 20, 2022 8:40 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (2 responses)
Cheers,
Posted Nov 20, 2022 15:49 UTC (Sun)
by kleptog (subscriber, #1183)
[Link] (1 responses)
Basically, SQL is a language with many dialects. No large application can ignore the characteristics of the specific implementation they're using.
Posted Nov 21, 2022 8:34 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
This. Because Pick *expects* you to know the characteristics of the database, the reality is that they're all very similar. We did a major port between two different implementations once, and the bulk of the work was jsut *tweaking* the DataBASIC so it compiled on the new system. (That plus QA, of course.)
Now I'm working on yet another different implementation, I'm not noticing any real differences. The biggest, off the top of my head, is the lack of the SEQUENTIAL file type (a table optimised for sequential numeric keys). I guess the standard dynamic hash has improved ...
Cheers,
Posted Nov 17, 2022 10:34 UTC (Thu)
by nysan (guest, #81015)
[Link]
Posted Nov 19, 2022 18:57 UTC (Sat)
by ahornby (subscriber, #3366)
[Link]
Posted Nov 22, 2022 13:23 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link]
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
A big configspec to select different versions from different subdirectories of the monorepo. And then comes a corporate merger, and now you have two monorepos. :-O
> In the end, all monorepos end up into what clearcase used to be. A big configspec to select different versions from different subdirectories of the monorepo.
Meta's Sapling source-code management system
repo
exist because Google needed something similar to monorepo but open-sourced. I deal with it on my $DAYJOB
. It kinda-sorta works but is just so flaky, cumbersome and unreliable compared to normal monorepo.Meta's Sapling source-code management system
In the end, you need to integrate and test X and Y together, since they are dependent.
Merging X first, and Y second does compile, but can't be integration-tested.
Incremental integration with feature flags.
Meta's Sapling source-code management system
> Merging X first, and Y second does compile, but can't be integration-tested.
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Meta's Sapling source-code management system
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
* git commit -a is equivalent to hg commit
* git add [file] is equivalent to hg commit --secret [file]
* git add -p is equivalent to hg commit -i
* git reset --mixed does nothing, so it has no equivalent.
* If a file is newly created or deleted, you have to run hg add/remove on it. hg forget will stop tracking a file without deleting it. This also applies to the nonempty case.
* git commit -a is equivalent to hg phase -d . && hg amend (commands can be run in either order)
* git add [file] is equivalent to hg amend [file]
* git add -p is equivalent to hg amend -i
* git reset --mixed is equivalent to hg uncommit --no-keep
* Since the staging area has a description like any other commit, you might want to change it. hg amend -e will change the description, but also does a regular amend; you can pass additional arguments to tell it not to include any files in the amend, or make an alias for that if you need to do it frequently.
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Wol
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Wol
Sapling source-code management system - no staging area?
Sapling source-code management system - no staging area?
Wol
Meta's Sapling source-code management system
Meta's Sapling source-code management system
No commit signing
There's an open issue #218 asking for that.