Bazel 1.0 released

[Posted October 17, 2019 by corbet]

Google has announced version 1.0 of its Bazel build system. "A growing list of Bazel users attests to the widespread demand for scalable, reproducible, and multi-lingual builds. Bazel helps Google be more open too: several large Google open source projects, such as Angular and TensorFlow, use Bazel. Users have reported 3x test time reductions and 10x faster build speeds after switching to Bazel."

Bazel 1.0 released

Posted Oct 17, 2019 18:16 UTC (Thu) by suckfish (guest, #69919) [Link] (14 responses)

How does Bazel handle the #include-dependencies of C/C++ code?

Looking at the docs, I got the impression that you need to explicitly list the #includes in the Bazel build files.

Automated support for tracking #includes is pretty much an absolute requirement for incremental builds of any sizable C/C++ project, but I don't see any mention of this in the examples? Am I missing something?

Bazel 1.0 released

Posted Oct 17, 2019 19:32 UTC (Thu) by ch33zer (subscriber, #128505) [Link] (13 responses)

If you #include a file in a .cc or .h then you must add a dependency on a bazel rule exporting that file, usually a cc_library rule with a hdrs field listing the header you include. See here for more:

https://docs.bazel.build/versions/master/tutorial/cpp.htm...

Bazel 1.0 released

Posted Oct 17, 2019 21:17 UTC (Thu) by suckfish (guest, #69919) [Link] (12 responses)

I'm looking a moderate (500kloc) size C++ source code, there are about 10000 #include's spread over 1500 or so source files.
To use Bazel, I would need to maintain an entry for each of those 10000 #include's in the build files, with no automated support from Bazel?
I don't see how using Bazel is practical for any non-trival C/C++ build, automated tracking of the #include graph in the build system is essential.

(For comparison, autotools uses a combination of makefile rules and compiler flags to extract the #include graph automatically. Other build systems like CMake, Meson I believe also do this).

Bazel 1.0 released

Posted Oct 17, 2019 21:38 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

I tried to use Bazel once and soon decided it was utterly impractical for almost anything: not just because of the #include thing (though that was a big part of it) but also because it was incredibly slow, dependent on a JVM, horrendously RAM-hungry, and appeared to assume a world where every one of your dependencies gets downloaded and rebuilt every time you want to cite it (which in a free software world you *never* want to do: only proprietary vendors who think Chromium's third_party/ tree is just great would imagine this was a good thing). I'm sure it's good for some uses, but from the perspective of this longtime GNU make/autotools/meson user, it was the only build system I have ever tried to use more aggravating than scons.

The collection of hilarious war stories that is https://archive.fosdem.org/2018/schedule/event/how_to_mak... is worth a look. I saw this before I tried Bazel: as a sometime packager, I've run into a lot of his complaints myself and said the same things -- and everything he said about Bazel made me go WAT?! Why would anyone *do* that?! (And then I forgot all about it and tried Bazel anyway. This was probably stupid of me.)

Perhaps I am biased: maybe it *is* good if you are trying to build a mixture of Java, Go and C++ in a massively distributed system to produce a single statically linked monster binary (and as long as those are the only three languages you're interested in). But frankly the lack of automatic include tracking makes me think that only the most disciplined of codebases can possibly work even then, unless you write code yourself to autogenerate Bazel's intended-to-be-manually-written lists of #includes. (But if you're doing that, you might as well use Meson, which does that for you and is quite capable of building Go and Rust code with a bit of manual hacking, and probably Java too: I suspect it's easier to extend than Bazel, too, but I'll admit I'm biased in favour of Python and strongly against a mass of Java and C++ for build system tooling.)

Bazel: probably a great build system, if you are Google, but so finely tuned to its highly unusual requirements that most other people should probably treat it mostly as a cautionary tale.

Bazel 1.0 released

Posted Oct 18, 2019 2:43 UTC (Fri) by suckfish (guest, #69919) [Link]

The web devs I work with use bazel and love it. But it does seem to be designed for quite different work flows to what C/C++ devs are used to.

Bazel 1.0 released

Posted Oct 19, 2019 17:55 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (1 responses)

As pointed out below (and in your comment), it makes sense in the context of Google:

https://m-cacm.acm.org/magazines/2016/7/204032-why-google...

And, as you said, outside of that... Probably not so much.

But the same could be said for some of the decisions made around Go...

And google has always been bad at documenting their products.

>maybe it *is* good if you are trying to build a mixture of Java, Go and C++ in a massively distributed system to produce a single statically linked monster binary (and as long as those are the only three languages you're interested in)

Based on my limited knowledge, this is a quite concise & elegant description of how Google develops ;)

Bazel 1.0 released

Posted Oct 19, 2019 18:31 UTC (Sat) by ssmith32 (subscriber, #72404) [Link]

Reading through the "who uses bazel" link:

https://github.com/bazelbuild/bazel/wiki/Bazel-Users

Was very illuminating. You're correct in all your assumptions, but I found it surprising how many companies have monorepos.

Other than that, there were the usual suspects:

A lot of Google projects to inflate the numbers.

People using the Google build service.

People realizing that the hipster version of ivy (Gradle) has many of the same issues.

People acknowledging that sbt is a horror show (I *never worried about slow build tools, was always like "whatever" if the build was slow... Until.... I... Started..... Working........ On............ Scala............. Projects. The pain is real).

There are also just some amusing comments.

Wix talking about how they hope to one day migrate to a monorepo.

Redfin:
"Running npm install under Bazel proved troublesome, so we reimplemented it completely in Python! This worked very well, but we had to abandon it: although it worked well in CI on Linux, it was prohibitively slow on macOS."
"In addition to being more performant on developer laptops, it also worked well on Linux."

The subtle implication of that last statement made me cringe (I did finally cave at my current company, and have temporarily given up on getting a linux dev environment. Temporarily. Someday, I will have proper keyboard shortcuts, actual full screen, and a sane shell again. I usually get one eventually, and shock people with things like containers that don't require VMs to run them, tooling that works really well with the deploy environment). The funniest situation was at my last company. I worked next to a guy who used his Mac as a VM environment to run a bunch of terminal windows into a Linux VM. As someone who talked his way into a Linux desktop dev environment (it's what all the SREs did), it just made me laugh. I mean, just use Linux, at that point!?

The rest of the comments just left me amused.

It looks like building complex software is in the same state as technical interviews: it's a mess everywhere, and the people who think they've found "the solution" have just become desensitized to whole classes of pain.

Bazel 1.0 released

Posted Oct 18, 2019 7:53 UTC (Fri) by pharm (guest, #22305) [Link] (6 responses)

It’s not quite that bad - you can define objects in bazel which have a set of headers associated with them (eg, a c++ library) and every time you assert a dependency on that library you automatically get all of the headers associated with it imported into the build.

Where bazel really scores is a) hermetic, repeatable builds which in turn enables b) caching. Lots & lots of caching.

This means that (if you’re Google) you can run a massive build farm that parallelises your build and simultaneously takes advantage of any previously built artifacts that your current build depends on, down to the individual compiled object code level.

The second way in which bazel scores (if you’re Google and are using a massive monorepo) is that it only pulls in the build instructions that your current project depends on. Someone has broken the build system for some part of the code you don’t happen to depend on right now? Your build still works, even if there are errors in bazel files elsewhere. This obviously really matters when you’re as large as Google & using a monorepo as otherwise you end up serialising your development behind whoever happens to be working on the build system at a given time.

If you’re not Google? Well, bazel’s advantages are less obvious. The hermetic builds are still nice, but you’re probably not going to gain as much from the rest of it & it’s a build tool that insists you do everything it’s way - you can use it to call out to other build systems, but it really wants to be in control. It doesn’t play well with package managers, nor with passing control back and forth between different build systems. The documentation is often opaque & parts of the internals are downright obtuse. Google can cope, because it has direct access to the developers who understand the system from the inside out, but non-Google developers are left to flounder around the moment they need something outside of bazel’s happy path. If you’re lucky, someone has already done the work for you and stuck it up on github somewhere, but you can’t rely on that.

(All the above from spending 18 months working on a 3-4 developer green field project where the build system was bazel.)

Bazel 1.0 released

Posted Oct 18, 2019 9:36 UTC (Fri) by farnz (subscriber, #17727) [Link] (5 responses)

One more thing you missed - Google can maintain a build farm that keeps the cache hot by running in a loop building the latest code. Thus, when you rebase your small change onto the latest code, instead of needing to build everyone else's changes as well as your own, you just need to build your own changes.

This, in turn, makes developing on trunk far nicer; while there's a huge churn rate overall, you only pay the costs of your own churn.

FWIW, at a deep technical level, the monorepo/multirepo discussion is about which tools need to support your project; a multirepo biases you towards needing better support for dependency management (packaging Debian-style, for example), while a monorepo biases you towards tools that don't have O(repo size) behaviour. Other than that bias, the only difference between the two is that a monorepo permits atomic commits across multiple projects, where a multirepo requires the engineer trying to do such a commit (e.g. fixing an internal API design error everywhere) to co-ordinate commits to each repo. Hence why the Linux kernel is a monorepo - it's easier to co-ordinate fixing a driver bug in a single commit than to try to commit to N repos.

Bazel 1.0 released

Posted Oct 19, 2019 5:44 UTC (Sat) by gps (subscriber, #45638) [Link]

Google's build cache loop is a combination of tens of thousands of engineers distributed around the globe and a similar number of continuous integration pipelines.

Bazel 1.0 released

Posted Oct 19, 2019 18:59 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (3 responses)

Managing varying levels of commit access in a monorepo can be difficult a well.
As is letting several teams happily coexist with slightly different dev processes (and, yes, tools).
And since languages also often dictate tools, you're also limiting what languages can be used, once you start down the monorepo path (since, as you said, it's tied to tooling).

>Other than that bias, the only difference between the two is that a monorepo permits atomic commits across...multirepo requires the engineer trying to do such a commit

That seems incorrect.

1) there are other differences

2) In a multirepo, shared code should be in a shared, versioned, artifact, and, yes, when you release a new, incompatible change, you communicate it, and slowly or quickly roll out the use of the updated artifact across all the repos. Buy why would you want to roll out such a change without communicating with the code owners?
Even in the kernel, I imagine the maintainers of the code where the api usage changes need to be made aware somehow, even if it's just approving their part of your massive atomic commit?
Why would you want them all to have to read through your massive atomic commit to find the part they care about?
What's so valuable about a massive single commit vs lots of smaller ones?
You're just as likely to miss a change in code in the one commit as you are to miss a single commit in a chain of them, so making sure it's all-or-nothing doesn't apply - testing is what finds mistakes there.

Versioned, shared libraries are a thing for a reason..
If you're in the kernel, yes, there are just practical limitations on using different versions of a shared library.. but other than that, I just don't see it..

Bazel 1.0 released

Posted Oct 21, 2019 0:46 UTC (Mon) by khim (subscriber, #9252) [Link] (1 responses)

> And since languages also often dictate tools, you're also limiting what languages can be used, once you start down the monorepo path (since, as you said, it's tied to tooling).

Which is, of course, huge advantage. Becauze every time you introduce weird language in the mix - you make a particular component quite a pain point: only small subset of potential developers could touch is or even understand what goes on there.

Now, of course if you have large group of people who all know a particular language (doesn't matter which one: Raku, Rust or anything else) - you could extend Bazel.

But the fact that's not easy to do is considered good thing.

> Even in the kernel, I imagine the maintainers of the code where the api usage changes need to be made aware somehow, even if it's just approving their part of your massive atomic commit?

Do developers of GTK+ get ACKs from all the developers of GTK-based applications? Or libstdc++ users? No? Then why monorepo would change that?

True - usually one of developers of the affected component ACKs changes… but if s/he's not available or just unresponsive this doesn't stop changes… someone with wider scope (Linus in worst case) just accepts these changes - and that's it.

> Why would you want them all to have to read through your massive atomic commit to find the part they care about?

They would just look on the parts them own, obviously. Why the fact that change affects other parts of repo is ever relevant to them?

> What's so valuable about a massive single commit vs lots of smaller ones?

The fact that you don't have to somehow provide two APIs: old and new. It's especially valuable in kernel because when locking rules change it's often impossible to maintain both old and new APIs simultaneously. But it's nice property to have in userpace, too.

> You're just as likely to miss a change in code in the one commit as you are to miss a single commit in a chain of them, so making sure it's all-or-nothing doesn't apply - testing is what finds mistakes there

That's why you try to change API in way that old code would become uncompileable. Then you could find breakages with a compiler. Yes, it's not always possible - but even if it's possible in 90% of cases (and in my experience it's actually more often than that) this greatly reduces amount of testing needed.

> Versioned, shared libraries are a thing for a reason..

Sure, they allow you to save some disk space and RAM. Not an issue at all for the majority of today's system. Worse, unless you are actually using library in dozens of projects (and majority of libraries are not used in dozens of projects) the end result may well be increase of waste of disk and memory - because of the need to support all the revisions of API which ever been exposed by the component.

In practice the end result is often combination of worst sides of both words: you have to pick one particular version of a givent library for you project because they are not compatible and each two-week release introduces new, incompatible - yet couldn't have nice atomic commits.

Bazel 1.0 released

Posted Oct 21, 2019 15:51 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> Sure, they allow you to save some disk space and RAM.

Between block-layer deduplication and COW, I'm not entirely convinced this is even true any more (except for very small stuff, maybe).

Bazel 1.0 released

Posted Oct 21, 2019 9:35 UTC (Mon) by farnz (subscriber, #17727) [Link]

What other differences? Apart from those related to implementation details such as ACLs, there are no other fundamental differences in what is and is not possible in a monorepo or multirepo world.
In a monorepo, shared code can also be in a shared, versioned artifact. None of the process you describe is impossible in a monorepo, but you can do the big atomic commits without waiting for all maintainers if you need to. And the point of all-or-nothing is that the sorts of changes I'm thinking of are ones where a failure to apply the change results in a tool indicating that you've failed (a linter, a compiler, something), so that you can trivially confirm that the change went to everywhere it needed to be.

And a monorepo does not prevent you from working as-if you were in multirepo land if that is appropriate to the project. It's just that the borders between repos are now fluid - you can "move" code around and treat it as external to your project or internal to your project as the technical requirements dictate; it's a lot harder to move a chunk of one repo into its own repo for sharing when that matters.

Shared, versioned libraries also don't solve the problem either - if I depend on the API of libfoo.2.so, and I depend on libbar.5.so that needs the API of libfoo.1.so, but where I need to share libfoo data structures between my code and libbar's code, there's no way to do that; I have to port libbar to libfoo.2.so, or I have to use libfoo.1.so. This restriction is independent of how you store your code, and applies just as much in a monorepo as in a multirepo.

Bazel 1.0 released

Posted Oct 18, 2019 13:07 UTC (Fri) by excors (subscriber, #95769) [Link]

Based on https://docs.bazel.build/versions/master/be/c-cpp.html#cc... it seems the idea is that when you declare a library, "hdrs" is the library's public API header files, and "srcs" is the source files *and* the internal header files.

You don't need to declare precisely which source files depend on which header files, you just list them all in srcs/hdrs. When building the library, it might do an incremental build as an optimisation (and I think it does, using "gcc -MD" to find dependencies) but that's an implementation detail.

When another package depends on that library, it must only directly include header files that were listed in "hdrs". (Except apparently that isn't checked, due to implementation difficulties, so you can directly include the internal header files from an imported library's "srcs" too. But you shouldn't.)

If you have a project with 1500 source files, you might want to split it into several packages. You need to explicitly specify the dependencies between those packages, and explicitly specify their public API headers, so the packages are built in the right order and with the right files imported into the build environment. But you don't need to specify dependencies between files within a package, that's all automatic.

But I don't know if what I'm saying is actually correct. The documentation is not very helpful when all its examples of C++ packages have a single source file.