The costs of continuous integration
By most accounts, the freedesktop.org (fd.o) GitLab instance has been a roaring success; lots of projects are using it, including Mesa, Linux kernel graphics drivers, NetworkManager, PipeWire, and many others. In addition, a great deal of continuous-integration (CI) testing is being done on a variety of projects under the fd.o umbrella. That success has come at a price, however. A recent message from the X.Org Foundation, which merged with fd.o in 2019, has made it clear that the current situation is untenable from a financial perspective. Given its current resources, X.Org cannot continue covering those costs beyond another few months.
X.Org board member Daniel Vetter posted a message to multiple mailing lists on February 27. In it, he noted that the GitLab instance has become quite popular, including the CI integration, which is good news. But there is some bad news to go with it:
The expense growth is mainly from "storing and serving build
artifacts and images to outside CI runners
sponsored by various companies
". Beyond that, all of that growth
means that a system administrator is needed to maintain the infrastructure,
so "X.org is therefore also looking for admin
sponsorship, at least medium term
". Without more sponsors for the
costs of the CI piece, it looks like those service would need to be turned
off in May or June, he said. The board is working on finding additional
sponsorship money, but that takes time, so it wanted to get the word out.
That set off a discussion of the problem and some possible solutions. Matt
Turner was concerned
that the bandwidth expense had not been adequately considered when the
decision was made to self-host the GitLab instance. "Perhaps
people building the CI would make different decisions about its
structure if they knew it was going to wipe out the bank account.
"
He wondered if the tradeoff was worth the cost:
Daniel Stone,
one of the administrators for the fd.o infrastructure (who gave a
talk on the organization's history at the
2018 X.Org Developers Conference), filled in some of the numbers for the
costs involved. He said
that the bill for January was based on 17.9TB of network egress (mostly
copying CI images to the test-running systems) and 16TB of storage for
"CI artifacts, container images, file uploads, [and] Git LFS
".
That totaled to almost $4,000, so the $75,000 projection takes into account
further growth. In a follow-up message,
he detailed the growth as well:
Stone also noted that the Google Cloud Platform (where the GitLab instance is hosted) does not provide all of the platform types needed for running the CI system. For example, Arm-based DragonBoards are needed, so some copying to external testing systems will be required. Using the cloud services means that bandwidth is metered, which is not necessarily true in other hosting setups, such as virtual private servers, as Jan Engelhardt pointed out. That would require more system administration costs, however, which Stone thinks would now make sense:
Dave Airlie argued
that the CI infrastructure should be shut down "until we work out what a
sustainable system would look like within the budget we have
". He
thought that it would be difficult to attract sponsors to effectively pay
Google and suggested that it would make more
sense for Google to cut out the middleman: "Having google sponsor the credits costs google
substantially less than having any other company give us money to do
it.
"
Vetter said
that Google has provided $30,000 in hosting credits over the last year, but
that money "simply
ran out _much_ faster than anyone planned for
". In addition, there are plenty of
other ways that companies can sponsor the CI system:
The lack of any oversight of what gets run in the CI system and which
projects are responsible for it is part of the problem, Airlie said. "You can't have a system in place that lets CI users
burn [large] sums of money without authorisation, and that is what we
have now.
" Vetter more or less agreed,
but said that the speed of the growth caught the board by surprise,
"so we're a bit behind on the controlling
aspect
". There is an effort to be able to track the costs by
project, which will make it easier to account for where the money is
going—and to take action if needed.
As part of the reassessment process, Kristian Høgsberg wanted
to make sure
that the "tremendous success
" of the system was
recognized. "Between gitlab and the CI, our
workflow has improved and code quality has gone up.
" He said that
it would have been hard to anticipate the growth:
Reducing costs
The conversation soon turned toward how to reduce the cost in ways that would not really impact the overall benefit that the system is providing. There may be some low-hanging fruit in terms of which kinds of changes actually need testing on all of the different hardware. As Erik Faye-Lund put it:
[...] We could also do stuff like reducing the amount of tests we run on each commit, and punt some testing to a per-weekend test-run or [something] like that. We don't *need* to know about every problem up front, just the stuff that's about to be released, really. The other stuff is just nice to have. If it's too expensive, I would say drop it.
There were other suggestions along those lines, as well as discussion of how to use GitLab features to reduce some of the "waste" in the amount of CI testing that is being done. It is useful to look at all of that, but Jason Ekstrand cautioned against getting too carried away:
He continued by noting that more data will help guide the process, but he is worried about the effect on the development process of reducing the amount of CI testing:
[...] I'm fairly hopeful that, once we understand better what the costs are (or even with just the new data we have), we can bring it down to reasonable and/or come up with money to pay for it in fairly short order.
Vetter is also worried that it could be somewhat difficult to figure out what tests are needed for a given change, which could result in missing out on important test runs:
In any case, the community is now aware of the problem and is pitching in
to start figuring out how best to attack it. Presumably some will also be
working with their companies to see if they can contribute as well. Any of
the solutions are likely to take some amount of effort both for developers
using the infrastructure and for the administrators of the system.
GitLab's new open-source program manager, Nuritzi Sanchez, also chimed
in; the company is interested in ensuring that community efforts like
fd.o are supported beyond just the migration help that was already
provided, she said. "We’ll be exploring ways for GitLab to help make
sure there isn’t a gap
in coverage during the time that freedesktop looks for sponsors.
"
While it may have come as a bit of a shock to some in the community, the announcement would seem to have served its purpose. The community now has been informed and can start working on the problem from various directions. Given the (too) runaway success of the CI infrastructure, one suspects that a sustainable model will be found before too long—probably well ahead of the (northern hemisphere) summer cutoff date.
Posted Mar 5, 2020 0:27 UTC (Thu)
by JohnVonNeumann (guest, #131609)
[Link] (19 responses)
I'm also not suggesting that you remove these tasks from the build, but if we visualise it:
Write code ---> ||| lint ---> test ---> ----> build ----> deploy
Posted Mar 5, 2020 0:37 UTC (Thu)
by roc (subscriber, #30627)
[Link] (6 responses)
Posted Mar 5, 2020 1:53 UTC (Thu)
by clopez (guest, #66009)
[Link]
Posted Mar 5, 2020 3:16 UTC (Thu)
by JohnVonNeumann (guest, #131609)
[Link] (4 responses)
Whilst I also get what you're saying about the tests, the simple fact is that most places are not firefox, and people should therefore stop acting as if their test suites are so large that they can't run them locally.
Posted Mar 5, 2020 3:28 UTC (Thu)
by gus3 (guest, #61103)
[Link] (1 responses)
But are you absolutely, positively certain that your work doesn't touch any other areas?
If the system integrator says otherwise, you'll need to run *all* the CI build tests.
Posted Mar 5, 2020 4:58 UTC (Thu)
by JohnVonNeumann (guest, #131609)
[Link]
Posted Mar 5, 2020 19:57 UTC (Thu)
by iabervon (subscriber, #722)
[Link] (1 responses)
Posted Mar 6, 2020 15:16 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Several times in my careers I had surreal discussions with sometimes senior developers where I would ask tracking unfortunately large lists of "known failing" tests to immediately detect and report regressions in passing tests and measure progression. Very basic validation stuff yet the answer was: "no, let's fix all the tests instead" which would of course never happen.
Posted Mar 5, 2020 1:20 UTC (Thu)
by anholt (guest, #52292)
[Link]
Posted Mar 5, 2020 7:07 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (3 responses)
Here's an idea for basic stuff that anyone can run locally: provide the PASS/FAIL status ASAP but log the warnings and error messages to a file. Wait a couple more minutes before publishing the log somewhere (pretend it's for some vague technical reason, in case the logs grow really big or make up something). I bet the vast majority of people who are told "FAIL, run this command to know why or wait X minutes" will not wait and run the command.
Posted Mar 5, 2020 8:32 UTC (Thu)
by gfernandes (subscriber, #119910)
[Link]
Posted Mar 6, 2020 9:28 UTC (Fri)
by geert (subscriber, #98403)
[Link] (1 responses)
Posted Mar 6, 2020 15:22 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Simple, more visible gap between developers who care a little bit about their productivity vs not really? Approved by HR immediately! Not sure I'm really against it either.
Posted Mar 5, 2020 9:07 UTC (Thu)
by rwmj (subscriber, #5474)
[Link] (1 responses)
Posted Mar 5, 2020 10:19 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
That's not too far off what VS Code Remote Development lets you do. Granted, it's an Electron app, not a web site, but that's nearly the same thing.
The benefit is that instead of having to have a powerful laptop, you can simply choose one that's nice to work on, and then do all the compiler running (gcc etc) on a nice fast machine in a datacentre somewhere.
Posted Mar 5, 2020 16:16 UTC (Thu)
by daniels (subscriber, #16193)
[Link]
No, I'd be pretty impressed if we'd managed to burn so much money running clang-format.
GStreamer runs an extensive conformance suite (gst-validate) over its changes, as well as builds on several distributions and architectures. Mesa runs builds on several architectures (x86, x86-64, armv7, armv8, s390x, ppc), a couple of build systems (Meson for Linux, SCons for Windows), and then runs tests such as the OpenGL (ES) conformance test suite and Piglit across the software rasteriser and a couple of different hardware platforms (Panfrost for newer Arm Mali GPUs, Lima for older Mali GPUs, Freedreno for Qualcomm GPUs).
Posted Mar 6, 2020 5:14 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
The issues we found is that the set of tools available on client-side are either not easily available everywhere (e.g., clang-format on Windows), or have silly version-specific behaviors (also clang-format). Plus, `git commit -n` exists, so you're stuck doing server-side enforcement *anyways*.
We're migrating to gitlab-ci at work and we do have a robot that does those kinds of checks (fairly cheaply; it's a 2 core VM and only really gets unhappy if you toss MRs with thousands of novel commits in them at it). There's another ~3x perf benefit laying around I haven't gotten to because it hasn't been necessary (and having to plumb libgit2 through everything rather than just exposing "here's a way to run an arbitrary git command" is more painful than I'd like it to be). Hooking things up so that the robot blocks testing if some subset of checks aren't passing (e.g., formatting is bad vs. some WIP marker detection) doesn't seem to be *too* hard (though may requires manual intervention of some kind on the first stab).
It works with any Gitlab instance (Github too for those curious, but it's an app there, so on the deprecation route; getting it to work in an action is on the TODO list) if folks are interested (fully FOSS, just bad "marketing" since I've had it on my list to blog about it for…a long time now).
Posted Mar 6, 2020 9:32 UTC (Fri)
by geert (subscriber, #98403)
[Link]
Posted Mar 6, 2020 16:23 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
How "simple" is that stuff really? The typical vicious circle is:
Ideally, anyone can copy/paste the command straight from the CI log[*] and at worst get a "program not found" fixable with a simple apt/dnf install. A test setup should not be less automated than a build setup - ideally they should be treated as one.
[*] and githooks just a couple lines long, personal preference.
Posted Mar 12, 2020 15:50 UTC (Thu)
by immibis (subscriber, #105511)
[Link]
This. When you have a physical server running CI (or a handful of servers with different hardware configurations that you want to test), you *want* to run it as much as possible because otherwise it's doing nothing. It's not until you hit your capacity limit that you have to find the most economical level of testing.
Pay-by-the-hour/gigabyte cloud hosting just makes it harder to know what your limits are. I have no doubt that most of these projects could get by with a much smaller rate of CI builds on many fewer platforms. But the power of CI is that it helps you find obscure bugs quickly, and then it wouldn't do that any more.
Imagine this scenario: everyone tests their own code, and the CI test runs only once a day on each platform. You would have a one-day turnaround on obscure platform bugs, but the cost would be way way way lower.
Posted Mar 5, 2020 7:01 UTC (Thu)
by amonakov (guest, #117106)
[Link] (1 responses)
> I don't think we need to worry so much about the cost of CI that we need to micro-optimize to to get the minimal number of CI runs. We especially shouldn't if it begins to impact coffee quality, ...
Posted Mar 5, 2020 17:45 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
Posted Mar 5, 2020 7:13 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (4 responses)
From a security perspective, the ability for basically anyone to run arbitrary code at ease never ceases to amaze me. I know: containers etc. but still.
Posted Mar 5, 2020 10:43 UTC (Thu)
by pkern (subscriber, #32883)
[Link] (2 responses)
Posted Mar 5, 2020 17:48 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Mar 6, 2020 7:16 UTC (Fri)
by pkern (subscriber, #32883)
[Link]
Posted Mar 5, 2020 11:52 UTC (Thu)
by blackwood (guest, #44174)
[Link]
Posted Mar 5, 2020 15:24 UTC (Thu)
by martin.langhoff (subscriber, #61417)
[Link]
Gitlab and others have the ability to run different test suites for the PRs and on a configurable "cron" schedule, so you run the smoketests on every PR, and "full" test suite on a daily basis on the master branch. If the full test suite broke, you can bisect to the commit or PR that broke it easily.
If data transfer of VM/Docker images and other assets are dominating the bill, a local image repo or just a cache should wipe 99% of that cost.
All tractable issues, IMO. And CI is a net win.
Posted Mar 5, 2020 17:40 UTC (Thu)
by smitty_one_each (subscriber, #28989)
[Link] (3 responses)
"You mean: two to three seconds," said the U.S. DoD.
I kills me that open source is so beggared. I enjoy being an FSF member, because we have to feed these good tools.
Posted Mar 6, 2020 1:04 UTC (Fri)
by gerdesj (subscriber, #5446)
[Link] (2 responses)
> "You mean: two to three seconds," said the U.S. DoD.
> I kills me that open source is so beggared. I enjoy being an FSF member, because we have to feed these good tools.
It is slightly annoying to me as a tax payer that the likes of DoD 'n' co could be so wasteful that you think it should be routine to look down on sums like that because they can be consumed in seconds.
Here in the UK we laugh at how much it costs for a football (soccer) player to take a shit on a contract at something like £200,000 p/w. Allow five mins in the bog: 200,000*52/365/24/12 = £99. Now that is obviously value for money but I suspect that a right thinking tax payer's largesse consumer would slap a £5000 per turd testing and initial deployment fee on top. Obviously there will be turd maintenance fees for 30 years which will be forecast scaled (inappropriately) and billed for 50 years, with an overrun of extra charges cascaded for another 20 years (because: "nonsense").
That's a very simplified (and only slightly jaded) view of military spending by ... some military lot.
Posted Mar 6, 2020 1:16 UTC (Fri)
by smitty_one_each (subscriber, #28989)
[Link] (1 responses)
What I've gleaned is that all human organizations of size (> Dunbar's Number) tend toward the Tower of Babel. Inevitably. Because #people.
Posted Mar 6, 2020 1:34 UTC (Fri)
by gerdesj (subscriber, #5446)
[Link]
Well yes they do for obvious reasons. I'll play straight man here. Dunbar's number is a measure of your direct contacts. Remember that each person has their own circle and it is subtly different from all the others. Changes in language, accent, dialect etc will propagate gradually across these circles of influence. Give it time and modern English, Dutch and German are very different languages, despite having fairly similar roots.
Posted Mar 5, 2020 17:59 UTC (Thu)
by flussence (guest, #85566)
[Link] (7 responses)
Posted Mar 5, 2020 18:34 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Mar 6, 2020 7:51 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Yes I know that economy of scale wouldn't work if *everybody* was only willing to pay their marginal cost, but that's why hosting commercial code should be (and often is, cf. Github) priced differently than free software.
Posted Mar 6, 2020 8:29 UTC (Fri)
by fw (subscriber, #26023)
[Link]
The branded public cloud overcharges a lot for network traffic. It's quite ridiculous. You don't even have to roll out your own network to avoid these costs. Just switching to an off-brand cloud helps a lot.
Posted Mar 10, 2020 12:34 UTC (Tue)
by judas_iscariote (guest, #47386)
[Link] (2 responses)
Posted Mar 10, 2020 12:56 UTC (Tue)
by pizza (subscriber, #46)
[Link] (1 responses)
For my group, keeping everything in-house is both faster _and_ cheaper.
Posted Mar 16, 2020 22:24 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Mar 12, 2020 15:53 UTC (Thu)
by immibis (subscriber, #105511)
[Link]
Posted Mar 5, 2020 19:00 UTC (Thu)
by rahvin (guest, #16953)
[Link] (3 responses)
Posted Mar 5, 2020 19:20 UTC (Thu)
by kaesaecracker (subscriber, #126447)
[Link] (1 responses)
Posted Mar 6, 2020 16:25 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Cache these too and transfer with rsync? It performs miracles when the binaries are not compressed.
Posted Mar 5, 2020 23:16 UTC (Thu)
by daniels (subscriber, #16193)
[Link]
Posted Mar 6, 2020 10:10 UTC (Fri)
by Seirdy (guest, #137326)
[Link] (1 responses)
Posted Mar 6, 2020 14:13 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Mar 9, 2020 16:56 UTC (Mon)
by wookey (guest, #5501)
[Link] (1 responses)
This probably pales into insignificance in comparison to the madness that is bitcoin, but we should consider how much CI we are doing and where we are doing it and whether the emissions are worth the benefit we get. Being wasteful is not excused by doing it on behalf of free software. '10 years to halve emissions' might include being a bit more responsible/targeted with our CI.
Posted Mar 12, 2020 15:54 UTC (Thu)
by immibis (subscriber, #105511)
[Link]
Posted Mar 10, 2020 6:18 UTC (Tue)
by pkolloch (subscriber, #21709)
[Link]
The costs of continuous integration
Write code ---> lint ---> test ---> |||----> build ----> deploy
The ||| indicates where we do our hand-off. IMO most failures tend to occur at the start of the pipeline, the stuff that can easily be picked up client side. Then you don't waste countless build minutes and cash.
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
My generation relies on local tools to catch such errors.
The next generation just pushes everything to the CI.
The costs of continuous integration
- test dependencies and setups are complex, so few developers run them locally
- because it affects only a few developers, the validation/devops team keeps increasing the complexity, asking these few developers to "upgrade" to new test frameworks etc.
- fewer and fewer developers run tests locally
- repeat and the gap widens each time
The costs of continuous integration
When the maintainer wants to write about code, but the maintainer's brain is yearning for its daily dose:
The costs of continuous coffee consumption
The costs of continuous coffee consumption
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
Maybe a dumb question
Maybe a dumb question
Maybe a dumb question
Maybe a dumb question
The costs of continuous integration
while running quicker tests on the self-hosted instance? Expansive tests can be run
on the local instance for new tags/versions, but for every little commit it should be
okay to run on an extenal instance.
The costs of continuous integration
The costs of continuous integration
The costs of continuous integration
I see a lot of CI pipelines that build and rerun everything nowadays, even though saner build systems like nix or bazel will cache appropriately. You still have to structure your build dependencies a bit carefully but then you can often avoid a lot of work.
nix caches by transative input hashes right now, so bazel caching by immediate inputs is often more precise, e.g. the reformatting mentioned in other comments would cause the directly affected build units to rebuild but their output would be the same (given a reproducible compiler) so the rest is skipped.
The costs of continuous integration