The costs of continuous integration

By Jake Edge
March 4, 2020

By most accounts, the freedesktop.org (fd.o) GitLab instance has been a roaring success; lots of projects are using it, including Mesa, Linux kernel graphics drivers, NetworkManager, PipeWire, and many others. In addition, a great deal of continuous-integration (CI) testing is being done on a variety of projects under the fd.o umbrella. That success has come at a price, however. A recent message from the X.Org Foundation, which merged with fd.o in 2019, has made it clear that the current situation is untenable from a financial perspective. Given its current resources, X.Org cannot continue covering those costs beyond another few months.

X.Org board member Daniel Vetter posted a message to multiple mailing lists on February 27. In it, he noted that the GitLab instance has become quite popular, including the CI integration, which is good news. But there is some bad news to go with it:

The cost in growth has also been tremendous, and it's breaking our bank account. With reasonable estimates for continued growth we're expecting hosting expenses totalling 75k USD this year, and 90k USD next year. With the current sponsors we've set up we can't sustain that. We estimate that hosting expenses for gitlab.fd.o without any of the CI features enabled would total 30k USD, which is within X.org's ability to support through various sponsorships, mostly through XDC.

The expense growth is mainly from "storing and serving build artifacts and images to outside CI runners sponsored by various companies". Beyond that, all of that growth means that a system administrator is needed to maintain the infrastructure, so "X.org is therefore also looking for admin sponsorship, at least medium term". Without more sponsors for the costs of the CI piece, it looks like those service would need to be turned off in May or June, he said. The board is working on finding additional sponsorship money, but that takes time, so it wanted to get the word out.

That set off a discussion of the problem and some possible solutions. Matt Turner was concerned that the bandwidth expense had not been adequately considered when the decision was made to self-host the GitLab instance. "Perhaps people building the CI would make different decisions about its structure if they knew it was going to wipe out the bank account." He wondered if the tradeoff was worth the cost:

I understand that self-hosting was attractive so that we didn't find ourselves on the SourceForge-equivalent hosting platform of 2022, but is that risk real enough to justify spending 75K+ per year? If we were hosted on gitlab.com or github.com, we wouldn't be paying for transferring CI images to CI test machines, etc, would we?

Daniel Stone, one of the administrators for the fd.o infrastructure (who gave a talk on the organization's history at the 2018 X.Org Developers Conference), filled in some of the numbers for the costs involved. He said that the bill for January was based on 17.9TB of network egress (mostly copying CI images to the test-running systems) and 16TB of storage for "CI artifacts, container images, file uploads, [and] Git LFS". That totaled to almost $4,000, so the $75,000 projection takes into account further growth. In a follow-up message, he detailed the growth as well:

For context, our storage & network costs have increased >10x in the past 12 months (~$320 Jan 2019), >3x in the past 6 months (~$1350 July 2019), and ~2x in the past 3 months (~$2000 Oct 2019).

Stone also noted that the Google Cloud Platform (where the GitLab instance is hosted) does not provide all of the platform types needed for running the CI system. For example, Arm-based DragonBoards are needed, so some copying to external testing systems will be required. Using the cloud services means that bandwidth is metered, which is not necessarily true in other hosting setups, such as virtual private servers, as Jan Engelhardt pointed out. That would require more system administration costs, however, which Stone thinks would now make sense:

I do now (personally) think that it's crossed the point at which it would be worthwhile paying an admin to solve the problems that cloud services currently solve for us - which wasn't true before.

Dave Airlie argued that the CI infrastructure should be shut down "until we work out what a sustainable system would look like within the budget we have". He thought that it would be difficult to attract sponsors to effectively pay Google and suggested that it would make more sense for Google to cut out the middleman: "Having google sponsor the credits costs google substantially less than having any other company give us money to do it."

Vetter said that Google has provided $30,000 in hosting credits over the last year, but that money "simply ran out _much_ faster than anyone planned for". In addition, there are plenty of other ways that companies can sponsor the CI system:

Plus there's also other companies sponsoring CI runners and what not else in equally substantial amounts, plus the biggest thing, sponsored admin time (more or less officially). So there's a _lot_ of room for companies like Red Hat to sponsor without throwing any money in google's revenue stream.

The lack of any oversight of what gets run in the CI system and which projects are responsible for it is part of the problem, Airlie said. "You can't have a system in place that lets CI users burn [large] sums of money without authorisation, and that is what we have now." Vetter more or less agreed, but said that the speed of the growth caught the board by surprise, "so we're a bit behind on the controlling aspect". There is an effort to be able to track the costs by project, which will make it easier to account for where the money is going—and to take action if needed.

As part of the reassessment process, Kristian Høgsberg wanted to make sure that the "tremendous success" of the system was recognized. "Between gitlab and the CI, our workflow has improved and code quality has gone up." He said that it would have been hard to anticipate the growth:

[...] it seems pretty standard engineering practice to build a system, observe it and identify and eliminate bottlenecks. Planning never hurts, of course, but I don't think anybody could have realistically modeled and projected the cost of this infrastructure as it's grown organically and fast.

Reducing costs

The conversation soon turned toward how to reduce the cost in ways that would not really impact the overall benefit that the system is providing. There may be some low-hanging fruit in terms of which kinds of changes actually need testing on all of the different hardware. As Erik Faye-Lund put it:

It feels silly that we need to test changes to e.g the i965 driver on dragonboards. We only have a big "do not run CI at all" escape- hatch.

[...] We could also do stuff like reducing the amount of tests we run on each commit, and punt some testing to a per-weekend test-run or [something] like that. We don't *need* to know about every problem up front, just the stuff that's about to be released, really. The other stuff is just nice to have. If it's too expensive, I would say drop it.

There were other suggestions along those lines, as well as discussion of how to use GitLab features to reduce some of the "waste" in the amount of CI testing that is being done. It is useful to look at all of that, but Jason Ekstrand cautioned against getting too carried away:

I don't think we need to worry so much about the cost of CI that we need to micro-optimize to to get the minimal number of CI runs. We especially shouldn't if it begins to impact coffee quality, people's ability to merge patches in a timely manner, or visibility into what went wrong when CI fails.

He continued by noting that more data will help guide the process, but he is worried about the effect on the development process of reducing the amount of CI testing:

I'm just trying to say that CI is useful and we shouldn't hurt our development flows just to save a little money unless we're truly desperate. From what I understand, I don't think we're that desperate yet. So I was mostly trying to re-focus the discussion towards straightforward things we can do to get rid of pointless waste (there probably is some pretty low-hanging fruit) and away from "OMG X.org is running out of money; CI as little as possible". [...]

[...] I'm fairly hopeful that, once we understand better what the costs are (or even with just the new data we have), we can bring it down to reasonable and/or come up with money to pay for it in fairly short order.

Vetter is also worried that it could be somewhat difficult to figure out what tests are needed for a given change, which could result in missing out on important test runs:

I think [it's] much better to look at filtering out CI targets for when nothing relevant happened. But that gets somewhat tricky, since "nothing relevant" is always only relative to some baseline, so bit of scripting and all involved to make sure you don't run stuff too often or (probably worse) not often enough.

In any case, the community is now aware of the problem and is pitching in to start figuring out how best to attack it. Presumably some will also be working with their companies to see if they can contribute as well. Any of the solutions are likely to take some amount of effort both for developers using the infrastructure and for the administrators of the system. GitLab's new open-source program manager, Nuritzi Sanchez, also chimed in; the company is interested in ensuring that community efforts like fd.o are supported beyond just the migration help that was already provided, she said. "We’ll be exploring ways for GitLab to help make sure there isn’t a gap in coverage during the time that freedesktop looks for sponsors."

While it may have come as a bit of a shock to some in the community, the announcement would seem to have served its purpose. The community now has been informed and can start working on the problem from various directions. Given the (too) runaway success of the CI infrastructure, one suspects that a sustainable model will be found before too long—probably well ahead of the (northern hemisphere) summer cutoff date.

The costs of continuous integration

Posted Mar 5, 2020 0:27 UTC (Thu) by JohnVonNeumann (guest, #131609) [Link] (19 responses)

I've seen this for a while now in enterprise where there is a default behaviour to simply test *every* tiny change on the CI. People will push up incorrectly formatted code without having run linters/static analysis client side and wait the 3 minutes for the CI to identify that the code isn't to standard, then fix it, and repeat. It's a massive waste of CI compute that is barely felt because most orgs aren't struggling to pay for the CI time. I think that perhaps you'll find a similar situation is occuring with this as well. I have been banging on about this for a bit now, using simple stuff like githooks and docker to have all of your non-functional CI tasks run client side (linting, testing to some degree, formatting), anything that isn't "deploy this to x" but I don't seem to have much luck convincing people that the improvement in feedback-cycle time that you get by cutting off the inherent network latency of waiting for the CI to do things that can easily be done on your localhost is worthwhile. All our dev machines are packing 16GB of mem at least, most of the time our local machines have more grunt than the actual CI servers.

I'm also not suggesting that you remove these tasks from the build, but if we visualise it:

Write code ---> ||| lint ---> test ---> ----> build ----> deploy
Write code ---> lint ---> test ---> |||----> build ----> deploy

The ||| indicates where we do our hand-off. IMO most failures tend to occur at the start of the pipeline, the stuff that can easily be picked up client side. Then you don't waste countless build minutes and cash.

The costs of continuous integration

Posted Mar 5, 2020 0:37 UTC (Thu) by roc (subscriber, #30627) [Link] (6 responses)

At sufficient scale, certainly at Firefox scale or larger, it's completely impractical to run all the tests locally. You run a few "possibly relevant" tests and then rely on CI to test the rest.

The costs of continuous integration

Posted Mar 5, 2020 1:53 UTC (Thu) by clopez (guest, #66009) [Link]

Right. But he talks about people not even running very basic tests like lint the code.

The costs of continuous integration

Posted Mar 5, 2020 3:16 UTC (Thu) by JohnVonNeumann (guest, #131609) [Link] (4 responses)

What clopez said and in my original comment I did make a proviso around running test suites, I'm well aware of the size of some test suites. But you don't need to run all the tests for CI builds, just the test for the area in which you are working, which should be managable by your local device. But the crux of my argument didn't revolve around the tests. It was around the stuff that can be done locally, like static analysis and linting primarily.

Whilst I also get what you're saying about the tests, the simple fact is that most places are not firefox, and people should therefore stop acting as if their test suites are so large that they can't run them locally.

The costs of continuous integration

Posted Mar 5, 2020 3:28 UTC (Thu) by gus3 (guest, #61103) [Link] (1 responses)

"you don't need to run all the tests for CI builds, just the test for the area in which you are working"

But are you absolutely, positively certain that your work doesn't touch any other areas?

If the system integrator says otherwise, you'll need to run *all* the CI build tests.

The costs of continuous integration

Posted Mar 5, 2020 4:58 UTC (Thu) by JohnVonNeumann (guest, #131609) [Link]

No of course you're not certain your work doesn't touch other areas, that's why the CI pipeline still runs the full test suite.

The costs of continuous integration

Posted Mar 5, 2020 19:57 UTC (Thu) by iabervon (subscriber, #722) [Link] (1 responses)

The right way to say that is: run the tests for the area you touched first, and if they fail, don't bother finding out if you also broke other things. It's worth minimizing the cost of finding the most likely problems. For that matter, the full CI looking for subtle and unexpected interactions should probably be after code review if it's expensive, since it's pretty likely that there will be a few rounds of not-quite-clear code, and nobody needs to be more than 99% sure these don't cause subtle problems, while they need close to 100% for the final code.

The costs of continuous integration

Posted Mar 6, 2020 15:16 UTC (Fri) by marcH (subscriber, #57642) [Link]

"Testing can only prove the existence of bugs, not their absence", yet it's difficult for some developers inexplicably still not interesting in quality to understand that testing is not just one button and not just one green/red light but various coverage/resources/time tradeoffs.

Several times in my careers I had surreal discussions with sometimes senior developers where I would ask tracking unfortunately large lists of "known failing" tests to immediately detect and report regressions in passing tests and measure progression. Very basic validation stuff yet the answer was: "no, let's fix all the tests instead" which would of course never happen.

The costs of continuous integration

Posted Mar 5, 2020 1:20 UTC (Thu) by anholt (guest, #52292) [Link]

Our pipelines already let us terminate early if we want to structure them that way, so we could get all the wins you're talking about (don't run the whole thing on obviously busted commits by doing easy linting locally). However, the cost is not in the failed runs but all the successful runs (judging by the streams of green checkmarks I see when I go looking through pipeline logs).

The costs of continuous integration

Posted Mar 5, 2020 7:07 UTC (Thu) by marcH (subscriber, #57642) [Link] (3 responses)

> People will push up incorrectly formatted code without having run linters/static analysis client side and wait the 3 minutes for the CI to identify that the code isn't to standard, then fix it, and repeat.

Here's an idea for basic stuff that anyone can run locally: provide the PASS/FAIL status ASAP but log the warnings and error messages to a file. Wait a couple more minutes before publishing the log somewhere (pretend it's for some vague technical reason, in case the logs grow really big or make up something). I bet the vast majority of people who are told "FAIL, run this command to know why or wait X minutes" will not wait and run the command.

The costs of continuous integration

Posted Mar 5, 2020 8:32 UTC (Thu) by gfernandes (subscriber, #119910) [Link]

You're thinking about *sequential* workers. Many workers, work in parallel. So it's still worth it for them to "outsource" the build/test to the CI pipeline, and use that time to work on something else.

The costs of continuous integration

Posted Mar 6, 2020 9:28 UTC (Fri) by geert (subscriber, #98403) [Link] (1 responses)

Many people will consider that an opportunity: https://xkcd.com/303/ ("My code's compiling").

The costs of continuous integration

Posted Mar 6, 2020 15:22 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Many people will consider that an opportunity: https://xkcd.com/303/ ("My code's compiling").

Simple, more visible gap between developers who care a little bit about their productivity vs not really? Approved by HR immediately! Not sure I'm really against it either.

The costs of continuous integration

Posted Mar 5, 2020 9:07 UTC (Thu) by rwmj (subscriber, #5474) [Link] (1 responses)

The logical conclusion of this is developers won't write the software locally at all. They'll simply edit it in some kind of web interface and everything will happen on the CI system. What a world ...

The costs of continuous integration

Posted Mar 5, 2020 10:19 UTC (Thu) by farnz (subscriber, #17727) [Link]

That's not too far off what VS Code Remote Development lets you do. Granted, it's an Electron app, not a web site, but that's nearly the same thing.

The benefit is that instead of having to have a powerful laptop, you can simply choose one that's nice to work on, and then do all the compiler running (gcc etc) on a nice fast machine in a datacentre somewhere.

The costs of continuous integration

Posted Mar 5, 2020 16:16 UTC (Thu) by daniels (subscriber, #16193) [Link]

> I think that perhaps you'll find a similar situation is occuring with this as well.

No, I'd be pretty impressed if we'd managed to burn so much money running clang-format.

GStreamer runs an extensive conformance suite (gst-validate) over its changes, as well as builds on several distributions and architectures. Mesa runs builds on several architectures (x86, x86-64, armv7, armv8, s390x, ppc), a couple of build systems (Meson for Linux, SCons for Windows), and then runs tests such as the OpenGL (ES) conformance test suite and Piglit across the software rasteriser and a couple of different hardware platforms (Panfrost for newer Arm Mali GPUs, Lima for older Mali GPUs, Freedreno for Qualcomm GPUs).

The costs of continuous integration

Posted Mar 6, 2020 5:14 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> using simple stuff like githooks and docker to have all of your non-functional CI tasks run client side (linting, testing to some degree, formatting)

The issues we found is that the set of tools available on client-side are either not easily available everywhere (e.g., clang-format on Windows), or have silly version-specific behaviors (also clang-format). Plus, `git commit -n` exists, so you're stuck doing server-side enforcement *anyways*.

We're migrating to gitlab-ci at work and we do have a robot that does those kinds of checks (fairly cheaply; it's a 2 core VM and only really gets unhappy if you toss MRs with thousands of novel commits in them at it). There's another ~3x perf benefit laying around I haven't gotten to because it hasn't been necessary (and having to plumb libgit2 through everything rather than just exposing "here's a way to run an arbitrary git command" is more painful than I'd like it to be). Hooking things up so that the robot blocks testing if some subset of checks aren't passing (e.g., formatting is bad vs. some WIP marker detection) doesn't seem to be *too* hard (though may requires manual intervention of some kind on the first stab).

It works with any Gitlab instance (Github too for those curious, but it's an app there, so on the deprecation route; getting it to work in an action is on the TODO list) if folks are interested (fully FOSS, just bad "marketing" since I've had it on my list to blog about it for…a long time now).

The costs of continuous integration

Posted Mar 6, 2020 9:32 UTC (Fri) by geert (subscriber, #98403) [Link]

The previous generation checked their code meticulously for syntax and other errors before submitting their punched cards to the datacenter.
My generation relies on local tools to catch such errors.
The next generation just pushes everything to the CI.

The costs of continuous integration

Posted Mar 6, 2020 16:23 UTC (Fri) by marcH (subscriber, #57642) [Link]

> I have been banging on about this for a bit now, using simple stuff like githooks and docker to have all of your non-functional CI tasks run client side (linting, testing to some degree, formatting), anything that isn't "deploy this to x" but I don't seem to have much luck convincing people ...

How "simple" is that stuff really? The typical vicious circle is:
- test dependencies and setups are complex, so few developers run them locally
- because it affects only a few developers, the validation/devops team keeps increasing the complexity, asking these few developers to "upgrade" to new test frameworks etc.
- fewer and fewer developers run tests locally
- repeat and the gap widens each time

Ideally, anyone can copy/paste the command straight from the CI log[*] and at worst get a "program not found" fixable with a simple apt/dnf install. A test setup should not be less automated than a build setup - ideally they should be treated as one.

[*] and githooks just a couple lines long, personal preference.

The costs of continuous integration

Posted Mar 12, 2020 15:50 UTC (Thu) by immibis (subscriber, #105511) [Link]

> It's a massive waste of CI compute that is barely felt because most orgs aren't struggling to pay for the CI time. I think that perhaps you'll find a similar situation is occuring with this as well.

This. When you have a physical server running CI (or a handful of servers with different hardware configurations that you want to test), you *want* to run it as much as possible because otherwise it's doing nothing. It's not until you hit your capacity limit that you have to find the most economical level of testing.

Pay-by-the-hour/gigabyte cloud hosting just makes it harder to know what your limits are. I have no doubt that most of these projects could get by with a much smaller rate of CI builds on many fewer platforms. But the power of CI is that it helps you find obscure bugs quickly, and then it wouldn't do that any more.

Imagine this scenario: everyone tests their own code, and the CI test runs only once a day on each platform. You would have a one-day turnaround on obscure platform bugs, but the cost would be way way way lower.

The costs of continuous coffee consumption

Posted Mar 5, 2020 7:01 UTC (Thu) by amonakov (guest, #117106) [Link] (1 responses)

When the maintainer wants to write about code, but the maintainer's brain is yearning for its daily dose:

> I don't think we need to worry so much about the cost of CI that we need to micro-optimize to to get the minimal number of CI runs. We especially shouldn't if it begins to impact coffee quality, ...

The costs of continuous coffee consumption

Posted Mar 5, 2020 17:45 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

I was not aware that Captain Janeway was maintaining the CI system.

The costs of continuous integration

Posted Mar 5, 2020 7:13 UTC (Thu) by marcH (subscriber, #57642) [Link] (4 responses)

> The lack of any oversight of what gets run in the CI system and which projects are responsible for it is part of the problem, Airlie said. "You can't have a system in place that lets CI users burn [large] sums of money without authorisation, and that is what we have now."

From a security perspective, the ability for basically anyone to run arbitrary code at ease never ceases to amaze me. I know: containers etc. but still.

The costs of continuous integration

Posted Mar 5, 2020 10:43 UTC (Thu) by pkern (subscriber, #32883) [Link] (2 responses)

I would argue that as long as no artefacts are copied off those machines and reused for anything but testing on another test machine, you can treat those machines as entirely untrusted. Of course that also means that those machines must not have any privileged access. With containers and VMs you can at least isolate the slightly more privileged runner from the actual test workload.

The costs of continuous integration

Posted Mar 5, 2020 17:48 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

Unfortunately, you can't "just" treat a machine as untrusted and think your job is done. Black hats will use it for DoS attacks against third parties, to mine whatever cryptocurrency is popular, and in various other nefarious ways that don't require trust.

The costs of continuous integration

Posted Mar 6, 2020 7:16 UTC (Fri) by pkern (subscriber, #32883) [Link]

Sure, but you can monitor for that. Whatever public infrastructure you offer will eventually be abused. That might even be true of any private infrastructure. ;-) At the same time people will only care about that if there are actual costs they see.

The costs of continuous integration

Posted Mar 5, 2020 11:52 UTC (Thu) by blackwood (guest, #44174) [Link]

It's not containers (we have zero illusions they're going to hold an attacker up), it's cheap throwaway machines that do nothing else than run CI jobs. You bomb one, we reinstall it. There's some cross-project credentials you might be able to sneak from other jobs, stuff like "I've rebuilt, need to kick of the next project in the depedency chain to do it's rebuilding". But given that all the code is open anyway the amount of damage you can inflict by taking over CI runners is really minimal. Breaking into the main servers is a different thing entirely.

The costs of continuous integration

Posted Mar 5, 2020 15:24 UTC (Thu) by martin.langhoff (subscriber, #61417) [Link]

Having a short, fast smoketest set of tests, and an extensive set of tests is as easy as marking the tests for different test suites. It's pretty much standard practice in my book.

Gitlab and others have the ability to run different test suites for the PRs and on a configurable "cron" schedule, so you run the smoketests on every PR, and "full" test suite on a daily basis on the master branch. If the full test suite broke, you can bisect to the commit or PR that broke it easily.

If data transfer of VM/Docker images and other assets are dominating the bill, a local image repo or just a cache should wipe 99% of that cost.

All tractable issues, IMO. And CI is a net win.

The costs of continuous integration

Posted Mar 5, 2020 17:40 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link] (3 responses)

> . . .75k USD this year, and 90k USD next year. . .

"You mean: two to three seconds," said the U.S. DoD.

I kills me that open source is so beggared. I enjoy being an FSF member, because we have to feed these good tools.

The costs of continuous integration

Posted Mar 6, 2020 1:04 UTC (Fri) by gerdesj (subscriber, #5446) [Link] (2 responses)

>> . . .75k USD this year, and 90k USD next year. . .

> "You mean: two to three seconds," said the U.S. DoD.

> I kills me that open source is so beggared. I enjoy being an FSF member, because we have to feed these good tools.

It is slightly annoying to me as a tax payer that the likes of DoD 'n' co could be so wasteful that you think it should be routine to look down on sums like that because they can be consumed in seconds.

Here in the UK we laugh at how much it costs for a football (soccer) player to take a shit on a contract at something like £200,000 p/w. Allow five mins in the bog: 200,000*52/365/24/12 = £99. Now that is obviously value for money but I suspect that a right thinking tax payer's largesse consumer would slap a £5000 per turd testing and initial deployment fee on top. Obviously there will be turd maintenance fees for 30 years which will be forecast scaled (inappropriately) and billed for 50 years, with an overrun of extra charges cascaded for another 20 years (because: "nonsense").

That's a very simplified (and only slightly jaded) view of military spending by ... some military lot.

The costs of continuous integration

Posted Mar 6, 2020 1:16 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link] (1 responses)

Well, we can't have J. Random Crapper making these contributions, can we?

What I've gleaned is that all human organizations of size (> Dunbar's Number) tend toward the Tower of Babel. Inevitably. Because #people.

The costs of continuous integration

Posted Mar 6, 2020 1:34 UTC (Fri) by gerdesj (subscriber, #5446) [Link]

>What I've gleaned is that all human organizations of size (> Dunbar's Number) tend toward the Tower of Babel. Inevitably. Because #people.

Well yes they do for obvious reasons. I'll play straight man here. Dunbar's number is a measure of your direct contacts. Remember that each person has their own circle and it is subtly different from all the others. Changes in language, accent, dialect etc will propagate gradually across these circles of influence. Give it time and modern English, Dutch and German are very different languages, despite having fairly similar roots.

The costs of continuous integration

Posted Mar 5, 2020 17:59 UTC (Thu) by flussence (guest, #85566) [Link] (7 responses)

Sounds like the bulk of the cost comes from a dependency on Someone Else's Computers.

The costs of continuous integration

Posted Mar 5, 2020 18:34 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

If you have them locally it probably won't be cheaper.

The costs of continuous integration

Posted Mar 6, 2020 7:51 UTC (Fri) by smurf (subscriber, #17840) [Link]

Probably not, but there's a middle road – host the instances in a data center with low traffic cost and admin them yourself, instead of paying Google $$$ for what's $ or $$ (additional) cost to them.

Yes I know that economy of scale wouldn't work if *everybody* was only willing to pay their marginal cost, but that's why hosting commercial code should be (and often is, cf. Github) priced differently than free software.

The costs of continuous integration

Posted Mar 6, 2020 8:29 UTC (Fri) by fw (subscriber, #26023) [Link]

The computers wouldn't be cheaper in all cases (only in some), but the network traffic would be, by over an order of magnitude. Closer to two, actually. And that's before factoring in bulk discounts.

The branded public cloud overcharges a lot for network traffic. It's quite ridiculous. You don't even have to roll out your own network to avoid these costs. Just switching to an off-brand cloud helps a lot.

The costs of continuous integration

Posted Mar 10, 2020 12:34 UTC (Tue) by judas_iscariote (guest, #47386) [Link] (2 responses)

Yeah, but running on your own computers is not going to make it cheaper at all, it will just push the cost elsewhere.

The costs of continuous integration

Posted Mar 10, 2020 12:56 UTC (Tue) by pizza (subscriber, #46) [Link] (1 responses)

Running on your own systems does push the costs elsewhere, but it's disingenuous to claim that doing so is automatically more expensive, especially if your cost is dominated by data transfer.

For my group, keeping everything in-house is both faster _and_ cheaper.

The costs of continuous integration

Posted Mar 16, 2020 22:24 UTC (Mon) by nix (subscriber, #2304) [Link]

Exactly. The cost of sending a few terabytes around my local network is basically nil (I suppose it costs a few micropennies in extra electricity, but that's utterly unnoticeable).

The costs of continuous integration

Posted Mar 12, 2020 15:53 UTC (Thu) by immibis (subscriber, #105511) [Link]

Also that the people who decide how much the CI system should do aren't the ones paying for it.

Maybe a dumb question

Posted Mar 5, 2020 19:00 UTC (Thu) by rahvin (guest, #16953) [Link] (3 responses)

Why isn't the CI build client comparing it's local cache against the git instance and only downloading the changes? That bandwidth figure leads me to believe each CI run is re-downloading the whole git tree. The CI server should be able to store or cache the git tree and download only the changes which should dramatically lower bandwidth.

Maybe a dumb question

Posted Mar 5, 2020 19:20 UTC (Thu) by kaesaecracker (subscriber, #126447) [Link] (1 responses)

The code/git is not the problem here. Text is tiny compared to the resulting binaries that you have to transfer to the machines that run the tests.

Maybe a dumb question

Posted Mar 6, 2020 16:25 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Text is tiny compared to the resulting binaries that you have to transfer to the machines that run the tests.

Cache these too and transfer with rsync? It performs miracles when the binaries are not compressed.

Maybe a dumb question

Posted Mar 5, 2020 23:16 UTC (Thu) by daniels (subscriber, #16193) [Link]

They are caching.

The costs of continuous integration

Posted Mar 6, 2020 10:10 UTC (Fri) by Seirdy (guest, #137326) [Link] (1 responses)

Isn't it possible to set up a mirror to gitlab.com and run expensive tests there
while running quicker tests on the self-hosted instance? Expansive tests can be run
on the local instance for new tags/versions, but for every little commit it should be
okay to run on an extenal instance.

The costs of continuous integration

Posted Mar 6, 2020 14:13 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

My reading has been that the biggest cost has been the bandwidth usage, not CPU usage. So moving builders to be even further away is unlikely to help with the biggest problem.

The costs of continuous integration

Posted Mar 9, 2020 16:56 UTC (Mon) by wookey (guest, #5501) [Link] (1 responses)

Obviously the cost is an issue, but potentially more of an issue is the energy use and thus emissions of the modern tendency to do epic amounts of CI. How many tonnes does that $90,000 represent? Is it one, ten or thousands?

This probably pales into insignificance in comparison to the madness that is bitcoin, but we should consider how much CI we are doing and where we are doing it and whether the emissions are worth the benefit we get. Being wasteful is not excused by doing it on behalf of free software. '10 years to halve emissions' might include being a bit more responsible/targeted with our CI.

The costs of continuous integration

Posted Mar 12, 2020 15:54 UTC (Thu) by immibis (subscriber, #105511) [Link]

A carbon tax would make sure the emissions are priced into that number.

The costs of continuous integration

Posted Mar 10, 2020 6:18 UTC (Tue) by pkolloch (subscriber, #21709) [Link]

I see a lot of CI pipelines that build and rerun everything nowadays, even though saner build systems like nix or bazel will cache appropriately. You still have to structure your build dependencies a bit carefully but then you can often avoid a lot of work. nix caches by transative input hashes right now, so bazel caching by immediate inputs is often more precise, e.g. the reformatting mentioned in other comments would cause the directly affected build units to rebuild but their output would be the same (given a reproducible compiler) so the rest is skipped.