Moving Google toward the mainline

Posted Oct 8, 2021 4:59 UTC (Fri) by zblaxell (subscriber, #26385)
In reply to: Moving Google toward the mainline by bfields
Parent article: Moving Google toward the mainline

One thing we learn very quickly when we try to use software to implement a production service is that there are two kinds of software: software we tested, and software we haven't tested yet. We can say a lot of things about the first kind, like it definitely works for some things and definitely doesn't work for other things, and various behaviors have changed in good and bad ways between versions. We can define boundaries around what we do and don't know about code behavior, and we can assess deployment risks based on empirical data.

All we can say about the second kind of software is that it hasn't been tested on our workload, so we don't know any of those things. Sure, it might have been tested by some group of domain-expert people, and certified by another group of accredited generalist people, and a third group of people with a lot of reddit upvotes swears it's awesome, and some robots didn't notice any of the more common problems--in fact, we'd insist on most or all of that before we bother downloading it for a test build. None of that fancy pedigree matters if we throw our production app on it, and it immediately falls over because we're doing something nobody else does. If we're providing a production service on a commercial scale with the software, it's highly likely we're doing something nobody else does. Even if others start doing what we do, we'd write some new code and be doing something different again. Maybe we're doing something wrong, and our tests (and only our tests) will make the problem visible.

The QA gatekeeper in front of the production server farm has one job: keep the server farm producing at least whatever it is producing now. They can keep running the kernel they already have, so they have no incentive to take risks that might jeopardize that. The gatekeeper will not accept broad assurances of quantity testing--they'll need to be *convinced* to upgrade, with evidence of monotonic improvement in the new versions, or dire and irreparable problems arising in the old versions. "Personally tested by the maintainer and a team of leading experts in the subsystem" is an excellent start, but we'll run our own test suite on it before we call it "tested."

At every node in the integration graph, from developer's laptop to integration tree to merge window to release, LTS, and production deployment, someone is doing testing and deciding whether the code they pulled as input to their node is good enough to push to the output of their node (or in the case of testing robots, snooping on the edges between nodes and advising the node owners). Every node must consider its inputs "effectively untested," or the integration graph doesn't work. That's the whole point of having an integration graph: to combine diverse and isolated pools of domain expertise into a comprehensive testing workflow.