|
|
Log in / Subscribe / Register

LLVM ist a mess

LLVM ist a mess

Posted Mar 17, 2024 9:59 UTC (Sun) by khim (subscriber, #9252)
In reply to: LLVM ist a mess by Curan
Parent article: Cranelift code generation comes to Rust

> if you need to break core concepts this often, you probably made a lot of mistakes in the past

Nope. Linux kernel breaks internal interfaces pretty often, too. The only problem of LLVM is that it's advertised as something of a separate project while in reality it's only half of many projects.

If all these projects would have lived in one repo and people would have changed everything in sync it wouldn't have even been visible.

> LLVM ist just not a stable platform you can develop against

That's the core issue: it was never designed as such. Clang/LLVM developers even explicitly said that you shouldn't try to use it as a stable platform. But lots of companies wanted stable compiler platform and they decreed that LLVM is it against developer's wishes and insistence.

> a lot of LLVM feels like it is a test environment to try out new things for the compiler space

Which is precisely what LLVM was designed for. Just open Wikipedia and read: LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages. From what you are saying LLVM works and acts like it was designed to work and act so why is that an issue?

> but then it shouldn't be the basis of anything else

Build “better basis for anything else”, isn't that the right solution? Maybe as LLVM fork or write from scratch.

I was told in no-uncertain terms in somewhat tangetially related discussion just over there that you have zero right to complain since LLVM is free.

> The one thing I'll never understand is how so many parts of the Khronos/Mesa ecosystem (and others, including Rust and WebAssembly) can depend on such an unstable platform.

License. Writing compilers is hard and time-consuming process. Thus there are, realistically, only two choices: LLVM and gcc (via libgccjit). And pointy-haired-bosses out there don't like GPL so LLVM was chosen. Initially they even mandated the use bitcode which produced many stillborn projects (pNaCl, RenderScript and bitcode iOS apps, to name a few), after they realized that developers weren't joking and they couldn't force them to do what they never promised to do bitcode use was abandoned, but since no replacement was available LLVM use continued.


to post comments

LLVM ist a mess

Posted Mar 17, 2024 16:03 UTC (Sun) by jem (subscriber, #24231) [Link] (1 responses)

>LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages.

Note the word "originally". That was 21 years ago, and the sentence does not imply it still is nothing more than a research project. On the official LLVM web site we can read "LLVM began as a research project[...] Since then LLVM has grown to be an umbrella project consisting of a number of subprojects, many of which are being used in production by a wide variety of commercial and open source projects[...]"

LLVM ist a mess

Posted Mar 17, 2024 16:20 UTC (Sun) by khim (subscriber, #9252) [Link]

Beyond certain size it's incredibly hard to change the nature of a project. Like Windows 11, which includes certain design decisions which may be traced back to design decisions made more than half-century ago when TOPS-10 was designed many things in LLVM are still in the shape needed to be a research project.

LLVM ist a mess

Posted Mar 17, 2024 16:32 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (3 responses)

In particular though LLVM's IR has a lot of places where either they say "We do X" but actually "Oh, we just assumed C++ even though that's not what we wrote" or they just don't explain and when you ask "Um, I guess it's whatever C++ requires". Part of this leaks from Clang and has been significantly corrected by the competing requirements from the Rust compiler team but part of it is clearly more intrinsic to LLVM as a project - they have people who think in C++ when they're supposed to be thinking in terms of LLVM's IR. There's a sloppiness that I'd expect from C++ people and I think is less prevalent for Cranelift.

I think "We assumed C++" is a problem for a research project too. Lots of interesting new work from the last few decades can't happen if you're just "assuming C++" everywhere, what you get out is "Oh well, apparently it's impossible to do better than C++" because you've assumed that's all that's possible.

LLVM ist a mess

Posted Mar 17, 2024 17:49 UTC (Sun) by farnz (subscriber, #17727) [Link] (2 responses)

And even when they didn't assume C++, there's often been bugs that boil down to "Clang doesn't use this much, therefore it's not routinely tested and there's lot of lurking bugs"; see the fun Rust has had trying to use noalias on references, where because the matching Clang feature (the C99 restrict type qualifier) is rarely used correctly, miscompilations by LLVM traceable to noalias in LLVM IR kept blocking Rust from using it for Rust references (which definitionally can't alias each other).

LLVM ist a mess

Posted Mar 17, 2024 19:09 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

Is LLVM written in C++? Should they re-write it in Rust?

:-))

Cheers,
Wol

LLVM ist a mess

Posted Mar 17, 2024 19:53 UTC (Sun) by farnz (subscriber, #17727) [Link]

Implementation language isn't at the root of this; the underlying issue is that LLVM IR's semantics aren't (yet) formally defined, but instead rely on informal reasoning throughout LLVM. As a consequence, it's intractable to verify that LLVM actually implements the claimed semantics, and it's not reasonable to write test cases that validate the LLVM IR semantics are met in edge cases, since we don't actually know what the edge cases are.

There's efforts afoot to fully define LLVM IR semantics formally, and one of the biggest outputs those efforts are having (at the moment) is finding bugs in existing LLVM functionality, where existing LLVM code assumes opposing meanings (that both fit the informally defined semantics) for the same LLVM IR construct in different places.

LLVM ist a mess

Posted Mar 17, 2024 17:06 UTC (Sun) by willy (subscriber, #9762) [Link] (18 responses)

Writing a compiler is not a hard problem. Evidence: https://bellard.org/tcc/ (and many other compilers).

What is hard is creating a thriving project that has many people who are dedicated to finding & fixing the glass jaws. There's also a question of how much optimisation you really need; TCC takes that to an extreme, but maybe it's a useful extreme.

LLVM ist a mess

Posted Mar 17, 2024 19:56 UTC (Sun) by roc (subscriber, #30627) [Link] (6 responses)

"Writing a compiler is not a hard problem" is too ambiguous to be useful. The gulf between a minimal C compiler (TCC) and an optimizing cross-platform C++ compiler with all the bells and whistles (good error messages, sanitizers, etc etc etc etc) is so vast you're not talking about the same thing at all.

LLVM ist a mess

Posted Mar 17, 2024 20:25 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (5 responses)

One compiler application which feels intuitively useful to me (but I'm not a language designer) would be to have a _non-optimising_ compiler which can translate from a suitable language to dependable constant time machine code for some N architectures where N > 1

The purpose would be to wean ourselves off machine code for writing core cryptographic libraries. It would be nice if the sort of people who enter NIST competitions could write this rather than C but it's not crucial.

In this application we actually don't want ordinary optimisation, so I suspect some (many?) optimisation strategies are invalid and it may be faster to begin from almost nothing.

LLVM ist a mess

Posted Mar 17, 2024 22:07 UTC (Sun) by khim (subscriber, #9252) [Link] (4 responses)

> One compiler application which feels intuitively useful to me (but I'm not a language designer) would be to have a _non-optimising_ compiler which can translate from a suitable language to dependable constant time machine code for some N architectures where N > 1

You do realize that for modern CPUs “architecture”, here, would include not just CPU vendor, but stepping, version of microcode, etc? One trivial example: when Intel implemented BMI instructions in 2013 they had nice, constant, execution time, but AMD turned them into nice let's leak all your data to everyone to see version after four years and every microcode update (on both AMD and Intel) may do the same to any instruction — to patch some other vulnerability.

> In this application we actually don't want ordinary optimisation, so I suspect some (many?) optimisation strategies are invalid and it may be faster to begin from almost nothing.

Before you may even begin attempting something like this you would need to define what do you want in the end. Given the fact that give enough samples you may even distinguish between (xor %eax,%eax and mov $1,%eax (they affect flags and one is 2bytes while other is is 5bytes) first you would need to define some metric which would say if timings are “sufficiently similar” or not.

The whole thing looks like an incredible waste of manpower: instead of trying to achieve something that's not possible to, realistically, achieve on modern CPUs we should ensure that non-ephemeral keys are generated on dedicated core. Adding tiny ARM core (Cell-style) would be much easier and more robust than attempts to create such compiler.

Constant-time cryptography

Posted Mar 18, 2024 7:05 UTC (Mon) by DemiMarie (subscriber, #164188) [Link] (1 responses)

This is not the first time I have seen this suggestion. It is also completely non-viable in practice. The security core will be much slower than the other cores, which will ruin performance. One can avoid that by using a hardware accelerator instead of a slow core, but then one needs to (a) patch all of the existing applications and libraries to use the accelerator and (b) deal with the fact that hardware accelerators, especially for symmetric cryptography, require an asynchronous API to get good performance. That requires application changes, not just library ones.

Hardware crypto engines are nice, but they are not at all a substitute for constant time guarantees for software operations.

Constant-time cryptography

Posted Mar 18, 2024 8:55 UTC (Mon) by khim (subscriber, #9252) [Link]

> Hardware crypto engines are nice, but they are not at all a substitute for constant time guarantees for software operations.

Oh, sure. Hardware works. “Constant time guarantees” are a snake oil you may lucratively sell. Completely different products with different properties and target audience.

> That requires application changes, not just library ones.

So you can't even change apps, yet, somehow, pretend that they are not leaking your precious key in some other way except for operations being of different speeds depending on source?

You keys are not leaking (or maybe leaking but you just don't know that) because nobody targets you. It's as simple as that.

LLVM ist a mess

Posted Mar 18, 2024 9:01 UTC (Mon) by pm215 (subscriber, #98099) [Link] (1 responses)

Modern CPUs, at least for Intel and Arm, have an architecturally defined data independent timing mode that you can enable in a status register bit when you want to execute this kind of crypto code, and which then guarantees that execution timing of a specified subset of instructions is not dependent on the data they are operating on. So I think the situation is not so bleak as you suggest: there's now a defined set of "stay within these boundaries and things won't change in future designs or microcode updates" rules.

LLVM ist a mess

Posted Mar 18, 2024 9:08 UTC (Mon) by khim (subscriber, #9252) [Link]

> Modern CPUs, at least for Intel and Arm, have an architecturally defined data independent timing mode that you can enable in a status register bit when you want to execute this kind of crypto code, and which then guarantees that execution timing of a specified subset of instructions is not dependent on the data they are operating on.

They still would depend on alignment of you data and code, on speculative properties of code which was executed before and after you call that “well crafted” code and so on.

Just look on continuous struggle to guarantee that SGX is useful for something. With another vulnerability revealed less than week ago.

Ultimately the solution would be the same as with memory security in C: solution that was obvious on the day one would be applied… but only after everything else would be unsuccessfully tried.

LLVM ist a mess

Posted Mar 18, 2024 15:18 UTC (Mon) by paulj (subscriber, #341) [Link] (10 responses)

"writing a [blah] is not a hard problem" -> links to something written by one of the most prodigious, genius authors of Free Software. Hmm... ;)

LLVM ist a mess

Posted Mar 18, 2024 16:28 UTC (Mon) by willy (subscriber, #9762) [Link] (5 responses)

You've completely missed the point of that comment, but to refute the uninteresting part you're quibbling with:

https://student.cs.uwaterloo.ca/~cs444/

Team of four students builds a compiler in three months.

LLVM ist a mess

Posted Mar 18, 2024 18:15 UTC (Mon) by paulj (subscriber, #341) [Link] (4 responses)

It was mostly humour. I too, like most students who've done a CS degree, have had to write some kind compiler for a class assignment (a toy front-end with type-checker for a subset of a typed JS-ish language). I've written compilers for simple DSLs - in AWK even ;). A compiler itself is not /that/ hard - completely agreed.

I was just pointing out the humour in the irony of making that point via an example written by an author who has a (prodigious) habit of solving difficult problems. ;)

I agree though that, even if a basic compiler is simple, there is a /lot/ more to making a _good_ C/C++ compiler.

LLVM ist a mess

Posted Mar 18, 2024 23:48 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

> I too, like most students who've done a CS degree, have had to write some kind compiler for a class assignment (a toy front-end with type-checker for a subset of a typed JS-ish language).

Fun fact: If your professor is sufficiently insane, it is possible that you will end up having to write an interpreter for the (untyped) lambda calculus. So count yourself lucky that you got a language that actually looked vaguely modern.

OTOH, I must admit that the lambda calculus is much, *much* easier to implement than most real languages. It only has 2½ rules, or 1½ if you use De Bruijn indexing. But I would've liked to do a real language, or at least something resembling a real language. I often feel that the most difficult courses were the only ones that actually taught me anything useful.

LLVM ist a mess

Posted Mar 20, 2024 20:14 UTC (Wed) by ringerc (subscriber, #3071) [Link]

You can always make the course more interesting.

I had a comp sci course on concurrency proofs and theory. The tool they used for it sucked so I updated it from the ancient RH4 target it required and replaced the build system. Then fixed some bugs and memory issues. Improved the error messages and generally made the tool nicer to use.

LLVM ist a mess

Posted Mar 19, 2024 4:42 UTC (Tue) by buck (subscriber, #55985) [Link] (1 responses)

> I was just pointing out the humour in the irony of making that point via an example written by an author who has a (prodigious) habit of solving difficult problems. ;)

Well, I'm with you: a compiler written by Fabrice Bellard is not your run-of-the-mill hobby project.

But, I'm really more just pointing out that this still blows me away:

https://bellard.org/jslinux/index.html

which i think saves this comment from being (rightly) criticized for being OT, since it's more Linux-y than the article, and this is LWN after all, dang it.

Well, to get this right back on topic, i can actually just point out that JSLinux features tcc and gcc but not clang:

localhost:~# cat readme.txt
Some tests:

- Compile hello.c with gcc (or tcc):

gcc hello.c -o hello
./hello

- Run QuickJS:

qjs hello.js

- Run python:

python3 bench.py
localhost:~#

b/c the performance win:

[`time gcc -o hello -c hello.c -O0` output elided to spare my old laptop's feelings]

LLVM ist a mess

Posted Mar 19, 2024 4:47 UTC (Tue) by buck (subscriber, #55985) [Link]

Sorry. Let me stand corrected by myself:

Quoth https://bellard.org/jslinux/news.html:

2020-07-05:

Added the Alpine Linux distribution. Many packages are included such as gcc, Clang, Python 2.7 and 3.8, Node.js, Ruby, PHP, ... The more adventurous (and patient) people can also try to run Wine or Firefox.
Added SSE2 support to the x86 emulator
Added dynamic resizing of the terminal

LLVM ist a mess

Posted Mar 18, 2024 18:02 UTC (Mon) by farnz (subscriber, #17727) [Link]

I've written more than one compiler, and I would not count myself as a Fabrice Bellard level developer. You should be able to write a simple optimizing C compiler following a book like this in about 3 months full-time effort if you're a competent developer (less if you're willing to reuse tools like gas and ld rather than doing everything yourself).

LLVM ist a mess

Posted Mar 18, 2024 19:57 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Writing a "toy C" compiler had been a course project in my university. It was C with most of its functionality, except things that are baffling, like syntactic ambiguity between function pointers and definitions, dangling else, etc.

It really is not a hard problem. Annoying and somewhat long, but not hard.

LLVM ist a mess

Posted Mar 19, 2024 5:40 UTC (Tue) by adobriyan (subscriber, #30858) [Link] (1 responses)

And you'll learn operator precedence table for FREE!

LLVM ist a mess

Posted Mar 19, 2024 5:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

I actually promptly forgot it. I always just spam the code with braces to disambiguate anything that is more complex than 2*2+1.

LLVM ist a mess

Posted Mar 20, 2024 22:44 UTC (Wed) by Curan (subscriber, #66186) [Link] (1 responses)

>> if you need to break core concepts this often, you probably made a lot of mistakes in the past
>
> Nope. Linux kernel breaks internal interfaces pretty often, too. The only problem of LLVM is that it's advertised as something of a separate project while in reality it's only half of many projects.
>
> If all these projects would have lived in one repo and people would have changed everything in sync it wouldn't have even been visible.

I do not care about internal interfaces. I do care about what they break in their official interfaces. Especially in their C interface (which gets worse every release, it seams like, and forces you to use the even worse C++ interface).

And offering a libllvm/libclang means you have to take some responsibility. At least make the minor versions work across the board all the time. Still get the some fails there, when some piece of software bundles their own LLVM and the system has a different one.

>> a lot of LLVM feels like it is a test environment to try out new things for the compiler space
>
> Which is precisely what LLVM was designed for. Just open Wikipedia and read: LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages. From what you are saying LLVM works and acts like it was designed to work and act so why is that an issue?

First of all – as others pointed out too – that is not what LLVM claims itself these days. (NB: LLVM/clang is the standard compiler for eg. Mac OS via XCode.) So either there needs to be a big fat warning at the top of all of LLVM that says "don't use my for production, I am a test environment" or LLVM needs to play ball.

>> but then it shouldn't be the basis of anything else
>
> Build “better basis for anything else”, isn't that the right solution? Maybe as LLVM fork or write from scratch.
>
> I was told in no-uncertain terms in somewhat tangetially related discussion just over there that you have zero right to complain since LLVM is free.

I have no issue with somebody attempting to build something better (no matter the language or the licensing model). What I do have an issue with is my stuff breaking because of some library. The glibc makes sure my oldest programs still work (even though there will be not much of a chance to get a new version of them for me). I want that commitment from LLVM. I do not care what they do internally. But their interfaces need to be stable enough.

LLVM ist a mess

Posted Mar 21, 2024 10:06 UTC (Thu) by khim (subscriber, #9252) [Link]

> At least make the minor versions work across the board all the time.

Why should they do that and was there such a promise ever mentioned on their web site?

> Still get the some fails there, when some piece of software bundles their own LLVM and the system has a different one.

Yes, LLVM is designed around the idea that it would be bundled with a frontend. If you have other ideas then it's your responsibility to support them.

> NB: LLVM/clang is the standard compiler for eg. Mac OS via XCode.

Yes. And that pair have stable outer interfaces AFAIK.

> So either there needs to be a big fat warning at the top of all of LLVM that says "don't use my for production, I am a test environment" or LLVM needs to play ball.

It does do that if you use it according to it's design.

For a long time LLVM wasn't even designed to be used as shared library, but at some point Apple decided to change that. And they did. Now it's easier to embed LLVM into external projects as a shared library, but there are still no promises beyond that.

If you want something more than that then it's your responsibility to offer such solution.

> I want that commitment from LLVM. I do not care what they do internally. But their interfaces need to be stable enough.

But who would do the work to ensure that? That's non-trivial amount of work and AFAIK no one ever volunteered.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds