|
|
Log in / Subscribe / Register

LLVM ist a mess

LLVM ist a mess

Posted Mar 17, 2024 17:06 UTC (Sun) by willy (subscriber, #9762)
In reply to: LLVM ist a mess by khim
Parent article: Cranelift code generation comes to Rust

Writing a compiler is not a hard problem. Evidence: https://bellard.org/tcc/ (and many other compilers).

What is hard is creating a thriving project that has many people who are dedicated to finding & fixing the glass jaws. There's also a question of how much optimisation you really need; TCC takes that to an extreme, but maybe it's a useful extreme.


to post comments

LLVM ist a mess

Posted Mar 17, 2024 19:56 UTC (Sun) by roc (subscriber, #30627) [Link] (6 responses)

"Writing a compiler is not a hard problem" is too ambiguous to be useful. The gulf between a minimal C compiler (TCC) and an optimizing cross-platform C++ compiler with all the bells and whistles (good error messages, sanitizers, etc etc etc etc) is so vast you're not talking about the same thing at all.

LLVM ist a mess

Posted Mar 17, 2024 20:25 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (5 responses)

One compiler application which feels intuitively useful to me (but I'm not a language designer) would be to have a _non-optimising_ compiler which can translate from a suitable language to dependable constant time machine code for some N architectures where N > 1

The purpose would be to wean ourselves off machine code for writing core cryptographic libraries. It would be nice if the sort of people who enter NIST competitions could write this rather than C but it's not crucial.

In this application we actually don't want ordinary optimisation, so I suspect some (many?) optimisation strategies are invalid and it may be faster to begin from almost nothing.

LLVM ist a mess

Posted Mar 17, 2024 22:07 UTC (Sun) by khim (subscriber, #9252) [Link] (4 responses)

> One compiler application which feels intuitively useful to me (but I'm not a language designer) would be to have a _non-optimising_ compiler which can translate from a suitable language to dependable constant time machine code for some N architectures where N > 1

You do realize that for modern CPUs “architecture”, here, would include not just CPU vendor, but stepping, version of microcode, etc? One trivial example: when Intel implemented BMI instructions in 2013 they had nice, constant, execution time, but AMD turned them into nice let's leak all your data to everyone to see version after four years and every microcode update (on both AMD and Intel) may do the same to any instruction — to patch some other vulnerability.

> In this application we actually don't want ordinary optimisation, so I suspect some (many?) optimisation strategies are invalid and it may be faster to begin from almost nothing.

Before you may even begin attempting something like this you would need to define what do you want in the end. Given the fact that give enough samples you may even distinguish between (xor %eax,%eax and mov $1,%eax (they affect flags and one is 2bytes while other is is 5bytes) first you would need to define some metric which would say if timings are “sufficiently similar” or not.

The whole thing looks like an incredible waste of manpower: instead of trying to achieve something that's not possible to, realistically, achieve on modern CPUs we should ensure that non-ephemeral keys are generated on dedicated core. Adding tiny ARM core (Cell-style) would be much easier and more robust than attempts to create such compiler.

Constant-time cryptography

Posted Mar 18, 2024 7:05 UTC (Mon) by DemiMarie (subscriber, #164188) [Link] (1 responses)

This is not the first time I have seen this suggestion. It is also completely non-viable in practice. The security core will be much slower than the other cores, which will ruin performance. One can avoid that by using a hardware accelerator instead of a slow core, but then one needs to (a) patch all of the existing applications and libraries to use the accelerator and (b) deal with the fact that hardware accelerators, especially for symmetric cryptography, require an asynchronous API to get good performance. That requires application changes, not just library ones.

Hardware crypto engines are nice, but they are not at all a substitute for constant time guarantees for software operations.

Constant-time cryptography

Posted Mar 18, 2024 8:55 UTC (Mon) by khim (subscriber, #9252) [Link]

> Hardware crypto engines are nice, but they are not at all a substitute for constant time guarantees for software operations.

Oh, sure. Hardware works. “Constant time guarantees” are a snake oil you may lucratively sell. Completely different products with different properties and target audience.

> That requires application changes, not just library ones.

So you can't even change apps, yet, somehow, pretend that they are not leaking your precious key in some other way except for operations being of different speeds depending on source?

You keys are not leaking (or maybe leaking but you just don't know that) because nobody targets you. It's as simple as that.

LLVM ist a mess

Posted Mar 18, 2024 9:01 UTC (Mon) by pm215 (subscriber, #98099) [Link] (1 responses)

Modern CPUs, at least for Intel and Arm, have an architecturally defined data independent timing mode that you can enable in a status register bit when you want to execute this kind of crypto code, and which then guarantees that execution timing of a specified subset of instructions is not dependent on the data they are operating on. So I think the situation is not so bleak as you suggest: there's now a defined set of "stay within these boundaries and things won't change in future designs or microcode updates" rules.

LLVM ist a mess

Posted Mar 18, 2024 9:08 UTC (Mon) by khim (subscriber, #9252) [Link]

> Modern CPUs, at least for Intel and Arm, have an architecturally defined data independent timing mode that you can enable in a status register bit when you want to execute this kind of crypto code, and which then guarantees that execution timing of a specified subset of instructions is not dependent on the data they are operating on.

They still would depend on alignment of you data and code, on speculative properties of code which was executed before and after you call that “well crafted” code and so on.

Just look on continuous struggle to guarantee that SGX is useful for something. With another vulnerability revealed less than week ago.

Ultimately the solution would be the same as with memory security in C: solution that was obvious on the day one would be applied… but only after everything else would be unsuccessfully tried.

LLVM ist a mess

Posted Mar 18, 2024 15:18 UTC (Mon) by paulj (subscriber, #341) [Link] (10 responses)

"writing a [blah] is not a hard problem" -> links to something written by one of the most prodigious, genius authors of Free Software. Hmm... ;)

LLVM ist a mess

Posted Mar 18, 2024 16:28 UTC (Mon) by willy (subscriber, #9762) [Link] (5 responses)

You've completely missed the point of that comment, but to refute the uninteresting part you're quibbling with:

https://student.cs.uwaterloo.ca/~cs444/

Team of four students builds a compiler in three months.

LLVM ist a mess

Posted Mar 18, 2024 18:15 UTC (Mon) by paulj (subscriber, #341) [Link] (4 responses)

It was mostly humour. I too, like most students who've done a CS degree, have had to write some kind compiler for a class assignment (a toy front-end with type-checker for a subset of a typed JS-ish language). I've written compilers for simple DSLs - in AWK even ;). A compiler itself is not /that/ hard - completely agreed.

I was just pointing out the humour in the irony of making that point via an example written by an author who has a (prodigious) habit of solving difficult problems. ;)

I agree though that, even if a basic compiler is simple, there is a /lot/ more to making a _good_ C/C++ compiler.

LLVM ist a mess

Posted Mar 18, 2024 23:48 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

> I too, like most students who've done a CS degree, have had to write some kind compiler for a class assignment (a toy front-end with type-checker for a subset of a typed JS-ish language).

Fun fact: If your professor is sufficiently insane, it is possible that you will end up having to write an interpreter for the (untyped) lambda calculus. So count yourself lucky that you got a language that actually looked vaguely modern.

OTOH, I must admit that the lambda calculus is much, *much* easier to implement than most real languages. It only has 2½ rules, or 1½ if you use De Bruijn indexing. But I would've liked to do a real language, or at least something resembling a real language. I often feel that the most difficult courses were the only ones that actually taught me anything useful.

LLVM ist a mess

Posted Mar 20, 2024 20:14 UTC (Wed) by ringerc (subscriber, #3071) [Link]

You can always make the course more interesting.

I had a comp sci course on concurrency proofs and theory. The tool they used for it sucked so I updated it from the ancient RH4 target it required and replaced the build system. Then fixed some bugs and memory issues. Improved the error messages and generally made the tool nicer to use.

LLVM ist a mess

Posted Mar 19, 2024 4:42 UTC (Tue) by buck (subscriber, #55985) [Link] (1 responses)

> I was just pointing out the humour in the irony of making that point via an example written by an author who has a (prodigious) habit of solving difficult problems. ;)

Well, I'm with you: a compiler written by Fabrice Bellard is not your run-of-the-mill hobby project.

But, I'm really more just pointing out that this still blows me away:

https://bellard.org/jslinux/index.html

which i think saves this comment from being (rightly) criticized for being OT, since it's more Linux-y than the article, and this is LWN after all, dang it.

Well, to get this right back on topic, i can actually just point out that JSLinux features tcc and gcc but not clang:

localhost:~# cat readme.txt
Some tests:

- Compile hello.c with gcc (or tcc):

gcc hello.c -o hello
./hello

- Run QuickJS:

qjs hello.js

- Run python:

python3 bench.py
localhost:~#

b/c the performance win:

[`time gcc -o hello -c hello.c -O0` output elided to spare my old laptop's feelings]

LLVM ist a mess

Posted Mar 19, 2024 4:47 UTC (Tue) by buck (subscriber, #55985) [Link]

Sorry. Let me stand corrected by myself:

Quoth https://bellard.org/jslinux/news.html:

2020-07-05:

Added the Alpine Linux distribution. Many packages are included such as gcc, Clang, Python 2.7 and 3.8, Node.js, Ruby, PHP, ... The more adventurous (and patient) people can also try to run Wine or Firefox.
Added SSE2 support to the x86 emulator
Added dynamic resizing of the terminal

LLVM ist a mess

Posted Mar 18, 2024 18:02 UTC (Mon) by farnz (subscriber, #17727) [Link]

I've written more than one compiler, and I would not count myself as a Fabrice Bellard level developer. You should be able to write a simple optimizing C compiler following a book like this in about 3 months full-time effort if you're a competent developer (less if you're willing to reuse tools like gas and ld rather than doing everything yourself).

LLVM ist a mess

Posted Mar 18, 2024 19:57 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Writing a "toy C" compiler had been a course project in my university. It was C with most of its functionality, except things that are baffling, like syntactic ambiguity between function pointers and definitions, dangling else, etc.

It really is not a hard problem. Annoying and somewhat long, but not hard.

LLVM ist a mess

Posted Mar 19, 2024 5:40 UTC (Tue) by adobriyan (subscriber, #30858) [Link] (1 responses)

And you'll learn operator precedence table for FREE!

LLVM ist a mess

Posted Mar 19, 2024 5:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

I actually promptly forgot it. I always just spam the code with braces to disambiguate anything that is more complex than 2*2+1.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds