LWN: Comments on "Continued attacks on HTTP/2"

Continued attacks on HTTP/2

kleptog — Thu, 18 Apr 2024 11:39:22 +0000

> Fortunately there's room for everyone, those who prefer code that is highly readable and flexible

Well, that code is readable in the sense that you see what the code does. However, I spent a good five minutes looking at it to see if I could prove to myself it actually does what the comment claims it does. I'm still not really sure, but I guess it must if everyone else claims it's fine. Using memcpy() won't fix that. There's this niggling worry that the borrowing during the subtractions opens a hole somewhere. There's probably an appropriate way to look it that does make sense.

But it is an interesting question: what are the chances the compiler will see that code, figure that if it unrolls the loop a few times it can use SSE instructions to make it even faster, and they *do* require alignment. It probably won't happen for the same reason I can't prove to myself the code works now.

Would it make a difference if you used unsigned int instead? At least then subtraction underflow is well defined. I don't know.

Continued attacks on HTTP/2

foom — Wed, 17 Apr 2024 19:30:49 +0000

> In this specific case, what would be wrong would be to use memcpy(), because you'd just give up the ifdef and always use that, and doing that on a strict-alignment CPU would cost a lost.

Why would you delete the ifdef just because you fixed the UB?

There's nothing wrong with keeping an ifdef to choose between two different correct implementations, one of which has better performance on certain architectures...

Continued attacks on HTTP/2

farnz — Wed, 17 Apr 2024 17:07:41 +0000

That ship sailed in the 1980s (before Standard C existed), when C was ported to Cray systems. It's been a problem for a long time; it would be less of a problem if C90 had been "the language as a macro assembler", and C99 had changed the model, but the issue remains that Standard C has never worked this way, since by the time we standardised C, this wasn't the model used by several significant compiler vendors.

And the underlying tension is that people who want C compilers to work as macro assemblers with knobs on aren't willing to switch to C compilers that work that way; if people used TinyCC instead of GCC and Clang, for example, they'd get what they want. They'd just have to admit that C the way they think of it is not C the way MSVC, GCC or Clang implements it, but instead C the way TinyCC and similar projects implement it.

Continued attacks on HTTP/2

Wol — Wed, 17 Apr 2024 16:22:14 +0000

> No, it's not. And one of the reasons that C is such a mess is that people who treat C like a macro assembler assume that everyone else thinks about code the way they do, and make sweeping false statements about the ways other people think.

Thing is, that's the way Kernighan and Ritchie thought. So everyone who thinks that way, thinks the way the designers of C thought.

Maybe times should move on. But if you're going to rewrite the entire philosophy of the language, DON'T. It will inevitably lead to the current religious mess we've now got! It's worse than The Judean People's Front vs. the People's Front of Judea!

Cheers,
Wol

Continued attacks on HTTP/2

foom — Wed, 17 Apr 2024 16:16:23 +0000

> under MS-DOS, you'd know that type-based alignment was just not a thing by then, it was perfectly normal *not* to align types. All holes were filled. Yet C did already exist.

An implementation may set the alignment of every type to 1, if it desired. In such a case, you'd never need to worry about misaligned pointers.

> hypothetical problems

Okay, a non-hypothetical case of this cheating causing problems. "Modern" 32bit Arm hardware accepts unaligned pointers for 1, 2, and 4-byte load/store operations. People saw this hardware specification, and broke the rules in their C code. That code worked great in practice despite breaking the rules. Yay.

But, on that same hardware, 8-byte load/store operations require 4-byte-aligned pointers. And at some point, the compiler leaned how to coalesce two sequential known-4-byte-aligned 4-byte loads into a single 8-byte load. This is good for performance. And this broke real code which broke the rules and was accessing misaligned "int*".

Compiler writers of course say: Your fault. We followed the spec, those accesses were already prohibited, what do you want? Don't break the rules if you want your program to work.

Users say: But I followed the unwritten spec of "I'm allowed to cheat the rules and do whatever the heck I want, as long as it seems to work right now." And besides who can even understand all the rules, there's too many and they're too complex to understand.

Continued attacks on HTTP/2

wtarreau — Wed, 17 Apr 2024 14:59:55 +0000

I think we'll never agree anyway. You seem to consider that the compiler always knows well, and I'm among those who spend 20% of their time fighting the massive de-optimization caused by those compilers who think they know what you're trying to do. And I'll always disagree with the memcpy() hack, because it's a hack. The compiler is free to call an external function for this (and there's a reason memcpy() exists as a function) and you have no control over what is done. I can assure you that on quite a bunch of compilers I've seen real calls, that just defeated all the purpose of the access. All what you explain is fine for a GUI where nanoseconds do not exist, they're just totally unrealistic for low-level programming. It has nothing to do with being "old-timers" or whatever. Just trying to pretend people introduce bugs by doing wrong things while in practice they're just doing what the language offers as a standard to support what the architecture supports is non-sense. It would be fine if the code was not protected etc but when the case is specifically handled for the supported platforms, the language obiously supports this since it's a well-known and much used case.

In this specific case, what would be wrong would be to use memcpy(), because you'd just give up the ifdef and always use that, and doing that on a strict-alignment CPU would cost a lost. In this case you precisely want to work one byte at a time and certainly not memcpy()!

As you called me names "old-timers" I would equally say that this new way of thinking comes from junkies, but I'll not, you'll probably have plenty of time left in your life to learn about all the nice stuff computers can do when they're not forced to burn cycles, coal or gas just to make the code look nice.

Fortunately there's room for everyone, those who prefer code that is highly readable and flexible and supports fast evolutions and those who prefer to work in areas that do not stand as fast changes because time is important. What's
sure is that those are rarely the same persons. Different preferences etc come into play. But both are needed to make the computers and systems we're using today.

Continued attacks on HTTP/2

mb — Wed, 17 Apr 2024 14:52:05 +0000

Well said.

Continued attacks on HTTP/2

farnz — Wed, 17 Apr 2024 13:03:33 +0000

"This thinking" is just what everyone educated about processors does when they care about performance. CPU vendors go through great difficulties to make sure that these extremely common patterns work efficiently because they're in critical paths, to the point that old CPUs which didn't support them are now dead (armv5, mips, sparc).

No, it's not. And one of the reasons that C is such a mess is that people who treat C like a macro assembler assume that everyone else thinks about code the way they do, and make sweeping false statements about the ways other people think.

Everything else in your comment flows from "if the compiler acts like a macro assembler, then this is all true, and nothing else can possibly be true as a result"; however, a well-written language specification and compiler is perfectly capable of detecting that you're doing a memcpy of 4 unaligned bytes into a 32 bit unsigned integer, and converting that into a load instruction.

The compiler should also then notice that your operations have exactly the same effect regardless of endianness, and will therefore not bother with byteswapping, since the effect of byteswapping is a no-op, and will optimize still further on that basis.

You'll notice that this is a very different style of performance reasoning, since the compiler is now part of your reasoning. But it is what most people educated about processors have done when I've been hand-optimizing code with them; they've been sticking within the defined semantics of the language, known what optimizations the compiler can do, and have cross-checked assembly output against what they want to be confident that they're getting what they want.

And you'll note that I didn't talk about MS-DOS; I talked about all old compilers; compilers of that era simply didn't do any complex analysis of the codebase to optimize it, and would pessimise the code that a modern expert on processor performance would write.

I'm very annoyed that there's a bunch of old-timers who have a culture of denying everything that's improved since the 1970s, trying to invent hypothetical reasons why compilers can't do the stuff that they do day-in, day-out, just for the sake of denigrating the facilities offered by hardware that others make use of after reading the specs. It would be better if they talked about stuff they know and that is useful to others, rather than decrying change because it's made things different to the ancient era.

Continued attacks on HTTP/2

wtarreau — Wed, 17 Apr 2024 12:37:24 +0000

"This thinking" is just what everyone educated about processors does when they care about performance. CPU vendors go through great difficulties to make sure that these extremely common patterns work efficiently because they're in critical paths, to the point that old CPUs which didn't support them are now dead (armv5, mips, sparc).

One would be completely crazy or ignorant to ruin the performance of their program doing one byte at a time on a machine which purposely dedicates silicon to address such common patterns. Look at compressors, hashing functions etc. Everyone uses this. A quick grep in the kernel shows me 8828 occurrences. Originally "undefined behavior" meant "not portable, will depend on the underlying hardware", just like the problem with signed addition overflow, shifts by more than the size of the word, etc. It's only recently that within the clang vs gcc battle it was found funny to purposely break programs relying on trustable and reliable implementations after carefully detecting them.

And even for the unaligned access I'm not even sure it's UB, I seem to remember it was mentioned as implementation specific, because that's just a constraint that's 100% hardware-specific, then should be relevant to the psABI only. And if you had programmed under MS-DOS, you'd know that type-based alignment was just not a thing by then, it was perfectly normal *not* to align types. All holes were filled. Yet C did already exist. It was only the 386 that brought this alignment flag whose purpose was not much understood by then and that nobody used. In any case the compiler has no reason to insert specific code to make your aligned accesses work and detect the unaligned ones and make them fail. Such casts exist because they're both needed and useful.

Actually, I'm a bit annoyed by that new culture consisting of denying everything that exists, trying to invent hypothetical problems that would require tremendous efforts to implement, just for the sake of denigrating the facilities offered by hardware that others naturally make use of after reading the specs. But it makes people talk and comment, that's great already. It would be better if they would talk about stuff they know and that are useful to others, of course.

Continued attacks on HTTP/2

farnz — Wed, 17 Apr 2024 09:21:32 +0000

There's a significant bit of history behind this sort of thinking, though. It's only relatively recently (late 1990s) that C compilers became more sophisticated than the combination of a macro assembler with peephole optimization and register allocation.

If you learnt C "back in the day", then you almost certainly built a mental model of how C works based on this compilation model: each piece of C code turns into a predictable sequence of instructions (so c = a + b always turns into the sequence "load a into virtual register for a, load b into virtual register for b, set virtual register for c to the sum of virtual register for a plus virtual register for b, store virtual register for c to c's memory location"), then the compiler goes through and removes surplus instructions (e.g. if a and b are already loaded into registers, no need to reload), then it does register allocation to turn all the virtual registers into real registers with spilling to the stack as needed.

That's not the modern compilation model at all, and as a result intuition about C that dates from that era is often wrong.

Continued attacks on HTTP/2

foom — Tue, 16 Apr 2024 20:13:51 +0000

It is cheating.

You have an ifdef testing for architectures that have unaligned memory access instructions at the machine level. But it is undefined behavior to read misaligned pointers at the C-language level, even so. No compiler I'm aware of has made any guarantee that'll work, even on your list of architectures.

Yes, empirically this code generates a working program on common compilers for those architectures today, despite the intentional bug. But that does NOT make the code correct. It may well stop working at any compiler upgrade, because you are breaking the rules.

But I don't mean to pick on just this code: this sort of thinking is ubiquitous in the C culture. And it's a serious cultural problem.

(And, btw, there's not a good reason to break the rules here: a memcpy from the buffer into a local int could generate the exact same machine instructions, without the UB.)

Continued attacks on HTTP/2

farnz — Tue, 16 Apr 2024 17:41:04 +0000

If I understand correctly, it breaks the C rules for type punning. Note that I'm not completely clear on the rules myself (I used to do C++, I now do Rust, both of which have different rules to C), but my understanding is that in standard C, you can cast any type to char * and dereference, but you cannot cast char * to anything and dereference it. You can do this via a union, and (at least GCC and Clang) compilers have -fno-strict-aliasing to change the rules such that this is OK (and I've not looked at your build system to know if you unconditionally set -fno-strict-aliasing).

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 16:48:43 +0000

Not at all, it's not cheating nor a mistake. It's perfectly defined thanks to the ifdef above it, which guarantees that we *do* support unaligned accesses on this arch. Rest assured that such code runs fine on sparc, mips, riscv64, armbe/le/64, x86 of course, and used to on PARISC, Alpha and VAX though I haven't tested it there for at least 10 years so I wouldn't promise anything on that front :-)

There are other places where we need to read possibly unaligned data (in protocol essentially) and it's done reliably instead.

Continued attacks on HTTP/2

adobriyan — Tue, 16 Apr 2024 16:44:37 +0000

manual shift is probably unnecessary too

> if (likely((unsigned char)(*ptr - 33) <= 93)) { /* 33 to 126 included */

Continued attacks on HTTP/2

foom — Tue, 16 Apr 2024 14:56:03 +0000

A few lines down looks like a nice example of a performance hack which introduces UB in the code.

https://github.com/haproxy/haproxy/blob/50d8c187423d6b7e9...

That's a great illustratation of the impossibility of remembering all the rules you must follow in C to avoid undefined behavior, and/or the widespread culture in the C/C++ development community of believing it's okay to "cheat" those rules. "Surely it is harmless to cheat, if you know what your CPU architecture guarantees?", even though the compiler makes no such guarantee.

In this case, building with -fsanitize=undefined (or more specifically -fsanitize=alignment) could catch this intentional-mistake.

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 14:08:48 +0000

100% agreed.

But there's also another factor which does not help, and I'm sure many of us have known such people: the lower level the language you practise, the most expert you look to some people. And there are a lot of people using C and ASM who regularly brag saying "ah, you didn't know this ? I genuinely thought everyone did". That hurts a lot because it puts a barrier in the transmission of knowledge on how to do better. In addition, with the C spec not being freely available (OK the latest draft is, but that's not a clean approach), you figure that a lot of corner cases have long been ignored by a lot of practisers (by far the vast majority, starting from teachers at school). For example I learned that signed ints were not supposed to wrap only after 25 years of routinely using them that way, when the GCC folks suddenly decided to abuse that UB. Having done ASM long before C, what a shock to me to discover that the compiler was no longer compatible with the hardware and that the language supported this decision! Other communities are probably more welcoming to newbies and do not try to impose their tricks saying "look, that's how experienced people do it". As such I think that some langauges develop a culture but that there's still little culture around C or ASM, outside of a few important projects using these languages like our preferred operating system.

Continued attacks on HTTP/2

Wol — Tue, 16 Apr 2024 14:08:39 +0000

> > I think a lot of it is more philosophy of the programmer than the actual language. However, a performance philosophy is more frequently found in C programmers. There is a cultural element, which correlates to languages.

I dunno about C. I have the same philosophy and I started with FORTRAN. I think it's partly age, and partly field of programming.

My first computer that I bought was a Jupiter Ace. 3 KILObytes of ram. My contemporaries had ZX80s, which I think was only 1 KB. The first computer I worked on was a Pr1me 25/30, which had 256KB for 20 users. That's the age thing - we didn't have resource to squander.

And one job I always mention on my CV is, I was asked to automate a job and told I had 6 weeks to do it - they needed the results for a customer. 4 weeks in, I'd completed the basic program, then estimated the *run* time to complete the job (given sole use of the mini available to me) as "about 5 weeks". Two days of hard work optimising the program to speed it up, and I handed it over to the team who were actually running the job for the customer. We made the deadline (although, immediately on handing the program over, I went sick and said "ring me if you need me". Had a lovely week off :-) And that's the field of programming - if you're short of resource for the job in hand (still true for many microcontrollers?) you don't have resource to squander.

Cheers,
Wol

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 13:55:59 +0000

> One of the quickest implementations I know of (though, I havn't tested Willy's version ;) ) is LSQUIC. C.

Haven't benchmarked it but we've pulled 260 Gbps out of a server a year ago from ours, it didn't seem bad by then, but we should improve that soon ;-)

> I think a lot of it is more philosophy of the programmer than the actual language. However, a performance philosophy is more frequently found in C programmers. There is a cultural element, which correlates to languages.

I totally agree with this. The number of times I've heard "you're spending your time for a millisecond?" to which I replied "but if this millisecond is wasted 1000 times a second, we're spending our whole lives in it". And it's true that the culture varies with languages, in fact, languages tend to attract people sharing the same culture and/or main concerns (performance, safety, ease of writing code, abundance of libraries, wide community etc).

Continued attacks on HTTP/2

farnz — Tue, 16 Apr 2024 11:27:04 +0000

It is entirely possible to have sufficient discipline (as an individual) to not use unsafe code without realising it, and to have a comment in place that acknowledges every use of a partially defined operation (which is the issue with unsafe code - there are operations in unsafe code that are partially defined) and justifies how you're sticking to just the defined subset of the operation.

However, this takes discipline to resist the temptation to do something partially defined that works in testing; and the big lesson of the last 200 years is that there's only two cases where people can resist the temptation:

Liability for latent faults lies with an individual, who is thus incentivized to ensure that there are no latent faults. This is how civil engineering today handles it - and it took us decades to get the process for this liability handling correct such that either a structure has sign-off from a qualified individual who takes the blame if there are latent faults, or the appropriate people get penalised for starting construction without sign-off.
It's simple and easy to verify that no latent faults exist; this is a capability introduced from mathematics into engineering, where proving something is hard, but verifying an existing proof is simple. This is also used in civil engineering - actually proving that a structure built with certain materials will stand up is hard, but it's trivial to confirm that the proof that it will stand up is valid under the assumptions we care about.

And that's where modern languages help; we know from mathematics that it's possible to construct a system with a small set of axioms, and to then have proofs of correctness that are easy to verify (even if some are hard to construct). Modern languages apply this knowledge to cases where we know that, as a profession, we make problematic mistakes frequently, and thus make it easier to catch cases where the human programmer has made a mistake.

Continued attacks on HTTP/2

farnz — Tue, 16 Apr 2024 11:22:39 +0000

Exactly, and the direction of travel from hand-writing machine code, through assembly, then macro assemblers, and into BLISS, C, Rust and other languages is to move from "this works just fine, it's just hard to verify" to "this is easy to verify, since the machine helps a lot, and the human can easily do the rest".

But it's not that Rust makes it possible to avoid mistakes that you cannot avoid in C; it's that Rust makes it easier to avoid certain mistakes than C, just as C makes it easier to avoid certain mistakes than assembly does, and assembly is easier to get right than hand-crafted machine code.

Continued attacks on HTTP/2

paulj — Tue, 16 Apr 2024 11:19:39 +0000

Have you ever considered that in some cases there might be code that is open-sourced by a giant tech company, that potentially has _deliberately_ bad performance, because if a competitor of theirs uses that code and ends up with bad performance that doesn't hurt said giant tech company?

Even if not outright deliberate, the giant tech company at least has no motivation to make the open sourced code perform well.

As an example, the Google QUIC code-base - from Chromium - has _terrible_ performance server side. It's kind of the reference implementation for QUIC. And I'm pretty sure it's /not/ what GOOG are using internally on their servers. Based on presentations they've given on how they've improved server performance, they've clearly worked on it and not released the improvements.

It's in C++.

One of the quickest implementations I know of (though, I havn't tested Willy's version ;) ) is LSQUIC. C.

I think a lot of it is more philosophy of the programmer than the actual language. However, a performance philosophy is more frequently found in C programmers. There is a cultural element, which correlates to languages.

Continued attacks on HTTP/2

paulj — Tue, 16 Apr 2024 11:11:46 +0000

> the majority of developers will be tempted to move a metric by 1% at the risk of unsafety in a C codebase.

And then those developers will write features ignoring the checking abstraction that makes the parsers safe (at least from overflows), and submit patches with the typical, hairy, dangerous, C parser-directly-twiddling-memory code. And they'll get annoyed when the maintainer objects and asks them to rewrite using the safe abstraction that has protected the project well for years.

Sigh.

Continued attacks on HTTP/2

paulj — Tue, 16 Apr 2024 11:05:30 +0000

This was kind of the experience with the "wise" C network code I maintained. The parsers - as per other comments - were structured to use a simple, checking abstraction layer to read/write atoms. If the higher-level parser made a logical mistake and issued an out of bounds read or write, the checking layer would abort().

This solved the problem of buffer overflows. However, we would still generally have a DoS security bug from the process ending itself. Obviously, an abort() and possible DoS is still /way/ better than an overflow and RCE security bug, but also still not ideal.

The next challenge was to make the parsers logically robust. Memory safe languages do not solve this.

Explicit state machines, in languages (programming or DSLs) that can verify error events are always handled (e.g., by proceeding to some common error state) can help. And even C these days can trivially ensure that all events are handled in an a state machine. Requires programmer discipline though.

It's worth noting in this story (IIUC) that the implementations in "memory safe" languages were vulnerable to the bug, and implementation in the "unsafe" language was not. ;)

Continued attacks on HTTP/2

farnz — Tue, 16 Apr 2024 10:40:54 +0000

Nope, you misinterpreted what I was trying to say.

Keep in mind that things being unsafe is only a problem if you fail to maintain sufficient discipline - and that individuals often can maintain that discipline, even when a wider group can't.

Continued attacks on HTTP/2

Wol — Tue, 16 Apr 2024 10:10:37 +0000

> What we do know is that population-wide, we have insufficient programmers capable of upholding the required level of discipline to use C or C++ safely even under pressure to deliver results. It's entirely consistent to say that wtarreau and paulj are capable of sustaining this level of discipline in a C codebase, while also saying that the majority of developers will be tempted to move a metric by 1% at the risk of unsafety in a C codebase.

The point of "unsafe" is the programmer has to EXPLICITLY opt in to it. The problem with C, and assembler, and languages like that, is that the programmer can use unsafe without even realising it.

That's also my point about state tables. EVERY combination should be EXPLICITLY acknowledged. The problem is that here the programmer has to actively opt in to safe behaviour. Very few do. I certainly try, but doubt I succeed. But if somebody reported a bug against my code and said "this is a state you haven't addressed" I'd certainly acknowledge it as a bug - whether I have time to fix it or not. It might get filed as a Round Tuit, but it would almost certainly be commented, in the code, as "this needs dealing with".

Cheers,
Wol

Continued attacks on HTTP/2

mb — Tue, 16 Apr 2024 09:52:33 +0000

Nope, you mis-interpreted what I was trying to say.
Keep in mind that in C everything is unsafe.

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 09:21:57 +0000

> If you've decided actually the diminishing returns do matter then we're precisely back to actually hand writing the epoll loop probably isn't the best way to expend those limited resources for almost anybody and you've got a singular anecdote where it worked out OK for you, which is good news for you of course.

Often that's how you can prioritize your long-term todo-list. But sometimes having to rely on 3rd-party blocks (e.g. libev+ssl) gets you closer to your goal faster, then blocks you in a corner because the day you say "enough is enough, I'm going to rewrite that part now", you have an immense amount of work to do to convert all the existing stuff to the new preferred approach. Instead when your started small (possibly even with select() or poll()) and progressively grew your system based on production feedback, it's easier to do more frequent baby steps in the right direction, and to adjust that direction based on feedback.

Typically one lib I'm not seeing myself replace with a home-grown one is openssl, and god, it's the one causing me the most grief! But for the rest, I'm glad I haven't remained stuck into generic implementations.

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 09:11:46 +0000

The handlers were simple and all inlined in the switch/case so that was easy. Nowadays the code looks like this:
https://github.com/haproxy/haproxy/blob/master/src/h1.c#L503

It does a little bit more than it used to but it remains quite maintainable (it has never been an issue to adapt protocol processing there). The few macros like EAT_AND_JUMP_OR_RETURN() perform the boundary checks and decide to continue or stop here so that most of the checks are hidden from the visible code.

Continued attacks on HTTP/2

farnz — Tue, 16 Apr 2024 09:08:57 +0000

It does work - if it didn't work, then Rust would fail because people would just use unsafe without thinking. Rust lowers the needed level of personal discipline by making it clearer when you're "cheating", because you have to write unsafe to indicate that you're going into the high-discipline subset, but otherwise it's the same as C.

So, by your assertions, I should expect to see a large amount of Rust code with undisciplined uses of unsafe, since all developers cheat on the required programming discipline that way. In practice, I don't see this when I look at crates.io, which leads me to think that your assertion that no programmer is capable of remaining disciplined in the face of temptation is false.

What we do know is that population-wide, we have insufficient programmers capable of upholding the required level of discipline to use C or C++ safely even under pressure to deliver results. It's entirely consistent to say that wtarreau and paulj are capable of sustaining this level of discipline in a C codebase, while also saying that the majority of developers will be tempted to move a metric by 1% at the risk of unsafety in a C codebase.

Continued attacks on HTTP/2

paulj — Tue, 16 Apr 2024 08:52:47 +0000

Nice.

With direct jumps between the different handlers? I'd be curious how you structured that to be readable? Functions conforming to some "interface" function pointer are a common way to give handlers some structure. But I don't know of any reasonable way to avoid the indirect jump.

Continued attacks on HTTP/2

tialaramex — Tue, 16 Apr 2024 08:50:02 +0000

No, no, "invested effort for diminishing returns" was exactly the point I tried to make above and you contradicted it, insisting that instead you can afford to invest in absolutely *everything* because it shows such massive returns - the thing I'd never seen, and, from the sounds of it, still haven't.

The "fully-controllable base" doesn't really mean anything in software, others too could make such a choice to use things or not use them as they wish. Swap to OpenBSD or Windows, write everything in Pascal or in PowerPC machine code, implement it on an FPGA or custom design silicon, whatever. The skill set matters of course, you struggle with all the non-alphabetic symbols in Rust, whereas say I never really "got" indefinite integrals, probably neither of us is an EDA wizard. But for a corporate entity they can hire in skills to fill those gaps.

If you've decided actually the diminishing returns do matter then we're precisely back to actually hand writing the epoll loop probably isn't the best way to expend those limited resources for almost anybody and you've got a singular anecdote where it worked out OK for you, which is good news for you of course.

Continued attacks on HTTP/2

paulj — Tue, 16 Apr 2024 08:47:10 +0000

And just to illustrate how. Consider how many times in professional publications (inc. LWN) you have read articles of the following form:

1. Here's a shiny new language, which can fix the regularly occurring problem of buffer overflows in network exposed software, leading to serious or catastrophic security issues, which has been a long standing issue. Here are the shiny features which can prevent such issues, and make programmers more productive. The language is almost stable, and more and more people are using it, and we're seeing more serious software being written in it. The more it's used the better! Shiny!

2. Buffer overflows in network exposed software, leading to serious or catastrophic security issues, are a long standing issue in our profession. The language of 1 or 2 (or ...) has a lot of promise in systematically solving this problem. However, existing language are certain to remain in use until language X is stable and widely used. Further it will take a long time before much software in existing language is rewritten to language 1. This article covers simple techniques that can be used to make network exposed parsers at least secure to buffer overflows, which every programmer writing network code in existing languages should know. It includes (pointers to) proven, simple, library code that can be used.

Think how many times over the last few *decades* you have seen articles of the first form, and how often of the latter form. I am sure you have seen the first form many times. I will wager it is rare you have the seen former, indeed, possibly never. Yet, a _serious_ profession of engineering would _ensure_ (via the conscientiousness of the engineers practicing it) that articles of the latter form were regularly printed, to drum it in other engineers.

Worse, there is a culture of even putting down ad-hoc comments making the points of form 2, (and I'm _not_ trying to make a dig at commenters here!) with "Sure, yeah, just write bug-free software!" usually with something like "Seriously, shiny language is the only way.". And yes, there is a lot of *truth* to it that $SHINY_LANGUAGE is the way forward for the future. However that is on a _long_ term basis. In the here and now, techniques that apply to current languages, and current software, and do /not/ need to wait for shiny language to mature, stabilise and be widely deployed, understood and with a body of software to build on (library code, learning from, etc.), are _required_ to solve problems _until then_. Frowning on attempts to distribute that knowledge has _not_ helped this engineering profession avoid many security bugs over the last decade+ while we have waited for (e.g.) Rust.

A serious engineering profession would be able to _both_ look forward to the next-generation shiny stuff, _and_ be able to disseminate critical information about best-practice techniques for _existing_ widely-used tooling. It's not an either-or! :)

Fellow engineers, let's do better. :)

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 03:34:37 +0000

I generally agree with the points you make. For having written some parsers using an FSM in a switch/case loop and with direct gotos between all cases (hidden in macros), it ended up as one of the most auditable, reliable and unchanged code over 2 decades, that could trivially be improved to support evolving protocols and whose performance remained unbeaten by a large margin compared to many other approaches I've seen used. It takes some time but not that much, it mostly requires a less common approach of the problem and a particular mindset of course.

One reason that it's rarely used is probably that before it's complete, it does nothing, and it's only testable once complete, contrary to stuffing strncmp() everywhere to get a progressive (but asymptotic) approach to the problem.

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 03:23:14 +0000

That's great that you had this opportunity. The first time a person taught me about the ability to overflow a buffer and execute code 30 years ago, I almost laughed, and said "you'd be lucky if that would surprisingly work", and he told me "it works more often than you think". That's when I started experimenting with it and figured how hard it was to achieve on sparc (due to switched register banks) that I wrote a generic exploitation tool for this and finally managed to get root on some systems :-) I just felt sad that it was so much ignored by teachers themselves.

Continued attacks on HTTP/2

wtarreau — Tue, 16 Apr 2024 03:19:36 +0000

> if somebody decides that for this much money they'll hand write the machine code to scrape a few extra cycles you can't reach from C and perform *better* than you.

Absolutely, it's a matter of invested effort for diminishing returns. But if we can improve our own design ourselves, others can also do better. It's just that having a fully-controllable base offers way more optimization opportunities than being forced to rely on something already done and coming pre-made as a lib. In the distant past, when threads were not a thing and we had to squeeze every CPU cycle, I even remember reimplementing some of the critical syscalls in asm to bypass glibc's overhead, then doing them using the VDSO, then experimenting with KML (kernel-mode linux) which was an interesting trick reducing the cost of syscalls. All of these things used to provide measurable savings, and that's the reward of having full control over your code.

Continued attacks on HTTP/2

tialaramex — Mon, 15 Apr 2024 21:06:02 +0000

> the savings are huge, especially when it allows them to halve the number of servers compared to the closest competitor

But that's a very *specific* improvement. So now it's not just "it's better" which I don't deny, but that specifically the choice to do this is buying you a 50% cost saving, which is huge. By Amdahl's law this means *at least half* of the resource was apparently wasted by whatever it is the closest competitor is doing that you're not. In your hypothetical that's... using language sugar rather than hand writing a C epoll state machine. Does that feel plausible? That the machine sugar cost the same as the actual work ?

My guess is the reality is that competitor just isn't very good, which is great for your ego, but might promise a future battering if you can't "literally save them millions per year" over a competitor who shows up some day with a cheaper product without hand writing C epoll state machines. Or indeed if somebody decides that for this much money they'll hand write the machine code to scrape a few extra cycles you can't reach from C and perform *better* than you.

Continued attacks on HTTP/2

Wol — Mon, 15 Apr 2024 20:39:01 +0000

> But that would cost far more time than we have, so it won't happen, some cards won't get done this month, or this year, or ever.

You're asking the wrong question (or rather sounds like your bosses are). I overheard some of our bot engineers making the same mistake. "It costs several hundred pounds per bot. Over all our bots that's millions of quid. It's too expensive, it won't happen". WHAT'S THE PAYBACK TIME?

Who cares if it costs a couple hundred quid a time. If it saves a couple of hundred quid over a year or so, you find the money! (Or you SHOULD.)

I'm not very good at putting a cost figure on the work I'm trying to do. I justify it in time - if I can make the work of downstream easier for our hard-pressed planners, and enable them to do more work (we spend FAR too much time watching the spinning hourglass ...) then I can justify it that way. Time is money :-)

Bosses should be asking "is this going to pay for itself", and if the answer is yes, you find a way of doing it. (Yes I know ... :-)

Cheers,
Wol

Continued attacks on HTTP/2

epa — Mon, 15 Apr 2024 19:52:28 +0000

Even if it did reliably segfault and crash, that’s still not a good choice for a format parser run in-process as part of a larger application. Heck, even a command line tool would be considered buggy if it crashed on bad input instead of giving a helpful error.

Continued attacks on HTTP/2

mb — Mon, 15 Apr 2024 17:41:59 +0000

>personal discipline

Yeah. We have been told that for decades.
"Just do it right."
"Adjust your personal discipline."
"Educate yourself before writing code."

But it doesn't work. Otherwise we would not still have massive amounts of memory corruption bugs in C programs.

Rust takes the required "personal discipline" part and throws it right back onto the developer, if she does not behave.
Bad programs won't compile or will panic.

Forcing the developer to explicitly mark code with "I know what I'm doing, I educated myself" (a.k.a. unsafe) is really the only thing that successfully actually reduced memory corruption bugs in a systems programming language, so far.

So, if you *really* know what you are doing, feel free to use the unchecked variants. You just need to mark it with "unsafe".
It's fine.

Continued attacks on HTTP/2

farnz — Mon, 15 Apr 2024 17:12:05 +0000

Yep, simply don't write bugs and you'll end up having completely bug free code. It's that simple.
Except that is apparently isn't. The reality proves that.

No; what reality proves is that most of us can't dependably do the "don't write bugs" part of that, and that the temptation of 1% better on a metric is enough that you will almost always find someone who succumbs to temptation to move the metric, even at the risk of introducing serious bugs.

For example, if you design your code such that the parser attempts to consume as many bytes from the buffer as it can, returning a request that I/O grows the buffer if it can't produce another item, then you get a great pattern for building reliable network parsers in any language, and the only thing that makes it easier in newer languages is that they're got more expressive type systems.

But it's always tempting to do some "pre-parsing" in the I/O layer: for example, checking a length field to see if the buffer is big enough to parse another item. And then, once you're doing that, and eliding the call to the "full" parser when it will clearly fail, it becomes tempting to inline "simple" parsing in the I/O function, which will perform slightly better on some benchmarks (not those compiled with both PGO and LTO, but not everyone takes care to first get the build setup right then benchmark).

And, of course, it also gets tempting to do just a little I/O in the parser - why not see if there's one more byte buffered if that's all you need to complete your parse. If one byte is OK, why not 2 or 3? Or maybe a few more?

That's where tooling comes in really handy - if you use tokio_util::codec to write your parser, it's now really hard to make either of those mistakes without forking tokio_util, and it's thus something that's less likely to happen than if you wrote both parts as separate modules in the same project. But there's nothing about this that "needs" Rust; you can do it all in assembly language, C, FORTRAN 77, or any other language you care to name - it's "just" a tradeoff between personal discipline of everyone on the project and tool enforcement.