C was a great low-level language - for the PDP-11
C was a great low-level language - for the PDP-11
Posted Feb 15, 2021 9:46 UTC (Mon) by anton (subscriber, #25547)In reply to: C was a great low-level language - for the PDP-11 by sdalley
Parent article: Python cryptography, Rust, and Gentoo
The referenced article is not particularly good, just a hodgepodge of pet peeves.
As for the complexity of gcc and clang/LLVM, it is an indication that they have too much budget and want to produce good benchmark results (at the cost of worse usability) to justify that (admittedly they are also doing things that help usability, but they could do that without doing the other nonsense).
As for flat memory and caches (and, mentioned in the paper, cache coherency protocols), that is indeed hardware architecture for speeding up existing software written for a simple memory model, plus being able to run processes with large memory needs. Hardware architects needed a long time to get here, and tried to throw the complexity over to programmers the whole time (and are still doing it, with weak memory consistency): Instead of caches, they wanted us to manage fast memory by software, with the most recent instance being the SPEs of the Cell Broadband Engine (used in the PlayStation 3). Instead of somewhat consistent shared memory, they would rather have given us distributed memory, with software managing the transfer of data from remote to local memory before processing (supercomputers still have this). All this would make general-purpose programming so much harder that the alternatives with more complex hardware won out. So the architectures provide at least single-threaded programs with a "flat" memory model, and a language that reflects that memory model with, e.g., address arithmetic is a sensible low-level language for that (but note taht C as understood by the gcc and clang maintainers is not such a language).
Segments are what I first thought of when you mentioned "flat memory". This has been pretty much eliminated as architectural (mis)feature (and where it is present, it has not been used for a while); having it in an architecture costs extra hardware, and costs extra in software. As to how a low-level language would look that supports it, look at the C standard; it includes many restrictions that cater for these kinds of architectures; and these days the gcc and clang maintainers use these restrictions as justification for miscompiling programs on architectures with flat memory.
As for register renaming (vs. "fixed registers"), Intel has spent billions on IA-64 aka Itanium based on the idea that compilers could rename "fixed registers" and reorder instructions better than the hardware can. In the end it turned out that the hardware with register renaming performs better for most software. The IA-64 approach would also have required more complex compilers to perform well, and the Itanium CPUs are also quite power-hungry even without a register renamer.
Vectors as first-class objects: Look at APL, J, or FP, although I would not call these languages low-level. Still, Backus was not pleased with architecture and programming languages and proposed FP as an alternative programming model. But despite Backus' standing and his high-profile presentation of his critique and alternative, FP/FL have not seen mainstream success nor taken the functional programming community by storm.
On a completely different track, you can look at GNU C's vector extensions, which is pretty low-level.
As for threads, we have seen SMT in mainstream CPUs since 2002 and multi-core CPUs in the mainstream since 2005. The low-level approaches to that have been pthreads and the C++ memory model, but they are hard to program with. By contrast, Unix pipes (a high-level concept) lets me use multiple cores or hardware threads without particular effort (but typically only for rather limited amounts of parallelism).
Occam is a programming language for programming distributed-memory multiprocessors (but even on shared-memory machines, each thread could get its private memory, limiting the memory ordering headaches to the implementation of communications). I think that one other thing that the transputers and Occam did right was to make thread creation, destruction and communications very cheap, so finding the right granularity of parallel processing was not as critical as on current mainstream stuff. Still, I don't see these aspects of Occam being picked up in the mainstream, so maybe they are not as important as I think.
Overall, the problem of making good use of many threads with little burden on the programmers is still unsolved, and that's why architectures with lots of slow threads have not found mainstream success.
