|
|
Log in / Subscribe / Register

Cro: Maintain it With Zig

This blog post by Loris Cro makes the claim that the Zig language is the solution to a lot of low-level programming problems:

Freeing the art of systems programming from the grips of C/C++ cruft is the only way to push for real change in our industry, but rewriting everything is not the answer. In the Zig project we’re making the C/C++ ecosystem more fun and productive. Today we have a compiler, a linker and a build system, and soon we’ll also have a package manager, making Zig a complete toolchain that can fetch dependencies and build C/C++/Zig projects from any target, for any target.

(LWN looked at Zig last year).


to post comments

Cro: Maintain it With Zig

Posted Sep 9, 2021 23:59 UTC (Thu) by bartoc (guest, #124262) [Link] (1 responses)

I kinda like zig, but when I looked at it I became extremely concerned about just how long it was willing to defer semantic analysis of generic functions (of which there are many, due to Zig's error handling mechanism, among other things). This plays out a lot like how MSVC used to parse template function bodies (it basically didn't).

Maybe something about how the language is put together mitigates this, but it really did concern me.

Their error handling mechanism also relies on the compiler being able to assign unique IDs to errors that may well differ between compiles. I'm not sure how well this scales either. Who knows, maybe it's fine, but maybe they are just kicking the can down the road on these hard problems.

Cro: Maintain it With Zig

Posted Sep 10, 2021 9:13 UTC (Fri) by ldearquer (guest, #137451) [Link]

>> Their error handling mechanism also relies on the compiler being able to assign unique IDs to errors that may well differ between compiles. I'm not sure how well this scales either. Who knows, maybe it's fine, but maybe they are just kicking the can down the road on these hard problems.

I think this is a concern for them too. When speaking about inferred error sets (which is not the same as auto code assignment), this comes up:

<<
When a function has an inferred error set, that function becomes generic and thus it becomes trickier to do certain things with it, such as obtain a function pointer, or have an error set that is consistent across different build targets. Additionally, inferred error sets are incompatible with recursion.

In these situations, it is recommended to use an explicit error set. You can generally start with an empty error set and let compile errors guide you toward completing the set.

These limitations may be overcome in a future version of Zig.
>>

What I am not sure about is if "explicit error set"s allow you to assign the error codes by hand. I would assume you can, since error sets are just special enums.

Other than that, I find it great making the compiler aware of returned errors. In general, Zig looks very promising to me, and I would like to start using it in embedded, as soon as time permits (I normally have to work with 32kB/64kB chips, and things like Zig "comptime" may help with some optimizations better than endless macros)

Cro: Maintain it With Zig

Posted Sep 10, 2021 0:36 UTC (Fri) by HelloWorld (guest, #56129) [Link] (17 responses)

The article claims that C++ is moving forward too slow. Well, that's just ridiculous. In fact it's hard to think of any language that has changed faster and more radically than C++ over the last 10 years. There are now variadic templates, auto, uniform initialization, lambdas, modules, concepts, rvalue references, constexpr, consteval, simplified for loops, coroutines, enum classes, class template argument deduction, initializer lists and a bunch of others... And there's no sign of this slowing down any time soon.

Cro: Maintain it With Zig

Posted Sep 10, 2021 5:35 UTC (Fri) by tlamp (subscriber, #108540) [Link] (16 responses)

He talks about cruft and your statement seem to underline his worries a bit, as this sounds that the cruft-adding/time rate is not too small.

Not saying that adding new features is bad, nor do I have in-depth C++ experience to actually judge its ecosystem, that's just how I read it and from the outside it seems that each C++ versions gets many features bolted on, great to have that much power; but I could imagine that it may need quite some discipline and all the more refactoring, as some contributors surely want to use all that new shiny stuff, but normally one also wants to avoid creating a big ball of mud.

Cro: Maintain it With Zig

Posted Sep 10, 2021 11:12 UTC (Fri) by excors (subscriber, #95769) [Link] (15 responses)

I think a lot of the recent C++ changes aren't about making it more powerful, they're making it easier to use the power that the language already had. Like you can now write "std::lock_guard lock(some_mutex);" instead of "std::lock_guard<std::mutex> lock(some_mutex);" (thanks to class template argument deduction) - not a big change, but it makes the code a bit cleaner. Or more substantially, with features like "if constexpr" you can do metaprogramming (i.e. code that's executed at compile-time, and its inputs/outputs can be both values and types) in a procedural style that's quite similar to regular C++, whereas previously you had to write everything in a weird recursive functional style with horrible SFINAE tricks. And there's a lot of language cleanups so that code which always seemed natural to write in C++ but previously generated obscure compiler errors, now compiles correctly and does what you'd expect.

The language specification is getting more complicated, but programs written in the language can now be simpler and less crufty, which seems like a good tradeoff.

Cro: Maintain it With Zig

Posted Sep 10, 2021 15:47 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link] (14 responses)

> The language specification is getting more complicated, but programs written in the language can now be simpler and less crufty, which seems like a good tradeoff.

The language itself gains multiple ways of doing things, old old way, old way, new way and since with a large amount of users, you can't drop compatibility, any programmer is likely going to have to learn all the different methods. This is something organizations have tried to tackle by limiting themselves to a subset but that subset is different depending on which codebase you are looking at. With a long history, I am not sure that is an easy problem to solve.

Cro: Maintain it With Zig

Posted Sep 11, 2021 0:34 UTC (Sat) by roc (subscriber, #30627) [Link] (13 responses)

It's even worse than that. When you import code from one project to another the C++ subsets used are likely to be different.

Cro: Maintain it With Zig

Posted Sep 12, 2021 0:13 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (12 responses)

Here's (C++ committee member) Nicolai Josuttis on the conflicting C++ language style guides:

https://www.youtube.com/watch?v=WRQ1xqYBKgc

Particularly egregious is the fact that the "Core Guidelines" recommend against East Const, on the basis that even though clearly East Const is better, and could be automatically checked, it's not popular and so you should just learn the more complicated unintuitive West Const rules instead. It's not clear what the purpose of such a guide even is when it defers to popularity so easily.

Cro: Maintain it With Zig

Posted Sep 12, 2021 2:28 UTC (Sun) by HelloWorld (guest, #56129) [Link] (9 responses)

Frankly that talk only demonstrates that many style guides are simply retarded.

A particularly idiotic example is the rule from MISRA that every switch statement must have a default branch. The compiler can and will check exhaustiveness when switching over an “enum class” type, so the rule doesn't achieve anything useful here. But when you add a new enumerator to the enum class, your compiler will now no longer be able to warn you about a missed case, because those are already handled by the default branch! So this rule actively harms programmers by depriving them of a useful language feature.

And when it comes to MISRA, that's just the tip of the iceberg.

switch

Posted Sep 12, 2021 8:08 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (2 responses)

Still a language mis-feature :D

In Rust, match (the closest feature to switch) must be exhaustive, you may choose to either write a default case *or* cover every possible value but not both as that's an error, for the same reason matching '7' twice in some digit matching code would be an error.

If the library you're using knows they might want to add more values to the enumeration they can declare it to be #[non_exhaustive] which signals to the compiler that the former scenario (cover every case explicitly) isn't enough after all and you must always supply a default. This way when the next library upgrade adds another value your default match covers that. USFederalHoliday should likely be #[non_exhaustive] but you don't need a default case for CalendarMonth or DayOfWeek.

If they choose not to write #[non_exhaustive] but then they do add a value to the enumeration anyway this is a backwards incompatible change and your code won't compile until it's adjusted to cope with the new value.

As a result the desired effect of the MISRA rule is always in place in Rust, while the dangerous behaviour is not possible, a guideline is unnecessary.

switch

Posted Sep 12, 2021 11:23 UTC (Sun) by HelloWorld (guest, #56129) [Link] (1 responses)

In safety critical applications, which after all typically run in some sort of embedded system, I don't think you even need something like #[non_exhaustive]. When a new enumerator is added, you really should take another look and not just hope that your old default clause still makes sense. Binary compatibility is not that much of an issue in embedded systems because you don't usually upgrade shared libraries independently.

Besides, this is purely a tooling problem. If MISRA wants to enforce exhaustiveness, they can do so. But apparently their tool vendors are just too lazy and they prefer forcing people to write dead code instead.

switch

Posted Sep 20, 2021 10:42 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

Exhaustiveness in C is really hard because the enumerated type is just a funny way to spell an integer as explained in the other sub-thread.

Obviously rustc turns your simply enumerated type into an integer in the machine code too, but this happens in an IR after you can't touch it, so the only rule needed to avoid setting yourself on fire is "No unsafe code" ie write #![forbid(unsafe_code)] and you're done.

Cro: Maintain it With Zig

Posted Sep 12, 2021 11:34 UTC (Sun) by excors (subscriber, #95769) [Link] (5 responses)

> The compiler can and will check exhaustiveness when switching over an “enum class” type, so the rule doesn't achieve anything useful here.

I think that's incorrect, because it's legal to cast an integer to an "enum class" type even if it's not one of the declared enumerators. Then it wouldn't match any of your 'exhaustive' cases and you need to handle it with a default case (or intentionally rely on the default default behaviour of falling off the bottom of the switch).

(This surprised me when I discovered it recently.)

Specifically, according to C++17 every enum has an 'underlying type'. For "enum E : T {}" and "enum class E : T {}", it is 'fixed' as T. For "enum class E {}", it is fixed as int. The 'values of the enumeration' are the values of the underlying type, e.g. for "enum class E {}" it's all values of type int.

For "enum E {}", the underlying type is not fixed and is implementation-defined. The values of the enumeration are (basically) from 0 up to the smallest 2^N-1 that will fit all the defined enumerators, which may be a smaller range than the underlying type.

With a fixed underlying type, casting an integral value to the enumeration type will convert it to the underlying type first, by the usual integer rules. That means it will always be one of the values of the enumeration, so the cast is always allowed.

With a non-fixed underlying type, casting is only allowed if the integral value is within the range of the enumeration values. E.g. if you have "enum E { one=1, six=6 };" then the range is 0 to 2^N-1 with N=3, so (E)2 and (E)7 are permitted but (E)8 is undefined behaviour. Clang's UndefinedBehaviorSanitizer helpfully detects that: "runtime error: load of value 8, which is not a valid value for type 'E'".

(That restriction is specifically for casting - the standard says "It does not preclude an expression of enumeration type from having a value that falls outside this range". I guess something like "std::underlying_type_t<E> n = 8; E e; memcpy(&e, &n, sizeof(e));" might be a legal way to generate such a value, but I'm not familiar enough with the rules to be certain.)

So I think about the only situation where you can exhaustively switch on an enum without a default case, is when it's "enum E : uint8_t" / "enum class E : uint8_t" and you define enumerators for every value from 0 to 255. In all other cases, for both "enum" and "enum class", it's perfectly legal to have values of the enumeration type that are not one of the enumerators. You need to either do some global analysis of your program (which is outside the scope of C++) to prove you never generate such values, or write code that's locally safe by handling the default case in every switch.

This does make the compiler's "warning: enumeration value '...' not handled in switch" warnings quite silly, because if you forget to handle one enumerator but have a default case you won't get that warning, and if you remove the default case (to enable the warning) and handle every enumerator (to fix the bug revealed by that warning) then the warning goes away even though you've just added billions of unhandled enumeration values. In the latter case, at least GCC will sometimes still warn you that "control reaches end of non-void function" despite you handling every declared enumator - Clang suppresses that warning and silently generates code that will trigger undefined behaviour at runtime when given a valid enumeration value that doesn't match any of the supposedly-exhaustive cases.

Cro: Maintain it With Zig

Posted Sep 12, 2021 14:45 UTC (Sun) by HelloWorld (guest, #56129) [Link] (4 responses)

> I think that's incorrect, because it's legal to cast an integer to an "enum class" type even if it's not one of the declared enumerators.
The problem here is the cast, not the lack of a default clause. Why doesn't MISRA forbid that? That would actually make sense...

Besides, what useful thing could you possibly do in such a default clause? Because after all, the whole point of an enum type is that it can only hold one of a number of enumerated values. Therefore, when you encounter a value that isn't among them, your program is already in a state that the developers didn't forsee, and hence couldn't possibly know how to rectify.

Cro: Maintain it With Zig

Posted Sep 12, 2021 17:06 UTC (Sun) by excors (subscriber, #95769) [Link] (3 responses)

> The problem here is the cast, not the lack of a default clause. Why doesn't MISRA forbid that? That would actually make sense...

Hmm, it looks like MISRA C++:2008 already forbids that: "Rule 7-2-1: An expression with enum underlying type shall only have values corresponding to the enumerators of the enumeration". (That's based on C++03 and scoped enumerations are a C++11 feature, so it's talking about unscoped enumerations here.)

In that case, I think it would be feasible to have exhaustive switches over enums. But since it's different to the standard C++ rules, you'd need to suppress the compiler's "control reaches end of non-void function" warnings (and suppressing warnings seems generally dodgy when you care about safety), then add a static analysis tool to check the new rules. I don't know much about MISRA but I guess they didn't want to rely on tools that didn't exist yet.

> Because after all, the whole point of an enum type is that it can only hold one of a number of enumerated values. Therefore, when you encounter a value that isn't among them, your program is already in a state that the developers didn't forsee, and hence couldn't possibly know how to rectify.

According to the definition of C++, the point of an enum type is that it's basically an integer where some of the values have names. The developer is responsible for foreseeing states where an enum value doesn't match any name and deciding how to handle it, because those are well-defined states.

With unscoped enums, it seems common and widely accepted to use an enum type to contain a set of flags, so you'll bitwise-or two enumerators and get an enum value that doesn't equal any named enumerator.

With scoped enums, storing a combination of flags is allowed but is very awkward (because you need static_casts everywhere) and I think any sensible style guide would advise against it. But even then, it seems quite reasonable to e.g. define a struct with a scoped enum field and read it from disk or from a network socket or decode it from JSON/protobuf/etc, and it could have an arbitrary integer value that doesn't match any enumerator. Maybe you have some validation layer that rejects such messages as soon as possible, but the language doesn't give you any tools to help implement that (e.g. there's no reflection to let you find all the enumerator values) and standard static analysis tools won't help (because non-enumerator values don't violate type safety and aren't undefined behaviour), so there's a risk that non-enumerator values will leak into the rest of your program. To be safe, you should handle those values everywhere.

Cro: Maintain it With Zig

Posted Sep 12, 2021 21:51 UTC (Sun) by HelloWorld (guest, #56129) [Link] (2 responses)

Actually the C++20 standard says this in Chapter 7.6.1.9, paragraph 10:
A value of integral or enumeration type can be explicitly converted to a complete enumeration type. If the enumeration type has a fixed underlying type, the value is first converted to that type by integral conversion, if necessary, and then to the enumeration type. If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values (9.7.1), and otherwise, the behavior is undefined
So it seems to me that it's impossible to create a value of an enum type other than the enumerators without previously invoking undefined behaviour (unless the enumeration type has a fixed underlying type). But I wonder if I'm misreading the standard here, because modern compilers should be able to exploit this, and yet they don't seem to. Something like this...
enum class Foo {
        Bar
};

auto f(Foo f) -> int {
        switch(f) {
                case Foo::Bar: return 42;
                default: return 23;
        }
}
... should just be compiled to mov eax, 42; ret according to my reading of the standard, but that's not what I get:
        test    edi, edi
        mov     ecx, 42
        mov     eax, 23
        cmove   eax, ecx
        ret
So I'm probably missing something here.

And you're right about the compiler warnings, that's a problem. But clang doesn't issue that diagnostic in such cases, and I think that's a good thing.

With unscoped enums, it seems common and widely accepted to use an enum type to contain a set of flags, so you'll bitwise-or two enumerators and get an enum value that doesn't equal any named enumerator.
If you want a set of bits, I think std::bitset is the way to go.
With scoped enums, storing a combination of flags is allowed but is very awkward (because you need static_casts everywhere) and I think any sensible style guide would advise against it. But even then, it seems quite reasonable to e.g. define a struct with a scoped enum field and read it from disk or from a network socket or decode it from JSON/protobuf/etc, and it could have an arbitrary integer value that doesn't match any enumerator. Maybe you have some validation layer that rejects such messages as soon as possible, but the language doesn't give you any tools to help implement that (e.g. there's no reflection to let you find all the enumerator values) and standard static analysis tools won't help (because non-enumerator values don't violate type safety and aren't undefined behaviour), so there's a risk that non-enumerator values will leak into the rest of your program. To be safe, you should handle those values everywhere.
Again, what are you going to do about it when you encounter a value other than the enumerators? That just means you had a bug in the part of your program that is supposed to validate the inputs, and now the program is in a state never expected or intended by the developer, so they couldn't possibly know what the correct way forward is.

Well, unless they actually do expect values other than the enumerators, in which case they should ask themselves why they're using an enum type in the first place. Anyway, the whole enum situation in C++ is a bit of a mess. I personally think that when you start thinking about the underlying representation of an enum type, you're probably operating at the wrong level of abstraction and should be using something other than an enum type, but that's not how the language is defined, apparently.

Cro: Maintain it With Zig

Posted Sep 12, 2021 22:19 UTC (Sun) by excors (subscriber, #95769) [Link] (1 responses)

> So it seems to me that it's impossible to create a value of an enum type other than the enumerators without previously invoking undefined behaviour (unless the enumeration type has a fixed underlying type).

"enum class Foo" is a scoped enumeration so it does have a fixed underlying type (defaulting to int), per C++20 9.7.1.5:

> Each enumeration defines a type that is different from all other types. Each enumeration also has an underlying type. The underlying type can be explicitly specified using an enum-base. For a scoped enumeration type, the underlying type is int if it is not explicitly specified. In both of these cases, the underlying type is said to be fixed.

The undefined behaviour only applies to an unscoped enum with no explicitly specified underlying type. And in that case "the range of the enumeration values" is not just the list of declared enumerators, it's a power-of-two-aligned range that includes the list of enumerators, per C++20 9.7.1.8:

> For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, the values of the enumeration are the values representable by a hypothetical integer type with minimal width M such that all enumerators can be represented. [...] It is possible to define an enumeration that has values not defined by any of its enumerators

(C++17 has a much more verbose definition but I think it has the same effect.)

> Again, what are you going to do about it when you encounter a value other than the enumerators?

If that case can only be triggered by a bug in your code, you could do the equivalent of assert(0), i.e. crash the process and let some other system (init, kernel, hardware watchdog, etc) recover cleanly - same as any other case where you detect a bug. That's safer than e.g. falling off the bottom of a non-void function and returning some garbage (which could be a security vulnerability).

> Anyway, the whole enum situation in C++ is a bit of a mess.

I can't disagree with that :-)

Cro: Maintain it With Zig

Posted Sep 13, 2021 1:10 UTC (Mon) by HelloWorld (guest, #56129) [Link]

I see, thanks for pointing out the relevant sections of the standard. That does clear things up. I still think that enforcing a default clause is a bad idea because it prevents the compiler from issuing a warning when you miss an enumerator. Giving that up for an assert that is only ever going to do something if you've already messed up just doesn't seem like a good tradeoff. Especially given that crashing the process might not always be viable. There are situations where you need to keep going under all circumstances.

East const vs West const

Posted Sep 13, 2021 14:47 UTC (Mon) by dskoll (subscriber, #1630) [Link] (1 responses)

OMG! I had no idea these styles had names! I'm a strong proponent of East const, which makes way more sense to me that West const. Now I feel like I'm part of a community instead of a lone programmer tilting against windmills. :)

East const vs West const

Posted Sep 16, 2021 8:25 UTC (Thu) by ncm (guest, #165) [Link]

West is favored by people who wish they were coding C; east by people glad to be coding C++.

East "*" is generally worse. It is favored by compiler writers and people who glory in being a PITA, and practically nobody else.

But it is not hard to read any of them. Or, even, mixed on the same page; it just looks untidy. There are worse sins. Clang-format fixes everything without fuss.

Cro: Maintain it With Zig

Posted Sep 10, 2021 1:12 UTC (Fri) by atai (subscriber, #10977) [Link] (2 responses)

>Freeing the art of systems programming from the grips of C/C++ cruft is the only way to push for real change in our industry, but rewriting everything is not the answer. In the Zig project we’re making the C/C++ ecosystem more fun and productive.

so we need to escape from C/C++, and then we don't.

Since we don't, this is a no-way way (as escaping is the "only way" per the first sentence)

Cro: Maintain it With Zig

Posted Sep 10, 2021 2:23 UTC (Fri) by Paf (subscriber, #91811) [Link] (1 responses)

It might help if you read more than just these lines. Zig is deeply compatible with these ecosystems, which is how it can - in theory - both make them more fun and productive and aid in escaping from those languages over time. Gradually migrate.

Cro: Maintain it With Zig

Posted Sep 10, 2021 9:01 UTC (Fri) by scientes (guest, #83068) [Link]

You got it.

The initial goal of Zig was to be a better C. I left the project because I felt that it had diverged from those goals, and that Andrew is driven by his ego and thus over-extends himself (although his abilities are quite impressive). However, it succeeded at many of those initial goals.

One reason it has been so successful is that it is not unwilling to throw off baggage, like a dependence on libc. The standard library is a major pain to anyone that tries to use Rust without a POSIX clone.

Cro: Maintain it With Zig

Posted Sep 10, 2021 9:48 UTC (Fri) by moltonel (subscriber, #45207) [Link] (6 responses)

This is talking about Zig as a build system for C/C++, not about Zig as a language. We've seen a lot of those before, what could possibly go wrong ? Getting this "foot in the door" as a build system to help transitioning to progressively migrate a C/C++ project to a better language could work, or it could be that people who appreciate the build system are not the same people who appreciate the language.

The article couldn't help jabbing at Rust (talking about the "RIIR mantra" is a gross mischaracterization at this stage), rightfully pointing out that we won't get rid of our C/C++ giants any time soon. But Rust makes by and large the same promise of a good build system a better language, and the possibility of progressive evolution without outright replacing the C heritage.

As nice as Zig looks, to me it doesn't seem compelling enough to dive into. If I'm working with a legacy project, changing its build system is hopefully not on my todo list, but I have many options beside Zig if it is. If I'm working on a new project, Rust seems to have more to offer.

Cro: Maintain it With Zig

Posted Sep 10, 2021 14:36 UTC (Fri) by mario-campos (subscriber, #152845) [Link] (5 responses)

>If I'm working with a legacy project, changing its build system is hopefully not on my todo list, but I have many options beside Zig if it is.

I agree: changing a build system on a legacy project is hardly ever a priority. That being said, I think the author mentions it, not for the build system itself as if it's some gift to the world of build systems, but to inform the reader that one can transition gradually to Zig, by using Zig as drop-in replacement for `cc` and other tools.

>If I'm working on a new project, Rust seems to have more to offer.

I am both excited and confounded by all of the languages in the systems-programming space. Just to name a few: Ada, D, C, C++, Go (if you squint), Rust, Zig, Nim, etc. They all seem to have their unique niches, although, as an outsider to these languages, I'm not truly sure what they are and I don't know if or how Rust stacks up to Zig. Or Nim. Or Ada.

Cro: Maintain it With Zig

Posted Sep 10, 2021 22:20 UTC (Fri) by khim (subscriber, #9252) [Link]

It's not about different niches but more about different attitudes. E.g. Rust is the only language without tracing GC which considers the ability to abuse an API to crash the program a CVE event. Sadly it doesn't consider DOS worthy of the same treatment, but most other system languages just say “you are holding it wrong” to such bugreports (ada considers it's Ok if you can abuse heap-allocating procedures but if dynamic memory is not used it uses the same standard).

Cro: Maintain it With Zig

Posted Sep 11, 2021 0:49 UTC (Sat) by dvdeug (subscriber, #10998) [Link] (2 responses)

For Ada, Ada's old. It doesn't have a lot of the fancy features new systems have, and its system of generics is the very first generic system in any major programming language, the direct ancestor of C++ templates. It is, in many ways, an advanced object-orientated Pascal. (It was the first internationally standardized object-orientated language, in its 1995 iteration.)

For upsides, for one, it's been around for 40 years, and there's a group working on the next ISO standard version. Like Fortran, it will be around for a few more decades. It's got more support than D or Nim or Zig, and unlike Go or Rust, I know it's going to be there for a while. (Look at Perl and Adobe Flash for other examples of shifting sands.) The military and aviation support will keep it alive, and AdaCore is dedicated fairly solidly to supporting the open source community.

For the other, it's a lot more powerful than C and a lot less hairy than C++. It's a fully object orientated language with generics, concurrency and Unicode support. The last update added a lot of support for invariants and the next is working on a lot of more convenient parallel processing constructs. SPARK is a formally provable subset, and there's subsets for running without dynamic memory allocation and for hard real-time systems.

I went hard into Ada after fighting with pre-ISO C++, and I sometimes group it in with my early years in BASIC, as something to be remembered more fondly than used, but SPARK is easier to use than Coq, and for a powerful language at the lowest level of the system, personally, Rust may be the only sane alternative.

Cro: Maintain it With Zig

Posted Sep 11, 2021 2:46 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I tried to look into Ada once and I haven't been able to understand how it deals with pointers and heap data. Are there any guides for that?

Cro: Maintain it With Zig

Posted Sep 11, 2021 3:24 UTC (Sat) by dvdeug (subscriber, #10998) [Link]

I don't know about guides. It just basically calls pointers access variables, and initalizes them with "new" and you have to make a free function with the generic package Ada.Unchecked_Deallocation, so it's little more sophisticated than C on this. (The clunky free function name is apparently because they assumed everyone would using garbage collection in the future in 1980, so deallocating memory needed to be marked. In practice, except for a couple short-lived JVM or .NET ports, nobody has ever used GC with Ada.) There's object finalization, so you can hide it behind objects that will automatically deallocate the memory when leaving scope, and storage pools, so you can control where memory for certain types gets allocated from, but by and large, it's like C, with all the potential use-after-free errors.

Cro: Maintain it With Zig

Posted Sep 12, 2021 1:59 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

There does seem to have been a resurgence of interest in this topic (systems programming languages) accompanying actual developments.

Rust in particular stands out for its focus on safety and concurrency (and so, safe concurrency). Once upon a time concurrent programming was a weird niche case, and few needed to care about it. Then for a while it was more mainstream but still only something a few very low-level components needed, like a multi-processor operating system, but today it is likely that choosing not to do concurrent programming means choosing suboptimal solutions for an increasing range of problems.

Rust's type system allows it to assure you that your (safe) Rust programs exhibit Sequential Consistency. This means you can reason about the working of your concurrent program in terms of things happening in some order, e.g. if thread #1 did A, then B, then C, and thread #2 observes that C happened, it stands to reason that A has already happened, since that was before C. Modern CPUs do not actually experience Sequential Consistency, raw machine code for thread #1 that does A, then B, then C, will _not_ reliably result in a thread #2 which checks C happened also observing A having happened, so this really is some work for the compiler. In languages like C or C++ the compiler promises to deliver Sequential Consistency *if* your program is free from data races (cases where the same value could be modified simultaneously in two or more places) but if you fail the program not only doesn't have Sequential Consistency its behaviour is Undefined. Rust sidesteps this by making such races impossible in the language design. If you try to write a data race your program won't compile.

I think C++ stands out as the most friendly to would-be consultants. A consultancy career in C++ looks safe even if you started today, there's a lot of C++ out there and it's fantastically complicated, even if interest in the language for new projects wanes over the coming decades, I can't see an expert in C++ struggling to find consultancy gigs in 2050. And if C++ continues to go from strength to strength, that's just more opportunity for consulting. As a programmer having so many different ways to initialize a variable, plus the knowledge that some (but not all) variables in the language can be declared uninitialized, is annoying. But as a consultant that means even "variable initialization" becomes a topic heading, you can have policies, you can talk about existing practices, you can write an exercise, tonnes of fun.

As we see in the article, Zig has a remarkable cross compilation story. There are plenty of other small things to like about Zig, but I'd say the one that stands out is how enthusiastic Zig is about solving your build problem. If this dominates the work you don't like on a hobby project, that might be enough reason to adopt Zig even if nothing else about it appeals to you.

I suspect Go's niche is exactly what Google specified, although perhaps less than they had hoped. New hires can write a Go program that has reasonably good performance, and it probably either does what they expected or they've got a reasonable chance to fix the bugs and make it work. Go doesn't want you to be too clever, in fact it specifically cautions against this. But, it turns out that although it's not as dangerous as C or C++ it's still not actually safe, and although it's not as slow as Java or Python it's still not actually fast, and although it's easier to learn than Rust or Haskell it's still not actually easy. Trying to be the best would have involved being clever, and they didn't want to. If you can only ever learn one programming language, you could do much worse than Go. But if you can learn two, there's no reason I can see why either of them should be Go.

Cro: Maintain it With Zig

Posted Sep 10, 2021 17:16 UTC (Fri) by jd (guest, #26381) [Link]

It's certainly worth noting, but then you've members of the family like D that already have a lot of cruft removed, and dialects of C like CompCert's where you can verify that both the source and binary match what you expect. XKCD-ing the standards by adding one more needs to contribute some quality that none of the others has. Otherwise, it would be better to place effort into identifying the standard closest to the desired end result and work from there, so you've a dialect (just as CILK was) and not a new language.

Fetch dependencies? Aaaaaah!

Posted Sep 11, 2021 17:32 UTC (Sat) by mirabilos (subscriber, #84359) [Link] (16 responses)

Anything that wants to fetch dependencies makes me step back at first. This is usually… almost always, even, a problem.

Fetch dependencies? Aaaaaah!

Posted Sep 12, 2021 1:56 UTC (Sun) by HelloWorld (guest, #56129) [Link] (15 responses)

Build tools download dependencies because that's the only way to get stuff to work more or less consistently across different developers' machines. Like it or not, but that's just the way the world works these days, and it's time that the C/C++ community catch up to 2005 when it comes to that.

And actually, people increasingly just specify their entire development environment as a container image, see e. g. Gitpod. That makes most ”it works on my machine“ issues just go away. You run a container and that's it, you're done, no need to fiddle around with your setup.

But hey, I guess if you like messing around with m4 and pkg-config and all that, that's cool too.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 0:22 UTC (Mon) by flussence (guest, #85566) [Link] (14 responses)

No thanks.

Modern development practices should not be merely gentrifying the experience of downloading multi-gigabyte mystery meat zip files from random xda-developers threads, or jumping through the same miserable hoops as setting up a semi-proprietary embedded environment in 2005. Don't throw overflowing trashbags of mystery binaries and other people's vendored code with private modifications over the wall and pretend it's open. It's not okay when the likes of Google do it, and you certainly aren't as important.

Also, "pkg-config"? Update your FUD, most people switched to pkgconf years ago.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 1:30 UTC (Mon) by HelloWorld (guest, #56129) [Link] (13 responses)

Wow, you clearly have no idea what you're talking about. Build systems don't download “multi-gigabyte mystery meat zip files from random xda-developers threads”, they fetch packages from well-known package repositories such as Maven Central, and they certainly don't encourage vendoring – it's way more convenient not to bundle dependencies and just have the build tool download them.
And jumping through hoops? How is it jumping through hoops when the build tool just does everything for you? Jumping through hoops is when you need to chase down every single library yourself because your build system isn't capable of doing what it should: take care of the boring stuff in a reliable, reproducible way.

Besides, do you think people make modifications to their libraries just for the fun of it? No, they do it because they need those changes for their program to work right. Most developers aren't idiots looking for ways to make their life harder, you know...

> Also, "pkg-config"? Update your FUD, most people switched to pkgconf years ago.
Yeah, like that makes any kind of meaningful difference.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 2:01 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (12 responses)

Right, Maven Central, now that you mention it as prime example, is a cesspit. Sourcless crap, sometimes not even built with Maven or rebuildable at all. You get source JARs only if you’re lucky, and often *cough*Lombok*cough* they don’t correspond to what’s shipped or only partially correspond to it, and they’re clearly not Complete Corresponding Source. Licence metadata in POM files, if it exists at all, is usually outdated or sometimes even so plainly wrong it was never right in the first place, so in $dayjob, where I often have to do licence analysis/compliance for Java™ stuff, I have to look at every single file…

… not that others are better in any way. Look at the npm package for jQuery once. The consider that jQuery bundles other libraries in its… compilate. Also look at how they ship an older jQuery binary as part of the testsuite runner. Continue screaming.

No, I know why I fully subscribe to the Debian schema of building everything themselves. I learnt in MirPorts, ages ago, that you *always* have to regenerate e.g. configure scripts, period, for example.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 5:07 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

Maven is the first widely used build system that downloads dependencies automatically (well, maybe apart from CPAN), so it's showing its age.

Most of more modern systems don't have this problem. They provide complete bit-for-bit identical build environments via lockfiles and it's also common to only provide source code, not prebuilt binary artifacts.

In particular, jQuery NPM package doesn't require any third-party downloadable libraries: https://www.npmjs.com/package/jquery

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 12:11 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (8 responses)

Haha, jQuery.

It’s still there: https://github.com/jquery/jquery/blob/main/test/data/jque...

Also, includes sizzle.js, not stating the version, and not including that at compile time (or, better, at runtime!), no, it’s bundled, and its licence possibly not even honoured.

I also found “interesting” things in npm but not yet the 5 MiB coffee cup… I did report a few short of 100(!) licence violations to SwaggerUI though, which to fix required them to rethink their entire build system.

This MUST be thought of BEFOREHAND. npm makes it easy to do the wrong thing, even more so than Maven. Debian makes it easy to do the right thing.

And, wtf, do you really say that Arch Linux DOESN’T disable network access during package compilation‽

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 17:36 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> Also, includes sizzle.js, not stating the version, and not including that at compile time (or, better, at runtime!), no, it’s bundled, and its licence possibly not even honoured.

Sizzle.js is copyrighted by jQuery and started as a part of jQuery. Anything more serious?

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 18:39 UTC (Mon) by karkhaz (subscriber, #99844) [Link] (6 responses)

> And, wtf, do you really say that Arch Linux DOESN’T disable network access during package compilation‽

That's correct, the Arch build system downloads the source as part of the build. There's no centralized storage of package sources. See for example the package for glibc [1], the PKGBUILD file includes a "sources" array with the glibc source tarball. The package build system first downloads that and verifies the checksums, then runs the prepare() function (to apply arch-specific patches), etc.

A few years ago I tried building every Arch Linux package from source, and didn't get terribly far with that, but reported this list [2] of packages for which the URL in the PKGBUILD was incorrect or broken (they've since all been fixed).

[1] https://github.com/archlinux/svntogit-packages/tree/packa...
[2] https://archlinux.org/todo/packages-with-missing-sources/

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 18:56 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (5 responses)

No, not that.

Downloading the source you actually want to build (if you verify it) is fine.

But during the actual build process, network access shouldn’t be available. I personally added this to pbuilder in Debian for Linux, using unshare(1). I don’t currently have a solution for nōn-Linux yet.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 19:14 UTC (Mon) by karkhaz (subscriber, #99844) [Link] (4 responses)

I see. Well, that exists too. I can't remember details but I'm almost certain that LibreOffice does some bizarre thing where it fetches its various language packs from the server during the build.

Indeed you can see in the build() function of the PKGBUILD [1] that the autogen.sh script takes a "--with-gdrive-client-id" flag, which implies that it's pulling something out of Google Drive, and a few lines down it writes a touchfile called "src.downloaded". This is obviously a script written by upstream. So no idea how Debian deals with this. I think I've seen other awful stuff like this, but LibreOffice's build stood out in my memory as being exceptionally painful.

[1] https://github.com/archlinux/svntogit-packages/blob/packa...

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 20:47 UTC (Mon) by Wol (subscriber, #4433) [Link]

How long ago was that?

Bear in mind, that one of the major differences between OpenOffice and LibreOffice in the early years was LO paid off a hell of a lot of technical debt, including a *massive* rewrite of the build system.

Which is why LO can now release new versions every few months, and I can build it at regular intervals on gentoo, while OpenOffice last I heard took 14 months to release a bodged fix for a major security problem because nobody could get the build system to build ...

Cheers,
Wol

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 22:41 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

The gdrive client ID thing is because LibreOffice itself can interact with Google Drive: this configure option lets you bake a client ID to do that access into the binary. It doesn't access Google Drive during the build. It's not *that* crazy. :)

It does download a fairly big pile of tarballs: you can do this before the final make, but after configure runs, via 'make fetch': the tarballs can be kept between runs. This just means that your build system needs to be able to split configure and make apart and run them separately, and run things that allow network access in between: it doesn't mean that you need to allow network access during the configure or make phases.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 22:56 UTC (Mon) by mirabilos (subscriber, #84359) [Link]

Almost.

Fetch them beforehand, by using package system means.

Backdoored configure scripts are a thing.

Fetch dependencies? Aaaaaah!

Posted Sep 14, 2021 11:07 UTC (Tue) by karkhaz (subscriber, #99844) [Link]

Thanks for these details. I did remember that LibreOffice failed to build when I blocked its network access but didn't have time to dig into it too deeply now, good to know that it doesn't download from GDrive for the build :)

However, I think the point still remains. Sure, there's a difference between the configure and build stage from an upstream perspective, but all of that is part of the "build" stage for a package manager. In particular, the Arch build system has a means to automatically check the hashes of all of the package sources that are declared in the "sources" array, before doing anything else. Obtaining source files later in the build process breaks reproducible builds and all the guarantees you get from that. Maybe the autogen script checks those hashes itself, and maybe it doesn't, but sidechaining the OS's own validation mechanisms doesn't really bode well.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 5:51 UTC (Mon) by LtWorf (subscriber, #124958) [Link]

Hehehe, npm packages.

Some weird stuff I found inside js packages:

* .c and .h files (why??)
* .py files
* A library made by a tiny js file, coming with HTML documentation, and a 5MiB example image of a coffee cup.
* windows exe files
* fonts
* configuration files for every possible IDE and editor

I too do not trust languages that autodownload stuff. Plus the build agents are harder to secure.

Fetch dependencies? Aaaaaah!

Posted Sep 13, 2021 10:32 UTC (Mon) by scientes (guest, #83068) [Link]

> I know why I fully subscribe to the Debian schema of building everything themselves.

I talked to some Arch developers about how with Debian you could download the entire source archive and build the whole thing from source offline, and they were like "that is a niche feature.".......

While some git integration would be nice, not leaving Debian anytime soon.

Cro: Maintain it With Zig

Posted Sep 15, 2021 0:12 UTC (Wed) by net_benji (subscriber, #75195) [Link]

Last month the CoRecursive podcast had an interview with the creator of Zig:
https://corecursive.com/067-zig-with-andrew-kelley/
I thought it was good and inspiring. He talks about the Zig language but also about how he quit his job to work on Zig full-time.

Cro: Maintain it With Zig

Posted Sep 16, 2021 19:45 UTC (Thu) by ejr (subscriber, #51652) [Link] (2 responses)

So how long until someone writes a preprocessor and calls it... Zag?

Cro: Maintain it With Zig - Make your time

Posted Sep 16, 2021 21:27 UTC (Thu) by amacater (subscriber, #790) [Link] (1 responses)

There can, of course, be only one response.

Move 'ZIG'. For great justice.

Cro: Maintain it With Zig - Make your time

Posted Sep 16, 2021 21:56 UTC (Thu) by ejr (subscriber, #51652) [Link]

And just to be clear, I don't think either of us want to take away from the work not only in defining a language but also in implementing it.

In some ways, the response of "But the name..." is perfect. I'm undecided on the language. I've seen too many variations on a theme. I've never bet on the right ones. So my taste doesn't matter.


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds