Real Rust type system usefulness [LWN.net]

Real Rust type system usefulness

Posted Nov 6, 2024 12:20 UTC (Wed) by matthias (subscriber, #94967) [Link] (7 responses)

> It would be nicer if constraints encoding was first-class language construct and not a side-effect of a type system,

Using the type system has one big advantage. You can encode arbitrary constraints. The constraints are spelled out in the documentation and are enforced by the module that implements the type. The enforcing is done by ensuring that only valid values can be constructed.

If you add first-class features, then you have a syntax for describing constraints. But this syntax limits the nature of constraints that can be encoded. Such a syntax would still be nice for types of constraints that are commonly used. But the type system has to be there to allow to encode everything that is not covered by the syntax for constraint specification.

Rust already allows to specify constraints directly in the language. If I write (bool,bool), then rust enforces that the two bytes of memory may only contain values 0 and 1. Specifying tuples, structs, and enums naturally gives constraints on the represented values. It just happens that until now, nobody has added the feature of defining lock orders as a first-class feature of the language. One could add such a feature, but then the next person comes and wants to encode special properties for MMU handling in the type system ;) So there will always be constraints that one wants to express, but which are not first-class features. Which types of contraints should be first-class features is of course open for debate.

Real Rust type system usefulness

Posted Nov 6, 2024 12:36 UTC (Wed) by daroc (editor, #160859) [Link] (6 responses)

There are some systems that let you write arbitrary pre- and post- conditions for functions. They don't suffer from the problem you're pointing out, because you can write your own custom constraints.

However, the line between systems like this and type systems can actually be a bit blurry. Languages with full dependent types, such as Idris, incorporate those constraints directly into the type system. Other languages (Ada SPARK, I think?) keep them separate.

Ultimately, like many other problems in programming language design, how you integrate features for making programs more correct into the rest of the language is a complex tradeoff between consistency, explainability, and what people expect from the language. Whether integrating these things into Rust's type system is ultimately the best approach for the language or not, I cannot say.

Real Rust type system usefulness

Posted Nov 6, 2024 16:37 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

There's clearly a spectrum between writing a function call which works max(4, 10) and nonsense like maxxx("Lots", 10, 1.6)

Almost any language will reject attempts to call a function which simply doesn't exist, many at compile time.

Fewer but still most will reject a call with the wrong number of parameters

Fewer again, but still a great many, have type constraints, "Lots" is a string, that's probably not OK

And then some will observe finer distinctions, "Lots" is not only a string, it's not even a string like "15" which can be coerced into a number, it's something else. This is rarely something available at compile time and when it is available it's usually magic, not general purpose.

Rust's compiler, faced with a call to std::str::from_utf8([/* Some values */]) for example, will cheerfully look at a constant array you're passing, decide whether it's not actually UTF-8 encoded text, and warn (which can be suppressed if appropriate) that as a result this call always errors. But that's magic, if we make our own type with different rules the same doesn't happen.

Shifting left is valuable. Finding out that C++ operator precedence means your logging doesn't have the desired effect... by reading the useless results in a customer bug report is bad. Finding out when the CI testing the potential new release builds fails six hours after you wrote the code is better but still inconvenient. Finding out five minutes after writing the code that now the unit test cases broke locally is enormously better but not perfect. Finding out because a red squiggly line appeared in the editor when you wrote the logging code is extremely good. Only "It just fucking works" is finally the ideal and unlike the other options that's a language design change and doesn't help with your next bug.

Real Rust type system usefulness

Posted Nov 6, 2024 22:36 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> There are some systems that let you write arbitrary pre- and post- conditions for functions.

But they typically are enforced only at runtime, with limited compile-time support.

A good example is Eiffel language.

Real Rust type system usefulness

Posted Nov 6, 2024 22:51 UTC (Wed) by daroc (editor, #160859) [Link] (3 responses)

Certainly, there are systems that enforce pre- and post- conditions at run time. But they're not so overwhelmingly common that I would call them "typical". Idris [1], LiquidHaskell [2], and Ada SPARK [3] all check their conditions at compile time.

[1]: https://www.idris-lang.org/
[2]: https://ucsd-progsys.github.io/liquidhaskell/
[3]: https://www.adacore.com/about-spark

Real Rust type system usefulness

Posted Nov 6, 2024 23:06 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

I'm not familiar with Idris and Haskell, but SPARK can't represent this kind of invariants at compile time.

It has something similar ("Type Invariant") that can be checked at compile time, but it's limited with the type itself. It can't express an invariant "there exist a path between this type and the root of the lock hierarchy".

Real Rust type system usefulness

Posted Nov 6, 2024 23:19 UTC (Wed) by daroc (editor, #160859) [Link]

Ah, alright. I haven't used SPARK, so I must have misunderstood. I thought that SPARK could put function preconditions through a SAT solver at compile time.

I have used the other two, though, and can confirm that they operate entirely at compile time.

Real Rust type system usefulness

Posted Nov 11, 2024 16:30 UTC (Mon) by zmower (subscriber, #3005) [Link]

I don't see why you would need more than the type itself as Ada does not have multiple inheritance. There's always a path to the root!

Real Rust type system usefulness

Posted Nov 6, 2024 12:40 UTC (Wed) by LtWorf (subscriber, #124958) [Link]

Ada exists, but I think it's not very used these days.

Real Rust type system usefulness

Posted Nov 6, 2024 16:36 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (4 responses)

There has been talk of first-class pattern types, which would be a significant step in this direction: https://github.com/rust-lang/keyword-generics-initiative/...

I have no idea if it's still being worked on or not, but there's no obvious indication that it has been abandoned.

Pattern Types.

Posted Nov 6, 2024 22:31 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Some day I want Pattern types so that I can make BalancedI8, BalancedI16 and so on which are types exactly like the signed integer types i8, i16 etc. but without their lop-sided most negative value.

Without this value, these types have a niche, meaning Option<BalancedI8> is a single byte, Option<BalancedI32> is only four bytes, and so on. Pattern Types unlock niches for ordinary stable Rust by third parties.

This delivers the same functionality as making these most negative values sentinels, but with the much improved ergonomics of Option. I think there are a lot of variables in software for which -10, +10 and crucially *zero* (which would be forbidden in the similarly sized but stdlib-blessed NonZeroI8 type) are reasonable values but -128 is not.

Today Rust's niches can be made at will by its own standard library (hence NonZeroI8 for example, but also OwnedFd and numerous others) but for us lay folk writing our own software this feature is permanently unstable compiler only magic (you can of course invoke this magic at your own risk, that's what my unstable "nook" crate does, but it's specifically marked as never-to-be-stabilised). The only lay person accessible new niches are from the sum types (enum). If I make an enum with say two hundred variants that contain no data, Rust will use two hundred of the 256 bit patterns for a byte to represent that enum and, it will mark all the remaining 56 patterns as a niche.

This is how the current era of CompactString works. The type has 24 bytes, the last byte is an enum, with 128 values representing ASCII characters, then 64 values representing non-ASCII last bytes of valid 24 byte UTF-8 strings, then lengths zero through to 23 inclusive signifying that the string didn't use this last byte because it was shorter, then one value signifying the strings was too big and lives on the heap instead and finally one more for static text. This leaves a large niche, so Option<CompactString> is a 24 byte data structure which can be None, or Some(string) where the string only needs a heap allocation for text longer than 24 UTF-8 code units. Fancy.

Real Rust type system usefulness

Posted Nov 8, 2024 0:42 UTC (Fri) by aszs (subscriber, #50252) [Link] (2 responses)

From lurking in the rust-lang Zulip it appears that it is progressing but it's a pretty invasive change to the type system so it's blocked on completion of the new Chalk solver and will thus take a while. To tie this into earlier comments, Rust patterns are essentially liquid types like liquid Haskell but with the patterns constrained to set operations to avoid having to require a SAT solver -- there's not much appetite to make Rust compile times NP Complete!

Real Rust type system usefulness

Posted Nov 8, 2024 16:53 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> there's not much appetite to make Rust compile times NP Complete!

Um. Rust compile times are already NP-hard… would promotion to NP-competeness do that much difference?

Real Rust type system usefulness

Posted Nov 11, 2024 17:22 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

If you want to start nitpicking, that blog post does not describe an instance of NP-hardness. It describes an instance of co-NP-hardness (i.e. you can only quickly verify that the compiler correctly checked for pattern refutability if the compilation fails and gives you an error identifying the refutation, so the problem being solved is the complement of SAT rather than SAT itself), which is not provably equivalent to NP-hardness (and is generally believed to be non-equivalent).

But anyway, I think the broader objection to this argument looks roughly like the following:

* It is not typical to match patterns of the size and complexity of the patterns shown in that blog post. They are a pathological edge case that nobody uses in practice.
* Refutable patterns (if let, match with a _ case, etc.) only need to be checked at runtime (and for structural correctness), which is also fairly straightforward unification logic.
* If you want to start allowing refutable pattern matching at compile time, things get significantly harder, because now every type has to carry around information about which patterns it is known to match, and you have to use that information to more precisely infer which other patterns are refutable, or else it's not doing much good (what is the point of knowing that my type matches 1..5 if I can't write separate match arms for 1, 2, 3..5 without a catchall _ arm?). That means you have to match patterns against each other, rather than merely matching objects or (simple/non-pattern) types against patterns. The unification logic gets a lot more complicated in that case, and it is far more plausible that it has an accidental complexity explosion in some edge case that users might really run into.