A taste of Rust

April 17, 2013

This article was contributed by Neil Brown

Rust, the new programming language being developed by the Mozilla project, has a number of interesting features. One that stands out is the focus on safety. There are clear attempts to increase the range of errors that the compiler can detect and prevent, and thereby reduce the number of errors that end up in production code.

There are two general ways that a language design can help increase safety: it can make it easier to write good code and it can make it harder to write bad code. The former approach has seen much success over the years as structured programming, richer type systems, and improved encapsulation have made it easier for a programmer to express their goals using high-level concepts. The latter approach has not met with quite such wide success, largely because it requires removing functionality that easily leads to bad code. The problem there, of course, is that potentially dangerous functionality can also be extremely useful. The classic example is the goto statement, which has long been known to be an easy source of errors, but which most major procedural languages still include as they would quickly hit difficulties without it (though Rust, like Python, has avoided adding goto so far).

Nevertheless it is this second approach that Rust appears to focus on. A number of features that are present in languages like C, and some that are still present in more safety-conscious languages like Java, are disallowed or seriously restricted in Rust. To compensate, Rust endeavors to be a sufficiently rich language that those features are not necessary. One way to look at this focus is that it endeavors to produce compile-time errors where other languages would generate runtime errors (or core dumps). This not only reduces costs at runtime, but should decrease the number of support calls from customers.

A large part of this richness is embodied in the type system. Rust has many of the same sorts of types that other languages do (structures, arrays, pointers, etc.), but they are often, as we shall see, somewhat narrower or more restricted than is common. Coupled with this are various ways to broaden a type, in a controlled way, so that its shape fits more closely to what the programmer really wants to say. And for those cases where Rust turns out not to be sufficiently rich, there is a loophole.

Rigidly defined areas of doubt and uncertainty

In a move that strongly echos the "System" module in Modula-2 (which provides access to so-called "low-level" functionality), Rust allows functions and code sections to be declared as "unsafe". Many of the restrictions that serve to make Rust a safer language do not apply inside "unsafe" code.

This could be viewed as an admission of failure to provide a truly safe language, however that view would be overly simplistic. In the first place, Gödel's incompleteness theorem seems to suggest that any programming language which is rich enough to say all that we might want to say is also so rich that it cannot possibly be considered to be "safe". But on a more practical level, a moment's thought will show that code inside the language compiler itself is deeply "unsafe" in the context of the program that it generates. A bug there could easily cause incorrect behavior without any hope of a compiler being able to detect it. So complete safety is impossible.

Given that the reality of "unsafe" code is unavoidable, it is not unreasonable to allow some of it to appear in the target program, or in libraries written in the target language, instead of all of it being in the compiler. This choice to have the language be safe by default allows most code to be safe while providing clearly defined areas where unchecked code is allowed. This, at the very least, makes it easier to identify which areas of code need particular attention in code reviews.

If we look at the Rust compiler, which itself is written in Rust, we find that of the 111 source code files (i.e. the src/librustc directory tree, which excludes the library and runtime support), 33 of them contain the word "unsafe". 30% of the code being potentially unsafe may seem like a lot, even for a compiler, but 70% being guaranteed not to contain certain classes of errors certainly sounds like a good thing.

Safer defaults

As we turn to look at the particular ways in which Rust encourages safe code, the first is not so much a restriction as a smaller scale version of the choice to default to increased safety.

While many languages have keywords like "const" (C, Go) or "final" (Java) to indicate that a name, once bound, will always refer to the same value, Rust takes the reverse approach. Bindings are immutable by default and require the keyword "mut" to make them mutable, thus:

    let pi = 3.1415926535;

will never allow the value of pi to change while:

    let mut radius = 1.496e11;

will allow radius to be altered later. The choice of mut, which encourages the vowel to sound short, in contrast to the long vowel in "mutable", may be unfortunate, but encouraging the programmer to think before allowing something to change is probably very wise.

For some data structures, the "mutable" setting is inherited through references and structures to member elements, so that if an immutable reference is held to certain objects, the compiler will ensure that the object as a whole will remain unchanged.

Pointers are never NULL

Pointer errors are a common class of errors in code written in C and similar languages, and many successors have taken steps to control these by limiting or forbidding pointer arithmetic. Rust goes a step further and forbids NULL (or nil) pointers as well, so that any attempt to dereference a pointer will result in finding a valid object. The only time that a variable or field does not contain a valid pointer is when the compiler knows that it does not, so a dereference will result in a compile-time error.

There are of course many cases where "valid pointer or NULL" is a very useful type, such as in a linked list or binary tree data structure. For these cases, the Rust core library provides the "Option" parameterized type.

If T is any type, then Option<T> is a variant record (sometime called a "tagged union" but called an "enum" in Rust), which contains a value of type T associated with the tag "Some", or it contains just the tag "None". Rust provides a "match" statement, which is a generalization of "case" or "typecase" from other languages, that can be used to test the tag and access the contained value if it is present.

Option is defined as:

    pub enum Option<T> {
        None,
        Some(T),
    }

so:

    struct element {
	value: int,
	next: Option<~element>,
    }

could be used to make a linked list of integers. The tilde in front of element indicates a pointer to the structure, similar to an asterisk in C. There are different types of pointers which will be discussed later — tilde (~) introduces an "owned" pointer while the at sign (@) introduces a "managed" pointer.

Thus null-able pointers are quite possible in Rust, but the default is the safer option of a pointer that can never be NULL, and when a null-able pointer is used, it must always be explicitly tested for NULL before that use. For example:

    match next {
        Some(e) => io::println(fmt!("%d", next.value)),
        None    => io::println("No value here"),
    }

In contrast to this, array (or, as Rust calls them, "vector") references do not require the compiler to know that the index is in range before that index is dereferenced. It would presumably be possible to use the type system to ensure that indexes are in range in many cases, and to require explicit tests for cases where the type system cannot be sure — similar to the approach taken for pointers.

Thus, while it is impossible to get a runtime error for a NULL pointer dereference, it is quite possible to get a runtime error for an array bounds error. It is not clear whether this is a deliberate omission, or whether it is something that might be changed later.

Parameterized Generics

Unsurprisingly, Rust does not permit "void" pointers or down casts (casts from a more general type, like void, to a more specific type). The use case that most commonly justifies these — the writing of generic data structures — is supported using parameterized types and parameterized functions. These are quite similar to Generic Classes in Java and Templates in C++. Generics in Rust have a simple syntax and any individual non-scalar type and any function or method can have type parameters.

A toy example might be:

    struct Pair<t1,t2> {
 	first: t1,
	second: t2,
    }

    fn first<t1:Copy,t2:Copy>(p: Pair<t1,t2>) -> t1 {
 	return copy p.first;
    }

We can then declare a variable:

    let foo = Pair { first:1, second:'2'};

noting that the type of foo is not given explicitly, but instead is inferred from the initial value. The declaration could read:

    let foo:Pair<int, char> = Pair { first:1, second:'2'};

but that would be needlessly verbose. We can then call

    first(foo);

and the compiler will know that this returns an int. In fact it will return a copy of an int, not a reference to one.

Rust allows all type declarations (struct, array, enum), functions, and traits (somewhat like "interfaces" or "virtual classes") to be parameterized by types. This should mean that the compiler always knows enough about the type of every value to avoid the need for down casts.

Keep one's memories in order

The final safety measure from Rust that we will examine relates to memory allocation and particularly how that interacts with concurrency.

Like many modern languages, Rust does not allow the programmer to explicitly release (or "free" or "dispose" of) allocated memory. Rather, the language and runtime support take care of that. Two mechanisms are provided which complement the inevitable "stack" allocation of local variables. Both of these mechanisms explicitly relate to the multiprocessing model that is based on "tasks", which are somewhat like "goroutines" in Go, or Green threads. They are lightweight and can be cooperatively scheduled within one operating system thread, or across a small pool of such threads.

The first mechanism provides "managed" allocations which are subject to garbage collection. These will be destroyed (if a destructor is defined) and freed sometime after the last active reference is dropped. Managed allocations are allocated on a per-task heap and they can only ever be accessed by code running in that task. When a task exits, all allocations in that task's heap are guaranteed to have been freed. The current implementation uses reference counting to manage these allocations, though there appears to be an intention to change to a mark/sweep scheme later.

The second mechanism involves using the type system to ensure that certain values only ever have one (strong) reference. When that single reference is dropped, the object is freed. This is somewhat reminiscent of the "Once upon a type" optimization described by David Turner et al. for functional languages. However, that system automatically inferred which values only ever had a single reference, whereas Rust requires that this "once" or "owned" (as Rust calls it) annotation be explicit.

These single-reference allocations are made on a common heap and can be passed between tasks. As only one reference is allowed, it is not possible for two different tasks to both access one of these owned objects, so concurrent access, and the races that so often involves, are impossible.

May I borrow your reference?

Owned values can have extra references providing they are "borrowed" references. Borrowed references are quite weak and can only be held in the context of some specific strong reference. As soon as the primary reference is dropped or goes out-of-scope, any references borrowed from it become invalid and it is a compile-time error if they are used at all.

It is often convenient to pass borrowed references to functions and, more significantly, to return these references from functions. This is particularly useful as each of owned, managed, and on-stack references can equally be borrowed, so passing a borrowed reference can work independently of the style of allocation.

The passing of borrowed references is easily managed by annotating the type of the function parameter to say that the reference is a borrowed reference. The compiler will then ensure nothing is done with it that is not safe.

Returning borrowed references is a little trickier as the compiler needs to know not only that the returned reference is borrowed, but also needs to know something about the lifetime of the reference. Otherwise it cannot ensure that the returned reference isn't used after the primary reference has been dropped.

This lifetime information is passed around using the same mechanism as type-parameters for functions. A function can be declared with one or more lifetime parameters (introduced by a single quote: '), and the various arguments and return values can then be tagged with these lifetimes. Rust uses type inference on the arguments to determine the actual values of these parameters and thus the lifetime of the returned value. For example, the function choose_one declared as:

    fn choose_one<'r>(first: &'r Foo, second: &'r Foo) -> &'r Foo

has one lifetime parameter (r) which applies to both arguments (first and second), and to the result. The ampersand (&) here indicates a borrowed reference.

When it is called, the lifetime parameter is implicit, not explicit:

    baz = choose_one(bar, bat);

The compiler knows that the lifetime of r must be compatible with the lifetimes of bar and bat so it effectively computes the intersection of those. The result is associated with baz as its lifetime. Any attempt to use it beyond its lifetime will cause a compile-time error.

Put another way, Rust uses the lifetime parameter information to infer the lifetime of the function's result, based on the function's arguments. This is similar to the inference of the type of foo in our Pair example earlier. A more thorough explanation of these lifetimes can be found in the "Borrowed Pointers Tutorial" referenced below.

This extra annotation may feel a little clumsy, but it seems a relatively small price to pay for the benefit of being able to give the compiler enough information that it can verify all references are valid.

Time for that loophole

Allowing objects to be allocated on the stack, in a per-task heap, or in a common heap providing there is only ever one (strong) reference ensures that there will be no errors due to races with multiple tasks accessing the same data, but the price seems a little high to pay — it seems to mean that there can be no sharing of data between tasks at all.

The key to solving this conundrum is the "unsafe" code regions that were mentioned earlier. Unsafe code can allocate "raw" memory which has no particular affiliation or management, and can cast "raw" pointers into any other sort of pointer.

This allows, for example, the creation of a "queue" from which two owned references could be created, one of a type that allows references to be added to the queue, one which allows references to be removed. The implementation of this queue would require a modest amount of "unsafe" code, but any client of the queue could be entirely safe. The end points might be passed around from task to task, can be stored in some data structure, and can be used to move other owned references between tasks. When the last reference is dropped the "queue" implementation will be told and can respond appropriately.

The standard library for Rust implements just such a data structure, which it calls a "pipe". Other more complex data structures could clearly be implemented to allow greater levels of sharing, while making sure the interface is composed only of owned and managed references, and thus is safe from unplanned concurrent access and from dangling pointer errors.

Is safety worth the cost?

Without writing a large body of code in Rust it is impossible to get a good feel for how effective the various safety features of the language really are. Having a general ideological preference for strong typing, I expect I would enjoy the extra expressiveness and would find the extra syntax it requires to be a small price to pay. The greatest irritation would probably come from finding the need to write too much (or even any) unsafe code, or from having to give up all safety when wanting to sacrifice any.

Of course, if that led to practical ideas for making the language able to make richer assertions about various data, or richer assertions about the level of safety, and could thus enlarge the scope for safe code and decreasing the need for unsafe code, then the irritation might well result in some useful pearls.

After all, Rust is only at release "0.6" and is not considered to be "finished" yet.

References

Rust tutorial
Borrowed pointers tutorial
Rust Reference Manual
Blog post about Rust memory management
Another blog post about Rust

Index entries for this article
GuestArticles	Brown, Neil

A taste of Rust

Posted Apr 17, 2013 22:46 UTC (Wed) by mikefletcher (subscriber, #60058) [Link]

I poked around and the references to unsafe all seemed to be mostly calling LLVM. There were also a few false-positives related to implementing a compiler that has a keyword called "unsafe".

looks like Haskell

Posted Apr 17, 2013 23:07 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (19 responses)

Or rather, an attempt to take ideas from Haskell without requiring the programmer to do everything in a functional way.

looks like Haskell

Posted Apr 17, 2013 23:47 UTC (Wed) by ncm (guest, #165) [Link] (4 responses)

It takes ideas from many languages. A type-safe Erlang, a scrubbed-up C++, a not-stupidly-weak Go, a fast Haskell, a fast ML, a history-sanitized Lisp; it's a real hodgepodge, and that's no criticism. Equally important are those it doesn't take anything from. One of very few languages with a real destructor and a pledge not to rely on GC in the standard library, it could turn out to be useful, someday, for writing programs that heretofore could only have been in C++.

looks like Haskell

Posted Apr 18, 2013 5:52 UTC (Thu) by nevyn (guest, #33129) [Link]

> One of very few languages with a real destructor and a pledge not to rely on GC in the standard library

So the "managed" allocated data using reference counts and thus. having RAII like properties is guaranteed to continue to exist?

looks like Haskell

Posted Apr 19, 2013 8:21 UTC (Fri) by k8to (guest, #15413) [Link] (1 responses)

Nitpick: ML is already fast, if you use the ocaml implementation. It's already competitive with C++ or C, and has been for like 8 years.

looks like Haskell

Posted Apr 24, 2013 20:36 UTC (Wed) by Tuna-Fish (guest, #61751) [Link]

There are many different kinds of speed. OCaml is very fast in *throughput*, but the differences in memory management strategies guarantee that Rust should always be able to promise less worst-case latency. In fact, Rust seems to sacrifice quite a bit of throughput to provide those better latency guarantees (among other things, the language promotes copying in many places where OCaml would be fine sharing), so I would expect OCaml to win most throughput benchmarks. However, if what you want is to guarantee that no task can take longer than 8ms to complete (fe. you want smooth 120Hz frame rates in a game), Rust should be better.

It's a tradeoff. Fundamentally, if a language or system wants to be really good on one metric, it has to sacrifice some of the other. OCaml and Rust choose different points, and there is nothing wrong with the choice of either. However, this does mean that there are some tasks where Rust is better suited.

looks like Haskell

Posted Apr 22, 2013 1:33 UTC (Mon) by zooko (guest, #2589) [Link]

I think that's why it is called "Rust"! Because it was originally envisioned to be made up entirely of hoary old ideas that had already been proven out in other languages.

looks like Haskell

Posted Apr 17, 2013 23:55 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Not really Haskell. It's more like Erlang - Rust emphasizes thread-local data with explicit object passing between threads.

Looks like something good

Posted Apr 18, 2013 3:47 UTC (Thu) by ofranja (guest, #11084) [Link]

It looks like OCaml, enhanced with Erlang concurrency model, and Cyclone type tagging for different memory management models.

Nice to hear about the language, it seems that they did a good research for gluing good things together, instead of implementing just more of the same old stuff. I'll keep an eye on it.

looks like Haskell

Posted Apr 17, 2013 23:57 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (11 responses)

Yeah, I'd like Haskell to become a popular language rather than Rust if only because Haskell already has a large pool of libraries and community. However, if people are that adverse to Haskell's more mathy take on things (such as using terms from category theory directly instead of naming them with more "friendly" terms) and how to handle laziness, I think Rust might be good enough.

looks like Haskell

Posted Apr 18, 2013 0:56 UTC (Thu) by droundy (subscriber, #4559) [Link] (10 responses)

The problem with Haskell that strongly distinguishes it from rust is that Haskell is designed in a way that makes it very hard to write fast code, and very hard to write a compiler that generates fast code. Laziness is a cool idea, but has huge implications for efficiency. Rust looks like it should be possible to have a compiler giving C-like efficiency (with a caveat for array bounds checking).

I love Haskell, but so far as I can tell, it's not playing the same game that rust is, which is creating a fast, safe, low-overhead language.

looks like Haskell

Posted Apr 18, 2013 1:58 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (2 responses)

I think one thing that I'd likely miss with Rust is Haskell's 'do' notation. Having a sequence of operations fail if any of them fail is nice to have. The biggest problem with the laziness that I've found is that I/O is lazy. I've been bitten a number of times forgetting a strictness marker on results of file reading and such.

That said, there is a lot of research going into making GHC generate good and fast code. There are certainly some very nice examples of GHC producing very fast code from fairly idiomatic Haskell. However, once you get past that cliff, things tend to get pretty hairy quickly.

looks like Haskell

Posted Apr 18, 2013 10:15 UTC (Thu) by Auders (guest, #53318) [Link] (1 responses)

Lazy I/O is on the way out, with iteratee I/O (pipes, conduit...) replacing it.

looks like Haskell

Posted Apr 19, 2013 17:36 UTC (Fri) by b7j0c (guest, #27559) [Link]

right, but like String -> Text, laziness is another major legacy problem for haskell. how much of hackage has been built with the older thinking? how much of hackage is now merely replicated attempts to get mindshare for the latest Iteratee variant?

you'll never purge Strings or laziness (or bang annotations etc) from hackage, most of those packages aren't that well maintained. rust is the reboot haskell needs.

looks like Haskell

Posted Apr 18, 2013 2:46 UTC (Thu) by rsidd (subscriber, #2582) [Link] (6 responses)

There are also important algorithms (eg dynamic programming, any kind of linear algebra) that are ill-suited to a purely functional language like Haskell. It can be done but hoops need to be jumped through and the results are (to me) unreadable. Sometimes it is clearer and more efficient to have a mutable array or matrix where operations are done in-place. And recursion instead of looping is a nice idea until you try to make it tail-recursive for efficiency and find it is ugly or just impossible. I prefer the approach of OCaml and (apparently) Rust where one is allowed to declare variables mutable, write for-loops, etc.

looks like Haskell

Posted Apr 19, 2013 9:24 UTC (Fri) by peter-b (guest, #66996) [Link] (4 responses)

Given that every loop has an isomorphic tail recursive form, I suspect that you are running up against "ugly" much more often than "impossible"!

looks like Haskell

Posted Apr 19, 2013 9:39 UTC (Fri) by rsidd (subscriber, #2582) [Link] (3 responses)

Read "impossible" as "impossible for me to get my mind around" :) I'm thinking of nested loops in particular. Also things like the Fibonacci function where a naive implementation

fib 0 = 0
fib 1 = 0
fib n = fib (n-1) + fib (n-2)

is not just non-tail-recursive but incredibly inefficient. And every "functional" efficient version that I've seen is just too hard to understand (for me) by reading the code. And that's for a trivial beginner-level example.

looks like Haskell

Posted Apr 19, 2013 9:54 UTC (Fri) by neilbrown (subscriber, #359) [Link] (2 responses)

How about something like:

fib 0 = (1,1)
fib n = (a+b, a) where (a,b) = fib (n-1)

That's reasonably efficient - not sure how easy it is to make it tail recursive.

looks like Haskell

Posted Apr 19, 2013 10:19 UTC (Fri) by rsidd (subscriber, #2582) [Link]

Yes, thanks, that's efficient though I'm not sure how easy it is to make it tail-recursive either. When it's more complicated stuff like matrix multiplication or Needleman-Wunsch, it's just too hard for me -- and, I suspect, for most "normal" people. Haskell has clearly proved that it is very useful for some people and some tasks, but I suspect that's the maximum for pure functional programming.

looks like Haskell

Posted Apr 21, 2013 15:12 UTC (Sun) by ballombe (subscriber, #9523) [Link]

Converting your code to tail-recursion give:

fib n = fibr n 1 1 where
fibr 0 a b = (a,b)
fibr n a b = fibr (n-1) (a+b) a

(thought there are much faster algorithms to compute this)

looks like Haskell

Posted Apr 19, 2013 17:38 UTC (Fri) by b7j0c (guest, #27559) [Link]

graphs have also been a problem for pure functional languages

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 0:13 UTC (Thu) by lonely_bear (subscriber, #2726) [Link] (24 responses)

Speaking of programming languages focusing on safety, I think Ada may suit the shoes more. More information about Ada could be found at http://www.adaic.org/.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 0:23 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (15 responses)

Please, stop peddling Ada. It's NOT any safer than other languages with boundary-checked arrays in the RealWorld(tm).

It has no protection on the type system level against null pointers, no dependent types (alas, Rust dropped them as well), no modern concurrency support (compartmentalization of data), memory handling support is not that good, etc.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 1:00 UTC (Thu) by gmaxwell (guest, #30048) [Link]

I was a bit disappointed that rust removed the typestate stuff... There is a _lot_ about program correctness beyond memory (/race) safety, and a lot of it can be captured with a sufficiently powerful type system— but the existing systems that do this are not very mortal-compatible and Rust's typestate looked pretty manageable.

Though I've been told that it's possible to accomplish many of the useful checks in rust without typestate, and I suppose it might be strengthened again in later versions once the behavior and appearance of real rust programs and their bugs is better understood.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 4:59 UTC (Thu) by lonely_bear (subscriber, #2726) [Link] (3 responses)

Would you elaborate more on "no modern concurrency support (compartmentalization of data)"? Are you referring something like http://en.wikibooks.org/wiki/Ada_Programming/Tasking#Prot...

I don't think boundary-checked arrays alone could be consider all of Ada's safe programming aspect.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 5:13 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

No, Ada concurrency is just a thin wrapper over classic threading with some nice synchronization primitives.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 5:31 UTC (Thu) by lonely_bear (subscriber, #2726) [Link] (1 responses)

Enlighten me, what do you referring to for "modern concurrency support (compartmentalization of data)"?

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 5:36 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Guarantees on the level of the type system that the data you are working on can't be changed by another thread.

Erlang guarantees this by making deep local copies of all the objects that are passed between threads, for example.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 21:19 UTC (Thu) by andreasb (guest, #80258) [Link] (9 responses)

> It has no protection on the type system level against null pointers

What does that mean? Does declaring an access type "not null" not suffice?

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 22:12 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

Making sure you can't pass a pointer that _can_ be null.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 18, 2013 23:20 UTC (Thu) by andreasb (guest, #80258) [Link] (6 responses)

The only way to pass a pointer that _can_ be null to a formal parameter of an access type with null exclusion is by passing a literal null. In that case an exception would be raised at the call site which would cause the subprogram call to not actually happen.

So I don't see the problem. Is there a code example demonstrating what you mean?

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 19, 2013 8:25 UTC (Fri) by k8to (guest, #15413) [Link] (5 responses)

Who catches this? the runtime or the compiler?

If the compiler, then how would you deal with it at runtime when it happens anyway? If the runtime, then what does the checker do when it happens?

The Rust choice of forcing you to write code that checks seems quite sensible.

Severe underestimation of dev laziness

Posted Apr 19, 2013 13:34 UTC (Fri) by man_ls (guest, #15091) [Link] (3 responses)

Yes, Java also forces you to check for (checked) exceptions. We all know how well it has resulted: if I got a penny every time I have seen this in production code:

try {
  ...
} catch (Exception e) {}

I might have bought myself a few pints.

Severe underestimation of dev laziness

Posted Apr 19, 2013 19:43 UTC (Fri) by k8to (guest, #15413) [Link] (2 responses)

It's a good point.

But the Rust example makes it seem that you can't say "i don't care if it's null", but rather have to write null-case code independent from access-case code. The null-case code could be degenerate, but the access-case code still can't access the null.

Severe underestimation of dev laziness

Posted Apr 20, 2013 8:37 UTC (Sat) by man_ls (guest, #15091) [Link] (1 responses)

That is what you might think about checked exceptions: at least the regular code does not have to deal with error conditions. It is however a bad policy: the empty catch() block can mask important error conditions and continue silently as if the process was successful, instead of failing.

With Java Null Pointer Exceptions are not checked (i.e. they don't force you to check for them with a try...catch), but at least it prevents the regular code to deal with the null pointer.

My preferred practice now is to leave exceptions as they are and catch them all only at the root level, logging them carefully. This is why I abandoned checked exceptions but in a very few cases. Catching all exceptions (including unchecked) and logging them also deals nicely with null pointers, without having to write extra code at every point — which is what Rust forces you to do.

Severe underestimation of dev laziness

Posted Apr 23, 2013 19:53 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

> without having to write extra code at every point — which is what Rust forces you to do.

Rust allows you to say "this value cannot be null" with less words than saying "might be null" would take. Hopefully developers would prefer the former rather than the latter (I certainly do) which means that the checks don't even have to be written. If Rust had do-syntax from Haskell, checking results in a chain of nullable calls wouldn't even be necessary. I assume some common library will probably implement ">>=" and ">>" in Rust which I'd use in a heartbeat when developing Rust rather than check for None manually everywhere.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 19, 2013 14:04 UTC (Fri) by andreasb (guest, #80258) [Link]

> Who catches this? the runtime or the compiler?

Out of range values are checked at runtime per the language specification and will raise a Constraint_Error exception when the check fails.

That said, a friendly compiler such as GNAT will print a warning when it can determine that something will unconditionally raise an exception at runtime.

> The Rust choice of forcing you to write code that checks seems quite sensible.

Ada's access types with null exclusion work exactly like that, as far as I can ascertain, only they just create warnings at compile time. Plus you get to have access types with valid null values if you so choose.

Attempting to dereference a null access will net you a Constraint_Error exception anyway, so a null exclusion just helps with pushing that exception back to the actual cause.

Ada should be mentioned when speaking of programming languages on safety

Posted Apr 19, 2013 15:01 UTC (Fri) by HiggsMechanism (guest, #45665) [Link]

procedure P (X : not null SomeAccessType);

Not just Ada...

Posted Apr 25, 2013 12:21 UTC (Thu) by phred14 (guest, #60633) [Link] (7 responses)

There was a strong bout of "safe programming languages" 30 or 40 years ago, and it seems that Ada is the only barely viable survivor, with FreePascal also making a minor presence. Now it looks like there are emerging efforts to get to a "safe programming language" again.

Which begs the question... Would it be easier to "make C safe" or to "make Modula-2 or Modula-3 usable"?

I focus on Modula-2/Modula-3 because others have mentioned that Ada has certain deficiencies which ought to be fixed. But Ada is rather rigidly standardized, and though there is a process to eventually change it, that takes time and makes experimentation difficult. Modula-2/Modula3 could make a better test-bed or starting point, recognizing that those would need work, as well. They're also less complex to begin with.

Even the usability argument against the old "safe languages" may have some issues, because while as defined they may not have been usable, most vendors back in the day had significant extensions. Back in the 1980s I used JPI Modula-2 to write a TSR (MSDOS Terminate-and-Stay-Resident, for those who are not fossils like me.) that implemented a reentrant interrupt handler for low-level communications. I also interfaced to a C module for (at the time) high-performance graphics, and both read and wrote legacy binary data files. It could be done, and it wasn't that hard.

Not just Ada...

Posted Apr 25, 2013 14:55 UTC (Thu) by viro (subscriber, #7872) [Link] (1 responses)

And how safe does it remain after you add those extensions?

Not just Ada...

Posted Apr 25, 2013 18:44 UTC (Thu) by phred14 (guest, #60633) [Link]

At least part of the idea has always been to mark those extensions as unsafe. You can't always be safe, but you can at least identify and minimize the times when you're not. You have a better idea of where the dragons are likely hiding.

Admittedly once you're messing with the stack frame, which I was with the interrupt handler, you've kind of unlocked the doors for the dragons, too. But I only played with the stack frame in that one example, knew it was a very risky thing, and tried to use extra care.

Not just Ada...

Posted Apr 25, 2013 17:13 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

The elephant in the room here is garbage collection. It's impossible to make a real and usable safe language without garbage collector of some sort. You might try to get away with reference counting and region inference (like in Ceylon) but it soon becomes too problematic.

Not just Ada...

Posted Apr 25, 2013 18:31 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

Did you mean Ceylon or Cyclone?

Not just Ada...

Posted Apr 26, 2013 14:55 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (2 responses)

Rust model is to use referenced-counted objects for a thread-shared heap and separated GC-heaps for each thread. As long as one keep number of thread-local objects small, the GC pauses should be on the scale of microseconds, not milliseconds, as with typical global GC solutions.

Not just Ada...

Posted Apr 26, 2013 22:20 UTC (Fri) by neilbrown (subscriber, #359) [Link] (1 responses)

This is not quite how I understand the documentation.

The thread-local heap is managed, though they currently use reference counting rather than mark/sweep garbage collection. That might change.

The thread-shared heap is not reference counted. The compiler tracks references of which there may only be on primary reference and possibly several borrowed references that can only exist within the scope of the primary reference. When the primary reference is dropped, the object is freed. No counting.

Not just Ada...

Posted Apr 26, 2013 23:18 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

> The thread-shared heap is not reference counted.

You are right, the ownership is fully tracked with the compiler.

> The thread-local heap is managed, though they currently use reference counting rather than mark/sweep garbage collection.

It is reference counting plus a cycle collector. For practical purposes it is just a form of GC with different trade-offs compared with a mark and sweep case. In Rust the reference counting allowed for simpler compiler implementation as one does not need to track the native register and stack roots. Plus for non-cyclic cases the garbage is collected faster. But when the cycle collector runs, it runs much slower than a mark-and-sweep due to significantly more complex algorithm.

A taste of Rust

Posted Apr 18, 2013 0:56 UTC (Thu) by dps (guest, #5725) [Link] (3 responses)

While Rust seems interesting, how does anybody not blessed with a Rust compiler actually build one?

I often implement the policy that pointers have an owner who must either free them or pass them on to somebody else in C, which avoids memory leaks without the costs of things like reference counts or garbage collection. A language that gives me that for free sounds attractive.

I definitely don't want to exclude reference counts or garbage collection from the toolbox for situations where objects having a single owner is impractical. Some sorts of graph manipulation look like prime candidates for the justified use of reference counting,

A taste of Rust

Posted Apr 18, 2013 1:23 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

You can have refcounted objects in Rust, especially for wrapped external resources.

A taste of Rust

Posted Apr 18, 2013 3:09 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

> While Rust seems interesting, how does anybody not blessed with a Rust compiler actually build one?

Much the same way that someone who doesn't have a C compiler can still compile gcc.

The tutorial linked at the end of the article contains instruction that worked for me.

A taste of Rust

Posted Apr 18, 2013 20:43 UTC (Thu) by tjc (guest, #137) [Link]

> The tutorial linked at the end of the article contains instruction that worked for me.

Worked for me too, eventually. I was getting

error: can't find crate for 'core'

when I tried to compile anything, which was caused by having my umask set to 077 during the build/install process. I changed the permissions on /usr/local/lib/rustc and a few other things to get it working.

A taste of Rust

Posted Apr 18, 2013 4:15 UTC (Thu) by xyzzy (guest, #1984) [Link]

Pointer borrowing sounds neat, and has some relation to ideas of "object ownership" which have been an area of research in the academic software engineering community for a while. Glad to see it's getting some airing in a language that Mozilla are using. My Masters thesis was on an object ownership system that included something akin to pointer borrowing :-)

A taste of Rust

Posted Apr 18, 2013 5:52 UTC (Thu) by ssmith32 (subscriber, #72404) [Link] (4 responses)

It's a nit, but I don't think Godel's Incompleteness Theorem applies.. You don't have to be able to construct a set of axioms that prove every mathematical truth in order to have a language that lets programmers say whatever they want to say.

A taste of Rust

Posted Apr 18, 2013 9:55 UTC (Thu) by k3ninho (subscriber, #50375) [Link] (1 responses)

The portion of Gödel's Incompleteness Theorem used here is the 'breaking the bounds of the box you specified' part. The incompleteness of two books for library cataloguing, one listing the books which do refer to themselves and one listing the books which don't refer to themselves, necessitates an incomplete catalogue and is a tidy example of Gödel's Incompleteness Theorem.

K3n.

A taste of Rust

Posted Apr 18, 2013 10:51 UTC (Thu) by tsmithe (guest, #57598) [Link]

No, that's not an example of Gödelian incompleteness; it's just an example of diagonalisation. Gödel's first theorem says specifically that no countably axiomatisable theory can be both consistent and complete; and, indeed, the proof of the theorem is a diagonal argument like that of your books. There are many fundamental results in mathematics -- such as Cantor's, or Church's -- along similar lines, and though related, they are all different results.

A taste of Rust

Posted Apr 18, 2013 13:26 UTC (Thu) by branden (guest, #7029) [Link]

I think you're mis-stating the results of Goedel's Incompleteness Theorems (there are two).

Rewording things very loosely (perhaps enough that I'll get myself into trouble), the GITs say that any semantic system sufficiently expressive to prove statements about simple arithmetic is necessarily at least either incomplete (meaning that some statements cannot be proven at all within the system) or inconsistent (meaning that some statements can be proven both true and false using only the basic axioms and other statements proven true within the system).

It is, of course, possible to have a system that is both incomplete *and* inconsistent.

So, basically, it's worse than you think:

"You don't have to be able to construct a set of axioms that prove every mathematical truth in order to have a language that lets programmers say whatever they want to say."

If you let programmers say whatever they want to say, you're guaranteed to have a language that permits them to express undefinable things. (Having just seen Peter Sewell's talk, "The C Language: It's not What You Think It Is!" at LFCS earlier this week, I have the strong impression that real-world programming languages will dash themselves upon the rocks of undefined semantics and/or machine state in multifarious ways long before reaching any sort of limitations on coherent expressiveness imposed by Goedel.)

Furthermore, I'm not sure I took part of your meaning correctly, but: an axiomatic system that directly asserted every mathematical truth would 1) not require logic or proofs, 2) consist of an uncountably huge number of axioms, surpassing all storage and computability requirements; and 3) per GITs, would be inconsistent anyway. (I think.)

N.B., I have not actually played with GITs' application to programming languages in any formal or academic sense, and I will happily yield the floor to someone who has.

A taste of Rust

Posted Apr 18, 2013 21:29 UTC (Thu) by tx (guest, #81224) [Link]

As others have said, I think the statement you make doesn't quite follow from the GIT; that being said, I think you're right that GIT doesn't apply.

"This could be viewed as an admission of failure to provide a truly safe language, however that view would be overly simplistic. In the first place, Gödel's incompleteness theorem seems to suggest that any programming language which is rich enough to say all that we might want to say is also so rich that it cannot possibly be considered to be "safe". "

I think of GIT w.r.t. programming languages more in the light of things like Turing completeness and the halting problem (where self-reference can lead to undecidability). "unsafe" seems a more practical than theoretical issue relating to functional purity.

Hm, now that I type this; however, it does occur to me that one real world usage of null pointer casting could be to subvert an otherwise consistent type system which would be necessary to allow a programmer to say whatever (s)he wants. So I guess I'm disagreeing with myself :)

Integer overflow

Posted Apr 18, 2013 6:15 UTC (Thu) by epa (subscriber, #39769) [Link] (11 responses)

Many security holes and latent bugs come from C's relaxed treatment of integer overflow. Does Rust do anything safer?

Integer overflow

Posted Apr 18, 2013 10:18 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (10 responses)

Rust does not do any better than C regarding numeric overflows. However, overflow bugs are particularly bad in C since they often translate into array-out-of-bounds type of access. Rust does not allow that.

But I agree that it would be nice to have a language that, if not deals with the overflow itself, at least allows to access overflow control registers that (all?) CPUs set. This way at least one can implement efficient overflow detection. I am puzzled that such feature is not present in any language that I am aware of.

Integer overflow

Posted Apr 18, 2013 10:29 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

The restricted integers supported by Ada are so simple and obvious that it's amazing so few other languages have adopted them.

Integer overflow

Posted Apr 18, 2013 10:46 UTC (Thu) by ibukanov (subscriber, #3942) [Link]

Right, I forgot that Ada allowed since its introduction to declare integers of arbitrary range and the compiler adds runtime checks to throw an exception on out-of-range operations. Do modern Ada compilers optimize those range checks using the CPU's overflow control flags when the range is a full machine word?

Integer overflow

Posted Apr 30, 2013 6:20 UTC (Tue) by FranTaylor (guest, #80190) [Link]

You can do the same thing in any language that supports operator overloading. Write your own setters and you can restrict any type in any way you want.

Integer overflow

Posted Apr 19, 2013 8:28 UTC (Fri) by k8to (guest, #15413) [Link] (6 responses)

You mean in a machine-representation oriented language? Python for example just overflows integers to non-machine representations, which seems quite reasonable.

Integer overflow

Posted Apr 19, 2013 9:12 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (2 responses)

> Python for example just overflows integers to non-machine representations,

I doubt the Python interpreter uses assembly code to detect if a*b overflows via checking the overflow control flag of a CPU. Rather I suppose it uses one of the tricks that C code must use for that. Those are slow compared with the assembly.

It is puzzling for me that modern C/C++ compilers do not offer compiler intrinsics to calculate a+b, a*b etc. together with the overflow status especially given that how trivial is to exploit many of overflow bugs and how non-trivial to write proper overflow detection without hurting performance.

Integer overflow

Posted Apr 19, 2013 9:20 UTC (Fri) by k8to (guest, #15413) [Link] (1 responses)

Maybe so, but it definitely satisfies the "deals with overflows itself" part of the query.

As the language programmer you really don't have any reason to care. It's not like the python program will run fast enough that the overflow check being optimized would ever matter.

Integer overflow

Posted Apr 19, 2013 12:18 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

> It's not like the python program will run fast enough that the overflow check being optimized would ever matter.

I suppose Python has to do overflow checks not only when doing arithmetics. For example, in Spidermonkey, JavaScript engine in Firefox, the code implementing a[index] operation had a path with an overflow check in the implementation code and that did affect benchmark performance.

Integer overflow

Posted Apr 30, 2013 9:11 UTC (Tue) by etienne (guest, #25256) [Link] (2 responses)

> Python for example just overflows integers

I wonder if a language exists which separate "data" and "counter" variables, and associates every "data" with a unit - considering counters do not tend to overflow.
When a "data" overflow, it would then change the unit to something more appropriate.
For instance, if a variable contains "300 millimetres" and I multiply it by 10^9, the resulting variable would not contain 0x2BA7DEF3000 millimetres but 300000 meters.
Obviously it can be bad in some rare cases, but it would also have a lot of advantages: you can compare "data" variables in different compatible units (miles and kilometres, degree Fahrenheit and Celsius, joules and calories...) and it could deduce the unit of calculus (divide miles by hours and compare it to meters per second) and flags errors (compare kilowatt per hours to Amperes)...

Integer overflow

Posted Apr 30, 2013 13:43 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

The second part has been implemented (see boost::unit). I don't believe that the first part has been in any of the unit libraries I've seen (they are mostly compile-time constructs and just reduce to doubles).

Integer overflow

Posted May 1, 2013 15:03 UTC (Wed) by dark (guest, #8483) [Link]

The first part is basically what floating point math is :)

A taste of Rust

Posted Apr 18, 2013 6:19 UTC (Thu) by dvdeug (guest, #10998) [Link] (2 responses)

I'm not sure why you say "30% of the code being potentially unsafe may seem like a lot, even for a compiler"; a compiler needs no unsafe code. It's a big function translating from a text file to binary; there are a handful of things that could be unsafe in a real-world compiler, but a large part of the compiler being unsafe means that much or most of any real-world program will end up being unsafe.

A taste of Rust

Posted Apr 18, 2013 8:24 UTC (Thu) by Tobu (subscriber, #24111) [Link]

First comment on the article.

A taste of Rust

Posted Apr 30, 2013 6:33 UTC (Tue) by FranTaylor (guest, #80190) [Link]

Even the longest journeys start with a single step.

The compilers code being "unsafe"

Posted Apr 18, 2013 16:53 UTC (Thu) by ebiederm (subscriber, #35028) [Link]

In a language that is sufficiently expressive it is possible to write a compiler that will not introduce bugs. For the constructive proof see the compcert C compiler http://compcert.inria.fr/.

Without dependent types Rust does not appear to be a language sufficiently expressive to write a compiler that will not introduce bugs. Rust seems to be seeking a different tradeoff between expressiveness and simplicity.

Fix the programming languages we have.

Posted Apr 22, 2013 14:44 UTC (Mon) by mpalamara (guest, #69530) [Link] (2 responses)

I don't need another programming or scripting language. I need to fix the ones I have.

Fix the programming languages we have.

Posted Apr 22, 2013 15:05 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link]

Well, a new programming language is certainly one type of "fix". Not everything can be retrofitted into existing languages due to design differences and backward compatibility.

Fix the programming languages we have.

Posted Apr 30, 2013 6:31 UTC (Tue) by FranTaylor (guest, #80190) [Link]

That's rather akin to patching up your old VW bus for a trip to the space station.

A taste of Rust

Posted Apr 26, 2013 8:46 UTC (Fri) by VITTUIX-MAN (guest, #82895) [Link] (1 responses)

"valid pointer or NULL" as in:

#define VALID 0^1^0

A taste of Rust

Posted Apr 27, 2013 20:04 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

This is one the main drawback of having your sentinel value(s) not being a discriminated union with your actual values. If it's not the sentinel, the code can only assume that it is valid. I suppose that in C, you could do some heuristics to check for plausible heap and stack pointers, but then if you see a pointer from some other allocator, you're going to reject perfectly valid pointers. Not to mention it being very not cross-platform.

A taste of Rust

Posted Apr 29, 2013 1:51 UTC (Mon) by cocoOS (guest, #90656) [Link] (1 responses)

any plan for a full support ide?...it's a pain in the ass coding a statically typed lang in a text editor...

I really like the language, the algebraic data types and the pattern matching looks great, but the syntax for lambdas is really ugly, why they chose the ruby lambda syntax instead a more clear and universal syntax line ((x : type) => ....//code here//) ....

A taste of Rust

Posted Apr 30, 2013 6:43 UTC (Tue) by FranTaylor (guest, #80190) [Link]

It comes with an emacs mode.

A taste of Rust

Posted Apr 30, 2013 6:39 UTC (Tue) by FranTaylor (guest, #80190) [Link] (1 responses)

Yet another attempt to emulate a tagged-architecture.

The computing world would be a much different place if the Intel iapx432 project had been successful.

A taste of Rust

Posted Apr 30, 2013 7:24 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

A world where the 432 was designed by sane people instead of architecture astronauts would have been quite interesting, yes.