July 3, 2013
This article was contributed by Neil Brown
Once upon a time, a new programming language could be interesting because
of some new mechanism for structured flow control. An if statement
that could guard a collection of statements would be so much easier
than one which just guarded a goto. Or a for statement which
took control of the loop variable could simplify matrix multiplication
significantly. An illuminating insight into this earlier age can be
found in Knuth's "Structured
Programming with go to statements" [PDF].
Many of the issues that seemed important in 1974 seem very dated
today, but some are still fresh and relevant.
The work of these early pioneers has left us with five basic forms
that appear to be common to most if not all procedural languages: two
conditional constructs, if and switch/case; two looping
constructs, while and for; and one encapsulation construct: the
function or procedure.
While interesting new control flow is unlikely to be a headline item
on a newly developed language these days, each language must embody
concrete choices concerning these structures and it is quite clear that,
while there is similarity, we are far from uniformity. Exploring how
a language handles control flow can provide interesting insights
into the philosophy behind the language. In this article, we will
continue our explorations of Go and Rust by looking at various
control-flow structures, but particularly focusing on the "for" loop.
The background of for loops
The for loop first appeared in programming languages as an easy way to
step through a fixed list of values. We can see this in Fortran, which
used the word do rather than for (here 10 is the label of the
statement after the loop):
do 10 i = 1, 100, 2
and in Algol58:
for i := 1(2)100 do
Algol60 adds some syntactic sugar
for i := 1 step 2 until 100 do
while Pascal dropped the step clause so you would need:
for j := 0 to 49 do
and then set i := j * 2 + 1 inside the loop.
The Algol60 for loop was actually quite rich as can be seen by the
examples here. It is a richness that probably seems
excessive by today's standard.
In C, which came a decade later, several of the ideas in Algol were
generalized and simplified to encapsulate all the interesting
possibilities in just three expressions: initializer, test, and step,
thus:
for (i = 1; i < 100; i += 2)
As the three expressions can be almost arbitrarily complex, very rich
looping constructs can be created from this simple form. The effect
is that the head of the for forms a coroutine that is executed in
concert with the body of the for loop. Control alternates between
one and the other, so that together they achieve the desired result.
The coroutine nature of the for loop's head is made particularly
obvious by the many (over 150) for_each macros that appear in the
Linux kernel. With these the code for one routine is physically quite
separate from the other, emphasizing the separate roles of the two
pieces of code. An example of such a for_each macro, from
include/linux/radix-tree.h is
#define radix_tree_for_each_slot(slot, root, iter, start) \
for (slot = radix_tree_iter_init(iter, start) ; \
slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
slot = radix_tree_next_slot(slot, iter, 0))
This example is interesting for a couple of reasons.
First, the middle expression — the loop-continue condition — is not
simply a condition, but contains an assignment and is sometimes used to
find the next value. This makes it clear that they aren't simply
expressions with fixed purposes, but rather three separate entry
points into a coroutine.
Secondly, it contains two variables that change throughout the loop:
slot and iter. The slot variable is
the regular loop variable that any for
loop would have, while iter contains extra state for tracking the
path through the list and is largely of internal interest.
While it is primarily internal, it needs to be visible externally, and,
in fact, needs to be declared externally. The for
statement has some properties of a coroutine, but cannot
define local variables for use throughout the loop.
So we see in the C for loop, particularly when combined with other
features of C such as the rich expressions and the macro preprocessor,
a very powerful, though not completely satisfactory, for loop
mechanism. One that will serve as a basis for examining others.
Go for — broke or beautiful?
The for loop in Go comes in three different forms — not quite the
range of Algol60, but seemingly more than C. One form is superficially very similar to that in C: the
parentheses are not required, and the loop body must be a
"block" rather than a simple statement. But these are syntactic
differences which don't affect expressiveness. The earlier
iterative example looks much the same in Go as in C:
for i := 1; i < 100 ; i += 2 { .... }
The parallel ends there, however. Simple for loops will look much the
same, but complex for loops will have to look quite different. This is
partly because Go has no macro preprocessor and partly because Go
expressions are not as rich as C expressions. While the C for loop
simply contains three expressions, the Go for loop contains a "simple
statement", an "expression", and another "simple statement", where
"simple statement" specifically includes assignments and
increments/decrements.
Were we to try a literal translation of the radix tree for_each loop
into Go, we would have mixed success. Go allows the declaration of local
variables inside a for loop head, so there would be no need to declare
slot and iter separately. However, as the condition in a Go for
statement cannot contain assignments, we find a complete literal translation
is impossible. Of course measuring a language by how literal translations from another
language fare is far from reasonable — we may not be using the best
tool for the job and, as already noted, there are other forms of the
for loop in Go.
The second form is really a reduced version of the first, with the two
simple statements missing and, thus, their semicolons discarded:
for i < 100 { ... }
That form is essentially what many other language would call a while loop.
This leaves the final form — the for/range loop.
for x := range expression { ... }
will iterate though members of the result of the expression in various
ways depending on the type of the result. This makes explicit a
difference from the for loops in the earlier languages. For
Fortran, ALGOL, and Pascal, the for loop dealt with sequences of
numbers, or possibly "enumerated constants" which are very number-like.
As we have seen, C can work with arbitrary values and the Go range
clause make it clear that this loop is for much more than just numbers.
The value can be an array, a slice (part of an array), a string (of
Unicode characters), a map (also known as a "hash",
"associative array", or "dictionary" in other languages), or a
"channel" (used for IPC). In the first four cases the for loop steps through the
components of the value in a fairly obvious way. Channels are
a bit different and will be examined shortly. As range does
not work with user-defined types at all, we cannot
translate our "radix_tree" loop directly into for/range and so must
look elsewhere.
A reasonable place to look might be some existing body of Go code to
see how such things are done. Though the Go compiler is not written
in Go, the Go language source distribution includes many tests,
libraries, examples, and tools written in Go, with a total of 2418 .go
source files, all of which were presumably written by people quite familiar
with the
language. Altogether, there are over 7000 for loops to consider.
Of these, 1200 are of the while loop form, nearly 2800 are for/range
loops, and the remaining 3000 are in the three-part form, the vast majority
of which have a numeric loop variable (demonstrating that the numeric
loops of yesteryear are very much alive and well). So there are not a
lot of examples of iterating user-defined data structures — a fact which
itself might be significant.
One example of interest is in
src/pkg/container/list/list_test.go:
for e := l.Front(); e != nil; e = e.Next() {
le := e.Value.(int)
....
This example is not vastly unlike the for_each macros we saw written
in C. The syntax is clearly different, but the idea of having a very
simple "head" on the for loop, with the actual code for the
coroutine being off in a different file, is represented quite clearly.
The for loop fragment given could easily be for almost any data
structure. If there was a desire to keep the value (le above) more
distinct from the iterator (e above), a construct like:
for slot, iter, ok := l.Front(); ok; slot, ok = iter.Next() {
could return a sequence of slots using an iterator much like the
radix_tree_for_each_slot loop we saw earlier. This construct is
really quite elegant and extremely general.
Another interesting example occurs in various files in
src/pkg/net,
such as src/pkg/net/hosts.go and takes the form:
for line, ok := file.readLine(); ok; line, ok = file.readLine() {
This is very similar to the Front/Next example, except that Front
and Next are identical. This could be considered to violate the DRY
principle: Don't Repeat Yourself.
In C, this sort of loop is regularly written as:
while ((line = fgets(buf, sizeof(buf), file)) != NULL) {
but that cannot be used in Go, as expressions do not include assignments.
This issue of expressions excluding assignments has clearly not gone
unnoticed by the Go designers. If we look at the if and
switch statements, we see that, while they can be given a
simple expression, they can also be given a simple statement as well,
such as:
if i := strings.Index(s, ":"); i >= 0 {
which includes both an assignment and a test. This would work quite
nicely for the readLine loop:
while line, ok := file.readLine(); ok {
except that Go does not provide a while loop — only a for loop. Though the for loop does include two simple statements, neither are
executed at a convenient place to make this loop work as expected. So
if we are to remove the repetition of the readLine call, we must look
elsewhere.
One possibility is to explore the fact that while expressions do not
include assignments, they do include function calls and functions can
include assignments. Go supports function literals. This means that the body of a function
can be given anywhere the name of a function can be used. The body of
a function may be assigned to a variable, or it may be called in
place. Further, the function so defined can access any variables that
are in the same scope as the function. So:
for line := "";
func() (ok bool) {
line, ok = file.readLine()
}(); {
is a for loop in the three-part form which behaves much the same as
the example above from hosts.go but without repetition.
The "initialize" part of the for loop (line := "")
declares a new variable, line which is initialized to the empty
string (it syntactically needs to be initialized to something, though
the value won't be used).
The "condition" part of the loop is an immediate call to a function
literal which calls file.readLine(), returns the ok part of the
result and has a side effect of assigning the line part of the
result to the line variable.
The = form of assignment is needed in the function, rather than the
:= form, so that it does not declare a new line variable, which is
local to the function, but instead uses the one local to the for
loop.
The "next" part of the loop is empty, and appears between the second
; and the {.
While this does remove the unfortunate repetition of the readLine
call, the cure turns out to be much worse than the disease, as the loop
is close to unreadable. While function literals certainly have their
place, this is not that place.
This leaves one more possibility to explore — it is time to examine that
"range channel" construct hinted at earlier.
Channels
Concurrency and multiple threads (known as goroutines) are deeply
embedded in Go, and the preferred mechanism for communicating between
goroutines is the "channel". A channel is somewhat like a Unix
pipe. It conceptually has two ends, and data written to one end can be
read from the other. While a pipe can only pass characters or strings
of characters, a channel can pass any type known to Go, including
other channels.
for i := range my_channel {
will repeatedly assign to i each value received from my_channel
and then run the body of the for loop. This is a lot like our
readLine example — if only we could make lines appear on a channel.
And, of course, we can.
func lines (file *file) (<- chan string) {
ch := make(chan string)
go func () {
for {
line, ok := file.readLine()
if !ok { break }
ch <- line
}
close(ch)
}()
return ch
}
This lines function creates a channel (the make function) and
starts a goroutine (the function literal after the go keyword) that
sends lines back over the channel. This could be called as:
for line := range lines(file) {
which will very cleanly iterate over all the lines in the file with
no violation of the DRY principle.
However, further examination shows that this isn't really ideal. It
certainly works in the simple case, but problems arise when you
break or return out of the for loop. When you do that, the
channel is not destroyed and the goroutine remains in existence trying
to write to it, though no one will ever read it again.
Go has built in garbage collection that will reclaim unreferenced
memory, but not unreferenced goroutines.
In order to clean up properly here, we would need to close the channel
after breaking out of the for loop. Strangely only the write end of a
channel can be closed and, since the return value of our lines function is
currently the read end (<- chan string), we need to change it to
return the double-ended channel. We also need to declare a variable
to hold the channel:
func lines (file *file) (chan string) {
ch := make(chan string)
go func () {
for {
line, ok := file.readLine()
if !ok { break }
ch <- line
}
close(ch)
}()
return ch
}
...
c := lines(file)
defer close(c)
for line := range c { ... }
Now we have a for loop that iterates over lines in a file, but that we
can break out of without leaking channels or goroutines. However it
isn't really elegant any more. Needing to return both ends of the channel,
needing to declare a separate variable to hold that channel, and the
explicit defer close are all warts which tarnishes the elegant:
for line := range lines(file)
The conclusion is that despite the repetition, the form used in the
net package of:
for line, ok := file.readLine(); ok; line, ok = file.readLine() {
does seem to be the best way to implement the task. All of the
alternatives fall short.
From loops to philosophy
It is in that last observation that part of the philosophy of Go seems
to show itself. While Go offers a lot of functionality, it often
seems quite restrictive in how this functionality is accessed. This is reminiscent of the 13th aphorism from the Zen of Python:
There should be one — and preferably only one — obvious way to do it.
We see this restrictiveness in for loops where the range syntax is
only available for built-in types, and where the first/next structure is
really the only way to do other for loops, even if it involves
repeating yourself.
We can see a similar pattern with inter-goroutine communication, where
channels have a privileged status. There are several language
facilities that only work with raw channels much like for/range only
works with internal data types. Send (ch <- v), receive (v <- ch), and the
select statement (which is a bit like switch but chooses
which of several blocking operations is ready to run) are completely
unavailable to user-defined types.
Where Python provides a default implementation for "maps", but allows
a class to provide an independent implementation using the same
syntax, Go provides a built in "map" data type and permits no
substitutes. The Go FAQ makes it clear that this is a conscious
decision and not an oversight:
We believe that Go's implementation of maps is strong enough that
it will serve for the vast majority of uses.
This is probably why we found so few examples of iterating user-defined
data structures in the Go code — maps are used instead.
Finally, even the syntax has an element of restrictiveness. We saw
this briefly in a previous article where the handling
of semicolons impose certain style choices on the programmer. We can
see it also in the go fmt command, which will reformat the code
in a .go file to follow a particular standard. While this is not
imposed on programmers, the language designers recommend the
use of go fmt to ensure that code follows the one true layout.
This philosophy certainly has a lot to recommend it. By removing
options from the programmer, the language removes the need to make
choices and so frees the programmer to focus on the actual
functionality that they need. It is a philosophy that also imposes heavy requirements on the language and
support environment. If there is only one way to do something, then
that one way had better work extremely well. Given the vibrant
community that has been built up around Go, and the strong emphasis on
performance shown in the recent release of Go 1.1, it seems likely
that Go does live up to this requirement
Rusty loops
Turning to Rust we see a very different style of for loop.
The example loop we started with which iterates over odd values from 1
to 99 would look like:
for uint::range_step(1, 100, 2) |i| { ... }
Here the:
|i| { ... }
piece is a function literal, similar to those we saw when exploring Go,
though
with a very different syntax and a different name. Rust like many other
languages calls it a lambda expression. It consists of a list of formal
parameters between vertical bars, and a statement block.
The
uint::range_step(1, 100, 2)
is a reference to a function called range_step in the
uint module. The uint::range_step()
function actually takes 4 arguments: start, stop, step, and function. The behavior of range_step() is to call
function, repeatedly passing values from start up to the
stop, incrementing by step each time. Consequently our for loop
could be realized simply by:
uint::range_step(1, 100, 2, |i| {
...
})
There are two problems with this. A minor point is that the syntax is
arguably less pleasing than the first version. More importantly,
constructs like break and continue don't have any meaning inside a
function literal, so they could not affect the flow of this second loop.
The for statement addresses both of these. It provides syntax for
writing the function literal outside the normal list of function
parameters and it gives meaning to break, loop (the Rust
equivalent of continue), and return.
By convention, the function in the head of for should stop looping when
the function argument that it calls returns false. The for statement
uses this by effectively translating break to return false and
loop to return true. If any return statement appears in the body of the for loop, it is also
translated to something that will "do the right thing".
This seems like a fairly complex set of transformations, but the end
result is extremely flexible. It allows a very clear separation of the
two coroutines that make up a for loop, with the head routine having
the full power of a regular function that is able to declare local variables
and to communicate in arbitrary ways with the body routine.
Both the "iterate over all the lines in a file" loop which we struggled
with in Go, and the radix tree loop from the Linux kernel, would be
trivial to implement as an iterator routine in Rust. The first of
these would look like:
pub fn every_line(f: @io::Reader, it: &fn(&str) -> bool) {
while !f.eof() {
let line = f.read_line();
if !it(line) { break }
}
}
and could be called as:
let f = io::file_reader(&Path("/etc/motd")).get();
for every_line(f) |line| {
io::println(fmt!("Line is %s", line));
}
This power to write elegant iterators is not without its
cost. While Rust allows an arbitrary function to provide the head of
the for loop, it also requires the head of the for
loop to be some function. The simple initialize, test, increment
form of C and Go cannot be used.
If we go back and look at the nearly 3000 for loops in the Go source
code that use a numeric loop variable, we find that the vast majority
of them could be implemented using uint::range_step() or even the simple
uint::range(). But not all. Some examples include:
for ; i > 0; i /= 10 {
for (mid = (bot+top)/2; mid < top; mid = (bot+top)/2) {
for n := 1; n <= 256; n *= 2 {
for rate := 0.05; rate < 10; rate *= 2 {
for parent := ".."; ; parent = "../" + parent {
(the last one does not have a numeric variable of course, but is still
a useful example).
Several of these could be supported by adding a very small number of
extra iterators to the standard library, the rest could just
as easily be implemented with a while loop. So this limitation
doesn't really limit Rust a significant amount.
A Rusty philosophy?
We see, in the for loops of Rust, a very different philosophy to that
of Go. While Go forces you into a particular mold, Rust lets you
build your own mold with enormous freedom. You could even modify the
exact behavior of break inside your for loops if that seems like a
useful thing to do.
This freedom and flexibility extends to other parts of Rust too. In
last month's article, we
saw that Rust does not draw a distinction between
expressions and statements, so it allows if and match constructs (the
latter being similar to switch) deeply inside expression, whereas Go
does not permit such things.
Rust goes even further with a rich macro language that can
declare which syntactic elements (e.g. identifier, expression, type)
may replace each macro parameter, and can repeat the body of the macro
if the parameter is a list. This leaning towards extreme flexibility seems to pervade Rust and is
reminiscent of the Perl programming motto: There is more than one way to
do it.
Summary
There will always be a tension in language design between allowing
the programmer freedom of expression and guiding the programmer
toward clarity of expression. In a previous article, we saw
how the type system of Rust prefers clarity over freedom. Go is not
such a stickler, and is satisfied with run-time type checks in places
where Rust would insist on compile-time checks. Here, when we look at the structuring of statements and expressions, we
find Rust prefers freedom while Go seems more focused on clarity by
eliminating unnecessary flexibility.
Which of these is to be preferred is almost certainly a very personal
choice. Some people rebel against a constraining environment,
others relish the focus it allows them. Both provide room for
creativity and productivity. Go and Rust provide very different
points in the spectrum of possibilities and it is good to have that
choice ... except that it does mean that you have to choose.
(
Log in to post comments)