|
|
Subscribe / Log in / New account

An introduction to the Julia language, part 2

September 4, 2018

This article was contributed by Lee Phillips

Part 1 of this series introduced the Julia project's goals and development process, along with the language syntax, including the basics of control flow, data types, and, in more detail, how to work with arrays. In this part, user-defined functions and the central concept of multiple dispatch are described. It will also survey Julia's module and package system, cover some syntax features, show how to make plots, and briefly dip into macros and distributed computing.

Multiple dispatch

Many high-level languages come with a built-in opinion about how you should organize your code. You are free to ignore these opinions, but that involves going against the grain and failing to take full advantage of the language's features. For example, Python is a class-based object-oriented language, where code tends to be organized around classes that inherit from other classes; the Lisp family encourages the programmer to use macros and small, composable functions to create a "domain specific language" suitable to the problem at hand; APL programmers express their problems as operations on entire arrays and are loath to write loops. Julia is organized around a principle different from these. To learn what that is, we first need to learn how functions are created.

Define functions using the function keyword:

    julia> function pdiff(a, b)
	       if b > a
		   return b - a
	       else
		   return 0
	       end
	   end
    pdiff (generic function with 1 method)

Notice the slightly odd message returned by the read-eval-print loop (REPL) after it digests the function definition. If you keep the REPL open and type in a new function definition, such as:

    function pdiff(a::String, b::String)
	if length(b) > length(a)
	    return b[length(a)+1:end]
	else
	    return ""
	end
    end

Then you will get a slightly different message back:

    pdiff (generic function with 2 methods)

In our second definition, we've specified the types of the arguments: both of them must be a String. In the original definition, the types are left unspecified. We now have two versions of the generic function (or "method") called pdiff(). When we invoke the pdiff() function, the compiler will use the method most specific to the types of the arguments passed. If they are both strings, this will be the second version:

    julia> pdiff(3, 9)
    6

    julia> pdiff("xx", "abcdef")
    "cdef"

The compiler will always examine the types of all the arguments passed to a function and choose the method definition that is most specific to those types. This behavior is so important to Julia's design that its creators consider it the central organizing principle of the language. It's called "multiple dispatch", referring to the fact that the types of all the arguments determine the method called, not merely the first. Other object-oriented languages dispatch based on other attributes, such as the class of an object being referenced.

Multiple dispatch brings function definition in Julia closer to mathematical thinking, where the definition of a function or operator depends on the types of all of its arguments, rather than just some of them. It also gives the programmer a way to organize the ideas in their code; it is the way Julia itself (which is mainly written in Julia) is organized internally. Multiple dispatch allows using the same name for related operations, each with its own implementation under the hood.

For example, you might write a distance() function, that calculates the absolute value of the difference between two numbers. You could use the same function name to operate on two one-dimensional arrays, interpreted as vectors, that returns the Pythagorean distance between them. Later, you could add more methods to calculate the distance between two manifolds, two nodes on a network, or two samples of text, using whatever definitions are useful for your particular problem. These more complex cases would be enabled by Julia's type system, where user-defined types can be elaborate collections of other types and are treated the same as fundamental types by the compiler. By thoughtful combining of your own data types with the multiple dispatch mechanism, you can program using high-level concepts natural to your problem domain, without sacrificing efficiency.

As mentioned above, you don't need to know much about Julia's type system to use the language productively. However, you should understand multiple dispatch and how to specify types in function signatures to create different methods. The methods() function, typed into the REPL, will list all the methods defined under a particular name and their type signatures. This works for operators, as well, which are defined as functions: try typing methods(*) to see the list of 376 versions of the multiplication operator — this is how "*" can be used to multiply integers and floats, concatenate strings, perform matrix multiplication, and serve any other purpose that may be conceptually related to the concept of multiplication.

Modules and packages

In my opinion, Julia's package and module system is a major selling point for the language, so it's worthwhile to go into it in a little detail.

If you've maintained software projects of any complexity that depend on external libraries, you've probably run into some version of "dependency hell". Upgrading the language or any external library on the system, or installing the project on a different machine, may cause the program to stop working and lead to hours of unproductive work tracking down incompatibilities and bugs. Python addresses this problem through virtual environments and package managers, which, until recently, were third-party projects. Another approach is Docker, which creates entire virtual machines as containers, each with its own network interfaces — overkill for most users.

Julia solves the dependency problem through several mechanisms built into the language. A project, which is a directory tree with code and other assets, can include a manifest that details all of the external resources used by the code, including the version numbers. All these requirements can be automatically put in place whenever needed.

Code can load external packages, which are projects designed to export code resources. Functions for export/import are stored in modules, which are another type of named code block within the package. Unlike many other languages, there is no relationship between module names and filenames: a module can be split among many files and a file can contain many modules. A package can contain any number of modules, but must have at least one if it wants to make functions available for importing.

If a package is not already installed on your system, you can get it by calling Pkg.add("packagename"). This will download the necessary files from the official registry and install them. To use the functions from a module included with the package, type using modulename. This will import the functions marked for export in the module and precompile them. Their bare names will then be available in the local namespace. If, instead, you issue the statement import modulename, you will then use its exported functions under names like modulename.function(). Pkg itself is part of the standard library; before you use it you need to say using Pkg.

The REPL has a special mode for manipulating packages, that you enter by typing "]". In Pkg mode, you can simply type, for example, add packagename.

The Julia community has created over 1900 packages. You can explore the official registry, divided into categories, at the Julia Observer site. These projects run the range from mature and polished to unfinished experiments and cover a wide variety of fields. Unfortunately, many of these packages do not work at the moment and will not work until they are upgraded to conform to the current language version. As the Julia team mentioned in email, there is a wide array of packages that have been developed for Julia, many of which are far outside the mainstream numerical/scientific application domain. These include a web framework, code for scripting Minecraft, Sudoku-as-a-service, a music manipulation library, and more.

Other features for the numericist

Julia is replete with features that make the life of the numerical scientist more convenient. It includes all the usual math functions, special functions (Bessel, Hankel, Airy, etc.) are provided by an external library, statistical functions are part of the standard library, and so on.

Julia can work with complex numbers, using im for the imaginary unit. Note that sqrt() is not defined for negative numbers unless they are complex, as shown here:

    sqrt(Complex(-4)) == 0.0 + 2.0im

Julia allows writing nested loops without explicitly nested blocks, enabling more concise, less deeply indented code that more closely resembles the way summations are written on paper:

    for i = 1:2, j = 3:4
	println((i, j))
    end

This block produces the output

    (1, 3)
    (1, 4)
    (2, 3)
    (2, 4)

Another piece of syntactic convenience is the arrow operator, which allows the chaining of functions from left to right without a lot of parentheses:

    f(x) |> g |> h == h(g(f(x)))

Technical computing often involves some type of plotting of the results. The Julia community has embraced a plotting "metapackage" called "Plots". The idea behind it is to present a unified, powerful plotting language to the user that is independent of the actual backend. There are multiple Backends for the package, including for Matplotlib, the Plotly JavaScript plotting library, the LaTeX drawing system PGF/TikZ, and more. There is also a particularly nice backend that draws plots right on the console, called UnicodePlots. Before using Plots for the first time, you may need to add the package with Pkg.add("Plots"); most of the backends are in separate packages, as well.

One of the aims of Plots is to be intuitive and "smart", with the ability to figure out the plot that you want. Whether or not it has achieved this lofty goal, it is easy to use, and the ability to switch plotting backends without changing your code is a nice feature. The backend is selected in an interactive session by calling a function made of its name transformed to all lowercase. Before plotting, we must also import the Plots module with a using command:

    julia> using Plots

    julia> plotly()
    Plots.PlotlyBackend()

    julia> plot(sin, 0, 2π)

After entering this in the default REPL prompt, your web browser will open a new tab containing the plot, which has some interactive controls for scaling and panning that appear upon hovering:

[Julia with Plotly]

Here is how to draw the same plot with a different backend:

[Julia Unicode plot]

There are some rough edges at the moment: my attempt at using the Matplotlib backend "PyPlot" led to a segmentation fault with the REPL and failed to produce a plot from the Jupyter console, leaving the console in an unusable state. Also, the first plot that you make in a session using a particular backend takes a good long while to appear, but subsequent plots are fairly quick.

If you want to use a plotting system other than the ones that interface with Plots, there are many options. For example, there is an interface to gnuplot and a pure Julia graphics package called Gadfly.

Macros

Julia has extensive support for metaprogramming, including full, Lisp-like macros. In fact, Julia itself is quite Lisp-like under the hood. Expressions like 1 + 2 + 3 are represented internally in a way closer to s-expressions and the user can use this internal representation. Try typing +(1, 2, 3) at the REPL prompt.

I won't go into macros in detail this article, but you should know that they give you the power to use the language to rewrite its own syntax and to create new language features. If you want Julia to have, say, a kind of control flow not yet provided in the language, perhaps an until keyword, you can create it yourself with a macro. This is a level of power most commonly seen in the Lisp family of languages, though there are other languages with robust macro support.

In Julia, you define a macro with a code block using the keyword macro and invoke it with the syntax @macroname(). Here is a very simple and practically useless example, to give you a general idea of how the machinery works:

    macro dblefun(f, x)
	return :( $f($f($x)) )
    end

This block defines a macro called dblefun, that takes a function f and composes it once with itself, applying the resulting doubled function to the second argument x. The :( ...) syntax defines an "expression object", which is used to "quote" what's inside the parentheses without evaluating it. Inside an expression object, the "$" is used to interpolate a value, in this case from the arguments supplied to the macro. We can verify that this macro works as intended from the REPL:

    julia> @dblefun(sin, π/2)
    0.8414709848078965

    julia> sin(sin(π/2))
    0.8414709848078965

The power of macros comes from the ability to use the entire language to manipulate expressions inside the macro definition, which allows Julia to rewrite its own syntax.

Distributed computing

As befits a language designed for high-performance numeric programming, Julia has provisions for parallel and distributed computing. You don't need to reach for third-party libraries for this, as the capability is built into the language or uses the standard library. I'll try to provide a bird's-eye overview of the landscape. The official documentation covers this area in considerable detail.

Julia provides keywords and functions that allow the programmer to define coroutines, which are functions that communicate with each other through "channels". You can tell your functions to send data through a channel, or to wait for data to appear on a channel. Functions are automatically suspended and resumed as data on the channel becomes available. The coroutine mechanism is ideal for situations where the execution time of a function is unpredictable; for example, using coroutines your program can do something else while waiting for data from the internet.

Julia also makes it simple to perform parallel computation, either on multiple local compute cores or on networked machines. The same code can be used in both situations; whether multiple cores, multiple machines, or a combination of both are used just depends on the arguments used when invoking Julia. If you start the REPL with julia -p N then it will be started with N worker processes, which should be set equal to the number of CPU threads on your machine. This argument also automatically loads a module that makes the parallel processing commands and macros available.

One of these is a parallel map() command, called pmap(). The regular map() command works as in other languages in which it appears; map(f, a) applies the function f() over the array a, returning an array of results of the same shape as a. The parallel version is pmap(f, a), which automatically distributes the work over the available threads. There are a handful of other commands and macros, such as @distributed() and @spawn(), which ship out work to available processing resources for parallel execution and fetch(), which gathers the results.

If you invoke Julia with the argument --machine-file filename rather than -p n, then all of the parallel code will run on the networked machines listed in filename (using all the CPU cores on each machine) transparently, assuming that key-based SSH logins have been set up on each machine. The machine file can be as simple as a list of hostnames. Of course, the machines can be nodes on a supercomputing cluster, or a set of heterogeneous servers around the world. Using this mechanism, without changing any of the code, a program can be run on an infinite variety of parallel computing environments.

Parting words

Julia goes a long way toward solving the "two-language problem", since it is quick to develop in, while producing fast, native code. Its drawbacks are that it is not well suited to system scripting, because of a somewhat slow startup time; occasional sluggish response at the REPL prompt while the JIT compiler does its thing; an ecosystem that, while growing rapidly, can not yet compete with, for example, Python's; and an at-times idiosyncratic syntax that is not to everyone's taste.

Considering its youth, and the well-established alternatives available, Julia has seen an impressive degree of adoption by a wide variety of users. As the Julia team pointed out, Julia is already in use at more that 700 universities and has become part of the curriculum at many of them. A few years a go I speculated that Julia would eventually supplant Fortran as the language of choice for large-scale simulations and other demanding numerical applications. With the release of version 1.0 and Julia's rapidly increasing adoption, I'm feeling pretty sanguine about my prediction.


Index entries for this article
GuestArticlesPhillips, Lee


to post comments

An introduction to the Julia language, part 2

Posted Sep 5, 2018 7:18 UTC (Wed) by rsidd (subscriber, #2582) [Link] (8 responses)

Unlike much of Julia, the "chaining" of functions (which I hadn't yet encountered)
f(x) |> g |> h == h(g(f(x)))
seems non-intuitive and contrary to mathematical notation to me. Usually "function composition" is written left to right, ie

h ◦ g ◦ f(x) ≡ h(g(f(x)))

and Haskell, for example, follows that with dot-composition -- you would write

h . g . f (x)
in Haskell.

An introduction to the Julia language, part 2

Posted Sep 5, 2018 7:33 UTC (Wed) by karkhaz (subscriber, #99844) [Link]

Ocaml also uses the left-associative |> notation. It's a nice homage to the POSIX shell pipe, i.e. piping output from one function into another. But it's certainly strange to see that notation in a language aimed more at scientists and mathematicians, who probably won't have seen a shell pipe anyway, and where they're likely to be more familiar with right-associative composition.

An introduction to the Julia language, part 2

Posted Sep 5, 2018 12:08 UTC (Wed) by bpearlmutter (subscriber, #14693) [Link] (1 responses)

In Haskell, >>= goes the other way.

An introduction to the Julia language, part 2

Posted Sep 17, 2018 15:04 UTC (Mon) by nybble41 (subscriber, #55106) [Link]

A closer parallel for right-to-left function composition would be (>>>) from Control.Category rather than monadic bind (>>=):

(f >>> g >>> h) x == h (g (f x))

However, this is not quite the same as the Julia code, which was using flipped function application (Haskell: Data.Function.&), not composition:

f x & g & h == h (g (f x))

The Julia |> operator matches the syntax for flipped function application in F#.

An introduction to the Julia language, part 2

Posted Sep 5, 2018 13:52 UTC (Wed) by leephillips (subscriber, #100450) [Link]

It works the same way as the threading macro in Clojure: https://clojure.org/guides/threading_macros (where, because it's a lisp, you only need to write the operator once).

The idea is more that you are sending data through a pipeline, rather than composing functions (although, of course, it's the same thing).


An introduction to the Julia language, part 2

Posted Sep 5, 2018 14:01 UTC (Wed) by leephillips (subscriber, #100450) [Link]

By the way, a function composition operator was recently added to Julia:

(f ∘ g)(x) == f(g(x))

You can get it by typing \circ<tab> in the REPL if you want.

An introduction to the Julia language, part 2

Posted Sep 5, 2018 15:06 UTC (Wed) by epithumia (subscriber, #23370) [Link] (2 responses)

I think this is one of those things where there isn't true consistency of notation. For example, in undergrad algebra I learned function composition in the order that Julia does it. Wikipedia even has a section about the notational disconnect: https://en.wikipedia.org/wiki/Function_composition#Altern...

An introduction to the Julia language, part 2

Posted Sep 8, 2018 19:44 UTC (Sat) by jrn (subscriber, #64214) [Link] (1 responses)

Just curious: what textbook did you use for undergrad algebra? E.g. was it Topics in Algebra by Herstein?

An introduction to the Julia language, part 2

Posted Oct 23, 2018 23:36 UTC (Tue) by epithumia (subscriber, #23370) [Link]

I took two years of undergrad algebra. We did use Herstein's text, and Hungerford's, and some random other things that I can't even remember, plus a ream of hand-typed notes from one professor. (And this was the 90s, so typewriters were getting pretty rare by then. He did not use an electric typewriter.) Given that this professor was eclectic enough to refuse to acknowledge daylight savings time, we got to see all sorts of interesting notational issues.

An introduction to the Julia language, part 2

Posted Sep 5, 2018 19:12 UTC (Wed) by mtaht (subscriber, #11087) [Link]

I sat down to learn a bit of julia last night due to this article. It's pretty straightforward...

but it's not clear to me how well the threading model works - the (possibly out of date) doc claims it's a single cpu only as yet....

An introduction to the Julia language, part 2

Posted Sep 8, 2018 0:44 UTC (Sat) by quietbritishjim (subscriber, #114117) [Link] (1 responses)

The multiple dispatch ability sounds like overloaded functions in C++. The example given would be like overloading these two functions:

template <class T>
decltype(b-a) pdiff(T a, T b);
string pdiff(string a, string b);

An introduction to the Julia language, part 2

Posted Sep 8, 2018 2:45 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

Except in Julia it works in runtime.

Multiple dispatch

Posted Sep 17, 2018 10:16 UTC (Mon) by oldtomas (guest, #72579) [Link]

To counter academic amnesia a bit, I'd like to point out that multiple dispatch is in CLOS, the standard object system in Common Lisp (at least Guile Scheme inherits this). Sometime around 1990.

Also, R has multiple dispatch since its "S4" object system, in the whereabouts of 1998.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds