An introduction to the Julia language, part 2
Part 1 of this series introduced the Julia project's goals and development process, along with the language syntax, including the basics of control flow, data types, and, in more detail, how to work with arrays. In this part, user-defined functions and the central concept of multiple dispatch are described. It will also survey Julia's module and package system, cover some syntax features, show how to make plots, and briefly dip into macros and distributed computing.
Multiple dispatch
Many high-level languages come with a built-in opinion about how you should organize your code. You are free to ignore these opinions, but that involves going against the grain and failing to take full advantage of the language's features. For example, Python is a class-based object-oriented language, where code tends to be organized around classes that inherit from other classes; the Lisp family encourages the programmer to use macros and small, composable functions to create a "domain specific language" suitable to the problem at hand; APL programmers express their problems as operations on entire arrays and are loath to write loops. Julia is organized around a principle different from these. To learn what that is, we first need to learn how functions are created.
Define functions using the function
keyword:
julia> function pdiff(a, b) if b > a return b - a else return 0 end end pdiff (generic function with 1 method)
Notice the slightly odd message returned by the read-eval-print loop (REPL) after it digests the function definition. If you keep the REPL open and type in a new function definition, such as:
function pdiff(a::String, b::String) if length(b) > length(a) return b[length(a)+1:end] else return "" end end
Then you will get a slightly different message back:
pdiff (generic function with 2 methods)
In our second definition, we've specified the types of the arguments:
both of them must be a String
. In the original definition, the
types are left unspecified. We now have two versions of the generic
function (or "method") called
pdiff()
. When we invoke the pdiff()
function, the
compiler will use the method most specific to the types of the arguments
passed. If they are both strings, this will be the second version:
julia> pdiff(3, 9) 6 julia> pdiff("xx", "abcdef") "cdef"
The compiler will always examine the types of all the arguments passed to a function and choose the method definition that is most specific to those types. This behavior is so important to Julia's design that its creators consider it the central organizing principle of the language. It's called "multiple dispatch", referring to the fact that the types of all the arguments determine the method called, not merely the first. Other object-oriented languages dispatch based on other attributes, such as the class of an object being referenced.
Multiple dispatch brings function definition in Julia closer to mathematical thinking, where the definition of a function or operator depends on the types of all of its arguments, rather than just some of them. It also gives the programmer a way to organize the ideas in their code; it is the way Julia itself (which is mainly written in Julia) is organized internally. Multiple dispatch allows using the same name for related operations, each with its own implementation under the hood.
For example, you might write a distance()
function,
that calculates the absolute value of the difference between two
numbers. You could use the same function name to operate on two
one-dimensional arrays,
interpreted as vectors, that returns the Pythagorean distance between
them. Later, you could add more methods to calculate the distance between
two manifolds, two nodes on a network, or two samples of text, using
whatever definitions are useful for your particular problem. These more
complex cases would be enabled by Julia's type system, where user-defined
types can be elaborate collections of other types and are treated the same
as fundamental types by the compiler. By thoughtful combining of your own
data types with the multiple dispatch mechanism, you can program using
high-level concepts natural to your problem domain, without sacrificing
efficiency.
As mentioned above, you don't need to know much about Julia's type
system to use the language productively. However, you should understand
multiple dispatch and how to specify types in function signatures to
create different methods. The methods()
function, typed into the
REPL, will list all the methods defined under a particular name and their
type signatures. This works for operators, as well, which are defined as
functions: try typing methods(*)
to see the list of 376
versions of the multiplication operator — this is how "*" can be
used to multiply integers and floats, concatenate strings, perform matrix
multiplication, and serve any other purpose that may be conceptually
related to the concept of multiplication.
Modules and packages
In my opinion, Julia's package and module system is a major selling point for the language, so it's worthwhile to go into it in a little detail.
If you've maintained software projects of any complexity that depend on external libraries, you've probably run into some version of "dependency hell". Upgrading the language or any external library on the system, or installing the project on a different machine, may cause the program to stop working and lead to hours of unproductive work tracking down incompatibilities and bugs. Python addresses this problem through virtual environments and package managers, which, until recently, were third-party projects. Another approach is Docker, which creates entire virtual machines as containers, each with its own network interfaces — overkill for most users.
Julia solves the dependency problem through several mechanisms built into the language. A project, which is a directory tree with code and other assets, can include a manifest that details all of the external resources used by the code, including the version numbers. All these requirements can be automatically put in place whenever needed.
Code can load external packages, which are projects designed to export code resources. Functions for export/import are stored in modules, which are another type of named code block within the package. Unlike many other languages, there is no relationship between module names and filenames: a module can be split among many files and a file can contain many modules. A package can contain any number of modules, but must have at least one if it wants to make functions available for importing.
If a package is not already installed on your system, you can get it
by calling Pkg.add("packagename")
. This will
download the necessary files from the official registry
and install them. To use the functions from a module included with the
package, type using modulename
. This will import the functions
marked for export in the module and precompile them. Their bare names will
then be available in the local namespace. If, instead, you issue the
statement import modulename
, you will then use its exported
functions under names like
modulename.function()
. Pkg
itself is part of the
standard library; before you use it you need to say
using Pkg
.
The REPL has a special mode for manipulating packages, that you enter by
typing "]". In Pkg mode, you can simply type, for example,
add packagename
.
The Julia community has created over 1900 packages. You can explore the official registry, divided into categories, at the Julia Observer site. These projects run the range from mature and polished to unfinished experiments and cover a wide variety of fields. Unfortunately, many of these packages do not work at the moment and will not work until they are upgraded to conform to the current language version. As the Julia team mentioned in email, there is a wide array of packages that have been developed for Julia, many of which are far outside the mainstream numerical/scientific application domain. These include a web framework, code for scripting Minecraft, Sudoku-as-a-service, a music manipulation library, and more.
Other features for the numericist
Julia is replete with features that make the life of the numerical scientist more convenient. It includes all the usual math functions, special functions (Bessel, Hankel, Airy, etc.) are provided by an external library, statistical functions are part of the standard library, and so on.
Julia can work with complex numbers, using im
for the
imaginary unit. Note that sqrt() is not defined for negative
numbers unless they are complex, as shown here:
sqrt(Complex(-4)) == 0.0 + 2.0im
Julia allows writing nested loops without explicitly nested blocks, enabling more concise, less deeply indented code that more closely resembles the way summations are written on paper:
for i = 1:2, j = 3:4 println((i, j)) end
This block produces the output
(1, 3) (1, 4) (2, 3) (2, 4)
Another piece of syntactic convenience is the arrow operator, which allows the chaining of functions from left to right without a lot of parentheses:
f(x) |> g |> h == h(g(f(x)))
Technical computing often involves some type of plotting of the
results. The Julia community has embraced a plotting
"metapackage" called "Plots". The idea behind
it is to present a unified, powerful plotting language to the
user that is independent of the actual backend. There are multiple Backends
for the package,
including for Matplotlib, the Plotly JavaScript plotting library,
the LaTeX drawing system PGF/TikZ, and more. There
is also
a particularly nice backend that draws plots right on the
console, called UnicodePlots. Before
using Plots for the first time, you may need to add the package with
Pkg.add("Plots")
; most of the backends are in
separate packages, as well.
One of the aims of Plots is to be intuitive and "smart", with
the ability to figure out the plot that you want. Whether or not it has
achieved this lofty goal, it is easy to use, and the ability to switch
plotting backends without changing your code is a nice feature. The
backend is selected in an interactive session by calling a function made of
its name transformed to all lowercase. Before plotting, we must also import
the Plots module with a using
command:
julia> using Plots julia> plotly() Plots.PlotlyBackend() julia> plot(sin, 0, 2π)
After entering this in the default REPL prompt, your web browser will open a new tab containing the plot, which has some interactive controls for scaling and panning that appear upon hovering:
Here is how to draw the same plot with a different backend:
There are some rough edges at the moment: my attempt at using the
Matplotlib backend "PyPlot
" led to a segmentation
fault with the REPL and failed to produce a plot from the Jupyter console,
leaving the console in an unusable state. Also, the first plot that you
make in a
session using a particular backend takes a good long while to appear, but
subsequent plots are fairly quick.
If you want to use a plotting system other than the ones that interface with Plots, there are many options. For example, there is an interface to gnuplot and a pure Julia graphics package called Gadfly.
Macros
Julia has extensive support for metaprogramming, including full,
Lisp-like macros. In fact, Julia itself is quite Lisp-like under the
hood. Expressions like 1 + 2 + 3
are
represented internally in
a way closer to s-expressions and the user can use this internal
representation. Try typing +(1, 2, 3)
at the REPL
prompt.
I won't go into macros in detail this article, but you should know that
they give you the power to
use the language to rewrite its own syntax and to create new language
features. If you want Julia to have, say, a kind of control flow not yet
provided in the language, perhaps an until
keyword, you can
create it yourself with a macro. This is a level of power most commonly seen
in the Lisp family of languages, though there are other languages with
robust macro support.
In Julia, you define a macro with a code block using the keyword
macro
and invoke it with the syntax
@macroname()
. Here is a very simple and practically useless
example, to give you a general idea of how the machinery works:
macro dblefun(f, x) return :( $f($f($x)) ) end
This block defines a macro called dblefun
, that takes a
function f
and composes it once with itself, applying the
resulting doubled function to the second argument x
. The
:( ...)
syntax defines an "expression object", which
is used to "quote" what's inside the parentheses without evaluating
it. Inside an expression object, the "$" is used to
interpolate a value, in this case from the arguments supplied to
the macro. We can verify that this macro works as intended from the
REPL:
julia> @dblefun(sin, π/2) 0.8414709848078965 julia> sin(sin(π/2)) 0.8414709848078965
The power of macros comes from the ability to use the entire language to manipulate expressions inside the macro definition, which allows Julia to rewrite its own syntax.
Distributed computing
As befits a language designed for high-performance numeric programming, Julia has provisions for parallel and distributed computing. You don't need to reach for third-party libraries for this, as the capability is built into the language or uses the standard library. I'll try to provide a bird's-eye overview of the landscape. The official documentation covers this area in considerable detail.
Julia provides keywords and functions that allow the programmer to define coroutines, which are functions that communicate with each other through "channels". You can tell your functions to send data through a channel, or to wait for data to appear on a channel. Functions are automatically suspended and resumed as data on the channel becomes available. The coroutine mechanism is ideal for situations where the execution time of a function is unpredictable; for example, using coroutines your program can do something else while waiting for data from the internet.
Julia also makes it simple to perform parallel computation, either on
multiple local compute cores or on networked machines. The same code can be
used in both situations; whether multiple cores, multiple machines, or a
combination of both are used just depends on the arguments used when
invoking Julia. If you start the REPL with
julia -p N
then it
will be
started with N
worker processes, which should be set equal to
the number of CPU threads on your machine. This argument also automatically
loads a module that makes the parallel processing commands and macros
available.
One of these is a parallel map()
command, called
pmap()
. The regular map()
command works as in other
languages in which it appears; map(f, a)
applies the function
f()
over the array a
, returning an array of results
of the same shape as a
. The parallel version is
pmap(f, a)
, which automatically distributes the work over
the available
threads. There are a handful of other commands and macros, such as
@distributed()
and @spawn()
, which ship out work to
available processing resources for parallel execution and
fetch()
, which gathers the results.
If you invoke Julia with the argument
--machine-file filename
rather than -p n
,
then all of the parallel
code will run on the networked machines listed in
filename
(using all the CPU cores on each machine)
transparently, assuming that key-based SSH logins have been set up on each
machine. The machine file can be as simple as a list of hostnames. Of
course, the machines can be nodes on a supercomputing cluster, or a set of
heterogeneous servers around the world. Using this mechanism, without
changing any of the code, a program can be run on an infinite variety of
parallel computing environments.
Parting words
Julia goes a long way toward solving the "two-language problem", since it is quick to develop in, while producing fast, native code. Its drawbacks are that it is not well suited to system scripting, because of a somewhat slow startup time; occasional sluggish response at the REPL prompt while the JIT compiler does its thing; an ecosystem that, while growing rapidly, can not yet compete with, for example, Python's; and an at-times idiosyncratic syntax that is not to everyone's taste.
Considering its youth, and the well-established alternatives available, Julia has seen an impressive degree of adoption by a wide variety of users. As the Julia team pointed out, Julia is already in use at more that 700 universities and has become part of the curriculum at many of them. A few years a go I speculated that Julia would eventually supplant Fortran as the language of choice for large-scale simulations and other demanding numerical applications. With the release of version 1.0 and Julia's rapidly increasing adoption, I'm feeling pretty sanguine about my prediction.
Index entries for this article | |
---|---|
GuestArticles | Phillips, Lee |
Posted Sep 5, 2018 7:18 UTC (Wed)
by rsidd (subscriber, #2582)
[Link] (8 responses)
h ◦ g ◦ f(x) ≡ h(g(f(x)))
and Haskell, for example, follows that with dot-composition -- you would write
Posted Sep 5, 2018 7:33 UTC (Wed)
by karkhaz (subscriber, #99844)
[Link]
Posted Sep 5, 2018 12:08 UTC (Wed)
by bpearlmutter (subscriber, #14693)
[Link] (1 responses)
Posted Sep 17, 2018 15:04 UTC (Mon)
by nybble41 (subscriber, #55106)
[Link]
(f >>> g >>> h) x == h (g (f x))
However, this is not quite the same as the Julia code, which was using flipped function application (Haskell: Data.Function.&), not composition:
f x & g & h == h (g (f x))
The Julia |> operator matches the syntax for flipped function application in F#.
Posted Sep 5, 2018 13:52 UTC (Wed)
by leephillips (subscriber, #100450)
[Link]
The idea is more that you are sending data through a pipeline, rather than composing functions (although, of course, it's the same thing).
Posted Sep 5, 2018 14:01 UTC (Wed)
by leephillips (subscriber, #100450)
[Link]
(f ∘ g)(x) == f(g(x))
You can get it by typing \circ<tab> in the REPL if you want.
Posted Sep 5, 2018 15:06 UTC (Wed)
by epithumia (subscriber, #23370)
[Link] (2 responses)
Posted Sep 8, 2018 19:44 UTC (Sat)
by jrn (subscriber, #64214)
[Link] (1 responses)
Posted Oct 23, 2018 23:36 UTC (Tue)
by epithumia (subscriber, #23370)
[Link]
Posted Sep 5, 2018 19:12 UTC (Wed)
by mtaht (subscriber, #11087)
[Link]
but it's not clear to me how well the threading model works - the (possibly out of date) doc claims it's a single cpu only as yet....
Posted Sep 8, 2018 0:44 UTC (Sat)
by quietbritishjim (subscriber, #114117)
[Link] (1 responses)
template <class T>
Posted Sep 8, 2018 2:45 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Sep 17, 2018 10:16 UTC (Mon)
by oldtomas (guest, #72579)
[Link]
Also, R has multiple dispatch since its "S4" object system, in the whereabouts of 1998.
Unlike much of Julia, the "chaining" of functions (which I hadn't yet encountered)
An introduction to the Julia language, part 2
f(x) |> g |> h == h(g(f(x)))
seems non-intuitive and contrary to mathematical notation to me. Usually "function composition" is written left to right, ie
h . g . f (x)
in Haskell.
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
An introduction to the Julia language, part 2
decltype(b-a) pdiff(T a, T b);
string pdiff(string a, string b);
An introduction to the Julia language, part 2
Multiple dispatch