Julia 1.6 addresses latency issues

May 25, 2021

This article was contributed by Lee Phillips

On March 24, version 1.6.0 of the Julia programming language was released. This is the first feature release since 1.0 came out in 2018. The new release significantly reduces the "time to first plot", which is a common source of dissatisfaction for newcomers to the language, by parallelizing pre-compilation, downloading packages more efficiently, and reducing the frequency of just-in-time re-compilations at run time.

The detailed list of new features, added functions, improvements to existing functions, and so on can be found in the release notes. The focus of this article will be the changes that affect all users of Julia, rather than those that only apply to certain packages or usage patterns.

The perennial complaint

Despite Julia's success as a vehicle for high-performance computing, there has been one persistent complaint on the part of those who try it out for the first time. This complaint is so common that it is referred to by a standard tag in discussions about Julia performance: the "time to first plot" problem. It refers to the fact that while typing julia at the terminal would result in the read-eval-print loop (REPL) prompt appearing instantly, one had to endure several minutes of loading and pre-compiling before plotting something for the first time. Although subsequent plots are speedy, this was still a pain point for many users who are familiar with one or more of Python, Octave, Matlab, gnuplot, etc., where the "first plot" appears without a noticeable delay.

The reasons behind the wait for the first plot are, perhaps paradoxically, also the reasons for Julia's ability to achieve the execution speed of C and Fortran while also providing the interactive experience of an interpreted language. But Julia does its compilation step as part of an interactive session, which is where the wait comes in.

It is worth noting that the membership of the "Petaflop" club, those languages that have reached one petaflop per second on real-world problems, consists of Fortran, C, C++, and Julia. The precocious Julia was admitted to this group before it had reached version 1.0, with an astronomical calculation using over a million threads on this Cray XC40. All of the languages in this club are ahead-of-time (AOT) compiled, except Julia. They achieve their performance as the result of efficient machine code that is created by optimizing compilers, a process that can take significant time.

Julia is different. The journey from source code to machine instructions has multiple stages. The first stage (after downloading), which happens when a package is installed, is pre-compilation, where parsed and serialized representations of some functions and data structures are stored on disk. This is what can cause delays of several minutes when installing or upgrading a package, and adds to the time to first plot.

The other major stage that can lead to a noticeable lag is the just-in-time (JIT) compilation phase. The Julia JIT compiler is different from the typical sort found in, for example, LuaJIT. Those compilers trace the execution of code and create highly optimized versions of time-consuming functions or loops. The Julia version is sometimes called a "just-barely-ahead-of-time" compiler. Rather than trace execution, it performs a static analysis of code. This happens at run time and can cause pauses in execution when, for example, functions are invoked for the first time.

Choosing this type of JIT compiler is due to Julia's type system and its organization around multiple dispatch. Each generic function in the source can have hundreds of methods associated with it. The compiler does not generate machine code for each possible method, but only for those actually used, which is not generally known until run time. This is why there is a delay the first time plot() is invoked in the REPL, but the second time is delay-free. It is also why, after importing other packages, invoking plot() sometimes incurs additional latency. The new functions may create method invalidations, requiring additional methods, not needed previously, to be compiled.

For those used to compiled languages, such as Fortran, this was never a cause for concern. We are used to having to compile our programs before running them. But since Julia has been widely described as something like a better Python, many people curious enough to look into it only had experience with interpreted languages. Julia offers the user a REPL like any other interpreted language, and then expects you to wait a minute after you ask it to do something. For many, this was jarring; thus, the perennial complaint.

Timing Julia 1.6

The new version of Julia makes good progress toward eliminating some of the latency that led to these complaints. It significantly reduces pre-compilation time, is faster at downloading package files, and reduces the frequency of run-time JIT compilation delays. Due to the factors discussed above, latency in Julia can never be completely eliminated, but the current version creates a noticeably snappier interactive experience. Pre-compilation times in 1.6 are reduced mainly through the use of all available cores to carry out concurrent compilation of modules. Therefore, the more cores available, the greater the reduction in pre-compilation latency (to a point, of course).

The 1.6 release was followed five weeks later by a "patch release" tagged v1.6.1. This minor release adds no new features, but fixes a few bugs and applies several optimizations, in addition to some subtle enhancements, for example in the display of stacktraces.

Below are the results of some timing trials comparing versions 1.6.1 with 1.5.0 and 1.0.5, which is the current long-term support release (LTS) from September 2019. The next LTS release may be one of the 1.6 series, or it may be deferred to the 1.7 series; this decision will be made later. The machine on which I carried out these experiments has an Intel Core i5 processor, which has two physical and four logical cores, and 4GB of RAM. The table below compares the timings for the three Julia versions. I measured all times with a stopwatch, as this was the most reliable way to get actual wall clock times, which are the times that affect the user experience.

Julia version install + plot download pre-compile start + plot Plots version

1.0.5 417s 103 (5.1 MB/s) 315s 30.0s 1.4.3

1.5.0 599s 230 (3.1 MB/s) 367s 38.5s 1.15.0

1.6.1 306s 123 (5.1 MB/s) 170s 19.2s 1.15.0

Julia version	install + plot	download	pre-compile	start + plot	Plots version
1.0.5	417s	103 (5.1 MB/s)	315s	30.0s	1.4.3
1.5.0	599s	230 (3.1 MB/s)	367s	38.5s	1.15.0
1.6.1	306s	123 (5.1 MB/s)	170s	19.2s	1.15.0

The first column shows the Julia version. The second column, "install + plot", shows the total time from entering the command:

    Pkg.add("Plots"); using Plots; plot(rand(100))

until seeing the plot. It was timed on a fresh install of the given Julia version, with no other packages installed. [Update: As pointed out by a reader, Pkg must be imported before the command above will work; doing so is nearly instantaneous and was not included in the timings.]

The next two columns separate the times into downloading and pre-compiling times. The numbers in these columns do not always add exactly to the numbers in the second column, for several reasons. I had to depend on the display in the REPL to discern when different phases were in progress; the way this information is displayed varies greatly among the three versions in the table. In addition, there are delays during which it is not clear what is happening; the REPL indicates neither downloading nor pre-compiling. These delays are presumably JIT passes, and account for much of the 13-second discrepancy in the last row.

Download time measurements are necessarily subject to variations in my network conditions at the time, but I minimized this by averaging two installs for each version, interleaving the downloads among versions, and conducting all the timings within a span of three hours. I wiped out my ~/.julia directory before each of these timing runs, so each version started with a blank slate.

The "start + plot" column shows the amount of time it takes for subsequent plots to appear; after restarting the REPL, the following command is timed:

    using Plots; plot(rand(100))

Once the first plot has been performed in a given REPL session, additional simple plots will be nearly instantaneous. The other times that are shown are for actions that do not occur frequently; packages are installed once, and not again until a new version is desired. The "start + plot" times, in contrast, affect the startup time of any script that uses Plots, and influence the responsiveness of interactive work when the REPL needs to be restarted repeatedly. As the table shows, this time is cut about in half in the new version.

Version differences

The times across different Julia versions cannot be compared meaningfully without taking account of a few other facts. As seen in the final column, the version of the package fetched with the add command is different for version 1.0.5. This earlier version of Plots is simpler and smaller. Fetching Plots downloads 523MB for Julia 1.0.5, 716MB for 1.5.0, and 629MB for the current version. Although the version of the Plots package is the same for the two recent Julia versions, the size of the payload is significantly different; while Plots is the same, the versions of some packages in its dependency graph may have changed. The more recent versions of Julia also download a larger number of smaller files, which may account for the observed faster average download speed of 1.0.5 compared with 1.5.0.

The significant improvement in download speed over version 1.5.0 is presumably due to an improved system for file transfers. In previous versions, when fetching a package with the add command, the Julia packaging system would search for a binary installed on the system that was capable of downloading files from URLs: curl, wget, etc. This was fragile, inconsistent, and slow as each download would launch a new process that had to connect and negotiate TLS anew, all of which slows things down.

The new system uses libcurl for all downloads. It works through Downloads.jl, which is a new package in the standard library that exposes two functions, one for downloading files and the other for making http requests. Now all of the downloads that are part of a package installation, which could be a large number of files for a complex package, are downloaded in-process and can reuse connections.

Watching the CPU meters during the compilation phase confirms that the new version uses all available cores, and that previous versions use one core only. It also shows that the concurrent compilation in the new version seems to be confined to simultaneous compilation of different packages; when there was one package left, Plots itself, the work to complete that job stayed on one core. This suggests a possible opportunity for further speedups, by concurrently compiling functions within packages.

The use of concurrent compilation means that the speedup here will depend on the number of cores available and also, to some extent, on how the package is organized. Here is a report of compiling and loading the DifferentialEquations package, which I explored in a recent article, that shows version 1.6 taking about one minute to compile and load, while version 1.5 took eight minutes.

The reduction in the "start + plot" times can be attributed to the improvements in the scheme for method invalidation that reduce the frequency of JIT compilations. These improvements make package loading through the use command faster, and reduce latency overall.

Sysimages

One may wonder: if packages need to be pre-compiled, how is it possible for the REPL to start up instantly? After all, when that prompt appears, a slew of functions are already available, including the large and complex REPL system itself, along with its subsystems for documentation, package management, and more.

This is possible because all of that, along with other commonly used functions, is already compiled and included in a "sysimage": the binary that is loaded when you type julia. For a while now, the ability to create custom sysimages has been exposed to the user. So someone who routinely uses the Plots package along with, say, the Statistics package, could create a custom sysimage that already has these things pre-compiled. It will start up instantly, with all of the plotting and statistics functions ready to use.

A drawback to using sysimages is that the package versions compiled into them are, of course, frozen. Upgrading any of the ingredients of a sysimage means manually creating a new one. For those who want to learn how to do this, the PackageCompiler.jl package contains all of the functions required for creating sysimages; it comes with a manual complete with practical examples. A sysimage is not only useful for eliminating latency; it is the way to distribute Julia executable programs to those without the Julia runtime installed.

Final words

Since my last article here about Julia six months ago, the language has continued to increase its footprint in the world of high-performance and scientific computing. It has also undergone significant development, both in the language itself and in the ecosystem of packages for numerical, mathematical, and scientific applications.

Installation of Julia is simple. The latest releases can be found on the download page, as tarfiles for various operating systems and architectures. Simply download the appropriate file and unpack it, then create a link to the Julia binary somewhere in the executable path. After I expanded the tarfile for 1.6.1, I found that the installation occupied 393MB. Julia is also commonly found in the package managers of Linux distributions, but the versions there may be older than desired, depending on the distribution; for example, the version packaged with Debian 10 is 1.0.3, but Arch Linux is completely up to date.

Julia appears to have become established as one of a small handful of languages that are serious contenders for large-scale number-crunching projects. I'd like to highlight two interesting recent research initiatives with Julia at the core.

CliMA, the Climate Modeling Alliance, consists of researchers from Caltech, MIT, the Naval Postgraduate School, and NASA's Jet Propulsion Laboratory, who have joined to create an Earth-system simulation integrating data from multiple sources, including satellites and ground sensors. All of the code will be open source; simulation results and predictions will also be available to the public.

The system is based on a collection of Julia packages that are developed on GitHub. These range from the general-purpose fluid dynamics solver Oceananigans that I played with in this article, to VizCLIMA, which is a specialized tool for visualizing the results of CliMA simulations. One exciting thing about this type of open-source science is that anyone can check out current versions of any of these packages and calculate with them, as well as contribute improvements and bug fixes.

Julia Computing, a company established to create products and offer consulting to support the use of the language, has teamed with quantum-computing startup QuEra on a DARPA contract to apply machine learning to microelectronic system design. This project uses Julia code to train a reduced model of an electronic circuit that can be, potentially, several orders of magnitude faster to run than a detailed simulation, while remaining accurate. QuEra's goal is to acquire tools to design the control electronics for their quantum computers, a task for which they have found existing circuit-simulation technology inadequate.

The success of Julia in the scientific-computing sphere is an important development at the intersection of science, engineering, and free software (in the free speech sense). Until the advent of Julia, the only programming language with comparable influence and ubiquity in the science world was Fortran.

While there are capable free compilers for Fortran, large-scale computations using the language are usually performed using proprietary compilers created by chip manufacturers such as Intel. This is because these compilers squeeze the best performance out of particular CPUs. In contrast, the LLVM compiler used by Julia is a free-software project. Also, there is a strong tradition within the community of scientists using Julia to expose the code used in research to scrutiny, usually on GitHub. These are healthy developments for science, where, in the interests of transparency and reproducibility, ideally there should be no black boxes.

With this new release, Julia is easier than ever to get started with for those interested in exploring whether it might be suitable for their work or their play. On the timescale defined by Fortran, it's still a young language. And as Fortran has continuously evolved since the 1950s in response to the needs of its users, Julia will undoubtedly evolve as well, in directions that are unpredictable. It already occupies a unique position as a language that is both friendly to use as an interactive calculator and capable of running the most demanding number-crunching applications. Even for those not involved with scientific computing, I recommend looking into Julia as an example of interesting language and system design.

Index entries for this article
GuestArticles	Phillips, Lee

Julia 1.6 addresses latency issues

Posted May 25, 2021 21:04 UTC (Tue) by leephillips (subscriber, #100450) [Link]

Hello, readers. This article is being discussed on Hacker News today: https://news.ycombinator.com/item?id=27280174

A reader there was confused, with some justification, about the description of the releases in the first couple of sentences. I should have explained the non-obvious terminology used in the Julia development process. v.1.6 is the first release since 1.0 that was not on the regular schedule of "timed" releases, but organized around a specific set of features instead; hence, a "feature release". This is explained in the link from the first sentence, but those details should have been brought into the article for clarity.

Main issues

Posted May 25, 2021 22:06 UTC (Tue) by summentier (subscriber, #100638) [Link]

I like Julia, and really feel it is coming together nicely. The speed improvements in 1.6 certainly help.

My main complaints are what I consider design "inaccuracies".

For example, the iterator interface is based around an iterate method, which takes the previous state as optional parameter, and returns a tagged union, either a pair (value, state) or a special "end tag" (nothing). This breaks type stability, thereby creating artificial problems for the compiler to optimize away and confusing its own analysis tools like @warn_type.

The fact that you need to use two different methods, depending on whether you are at the beginning or in the middle of the iteration and also that you always need to check the union makes writing custom iterators a complete mess. Just look at Julia's Iterators standard library, which is filled with ugly hacks and workarounds.

C++, not exactly known for its ease of writing iterators, beats this hands down: you have a type-stable iterator state moving between a known begin() and end(), and you check for iterator completion by simply comparing with end(). In the rare case where the end state is not known, you can still use a tagged union for the state.

The fact that immutability is such a big part of the language but there is no way to denote an array as immutable is also a problem, but IIRC the devs want to fix this in 2.0

Julia 1.6 addresses latency issues

Posted May 26, 2021 9:50 UTC (Wed) by bustervill (guest, #85383) [Link] (2 responses)

"The Julia JIT compiler is different from the typical sort found in, for example, LuaJIT."

This was either a poor choice of words or poorly chosen example. The brainchild of Mike Pall is anything but typical.

Aside from the now obsolete "Tamarin", can you name any other tracing JIT compiler?

[Tamarin](https://en.wikipedia.org/wiki/Tamarin_(software))

Julia 1.6 addresses latency issues

Posted May 26, 2021 12:44 UTC (Wed) by dave_malcolm (subscriber, #15013) [Link]

> Aside from the now obsolete "Tamarin", can you name any other tracing JIT compiler?
PyPy's JIT compiler traces execution through hot loops, effectively inlining into both functions that are called, and the into the implementation of the object types that are in use:
https://rpython.readthedocs.io/en/latest/jit/index.html

Julia 1.6 addresses latency issues

Posted May 26, 2021 13:05 UTC (Wed) by leephillips (subscriber, #100450) [Link]

Good point; not the best phrasing, as I suggest that, typically, JIT compilers do tracing. There are others, though. To combine the themes of Python, speed, and JIT, there is PyPy, a tracing JIT compiler for Python. Probably not as generally fast as LuaJIT, though.

Julia 1.6 addresses latency issues

Posted May 27, 2021 13:11 UTC (Thu) by flussence (guest, #85566) [Link] (1 responses)

Twenty seconds to REPL on a primed precomp cache? And to think I'd been annoyed about Raku's startup time…

Julia 1.6 addresses latency issues

Posted May 27, 2021 14:18 UTC (Thu) by leephillips (subscriber, #100450) [Link]

The REPL starts up instantly.

Julia 1.6 addresses latency issues

Posted Jun 3, 2021 7:43 UTC (Thu) by callegar (guest, #16148) [Link]

When trying Julia at version 1.5, I remember I was not really that negatively impressed by the long install+plot time (~ 10 m on the chart) in the article as I realized that this was really a one-off cost. What I really disliked was the subsequent start+plot time (still ~ 40 s, almost 1 m on the chart) that makes the interactivity poor in certain education contexts. It is enough for a student to get out of the repl by mistake to have him/her lagging by many minutes in the restart. I now see that this is being halved which looks promising and I wonder if more optimization on that are expected to be on the way.