Julia 1.9 brings more speed and convenience
Version 1.9 of Julia, which is an open-source programming language popular in scientific computing, was released in early May. There are a number of interesting new features this time around, including more work addressing the startup-time complaints and a number of improvements to the package system. Beyond that, there are a few interesting features from the Julia 1.8 release to catch up on.
Julia is a general-purpose programming language which is just-in-time (JIT) compiled by the LLVM compiler. Since its public release in 2012, it has rapidly been adopted for scientific research, due to execution speed similar to Fortran combined with the convenience of REPL-based development. Julia has an expressive syntax as well as a high degree of composability of library code.
Julia 1.8
Our last detailed description of a Julia release looked at version 1.7, so we have a little catching up to do. Version 1.8 brought some changes that merit at least a quick summary. The one new feature in that release that had the greatest potential effect on writing Julia programs was the appearance of typed globals.
It's possible to write an unnecessarily slow program in any language, and Julia is no exception. The key to creating efficient programs in Julia is to write type-stable code. One of the conditions for a type-stable program is that the compiler can infer the return type of any function just from the types of its arguments in a call. In practice this is no great burden on the programmer, who simply needs to keep in mind some simple and generally intuitive guidelines. One of the first of those guidelines that the Julia programmer absorbs is that they should never use non-constant global variables (i.e. not declared using const).
It's easy to see why a non-constant global can create type instability: since the type of the variable could potentially change from an assignment elsewhere in the program, any function that uses it may have a return type that cannot be inferred. Version 1.8 allows declaring the type of a global variable, so that its value can be changed, but not its type. Programmers who want to use global variables can now do so without a performance penalty.
Another 1.8 change involves Julia's structs, which are similar to structs in C; they are the basic mechanism for creating user-defined types. Before version 1.8, they came in two distinct species: mutable and immutable structs. Once an instance of the latter is created, the values of its fields cannot be changed. A plain struct declaration creates an immutable struct, while a mutable struct declaration is for the mutable variety. Julia version 1.8 introduced the ability to mix the characteristics of both kinds of struct, by declaring some of its fields to be const:
mutable struct MS a::Int const b::Int end thing = MS(17, 43)
This creates a mutable struct called MS and a variable thing of that type. The value of the first field can be changed, but not the second: thing.b = 18 is an error.
That covers the changes that are most relevant to daily use of the language. Most of the other significant new features in version 1.8 are related to performance, profiling, and the package system.
"Time to first x"
Precompilation delays when loading packages, which increase the latency until a loaded function can be used, are a common source of complaints from new Julia users. This is called the "time to first x" problem, often in the form of "time to first plot", referring to the delay between importing a package and seeing the result returned by one of its functions. The nature of Julia's type system and its "just ahead of time" compilation make some of this latency inevitable; it's a small price to be paid for the ability to program with user-defined types, dynamic dispatch, macros, and the rest of Julia's toolbox, while still ending up with fast code operating on primitive types.
Nevertheless, any improvement in the interactive experience is welcome. Our article measuring the decreased latency in Julia version 1.6 noted the distinct improvement over previous releases. Work on various strategies for making Julia more responsive has continued, leading to a big improvement in version 1.9. See this section of the release highlights for tables and graphs of loading times for a few large, popular packages.
A big portion of this improvement is due to the arrival of native code caching. Now, after initial precompilation, the resulting machine code is cached. The consequences are longer initial precompilation times and more space needed for the cached files, but greatly reduced time to first x in subsequent sessions. Package authors can also ship their code with cached, precompiled routines. Julia users can now start a REPL, load Plots, and see their first graph in a couple of seconds. Users can also create personal "startup packages" with a set of dependent packages and compiled, cached routines relevant to their typical workflows.
More options when adding packages
The package system now provides more flexibility when adding and upgrading packages. In the REPL's package mode, the add Package command, without any options, will install the latest compatible version of the package called Package into the active environment. This may require dependencies to be upgraded to maintain a consistent environment, which in turn will require sometimes lengthy precompilation.
Julia 1.9 provides an option only add package versions to the active environment that have already been installed on the machine. The package system is aware of everything installed across all of the environments, so that it can share resources. For example, if there are several projects that all use the same version of the Plots package, it hasn't been downloaded and compiled for each project, but only once; all the projects use the same files. This means many projects can be created without worrying about using up additional disk space.
To use this option enter:
pkg> add --preserve=tiered_installed Package
The system will try to use installed versions first, and only download new ones if required to satisfy dependencies. An option that forbids the system to install new versions even if required, but to give up instead, is also available:
pkg> add --preserve=installed Package
I find this option useful to save time when I need a function from a package that I know I've installed and don't care about getting the latest version.
Tools for package authors
Packages developed for other people to use should avoid having unnecessary dependent packages. Dragging along unneeded dependencies obligates users to install them when they load the package and increases the chances of conflicts. At times, when developing a package, a programmer may notice that a dependency has been added to its manifest, but not understand why it would be needed. A new package system command, why, has been added in version 1.9:
(RBCOceananigans) pkg> why AbstractFFTs Oceananigans → CUDA → AbstractFFTs Oceananigans → CUDAKernels → CUDA → AbstractFFTs Oceananigans → CubedSphere → FFTW → AbstractFFTs Oceananigans → FFTW → AbstractFFTs Oceananigans → PencilFFTs → AbstractFFTs Oceananigans → PencilFFTs → FFTW → AbstractFFTs
In this REPL excerpt I'm in package mode with the "RBCOceananigans" environment activated; this is a project I'm working on that uses the Oceananigans fluid dynamics package. I noticed from precompilation messages that my package uses AbstractFFTs, but I didn't know why that was needed. The why command tells me that AbstractFFTs is used by some other packages that Oceananigans needs.
I can't do anything about that, since Oceananigans is essential to the project. However, if the dependency were a heavy or troublesome module that was only needed because I had included an inessential package, the why command would help me ferret this out. Perhaps I could extract what I was using from the guilty package instead of including the whole thing, and thus snip that branch of the dependency tree.
I'm impressed by Julia's package system, but didn't fully appreciate an inherent flaw until I contemplated the advantages brought by the new package extensions feature. Glancing again at the output of the why command above, we see that Oceananigans depends, for example, on CUDA. This is a package for working on graphical processing units (GPUs), which can be useful for accelerating fluid simulations. It's great that Oceananigans has support for GPUs, but if the user doesn't plan on using one (or one from the right manufacturer), it's dead weight. CUDA is a big package that takes several minutes to download and precompile. This time is added to the already significant time for Oceananigans itself and its other dependencies.
The package-extension mechanism allows the developer to segregate the parts of the (in this case) Oceananigans code that actually require CUDA into a separate module. The extension module is just source code, shipped with the main Oceananigans module code (and any other extensions). CUDA is removed from the dependency list, so when Oceananigans is installed, it does not come along for the ride. Those who really want to put their calculations on a GPU would install CUDA, which would then trigger the loading of the extension module that uses it. Extension loading can also be triggered by the installation of a specified set of packages into the environment.
This gives users more control over what packages are installed and avoids loading module code that will never be used. These benefits require package authors to reorganize their libraries into main and extension modules, which will take some time, but is already happening with some large and popular packages. To see what extensions are available for the installed packages, the package status command has a new flag:
(RBCOceananigans) pkg> status -e Project RBCOceananigans v0.1.0 Status [...] [91a5bcdd] Plots v1.38.12 ├─ FileIOExt [FileIO] ├─ UnitfulExt [Unitful] ├─ GeometryBasicsExt [GeometryBasics] ├─ IJuliaExt [IJulia] └─ ImageInTerminalExt [ImageInTerminal]
The -e flag means that I only want to see information about extensions. Here, I've learned that, of the packages installed in my project, only Plots comes with extensions. The output shows the names of the available module extensions and, in square brackets, which packages they depend on. Plots will load faster for me if I do not use some of its optional features, because I can avoid installing and precompiling unneeded packages.
Numbered REPL prompt
A new option in the REPL changes its familiar prompt to numbered red and green input and output prompts, in affectionate imitation of IPython. The figure below shows the result of activating the numbered prompt.
The feature is turned on by calling REPL.numbered_prompt!() from the REPL package. As the figure above shows, previous returned results are available by indexing the Out vector using the displayed prompt numbers. The special REPL modes and their prompts are unchanged, so the package, help, and shell modes are unaffected by activating the numbered prompt. A convenient feature of the Julia REPL is that the user can paste text copied from the shell history or any other source, and any prompts or returned results that are part of the pasted content are automatically stripped away, leaving only the actual input commands. This feature still works with numbered prompts turned on: In and Out prompts are deleted from pasted code.
Interactive tasks
The new interactive tasks feature in Julia is my favorite from the new release, because it enables a convenient mode of working through the REPL or in other interactive contexts. Starting in version 1.9 Julia can be started with the flag -t m[,n] to create two "threadpools", a normal one with m threads and, optionally, an interactive one with n threads. See my concurrency in Julia article for an introduction to the use of threads and tasks in the language.
After invoking Julia with this flag, tasks started with the Threads.@spawn macro will be confined to the "normal" threads. They're started on one of the m threads and may not migrate to any of the "interactive" threads. Tasks launched with Threads.@spawn :interactive are assigned to one of the interactive threads and given priority in scheduling. To prevent them from starving the other threads, these interactive tasks should be written so that they yield frequently. Since a common application for interactive tasks is for user interaction, yielding will occur as a matter of course, since waiting for input causes an automatic yield.
To test this new capability, I launched the REPL on a two-core machine, first with julia -t 2. Then I spawned a half-dozen compute-heavy function tasks that yielded infrequently. As expected, the response in the REPL became sluggish, taking one to several seconds to return a prompt after hitting return. I repeated the experiment after invoking Julia with julia -t 1,1. After spawning the same half-dozen CPU-hungry tasks, the CPU meters showed 100% utilization, but on only one thread. This time, the REPL, which was provided with an interactive thread, continued to respond instantly, since the churning of the other tasks had no effect.
The ability to launch computations "in the background" and continue development work undisturbed is particularly convenient. Other purposes that suggest themselves include web applications, GUI applications, or other programs that respond in realtime to the user.
Other improvements
Aside from the major improvements described above, the new release brings a handful of other useful new features. We can now give Julia a "heap size hint" on startup, defining a limit above which the garbage collector will try harder to reclaim memory. The hint is transmitted though a flag:
$ julia --heap-size-hint=2G
This sets a memory limit of two gigabytes, for example.
The default sorting algorithm has been replaced by a faster one. In a simple test of sorting arrays of 108 native floats and ints, I found that version 1.9 was twice as fast as version 1.8, but also allocated twice as much memory to carry out the sort.
For some time Julia has provided both a @fastmath macro and a --math-mode=fast startup flag to turn on the fastmath LLVM option. These perform floating-point arithmetic more quickly, but less predictably and accurately. Fortran and C compilers have similar options. The macro marks individual blocks for the fastmath treatment, while the flag applies it to the whole program. In the current version the startup flag has been disabled (it's accepted but doesn't do anything, so deployment scripts need not be changed), due to unacceptably inaccurate results from some functions. Those with the courage to use fastmath must decide where it should be applied.
Until now, standard-library packages could not be upgraded separately from the Julia version. The new release experiments with upgradable standard-library packages, which are distributed with Julia (so can be used without installation, as now), but otherwise are treated just as "normal" packages. As of now, only the DelimitedFiles package, a small library that stores arrays in files, is getting this treatment, but, if the experiment succeeds, upgradable standard-library packages may become the norm.
This one is rather esoteric (I haven't tried it), but seems to be important to some people: starting with version 1.9, Julia libraries can be called from multi-threaded C++ programs. This article shows how the feature can be used. Beyond that, support for the half-precision (Float16) floating-point hardware present on the Apple M-series computers has been added for those who want the enhanced performance and can tolerate the loss of precision.
Conclusion
Since version 1.0, all of the changes to the Julia language have been backward-compatible, so any programs that worked with the first public release will work with this latest version, aside from possible package incompatibilities. On a personal note, I am preparing a code repository for a book that I began writing when Julia was at version 1.6. I'm packaging and testing all of the programs in the book with the current Julia release, which means that Julia's package system updates any dependencies to their latest compatible versions. I'm over halfway through the book, and, so far, all of the programs have worked without modification, with the exception of one specific problem with a plotting backend that's already fixed. This was a pretty good test of the robustness of the package system and the backward compatibility of the language; the result is, for me, relieved surprise.
Rumor has it that version 1.10 is less than a year away, and that it will bring more improvement in package-loading times. Among other features coming in the next release are new and improved mathematical functions in the standard library (for example, fourthroot(), also called ∜); enhanced formatted printing; more useful display of matrices with Rational elements; easier-to-read stack traces; better function dispatch for signatures with Union types; the ability to choose how many threads will be used by the garbage collector; and optional display of per-package precompilation timings. If all of that holds true, it will be another steady, incremental improvement in the language and its ecosystem.
Index entries for this article | |
---|---|
GuestArticles | Phillips, Lee |
Posted Jun 1, 2023 19:54 UTC (Thu)
by fraetor (subscriber, #161147)
[Link] (3 responses)
I do a lot of work on scientific code in python, with Fortran or C++ doing the really computationally heavy bits. Would there be a trade off for using julia for this kind of use, or does julia just have the advantage of being 30 years newer?
Posted Jun 2, 2023 3:41 UTC (Fri)
by leephillips (subscriber, #100450)
[Link]
For new projects, there's no advantage in using Fortran now over Julia, and the latter provides a multitude of benefits over Fortran, from interactive development to high level syntax to the ease of composing functions and datatypes from a myriad of public libraries.
Although calling Python from Julia is straightforward, if your project depends heavily on existing Python packages without good equivalents in Julia and does not demand high levels of performance, it may be more convenient to stick with Python.
Your strategy of programming in Python and writing the numerically intensive bits in Fortran or C++ leads to the “two language problem”. For most projects these hybrid programs can be replaced with a shorter, faster, and easier to maintain Julia program (unless, as noted above, you really need some Python package).
Posted Jun 2, 2023 21:59 UTC (Fri)
by bartoc (guest, #124262)
[Link]
Also julia has a garbage collector, which you might not want, and which can be aggravating if you're integrating with another language with a garbage collector. But it won't take up any time if you don't allocate dynamically anyway and fortran's memory management facilities aren't stellar either. Personally, for most scientific code I have found the GC is convenient and doesn't really cause any problems.
Posted Jun 19, 2023 8:07 UTC (Mon)
by motiejus (subscriber, #92837)
[Link]
The post was originally written in 2016, but has an update from 2022. Worth a read and especially following the hyperlinks.
Julia 1.9 brings more speed and convenience
Julia 1.9 brings more speed and convenience
Julia 1.9 brings more speed and convenience
Julia 1.9 brings more speed and convenience