Digging into Julia's package system
We recently looked at some of the changes and new features arriving with the upcoming version 1.7 release of the Julia programming language. The package system provided by the language makes it easier to explore new language versions, while still preserving multiple versions of various parts of the ecosystem. This flexible system takes care of dependency management, both for writing exploratory code in the REPL and for developing projects or libraries.
The package system
Julia's package system allows users to experiment with new versions of the language or specific packages without having to worry about conflicts or dependency issues. At the moment, I have four versions of Julia on my computer, and have been using and testing them all, with an assortment of packages under active development, and with my own evolving projects, with no difficulties. This is not due to any particular foresight on my part. The package system largely works automatically, resolving the dependency graph by installing and pre-compiling modules as needed.
In an article from a year ago, I traced Julia's popularity in the sciences in part to its unique ability to allow users to combine features from multiple third-party modules. This is enabled by Julia's type system, its use of multiple dispatch, and its optimizing just-ahead-of-time compiler. But it is made convenient by the package system, which removes the pain from managing a diverse library of software; it is similar to the way that Git revolutionized painless branching and merging, which encouraged programmers to be more willing to experiment with features.
After starting the REPL by invoking the Julia binary on the command line, the user may need functionality not provided in the standard library. The using command imports new functions, variables, and data types into the global namespace. For example, if I want to plot something, I need to execute using Plots, after which I can call plot(sin) and see a picture of the sine function.
The plot() function is one of dozens of names exported by the Plots package. A package is simply code (in a module) alongside some other information including its author, version, and a list of other packages that it depends on. A module is an ordinary Julia program containing statements listing the items that it wants to export, with everything wrapped in a module definition that provides its name. We will look deeper into modules later in the article.
The Plots package is under active development. The version that I import needs to be compatible with the version of Julia that I'm running in the REPL. If I start another REPL by running a different version of Julia, I may need a different version of Plots; and that version of Plots may need different versions of the packages that it depends on. Keeping track of these dependencies, and installing the correct versions of packages to resolve them, is the job of the package manager. It's not an add-on system, but an integral part of Julia.
The package system maintains a tree of which versions of packages to use with various versions of Julia by setting up a different environment for each version. These environments are simply directories named after the language versions, containing two files with information about that version's dependencies. Everything is stored in the user's .julia directory.
When the user starts up the REPL, it's in one of these default environments: the one that matches the version of Julia running in the REPL. The two files used for tracking dependencies are Project.toml and Manifest.toml; the file extension stands for "Tom's Obvious, Minimal Language" designed by Tom Preston-Werner. The files are maintained automatically and the user never needs to look at them.
Package manipulations are most conveniently carried out in the REPL "package mode", which is entered by pressing the ] key. The prompt changes to indicate the mode and current environment. After starting version 1.7rc1 and entering package mode, my prompt is (@v1.7) pkg>. The part within the parentheses indicates the environment; environments beginning with "@" are default environments determined by the Julia version. The package manager doesn't distinguish between a release candidate and the released Julia version, but does note the actual current version in Manifest.toml.
Adding a package uses the add command in package mode; for example, add Plots. This kicks off the process of downloading the files from an official repository on GitHub and pre-compiling the code. The package manager will not needlessly duplicate files, saving disk space by sharing resources when possible. The packages installed directly by the user are listed in the Project.toml file, while Manifest.toml records the entire dependency graph of the environment.
After adding Plots to my fresh install of version 1.7rc1, my Project.toml contained the single line for that package, and my Manifest.toml contained 815 lines: Plots pulls in many other packages.
The command update <Packagename> searches for new
versions and installs them, along, of course, with any updated
dependencies. The rm
command deletes a package from the list
of dependencies in the Project.toml
file. It undoes an
add
command. If other packages in the active environment
depend on the package, it will persist in the Manifest.toml
file.
The rm command does not actually delete anything from the filesystem. An automatic garbage-collection process runs, if needed, when package commands are used. It reclaims disk space by purging packages that no other installed package depends on and that haven't been used for over 30 days. If one needs disk space right away, the garbage collector can be called manually and supplied with a time frame other than the 30-day default.
With this system I can effortlessly switch among my installed Julia versions, using any mixture of packages with any of them. Julia's package manager keeps each default environment logically separate, while avoiding file duplication.
Local packages
The package manager can also work with locally developed code and include it in the dependency management system alongside external packages. To see how this works we need to understand more precisely what packages and modules are.
A Julia module is a section of code beginning with the line module <Modulename> and ending with the keyword that terminates all blocks: end. Its purpose is to define a global namespace from which functions and variables can be imported by other programs. Here is a file defining a module called M1:
module M1 export plusone plusone(x) = x + 1 a = 17 end
The export commands list the names that can be used without namespace qualification after this module is imported. Julia doesn't hide anything, though: the caller can access the value of a in this module with M1.a.
An include statement pastes in the file directly, but for portable code reuse we would prefer to be able to simply say using M1 in the REPL or from another program file. For this to work, the module needs to be turned into a package.
A package is a collection of three things: a Project.toml file containing some metadata, such as the identity of the package author; a src directory; and, inside that, a program file like the example above that defines a module. The file should be named after the module; so for the above it should be named M1.jl.
If the M1.jl program uses functions from other packages, it will have dependencies. We can get the package manager to track these dependencies for us, just as it does for our default REPL environments. This is the purpose of the activate command: activate <path> tells the package manager to apply all subsequent commands to the project whose Project.toml file is in path. After activating M1 in this way, we could then issue the command add Plots, it will add Plots as a dependency to M1's Project.toml file, and also build a Manifest.toml file to record the dependency graph. If the latest version of Plots has not been downloaded, it will take care of that, too.
A convenient way to begin developing a Julia project is to say (in this example) generate M1 in the REPL package mode. A complete package directory with a skeleton program file defining the M1 module will be created.
A colleague can install your package (by cloning it from a Git repository or simply copying the files) and then tell the package manager to recreate its environment. This is done by using the activate command to switch to it; the instantiate command is then used to consult the project's Manifest.toml file and install everything in the dependency graph. The code can now be run in the exact environment in which it was developed.
Tracking local packages
Say we'd like to use our local package in our REPL sessions while keeping track of it as it develops with the package manager, just as we do for external packages such as Plots. If we ask the package manager to add it with add <path>, we get an error:
ERROR: Did not find a git repository at `<path>`
It appears that we can add dependencies to our package, but we're not yet allowed to add our package as a dependency to anything else.
Just as the public Julia ecosystem lives on GitHub, local projects must live in Git repositories if they are to fully participate in the package system. Initializing a Git repository for the package and making an initial commit is all that's required to satisfy Julia's package system. When tracking local projects, the package manager doesn't look at the file tree, but at the Git repository. An add or update command in package mode checks out the files from the local repository and caches them in the .julia/packages directory. By default this tracks the tip of the master branch, but it is also possible to track other branches.
The package manager tracks commits not through the commit hash, however, but through the tree hash. Many Git users are unaware of this hash, because it's rarely needed for anything. The tree hash encodes the actual contents of all the tracked files in the commit; it can be displayed with git log ‑‑pretty=raw. Julia's package manager uses this rather than the commit hash because it's more reliable: through rebasing or other Git operations it's possible to break the connection between the commit hash and the file state that it's supposed to represent.
Contributing to public packages: a case study
For a while I've thought that the Plots
package needed a
particular feature. This morning I cloned the project to my computer, added
the feature, made a pull
request on GitHub, made a change suggested by one
of the maintainers, and got it approved. The entire elapsed time for this
process was about five hours. In this section I'll describe two more
package system commands that make it easier to hack on public packages.
In the REPL, I entered package mode, then executed
develop Plots
. This command, which can be shortened to
dev
,
clones the named package's Git repository to the user's machine in the
directory .julia/dev/<PackageName>
. Since Plots
is a big package with many source files, this took about two
minutes.
This command also alters the environment so that
using Plots
imports from the version under development,
rather than the official version. The command free Plots
returns to using the official version. One can switch back and forth
between these two incarnations of the package freely, as subsequent
dev
commands won't download anything, but simply switch back
to the development version.
I entered the development directory and created a branch for my feature
with the git checkout ‑b
command. The package
manager doesn't
require this; it's happy to let you mangle the master branch. But I had
plans to ask that my feature be merged into master, and needed to create a
branch for it. Packages under develop
are loaded from the file
tree, not from the Git repository.
Then I wanted to edit the function to add my
feature. But where is it? Plots
has 37 files in its
src
tree. Because of multiple
dispatch, each function can have dozens of methods associated with it,
all with the same name. This makes finding a particular method in the
source difficult to accomplish with simple grep
commands.
The @edit
macro comes to the rescue in just this
situation. Supplied with a function call, it opens the user's default
editor, right in the REPL, to the method definition that the compiler would
select for that particular call. For example, there are 224 methods for the
+
function. One
of these adds together a date and a time. If I want to edit, or look at,
the method for that, I enter:
@edit +(<date_var>, <time_var>)
The two arguments are any variables with the given data types. Vim (in my case) opens in the REPL with the cursor on the correct method definition. I used this macro to edit the plotting function that I wanted to enhance, and tested that it worked as desired. After that, I made my pull request.
Although they don't make actual programming any easier, the
dev
and free
commands, along with the
@edit
macro, remove some of the barriers that stand between
the possibly intimidated programmer and a large codebase.
The final essential tool is a package called Revise.jl
,
which indeed is listed under the "Essential Tools" heading on the Julia
home page, along with the debugger
and profiler. With this package imported into the REPL, every time the
programmer makes an edit to code under development, the new versions of any
changed functions are immediately made active. No manual re-importing nor
restarting of the REPL is required, saving time and making the development
experience more interactive and fluid.
The culture around the Julia development community is a big factor in encouraging new contributors. Many of the contributors to Julia packages are domain experts in areas of science or mathematics, rather than professional programmers; they are interested in improving the tools that they use in their research. Repository maintainers are welcoming and helpful, rather than dismissive of imperfect code. This generous attitude is evident in the discussions attached to pull requests on GitHub.
Modern software development gains much of its power by building solutions on top of imported code, but with this power often comes headaches when a change in some library breaks something that was working yesterday. Julia's package system eliminates most of these headaches.
In a follow-up article, we'll provide an overview of parallel and
concurrent computing in Julia. It will look at multithreading,
heterogeneous computing, facilities for using GPUs, and task-based
parallelism, including an important
improvement in multithreading behavior coming in version 1.7.
Index entries for this article | |
---|---|
GuestArticles | Phillips, Lee |
Posted Oct 13, 2021 16:35 UTC (Wed)
by willy (subscriber, #9762)
[Link] (7 responses)
Posted Oct 13, 2021 16:47 UTC (Wed)
by leephillips (subscriber, #100450)
[Link] (5 responses)
Posted Oct 13, 2021 17:14 UTC (Wed)
by leephillips (subscriber, #100450)
[Link] (4 responses)
Although Julia package development happens almost entirely on GitHub, the process is more decentralized than with npm, as contributors maintain their own forks as part of the GitHub pull request workflow. So one executive of a company deciding to pull down a package would not be so simple.
Every version of every package is identified by a unique UUID within the Manifest and Project files (an implementation detail I did not go into in the article). So switching a dependency on a particular version of a package means changing this identifier in the Manifests of the affected packages. It seems this part of the problem is much more tractable than the situation on npm.
Finally, it’s far less likely that a Julia programmer would create a dependency on a package that does what you can do in one line in Julia. I haven’t come across any public packages that are as trivial as leftpad.
Posted Oct 14, 2021 11:14 UTC (Thu)
by azumanga (subscriber, #90158)
[Link] (3 responses)
Saying every package which has some dependency could "switch to a new version" doesn't feel helpful, you could do that in npm too if you like. I'm not really clear why it would be easier for Julia than it would be for Javascript.
I'm surprised Julia didn't do what Rust did -- there packages in the "package repository" are stored centrally, and unless there is a very serious issue released packages can never be removed. You can disable versions (by 'yanking' them), but users can still get those versions by specifying exact version number.
Posted Oct 14, 2021 12:32 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
What do you mean by "centrally". If you mean "on the net somewhere", what happens if that (for various meanings of "that") go down?
Or is that repository mirrored (should you so choose) on your machine, so you can ALWAYS re-install that package if you need? iiuc gentoo downloads everything, and while I've deliberately configured my system to forget it, I think it's easy enough to change that so it keeps it ...
Cheers,
Posted Oct 17, 2021 7:26 UTC (Sun)
by roc (subscriber, #30627)
[Link]
Posted Oct 14, 2021 12:56 UTC (Thu)
by leephillips (subscriber, #100450)
[Link]
Posted Oct 28, 2021 19:31 UTC (Thu)
by StefanKarpinski (guest, #155008)
[Link]
Posted Oct 15, 2021 2:04 UTC (Fri)
by droundy (subscriber, #4559)
[Link]
Posted Oct 23, 2021 22:12 UTC (Sat)
by garrison (subscriber, #39220)
[Link] (1 responses)
One thing that might interest LWN readers is the great effort the Julia community has put into making packages that Just Work, even if they depend on code built in another language. Long ago, when a user would install a package, Julia would detect which Linux distribution it is running on, then run the suitable sudo apt-get install command (or equivalent) so that its dependencies would be available. This turned out to be brittle -- too much subtle breakage here and there, not to mention the requirement of root access. Because of these issues, Julia 1.3 introduced an artifacts system that allows binary dependencies to be cross-compiled and distributed. The Yggdrasil repository contains recipes for cross compiling these binary dependencies -- the procedure is much like packaging a library for any other Linux distribution. While I was somewhat skeptical when this approach was introduced (their motivation sounds like a classic case of NIH to me), I must admit that it works remarkably well in practice. Years later, I rely on this system daily and it works flawlessly.
Posted Oct 23, 2021 22:18 UTC (Sat)
by garrison (subscriber, #39220)
[Link]
Posted Nov 24, 2021 8:15 UTC (Wed)
by Lawless-M (guest, #155377)
[Link]
You can use any Git Clone method available to you, for instance I pull from my company's privately hosted BitBucket repo.
@v.16] add ssh://git@mirror.bitbucket.intranet:7999/~myname/code.git
You can also use file paths - even windows ones!
@v1.6] add \\windowsUNC\packages\code.jl
With Distributed you can even install packages on remote machines
julia> @everywhere Pkg.add("ssh://git@mirror.bitbucket.intranet:7999/~myname/code.git")
Digging into Julia's package system
Digging into Julia's package system
Digging into Julia's package system
Digging into Julia's package system
Digging into Julia's package system
Wol
Digging into Julia's package system
Digging into Julia's package system
Digging into Julia's package system
Digging into Julia's package system
Digging into Julia's package system
Digging into Julia's package system
Private repos