|
|
Subscribe / Log in / New account

Digging into Julia's package system

October 13, 2021

This article was contributed by Lee Phillips

We recently looked at some of the changes and new features arriving with the upcoming version 1.7 release of the Julia programming language. The package system provided by the language makes it easier to explore new language versions, while still preserving multiple versions of various parts of the ecosystem. This flexible system takes care of dependency management, both for writing exploratory code in the REPL and for developing projects or libraries.

The package system

Julia's package system allows users to experiment with new versions of the language or specific packages without having to worry about conflicts or dependency issues. At the moment, I have four versions of Julia on my computer, and have been using and testing them all, with an assortment of packages under active development, and with my own evolving projects, with no difficulties. This is not due to any particular foresight on my part. The package system largely works automatically, resolving the dependency graph by installing and pre-compiling modules as needed.

In an article from a year ago, I traced Julia's popularity in the sciences in part to its unique ability to allow users to combine features from multiple third-party modules. This is enabled by Julia's type system, its use of multiple dispatch, and its optimizing just-ahead-of-time compiler. But it is made convenient by the package system, which removes the pain from managing a diverse library of software; it is similar to the way that Git revolutionized painless branching and merging, which encouraged programmers to be more willing to experiment with features.

After starting the REPL by invoking the Julia binary on the command line, the user may need functionality not provided in the standard library. The using command imports new functions, variables, and data types into the global namespace. For example, if I want to plot something, I need to execute using Plots, after which I can call plot(sin) and see a picture of the sine function.

The plot() function is one of dozens of names exported by the Plots package. A package is simply code (in a module) alongside some other information including its author, version, and a list of other packages that it depends on. A module is an ordinary Julia program containing statements listing the items that it wants to export, with everything wrapped in a module definition that provides its name. We will look deeper into modules later in the article.

The Plots package is under active development. The version that I import needs to be compatible with the version of Julia that I'm running in the REPL. If I start another REPL by running a different version of Julia, I may need a different version of Plots; and that version of Plots may need different versions of the packages that it depends on. Keeping track of these dependencies, and installing the correct versions of packages to resolve them, is the job of the package manager. It's not an add-on system, but an integral part of Julia.

The package system maintains a tree of which versions of packages to use with various versions of Julia by setting up a different environment for each version. These environments are simply directories named after the language versions, containing two files with information about that version's dependencies. Everything is stored in the user's .julia directory.

When the user starts up the REPL, it's in one of these default environments: the one that matches the version of Julia running in the REPL. The two files used for tracking dependencies are Project.toml and Manifest.toml; the file extension stands for "Tom's Obvious, Minimal Language" designed by Tom Preston-Werner. The files are maintained automatically and the user never needs to look at them.

Package manipulations are most conveniently carried out in the REPL "package mode", which is entered by pressing the ] key. The prompt changes to indicate the mode and current environment. After starting version 1.7rc1 and entering package mode, my prompt is (@v1.7) pkg>. The part within the parentheses indicates the environment; environments beginning with "@" are default environments determined by the Julia version. The package manager doesn't distinguish between a release candidate and the released Julia version, but does note the actual current version in Manifest.toml.

Adding a package uses the add command in package mode; for example, add Plots. This kicks off the process of downloading the files from an official repository on GitHub and pre-compiling the code. The package manager will not needlessly duplicate files, saving disk space by sharing resources when possible. The packages installed directly by the user are listed in the Project.toml file, while Manifest.toml records the entire dependency graph of the environment.

After adding Plots to my fresh install of version 1.7rc1, my Project.toml contained the single line for that package, and my Manifest.toml contained 815 lines: Plots pulls in many other packages.

The command update <Packagename> searches for new versions and installs them, along, of course, with any updated dependencies. The rm command deletes a package from the list of dependencies in the Project.toml file. It undoes an add command. If other packages in the active environment depend on the package, it will persist in the Manifest.toml file.

The rm command does not actually delete anything from the filesystem. An automatic garbage-collection process runs, if needed, when package commands are used. It reclaims disk space by purging packages that no other installed package depends on and that haven't been used for over 30 days. If one needs disk space right away, the garbage collector can be called manually and supplied with a time frame other than the 30-day default.

With this system I can effortlessly switch among my installed Julia versions, using any mixture of packages with any of them. Julia's package manager keeps each default environment logically separate, while avoiding file duplication.

Local packages

The package manager can also work with locally developed code and include it in the dependency management system alongside external packages. To see how this works we need to understand more precisely what packages and modules are.

A Julia module is a section of code beginning with the line module <Modulename> and ending with the keyword that terminates all blocks: end. Its purpose is to define a global namespace from which functions and variables can be imported by other programs. Here is a file defining a module called M1:

    module M1

    export plusone
    plusone(x) = x + 1
    a = 17
    
    end

The export commands list the names that can be used without namespace qualification after this module is imported. Julia doesn't hide anything, though: the caller can access the value of a in this module with M1.a.

An include statement pastes in the file directly, but for portable code reuse we would prefer to be able to simply say using M1 in the REPL or from another program file. For this to work, the module needs to be turned into a package.

A package is a collection of three things: a Project.toml file containing some metadata, such as the identity of the package author; a src directory; and, inside that, a program file like the example above that defines a module. The file should be named after the module; so for the above it should be named M1.jl.

If the M1.jl program uses functions from other packages, it will have dependencies. We can get the package manager to track these dependencies for us, just as it does for our default REPL environments. This is the purpose of the activate command: activate <path> tells the package manager to apply all subsequent commands to the project whose Project.toml file is in path. After activating M1 in this way, we could then issue the command add Plots, it will add Plots as a dependency to M1's Project.toml file, and also build a Manifest.toml file to record the dependency graph. If the latest version of Plots has not been downloaded, it will take care of that, too.

A convenient way to begin developing a Julia project is to say (in this example) generate M1 in the REPL package mode. A complete package directory with a skeleton program file defining the M1 module will be created.

A colleague can install your package (by cloning it from a Git repository or simply copying the files) and then tell the package manager to recreate its environment. This is done by using the activate command to switch to it; the instantiate command is then used to consult the project's Manifest.toml file and install everything in the dependency graph. The code can now be run in the exact environment in which it was developed.

Tracking local packages

Say we'd like to use our local package in our REPL sessions while keeping track of it as it develops with the package manager, just as we do for external packages such as Plots. If we ask the package manager to add it with add <path>, we get an error:

    ERROR: Did not find a git repository at `<path>` 

It appears that we can add dependencies to our package, but we're not yet allowed to add our package as a dependency to anything else.

Just as the public Julia ecosystem lives on GitHub, local projects must live in Git repositories if they are to fully participate in the package system. Initializing a Git repository for the package and making an initial commit is all that's required to satisfy Julia's package system. When tracking local projects, the package manager doesn't look at the file tree, but at the Git repository. An add or update command in package mode checks out the files from the local repository and caches them in the .julia/packages directory. By default this tracks the tip of the master branch, but it is also possible to track other branches.

The package manager tracks commits not through the commit hash, however, but through the tree hash. Many Git users are unaware of this hash, because it's rarely needed for anything. The tree hash encodes the actual contents of all the tracked files in the commit; it can be displayed with git log ‑‑pretty=raw. Julia's package manager uses this rather than the commit hash because it's more reliable: through rebasing or other Git operations it's possible to break the connection between the commit hash and the file state that it's supposed to represent.

Contributing to public packages: a case study

For a while I've thought that the Plots package needed a particular feature. This morning I cloned the project to my computer, added the feature, made a pull request on GitHub, made a change suggested by one of the maintainers, and got it approved. The entire elapsed time for this process was about five hours. In this section I'll describe two more package system commands that make it easier to hack on public packages.

In the REPL, I entered package mode, then executed develop Plots. This command, which can be shortened to dev, clones the named package's Git repository to the user's machine in the directory .julia/dev/<PackageName>. Since Plots is a big package with many source files, this took about two minutes.

This command also alters the environment so that using Plots imports from the version under development, rather than the official version. The command free Plots returns to using the official version. One can switch back and forth between these two incarnations of the package freely, as subsequent dev commands won't download anything, but simply switch back to the development version.

I entered the development directory and created a branch for my feature with the git checkout ‑b command. The package manager doesn't require this; it's happy to let you mangle the master branch. But I had plans to ask that my feature be merged into master, and needed to create a branch for it. Packages under develop are loaded from the file tree, not from the Git repository.

Then I wanted to edit the function to add my feature. But where is it? Plots has 37 files in its src tree. Because of multiple dispatch, each function can have dozens of methods associated with it, all with the same name. This makes finding a particular method in the source difficult to accomplish with simple grep commands.

The @edit macro comes to the rescue in just this situation. Supplied with a function call, it opens the user's default editor, right in the REPL, to the method definition that the compiler would select for that particular call. For example, there are 224 methods for the + function. One of these adds together a date and a time. If I want to edit, or look at, the method for that, I enter:

    @edit +(<date_var>, <time_var>)

The two arguments are any variables with the given data types. Vim (in my case) opens in the REPL with the cursor on the correct method definition. I used this macro to edit the plotting function that I wanted to enhance, and tested that it worked as desired. After that, I made my pull request.

Although they don't make actual programming any easier, the dev and free commands, along with the @edit macro, remove some of the barriers that stand between the possibly intimidated programmer and a large codebase.

The final essential tool is a package called Revise.jl, which indeed is listed under the "Essential Tools" heading on the Julia home page, along with the debugger and profiler. With this package imported into the REPL, every time the programmer makes an edit to code under development, the new versions of any changed functions are immediately made active. No manual re-importing nor restarting of the REPL is required, saving time and making the development experience more interactive and fluid.

The culture around the Julia development community is a big factor in encouraging new contributors. Many of the contributors to Julia packages are domain experts in areas of science or mathematics, rather than professional programmers; they are interested in improving the tools that they use in their research. Repository maintainers are welcoming and helpful, rather than dismissive of imperfect code. This generous attitude is evident in the discussions attached to pull requests on GitHub.

Modern software development gains much of its power by building solutions on top of imported code, but with this power often comes headaches when a change in some library breaks something that was working yesterday. Julia's package system eliminates most of these headaches.

In a follow-up article, we'll provide an overview of parallel and concurrent computing in Julia. It will look at multithreading, heterogeneous computing, facilities for using GPUs, and task-based parallelism, including an important improvement in multithreading behavior coming in version 1.7.

Index entries for this article
GuestArticlesPhillips, Lee


to post comments

Digging into Julia's package system

Posted Oct 13, 2021 16:35 UTC (Wed) by willy (subscriber, #9762) [Link] (7 responses)

Could you touch on how Julia's packaging system and/or development community prevents an incident like leftpad?

Digging into Julia's package system

Posted Oct 13, 2021 16:47 UTC (Wed) by leephillips (subscriber, #100450) [Link] (5 responses)

The incident involves the sole maintainer of a package removing it from a repository. In that case Julia will continue to use the version stored on your machine, and when it checks for updates it won’t find anything. Someone in possession of the source, which would be anyone who had `dev`ed it, could recreate the GitHub repository, if the license allowed it. If it were a package with other contributors, there would be other forks of the project on GitHub, and the registry could be changed to point to one of these, I suppose.

Digging into Julia's package system

Posted Oct 13, 2021 17:14 UTC (Wed) by leephillips (subscriber, #100450) [Link] (4 responses)

After reading Nathan Willis’ article about this incident at https://lwn.net/Articles/681410/, I have a few more observations.

Although Julia package development happens almost entirely on GitHub, the process is more decentralized than with npm, as contributors maintain their own forks as part of the GitHub pull request workflow. So one executive of a company deciding to pull down a package would not be so simple.

Every version of every package is identified by a unique UUID within the Manifest and Project files (an implementation detail I did not go into in the article). So switching a dependency on a particular version of a package means changing this identifier in the Manifests of the affected packages. It seems this part of the problem is much more tractable than the situation on npm.

Finally, it’s far less likely that a Julia programmer would create a dependency on a package that does what you can do in one line in Julia. I haven’t come across any public packages that are as trivial as leftpad.

Digging into Julia's package system

Posted Oct 14, 2021 11:14 UTC (Thu) by azumanga (subscriber, #90158) [Link] (3 responses)

To be honest, that sounds as bad as javascript!

Saying every package which has some dependency could "switch to a new version" doesn't feel helpful, you could do that in npm too if you like. I'm not really clear why it would be easier for Julia than it would be for Javascript.

I'm surprised Julia didn't do what Rust did -- there packages in the "package repository" are stored centrally, and unless there is a very serious issue released packages can never be removed. You can disable versions (by 'yanking' them), but users can still get those versions by specifying exact version number.

Digging into Julia's package system

Posted Oct 14, 2021 12:32 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> I'm surprised Julia didn't do what Rust did -- there packages in the "package repository" are stored centrally, and unless there is a very serious issue released packages can never be removed.

What do you mean by "centrally". If you mean "on the net somewhere", what happens if that (for various meanings of "that") go down?

Or is that repository mirrored (should you so choose) on your machine, so you can ALWAYS re-install that package if you need? iiuc gentoo downloads everything, and while I've deliberately configured my system to forget it, I think it's easy enough to change that so it keeps it ...

Cheers,
Wol

Digging into Julia's package system

Posted Oct 17, 2021 7:26 UTC (Sun) by roc (subscriber, #30627) [Link]

crates.io packages are stored in S3 and cached locally. S3 isn't really going to go down for technical reasons. Hopefully someone has a copy of the archive in case those S3 resources get deleted.

Digging into Julia's package system

Posted Oct 14, 2021 12:56 UTC (Thu) by leephillips (subscriber, #100450) [Link]

I’m afraid I don’t understand your comment. I didn’t say “switch to a new version” anywhere.

Digging into Julia's package system

Posted Oct 28, 2021 19:31 UTC (Thu) by StefanKarpinski (guest, #155008) [Link]

The Julia client by default fetches all packages from a package server at https://pkg.juilalang.org, which implements a simple HTTP protocol for serving content-addressed immutable tarballs of package code. These source tarballs are stored persistently in S3 and replicated across a global network of servers that keep them forever. So even if GitHub were to disappear off the face of the earth, every registered package version would continue to be available to install via package servers. So no, Julia cannot get left-padded.

Digging into Julia's package system

Posted Oct 15, 2021 2:04 UTC (Fri) by droundy (subscriber, #4559) [Link]

I wish these Julia articles would give more comparison to other languages. Rather than trying to describe in detail how things are done in Julia, it would be much easier to read if it explained how it was similar and different from other languages. Python and Rust come to mind as systems with packaging systems that would provide productive comparisons.

Digging into Julia's package system

Posted Oct 23, 2021 22:12 UTC (Sat) by garrison (subscriber, #39220) [Link] (1 responses)

One thing that might interest LWN readers is the great effort the Julia community has put into making packages that Just Work, even if they depend on code built in another language. Long ago, when a user would install a package, Julia would detect which Linux distribution it is running on, then run the suitable sudo apt-get install command (or equivalent) so that its dependencies would be available. This turned out to be brittle -- too much subtle breakage here and there, not to mention the requirement of root access.

Because of these issues, Julia 1.3 introduced an artifacts system that allows binary dependencies to be cross-compiled and distributed. The Yggdrasil repository contains recipes for cross compiling these binary dependencies -- the procedure is much like packaging a library for any other Linux distribution. While I was somewhat skeptical when this approach was introduced (their motivation sounds like a classic case of NIH to me), I must admit that it works remarkably well in practice. Years later, I rely on this system daily and it works flawlessly.

Digging into Julia's package system

Posted Oct 23, 2021 22:18 UTC (Sat) by garrison (subscriber, #39220) [Link]

Sorry, my link to the "motivation" should have been to https://github.com/JuliaPackaging/BinaryBuilder.jl#philos...

Private repos

Posted Nov 24, 2021 8:15 UTC (Wed) by Lawless-M (guest, #155377) [Link]

One thing not mentioned is the ability to install via Private repos

You can use any Git Clone method available to you, for instance I pull from my company's privately hosted BitBucket repo.

@v.16] add ssh://git@mirror.bitbucket.intranet:7999/~myname/code.git

You can also use file paths - even windows ones!

@v1.6] add \\windowsUNC\packages\code.jl

With Distributed you can even install packages on remote machines

julia> @everywhere Pkg.add("ssh://git@mirror.bitbucket.intranet:7999/~myname/code.git")


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds