|
|
Subscribe / Log in / New account

Jupyter: notebooks for education and collaboration

February 6, 2018

This article was contributed by Lee Phillips

The popular interpreted language Python shares a mode of interaction with many other languages, from Lisp to APL to Julia: the REPL (read-eval-print-loop) allows the user to experiment with and explore their code, while maintaining a workspace of global variables and functions. This is in contrast with languages such as Fortran and C, which must be compiled and run as complete programs (a mode of operation available to the REPL-enabled languages as well). But using a REPL is a solitary task; one can write a program to share based on their explorations, but the REPL session itself not easily shareable. So REPLs have gotten more sophisticated over time, evolving into shareable notebooks, such as what IPython, and its more recent descendant, Jupyter, have. Here we look at Jupyter: its history, notebooks, and how it enables better collaboration in languages well beyond its Python roots.

History

Python also has an enhanced REPL called IPython, which adds a host of abilities similar to what is available in an integrated development environment, such as a help system and tab completion. IPython offers some special features well-suited to Python's popularity as a language for scientific computation and analysis, such as smooth integration with matplotlib and other graphics packages and parallel computation. The architecture of IPython allowed it to be used as an interface for other interpreted languages besides Python, by writing a "kernel" to communicate between the host language and the interactive shell.

Early in its development, IPython gained an alternative interface called the "IPython Notebook." This was a web-browser based interface similar to the pioneering notebook interface used in the commercial symbolic mathematics program Mathematica. The IPython Notebook inherited IPython's ability to interact with other languages besides Python, and began to be used with R, Ruby, and others.

Beginning a few years ago, the notebook component of the IPython Notebook was split from the main IPython project to stand on its own. The new project is called Jupyter (the name is a loose acronym derived from "Julia", "Python", and "R", some of the first languages that the technology supported). The software, actively developed on GitHub with over 300 contributors, is organized under the umbrella Project Jupyter. The project is committed to remaining free and open-source; it uses the three-clause BSD license. Jupyter has a vibrant and enthusiastic user and developer community, with over 700 participants attending an annual conference. It survives through donations from users and the direct financial support of a handful of organizations and corporations.

Project Jupyter is organized around a "steering committee" of 15 members, including Fernando Pérez and Brian Granger, the original leaders of the IPython Notebook project, and a diverse group employed by a wide variety of academic and corporate institutions. Pérez and Granger describe its aims and philosophy, and some additional aspects of his history, in this overview of "The state of Jupyter".

The Jupyter interface offers several advantages over the traditional terminal-based REPLs. When you use Jupyter, your input and the program's output are automatically preserved on the page, forming a "notebook" that you can return to later and share with colleagues or students. You can interleave markdown-formatted, explanatory text between your interactions with the program, turning the notebook into something worthy of the name, or into the outline of a paper or textbook. Program output is not limited to text, but can include graphics, which can be embedded into the page. The pedagogical possibilities of the Jupyter notebook are expanded further by the ability to include widgets, such as sliders, for interacting with programs.

Those with experience with some past attempts at this kind of interface may suspect that Jupyter is clumsy and slow, but this is not at all the case. Interacting with Jupyter is a speedy, smooth, and responsive experience, which provides tab completion and syntax highlighting.

Jupyter kernels are usually written in the host language, but, for languages that can be controlled from Python, another option is to write the kernel in Python. Since the system uses a web browser, starting Jupyter also starts up a local web server to handle communication between the browser and the kernel.

Because of its origins, Jupyter is installed with the Python kernel and that is the only one required for its operation. The popularity of the Jupyter concept has led developers to continue to create kernels for a wide variety of languages besides Python, including C, Clojure, Julia, R, Fortran, and many others . Some of these (a partial list can be found here) are young projects and in early stages of development, while others are quite mature. One interesting thing to notice here is that Jupyter can be used not only for interpreted languages such as Python and Ruby, but for compiled languages such as C and Fortran, providing a vaguely REPL-like experience for developing in these languages. On a personal note, after I belatedly discovered that there is a gnuplot kernel available for Jupyter, it immediately became an indispensable part of my workflow for putting together a book about gnuplot.

The Jupyter project has become quite popular, especially with teachers and researchers using Python in data science, computational physics, and several other fields. Its reputation has grown as a convenient solution to the longstanding problems of sharing code and algorithms with co-researchers, or providing extra documentation on computational methods that was too detailed to be included in a published paper. A curated list of example Jupyter notebooks from its user community, indexed by topic and purpose, is available.

There are other programs, both free and non-free, that employ a notebook interface similar to IPython Notebook and Jupyter. Perhaps the most well-known commercial offering is Mathematica, which shares Jupyter's cell structure (see below) and embedded graphics; the free software statistical platform R has the R Notebook; the Python-based mathematical software package Sage featured its own notebook interface, but it has now been superseded by Jupyter; and there are several others. But Jupyter is unique in that it is designed to serve as a general notebook that can work with any programming language, or several at the same time — as long as the appropriate kernels are available.

Using Jupyter

It is possible to try Jupyter out without installing it; we'll explain how to do so in the next section. If you plan to use the software regularly in your work, however, you'll want to install it locally. There are two main strategies for setting up your system with Python, Jupyter, and the other libraries that you will probably want to use. The simple, one-stop-shopping approach is to download the Anaconda distribution, a large archive that contains Python, Jupyter, the scientific toolkit contained in SciPy, and more. This may be the most convenient approach to installation for Windows and Mac users, and is favored by many Linux users as well. I have worked with inexperienced users who were able to start working with Python on the Jupyter Notebook, on both Linux and MacOS, with very little fuss, by following the Anaconda route.

Others may prefer to retain more control, or may not require the entire Anaconda distribution since they have equivalent components, such as Python, already installed. And for users of other operating systems besides the big three, Anaconda is not available. These users will follow the second strategy. To install the software without using a monolithic distribution such as Anaconda, first make sure that Python (Python 3, unless you have a compelling case to use Python 2) is installed, and install pip, the Python package manager. The "pip3 install" command can then be used to install Jupyter, SciPy, and anything else from the Python universe. Jupyter may also be available from your distribution's package manager, but the version installed through pip is more likely to be up to date. This method allows you to experiment with a much smaller install of Python and Jupyter, and to install other components, such as SciPy, down the road, depending on your needs.

Once you've confirmed that Python is working on the command line, you can start up a Jupyter Notebook by executing the command "jupyter notebook" at the terminal. This will launch a local web server (that is only accessible to localhost by default), open a tab in your preferred web browser connected to the local server, and display a list of the files in your current directory. This web page will serve as a hub from which to open existing notebooks or start new ones, and to display information about running notebooks and shut them down if needed. The page will feature some tabs and menus as well; the most important when starting out is the "New" menu, shown selected in the figure below. Here you will see a list of your installed kernels. Python will always be available; on my machine, I've also installed the gnuplot kernel.

[Jupyter menu]

To create a new Python notebook, select the "Python 3" item, and a new tab or browser window will open, ready to accept input. The Jupyter notebook is organized into input and output "cells". You can type any legal Python code into the input cells, with the benefit of syntax highlighting and tab completion. Typing shift-return in an input cell evaluates the contents of the cell and displays the output in an output cell immediately beneath it. This output can consist of text, graphics, and more; the way it is displayed is partly controlled by global settings established in %-prefixed "magic" commands. The next figure shows an example of one of the magic commands, %matplotlib inline, that causes matplotlib plots to be embedded into the notebook.

[Notebook example]

Other magic commands do such things as define aliases for other commands, change aspects of the interface such as prompt colors and precision for printing numbers, manipulate environment variables, load code, log the session, search for objects in the current namespace, run system commands, time execution, and much more. These commands are specific to the kernel in use; kernels other than Python may provide a different set, or none at all. In addition to these, there are "cell magics," which alter the behavior of the notebook for a single cell. For example, if you begin a cell with %%markdown, the contents of that cell will be interpreted as markdown-formatted text, rather than Python code. If you have more than one kernel installed, you can send the contents of a particular cell to a different kernel by mentioning it in a cell magic, making the Jupyter notebook a multi-lingual computing environment. To find out about the other magic commands available, evaluate the command %magic.

Cells can be freely added, deleted and rearranged, gradually transforming a series of explorations into a narrative that teaches a lesson or recounts the development of a research program. As you work, your notebook is auto-saved at frequent intervals. You can also save a checkpoint at any time; the system places a list of your checkpoints in a menu, allowing you to revert the notebook to a previous state if you make a mess of it.

[Input example]

In my testing, Python programs run in the Jupyter notebook were able to handle anything that "normally" executed Python can handle, from file input and output to Turtle graphics to accepting input from the user, as shown in the figure to the right. The only significant difference from using the normal REPL is that the entire contents of a cell are sent to the interpreter upon pressing shift-return, rather than one line at a time.

The fact that everything seems to just work, as in the figure, where console input has been seamlessly translated into HTML and JavaScript, is an impressive achievement for which the Jupyter developers are to be congratulated. The occasional problem that I did encounter with the Python or other interpreter seeming to get stuck in an inconsistent state is quickly solved by selecting the menu command for restarting the kernel. There is another command that restarts the kernel and runs all the cells in the notebook, which is convenient for renumbering the input and output cells after a typical session of trial, error, and cell rearrangement; this also ensures that the code in all the cells still works, which makes sense to do before sharing your notebook. You will also need to restart the kernel if you want to reset any global state held by the interpreter.

If your interest is in using the notebook for education, you probably want to learn about ipywidgets. This is a library of interactive controls, implemented in JavaScript, that allow you, with very little extra code, to add a graphical interface to the Python code in the notebook. The widgets provide a way for the user to set function parameters and have the function re-evaluated and its output immediately displayed. The following short video shows the import necessary to use the widget library (it's in a separate cell, along with the numpy and pylab imports, as a reminder of the global state of the interpreter). The definition of a simple routine that plots the sine function of frequency n is in a separate cell. The single line of code in the next cell creates the widget and hooks it into the function. It has two arguments: the name of the function and the name of the function argument that we wish to control. If, as here, we set this to a tuple with three elements, these will become the minimum and maximum values of the argument and the step size.

This is the simplest example, showing how you can create a graphical interface to any function with a single line of code. The documentation in the link above introduces the other widgets available, for both input and display of data, facilities for laying them out into a more fully-realized graphical interface, and explains how to create your own custom widgets.

Sharing notebooks

Eventually, you may want to share your Jupyter notebook with colleagues or students. There are several strategies for accomplishing this; which one to choose is largely depends on your audience.

The native, auto-saved form of the notebook, which is in JSON format, can be opened and used by anyone with Jupyter installed (as well as any additional kernels or libraries that the notebook depends on, of course). This file is self-contained, with images included as inline data. If you're sharing among a group of colleagues who are also using Jupyter, there is little reason to do anything more elaborate; the textual form of the notebook file even makes collaboration using Git or another version control system practical.

If you need to provide a version of a notebook for reading only, with no interaction or modification required, you can export it. Simply by selecting a command from a menu, you can create a nicely formatted version of your notebook as HTML, PDF, or other formats. The result contains images and syntax-highlighted code, but omits any interactive widgets and their output.

This is the route to take in order to turn a notebook into an article or book. What really makes this possible is Jupyter's ability to insert markdown-formatted cells anywhere in the notebook. When these are "executed", they turn into the formatted, HTML version, but can be edited at will. Thus you can easily interleave formatted text with code and output to create a finished narrative.

For more control, such as the use of specialized templates, or for automating the conversion, the command "jupyter nbconvert" comes with the Jupyter installation. This command accepts arguments for output format and template; for some formats, such as LaTeX or PDF, you will need to have other software installed (pandoc for translating to LaTeX and some other formats, and a LaTeX installation for creating PDFs, for example).

The nbviewer project provides Jupyter's nbconvert facility as a web service. This is used by GitHub for displaying Jupyter notebooks; it converts the native notebook, on demand, into HTML. There can be significant delay when attempting to view a notebook served through nbviewer, but its use means that only the native format need be saved, which is useful for sites such as GitHub, where the user may want to select which commit of a notebook to view.

Exporting has the advantage that your reader can view your notebook without needing to install Jupyter (or anything else); only a web browser is required. In a fairly recent development, it is now possible to offer a fully interactive experience of your notebook while still requiring nothing besides a browser from the reader. The technology is called JupyterHub; it launches multiple instances of the Jupyter web server in response to requests from the public web, extending the capabilities of the local web server that runs on your machine in response to the "jupyter notebook" command.

An organization called Binder has adapted this technology to generously offer the public the ability to interact with Jupyter notebooks that are stored in GitHub repositories. In order to make your notebook available for interactive sharing, you merely need place it in a repository along with a simple text file that lists any imports required by your code. In order to use a notebook on GitHub, simply go to the Binder web site and enter the URL of the repository, and click on the orange "launch" button. A new browser window will open with the names of the files in the repository; click on the notebook of interest to launch it.

I've placed a simple notebook in a repository at https://github.com/leephillips/JupyterExperiments. It demonstrates the widget slider shown in the video above. If you launch it through Binder, you will see a list of the three files in the repository. Clicking on the name of the notebook, widgetExamples.ipynb, will launch it in a new window. Click on each cell and execute it with shift-enter to begin. You can make changes, add cells, and experiment at will, but if you want to preserve a copy of your work, you need to visit the "File" menu to download a copy to your machine. If you download as a "Notebook", you can open the file locally if you have Jupyter installed.

The other two files are a short README and the requirements.txt file, which tells Binder what the prerequisites are for running the notebook. This file can be more elaborate than my simple example, with version requirements and other information if needed.

The most recent release of the Jupyter Notebook is version 5.3.0, announced on January 17. While development of the mainline Notebook has and will continue for the foreseeable future, work is well underway on the "next generation" interface for Project Jupyter, called JupyterLab. It is still in "early preview" stage, but JupyterLab has already gained some enthusiastic early adopters. It retains the Jupyter Notebook web interface, but enhances it with improvements such as the ability to use multiple tabs, collapse cells, and edit markdown in side-by-side windows. JupyterLab also allows the user to interact with 3D objects and with large tables of data. As the interface is designed to be extended through the use of a public API, third-party developers will be free to create novel interfaces into the world of interactive computing and data analysis.

In just a few years, the Jupyter notebook has generated a great deal of enthusiasm and enjoyed widespread adoption. It is closely associated with SciPy, a free-software ecosystem for scientific and numerical calculation using Python, with which it has helped to accelerate the growth of a new culture of openness in the sciences. This is due, in part, to the ease with which Jupyter allows the researcher to create a living notebook describing methods and algorithms, and to easily share this document in a variety of forms. Jupyter provides a vibrant example of how a technology can transcend its origins as a mere tool, and have a powerful effect on the community that adopts it.


Index entries for this article
GuestArticlesPhillips, Lee


to post comments

Jupyter: notebooks for education and collaboration

Posted Feb 6, 2018 21:30 UTC (Tue) by hnoronhaf (guest, #109244) [Link] (1 responses)

I use Jupyter notebook at college classes for annotations and even for essays. It's greater because it can supports latex or any other markup language while been able to code and export it to many formats

Jupyter: notebooks for education and collaboration

Posted Feb 7, 2018 5:46 UTC (Wed) by leephillips (subscriber, #100450) [Link]

I think I would have enjoyed using this technology back in my school days.

Not just for education and collaboration

Posted Feb 7, 2018 7:45 UTC (Wed) by osma (subscriber, #6912) [Link]

Thank you for this article. I already use Jupyter occasionally (and used IPython Notebook earlier), but it taught me many new things that can be done.

The article emphasizes education and collaboration, and while those are surely important, I've found Jupyter convenient for other reasons: it allows experimenting on large data sets without having to wait for them to load every time you start your script. For example RDF files loaded with rdflib tend to be like this. Loading a non-trivial RDF file can take several minutes. If you write an old-fashioned Python script to process a large file, it's frustrating to wait for it to load the data only to see it crash because of a trivial bug after loading. With Jupyter I can approach the problem one step at a time, checking intermediate results and preserving the already loaded data structures. This is of course also possible with the Python REPL, but then it's much more difficult to keep track of what you've done and save the code into a script for later use. With Jupyter, once I'm confident that the process works, I can easily turn the notebook into a regular Python script.

Jupyter's packaging in Fedora

Posted Feb 8, 2018 12:19 UTC (Thu) by gdt (subscriber, #6284) [Link] (1 responses)

Jupyter installation for Fedora 27 is simply:

$ sudo dnf install notebook

The Fedora ❤ Python website has a good summary of Fedora's packaging for popular science Python packages.

Jupyter's packaging in Fedora

Posted Feb 8, 2018 12:36 UTC (Thu) by gerdesj (subscriber, #5446) [Link]

$ sudo pacman -S jupyter
$ jupyter-notebook

Jupyter: notebooks for education and collaboration

Posted Feb 8, 2018 19:32 UTC (Thu) by marcel.oliver (subscriber, #5441) [Link] (1 responses)

First: yes, the notebook is a great tool, especially the markdown-formatted text between cells. I have used it, mostly happily, for presentations and teaching. However, there are a few warts, some are a matter of principle, some specific to the implementation, that one needs to be aware of.
  • ipynbs are difficult to keep under version control. There are some hacks to ease the problem, but it's probably never going to be so easy and natural as compared to plain .py files.
  • ipynbs have implicit state. I have run into funny errors when debugging students' code in a lab that where ultimately due to the student having defined something unexpected and then deleted the offending code. So kernel restarts are a regular debugging necessity, but then it's important to keep the order of cells clean to be be able to "run all", which too often fails so that it's hard to get back to the point where the problem occurred. It also happens too frequently that the notebook output looks OK, but when the Teaching Assistant does a "run all" on a fresh kernel to verify the work, it fails and nobody has any idea of how to get it back. For this reason I have disallowed project submissions as ipynbs in "serious" scientific computing classes. (That said, it's very useful still in situations with students without strong programming skills.)
  • Last time I looked, plots could not be resized interactively, which is a major drawback as compared to the Mathematica notebook. There are other limitations, too, but this one feels the most limiting in actual use.
  • Jupyter does not support mouse-button copy-paste, which I personally find extremely annoying (but students and colleagues who grow up on other operating systems don't seem to mind...).
  • And last, startup profiles where removed from both ipython and jupyter, which is also annoying as I would like to setup shell aliases or script files to start a fully operational 'pylab' or 'sympy' session from the shell. So the inability to set up these and possibly more sophisticated session types automatically seems to take some of the advantage of the interactive nature of the notebook away.
But overall, I do find Jupyter very useful. Some of these issues come down to choosing the right tool for the job and others will hopefully be addressed in future Jupyter development.

Jupyter: notebooks for education and collaboration

Posted Feb 9, 2018 21:15 UTC (Fri) by efiring (guest, #4543) [Link]

Regarding your third item: If you use "%matplotlib notebook" instead of "%matplotlib inline" your embedded plots will be fully interactive--you can resize, zoom, etc.--until you close the figure either programmatically or via the "stop interaction" button in the upper right-hand corner. Alternatively, you can use "%matplotlib qt" (for example, assuming you have PyQt5 installed) to generate each plot in its own independent window, outside the browser. This provides more speed and flexibility while developing the notebook, but saving the plots as embedded in the notebook or in an export requires restarting the kernel and re-running with "notebook" or "inline" in place of "qt" in that initial "magic" directive.

Jupyter: notebooks for education and collaboration

Posted Aug 30, 2018 9:29 UTC (Thu) by dsblank (guest, #126925) [Link]

The 2018 JupyterCon had an education track: https://blog.jupyter.org/synopsis-jupytercon-2018-education-track-6f9b3f4d8dd9 Note that development in Jupyter is continuous, and so things continually change. For example, Just released was the new Jupyter Lab.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds