JupyterLab 4.0: a development environment for education and research
JupyterLab is a web-based development environment widely used by data scientists, engineers, and educators for data visualization, data analysis, prototyping, and interactive learning materials. The Jupyter community has recently announced the release of JupyterLab 4.0, introducing lots of new features and performance improvements to enhance its capabilities both in research and educational settings.
JupyterLab's umbrella project, Jupyter, focuses on creating free and open-source software for interactive computing across all programming languages, using the three-clause BSD license for all of its projects. Jupyter evolved from IPython, which is an interactive shell for Python that later added support for other interpreted languages. Jupyter's core concept is the computational notebook: a shareable document that combines computer code, plain language descriptions, data tables, visualizations, and even interactive controls like sliders for changing parameters.
LWN looked at JupyterLab's first beta release in 2018, but it has made a quite a bit of progress from there. JupyterLab is a full-fledged web-based development environment to create these computational notebooks, which are organized into input and output cells. Users type Python code (or code in any of the other more than 40 supported programming languages) into an input cell. After the user presses shift-return, the code is evaluated and the output is displayed in an output cell below it.
Not only is this an excellent pedagogical tool for teachers, but JupyterLab is also indispensable for researchers when experimenting or building prototypes (see the screen shot below for an experiment analyzing signals from a brain-computer interface). Its popularity among data scientists is no coincidence: rather than re-running a Python script that processes a large data set every time the script is modified, JupyterLab allows users to iteratively develop a notebook with data loading in one cell and processing steps in other cells. Users can then fix bugs and add functionality to the processing cells and re-run only those cells without having to reload the data. Of course, this is also possible with the Python REPL, but without the clear cell-based approach.
Using JupyterLab
JupyterLab can be installed as a Python package (jupyterlab) through PyPI and conda-forge. The 4.0 release is also available from the official repositories in the upcoming Fedora 39 (package jupyterlab), in openSUSE Tumbleweed (package python-jupyterlab, and Arch Linux (package jupyterlab). Debian and Ubuntu don't have the package in their official repositories. After installation, JupyterLab is started by typing jupyter lab at the command line. This opens a new tab or window in the default web browser, displaying the development environment.
At the left of the JupyterLab window, a file browser displays the contents of the current directory, while the panel on the right shows the "launcher". This is where the developer creates new notebooks, text files, or Markdown files. Additionally, the developer can open a console with a Python REPL or a shell from the launcher. Opening a new launcher adds a tab by default, but by dragging the tab's title bar, it can be converted into a new panel and arranged horizontally or vertically. That can be used for simultaneously viewing multiple notebooks or other files, such as CSV files with data used in the notebook's analysis (see the screen shot below with an analysis of the frequency of egg laying by my chickens).
Code editor and performance improvements
As the changelog for JupyterLab 4.0 reveals, the latest release includes significant improvements. The code editor that JupyterLab uses for its cells, CodeMirror, has been updated to version 6. Its most notable enhancement is better customization capabilities. For example, in previous versions, users had to modify settings separately for each type of cell, the file editor, and the console editor. Now, they can change their settings for all of these in one location and override some settings for specific cell types, such as hiding line numbers only for Markdown cells.
The new CodeMirror version also loads large notebooks more quickly. This is just one of the performance improvements featured in the latest JupyterLab release. For example, the upgrade from MathJax version 2 to 3 improves rendering times for mathematical equations. Other optimizations have been made to the CSS rules for JupyterLab's web interface to improve browser performance when many HTML elements are present on a page. JupyterLab 4.0 also introduces notebook windowing: when setting this feature in the Notebook part of the Settings panel to "full", JupyterLab only renders the parts of a notebook that are currently visible in the browser window; the other parts are only rendered when scrolling makes them visible. However, this setting is not enabled by default, as it may have side effects if some cell outputs are displaying HTML iframe elements. All of these performance improvements should be noticeable when working with larger notebooks.
Extensible architecture
JupyterLab is designed as an extensible environment. JupyterLab extensions customize or enhance JupyterLab, for example by providing a new theme, a file viewer or editor for specific file types, or a renderer for specific types of output cells in notebooks.
JupyterLab 3 introduced the ability to install extensions as Python packages via pip. Now JupyterLab 4.0 builds on this feature, making extensions more discoverable from its web interface. The Extension Manager (accessible in the left sidebar by clicking on the jigsaw icon) by default displays extensions from PyPI, at least for Python packages that have the Trove classifier "Framework :: Jupyter :: JupyterLab :: Extensions :: Prebuilt" in their package metadata. However, there is no check to ensure that the extension is compatible with the current JupyterLab version. Consequently, it is possible that an extension found in the Extension Manager will fail to run.
Extension developers need to be aware of numerous breaking changes in the API from JupyterLab 3.x to 4.0 and have to modify affected extensions accordingly. The extension system and packaging have also been changed in JupyterLab 4.0. Fortunately, there is an upgrade script that helps with this migration by creating the necessary files for packaging the extension and updating dependencies to package versions compatible with JupyterLab 4.0. The extension tutorial has also been updated.
Real-time collaboration
JupyterLab is not only a tool for individual development; recent versions have also improved the possibilities for real-time collaboration. This feature allows multiple developers to work on the same notebook simultaneously. When editing the same document, users can see the cursors from other users in the editor, and a side panel displays all connected users. Under the hood, this is based on Yjs, which is a JavaScript framework for shared editing.
Since not all users require real-time collaboration, JupyterLab 4.0 has separated this feature into a distinct package, jupyterlab-collaboration. Just like the main JupyterLab package, this can be installed from PyPI or conda-forge, and it can also be installed from the Extension Manager. After this, JupyterLab needs to be started with the --collaborative option to enable the collaborative mode. Moreover, JupyterLab's web server only listens for local connections by default, while collaborators will obviously need remote access. There are options to allow the server to listen for remote connections (e.g. --ip=0.0.0.0); when it starts up it will display a URL for remote access to the notebook, though this feature is not well documented (except in the help text). TLS can also be enabled by adding options for the certificate and key file on the command line.
Some flaws of Jupyter notebooks still remain
While Jupyter notebooks offer a convenient way to develop Python programs interactively, they have some drawbacks compared to traditional Python development, and JupyterLab 4.0 doesn't change this. To begin with, the cell-based approach can lead to less structured and more linear code. Additionally, notebooks are designed to be standalone, whereas in conventional Python development, functions and classes are often organized into separate modules according to their purpose. Although it's possible to write traditional Python modules and import them into a notebook, this forces the developer to switch between two modes of programming. Consequently, many developers simply adhere to the linear, cell-based structure of their notebook and copy and paste code they want to reuse from another notebook. This approach does not promote code reuse and modularity, making it challenging to maintain and understand Jupyter notebooks, especially in larger projects.
Another point of concern is that notebook cells can be executed out of order. This is convenient while iteratively developing code because it allows going back to a previously executed cell, altering its code to fix a bug, and executing it again before proceeding with another cell. However, this can lead to unexpected results and can make it difficult to understand the flow of the code if the developer forgets to run other cells that are depending on the changed cells. This contrasts with traditional Python development, which involves executing code sequentially in a file or in the REPL, providing a clearer code flow.
When storing Jupyter notebooks into version control, another problem emerges. A notebook is saved in JSON format, combining its code and output (text as well as images) in a single file. This makes it difficult to track changes with version-control systems like Git. In particular, with non-deterministic tasks, such as those in machine learning, merely running the notebook can result in significant changes in the output cells, resulting in a large diff even if the code cells remain unchanged. Fortunately, a tool like nbdime can be used to display only the relevant differences in Jupyter notebooks. Another option is to always clear the output cells of a notebook before committing changes to Git.
Conclusion
JupyterLab 4.0 represents a significant step forward in the development of interactive notebooks for education and research. With its enhanced code editor, performance improvements, new Extension Manager, and expanded real-time collaboration capabilities, JupyterLab continues to be an invaluable tool for data scientists, engineers, and educators. Although the notebook concept still comes with its limitations, it's clear that JupyterLab offers a big productivity boost for many who are working with data. Give it a try for your next project.
| Index entries for this article | |
|---|---|
| GuestArticles | Vervloesem, Koen |
