|
|
Log in / Subscribe / Register

JupyterLab 4.0: a development environment for education and research

June 28, 2023

This article was contributed by Koen Vervloesem

JupyterLab is a web-based development environment widely used by data scientists, engineers, and educators for data visualization, data analysis, prototyping, and interactive learning materials. The Jupyter community has recently announced the release of JupyterLab 4.0, introducing lots of new features and performance improvements to enhance its capabilities both in research and educational settings.

JupyterLab's umbrella project, Jupyter, focuses on creating free and open-source software for interactive computing across all programming languages, using the three-clause BSD license for all of its projects. Jupyter evolved from IPython, which is an interactive shell for Python that later added support for other interpreted languages. Jupyter's core concept is the computational notebook: a shareable document that combines computer code, plain language descriptions, data tables, visualizations, and even interactive controls like sliders for changing parameters.

LWN looked at JupyterLab's first beta release in 2018, but it has made a quite a bit of progress from there. JupyterLab is a full-fledged web-based development environment to create these computational notebooks, which are organized into input and output cells. Users type Python code (or code in any of the other more than 40 supported programming languages) into an input cell. After the user presses shift-return, the code is evaluated and the output is displayed in an output cell below it.

Not only is this an excellent pedagogical tool for teachers, but JupyterLab is also indispensable for researchers when experimenting or building prototypes (see the screen shot below for an experiment analyzing signals from a brain-computer interface). Its popularity among data scientists is no coincidence: rather than re-running a Python script that processes a large data set every time the script is modified, JupyterLab allows users to iteratively develop a notebook with data loading in one cell and processing steps in other cells. Users can then fix bugs and add functionality to the processing cells and re-run only those cells without having to reload the data. Of course, this is also possible with the Python REPL, but without the clear cell-based approach.

[JupyterLab example]

Using JupyterLab

JupyterLab can be installed as a Python package (jupyterlab) through PyPI and conda-forge. The 4.0 release is also available from the official repositories in the upcoming Fedora 39 (package jupyterlab), in openSUSE Tumbleweed (package python-jupyterlab, and Arch Linux (package jupyterlab). Debian and Ubuntu don't have the package in their official repositories. After installation, JupyterLab is started by typing jupyter lab at the command line. This opens a new tab or window in the default web browser, displaying the development environment.

At the left of the JupyterLab window, a file browser displays the contents of the current directory, while the panel on the right shows the "launcher". This is where the developer creates new notebooks, text files, or Markdown files. Additionally, the developer can open a console with a Python REPL or a shell from the launcher. Opening a new launcher adds a tab by default, but by dragging the tab's title bar, it can be converted into a new panel and arranged horizontally or vertically. That can be used for simultaneously viewing multiple notebooks or other files, such as CSV files with data used in the notebook's analysis (see the screen shot below with an analysis of the frequency of egg laying by my chickens).

[Multiple panels]

Code editor and performance improvements

As the changelog for JupyterLab 4.0 reveals, the latest release includes significant improvements. The code editor that JupyterLab uses for its cells, CodeMirror, has been updated to version 6. Its most notable enhancement is better customization capabilities. For example, in previous versions, users had to modify settings separately for each type of cell, the file editor, and the console editor. Now, they can change their settings for all of these in one location and override some settings for specific cell types, such as hiding line numbers only for Markdown cells.

The new CodeMirror version also loads large notebooks more quickly. This is just one of the performance improvements featured in the latest JupyterLab release. For example, the upgrade from MathJax version 2 to 3 improves rendering times for mathematical equations. Other optimizations have been made to the CSS rules for JupyterLab's web interface to improve browser performance when many HTML elements are present on a page. JupyterLab 4.0 also introduces notebook windowing: when setting this feature in the Notebook part of the Settings panel to "full", JupyterLab only renders the parts of a notebook that are currently visible in the browser window; the other parts are only rendered when scrolling makes them visible. However, this setting is not enabled by default, as it may have side effects if some cell outputs are displaying HTML iframe elements. All of these performance improvements should be noticeable when working with larger notebooks.

Extensible architecture

JupyterLab is designed as an extensible environment. JupyterLab extensions customize or enhance JupyterLab, for example by providing a new theme, a file viewer or editor for specific file types, or a renderer for specific types of output cells in notebooks.

JupyterLab 3 introduced the ability to install extensions as Python packages via pip. Now JupyterLab 4.0 builds on this feature, making extensions more discoverable from its web interface. The Extension Manager (accessible in the left sidebar by clicking on the jigsaw icon) by default displays extensions from PyPI, at least for Python packages that have the Trove classifier "Framework :: Jupyter :: JupyterLab :: Extensions :: Prebuilt" in their package metadata. However, there is no check to ensure that the extension is compatible with the current JupyterLab version. Consequently, it is possible that an extension found in the Extension Manager will fail to run.

Extension developers need to be aware of numerous breaking changes in the API from JupyterLab 3.x to 4.0 and have to modify affected extensions accordingly. The extension system and packaging have also been changed in JupyterLab 4.0. Fortunately, there is an upgrade script that helps with this migration by creating the necessary files for packaging the extension and updating dependencies to package versions compatible with JupyterLab 4.0. The extension tutorial has also been updated.

Real-time collaboration

JupyterLab is not only a tool for individual development; recent versions have also improved the possibilities for real-time collaboration. This feature allows multiple developers to work on the same notebook simultaneously. When editing the same document, users can see the cursors from other users in the editor, and a side panel displays all connected users. Under the hood, this is based on Yjs, which is a JavaScript framework for shared editing.

Since not all users require real-time collaboration, JupyterLab 4.0 has separated this feature into a distinct package, jupyterlab-collaboration. Just like the main JupyterLab package, this can be installed from PyPI or conda-forge, and it can also be installed from the Extension Manager. After this, JupyterLab needs to be started with the --collaborative option to enable the collaborative mode. Moreover, JupyterLab's web server only listens for local connections by default, while collaborators will obviously need remote access. There are options to allow the server to listen for remote connections (e.g. --ip=0.0.0.0); when it starts up it will display a URL for remote access to the notebook, though this feature is not well documented (except in the help text). TLS can also be enabled by adding options for the certificate and key file on the command line.

Some flaws of Jupyter notebooks still remain

While Jupyter notebooks offer a convenient way to develop Python programs interactively, they have some drawbacks compared to traditional Python development, and JupyterLab 4.0 doesn't change this. To begin with, the cell-based approach can lead to less structured and more linear code. Additionally, notebooks are designed to be standalone, whereas in conventional Python development, functions and classes are often organized into separate modules according to their purpose. Although it's possible to write traditional Python modules and import them into a notebook, this forces the developer to switch between two modes of programming. Consequently, many developers simply adhere to the linear, cell-based structure of their notebook and copy and paste code they want to reuse from another notebook. This approach does not promote code reuse and modularity, making it challenging to maintain and understand Jupyter notebooks, especially in larger projects.

Another point of concern is that notebook cells can be executed out of order. This is convenient while iteratively developing code because it allows going back to a previously executed cell, altering its code to fix a bug, and executing it again before proceeding with another cell. However, this can lead to unexpected results and can make it difficult to understand the flow of the code if the developer forgets to run other cells that are depending on the changed cells. This contrasts with traditional Python development, which involves executing code sequentially in a file or in the REPL, providing a clearer code flow.

When storing Jupyter notebooks into version control, another problem emerges. A notebook is saved in JSON format, combining its code and output (text as well as images) in a single file. This makes it difficult to track changes with version-control systems like Git. In particular, with non-deterministic tasks, such as those in machine learning, merely running the notebook can result in significant changes in the output cells, resulting in a large diff even if the code cells remain unchanged. Fortunately, a tool like nbdime can be used to display only the relevant differences in Jupyter notebooks. Another option is to always clear the output cells of a notebook before committing changes to Git.

Conclusion

JupyterLab 4.0 represents a significant step forward in the development of interactive notebooks for education and research. With its enhanced code editor, performance improvements, new Extension Manager, and expanded real-time collaboration capabilities, JupyterLab continues to be an invaluable tool for data scientists, engineers, and educators. Although the notebook concept still comes with its limitations, it's clear that JupyterLab offers a big productivity boost for many who are working with data. Give it a try for your next project.


Index entries for this article
GuestArticlesVervloesem, Koen


to post comments

Out of order execution

Posted Jun 28, 2023 17:19 UTC (Wed) by SLi (subscriber, #53131) [Link] (9 responses)

That out of order execution is the reason why I, as a data scientist, don't use these notebooks a lot. I may be opinionated, but it's horrible.

It's not even like it wouldn't be possible to do this right (though I grant it's likely nontrivial with a language like Python). In my ideal world, notebooks would behave a lot like any modern spreadsheet. Cells have (implicit) inputs and outputs, and when you update a cell, everything that depends on it either gets automatically updated—which is not necessarily a great idea if you have huge datasets and non-deterministic behavior (it might make sense to do this at a sub-cell granularity)—or at least get marked out of date.

Cell order absolutely should matter. Inserting a cell that changes a variable value should affect any cells below it that refer to that variable, and (generally) only those.

Out of order execution

Posted Jun 28, 2023 19:04 UTC (Wed) by summentier (subscriber, #100638) [Link] (2 responses)

You may want to take a look at Pluto, which comes fairly close to what you are describing, albeit only works with Julia. A Pluto notebook is presented to you as a sequence of cells, but internally it is a directed acyclic graph, where the cells are the vertices and dependencies are the edges. Changing one cell triggers recomputation of all dependent cells along the graph, regardless of the order.

Having seen my students flounder around with the "hidden state" and out-of-order execution problems this to me seemed like godsend: with Jupyter, we have to impress onto the students that if something looks weird, first try to restart the kernel and execute everything. Even nbgrader, the auto-grading engine we use to grade submissions, does this before handing in the notebook such that students don't submit non-working code by mistake.

However, having worked with Pluto for a while, I have to say that the beautiful concept does not translate that well to the real world: (1) I find I often have heavy calculation in one of the cells, and I don't always want to recompute even if I change one of it's dependencies; (2) one of the big advantages of out-of-order execution is that I can watch an iterative procedure converge by, e.g., repeatedly executing a sequence of cells; (3) reordering cells works reasonably well for functions, where you can "hide details" down the file, but I find it is a poor match for my mental model when you are describing a sequence of steps; (4) Pluto must be extremely restrictive with global state, which I find does get in the way of experimentation.

Out of order execution

Posted Jun 29, 2023 1:13 UTC (Thu) by rsidd (subscriber, #2582) [Link]

I haven't even tried pluto precisely because of these worries. I use jupyter+julia heavily (I never "got" jupyterlab but maybe I will give 4.0 a try). Jupyter is both a testing ground for new module functions, and a place to import and run the module. My system is to load the module on the top of the notebook, do testing and development (currently it's a clustering program and I have multiple notebooks open doing different benchmarks), and as and when I write a new function that works and is needed in the module, I put it there. I can also edit existing functions in the module and Revise.jl gets the notebook to "do the right thing" magically.

Out of order execution

Posted Jun 29, 2023 14:03 UTC (Thu) by ballombe (subscriber, #9523) [Link]

Indeed, pluto system should use versioning of both states and results, so that you could browse the full history of what was computed and get back/forward as needed.
(see each cells as a file in a fit repository and do 'git commit' each time a result is computed)

Out of order execution

Posted Jun 29, 2023 9:44 UTC (Thu) by spacefrogg (subscriber, #119608) [Link] (5 responses)

I use emacs, org-mode and org-babel for this. It can use the concept of dependent cells forming a graph and has a nice general concept of defining which cells to update when.

It is also language agnostic, so you can use output of one language as input to a different one. No need to premeditate which language to use in an notebook.

Out of order execution

Posted Jun 30, 2023 0:58 UTC (Fri) by intelfx (subscriber, #130118) [Link] (4 responses)

> I use emacs, org-mode and org-babel for this. It can use the concept of dependent cells forming a graph and has a nice general concept of defining which cells to update when.
>
> It is also language agnostic, so you can use output of one language as input to a different one. No need to premeditate which language to use in an notebook.

Sounds like this mechanism ought to be limited to feeding text output from cells into other cells? I guess that's something, but it’s absolutely inadequate for most non-trivial uses of Jupyter notebooks.

Out of order execution

Posted Jun 30, 2023 5:00 UTC (Fri) by rsidd (subscriber, #2582) [Link] (1 responses)

I was wondering this too. A jupyter notebook runs a single language kernel (python, julia, whatever), and that is a feature not a bug. You can have a global state, function definitions, etc, not just piping outputs to inputs. I see that babel has "session-based evaluation" for Python and some other languages which presumably allows the same thing.

The other thing is that emacs org-mode may be very useful to some people but will never take over the world: it's just too non-standard and non-intuitive if you haven't already lived much of your life inside emacs. (I use emacs but purely as a code editor. Most people younger than me don't use emacs.) Jupyter has arguably already taken over from proprietary platforms like mathematica and matlab. That's a win. Cf Economics Nobel laureate Paul Romer's article from 2018.

Out of order execution

Posted Jun 30, 2023 5:34 UTC (Fri) by SLi (subscriber, #53131) [Link]

This made me think of PowerShell. I'm not too familiar with it, being a Linux nerd, but what I've seen and tried out—it's open source and works on Linux too—it feels like it's does something right, allowing piping richer content than just text. Maybe something like that could be combined with a spreadsheet style auto update philosophy.

Out of order execution

Posted Jun 30, 2023 13:59 UTC (Fri) by spacefrogg (subscriber, #119608) [Link] (1 responses)

Strong words for somebody who didn't even take a minute to take a look at it. None of this is true. You can, of course, share code between snippets. Let all your code run in a single session, such that later snippets can implicitly use results from previous ones. Just like Jupyter. So, when staying within a single language it runs as a single program. But you can also pass values to and from other languages. Sure, it's not a complete byte-level interface, but it is much more flexible than Jupyter will ever be.

Out of order execution

Posted Jun 30, 2023 14:12 UTC (Fri) by intelfx (subscriber, #130118) [Link]

> Let all your code run in a single session, such that later snippets can implicitly use results from previous ones. Just like Jupyter. So, when staying within a single language it runs as a single program.

I can only assume the whole "concept of dependent cells" goes out of the window once you start using implicit state? Because if so, then it's no better than Jupyter.

(At best, I guess you could define dependencies manually (and maintain them alongside the code), which is nothing but one more place to inevitably mess up. I'm not sold, not in the slightest.)

JupyterLab 4.0: a development environment for education and research

Posted Jun 29, 2023 13:46 UTC (Thu) by pj (subscriber, #4506) [Link] (1 responses)

Glad to see a new release! Sad they haven't addressed anything from Joel Grus' "I Don't Like Notebooks" talk from Jupytercon 2018(!).

JupyterLab 4.0: a development environment for education and research

Posted Jun 29, 2023 18:20 UTC (Thu) by bluss (guest, #47454) [Link]

I think autocompletion has improved since 2018, there are plugins for linting, and there are ideas floating around for managing dependencies (but they are unfortunately not catching on, from what I know). Some of these issues would not be solved by the project but maybe by other people in the ecosystem.

He also focuses a lot on four code cells being executed out of order. That's not great, but it's a very minor issue in any of my notebook usage. If I can't use the restart and run all button repeatedly, I have a problem and I fix it.

JupyterLab 4.0: a development environment for education and research

Posted Jun 30, 2023 3:12 UTC (Fri) by azumanga (subscriber, #90158) [Link]

Recently I've been trying to use JupyterLab, and I've been finding the extensibility incredibly annoying to work with.

In Jupyter notebooks, it's easy to output some HTML and Javascript, making it very easy to make interactive graphics. JupyterLab has locked everything down, meaning it really wants you to write an installable python package, which means now I need to make a full python distributed pip package just to draw a clickable grid.

Even worse, as discussed here, they keep breaking the API every release.

After trying Jupyter Lab for a while, I'm going to stick with Jupyter notebooks, which is the format best supported anyway (for example, in vscode).

JupyterLab 4.0: a development environment for education and research

Posted Jul 6, 2023 11:03 UTC (Thu) by callegar (guest, #16148) [Link]

Most of the negative comments that I have read seem to be related to the very concept of a computational notebook interface (that owes most of its success to Mathematica going back to the end of the 1980s, and that is a great form of Literate Programming) with its pros and cons (including the out of order execution), rather than to the specific Jupyter lab implementation.

What seems to be very specific to Jupyter and more susceptible to criticism in my opinion is that Jupyter lab, now at version 4.0, seems to be still in a "young" phase of development, with a much stronger emphasis on adding features wrt fixing issues. As a consequence, the project has a very large number (>2k) of bugs open, some of which very old and some outstanding (e.g. the fact that printing notebooks from the browser is severely broken, and has been so almost forever, which is really weird given that we talk of literate programming). Very poor svg management is another notable issue, as the poor management of embedded images for markdown cells.

In general the ability to store files embedded in the notebooks to be used as datasets by the code or by markdown cells (e.g. for images) would be useful to enable self contained notebooks. The ability to associate a notebook with "requirements" metadata would also be interesting.

Another thing specific to Jupyter lab that seems worth mentioning is that other implementations of the Notebook interface have means to group cells (e.g. in notebook sections) and to associate specific context (scoping rules) to the sections.

JupyterLab 4.0: a development environment for education and research

Posted Jul 23, 2023 16:01 UTC (Sun) by jondo (guest, #69852) [Link]

Late to the game, I want to mention that the Spyder IDE offers to mark and execute "cells" within a normal Python script [1], thus providing a way to work as exploratively there.

[1] https://docs.spyder-ide.org/5/panes/editor.html#code-cells


Copyright © 2023, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds