LWN.net Weekly Edition for January 25, 2024

Welcome to the LWN.net Weekly Edition for January 25, 2024

This edition contains the following feature content:

Python, packaging, and pip—again: the latest discussion of pip and its role in the Python packaging ecosystem.
Microdot: a web framework for microcontrollers: easy web-site creation on small (and large) machines.
mseal() gets closer: progress on a new system call for application hardening.
The rest of the 6.8 merge window: interesting changes that arrived in the latter part of this merge window.
Improved code generation in the CPython JIT: what's being done to get better performance from compiled Python code.
Jujutsu: a new, Git-compatible version control system: a new attempt to create a better source-code management system.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Python, packaging, and pip—again

By Jake Edge
January 24, 2024

Python packaging discussions seem like they often just go around and around, ending up where they started and recapitulating many of the points that have come up before. A recent discussion revolves around the pip package installer, as they often do. The central role that is occupied by pip has both good points and bad. There is a clear need for something that can install from the Python Package Index (PyPI) immediately after Python itself is installed. Whether there should be additional features, including project management, that come "inside the box", as well, is much less clear—not unlike the question of which project management "style" should be chosen.

Early in 2023, we tried to cover a wide-ranging discussion regarding Python packaging, its tools, the governance of the Python Packaging Authority (PyPA), and more. The "authority" part of the name, which originally was meant as something of a joke, is not entirely accurate; there are efforts underway to update (or replace) PEP 609 ("Python Packaging Authority (PyPA) Governance") with a new "packaging council" and a clearer mandate. Meanwhile, there has been some progress in the packaging world since our article series, but it seems likely that none of the participants are completely happy with its extent. There is still a huge amount to do.

There are some PEPs that are being discussed and worked on in that area, including PEP 735 ("Dependency Groups in pyproject.toml"), which is authored by Stephen Rosen and sponsored by Brett Cannon; it will be accepted or rejected by the PEP delegate, Paul Moore. It specifies a way to store package dependencies in a pyproject.toml file that is analogous to the requirements.txt file that pip uses today. The TOML file is meant to be used by various other tools, such as IDEs and program launchers, in ways that are well beyond what the current format can provide. In addition, the requirements.txt file format is not standardized, but trying to do so for now would have lots of backward-compatibility concerns.

It is against that backdrop, and apparently under the assumption that the PEP will be accepted, that Luca Baggi asked if pip would be changed to add dependencies to pyproject.toml. Moore, who is also a pip maintainer, noted that pip "isn't a project manager, it's an installer"; there are other tools for that job. There was a bit of discussion about what it might look like to add the feature, but "sinoroc" agreed with Moore that it is not in pip's scope: "pip does not deal with modifying this kind of data, pip is only a consumer of such data".

Pip maintainer Pradyun Gedam said that he would like to see pip expand into some "additional parts of the workflow" but that there simply are not enough developers to handle the existing load; "if we can't keep up with existing feature [maintenance], it gets worse when we add more features". That led Damian Shaw to suggest that it may be time to "consider bundling a tool with the CPython installer that does support this kind of package/environment manager workflow", though he recognized there would be a high bar for any project to cross to get bundled that way.

Moore cautioned against extending pip further, noting that many of the existing workflow tools use pip under the hood, so any changes would need to take that into account. Beyond that, for a new workflow tool to get adopted, a new PEP, along the lines of PEP 453 ("Explicit bootstrapping of pip in Python installations"), would need to be written and get approved. Things have changed in the ten years since that PEP, so it makes sense to consider a new path, but:

[...] I doubt there's a realistic possibility of anyone ([packaging] council, PyPA or community) being able to come to a decision on which workflow tool is going to be blessed as the "official answer". We've had way too many unproductive discussions on this in the recent past for me to think there's anything even remotely like a consensus.

Sinoroc suggested that pipx, which automatically installs command-line applications from PyPI into virtual environments and adds them to the user's path, might be a better choice for bootstrapping these days. Moore pointed out that pipx uses pip, so it does not remove the need for pip as part of the initial Python install. Both pip and pipx are experimentally available as standalone zipapp applications, which might mean that Python could stop shipping pip, but even then it is still "a non-trivial process getting from 'install Python' to 'ready to go'", he said.

When pip was added to the install, it was done to provide a means for users to install the workflow tool of their choice (as well as other packages of interest), it was not meant to be the be-all and end-all, Moore said. But Shaw thinks that users see it differently: "With pip bundled with the official installer and pypi.org displaying pip install package-name on every package page, the impression is that pip is the blessed official tool to use for managing 3rd party packages, not merely a simple way to bootstrap to your favorite tool of choice." If there is sufficient interest by both CPython developers and those of a project-management tool, he said, it may be worth looking at bundling such a tool to supplant pip.

But some of the difficulties that pip struggles under, such as needing to adhere to the CPython release cycle and to vendor all of its dependencies, would also affect any other tool that gets shipped. It is hard to see the developers of other tools being willing to do so, Moore said. In addition, even if there were candidates, as a core developer he does not "have the appetite to get CPython sucked into the 'which tool is best' controversies this would involve". Perhaps unfortunately, though, making that choice is exactly what last year's Python user survey showed was most desired, Brendan Barnwell said. Not making a choice is:

[...] ultimately incompatible with the goal of solving the fragmentation problem of Python packaging tooling. As long as the tools that are bundled with Python leave out large chunks of functionality that people want, while the ones that aren't bundled compete and none is clearly endorsed by Python, users will feel confused and irritated. It doesn't exactly have to be choosing the "best" one but I think a choice does need to be communicated about which tool(s) are recommended.

But core developer Steve Dower concurred with Moore; he also does not want to get pulled into trying to make that controversial choice. Furthermore, he pointed out that the core developers are not particularly sympathetic to packaging concerns:

And I've raised packaging-like questions with the broader core developer group before - the responses tend to range between "what's packaging" to "I don't care about packaging" to "they can do whatever they want". I'm afraid the most support from the core team comes from those of us who participate in this category, which is very few of us.

That division is not healthy, Moore noted, especially given that the responsibility for ensuring access to PyPI should be shared between the PyPA and core-development team. Since that is not happening, solutions to some of the problems that users complain about cannot come about:

Other solutions are possible. Making zipapps work better, and shipping a pip (or pipx) zipapp with core Python might be an option, for example. But only if the core devs take a more active interest in the deployment side of the developer experience.

That's pretty much where things stand; there was a bit more discussion, which continues as of this writing, about pip and its central—privileged—role in the ecosystem. That is, of course, much like many of the other, interminable discussions that are ongoing in the packaging category of the Python discussion forum. Incremental progress is seemingly being made, but the main problem identified by the user survey—and huge number of complaints before that—remains. It is not at all clear what, if anything, will break the logjam.

Comments (40 posted)

Microdot: a web framework for microcontrollers

By Jake Edge
January 23, 2024

There are many different Python web frameworks, from nano-frameworks all the way up to the full-stack variety. One that recently caught my eye is Microdot, the "impossibly small web framework for Python and MicroPython"; since it targets MicroPython, it is plausible for running the user interface of an "internet of things" (IoT) device, for example. Beyond that, it is Flask-inspired, which should make it reasonably familiar to many potential web developers.

Microdot was created by Miguel Grinberg, who also created the well-known Flask Mega-Tutorial that has served as the introduction to the Flask web framework for many people. While Flask is considered to be a microframework, it still requires a full CPython environment to run; another Python microframework described alongside Flask in a 2019 LWN article, Bottle, has similar needs. Microdot came about because Grinberg wanted a framework for MicroPython and did not find anything usable back in 2019.

MicroPython is a cut-down version of Python that is suitable for microcontrollers; LWN looked at version 1.20 in May 2023. There is a lengthy list of differences between MicroPython and CPython, but there is enough overlap that Python—and web—programmers will feel right at home with Microdot. Code can be moved back and forth, because Microdot can also run under CPython, though, of course, MicroPython runs on regular CPUs as well.

As shown in Grinberg's announcement blog post, a canonical "hello world" web application looks fairly straightforward:

    from microdot import Microdot

    app = Microdot()

    @app.get('/')
    async def index(request):
        return {'hello': 'world'}

    app.run(debug=True)

The @app.get() decorator describes the URL to match and the index() function returns data that is used as the response. In this case, returning a dictionary causes Microdot to format the response as JSON; a status code or custom headers can be returned as the second and third elements of a multi-value return.

One thing that does stand out in the example is the "async def" in the definition of index(). While Microdot 1.0, which was released in August 2022, had a synchronous base with an asyncio extension, Microdot 2.0, which was released in December, is designed to be fully asynchronous. Regular def can still be used with Microdot 2.0, but web-server responsiveness will suffer on microcontrollers, which generally lack threading support. Instead, inside a handler defined with async def, processing that requires I/O should be handled using await, rather than synchronously waiting for the operation to return control. Grinberg gave a talk on asynchronous Python at PyCon 2017 that may be of interest to readers who are not familiar with the topic.

While Microdot is inspired by Flask, it is not beholden to it. In particular, Grinberg said that he liked the Flask Blueprints for modularizing web applications, but thought the Flask mechanism was too complicated. So Microdot has the concept of "mounting" sub-applications at different points in the URL hierarchy. As the documentation shows, a customer_app and an orders_app can be created in two separate files, which get imported into the main application source file in order to create the full application with something like:

    def create_app():
        app = Microdot()
        app.mount(customers_app, url_prefix='/customers')
        app.mount(orders_app, url_prefix='/orders')
        return app

That allows the functionality for customers and orders to be written as if they are at the top-level of the hierarchy, then they can be mounted where they should actually appear; any reorganization would not require changes to the sub-applications, just a different mount point, as long as relative URLs are used within the sub-application code.

For rendering its pages, Microdot supports the familiar Jinja templating engine, but only for CPython. For MicroPython, utemplate can be used and it will work on CPython too; it implements a simplified version of the Jinja syntax. Other extensions provide WebSocket, secure user session, server-sent event (SSE), and other functionality.

In its most minimal configuration, Microdot clocks in at a little over 700 lines of code (plus comments and blank lines) in a single file; the full configuration is a bit over double that in 11 files. Grinberg contrasts that with Flask and FastAPI, which both come in around ten times the number of lines of code when including their main dependencies (Werkzeug and Starlette, respectively). Meanwhile, the third-party dependencies for Microdot are minimal, as one might guess:

The single-file minimal version does not require any dependencies at all, and it even includes its own web server. Some of the optional features do require third-party dependencies, and this is comparable to the larger frameworks. Like Flask, you will need to add a templating library (Jinja or uTemplate) to use templates. For user sessions, Microdot relies on PyJWT, while Flask uses its own Itsdangerous package.

Deploying a web application on MicroPython uses the built-in web server, but CPython can use Asynchronous Server Gateway Interface (ASGI) servers or those that support its predecessor of sorts: Web Server Gateway Interface (WSGI). TLS is also supported so serving can be done over HTTPS. There are several example programs using each server type, and both template engines, linked from the documentation. While WSGI is a synchronous protocol, Microdot itself runs in an asyncio event loop, so async def and await should still be used.

Overall, Microdot looks like a nice option for new projects for microcontrollers. Grinberg specifically does not recommend that people drop their current CPython framework in favor of Microdot, unless they are unhappy with it or need to really eke out the memory savings that Microdot can provide. There are now multiple MicroPython web frameworks and tools listed on the Awesome MicroPython web site, so there are lots more options than what Grinberg found five years ago. A simple web application makes for an almost ideal interface to a headless, network-connected device, so finding ways to make them easier to develop is welcome. From that perspective, Microdot is worth a look.

Comments (1 posted)

mseal() gets closer

By Jonathan Corbet
January 19, 2024

The proposed mseal() system call stirred up some controversy when it was first posted in October 2023. Since then, it has been evolving in a quieter fashion, and seems to have reached a point where the relevant commenters are willing to accept it. Should mseal() be merged in a future development cycle, it will look rather different than it did at the outset.

As a reminder, mseal() was created as a way of preventing changes to portions of the virtual address space. It is meant to thwart attacks that depend on changing memory that is supposed to be read-only or otherwise messing with a process's idea of how its memory is laid out. An attacker who can change memory permissions or mappings may, for example, be able to circumvent control-flow-integrity protections. By using mseal(), a process can prevent changes of that type from being made. The initial user is expected to be the Chrome browser, where it will be used to further harden the program against memory-based attacks.

mseal(), as proposed in October, had this prototype:

    int mseal(void *addr, size_t len, unsigned int types, unsigned int flags);

The types parameter, which allowed the caller to fine-tune the changes that mseal() would prohibit, was one of the more controversial features, with a number of people questioning why anything other than an outright ban on changes would be useful. Even so, version 2 (posted shortly after the first version) and version 3 (posted in mid-December) retained that parameter. In response to the latter posting, Linus Torvalds reiterated his dislike of that aspect of the API, and asked: "I want to know why we don't just do the BSD immutable thing, and why we need this multi-level sealing thing".

Chrome developer Stephen Röttger answered that Chrome needed the ability to allow madvise(MADV_DONTNEED) in specific places where the region could otherwise be sealed. This operation was forbidden in sealed memory because it is essentially a mapping change; it discards the underlying memory, which (for anonymous pages) will be refilled with zeroes if it is accessed again. It is useful for (for example) discarding unneeded cached data, but it also has the potential to create surprises. In the Chrome case, the type argument was used to allow MADV_DONTNEED on writable anonymous memory — memory that the process has the ability to write directly even when it is sealed. Torvalds replied that the proper solution was for mseal() to only allow MADV_DONTNEED if the mapping in question is writable. Indeed, he thought that restriction might make sense even in the absence of sealing.

As a result of that discussion, version 4, posted in early January, implemented the new semantics with regard to MADV_DONTNEED. This version also finally dropped the types parameter; memory is now either sealed or not. Torvalds was satisfied by the changes; he declared that "this seems all reasonable to me now" and withdrew from the discussion. The fifth version brought only small changes, suggesting that the major concerns have been addressed; Kees Cook noted that "this code is looking to land". Since then, version 6 was posted with a few more small changes.

If mseal() is merged in this form, its prototype will be:

    int mseal(void *addr, size_t len, unsigned long flags);

The addr and len describe the range of memory to be sealed; the flags argument is currently unused and must be zero. It will only be available on 64-bit systems. This documentation patch contains more information about its use.

So this story may have run its course, but there is still one aspect of it that has been somewhat swept under the rug. OpenBSD has a similar system call, mimmutable(), that has been in place since 2022. It, too, prevents modifications to a specific range of the address space. Over the course of the conversation, simply implementing mimmutable() for Linux has been suggested a number of times. Jeff Xu, the developer of mseal(), has always shrugged off that suggestion, to the point that Theo de Raadt, the creator of mimmutable(), suggested that "maybe this proposal should be using the name chromesyscall() instead". It seems that implementing mimmutable() for Linux has never been seriously considered.

As mseal() has gotten simpler, though, the features that differentiated it from mimmutable() have melted away, to the point that they do almost the same thing. About the only difference is that mimmutable() allows downgrading permissions (setting memory read-only even if it has been sealed), while mseal() does not; OpenBSD may yet remove that feature, though, further reducing the difference in semantics between the two system calls. Given that, it may be worth asking, one more time, why Linux doesn't just adopt the existing interface and add mimmutable(). It is not a question that has been directly addressed.

Possible answers do exist. mseal() carries the flags parameter that long experience says is a good idea, even if the immediate need for it is not apparent. It may also be that the use of this system call will always be so specialized and low-level that any code using it will need to be system-specific in any case, in which case there may be little value in using the same name. Finally, adding an mimmutable() wrapper around mseal() in the C library would be an almost trivial undertaking if it were deemed worthwhile.

If and when mseal() is merged, it will initially only benefit the Chrome browser (and its small band of users). As the mseal() cover letter points out, though, Röttger is working on adding support to the GNU C Library so that most programs would be able to run with a fair amount of sealing automatically applied. That would greatly increase the use of this new system call, and the ability to use it in the C library would increase confidence that the API is correct. That seems likely to truly seal the deal.

Comments (20 posted)

The rest of the 6.8 merge window

By Jonathan Corbet
January 22, 2024

Linus Torvalds was able to release 6.8-rc1 and close the 6.8 merge window on time despite losing power to his home for most of a week. He noted that this merge window is "maybe a bit smaller than usual", but 12,239 non-merge changesets found their way into the mainline, so it's not that small. About 8,000 of those changes were merged since the first-half summary was written; the second half saw a lot of device-driver updates, but there were other interesting changes as well.

Some of the most significant changes pulled in the latter half of the 6.8 merge window are:

Architecture-specific

The riscv architecture has made more information about supported ISA extensions on the current system available via the riscv_hwprobe() system call. See Documentation/arch/riscv/hwprobe.rst for details on what is available.
Riscv can also now suspend to RAM if the SUSP SBI extension is present.
Host-side support for Intel Trust Domain Extensions (TDX) has been merged; this will eventually allow KVM to create TDX-protected guests. This documentation commit has some more information.
The LoongArch architecture has added support for modules written in Rust. This architecture has also raised the minimum version of Clang that can be used to 18.0.0 — which has not been released yet.

Core kernel

It is now possible to change the size of tracing sub-buffers used for the reporting of trace events to user space; see this documentation commit for more information.
One new "feature" — the scheduler performance regression encountered by Torvalds early in the merge window — has been removed with this fix.

Filesystems and block I/O

The ~~device-mapper~~ multi-device MD_LINEAR, MD_MULTIPATH, and MD_FAULTY targets have been deprecated since the 5.14 release in 2021; they have now been removed.

Hardware support

Clock: Qualcomm SC8280XP camera clock controllers, Qualcomm SM8650 global clock controllers, Qualcomm SM8650 TCSR clock controllers, Qualcomm SM8650 display clock controllers, Qualcomm SM8650 GPU clock controllers, Qualcomm QDU1000/QRU1000 ECPRI clock controllers, Qualcomm X1E80100 global clock controllers, MediaTek MT7988 clock controllers, Nuvoton MA35D1 realtime clocks, TI TPS6594 realtime clocks, and Analog Devices MAX31335 automotive realtime clocks.
GPIO and pin control: Realtek DHC GPIO controllers, Nuvoton BMC NPCM7xx/NPCM8xx SGPIO controllers, Qualcomm SM8550 LPASS LPI pin controllers, Qualcomm SM8650, SM4450 and X1E80100 pin controllers, TI TPS6594 PMIC GPIO controllers, and Intel Meteor Point pin controllers.
Graphics: Imagination Technologies PowerVR (Series 6 and later) and IMG GPUs, Synaptics R63353-based panels, and Ilitek ILI9805-based panels. Also merged is the Intel "Xe" driver for GPUs starting with the Tiger Lake generation. It is not enabled by default anywhere but that will change in some future kernel development cycle.
Hardware monitoring: Monolithic Power Systems MP5990 hot-swap controllers, Monolithic Power Systems mp2856/mp2857 modulation controllers, Analog Devices LTC4286 and LTC4287 hot-swap controllers, and Gigabyte Waterforce X240/X280/X360 AIO CPU coolers.
Industrial I/O: Maxim max34408/max344089 analog-to-digital converters, Bosch BMI323 I2C and SPI controllers, Microchip MCP9600 thermocouple EMF converters, Vishay VEML6075 UVA and UVB light sensors, Intersil ISL76682 light sensors, Melexis MLX90635 contactless infrared sensors, Honeywell HSC/SSC TruStability pressure sensors, Lite-On LTR-390UV-01 ambient light and UV sensors, Aosong AGS02MA TVOC sensors, Microchip MCP4801/02/11/12/21/22 digital-to-analog converters, and Analog Devices AD7091R8 analog-to-digital converters.
LED: Allwinner A100 RGB LED controllers and Maxim 5970 indication LEDs.
Media: Starfive camera subsystems, Chips&Media Wave codecs, GalaxyCore GC2145 and GC0308 sensors, THine THP7312 image signal processors, STMicroelectronics STM32 memory interface pixel processors, Techwell TW9900 video decoders, Allied Vision ALVIUM MIPI CSI-2 cameras, and OmniVision OV64A40 sensors.
Miscellaneous: Apple SoC mailboxes, Qualcomm PMIC PDCharger ULOG providers, Microchip MCP2200 HID USB-to-GPIO bridges, Nintendo NSO controllers, AWS EC2 Nitro security modules, Intel visual sensing controllers, AMD AXI 1-wire bus host interfaces, Qualcomm SM8650, SM6115 and X1E80100 interconnects, MPS MP3309C backlight controllers, Adafruit Mini I2C gamepads, and Loongson LS2X APB DMA controllers.
Sound: Qualcomm X1E80100 audio subsystems and Qualcomm WCD939x USBSS analog audio switches.

Miscellaneous

The perf tool has gained support for data-type profiling. Some more details, along with information on a the usual large pile of other perf changes, can be found in this merge message.

Security-related

See this blog post from Paul Moore covering changes to the kernel's security subsystem in detail.
The AppArmor security module has switched its policy-hash verification from the SHA-1 hash to SHA-256.
The task of removing the strlcpy() API from the kernel is now complete.

Virtualization and containers

The guest-first memory feature for KVM has been merged. Guest-first memory can be allocated for and mapped into KVM guests, but is inaccessible to the host, making it suited to confidential-computing applications. There is also a new ioctl() call where the expected attributes for guest memory (including a lack of mapping in the host) can be specified. This changelog has some more information.
KVM on arm64 systems has gained support for 52-bit (LPA2) physical addresses.
KVM on x86 can now be built without Hyper-V emulation support, reducing the size of the resulting kernel.

Internal kernel changes

The kernel now has a .editorconfig file that will automatically configure editors to the kernel's coding style.
The new check-uapi.sh script can be used to detect inadvertent changes to the kernel's user-space API. See Documentation/dev-tools/checkuapi.rst for details.

If all goes according to plan (which it pretty much always does), the 6.8 kernel will be released on March 10 or 17. Between now and then, though, there will certainly be a lot of bugs to find and fix.

Comments (52 posted)

Improved code generation in the CPython JIT

By Daroc Alden
January 18, 2024

As previously reported, Python 3.13 is set to include a copy-and-patch just-in-time (JIT) compiler that transforms Python bytecode into machine code for execution. Ken Jin from the Faster CPython project has been working on taking the JIT further by adding support for a peephole optimizer that rewrites the JIT's intermediate representation to introduce constant folding, type specialization, and other optimizations. Those techniques should provide significant benefits for the performance of many different types of code running on CPython.

Background

Currently, Python code is transformed in a few ways before execution. After being parsed, the code is transformed into a high-level bytecode. Since PEP 659 (Specializing Adaptive Interpreter) was adopted in version 3.11, Python has used a specializing bytecode interpreter. This work introduces "adaptive" instructions which, when run, replace themselves with an instruction specialized to the type of its arguments. Having inline type information can help eliminate overhead from Python's dynamic dispatch by caching the necessary functions in the bytecode directly.

The JIT introduced in version 3.13 takes Python's existing bytecode and transforms it into machine code. It does this in two steps, first breaking each instruction into micro-operations, and then translating each micro-operation into machine code. This two-pass approach is intended to allow Python to continue working in an environment where JIT compilation is not supported by permitting the micro-operations to be interpreted directly by a micro-operation interpreter (called the Tier 2 interpreter in the design documentation). This also serves as a specification against which to test the JIT to make sure it remains correct. The new optimizer operates on the micro-operations before they are interpreted or transformed into machine code.

In summary, Python code now passes through several phases as the interpreter warms up. It is parsed and compiled to bytecode, and then adaptive instructions specialize the bytecode. Next, the interpreter identifies hot spots and converts them to micro-operations. These pass through any available optimizers before being compiled to machine code by the JIT or executed directly by a specialized interpreter.

Optimizations

The micro-operation optimizer works using abstract interpretation. Unlike normal interpretation, where every value is known at the time the code is run, in abstract interpretation some of the information is known and some is represented by an opaque token standing in for information that will only be learned later, at the actual runtime of the program. The trace of the abstract interpreter's execution is used to generate the sequence of micro-operations to be compiled or executed. The optimizer takes the known information in the bytecode (provided by specialization) and simulates the operation of each instruction. When all of the inputs to an instruction are fully known, such as when adding constant values together, the result is also fully known and can be carried into further instructions without recording an instruction to be emitted in the output of the optimizer. When only some of the inputs are known, the optimizer can still sometimes emit an instruction which is specialized to that input, allowing bytecode specialization to propagate. When none of the inputs are known, the instruction is emitted to be executed at runtime.

The core operation of the optimizer — taking known values and combining them to reduce the number of operations needed at runtime — is known as constant folding, which is a common optimization in other languages. Python actually already performs constant folding on constants present in the source code. The benefit of doing constant propagation again for code about to be JIT compiled is that the optimizer can take advantage of information discovered by adaptive instructions that was not available to the bytecode compiler.

Because Python is a dynamically typed language, the type information that adaptive instructions capture can be important to propagate via constant folding because of the impact on constant guards. Python bytecode frequently contains guard conditions that check whether an assumption, such as whether two operands have compatible types, is true — raising an exception or bailing to specialized handling code when it isn't. Guards are also used to ensure that specialized instructions remain applicable, and de-specialize the instruction if not. However, the bytecode compiler cannot always tell ahead of time whether two guards are redundant, such as when the type of one variable is the same as another variable, and therefore only one of them requires a check. The micro-operation optimizer is capable of noticing that checks depend on already-known information and eliding them.

Another especially useful optimization is "true" inlining of Python functions. Prior to Python 3.11, every call of a Python function required calling a recursive C function inside the interpreter, meaning that no inlining could occur. Python 3.11 introduced an inlining-like optimization to remove the overhead from the C function call by using "fake frames" to record function-call information on the Python stack, and convert Python-to-Python function calls into a simple jump inside the bytecode interpreter. This eliminates some but not all of the unnecessary overhead in making function calls.

The micro-operation optimizer is intended to eliminate the overhead from creating and destroying fake frames on the Python stack, allowing small functions to actually be inlined with no call overhead. This allows the locals from a called function to be placed in the stack frame of their parent. This optimization is expected to have a particularly dramatic effect on the overhead of creating new objects. Currently, creating an object in Python requires pushing and popping at least two frames to call the __init__() function of the object in question. By applying true inlining, the optimizer can ensure that simple object initializers that only fill slots in the object can be reduced to a series of stores directly following the memory allocation.

Other projects

This new work all integrates into the existing CPython implementation using the high-level optimizer API (introduced in version 3.12) that is intended to allow new optimizers to be developed as plugins and called by the interpreter on hot spots. These optimizers operate on "superblocks" of micro-operations. Like the more usual basic block representation, a superblock has exactly one entry point. Unlike a basic block, however, a superblock may have several exits, usually guards or places where an exception may be thrown.

While the high-level optimizer API may eventually be adopted by other projects, developers of Pyston and Cinder — two performance-oriented Python forks — said that the API was not sufficient for their use case, because it does not provide the JIT with enough control over what performance information is collected, and when functions are optimized. Currently, the new CPython JIT is the only client of the API.

Future work

While the new optimizer has not landed in Python's main branch yet, it is mostly working, and is expected to land soon once the tests and documentation have been updated. The new optimizer is one of several other ongoing improvements to the new JIT. And improvements to the JIT are not the end of the Faster CPython project's plans. The availability of the JIT interpreter unblocks additional work, including improvements to the calling convention for Python code. Mark Shannon and Michael Droettboom said in one of the project's planning documents that the recent improvements in the core interpreter code have highlighted garbage collection as the next major performance blocker. Discussion on how to improve Python's garbage collector and many other pending performance improvements is ongoing on the Faster CPython ideas issue tracker.

Comments (7 posted)

Jujutsu: a new, Git-compatible version control system

By Daroc Alden
January 19, 2024

Jujutsu is a Git-compatible distributed version control system originally started as a hobby project by Martin von Zweigbergk in 2019. It is intended to be a simpler, more performant Git replacement. Jujutsu boasts a radically simplified user interface and integrates ideas from patch-based version control systems for a novel take on resolving merge conflicts. It is written in Rust and available under an Apache 2.0 license.

Unlike some other projects that build on top of Git — such as Gitless or Magit — Jujutsu is designed with eventual independence from Git in mind. Jujutsu's own code is written in Rust, but it links libgit2 (a C implementation of core Git features) to interact with Git repositories. Jujutsu can either use a Git repository as its storage backend, or use its own custom storage backend. The native backend is not yet ready for serious use. The project's README states: "The backend exists mainly to make sure that it's possible to eventually add functionality that cannot easily be added to the Git backend". Von Zweigbergk, who is now paid by Google to work on the project, plans to extend Jujutsu with a backend for Google's internal cloud-based storage (as shown in the slides of his 2022 Git Merge talk). Jujutsu is designed to fetch information from its storage backend lazily, specifically to support large monorepos like the one used at Google.

For now, Jujutsu seeks to be usable as a Git replacement for simple, everyday workflows. The core contributors use it to work on the Jujutsu repository without significant problems. However, the project still lacks support for some of Git's more esoteric features, including submodules, partial clones, shallow clones, multiple work trees, sparse checkouts, signed commits, and Git Large File Storage (LFS).

Also difficult to do without is the lack of support for hooks — external scripts that can be called at different points during Git's operations. Git used to have many components written as shell scripts, making it easy to add hooks to be called for various operations. Even though many core components have been rewritten in C, Git has maintained backward compatibility around when hooks will be invoked. Git contributor Elijah Newren cited this as one of the difficulties with improving the performance of Git's rebase operations.

There are plans to add support for a subset of hooks to Jujutsu, but this is more difficult than it might appear. Unlike Git, Jujutsu aims to have all operations work directly against the underlying storage, with any necessary changes rendered to the filesystem afterward. This choice allows Jujutsu to work with the same commands in a bare repository, and to perform more efficient in-memory transformations without touching the filesystem, but makes it challenging to support hooks. Since hooks are external programs, they usually expect to interact with a repository through the filesystem, which is impossible when operations are performed directly in memory. This lack also means that Jujutsu has no configurable merge strategies.

Jujutsu's from-scratch design with performance in mind means that it can complete rebase operations significantly faster than Git can. Jujutsu's superior performance has driven Elijah Newren and Christian Couder to propose a new "git replay" command that makes rebase operations faster. It does this partly by adopting reasonable defaults (such as making the equivalent of "--reapply-cherry-picks" the default), and partly by avoiding walking the commit graph as much as "git rebase" does. Walking the commit graph to obtain extra information that would reduce the number of required merges was a reasonable performance optimization when "git rebase" was first introduced, but now other performance improvements to merging have made the tradeoff not worth it for most situations. While "git replay" offers a significant improvement over "git rebase", it still lags behind Jujutsu's rebase operation.

Features

Jujutsu differs from Git in several other ways. The most striking is perhaps the removal of the index. In Jujutsu, the working tree is represented directly by a real commit. Editing that commit can be done directly by editing the files on disk, with no need to stage or unstage them. Running a Jujutsu command will copy any changes from the filesystem into the working commit before taking other actions. To finalize the commit and stop editing it, the user simply creates a new working commit on top of the old one with "jj new". Since the working copy is a commit like any other, Jujutsu doesn't need any equivalent of commands that modify the index, such as "git add" or "git stash".

Most commands in Jujutsu affect the working copy by default, but can perform the same operations on other commits by specifying their revision. For example, "jj describe" is used to alter the commit message of a commit. If invoked without any other arguments, it alters the commit message of the working copy. If invoked with another revision, it alters the commit message of a historical commit and transparently rebases everything that depends on it. Other commands such as "jj move", that moves a diff between commits, work on the same principle.

Automatically rebasing commits like this, while convenient, creates its own set of problems, especially when rebasing commits that have already been sent to other developers or a centralized server. To ameliorate the many conflicts that implicit rebases create, Jujutsu uses an idea from patch-based version control systems such as Darcs and Pijul. These systems allow one to commit conflicts and resolve them later. This does not involve committing the textual conflict markers, but rather a representation of the conflicting trees. Then those conflicts can be resolved in later commits.

Von Zweigbergk gave an example of how this works. Consider a Git history that looks like this:

               o---*---o topic
              /
     o---o---o---*---o---o main

The two commits marked with asterisks have conflicting changes. The author of this code would like to merge the main branch into the topic branch, resolve the merge conflicts, and then later rebase the topic branch onto the main branch to eliminate unnecessary merges. With Git, the author could use "git rerere" to remember the conflict resolution and replay it later while rebasing. With Jujutsu, they could simply rebase on top of the main branch and fix the conflict, resulting in this history:

                                   o---*---o---+ topic
                                  /
     o---o---o---*---o---o---o---o   main

Now if the author inspects this history with "jj log", they will see that the commit marked with an asterisk on the topic branch is marked as having a conflict, as is the commit after it. The commit with a plus is the one that resolves the conflict. Later, if they want to clean up their history, they could move the conflict resolution into the original problematic commit and obtain a clean linear history like so:

    jj move --from <plus commit> --to <star commit>

By storing conflicts in this way, Jujutsu can ensure that merges and rebases always "succeed" — they might just leave some conflicted trees that can be dealt with afterward. Conflicted trees can be committed, checked out, and edited like ordinary trees. Conflicted files are rendered with textual conflict markers like the ones that Git adds after a failed merge, but are represented internally as a series of diffs. One use case for which the approach of storing conflicts might not work as well is bisecting, since conflicted commits may not build correctly. There are plans to add bisection support, but the implementation is not yet finalized. One advantage of representing conflicts in this way is that Jujutsu doesn't need an equivalent of "git rebase --continue". Every rebase and every merge completes in one command, without stopping in the middle to ask for assistance, as Git is prone to do.

This permits Jujutsu's final headline feature: the operation log. Like Git's reflog, the operation log tracks previous states of the repository. Unlike the reflog, the operation log treats rebases or other complex operations as single atomic entries. The operation log then powers the "jj undo" command that can undo the effects of any other Jujutsu command.

This combination of fast, atomic, pervasive rebases provides a different vision of how to manage a repository. Whether Jujutsu's user interface is ultimately an improvement remains to be seen, although the radical simplicity of its design is promising.

Trying Jujutsu

While Jujutsu has not yet been packaged for most distributions (the exception being NixOS ), readers interested in trying it can download a precompiled version or compile the source. Cloning an existing Git repository for use with Jujutsu is done with "jj git clone". Cloning Git repositories with many refs can be slow, and the Jujutsu documentation warns that hybrid repositories that use Git and Jujutsu together may see Jujutsu commands run slowly because of the need to check Git refs for changes between commands.

The command-line interface is fairly similar to Git, except for the omission of commands to manipulate the index. One difference to be aware of is that "jj log" only shows local commits by default. To show the full history of the repository, one uses:

    jj log -r 'all()'

The Jujutsu documentation has a thorough introduction and a comparison between Git commands and Jujutsu commands.

Conclusion

Jujutsu has come a long way in only a few years. It is already usable for working on projects that don't need more complex Git features, and offers a more consistent user interface, better performance for some operations, and an interesting approach to conflict resolution. With ongoing support from Google, it seems likely that Jujutsu will continue to see active development.

At the same time, Jujutsu lacks some of the features that make Git flexibly adaptable to different use cases, such as hooks or submodules. Importing many refs from Git remains slow, and there are still some rough edges around getting a repository initially set up. Whether Jujutsu will come to be used outside of Google depends on if its simplified interface wins out over its reduced applicability to uncommon workflows.

Comments (123 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: Linux 6.8-rc1; Slowroll misunderstandings; Firefox 122; Sourcehut outage; Vizio ruling; Dave Mills RIP; Quotes; ...
Announcements: Newsletters, conferences, security updates, patches, and more.

Next page: Brief items>>