LWN.net Weekly Edition for January 18, 2024

Welcome to the LWN.net Weekly Edition for January 18, 2024

This edition contains the following feature content:

Growing pains for typing in Python: the Python typing mechanism is meant to be simple and optional; some developers worry that it is becoming mandatory and complex.
The kernel "closure" API: a kernel synchronization API that came with bcachefs.
The first half of the 6.8 merge window: what the first pile of changes for the 6.8 kernel brought.
Rust and C filesystem APIs: a discussion over how Rust abstractions should be designed.
Julia v1.10: Performance, a new parser, and more: a new version of the Julia language with new features.
Please welcome Daroc Alden: LWN's newest staff writer.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Growing pains for typing in Python

By Jake Edge
January 17, 2024

Python's static-typing feature has come a long way since it was introduced in 2014. Adding type information to functions has always been—and will remain—optional, but typing still remains somewhat contentious. There are multiple kinds of consumers of the information, each with their own needs and wishes, as well as users of the feature with expectations of their own. That has led to the formation of a Python typing council to govern the type system for the language, though, as might be guessed, there are still grumblings from various quarters.

Divisive

A long, recently rekindled, thread about typing and its future was kicked off back in September on the Python Discourse forum with a lengthy post from Michael Hall. He noted that the topic is "more than a little divisive in Python" because there are people pulling in different directions. There are some, like himself, who want the type system to be more strict, others who are concerned that the tools and community are making the feature less optional than advertised, and there are also discussions on putting extra non-type information into the type system. Part of the problem is that there is "no clear answer to the question 'Who is this for?'".

Hall suggested a series of steps, starting with formalizing the type system, making a reference document for it, and specifying what the type system's purpose and goals are. Jelle Zijlstra, who has been doing a lot of work on Python typing, replied that he would have a proposal soon "that will address some (though not all) of your points, creating a path towards a specification". He noted that Hall is focused on "soundness" for the type system, but that is not necessarily a goal—in fact, it may be difficult to achieve with the current tools:

You clearly care deeply about making the type system sound, but for many type checkers full soundness has not historically been a goal. And for type checkers like mypy and pyright that are used by numerous projects, any changes now can have backward compatibility considerations, so even if the maintainers of these type checkers agree with soundness as a goal (they may not!), it will take them a long time to get there.

What followed was some further discussion, also at length, about what was meant by "soundness"—rigor and consistency to a first approximation—which Hall thinks is essential, while others see it differently. Pyright developer Eric Traut noted that Python is a "big tent" programming language with diverse needs from its users and that the same is true for static typing in the language. In addition, the notions of type safety and soundness are not so black-and-white—there is a lot of gray. He thought that it would be difficult to find a single answer to the question of "purpose", but had a suggestion on how to approach it:

If we're going to attempt to codify the purpose of the type system, I suggest that do so in terms of user value. An answer like "the purpose of a type system is to allow a type checker to enforce type safety" is hard to evaluate because it begs the question "what exactly does that mean?" and "why should a user care?". If we frame the answer in terms of user value, I think it will help us find common ground.

David Foster, who is a contributor to Python typing, gave his thoughts on the question of the feature's targets, noting that long-lived, large, and collaborative projects are likely to benefit most. In his view, the goal should be to catch common error types and to provide "reliable machine-verified documentation that cannot get out of date"; that all should be driven by patterns in real-world Python programs. He agreed with Traut that soundness is not clear-cut, but also pointed out that he believes unsound typing is still useful.

Longtime Python developer Skip Montanaro is getting increasingly worried about what he sees going on with typing. It looks complex and the benefits marginal, he said, but typing is creeping in everywhere:

My understanding was that type annotations were always supposed to be optional. Technically, I suppose they still are. However, as tooling shifts to demand more and more metadata be specified through annotations or open source projects require type annotations in all submissions, I fear it has become optional in name only. I think at some point a fence will need to be built around type annotations to prevent them from infecting the entire ecosystem.

While there is no fence, there is typeshed, mypy maintainer Shantanu Jain said, which contains stubs for libraries that do not have their own type annotations; it should help reduce pressure on library developers coming from users who want them to add type information. Another problem is that it is difficult for developers to write annotations that work with all of the different type checkers that are available, as Foster, Hall, and others mentioned; type documentation that is type-checker-independent is largely non-existent.

Montanaro said that he is not only concerned about library authors being badgered about adding type information; the existing documentation mechanism, docstrings, is being ignored in favor of type information. "My understanding is that Visual Studio no longer uses that info, but relies only on type annotations when prompting for completions." Python packaging developer Paul Moore had an example of that:

As a user of VS Code, the autocompletion and informational pop-ups are often very nice. But they can be utterly terrible when type annotations are complicated. As an example, the popup for divmod says:
    (__x: SupportsDivMod[_T_contra@divmod, _T_co@divmod], \
     __y: _T_contra@divmod, /) -> _T_co@divmod
While that information may be easy to get (from type annotations) it's frankly worse than useless as actual documentation. In contrast, help(divmod) gives:
    divmod(x, y, /)
    Return the tuple (x//y, x%y). Invariant: div*y + mod == x.
I'm not entirely sure what conclusions to draw from this. But it very much feels like the pressure to "use type annotations" has resulted in a badly degraded user experience.

He agreed with Montanaro that a fence was needed, but thought it should be "a social one, rather than a technical one"; Moore would like to see it become more acceptable for libraries not to include type annotations and for the tools to treat that code as perfectly valid. In addition, as his example showed, annotations are not necessarily human-readable so tools should "promote non-typing solutions that are readable". To that end, Jain created a document with an "anti-pitch" for type annotations, which was eventually added to the typing repository (and documentation) as "Reasons to avoid static type checking".

In yet another lengthy message, Brendan Barnwell agreed with Moore and Montanaro about the path of typing in Python; the focus should be on user experience and not "making programming harder for humans in order to make it easier for IDEs". He concluded by noting that there are a large number of typing PEPs that have been accepted; "I do wonder whether we really needed (or still need) that level of elaboration for an entirely optional language feature with no effect on runtime behavior."

After that, there was a fair amount of back-and-forth discussion, largely between Moore and Hall, on soundness, correctness, and what is possible (and not) within typing theory. Guido van Rossum popped in on the conversation late; once he mostly got up to speed, he observed that the goal of Hall's original post, as described in the subject ("A more useful and less divisive future for typing?"), may not be in the cards:

Going back to the subject line of this thread, it feels like whatever the benefits of a set-theoretic type system, "less divisive" does not appear particularly likely at this point. :(

Typing council

As that conversation was mostly winding down, Zijlstra announced the first draft of PEP 729 ("Typing governance process"), which proposes the formation of a typing council "that would help shepherd the Python type system by creating a conformance test suite and specification". It was co-authored by Jain and came out of a September discussion on typing governance that was proceeding in parallel with the thread started by Hall.

The PEP was favorably received in the thread and feedback was provided, some of which got reflected in the PEP. Zijlstra added the initial members to it at the end of October; it proposed a five-member council with Zijlstra, Van Rossum, Jain, Traut, and pytype maintainer Rebecca Chen. It was submitted to the steering council, which accepted it on November 20.

There have been lots of discussions in the Python typing forum of various aspects of typing from before and since the formation of the council. Beyond that, GitHub issues are being used to propose changes, which are being reflected in the specifications, conformance tests, and documentation in the typing repository.

User experience

Hall returned to the original thread in January with an "example of tools being placed above users". His argument was that the removal of a feature from PEP 646 ("Variadic Generics") was done because implementation would be difficult in the Pyre type checker, which might mean it would also be hard for others. To an outside observer, the description of the feature removed ("support for type variable tuples in `Union`") would seem to bolster the case about complexities in the type system that others (e.g. Moore) were pointing out; that description is not particularly clear to those not deeply immersed in the typing world. Hall said that there was evidence (from the mypy_primer regression tester) that projects had actually shown interest in using the feature.

Traut disagreed, saying that the PEP authors did not have a use case in mind for the feature when it was added; it does not make sense to add functionality without a use case. Beyond that, implementation complexity is something that needs to be taken into account:

From the time that the first draft PEP 646 was presented for general review, it took almost three years for mypy to add support for it. Pyre still hasn't added full support, and pytype hasn't even started to add support for it. New type system features that are too complex to implement in type checkers do not help the community. If you are involved in drafting a typing PEP, please take that into consideration.

But Hall said that the use cases for the feature were apparent. While it is understandable that type-checker developers need to consider the implementation difficulty, that puts users in a bind. "Things go from being typed as Any or untyped because the type system can't handle it yet, to people rewriting perfectly good code when it turns into 'and it never will be supported'." Elizabeth King had a similar complaint:

You incrementally support more and that's supposed to be a good thing right? In reality, that doesn't work for specifications. You get ossification as implementations block each other on [compatibility]. How many things are in the typeshed with a comment about a specific type checker not handling something correctly? Special casing that turns into a bigger problem down the road. This becomes self-fulfilling if even one implementation does this if implementations are allowed to be a blocker for the specification of something that makes sense.
At some point if typing is going to keep up with demands from users, it needs to stop being implementations that dictate what the type system supports, and only a matter of if a type checker is compliant with a specification made to support real use.

Moore said that a missing piece is a commitment to "prioritise user experience over implementation convenience" as an explicit goal of the typing council. Zijlstra thought that a blanket statement about priorities was not reasonable, "because a spec that is never implemented cannot help users". Meanwhile, he believes that the process worked in this case, since there were "no known use cases at the time"; he suggested that now that there are use cases, anyone who wants to re-add that feature should "go through the process for updating the typing spec".

There is a clear disagreement about the question of use cases and when they were known—or should have been known—with regard to this feature. Hall strongly disagreed with Zijlstra; in fact, he thinks the lack of recognition of the need for the feature is the biggest part of the problem, in general, for typing. There is a chicken-and-egg problem of sorts with typing features: without support, things that need the feature are not given types, so there are no clear use cases. That calls for careful consideration:

The type system is being bolted onto the language, and the gradual nature means people need to look at not only the things they are making now, but how this may interact with other things and things that haven't been typable so far.

Tensions

That is where things were left, but the thread clearly highlights the tension within the typing community. There are various things contributing to that, but the crux of the matter was summed up in that first message: there are too many constituencies, pulling in too many different directions. There is an enormous amount of Python code out there in the world without a single type annotation—and that is not likely to change substantially, probably for a long time. Meanwhile, there is a loud and likely growing group of Python developers who believe that "all code must have typing information"—to the extent that there have been proposals that would make typing implicitly, or even explicitly, required. Those were quickly shot down, and always will be, given that PEP 484 ("Type Hints"), which is the starting point for Python typing clearly states:

It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.

It is into this mini-maelstrom that the typing council steps; it has a big job ahead of it. Bolting on (or, more politely, gradually adopting) a typing system was always going to be difficult, but the existence of the current set of tools may actually be making it harder. Those type checkers and integrated development environments (IDEs) have to ensure that they continue to function and to report the same kinds of problems and results, even as the specification changes out from under them. Choices made nearly ten years ago may not have aged well, but those tools are somewhat stuck at this point. Whether the council can help cut through this Gordian Knot remains to be seen.

The discussion highlighted another thing that is something of a hallmark of the Python world: cordiality even in the face of strong disagreement. Though public conversations have generally improved over the last decade or so, one does not have to look too far to find rancor and flames on the internet. But Python conversations are almost always collegial—and have been for decades. For example, several took the opportunity of the thread to thank the typing developers and to recognize that the feature does have uses (and users), even if they were not particularly inclined toward using typing themselves. It all helps reinforce Brett Cannon's adage that he came for the language, but stayed for the community. It also bodes well for finding reasonable solutions and compromises for typing—and more.

Comments (25 posted)

The kernel "closure" API

By Jonathan Corbet
January 11, 2024

The data structure known as a "closure" first found its way into the mainline kernel with the addition of bcache in the 3.10 development cycle. With the advent of bcachefs in 6.7, though, it acquired a second user and was moved to the kernel's lib directory, making it available to other kernel users as well. The documentation of closures in the source is better than that of many things in the kernel, but there is still room for a gentler introduction.

As include/linux/closure.h notes: "Closure is perhaps the most overused and abused term in computer science, but since I've been unable to come up with anything better you're stuck with it again". In the kernel sense, a closure can be thought of as a reference count tracking some number of things that need to happen, along with some synchronization features and a hierarchical organization.

To start working with closures, one should allocate a structure of type struct closure and initialize it with:

    #include <linux/closure.h>

    closure_init(struct closure *cl, struct closure *parent);

Where cl is the closure to be initialized, and parent is used to create a parent relationship, which will be described below. On return from this call, the caller owns a reference to the closure that must eventually be given back.

A closure's reference count can be manipulated with:

    void closure_get(struct closure *cl);
    void closure_put(struct closure *cl);

Closures have a few mildly quirky rules, one of which is that only references obtained with closure_get() can be released with closure_put(); the initial reference obtained from closure_init() is special and must be handled differently.

There are a couple of ways of managing that initial reference; to understand them, it's worth keeping in mind what closures are for. Essentially, they allow a thread running in the kernel to place one or more operations (a set of I/O requests, for example) in motion and then wait for all of those operations to complete. To do so, that thread will initialize its closure, then start those other operations, each of which will involve calling closure_get() to obtain a reference to the closure. As each operation completes, a closure_put() call is made. When the closure's reference count drops to one, all of those operations are complete and the next step, whatever it is, can be taken.

It is up to the creator of the closure to arrange for that next step once the closure has dropped back to just the initial reference. One option for doing that is for the initiating thread to simply wait until the reference count drops by calling both of:

    bool closure_wait(struct closure_waitlist *list, struct closure *cl);
    void closure_sync(struct closure *cl);

The caller should allocate list separately. Another rule of closures is that they can only be on one wait list at a time; if the given cl is already on a list, closure_wait() will immediately return false. Otherwise it will place the closure on the given list. A call to closure_sync() will then block the current thread until the reference count drops to one.

If the initiating thread does not want to wait synchronously for the closure to complete, the alternative is to arrange for a sort of callback when the reference count drops to one:

    typedef void (closure_fn) (struct closure *);
    void continue_at(struct closure *cl, closure_fn *callback,
    		     struct workqueue *wq);

This call will arrange for callback() to be called when the last closure_put() call is made — the point where only the initial reference to the closure remains. If wq is non-NULL, it specifies the workqueue to be used to make this call; otherwise the call will made directly from closure_put(). The call to continue_at() drops the caller's reference to cl (which, remember, is the initial reference created when the closure was set up), so the caller should not touch it further; indeed, the rules for closures say that the caller should return immediately after the call.

The way to destroy a closure is to call continue_at() with a NULL callback() pointer; that is the signal that the closure is done. The macro closure_return() is defined as a shorthand for this call:

    #define closure_return(_cl)	continue_at((_cl), NULL, NULL)

There is also a variant, closure_return_with_destructor(), that takes a second closure_fn() pointer indicating a function to call when all references have been dropped and the closure is finished.

As noted above, closures can be initialized with a parent pointer; this allows the caller to set up a hierarchy of dependent events. When a closure is initialized, it will take a reference (with closure_get()) on the parent if one is specified; as a result, the parent will continue to exist for a long as the new closure does. When a closure is finished with the special continue_at() call, the reference to the parent will be dropped with closure_put(). This mechanism ensures that the parent closure will not complete until all of its child closures have finished.

Needless to say, there are other complications in the closure API as well, but the above covers the core of it. As of this writing, only bcache and bcachefs use closures. In the past, there have been occasional vague objections to the closure abstraction, but those have not prevented its use so far. Whether its usage will grow will depend entirely on whether other developers find it useful.

Comments (9 posted)

The first half of the 6.8 merge window

By Jonathan Corbet
January 12, 2024

The 6.8 merge window has gotten off to a relatively slow start; reasons for that include a significant scheduler performance regression that Linus Torvalds stumbled into and has spent time tracking down. Even so, 4,282 non-merge changesets have found their way into the mainline repository for the 6.8 release as of this writing. These commits have brought a number of significant changes and new features.

Some of the more interesting changes merged so far include:

Core kernel

The deadline servers mechanism has been added as a way to prevent the starvation of normal tasks when realtime tasks are using all available CPU time.
The zswap subsystem has gained the ability to force cold pages out to (real) swap when memory gets tight. This commit includes some documentation on how to opt into or out of this feature.
There is also a new zswap mode that disables writing back to swap entirely; see this commit for details.
The DAMON memory-management facility now supports an auto-tuning mechanism; see this changelog for more information.
The new TRANSPARENT_HUGEPAGE_NEVER configuration option causes the use of transparent huge pages to be disabled by default.
Transparent huge pages can now be allocated in multiple sizes below the normal huge-page size. See this commit for some documentation on how to control this feature.
The new UFFDIO_MOVE operation for userfaultfd() allows pages to be moved within a virtual address space; see this commit for details.
The "KSM advisor" feature allows for automated tuning of the kernel samepage merging subsystem; see this commit and this documentation patch for details.
The BPF verifier has seen a considerable amount of work that should result in successful verification of a wider range of correct programs.

Filesystems and block I/O

The kernel is now able to prevent direct writes to block devices that contain mounted filesystems. This feature, controlled by the BLK_DEV_WRITE_MOUNTED configuration option, is disabled by default but seems likely to be enabled by distributors if it is shown to not break existing workloads. Writes to devices containing mounted Btrfs filesystems remain unrestricted in any case for now, pending the merging of some support patches into that filesystem. (See this article for some background on this change).
The listmount() and statmount() system calls have been merged; they allow user space to obtain detailed information about mounted filesystems. See this changelog for more information.
The XFS filesystem continues to accumulate changes adding support for the eventual online-repair feature.
The SMB filesystem has gained the ability to create block and character special files.
Bcachefs now has a partial (but functional) online filesystem check and repair mechanism.

Hardware support

Miscellaneous: DesignWare PCIe performance-monitoring units, Intel IAA compression accelerators, Intel QAT_420xx crypto accelerators, and Lantiq PEF2256 (FALC56) pin controllers.
Networking: Lantiq PEF2256 (FALC56) framers and Texas Instruments DP83TG720 Ethernet 1000Base-T1 PHYs. Also: a number of ancient wireless drivers (atmel, hostap, zd1201, orinoco, ray_cs, wl3501, rndis_wlan, and libertas 16-bit PCMCIA) have been removed.

Miscellaneous

Rust support has been added for the creation of network PHY drivers. This work includes a set of abstractions making the driver API available and a reference driver for Asix PHYs. This is the first user-visible Rust code added to the kernel, though it duplicates the functionality of an existing driver and thus does not add new features — yet.

Networking

There has been a fair amount of low-level work to reorganize a number of core networking data structures for better cache efficiency. This may seem like a small change but, as the networking pull request noted: "This improves TCP performances with many concurrent connections up to 40%".
The bpfilter subsystem was meant to be a way of writing firewall rules using BPF; it was first merged for the 4.18 kernel in 2018, but never got to a point where it was usable and has seen little development in recent years. The bpfilter code has now been removed, though development is said to continue in an external repository. The associated "usermode blob" mechanism (which was transformed into "usermode driver" in 2020) remains in the kernel, though there are no users for it.

Security-related

There are three new system calls — lsm_list_modules(), lsm_get_self_attr(), and lsm_set_self_attr() - for working with Linux security modules. See Documentation/userspace-api/lsm.rst for details.
The BPF token mechanism, which allows fine-grained delegation of BPF-related permissions, was initially merged into the networking tree for inclusion in 6.8. That code ran into trouble, though, when Torvalds realized that it was still treating file descriptor zero as being special; suffice to say he was not pleased. So this code was reverted for repairs; discussions are still underway and it will not be ready for this kernel release.

Internal kernel changes

The scope-based resource management mechanism feature has gained some new guards for conditional locks (as obtained with mutex_trylock() and the like). See this commit for a bit more information.
As expected, the venerable SLAB memory allocator has been removed, leaving SLUB as the only object-level allocator in the kernel. According to the merge message: "Removing the choice of allocators has already allowed to simplify and optimize the code wiring up the kmalloc APIs to the SLUB implementation".
The MAX_ORDER macro is no more; see this article for the whole story.
The kernel now builds with -Wmissing-prototypes (which generates warnings for calls to functions that have not had a prototype declared for them) on all architectures.

The 6.8 merge window can be expected to remain open through January 21. Tune back in once it has closed for a summary of the remaining changes merged for the next kernel release.

Comments (10 posted)

Rust and C filesystem APIs

By Jonathan Corbet
January 15, 2024

As the Rust-for-Linux project advances, the kernel is gradually accumulating abstraction layers that enable Rust code to interface with the existing C code. As the discussion around the set of filesystem abstractions posted by Wedson Almeida Filho in December shows, though, there is some tension between two approaches to the design of those abstractions. The approach favored by most of the kernel's C programmers looks set to win out, but this is a discussion that is likely to return as the use of Rust in the kernel grows.

If a Rust developer wants to implement a filesystem using the posted abstractions, their job will be to put together an implementation that looks like this example taken from the cover letter:

    impl FileSystem for MyFS {
        fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
        fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
        fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
        fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
        fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
    }

The functions defined here perform the tasks that the kernel might ask of a filesystem implementation: read_dir() to read directory contents, for example, or lookup() to look up a file name within a directory. All of these operations are defined as part of a single trait called FileSystem.

This organization differs from how the API is defined for C code, where file and filesystem-related operations are spread out across a wide range of object types. A filesystem as a whole is defined by struct super_block, which has a set of associated operations in struct super_operations. But filesystems implement a number of other object types and related operations, including inodes (inode, inode_operations), directory entries (dentry, dentry_operations), files (file, file_operations), and address spaces (address_space, address_space_operations).

As an example of how the object model works on the C side, consider that lookup() is an inode operation, iterate_shared() (used to implement the read_dir() function defined in the Rust trait) is a file operation, and read_folio() is an address-space operation.

Matthew Wilcox had a couple of questions about the proposed abstractions, starting with the inode parameter to a number of the operations. In the kernel's C code, those functions take a struct inode pointer, which is quickly converted into a filesystem-specific structure pointer. There is little in the way of type safety here; a function cannot know that it was actually passed a pointer to the right sort of inode. In Rust, it seems, it should be possible to do better.

Almeida answered that this interface does, indeed, do better. The type of the inode parameter is &INode<Self>, which ties the actual type of that parameter to the filesystem type; it is not possible to pass the wrong type of inode to those functions without running into compilation errors.

Wilcox's other question proved harder to answer, though. The file operation used (in C code) to read a directory is:

    int (*iterate_shared) (struct file *, struct dir_context *);

The equivalent in the Rust code (read_dir(), above) takes an inode reference as a parameter rather than a struct file pointer. Wilcox pointed out that, while "toy filesystems" get away with just the information stored in the inode, others need the information in the file structure. Not keeping that structure in the interface thus seems a bit strange. Almeida answered that the filesystems that have been implemented in Rust, to date, do not need anything from struct file; he added: "Passing a `file` to `read_dir` would require us to introduce an unnecessary abstraction that no one uses, which we've been told not to do". But, he said, the interface could be changed if and when it becomes necessary.

Wilcox responded in fairly strong terms:

Then we shouldn't merge any of this, or even send it out for review again until there is at least one non-toy filesystems implemented. Either stick to the object orientation we've already defined (ie separate aops, iops, fops, ... with substantially similar arguments) or propose changes to the ones we have in C. Dealing only with toy filesystems is leading you to bad architecture.

Almeida was not pleased by this message; he asked: "Are [you] saying that Rust cannot have different APIs with the same performance characteristics as C's, unless we also fix the C apis?" Wilcox replied that the kernel's object model exists for a reason, and that the Rust side should not change that model without a strong justification. Al Viro added that the existing set of objects and operations needed to be treated "as externally given"; they can be changed with good reason, he said, but no such reason exists here.

Kent Overstreet, instead, argued that the Rust abstractions are a way to design a cleaner interface, and that this interface should not need to match the C API. Cleaning up the latter is "a giant hassle" due to the need to change all existing filesystems at the same time, while creating something better in Rust is relatively easy.

So instead, it would seem easier to me to do the cleaner version on the Rust side, and then once we know what that looks like, maybe we update the C version to match - or maybe we light it all on fire and continue with rewriting everything in Rust.

Meanwhile, Almeida complained that passing a file structure into read_dir() when nothing uses it is just the sort of thing the Rust developers have been advised to avoid. Those developers have long been contending with the problem of merging abstractions so that they can be used without being able to merge the users at the same time. Wilcox answered that the advice had been misunderstood; the Rust developers have been asked not to merge abstractions for which there are no users, not to change the interfaces for the abstractions they are merging. Greg Kroah-Hartman concurred, saying that the abstractions should be suitable for all filesystems, not just those that have been implemented now. Dave Chinner said that this problem is exactly why he has been suggesting that the Rust developers reimplement ext2, since that filesystem, while being relatively simple, uses most of the core filesystem API.

Eventually, Almeida gave in, and said that he would make a new version of the abstractions with separate file, inode, and address-space traits; read_dir() will be updated to take a File<T> reference instead. Wilcox agreed that this approach seemed like the right way forward.

So this particular discussion appears to have come to a resolution. But implementing kernel functionality in Rust is sure to provide innumerable opportunities to create new interfaces that are cleaner and safer than those that have evolved over decades in the kernel's C code. Sometimes those APIs will demonstrate misunderstandings about why the C code evolved the way it did; sometimes they will truly be better. But, either way, a Rust API that differs significantly from the C API will make maintenance and future development harder, so there will continue to be strong resistance to the idea of creating APIs on the Rust side that differ from what is done on the C side.

One answer, as was also discussed at the 2023 Maintainers Summit, is to evolve the C code to match the better interfaces being developed for Rust. The idea makes some sense, but it is also asking Rust developers to do large amounts of work — in C, which is just the thing they are trying to get away from. Changing core kernel APIs, updating all users of those APIs, and obtaining acceptance for the changes will not be a task for the faint of heart. Such a policy would undoubtedly impede the development of better interfaces on the Rust side; the result would be more maintainability, but that comes at a real cost.

What seems likely to happen at some point was alluded to by Overstreet above: "light it all on fire and continue with rewriting everything in Rust". There is no problem with API divergence if the API used by everybody is in the Rust code. Your editor's predictive powers are severely limited, but a couple of things seem likely to happen: there will be proposals to replace some core code with Rust implementations at some point, and the resistance to doing so will be fierce. Even in this discussion, David Howells made it clear that he didn't want to see Rust anywhere near the core kernel.

That, though, is a discussion for a future time; Rust will have to prove itself at the edges of the kernel first. But once the camel's nose (or the crab's) is in the tent, the rest seems likely to want to follow. Stay tuned, it is going to be interesting.

Comments (16 posted)

Julia v1.10: Performance, a new parser, and more

January 17, 2024

This article was contributed by Lee Phillips

The new year arrived bearing a new version of Julia, a general-purpose, open-source programming language with a focus on high-performance scientific computing. Some of Julia's unusual features are Lisp-inspired metaprogramming, the ability to examine compiled representations of code in the REPL or in a "reactive notebook", an advanced type and dispatch system, and a sophisticated, built-in package manager. Version 1.10 brings big increases in speed and developer convenience, especially improvements in code precompilation and loading times. It also features a new parser written in Julia.

Time to first x

The time needed to import packages has been a common annoyance since the first public release of Julia. This is sometimes called the "time to first plot", or, more generally, the "time to first x", where "x" is the output of some function when called for the first time from a freshly loaded package. Nearly every major release of Julia has improved compilation and loading times.

The latest release decreases these delays to the point where the strategy of creating custom system images with pre-loaded packages is largely no longer needed, as Chris Rackauckas pointed out. This strategy used the PackageCompiler module to make a custom Julia binary with one's commonly-used packages baked in, so it could start up instantly, ready to plot or perform other tasks without delays. The section "Creating binaries" below discusses this further.

I borrowed an example from that Hacker News discussion, and timed a simple "using Plots; plot(sin)" using Julia v1.10 on a low-powered laptop. It reported a wall-clock time of 2.16 seconds; in that time, it needed to load the Julia runtime, import the Plots package, and create a simple plot. Note that this experiment does not use a fresh Julia install. Plots has already been downloaded and precompiled (along with its dependencies), which can take some time, but that's a one-time cost. To get a sense of the improvements in v1.10, I repeated the same timed command with the two previous versions of Julia that I still have installed. Julia v1.9 takes 3.6 seconds, and v1.8 takes 18.1 seconds.

The Julia developers employed several strategies to achieve this latest improvement in code loading. The previous major release introduced native code caching and package extensions for weak dependencies, as well as the ability for package authors to ship precompiled, native binary code with their software. Some of the recent improvement in loading time is due to package developers taking greater advantage of these two mechanisms; as these practices spread through the community we'll see further decreases in latencies even without improvements to Julia itself.

The latest release corrects an oversight in the generation of code caches: previously, if a user was compiling code with more than one instance of Julia (a situation that can arise in parallel computing environments), a race condition might ensue, with different Julia processes writing to the same cache files simultaneously. Now this problem has been eliminated using lock files.

Part of the reduction in package load time is the result of parallelizing the initial stage of precompilation. The precompilation that happens upon installation of packages has been done in parallel for a long time, but the later stage, when the user imports functions into a program or environment, was a serial process until v1.10. It is this stage of precompilation that affects responsiveness in daily use, so speeding this up has a greater impact when working in the REPL or running programs.

Julia ships with a version of LLVM containing patches to fix bugs, mainly in numerical methods. The upgrade to LLVM 15 that came with v1.10 also had an impact on Julia's responsiveness, as the new compiler version has better performance.

For users who'd like to keep a closer eye on package precompilation times in the REPL, the command Pkg.precompile(timing=true) will precompile any packages in your environment that require it, and supply a per-package report of compilation times.

Language, libraries, and runtime

In addition to faster precompilation, v1.10 brings improved runtime performance and some developer conveniences.

The Julia parser has, since v1.0 until now, been implemented in Femtolisp. This Scheme dialect was, in the words of its creator, Jeff Bezanson (one of the inventors of Julia), in its README, "an attempt to write the fastest lisp interpreter I could in under 1000 lines of C".

With v1.10, the Femtolisp parser has been replaced with a parser written in Julia. The new parser brings three advantages: it is faster, it produces more useful syntax-error messages, and it provides better source-code mapping, which associates locations in compiled code to their corresponding lines in the source. That last improvement also leads to better error messages and makes it possible to write more sophisticated debuggers and linters.

The next figure illustrates the improved error reporting in v1.10:

As can be seen, the exact position of the error is indicated. The same error committed using v1.9 produces a somewhat less friendly message, mentioning the extra space but not pointing out its location visually.

Femtolisp still performs code lowering for the language. In the latest version, as in earlier iterations, starting an interactive session with the --lisp flag takes you to the Femtolisp REPL.

Stack traces have also been made more concise and useful by omitting unnecessary information, addressing a frequent complaint from Julia programmers.

Julia has always used a tracing garbage collector. Application programmers have never had to know anything about this, although, as an optimization, it is often possible to avoid garbage collection entirely by taking care not to allocate memory dynamically. Nevertheless, for some algorithms these strategies won't work.

Run-time performance in v1.10 was improved by parallelizing the mark phase of the garbage collector, achieving nearly linear speedup as a function of the number of threads allocated to garbage collection. Julia can be started with any number of worker threads, set with the -t flag. The number of threads used for garbage collection is half the number of worker threads, by default, but the user can set this with the new --gcthreads flag.

The use of Unicode characters has received some tweaks and refinements, as usual in any Julia release. There are two new fancy arrows (⥺ and ⥷) available to be used as names for binary operators. Physicists will be relieved to learn that two similar glyphs for hbar are now treated as identical. The glyph ∜ computes the fourth root, unsurprisingly, using a new function in Base.math called fourthroot(). Its implementer was motivated by the desire to not "let a perfectly good Unicode symbol go to waste", and also points out that fourthroot() is faster than what some programmers might turn to: x^(1/4).

Speaking of new functions, there is a generous handful in the new release. The tanpi(x) function calculates tan(xπ) more accurately for large arguments. For example, noting that tan(π/4) = 1 exactly, and the tangent has a period of π:

    julia> tanpi(1/4 + 2e10)
    0.9999999999999999

    julia> tan((1/4 + 2e10)*π)
    0.999995497014194

Some new memory-related functions, memmove(), memset(), and memcpy(), were added to Julia's C standard library, to aid in interacting with C library routines.

The function in Base for calculating binomial coefficients, binomial(x, n) (x choose n), now accepts non-integer x, applying the standard definition for extended binomial coefficients; n must still be an integer.

Julia has a printstyled() function that writes messages to the screen with optional effects including colors, underline, reverse video, and others; whether the user sees the effects depends on the capabilities of the terminal emulator in use. Julia v1.10 adds an italic option.

A View in Julia is a section of an array (called the parent), with which it shares memory. The function parent() returns the View's parent array. A SubString applies the same idea to strings: it's a piece of a parent string that doesn't create a new object. Previously parent() only worked with array Views. Now the function has been extended to work with SubString types:

    julia> s = "abcdefgh"
    "abcdefgh"

    julia> ss = SubString(s, 6:8)
    "fgh"

    julia> ss[1]
    'f': ASCII/Unicode U+0066 (category Ll: Letter, lowercase)

    julia> parent(ss)
    "abcdefgh"

The startswith() function determines whether a string starts with a given character or substring. This has now been extended to work with I/O streams (for example, files). It looks at the stream not from the beginning, but from the current read position, as shown in this example:

    julia> maybePNG = open("fig1.png")
    IOStream(<file fig1.png>)

    julia> seek(maybePNG, 1);

    julia> startswith(maybePNG, "PNG")
    true

    julia> position(maybePNG)
    1

This example checks for the magic number one byte into the file to determine if it might be a PNG image. The startswith() function peeks at the required extent of the stream without changing the read position, as shown in the last line of the example.

In Julia rational numbers are defined and displayed with a double slash, such as 3//4. In the latest release, when printing an array, rational numbers are displayed as integers if they have an integral value (if they can be reduced to having a denominator of one). Here's an example:

    julia> TM = Matrix{Rational}(undef, 5, 5);

    julia> for i ∈ 1:5, j ∈ 1:5
              if j<i
                  TM[i, j] = i//j
              else
                  TM[i, j] = 0//1
              end
           end

    julia> TM
    5×5 Matrix{Rational}:
    0   0     0     0    0
    2   0     0     0    0
    3  3//2   0     0    0
    4   2    4//3   0    0
    5  5//2  5//3  5//4  0

This constructs a lower triangular matrix whose structure is easy to see at a glance. Here is the way previous versions of Julia display the same array:

    0//1  0//1  0//1  0//1  0//1
    2//1  0//1  0//1  0//1  0//1
    3//1  3//2  0//1  0//1  0//1
    4//1  2//1  4//3  0//1  0//1
    5//1  5//2  5//3  5//4  0//1

This style of displaying rational numbers only applies to members of arrays; outside of that context, there is no change.

The new release comes with a few technical refinements to the linear algebra library, which is part of the standard library. One new addition is the hermitianpart() function, which efficiently calculates the Hermitian part of a square matrix.

Creating binaries

The normal way to use Julia is to download and install the runtime and to work either in an interactive environment, such as the REPL or a Pluto notebook, or to run program files from the command line, using the julia shell command. In both styles, the compiler is available to generate new native machine code whenever it encounters a function call with a combination of argument types that has not previously been compiled. This allows making full use of multiple dispatch and user-defined types without sacrificing performance, aside from the necessary compile times when code for a new method needs to be generated.

There are other usage scenarios, however, that are not satisfied by the standard mechanisms. We may want to eliminate as much compiler delay as possible, for example if a Julia program is called repeatedly in a shell script. Even the small compilation penalties still present in v1.10 could be undesirable. Or we may want to give our program to someone who, for some odd reason, does not have the Julia runtime installed. In this case, they would need a standalone binary version of our program.

There are two main packages that provide utilities for generating various types of compiled Julia programs. StaticCompiler is a package that can make small, statically compiled binaries from sources written in a severely limited subset of Julia. It's an experimental package intended for adventurous programmers. StaticCompiler is the basis of ongoing experiments in compiling Julia to WebAssembly for running in web browsers. The past year has seen significant progress in that area.

A more generally useful tool is the package PackageCompiler. It can compile programs written in unconstrained Julia into either sysimages or into distributable, standalone binaries. The former target is used in order to have a Julia environment with a set of packages baked in that starts up instantly. This was a great help in earlier versions of Julia, when precompilation times interfered with using Julia programs as routine utilities (a script that took half a minute to plot a small bit of data was inconvenient). Now that precompilation and load times are so much smaller, this use of PackageCompiler is almost obsolete.

But it is still the standard utility for creating binaries for distribution or for use in environments where the Julia runtime is not installed. While still under active development, it's a mature tool. Until recently, the main complaint has been the enormous size of the compiled programs: PackageCompiler would turn a "hello world" program into a 900MB monstrosity. This is because everything was included; not only the entire Julia runtime, but everything in the standard library. For example, the BLAS routines were included even if the program didn't do any linear algebra.

In the past year, the size of a compiled "hello world" program has come down to 50MB, making the PackageCompiler approach far more practical. This progress results from stripping out unneeded parts of the runtime and unused libraries. This work is ongoing, and is a strong focus of developer activity. Bezanson gave an informative talk about the history of this progress.

Learning resources

Julia's official manual is compendious and accurate, but its organization can make it difficult to find what you need, especially if you're a beginner. Fortunately Julia has been in use long enough now that a constellation of articles and books has grown around it, written by authors from a wide variety of backgrounds and aimed at readers from beginning programmers to working scientists experienced with computation. Several books have recently appeared by authors whose Julia articles over the past several years I've found illuminating; for example, Bogumił Kamiński, Erik Engheim, and Logan Kilpatrick. Their books, as well as my volume, are collected into a list maintained on the official Julia site.

Another good resource for learning more about Julia and the projects taking advantage of it are the talks from JuliaCon on YouTube.

Finally, once you're exploring the language and ecosystem in earnest, the Julia Discourse discussion platform will become a valuable resource. Julia developers, package authors, and experienced users participate actively; the community is welcoming, patient, and helpful.

Conclusion

Between the improvements in precompilation and loading times, and the progress in making small binaries, two major and perennial complaints, of beginners and seasoned Julia users alike, have been addressed. Work on both these areas continues and is likely to see more improvement in the near future, especially in the area of binary generation. StaticCompiler and related WebAssembly tools will make it easier to write web applications in Julia for direct execution in the browser; it is already possible, but may become more convenient over the next few years.

Comments (8 posted)

Please welcome Daroc Alden

When, at the beginning of November, we posted an open position at LWN, we were only so hopeful; experience has shown that finding writers who are both capable of and interested in writing our sort of material is a challenging task. This time, though, hope was justified: we got a surprising number of applications from highly qualified applicants. The hardest part of the task has, instead, been narrowing down the choice to a hiring decision.

We are pleased to announce that Daroc Alden has just joined LWN's staff. Daroc is a programmer from New England, where they live with their spouse and their cat. They graduated with a Master's degree in Computer Science from the University of New Hampshire. In their spare time, they enjoy fiction writing and musicals. They are especially interested in programming language theory and implementation.

Daroc will be taking on some of the load of keeping LWN interesting while helping us to expand our content mix in the areas that our readers are interested in. Please give them your support as they come up to speed within our operation. We are looking forward to having Daroc as part of a reinforced and more energetic LWN going forward.

Comments (24 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: OpenSSH DSA removal; PyTorch attack; SourceHut outage; Mint 21.3; Leap 16 coming; Wine 9.0; Quotes; ...
Announcements: Newsletters, conferences, security updates, patches, and more.

Next page: Brief items>>