Leading items

Welcome to the LWN.net Weekly Edition for December 22, 2022

This edition contains the following feature content:

Wrapping up 2022: another year has come and gone.
6.2 Merge window, part 1: what the first 9,200 changesets for 6.2 brought into the mainline kernel.
Enabling non-executable memfds: closing a longstanding security problem with the memfd mechanism.
The intersection of shadow stacks and CRIU: shadow stacks can thwart attackers, but they make life harder for checkpoint/restart systems as well.
Beyond microblogging with ActivityPub: a survey of some of the other ActivityPub-enabled applications.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Note that this is the final LWN.net Weekly Edition for 2022; as is our standard practice, we'll be taking the final week of the year off to prepare for the year that is to come. Best wishes for the new year to all LWN readers; the weekly edition will return on January 5.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Wrapping up 2022

By Jonathan Corbet
December 21, 2022

Yet another year is coming to a close; that can only mean that the time has come to indulge in a longstanding LWN tradition: looking back at the predictions we made in January and giving them the mocking that they richly deserve. Read on to see how those predictions went, what was missed, and a look back at the year in general.

What was predicted

Our first prediction was that, in 2022, awareness of the need to support free-software maintainers would grow. This is a hard one to judge; certainly there have been no developments to show that any such awareness is being acted upon in any significant way. Perhaps efforts like the posting of the Linux kernel contribution maturity model will help to raise awareness of this problem in the future but, for now, the community is still far short of the resources it needs to properly maintain the code that the world depends on.

It is also not clear that the predicted increase in willingness to pay for free-software products has materialized. Monetizing free-software work remains at the core of disagreements like the current dispute at The Document Foundation. The ominous economic winds that have been blowing in the last year have not helped that cause either.

Have the browser wars returned as expected? If so, there has been little in the way of results seen in the market share of the major browsers. Chrome remains unchallenged as the dominant browser, and it's not clear what anybody might be doing to change that.

The prediction that the use of centralized proprietary services would create contention was a relatively obvious one. For example, one of the many concerns expressed around the GNU Tools Infrastructure initiative, for example, was the use of proprietary services. The Software Freedom Conservancy called for free-software projects to leave GitHub this year. While your editor certainly didn't predict the events around Twitter, all of that drama only serves to highlight the hazards of depending on somebody else's platform.

The 6.0 kernel was released this year as was predicted, though the "most likely release date" (early December) turned out to be when 6.1 came out instead. The kernel development cycle is relatively easy to predict, but guessing whether Linus Torvalds will elect to create a 5.20 release or go straight from 5.19 to 6.0 is beyond your editor's skills. Rust support was also merged, as expected.

On the other hand, Python has not yet lost its global interpreter lock (GIL). Much of the work toward that goal, in the form of performance improvements to make up for the cost of removing the global lock, did make it into the 3.11 release as part of the broader CPython performance push. Whether the Python community will finish the job is unclear; developers there may be happy to take the performance work while leaving the global lock (which is not a problem for many applications) in place.

The prediction that GNU projects would continue to assert independence from the Free Software Foundation (FSF) can be deemed to have been on-target. For example, the binutils project adopted the developers certificate of origin in October, dropping the requirement that code be signed over to the FSF. The GNU Tools Infrastructure work, also, moves a number of projects a little further away from the FSF. On the other hand, Emacs remains firmly under Richard Stallman's control and doesn't appear to be looking to change that.

Machine learning certainly came to the attention of the free-software community with, for example, the filing of a class-action suit against GitHub for its Copilot offering. Whether machine learning played a greater role in free-software development, as was predicted, is another question, though. These applications appear to be firmly rooted in the proprietary world, and their training is expensive enough to make them inaccessible to much of the community.

Finally, your editor predicted that Linux might lose some embedded market share to systems like Fuchsia in 2022. It's not really clear that this has happened to any real extent. Instead, we've seen Google announce yet another embedded system to compete with the ones it already has.

What was not predicted

What was missed in our 2022 predictions? The increasing discord at The Document Foundation was one. The sources of trouble have been in place for a while and it should have been clear enough that the underlying disagreements driving the conflict had not (and have not) been resolved.

It's probably fair to say that almost nobody predicted that longtime kernel maintainer Andrew Morton would start using Git to manage patches going into the mainline.

The decision to base the Android version of the Thunderbird email client on the K9 app certainly came as a surprise here, even though it had evidently been in the works for some time. Thunderbird and K9 both have a lot of users; hopefully the combination of the two will make both camps happier.

Predicting that Rust support would land in the Linux kernel was not a sure bet; that is a major change for an old software project. But your editor wasn't able to see that, while this was happening, others would already be writing useful kernel modules in Rust that many users are going to end up wanting. The Apple-silicon GPU driver probably tops the list of interesting kernel features that are only available in Rust form, but the in-kernel 9P filesystem support may end up being popular as well.

Rust in the kernel is currently seen as experimental — something that can be removed if it doesn't appear to be working out. That situation will change as soon as some of these kernel modules are merged; it will no longer be possible to remove Rust without taking away functionality and breaking systems. Expect some interesting discussions when developers start pushing these modules for the mainline.

Finally, your editor explicitly refused to try to predict anything related to COVID. In the end, this year saw a number of successful in-person conferences being held, and the fact that many participants seem to have brought home COVID along with their T-shirts doesn't seem to be slowing things down. Our community needs to gather occasionally; that is the lubricant that makes our far-flung cooperation work the rest of the year. While we may never return to pre-pandemic levels of conferencing — it turns out there is also value in not traveling quite so much — the in-person conference seems to be back to stay.

Goodbye to 2022

Overall, it was another successful year for the free-software community; few would have predicted anything else. Not even the combination of an ongoing pandemic, economic uncertainty, and war in Europe would appear to have slowed things down much — so far. Our communities remain strong and our software only gets better over time.

The community was made poorer, though, by the loss of many of our members this year, including Lorinda Cherry, Marina Zhurakhinskaya, Pedro Francisco, Peter Eckersley, Sven Guckes, Tom Lord, and Wolfgang Denk. Their contributions and their presence will be much missed.

At LWN, we wrote and published 252 feature articles in 2022, and published 28 more from guest authors. We were able to cover presentations at 13 conferences over the course of the year. We raised our prices for the first time since 2010 this year, and we are deeply grateful that you stuck with us; we have slightly more subscribers than we did one year ago. LWN lives on the support of its readers, and we appreciate every one of you.

One thing that we have not been able to do is to hire more writers for LWN. It seems that the people who are both willing and able to create the sort of articles that LWN readers expect are rare and hard to find — even more so than we expected. The need has only become more acute; LWN is hard to sustain at its current staffing level. The position description is still out there; we would love to hear from anybody who thinks they might be interested in filling it.

Meanwhile, we wish the best for all of our readers for the remainder of the holiday season and the beginning of the new year. Thanks to all of you for your support, your comments, and for being a part of this community — we couldn't do it without you.

Comments (107 posted)

6.2 Merge window, part 1

By Jonathan Corbet
December 15, 2022

Once upon a time, Linus Torvalds would try to set a pace of about 1,000 changesets pulled into the mainline each day during the early part of the merge window. For 6.2, though, the situation is different; no less than 9,278 non-merge changesets were pulled during the first two days. Needless to say, these commits affect the kernel in numerous ways, even though there are fewer fundamental changes than were seen in 6.1.

The most significant changes merged for 6.2 so far include:

Architecture-specific

The arm64 architecture can now enable or disable software-implemented shadow stacks at boot time; this is done by patching in the relevant instructions where necessary. This change allows a single kernel to work efficiently on systems both with pointer authentication (where shadow stacks don't really add much) and without.
The Intel "asynchronous exit notification" mechanism is now supported; this allows code in SGX enclaves to detect single-step attacks.
There is a new set of operations allowing a hypervisor to support requests from Intel TDX guests; this documentation commit has some more information.
There is a new sysctl knob to control how x86 systems respond to processes executing split locks; see this commit for an overview and this article for the background.

BPF

BPF programs have increased access to control-group local storage; see this documentation commit for details.
BPF programs can now define types, allocate objects, and create their own data structures; this merge message gives an overview.
It is now possible for BPF code to access and store task_struct objects; see this commit for an overview.

Core kernel

It is now possible to move a process into a new time namespace when it calls exec(). Among other things, this allows a process to execute the vfork()+exec() sequence after unsharing its time namespace, which does not work in current kernels.
More Rust infrastructure code has been merged; see this article for details.

Filesystems and block I/O

Squashfs filesystems can now be mounted with the threads= option to control how parallel decompression is done; see this commit for details.
Squashfs can also now handle ID-mapped mounts.
The kernel's handling of POSIX access-control lists has been massively reworked. There should be no user-visible changes. This merge commit contains a detailed overview of what was done.
The fscrypt mechanism can now make use of the SM4 encryption algorithm though, as detailed in this merge message, the fscrypt maintainer recommends against its use.
The reliability of the much-maligned Btrfs RAID5 and RAID6 implementation has been improved; this merge message describes the changes that were made. There have also been more performance improvements merged for Btrfs.
The kernel can now be built without NFSv2 support; this is the next step toward removing that support entirely.
Permissions checks for access to NVMe devices have changed; operations that read or write a given device will now succeed if the writing process has the appropriate access in the permission bits on the device special file. Previously, CAP_SYS_ADMIN was required for such operations.
The packet CD/DVD driver, deprecated in 2016, has finally been removed.

Hardware support

Clock: MStar CPUPLL clocks, Ingenic JZ4755 CGU clocks, MediaTek FHCTL hardware controller clocks, Qualcomm SC8280XP and SM6375 display clock controllers, and Qualcomm SM8550 global clock controllers.
GPIO and pin control: Qualcomm SDM670 pin controllers, Loongson-2 SoC pin controllers, and Intel Moorefield pin controllers.
Graphics: Open Firmware display controllers, Renesas RZ/G2L MIPI DSI encoders, Jadard JD9365DA-H3 WXGA DSI panels, and NewVision NV3051D DSI panels.
Hardware monitoring: Ampere Altra SMpro hardware monitors and OneXPlayer EC fan controllers.
Input: Hynitron cst3xx touchscreens, Cypress TrueTouch Gen5 touchscreens, and Himax hx83112b touchscreens.
Media: OmniVision OV08X40 and OV4689 sensors, STmicroelectronics VGXY61 sensors, Toshiba TC358746 parallel-CSI2 bridges, Allwinner A31 image signal processors, Microchip image sensor controllers, Renesas RZ/G2L MIPI CSI-2 receivers, and Renesas RZ/G2L camera data receiving units.
Miscellaneous: ARM CoreSight performance monitoring units, Amlogic DDR bandwidth performance monitors, Loongson-2 SoC global utilities register blocks, Dell WMI-based platform sensors, ChromeOS human-presence sensors, Apple CPU-frequency controllers, ARM SCMI powercap controllers, Richtek RT6190 4-Switch BuckBoost controllers, MediaTek MT6357 power-management ICs, and Sunplus SP7021 MMC controllers.
Networking: Realtek 8852BE PCI wireless network adapters, Motorcomm yt8521 gigabit Ethernet PHYs, Renesas R-Car S4-8 Ethernet switches, MediaTek MT7996 wireless interfaces, NVIDIA Tegra multi-gigabit Ethernet controllers, Realtek 8821CU, 8822BU, 8822CU and 8723DU USB wireless network adapters, and Broadcom BCM4377/4378/4387 Bluetooth interfaces.
Sound: Realtek RT1318 codecs.
SPI: Microchip pci1xxxx PCIe switches, Socionext F_OSPI SPI flash controllers, and Nuvoton WPCM450 flash interface units.
Also: the kernel has a new framework for the management of compute-acceleration devices. There are no actual devices using that framework in 6.2; that may change for 6.3. Meanwhile, this documentation commit gives an overview of the new subsystem.

Miscellaneous

The new rv tool can be used to control the operation of the runtime verification subsystem. See this documentation commit for details.
The HTML version of the kernel documentation is now built with the Sphinx "alabaster" theme by default.

Networking

The IPv6 stack has gained support for "protective load balancing", described as:

PLB (Protective Load Balancing) is a host based mechanism for load balancing across switch links. It leverages congestion signals(e.g. ECN) from transport layer to randomly change the path of the connection experiencing congestion.

This paper has more details.

Security-related

The RANDOM_TRUST_BOOTLOADER and RANDOM_TRUST_CPU configuration options have been removed; the only way to set those parameters now is with a command-line option. See this commit for more information.
The Landlock security module can now control file truncation operations. This documentation commit has some more information.

Internal kernel changes

The read-copy-update (RCU) subsystem has a new "lazy" mode (controlled by the RCU_LAZY configuration option). When this mode is active, the handling of RCU callbacks will be delayed so that those callbacks can be run in larger batches. On lightly loaded systems, the result can be a 5-10% power savings. For callbacks that can't wait, there is a new call_rcu_hurry() function. This commit has the details.
As described in this article, the char type will now default to unsigned on all architectures.
The SLOB slab allocator, which was designed for small-memory systems, has been deprecated and will likely be removed in a future release. Any remaining users are encouraged to move to SLUB, as the other allocator (SLAB) will eventually be targeted as well. To help on smaller systems, there is a new SLUB_TINY configuration option that reduces the SLUB allocator's memory requirements.
Support for message-signaled interrupts (MSIs) has been massively reworked to deal with years of technical debt and upcoming technologies. This merge commit describes the situation in great detail.
There have been changes to the timer subsystem as well. del_timer() and del_timer_sync() have been renamed to timer_delete() and timer_delete_sync() respectively. There are new functions, timer_shutdown() and timer_shutdown_sync(), which are meant to ease the task of cleaning up timers that might be rearmed during that process; once they are called, any attempts to rearm the timer will be ignored.

If the usual two-week schedule is followed, the 6.2 merge window can be expected to end on December 25. Given the significance of that date and a number of warnings from Torvalds, though, it would not be at all surprising if this merge window ended up being shorter than usual. Whatever happens, LWN will follow up with a summary of the changes that were pulled once 6.2-rc1 has been released.

Comments (21 posted)

Enabling non-executable memfds

By Jonathan Corbet
December 19, 2022

The memfd interface is a bit of a strange and Linux-specific beast; it was initially created to support the secure passing of data between cooperating processes on a single system. It has since gained other roles, but it may still come as a surprise to some to learn that memory regions created for memfds, unlike almost any other data area, have the execute permission bit set. That can facilitate attacks; this patch set from Jeff Xu proposes an addition to the memfd API to close that hole.

A memfd is created with a call to memfd_create(), which will return a file descriptor referring to the region. That descriptor can be treated as an ordinary file, in that it can be written to or read from; it can also be mapped into a process's address space. Normally the first step will be to call ftruncate() to set the size of the region; after that it can be populated with data and passed to another process. One interesting characteristic of memfds is that they can be "sealed" with a call to fcntl(), an operation that disallows any further changes to the stored data. Sealing allows a recipient to know that the contents of a memfd will not change in unexpected ways in the middle of an operation.

As it happens, the virtual file that underlies a memfd is created with the execute permission bits set; that allows the memory itself to be mapped as executable. The result is a combination of permissions — both write and execute permission enabled — that developers in both the kernel and user space are increasingly going out of their way to avoid. A memory area that is both writable and executable gives attackers a relatively easy way to inject their own code into a target process. And, indeed, Xu notes in the patch cover letter that memfd areas have been used in just that way to attack ChromeOS systems.

One might be tempted to respond by just removing the execute permission from the underlying memfd file unconditionally. But at this point that would be an ABI change, and there is at least one known (legitimate) user of executable memfds. The runc container runtime uses an executable memfd to load the image of the container it is about to run; that feature was added in response to another vulnerability in 2019. So the ability to have an executable memfd must remain.

Executable memfds do not necessarily have to be the default, though, and processes can definitely be given the ability to make a non-executable memfd. Xu's patch set thus modifies the memfd API in that direction by adding a pair of new flags for memfd_create():

MFD_EXEC explicitly asks memfd_create() to create a memfd with execute permission set. That simply reinforces the current default, but the default can be changed as described below.
MFD_NOEXEC_SEAL, instead, creates a memfd without execute permission, and applies a seal that prevents that setting from ever being changed. A memfd created with this flag will thus never be executable no matter how hard a user-space attacker might try.

There is a new fcntl() operation, F_SEAL_EXEC, that seals the execute permission and prevents it from being changed thereafter. As with all sealing operations, this change cannot be undone afterward.

The patches also add a new sysctl knob, called vm.memfd_noexec, that is local to the current PID namespace; it controls what the kernel does when the affected process creates a memfd without specifying either of the two new flags. Setting that knob to zero causes memfd_create() to behave as if MFD_EXEC were set — the current behavior. Setting it to one, instead, causes MFD_NOEXEC_SEAL to be set, essentially turning off execute permission by default. A value of two will cause any memfd_create() call that does not explicitly provide MFD_NOEXEC_SEAL to fail, disabling executable memfds entirely. The default, naturally, must be zero to avoid breaking any existing applications.

The new code emits a warning to the kernel log if neither of the two flags is set when a memfd is created, in the hope of causing applications to be updated. Peter Xu observed that this could fill the system log with a lot of warnings; after some discussion, it was agreed to emit the warning only once per boot. As a result, it could take several boot cycles to discover all of the applications that need to be fixed on a given system, but that was deemed preferable to unlimited logging.

Finally, Paul Moore has questioned the addition of a security-module hook for memfd_create() since there is no corresponding change to a security module to actually use that hook. As a result, it's possible that the hook might be taken out until somebody wants to write a policy for this system call. Otherwise, the patch series appears to be ready for merging.

Comments (3 posted)

The intersection of shadow stacks and CRIU

December 16, 2022

This article was contributed by Mike Rapoport

Shadow stacks are one of the methods employed to enforce control-flow integrity and thwart attackers; they are a mechanism for fine-grained, backward-edge protection. Most of the time, applications are not even aware that shadow stacks are in use. As is so often the case, though, life gets more complicated when the Checkpoint/Restore in Userspace (CRIU) mechanism is in use. Not breaking CRIU turns out to be one of the big challenges facing developers working to get user-space shadow-stack support into the kernel.

The idea behind shadow stacks is simple: in addition to the normal program stack (which holds return addresses, local variables, and more) there is a special memory area, called the "shadow stack", that stores only return addresses. Whenever a CALL instruction is executed, the return address is pushed onto both the normal and the shadow stacks. When, later, a function ends with a RET instruction, the return address that's popped from the normal stack is compared to that on the shadow stack. If they match, the execution continues; if they don't, a violation of control-flow integrity has just been detected.

Recent x86 processor models implement shadow stacks in hardware, meaning that no instrumentation is required for a program to get the protection that shadow stacks provide and that the cost of using a shadow stack is negligible. Once the feature is enabled, the CPU takes care of pushing and popping the return address on the shadow stack and comparing the return addresses. If the return addresses do not match, the CPU generates a control protection exception. To support shadow stacks, the x86 architecture has been extended with a model-specific register (MSR) that controls the use of the shadow stack and its features. There are also shadow-stack pointer MSRs (one for each possible privilege level) and a set of instructions for manipulating shadow-stack contents.

The discussion about how kernel should support shadow stacks for user space started a long time ago, but it has still not concluded. One of the difficulties in enabling this feature is the possibility that some applications will be broken by shadow stacks because they use non-standard ways to change their control flow. The list of problematic applications includes GDB, various JIT engines, and, of course, CRIU.

CRIU and shadow stacks

CRIU is known for its intimate relations with the kernel and its use of obscure kernel interfaces. Among other things, CRIU has to intervene in the control flow of the tasks to be checkpointed in order to extract the information that cannot be obtained by other means (such as from the /proc file system). When restoring a saved process, CRIU has to be able to recreate its state as it was at checkpoint time, so if the process had a shadow stack enabled, that shadow stack has to be restored exactly as it was before the checkpoint.

To checkpoint (or "dump" in CRIU jargon) a process, CRIU injects a blob with parasite code into the target to get parts of the process state that are not visible from the outside or which can only be obtained slowly and painfully. To inject the parasite, CRIU stops the task with ptrace(), finds a free area in the task's memory layout, puts the parasite code there, and makes the task jump into that code. So far, there are no conflicts with shadow-stack enforcement because, after the parasite starts running, the CALL and RET instructions within the parasite are properly paired.

There is, however, a problem when the parasite's job is done and the normal process execution should be resumed. CRIU uses the sigreturn() system call, which is normally only invoked at the end of a signal handler, to "cure" the task of the parasite and resume its normal execution. This operation could be done with ptrace(), but sigreturn() reduces the synchronization complexity between CRIU and the parasite and, more importantly, allows the task to continue even if CRIU itself fails.

The implementation of sigreturn() in the kernel takes special measures to ensure that its usage does not violate shadow-stack integrity. Whenever the kernel needs to deliver a signal to a process, it sets up the return frame that will be used when signal handler is concluded; it also pushes some data to the shadow stack and then verifies the integrity of that data when sigreturn() is called. Since CRIU uses sigreturn() directly — without any signal being delivered to the process that is being dumped — it has to tweak the shadow-stack contents to match the state expected by the kernel. The modification of the shadow-stack pointer is done using a couple of ptrace() calls are part of the latest API proposed for shadow-stack enablement; the shadow-stack contents can already be adjusted using existing ptrace() calls. This shadow-stack modification is performed early during parasite injection in order to preserve the ability to resume normal task execution if anything goes wrong.

Once parasite injection and removal are handled, dumping a process with a shadow stack enabled is simple. The only difference from a "normal" dump is the need to save the shadow-stack enable/disable state and the shadow-stack pointer, both of which can be easily done with the ptrace() calls. The shadow-stack memory area is saved exactly as any other anonymous memory and does not require any special care.

CRIU restore

Restoring a process with a shadow stack is slightly more involved than dumping. When CRIU restores a process tree, it creates all of the tasks and threads found in the checkpoint and then modifies them so that their state will be exactly as the state that was saved at dump time. After the state of each thread is restored, CRIU sets up a sigreturn() frame for each thread, cleans up leftovers of the original CRIU process, and calls sigreturn() to restart the execution of the restored tasks. In order to restore the shadow stack, CRIU needs to be able to map the shadow-stack memory at exactly at the same address as it was before the dump. CRIU also needs a way to efficiently populate the contents of the shadow stack with the saved data and the ability to set the shadow-stack control bits and pointer. Additionally, the kernel API lets the C library and program loader lock various shadow-stack features; CRIU must thus be able to ensure that these feature locks are kept after a restore.

Since shadow-stack memory is somewhat special, the virtual memory area for it should be created with proposed map_shadow_stack() system call (described in this article) rather than with mmap(). Shadow-stack memory is read-only and it cannot be remapped. Based on the feedback from the CRIU developers, the latest version of the kernel patches that enable shadow stacks for user space allows passing a desired address to the map_shadow_stack() system call. This allows CRIU to map the shadow stack of the restored processes exactly where it was before the dump.

As a result of the way CRIU recreates the process's memory layout and restores its memory contents, mapping shadow-stack memory requires some additional care beyond having it at the correct address. To avoid conflicts between the memory layouts of CRIU and the restored process, CRIU reserves enough virtual memory to hold all of the restored process's memory areas, partially populates that memory, and then uses mremap() to map chunks of the reserved area to the appropriate addresses; it then finishes restoring the memory contents. The remapping happens late in the restore process and, since the shadow-stack memory cannot be remapped, it has to be created after the memory layout is nearly finalized; otherwise map_shadow_stack() could clobber an existing mapping.

Once the shadow stack has been put into the correct place, CRIU switches the shadow-stack pointer to it using the x86 RSTORSSP and SAVEPREVSSP instructions. At this point, the shadow stack can be populated with the WRUSS instruction. After restoring the saved shadow-stack data, CRIU uses WRUSS again to set up a frame for sigreturn() that will later resume normal execution of the restored tasks.

Restoring the shadow-stack contents could also be done with ptrace(), but user-space stacks can grow quite deep; there may be a lot of threads, and so restoring shadow-stack contents that way would involve complex synchronization between the CRIU control process and the tasks being restored. Additionally, filling memory with ptrace() is terribly slow. Although WRUSS is not as efficient as memcpy(), it is still much faster than ptrace(). Before using WRUSS, though, it should be enabled in the shadow-stack control register, where it is disabled by default. CRIU can enable WRUSS before restoring the shadow-stack memory with an arch_prctl() call that allows manipulating bits in the shadow-stack control MSR, and switch it back off before letting the restored tasks run.

The last task that CRIU has to take care of is the locking of the shadow-stack features. The GNU C Library (glibc) will enable shadow stacks for a process if it finds certain bits in the ELF header of the running program, and disables the feature if these bits are absent. Once the shadow stack is enabled or disabled, glibc locks its state with an arch_prctl() call. The same call allows locking the state of WRUSS enablement but, at the moment, glibc does not use it. The feature locks are inherited across a clone() call so, if CRIU runs with shadow stacks enabled, it cannot restore a process that has shadow stacks disabled and similarly, if CRIU starts without the shadow stack, it has no way to enable it after clone()ing the restored tasks. To resolve this problem, the proposed kernel API introduces another arch_prctl() call that will unlock the shadow-stack features. This call is only available via ptrace(), so an attacker won't be able to disable shadow stack from within a process. With this arch_prctl() call, CRIU can control the shadow-stack feature locks for the clone()ed tasks and then reset them to the final, secure state after the shadow stack is restored.

Conclusions

Shadow stacks on the x86 architecture provide efficient protection against return-oriented programming (ROP) and similar attacks, but its use necessitates updates of certain applications. Hopefully, CRIU's experience with shadow stacks will be useful to other projects that need to address shadow-stack compatibility issues. Enabling shadow stack-support in CRIU revealed several gaps in the earlier versions of the proposed kernel APIs and the initial implementation of shadow-stack support in CRIU relied on API extensions that were not included in the original kernel patches. The latest version of those patches has incorporated feedback from the CRIU developers and has all the necessary knobs to support checkpoint and restore of applications with shadow stacks.

Comments (18 posted)

Beyond microblogging with ActivityPub

December 20, 2022

This article was contributed by Jordan Webb

ActivityPub-enabled microblogs are gaining popularity as a replacement for Twitter, but ActivityPub is for more than just microblogging. Many other popular services also have open-source alternatives that speak ActivityPub. Proprietary services operated by commercial interests usually deliberately limit interoperability, but users of any ActivityPub-enabled service should be able to communicate with each other, even if they are using different services. This promise of interoperability is often limited in practice, though; while ActivityPub specifies how multiple types of content can be published, the kinds of content that can be displayed or interacted with vary from project to project.

The ActivityPub protocol describes how servers can exchange Activity Streams. Microblogs mostly emit activities related to status updates (which is called a "Note" in ActivityPub parlance), but there are many other types of objects that can be described in these streams. ActivityPub projects that aren't microblogs mostly specialize in publishing activities related to one or more of these other types of objects; instead of notes, they publish pages, images, or videos. All types of objects are allowed to contain some common fields, including a name and a URL; software that doesn't understand a particular type of object may fall back to using these fields to display a link to the object on its original server instead, or may simply choose not display the object at all.

Unless otherwise noted, all of the projects mentioned in this article are released under the terms of the AGPL 3.0.

WriteFreely

WriteFreely is devoted to long-form blogging; there are no character limits on posts to be found here. It presents a simple and distraction-free interface for writing. Posts must be authored in Markdown or HTML; there is no WYSIWYG editor included.

Like most blogging software, WriteFreely can be used to publish posts via HTML and RSS, but it also supports federation via ActivityPub. If the author of a blog chooses to enable federation, users on ActivityPub-enabled services will be able to follow it. Mastodon users will only see post titles, a short summary, and a link to the post, as seen in a video demonstrating the feature. The Hometown fork of Mastodon allows users to read full posts from software like WriteFreely without having to leave their feed.

WriteFreely's participation in the ActivityPub ecosystem seems to be write-only; users of other software can reply to posts from WriteFreely, but it's not clear that WriteFreely does anything with the replies. It does not provide a way to follow other blogs or ActivityPub users; for that, an account on some other service is needed.

The project is written in Go with its source available on GitHub. WriteFreely is distributed as a single binary. It stores its data in a SQLite database by default, but can optionally be configured to use MySQL. Instructions for running it in a container are available for development purposes, but the documentation says that there is no official way to run the container in production as of yet. There is an image on Docker Hub, but it isn't up-to-date. The image was last updated on June 30th, 2021, whereas WriteFreely's most recent release on November 11th of this year.

WriteFreely's commit history goes back to 2016; LWN looked at it in 2019. A hosted version is available at write.as, which offers both free and paid plans. Development is sponsored by Musing Studio, which operates write.as and other related services.

Lemmy

Lemmy is a link aggregator and discussion forum. People who hang out on Reddit or Hacker News will find Lemmy's interface to be familiar. Each site running Lemmy can play host to a number of communities. Each community contains a list of posts, which can be sorted several different ways. Users can participate in threaded discussions attached to each post, and upvote posts or individual comments.

To users of other ActivityPub software, each Lemmy community appears to be a user; posts and comments from Lemmy appear as notes that have been reposted by the community user. Users on other types of servers can follow Lemmy communities and participate in discussions by replying to existing posts and comments, but they may not be able to create new top-level posts of their own, unless the software that they're using also supports the concept of discussion groups. Lemmy users can subscribe to communities on their own and other servers, but cannot follow individual accounts, although they can exchange direct messages with them.

Lemmy is split into two parts; source for both can be found in the LemmyNet organization on GitHub. The backend ("lemmy") is written in Rust, while the user interface ("lemmy-ui") is written in TypeScript. The UI is built using the Inferno framework and is an isomorphic application — this means that the same JavaScript used to build the user interface on the client-side can also be run on the server to pre-render a page, which allows it to be indexed by search engines. Because of this, the UI must be run in Node.js as a separate application alongside the backend.

Docker Compose seems to be the preferred method of deploying Lemmy; an Ansible playbook is also available, but it just automates the recommended setup with Docker Compose. In addition the backend and the UI, Lemmy requires a few additional services. It stores its data in a PostgreSQL database, and an HTTP proxy is required to route requests to the backend or to the UI server depending on what is requested; an example configuration for NGINX is provided. The example configurations also include an instance of pict-rs, which is a small API server for hosting images and other media. Lemmy's documentation indicates that this is optional, but, without it, users will be unable to upload media with their posts.

Development of Lemmy began in 2019 and is funded by donations and sponsorships. Its creators, Dessalines and Nutomic, are also the primary contributors to both the backend and the frontend.

Pixelfed

Pixelfed is designed for sharing images; it is an alternative to Instagram. Users can create posts by uploading photos or videos, comment on each other's posts, and exchange direct messages with each other. Public posts on Pixelfed must contain at least one photo or video; text-only posts aren't allowed, though text-only comments and direct messages are. Users can also create collections of their posts, and "stories", which are collections of posts that only appear on a user's profile for a limited amount of time.

Users of other ActivityPub software can follow Pixelfed users, comment on their posts, and exchange direct messages with them, but collections and stories don't seem to federate to anything but other instances of Pixelfed. Users on Pixelfed can follow users on other types of servers, but they will only see posts that include media; a Pixelfed user following a Mastodon user will see the updates with pictures and video that the Mastodon user posts, but will not see any updates that only contain text.

Pixelfed is written in PHP and its source is available on GitHub. The installation instructions assume that the user already has a web server that can run PHP applications. There are no official container images, but the repository contains some example Dockerfiles and a docker-compose.yml. In addition to a PHP-capable web server, Pixelfed needs a process to handle background jobs, a Redis server, and either a PosgreSQL or MySQL database.

The first commit to Pixelfed's repository was made in 2018, and is primarily developed by its creator, Daniel Supernault. The project is funded by donations and a sponsorship from the NLnet Foundation.

PeerTube

PeerTube is similar to YouTube; it allows users to create channels, upload and share videos to them, and to host live streams. PeerTube can use peer-to-peer technologies to reduce the amount of bandwidth needed by the server hosting the videos; the first versions of PeerTube used WebTorrent to deliver video, and later versions have adapted the HTTP Live Streaming protocol to run over WebRTC. This allows viewers' browsers to obtain parts of the video from other viewers; PeerTube also supports an instance redundancy system that allows PeerTube servers to backup and help serve each other's videos.

Users of most other ActivityPub software can follow users or channels on PeerTube and comment on videos, and PeerTube users can reply to them. As of this writing, Pixelfed seems to be incompatible with PeerTube; Pixelfed users can follow users and channels on PeerTube instances, but Pixelfed will not display any posts from PeerTube servers. Users of PeerTube do not appear to be able to follow users on other types of servers; accounts from servers running something other than PeerTube can appear in the PeerTube user interface, but PeerTube will not find any channels or videos associated with them and will not let a user subscribe to them.

PeerTube is written in TypeScript. It stores its data in a PostgreSQL database, and also requires an instance of Redis. Installation instructions are available for running PeerTube as a system service and for running PeerTube with Docker Compose. The application itself is relatively lightweight, but transcoding uploaded videos is CPU-intensive. Serving videos also takes up a lot of bandwidth, and storing them takes up a lot of disk space. Some configurations require PeerTube to create and store multiple copies of each video; one copy is needed for each streaming format and quality level that the server operator wishes to support.

Development of PeerTube is sponsored by Framasoft, a French non-profit organization, and goes back to 2015. Most Framasoft projects are hosted on its GitLab server, but development of PeerTube occurs on GitHub.

Others

Because of the breadth of the ActivityPub ecosystem, it is impossible to cover it all in a single article, but here are some other notable projects:

Bookwyrm is a social reading application similar to Goodreads, which allows users to review and rate books as well as to maintain lists of books they want to read and have read. Users on other types of servers can follow Bookwyrm users to see their reading activity and exchange direct messages with them. Bookwyrm users can also follow users on other types of servers, and will see their activities, even if they are not book-related. It is written in Python; source is available on GitHub under the Anti-Capitalist Software License, which is not a free-software license since it forbids most use by commercial entities except for certain types of worker-owned collectives.
Lotide is another link aggregator and discussion forum, similar to Lemmy; it is also written in Rust. Lotide appears to be fully interoperable with Lemmy; each will recognize communities on the other type of server. Sources are available on its SourceHut home page.
Mobilizon is a tool for organizing events. Users can create events, search for them on their own or on other Mobilizon servers, comment on them, and RSVP to them. Users of other ActivityPub software can follow Mobilizon accounts, receive notifications of events, and comment on them. Like PeerTube, development of Mobilizon is sponsored by Framasoft. It is written in Elixir and sources are available on Framagit, the FramaSoft GitLab instance.
Funkwhale is a social music application. It allows users to upload their music libraries, listen to them, and share them with other users. Libraries can only be shared with other people on Funkwhale, users can also create channels and upload audio files to them; users of other ActivityPub software can follow these channels and receive each audio file as a post. Funkwhale's backend is written in Python, while its user interface is written in TypeScript. Source is available on the project's own GitLab server.
Owncast is a self-hosted livestreaming platform written in Go. Users of other ActivityPub software can follow Owncast servers to receive stream notifications and posts from Owncast users. Source is available on GitHub under the MIT license.

Conclusion

Though all of these projects are based on ActivityPub, in many cases their interoperability is limited, either by design or by necessity. WriteFreely can publish, but offers little interactivity, Pixelfed chooses to ignore posts without images or video, and PeerTube doesn't allow its users to subscribe to anything but PeerTube channels. Meanwhile, Lemmy and other more specialized projects have interaction models that don't map cleanly to the semantics of other applications.

People who want to be able to fully participate in multiple types of communities will need multiple accounts; all of these projects may speak the same protocol, but there is no single project that supports every type of interaction that is possible within the ecosystem. A person who wants to be able to casually follow everything possible from a single account may be happy to have their home on Mastodon or one of its alternatives, though; almost all of the content shared by users of other ActivityPub software can be followed and viewed by users of microblogging software, though they may not be able to fully interact with them.

While it is sometimes limited, any degree of interoperability is better than what is offered by most commercial services. It's impossible to follow an Instagram account from Twitter or a YouTube channel from Facebook. The current big players in social media (with the notable exception of Tumblr) are unlikely to hop on the interoperability train any time soon, but their assorted efforts to extract greater amounts of revenue from their services have caused their relationships with their users to become increasingly sour. Services based on ActivityPub offer many of the same attractions that can be found within commercial walled gardens, without the walls.

Comments (15 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>