|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for August 14, 2025

Welcome to the LWN.net Weekly Edition for August 14, 2025

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Indico: event management using Python

By Jake Edge
August 13, 2025

EuroPython

The Indico event-management tool has been in development at CERN for two decades at this point. The MIT-licensed web application helps organize conferences, meetings, workshops, and so on; it runs on Python and uses the Flask web framework. Two software engineers on the project, Dominic Hollis and Tomas Roun, came to EuroPython 2025 in Prague to talk about Indico, its history, and some metrics about its community. There is a bit of a connection between Indico and the conference: in 2006 and 2007, the tool was used to manage EuroPython.

CERN

Since they work in a rather interesting place, Roun started with an overview of CERN. He said that attendees may be familiar with CERN from pop culture; for example, it appears in the opening scenes of the movie Angels & Demons, where the bad guy tries to steal antimatter from the organization. That is not completely unrealistic, he said, showing a picture of the antimatter factory at CERN; he and Hollis do not work all that far from the factory on the campus.

[Tomas Roun]

CERN is "the largest particle physics lab in the world". There are thousands of people working on site, keeping the facilities running, including both of them. Thousands of scientists also come to CERN each year to run their experiments.

One of the main things that happens at CERN is experiments with particle collisions. "We take particles, for example protons, we make them go very very fast and then we smash them together and see what happens." The Large Hadron Collider (LHC) is used for those collisions, which is the largest particle accelerator and collider in the world; it is also the largest machine ever built. The LHC has four hangar-sized detectors that are effectively large, high-speed cameras to record the collisions so that they can be reconstructed by the scientists.

Probably the most famous discovery made using the LHC was of the Higgs boson in 2012. It was a missing particle in the standard model of particle physics, which had been theorized to exist since the 1960s. Beyond physics, the World Wide Web was invented at CERN by Tim Berners-Lee in 1989.

"CERN also supports open source and open science, in general." All of the research done there is "open and available to anybody". In addition, the organization releases a lot of the tools it develops internally, such as Indico, as open source; he mentioned the ROOT data-analysis framework and Zenodo data repository as examples. CERN also contributes to various open-source projects, such as OpenStack and Python. At CERN, Python is used for everything, Roun said: "web applications, like Indico, but also desktop applications, machine learning, data analysis, and just random scripts that bolt everything together". More information can be found on CERN's open-source program office page and the list of projects in the organization's GitHub repository.

Indico

With that, he turned over the podium to Hollis, who described Indico as a Swiss Army knife for managing meetings, conferences, and more. Its core is developed at CERN, but there are also lots of contributions from organizations like the United Nations (UN), Max-Planck Institute for Physics, and others.

"It is one of the more popular event-management systems that you probably never heard about." There are more than 300 instances installed worldwide that the project knows about. Those servers are handling around 350,000 users. The initial adopters were research-oriented, but more recently there has been adoption by diplomatic users (indico.un.org) and in the tech industry, such as by Canonical, GNOME, and the Linux Plumbers Conference (LPC). LWN looked at Indico back in 2018 in conjunction with its adoption by LPC.

[Dominic Hollis]

Beyond Python and Flask, Indico uses SQLAlchemy for access to its PostgreSQL database. It uses React in its web interface. There is more to the stack, Hollis said, which people can investigate in the project's GitHub repository.

The project has been under development for more than 20 years at this point. It started as a PHP application called CDSAgenda back in 1999, which used MySQL as its database; it was a "typical LAMP stack", which was common in those days. It was developed and maintained by a small team at CERN for use by the organization.

In the early 2000s, the EU funded development of a "catch-all" event-management system, which CERN built as Integrated Digital Conference or InDiCo. It was written in Python, using mod_python and mod_wsgi for the Apache web server, and the Zope Object Database (ZODB) for storage. It also "featured a very limited, home-brewed templating engine". After a few years of development work, the first event was hosted, the Computing in High Energy Physics (CHEP) conference, which became known by project members as "event zero". EuroPython 2006 was hosted by CERN and was run on InDiCo, as was EuroPython 2007 in Lithuania.

The name was changed to "Indico" in 2010 or so and the project changed in other ways. Over the next the few years, it adopted Flask and then started using Redis for Flask sessions to reduce the pressure on ZODB. SQLAlchemy and PostgreSQL replaced ZODB in 2014 or so, though, for a while, they ran in parallel. Users were being stored in PostgreSQL and events in ZODB, which was "a bit of a mess". The home-grown templating engine was replaced with Jinja. Between 2014 and 2017, nearly all of the Python code base, some 200,000 lines, was rewritten or refactored to remove CamelCase, make it conform to PEP 8 ("Style Guide for Python Code"), and perform other cleanups. That journey was described on a poster that Hollis showed.

He quickly went through some of the highlights since 2017, including adding React and a new room-booking interface in 2019. Around 2020, the transition from Python 2 to Python 3 was made. Modernizing OAuth handling and adding support for API access via personal-access tokens, along with some cleanup, was done in 2021. The advent of the GDPR initiated some additional privacy features in 2022, and better document generation using WeasyPrint for things like receipts, statements of attendance, and so on.

There have been some projects that were spinoffs from the Indico work, starting with Flask-Multipass. It allows simultaneously using multiple authentication mechanisms with Flask. It is used in Indico, but is packaged separately so it can be used by any Flask application. It supports many different authentication mechanisms, like OAuth, LDAP, and SAML, as well as lesser-used ones like Shibboleth.

Making it easier to use Flask routes from JavaScript is the focus of the js-flask-urls library. It helps glue together a JavaScript-based frontend to a Flask backend without having to hard-code the Flask URLs. He also noted that the Indico command-line interface can be used to manage various aspects of the instance. "indico shell" is helpful for administrators, but it can also be used when developing Indico features or plugins.

Metrics

Roun returned to the microphone to report on some of the metrics that have been gathered from the project's Git repository. There is a Linux Foundation project called CHAOSS, "which gives you all sorts of different metrics you can actually run on your open-source project to figure out how you are doing". The Indico project is interested in a number of different measures, including the code evolution in terms of size and languages, amount of legacy code, number of contributors, its bus factor, and the work-life balance for its contributors.

[Indico graph]

His first graph (seen above from the talk slides) showed the code base size and languages used over the years since 2009, which is when the project started using Git. One interesting thing is that the size of the project is about the same now as it was in 2009, 325,000 lines of code or so, though "we've shipped probably hundreds of features in Indico" over that time, he said. That means the project is much more efficient in what it is able to achieve per line of code, "which I think is quite cool". Currently, the project is roughly half Python, with React, JavaScript, CSS, and Jinja templates making up the rest of it.

Since Indico is over 20 years old, it is going to have some legacy code, Roun said; the project would like to be able to measure the amount of that code. Another useful measure is which modules are not being touched much or at all, which may indicate a lack of knowledge about how they work and concern about breaking things when changes are made. He showed a histogram of the age of all of the lines of code in Indico, which measures how long a line of code stays unmodified in the code.

The average lifetime is around six years. That's a reasonable number, he said, "it means we aren't actually rewriting stuff all the time, so we do have some stable and well-tested code", but without "too much legacy code, it's still manageable". He noted that there is a chunk of code that has not been touched in 16 years, but it is something that he is currently rewriting, so that piece of legacy code will largely be gone soon.

His next graphs were of contributors, the first showing steady growth in the total number of contributors to somewhere close to 200. It shows that, even with large and somewhat complicated code base, "people are able to get familiar with it and contribute". The other graph was a breakdown by year of existing and new contributors, which showed low-mid teens for new contributors and 40-50 total contributors over the last five years or so. The team at CERN is five or six people, so there are quite a few others who are also working on the tool, both individuals and from organizations using Indico. He noted that the UN has made quite a few contributions recently.

There are lots of ways beyond code to contribute to a project, he said, including documentation, design, artwork, community building, or spreading the word. Another, that Indico has "been doing quite well" with, is translations. The tool is available in more than 15 languages and there are a few more that are being worked on. Except for French, all of the translations are being done by volunteers. It is a large job, as there are more than 6,000 phrases that need to be translated for the project.

He showed a few other graphs, then wrapped up the presentation. Indico has been through a lot of changes over the last 20 years, Roun said. "We've changed technologies, frameworks, databases, even programming languages, and, so, who knows where we're going to be in five, ten, 15 years, but for sure Indico is going to be around and kicking."

Q&A

There was a good bit of time left for questions, starting with one about why Transifex was chosen for managing translations over other options. Roun said that the choice was made long ago, before Weblate "was as good as it is now". There have been thoughts about switching to Weblate, but there are many translation volunteers who are used to Transifex, so switching would "probably hurt more people than it would help"; in addition, the project is happy overall with Transifex.

Another question regarded any regrets that they had about technology choices that were made along the way. Hollis quickly stepped up to answer: "This is a personal opinion, but React and the JavaScript ecosystem", though it was not clear that he was being entirely serious. He continued that "there are always mini-regrets when you change tech", because there is always something new that others are jumping onto. The project tries to follow standards of various sorts, but those standards change over time.

From the audience, Indico lead developer and project manager Adrian Mönnich said that the choice of ZODB in the early days might be a candidate for something that was regretted. That choice was made well before his time. ZODB made it more difficult to change the underlying objects being used in Indico, since they were stored in what some called a "glorified pickle store". The idea was that ZODB made it easy to prototype Indico in the early going and, on balance, its use allowed the project to make a lot of progress quickly, so maybe it is not truly a regret, he said.

Another question was about how the installations were tracked and how communication within the community was done. Hollis said that the forum at talk.getindico.io was a general-purpose discussion site "for all things Indico". There is also a Matrix channel. Indico instance administrators can choose to enable telemetry, which is just some simple information that allows the project to count the number of servers and the Indico version they are running.

The priority given to proposed features was up next; is it based on CERN's needs or other criteria? Roun said that originally it was all based on what CERN needed, but that now "the core of Indico is for everybody, there is no CERN-specific stuff in it". CERN has some plugins for functionality it needs, but the rest of the code is meant to take into account the needs of all Indico users.

[I would like to thank the Linux Foundation, LWN's travel sponsor, for travel assistance to Prague for EuroPython.]

Comments (3 posted)

Arch shares its wiki strategy with Debian

By Joe Brockmeier
August 12, 2025

DebConf

The Arch Linux project is especially well-known in the Linux community for two things: its rolling-release model and the quality of the documentation in the ArchWiki. No matter which Linux distribution one uses, the odds are that eventually the ArchWiki's documentation will prove useful. The Debian project recognized this and has sought to improve its own documentation game by inviting ArchWiki maintainers Jakub Klinkovský and Vladimir Lavallade to DebConf25 in Brest, France, to speak about how Arch manages its wiki. The talk has already borne fruit with the launch of an effort to revamp the Debian wiki.

[Jakub Klinkovský]

Klinkovský and Lavallade were introduced by Debian developer Thomas Lange, who said that he had the idea to invite the pair to DebConf. Klinkovský said that he had been a maintainer of the wiki since about 2014, and that he is also a package maintainer for Arch Linux. He added that he contributes to many other projects "wherever I can". For his part, Lavallade said that he has contributed to the wiki since 2021, but he had only recently joined the maintenance team: "I know just enough to be dangerous."

Lavallade said that the talk was a good opportunity to cross-pollinate with another distribution, and to do some self-reflection on how the wiki team operates. They would explain how the wiki is run using the SWOT analysis format, with a focus on the content and how the maintenance team keeps the quality of pages as high as it can. "SWOT", for those who have been fortunate enough not to have encountered the acronym through corporate meetings, is short for "strengths, weaknesses, opportunities, and threats". SWOT analysis is generally used for decision-making processes to help analyze the current state and identify what an organization needs to improve.

ArchWiki:About

The ArchWiki was established in 2004; the project originally used PhpWiki as its backend—but Klinkovský said that it was quickly migrated to MediaWiki, which is still in use today. The wiki maintenance and translation teams were established "about 2010". The maintenance team is responsible for the contribution guidelines, style conventions, organization, and anything else that contributors need to know.

Today, the wiki has more than 4,000 topic pages; it has close to 30,000 pages if one counts talk pages, redirects, and help pages. "We are still quite a small wiki compared to Wikipedia", Klinkovský said.

He displayed a slide, part of which is shown below, with graphs showing the number of edits and active users per month. The full set of slides is available online as well.

[ArchWiki today slide]

Since 2006, the wiki has had more than 840,000 edits by more than 86,000 editors; the project is averaging more than 2,000 edits by about 300 active contributors each month. Klinkovský noted that this "used to be quite a larger number".

Strengths

Lavallade had a short list of the "best user-facing qualities" of the ArchWiki, which are the project's strengths. The first was "comprehensive content and a very large coverage of various topics". He said this included not just how to run Arch Linux, but how to run important software on the distribution.

The next was having high-quality and up-to-date content. Given that Arch is a rolling-release distribution, he said, every page has to be updated to reflect the latest package provided with the distribution. That is only possible thanks to "a very involved community"; he noted that most of the edits on the ArchWiki were made by contributors outside the maintenance team.

All of that brought him to the last strength he wanted to discuss: its reach beyond the Arch community. He pulled up a slide that included a quote from Edward Snowden, which said:

Is it just me, or have search results become absolute garbage for basically every site? It's nearly impossible to discover useful information these days (outside the ArchWiki).

Contribution and content guidelines

The contribution guidelines and processes have a lot to do with the [Vladimir Lavallade] quality of the content on the wiki. Contributors, he said, have to follow three fundamental rules. The first is that they must use the edit summary to explain what has been done and why. The second rule is that contributors should not make complex edits all at once. As much as possible, Lavallade said, contributors should do "some kind of atomic editing" where each change is independent of the other ones. He did not go into specifics on this during the talk, but the guidelines have examples of best practices. The third rule is that major changes or rewrites should be announced on a topic's talk page to give others who are watching the topic a chance to weigh in.

The team also has three major content guidelines that Lavallade highlighted. One that is likely familiar to anyone contributing to technical documentation is the don't repeat yourself (DRY) principle. A topic should only exist in one place, rather than being repeated on multiple pages. He also said that the ArchWiki employed a "simple, but not stupid" approach to the documentation. This means that the documentation should be simple to read and maintain, but not offer too much hand-holding. Users also need to be willing to learn; they may need to read through more than one page to find the information they need to do something.

The final guideline is that everything is Arch-centric. Content on the site may be useful for users running different Linux distributions, and contributions are welcome that may apply to other distributions, but "something that will not work on Arch as-is is not something we will be hosting on our site". That, he said, allowed the maintenance team to be focused on the content Arch provides and helps to keep maintenance more manageable.

Maintenance

Speaking of maintenance, Klinkovský said, the project has tools and templates to help make life easier for contributors. A reviewer might apply an accuracy template, for instance, which will add it to a page that lists all content that has been flagged as possibly inaccurate. The templates are usually used and acted on by people, but the project also has bots that can add some templates (such as dead link) and even fix some problems.

The review process is an important part of maintenance, he said. Everyone can participate in review, not just the maintainers of the wiki. He explained that it was not possible for the maintenance team to review everything, so much of the review is done by people interested in specific topics who watch pages to see when changes were made. If people spot errors, they are empowered to fix them on their own, or to use the templates to flag them for others to address. Maintainers are there, he said, "to make some authoritative decisions when needed, and mediate disputes if they came up".

Klinkovský referred to watching and reviewing content on the wiki as "patrolling", and said there were some basic rules that should be followed, starting with "assume good faith". Most people do something because they think it is right; the maintainers rarely see outright vandalism on the wiki.

The second rule, he said, is "when in doubt, discuss changes with others before making a hasty decision". If a change must be reverted, then a reviewer should always explain why it was reverted. This gives the original contributor a chance to come back and fix the problem or address it in a different way. Lastly, Klinkovský said, they wanted to avoid edit wars: "the worst thing that can happen on a wiki is a few people just reverting their changes after each other".

Preventing edit wars and encouraging contributions was, Lavallade said, part of the broader topic of community management. The team tries to encourage contributors to not only make one change, but to learn the guidelines and keep contributing—and then help teach others the guidelines.

Arch has support forums, such as IRC, and when people ask for help there they are pointed to the wiki "because there is always the solution on the ArchWiki". In the rare event that the wiki does not have the solution, he said, "we gently point them to where the page with the content needs to be" and invite the user to add it even if it's not perfect the first time. That helps to reinforce the idea that the wiki is a collaborative work that everyone should feel welcome to add to.

Weaknesses

Lavallade said that the contribution model also illustrated one of ArchWiki's weaknesses: there is a lot to learn about contributing to the wiki, and newcomers can get tripped up. For example, he said that the DRY principle was difficult for new contributors. Or a newcomer might add a large chunk of content into a page that should be broken up into several pages.

The MediaWiki markup language is another hurdle for new contributors. He called the markup "antiquated", and acknowledged that the style conventions for the ArchWiki are not obvious either. It can take a lot of reading, cross-referencing, and back-and-forth discussions for a new contributor to make a content contribution correctly.

MediaWiki has a lot of strengths, Klinkovský said; it is battle-proven software, it is the de facto standard platform for wikis, and it has a nice API that can be used for integration with external applications such as bots. But MediaWiki is a weakness as well, he said. The platform is primarily developed for Wikipedia, and its developers are from Wikipedia. "Sometimes their decisions don't suit us", he said, and there was little way to make things exactly as the ArchWiki maintenance team might want.

The primary weakness, though, was that its markup language is "very weird and hard to understand both for humans and machines". In 2025, most people know and write Markdown daily, but MediaWiki markup is different. It is weird and fragile; changing a single token can completely break a page. It is also, he said, difficult to write a proper or robust parser for the language. This is particularly true because there is no formal specification of the language, just the reference implementation in the form of MediaWiki. That can change at any time: "so even if you had a perfect parser one day, it might not work the same or perfectly the next day".

Since ArchWiki is developed by volunteer contributors, its content is essentially driven by popularity; people generally only edit the content that they have an interest in. Klinkovský said that this was not a weakness, necessarily, but it was related to some weaknesses. For example, some pages were edited frequently while others were not touched for years due to lack of interest. To a reader, it is not obvious whether page content is stale or recently updated.

There is also no perfect way to ensure that content makes its way to the wiki. He noted that people might solve their problem in a discussion on Arch's forums, but that the solution might never end up on the wiki.

Opportunities and threats

Klinkovský said that they had also identified several areas of opportunity—such as community involvement and support tools for editors—where the ArchWiki's work could be improved.

Lavallade said that one example of community involvement would be to work with derivatives from Arch Linux, such as SteamOS or Arch ports to CPU architectures other than x86-64. Currently, Arch is only supported on x86-64, he noted, but the project has passed an RFC to expand the number of architectures that would be supported.

Right now, the project has two tools for editors to use to make their work a bit easier: wiki-scripts and Wiki Monkey. Klinkovský explained that wiki-scripts was a collection of Python scripts used to automate common actions, such as checking if links actually work. Wiki Monkey is an interactive JavaScript tool that runs in the web browser, he said, and can help contributors improve content. For example, it has plugins to expand contractions, fix headers, convert HTML <code> tags into proper MediaWiki markup, and more.

There is much more that could be added or improved, he said, like linting software for grammar issues. The team might also consider incorporating machine learning or AI techniques into the editor workflow, "but this needs to be done carefully so we don't cause more trouble than we have right now". The trouble the team has with AI right now will probably sound familiar to anyone running an open-source project today; specifically, AI-generated content that is not up to par and scraper bots.

People have already tried contributing to ArchWiki using AI, but Klinkovský pointed out that "current models are obviously not trained on our style guidelines, because the content does not fit". Using AI for problem solving also prevents people from fully understanding a solution or how things work. That may be a problem for the whole of society, he said, not just ArchWiki.

The scraper bot problem is a more immediate concern, in that the project had to put the wiki behind Anubis in the early part of the year for about two months. Currently they do not need to use it, Klinkovský said, but they have it on standby if the bots come back. "So this is still a threat and we cannot consider it solved."

Another, non-technical, threat that the project faces is burnout. Lavallade said that contributor burnout is a real problem, and that people who have stayed a long while "usually start with a good, strong string of changes, but they end up tapering their amount of contributions". Everyone, he said, ends up running out of steam at some point. Because of that, there is a need to keep bringing in new contributors to replace people who have moved on.

Questions

One member of the audience wanted to know if there was a dedicated chat room for the wiki to discuss changes coming in. Lavallade said that there is an #archlinux-wiki room on Libera.Chat, and anyone is welcome there. However, the team frequently redirects conversations about changes to the talk pages on the wiki to ensure that everyone interested in a topic can discuss the change.

Steve McIntyre had two questions. He was curious about how many maintainers the ArchWiki had and what kind of hardware or setup was on the backend of the wiki "is this like, one virtual machine, or a cluster?" Klinkovský said that there were about 30 to 50 maintainers at the moment. As far as the setup, he said he was not on Arch's DevOps team and didn't know all the details, but he did know it was just one virtual machine "in the cloud".

Another person wanted to know if the team would choose MediaWiki again if they were building the wiki today. Klinkovský did not quite answer directly, but he said that if a project does not like the markup language used by MediaWiki then it should look to a solution that uses Markdown. But, if a project needs all of the other features MediaWiki has, "like plugins or the API for writing bots and so on", then MediaWiki is the best from all of the wiki software available.

One audience member pointed out that the chart seemed to show a spike in activity beginning with COVID and a steady decline since. They asked if the team had noticed that, and what they were doing about it. Klinkovský said that they had not looked at that problem as a whole team, or discussed what they could do about it. He said that if Arch added new architectures or accepted contributions from Arch-derivative distributions, it might reverse the trend.

Lange closed the session by saying that he thought it was funny that the presenters had said they wanted ArchWiki to be Arch-centric: "I think you failed, because a lot of other people are reading your really great, big wiki".

Debian embraces MediaWiki

The session seems to have been a success in that it has helped to inspire the Debian project to revamp its own wiki. Immediately after the ArchWiki presentation, there was a Debian wiki BoF where it was decided to use MediaWiki. Debian currently uses the MoinMoin 1.9 branch, which depends on Python 2.7.

Since DebConf25, members of the wiki team have worked with the Debian's system administrators team to put up wiki2025.debian.org to eventually replace the current wiki. They have also created a new debian-wiki mailing list and decided to change the content licensing policy for material contributed to the wiki. Changes submitted to the wiki after July 24 are now licensed under the Creative Commons Attribution-ShareAlike 4.0 license unless otherwise noted.

If Debian can sustain the activity that has gone into the wiki revamp since DebConf25, its wiki might give the ArchWiki project a run for its money. In that case, given that ArchWiki has proven such a good resource for Linux users regardless of distribution, everybody will win.

[Thanks to the Linux Foundation, LWN's travel sponsor, for funding my travel to Brest for DebConf25.]

Comments (16 posted)

StarDict sends X11 clipboard to remote servers

By Daroc Alden
August 11, 2025

StarDict is a GPLv3-licensed cross-platform dictionary application. It includes dictionaries for a number of languages, and has a rich plugin ecosystem. It also has a glaring security problem: while running on X11, using Debian's default configuration, it will send a user's text selections over unencrypted HTTP to two remote servers.

On August 4, Vincent Lefevre reported the problem to the oss-security mailing list and to Debian's bug tracker. He identified it while testing his setup before the upcoming Debian 13 ("trixie") release. Installing StarDict will also install the stardict-plugin package by default, because the former recommends the latter. The plugins package contains a set of commonly used StarDict plugins, including a plugin for YouDao, a Chinese search engine that supplies Chinese-to-English translations. The plugin also contacts a second online Chinese dictionary, dict.cn.

This would normally not be much cause for concern; of course a dictionary program will include code to talk to dictionary-providing web sites. But one of StarDict's features, which is also enabled by default, is its "scan" functionality: it will watch the user's text selections (i.e. text highlighted with the mouse), and automatically provide translations as a pop-up. Taken together, the two features result in any selected text being sent to both servers. This only occurs while StarDict is open, but the application is designed to be left open in the background in case the user needs a quick reference while reading.

StarDict on Wayland doesn't have this problem, because Wayland prevents applications from being able to capture text from other applications by default. That does mean that it breaks StarDict's scan feature, though.

Xiao Sheng Wen, the Debian package maintainer for StarDict, didn't see a problem with the behavior, noting that if a user doesn't want to use the scan functionality or the YouDao plugin, both can be disabled. Lefevre wasn't satisfied with that, saying:

But this is not the whole point. Features with privacy concerns should never be enabled by default (unless the feature is the only purpose of the package, and such a package would never be installed automatically — and even in such a case, there should be a big warning first).

In response, Xiao pointed out that the package description can be read by any user who chooses to install the software, and it does mention the scan feature. That said, I noted during my investigation that the description of stardict-plugin did not mention that the YouDao plugin uses an online service instead of an offline dictionary. Xiao suggested splitting the networked dictionary plugins into a separate package, but was "not sure whether it's very necessary to do so".

It is worth noting that the scan feature, while obviously a problem in this context, is one of the reasons that a user might choose to use StarDict over an alternative. Reading foreign-language media is often easier when words can be sought in a dictionary with as little fuss as possible. From that perspective, it makes sense that Xiao might not view the feature as problematic.

Any user who did read the description of the package, and who knew what the YouDao plugin would do, might nevertheless expect the resulting communication to at least be encrypted. But the plugin actually reaches out to its backend servers — dict.youdao.com and dict.cn — over unsecured HTTP. So, not only are these servers sent any text the user selects, but anyone who can view traffic anywhere along its path can see the same thing.

This is not even the first time that StarDict has sent user selections to the internet; the same kind of problem was reported by Pavel Machek in 2009 and again by "niekt0" in 2015. The 2009 bug was solved by patching the application's default configuration to disable networked dictionaries. That appears to have worked for a time, but the YouDao plugin, which was added in 2016, does not respect the configuration option. The 2015 problem was not fixed until August 6 of this year (although the package was removed from Debian for unrelated reasons for a few months from 2020 to 2021). That fix just removed the stardict_dictdotcn.so plugin, which also sent translation requests to dict.cn and was later subsumed by the YouDao plugin, from the package. In fairness to Xiao, he was not the StarDict maintainer in 2015 — that was Andrew Lee — but Xiao knew about the 2015 bug since at least 2021, even if he didn't consider it a priority.

According to Debian's package popularity contest statistics, only 178 people have StarDict installed, down from around a thousand between 2009 and 2015. That obviously doesn't capture people who have configured their Debian system not to participate in the statistics collection, but it does suggest there were a number of people who might have been broadcasting their text selections to the internet for several years. Given that people copy and paste passwords from their password managers, or select the text of sensitive emails and documents during the course of editing, that should be a significant cause for concern.

Debian is a large distribution, containing tens of thousands of packages. Moreover, because of its commitment to stability, a decent fraction of these are older software with delayed or sporadic updates. The reality is that Linus's law ("given enough eyeballs, all bugs are shallow") only holds up if people are looking — and if, once they have looked, and have reported things, the people who have taken up maintenance of the software actually agree that there is a problem.

Part of the justification for moving to Wayland over X11 is to make security vulnerabilities relating to one application spying on another more difficult to introduce. That obviously has to be balanced against the cost of adapting to a new way of doing things, but it's not hard to see why so many people are eager to make Wayland work. Maybe, in the future, StarDict's default behavior would have had little to no impact. Or maybe StarDict would have started asking for special permissions to let it work on Wayland, and users would have accepted those defaults the same way they currently do.

Either way, the existence of serious security problems that can be found, diagnosed, reported, and still remain unfixed is cause for concern. Linux has long enjoyed a reputation for security; maintaining that reputation depends on the developers, maintainers, and users of open-source software caring enough to fix security problems when they arise.

Comments (38 posted)

Treating Python's debugging woes

By Jake Edge
August 8, 2025

EuroPython

Debugging in Python is not like it is for some other languages, as there is no way to attach a debugger to a running program to try to diagnose its ills. Pablo Galindo Salgado noticed that when he started programming in Python ten years ago or so; it bugged him enough that he helped fill the hole. The results will be delivered in October with Python 3.14. At EuroPython 2025, he gave a characteristically fast-paced and humorous look at debugging and what will soon be possible for Python debugging—while comparing it all to medical diagnosis.

When he started with Python, he came from the compiled-language (C, C++, and Fortran) world, where you can attach a debugger like GDB to a running program. That would allow stopping the execution, poking around to see what the program is doing, then letting it continue to execute. Python has the pdb debugger, but when he asked around about why it could not attach to running programs like GDB does, people said "Python does not work like this". Ten years later, "now I am here to tell you 'yeah, it actually works likes this'", he said with a laugh.

Medicine

He showed a picture of a magnetic-resonance-imaging (MRI) machine, noting that it was an amazing piece of equipment that can be used to look inside a person to see how well they are working. He is a physicist, so he needed to study how they work; they use magnetic fields, "which are just light". So they are technically using light to "get a precise map of what is wrong with you without actually even cutting you open". MRI machines produce enormous magnetic fields, much larger than Earth's; it is simply amazing that humans can "produce magnetic fields that are 60,000 times the magnetic field of a planet". We use them "to charge you a lot of money and try to find out what is wrong with you".

[Cutting the Stone]

Unfortunately, the Python debugging experience is not like an MRI machine, it is instead more like the Hieronymus Bosch painting Cutting the Stone (seen at left). The painting depicts a hapless patient having their skull opened up by a "doctor" in medieval times. Galindo Salgado renamed the painting: "Two senior engineers and a manager debugging a live application". The manager was easily spotted as the one with the book on their head "because he looks like he is helping, but he is not", he said with a laugh.

He asked his manager about using the joke, who admitted it was pretty funny. They looked into the background of the painting and found out that Bosch was actually aiming that figure in a similar way; it is meant to represent the church, which is trying to help but doesn't know how. Jokes aside, the Python debugging experience is "kinda bad", Galindo Salgado said.

Debugging a complicated application may require restarting many different components after adding some debugging output and hoping that the problem happens again. It would be like killing a sick patient, resurrecting them on the exam table, and hoping they get the same sickness so it can be observed. "You laugh because it sounds stupid, because it is stupid." With Python 3.14, things will be much better, and not just for pdb; "this is going to open a new field of tools that can do very cool things with the Python interpreter".

Debugging

Attaching pdb to a running Python program is not possible, prior to Python 3.14, but one can attach a native debugger, like GDB, to the running interpreter. On Linux, ptrace() is used by GDB to attach to a process; macOS and Windows have similar facilities. The target process stops executing once the debugging process attaches to it, so the debugger can use other ptrace() calls to retrieve information (e.g. register contents) about the target. From the register values, the debugger can determine where the program is executing, what the values of local variables are, and so on.

[Pablo Galindo Salgado]

Beyond that, other ptrace() calls can be used to examine and modify the memory of the program. That includes the registers, so the debugger can, in principle, change the instruction pointer to execute a different function. At that point, the ptrace() continue opcode (PTRACE_CONT) can be used to cause the program to start executing again. "And the whole thing explodes."

It explodes for various reasons, but the basic problem is that the running interpreter is not prepared for that kind of manipulation. For example, malloc() has a lock, so if the interpreter was trying to allocate memory when it was interrupted, a subsequent call to malloc() will deadlock. In Python there are other locks ("actually we have a big one", he said with a grin) that are similarly affected. "Python is fundamentally unsynchronized with this mechanism", so GDB cannot be used to debug that way.

The Memray memory profiler that he works on does attach to running Python programs, but it has a long list of functions that it sets breakpoints for. When a breakpoint in, say, malloc(), is triggered, it may be safe to manipulate the program. "The list is like 80 functions long." The technique is fragile, since missing one or more functions may result in an explosion, which is particularly bad when working in production. "All of this is horrible, it's just disgusting." What is needed is for the interpreter to be able to tell the debugger "now it is safe to attach".

One could imagine a debugger that sends bytecode to the target program to debug at the Python level. But GDB and other native debuggers do not speak bytecode; "these two processes are basically speaking two different languages". One of them uses and recognizes the C language calls in the interpreter, while the other uses Python in the target program, so some other technique is required.

It turns out, Galindo Salgado said, that various profilers use the process_vm_readv() system call to access the memory of a target process. It allows reading memory, based on an address and length, without even stopping the target process. After learning about that call, he started thinking about the movie Inception, and wondered if you can put something into memory as well. He found the process_vm_writev() call, which allows just that; being able to write is "where the fun starts". With those calls (and their equivalents for other operating systems), an interface for safely debugging running Python programs can be developed.

PEP

He authored PEP 768 ("Safe external debugger interface for CPython") with two of his Bloomberg colleagues: Matt Wozniski and Ivona Stojanovic. It is a complicated PEP, Galindo Salgado said, because it covers security implications and "all sorts of different boring things, unless you are into security, in which case I'm very sorry for you".

The first step is for the debugger to call a new function in the sys module called remote_exec(); it takes a process ID and the file name of a script to be run by the target process. The target Python program needs to find a safe point when it can run the script, or else the result will be the explosions he mentioned earlier.

The interpreter main loop, which steps through the bytecode of the program as it executes, has an "eval breaker" that is checked periodically to handle certain events, such as signals. A ctrl-c is not immediately processed by CPython because the interpreter cannot just be interrupted anywhere; the CPython core developers also found out, painfully, that the garbage collector needs to be restricted to only running when the eval breaker is checked, he said. That makes for a safe place to run the script that gets passed to remote_exec().

Starting with 3.14, all CPython processes will have a new array to hold the script name that remote_exec() will run. The eval breaker will check a flag to see if it should execute the file and, if so, it will run the code. The trick is that the remote_exec() call, which is running in the debugging Python process, needs to be able to find the array in the memory of the target Python process so that the script name can be copied there.

The key to that is finding the PyRuntime structure "that contains all the information about the entire interpreter that is running"; it also includes information about any subinterpreters that are active. Since CPython 3.11, a symbol has been placed into the binary as an ELF section (_PyRuntime) that contains the offset of the structure from the start of the binary. He suggested using "readelf -h" on the binary to see it, but it is not present in the Python 3.13 binary on my Fedora 42 system.

That offset is not sufficient to find the structure, however, due to address-space-layout randomization (ASLR), which changes the address where the CPython interpreter gets loaded each time it is run. Figuring out that address is different for each platform (though Windows does not do ASLR for reasons unknown to him, he said); for Linux, the /proc/PID/maps file gives all of the information needed (see our report from a PyCon US talk that included information about processing the file from Python). From that, the address of the interpreter binary can be extracted; adding the offset of the structure results in the address of the target process's PyRuntime structure.

At that point, process_vm_readv() can be used to look at the structure; the array for the script name is not contained there, but there is a _Py_DebugOffsets structure that contains information that will allow the debugging Python process to correctly access objects in the memory of the target program. A new _debugger_support structure has been added to _Py_DebugOffsets with an offset to the actual array where the script name can be copied; it is an offset from a thread-specific structure because each thread can be debugged separately. He quickly went through the path needed to find the interpreter and thread state; from there, the place to write the script name can be found.

He showed some code from the interpreter for the eval breaker, with the code for new remote debugger script added. Once the script name has been written to the proper location, a flag is set in the structure, which is checked by the eval breaker. If it is set, the audit system is consulted (PySys_Audit()) and, if execution is allowed, PyRun_AnyFile() is called on the open file. It took a lot to get all of that working, but it has been done at this point, so users can simply run:

    sys.remote_exec(1234, '/tmp/script.py')
He gave a quick demo of running a Python program in one window and, in another, doing:
    $ python -m pdb -p PID
That stopped the other program and gave him a "(Pdb) " prompt, where he could get a Python stack backtrace with "bt", step through with the next command ("n"), and so on. "Awesome, it only took a year of work", he said with a laugh, to applause for the demo.

He noted that Mark Shannon likes to call the feature "remote execution as a service", so there are ways to disable it. "There are a bunch of increasingly nuclear options to deactivate this", Galindo Salgado said. Though he wanted to call it PYTHON_NO_FUN_MODE, the PYTHON_DISABLE_REMOTE_DEBUG environment variable can be set to a value in order to disable the feature for any Python started in that environment. A more targeted approach is to start Python with the "-X disable-remote-debug" flag. Sites that do not want the feature available at all can configure the Python build using the "--without-remote-debug" option. He joked that it was a boring option; "people will not invite you to parties and things like that".

Future

The remote_exec() call is meant to provide a building block for debuggers and profilers in the future. While it could be used for some kind of interprocess-communication (IPC) or remote-procedure-call (RPC) mechanism, he warned against using it that way. Beyond debuggers and profilers, though, he wanted to show some examples of "the tiny tools that you can do". Something he has learned as a core developer is that building blocks are a great way to add features, because "people are kind of weird" and will find interesting and unexpected ways to use them. For example, there are various uses of remote_exec() for introspection tasks in the standard library for Python 3.14.

If a web application is having a problem, but the logging level is not showing enough to diagnose it, for example, remote_exec() can be used to change the logging level while it is running—and change it back once the problem is found. The application could have a diagnostic report of some sort that can be triggered via the remote-debugging interface. He showed a web server that dumped information about its active connections.

Another application might be for memory-allocation debugging. Memory profilers exist, he said, but they normally need to observe allocations as they happen; if there is a program that is already using too much memory, "bad luck, because the profiler has not seen anything". But with the remote debugging, the garbage collector can be queried for all of the live objects in the program. He showed sample reports of the object types with the most allocated objects and of the objects with a size larger than 10MB. Trying to interpret that kind of data with a native debugger like GDB is effectively impossible, he said.

Another example showed all of the modules that were loaded by the program. It could be extended to show the module versions as well. For a company that runs lots of Python programs, a script could poll all of the programs to gather the full picture of module use throughout production. And so on.

In conclusion

He likes to think of the feature as "the Python MRI moment" because "it is the technology that unblocks inspection". As with an MRI machine, the feature does not actually diagnose anything—a doctor or programmer is needed to interpret the output. Unlike an MRI, remote_exec() allows changing things within the patient, which "maybe is not a great idea, but maybe you know what you are doing".

It will allow other Python debuggers, such as the one in Visual Studio Code (VSCode), to drop "their megacode that injects crazy things" into Python in favor of using remote_exec(). He is excited to see that, but is also interested in what kinds of tools users come up with. Python is the most popular programming language right now and has been cutting into heads for too long; it is time to use the MRI machine instead, he concluded.

The first question was about attaching multiple times to the same program, which Galindo Salgado said can be done, but sequentially. He noted that the script is running in the context of the program, so it can import modules if needed, but only if they are already installed. If they are not installed, "you could shell out to pip maybe ... whoops", he said with a worried laugh.

Another question was about forward compatibility: would a Python 3.14 program be able to call remote_exec() for a target program running on 3.15? Is the method for finding PyRuntime and the other pieces of the interpreter state version-specific? Galindo Salgado said that the protocol is "technically forward-compatible", but currently the same major version of Python needs to be used on both sides. That may change eventually, but there are a lot of installed Pythons out there that will not work, so the idea is to avoid user confusion. The protocol itself could be used to implement a debugger in Rust, if desired; all of the information needed to interpret the Python objects is provided in _Py_DebugOffsets.

The subinterpreter support in the new feature was the next question. There is no real support for subinterpreters yet, Galindo Salgado said. The remote_exec() in the standard library selects the main subinterpreter currently, but the protocol makes it possible for other implementations to choose different subinterpreters. The protocol allows stepping through the list of interpreters to pick the one to target; that may eventually be added for remote_exec() as well.

[I would like to thank the Linux Foundation, LWN's travel sponsor, for travel assistance to Prague for EuroPython.]

Comments (5 posted)

On the use of LLM assistants for kernel development

By Jonathan Corbet
August 7, 2025
By some appearances, at least, the kernel community has been relatively insulated from the onslaught of AI-driven software-development tools. There has not been a flood of vibe-coded memory-management patches — yet. But kernel development is, in the end, software development, and these tools threaten to change many aspects of how software development is done. In a world where companies are actively pushing their developers to use these tools, it is not surprising that the topic is increasingly prominent in kernel circles as well. There are currently a number of ongoing discussions about how tools based on large language models (LLMs) fit into the kernel-development community.

Arguably, the current round of debate began with this article on a presentation by Sasha Levin at the Open Source Summit North America in June; his use of an LLM to generate a kernel patch came as a surprise to some developers, including the maintainer who accepted that patch. Since then, David Alan Gilbert has posted a patch proposing requirements for the disclosure of LLM use in kernel development. Levin has posted a series of his own focused on providing configurations for coding assistants and guidelines for their use. Both of these submissions have provoked discussions ranging beyond their relatively narrow objectives.

Gilbert suggested the use of a new patch tag, Generated-by, to identify a tool that was used to create a kernel patch; that tag would be expected not just for LLM-generated patches, but also patches from long-accepted tools like Coccinelle. Levin, instead, suggests using the existing Co-developed-by tag, but takes pains to point out that an LLM should not add the Signed-off-by tag that normally is required alongside Co-developed-by. Either way, the suggestion is the addition of information to the tags section of any patch that was generated by an LLM-based tool.

A step back

While much of the discussion jumped directly into the details of these patches, some developers clearly feel that there is a more fundamental question to answer first: does the kernel community want to accept LLM-developed patches at all? Vlastimil Babka responded that Levin's patch set was "premature", and that there was a need to set the rules for humans to follow before trying to properly configure LLMs:

So without such policy first, I fear just merging this alone would send the message that the kernel is now officially accepting contributions done with coding assistants, and those assistants will do the right things based on these configuration files, and the developers using the assistants don't need to concern themselves with anything more, as it's all covered by the configuration.

Lorenzo Stoakes said that "an official kernel AI policy document" is needed first, and suggested that it would be best discussed at the Maintainers Summit (to be held in December). He agreed with Babka that merging the patches in the absence of such a policy would be equivalent to a public statement that LLM-generated patches are welcome in the kernel community.

A number of developers expressed concerns that these tools will be used to generate patches that are not understood by their submitters and which may contain more than the usual number of subtle bugs. David Hildenbrand worried that he would end up dealing with contributors who simply submit his questions to the tool that generated the patch in the first place, since they are unable to explain the code on their own. He also pointed out the policy adopted by the QEMU project, which essentially bans LLM-generated contributions in that project. Al Viro described LLM-based tools as "a force multiplier" for the numerous developers who have, for years, been submitting machine-generated patches that they don't understand.

Mark Brown, instead, suggested that these tools will be used regardless of the kernel policy:

I'm also concerned about submitters just silently using this stuff anyway regardless of what we say, from that point of view there's something to be said for encouraging people to be open and honest about it so it can be taken into consideration when looking at the changes that get sent.

Levin's point of view is that the current policy for the kernel is that "we accept agent generated contributions without any requirements beyond what applies to regular humans"; his objective is to work out what those extra requirements should be. It should also be noted that some developers clearly feel that these tools are helpful; Kees Cook, for example, argued against any sort of ban, saying it would be "not useful, realistic, nor enforceable". Elsewhere, he has commented that "the tools are finally getting interesting".

Disclosure

If the kernel project were to ban LLM-generated code, then the rest of the discussion would be moot, but that would appear to be an unlikely outcome. If one assumes that there will be (more) LLM-generated code entering the kernel, a number of questions come up, starting with disclosure of tool use. Both Gilbert and Levin propose the addition of patch tags to document this use. A couple of developers disagreed with that idea, though; Konstantin Ryabitsev said that this information belongs in the cover letter of a patch series, rather than in the tags. That is how code generated by tools is described now, and he did not see a reason to change that practice. Jakub Kicinski argued that the information about tools was "only relevant during the review", so putting it into patch changelogs at all "is just free advertising" for the tools in question.

The consensus view, though, would appear to be in favor of including tool information in the patch itself. Cook, who initially favored keeping tool information out of the tags, later acknowledged that it would be useful should the need come to track down all of the patches created by a specific tool. Steve Rostedt said that this information could be useful to find patterns of bugs introduced by a specific tool. Laurent Pinchart noted that formalized patch tags would be useful for tracking down any copyright-related problems as well. Gilbert commented that disclosure "lets the people who worry keep of track what our mechanical overlords are doing".

If one takes the position that tool use must be disclosed, the next question is inevitably: where should the line be drawn? Levin asked whether the use of a code-completion tool requires disclosure, for example. Others have mentioned using compiler diagnostics to find problems or the use of language-sensitive editors. There is clearly a point where requiring disclosure makes no sense, but there does not, yet, appear to be a consensus on where that point is. One possible rule might be this one suggested by Rostedt: "if AI creates any algorithm for you then it must be disclosed".

Meanwhile, Levin's first attempt to disclose LLM usage with a Co-developed-by tag drew an amused response from Andrew Morton, who seemingly had not been following this conversation. Hildenbrand responded that a new tag, such as Assisted-by, would be more appropriate; Ryabitsev has also made that suggestion.

Copyright and responsibility

The copyright status of LLM-generated code is of concern to many developers; if LLM-generated code ends up being subject to somebody's copyright claim, accepting it into the kernel could set the project up for a future SCO-lawsuit scenario. This, of course, is an issue that goes far beyond the kernel community and will likely take years of court battles worldwide to work out. Meanwhile, though, maintainers will be asked to accept LLM-generated patches, and will have to make decisions long before the legal processes have run their course.

Levin pointed to the generative-AI guidance from the Linux Foundation, saying that it is the policy that the kernel community is implicitly following now. In short, this guidance suggest that developers should ensure that the tool itself does not place restrictions on the code it generates, and that said code does not incorporate any pre-existing, copyrighted material. Levin suggested using this document as a starting point for judging the copyright status of submissions, but that guidance is only so helpful.

Michal Hocko asked how maintainers can be expected to know whether the conditions suggested in that "quite vague" guidance have been met. Levin's answer reflects a theme that came up a few times in the discussion: that is what the Signed-off-by tag applied by the patch submitter is for. By applying that tag, the submitter is indicating that the patch is a legitimate contribution to the kernel. As with any other patch, a contributor needs to be sure they are on solid ground before adding that tag.

That reasoning extends beyond just copyright status to responsibility for the patch at all levels. Rostedt suggested documenting that a signoff is also an indication that the submitter understands the code and can fix problems with it. Viro said that, for any patch regardless of origin, "there must be somebody able to handle active questioning" about it. Levin added that: "AI doesn't send patches on its own - humans do", so it is the human behind the patch who will ultimately be responsible for its contents.

The reasoning makes some sense, but may not be entirely comforting to nervous maintainers. The people submitting LLM-generated patches are not likely to be in a better position to judge the copyright status of that work than maintainers are. Meanwhile, maintainers have had to deal with patches from contributors who clearly do not understand what they are doing for many years; documenting that those contributors must understand the output from coding tools seems unlikely to slow down that flood. Hildenbrand expressed his concern this way: "We cannot keep complaining about maintainer overload and, at the same time, encourage people to bombard us with even more of that stuff". Based on what has been seen in other areas, it would not be surprising to see an order-of-magnitude increase in the flow of low-quality patches; indeed, Greg Kroah-Hartman said that it is already happening.

More discussion

The end result is that the question of how to incorporate LLM-based development tools into the kernel project's workflow is likely to feature prominently in community discussions for some time. While these tools may bring benefits, including finding patterns that are difficult for humans to see and the patient generation of test code, they also have the potential to bring copyright problems, bugs, and added maintainer stress. The pressure to use these tools is not going away, and even the eventual popping of the current AI bubble seems unlikely to change that.

Within a few milliseconds of the posting of the call for topics for the 2025 Maintainers Summit, there were two separate proposals (from Stoakes and Jiri Kosina) on the issue of AI-based tools in the kernel workflow; they have sparked discussions that will surely have progressed significantly by the time this article is published. One does not, it seems, need an LLM to generate vast amounts of text. This conversation is, in other words, just beginning.

Comments (45 posted)

The rest of the 6.17 merge window

By Jonathan Corbet
August 11, 2025
The 6.17-rc1 prepatch was released by Linus Torvalds on August 10; the 6.17 merge window is now closed. There were 11,404 non-merge changesets pulled into the mainline this time around, a little over 7,000 of which came in after the first-half merge-window summary was written. As one would expect, quite a few changes and new features were included in that work.

Some of the most significant changes pulled into the mainline during the second half of the 6.17 merge window are:

Architecture-specific

  • Support for BPF has been improved for the LoongArch architecture, which can now handle dynamic code modification, BPF trampolines, and struct ops programs.
  • S390 systems now support the swapping and migration of transparent huge pages.

Core kernel

  • The BPF subsystem now exports a set of standard (but read-only) string operations for BPF programs; there is no documentation, but the functions can be found in this commit.
  • BPF programs now have standard output and error streams that can be used to communicate back to user space; see this commit for details.
  • The new DAMON_STAT kernel module provides simplified monitoring of memory-management activity in the system; see this changelog and Documentation/admin-guide/mm/damon/stat.rst for more information.
  • It is now possible to control the aggressiveness of the proactive-reclaim machinery on a per-NUMA-node basis, allowing some nodes to more actively evict pages than others. See this changelog for details.
  • The extensible scheduler class now has support for control-group-based bandwidth control (specifically the cpu.max parameter described in this document).

Hardware support

  • Clock: Renesas RZ/T2H clocks, SpacemiT reset controllers, Qualcomm SM6350 video clock controllers, Qualcomm SC8180X camera clock controllers, multiple Qualcomm QCS615 clock controllers, and multiple Qualcomm Milos clock controllers.
  • GPIO and pin control: ESWIN EIC7700 pin-control units, Qualcomm Milos pin controllers, STMicroelectronics STM32 hardware debug port pin controllers, and MediaTek MT8189 pin controllers.
  • Graphics: Renesas R69328 720x1280 DSI video mode panels, Himax HX83112B-based DSI panels, and Intel Discrete Graphics non-volatile memory.
  • Miscellaneous: Qualcomm M31 eUSB2 PHYs, Sophgo CV1800/SG2000 series SoC DMA multiplexers, Sophgo DesignWare PCIe controllers (host mode), Renesas I3C controllers, Broadcom BCM74110 mailboxes, and Aspeed AST2700 mailboxes.
  • Networking: Qualcomm IPQ5018 Internal PHYs, Airoha AN7583 MDIO bus controllers, Broadcom 50/100/200/400/800 gigabit Ethernet cards, Microchip Azurite DPLL/PTP/SyncE devices, and Realtek 8851BU and 8852BU USB wireless network (Wi-Fi 6) adapters.

Miscellaneous

  • The runtime verification subsystem has gained support for linear temporal logic monitors; this commit provides documentation. There is a new monitor, rtapp, that looks for common problems in realtime applications; see this commit for documentation. Finally, the new nrp, sssw, and opid monitors are available for internal scheduler testing.
  • The automatic mounting of the tracefs virtual filesystem on /sys/kernel/debug/tracing has been deprecated; scripts should be using /sys/kernel/tracing instead. The current plan is to remove that automatic mount in 2030.
  • See this merge message for a summary of the numerous changes to the perf tool in 6.17.
  • There is a new option to reserve space for kernel crash dumps from the contiguous memory allocator, making that memory available for use by the kernel prior to a crash. See this documentation patch for details.

Networking

  • Support for RFC 6675 loss detection has been removed from the kernel. This protocol has long been considered obsolete and has not been used by default since 2018. At this point, it seems that everybody is using RACK-TLP instead.
  • The power sourcing equipment (PSE) implementation has gained support for configurable budget-evaluation strategies, which are "utilized by PSE controllers to determine which ports to turn off first in scenarios such as power budget exceedance". See this changelog for some more information.
  • Support for gateway routing has been added to the Management Component Transport Protocol (MCTP) subsystem; see this merge message for an overview.
  • The new SO_INC option for AF_UNIX sockets mirrors TCP_INQ; it will cause a control message to be placed on the socket indicating how much data is available to be read there. Similarly, SIOCINQ has been added for the VSOCK address family.
  • The TCP implementation has traditionally been forgiving about accepting data beyond the advertised receive window; that comes to an end in 6.17, which enforces the window limit more strictly.
  • Multipath TCP now supports the TCP_MAXSEG socket option to set the maximum size of outgoing segments.
  • Support for the DualPI2 (RFC 9332) congestion-control protocol has been added; see this commit for an overview of the Linux implementation.
  • The new force_forwarding sysctl knob allows the administrator to enable forwarding on specific IPv6 interfaces.

Security-related

  • The AppArmor security module has gained the ability to control access to AF_UNIX sockets. See this commit changelog for an overview of how it works.

Virtualization and containers

Internal kernel changes

  • Memory managed by the networking subsystem's page pool is now referred to using struct netmem_desc rather than struct page. This work is part of the ongoing process of moving from struct page to descriptors specific to the way each folio is being used.
  • The deferred unwind infrastructure — a needed precursor for the work to add SFrame-based stack unwinding — has been merged. The SFrame work itself will seemingly wait for another development cycle.
  • Rust support has been added for the warn_on!() macro, delayed workqueue items, a UserPtr type for user-space pointers, and more; see this merge message for a list.
  • The gconfig kernel-configuration tool has been migrated to the GTK 3 toolkit.
  • There were 171 exported symbols removed, and 523 added this time around; see this page for the full list. Developers also removed seven kfuncs (scx_bpf_consume(), scx_bpf_dispatch(), scx_bpf_dispatch_from_dsq(), scx_bpf_dispatch_from_dsq_set_slice(), scx_bpf_dispatch_from_dsq_set_vtime(), scx_bpf_dispatch_vtime(), and scx_bpf_dispatch_vtime_from_dsq()) and added 15 others (bpf_arena_reserve_pages(), bpf_cgroup_read_xattr(), bpf_strchr(), bpf_strchrnul(), bpf_strcmp(), bpf_strcspn(), bpf_stream_vprintk(), bpf_strlen(), bpf_strnchr(), bpf_strnlen(), bpf_strnstr(), bpf_strrchr(), bpf_strspn(), bpf_strstr(), and bpf_stream_vprintk()).

Notably absent from the 6.17 merge window was any action on the bcachefs pull request. In the contentious conversation that has followed, bcachefs developer Kent Overstreet said: "I just got an email from Linus saying 'we're now talking about git rm -rf in 6.18'", but no further information is available publicly.

The 6.17 kernel now goes into the stabilization phase, with the mostly likely date for the final release being September 28.

Comments (none posted)

Possible paths for signing BPF programs

By Daroc Alden
August 12, 2025

BPF programs are loaded directly into the kernel. Even though the verifier protects the kernel from certain kinds of misbehavior in BPF programs, some people are still justifiably concerned about adding unsigned code to their kernel. A fully correct BPF program can still be used to expose sensitive data, for example. To remedy this, Blaise Boscaccy and KP Singh have both shared patch sets that add ways to verify cryptographic signatures of BPF programs, allowing users to configure their kernels to load only pre-approved BPF programs. This work follows on from the discussion at the Linux Storage, Filesystem, Memory-Management, and BPF Summit (LSFMM+BPF) in April and Boscaccy's earlier proposal of a Linux Security Module (LSM) to accomplish the same goal. There are still some fundamental disagreements over the best approach to signing BPF programs, however.

The kernel can already check signatures on loadable kernel modules; what makes BPF programs more difficult? The main culprit is "compile once — run everywhere" (CO-RE) relocations. BPF programs need to access internal kernel data structures, but those data structures can be different between kernel versions, architectures, or configurations. In the same way that normal ELF relocations modify a program to account for the run-time memory layout of its libraries, CO-RE relocations modify a BPF program to account for differences between kernel versions.

Not all BPF programs require CO-RE relocations; for those that do, the relocations can be performed by the kernel on request from user space or by a separate BPF loader program called a "light skeleton" (a common approach that can also set up the needed BPF maps for a program). CO-RE relocations used to be performed entirely in user space, but a 2021 patch set from Alexei Starovoitov added support into the kernel as an explicit prerequisite to BPF signing. A customized light skeleton is generated for a particular BPF program at build time, and can therefore perform program-specific setup. This means that the final version of the BPF program that is seen by the verifier is not the same as the version that sits on disk, which makes verifying a signature on the program challenging. A related problem is a BPF program's associated maps; several kinds of BPF maps can affect the behavior of a running program, so they essentially need to be treated as part of the payload to be signed.

Boscaccy's previous attempt at the problem was an LSM that he presented at LSFMM+BPF. In follow-up discussion on the mailing list, several BPF developers disagreed with that approach. Starovoitov summarized their objections, saying that the proposed mechanism was insecure, and flew in the face of the planning that the BPF community has been doing to enable signed BPF programs. LSM maintainer Paul Moore asked for more details of how exactly the BPF maintainers wanted it to work, saying that he was sure Boscaccy would take that feedback into account. Singh provided an explanation of the design Starovoitov had referenced.

In response, Boscaccy put together a new patch set that integrates BPF signature verification into the loading process rather than handling it with an LSM. The patch set supports two different ways to sign a BPF program: signing the program and its maps together, or signing just the light skeleton, and letting it check the signature on the actual BPF program. The latter mode of operation was "suggested or mandated" during the previous discussion by Singh (a maintainer of BPF security code in the kernel), Boscaccy said.

Singh prefers that approach because it imposes fewer hard requirements on future designs for BPF loaders. He was also somewhat dismissive of Boscaccy's interactions with other contributors, saying that they "leave a lot to be desired". Verifying the signature on light skeleton, which itself checks an embedded hash against the BPF program being loaded, is theoretically equally secure to verifying the whole thing. But practically, Boscaccy said, it brings the code for the loader into the set of software that needs to be trusted for the whole scheme to be secure. "Your proposal increases the attack surface with no observable benefit."

Usually light skeletons are generated when the BPF program is built, using bpftool or libbpf, which are developed as part of the kernel project. Therefore, it might seem that trusting the code in a light skeleton is no different than trusting the code in the kernel. As Singh points out, users already have to trust their build environments in order to sign binaries that come out of it. In his view, the loader-based checking should be the only option — and is what the BPF developers have been planning on and working toward. That hasn't satisfied Boscaccy, however, who feels that it's important to be able to handle the entire verification within the kernel itself.

There is also the issue of how program verification interacts with LSMs; if the kernel is responsible for all verification, it can provide information to an LSM on whether a verification succeeded or failed. If the loader program is responsible for the verification, it is harder to pass that information to the LSM, since it is not available when the bpf() system call that loads the light skeleton is made.

Singh wasn't happy with Boscaccy's patch set, however, and ended up submitting his own, which only supports the loader-based approach. He did note that anyone who did not wish to rely on light skeletons could potentially write their own loader program, but thought that some means of supporting signing via loaders was absolutely necessary for BPF, and therefore the only method needed:

Given that many use-cases (e.g. Cilium) generate trusted BPF programs, trusted loaders are an inevitability and a requirement for signing support, [...] entrusting loader programs will be a fundamental requirement for [a] security policy.

James Bottomley said that the difference in size and complexity between the two approaches was quite small, and encouraged Singh to add support for Boscaccy's preferred option. Singh still didn't agree it was necessary or desirable to do so, and the second version of his patch set also doesn't include it. Boscaccy indicated that he intends to submit some follow-up work (presumably adding his preferred approach) if Singh's patch set is accepted.

Another feature of Singh's patch set is the introduction of exclusive maps. This provides a mechanism for users to indicate that certain BPF maps should exclusively be shared with the BPF program that matches a certain hash. This allows access to sensitive data to be restricted to the program that is intended to access it.

Neither patch set has been accepted at the time of writing, but it seems likely that Singh's version will be; luckily, both approaches can support signatures for simple BPF programs that do not require CO-RE relocations (such as light skeletons), though, so signing those programs will likely become an option in the not-too-distant future. Support for more complicated programs is likely to follow, one way or another.

Comments (7 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: CalyxOS; ACME on NGINX; Debian 13; LVFS sustainability; Go 1.25; Radicle 1.3.0; Rust 1.89; Syncthing 2.0; Quotes; ...
  • Announcements: Newsletters, conferences, security updates, patches, and more.
Next page: Brief items>>

Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds