Leading items

Welcome to the LWN.net Weekly Edition for March 23, 2023

This edition contains the following feature content:

Jumping the licensing shark: Bradley Kuhn explores where copyleft licensing went wrong and what might be done to fix it.
Hopes and promises for open-source voice assistants: voice control need not involve centralized, proprietary systems.
Zero-copy I/O for ublk, three different ways: the community debates how to best support zero-copy I/O for user-space block drivers.
Generic iterators for BPF: another way to support loops in BPF programs.
Reducing direct-map fragmentation with __GFP_UNMAPPED: the quest to keep the kernel's direct memory map from being fragmented continues.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (2 posted)

Jumping the licensing shark

By Jake Edge
March 22, 2023

Everything Open

The concept of copyleft is compelling in a lot of ways, at least for those who want to promote software freedom in the world. Bradley Kuhn is certainly one of those people and has long been working on various aspects of copyleft licensing and compliance, along with software freedom. He came to Everything Open 2023 to talk about copyleft, some of its history—and flaws—and to look toward the future of copyleft.

Kuhn began by saying that he spends much of his time these days thinking about the enforcement of GPLv2 and LGPLv2.1; "it turns out that those are the most widely used copyleft licenses in the world", thus they are the most frequently violated. It is sometimes painful to be looking at license text written in 1991 and 1993 as we move through 2023, but that is what he has to do. Outside of work, though, he has time to think about what sort of copyleft license he would draft if he were to do so. He was just out of high school when GPLv2 was released, so he did not participate in that process at all.

He turns 0x32 this year and, as an elder statesperson of FOSS, he has a duty to help figure out "where, exactly, we went wrong in how we got to where we are with copyleft", he said. In his view, copyleft exists to "achieve one very important principle"; it is working toward a world where everyone "has a fundamental right to examine, change, and install modified versions of their software onto the devices that they have in front of them".

Many of those in the audience who have Linux-based devices will realize that they cannot do that with many of those devices, which is a violation of GPLv2, he said. That is something that he and his colleagues at Software Freedom Conservancy (SFC) have been working on, but that is not what he wanted to focus on in the talk. Instead, he wanted to look at how we arrived at the current state of copyleft and if there is anything that can be done to improve the situation going forward.

History

When free and open-source software got its start, there were basically just three licenses that were in widespread use: BSD, X11, and the GPL. While the BSD license appeared a bit earlier, at this point we can say they all emerged at roughly the same time, certainly in "the same era of computing", he said. After its proto-versions (such as the Emacs public license), the GPL became the standard for copyleft releases, while the X11 license—essentially the same as the MIT license—generally was the choice for permissively released code. Until late in the 1990s, the (4-clause) BSD license had the "problematic advertising clause", which reduced its applicability.

Copyleft is primarily a strategy, Kuhn said, "it is not a principle unto itself", but that is often misunderstood by its fans. Copyleft is a tool to be used to reach the goal he described earlier about users having the source and the ability to modify and install it on their devices. "That is a principle." Since it is simply a tool, we should not treat any copyleft license or even the idea of copyleft as sacrosanct; if it is failing to get us to that goal, we should reconsider it.

The BSD advertising clause was well-known at the time that copyleft was being created, but nothing like that was added to the early versions of the GPL—though it does start to appear later. It is an "ego clause" in his view—a way for developers to get credit for their "amazing" work—which is rather comical to those who have been in the industry for long enough. "No one has ever written any good software ever in the history of software and it is a constant process to make it better." It makes no real sense to be "so arrogantly proud of our software", he said.

Because copyleft was focused on the principle behind it, at least in its early days, it did not incorporate anything like the advertising clause. Copyleft was aimed at helping the entire public, not simply the developers behind the software who might want credit. "It was designed to help people who use software."

Proliferation

It turns out that it is difficult to write good open-source and free-software licenses, but that has not stopped people from trying. In the early days of open source, everyone wanted to write their own license, he said. There were "dozens of organization-specific licenses", all of which were approved by the Open Source Initiative (OSI); sadly some of those licenses are still in use. He noted that the Postfix mail-transfer agent, which is the one he uses, is released under the IBM Public License. He does not think there is anyone at IBM who still remembers why it was drafted—or how—and what it had that some other already-existing license was missing.

Around 2004 or 2005, "almost everyone, in complete agreement, decided that there were too many open-source licenses". The only people who were opposed to the license non-proliferation efforts that came about was "basically a small group of lawyers". There was an opportunity to make a name for yourself as an open-source lawyer by creating a license and getting it approved by OSI.

Being a lawyer is a fairly boring job, Kuhn said; "open-source licensing was the most exciting thing to happen in the field of software licensing in history". So lawyers got excited by this prospect and created many different licenses—of highly variable quality. Some of those licenses still appear on the OSI list, though others have been deprecated at this point.

He has spent a lot of time worrying about the problem of approved, but low-quality, licenses, many of them copyleft or quasi-copyleft, over his career. As recently as a few months ago, he was complaining on the OSI mailing list about a proposed open-source license that was tied to the laws of Germany, though the proposer has not been forthcoming about why such a license is actually needed. While he still will jump in to try to prevent proliferation from time to time, these days he sees the problem as more of an annoyance than a disaster

Copyleft abuse

One of the interesting things he has seen with all of the license proliferation is the idea of "badgeware licenses". These are licenses that require something like the BSD advertising clause, but the dangerous thing is that the advertising requirement is linked to the copyleft provisions of the license. "If you don't do my badgeware the way I want you to do it, you lose your right to distribute the software."

There are licenses that purport to be copyleft licenses that have a badgeware clause. So, if you remove the logo of the company who released the software, you lose the right to distribute it—or perhaps even to use it as a web service of some kind. If this practice is confined to that particular license (or a small set of them), perhaps it is not that big of a problem, he said. The bigger problem is that companies are finding ways to use the established copyleft licenses in abusive ways.

Kuhn said that he has spoken at numerous editions of linux.conf.au (LCA), which is the predecessor to Everything Open, on the "danger and horror of proprietary relicensing"; he quickly summarized the problem in his talk. The idea is that a company runs a project where it requires all contributors to grant it the copyright or to sign a contributor license agreement (CLA) so that the company can offer the code under any license that it chooses. The code is then offered to the public under a copyleft license, typically either the GPL or the Affero GPL (AGPL).

Once people start using the code, though, the company spends its time finding users who "make minor mistakes with their copyleft compliance" and, effectively, shake them down to pay for a proprietary license. "Pretty nice business you have here, it would be a shame if you were violating our license." This business model was originally pioneered by Aladdin, which was the company behind the Ghostscript PostScript interpreter. There have been multiple lawsuits by the company (and its successors in interest) trying to get users to buy its proprietary license for supposedly not complying with the GPL.

There are certain areas of computing, open-source NoSQL databases for example, where this business model is more or less standard. MongoDB, Neo4j, and others release their code under copyleft terms, many using the "copyleft licenses that we know and love, but they are being used in this nefarious way". This is a huge problem, he said, and one that "we should have anticipated in license drafting"; there is a solution for that now, though it is not part of the GPLv3 or AGPL.

MongoDB was not happy with that model, he said, the company felt it did not go far enough, so it created its own license, the Server Side Public License (SSPL). The company said that it wanted to make a better copyleft license, but "as it turns out, no one can actually comply" with the SSPL; meanwhile the company was pitching that "we think that will get the most software freedom in the world". The SSPL requires that those offering its covered code as a service to also release all of the other code on the system under the SSPL—something that is completely impossible for anyone deploying on Linux (or, really, any other operating system).

Kuhn does not think anyone took MongoDB's claim that it was advancing the cause of copyleft for everyone's benefit seriously, but it is worrisome that the SSPL was seriously considered as a possible OSI-approved license until he pointed out its inherent flaw. He sees this incident as the culmination of what has been going on in the copyleft world for quite some time now; companies have "co-opted the message of copyleft, they have drafted on the idea that copyleft is some sort of sacrosanct principle" to try to make people believe what they are doing is "the right thing". What they are really doing is "misusing a tool".

Applying copyleft further

Another concerning thing is that people have started to want to apply copyleft to areas beyond just the realm of software freedom. A number of social-justice advocates have become excited about using copyleft to further the aims of social justice, for example. This came about when it was discovered that the US Immigration and Customs Enforcement (ICE) agency was using a variety of open-source programs, so people thought that software licenses should be used to prohibit these kinds of bad actors. While he agrees that ICE "is a very bad actor", copyleft licenses have always been focused on the issues of software freedom.

This has led to license proliferation of a different sort. He has tried to explain to the social-justice advocates that he does not think copyleft will work to achieve their aims, in a practical sense, even if some kind of agreement could be found on the wording for licenses of that sort. But these two sets of efforts, from companies in search of a business model and people looking to right injustices, have teamed up to cross boundaries that were long established norms in the copyleft community.

Over the years, that community has made plenty of mistakes, one of which is in not really focusing on adoption, both of the licenses and the software covered by them. "There was a lot of rhetoric in the free-software advocacy world that 'better the software be under a copyleft license and never used, than be under a non-copyleft license and be widely adopted in proprietary software'", Kuhn said. But that misses the point of how copyleft became successful; Linux is the most successful copyleft project because companies did not want to live without it. Unless the software gets adopted widely, it is not going to have a big impact on the software freedom of users.

Copyleft advocacy has not done the work "to convince developers that they want to create software [...] under copyleft", thus there is this weird belief that non-copyleft licenses are for adoption, while copyleft is for software freedom. "There is no software freedom unless you have adoption."

Drafting copyleft licenses

A problem that is apparent to him, now, in hindsight, is that the approach that has been used to draft new copyleft licenses is flawed. That can be seen in the work on GPLv3: it was targeted at large software businesses (and their lawyers). That led to a bunch of complicated licensing terms being added to the GPLv3 family of licenses, which has been a barrier to adoption of the license.

He did not participate in the drafting of GPLv3 as much as he would have liked, but in talking to those who were more heavily involved, he learned that there were serious lobbying efforts from large companies going on during the process. He believes that those who were drafting the license were ill-equipped to handle these aggressive lobbying efforts, which came easily to the companies because they lobby governments and each other all the time. "The inexperience of the GPLv3 drafting team led to some really bad decisions", he said.

For example, he does not understand why the GPLv3 ended up with a badgeware clause; it is effectively an additional restriction that can be placed on the distribution to require preservation of "specified reasonable legal notices or author attributions" (from clause 7b). The badgeware companies, with SugarCRM being the biggest proponent that he is aware of, wanted much more in that clause; "they feel like they got a horrible compromise". But the confusion about the language of that clause has led to various kinds of abuse.

His second example of where the GPLv3 process went awry was about the patent language. At the time of the license drafting, there was "this weird deal done by Novell and Microsoft" around patents with regard to Linux. He believes that the deal was clearly a violation of GPLv2, but it ended up adding a bunch of complicated legalese about patents to GPLv3 (section 11), which includes a date ("prior to 28 March 2007") that effectively grandfathers in the Novell/Microsoft agreement. "Who do you think lobbied to put that date in there?"

So, in an effort to get the GPLv3 out the door, two paragraphs of complicated language were included in it to preclude others from following in Microsoft's footsteps—but still allow the agreement that had already been made. That language is simply wasted space at this point, since the GPLv2 already had much simpler language precluding it—and it was already clear that the Linux kernel would not be switching to GPLv3 in any case. The only purpose it serves now is to confuse users who might adopt GPLv3 except that "it's so long and annoying and I don't know what all these words mean".

In the past, licenses have been managed and shepherded by license stewards, but he thinks that era has passed. He used to work for a license steward (the Free Software Foundation), but does not anymore; he does not want to see his current employer become a license steward either. He believes that "license stewardship has not worked for us". It raises the barrier to entry for new licenses and allows the elite to control what the licenses say. He makes that statement with full recognition of his membership in the "licensing elite", but he does not want to be in control of what the next copyleft license says. On the other hand, he would like to contribute to a new license on an equal footing with everyone else.

He thinks that people have mostly chosen licenses based on their stewards, rather than on reading the text of the license. Back in 1995, he read the text of the GPLv2 and could mostly work out what it was about; it was written to be read by developers first and lawyers second. GPLv3 was drafted by lawyers and he doubts that any developer really wants to read it. So people fall back on simply trusting the steward, which he does not think works well either.

The shark

Kuhn riffed a bit around the title of the talk: "Did FOSS Licenses Jump the Shark?" He put up a slide with artwork of Fonzie jumping the shark (from the 1970s TV show Happy Days) and asked: "Do we have to cancel the whole thing because it's over and just live with the licenses we have now?" The idea of "jumping the shark" refers to the decline in quality (and popularity) of the show after a cliff-hanger at the end of season 5 in 1977. It is a phrase that denotes something that has run its course and, perhaps, is simply moving forward on inertia after said shark-jumping.

Interestingly, though, season 5 was the peak of popularity for the show and it did decline a bit the next year, but not by all that much. The real decline came after season 7 when the star of the show left. "Wasn't Ron Howard leaving really jumping the shark, not Fonzie literally jumping the shark?", he asked with a laugh—maybe jumping the shark is not actually so bad. He showed another image of Henry Winkler, the actor who played Fonzie, jumping a shark in a different context, which is a kind of cultural reinterpretation or reinvention.

"I feel like we should just reinvent copyleft", Kuhn said. Sometimes he wonders if the whole idea of using licenses to promote software freedom should be abandoned. But there are some hard facts that make it difficult to take another path. The copyright regime restricts software freedom by default; that is true worldwide due to the Berne Convention. "So we are stuck with software being governed by copyright."

So copyleft licenses have to be a component of the struggle for software freedom. They are necessary, even if not sufficient, to achieve the aims of free software. He does not believe that software freedom will ever be mandated in legislation or written into the UN Declaration of Human Rights, which are other paths that might be chosen.

Relaunching copyleft-next

Kuhn has decided to help relaunch the copyleft-next project that was started by Richard Fontana a little over ten years ago. The project is going to try to write a copyleft license in an open and transparent way; "write a free-software license the way we write free software". That is really how it should be done.

The license should be written "slowly, carefully, deliberately, [and] don't rush to address the issues of the day". The license text should be short and readable by non-lawyers, something he thinks that GPLv2 accomplished and GPLv3 did not. The project uses the "hindering backchannels rule" (HBR) to try to ensure that lobbying efforts are thwarted by ensuring that all of the discussions are done openly and in public.

He would like to try to rekindle a world where hobbyists write free software. So he wants to work on a license in a "hobbyist way", rather than in a top-down fashion like a license steward might employ. While license language is not code, it does have some similarities, so he wanted to encourage developers to get involved.

The project is just getting restarted now. "Even if it fails, we need to experiment at this point", Kuhn said. GPLv3 itself is almost 15 years old and no one is looking at alternatives. The idea is to have a community-owned license, so there will be no steward other than the community as a whole. Fontana does not want to be the dictator for life, and, even though SFC will be funding some of Kuhn's work on the project, that organization will not be the steward either.

In an email, Kuhn said that he plans to put his slides in a permanent location soon, which we will link for interested readers. At some point soon, a video of the talk should appear in the Everything Open YouTube channel as well. [Update: The slides are now available.]

[Thanks to LWN subscribers for supporting my travel to Melbourne for Everything Open.]

Comments (235 posted)

Hopes and promises for open-source voice assistants

March 21, 2023

This article was contributed by Koen Vervloesem

At the end of 2022, Paulus Schoutsen declared 2023 "the year of voice" for Home Assistant, the popular open-source home-automation project that he founded nine years ago. The project's goal this year is to let users control their home with voice commands in their own language, using offline processing instead of sending data to the cloud. Offline voice control has been the holy grail of open-source home-automation systems for years. Several projects have tried and failed. But with Rhasspy's developer Mike Hansen spearheading Home Assistant's voice efforts, this time things could be different.

Science fiction shows and movies have sold us on the idea of spaceships and homes we can talk to. In recent years, voice control at home has become possible thanks to the so-called "smart speakers" from Google, Amazon, and Apple. However, there's nothing smart about these devices: their intelligence is almost completely in the cloud, where the user's voice recordings are processed and translated into sentences and meaning.

This is a complex and computationally intensive task, and these companies make us believe that their services are required to be able to use voice control. Of course this comes with downsides: users don't have any control over what's happening with their voice recordings, which is a big privacy risk. But, fundamentally, the problem lies even deeper. It just makes no sense for users to have their voices make a long detour through the internet just to turn on a light in the same room.

The challenges of offline voice control

Luckily, there have been some projects working on offline voice control for years now, some of them partially or even fully open-source. Of course, a voice assistant running on a home server will not have the same performance as those general-purpose smart speakers making use of servers in the cloud. However, it's possible to have a reasonably working voice-control system, even on a Raspberry Pi, if its purpose is limited to some specific domain, for example opening and closing blinds, turning on and off lights, asking what time it is, or whether the door is closed.

A voice-control software stack consists of a lot of parts. It all starts with wake-word detection: the voice assistant listens to an audio stream from a microphone and activates when it recognizes a wake word or phrase, such as "Hey Rhasspy". After activation, the microphone records audio until it detects that the user has stopped talking.

After that, a speech-to-text module transcribes what the user said into text, such as "What's the temperature outside?". This text is processed by an intent parser, which figures out what the user means: the intent. The result is then processed by an intent handler, which reads the temperature from a sensor at home or gets it using a web API, then returns text like "It's 20 degrees outside". A text-to-speech module then converts this text into audio using a synthesized voice, which is played on the speaker to reply to the user's request.

It's a challenge to bring all of these parts together to create a working and user-friendly voice-control system. It is even more difficult if the system needs to be completely open source and work offline. This article takes a look at some projects that were promising in the past, but failed. It also examines the currently most promising one: Rhasspy as part of Home Assistant.

Broken promises

A few years ago, Snips was a promising startup in the voice domain. The French company was founded in 2013 by three applied mathematicians with a mission to put artificial intelligence (AI) into every device, while respecting privacy. In 2016 they decided to focus on creating a voice assistant that processes the user's voice recordings offline, thereby offering privacy by design. The Snips voice assistant was able to run on a Raspberry Pi; much of the software was open source, under the permissive Apache 2.0 license.

Snips managed to attract a small but dedicated community of people that created Snips skills, which are intent-handler scripts (often written in Python) that react to intents recognized by Snips. I was one of those people, believing that Snips would help us reach the vision of an open-source offline voice assistant. There was also a Home Assistant integration to trigger actions based on the user's voice commands.

Snips had (and still has) a FAQ on its web site that explained that the proprietary part of their software would become open source "soon", and that's why I decided to become active in the community. However, that FAQ item kept saying "soon" for a long time. In early 2019, I interviewed the Snips CTO for a magazine and asked about the company's plans to open the rest of their source code. His answers were vague and didn't give me much confidence that it would actually happen. That's when I decided to throw in the towel and leave the Snips community. My feeling turned out to be correct: at the end of 2019, Snips was acquired by Sonos. The company's web-based Snips Console that was needed to train the voice assistant was shut down a few months later.

Death by patents

Another promising project was Mycroft. The company started around the same time as the Snips voice assistant. The developers worked on free software for most parts of the voice stack and released their software under the GPL license in May 2016 (and relicensed under Apache 2.0 in October 2017). The company created its own smart-speaker hardware, the Mark 1 and later the Mark II. Both devices were funded through successful Kickstarter campaigns, although the delivery of the Mark II was severely delayed after problems with the company's hardware partner.

While Snips had the problem that essential parts of its voice stack weren't open source, Mycroft had another problem: by default audio was sent to Google's speech-to-text services for speech recognition. Mycroft acted as a proxy, though, so Google only saw requests coming from Mycroft's servers, not from individual users—though, of course, the Mycroft servers still saw the original requests. Enterprising users could always swap the cloud-based speech-to-text service for an open source offline solution such as Mozilla DeepSpeech or Kaldi.

Mycroft started to have some real problems in 2020 when Voice Tech Corporation filed a patent-infringement lawsuit. Eventually, all of Voice Tech's claims were invalidated, but earlier this year Mycroft CEO Michael Lewis published "The End of the Campaign" on Kickstarter with some bad news: the company didn't have the funds to continue meaningful operations. The future looked bleak:

Since starting here in early 2020 I’ve had to make some of the toughest decisions I’ve ever faced, and none more so than at the end of last year. At the end of November, just after the Mark II entered production, I was faced with the reality that I had to lay off most of the Mycroft staff. At present, our staff is two developers, one customer service agent and one attorney. Moreover, without immediate new investment, we will have to cease development by the end of the month.
[...]So what went wrong? The single most expensive item that I could not predict was our ongoing litigation against the non-practicing patent entity that has never stopped trying to destroy us. If we had that million dollars we would be in a very different state right now.

Lewis also posted to the company blog at the end of January, with much of the same text as the Kickstarter article, but ending with: "There is much more to be said and many other topics that I will cover in future posts over the coming days." However, there has been no update to the blog at the time of this writing.

So is Mycroft dead now? It may live on in OpenVoiceOS, which started as a Linux-based operating system to run Mycroft. Over time, it forked Mycroft's core to add functionality that wasn't accepted upstream. A few weeks ago, the developers announced a plan to become a non-profit association under Dutch law and started a GoFundMe campaign to raise money to support their initiative. The OpenVoiceOS developers have been active in the Mycroft community for years; they have kept their fork compatible with Mycroft's skills, so a lot of users will probably move to OpenVoiceOS.

Rhasspy

Another project, and the one that I started contributing to after leaving the Snips community, is the MIT-licensed Rhasspy. It's an open-source, modular set of voice-assistant services that can function completely disconnected from the internet and it works well with home-automation software. Moreover, it's not limited to English, but supports many human languages.

Rhasspy went through a couple of architectural rewrites, and has benefited a lot from the shutdown of Snips. Around 2019-2020, there was an influx of people from the Snips community searching for an alternative voice assistant. Mike Hansen, Rhasspy's developer, saw an opportunity and broke the monolithic Rhasspy Python application into multiple services. The result was Rhasspy 2.5, with services communicating over Message Queuing Telemetry Transport (MQTT) using an extended version of the Hermes protocol from Snips. This modular approach allowed plugging in different implementations for wake-word detection, speech-to-text, intent recognition, and text-to-speech.

The use of the Hermes protocol allowed contributors to write Rhasspy skills in any programming language that can speak MQTT and JSON. For example, I wrote a helper library to create voice apps for Rhasspy in Python. Hansen also created a Rhasspy add-on for Home Assistant. Another popular project in the community is ESP32-Rhasspy-Satellite, which lets users run a Rhasspy "satellite" on an ESP32 microcontroller board for audio input and output that is then streamed over MQTT to and from the Raspberry Pi or other computer running Rhasspy's core services.

However, when Hansen joined Mycroft in November 2021, it looked like another open-source voice-assistant project might die a slow death. Mycroft hired Hansen to help get its Mark II smart speaker over the finish line; he explained on the Rhasspy forum that he would change Mycroft's dependence on a cloud server. He also said that he wouldn't abandon Rhasspy, and that he would like to merge both communities at some point.

Understandably, Rhasspy's development slowed down after this. But a year later, Hansen made another surprise announcement: he would join Nabu Casa, the company that is developing Home Assistant, as Voice Engineering Lead to work on Rhasspy full time.

The year of voice

Rhasspy's revival as part of Home Assistant means that it's getting another architectural rewrite. Hansen is now working on Rhasspy 3, which he calls "a very early developer preview". The main goals are still the same: it works completely offline, has broad language support, and is completely customizable. There's a tutorial on how to set up Rhasspy 3, but many of the manual steps will be replaced with something more user-friendly in the future.

Instead of using MQTT, Rhasspy 3 now has all its services communicating using the new Wyoming protocol; the services communicate using standard input and output, which lowers the barrier for programs to talk to Rhasspy. Essentially, the Wyoming protocol is JSON Lines (JSONL) with an optional binary payload.

So, if Rhasspy needs to send chunks of audio data to a speech-to-text program, it sends a single line of JSON with a type field telling the receiving program it's sending an audio chunk, an optional data field with parameters like the sample rate and the number of channels, as well as a payload length. After this, it sends the audio chunk of the given length. Of course, existing programs don't use this Wyoming protocol, but small Python programs can be written as adapters. For example, a speech-to-text program that accepts raw audio on stdin and sends the recognized text to stdout should be used with the asr_adapter_raw2text.py adapter.

Voice commands at home

Rhasspy 3 is focused on Home Assistant's use case: voice commands to control devices. For intent handling, Rhasspy 3 uses the Assist feature introduced in Home Assistant 2023.2. Intent recognition is powered by HassIL. This matches the user's input against sentence templates. For example, a template such as:

    (turn | switch) on [the] {area} lights

matches inputs like "turn on kitchen lights" and "switch on the kitchen lights".

Home Assistant supports a small list of built-in intents. For example, the HassTurnOn intent turns on a device. If the previous template to turn on the kitchen lights is defined as a template for this HassTurnOn intent, the intent Handler Assist will turn on the kitchen lights if the user's Home Assistant configuration has lights defined and assigned to the kitchen area.

The Home Assistant Intents project has the goal to create sentence templates for all possible home-automation intents in every possible language, released under the CC-BY-4.0 license. At the moment, 156 people have contributed sentences for 52 languages. The project has started with support for six intents to keep the work manageable; the goal is to increase this number slowly.

Conclusion

An offline, open-source voice assistant in one's own language is important for a lot of people. However, the technical challenges to doing so are rather high. Snips and Mycroft were able to attract a community, but failed to build a successful business. Rhasspy was quite successful in the small crowd of people that like to tinker and build their own voice assistant around the flexible services that the project offered, but the core was developed mainly by one person and the project wasn't backed financially. The good news is that Rhasspy will be tightly integrated with Home Assistant, which is one of the most active projects on GitHub, and its development is funded by Nabu Casa. So there's hope that we will finally reach those science fiction dreams. We will be able to control our homes with a user-friendly voice assistant that is both privacy respecting and made from open-source software.

Comments (7 posted)

Zero-copy I/O for ublk, three different ways

By Jonathan Corbet
March 16, 2023

The ublk subsystem enables the creation of user-space block drivers that communicate with the kernel using io_uring. Drivers implemented this way show some promise with regard to performance, but there is a bottleneck in the way: copying data between the kernel and the user-space driver's address space. It is thus not surprising that there is interest in implementing zero-copy I/O for ublk. The mailing lists have recently seen three different proposals for how this could be done.

1: Use BPF

There are few problems in the kernel, it seems, that cannot be addressed by throwing some BPF into the mix, and zero-copy ublk I/O would appear to be no exception. This patch set from Xiaoguang Wang adds a new program type (BPF_PROG_TYPE_UBLK) that can be loaded by ublk drivers and subsequently registered with one or more specific ublk devices. Once that happens, I/O requests generated by the kernel will be passed to that program rather than being sent to the user-space driver for execution. There is a new BPF helper function (not a kfunc, for unclear reasons) called bpf_ublk_queue_sqe() that allows BPF programs to add requests to the ring; this helper can be used to queue the I/O operations that fulfill the original block request.

There are a few advantages to handling these requests entirely in the kernel, starting with the ability to eliminate round trips with the user-space daemon. The biggest win, though, is likely to come from the fact that the BPF program has access to the buffers provided by the kernel and can use them directly for whatever I/O is needed to satisfy each request, eliminating a copy of that data. Block drivers can move quite a bit of data, so the advantage of avoiding copies should be clear. That said, this patch (like all the others discussed here) lacks benchmark results showing the performance improvement it enables.

2: Fused operations

Ming Lei, the author of the original ublk patches, has a rather different approach. Like ublk itself, this work is minimally documented and difficult to read, so this description is the result of a reverse-engineering effort and may well be wrong in some respects.

Operations in an io_uring ring are usually entirely separate from each other. There is a way to link them so that one operation must complete before the next can be dispatched, but otherwise each operation is distinct. Lei's patch set provides a rather tighter link between operations by adding the concept of "fused" operations — two operations that are tied together and which can share resources between them.

When a user-space ublk driver is running, it will receive commands from the kernel, via the ring, with instructions like "read N blocks from device D at offset O". With Lei's series applied, the driver will have the option to turn that operation into a fused command that is placed back into the ring for execution in the kernel. A fused command is two io_uring commands that are tied together; they must be submitted as a single unit. The "master" command (Lei's terminology) is of type IORING_OP_FUSED_CMD; it contains enough information for the ublk subsystem to connect the command to a request sent to the user-space driver. The "slave" command, instead, performs the actual I/O needed to satisfy that request.

As with the BPF solution, the key here is that the slave command has access to the buffer associated with the master; in this case, the slave command can access the kernel-space buffers associated with the original block I/O request. Once again, that allows the I/O to be performed without copying the data to or from the user-space driver. Once the slave command completes, the user-space driver can signal completion of the original block I/O request to the kernel in the usual way.

The fused-command functionality is a special-purpose beast; it will not work in any sort of general case. The subsystem receiving the fused command must have special support for it and, specifically, it must be able to locate the kernel-space buffer for the slave command and make the connection with a call to the new function io_fused_cmd_provide_kbuf() before the slave can execute. It is a fair amount of change to the io_uring subsystem, and it is not entirely clear that any other subsystem would be able to make use of it.

3: Use `splice()`

In the discussion after version 2 of Lei's patch set was posted, Pavel Begunkov observed that "it all looks a bit complicated and intrusive". He thought that it might be possible to, instead, reuse the mechanisms for the splice() system call. The io_uring "registered buffer" feature would be used to facilitate zero-copy operation. Shortly thereafter, he posted a preliminary, proof-of-concept implementation; it showed how this approach could work but was not complete.

Lei had a number of questions about this approach, mostly focused on how the buffer management works. It is not clear how well the splice() approach would work if I/O needs to be performed on a given buffer more than once — for example, when writing to a mirrored block device. The questions kept coming, and Begunkov has not (as of this writing) posted a complete version of the patch. It seems likely that the splice() approach will not go much further, though surprises can always happen.

Wang, meanwhile, has said that the fused-command approach seems like "the right direction to support ublk zero copy".

As was noted in the original ublk article, one of the key practical problems that has impeded the microkernel approach to operating-system design is the cost of communication between the components. Ublk has managed to reduce that cost considerably, but there is more to be gained if the cost of copying data between the kernel and user space can be eliminated. So chances are good that developers will continue to work on this problem until some sort of workable solution has been found.

Comments (5 posted)

Generic iterators for BPF

By Jonathan Corbet
March 17, 2023

BPF programs destined to be loaded into the kernel are generally written in C but, increasingly, the environment in which those programs run differs significantly from the C environment. The BPF virtual machine and associated verifier make a growing set of checks in an attempt to make BPF code safe to run. The proposed addition of an iterator mechanism to BPF highlights the kind of features that are being added — as well as the constraints placed on programmers by BPF.

One of the many checks performed by the BPF verifier at program-load time is to convince itself that the program will terminate within a reasonable period of time, a process that involves simulating the program's execution. This constraint has made supporting loops in BPF programs challenging since the beginning; it has only been possible to use loops since the 5.3 release. Even with that addition, convincing the verifier that a loop will terminate can be a challenge; this annoyance has led to, among other things, the addition of features like bpf_loop(), which puts the looping logic for some simple cases into the kernel's C code.

Not all problems are readily addressable by a simple function like bpf_loop(), though. Many loops in BPF programs are simply iterating through a set of objects, and BPF developers would like easier ways to do that. While numerous languages have some sort of built-in notion of iteration over a set, C does not. As noted above, though, BPF is not really C; this patch set from Andrii Nakryiko reiterates (so to speak) that point by adding an iteration mechanism to the BPF virtual machine.

In languages that support the concept of iteration with a specific type, there is usually a set of methods to implement for a new iterator type; they can be thought of as "start iteration", "next item", and "finish iteration". The proposed BPF mechanism follows that same pattern. Code to support iteration must be written (in real C) in the kernel, and it must supply four things, starting with a structure type to represent the iterator itself; the size of this structure must be a multiple of eight bytes. The iterator structure will have a name like bpf_iter_foo, and will contain whatever data the iterator needs to maintain its state.

The "new" function (or "constructor") must be called bpf_iter_foo_new(). Its first parameter will be a structure of the iterator type (which must be declared and instantiated in the BPF program); it can take an arbitrary number of other parameters. This function should initialize the iterator and return either zero or a negative error code; if initialization fails, the iterator must still be set up properly so that subsequent calls do the right thing.

The "next item" function is bpf_iter_foo_next(); it accepts the iterator as its only argument and returns a pointer to the next element (in whatever type the iterator supports). Even an iterator that just returns an integer must return a pointer to that integer. Returning a null pointer indicates that iteration is complete — or that some sort of error has occurred.

The bpf_iter_foo_destroy() function (the "destructor") takes a pointer to the iterator structure as its only argument and returns void; it completes iteration and performs any needed cleanup.

All of these functions must be declared as kfuncs with some flags indicating their special roles. The constructor must be marked as KF_ITER_NEW, the next function as KF_ITER_NEXT|KF_ITER_NULL, and the destructor as KF_ITER_DESTROY.

With this infrastructure in place, the verifier can perform a number of checks on an iterator, starting with the requirement that the constructor must be called before any other operations. Calls to the next function will be checked to ensure that the program is looking for the null return that indicates the end of iteration. The verifier ensures that the destructor is called at the end, and that the iterator is not accessed thereafter. It also uses the type information to ensure that a given iterator type is only passed to a set of functions that is declared to deal with that type.

The BPF subsystem also has some requirements on the C code implementing iterators, including the rule that the next function must return null after a reasonable number of calls. Since the verifier cannot know how many times an iterator-driven loop might run, its ability to enforce limits on the number of instructions executed by a BPF program is reduced; iterators have to help by not letting a program run indefinitely.

The patch series adds a mechanism enforcing the naming of the iterator type (it must start with bpf_iter_) and of the associated functions, which must be constructed by appending _new(), _next(), or _destroy() to the iterator type name. The arguments and return type of each function are also checked; if a check fails, the registration of the functions will fail.

One nice feature of this implementation is that iterators are, as far as the verifier is concerned, completely self-describing. Specifically, that means that there is no need to change the verifier itself to add new iterator types in the future, as long as they conform to this pattern.

As an example of how all this works, the series includes a sample "numbers" iterator that simply steps through a series of integers. The usage example on the BPF side looks like:

    struct bpf_iter_num it;
    int *v;

    bpf_iter_num_new(&it, 2, 5);
    while ((v = bpf_iter_num_next(&it))) {
        bpf_printk("X = %d", *v);
    }
    bpf_iter_num_destroy(&it);

This code will execute the body of the loop with *v holding values from two to four, inclusive.

Iterating through a count in this way is not hugely exciting, of course; that can already be done with bpf_loop() or, in the case like the above with constant bounds, by just coding a for loop. One expects that there are rather more advanced use cases in mind for this functionality, in the extensible scheduler class perhaps, but that has not been spelled out in the patch posting. Those can be expected to appear after the series is merged; given the apparent lack of opposition at this point, that seems likely to happen soon.

Comments (19 posted)

Reducing direct-map fragmentation with __GFP_UNMAPPED

By Jonathan Corbet
March 20, 2023

The kernel's direct map makes all of a system's physical memory available to the kernel within its address space — on 64-bit systems, at least. This seemingly simple feature has proved to be hard to maintain, in the face of the requirements faced by current systems, while keeping good performance. The latest attempt to address this issue is this patch set from Mike Rapoport adding more direct-map awareness to the kernel's page allocator.

Direct-map fragmentation

Over the course of a system's operation, the kernel will likely end up having to access almost every page of memory; if nothing else, it will need to load executable text and clear anonymous pages before giving them to user-space processes. The direct map is clearly useful for this work, as can be seen by the difficulties caused by systems that lack enough address space to hold a complete direct map. For much of its operation (including most memory accesses internally), the kernel simply uses direct-map addresses rather than a separate map for kernel space.

As a result, efficient access to the direct map is important; the way the direct map is managed has a significant effect on how efficient that access is. To understand the problem, a quick refresher on how page tables work may help. While page tables can seem like a simple linear array mapping page-frame numbers to physical pages, that would not be workable in practice; instead, page tables are implemented as a sparse hierarchy. Here is a simplistic diagram of how virtual addresses are interpreted first used in this 2013 article:

This diagram shows four levels of page tables: the page global directory (PGD), page upper directory (PUD), page middle directory (PMD), and the page-table entries (PTE). Current systems can add a fifth level, the P4D, between the PGD and the PUD. Virtual-address translation involves stepping through each level of the hierarchy; if the relevant data is not in the processor caches, this process can take a long time. To improve performance, processors have a translation lookaside buffer (TLB) that caches the result of a small number of recent translations. If an address is found in the TLB, the page-table walk can be avoided; improving the TLB hit rate can thus significantly increase the performance of the system.

One way to improve TLB usage is to use huge pages. A huge page has a special entry in one of the higher-level directories (the PMD or the PUD) saying that translation stops there. A PMD-level huge page will be (on most architectures) 2MB in size; a single PMD huge page can replace 512 PTE-level ("base") pages, all of which can be accessed through a single TLB entry. A (typically) 1GB PUD-level huge page expands the reach of a TLB entry even further.

The kernel's direct map is created using huge pages to reduce the TLB usage of kernel-space code, with measurable results. There is a problem, though: a huge page is managed by a single entry in the appropriate page directory, meaning that the same access permissions apply to the whole page. If the kernel needs to change the permissions for some base pages within a huge page, it must first break that huge page up into smaller pages, with a corresponding loss in access performance.

Increasingly, kernel developers are finding themselves needing to change direct-map permissions. Various sorts of address-space isolation mechanisms, for example, might remove some pages from the direct map entirely to prevent unwanted access. The increasingly stringent prohibition on pages that are both writable and executable means that, if the kernel needs to load executable code into its address space, it must split up any huge pages holding the target memory so that write permission can be removed and execute permission added; this happens when kernel modules and BPF programs are loaded, for example.

Breaking up one huge page to load a module or BPF program, or to isolate some memory, is not a huge problem. As the system runs, though, this can happen repeatedly, fragmenting the direct map over time. On systems where, for example, BPF programs are frequently loaded, the result can be a badly fragmented direct map and equally bad performance. This problem has led to a number of efforts, such as the BPF program allocator, intended to minimize the effect on the direct map.

Improving the page allocator

Rapoport's patch addresses this problem by adding yet another allocation flag called __GFP_UNMAPPED. When kernel code allocates one or more pages using this flag, they will be removed from the direct map before being returned to the caller. The value that is added is not just in the direct-map removal, though, but also in the cache that the page allocator maintains for __GFP_UNMAPPED allocations.

When the first such allocation request is made, the allocator will remove a PMD-sized huge page from the direct map, use a portion of it to satisfy the request, and hang onto the rest to satisfy future requests. Unmapped pages that are freed will be retained in that cache as well. The effect will be to focus these special requests on a single region of memory, avoiding the fragmentation of the direct map as a whole. There is also the inevitable shrinker that can be called when memory is tight; that will cause the release of pages in the __GFP_UNMAPPED cache back to the kernel for use elsewhere.

The patch set includes two uses of the new facility. One of those is in the x86 implementation of module_alloc(), which allocates space for loadable kernel modules. The other is in the implementation of memfd_secret(), which removes the allocated space from the direct map entirely, making it inaccessible to the kernel.

There are no benchmark results included with the patch set, so it is not really possible at this point to quantify just how much it can improve the performance of the system. The performance effects will be heavily workload-dependent in any case. But the problem being solved is well understood, and the effects of direct-map fragmentation have been measured in the past. So it seems clear that some sort of solution will need to be merged at some point. Whether this latest attempt is that solution remains to be seen; that may be a question for the upcoming LSFMM/BPF conference to address.

Comments (none posted)

Page editor: Jonathan Corbet
Next page: Brief items>>