Leading items

Welcome to the LWN.net Weekly Edition for March 9, 2023

This edition contains the following feature content:

Removing support for DeltaRPMs in Fedora: an attempt at more bandwidth-friendly updates comes to an end.
The SCO lawsuit, 20 years later: even after two decades, the effects of this suit are being felt in our community.
Kernel time APIs for Rust: bringing Rust into the kernel means more than just replicating existing internal APIs.
The rest of the 6.3 merge window: more changes for the next kernel release.
BTHome: An open standard for broadcasting sensor data: an attempt to bring some order to the protocols used by Bluetooth Low Energy devices.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Removing support for DeltaRPMs in Fedora

By Jake Edge
March 8, 2023

Way back in 2009, we looked at the presto plugin for yum, which added support for DeltaRPMs to Fedora. That package format allows just the binary differences (i.e. the delta) between an installed RPM and its update to be transmitted, which saves network bandwidth; the receiving system then creates the new RPM from those two pieces before installing it. Support for DeltaRPMs was eventually added to the distribution by default, though the feature has never really lived up to expectations—and hopes. Now, it would seem that Fedora is ready to, in the words of project leader Matthew Miller, "give DeltaRPMs a sad, fond farewell".

Miller raised the question of retiring DeltaRPMs in a February 21 post to the Fedora devel mailing list. He pointed to a five-year-old open bug report that described problems with retaining the .drpm files for packages due to the way the Pungi distribution composer works. Miller also noted that a thread from 2021 discussing "deltarpm usefulness?" did not come to any firm decision.

The problem is that the DeltaRPMs are created but only getting synced to the mirrors as part of the update composed on the day the DeltaRPMs was created; the next day, a new distribution update gets composed, without using the previous DeltaRPMs, and that gets pushed to the mirrors. The net effect, as Jonathan Dieter pointed out in the bug report, is that the DeltaRPMs are only available for a day; "That means that the only way to take full advantage of deltarpms in Fedora is to update every single day." Doing things that way "has very little end-user value", Miller said.

There are some benefits to using DeltaRPMs, especially for those on slow or metered connections, but generally bandwidth consumption has faded as a major problem for most people at this point. Meanwhile, as Kevin Fenzi pointed out in the bug report, there are still costs:

Sure, you save BW [bandwidth], but you spend more in CPU and disk I/O to reassemble the rpms on every client machine. You also make updates pushes slower and use more resources.

Beyond that, there are some technical hurdles in the way of better DeltaRPM retention on the Fedora mirrors. The problem has never risen to the top of anyone's priority list, which is unfortunate, but now probably is not the time to address that, Miller said. Instead, Fedora has other technologies: "We have ostree and various container-delta approaches. We should focus on those [...]".

There were few complaints in the thread about that conclusion—generally, the opposite, in fact. Stephen Smoogen wondered when the axe should fall; should DeltaRPMs be removed for Fedora 39 and beyond, after a particular date, or, perhaps, discontinued for the upcoming Fedora 38? Miller said that normally he would be suggesting starting with Fedora 39, since the deadline for changes to Fedora 38 (which is due in April) has passed, but the DeltaRPMs have largely not been available anyway. He asked the infrastructure and release-engineering folks what the ramifications of simply shutting off the creation of DeltaRPMs would be. Kevin Fenzi replied that it was an easy change to make (and to back out if need be), without much in the way of risks.

Both Gary Buhrmaster and Dennis Gilmore posted their anecdotal experiences with the availability (and savings) from DeltaRPMs. The feature did not really provide much for either of them; Buhrmaster put it this way:

While occasionally I have seen a small decrease in the size of the files transferred (which certainly can benefit some people some of the time), the total elapsed time of the transaction has always ended up being higher as the recreation of the original rpm exceeds the time that it would have taken me to just download the full new rpm (with an admittedly reasonably high speed network provider in my environment).

The savings versus cost of DeltaRPMs is not entirely easy to work out; Ben Cotton thought the tradeoff may be particularly problematic for those with lower-powered systems. "And since delta RPMs trade bandwidth for CPU, it probably makes things worse for folks in developing countries." He did wonder if the feature should only be turned off by default on Fedora installs, while the distribution continued building the .drpm files; that would "still allow people to opt-in to it" and the distribution to measure its use. But Miller said that, even with the measurement, it would still be difficult to judge whether the benefit outweighs the cost.

Several times in the recent thread, Demi Marie Obenour advocated removing DeltaRPM support in order to reduce the attack surface of the distribution. She is mostly concerned with security holes in the program that reassembles the RPM from the delta, as she described in a March 2022 post:

This assumes that deltarpm (the program) does not contain any security flaws of its own, which could allow for code execution while the deltarpm is being applied. This is a bad assumption: a cursory audit I did found that it is not designed with untrusted input in mind. The code is also quite hard to follow, which makes auditing it quite difficult. Finally, it exposes decompression libraries to untrusted input before signature verification, and it itself has at the very least several areas where a bad deltarpm could cause it to allocate gigabytes of RAM.

In the earlier thread that Miller had pointed to, there was some confusion about the integrity and authenticity checking for DeltaRPMs. Marek Marczykowski-Górecki had proposed disabling DeltaRPMs in that earlier thread as well, but thought another reason to do so was security-related because DeltaRPMs are "processed before checking the package signature, which exposes rather big attack surface [...]". But Fenzi described the process for assembling and checking a DeltaRPM, which is just as secure as the checking for a regular RPM, he said. Obenour's concerns about the reassembly process are still valid, of course, but:

drpms work by downloading the delta, then using it + the version you have installed to recreate the signed rpm (just like you downloaded the full signed update) and then the gpg signature is checked of that full rpm, just like one you downloaded. If the drpm is tampered with it won't reassemble and it will fall back to the full signed rpm.

The security concerns seem like they could be addressed, as could the build-process issues that drastically reduce the availability of DeltaRPMs, but there is no large impetus for the feature these days. The tradeoff is potentially valuable in some scenarios, but the cost has been deemed too high by many. Given that users have been largely living without even the limited benefits that the feature provides—for a fair number of years and Fedora releases—the time has come to let it go. That will not happen without some action by the Fedora Engineering Steering Committee (FESCo), but with essentially zero opposition to removing DeltaRPMs, agreement seems clear.

Comments (41 posted)

The SCO lawsuit, 20 years later

By Jonathan Corbet
March 3, 2023

On March 7, 2003, a struggling company called The SCO Group filed a lawsuit against IBM, claiming that the success of Linux was the result of a theft of SCO's technology. Two decades later, it is easy to look back on that incident as a somewhat humorous side-story in the development of Linux. At the time, though, it shook our community to its foundations. It is hard to overestimate how much the community we find ourselves in now was shaped by a ridiculous lawsuit 20 years ago.

SCO claimed to be the owner of the Unix operating system which, it said, was the power behind the "global technology economy"; the company sold a proprietary Unix system that ran on x86 hardware. By that point, of course, the heyday of proprietary Unix was already well in the past, and SCO's offerings were not doing particularly well. The reason for that, SCO reasoned, was the growth of Linux — which was true to a point, though Windows had been pushing a lot of Unix systems aside for years. But Linux, SCO said, couldn't possibly have reached a point where it threatened Unix on its own:

Prior to IBM's involvement, Linux was the software equivalent of a bicycle. UNIX was the software equivalent of a luxury car. To make Linux of necessary quality for use by enterprise customers, it must be re-designed so that Linux also becomes the software equivalent of a luxury car. This re-design is not technologically feasible or even possible at the enterprise level without (1) a high degree of design coordination, (2) access to expensive and sophisticated design and testing equipment; (3) access to UNIX code, methods and concepts; (4) UNIX architectural experience; and (5) a very significant financial investment.

It was the claim of access to Unix code that was the most threatening allegation for the Linux community. SCO made it clear that, in its opinion, Linux was stolen property: "It is not possible for Linux to rapidly reach UNIX performance standards for complete enterprise functionality without the misappropriation of UNIX code, methods or concepts". To rectify this "misappropriation", SCO was asking for a judgment of at least $1 billion, later increased to $5 billion. As the suit dragged on, SCO also started suing Linux users as it tried to collect a tax for use of the system.

Send in the clowns

Though this has never been proven, it was widely assumed at the time that SCO's real objective was to prod IBM into acquiring the company. That would have solved SCO's ongoing business problems and IBM, for rather less than the amount demanded in court, could have made an annoying problem go away and also lay claim to the ownership of Unix — and, thus, Linux. To SCO's management, it may well have seemed like a good idea at the time.

IBM, though, refused to play that game; the company had invested heavily into Linux in its early days and was uninterested in allowing any sort of intellectual-property taint to attach to that effort. So the company, instead, directed its not inconsiderable legal resources to squashing this attack. But notably, so did the development community as a whole, as did much of the rest of the technology industry.

Over the course of the following years — far too many years — SCO's case fell to pieces. The "misappropriated" technology wasn't there. Due to what must be one of the worst-written contracts in technology-industry history, it turned out that SCO didn't even own the Unix copyrights it was suing over. The level of buffoonery was high from the beginning and got worse; the company lost at every turn and eventually collapsed into bankruptcy.

At a talk some years ago, your editor got a good laugh by saying that, in the SCO case, we had the good luck to be sued by idiots. SCO created a great deal of fear, uncertainty, and doubt in the industry, but a smarter attack could have been a lot worse. Even as it was, this was a period when SCO was making waves by threatening to sue any company using Linux — at a time when our foothold was rather less well established than it is now. Microsoft, which had not yet learned to love Linux, funded SCO and loudly bought licenses from the company. Magazines like Forbes were warning the "Linux-loving crunchies in the open-source movement" that they "should wake up". SCO was suggesting a license fee of $1,399 — per-CPU — to run Linux.

All of this was a campaign to create a maximal level of fear around Linux and, as a result, to put pressure on IBM to settle. It certainly succeeded to an extent. Such an effort, in less incompetent hands, could easily have damaged Linux badly. As it went, SCO, despite its best efforts, instead succeeded in improving the position of Linux — in development, legal, and economic terms — considerably.

The enduring effects

Consider the charge of directly-copied source code — one that SCO CEO Darl McBride loudly made in May of that year. At the time, there was not a lot of oversight applied to code going into the kernel; the project had only just begun using BitKeeper as its first version-control system, after all, and the maintainer hierarchy that has served the project so well was in its infancy. It seemed almost inevitable that, among the millions of lines of code poured into the kernel from an unknown number of sources, some would be found to have been copied from Unix; source for various Unix distributions was not hard to come by in those days. Richard Stallman allowed that: "In a community of over half a million developers, we can hardly expect that there will never be plagiarism". The real question seemed be just how bad the damage would turn out to be.

The world waited for McBride to actually show all this copied code — sometimes said to be "millions of lines" — that he had found. The actual code turned out to be a snippet in the ia64 architecture subsystem. It undoubtedly shouldn't have been there; interestingly, it had already been removed by the time SCO fingered it. Beyond that, SCO's claims touched on code — read-copy-update and the Berkeley packet filter, for example — that could not possibly have come from anything it owned. When SCO was asked to put up its evidence, all that came out was a bunch of handwaving.

The important part is this, though: SCO's efforts and those it inspired put the Linux kernel code under the sort of microscope that few projects ever see. There were allegedly billions of dollars at stake, after all. But despite that incentive and all of the resources poured into inspecting the kernel source, nobody ever found all that copied code; it simply did not exist. SCO managed to prove the cleanliness of the kernel's pedigree in a far more convincing way than anybody else could have. Nobody now questions the legitimacy of the kernel's source code.

Another thing that is no longer questioned is the need for the free-software community to have lawyers on its side. It is not enough to be right; we have to be able to prove that we are right and deter potential attackers. The SCO lawsuit brought about a substantial increase in the legal resources available to the community, both within companies and in projects and related organizations. Anybody who hopes to extract rents from the free-software community now will face a strong, united, and capable defense.

A related change is the improved procedures that the community has adopted; just because SCO proved that the kernel's code was clean doesn't mean it will always be. The adoption of the developer's certificate of origin for kernel code is one obvious example; its purpose was to avoid the next SCO case. As Linus Torvalds said at the time:

People have been pretty good (understatement of the year) at debunking those claims, but the fact is that part of that debunking involved searching kernel mailing list archives from 1992 etc. Not much fun.
For example, in the case of "ctype.h", what made it so clear that it was original work was the horrible bugs it contained originally, and since we obviously don't do bugs any more (right?), we should probably plan on having other ways to document the origin of the code.
So, to avoid these kinds of issues ten years from now, I'm suggesting that we put in more of a process to explicitly document not only where a patch comes from (which we do actually already document pretty well in the changelogs), but the path it came through.

Many other projects have adopted similar procedures, most of which have the happy result of documenting the provenance of code without imposing heavy bureaucracy on the process. Efforts like SPDX are also partially motivated by the desire to avoid another SCO.

Long live the long-hair smellies

Perhaps the most significant outcome of this whole episode, though, is what it revealed about our community. If SCO wanted to scare developers and users into fleeing Linux, it certainly failed. While IBM waged a devastating campaign in the courts, it often seemed like many of the battles were won in the wider community; it turns out that, when thousands of developers and users join a fight against a common enemy, they can do amazing things.

Developers from across the community (occasionally referred to within SCO as the "long-hair smellies") put their time into debunking SCO's code-ownership claims, to great effect. Resources like Groklaw marshaled information for the defense and, just as importantly, informed the community about what was at stake and how the system works. Encouraged by this work, users stuck with Linux and refused to pay SCO's licensing demands. It was not just IBM's lawyers and money that won this fight; it was a widespread community that had built something special and had no intention of letting a failing company steal it.

Twenty years later, it is fair to say that Linux is doing a little better than The SCO Group. Its swaggering leader, who thought to make his fortune by taxing Linux, filed for personal bankruptcy in 2020. We survived a focused and determined attack that would have brought an end to many other enterprises, regardless of the injustice involved. But the SCO attack should never be forgotten, because of the ways that it changed our community, but also because, despite our much stronger position now, it could happen again. The Linux community has created a vast amount of wealth, whether measured in code or in actual money; when that happens, there will always be those who wish to steal some of that wealth. Hopefully we will be as lucky when the time comes to fend off the next one.

Comments (126 posted)

Kernel time APIs for Rust

By Jonathan Corbet
March 2, 2023

While the 6.3 kernel has gained more support for the Rust language, it still remains true that there is little that can be done in Rust beyond the creation of a "hello world" module. That functionality was already available in C, of course, with a level of safety similar to what Rust can provide. Interest is growing, though, in merging actually useful modules written in Rust; that will require some more capable infrastructure than is currently present. A recent discussion on the handling of time values in Rust demonstrates the challenges — and opportunities — inherent in this effort.

Asahi Lina, who is implementing a graphics driver for Apple hardware in Rust, has posted a number of pieces of Rust infrastructure, including a module for timekeeping functions. The timekeeping module itself is just a beginning, weighing in at all of 25 lines; it looks like this:

    // SPDX-License-Identifier: GPL-2.0
    
    //! Timekeeping functions.
    //!
    //! C header: [`include/linux/ktime.h`](../../../../include/linux/ktime.h)
    //! C header: [`include/linux/timekeeping.h`](../../../../include/linux/timekeeping.h)
    
    use crate::bindings;
    use core::time::Duration;
    
    /// Returns the kernel time elapsed since boot, excluding time spent
    /// sleeping, as a [`Duration`].
    pub fn ktime_get() -> Duration {
        // SAFETY: Function has no side effects and no inputs.
        Duration::from_nanos(unsafe { bindings::ktime_get() }.try_into().unwrap())
    }
    
    /// Returns the kernel time elapsed since boot, including time spent
    /// sleeping, as a [`Duration`].
    pub fn ktime_get_boottime() -> Duration {
        Duration::from_nanos(
            // SAFETY: Function has no side effects and no variable inputs.
            unsafe { bindings::ktime_get_with_offset(bindings::tk_offsets_TK_OFFS_BOOT) }
                .try_into()
                .unwrap(),
        )
    }

This module expresses two kernel functions — ktime_get() and ktime_get_boottime() — as Rust equivalents that return values as the Rust Duration type. In C, these functions both return a ktime_t, which is a signed, 64-bit value reflecting a time in nanoseconds. The origin of that time — what real-world date and time is represented by a ktime_t of zero — varies depending on which clock is being queried. In the case of ktime_get_boottime(), for example, the returned value represents the time that has passed since the system booted.

Kernel times are thus, at their core, a delta value; they are a count of nanoseconds since some beginning-of-the-universe event. The proposed Rust implementation followed that lead in its use of the Duration type. But Thomas Gleixner, who is responsible for much of the kernel's timekeeping code, questioned this approach. Since both of the functions are meant, each in its own way, to represent an absolute point in time, he suggested that the Rust functions should return an absolute-time type; he suggested either Instant or SystemTime. Both represent absolute time values; Instant is monotonic (it will never go backward) while SystemTime is not.

That approach will not work well in the kernel, though, for a couple of reasons. The first, as pointed out by Boqun Feng, is that those two types are defined in the Rust standard library ("std"), which, just like the C standard library, is not available in the kernel. So, at best, those two types would have to be reimplemented for kernel use. But the other problem is that the kernel supports a multitude of clocks; the list can be found in the clock_gettime() man page. Each clock has a reason for existing, and each behaves a little differently. A type like Instant is defined to use exactly one clock, but kernel code will need access to several of them.

Gleixner was not really concerned about the exact types used, but he did call for different types to be used for absolute times (timestamps) and delta times (or intervals). The kernel does not currently have such a distinction in its types for times, but libraries for many languages do make that distinction. Given that Rust is being brought into the kernel in the hope of making it easy to write safer code, it makes sense to use Rust's type system to prevent developers from, for example, trying to add two absolute-time values together.

Indeed, as Lina pointed out, type safety in the Rust interface should even go one step further. Subtracting one absolute time from another will yield a delta time — but that delta time will only make sense if the two absolute times came from the same clock. So the type system should prevent the incorrect mixing of times from different clocks.

What about delta times? Gleixner initially suggested that time deltas could be independent of any clock; a time delta obtained by subtracting one CLOCK_BOOTTIME value from another would be the same type as a delta calculated as a difference of CLOCK_TAI values. Heghedus Razvan agreed with this idea and posted a sample implementation; Gary Guo then polished that idea into a "more Rusty" implementation. Miguel Ojeda, though, suggested that delta times, too, could be tied to a specific clock. Gleixner was not entirely convinced that this distinction was needed, but agreed that there might be value in it, especially when dealing with timers. Kernel timers, too, can be based on specific clocks, so it might make sense to ensure that any time deltas used with those timers are generated with the same clock, he said.

Feng suggested proceeding with an implementation using Duration for all time delta values and something resembling Instant, but with clock-specific typing, for absolute times. Lina agreed, and seems ready to proceed in this direction, starting with the example posted by Razvan. A new patch will, presumably, be forthcoming.

It seems likely that we will see this sort of conversation happening repeatedly as more Rust infrastructure heads toward the mainline. It is certainly possible to reproduce something like the existing kernel APIs in Rust, and doing so would make the resulting Rust code look more familiar to current kernel developers. But that approach also risks losing much of the hoped-for benefit that is driving the Rust experiment in the first place. Doing this experiment properly will require showing how Rust can lead to the creation of better, safer APIs than what the kernel has now. So a lot of APIs are going to have to be redesigned from the beginning; they can benefit from years of experience in the kernel community, but will have to leave many of that community's conventions behind. It seems like a project that will keep people busy for some time.

Comments (30 posted)

The rest of the 6.3 merge window

By Jonathan Corbet
March 6, 2023

Linus Torvalds released 6.3-rc1 and closed the 6.3 merge window as expected on March 5. By that time, 12,717 non-merge commits (and 848 merges) had found their way into the mainline kernel; nearly 7,000 of those commits came in after the first-half merge-window summary was written. The second half of the 6.3 merge window was thus a busy time, with quite a bit of new functionality landing in the mainline.

Some of the most significant changes merged during this time are:

Architecture-specific

RISC-V kernels can use the "ZBB" bit-manipulation extension, when present, to accelerate string operations.
User-mode Linux (on x86-64 systems) now supports code written in Rust.
LoongArch has gained support for kernel relocation, kernel address-space layout randomization, hardware breakpoints and watchpoints, and kprobes.

Core kernel

It is now possible to create a non-executable memfd and prevent execute permission from being enabled thereafter.
The DAMOS memory-management interface has a new "filters" option; there is some documentation in this commit.
The new PR_SET_MDWE prctl() operation will cause any attempts to enable both write and execute permissions on memory in the target process to be denied; see this commit message for more information.

Filesystems and block I/O

The NFS filesystem (both the client and server sides) has gained support for AES-SHA2-based encryption.
The filesystems in user space (FUSE) subsystem has a new request extension mechanism that can be used to put additional information onto a request. The first use is to add supplementary groups to a filesystem request.
Christian Brauner has been added as a co-maintainer of the virtual filesystem (VFS) layer.

Hardware support

Clock: MediaTek MT7981 basic clocks, Qualcomm SM6350 camera clock controllers, Qualcomm SM8550 TCSR clock controllers, Qualcomm SM8550 display clock controllers, Qualcomm SA8775 and QDU1000/QRU1000 global clock controllers, and NXP BBNSM realtime clocks.
Graphics: Orise Technology ota5601a RGB/SPI panels, Visionox VTDR6130 1080x2400 AMOLED DSI panels, HIMAX HX8394 MIPI-DSI LCD panels, Intel "versatile processing unit" inference accelerators, and AUO A030JTN01 320x480 3.0" panels. Also, several ancient drivers (i810, mga, r128, savage, sis, tdfx, and via) have been removed; they were all marked as being obsolete in 2016.
Industrial I/O: TI TMAG5273 low-power linear 3D Hall-effect sensors, TI LMP92064 and ADS7924 analog-to-digital converters, Maxim MAX5522 digital-to-analog converters, and NXP IMX93 analog-to-digital converters.
Media: OmniVision OV8858 image sensors and Sony IMX296 and IMX415 sensors.
Miscellaneous: Unisoc UFS SCSI host controllers, Intel MAX 10 board management controllers with PMCI, Kinetic KTZ8866 backlight controllers, Microchip 8250-based serial ports, Ultrasoc system memory buffers, CoreSight trace, profiling, and diagnostics monitors, Qualcomm SDM670 and SA8775P interconnects, Freescale i.MX6/7/8 PCIe controllers in endpoint mode, Richtek RT9471 and RT9467 battery chargers, Loongson LS2X I2C adapters, GXP I2C Interfaces, Xilinx DMA/Bridge subsystem DMA engines, StarFive JH71XX power-management units, Renesas RZ/V2M external power sequence controllers, Allwinner D1 PPU power domain controllers, MediaTek SoC regulator couplers, Qualcomm ramp controllers, and Qualcomm PMIC GLINK interfaces.
USB: Renesas RZ/N1 USB function controllers, Renesas USB3.1 DRD controllers, Qualcomm SNPS eUSB2 PHYs, and Qualcomm SNPS eUSB2 repeaters.
Also: there is a new pata_parport driver that can manage IDE drives connected by way of a parallel port. With this driver, much of the work is done by the ATA layer, and the old PARIDE drivers have been removed. In this new world, it is no longer possible to put both drives and a printer on the port; this commit has some more information.
(Ask your parents if you're unfamiliar with parallel ports and IDE drives.)

Miscellaneous

The Nolibc library has gained support for the s390 architecture and the Arm Thumb1 instruction set.
The new hwnoise tool can measure timing jitter caused by hardware; this commit contains a man page describing its operation.
The perf tool has seen a number of improvements; this merge message has details.
The kernel now can feature a built-in Dhrystone benchmark test.

Virtualization and containers

KVM on x86 has gained the ability to support Hyper-V extended hypercalls. These calls are implemented by passing them through to user space on the host side.
Also on x86, KVM has made it easier to restrict the performance-measurement-unit events that are available to the guest; this commit has more information.

Internal kernel changes

The internal __GFP_ATOMIC memory-allocation flag has been removed. Almost nobody should notice the change.
The (default) V=0 make option has been removed. The V=1 (verbose mode) and V=2 (show the reason why a target was rebuilt) can now be selected together as V=12.
There has been a minor change to the developer's certificate of origin to make it clear that patches submitted using a nickname are acceptable.
The internal representation of capabilities has been changed to a simple u64 value. The previous (array) representation had been designed to allow for more capabilities in the future, but as Torvalds noted in the changelog, "the last thing we want to do is to extend the capability set any more".
Support for building with the Intel ICC compiler has been removed. It has seemingly been broken for some time and nobody noticed.

The next seven or eight weeks will be spent stabilizing this new code in preparation for the 6.3 release, which should happen on April 23 or 30.

Comments (21 posted)

BTHome: An open standard for broadcasting sensor data

March 7, 2023

This article was contributed by Koen Vervloesem

Many wireless sensors broadcast their data using Bluetooth Low Energy (BLE). Their data is easy to receive, but decoding it can be a challenge. Each manufacturer uses its own format, often tied to its own mobile apps. Integrating all of these sensors into a home-automation system requires a lot of custom decoders, which are generally developed by reverse-engineering the protocols. The goal of the BTHome project is to change this: it offers a standardized format for sensors to broadcast their measurements using BLE. BTHome is supported by the Home Assistant home-automation software and by a few open-firmware and open-hardware projects.

The chances are high that the manufacturer of a BLE device requires the use of a smartphone app to remotely view its data. But, technically, there's no need to use the app. The device advertises its name and some data; anyone with a BLE receiver in the neighborhood is able to pick up those BLE advertisements. What those apps do is to convert the raw data to information such as a temperature or humidity value using a protocol decoder for the proprietary data format.

Reverse-engineering the protocol used by these BLE sensors is often not that difficult: just pick up some BLE advertisements, look at the decoded information in the app or on the display if the device has one, and try to figure out the mapping between the data and the decoded information. Most of these protocols aren't encrypted or obfuscated in any way. Once the protocol has been figured out, a decoder can be developed and the app isn't needed anymore.

However, for all of those sensors to be supported in home-automation systems, these systems have to add decoders for all of those proprietary formats. For example, Home Assistant has decoders for BLE devices made by Xiaomi, Govee, Mopeka, ThermoPro, Inkbird, and more. Every time one of these manufacturers launches a new device or changes their protocol, the decoder has to be adapted, not only in Home Assistant, but also in many other home-automation projects.

One decoder to parse them all

One can't hope to change the way that commercial BLE device manufacturers transmit their data, but a few open-source do-it-yourself projects for BLE sensors have started to use their own formats too. So Ernst Klamer, who had been working on Bluetooth support in Home Assistant for a long time, created his own standardized data format for devices to send sensor data over BLE. This was added to Home Assistant 2022.9 and launched as a new open standard for sending sensor data: BTHome.

Last year, Raphael Baron's soil-moisture sensor b-parasite and pvvx's ATC_MiThermometer firmware (for various BLE temperature sensors made by Xiaomi) adopted the BTHome format. So these two devices don't need a custom protocol decoder anymore in Home Assistant, just the BTHome integration. Any other devices that implement BLE advertisements using the BTHome format are also automatically recognized by Home Assistant.

It's important to know BTHome's scope. It only defines a format for BLE advertisements, which means it is for devices that simply broadcast their data. Advertisements are the fundamental way that BLE devices communicate; they are also used in the discovery of these devices. Another BLE communication mechanism uses a connection: one device connects to the other one and accesses the latter's services to read and write data. BTHome does not use these BLE connections.

As a result, BTHome is for one-to-many unidirectional communication: a device broadcasts its sensor data and many other devices can pick up that data. There's no way to return data to a BTHome sensor or to control the device by sending commands. So to light an LED or turn on a switch, something other than BTHome is needed.

Flexible format

An advertising packet in BLE is a sequence of one or more advertising data structures. Each of these starts with a length byte, which specifies the number of bytes following, a byte describing the type of data structure, and then the data itself. For example, there's a "Flags" type indicating the capabilities of the device; there are also "Shortened Local Name" or "Complete Local Name" types to advertise the device's name.

Another type of advertising data structure is the "Service Data with 16-bit UUID", which starts with a 16-bit UUID designating the type of data or the company advertising the data; there are other types of Service Data with UUIDs of different lengths. These UUIDs are assigned by the Bluetooth Special Interest Group that oversees the Bluetooth standards. After the UUID, there is data that follows a format defined by either the BLE standard or the company, depending on the UUID.

A BTHome device broadcasts its sensor measurements as part of the Service Data. Allterco Robotics, the manufacturer of Shelly devices for home automation, has sponsored the UUID 0xFCD2 for BTHome. It's free to use for everyone with a license that restricts the use of the UUID to implementations of the BTHome protocol.

The data format after the UUID bytes is described in detail on BTHome's web site. It supports dozens of sensor types (including temperature, humidity, speed, and voltage), binary sensor types (which can be true or false, such as door open or closed, or power on or off), and events (button presses, double-presses, long-presses, and others).

The payload always starts with an object ID designating the sensor type and then the data. One packet of BTHome Service Data can include multiple measurements of the same or different types. Because each object ID corresponds to data with a known, fixed length (1 to 4 bytes), a parser just has to read the object ID, the known number of bytes for the data, then the next object ID, the corresponding data, and so on, until the end of the Service Data. An example is shown in the diagram below.

The specification has some extra requirements that enable compatibility with future versions. Object IDs need to occur in numerical order (from low to high) in a BTHome advertisement. If a new version of BTHome adds a new sensor type, its object ID will be a higher number than the ones supported in earlier versions. And if a BTHome parser encounters an unknown object ID, it has to stop parsing the advertisement. Thanks to the ordering of the object IDs, the supported sensor data is still parsed, because it comes before the unknown object ID.

Projects using BTHome

Apart from the already mentioned b-parasite and ATC_MiThermometer projects, there are not many devices that currently implement the BTHome standard. But there are some that can be used as an example for other implementations in various programming languages.

A simple but interesting example is the BTHome Electricity Meter for the Puck.js device. It is a Bluetooth beacon with lots of sensors and I/O pins that is running the Espruino firmware, which allows the microcontroller to be programmed using JavaScript. The BTHome Electricity Meter code reads total energy usage (in kWh) and live power consumption (in W) from an electricity meter with an impulse light, using a light-dependent resistor connected to one of the Puck.js's I/O pins. The JavaScript code is easy to follow as an example of how to encode the data in BTHome advertisements. For other JavaScript projects, Ron Šmeral's bthome.js is an interesting library that could be used to build on.

BTHome Weather Station is an ESP-IDF project for the ESP32, a popular microcontroller with WiFi and Bluetooth built in. The project's code reads the temperature, humidity, and pressure from a BME280 sensor connected over I²C to the microcontroller board, and advertises the sensor measurements over BLE using BTHome. The same code can be used for a lot of other types of sensors. The bthome-bme680 project has borrowed this code to advertise measurements from the BME680 gas sensor, also on an ESP32 board. And Ole Sæther, working for Nordic Semiconductor, has created an implementation using the Zephyr real-time operating system that works on his employer's nRF52840 BLE SoC: bthome_nrf_template.

However, when researching BTHome implementations, make sure that they're not using the BTHome legacy (v1) format, as this should not be used for new projects. For example, the ESP32_BTHome project advertising a temperature and counter with the NimBLE-Arduino library on an ESP32 microcontroller board is still using the legacy format, as is the bthome_nrf52840 project for Nordic Semiconductor's nRF52840 SoC.

While the previous examples are for devices that advertise BTHome payloads, for parsing there are even fewer options. The canonical parser implementation is Home Assistant's bthome-ble Python library. This can be used to decode BLE advertisements in other Python programs and to decode BTHome payloads. Ardelean Călin Petru has done some initial work on a Rust implementation: bthome-rs. As an exercise in understanding the BTHome format, I created a BTHome format description in Kaitai Struct. Because Kaitai Struct is also a parser generator, this description can be turned into a BTHome parser for the 11 programming languages it supports.

Encrypted advertisements

By default anyone in the neighborhood of a BTHome device can pick up its sensor measurements and decode them. For more security-conscious applications, the BTHome format also supports encrypted advertisements. This works using a pre-shared key that is set up in the device and then also configured in the receiver (such as in Home Assistant's BTHome integration). The sensor data in the advertisements cannot be decrypted without the key. Encryption is done using the Advanced Encryption Standard (AES) algorithm in CCM mode.

None of the example projects seem to implement encryption, so unfortunately there's no clear example to build on. The BTHome Weather Station project mentions encryption under Future Work, but explains that the documentation is pretty scarce and that there's only one example Python script, which is contained in the bthome-ble repository. It encodes some test data with temperature and humidity values, encrypts the payload, shows the encryption algorithm's parameters, decrypts the result, and decodes the data, showing the same temperature and humidity values.

Integrating BTHome sensors in Home Assistant

As a test for how easy it would be to implement the BTHome format, I created a BTHome project in CircuitPython that advertises movement detected by the inertial sensor built into the Seeed Studio XIAO nRF52840 Sense board. The BTHome format specification was quite clear and, with CircuitPython's low-level BLE module _bleio, it was easy to translate the sensor data to the corresponding advertising payload.

After adding the BTHome integration, my Home Assistant installation immediately recognized the Seeed Studio XIAO nRF52840 Sense board. When the board gets moved, its status is shown in Home Assistant's dashboard. So BTHome delivers on its promise of out-of-the-box support for do-it-yourself BLE sensors using the format for their advertisements.

Building an ecosystem around an open standard

The BTHome format is still quite young and, so far at least, there is no large ecosystem around it. However, with two popular device projects adopting it, there's hope that it will get some visibility. Instead of everyone inventing their own custom data format, having a flexible format specification such as BTHome for sensor measurements can help a lot of projects.

For device makers, it means out-of-the-box support in Home Assistant and potentially other home-automation controllers. The BTHome project is also quite responsive to suggestions for adding new sensor types. For developers of home-automation controllers or BLE apps, the format means support for a lot of different device types. Seeing a random BTHome sensor automatically recognized in Home Assistant provides an example for other home-automation projects to aspire to.

Comments (8 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>