LWN.net Weekly Edition for June 6, 2019
Welcome to the LWN.net Weekly Edition for June 6, 2019
This edition contains the following feature content:
- Fun with LEDs and CircuitPython: a PyCon talk on the joys of hardware-level programming in Python.
- Seeking consensus on dh: the Debian project debates mandating the use of a package-building helper.
- A ring buffer for epoll: a proposed scalability improvement for epoll that inevitably adds another ring buffer to the kernel API.
- Yet another try for fs-verity: a reworked file-protection scheme that appears to have addressed the biggest concerns raised by its predecessors.
- How many kernel test frameworks?: is the kernel getting too many separate test frameworks?
- SIGnals from KubeCon: a pair of KubeCon sessions describing the work of two Kubernetes special interest groups.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Fun with LEDs and CircuitPython
Nina Zakharenko has been programming for a long time; when she was young she thought that "the idea that I could trick computers into doing what I tell them was pretty awesome". But as she joined the workforce, her opportunities for "creative coding" faded away; she regained some of that working with open source, but tinkering with hardware is what let her creativity "truly explode". It has taken her years to get back what she learned long ago, she said, and her keynote at PyCon 2019 was meant to show attendees the kinds of things can be built with Python—starting with something that attendees would find in their swag bag.
As part of her shift in thinking, she realized that "software doesn't have to be serious"; it can be used to make art, for example. But she also realized that hardware doesn't need to be serious either, putting up a clip from a YouTube video of "The Breakfast Machine" created by "an incredible maker", Simone Giertz. She showed pictures of some of her own projects (which can be seen in her Speaker Deck slides), such as an Arduino-based iridescent LED headdress.
While
hardware is a recent interest of hers, she has long been "wary of the
ephemeral nature of software"; you can't hold it in your hands the way you
can with a hardware device.
Beyond that, she "wanted to make things that fit with my aesthetic".
She pointed out that she has pink hair, pink glittery nails, and some
glittery shoes that she showed off to the audience (seen in the photo).
Hardware lets her imagine something that fits that aesthetic, build it, and wear
it.
She described herself as a software engineer, who has worked for over a decade at some large companies "that you may have heard of", such as HBO, MeetUp, and Reddit. She is now at Microsoft as a senior cloud-developer advocate; her goal there is to make Visual Studio Code (VSCode) and Azure "a pleasure to use for Python developers". This was her seventh PyCon, Zakharenko said, and it is a "dream come true" to be on stage at PyCon "sharing what I am passionate about with all of you".
Python on hardware
She had already mentioned Arduino code, which is written in a C++ variant, that she had used for some of her projects. But C++ is hard to learn and error-prone, she said. We know from experience that Python is the opposite; it is easy to learn and beginner friendly. "We want to be able to program devices with Python". Luckily, she said, in each attendee's swag bag, they should find a Circuit Playground Express device. "I am going to show you how to program it."
She then did a live demo of the device, using a camera to display the device itself while typing Python at it in another window. One of the nice things about the device is how easy it is to get started writing programs for it. Hooking it up to a computer via USB provides power to it, a mountable storage device where programs can be copied to run on it, and a serial console. The console is where program output can be seen and the read-eval-print loop (REPL) can be accessed. She started by copying a five-line program to the mounted device as code.py; the device immediately ran the code resulting in the ten multi-color LEDs all being set to red. The program looked like the following:
from adafruit_circuitplayground.express import cpx
RED = (255, 0, 0)
cpx.pixels.brightness = 0.1
while True:
cpx.pixels.fill(RED)
With
a small change to the code, she reloaded it and all the LEDs were set to
green. "Easy peasy." As might be guessed, the change she made was to add a
line below the RED definition:
GREEN = (0, 255, 0)
She also changed the fill() call to use GREEN.
There are other devices that run Python, she said, including the Raspberry Pi Zero W, BBC micro:bit, and the Adafruit M0 and M4. Adafruit is also behind the Circuit Playground Express; she is big fan of the company, which produces low-cost, well-documented products for makers.
MicroPython (which LWN looked at back in 2015) is a variant of Python 3 that targets microcontrollers; it is compact enough to fit in 256KB of code space and run with 16KB of RAM. CircuitPython is an education-friendly fork of MicroPython created by Adafruit; it is what runs on the Playground Express device. Importantly, she said, both are open source.
Circuit Playground Express
The Circuit Playground Express is a $25 device that is meant to be a learning platform. Depending on where people get their devices, it may not have CircuitPython preinstalled, but that is not a problem for PyCon attendees as those in the swag bags are Python-ready. PEP 206 describes Python's "batteries included" philosophy, which means that Python has a rich standard library right out of the box (though it should be noted that there are some current efforts to potentially rethink the idea, which arose in a session at the 2018 Python Language Summit).
While the Playground Express device doesn't actually come with batteries, Zakharenko said with a grin, the philosophy behind it is the same. It comes with "everything you need to get started programming hardware on one board". She gave a bit of a tour of the board by pointing to various peripherals that it has. There are the ten NeoPixel multi-color LEDs, two buttons, a slide switch, a speaker and microphone, light and temperature sensors, and an accelerometer, all of which are accessible from CircuitPython.
The best feature of the board is that there is no soldering
required; "which means it is great for kids, but it is also great for
clumsy adults". There are connection pads that can be used with alligator
clips or even sewn into a garment with conductive thread. Some of the pads
are capacitive touch pads that will report True if they are
touched (such as by a finger or some object connected with an alligator
clip).
Programming the colors of the LEDs is straightforward, as long as you understand some basics of the color wheel, she said. Each LED takes a tuple of three values, from zero to 255, that represents the amount of red, green, and blue to display. Using that, you can display over 16-million colors on each of the ten NeoPixels. The LEDs are made up of three LED elements, one for each color; the eye mixes the colors that are produced by the elements to make a single color. She showed the operation of a NeoPixel from a microscope to illustrate how it worked.
It should be noted that not all micro-USB cables support both charging and data, she said, so it is important to use one that does both. The USB "drive" that shows up when the device is plugged in needs to be mounted. Then the code.py (or main.py) file in the root directory can be edited (or copied over) and the code will be reloaded and run whenever that file changes. She stepped through her original example, explaining each line; the import statement gets the Adafruit library that provides easy access to the hardware. She noted that the pixel brightness is high by default, so setting cpx.pixels.brightness to a lower value (0.1 in the example) is helpful. The infinite loop is there because the fill() function simply lights the LEDs once; for persistence they need to be refreshed continuously.
Demos
She then moved on to a more complicated example that used the buttons to control which color would display on the pixels. She loaded the code from a file on her laptop; the relevant section looked like:
if cpx.button_a:
cpx.pixels.fill(RED)
elif cpx.button_b:
cpx.pixels.fill(BLUE)
else:
cpx.pixels.fill(0)
She demonstrated that code working (it reloaded as soon as she saved it
from her editor), then in a bit of a live-coding exercise added a
definition for GREEN and some code to use that color if both
buttons were pressed.
She continued in that vein for a bit, making a version that cycled through
the colors based on button presses, which had a simple button debounce
mechanism by using time.sleep(0.2). She also added yellow, cyan,
and purple (or magenta) as colors.
Any editor can be used, but she cautioned that some editors do not fully write out the file when a save is done, which can cause problems. She uses VSCode, since it is the editor she already uses for Python after she switched from Emacs a few months back (though she still has Emacs keybindings in VSCode). She also pointed to Mu, which is a simple Python editor for beginners.
In order to see any program output, including tracebacks if there are errors in the Python code, you need to connect to the serial console. The easiest way to do so is to use Mu's builtin serial console. A single button in the interface will provide a second pane in its window with the console output. More advanced users may want to use Screen or another terminal program instead. Typing control-c in the terminal will send a keyboard interrupt to the running programming and give access to the REPL; as might be expected control-d will return to running the existing program.
The Playground Express can be powered via USB, but it also has a two-pin JST connector for battery power. It needs 3.5V, so a battery pack that holds 3 AAA batteries can be used for portable power. In addition, advanced users can consider a lithium polymer (LiPo) battery, which are small, light, and energy dense, but are also somewhat fragile; without proper care, they can explode or catch on fire.
She showed another demo using code that had come from another Playground Express device that she had. It had two modes, which could be chosen based on the position of the slide switch. In both modes it individually changed the LED colors in a kind of spinning rainbow, but it would also act as a capacitive touch "piano" in the second mode. Examining the code will show it accessing some of the other peripherals and printing out readings from various sensors.
For those who want to "program" without typing code, she recommended Microsoft MakeCode. It is a block-based programming environment. There is an Adafruit MakeCode site that allows one to use the programming environment without any hardware at all. There is an emulator that can be used to interact with the "device".
Diversity
She gave some statistics based on a survey that the BBC did after it distributed micro:bit devices to schoolchildren in the UK. It found that 90% of students thought that the device showed them that anyone can code; 86% thought that the micro:bit made computer science more interesting. In a statistic that hit close to home, Zakharenko said, 70% more girls said that they would take up computer science as a subject option after using the micro:bit; that brought a round of applause from the audience.
A study called "Unlocking the Clubhouse" suggested that traditional computing culture is a boy's club that is unfriendly to women; it concluded that we need to find ways to unlock this clubhouse to make it accessible. Another study, "LilyPad in the wild", suggested building new clubhouses; rather than trying to fit new people into existing cultures, "it may be more constructive to spark and support new cultures". The idea is to make room for more diverse interests and passions; "diversity does not stop at gender".
She would like to see the "Python on hardware" community encourage everyone to participate, regardless of their gender, ethnicity, socioeconomic status, age, or any of a number of other factors. The overall hardware community is "really great" about sharing what it does with others, providing step-by-step guides to help build a wild assortment of different things. Python on hardware is really just getting started at this point, she said.
She quoted Madison Maxey, the founder of smart-fabric technology maker Loomia, who said: "You can be a creator and artist, not just a consumer". She also mentioned Angela Sheehan, who created infinity mirror heart heels "and a lot more". Zakharenko said that she really looked up to Sheehan as she follows the same passion, "making things that fit her aesthetic". That is the nature of hardware, Zakharenko said, it allows for instantaneous results in the real world, which helps make technology more accessible. She showed one of her favorite pictures: two girls with huge, beaming smiles who were wearing the light-up masks they built as part of a Made By Girls summer camp program. E-textiles, sewable electronics, papercraft, and hardware focused on education all help expose technology in new and unexpected ways, she said.
She noted that eagle-eyed attendees may have noticed that she is wearing one of her projects, pyearrings, which are LED earrings powered by Python. They are made with Adafruit Gemma M0 boards and two different LED rings, along with a LiPo battery. She turned them on to loud applause. They can be seen in the photo above.
In closing, she quoted Lawrence Durrell: "Our inventions mirror our secret wishes." That quote really drives it home, she said: "we want to build things that reflect us". She said that anyone in the room could start on that process using the Circuit Playground Express that they got from the conference sponsors; she is not an expert, just a hobbyist, but you don't need to be an expert.
"I implore you to learn, make things, create, teach others, share your knowledge; it is a fantastic cycle that we can all participate in." She suggested possibly donating the board to someone from a group that is underrepresented in technology and showing them how to get started with it. She recommended using the (almost) unused "#PythonHardware" hashtag to share results and ideas; the tag had only been used once by the time of the keynote, by Raymond Hettinger after getting a preview of Zakharenko's talk.
The device is compelling and fun; after playing around with it for a bit after the conference, it is clear that anyone with an interest could get started with it quickly—especially with a jump start from someone in the community.
A YouTube video of the talk is available.
[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Cleveland for PyCon.]
Seeking consensus on dh
Debian takes an almost completely "hands off" approach to the decisions that Debian developers (DDs) can make in regard to the packaging and maintenance of their packages. That leads to maximal freedom for DDs, but impacts the project in other ways, some of which may be less than entirely desirable. New Debian project leader (DPL) Sam Hartman started a conversation about potential changes to the Debian packaging requirements back in mid-May. In something of a departure from the Debian tradition of nearly endless discussion without reaching a conclusion (and, possibly, punting the decision to the technical committee or a vote in a general resolution), Hartman has instead tried to guide the discussion toward reaching some kind of rough consensus.
The question revolves around an adjunct to the debhelper tool that is used to build many Debian packages. The additional tool is a "command sequencer" for debhelper commands; it is called dh. Debhelper has commands that get invoked from the rules file that is used to build a .deb from the source code and other files that are part of a Debian package. By default, dh steps through a sequence of debhelper commands that should suffice to build many types of packages; if some of the steps need overrides or changes, that can be handled as well. In effect, dh encapsulates the standard way to build a Debian package using debhelper.
But not all packages use dh, so Hartman asked whether
the distribution wanted to require, or at least recommend, the use of dh.
In that posting to debian-devel, he noted that some have said that a package not
using dh has a "package smell
", which is an indication
that the maintainers should consider fixing it. His question might ultimately
boil down to "whether maintainers
should be expected to apply well-written patches to convert a package to
using dh
".
Hartman noted that there will likely always be exceptions for some packages
where using dh does not make sense. He also summarized some of the reasons
for and against the idea. Using "dh makes packaging simpler
",
he said, but it is more than just that. It also makes it easier for other
maintainers to understand and help out with package maintenance. That can
make a real difference when there are tree-wide efforts, like hardening or
reproducible builds, but it also would tend to draw DDs together as a
team. The main argument against a change toward pushing dh is that it would
introduce bugs in package building simply due to conversion mistakes.
Beyond that:
Holger Levsen thought that the Debian Policy Manual should be changed to have dh as a "should" requirement, except in two specific areas: those packages that already use the Common Debian Build System (CDBS) and those packages that debhelper is dependent upon. Eventually, the "should" could become a "must". The Haskell ecosystem uses features of CDBS (which is seemingly not documented anywhere) that are not available in debhelper, so it may not be a good candidate for moving to dh, at least at this point. There has been some work to add a dh_haskell command to debhelper, Sean Whitton said, but it has stalled. Marc Dequènes pointed out that CDBS may be on its way toward retirement at this point, which should be kept in mind. Others agreed with Levsen's ideas, though some would like to see debhelper eventually take over for CDBS, which fits well if CDBS is on its way out.
But Marco d'Itri wondered
why he would convert his debhelper-using packages to dh: "I use debhelper in all of my packages but I have never switched to dh:
why should I bother?
'Everybody is doing this' is not much of an argument.
" Simon
McVittie had a lengthy
response that described some of the extras that dh provides.
Essentially, it will help those who modify the packages ("perhaps
your future self
") by allowing them to ignore various semi-tedious
details that have to be worked out without it. In addition, those details
can change over time, which will not necessarily be reflected in packages
that only use debhelper.
There was also discussion of whether it was a appropriate to use the non-maintainer
upload (NMU) process to change a package to using dh. In general, it
was not seen as a reasonable way to switch a package to dh. As Scott
Kitterman put it:
"A likely bug inducing drive-by NMU is not helpful.
" He
thinks that new packages should, in general, be built using dh, but is less
convinced that existing packages will truly benefit—and the likely result
will be lots of tricky bugs:
For really complex packages, they are going to be hard for someone not familiar with the package to modify regardless of dh/debhelper. My guess is that if we try and push this direction the effort will mostly be spent where there is the least potential for gain and the most risk of regressions.
For improvement of existing packages, I think there are better things to expend resources on.
Whitton largely agreed with Kitterman about the distinction between new and existing packages. Ian Jackson also thought it made sense, though he did have an anecdotal data point about his efforts to convert the Xen package, which worked out well, overall. For policy, though, he didn't think converting should be mandated:
As the conversation started winding down a bit, Hartman said that he
planned to try to summarize where consensus was found and not found in the
discussion. That resulted in his consensus call
post on May 25. He used the consensus process in RFC 7282 as a model,
though, of course, it "has
no force here
"; it does, however, have some useful thoughts, he
said:
He asked that people make comments by June 16 and to clearly
distinguish where they thought his summary was not accurate versus comments
meant to help establish a different consensus. His summary laid out some
of the issues that recurred in the earlier thread: that dh makes
cross-archive changes easier, that converting to dh could be seen as churn
that made other people's efforts harder, especially if it wasn't tested
well, and that Debian has "valued respecting the maintainer's preference
above most other factors
". Pushing for more uniformity is perhaps a
step away from that last item.
He also summarized the areas where he thought rough consensus had been
reached. One is that, except in certain circumstances, using dh is the
right thing for
a maintainer to do. Those certain circumstances were outlined
(e.g. the Haskell ecosystem),
though he called them "exceptional circumstances", which Jackson thought
sent a slightly wrong message. Jackson suggested "unusual circumstances"
since that does not necessarily imply rarity,
saying that the consensus he heard was "that dh
should be used unless there is 'some reasonable reason' not to
".
Hartman liked the "some reasonable reason" wording and plans to try to work
that in.
There was also clear consensus on not using the NMU process to convert packages. The area where consensus is more murky is around whether not using dh should be considered a bug that can be filed against a package. It is clearly not a release-critical (RC) bug, he said, nor would it be a bug if one of the exceptions (or "some reasonable reason") applies. Hartman's best guess is that there may be consensus that not using dh would be considered a normal bug, but that mass-filing bugs is not the way forward either.
While there has been a fair amount of discussion in response, it mostly
veered away from either of the two types of responses that Hartman was
seeking. That probably indicates that his summary is generally
accurate and that participants in the debian-devel mailing list are
sanguine about the consensus reached. Kitterman disputed
the idea that not using dh was a bug of any sort, but thought that it could
be turned
into one by changing Debian policy. There is still more than a week to run
on the consensus call, but if things stand as they are, Hartman plans to
"talk
to various people who are impacted about next steps
", which
presumably includes the policy editors.
To an outside observer, but one who has looked in on Debian discussions a
few times over the years, the process has gone much more smoothly than it
often does. Jackson called it "an awesome way to conduct
this discussion/decisionmaking/whatever
". Perhaps this particular
set of topics is not as controversial and heat inducing as some, but
patiently working through the issues and trying to find common ground where
it exists does seem like an improvement. It will be interesting to see
where it all goes from here.
A ring buffer for epoll
The set of system calls known collectively as epoll was designed to make polling for I/O events more scalable. To that end, it minimizes the amount of setup that must be done for each system call and returns multiple events so that the number of calls can also be minimized. But that turns out to still not be scalable enough for some users. The response to this problem, in the form of this patch series from Roman Penyaev, takes a familiar form: add yet another ring-buffer interface to the kernel.The poll() and select() system calls can be used to wait until at least one of a set of file descriptors is ready for I/O. Each call, though, requires the kernel to set up an internal data structure so that it can be notified when any given descriptor changes state. Epoll gets around this by separating the setup and waiting phases, and keeping the internal data structure around for as long as it is needed.
An application starts by calling epoll_create1() to create a file descriptor to use with the subsequent steps. That call, incidentally, supersedes epoll_create(); it replaces an unused argument with a flags parameter. Then epoll_ctl() is used to add individual file descriptors to the set monitored by epoll. Finally, a call to epoll_wait() will block until at least one of the file descriptors of interest has something to report. This interface is a bit more work to use than poll(), but it makes a big difference for applications that are monitoring huge numbers of file descriptors.
That said, it would seem that there is still room for doing things better. Even though epoll is more efficient than its predecessors, an application still has to make a system call to get the next set of file descriptors that are ready for I/O. On a busy system, where there is almost always something that is needing attention, it would be more efficient if there were a way to get new events without calling into the kernel. That is where Penyaev's patch set comes in; it creates a ring buffer shared between the application and the kernel that can be used to transmit events as they happen.
epoll_create() — the third time is the charm
The first step for an application that wishes to use this mechanism is to tell the kernel that polling will be used and how big the ring buffer should be. Of course, epoll_create1() does not have a parameter that can be used for the size information, so it is necessary to add epoll_create2():
int epoll_create2(int flags, size_t size);
There is a new flag, EPOLL_USERPOLL, that tells the kernel to use a ring buffer to communicate events; the size parameter says how many entries the ring buffer should hold. This size will be rounded up to the next power of two; the result sets an upper bound on the number of file descriptors that this epoll instance will be able to monitor. A maximum of 65,536 entries is enforced by the current patch set.
File descriptors are then added to the polling set in the usual way with epoll_ctl(). There are some restrictions that apply here, though, since some modes of operation are not compatible with user-space polling. In particular, every file descriptor must request edge-triggered behavior with the EPOLLET flag. Only one event will be added to the ring buffer when a file descriptor signals readiness; continually adding events for level-triggered behavior clearly would not work well. The EPOLLWAKEUP flag (which can be used to prevent system suspend while specific events are being processed) does not work in this mode; EPOLLEXCLUSIVE is also not supported.
Two or three separate mmap() calls are required to map the ring buffer into user space. The first one should have an offset of zero and a length of one page; it will yield a page containing this structure:
struct epoll_uheader {
u32 magic; /* epoll user header magic */
u32 header_length; /* length of the header + items */
u32 index_length; /* length of the index ring, always pow2 */
u32 max_items_nr; /* max number of items */
u32 head; /* updated by userland */
u32 tail; /* updated by kernel */
struct epoll_uitem items[];
};
The header_length field, somewhat confusingly, contains the length of both the epoll_uheader structure and the items array. As seen in this example program, the intended use pattern appears to be that the application will map the header structure, get the real length, unmap the just-mapped page, then remap it using header_length to get the full items array.
One might expect that items is the ring buffer, but there is a layer of indirection used here. Getting at the actual ring buffer requires calling mmap() another time with header_length as the offset and the index_length header field as the length. The result will be an array of integer indexes into the items array that functions as the real ring buffer.
The actual items used to indicate events are represented by this structure:
struct epoll_uitem {
__poll_t ready_events;
__poll_t events;
__u64 data;
};
Here, events appears to be the set of events that was requested when epoll_ctl() was called, and ready_events is the set of events that has actually happened. The data field comes through directly from the epoll_ctl() call that added this file descriptor.
Whenever the head and tail fields differ, there is at least one event to be consumed from the ring buffer. To consume an event, the application should read the entry from the index array at head; this read should be performed in a loop until a non-zero value is found there. The loop, evidently, is required to wait, if necessary, until the kernel's write to that entry is visible. The value read is an index into the items array — almost. It is actually the index plus one. The data should be copied from the entry and ready_events set to zero; then the head index should be incremented.
So, in a cleaned up form, code that reads from the ring buffer will look something like this:
while (header->tail == header->head)
; /* Wait for an event to appear */
while (index[header->head] == 0)
; /* Wait for event to really appear */
item = header->items + index[header->head] - 1;
data = item->data;
item->ready_events = 0; /* Mark event consumed */
header->head++;
In practice, this code is likely to be using C atomic operations rather than direct reads and writes, and head must be incremented in a circular fashion. But hopefully the idea is clear.
Busy-waiting on an empty ring buffer is obviously not ideal. Should the application find itself with nothing to do, it can still call epoll_wait() to block until something happens. This call will only succeed, though, if the events array is passed as NULL, and maxevents is set to zero; in other words, epoll_wait() will block, but it will not, itself, return any events to the caller. It will, though, helpfully return ESTALE to indicate that there are events available in the ring buffer.
This patch set is in its third revision, and there appears to be little opposition to its inclusion at this point. The work has not yet found its way into linux-next, but it still seems plausible that it could be deemed ready for the 5.3 merge window.
Some closing grumbles
Figuring out the above interface required a substantial amount of reverse engineering of the code. This is a rather complex new API, but it is almost entirely undocumented; that will make it hard to use, but the lack of documentation also makes it hard to review the API in the first place. It is doubtful that anybody beyond the author has written any code to use this API at this point. Whether the development community will fully understand this API before committing to it is far from clear.
Perhaps the saddest thing, though, is that this will be yet another of many ring-buffer interfaces in the kernel. Others include perf events, ftrace, io_uring, AF_XDP and, doubtless, others that don't come immediately to mind. Each of these interfaces has been created from scratch and must be understood (and consumers implemented) separately by user-space developers. Wouldn't it have been nice if the kernel had defined a set of standards for ring buffers shared with user space rather than creating something new every time? One cannot blame the current patch set for this failing; that ship sailed some time ago. But it does illustrate a shortcoming in how Linux kernel APIs are designed; they seem doomed to never fit into a coherent and consistent whole.
Yet another try for fs-verity
The fs‑verity mechanism has its origins in the Android project; its purpose is to make individual files read-only and enable the kernel to detect any modifications that might have been made, even if those changes happen offline. Previous fs‑verity implementations have run into criticism in the development community, and none have been merged. A new version of the patch set was posted on May 23; it features a changed user-space API and may have a better chance of getting into the mainline.Fs‑verity works by associating a set of hashes with a file; the hash values can be used to check that the contents of the file have not been changed. In current implementations, the hashes are stored in a Merkle tree, which allows for quick verification when the file is accessed. The tree itself is hashed and signed, so modifications to the hash values can also be detected (and access to the file blocked). The intended use case is to protect critical Android packages even when an attacker is able to make changes to the local storage device.
Previous versions of the fs‑verity patches ran aground over objections to how the API worked. To protect a file, user space would need to generate and sign a Merkle tree, then append that tree to the file itself, aligned to the beginning of a filesystem block. After an ioctl() call, the kernel would hide the tree, making the file appear to be shorter than it really was, while using the tree to verify the file's contents. This mechanism was seen as being incompatible with how some filesystems manage space at the end of files; developers also complained that it exposed too much about how fs‑verity was implemented internally. In the end, an attempt to merge this code for 5.0 was not acted upon, and fs‑verity remained outside of the mainline.
The new patch set addresses these concerns by moving the generation of the Merkle tree into the kernel and hiding the details of where this tree is stored. To enable fs‑verity protection for a file, a user-space application starts by opening the file in question. Despite the fact that this operation changes the file (by adding the protection and making the file read-only), this file descriptor must be opened for read access only. Then, the new FS_IOC_ENABLE_VERITY ioctl() command is invoked on this file; the application passes in a structure that looks like this:
struct fsverity_enable_arg {
__u32 version;
__u32 hash_algorithm;
__u32 block_size;
__u32 salt_size;
__u64 salt_ptr;
__u32 sig_size;
__u32 __reserved1;
__u64 sig_ptr;
__u64 __reserved2[11];
};
The version field must be set to one; it is there to allow different fs‑verity implementations in the future. Similarly, the reserved fields must all be set to zero. hash_algorithm tells the kernel which algorithm to use for hashing the file's blocks; the only supported values at the moment are FS_VERITY_HASH_ALG_SHA256 and FS_VERITY_HASH_ALG_SHA512. The block size for the hash is set in block_size; it must match the filesystem block size. If salt_size and salt_ptr are set, they provide a "salt" value that is prepended to each block prior to hashing. A digital signature for the hash of the file can optionally be added using sig_ptr and sig_size; more on that shortly.
This ioctl() call will read through the entire file, generating the Merkle tree and storing it wherever the filesystem thinks is best. If the file is large, this operation can take some time; it can be interrupted with a fatal signal, leaving the file unchanged. Enabling fs‑verity will fail if there are any open, write-enabled file descriptors for the target file.
After the operation succeeds, the file will be in the fs‑verity mode. Opens for write access will fail, even if the file's permission bits would otherwise allow writing. Some metadata can still be changed, though, and the file can be renamed or deleted. Any attempt to read from the file will fail (with EIO) if the data of interest does not match the stored hash. If user space is counting on fs‑verity protection, though, it should, after opening the file, verify that this protection is present with the FS_IOC_MEASURE_VERITY ioctl() call, which takes a pointer to this structure:
struct fsverity_digest {
__u16 digest_algorithm;
__u16 digest_size; /* input/output */
__u8 digest[];
};
If the file is protected with fs‑verity, this structure will be filled in with summary hash information.
User space can use that information to verify that the digest data matches expectations; without that test, an attacker could substitute a new file with hostile contents and a matching Merkle tree. Alternatively, this digest can be signed and the kernel will verify that it matches at access time. What must actually be signed is this structure:
struct fsverity_signed_digest {
char magic[8]; /* must be "FSVerity" */
__le16 digest_algorithm;
__le16 digest_size;
__u8 digest[];
};
The digest information can be obtained from the kernel using the FS_IOC_MEASURE_VERITY ioctl() described just above. So one way to add a signature to an fs‑verity file would be to create the file once, enable fs‑verity on the file without a signature, obtain the digest information, then create and enable the file a second time with the signature data. In practice, files to be protected this way (such as Android package files) will probably be shipped with the associated signature data, so this two-step process will not be necessary on the target systems.
The final piece for signature verification is the provision of a public key to verify against. The fs‑verity subsystem creates a new keyring (called .fs‑verity); a suitably privileged user can add certificates to this keyring for use in file verification. The signing key, of course, should not be on the target system at all; assuming that the attacker cannot obtain that key by other means, verification against the public key should provide assurance that the file has not been modified.
The ext4 and F2FS filesystems are supported in the current patch set. See the extensive documentation file provided for the patch set for a lot more details on how it all works. Some kernel features are added without sufficient documentation; fs‑verity does not look like it will be one of those.
Previous versions of the patch set have generated a lot of (sometimes heated) discussion. This time, the response has been silent, prompting Eric Biggers (the author of this work) to ask if anybody has any comments. Unless somebody shows up with objections, the logical conclusion is that the biggest concerns have been addressed and that fs‑verity may be on track for merging into the 5.3 kernel.
How many kernel test frameworks?
The kernel self-test framework (kselftest) has been a part of the kernel for some time now; a relatively recent proposal for a kernel unit-testing framework, called KUnit, has left some wondering why both exist. In a lengthy discussion thread about KUnit, the justification for adding another testing framework to the kernel was debated. While there are different use cases for kselftest and KUnit, there was concern about fragmenting the kernel-testing landscape.
In early May, Brendan Higgins posted v2 of the KUnit patch set with an eye toward getting it into Linux 5.2. That was deemed a bit of an overaggressive schedule by Greg Kroah-Hartman and Shuah Khan given that the merge window would be opening a week later or so. But Khan did agree that the patches could come in via her kselftest tree. There were some technical objections to some of the patches, which is no surprise, but overall the patches were met with approval—and some Reviewed-by tags.
There were some sticking points, however. Several, including Kroah-Hartman
and
Logan Gunthorpe complained about the reliance on user-mode Linux (UML)
to run the tests. Higgins said
that he had "mostly fixed that
". The KUnit tests will now run
on any architecture, though the Python wrapper scripts are still expecting
to run the tests in UML. He said that he should probably document that,
which is something that he has subsequently
done.
A more overarching concern was raised
by Frank Rowand. From his understanding, using UML is meant to
"avoid booting a kernel on
real hardware or in a virtual machine
", he said, but he does not
really see that as anything other than "a matter of
semantics
"; running Linux via UML is simply a different form of
virtualization. Furthermore:
I would guess that some developers will focus on just one of the two test environments (and some will focus on both), splitting the development resources instead of pooling them on a common infrastructure.
Khan replied
that she sees kselftest and KUnit as complementary. Kselftest is "a
collection of user-space tests with a
few kernel test modules back-ending the tests in some cases
", while KUnit
provides a framework for in-kernel testing. Rowand was
not particularly swayed by that argument, however. He sees that there
is (or could be) an almost complete overlap between the two.
Unlike some other developers, Ted Ts'o actually finds the use of UML to be beneficial. He described some unit tests that are under development for ext4; they will test certain features of ext4 in isolation from any other part of the kernel, which is where he sees the value in KUnit. The framework provided with kselftest targets running tests from user space, which requires booting a real kernel, while KUnit is simpler and faster to use:
Frameworks
Part of the difference of opinion may hinge on the definition of
"framework" to a certain extent. Ts'o stridently argued that kselftest is
not providing an in-kernel testing framework, but Rowand just as vehemently
disagreed with that. Rowand pointed
to the use of kernel modules in kselftest and noted
that those modules can be built into a UML kernel. Ts'o did not think
that added up to a framework since "each of
the in-kernel code has to create their own in-kernel test
infrastructure
". Rowand sees
that differently:
"The kselftest in-kernel tests follow a common pattern. As such, there
is a framework.
"
To Ts'o, that
doesn't really equate to a framework, though perhaps the situation
could change down the road:
In addition, Ts'o said that kselftest expects to have a working user-space environment:
Rowand disagreed:
No userspace environment needed. So exactly the same overhead as KUnit when invoked in that manner.
Ts'o is not convinced by that. He noted that the kselftest documentation is missing any mention of this kind of test. There are tests that run before init is started, but they aren't part of the kselftest framework:
Overlaps
There may be overlaps in the functionality of KUnit and kselftest, however. Knut Omang, who is part of the Kernel Test Framework project—another unit-testing project for the kernel that is not upstream—pointed out that there are two types of tests that are being conflated a bit in the discussion. One is an isolated test of a particular subsystem that is meant to be run rapidly and repeatedly by developers of that subsystem. The other is meant to test interactions between more than one subsystem and might be run as part of a regression test suite or in a continuous-integration effort, though it would be used by developers as well. The unit tests being developed for ext4 would fall into the first category, while xfstests would fall into the latter.
Omang said that the two could potentially be combined into a single tool, with common configuration files, test reporting, and so on. That is what KTF is trying to do, he said. But Ts'o is skeptical that a single test framework is the way forward. There are already multiple frameworks out there, he said, including xfstests, blktests, kselftest, and so on. Omang also suggested that UML was still muddying the waters in terms of single-subsystem unit tests:
But Ts'o sees things differently:
Gunthorpe saw some potential overlap as well. He made a distinction in test styles that was somewhat similar to Omang's. He noted that there are not many users of the kselftest_harness.h interface at this point, so it might make sense to look at unifying the areas that overlap sooner rather than later.
Looking at the selftests tree in the repo, we already have similar items to what Kunit is adding as I described in point (2) above. kselftest_harness.h contains macros like EXPECT_* and ASSERT_* with very similar intentions to the new KUNIT_EXECPT_* and KUNIT_ASSERT_* macros.
Ts'o is not opposed to unifying the tests in whatever way makes sense, but said that kselftest_harness.h needs to be reworked before in-kernel tests can use it. Gunthorpe seemed to change his mind some when he replied that perhaps the amount of work to unify the two use cases was not worth it:
Ultimately, what Rowand
seems to be after is a better justification for
KUnit and why it is, and needs to be, different from kselftest, in the patch
series itself. "I was looking for a fuller, better explanation than
was given in patch 0
of how KUnit provides something that is different than what kselftest
provides for creating unit tests for kernel code.
" Higgins asked for
specific suggestions on where the documentation of KUnit was lacking.
Rowand replied
that in-patch justification is what he, as a code reviewer, was looking
for:
But Gunthorpe did
not agree; "in my
opinion, Brendan has provided over and above the information required to
justify Kunit's inclusion
". The difference of opinion about whether
kselftest provides any kind of in-kernel framework appears to be the crux
of the standoff. Gunthorpe believes that the in-kernel kselftest code
should probably be changed to use KUnit, once it gets merged, which he was
strongly in favor of.
As the discussion was trailing off, Higgins posted v3 of the patch set on May 13, followed closely by an update to v4 a day later. Both addressed the technical comments on the v2 code and also added the documentation about running on architectures other than UML. There have been relatively few comments and no major complaints about those postings. One might guess that KUnit is on its way into the mainline, probably for 5.3.
SIGnals from KubeCon
The basic organizational construct within the Kubernetes project is a set of Special Interest Groups (SIGs), each of which represents a different area of responsibility within the project. Introductions to what the various SIGs do, as well as more detailed sessions, were a core part of KubeCon + CloudNativeCon Europe 2019, as the different groups explained what they're doing now and their plans for the future. Two sessions, in particular, covered the work of the Release and Architecture SIGs, both of which have a key role in driving the project forward.
SIG Release
Among the core Kubernetes SIGs is SIG Release, which is responsible for the release-engineering aspects of the project. Tim Pepper, open-source engineer at VMware's Open Source Technology Center and Stephen Augustus, senior cloud-native architect at VMware, ran a session that outlined some of the current issues and challenges for release engineering in Kubernetes. Pepper and Augustus currently serve as co-chairs for SIG Release, alongside Caleb Miles from Google.
Among the SIG's primary goals is to generate releases on a
reliable schedule, which involves partnership with the other Kubernetes
SIGs to make sure the work is all integrated properly. From a
release-engineering perspective, SIG Release is also tasked with providing
guidance and tooling to facilitate the generation of automated
releases. Pepper admitted that full automation is still a distant goal,
though the direction is to automate more human tasks and processes within
the release workflow. "In many ways we don't do a lot of the work that
people maybe presume that we do, they think that the release team for
example does a lot of work, but it's much more partnering with the SIGs
and empowering them to identify what they can get done on the timeline
that's associated with the release and integrate their work," Pepper said.
One of the key areas of intersection is with SIG PM (Project Management), which handles some of the tracking for large features, referred to as "enhancements", within the Kubernetes development cycle. Enhancement issues are not simple bug fixes, rather they involve changes that typically take multiple release cycles to complete and mature. Augustus explained that, within the SIG Release team, there are different roles for members, one of them is about tracking and working with SIG PM on enhancements. Enhancement leads are responsible for collating lists of open features within Kubernetes and deciding which ones will land in a given release.
SIG Release handles continuous-integration (CI) signals on the Kubernetes master branch, making sure that the test infrastructure is stable and working. These signals include various test results that could indicate a potential stability issue within a release. The group is also responsible for managing operations around branching and moving content in preparation for a release. When it comes to the actual release, SIG Release helps to put together the release notes and associated documentation.
The move to have all release engineering done transparently via SIG Release (and the open-source Kubernetes project in general) is part of the continuing evolution of Kubernetes away from being just a Google project. Kubernetes is celebrating its fifth birthday in 2019, and the project became the founding effort of the Linux Foundation's Cloud Native Computing Foundation (CNCF) in 2015. Though Kubernetes has been an open project for five years, it was only in August 2018 that Google announced that it was moving Kubernetes development infrastructure over to the CNCF. "One of the cool things that we've done over the last year, is turn this (release engineering process) from sort of anecdote and lore especially held within the minds of a a set of Googlers, into something that's really being run by the community," Pepper said.
Licensing and long-term support
One of the newer subprojects within SIG Release has to do with licensing. Pepper said that licensing shouldn't really be a challenge, since Kubernetes itself is licensed under the Apache 2.0 license. Apparently, however, there are some non-Apache licenses in the Kubernetes code as well. Pepper said that there isn't a clear uniformity of Apache-2.0 licensed files in the code base. "This is open source, we have to do things right and we need to make sure that we're in compliance with whatever all these other licenses are," Pepper said.
Given the scale of the Kubernetes project, there is a clear need for automated tooling to help identify the various licenses and make sure there are no license conflicts. The Linux Foundation is helping out with its FOSSology tool, which outputs a list of all licenses used in a code base into a spreadsheet. Augustus said that the Kubernetes project will be making use of a tool called FOSSA to help identify licenses in an automated manner when code is submitted to Kubernetes.
Another issue addressed in this session was long-term support for Kubernetes, which is released on a quarterly cadence; each release receives nine months of support from the open-source project for bug and security fixes. Pepper said that there has been a question in the community about whether there is a need for a longer support term for releases, as some users require more time for upgrades.
The question of how Kubernetes might handle a long term support release is being talked about in a working group within the project dubbed WG-LTS. "In Kubernetes, working groups don't own code, they're trying to figure out and propose a solution to a problem," Pepper said. "So this working group is trying to drive conversations around what are the actual end-user needs and then, with those cataloged, what are our options to match them."
SIG Architecture
While SIG Release deals with release engineering, SIG Architecture is about understanding, defining, and evolving the architecture of Kubernetes. Timothy St. Clair, senior staff engineer at VMware and Kubernetes Steering Committee member, ran a session where he outlined the current state of Kubernetes architecture. The scope of SIG Architecture is to maintain and evolve the design principles of Kubernetes as well as providing a consistent body of expertise that is necessary to ensure architectural consistency over time.
SIG Architecture handles tasks like the
API review process as well as conformance-test review and management. "API
reviews affect everybody, so if you have a bad API, that's a contract
that you're going to have to support for several iterations if you make a
mistake, so the API review process affects everybody," St. Clair said.
SIG Architecture also provides input into the ongoing evolution of Kubernetes features via the Kubernetes Enhancement Proposals (KEPs) process. A KEP is a document that defines the goals and capabilities of a new or updated feature within Kubernetes. St. Clair explained that SIG Architecture provides a stopgap review for KEPs that cut across different boundary lines in an effort to help retain the integrity and stability of Kubernetes' architecture overall. "The promise of Kubernetes, the whole purpose at a fundamental level is to be the abstraction layer that you can write across different providers and different clouds," St. Clair said. "If that abstraction layer doesn't hold true and we develop balkanization, the promise of Kubernetes is lost."
Overall, the direction that St. Clair hopes that Kubernetes will take from a core architecture perspective is to be less iterative on foundational elements. He'd instead like to see API extension mechanisms and extension policies that retain core stability while enabling extensibility by developers. Currently, stability is a work in progress across different aspects of the Kubernetes project and in particular for dot-zero releases (i.e. 1.14.0). St. Clair asked the audience how many people use a dot-zero Kubernetes release in production and a single hand went up. "If you use a dot-zero in production, you are far braver than 90% of the people I know," St. Clair said.
According to St. Clair, a Kubernetes dot-zero release is a good development release, but isn't really suited for production usage. He said that after each dot-zero release there can be hundreds of pull requests that come in. "All of a sudden it gets in the wild, then we find all the things that we're missing," St. Clair said.
Across both the SIG Release and SIG Architecture sessions, a unifying theme was that of community engagement and a call for more participation as the way forward to build a better, more inclusive Kubernetes project for everyone.
YouTube video of the SIG Release and SIG Architecture sessions are available.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: Firefox tracking protection; eBPF & XDP; CockroachDB relicensed; Quotes; ...
- Announcements: Newsletters; events; security updates; kernel patches; ...
