LWN.net Weekly Edition for October 13, 2016
A tale of two conferences
Your editor just spent two weeks in Europe attending two technical conferences. Numerous articles from specific sessions appear in this week's edition, with a few more yet to come. But there is value in pondering for a moment on the nature of the two events themselves; both have some things to say about where our community is going.
LinuxCon Europe
LinuxCon Europe was held in Berlin this year, colocated with ContainerCon and who knows what else (it's worth noting that these conferences will soon be rebranded into the "Open Source Summit"). It was a large event, with well over 1,000 attendees. As is generally the case with Linux Foundation events, it was impeccably organized and generally busy, with nearly a dozen tracks running simultaneously.
Early LinuxCon events featured a fair amount of low-level technical content, and there was a strong focus on kernel-related topics. Kernel developers were well represented in the audience and tended to receive a fair amount of attention. At LinuxCon Europe 2016, instead, there was no kernel panel, relatively little kernel-related content, and it seems fair to say that most of the attendees didn't know who the few kernel developers present at the event were. Their attention was elsewhere.
Eleven of the talks at this event featured "orchestration" in their titles; 39 mentioned containers, 30 mentioned Docker, and there were nine on Kubernetes. It seems pretty clear that containers are where a lot of the action is at the moment. Perhaps this is a way of saying that, to a great extent, the problems at the lower levels of the system have been solved, so the interesting things to work on are higher up the stack. Or, at least, there is perceived to be more money higher up the stack.
Meanwhile, 17 talks mentioned the kernel, so all is not lost for those of us who are more drawn to lower-level code. One high point was the "Outreachy internship report", where Shraddha Barke, Ioana Ciornei, Cristina Moraru, Ksenija Stanojević, and Janani Ravichandran presented the work they had done during their internships. It was an exercise in optimistic and youthful development energy in general. But your editor also realized, while sitting in the room, that he had never before been in the presence of that many female kernel developers at the same time. That is a sad reflection of the state of the kernel development community, but also a hopeful sign that, maybe, things can get better.
All told, LinuxCon was an intensive and interesting event and a worthwhile snapshot of where at least a part of our community is heading. But, while one is in the middle of the crowds, the overtly commercial keynotes, the show floor, etc., it's sometimes hard not to miss the kind of event we used to have. Linux conferences were once less slick and more focused on the code. What we have in LinuxCon is good, but there is more to our community than what is on offer there.
Kernel Recipes
The week prior to LinuxCon, your editor was fortunate enough to attend Kernel Recipes in Paris. This is the fifth year that this event has been run, but the first time that LWN has been able to be there. This event is a breath of fresh air for anybody who finds the LinuxCon scene to be a bit overwhelming at times.
Kernel Recipes is, at its heart, a three-day gathering for developers to sit down and talk about kernel-related topics. While LinuxCon had a dozen tracks, Kernel Recipes features exactly one. Everybody is in the same room, and, from what your editor saw, all the attendees made a point of being there for the entire event. The conference has a limit of 100 attendees — a limit that was hit in less than two days this year. Each session was a discussion, with wide participation throughout the room. The overall level of engagement was high.
It seems certain that new and interesting work will be inspired by the discussions that happened here. So in that sense, if no other, Kernel Recipes must be seen to be a successful event. The keys to this kind of success would appear to be keeping the size small and a relentless focus on bringing in high-quality talks. Your editor, who sometimes needed prodding to confirm and prepare for his presence there, can attest to the relentless part.
The "small" criterion can be a bit of a problem since it, naturally, limits the number of people who can participate in this kind of event. The Linux Plumbers Conference (now just a few weeks away) is always trying to find the right balance between size and quality of the event, and there, too, tickets tend to sell out quickly. The nice thing about an event like Kernel Recipes, though, is that it ought to be reproducible in other parts of the world. We have a ready supply of good speakers and interesting things to talk about in our community, and it doesn't take that many speakers to make an event like this work.
In the end, it was a privilege to be able to attend both events. Your editor's only regret was being unable to stay in Berlin for the Embedded Linux Conference Europe the following week. Conferences are an opportunity to get a sense for what is happening in our community and to renew one's enthusiasm and energy; both LinuxCon and Kernel Recipes succeeded on all of those fronts. A diverse community needs a diverse range of events; happily, that is just what was in store in Europe during these weeks.
An introduction to color spaces
The Kernel Recipes conference is, unsurprisingly, focused on kernel-related topics, but one of the potentially most useful talks given there was only marginally about the kernel. Applications that deal with the acquisition or display of video data must be aware of color spaces, but few developers really understand what color spaces are or how they work. Media subsystem maintainer Hans Verkuil sought to improve this situation with an overview of the color-space abstraction.His slides started with the v4l2_pix_format structure, which describes the pixel format of the data returned by a video capture device (such as a webcam). The colorspace member of that structure, in particular, identifies the color space in which the pixel data is expressed. Developers of applications (and drivers) for media devices must specify an appropriate color space, but few developers, Verkuil said, understand what that field really means.
Color, he said, can be thought of as a signal consisting of light at one or more specific frequencies and powers. The human eye, though, does not detect all of those frequencies directly. Instead, it has three types of "cones" with sensitivities centered around three specific frequencies — nominally red, green, and blue. Light at a specific frequency and power will generate a certain level of signal from each type of cone; the three-value tuple that results is how color is signaled to the brain. As it happens, there is an infinite set of frequency/power distributions that can result in the same three values. Reproducing a color, as far as the brain is concerned, is just a matter of reproducing a specific color tuple. Photographs and video displays take full advantage of that fact; the colors they produce are not the original colors, but they are able to fool the eye into seeing the the original colors.
Color spaces
When dealing with visual data, we need a way to uniquely identify colors; that is where a color space comes in. Back in the 1920s, the CIE (Commission Internationale de L'Éclairage) performed a set of studies mapping wavelengths onto the RGB values that replicate them. Those values became the CIE RGB color space. A simple linear transformation turns CIE RGB into the CIE XYZ color space, which has a couple of practical advantages: it allows all colors to be represented using positive values, and the Y value describes the overall brightness (luminance) of the color. Among other things, it turns the range of possible colors (at a given luminance) into a two-dimensional quantity. All other color spaces are based on CIE XYZ — which was developed in the 1920s from measurements on a pool of 17 people.
In general, a color space defines three "primaries" that can be thought of
as the red, green, and blue colors, though they don't always correspond to
those colors. Each color space also has a "white point" describing the
maximum output value for each primary. Once upon a time, color spaces
corresponded to the physical properties of the phosphors found in CRT
screens, but that is no longer the case.
Color spaces as described thus far are linear, with values corresponding directly to the light levels of the primaries. The human eye does not respond to light linearly, though; a doubling of the light level does not look twice as bright. If (say) eight bits are used to represent a primary value in a color space, many of the 256 available values will be wasted on tiny differences between the brightest values, while the resolution is too coarse at the dim end of the scale. So colors are often represented in nonlinear color spaces that better match how the eye responds.
If a linear color space has RGB values, then a nonlinear equivalent can be obtained by applying a "transfer function" yielding a new set of values, called R'G'B'. The primes should be used for nonlinear color spaces, but almost everybody leaves them out, with the result that nobody ever knows which kind of color space is being talked about. The transfer function is often called a "gamma function", Verkuil said, but that is not quite correct. The screen will typically apply an inverse transfer function to color values to get the actual intensities to display; needless to say, the transfer function and its inverse need to match or colors will not be displayed correctly. OpenGL programmers need to be aware that textures use linear RGB values by default, not nonlinear R'G'B'.
Video applications often deal with colors in the Y'CbCr (or YUV) "color space", but it is not actually a separate color space. Y'CbCr is derived directly from R'G'B' via a matrix multiplication; it is simply a different representation for the same color space. Even so, it seems that a color space can define its own matrix (or even more than one) for this transformation.
Verkuil went quickly through some of the more prominent standards in this area.
- The Rec. 709
color space is for high-definition television; it is, he said, "nicely
done."
- The best-known color space, perhaps, is sRGB, which is typically used
for computer graphics. It has the same chromaticities (primaries) as
Rec. 709, but the transfer function is different.
- SMPTE 170M is the
color space for standard-definition TV; it has the same transfer
function as Rec. 709, but the chromaticities are different.
- BT.2020 is for ultra-high-definition television with at least ten bits for each color component. There are two separate Y'CbCr encodings defined for this color space.
Once the color space and encoding have been figured out, there is one more complication in the form of limited-range encoding. Normally, eight-bit R'G'B' color values use the full 0..255 range, but colors in the Y'CbCr encoding are compressed to fit in the narrower 16..235 range. Limited-range R'G'B' does exist in the wild, though, as does full-range Y'CbCr. The limited-range encoding is a holdover from the old analog television days, when the margin at either end was needed to handle errors. Everything is digital now, but we are still stuck with limited range encoding in a number of situations. Some transports use the out-of-range values as sentinel values for in-band signaling.
After spending a lot of time gaining a better understanding of color spaces, Verkuil added some additional fields to the v4l2_pix_format structure:
__u32 ycbcr_enc; /* enum v4l2_ycbcr_encoding */
__u32 quantization; /* enum v4l2_quantization */
__u32 xfer_func; /* enum v4l2_xfer_func */
These fields describe which Y'CbCr encoding is in use (if any), whether full-range or limited-range quantization is in use, and which transfer function has been applied. Now it is possible for user space to learn everything it needs to handle color spaces correctly — but user-space developers still ignore it all, he said. There is one exception to that, actually: the GStreamer developers have worked hard to get their color-space handling right.
When things go wrong
What happens if you don't put in that effort and don't get things right? There are a number of problems that can result, and it's not all the developers' fault. The names for the color spaces are confusing (CIE XYZ is not the same as CIE xyz or CIE Yxy), conversion matrices can be buggy, and, of course, there is the full range of exciting surprises that originate in hardware implementations.
One thing that often goes wrong is a confusion between SMPTE 170M and Rec. 709. These two color spaces have different primaries, leading to slight color differences. Those differences are indeed slight, though, to the point that only an expert is likely to notice them; most developers can safely ignore this particular difference. It is slightly visible on LCD screens, but completely disappears when projectors are in use.
Things go a bit further amiss when the Rec. 709 and sRGB transfer functions are confused. The differences here are more noticeable, especially toward the black end of the scale. Using the wrong Y'CbCr encoding is quite a bit more obvious; that's something that customers will notice. Using limited-range quantization when full-range is expected (or the reverse) is also quite evident. This one tends to manifest itself when somebody is displaying an Excel spreadsheet; the slight color difference between adjacent rows will vanish if a full-range signal is interpreted as limited-range.
The slides from the presentation give examples of the visible differences resulting from the above problems. The media subsystem documentation has information for developers wanting to learn more about using color spaces with video acquisition devices.
[Your editor would like to thank Kernel Recipes for supporting his travel to the event.]
OpenSSL after Heartbleed
Rich Salz and Tim Hudson started off their LinuxCon Europe 2016 talk by stating that April 3, 2014 shall forever be known as the "re-key the Internet date." That, of course, was the day that the Heartbleed vulnerability in the OpenSSL library was disclosed. A lot has happened with OpenSSL since that day, to the point that, Salz said, this should be the last talk he gives that ever mentions that particular vulnerability. In the last two years, the project has recovered from Heartbleed and is now more vital than ever before.
Ever since Heartbleed, every severe vulnerability has had to feature its
own catchy name, web site, and logo. It is good, in that it has forced
security researchers to develop some artistic sense. Seriously, though,
short names for vulnerabilities are useful. Saying that a system "is
susceptible to Heartbleed" makes sense to almost everybody. That is much
less true if one talks about "susceptibility to CVE-whatever" instead.
Heartbleed was also the first general defect that made it onto the front
pages of the mainstream press. Salz suggested that it may end up
outlasting the Kardashians.
At its core, Heartbleed was a simple bug, a missing buffer-length check. That resulted in the disclosure of some arbitrary memory that could be used to extract session and key information from a system running OpenSSL. This bug had been in place for three years when it was disclosed; nobody had managed to see it. The OpenSSL developers missed it, as did external security reviewers and code analysis tools (though, Salz noted, those tools all caught it three days after its disclosure). It was an ordinary bug, not as bad as some of its contemporaries ("goto fail" for example), but it got a lot of attention, perhaps because it, unlike many bugs, affected both server and client systems.
OpenSSL asleep
At the time of Heartbleed, the OpenSSL project had gone into a nearly moribund state. There were no policies for dealing with issues, and releases were not announced. There was no recognition that users might have to respond to releases containing security fixes. The source was complex and arcane; it "made procmail look pretty." There were almost no comments; indeed, there are still almost no comments, though it is starting to get better. The code was hard to maintain and hard to contribute to, especially for developers in the US due to crypto export issues. There were two overworked developers doing almost all of the work and making little money in the process; donations to the project were less than $2,000/year.
How did things get to this point? The project had put almost no time into building its community. It used mailing lists based on an old Majordomo server; they were unsearchable among other problems. Maintainership was static, with a great deal of wariness around the idea of letting anybody else in. The existing developers were strongly driven by the need to obtain consulting money, which caused them to focus on FIPS 140 certification. The project had no ability to make, much less adhere to, plans. The developers had fallen into a mindset that the easiest way to avoid breaking things was to not change them in the first place, so changes of any type were discouraged.
The advent of Heartbleed led immediately to a lot of negative feedback and hard questions. How can the project be trusted after such a vulnerability? When will the next one hit? These are, Salz said, good questions to ask.
After Heartbleed
One of the first after-effects of the Heartbleed disclosure was a wider recognition of the existence of critical, but underfunded projects. In response, the Linux Foundation launched the Core Infrastructure Initiative (CII) as a way of getting resources to those projects. That has helped OpenSSL, among others, to get back into a healthy state.
Before April 2014, OpenSSL had two primary developers, both of whom were volunteers, and no decision-making process. As of December of that year, the project had 15 members, two of whom are paid full-time by CII and two others who are paid from donations made directly to the project. There is a set of "very formal" decision processes in place now.
The group had its first in-person meeting that year in Düsseldorf, allowing
them to get to know each other — a critical part of making a group
functional. They drafted a set of major policies, covering release
strategies, security, coding style, and more. The project has become much
more transparent, in contrast to the insularity that prevailed before. Now,
Salz said, only the OpenSSL data structures are opaque. Much of the
project's work runs on GitHub now; email traffic on the lists has increased
and has also become much more useful.
In 2016 (so far), the project has seen 3,246 commits. It has put out one major release, 15 bug-fixing releases, and responded to 29 CVE numbers. In general, it shows a lot more activity from many more people and has a far better sense of community. Developers have gotten serious about closing issues reported against the project which, Salz said, is an important step toward building a community. If issues go without responses, users will eventually stop reporting them. The goal is to respond to all reports within four days; the actual response time remains about three times that number, but it's getting closer.
The support policies for OpenSSL release are now well defined. The 1.1.0 release will receive support through September 30, 2018, while 1.0.2, the project's first long-term supported release, will be maintained through the end of 2019. Support for 1.0.1 is limited to security fixes now, and will end entirely at the end of 2016. All older versions are unsupported now. It is important, they said, to drop support for older releases, giving the project the resources to move forward.
Along with more development, the project has more people who are actively looking for problems. There is more fuzz testing going on, and the static analysis tools have been updated; indeed, there is more automated testing in general. A recently added tool will modify the code to invert the sense of the conditions in if statements to see whether the test suite catches the resulting bug; if it doesn't, there is a coverage gap in the test suite. All code must be formally reviewed before being committed.
Future plans and lessons learned
There is now a posted project roadmap to work from. Currently, some 1,100 forks of the project exist on GitHub. At the top of the list for future development is support for TLS 1.3. The project plans to move to the Apache2 license, though they couldn't say just when that will happen. More testing is in the plans.
Then there is the ongoing issue of FIPS 140 certification. This certification is mandatory for most vendors selling software to the United States government; meeting this need essentially funded the project for five years. The validation process is, they said, a time-consuming and irritating thing to have to go through; it takes a couple of years and is expensive. It is an ongoing process as well, in order to keep up with changes in the software; FIPS 140 certification is not something that can be done once and forgotten about.
The OpenSSL FIPS 2.0 certification module applies to the 1.0.x release series. There is a new FIPS module coming, supported by SafeLogic; it will work with the 1.1.0 release. In general, the project is working to make the FIPS-related changes less intrusive in the code base as a whole.
What was learned from the whole Heartbleed experience? No project can rely on any one individual to continually perform superhuman feats; that will always lead to disappointment in the end. Code reviewers have to actually look at the code; they can't just hope the the community will somehow perform a proper review. A project also cannot rely just on tools, they will never do the full job. Proper code review takes a lot of time, and it needs to be done by experienced developers.
Needless to say, OpenSSL is looking for contributors. Beyond contributing patches, interested developers can test the pre-releases, report bugs, and help to close bugs. The presenters concluded by saying that they would like users to get in touch, especially those who are distributing OpenSSL further downstream.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Spam reduction with greylisting; new vulnerabilities in apache, bind, ffmpeg, freeimage, ghostscript, ...
- Kernel: The 4.9 merge window continues; On kernel maintainer scalability.
- Distributions: Supporting UEFI secure boot in Debian; FreeBSD, Fedora, ...
- Development: An update on input; Dontbug, GDB, Kexi, ...
- Announcements: FSF seeks nominations for Free Software Awards, Google Code-in 2016 and Google Summer of Code 2017, software patents, DRM, ...
