Leading items
Welcome to the LWN.net Weekly Edition for August 20, 2020
This edition contains the following feature content:
- Exploring LibreOffice 7.0: a look at some of the changes in this major LibreOffice release.
- PHP Debugging using Xdebug: better debugging for PHP code.
- Theoretical vs. practical cryptography in the kernel: a disagreement over which approach to cryptography really makes a system more secure.
- 5.9 Merge window, part 2: the rest of the changes merged for the 5.9 kernel.
- Searching code with Sourcegraph: a tool for finding things in a large code base.
- Voxel plotting with gnuplot 5.4: a closer look at a headline gnuplot 5.4 feature.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Exploring LibreOffice 7.0
The Document Foundation (TDF) has announced the release of LibreOffice 7.0. This major release is a significant upgrade from version 6.4.6, focusing on interoperability with Microsoft Office, general performance, and support for OpenDocument Format (ODF) version 1.3. A complete list of new features and bug fixes can be found in the release notes.
When talking about the latest LibreOffice release, one must also talk about ODF, the default format for LibreOffice documents. ODF version 1.3, which was approved as an OASIS Committee specification back in December 2019, offers several improvements to the format that LibreOffice can now take advantage of. For the security concerned, document encryption using OpenPGP (PGP) is a welcome addition. Further, while LibreOffice has supported digital signatures in past releases via SSL/TLS certificates, PGP keys can now be used to sign documents in LibreOffice 7.0. The notes on ODF in the release clarify the compatibility between LibreOffice versions:
Recent versions of LibreOffice should have no issues consuming "ODF 1.3 Extended" files. The only known exception is OpenPGP/GPG encrypted ODF 1.3 documents, which can be imported only since LibreOffice 6.4.5. If compatibility with old and no longer maintained ODF consumers (such as OpenOffice.org, Apache OpenOffice or LibreOffice 3.x) is required, and conformance to the ODF standard is no concern, the version "ODF 1.2 Extended (compatibility mode)" continues to be available.
To put these latest features to a test, two systems were used: Ubuntu 20.04 and macOS 10.15.5. In this experiment, a document was created in LibreOffice Writer on the Ubuntu machine, which was then opened on the macOS machine to verify the cryptography. For all tests, my LWN PGP key was used.
Initially, the most complicated aspect of getting digital signatures and encryption via PGP working was importing the keys themselves. LibreOffice relies on what it calls the "certificate manager" of the operating system, and each system offers unique challenges. In this case, there is an outstanding bug in LibreOffice from 2018 (version 6.0.2.1) for macOS that made things unnecessarily difficult. In Ubuntu, importing my keys via gpg required logging out in order for them to be accessible by Seahorse (bug report).
Using the available PGP keys in LibreOffice 7.0 to sign or encrypt documents is straightforward. Adding a digital signature to a document is a matter of choosing "Digital Signatures" from the menu (bringing up a list of current signers of the document), then clicking "Sign Document" to add another. Signing a document automatically saves it, and any further modification invalidates the signatures. An example of an ODT document signed by me is available for download, and below is a screenshot of that document:
To encrypt an ODT document using PGP is even more straightforward; checking the "Encrypt with GPG key" box when using "Save As" brings up an interface to select the key. Attempting to open an encrypted document without having the private key available causes LibreOffice to oddly prompt for a decryption password (one would hope that no password exists); with the private key available, the document is decrypted transparently.
The LibreOffice 7.0 release includes considerable improvements for
interoperability with the Microsoft Office software suite as well. For
Microsoft Word, LibreOffice now supports saving in the native 2013/2016/2019
Microsoft Word
MS-DOCX formats instead of the previous limitation of only the 2007
compatibility mode format. Compatibility mode was never an ideal solution for
interoperability with Microsoft Word, as Microsoft itself
has acknowledged: "this mode is intended to ensure users of
different versions of Microsoft Office can continue working together and
documents created with older versions of Office won't look any different when
they're opened in future versions of Office.
" TDF further explained the move by saying:
"this mainly benefits Word users - where documents can use
more features and Word's bug fixes since DOCX 1.0 can be
applied.
" The release notes go on to acknowledge that this change
adversely impacts users working with Microsoft Word 2010, and encourages
those users to switch to LibreOffice.
For LibreOffice Calc, a number of
improvements are included in the release. Two new functions for generating
non-volatile (generated once per cell) random numbers RAND.NV() and RANDBETWEEN.NV()
were added. For functions that allow the use of regular expressions,
Calc now correctly handles case-insensitivity flags. The
notes on this issue indicate that the "the default case-sensitivity
of the functions is not changed [...] the sensitivity only changes after
first (?-i) in the regular expression.
" In another Microsoft Office
compatibility improvement, exporting spreadsheets to Microsoft Excel with
sheet names with that are more than 31 characters is now supported.
Python macros to automate tasks will no longer be able to use CPython 2.7; the project has upgraded to CPython 3 exclusively. This change may cause problems for some macros; users should expect to address any scripting compatibility breaks between Python 2.7 and 3 when upgrading LibreOffice.
With LibreOffice 7.0 released, the community is looking ahead to the 7.1
release. The wiki for TDF provides the release notes for
7.1, updated as new features and bug fixes are addressed. The two notable
improvements listed are an experimental Outline mode for LibreOffice Writer and the ability to add visible signatures
to existing PDF files in the LibreOffice Draw application. Writer's Outline
mode facilitates making document outlines by providing a way to "fold all text from the current heading to next
heading
".
Readers who are interested in trying out or upgrading to version 7.0 can
download
binaries for Linux, macOS, and Windows platforms. The project also
provides Flatpak,
Snap, and AppImage builds. To get
started with LibreOffice, extensive
documentation of the project is available. As stated in the announcement,
TDF "does not provide any technical support for users
";
community support exists on the project's mailing lists and
the Ask LibreOffice web site.
A major thrust of the LibreOffice 7.0 release appears to be directed toward making it more compatible with (and more viable as a replacement to) Microsoft Office. With Microsoft shifting its attention to online offerings (Microsoft 365), TDF appears to realize that the idea of being forced into paying an eternal subscription fee for an office-productivity suite of software isn't ideal for anyone but Microsoft. After all, some users may not want to depend on an entirely online office-productivity suite. It is likely TDF hopes to attract those users who need or want their productivity software installed locally.
For those users looking for an
open-source online-office solution, TDF is also working on LibreOffice
Online concurrently with the offline version. LibreOffice Online is currently "suitable for home users
", but TDF
"is keen to avoid situations where an unsuitable version is deployed at
scale
". To underscore this position, LibreOffice Online displays a
prominent "not supported" warning if more than ten concurrent documents and/or
more than 20 concurrent connections are active.
For readers unfamiliar with the specifics of the project, LibreOffice is
primarily released under Mozilla Public License v2. Its community consists of a wide
range of corporate and individual contributors; the announcement for 7.0
stated: "74% of commits are from developers employed by companies
sitting in the Advisory Board, such as Collabora, Red Hat, and CIB, plus
several other organizations, and 26% are from individual volunteers.
"
The project regularly announces security
advisories and appears to fix serious vulnerabilities promptly.
LibreOffice 7.0 is a major improvement in a number of key ways, and
LibreOffice Online will likely continue to mature into a scalable solution. For current 6.4 users, the project will continue support for
"some months of back-ported fixes.
"
In all, thanks to the work of TDF, free Microsoft Office compatible
alternatives (online and offline) continue to make significant progress.
PHP Debugging using Xdebug
While PHP does not come with a full toolkit for debugging and profiling, an open-source project has existed almost as long as PHP to provide both: Xdebug. Created and maintained by PHP core developer Derick Rethans, it offers remote debugging, stack traces, profiling, and more. It is a project that anyone doing PHP development would benefit from using.
The Xdebug GitHub page claims over 60 contributors to the project. Its most recent release, version 2.9.6, was made on May 29. The Xdebug compatibility page provides a table describing which versions of Xdebug are still supported, including which Xdebug version should be used for the PHP version in question. Presently, version 2.9 supports PHP 7.4 (with security support for PHP 7.1 and above). Xdebug uses its own open-source license based on the PHP project's permissive license.
Xdebug visual features
Simply installing the Xdebug extension offers a number of niceties as compared to unmodified PHP, take for example this simple PHP script that causes an E_NOTICE error (because $foo is an undefined variable):
<?php
// Settings just to make sure all errors are displayed
error_reporting(E_ALL);
ini_set('display_errors', true);
// Turn off Xdebug (if enabled by default)
xdebug_disable();
echo "<h1>Without Xdebug</h1>";
echo "<br/>The value is: " . $foo . "<br/>";
xdebug_enable();
echo "<h1>With Xdebug</h1>";
echo "<br/>The value is: " . $foo . "<br/>";
Here is the output in the browser when this script is executed, both with and without Xdebug enabled:
Xdebug provides an enhancement to the var_dump() function, which is commonly used in development to dump the contents of a variable to the browser. Unmodified PHP will simply output the text without any HTML formatting (so it generally needs to be wrapped using the <pre> tag), where Xdebug provides an easier-to-read output:
Xdebug step-debugging
Xdebug isn't only limited to HTML-formatting of PHP outputs; it also offers a fully-featured remote debugger. With Xdebug, PHP scripts running on a remote server can be debugged by communicating using the DBGp debugging protocol. DBGp is authored by Shane Caraveo and Rethans; it provides a common means to communicate remote debugging commands for any language (for example, here is a Python implementation of DBGp). An open-source protocol means that plenty of clients implement DBGp, making it likely that Xdebug is supported in a developer's favorite editor. This article will be using a DBGp-compatible Vim plugin Vdebug to debug the PHP script. For reference, here is the script to be debugged:
<?php
$total = 10;
echo "<h1> First " . ($total + 2) ." Fibonacci numbers</h1>";
$a = 0;
$b = 1;
$vals = [];
echo "$a, $b, ";
for($i = 0; $i < $total; $i++) {
$d = $b + $a;
$vals[] = $d;
$a = $b;
$b = $d;
}
echo implode(', ', $vals);
When activated, Xdebug will make a connection to a DBGp client specified in php.ini. By default, that host will be localhost, but it can be changed as necessary using the xdebug.remote_host configuration setting (the port can be set using xdebug.remote_port). To activate the debugger for a given PHP script, multiple options are available. First, start a debugging session in the client (F5 in Vdebug), followed by activating the Xdebug debugger on the server. With the client listening, activating Xdebug for a request in our situation means appending XDEBUG_SESSION_START=<session_id> to the URL as an HTTP GET parameter (session_id is arbitrary):
http://localhost/step_debug.php?XDEBUG_SESSION_START=mysessionThis request, which causes Xdebug to open a connection to the DBGp client specified in the xdebug.remote_host setting, triggers the Vdebug debugging interface on connection:
Using Vdebug, the script can be stepped through (F2 for step over, F3 to step into), breakpoints can be set using :Breakpoint, and inline PHP code can be evaluated using :Vdebugeval. See the Vdebug documentation for a complete listing of available commands.
Profiling of PHP scripts
Along with debugging, Xdebug provides a mechanism to profile a PHP script's execution time by generating a Cachegrind-compatible file that can be visualized using the open-source KCachegrind application. To enable profiling of PHP scripts, we will need to set the xdebug.profiler_enable setting in the php.ini file to true, and provide a directory where Xdebug should put the generated profile files using xdebug.profiler_output_dir. Note that using the profiler can consume an extensive amount of disk space — over 500MB for complex applications. To only use the profiler for specific requests, you can set xdebug.profile_enable_trigger to true and adding XDEBUG_PROFILE as an HTTP variable to the request to be profiled:
http://localhost/step_debugger.php?XDEBUG_PROFILE=1
Profiling the simple PHP scripts from our previous example wouldn't produce much useful information. To give a better demonstration of Xdebug's profiler capabilities, we installed a new stub Laravel framework project via Composer using the following command:
$ composer create-project laravel/laravel profile-demo
This (significantly more complex) stub project was then profiled by visiting it in our browser and providing the XDEBUG_PROFILE HTTP variable. Here is the Cachegrind profile of that request, as viewed using KCachegrind:
Other useful functionality
Along with debugger and profiler tooling, Xdebug provides a collection of additional PHP functions. Some of these functions, like xdebug_break(), are meant to be used to apply logic to features like debugger breakpoints. Others, like xdebug_call_class(), provide more detailed information on the call stack, which is handy in augmenting logs for analysis. The PHPUnit project takes advantage of Xdebug's code-coverage-analysis functions when available, enabling useful reports to measure how effective a code base's unit tests are.
The functions provided by Xdebug to developers allow for tight integration between Xdebug and the application. In the example below, we take our Fibonacci sequence code from above, introduce a non-fatal error, and add a few Xdebug functions to improve the way errors are displayed:
<?php
xdebug_start_error_collection(); // Capture non-fatal errors
$total = 10;
echo "<h1> First " . ($total + 2) . " Fibonacci numbers</h1>";
// Let's remove this, which will cause an E_NOTICE error
//$a = 0;
$b = 1;
$vals = [];
echo "$a, $b, ";
for($i = 0; $i < $total; $i++) {
$d = $b + $a;
$vals[] = $d;
$a = $b;
$b = $d;
}
echo implode(', ', $vals);
xdebug_stop_error_collection();
$errors = xdebug_get_collected_errors();
if(!empty($errors)) {
echo "<hr/>";
echo "<h1>Non-Fatal Errors</h1>";
foreach($errors as $error) {
echo $error;
}
}
The above example produces a E_NOTICE error, which is normally displayed as soon as it is generated. In our case, that would be right in the middle of our script's output. In a more complicated script that generates complex HTML, it's possible for such a PHP error not even to be rendered in the browser (e.g. it happened in a <script> tag). Xdebug allows us to avoid this sort of problem by providing a means of first capturing these errors using xdebug_start_error_collection(), and then displaying them at a more opportune moment (e.g. after the output is finished) by using xdebug_stop_error_collection() and xdebug_get_collected_errors(). Here is the output of the script above, implementing these functions to defer errors until after our program has finished:
In closing
This article has provided an introduction to the most commonly used features of Xdebug. There is more to it, however, including features like garbage collection statistics, function traces, and DBGp proxies. Xdebug ably fills a gap by providing robust debugging and profiling that is missing from the PHP standard distribution.
Theoretical vs. practical cryptography in the kernel
Shortly before the release of the 5.8 kernel, a brief patch to a pseudo-random-number generator (PRNG) used by the networking stack was quietly applied to the kernel. As is the norm for such things, the changelog gave no indication that a security vulnerability had been fixed, but that turns out indeed to be the case. The resulting controversy had little to do with the original vulnerability, though, and everything to do with how cryptographic security is managed in the kernel. Figuring prominently in the discussion was the question of whether theoretical security can undermine security in the real world.Port numbers assigned to network sockets are not an especially secure item — they are only 16 bits, after all. That said, there is value in keeping them from being predictable; an attacker who can guess which port number will be assigned next can interfere with communications and, in the worst case, inject malicious data. Seemingly back in March, Amit Klein reported a port-guessing vulnerability to the kernel's security team; properly exploited, this vulnerability could be used to inject malicious answers to DNS queries, as one example.
The source of the problem comes down to how the kernel selects port numbers, which should be chosen randomly so as to not be guessable by an attacker. The kernel is able to generate random numbers that, as far as anybody knows, are not predictable, but doing so takes time — more time than the network stack is willing to wait. So, instead, the networking code calls prandom_u32(), which is a much simpler PRNG; it is effectively a linear-feedback shift register. That makes it fast, but unsuited to cryptographic operations; its output is a relatively simple function of its state, so anybody who can figure out what its internal state is can predict its output going forward. Klein, it seems, was able to do exactly that by observing the port numbers assigned by the kernel.
Since the problem results from an attacker having worked out the internal state of prandom_u32(), an obvious way to address the problem is to change that state, preferably before the attacker learns what it is. The patch that was applied (written by Willy Tarreau) does this by mixing some random data into the pool used by prandom_u32() on every interrupt. This reseeding is also done when process accounting is performed, just to ensure that it happens even if no interrupts are coming in. An attacker can figure out the internal state just as easily as before but, the reasoning goes, by the time that has been done and the information can be used, said state has changed.
The complaints
On August 4, Marc Plumb posted a
request that Tarreau's patch be immediately reverted and rethought.
His concerns came down to this: "it allows an attacker to determine the
state and [from] there [the] output of /dev/random. I sincerely hope that
this was not
the intended goal :)
". Since /dev/random is the kernel's
"real" random-number generator, compromising its state could lead to far
worse consequences than guessable port numbers; that was indeed not
anybody's intended goal.
Perturbing the state of prandom_u32() doesn't seem like something that could wreck the security of an unrelated random-number generator, so it is worth looking into the specifics of the complaint. Those were best explained by George Spelvin (a pseudonym associated with a long-time occasional kernel developer). In short: an attacker might be able to work out the value of the additional random bits, and those same bits are fed into the kernel's main entropy pool.
As Spelvin explained, the motivation for adding noise to the PRNG's state is a fear that an attacker may know the current contents of that state. An attacker with that knowledge can, obviously, predict the future "random" values that will be generated by the PRNG. If you inject k bits of random data into the PRNG, the attacker will lose that knowledge — to an extent. But, if k is sufficiently small, this attacker can simply try the PRNG with all 2k possibilities and see which one corresponds to the actual observed output. Once that has been done, the attacker has determined the value of those k bits and regained knowledge of the PRNG's internal state.
That might be bad enough, suggesting that the fix that was applied may be ineffective at actually closing the vulnerability. The other part of the complaint, though, is that the 32 bits applied to the PRNG come from the "fast pool", which can be thought of as a sort of queue of random bits waiting to be fed into /dev/random. These bits are not removed from the fast pool; they are still, eventually, used to add entropy to the primary random-number generator. This, it is argued, gives the attacker insight into the kernel's real random numbers and might compromise the random-number generator entirely.
There is general agreement in the kernel community that such a compromise is highly unlikely to lead to anything good. That is where the agreement stops, though.
If nothing else, Spelvin argued, the injection of random data into the PRNG must be done in a way that does not allow the guessing of that data. In practice, that means making k sufficiently large that a brute-force attack to calculate k bits becomes impractical; this "catastrophic reseeding" makes the use of that data safer. Even better, of course, would be to replace the simple PRNG with something that does not so readily expose its internal state.
Responses
Tarreau responded that the assumption that the attacker knows the PRNG state is not warranted; the purpose of adding the noise was to make guessing that state much more difficult. If the time required to calculate the PRNG's internal state is longer than the time between perturbations, there should never be a point where the attacker truly knows that state. If that is true, then concerns about exposing the random bits added to the PRNG go away.
He also questioned the practicality of any attack on /dev/random even if those bits are exposed:
Linus Torvalds's response was rather more direct and categorical:
Because I don't think they exist. And I think it's actively dangerous to continue on the path of thinking that stable and "serious" algorithms that can be analyzed are better than the one that end up depending on random hardware details.
More generally, he said, the insistence on theoretical security has often had the effect of making the kernel less secure, and he had a number of examples to cite. The random-number generator traditionally blocked in the absence of sufficient entropy to initialize it, causing users to avoid it entirely; this issue was only addressed in late 2019. The insistence that only "analyzable" entropy sources could be used to feed the random-number generator has led to systems with no sources of entropy at all, resulting in vast numbers of embedded devices having the same "random" state. And more; the whole message merits a read.
This same problem, he added later, delayed the resolution of the port-number-guessing problem:
The addition of noise, he said, can also be useful in the face of a Meltdown-style vulnerability. In that case, an attacker may well be able to read out the state of the random-number generator, but the regular mixing of noise should limit the usefulness of that information. In general, adding noise is the opposite of the "analyzability" that (he argues) crypto people want, and that is a good thing. It is better, he said, if the internal state of any random-number generator is not susceptible to analysis. Torvalds was quite clear that he would entertain no patches addressing theoretical problems.
Where to from here
It seems evident that the patch in question will not be reverted anytime soon; that would require the demonstration of a practical attack, and it is far from clear that anybody can do that. So, for now, port numbers generated by Linux should be a bit harder to guess and, presumably, the random pool as a whole remains uncompromised.
That said, nobody felt the need to defend the simple PRNG that underlies prandom_u32() now; the door is clearly open to replacing it outright if a suitable alternative can be found. That alternative would have to be something that does not make guessing its internal state so easy, but it would also have to retain the performance characteristics of the current implementation. Torvalds has proclaimed that he will not accept a replacement that slows things down.
Spelvin has proposed a
replacement described as "a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key
"; Tarreau played with it some and
discussions over how to improve it have been ongoing. Tarreau also said that he is
working on a replacement based on the Middle Square Weyl Sequence
random-number generator [PDF], but no patches have been posted yet.
Meanwhile, Ted Ts'o, the maintainer of the kernel's random-number generator, has expressed worries that perhaps prandom_u32() is being used in places where it shouldn't be:
Replicating that grep suggests that there are nearly 500 call sites to be worried about — and that doesn't count any BPF programs in the wild using bpf_get_prandom_u32(). Replacing prandom_32() might help to mitigate such concerns, but it will not eliminate them. The whole point of prandom_32() is that performance takes priority over security, so if it is being used in places that require a proper random-number generator, there may indeed be a need for some changes in the near future.
Overall, random-number generation remains a surprisingly difficult problem. It's hard to predict which issues will occupy kernel developers in, say, 2030, but chances are good that random numbers will be one of them. Meanwhile, Linux has always been a bit of a battleground between theoretically superior solutions and those that are deemed to work in real-world situations; that, too, is unlikely to change anytime soon.
5.9 Merge window, part 2
By the time Linus Torvalds released 5.9-rc1 and closed the merge window for this cycle, 12,866 non-merge changesets had been pulled into the mainline repository. Nearly 9,000 of those came in after the first 5.9 merge-window summary was written. Clearly the kernel-development community remains busy. Much of what was merged takes the form of cleanups and restructuring, as always, but there was also a substantial set of new features.
Architecture-specific
- The xtensa architecture has gained support for the audit and seccomp mechanisms.
- The csky architecture has also gained seccomp support.
- The RISC-V architecture now has support for a number of features, including code-coverage tracking with kcov, the kmemleak tester, stack protection, jump labels, and tickless operation.
- PowerPC has gained a queued spinlock implementation that provides
"
significantly improved
" performance in highly contended situations. - The arm and arm64 architectures now use the schedutil CPU-frequency governor by default.
Core kernel
- The proactive compaction patches have been merged. They perform memory compaction in the background, hopefully increasing the supply of large pages available to the kernel.
Filesystems and block I/O
- The SCSI subsystem can now make use of encryption hardware on UFS controllers to implement inline encryption.
- The device mapper's dm-crypt target now has options to avoid the use of workqueues for cryptographic processing. Not using workqueues can improve latency; it is also required to properly support zoned block devices (devices with regions that must be written sequentially) with dm-crypt.
- The NFSv4.2 client has gained support for extended attributes.
Hardware support
- Clock: Broadcom BCM2711 DVP clock controllers, Qualcomm IPQ APSS clock controllers, Qualcomm MSM8996 CPU clock controllers, and Qualcomm SM8150 and SM8250 graphics clock controllers.
- Graphics: Ingenic image processing units and Xilinx ZinqMP DisplayPort DMA engines and controllers.
- Industrial I/O: InvenSense ICM-426xx motion trackers and Sensirion SCD30 carbon-dioxide sensors.
- Miscellaneous: multi-color LEDs in a general way (see this commit for documentation), Turris Omnia LED controllers, Microchip timer counter capture devices, Qualcomm inline crypto engines, TI J721E PCIe platform host controllers, Xilinx Versal CPM host bridges, TI BQ2515X battery chargers, Mediatek MT6779 pin controllers, TI C66x and C71x DSP remote processor subsystems, and Khadas system control microcontroller interfaces.
- Networking: Vitesse Seville VSC9953 switches and Solarflare EF100 Ethernet cards.
- Sound: Maxim integrated MAX98373 speaker amplifiers and NVIDIA Tegra audio processing engines.
- Video4Linux: Xilinx CSI-2 Rx subsystems, Chrontel CH7322 CEC controllers, Mediatek DW9768 lens voice coils, Maxim MAX9286 GMSL deserializers, and IMI RDACM20 cameras.
- It's also worth noting that the "speakup" console speech driver, which has lived in the staging tree since the 2.6.37 kernel, has finally graduated out of staging.
Networking
- "BPF iterators" have been added for TCP and UDP sockets; these allow a BPF program to work through the list of open sockets and extract whatever information is of interest. Sample programs for TCP and UDP are included.
- The new BPF_PROG_TYPE_SK_LOOKUP BPF program type will run when the kernel is looking for an open socket for an incoming connection. The program can then decide which socket should receive the connection. This mechanism has been added as a way to "bind" a socket to a range of addresses or port numbers in a simple way.
- The parallel redundancy protocol is now supported.
Virtualization and containers
- The 32-bit PV guest mode has been removed from the Xen hypervisor; any remaining users (there are expected to be few) can use the better-supported "PVH" mode instead.
Internal kernel changes
- The way that priorities are assigned to kernel threads has been significantly reworked. The new API brings a lot more consistency in how realtime priorities are assigned across the kernel.
- The initrd code can no longer cope with a disk image stored on
multiple floppies. Christoph Hellwig noted: "
No one should be using floppies for booting these days. (famous last words..)
". - Kernel modules that import symbols from proprietary modules will themselves be marked as tainted, eliminating their ability to access GPL-only symbols in the rest of the kernel. This change, along with its motivation, is explained in this article from July.
Now the development community will take seven or eight weeks to stabilize this release, with a final 5.9 release expected in the first half of October.
Searching code with Sourcegraph
Sourcegraph is a tool for searching and navigating around large code bases. The tool has various search methods, including regular-expression search, and "structural search", which is a relatively new technique that is language-aware. The open-source core of the tool comes with code search, go-to-definition and other "code intelligence" features, which provide ways for developers to make sense of multi-repository code bases. Sourcegraph's code-searching tools can show documentation for functions and methods on mouse hover and allow developers to quickly jump to definitions or to find all references to a particular identifier.
The Sourcegraph server is mostly written in Go, with the core released under the Apache License 2.0; various "enterprise" extensions are available under a proprietary license. The company behind Sourcegraph releases a new version of the tool every month, with the latest release (3.18) improving C++ support and the 3.17 release featuring faster and more accurate code search as well as support for AND and OR search operators.
Code search
The primary feature of Sourcegraph is the ability to search code across one or more repositories. Results usually come back in a second or two, even when searching hundreds of repositories. The default query style is literal search, which will match a search string like "foo bar" exactly, including the quotes. Clicking the .* icon in the right-hand side of the search bar switches to regular expression search, and either of those search modes support case-sensitive matching (by clicking the Aa icon).
The [] icon switches to "structural search", a search syntax created Rijnard van Tonder (who works at Sourcegraph) for his Comby project. Structural searches are language-aware, and handle nested expressions and multi-line statements better than regular expressions. Structural search queries are often used to find potential bugs or code simplifications, for example, a query for the following:
fmt.Sprintf(":[str]")That will find places where a developer can eliminate a fmt.Sprintf() call when it has a single argument that is just a string literal.
The documentation has an architecture diagram that shows the various processes a Sourcegraph installation runs. There is also a more detailed description of the "life of a search query". The front end starts by looking for a repo: filter in the query to decide which repositories need to be searched. The server stores its list of repositories in a PostgreSQL database, along with most other Sourcegraph metadata; Git repositories are cloned and stored in the filesystem normally.
Next, the server determines which repositories are indexed (for a specific revision if specified in the search query) and which are not: both indexing the repositories and indexed searches are handled by zoekt, which is a trigram-based code-search library written in Go. (Those curious about using trigrams for searching code may be interested in Go technical lead Russ Cox's article about it).
Repository revisions that are not indexed are handled by a separate "searcher" process (which is horizontally-scalable via Kubernetes). It fetches a zip archive of the repository from Sourcegraph's server (i.e. gitserver) and iterates through the files in it, matching using Go's regexp package for regular-expression searches or the Comby library for structural searches. By default, only a repository's default branch is indexed, but Sourcegraph 3.18 added the ability to index non-default branches.
Code intelligence
The second main feature of Sourcegraph is what the company calls "code
intelligence": the ability to navigate to the definition of the variable or
function under the cursor, or to find all references to it. By
default, these features use "search-based
heuristics, rather than parsing the code into an AST [abstract syntax
tree]
", but the heuristics seem
to be quite accurate in the tests I ran. The tool found definitions in C,
Python, and Go without a problem, and even found dynamically-assigned
definitions in Python (such as being able to go to the
definition of the assigned and re-assigned scandir_python name
in my scandir project).
More recently, Sourcegraph has implemented a more
precise code-search feature (which uses language-specific parse trees
rather than search heuristics) using Microsoft's Language Server Index
Format (LSIF), a JSON-based file format that is used to store data
extracted by indexers for
language tooling. Sourcegraph has written or maintains LSIF indexers for several languages,
including Go, C/C++, and Python (all
MIT-licensed). Currently, LSIF support in Sourcegraph is opt-in, and
according to the documentation: "It provides fast and precise code
intelligence but needs to be periodically generated and uploaded to your
Sourcegraph instance.
" Sourcegraph's recommendation
is to generate and upload LSIF data on every commit, but developers can
also set up a periodic job to index less frequently.
Code intelligence queries are broken down into three types: hover queries (which retrieve the documentation associated with a symbol to display as "hover text"), go-to-definition queries, and find-references queries. Precise LSIF information is used if it is available, otherwise Sourcegraph falls back to returning "fuzzy" results based on a combination of Ctags and searching.
Open source?
Sourcegraph's licensing is open core, but the delivery is somewhat unusual:
all of the source, including the proprietary code, is in a single public
repository, but the code under the enterprise/ and
web/src/enterprise/ directories are subject to the Sourcegraph
Enterprise license, and the rest of the code is under the Apache
license. The pre-built Docker images provided by Sourcegraph include the
enterprise code "to provide a smooth upgrade path to Sourcegraph
Enterprise
", but the repository provides a build
script that builds a fully open-source image. The enterprise code
includes a check
to disallow more than ten users, but that won't be included in an
open-source build. Overall, building and installing the open-source version is not well
documented and its setup script may be missing some
steps — it definitely feels like a second-class citizen.
Sourcegraph (the company) runs a hosted version of the system that allows anyone to search "top" public repositories from various code hosts. It is unclear how "top" is defined, or exactly what repositories are indexed in this hosted version, but this version provides a good demonstration of the features available. The company's pricing page lists the features that are restricted to the enterprise version, including: the Campaigns multi-repository refactoring tool, support for multiple code hosts, custom branding, live training sessions, and more.
Setup
Installing the pre-built Sourcegraph images was quick using the docker-compose method, as shown in its installation documentation. It took a couple of minutes to get it up and running, and a few more minutes to configure it. I was running it on my local machine, so I used an ngrok tunnel to (temporarily) provide an internet-facing domain with https support (it didn't need this to run, but certain features work better if it is provided). The even quicker single-command Docker installation method also worked fine, but I decided to try out the docker-compose option: it seems slightly more realistic, as it's recommended for small and medium production deployments and not just local testing. For larger, highly-available deployments, Sourcegraph recommends deploying on a Kubernetes cluster.
Very little configuration was required to set things up: creating an admin user, and pointing the system at a code host (in my case, I needed to create a GitHub access token to allow Sourcegraph to access my public and private repositories on GitHub). As soon as the access token was added, Sourcegraph started cloning and indexing the repositories. A couple of minutes later, they were ready to search. The system is optimized for self-hosting; presumably the company wants to make it easy for developers to set it up for a small number of test users (and then ask them to start paying when they go above ten users).
One of the "features" that may give some people pause is what
Sourcegraph calls "pings"; by default, the
tool sends a POST request to
https://sourcegraph.com/.api/updates.com approximately every 30
minutes "to help our product and customer teams
". This
"critical telemetry
" includes the "the email address of
the initial site installer
" and the "total count of existing
user accounts
", presumably so the company can try to contact an
installer about paying for its enterprise offering when the ten-user
threshold is reached. It can only be turned off by modifying the source
code (the ping code is in the open-source core, so someone could
comment out this
line to get rid of it). By default, the system also sends aggregated usage information
for some product features, but this can be turned off by setting the
DisableNonCriticalTelemetry configuration variable. To its credit,
Sourcegraph is up-front about its "ping
philosophy", and clearly states that it never sends source code,
filenames, or specific search queries.
Browser and editor integrations
In addition to the search server and web UI, Sourcegraph provides browser extensions for Chrome and Firefox that enable its features to be used when browsing on hosts like GitHub and GitLab. For example, when reviewing a pull request on GitHub, a developer with the Sourcegraph extension installed can quickly go to a definition, find all references, or see the implementations of a given interface. As of June 2019, GitHub has a similar feature, which uses its semantic library, though the Sourcegraph browser extension seems to be more capable (for example, it finds struct fields, and not just functions and methods). The Sourcegraph browser extension tries to keep a developer on github.com if it can, but for certain links and definitions it goes to the Sourcegraph instance's URL.
Sourcegraph also provides editor
integrations for four popular editors (Visual Studio Code, Atom,
IntelliJ, and Sublime Text). These plugins allow the developer to open the
current file in Sourcegraph, or search the selected text using Sourcegraph
(the plugins open the results in a browser). The browser extensions and
editor plugins fit with one of Sourcegraph's principles:
"We eventually want to be a platform that ties together all of the
tools developers use
".
In conclusion
The development of Sourcegraph is fairly open as well, with tracking issues for the upcoming 3.19 and 3.20 releases, as well as a work-in-progress roadmap. Along with many improvements planned for the core (search and code intelligence), such as "OpenGrok parity", it looks like the company is working on its cloud offering, and that the Campaigns feature will see significant improvements.
Sourcegraph looks to be a well-designed system that is useful,
especially for large code bases and big development teams. In fact, the
documentation implies
that the tool might not be the right fit for small teams: "Sourcegraph is
more useful to developers working with larger code bases or teams (15+
developers).
" Some may also be put off by the poorly-supported open-source build and the phone-home "pings"; however, it does look like some
folks have persisted with the open-source version and have gotten it
working.
Voxel plotting with gnuplot 5.4
In this followup to our coverage of the release of gnuplot 5.4, we look more deeply at one of the new features: voxel plots. We only briefly touched on these plots in that article, but they are the most conspicuous addition in this release of the free-software graphing tool. Voxel plotting provides multiple ways to visualize 3D data, so it is worth looking at this new plot type in more detail.
Voxel grids
Previously, we introduced the simple physical system of an electric dipole, with two equal but opposite charges placed on the z axis at 0.25 and 0.75. It should be instructive to stick with this one problem, and explore different ways to visualize the 3D potential field produced by the charges.
Before diving in to voxel plotting proper, we need to understand that the familiar splot command has new 3D powers. splot used to stand for "surface plot", but can now do much more. For example, we can use it to plot the positions of our two charges like this:
set xrange [0:1]; set yrange [0:1]; set zrange [0:1] set view 65,40 set xyplane at -0.1 set border 4095 unset key $charges << EOD 0 0 0.75 1 0 0 0.25 -1 EOD splot $charges using 1:2:3:4 with points\ pointtype 7 pointsize 5 linecolor palette
This script produces this picture:
The first six lines of the script set the ranges of the display bounding box, the angle of view, the position of the bottom plane, and set the borders to surround the box on all sides. The next line, beginning with $charges, defines a "data block" consisting of the following two lines. Each line contains x, y, z, coordinates and, in the fourth column, the magnitude of the charge. The final command, broken over two lines, plots the two charges using their positions, extracted with the using 1:2:3 piece, and the charge value from the fourth column, extracted with the :4. This value is used to decide which colors the plotted points should be, by mapping the value onto the color palette, which is what the "linecolor palette" tells gnuplot to do. The other clauses set the pointsize to be five character widths and the pointtype to a circle (7).
Next, we will make a graph of the 3D structure of the potential field around these two charges. For this, we turn to the voxel grid. Just as a 2D image, such as a photograph, is a rectangular array of pixels, data in 3D can be represented as a 3D rectangular array of voxels, or volume pixels. Each voxel has x, y, and z coordinates, and a numerical value attached to it, so the voxel grid can represent a function of three variables, f(x, y, z). Note that this is completely new in gnuplot 5.4; previously, 3D plotting was confined to the plotting of surfaces or other representations of functions of two variables.
The way that the voxel commands in gnuplot are structured fits naturally with concepts from physics, which is why we've chosen a physical system for our example. In classical physics one has sources and the fields that they give rise to. The source could be a mass, such as a planet, creating a gravitational field; or, as in our case, a charged particle, giving rise to an electrical field. The field from a collection of sources is the sum of the fields from each individual source. The sources in our examples will be the two charges depicted above. The field created by adding the potentials from the two charges will be calculated and stored on the voxel grid.
Voxel plotting proceeds in two stages. First, after the voxel grid is initialized, it is filled with values calculated from any number of sources. This stage attaches the numbers to the locations on the voxel grid; it defines the f(x, y, z). In the second stage, we use the various styles of the splot command to create visualizations of this 3D function. Thus the calculation that establishes the f(x, y, z) need only be done once; the data is permanently stored in the voxel grid. Then we can make any number of plots of different types, from different angles, and so on. This two-stage process may seem cumbersome, but, since the initial filling of the voxel grid can be computationally intensive, it is actually more efficient to be able to do this just once.
There can be any number of sources. In our example, there are just two: the positive and negative charges. They need not correspond to voxel locations, and there can be more sources than voxel grid elements. The collection of sources, in other words, is independent of the configuration of the voxel grid.
The new command set vgrid establishes the voxel grid. It requires a name beginning with a dollar sign and a size; an example of a complete command would be:
set vgrid $v size 25
which defines a 25 × 25 × 25 grid initialized with 0 at each voxel. Voxel grids only come in the cubic variety. This command also sets the voxel grid $v to be the currently active grid, so that certain subsequent commands will automatically operate on it.
The command to fill the voxel grid with values from the sources is vfill. A complete vfill command has this form:
vfill source using x:y:z:radius:value
where source is a file or data block that must have at least three columns; in our example, that's the data block called $charges. Each line of source will become a source of data for the voxel grid. The first three colon-separated fields after the using keyword refer to the three spatial coordinates; if we put 1:2:3 here then they will be taken from the first three respective columns in source. The radius is required and indicates that the source values are to be applied on the voxel grid within a sphere with the indicated radius for each point in the source. The value indicates what to add to the voxel grid points inside that radius for each source point. This can be a constant number, some function of the coordinates, values taken from other optional columns in source, or anything else.
For example, say there was a file of 3D data, called "f", stored on disk; it has a 100 × 100 × 100 array, with x, y, z coordinates in the first three columns, ranging from 0 to 1 along each dimension, and the data value in the fourth column. This data could be assigned to a voxel grid directly with the following commands:
set xrange [0 : 1]; set yrange [0 : 1]; set zrange [0 : 1] set vxrange [0 : 1]; set vyrange [0 : 1]; set vzrange [0 : 1] set vgrid $v size 100 vfill "f" using 1:2:3:(0):4
Here we have set the voxel grid ranges (vxrange, etc.) to be the same as the coordinate ranges in the data, and they have the same dimensions, so the data will correspond exactly to the voxel grid. In vfill we use a radius of 0 so that each point in the "source" will only affect the voxel that it lies on top of. In a using clause, bare numbers refer to columns in the data, but parentheses are used to delimit arithmetic expressions, so (0) is just the number zero.
For our dipole example, we want the voxel data to come from the value of the potential, which you may remember looks like charge/r, where r is the distance from the charge (we are not worrying about units here). But that calculation goes infinite when r is zero, so we need to handle that case, which we can do by using gnuplot's ternary operator to define the potential function as:
pot(r) = r > 0 ? 1/r : 10**6
which avoids the infinity by substituting an arbitrary large number.
We will set the ranges of the bounding box and the voxel grid to go from 0 to 1, and in the vfill command we will use a radius large enough to include the entire box, because the range of the potential field is infinite:
vfill $charges using 1:2:3:(2):($4*pot(VoxelDistance))
Here we have used gnuplot's convenient VoxelDistance function, which computes r for us. In the parenthesized expressions in using clauses, source columns are referred to with dollar signs; so here we are extracting the charge values from the fourth column in the data. After executing the vfill, we will be able to plot the voxel grid in various ways.
It would make sense to use a color palette for representing the potential field that clearly distinguishes between positive and negative values, and that blends into the white background for values close to zero. We can define a palette with these properties with the command:
set palette define (0 "red", 0.5 "white", 1 "blue")
This maps the entire range of values stored in the voxels to the range zero to one, then uses those mapped values to determine which color to choose from the palette. So this will map the smallest (most negative) voxel values to pure red, blending into white in the middle of the range, where the potential is zero, and grading to pure blue at the maximum. We have already used this palette to plot the two charges in our first illustration.
Now the command
splot $v with points above -10**6 pointtype 7 pointsize 0.4 linecolor palette
will plot the voxel grid $v with a style that places a point at each voxel location. The above clause plots all those points with values above the given number; as there is no built-in way to just plot everything, we are obligated to choose a number smaller than any that exists on the grid. The pointtype 7 means closed circles, and the "linecolor palette" tells gnuplot to color the points using the palette that we defined above.
This plot suffers from a kind of moiré pattern, due to viewing periodic arrays of points superimposed on each other. This artifact is not only annoying, but can give a false impression about the nature of the data that the visualization is trying to clarify. Fortunately, the problem can be eliminated by applying jitter. The jitter command appeared first in gnuplot 5.2, and was described here in an earlier article. It applies small random displacements to plotted points; this application of noise is just what is needed to break up the alignments that cause the moiré interference in voxel plots.
If we insert the following command before the splot command in the aforementioned script:
set jitter spread 0.3
we get the following output:
There are many possible variations of the point-plotting of voxel grids. The opacities, colors, types, and sizes of the plotted points can all be made to depend on the field values in various ways. This can quickly become complicated, but we have one example in the Appendix.
Projection onto surfaces
Rather than trying to visualize the whole volume all at once with a mass of points, a popular alternative technique for 3D visualization is to cut the volume with a plane and plot the projection of the field on it. This is done using the "++" synthetic data source, which will generate coordinate pairs over the bounding box range, with steps based on the samples and isolines settings. It is explained, tersely, on pp. 112–113 of the gnuplot manual [2.1MB PDF]. The command:
splot "++" using (0.2):1:2
will draw a plane at x = 0.2, parallel to the y-z axis.
We want to color this plane according to the values of the voxel grid elements that it intersects. To do this, we add a fourth field to the using clause that takes the values from the voxels:
splot "++" using (0.2):1:2:(voxel(0.2, $1, $2)) with pm3d
With the same preliminaries as before, and after making the settings mentioned above, the command produces this plot:
To get an overview of the 3D structure of the field, we need more than one slice. By looping through a series of splot commands, we can lay down a set of slices. If we make the surfaces transparent, we can create a clear visualization of the field. To do this, we'll replace the single splot command above with this:
set style fill transparent solid 0.4 splot for [j=0:9] "++" using (j/10.):1:2:(voxel(j/10., $1, $2)) with pm3d
The solid 0.4 sets the opacity of the surfaces; the smaller the number, the more transparent they are. The splot loop replaces the constant x-value we used for the single slice with 10 values from 0 to 9 (thus generating slices at each x, 0.0 to 0.9).
While probably not necessary for this example, the visualization of complex 3D structures can sometimes be aided by animating some aspect of the plot: the angle of view, the range of values plotted, or the position of a cross-section. We can easily adapt the previous command to save a series of plots on disk, using the block iteration syntax introduced in gnuplot 4.6:
do for [j=0:99] { set out "slfr" . gprintf('%02g', j) . '.png' splot "++" using (j/99.):1:2:(voxel(j/99., $1, $2)) with pm3d }
This uses the same splot command as in the immediately preceding example, but with 100 slices rather than 10. Before each splot command we set the output file name to something that contains the slice number using two digits, so that the names will be sorted correctly for post-processing. We construct the file name string with gnuplot's gprintf() function, which is similar to the familiar sprintf() function from C (and which is also available within gnuplot), but a little simpler. After running this, we will be in possession of 100 files named slfr00.png to slfr99.png.
There are many programs that can stitch these up into a movie. For these purposes I use ImageMagick. If you have this installed, the command is simply:
convert slfr*.png slfr.gif
And the result is:
Isosurfaces
The splot command for visualizing voxel grids has one more trick up its sleeve: isosurfaces. This is a surface, or a set of surfaces, that show where the voxel grid has certain values. In other words, it is the 3D analogue of contour lines for functions of two variables. And, as with contour lines, they may need to be interpolated between grid values.
The following command draws the isosurfaces showing where the voxel grid has the values 1 or -1. Gnuplot colors the "top" and "bottom" of the surfaces differently by default. The rest of the setup is the same as before, so we need not repeat it—but the depthorder command is necessary for the rendering to be correct; it ensures that pieces of the surface "closer" to the viewer are drawn over those farther away.
set pm3d depthorder splot $v with isosurface level 1, $v with isosurface level -1
That results in the following graph:
We can also apply transparency and iteration to isosurface plotting, to create a set of nested isosurfaces that amount to a 3D contour plot, perhaps the most useful visualization for this particular problem:
set style fill transparent solid 0.4 splot for [j=-100:100:1] $v with isosurface level j
Here we see all of the isosurfaces for the whole number values from -100 to 100:
Visualization of 3D data is inherently complicated. Admittedly, some of the commands and concepts that gnuplot exposes for voxel plotting can take a while to understand; that was certainly the case for me. The official manual is quite terse; most of what can be done with voxel grids I discovered through experimentation and brief excursions through the source code. But one reason that these commands seem arcane is their flexibility; once they are mastered they enable almost anything that can be imagined. For example, the technique described here for cutting the space with a plane can be immediately extended to the projection of voxel data onto curved surfaces, such as a cylinder embedded within the potential field. In other words, gnuplot may sometimes demand a bit much from the user, but it offers a lot in return.
Page editor: Jonathan Corbet
Next page:
Brief items>>