|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for January 9, 2020

Welcome to the LWN.net Weekly Edition for January 9, 2020

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Some median Python NaNsense

By Jonathan Corbet
January 3, 2020
Anybody who has ever taken a numerical analysis course understands that floating-point arithmetic on computers is a messy affair. Even so, it is easy to underestimate just how messy things can be. This topic came to the fore in an initially unrelated python-ideas mailing-list thread; what should the Python statistics module do with floating-point values that are explicitly not numbers?

Kemal Diri doubtless did not mean to start a massive thread with this request to add a built-in function to the language to calculate the average of the values in a list. That request was quickly dismissed, but the developers went on to the seemingly strange behavior of the statistics module's median() function when presented with floating-point not-a-number values.

What's not-a-number?

Python integer values have no specific internal representation and are defined to cover an arbitrary range. The float type, instead, is restricted to what the underlying CPU implements; on almost all current hardware, floating-point numbers are described by IEEE 754. That format looks like this:

[IEEE floating point]

The interpretation of these numbers is relatively straightforward — except for the quirks. Since this is essentially a sign/magnitude format, it is capable of representing both positive and negative values of zero; both are treated as zero during mathematical operations and (most) comparisons, but they are distinct numbers. The exponent is a biased value that makes comparisons easier. And, importantly for this discussion, an exponent value that is all ones is interpreted in special ways.

In particular, if the exponent is all ones and the mantissa is all zeroes, then the value represents infinity (usually written as "inf"). The sign bit matters, so there can be both positive and negative inf values. Any other value of the mantissa, instead, creates a special value called "not-a-number" or NaN. These values can be created by the hardware when an operation results in a value that cannot be represented — arithmetic overflow, for example. NaN values are also often used in scientific code to represent missing data, though that was apparently not a part of the original intent for NaNs.

There is more. With 52 bits to play with (plus the sign bit), it is obviously possible to create a lot of unique NaN values. Some code uses the mantissa bits to record additional information. But the most significant bit of the mantissa is special: if that bit is set, the result is a "quiet NaN"; if it is clear, instead, the value is a "signaling NaN". Quiet NaNs will quietly propagate through computations; adding a number to a quiet NaN yields another quiet NaN as a result. Operations on signaling NaNs are supposed to generate an immediate error.

(And, yes, some processors invert the meaning of the "quiet" bit, but that's more pain than we need to get into here.)

The problem with median()

The diversion of the initial thread came about when David Mertz pointed out that median() can yield some interesting results when the input data includes NaNs. To summarize his examples:

   >>> import statistics
   >>>
   >>> NaN = float('nan')
   >>> statistics.median([NaN, 1, 2, 3, 4])
   2
   >>> statistics.median([1, 2, 3, 4, NaN])
   3

This strikes Mertz as a nonsensical result: the return value from a function like median() should not depend on the order of the data passed to it, but that is exactly what happens if the input data contains NaN values.

There was no immediate consensus on whether this behavior is indeed a problem or not. Richard Damon asserted: "Getting garbage answers for garbage input isn't THAT unreasonable". Steven D'Aprano, who added the statistics module to the standard library, agreed: "this is a case of garbage in, garbage out". In both cases, the basis of the argument is that median() is documented to require orderable input values, and that is not what is provided in the example above.

Naturally, others see things differently. It turns out that IEEE 754 defines a "total order" that can be used to order all floating-point values. In this order, it turns out that signaling NaNs are bigger than infinity (in both the positive and negative directions), and quiet NaNs are even bigger than that. So some felt that this total order should be used in calculating the median of a list of floating-point values. Brendan Barnwell went a bit further by arguing that NaN values should be treated like other numbers: "The things that computers work with are floats, and NaN is a float, so in any relevant sense it is a number". D'Aprano disagreed strongly:

If something doesn't quack like a duck, doesn't swim like a duck, and doesn't walk like a duck, and is explicitly called Not A Duck, would we insist that it's actually a duck?

Despite insisting that median() is within its rights to return strange results in this situation, D'Aprano also acknowledged that silently returning those results might not be the best course of action. All that is needed is to decide what that best course would actually be.

What should median() do

Much of the discussion was thus understandably focused on what the correct behavior for median() (and for other functions in the statistics module that have similar issues) should be. There were two core concerns here: what the proper behavior is, and the performance cost of implementing a different solution.

The cost is the easier concern to get a handle on. median() currently works by sorting the supplied values into an ordered list, then taking the value in the middle. Dealing with NaN values would require adding extra tests to the sort, slowing it down. D'Aprano figured that this change would slow median() by a factor between four and eight times. Christopher Barker did some tests and measured a slowdown range between two and ten times, depending on how the checking for NaN values is done. (And, for the curious, testing for NaN is not as easy as one might hope.) Some of this slowdown could be addressed by switching to a smarter way of calculating the median.

Once those little problems have been dealt with, there is still the issue of what median() should actually do. A number of alternatives surfaced during the discussion:

  • Keep the current behavior, which works well enough for most users (who may never encounter a NaN value in their work) and doesn't hurt performance. Users who want specialized NaN handling could use a library like NumPy instead.
  • Simply ignore NaNs.
  • Raise an exception if the input data contains NaNs. This applies to quiet NaNs; if a signaling NaN is encountered, D'Aprano said, an exception should always be raised.
  • Return NaN if the input data contains NaNs.
  • Move the NaN values to one end of the list.
  • Sort the list using total order (which would move NaNs to both ends of the list, depending on their sign bits).
  • Probably one or two others as well.

D'Aprano quickly came to the conclusion that the community is unlikely to come to a consensus on what the proper handling for NaN values is. So he has proposed adding a keyword argument (called nan_policy) to median() that would allow the user to pick between a subset of the options above. The default behavior would be to raise an exception.

This proposal appears to have brought the discussion to an orderly close; nobody seems to strongly object to handling the problem in this way. All that is left is to actually write the code to implement this behavior. After that, one would hope, not-a-number handling in the statistics module would be not-a-problem.

Comments (33 posted)

Toward a conclusion for Python dictionary "addition"

By Jake Edge
January 8, 2020

One of Guido van Rossum's last items of business as he finished his term on the inaugural steering council for Python was to review the Python Enhancement Proposal (PEP) that proposes a new update and union operators for dictionaries. He would still seem to be in favor of the idea, but it will be up to the newly elected steering council and whoever the council chooses as the PEP-deciding delegate (i.e. BDFL-Delegate). Van Rossum provided some feedback on the PEP and, inevitably, the question of how to spell the operator returned, but the path toward getting a decision on it is now pretty clear.

PEP 584 ("Add + and += operators to the built-in dict class") has been in the works since last March, but the idea has been around for a lot longer than that. LWN covered a discussion back in March 2015, though it had come up well before that as well. It is a seemingly "obvious" language enhancement, at least for proponents, that would simply create an operator for dictionaries to either update them in-place or to easily create a combination of two dictionaries:

>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d + e
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> e + d
{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}

>>> d += e
>>> d
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

As can be seen, the operation would not be commutative as the value for any shared keys will come from the second (i.e. right-hand) operand, which makes it order-dependent. There are some who do not see the operators as desirable features for the language, but the most vigorous discussion over the last year or so has been about its spelling, with a strong preference for using | and |= among participants in those threads—including Van Rossum.

At the beginning of December, Van Rossum posted his review of the PEP to the python-ideas mailing list. He encouraged the authors (Brandt Bucher and Steven D'Aprano) to request a BDFL-Delegate for the PEP from the steering council, noting that he would not be on the council after the end of the year. D'Aprano indicated that he would be doing so. Apparently that happened, because, tucked away in the notes from the November and December steering council meetings was a mention that a BDFL-Delegate had been assigned—none other than Van Rossum himself.

In his review, he comes down strongly in favor of | and |= and had some other minor suggestions. He said: "All in all I would recommend to the SC to go forward with this proposal, targeting Python 3.9, assuming the operators are changed to | and |=, and the PEP is brought more in line with the PEP editing guidelines from PEP 1 and PEP 12." Given that, and that he is the decision maker for the PEP, it would seem to be smooth sailing for its acceptance.

That did not stop some from voicing objections to the PEP as a whole or the spelling of the operator in particular, of course, though the discussion was collegial as is so often the case in the Python world. Van Rossum thought that | might be harder for newcomers, but was not particularly concerned about that: "I don't think beginners should be taught these operators as a major tool in their toolbox". But Ryan Gonzalez thought that beginners might actually find that spelling easier because of its congruence to the Python set union operator.

Serhiy Storchaka is not a fan of the PEP in general, but believes that | is a better choice than +. He thinks there are already other ways to accomplish the same things that the operators would provide and that their use may be error-prone. He also had a performance concern, but Brett Cannon pointed out that it might only exist for CPython; PyPy and other Pythons might not have the same performance characteristics. Furthermore:

To me this PEP is entirely a question of whether the operators will increase developer productivity and not some way to do dict merging faster, and so performance questions should stay out of it unless it's somehow slower than dict.update().

Marco Sulla made the argument that using | is illogical because sets also support a set difference operation using -, while the PEP does not propose that operator for dictionaries (though it should be noted that a previous incarnation of the PEP did have "subtraction", but it was not well-received and was dropped). Andrew Barnert felt that "illogical" was not the right reason to choose one spelling over the other:

It’s logical to spell the union of two dicts the same way you spell the union of two sets; it’s also logical to spell the concatenation of two dicts the same way you spell the concatenation of two lists. The question is which one is a more useful analogy, or which one is less potentially confusing, not which one you can come up with a convoluted way of declaring illogical if you really try.

Sulla continued by saying that since list and string subtraction make no sense, that it is an unfair comparison. But Chris Angelico pointed out that's not necessarily the case either, since that operation does make sense in some contexts. While he doesn't necessarily think Python should add support for those use cases, "I do ask people to be a little more respectful to the notion that these operations are meaningful". What followed was a bit of a digression into mathematics and the meaning of various operations, much of which had little to do with Python.

There were two offshoots of the discussion. "Random832" suggested a generic way to add an operator specific to a module: all code in the module could use the operator but it would not bubble out from there. Cannon thought it could be quite confusing to programmers who did not realize the operator was redefined. "And debugging this wouldn't be fun either." Storchaka brought up some performance concerns, which could perhaps be worked around, but the general reaction to Random832's idea was negative.

Jonathan Fine thought that since the proposed | operator gives preference to the right operand ("merge-right" in his terminology), there was a need for a merge-left operation. He called it gapfill(), which was a puzzling name choice to some; it would only add values for keys in the right-hand operand that were not present in the left-hand one. While the use case of, say, filling in defaults to a dictionary that held command-line options is reasonable, there are a number of other ways to do that (as is also true for |, however). Fine did not propose that an operator be added but did note that some other Python operations could be seen to give preference to the left-hand operand, which might make the merge-right | operator confusing. There was not a lot of reaction to the idea, but it doesn't look to be going anywhere for now.

D'Aprano plans to update the PEP based on the feedback from Van Rossum and others. It presumably also needs to run the gauntlet of the python-dev mailing list before Van Rossum can decide its fate. There is still plenty of time for all of that before the Python 3.9 release, even though the project adopted a 12-month release cycle a few months back. Python 3.9 is due in early October; it's a pretty good bet that | and |= for dictionaries will make the cut. Even if they do not, though, one of the goals was to put the subject to rest once and for all; a rejected PEP would serve as a place to point those who ask about dictionary "addition" in the future.

Comments (7 posted)

A medley of performance-related BPF patches

By Jonathan Corbet
January 2, 2020
One of the advantages of the in-kernel BPF virtual machine is that it is fast. BPF programs are just-in-time compiled and run directly by the CPU, so there is no interpreter overhead. For many of the intended use cases, though, "fast" can never be quite fast enough. It is thus unsurprising that there are currently a number of patch sets under development that are intended to speed up one aspect or another of using BPF in the system. A few, in particular, seem about ready to hit the mainline.

The BPF dispatcher

BPF programs cannot run until they are "attached" to a specific call point. Tracing programs are attached to tracepoints, while networking express data path (XDP) programs are attached to a specific network device. In general, more than one program can be attached at any given location. When it comes time to run attached programs, the kernel will work through a linked list and invoke each program in turn.

Actually executing a compiled BPF program is done with an indirect jump. Such jumps were never entirely fast, but in the age of speculative-execution vulnerabilities those jumps have been turned into retpolines — a construct that defeats a number of Spectre attacks, but which also turns indirect jumps into something that is far slower than they were before. For cases where BPF programs are invoked frequently, such as for every incoming network packet, that extra overhead hurts.

There have been a number of efforts aimed at reducing the retpoline performance penalty in various parts of the kernel. The BPF dispatcher patch set is Björn Töpel's approach to the problem for BPF programs, and for the XDP use case in particular. It maintains a machine-code trampoline containing a direct jump instruction for every attached BPF program; this trampoline must be regenerated whenever a program is added to or removed from the list. When the time comes to call a BPF program, the trampoline is invoked with the address of the program of interest; it then executes a binary search to find the direct-jump instruction corresponding to that program. The jump is then executed, causing the desired program to be run.

That may seem like a lot of overhead to replace an indirect call, but it is still faster than using a retpoline — by a factor of about three, according to the performance result posted with the patch series. In fact, indirect jumps are so expensive that the dispatcher is competitive even in the absence of retpolines, so it is enabled whether retpolines are in use or not. This code is in its fifth revision and seems likely to make its way into the mainline before too long.

Memory-mappable maps

BPF maps are the way that BPF programs store persistent data; they come in a number of varieties but are essentially associative arrays that can be shared with other BPF programs or with user space. Access to maps from within BPF programs is done by way of special helper functions; since everything happens within the kernel, this access is relatively fast. Getting at a BPF map from user space, instead, must be done with the bpf() system call, which provides operations like BPF_MAP_LOOKUP_ELEM and BPF_MAP_UPDATE_ELEM.

If one simply needs to read out the results at the end of a tracing run, calling bpf() is unlikely to be a problem. In the case of user-space programs that run for a long time and access a lot of data in BPF maps, though, the system-call overhead may well prove to be too much. Much of the time, the key to good performance is avoiding system calls as much as possible; making a call into the system for each item of data exchanged with a BPF program runs counter to that principle. Andrii Nakryiko has a partial solution to this problem in the form of memory-mappable BPF maps. It allows a user-space process to map a BPF array map (one that is indexed with simple integers) directly into its address space; thereafter, data in BPF maps can be accessed directly, with no need for system calls at all.

There are some limitations in the current patch set; only array maps can be mapped in this way, and maps containing spinlocks cannot be mapped (which makes sense, since user space will be unable to participate in the locking protocol anyway). Maps must be created with the BPF_F_MAPPABLE attribute (which causes them to be laid out differently in memory) to be mappable. This patch set has been applied to the BPF repository and can be expected to show up in the 5.6 kernel.

Batched map operations

Memory-mapping BPF maps is one way of avoiding the bpf() system call but, as seen above, it has some limitations. A different approach to reducing system calls can be seen in the batched operations patch set from Brian Vazquez. System calls are still required to access BPF map elements, but it becomes possible to access multiple elements with a single system call.

In particular, the patch set introduces four new map-related commands for the bpf() system call: BPF_MAP_LOOKUP_BATCH, BPF_MAP_LOOKUP_AND_DELETE_BATCH, BPF_MAP_UPDATE_BATCH, and BPF_MAP_DELETE_BATCH. These commands require the following structure to be passed in the bpf() call:

    struct { /* struct used by BPF_MAP_*_BATCH commands */
        __aligned_u64   in_batch;
        __aligned_u64   out_batch;
        __aligned_u64   keys;
        __aligned_u64   values;
        __u32           count;
        __u32           map_fd;
        __u64           elem_flags;
        __u64           flags;
    } batch;

For lookup operations (which, despite their name, are intended to read through a map's entries rather than look up specific entries), keys points to an array able to hold count keys; values is an array for count values. The kernel will pass through the map, storing the keys and associated values for a maximum of that many elements, and setting count to the number actually returned. Setting in_batch to NULL starts the lookup at the beginning of the map; the out_batch value can be used for subsequent calls to pick up where the previous call left off, thus allowing traversal of the entire map.

Update and delete operations expect keys to contain the keys for the map elements to be affected. Updates also use values for the new values to be associated with keys.

The batch operations do not eliminate system calls for access to map elements, but they can reduce those calls considerably; one call can affect 100 (or more) elements at a time rather than just one element. The batch operations do have some significant advantages over memory-mapping; for example, they can be used for any map type, not just array maps. It is also possible to perform operations (like deletion) that cannot be done with memory-mapping.

There is thus a place for both approaches. This patch set is in its third revision, having picked up a number of reviews and acks along the way, so it, too, seems likely to be merged in the near future.

Comments (6 posted)

Removing the Linux /dev/random blocking pool

By Jake Edge
January 7, 2020

The random-number generation facilities in the kernel have been reworked some over the past few months—but problems in that subsystem have been addressed over an even longer time frame. The most recent changes were made to stop the getrandom() system call from blocking for long periods of time at system boot, but the underlying cause was the behavior of the blocking random pool. A recent patch set would remove that pool and it would seem to be headed for the mainline kernel.

Andy Lutomirski posted version 3 of the patch set toward the end of December. It makes "two major semantic changes to Linux's random APIs". It adds a new GRND_INSECURE flag to the getrandom() system call (though Lutomirski refers to it as getentropy(), which is implemented in glibc using getrandom() with fixed flags); that flag would cause the call to always return the amount of data requested, but with no guarantee that the data is random. The kernel would just make its best effort to give the best random data it has at that point in time. "Calling it 'INSECURE' is probably the best we can do to discourage using this API for things that need security."

The patches also remove the blocking pool. The kernel currently maintains two pools of random data, one that corresponds to /dev/random and another for /dev/urandom, as described in this 2015 article. The blocking pool is the one for /dev/random; reads to that device will block (thus the name) until "enough" entropy has been gathered from the system to satisfy the request. Further reads from that file will also block if there is insufficient entropy in the pool.

Removing the blocking pool means that reads from /dev/random behave like getrandom() with a flags value of zero (and turns the GRND_RANDOM flag into a noop). Once the cryptographic random-number generator (CRNG) has been initialized, reads from /dev/random and calls to getrandom(..., 0) will not block and will return the requested amount of random data. Lutomirski said:

I believe that Linux's blocking pool has outlived its usefulness. Linux's CRNG generates output that is good enough to use even for key generation. The blocking pool is not stronger in any material way, and keeping it around requires a lot of infrastructure of dubious value.

The changes were made with an eye toward ensuring that existing programs are not really affected; in fact, the problems with long waits for things like generating GnuPG keys will get better.

This series should not break any existing programs. /dev/urandom is unchanged. /dev/random will still block just after booting, but it will block less than it used to. getentropy() with existing flags will return output that is, for practical purposes, just as strong as before.

Lutomirski noted that there is still the open question of whether the kernel should provide so-called "true random numbers", which is, to a certain extent, what the blocking pool was meant to do. He can only see one reason to do so: "compliance with government standards". He suggested that if the kernel were to provide that, it should be done through an entirely different interface—or be punted to user space by providing a way for it to extract raw event samples that could be used to create such a blocking pool.

Stephan Müller suggested that his Linux random-number generator (LRNG) patch set (now up to version 26) might be a way to provide true random numbers for applications that need them. The LRNG is "fully compliant to SP800-90B requirements", which makes it a solution to the governmental-standards problem. Matthew Garrett objected to the term "true random data", noting that the devices being sampled could, in principle, be modeled accurately enough to make them predictable: "We're not sampling quantum events here." Müller said that the term comes from the German AIS 31 standard to describe a random-number generator that only produces output "at an equal rate as the underlying noise source produces entropy".

Beyond the terminology, though, having a blocking pool as is proposed by the LRNG patches will just lead to various problems, at least if it is available without privilege, Lutomirski said:

This doesn’t solve the problem. If two different users run stupid programs like gnupg, they will starve each other.

As I see it, there are two major problems with /dev/random right now: it’s prone to DoS (i.e. starvation, malicious or otherwise), and, because no privilege is required, it’s prone to misuse. Gnupg is misuse, full stop. If we add a new unprivileged interface, gnupg and similar programs will use it, and we lose all over again.

Müller noted that the addition of getrandom() will now allow GnuPG to use that interface since it will provide the needed guarantee that the pool has been initialized. From discussions with GnuPG maintainer Werner Koch, Müller believes that guarantee is the only reason GnuPG currently reads directly from /dev/random. But if there is an unprivileged interface that is subject to denial of service (like /dev/random today), it will be misused by some applications, Lutomirski asserted.

Theodore Y. Ts'o, who is the maintainer of the Linux random-number subsystem, appears to have changed his mind along the way about the need for a blocking pool. He said that removing that pool would effectively get rid of the idea that Linux has a true random-number generator (TRNG), which "is not insane; this is what the *BSD's have always done". He, too, is concerned that providing a TRNG mechanism will just serve as an attractant for application developers. He also thinks that it is not really possible to guarantee a TRNG in the kernel, given all of the different types of hardware supported by Linux. Even making the facility only available to root will not solve the problem:

Application programmers would give instructions requiring that their application be installed as root to be more secure, "because that way you can get access the _really_ good random numbers".

Müller asked if Ts'o was giving up on the blocking pool implementation that he had added long ago. Ts'o agreed that he was; he is planning to take the patches from Lutomirski and is pretty strongly opposed to adding a blocking interface back into the kernel.

The kernel can't offer up any guarantees about whether or not the noise source has been appropriately characterized. All say, a GPG or OpenSSL developer can do is get the vague sense that TRUERANDOM is "better" and of course, they want the best security, so of *course* they are going to try to use it. At which point it will block, and when some other clever user (maybe a distro release engineer) puts it into an init script, then systems will stop working and users will complain to Linus.

For cryptographers and others who really need a TRNG, Ts'o is also in favor of providing them a way to collect their own entropy in user space to use as they see fit. Entropy collection is not something that the kernel can reliably do on all of the different hardware that it supports, nor can it estimate the amount of entropy provided by the different sources, he said.

The kernel shouldn't be mixing various noise sources together, and it certainly shouldn't be trying to claim that it knows how many bits of entropy that it gets when [it] is trying to play some jitter entropy game on a stupid-simple CPU architecture for IOT/Embedded user cases where everything is synchronized off of a single master oscillator, and there is no CPU instruction reordering or register renaming, etc., etc.

You can talk about providing tools that try to make these estimations --- but these sorts of things would have to be done on each user's hardware, and for most distro users, it's just not practical.

So if it's just for cryptographers, then let it all be done in userspace, and let's not make it easy for GPG, OpenSSL, etc., to all say, "We want TrueRandom(tm); we won't settle for less". We can talk about how do we provide the interfaces so that those cryptographers can get the information they need so they can get access to the raw noise sources, separated out and named, and with possibly some way that the noise source can authenticate itself to the Cryptographer's userspace library/application.

There was a bit of discussion about how that interface might look; there may be security implications for some of the events, for example. Ts'o noted that the keyboard scan codes (i.e. the keys pressed) are mixed into the pool as part of the entropy collection. "Exposing this to userspace, even if it is via a privileged system call, would be... unwise." It does seem possible that other event timings could provide some kind of side-channel information leak as well.

So it would seem that a longtime feature of the Linux random-number subsystem is on its way out. Given the changes that the random-number subsystem have undergone recently, it effectively was only causing denial-of-service problems when it was used; there are now better ways to get the best random numbers that the kernel can provide. If a TRNG is still desired for Linux, that lack will need to be addressed in the future, but likely will not be done within the kernel itself.

Comments (23 posted)

The trouble with IPv6 extension headers

By Jonathan Corbet
January 7, 2020
It has taken longer than anybody might have liked, but the IPv6 protocol is slowly displacing IPv4 across the Internet. A quick, highly scientific "grep the access logs" test shows that about 16% of the traffic to LWN.net is currently using IPv6, and many large corporate networks are using IPv6 exclusively internally. This version of the IP protocol was designed to be more flexible than IPv4 in a number of ways; the "extension header" mechanism is one way in which that flexibility is achieved. A proposal to formalize extension-header processing in the kernel's networking stack has led to some concerns, though, about how this feature will be used and what role Linux should play in its development.

In both versions of the IP protocol, the header of each packet contains a collection of information about how the packet is to be handled; at a minimum, it contains source and destination addresses and a higher-level protocol number. In IPv4, the contents of the header are rigidly specified; it is difficult to add new types of information to the header. When IPv6 was designed, extension headers were added as a way to (relatively) easily add new information in the future.

A few extension header types are defined in RFC8200 (which describes IPv6). Two of particular interest are the "Hop-by-Hop" and "Destination" headers; the former is meant to be acted upon by every system that handles the packet, while the latter is only for the destination node's attention. These headers may contain one or more options, each encoded in a type-length-value (TLV) format. RFC8200 only defines a couple of options that insert padding into the header, but there is interest in adding a number of others.

For example, In-situ Operations, Administration, and Maintenance options are meant to allow providers to collect telemetry information on packets passing through their networks. The Path MTU mechanism uses a Hop-by-Hop option to discover the maximum packet size a path can handle. Firewall and Service Tickets (FAST) are a Hop-by-Hop option that documents a packet's right to traverse a network or pass through a firewall. The Segment Routing options allows a packet to contain the path it should take through a network. And so on.

Tom Herbert has been working on a patch series making a number of changes to how IPv6 extension headers are handled in Linux. It adds infrastructure to allow kernel modules to register their support for specific Hop-by-Hop and Destination options, and makes the creation and parsing of the associated TLVs easy. Specific options may be added to packets or connections by unprivileged users, while others are restricted to privileged users only; either way, the code tries to ensure that the options are well-formed and ordered correctly.

One of the most controversial features is not actually a part of this patch set; Herbert lists it as work for the future. This feature would perform the insertion of new extension headers into packets passing through a system. Header insertion is a violation of RFC8200, but that naturally doesn't stop the purveyors of routers and other middleboxes from doing it anyway. That creates all of the usual problems, including packet transmission failing for reasons that are entirely opaque to a distant sender, proprietary headers leaking onto the public Internet, and more.

Networking maintainer David Miller was less than pleased by the idea of adding header-insertion capabilities to the Linux kernel:

And honestly, this stuff sounds so easy to misuse by governments and other entities. It could also be used to allow ISPs to limit users in very undesirable and unfair ways. And honestly, surveillance and limiting are the most likely uses for such a facility. I can't see it legitimately being promoted as a "security" feature, really.

It is not hard to imagine how injected headers could be used. They could mark "slow lane" packets, for example, or packets that should be forwarded to that mysterious locked room in an Internet service provider's basement. These are not capabilities that Linux developers are generally enthusiastic about supporting; it is thus not surprising that Miller made it clear that he is in no hurry to merge this code into the networking subsystem.

Herbert acknowledged Miller's concerns, but noted that router vendors will engage in abuse regardless of whether Linux supports a specific feature. None of this behavior requires the use of extension headers at all. Adding better extension header support to the kernel, though, might be a way to minimize the scope of these abuses in the future:

This is why Linux is so critical to networking, it is the only open forum where real scrutiny is applied to how protocols are implemented. If the alternatives are given free [rein] to lead then it's very likely we'll end up being stuck with what they do and probably have to follow their lead regardless of how miserable they make the protocols. We've already seen this in segment routing, their attempts to kill IP fragmentation, and all the other examples of protocol ossification that unnecessarily restrict what hosts are allowed to send in the network and hence reduce the utility and security we are able to offer the user.

One way in which Herbert hopes to improve the situation is via a new attribution option that would at least allow network managers to determine the source of an injected extension header that is causing problems. As things stand now, there is no way to know which system may be injecting problematic headers into packets as they pass through. More generally, he hopes that showing how to do things "right" will help to deter the worst hacks. Miller was skeptical about whether this could work; Herbert countered with protocols like QUIC, TLS, and TCP fast open as examples of how Linux developers have been able to steer protocols in a better direction in the past.

That is where the conversation stands as of this writing. How it is resolved matters, though. For all practical purposes, Linux is the reference implementation and the proving ground for the protocols that make up the public Internet. Adoption by Linux ensures that a feature will be available across the net; rejection can doom a feature in the long run. But rejection also abdicates the community's role in the development of new protocols, and Linux, too, can be routed around if the forces driving a feature are strong enough. Whether we want to resist header injection or to try to mitigate its worst abuses from the inside is a question that the networking community will need to find an answer to in the relatively near future.

Comments (4 posted)

Page editor: Jonathan Corbet

Brief items

Security

Security quotes of the week

About a month ago we presented honware at eCrime 2019; [it is] a new honeypot framework that enables the rapid construction of honeypots for a wide range of CPE and IoT devices. The framework automatically processes a standard firmware image (as is commonly provided for updates) and runs the system with a special pre-built Linux kernel without needing custom hardware. It then logs attacker traffic and records which of their actions led to a compromise.

We provide an extensive evaluation and show that our framework is scalable and significantly better than existing emulation strategies in emulating the devices’ firmware applications. We were able to successfully process close to 2000 firmware images across a dozen brands (TP-Link, Netgear, D-Link…) and run them as honeypots. Also, as we use the original firmware images, the honeypots are not susceptible to fingerprinting attacks based on protocol deviations or self-revealing properties [PDF].

Alexander Vetterl (paper [PDF])

In light of recently-published chosen-prefix attacks on SHA1 [1], I caution that it is no longer safe to use Epiphany, or any other WebKitGTK-based browser, or libsoup, or any applications based on libsoup, or any other applications using GLib's networking facilities, in combination with GnuTLS versions older than GnuTLS 3.6. GnuTLS versions prior to 3.6 will accept certificates that use SHA1 signatures. It is now both possible and economically-feasible to forge these signatures. Your secure connections can no longer be trusted to be secure when using these older versions of GnuTLS.
Michael Catanzaro

It's a long (32 pages) but interesting read. The only thing I have a bit of an issue with is the conclusion:
SHA-1 signatures now offers virtually no security in practice

It should really be "SHA-1 signatures where the attacker has two months time and tens of thousands of dollars (there are some cheaper options than $75k) to prepare a forgery offer no security in practice".

Even then, the demonstrated attack relies on the ability to stuff arbitrary garbage data into the signed message (in this case into a JPEG image after the End-of-Image marker), so add:

"... and the ability to stuff arbitrary attacker-chosen data into the signed message..."

to that.

Peter Gutmann on the SHA-1 collision paper

Comments (1 posted)

Kernel development

Kernel release status

The current development kernel is 5.5-rc5, released on January 5. Linus added a note to the release announcement: "One sad piece of news I got this past week was that Bruce Evans has passed away. Bruce wasn't really ever really much directly involved in Linux development - he was active on the BSD side - but he was the developer behind Minix/i386, which was what I used for the original Linux development in the very early days before Linux became self-hosting."

Stable updates: 5.4.8, 4.19.93, 4.14.162, 4.9.208, and 4.4.208 were all released on January 5.

Comments (none posted)

Quote of the week

So you might want to look into not the standard library implementation, but specific locking implementations for your particular needs. Which is admittedly very very annoying indeed. But don't write your own. Find somebody else that wrote one, and spent the decades actually tuning it and making it work.

Because you should never ever think that you're clever enough to write your own locking routines.. Because the likelihood is that you aren't (and by that "you" I very much include myself - we've tweaked all the in-kernel locking over decades, and gone through the simple test-and-set to ticket locks to cacheline-efficient queuing locks, and even people who know what they are doing tend to get it wrong several times).

Linus Torvalds

Comments (1 posted)

Distributions

Distributions quote of the week

In the end, enterprise software upgrades at the rate of whatever the majority of accounting systems need to make payroll and tax filings happen.
Stephen J Smoogen

Comments (none posted)

Development

Firefox 72.0

Firefox 72.0 has been released. In this version Firefox’s Enhanced Tracking Protection now blocks fingerprinting scripts. Also picture-in-picture video is available. See the release notes for the details of these features and other changes.

Comments (4 posted)

Ruby 2.7 released

Over the holiday week, we missed the announcement of Ruby 2.7 on December 25. It is the most recent release of the Ruby programming language and was more than a year in development. There are quite a few new features including experimental pattern matching for case statements (more information can be found in these slides), a new compaction garbage collector for the heap, support for separating positional and keyword arguments, and plenty more.

Comments (19 posted)

Development quotes of the week

Why do I care so much about unexpected stacktraces? I do because mat2 is dealing with untrusted fileformats: users will throw all kind of random malformed files at it, and I'm expecting meaningful exceptions that I can catch should something go wrong, not eldrich-like unpredictable monstrosities crawling from the depth of Python's core in a fireworks of traces scaring my beloved users away.
Julien Voisin (Thanks to Paul Wise)

And hidden therein is my actual point: complexity. There has long been a trend in computing of endlessly piling on the abstractions, with no regard for the consequences. The web is an ever growing mess of complexity, with larger and larger blobs of inscrutable JavaScript being shoved down pipes with no regard for the pipe’s size or the bridge toll charged by the end-user’s telecom. Electron apps are so far removed from hardware that their jarring non-native UIs can take seconds to respond and eat up the better part of your RAM to merely show a text editor or chat application. [...]

I use syscalls as an approximation of this complexity. Even for one of the simplest possible programs, there is a huge amount of abstraction and complexity that comes with many approaches to its implementation. If I just print “hello world” in Python, users are going to bring along almost a million lines of code to run it, the fraction of which isn’t dead code is basically a rounding error. This isn’t always a bad thing, but it often is and no one is thinking about it.

Drew DeVault

Comments (4 posted)

Miscellaneous

The Schism at the Heart of the Open-Source Movement (The Atlantic)

It is not all that often that the mainstream press looks at issues in the open-source world, but this article from The Atlantic does just that; it looks at the controversy surrounding GitHub renewing its contract with the US Immigration and Customs Enforcement (ICE) agency and the concerns some have had with their code being used by ICE. "So when news of GitHub’s contract with ICE emerged, its employees weren’t the only ones outraged. Because of the transitive nature of open source, volunteer developers—who host code on the site to share with others—may have unwittingly contributed to the code GitHub furnished for ICE, the agency responsible for enforcing immigration policy. Some were troubled by the idea that their code might in some way be used to help agents detain and deport undocumented migrants. But their outrage—and the backlash to it—reveals existential questions about the very nature of open source."

Comments (52 posted)

Ingebrigtsen: Whatever Happened To news.gmane.org?

Lars Ingebrigtsen provides details on the current status of the Gmane archive server and asks for feedback on whether it is still useful. "Over the past few years, people have asked me what happened to Gmane, and I’ve mostly clasped my hands over my ears and gone 'la la la can’t hear you', because there’s nothing about the story I’m now finally going to tell that I don’t find highly embarrassing. I had hoped I could just continue that way until I die, but perhaps it would be more constructive to actually tell people what’s going on instead of doing an ostrich impression." (Thanks to Giovanni Gherdovich).

Comments (17 posted)

Page editor: Jake Edge

Announcements

Newsletters

Distributions and system administration

Development

Meeting minutes

Miscellaneous

Calls for Presentations

CFP Deadlines: January 9, 2020 to March 9, 2020

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

DeadlineEvent Dates EventLocation
January 13 May 18
May 20
[POSTPONED] Percona Live Open Source Database Conference 2020 Austin, Tx, USA
January 15 August 13
August 21
Netdev 0x14 Virtual
January 19 May 27
May 28
[ONLINE] PGCon 2020 Ottawa, Canada
January 31 March 23
March 25
Supercomputing Frontiers Europe Warsaw, Poland
January 31 June 17
June 18
[ONLINE] Open Source Data Center Conference Berlin, Germany
January 31 April 16
April 17
[CANCELED] DevOps Days Toronto 2020 Toronto, Canada
February 9 May 11
May 13
[RESCHEDULED] Linux Audio Conference Bordeaux, France
February 9 March 29
April 1
[POSTPONED] foss-north Göteborg, Sweden
February 10 May 11
May 13
[ONLINE] Power Management and Scheduling in the Linux Kernel Pisa, Italy
February 11 July 6
July 12
[VIRTUAL] SciPy 2020 Austin, TX, USA
February 15 June 17
June 18
[ONLINE] stackconf 2020 Berlin, Germany
February 15 March 25
March 28
[POSTPONED] MiniDebConf Maceió 2020 Maceió, Brazil
February 16 June 29
July 2
[VIRTUAL] Open Source Summit/Embedded Linux Conference North America Austin, TX, USA
February 28 June 23
June 25
[CANCELED] IcingaConf 2020 Amsterdam, Netherlands
February 28 May 26
May 28
[VIRTUAL] sambaXP 2020 Berlin, Germany
February 28 June 23
June 25
[ONLINE] Postgres Vision 2020 Boston, MA, USA
March 6 July 6
July 9
[Virtual] XenProject Developer and Design Summit 2020
March 8 June 25
June 26
[CANCELED] Postgres Ibiza 2020 Ibiza, Spain

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

Events: January 9, 2020 to March 9, 2020

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
January 13 LCA Creative Arts Miniconf Gold Coast, Australia
January 13 Open ISA (RISC-V, OpenPOWER, etc) Miniconf at Linux.conf.au Gold Coast, Australia
January 13
January 17
linux.conf.au Gold Coast, Australia
January 14 lca Kernel Miniconf Gold Coast, Australia
January 24
January 26
devconf.cz Brno, Czech Republic
February 1
February 2
FOSDEM 2020 Brussels, Belgium
February 3 Copyleft Conference Brussels, Belgium
February 18
February 20
PyCon Namibia Windhoek, Namibia
March 3 Open Source 101 Columbia, SC, USA
March 5
March 6
Incontro DevOps Italia Bologna, Italy
March 5
March 8
Southern California Linux Expo Pasadena, CA, USA

If your event does not appear here, please tell us about it.

Security updates

Alert summary January 2, 2020 to January 8, 2020

Dist. ID Release Package Date
Arch Linux ASA-202001-1 firefox 2020-01-08
Debian DSA-4597-1 stable netty 2020-01-03
Debian DLA-2058-1 LTS nss 2020-01-07
Debian DLA-2057-1 LTS pillow 2020-01-06
Debian DSA-4598-1 stable python-django 2020-01-07
Debian DSA-4599-1 stable wordpress 2020-01-08
Fedora FEDORA-2019-5fdceffcb9 F31 chromium 2020-01-04
Fedora FEDORA-2019-7938c21723 F30 cyrus-imapd 2020-01-04
Fedora FEDORA-2019-ad23a4522d F31 cyrus-imapd 2020-01-05
Fedora FEDORA-2019-5898f4f935 F31 dovecot 2020-01-08
Fedora FEDORA-2019-c4177f74f5 F30 drupal7-l10n_update 2020-01-04
Fedora FEDORA-2019-3b1529362e F31 drupal7-l10n_update 2020-01-05
Fedora FEDORA-2019-6abe00cae1 F30 drupal7-webform 2020-01-04
Fedora FEDORA-2019-88d9c30b7d F31 drupal7-webform 2020-01-05
Fedora FEDORA-2019-e039dfaa30 F30 htmldoc 2020-01-04
Fedora FEDORA-2019-e90a7032f2 F31 htmldoc 2020-01-05
Fedora FEDORA-2019-46b6bd2459 F30 libssh 2020-01-03
Fedora FEDORA-2019-79b80b66d9 F30 nethack 2020-01-03
Fedora FEDORA-2019-1090bd0af2 F31 nethack 2020-01-05
Fedora FEDORA-2019-437d94e271 F30 php 2020-01-03
Fedora FEDORA-2019-a54a622670 F31 php 2020-01-05
Fedora FEDORA-2019-11dddb785b F30 samba 2020-01-03
Fedora FEDORA-2019-f4eb2a01d1 F31 singularity 2020-01-05
Fedora FEDORA-2019-2e12bd3a9a F30 xen 2020-01-03
Mageia MGASA-2020-0008 7 advancecomp 2020-01-05
Mageia MGASA-2020-0001 7 apache-commons-compress 2020-01-05
Mageia MGASA-2020-0010 7 cyrus-imapd 2020-01-05
Mageia MGASA-2020-0011 7 cyrus-sasl 2020-01-05
Mageia MGASA-2020-0022 7 dia 2020-01-05
Mageia MGASA-2020-0019 7 freeimage 2020-01-05
Mageia MGASA-2020-0007 7 freeradius 2020-01-05
Mageia MGASA-2020-0013 7 igraph 2020-01-05
Mageia MGASA-2020-0014 7 jhead 2020-01-05
Mageia MGASA-2020-0018 7 jss 2020-01-05
Mageia MGASA-2020-0017 7 libdwarf 2020-01-05
Mageia MGASA-2020-0015 7 libextractor 2020-01-05
Mageia MGASA-2020-0020 7 libxml2 2020-01-05
Mageia MGASA-2020-0021 7 mediawiki 2020-01-05
Mageia MGASA-2020-0016 7 memcached 2020-01-05
Mageia MGASA-2020-0009 7 mozjs60 2020-01-05
Mageia MGASA-2020-0005 7 openconnect 2020-01-05
Mageia MGASA-2020-0026 7 opensc 2020-01-07
Mageia MGASA-2020-0023 7 openssl 2020-01-05
Mageia MGASA-2020-0003 7 putty 2020-01-05
Mageia MGASA-2020-0002 7 python-ecdsa 2020-01-05
Mageia MGASA-2020-0004 7 python-werkzeug 2020-01-05
Mageia MGASA-2020-0024 7 radare2 2020-01-07
Mageia MGASA-2020-0006 7 shadowsocks-libev 2020-01-05
Mageia MGASA-2020-0012 7 upx 2020-01-05
Mageia MGASA-2020-0025 7 varnish 2020-01-07
Oracle ELSA-2019-4273 OL8 container-tools:1.0 2020-01-03
Oracle ELSA-2019-4269 OL8 container-tools:ol8 2020-01-03
Red Hat RHSA-2020:0005-01 EL6 chromium-browser 2020-01-02
Red Hat RHSA-2020:0046-01 EL8 java-1.8.0-ibm 2020-01-07
Red Hat RHSA-2020:0036-01 EL7.5 kernel 2020-01-07
Red Hat RHSA-2020:0027-01 EL7 kpatch-patch 2020-01-06
Red Hat RHSA-2020:0028-01 EL7 kpatch-patch 2020-01-06
Red Hat RHSA-2020:0026-01 EL7.6 kpatch-patch 2020-01-06
Red Hat RHSA-2020:0002-01 RHSC rh-git218-git 2020-01-02
Red Hat RHSA-2020:0057-01 SCL rh-java-common-apache-commons-beanutils 2020-01-08
Slackware SSA:2020-006-01 firefox 2020-01-06
SUSE SUSE-SU-2020:0035-1 SLE15 containerd, docker, docker-runc, golang-github-docker-libnetwork 2020-01-08
SUSE SUSE-SU-2020:0024-1 OS7 OS8 SLE12 SES5 java-1_8_0-ibm 2020-01-07
SUSE SUSE-SU-2020:0001-1 SLE15 java-1_8_0-ibm 2020-01-02
SUSE SUSE-SU-2020:0025-1 SLE12 java-1_8_0-openjdk 2020-01-07
SUSE SUSE-SU-2020:0023-1 libzypp 2020-01-07
SUSE SUSE-SU-2020:0028-1 SLE12 openssl-1_0_0 2020-01-07
SUSE SUSE-SU-2020:0002-1 SLE15 openssl-1_1 2020-01-02
SUSE SUSE-SU-2020:0026-1 SLE12 sysstat 2020-01-07
SUSE SUSE-SU-2020:0029-1 SLE15 tomcat 2020-01-07
SUSE SUSE-SU-2020:0016-1 OS7 OS8 SLE12 SES5 virglrenderer 2020-01-07
SUSE SUSE-SU-2020:0017-1 SLE15 virglrenderer 2020-01-07
Ubuntu USN-4230-1 16.04 18.04 19.04 19.10 clamav 2020-01-08
Ubuntu USN-4226-1 18.04 19.04 linux, linux-aws, linux-aws-5.0, linux-azure, linux-gcp, linux-gke-5.0, linux-kvm, linux-oem-osp1, linux-oracle, linux-oracle-5.0, linux-raspi2 2020-01-06
Ubuntu USN-4228-1 16.04 linux, linux-aws, linux-kvm, linux-raspi2, linux-snapdragon 2020-01-06
Ubuntu USN-4227-2 14.04 linux-azure 2020-01-07
Ubuntu USN-4225-1 18.04 19.10 linux-kvm, linux-oracle, linux-raspi2 2020-01-06
Ubuntu USN-4228-2 14.04 linux-lts-xenial, linux-aws 2020-01-07
Ubuntu USN-4227-1 16.04 18.04 linux-snapdragon 2020-01-06
Full Story (comments: none)

Kernel patches of interest

Kernel releases

Linus Torvalds Linux 5.5-rc5 Jan 05
Greg KH Linux 5.4.8 Jan 05
Greg KH Linux 4.19.93 Jan 05
Greg KH Linux 4.14.162 Jan 05
Greg KH Linux 4.9.208 Jan 05
Greg KH Linux 4.4.208 Jan 05

Architecture-specific

Mark Brown ARMv8.5-RNG support Jan 07
Pratik Rajesh Sampat Introduce Self-Save API for deep stop states Jan 06
Mikhail Zaslonko S390 hardware support for kernel zlib Jan 03
kan.liang@linux.intel.com TopDown metrics support for Icelake Jan 06

Core kernel

Sargun Dhillon Add pidfd_getfd syscall Jan 03
Alexei Starovoitov bpf: Introduce global functions Jan 07

Device drivers

Device-driver infrastructure

Filesystems and block layer

Memory management

Networking

Pablo Neira Ayuso iptables: introduce cache evaluation phase Jan 06
Ido Schimmel net: Add route offload indication Jan 07
Mat Martineau Multipath TCP: Prerequisites Jan 07

Security-related

Lakshmi Ramasubramanian IMA: Deferred measurement of keys Jan 02

Virtualization and containers

Miscellaneous

Theodore Y. Ts'o e2fsprogs v1.45.5 Jan 07

Page editor: Rebecca Sobol


Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds