|
|
Log in / Subscribe / Register

Leading items

Welcome to the LWN.net Weekly Edition for March 19, 2020

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Bringing encryption restrictions in through the back door

By Jake Edge
March 18, 2020

Legislation recently proposed in the US Senate is ostensibly meant to combat "child sexual abuse material" (CSAM), but it does not actually do much to combat that horrible problem. Its target, instead, is the encryption of user communications, which the legislation—tellingly—never mentions. The Eliminating Abusive and Rampant Neglect of Interactive Technologies Act of 2020, EARN IT for short, is an attempt to force online service providers (e.g. Facebook, Google, etc.) to follow a set of "best practices" determined by a commission, to combat the scourge of CSAM; the composition of that commission makes it clear that end-to-end encryption will not be one of those practices, but companies that do not follow the best practices will lose liability protection for their users' actions. It is, in brief, an attempt to force providers to either abandon true end-to-end encryption or face ruinous lawsuits—all without "seeming" to be about encryption at all.

The bill

The EARN IT bill (i.e. proposed legislation) would set up a 19-member "National Commission on Online Child Sexual Exploitation Prevention":

The purpose of the Commission is to develop recommended best practices that providers of interactive computer services may choose to implement to prevent, reduce, and respond to the online sexual exploitation of children, including the enticement, grooming, sex trafficking, and sexual abuse of children and the proliferation of online child sexual abuse material.

The composition of the commission includes three administration officials, the Attorney General, Secretary of Homeland Security, and chairman of the Federal Trade Commission, along with 16 other members in several different groups. Four will be from law enforcement or the prosecution of CSAM crimes, four will be either survivors of those crimes or professionals who work with the victims, four from the "interactive computer service" industry, two experienced in constitutional law, consumer protection, or privacy, and two computer scientists experienced in "cryptography, data security, or artificial intelligence". That mention of "cryptography" is as close as the bill gets to talking about encryption.

The commission only requires 14 of its members to agree on the best practices, however, so the computer scientists and consumer-protection specialists could be ignored entirely, for example. Worse than that, though, is that the Attorney General and other administration officials effectively have veto power over the best practices list. Since they will be participating in the formulation of the list, it seems a tad unlikely that it will not be to their liking. Since the current Attorney General (and, really, all of his predecessors no matter which of the two dominant parties is appointing them) is strongly anti-encryption, one would guess that providing a backdoor "for law enforcement" will make the list.

But the consequences of not following these commission-established rules is where the "earn" part comes in. Companies that offer interactive computer services are currently shielded from liability based on the actions of their users via section 230 of the Communications Decency Act (CDA), which came about in 1996. It effectively treats service providers as mere conduits, rather than as publishers; the latter have far more liability for the content they purvey. Under EARN IT, though, service providers would only continue to receive section 230 protection if they follow the practices that the commission "recommends". Thus, they would earn their right be treated as telecommunications providers—but only if they bow to the best practices, which will certainly curtail true end-to-end encryption for users.

Opposition

Though opponents of EARN IT will be branded as CSAM-enablers, as always, that is not at all what the overwhelmingly vast majority of the opponents are after, of course. It is always the same litany of bad people (e.g. terrorists, abusers of children) that can use encryption to hide their activities, but encryption is used by regular people for their normal activities, which is extremely important to note. The foundation of all financial transactions on the web, for example, is encryption. People would not be able to safely work, bank, shop, and so on from home, while, say, trying to flatten the curve during a pandemic, without encryption. Like it or not—and politicians hate it, if they even believe it—there is no way to have "magic" encryption that works for everything except when law enforcement wants to have a peek.

Over the years, countless cryptographers and security experts have patiently explained that there is no known way, mathematically, to provide a backdoor for the "good guys" without also effectively providing an opening for the "bad guys". Sometimes the definition of "bad guys" differs, of course. There have been numerous instances where rogue law enforcement agencies and individuals have abused various safeguards for their own gain—or even a perceived societal gain. There are also plenty of instances where rogue employees of online providers have accessed information by using backdoors intended to be used only by the authorities.

Weakening encryption makes it less effective for everyone. Lawmakers often seem to forget how much of the government uses the same online services they are targeting; leaving holes for law enforcement also may be leaving holes for attackers, some of whom may be working for the intelligence services of less-than-friendly rivals. Companies and regular folks may be more concerned with interception of their secrets, almost all of which have nothing to do with terrorism, CSAM, or any other illegal activity.

Beyond that, creating a list of best practices may preclude innovations that could actually help combat CSAM. Once the list of best practices has been adopted, it will be slow to change—commissions are simply another name for committees, after all. Providers will be leery of putting their companies at risk by adding features that violate the best practices, even if the result might be that more criminal behavior would be found. It effectively locks providers into what are best practices—at least hopefully, other than encryption restrictions—as they stand today (or, perhaps, in 18 months when the commission is supposed to conclude its work). In a fast-moving environment like the internet of today, that's simply too risky.

As might be guessed, various online privacy advocates, lawyers with a background in internet matters, cryptographers, and others have come out strongly against EARN IT. Perhaps cryptographer Matthew Green put it best:

Since there are no "best practices" in existence, and the techniques for doing this while preserving privacy are completely unknown, the bill creates a government-appointed committee that will tell technology providers what technology they have to use. The specific nature of the committee is byzantine and described within the bill itself. Needless to say, the makeup of the committee, which can include as few as zero data security experts, ensures that end-to-end encryption will almost certainly not be considered a best practice.

So in short: this bill is a backdoor way to allow the government to ban encryption on commercial services. And even more beautifully: it doesn't come out and actually ban the use of encryption, it just makes encryption commercially infeasible for major providers to deploy, ensuring that they'll go bankrupt if they try to disobey this committee's recommendations.

It's the kind of bill you'd come up with if you knew the thing you wanted to do was unconstitutional and highly unpopular, and you basically didn't care.

The Electronic Frontier Foundation (EFF) has, unsurprisingly, come out strongly opposed to EARN IT (here too). Riana Pfefferkorn of the Stanford Law School Center for Internet and Society has been analyzing the implications of the bill since before it was even introduced; more recently here and here. There is lots of additional analysis out there, much of it linked from the reactions above. EARN IT is extraordinarily bad legislation in multiple dimensions, with far-reaching effects that may run afoul of the first, fourth, and fifth amendments to the US Constitution (i.e. part of the "Bill of Rights").

Section 230 has been, rightly or wrongly, targeted by various "sides" over the last few years, in part because of the disinformation war that was waged on social media sites during the last US Presidential election (and other elections elsewhere). The so-called "techlash"—backlash against the online service providers such as Facebook and Twitter—is providing cover for EARN IT. One hopes that it was simply a coincidence, but it would seem that many Americans have more important, health-related concerns right now, so they may not be paying close attention to attempts to circumvent the secrecy protections they want—and need. Whether it was planned or not, Covid-19 is definitely providing cover of a different sort for EARN IT.

The most galling thing about attacks against encryption is that, whether they understand it or not, legislators and others who push for backdoors are only hurting regular users for the most part. Those who are technically savvy, or are willing to hire people with those talents, can certainly communicate securely without concern for government surveillance. Mathematics exists, much to the chagrin, if not outright bafflement, of politicians and others; those who need or want effectively unbreakable encryption can have it. Those who cannot have it are the regular users, unless it is made available to them by various social-media platforms and the like. And, as mentioned earlier, some of those regular users are the very legislators behind this attack, alongside much of the rest of the government they are part of. It would almost be comical if it was not so disheartening.

EARN IT is a neat piece of work—in a sick sort of way. It trumpets the oft-used "but what about the children?" battle cry in a ploy to misdirect the public from its actual aims. That, sadly, is so often the case with this kind of legislation. This is a bill worth keeping an eye on—and trying to stop, if possible. It is a well-crafted attack, however, and pushes all the right buttons, so it may well pass; at that point, presumably, the fight will move to the court system. The crypto wars have come yet again ... stay tuned.

Comments (71 posted)

Dentry negativity

By Jonathan Corbet
March 12, 2020
Back in 2017, Waiman Long posted a patch set placing limits on the number of "negative dentries" stored by the kernel. The better part of three years later, that work continues with, seemingly, no better prospects for getting into the mainline. It would be understandable, though, if many people out there don't really know what negative dentries are or why kernel developers care about them. That, at least, can be fixed, even if the underlying problem seems to be more difficult.

A "dentry" in the Linux kernel is the in-memory representation of a directory entry; it is a way of remembering the resolution of a given file or directory name without having to search through the filesystem to find it. The dentry cache speeds lookups considerably; keeping dentries for frequently accessed names like /tmp, /dev/null, or /usr/bin/tetris saves a lot of filesystem I/O.

A negative dentry is a little different, though: it is a memory of a filesystem lookup that failed. If a user types "more cowbell" and no file named cowbell exists, the kernel will create a negative dentry recording that fact. Should our hypothetical user, being a stubborn type, repeat that command, the kernel will encounter the negative dentry and reward said user — who is unlikely to be grateful, users are like that — with an even quicker "no such file or directory" error.

Optimized error messages for fat-fingered commands is a nice benefit from negative dentries, but their real value lies elsewhere. It turns out that lookups on nonexistent files happen frequently, and it's often the same files that are being looked for. Shared-library lookups are one example; it can be instructive to type something like this:

    $ strace -eopenat /usr/bin/echo 'Subscribe to LWN'

On your editor's system, the output looks like:

    openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = 3
    [...]

That simple echo command generates 13 failed lookups on a Fedora 31 system; launching oowriter creates 68 of them, and launching gnucash generates 277. For applications like these, optimizing failed lookups can yield a perceptible improvement in startup time. Compilers and language runtimes can also generate a lot of failed lookups; consider, for example, the handling of C #include or Python import statements. A quick "allmodconfig" kernel build run on your editor's system caused 52,799,262 failed lookups; that is worth optimizing.

There is one little problem with negative dentries, though: they require memory. All of those failed lookups can generate a lot of negative dentries, to the point that they start to crowd out more useful data. This is not a new problem; LWN reported on a complaint about negative dentries from memory-management developer Andrea Arcangeli — in 2002. For the most part, though, the normal shrinker mechanisms that keep the dentry cache as a whole under control have also sufficed to keep the negative variety from taking over.

Long has been working on the cases where normal shrinking doesn't work, though; he posted a new version of his patch set toward the end of February. As he points out, the number of positive dentries is limited by the number of files in the system, but there is no practical limit to the number of files that don't exist. As an illustration of what this can mean, Eric Sandeen pointed out some code in the NSS library that deliberately tries to open 10,000 nonexistent files — every time it starts up — as a timing exercise. Even without such pathological examples, though, the number of negative dentries has the potential to grow without bound.

Long's patch set adds a new sysctl knob, /proc/sys/fs/dentry-dir-max; if its value is zero (the default), the system's behavior is unchanged. If, instead, it is set to a positive value, the number of negative dentries associated with any given directory will not be allowed to exceed that value. The limit on negative dentries can be no lower than 256 to avoid excessive trimming of dentries. When the time comes to clean up excess dentries, the code tries to pick those that have not been referenced recently, and will reduce the number to 7/8 of the limit. A static key is used to prevent this mechanism from slowing down the system if it is not being used.

There seems to be no disagreement with the idea of putting firmer limits on how many negative dentries can exist. The specific solution chosen here, though, is a bit more controversial. Adding new sysctl knobs is always a bit of a hard sell; as Matthew Wilcox put it: "A sysctl is just a way of blaming the sysadmin for us not being very good at programming". In general, such knobs are difficult for administrators to discover in the first place, and even harder for them to set correctly. How should an administrator know what an appropriate number of negative dentries for any given directory should be for their systems and workloads?

Thus, Wilcox and others argued for some sort of dynamic limit calculated (and adjusted) by the kernel itself. Long responded with a suggestion that the administrator could control the total amount of memory used by negative dentries instead of setting a per-directory maximum count; Wilcox didn't care how the mechanism worked internally, but insisted that it had to be self-tuning.

Dave Chinner, instead, wondered about the need for this kind of mechanism at all. He suggested that the offending applications should just be confined to a memory control group; when memory gets tight within the group, the system will reclaim memory inside that group, including negative dentries. There is, he said, already an effective mechanism for limiting the amount of memory used by a specific application, so there should be no need to add another.

Long answered that, while control groups can help, they don't solve the entire problem. Large numbers of negative dentries can impact the performance of the program generating them, even if a control group isolates the rest of the system from the problem. He also pointed out that daemons often run in the root control group, where they cannot be constrained in this manner.

As has happened every time that this patch set has been posted, the discussion wound down without any sort of conclusion on how things should proceed. This patch set seems no closer to the mainline than it was years ago; a search for control over negative dentries in the kernel will return a negative result.

Comments (54 posted)

Filesystem-oriented flags: sad, messy and not going away

By Jonathan Corbet
March 16, 2020
Over the last decade, the addition of a "flags" argument to all new system calls, even if no flags are actually needed at the outset, has been widely adopted as a best practice. The result has certainly been greater API extensibility, but we have also seen a proliferation of various types of flags for related system calls. For calls related to files and filesystems, in particular, the available flags have reached a point where some calls will need as many as three arguments for them rather than just one.

One set of filesystem-oriented flags will be familiar to almost anybody who has worked with the Unix system-call API: the O_ flags supported by calls like open(). These flags affect how the call operates in a number of ways; O_CREAT will cause the named file to be opened if it does not already exist, O_NOFOLLOW causes the open to fail if the final component in the name is a symbolic link, O_NONBLOCK requests non-blocking operation, and so on. Some of those flags affect the lookup process (O_NOFOLLOW, for example) while others, like O_NONBLOCK, affect how the file descriptor created by the call will behave. All are part of one flag namespace that is recognized by all of the open() family of system calls.

open() is one way to create a new entry in a directory; link() is another. When the time came to add flags to link(), the linkat() system call was born; this system call also follows the other relatively new pattern of accepting a file descriptor for the directory in which the operation is to be performed. linkat() has a separate flag namespace (the "AT_ flags") with flags like AT_SYMLINK_FOLLOW, which is the opposite of O_NOFOLLOW. There is also an AT_SYMLINK_NOFOLLOW that is not recognized by linkat(), but which is understood by calls like fchmodat() and execveat(). There are more AT_ flags, such as AT_NO_AUTOMOUNT, supported by the relatively new statx() system call.

Then there is openat2(), which is coming with the 5.6 kernel. Rather than having a separate argument for flags, this system call requires a pointer to an open_how structure:

    struct open_how {
	__u64 flags;
	__u64 mode;
	__u64 resolve;
    };

Here, flags contains the O_ flags common to the open() family, while resolve contains yet another set of flags (the "RESOLVE_ flags"). These include RESOLVE_BENEATH to limit the lookup to files below the provided directory and RESOLVE_NO_SYMLINKS, which is kind of like O_NOFOLLOW or AT_SYMLINK_NOFOLLOW but different: it blocks symbolic-link traversal at all stages of pathname traversal, rather than just for the final component.

LWN has occasionally covered the ongoing story of the proposed fsinfo() system call, which provides information about mounted filesystems. This new API also includes a structure pointer as one of its parameters:

    struct fsinfo_params {
	__u32	at_flags;
	__u32	flags;
	__u32	request;
	__u32	Nth;
	__u32	Mth;
	__u64	__reserved[3];
    };

Here, at_flags is, as one would expect, a set of AT_ flags, while flags is yet another set of flags specific to this system call. Recently, though, fsinfo() author David Howells noted that he had been told that RESOLVE_ flags should be used in preference to AT_ flags in all new system calls, and asked whether the AT_ flags should be considered deprecated. He followed up with a patch marking the AT_ flags as being deprecated and adding new RESOLVE_ flags to cover behaviors that can currently only be requested by AT_ flags. So, for example, he added RESOLVE_NO_TERMINAL_SYMLINKS (later renamed RESOLVE_NO_TRAILING_SYMLINKS) to request the same semantics as AT_SYMLINK_NOFOLLOW.

Christian Brauner argued in favor of moving to RESOLVE_ flags, noting that some of the semantics that are only available via those flags may be of use in settings beyond openat(). He did allow, though, that "we might end up causing more confusion for userspace due to yet another set of flags" — though others might argue that it's a bit late to worry about that at this point.

Linus Torvalds, though, is not a fan of the plan to deprecate the AT_ flags; he noted that software will continue to use flags like O_NOFOLLOW or AT_SYMLINK_NOFOLLOW, so they can't go away. He added:

And yes, the fact that we then have three different user-visible namespaces (O_xyz flags for open(), AT_xyz flags for linkat(), and now RESOLVE_xyz flags for openat2()) is sad and messy. But it's an inherent messiness from just how the world works. We can't get rid of it.

Adding multiple flags that do the same thing leads to complexity and confusion, he said; one might thus conclude that any such patch is unlikely to make it into the mainline. He later said that, if fsinfo() needs features controlled by both AT_ and RESOLVE_ flags, it should accept both; that, along with the flags specific to that system call, adds up to three different sets of flags for one call. One could reasonably conclude that if, for example, openat2() were to implement a feature controlled by an AT_ flag, it would have to accept a third set of flags as well.

So the situation may indeed be "sad and messy", but it doesn't appear that it will be getting any less messy anytime soon. Perhaps one of the messiest aspects of this API is that there is no type checking for any of these flags fields. Nothing but due care prevents a developer from setting a flag in the wrong field. That one may be hard to correct in a backward-compatible way, even if somebody were to be motivated to do it. It is not the biggest mess to be found in our APIs; we'll continue to muddle on with things as they are.

Comments (35 posted)

A QUIC look at HTTP/3

March 13, 2020

This article was contributed by Marta Rybczyńska

The Hypertext Transfer Protocol (HTTP) is a core component of the world-wide web. Over its evolution it has added features, including encryption, but time has revealed its limitations and those of the whole protocol stack. At FOSDEM 2020, Daniel Stenberg delivered a talk about a new version of the protocol called HTTP/3. It is under development and includes some big changes under the hood. There is no more TCP, for example; a new transport protocol called QUIC is expected to improve performance and allow new features.

HTTP/1 and HTTP/2

Each HTTP session requires a TCP connection which, in turn, requires a three-way handshake to set up. Once that is done, "we can send data in a reliable data stream", Stenberg explained. TCP transmits data in the clear, so everyone can read what is transferred; the same thus holds true for the non-encrypted HTTP protocol. However, 80% of requests today are using the encrypted version, called Hypertext Transfer Protocol Secure (HTTPS), according to statistics of Mozilla (Firefox users) and Google (Chrome users). "The web is getting more and more encrypted", Stenberg explained. HTTPS uses Transport Layer Security (TLS); it adds security on the top of the stack of protocols, which are (in order): IP, TCP, TLS, and HTTP. The cost of TLS is another handshake that increases the latency. In return, we get privacy, security, and "you know you're talking to the right server".

HTTP/1 required clients to establish one new TCP connection per object, meaning that for each request, the browser needed to create a connection, send the request, read the response, then close it. "TCP is very inefficient in the beginning", Stenberg explained; connections transmit data slowly just after being established, then increase the speed until they discover what the link can support. With only one object to fetch before closing the connection, TCP was never getting up to speed. In addition, a typical web page includes many elements, including JavaScript files, images, stylesheets, and so on. Fetching one object at a time is slow, so browser developers responded by creating multiple connections in parallel.

That created too many connections to be handled by the servers, so typically the number of connections for each client was limited. The browser had to choose which of its few allowed connections to use for the next object; that led to the so-called "head-of-line blocking" problem. Think of a supermarket checkout line; you might choose the one that looks shortest, only to be stuck behind a customer with some sort of complicated problem. A big TCP efficiency improvement was added for HTTP/1.1 in 1997: open TCP connections can be reused for other requests. That improved the slow-start problem, but not the head-of-line blocking issue, which can be made even worse.

HTTP/2 from 2015 uses a single connection per host, allowing TCP to get up to speed. However, the head-of-line blocking problem became even more serious at the TCP connection level. In HTTP/1 the problem was that one longer request could block others waiting for the same connection. In HTTP/2, the single connection carries hundreds of streams. In this case, when we lose one packet, "one hundred streams are waiting for that one single packet", Stenberg said. As a reminder, this is because TCP will retransmit the missing packet only when the network stack figures out that it was lost, and the network stack will only pass the data received after the gap when the missing packet arrives.

The "boxes"

Another trend Stenberg explained is protocol ossification (which LWN looked at in 2018). He explained it in the following way: the Internet is full of "boxes" (they are often called "middleboxes") such as routers and gateways. They were installed at some time and are running software to handle networking protocols as they existed at that time. The problem is that, "they know how the Internet works — at the time they were installed". For those boxes, if a given packet-header field was always zero, it is never going to be anything else. What is worse is that those boxes do not get upgraded. They are "stuck in time", he said. This is different than the servers or the web browsers, which are updated regularly.

The existence of those boxes brings limitations to the development of new versions of the HTTP protocol. An example of this is the use of TCP protocol port 80 assigned for HTTP/1.1, which is unencrypted. Currently no browser speaks HTTP/2.0 in clear text on that port. "One browser tried to do it until they figured out it doesn't work", he said. The middleboxes modified (or blocked) the traffic based on their understanding of HTTP/1, breaking HTTP/2 traffic.

Another idea to improve the protocol was to send data earlier in the TCP connection, a functionality called TCP fast open (or TFO; LWN covered it in 2012). It allowed browsers to send request data in the packets of the TCP handshake itself. Stenberg explained that it took five to seven years until all kernels supported it. Then the browsers tried it ... and it did not work. Middleboxes would just drop the TFO packets. Currently no browser enables TFO by default. A similar story happened with Brotli compression. The middleboxes only know gzip, so they break the connections using Brotli. Currently this compression is used only over HTTPS. He concluded that the introduction of new transport protocols does not work, because "your router at home will only route TCP and UDP".

The definition of HTTP/3

The difficulties with innovation in HTTP were one of the reasons for the creation of the QUIC working group at the IETF in 2016. QUIC is a name, not an acronym, Stenberg highlighted. A number of companies are interested in this development. The work of the IETF group is built on experiments with Google QUIC, a protocol deployed first in 2013 (LWN looked at it that year). The experiments used HTTP requests over UDP, with widely used client and web services. This experiment carried a fair amount of HTTP traffic, and was taken to IETF, where the working group started. Currently the IETF version is significantly different from the Google one: it includes a new transport protocol and application level.

IETF's QUIC fixes the head-of-line blocking issues and allows early data transmission like TCP fast open does. The encryption is built-in; no clear-text version of QUIC exists. HTTP/3 implemented over QUIC includes fewer clear-text messages than HTTP/2.

During the development of QUIC, the group also addressed some other modern challenges. TCP was defined with a connection tied to an IP address. Currently, devices can have multiple addresses and change them when users move around. With TCP, a new connection must be created when the interface address changes. QUIC uses a session identification separate from IP addresses to solve this problem.

QUIC uses UDP, but in a limited fashion that is more similar to the use of IP than UDP. The transport layer is in the higher layer of QUIC, above UDP; it adds connections, reliability, flow control, and security. A big difference with TCP is how QUIC handles streams within a connection. QUIC can send multiple streams in a single connection, in either direction. They are all independent, initiated by the server or the client. If a packet is lost, the implementation knows which stream is affected; only that stream will have to wait for a retransmission. The streams are internally reliable and in-order.

Applications run on top of QUIC. The protocol definition was started for HTTP; others are expected to follow, DNS for example. The definition of other application protocols is expected to start around when QUIC ships.

HTTP over QUIC is the "same but different", Stenberg said. There will still be the GET command that should be familiar to most readers, but the way the command is transmitted changes. Stenberg explained the history of HTTP: HTTP/1 was in ASCII, HTTP/2 was binary multiplexed, and HTTP/3 is binary over multiplexed QUIC, with TLS 1.3.

HTTP/3 will be faster thanks to the improved handshakes. Early numbers from the experiments showed 70% of connections with no round-trip-time (RTT) delay because the connections were already there had been established previously. The protocol allows early data, so it should improve latency even when a connection does not already exist. The independent streams should also help in low-quality networks. He noted that he could not show numbers for now, as the protocol is not finished yet. However, the expectations is that it will be "a little better to much better".

Deployment

HTTPS URLs are everywhere; they cannot be replaced without rewriting the entire web. They imply the use of TCP port 443 with TLS. The migration to HTTP/3 will thus require a connection to a legacy server. If a site supports HTTP/3, it will provide an Alt-svc header giving the server to connect to. Browsers will check that and make the second connection in the background, or they will just try both protocols at the same time. "There will be a lot of probing", he noted. There will also be support in the domain name system in the form of a new record called HTTPSSVC that will allow the provision of information on the connection parameters. In practice, it will mean asking the DNS first to check if HTTP/3 can be used.

There will be a few challenges. One difficulty may be that many companies block UDP by default as a way of blocking distributed denial-of-service attacks. With UDP, 3-7% of connections will fail due to blocking somewhere in the network. Clients need to have fallback algorithms and use them transparently. That leads to another problem: there will be no incentive to unblock UDP because the fallback will be in place.

As of today, QUIC stacks are implemented in user space to allow easy testing. "But you need to stick to one library as there are no standard APIs", Stenberg said. There are a dozen implementations right now, in many languages. Interoperability tests happen every month and the current version of the protocol as of March 2020 is draft 27.

HTTP/3 is expected to use two to three times more CPU time than the earlier versions for the same bandwidth consumption. This might delay deployment for a while. One of the reasons is that UDP is not well optimized in Linux, while "we've been polishing TCP for years", he said. Currently UDP is not made for high-volume traffic and there is no hardware offload for QUIC. In addition, performance suffers since there are also quite a few transitions between kernel and user space because the protocol stack is implemented in user space. For now, he doesn't know if QUIC will be moved into the kernel. There are some efforts to do so, but it requires a new implementation of TLS in the kernel.

TLS usage in QUIC is different, so that existing offloads will not work. The TLS protocol transmits data using "TLS records"; the records may include one or more TLS messages, and one message may span over the record boundary. In the case of TLS over TCP, both records and messages are used. Over QUIC it will send messages only, records are not needed anymore. This changes the way the TLS libraries are used and the needed APIs.

As the use of the TLS library changes between TCP and QUIC, new APIs are necessary. An OpenSSL pull request adding the QUIC APIs (PR 8797) is still being discussed; this is expected to take a while. Then, when it gets accepted, there will be another delay until it is available in a release and deployed.

Changes to the transport protocol will also force changes in the associated tools. tcpdump is not ready yet, for example. The existing tools that do understand QUIC are Wireshark and the two QUIC-specific tools qlog and qvis. Stenberg is the author of curl, which supports the latest drafts (version 25 at the time of his talk in February 2020), but without the fallback functionality; "fallback is tricky", he says. He summarized that "there is definitely a shortage" of tools and a lot of work to do.

On the browser side, nightly builds of Chrome and Firefox can have HTTP/3 enabled. For those who want to run experiments, they need to enable some specific options. In Firefox, nightly builds include HTTP/3 support; the user should go to about:config and change network.http.http3.enabled to true. Chrome Canary (not for Linux) requires specific options when launching: --enable-quic and --quic-version=h4-25 [at the time of the talk, see comments]. On the server side, an NGINX patch exists to use quiche quiche (a library implementing QUIC) for experiments. However, the other big servers, including Apache, IIS, and the official version of NGINX, do not have it yet. There is no support in the Safari browser either.

The date when the protocol will ship is not set yet, as the group prefers to do it right, not fast; he hopes for July 2020. Currently the libraries are in alpha versions; they will ship when the specification is ready. Browsers require updates of the TLS libraries. The deployment is expected to take time. He expects that it will grow more slowly than HTTP/2, but HTTP/3 is there for the long term.

Once the protocol is ready, people are waiting to add new features to QUIC, including multipath (accessing the same site using different network connections), forward error correction, and unreliable and partially reliable streams ("for video people"). Of course, other applications will also appear. QUIC development will move to version 2 after version 1 ships.

Slides [PDF] and a video of the talk are available.

Comments (41 posted)

Improving pretty-printing in Python

By Jake Edge
March 18, 2020

The python-ideas mailing list is typically used to discuss new features or enhancements for the language; ideas that gain traction will get turned into Python Enhancement Proposals (PEPs) and eventually make their way to python-dev for wider consideration. Steve Jorgensen recently started a discussion of just that sort; he was looking for a way to add customization to the "pretty-print" module (pprint) so that objects could change the way they are displayed. The subsequent thread went in a few different directions that reflect the nature of the mailing list—and the idea itself.

Jorgensen prefaced his thoughts with a disclaimer of sorts: "This is really an idea for an idea [...]". He suggested that adding a "dunder" method to Python objects for pretty-printing purposes. Those methods have names that start and end with double underscores (i.e. "dunder"); they are used internally by Python for a number of standard tasks (e.g. __init__()). A new one might allow objects to represent themselves differently in Unicode streams:

The informal (`str`) representations of `inf` and `-inf` are "inf" and "-inf", and that seems appropriate as a known-safe value, but if we're writing the representation to a stream, and the stream has a Unicode encoding, then those might prefer to represent themselves as "∞" and "-∞". If there were a dunder method for informal representation to which the destination stream was passed, then the object could decide how to represent itself based on the properties of the stream.

Beyond that, objects might like to control how they are pretty-printed in the general case. pprint provides some amount of customization, in terms of text width, indentation, and traversal depth, but he is looking for more than that:

It would be nice if there were some method that, if implemented for the object, would be used to allow the object to tell the pretty printer to treat it is a composite with starting text, component objects, and ending text.

Guido van Rossum thought the idea had some merit. He suggested that a pprint alternative "that allows classes to have formatting hooks that get passed in some additional information (or perhaps a PrettyPrinter object) that can affect the formatting" might make sense. It would be the type of feature that could be developed as independent modules on the Python Package Index (PyPI), "*except* it would be more effective if there was a standard, rather than several competing such modules, with different APIs for the formatting hooks". He encouraged a discussion on what that API might look like.

Jonathan Fine offered up some potential starting points, at least in terms of design, in the reprlib and json modules in the standard library. Eric V. Smith pointed to the @functools.singledispatch decorator as a potential pattern to use; it allows for overloaded functions based on the type of the first argument.

But the definition of some putative __pretty__() method on objects could be problematic, Barry Scott said. "Pretty" is "in the eye of the beholder", so he is skeptical that objects can define a one-size-fits-all implementation; for example, internationalization and localization might be required. Instead of driving it from the object side, he would rather have something that takes an object "and returns the pretty version depending on the apps demands/config". Stephen J. Turnbull more or less concurred with that:

Allowing objects to decide implicitly how to represent themselves is usually a bad idea, and we shouldn't encourage it. Yes, it's *very* cool that you can do things like "π = math.pi", and with MacroPy you can even do things like substitute "λ" for "lambda". However, if ways are provided to do this automatically depending on encodings and other variable environment state, people *will* put them into public libraries, and clients of those libraries will have to compensate for that. And of course there's the potential for foot-shooting in private libraries.

If an application wants to make such substitutions, I have no objection to that. But "explicit is better than implicit", and those substitutions should be made at the level of application I/O, not the class level IMO. (Yes, I know those "levels" are ill-defined, but that's still an appropriate informal principle, I think.)

Christopher Barker was concerned that adding a new dunder method for pretty-printing, beyond the existing __str__() and __repr__(), might just lead to the need for more than one version of "pretty". He wondered about updating __str__() for standard types, so that the output was "pretty" by default, but recognized that it would likely break many things: "I imagine a LOT of code out there (doctests, who know what else) does in fact expect the str() of builtins not to change -- so this is probably dead in the water." But beyond the code (and documentation) upheaval, it is far from clear what "pretty" means, as Steven D'Aprano pointed out:

Define "pretty". The main reason I don't use the pprint module at the moment is that it formats things like lists into a single long, thin column which is *less* attractive than the unformatted list:
    py> pprint.pprint(list(range(200)))
    [0,
     1,
     2,
     3,
     ...
     198,
     199]

I've inserted the ellipsis for brevity, the real output is 200 rows tall.

When it comes to floats, depending on what I'm doing, I may consider any of these to be "pretty":

  • the minimum number of digits which are sufficient to round trip;
  • the mathematically exact value, which could take a lot of digits;
  • some short number of digits, say, 5, that is "close enough".

Turnbull agreed with Barker that doctest-based tests would be affected by a change to str() (which calls __str__() if present), but that other things would be broken as well, which is something that the project tries to avoid:

Python may be good for developers who are moving fast and breaking things, but that's partly because (despite frequent complaints to the contrary) we don't move fast and break things most of the time.

Beyond the standard library modules, Alex Hall noted two projects on GitHub that may be of interest: PrettyPrinter and pprint++. Jorgensen said that he is looking at those as well as the others suggested in the thread. He is continuing the discussion, but is now thinking that adding dunder methods is not the right approach:

There has been some argument regarding whether objects should say how to present themselves "prettily". I think a case can be made either way, but in either case, it makes sense that it should be easy to override the representation for an object type without subclassing or monkey-patching it. Also, it might make sense not to clutter up the dunder-method space for all kinds of objects with this kind of thing.

Instead, he suggested adding a way for objects to register hooks governing how they want to be represented. It is still in the early going for any pretty-printing improvements; Jorgensen posted his initial message on March 15. Any wrangling over an API is still down the road a bit; a PEP and changes to the language, if any, are further out still. But there does seem to be a contingent that favors a feature of this sort, so it may well work its way into, say, Python 3.10, presumably sometime in 2021.

Comments (2 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>


Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds