Leading items

Welcome to the LWN.net Weekly Edition for January 16, 2020

This edition contains the following feature content:

The dark side of expertise: a linux.conf.au keynote on when our expertise can blind us.
Grabbing file descriptors with pidfd_getfd(): a proposed new system call.
configfd() and shifting bind mounts: a different way of dealing with the complex mount configuration problem.
Accelerating netfilter with hardware offload, part 1: how packet-filtering tasks can be shifted to the hardware, and the work that needed to be done to prepare the kernel to support offloaded netfilter.
Poker and FOSS: Bradley Kuhn explores the intersection of online poker playing and free software.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

The dark side of expertise

By Jake Edge
January 15, 2020

LCA

Everyone has expertise in some things, which is normally seen as a good thing to have. But Dr. Sean Brady gave some examples of ways that our expertise can lead us astray, and actually cause us to make worse decisions, in a keynote at the 2020 linux.conf.au. Brady is a forensic engineer who specializes in analyzing engineering failures to try to discover the root causes behind them. The talk gave real-world examples of expertise gone wrong, as well as looking at some of the psychological research that demonstrates the problem. It was an interesting view into the ways that our brains work—and fail to work—in situations where our expertise may be sending our thoughts down the wrong path.

Brady began his talk by going back to 1971 and a project to build a civic center arena in Hartford, Connecticut in the US. The building was meant to hold 10,000 seats; it had a large roof that was a "spiderweb of steel members". That roof would be sitting on four columns; it was to be built on the ground and then lifted into place.

As it was being built, the contractors doing the construction reported that the roof was sagging while it was still on the ground. The design engineers checked their calculations and proclaimed that they were all correct, so building (and raising) should proceed. Once on the columns, the roof was bending and sagging twice as much as the engineers had specified, but after checking again, the designers said that the calculations were all correct.

Other problems arose during the construction and, each time the contractors would point out some problem where the design and reality did not mesh, the designers would dutifully check their calculations again and proclaim that all was well. After it went up, Hartford residents twice contacted city government about problems they could see with the roof; but the engineers checked the calculations once again and pronounced it all to be fine.

In 1978, the first major snowstorm since the construction resulted in an amount of snow that was only half of the rated capacity of the roof—but the roof caved in. Thankfully, that happened in the middle of the night; only six hours earlier there had been 5,000 people in it for a basketball game.

So, Brady asked, what went wrong here? There were "reasonably smart design engineers" behind the plans, but there were also multiple reports of problems and none of those engineers picked up on what had gone wrong. In fact, there seemed to be a reluctance to even admit that there was a problem of any kind. It is something that is seen in all fields when analyzing the causes of a failure; it turns out that "people are involved". "We have amazingly creative ways to stuff things up."

Expertise

Before returning to finish the story about the arena, Brady switched gears a bit; there are lots of different human factors that one could look at for failures like that, he said, but he would be focusing on the idea of expertise. Humans possess expertise in various areas; expertise is important for us to be able to do our jobs effectively, for example. We also tend to think that more expertise is better and that it reduces the chances of mistakes. By and large, that is correct, but what if sometimes it isn't? "The greatest obstacle to knowledge is not ignorance ... it is the illusion of knowledge", he said, quoting (or paraphrasing) a famous quote.

Before digging further in, he wanted to show "how awkward your brain is". He did so with a variant of the Müller-Lyer optical illusion that shows two lines with arrows at the end, one set pointing out and the other pointing in (a version from Wikipedia can be seen on the right). The straight line segments are the same length, which he demonstrated by placing vertical lines on the image, even though that's not what people see. He asked the audience to keep looking at the slide as he took the lines away and restored them; each time the vertical lines were gone, the line with inward pointing arrows would return to looking ~~shorter~~ longer than the other. "It's like you learned absolutely nothing", he said to laughter. Your brain knows they are the same length, but it cannot make your eye see that.

Mann Gulch

A similar effect can be seen in lots of other areas of human endeavor, he said. He turned to the example of the Mann Gulch forest fire in 1949 in the US state of Montana. A small fire on the south-facing side of a gulch (or valley) near the Missouri River was called in and a team of smokejumpers was dispatched to fight it before it could really get going.

Unfortunately, the weather conditions, abnormally high temperatures, dry air, and wind, turned the small fire into an enormous conflagration in fairly short order. In less than an hour after the smokejumpers had gathered up the equipment dropped from the plane (and found that the radio had not survived the jump due to a malfunctioning parachute), the firefighters were overrun by the fire and most of them perished.

The foreman of the team, Wagner Dodge, followed the generally accepted practices in leading the men to the north-facing slope, which was essentially just covered with waist-high grass, and then down toward the river to get to the flank of the fire. From what they knew, the fire was burning in the heavily timbered slope on the other side of the gulch. As it turned out, the fire had already jumped the gulch and was burning extremely quickly toward them, pushed by 20-40mph winds directly up the gulch into their faces. Once he recognized the problem, Dodge realized that the team needed to head up the steep slope to the top of the ridge and get over to the other side of it, which was complicated by the presence of a cliff at the top that would need to be surmounted or crossed in some fashion.

When they turned back and started up the ridge, the fire was 150 yards away and moving at 3mph; in the next 12-14 minutes it completely overtook the team. The men were carrying heavy packs and equipment so they were only moving at around 1mph on the steep slope. Dodge gave the order for everyone to drop their equipment and packs to speed their way up the slope, but many of the men seemed simply unable to do that, which slowed them down too much.

It took many years to understand what happened, but the fire underwent a transformation, called a "blow up", that made it speed up and intensify. It was burning so hard that a vacuum was being created by the convection, which just served to pull in even more air and intensify it further. It was essentially a "tornado of fire" chasing the men up the slope and, by then, it was moving at around 7mph.

Once Dodge realized that many of them were not going to make it to (and over) the ridge, he had to come up with something. For perhaps the first time in a firefighting situation, he lit a fire in front of them that quickly burned a large patch of ground up and away from the team. His idea was that the main fire would not have any fuel in that area. He ordered the men to join him in that no-fuel zone to hunker down and cover themselves as the fire roared past them, but none could apparently bring themselves to do so. Only the two youngest smokejumpers, who had reached the ridge and miraculously found a way through a crevice in the cliff in zero visibility, survived along with Dodge. Thirteen men died from the fire.

There are two things that Brady wanted to focus on. Why did the men not drop their tools and packs? And why didn't they join Dodge in the burned-out zone? If we can answer those questions, we can understand a lot about how we make decisions under pressure, he said.

Priming

In order to do that, he wanted to talk about a psychology term: priming. The idea is that certain information that your brain takes in "primes" you for a certain course of action. It is generally subconscious and difficult to overcome.

There was a famous experiment done with students at New York University that demonstrates priming. The students were called into a room where they were given a series of lists of five words that they needed to reorder to create four-word sentences, thus discarding one word. The non-control group had a specific set of words that were seeded within their lists; those words were things that normally would be associated with elderly people.

They then told the students to go up the hallway to a different room where there would be another test. What the students didn't know was that the test was actually in the hallway; the time it took each participant to walk down the hall was measured. It turned out that the students who had been exposed to the "elderly words" walked more slowly down the hall. Attendees might be inclined to call that "absolute crap", Brady suggested, but it is not, it is repeatable and even has a name, "the Florida effect", because Florida was used as one of the words associated with the elderly.

It seems insane, but those words had the effect of priming the participants to act a bit more elderly, he said. So to try to prove that priming is real, he played a different word game with the audience; it is called a "remote associative test". He put up three words on the screen (e.g. blue, knife, cottage) and the audience was to choose another word that went with all three (cheese, in that case). The audience did quite well on a few rounds of that test.

But then Brady changed things up. He said that he would put up three words, each of which was followed by another in parentheses (e.g. dark (light), shot (gun), sun (moon), answer: glasses); he told everyone to not even look at the parenthesized words. When he put the first test up, the silence was eye-opening. The words in parentheses, which no one could stop themselves from reading, of course, would send the brain down the wrong path; it would take a lot of effort to overcome the "negative priming" those words would cause. It is, in fact, almost impossible to do so.

The tests were designed by "evil psychologists" to send your brain down the wrong solution path, he said; once that happens, "you cannot stop it". "We are not nearly as rational as we think we are". If he repeated the test later without the extra negative-priming words, people would be able to come up with the right solution because their brain had time to forget about the earlier path (and the words that caused it). This is the same effect that causes people to find a solution to a problem they have in the shower or on a walk; the negative-priming influence of their work surroundings, which reinforce the non-solution path they have been on, is forgotten, so other solution paths open up.

"So at this point you might say, 'hang on Sean, those are some fancy word games, but I'm a trained professional'", he said to laughter. He suggested that some in the audience might be thinking that their expertise would save them from the effects of negative priming. Some researchers at the University of Pittsburgh wanted to test whether our expertise could prime us in the way that the parenthesized words did. They designed a study to see if they could find out.

They picked a control group, then another group made up of avid baseball fans, and did a remote associative test with both groups. Instead of putting words in parentheses, though, they allowed the baseball fans to prime themselves by using words from common baseball phrases as the first word in the test; that word was deliberately chosen to send them down an incorrect solution path.

For example, they would use "strike", "white", and "medal"; a baseball fan would think of "out", which works for the first two, but not the last, and they would get stuck at that point. Those who don't have baseball expertise will likely end up on the proper solution, which is "gold". As might be guessed, the baseball fans "absolutely crashed and burned" on the test. Interestingly, at the end of the test they were asked if they used their baseball knowledge in the test, but they said: "No, why would I? It had nothing to do with baseball." The expertise was being used subconsciously.

In another test, they would warn the baseball fans ahead of time that the test was meant to mess with their head and use their baseball knowledge against them, so that they should not use that knowledge at all for the test. They did just as poorly compared to the control group, which showed that the use of expertise is not only subconscious, but it is also automatic.

Back to the fire

Brady then circled back to the forest fire; the men in Mann Gulch "can no sooner drop their firefighting expertise than the baseball fans could". They could not drop their physical tools and they could not drop their mental tools that told them they had to get to the ridge. They also could not accept new tools, he said; when Dodge showed them the ash-covered area that the new fire had created, they did not accept it as a new tool, instead they "defaulted to their existing expertise and worked with the tools they had".

There is a name for this, he said, it is called "The Law of the Instrument": "When all you have is a hammer, everything looks like a nail." We are all running around with our respective hammers looking for nails to hit. "We see the world through the prism of our expertise and we cannot stop ourselves from doing so."

After Mann Gulch, firefighters were told that if they got into a situation of that sort, they should drop their tools and run, but that still did not work. There were fatalities where firefighters were close to their safe zones but found with their packs still on and holding their chainsaws. The next step was to properly retrain them by having them run an obstacle course with and without their gear, then showing them how much faster they could run without it. It sounds silly, Brady said, but it worked because it gave them a new tool in their mental toolbox.

The one exception at Mann Gulch, though, is Dodge, who dropped both his physical and mental tools. He came up with a new tool on the spot; "escape fires" became part of the training for firefighters after Mann Gulch. How did that happen? Psychologists have a term for this as well, it is called "creative desperation"; when their back is truly to the wall, some will recognize that their expertise is not working and will not solve the problem at hand. At that point they drop their tools and see the facts for what they are, which allows them to find a solution that was outside of the path their expertise was leading them down.

Brady then returned all the way to the beginning and the Hartford civic center roof collapse. Even though there were repeated warnings that something was wrong with the design of the roof, the engineers defaulted to their expertise: "Our calculations say it's OK, so it must be OK."

This was the early 1970s, he said, why were these engineers so confident in their calculations? As guessed by many in the audience, the reason for that was "computers". In fact, when they won the bid, they told the city of Hartford that they could save half a million dollars in construction costs "if you buy us this new, whiz-bang thing called a computer". It turned out that the computer worked fine, but it was given the wrong inputs. There was an emotional investment that the engineers had made in the new technology, so it was inconceivable to them that it could be giving them the wrong answers.

He concluded by saying that no matter what field we are in, we will all encounter situations where our expertise is not a perfect fit for the problem at hand. It is important to try to recognize that situation, drop the tools that we are trying to default to, and see the facts for what they are, as Dodge did in Mann Gulch. He ended with a quote from Lao Tzu: "In pursuit of knowledge, every day something is acquired. In pursuit of wisdom, every day something is dropped."

It was an engaging, thought-provoking talk, which is generally the case with keynotes at linux.conf.au. Brady is a good speaker with a nicely crafted talk; there is certainly more that interested readers will find in the YouTube video of his presentation.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Gold Coast for linux.conf.au.]

Comments (37 posted)

Grabbing file descriptors with pidfd_getfd()

By Jonathan Corbet
January 9, 2020

In response to a growing desire for ways to control groups of processes from user space, the kernel has added a number of mechanisms that allow one process to operate on another. One piece that is currently missing, though, is the ability for a process to snatch a copy of an open file descriptor from another. That gap may soon be filled, though, if the pidfd_getfd() system-call patch set from Sargun Dhillon is merged.

One thing that is possible in current kernels is to open a file that another process also has open; the information needed to do that is in each process's /proc directory. That does not work, though, for file descriptors referring to pipes, sockets, or other objects that do not appear in the filesystem hierarchy. Just as importantly, though, opening a new file in this way creates a new entry in the file table; it is not the entry corresponding to the file descriptor in the process of interest.

That distinction matters if the objective is to modify that particular file descriptor. One use case mentioned in the patch series is using seccomp to intercept attempts to bind a socket to a privileged port. A privileged supervisor process could, if it so chose, grab the file descriptor for that socket from the target process and actually perform the bind — something the target process would not have the privilege to do on its own. Since the grabbed file descriptor is essentially identical to the original, the bind operation will be visible to the target process as well.

For the sufficiently determined, it is actually possible to extract a file descriptor from another process now. The technique involves using ptrace() to attach to that process, stop it from executing, inject some code that opens a connection to the supervisor process and sends the file descriptor via an SCM_RIGHTS datagram, then running that code. This solution might justly be said to be slightly lacking in elegance. It also requires stopping the target process, which is likely to be unwelcome.

This functionality, without the need to stop the target process, is relatively easy to implement in the kernel, though; a supervisor process would merely need to make a call to:

    int pidfd_getfd(int pidfd, int targetfd, unsigned int flags);

The target process is specified by pidfd (which is, as one might expect, a pidfd, presumably obtained when the process was created). The file descriptor to grab is given by targetfd; if all goes well, the return value will be a local file-descriptor number corresponding to the target process's file. For all to go well, the calling process must have the ability to call ptrace() on the target process.

The flags argument is currently unused and must be zero. There are, evidently, plans to add flags in the future, though. One would cause the file descriptor to be closed in the target process after being copied to the caller, thus truly "stealing" the descriptor from the target. Another would remove any related control-group data from socket file descriptors during the copy operation.

This patch set has been through an impressive number of versions — and a fair amount of evolution — since it was first posted on December 5. The initial version added a new PTRACE_GETFD command to ptrace(). Version 3 switched to an ioctl() operation on a pidfd instead. In version 5, fifteen days after the initial posting, this functionality moved into a separate system call. The current posting is version 9.

From the beginning there has not been much concern about the goals behind this feature; the comments have mostly focused on the implementation. At this point, Dhillon would appear to have just about exhausted the set of possible implementations — though some might be justified in thinking that a BPF version in the near future is inevitable. Failing that, this new system call may well be on track for the 5.6 or 5.7 merge window.

Comments (15 posted)

configfd() and shifting bind mounts

By Jonathan Corbet
January 10, 2020

The 5.2 kernel saw the addition of an extensive new API for the mounting (and remounting) of filesystems; this article covered an early version of that API. Since then, work in this area has mostly focused on enabling filesystems to support this API fully. James Bottomley has taken a look at this API as part of the job of redesigning his shiftfs filesystem and found it to be incomplete. What has followed is a significant set of changes that promise to simplify the mount API — though it turns out that "simple" is often in the eye of the beholder.

The mount API work replaces the existing, complex mount() system call with a half-dozen or so new system calls. An application would call fsopen() to open a filesystem stored somewhere or fspick() to open an already mounted filesystem. Calls to fsconfig() set various parameters related to the mount; fsmount() is then called to mount a filesystem within the kernel and move_mount() to attach the result to the filesystem hierarchy somewhere. There are a couple more calls to fill in other parts of the interface as well. The intent is for this set of system calls to be able to replace mount() entirely with something that is more flexible, capable, and maintainable.

Back in November, Bottomley discovered one significant gap with the new API: it is not possible to use it to set up a read-only bind mount. The problem is that bind mounts are special; they do not represent a filesystem directly. Instead, they can be thought of as a view of a filesystem that is mounted elsewhere. There is no superblock associated with a bind mount, which turns out to be a problem where the new API is concerned, since fsconfig() is designed to operate on superblocks. An attempt to call fsconfig() on a bind mount will end up modifying the original mount, which is almost certainly not what the caller had in mind. So there is no way to set the read-only flag for a bind mount.

David Howells, the creator of the new mount API, responded that what is needed is yet another system call, mount_setattr(), which would change attributes of mounts. That would work for the read-only case, Bottomley said, but it falls down when it comes to more complex situations, such as his proposed UID-shifting bind mount. Instead, he said, the file-descriptor-based configuration mechanism provided by fsconfig() is well suited to this job, but it needs to be made more widely applicable. He suggested that this interface be made more generic so that it could be used in both situations (and beyond).

He posted an initial version of this proposed interface in November, and has recently come back with an updated version. It adds two new system calls:

    int configfd_open(const char *name, unsigned int flags, unsigned int op);
    int configfd_action(int fd, unsigned int cmd, const char *key, void *value,
    			int aux);

A call to configfd_open() would open a new file descriptor intended for the configuration of the subsystem identified by name; the usual open() flags would appear in flags, and op defines whether a new configuration instance is to be created or an existing one modified. configfd_action() would then be used to make changes to the returned file descriptor. The fsconfig() system call (along with related parts like fsopen() and fspick()) is reimplemented using the new calls. Bottomley provides an example for mounting a tmpfs filesystem:

    fd = configfd_open("tmpfs", O_CLOEXEC, CONFIGFD_CMD_CREATE);
    configfd_action(fd, CONFIGFD_SET_INT, "mount_attrs", NULL,
		    MOUNT_ATTR_NODEV|MOUNT_ATTR_NOEXEC);
    configfd_action(fd, CONFIGFD_CMD_CREATE, NULL, NULL, 0);
    configfd_action(fd, CONFIGFD_GET_FD, "mountfd", &mfd, O_CLOEXEC);
    move_mount("", mfd, AT_FDCWD, "/mountpoint", MOVE_MOUNT_F_EMPTY_PATH);

The configfd_open() call creates a new tmpfs instance; the first configfd_action() call is then used to set the nodev and noexec mount flags on that instance. The filesystem mount is actually created with another configfd_action() call, and the third such call is used to obtain a file descriptor for the mount that can be used with move_mount() to make the filesystem visible.

With that infrastructure in place, Bottomley is able to reimplement his shiftfs filesystem as a type of bind mount. A shifting bind mount will apply a constant offset to user and group IDs before forwarding operations to the underlying mount; this is useful to safely allow true-root access to an on-disk filesystem from within a user namespace.

Only one developer, Christian Brauner, has responded to this patch series so far; he doesn't like it. It is an excessive collection of abstraction layers, he said, and it creates another set of multiplexing system calls, a design approach that is out of favor these days:

If they are ever going to be used outside of filesystem use-cases (which is doubtful) they will quickly rival prctl(), seccomp(), and ptrace(). That's not a great thing. Especially, since we recently (a few months ago with Linus chiming in too) had long discussions with the conclusion that multiplexing syscalls are discouraged, from a security and api design perspective.

Unsurprisingly, Bottomley disagreed. He argued that there is a common pattern that arises in kernel development: a subsystem that is complicated to configure, but then relatively simple to use. Filesystem mounts are an example of this pattern; the setup is hard, but then they can all be accessed through the same virtual filesystem interfaces. Cryptographic keys and storage devices were also mentioned. It would be better, he said, to figure out a common way of interfacing with these subsystems rather than inventing slightly different interfaces every time. The configuration file descriptor approach may be a good solution for that common way, he said:

I don't disagree that configuration multiplexors are a user space annoyance, but we put up with them because we get a simple and very generic API for the configured object. Given that they're a necessary evil and a widespread pattern, I think examining the question of whether we could cover them all with a single API and what properties it should have is a useful one.

The conversation appears to have stalled out at this point. It is hard to guess how this disagreement will be resolved, but one thing is fairly straightforward to point out: if the configfd approach is deemed unacceptable for the kernel, then somebody needs to come up with a better idea for how the problems addressed by configfd will be solved. Thus far, that better idea has not yet shown up on the mailing lists.

Comments (21 posted)

Accelerating netfilter with hardware offload, part 1

January 14, 2020

This article was contributed by Marta Rybczyńska

Supporting network protocols at high speeds in pure software is getting increasingly difficult, with 25-100Gb/s interfaces available now and 200-400Gb/s starting to show up. Packet processing at 100Gb/s must happen in 200 cycles or less, which does not leave much room for processing at the operating-system level. Fortunately some operations can be performed by hardware, including checksum verification and offloading parts of the packet send and receive paths.

As modern hardware adds more functionality, new options are becoming available. The 5.3 kernel includes a patch set from Pablo Neira Ayuso that added support for offloading some packet filtering with netfilter. This patch set not only adds the offload support, but also performs a refactoring of the existing offload paths in the generic code and the network card drivers. More work came in the following kernel releases. This seems like a good moment to review the recent advancements in offloading in the network stack.

Offloads in network cards

Let us start with a refresh on the functionality provided by network cards. A network packet passes through a number of hardware blocks before it is handled by the kernel's network stack. It is first received by the physical layer (PHY) processor that deals with the low-level aspects, including the medium (copper or fiber for Ethernet), frequencies, modulation, and so on. Then it is passed to the medium access control (MAC) block, which copies the packet to system memory, writes the packet descriptor into the receive queue, and possibly raises an interrupt. This allows the device driver to start the processing in the network stack.

MAC controllers, however, often include other logic, including specific processors or FPGAs, that can perform tasks far beyond launching DMA transfers. First, the MAC may be able to handle multiple receive queues that allow separating packet processing onto different CPUs in the system. It can also sort packets with the same source and destination addresses and ports, called "flows" in this context; different flows can be redirected to specific receive queues. This has performance benefits, including better cache usage. More than that, the MAC blocks can perform actions on flows, such as redirecting them to another network interface (when there are multiple interfaces in the same MAC), dropping packets in response to a denial-of-service attack, and so on.

The hardware behind that functionality includes two blocks that are important for netfilter offload: a parser and a classifier. The parser extracts fields from packets at line speed; it understands a number of network protocols, so that it can handle the packet at multiple layers. It usually extracts both well-known fields (like addresses and port numbers) and software-specified ones. In the second step the classifier uses the information from the parser to perform actions on the packet.

The hardware implementation of those blocks uses a structure called ternary content-addressable memory (TCAM), a special type of memory that uses three values (0, 1 and X) instead of the typical two (0 and 1). The additional X value means "don't care" and, in a comparison operation, it matches both 0 and 1. A typical parser provides a number of TCAM entries, with each entry associated with another region of memory containing actions to perform. That implementation allows the creation of something like regular expressions for packets; each packet is compared in hardware with the available TCAM entries, yielding the index for any matching entries with the actions to perform.

The number of TCAM entries is limited. For example, controllers in Marvell SoCs like Armada 70xx and 80xx have a TCAM with 256 entries (covered in a slide set [PDF] from Maxime Chevallier's talk about adding support for classification offload to a network driver at the 2019 Embedded Linux Conference Europe). In comparison, netfilter configurations often include thousands of rules. Clearly, one of the challenges of configuring a controller like this is to limit the number of rules stored in TCAM. It is also up to the driver to configure the device-specific actions and different types of classifiers that might be available. The hardware available is usually complex and the drivers usually support only a subset of what is available.

Offload capabilities in MAC controllers can be more sophisticated than that. They include implementations of offloading for the complete TCP stack, called TCP offload engines. Those are currently not supported by Linux, as the code needed to handle them raised many objections years ago from the network stack maintainers. Instead of supporting TCP offloading, the Linux kernel provides support for specific, mostly stateless offloads.

Interested readers can find the history of the offload development in a paper [PDF] from Jesse Brandeburg and Anjali Singhai Jain, presented at the 2018 Linux Plumbers Conference.

Kernel subsystems with filtering offloads

The core networking subsystem supports a long list of offloads to network devices, including checksumming, scatter/gather processing, segmentation, and more. Readers can view the lists of available and active offload functionality on their machine with:

    ethtool --show-offload <interface>

The lists will be different from one interface to another, depending on the features of the hardware and the associated driver. ethtool also allows configuring those offloads; the manual page describes of some of the available features.

The other subsystem making use of hardware offloads is traffic control (tc with the configuration tool of the same name); the tc manual page offers an overview of the available features, in particular the flower classifier, which allows administrators to set up scheduling of network packets. Practical examples of tc use include bandwidth limiting per service or adding priorities to some traffic. Interested readers can find more about tc flower offloads in an article [PDF] by Simon Horman presented at NetDev 2.2 in November 2017.

Up to this point, filtering offloads were possible with both tc and ethtool; these two features were implemented separately in the kernel. This duplication also required duplication of work by authors of network card drivers, as each offload implementation used different driver callbacks. With the advent of a third system adding offload functionality, the developers started working on common paths; this required refactoring some of the common code and changes in the callbacks to be implemented by the drivers.

Summary

Network packet processing with high speed interfaces is not an easy task — the number of CPU cycles available to do so is small. Fortunately, the hardware is offering offload capabilities that the kernel can use to ease the task. In this article we have provided an overview of how a network card works and some offload basics. This is to lay the foundations for the second part, where we're going to look into the details of the changes brought by the netfilter offloading functionality, both in the common code, and how it affects driver authors — and how to use the netfilter offloads, of course.

Comments (15 posted)

Poker and FOSS

By Jake Edge
January 15, 2020

LCA

The intersection of games with free and open-source software (FOSS) was the topic of a miniconf on the first day of this year's linux.conf.au, which was held January 13-17 in Gold Coast, Australia. As part of the miniconf, Bradley M. Kuhn gave a talk that was well outside of his normal conference-talk fare: the game of poker and its relationship to FOSS. It turns out that he did some side work on a FOSS-based poker site along the way, which failed by most measures, but there was also an element of success to the project. The time for a successful FOSS poker project likely has passed at this point, but there are some lessons to be learned from the journey.

The session began with a bit of a jarring sight; after failing to get his "100% free-software laptop" to talk to the projector and running into some other technical hurdle using Karen Sandler's 98% free-software laptop, he ended up presenting his slides from miniconf organizer Tim Nugent's macOS laptop. Kuhn's laptop only has VGA output, which is likely what the input to the projector also is, he said, but having two VGA-to-HDMI adapters in the path has increasingly become a problem for him at conferences. It is yet another example of the difficulty of using free software these days, which is a topic that he and Sandler would be giving a talk on later in the week.

He began with a disclaimer that his employer, Software Freedom Conservancy, had no opinions on nearly everything in the talk. The organization does share his opposition to proprietary software, but online poker and the industry around it are not in its purview, so all of the opinions he would be giving were strictly his own.

Some background

He introduced poker with a definition: "Poker is a gambling game of strategy played by people for money, using cards". The order of the terms in that definition is important, he said. In online poker, though, the "people" element is weakened because you can't see and directly interact with the other people you are playing with. So, unlike real-life poker, online poker is more about sociology than psychology; serious players track the trends of the player base as a whole, rather than trying to recognize the quirks of a particular person.

That means online poker is "really about money". In order to succeed, one has to develop some weird views of the value of money. Even in games with relatively small stakes, players can win or lose a few thousand dollars in a session; in games with "nosebleed stakes", a player could be up or down by a million dollars in an evening. The game is particularly popular in the US, UK, and Australia, he said; it is played online and in face-to-face games in people's homes or at casinos.

Poker became mainstream in the late 1990s, largely due to the "Late Night Poker" television series in the UK. There are a lot of different kinds of poker games, but the show focused on no-limit Texas hold 'em, which is the most "high drama of poker games" so it was well-suited to television. The show pioneered the use of a hole-card camera, so that viewers could see the two unseen cards each player was dealt. That innovation allowed viewers and commentators to analyze the choices that the players were making; without seeing the hole cards, watching other people play poker is about as interesting as "watching paint dry", Kuhn said.

He did not go into the rules of poker much in the talk; a lot of it is not really germane to his topic. The important things to note are that it is a zero-sum, partial-information game where players are playing against each other and not the house (as they are in most other gambling games). It is a game of skill—better players win more over time—but there is a huge element of chance. In order for the house to make any money (casinos are not charities after all), a small percentage of the bets are kept by the house, which is usually called the "rake".

All of that made poker an ideal candidate for online play. He put up a screen shot of a online poker game from 1999 and noted that all of today's poker sites have a similar look. It features a simple user interface that allows players to quickly and easily see the cards and make their bets. Most online poker players do not want sophisticated graphics and the like.

So poker is relatively easy to write an online system for; there are a few "tricky bits", but in comparison to, say, an online multiplayer role-playing game, there are only minimal timing or network-delay issues to handle. It is completely turn-based and the state of the game is easily maintained on the server side. In addition, the client does not need any secret information, so the ability to cheat by extracting secrets from the data sent back and forth is eliminated—or, at least, it should be. The main problem for these systems is scaling them to accommodate as many tables as there is demand for. Serious players want to play in multiple games at once and the house maximizes its revenue by the number of games it can run.

The "watershed moment" for online poker came in 2003 when Chris Moneymaker—his actual birth name, as has been documented—joined into a "satellite tournament" for the World Series of Poker (WSoP). Moneymaker paid $86 to enter the tournament and ended up winning the $10,000 entry into the main WSoP event in Las Vegas; he won that tournament and received $2.5 million for doing so. That created a huge boom in online poker, Kuhn said.

FOSS poker

It turns out that FOSS was both early and late to the online poker world; it was there first, but did not keep up as the market grew, Kuhn said. The rec.gambling.poker newsgroup spawned IRC poker, which was, naturally, played over Internet Relay Chat (IRC). It was not a real money game, but bragging rights within the newsgroup community were important. It eventually stopped being maintained and was gone by the time the other online poker sites started to arise.

Poker hands are ranked based on the rules of the specific type of game, so a "hand evaluator" is needed to determine which hands are better or worse than other hands. The first major hand evaluator and odds calculator was written in 1994 by Cliff Matthews and, perhaps surprisingly, was released under the GPL. It is a highly optimized, fast implementation with a compact hand representation that makes it quite popular to use where hand evaluation is needed.

The hand evaluator code is used by some television shows to display the win likelihood percentage for each of the hidden hands based on the common cards seen so far. He suspects that most of the online poker games use a fork of the code; the server is the only part of the system that needs a hand evaluator, so the forks do not need to be released. He sometimes wonders how things might have been different if a network-services copyleft license (i.e. Affero GPL) had been available and was used by the hand evaluator when it was created.

In 2003, a French company called Mekensleep tried to create the "ultimate poker game" with a 3D site where player avatars would do whizzy animated chip tricks and the like. It looked like a real poker game, but it "failed miserably". As it turned out, though, the company hired Kuhn's lifelong friend Loïc Dachary as its CTO.

Dachary wanted to hire Kuhn as a consultant to work on a free-software implementation of the online system for Mekensleep. But Kuhn was employed at the Free Software Foundation at the time and said he would work on it on weekends for free. He knew that Dachary was morally opposed to proprietary software, so he could trust that the resulting code would be under the GPL or Affero GPL. Kuhn worked on the project from 2003 until 2005, when it failed.

The reason that it failed, he said, was because no one wants poker to be like a regular video game. Poker is gambling and the goal for its players is to maximize the number of hands per hour that they play; for good players that equates to making more money and for bad players, "they get more gambling". No one wants to sit in front of a single game, no matter how exciting the animations of the players are; most people play at least two or three simultaneous games, while serious players will sometimes play up to 100 games at once. When he played online poker seriously, he would come home from work and start up nine games on a site with a lot of bad players that he had found.

Cheating scandals

FOSS lost a lot of opportunity by not jumping on the online poker bandwagon more seriously, especially once the scandals in that world started to come to light. For example, one system that was used by two different large sites had implemented a "god mode" in the (proprietary) client, where a special password entered into it would enable that person to see everyone else's cards. The transparency of FOSS could help some with that particular problem, though shady operators would certainly still have the ability to cheat in a variety of ways.

It took a while to discover the god-mode cheat. It essentially came down to the person who bought the password from the original authors using the knowledge in such a way that it became obvious they could see the cards. Poker players all over the world combined the hand records from their games and found that a certain player had impossibly low values for a particular poker statistic; even the best players lose roughly 60% of the time when they call the final bet, they just win enough on the other 40% to more than cover those losses. But the cheating player only lost 2% in those situations, which is effectively impossible unless you know the other people's cards.

In another scandal, Full Tilt Poker, which was co-founded by Chris Ferguson who had worked on IRC poker, co-mingled the money that it held for its customers (i.e. the balance in their accounts) with its operating funds. Its expenses started eating into the player money and it could not pay out the players. The company folded and the founders were charged with running a pyramid scheme.

More than just the game

The FOSS efforts tended to focus on making the game portion of the system better, but that's not really the important part, Kuhn said. Players want the UI to be essentially the same as that of the original sites from 1999 so they can play more and more games, which means there is not really much opportunity for innovation there.

The other piece of the puzzle is the infrastructure software to make the whole site work. That includes tasks like cashier services, player database management, customer relationship management, and collusion detection. It would be easy for, say, three friends to play at the same table and to share information over the phone or in some other way; other players at that table are at an extreme disadvantage, so it is important to be able to detect and prevent that kind of cheating. The proprietary sites do that, but there is no FOSS anti-collusion equivalent.

But there is a free-software implementation for an online poker game, called Pokersource, though the repository is not available yet. Kuhn got notified about his talk being accepted only around a week before the talk or else he would have had the repository up before the talk, he said. There are some outdated versions out there, but the current code is on his laptop and will be made available soon. It is an outgrowth of the effort by Mekensleep; after the project failed, Dachary built a consulting company called OutFlop around the existing code. That code does not have all of the infrastructure parts that would be needed to turn it into a real online poker site, however.

OutFlop got a large French social media site as its primary client. The site wanted to run poker using play money and award prizes to the winners. That allowed Dachary to pay a network of around ten contractors for part-time work on a FOSS code base, including Kuhn. For around three years, the contractors were working to improve the fully FOSS software, which is certainly something of a success. Not every FOSS project is going to last for generations like Linux, he said; Pokersource is kind of a small FOSS success story

The future

"There's no future for free-software online poker at this point", Kuhn said. In 2006, the US passed a law that prevented banks from processing gambling transactions, which made it much harder for players to deposit money and withdraw winnings from the sites. The US Department of Justice warned all of the sites operating in the US to shut down at that time, but only one complied. Then in 2011, on what is called "Black Friday" in poker circles, all of the online poker sites operating in the US were shut down by the government. The pyramid scheme at Full Tilt Poker came to light within weeks after that as players tried to recover their money.

After the shutdown, one site made a deal with the government to take over the debt that Full Tilt had, pay out the players, and pay all of the fines, with the agreement that it could continue operating as individual US states made online poker legal in their jurisdictions. That company is now a juggernaut that would be very difficult for a FOSS project to overcome, even if the investment into the needed infrastructure code was made.

Play money sites are not really a likely possibility either, he thinks. Even though Pokersource could be used to create such a site, there does not seem to be a lot of interest in doing so. For one thing, Zynga put a play money poker game on Facebook early on and pretty much all of the players who might be interested in playing for prizes or bragging rights are playing there. Beyond that, poker players tend to lose interest if there is no money involved, so it would be hard to attract them to running and maintaining such a site.

In answer to a question from Nugent, Kuhn said that he thought there might be room for FOSS games in the wider gaming world. He does not have a lot of knowledge about gaming, but he believes that poker has quirks that make it hard for FOSS to find a way in; it is so rigidly specified, has no room for UI innovation, and is so money-focused that it may just not be a good fit. A new game type or interaction mechanism could perhaps come out of the FOSS world and have a lot of success.

It was an engaging talk, full of anecdotes and tidbits about online poker, that will presumably be available in video form before too long. There was a lot more that he covered in the talk and in the Q&A session; interested readers may want to track the video down when it is available in the LCA 2020 channel on YouTube.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Gold Coast for linux.conf.au.]

Comments (7 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>