LWN.net Weekly Edition for June 21, 2018
Welcome to the LWN.net Weekly Edition for June 21, 2018
This edition contains the following feature content:
- Toward a fully reproducible Debian: a talk on the reproducible builds project.
- PEP 572 and decision-making in Python: a look at assignment expressions with an eye toward avoiding the thread explosion that came with the discussion of the feature.
- Getting along in the Python community: finding ways to avoid making the mailing lists an unwelcoming place even in the face of rudeness.
- Mentoring and diversity for Python: a discussion on how to increase the diversity within the Python core development team.
- TCP small queues and WiFi aggregation — a war story: tracking down a performance problem in the networking stack.
- 4.18 Merge window, part 2: the last merges before the 4.18 merge window closed.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Toward a fully reproducible Debian
It's been a little over one year since we last covered Debian's reproducible builds project. The effort has not stopped in the interim; progress continues to be made, the message has sharpened up, and word is spreading. Chris Lamb, speaking about this at FLOSS UK in a talk called "You may think you're not a target: a tale of three developers", hinted that the end may be starting to come into sight.The three developers of the title are part of the sharpened message, each being an example of the problem that reproducible builds aim to solve. Alice, a system administrator who contributes to a Linux distribution, is building her binaries on servers that, unknown to her, have been compromised; her binaries are trojan horses, carrying malicious content into systems that run them. Bob, a privacy-oriented developer, makes a privacy-preserving browser, but is being blackmailed into secretly including vulnerabilities in the binaries he provides. Carol is a free-software user whose laptop is being attacked by an evil maid called Eve, the third developer of the title; each time Carol shares free software with her friends, it is pre-compromised by Eve. All of these attacks hurt free-software users and the reputation of free software as a whole.
Worse, the mere existence of these
classes of attack is a disincentive to share software. People like Alice
may reason that, if their servers turn out to be compromised, they will
be blamed for the malicious software they have unwittingly distributed,
and that the potential opprobrium may not be justified by the fleeting
gratitude they currently get for their unpaid work. Others who have
servers, skills, and time they might once have volunteered to help build
free software may similarly decline to paint a target upon themselves.
In Lamb's words, they may say "I'm not going to do this free software
lark. I'm going to go for a walk instead". Participation in the
community is reduced, as is the trust we all place in the binaries
we install.
Building everything from sources that one has hand-inspected is a solution to this, but it doesn't scale. Many of us aren't qualified to spot security weaknesses (Lamb's specific example was the one-line patch that throttled Debian's ability to generate random keys, back in 2008), and in any case you still need to get that initial compiler from somewhere. Many users, even those who love free software and wish to use it in preference to proprietary software, will continue to install binaries. I will be one of them.
But if compilation from a given set of sources in a given environment always resulted in binaries that were bit-for-bit identical to each other, having confidence in the integrity of your binaries would be a much easier proposition, since you could compare your own binary copy with those of a suitable number of others. You could ring a friend and compare checksums, or you could perhaps participate in a distributed checksum validation scheme comparable to the old Perspectives system for distributed validation of SSL certificates. Many strategies for increasing confidence would be possible, but only if the build is reproducible. That is what Debian has been striving for, and why.
There are other advantages to reproducible builds. For developers, they mean that successive generations of a binary should change only in proportion to the source changes that were made between them; the only changes you should see in your binaries are the ones you intended to be there. It helps cut down on unnecessary build dependencies; you can remove a dependency and rebuild, and if the binary hasn't changed, you didn't need that dependency and can get rid of it. In some cases, it can even help find bugs: Lamb referred to a build that had been made non-reproducible by a 15-digit random number that was generated during each build and baked into the resulting binary. It turned out that it was used as an OpenID secret, which meant that everyone running a given build of the software was using the same secret key.
Clearly, reproducible builds are a good thing, but it turns out they aren't trivial. As we reported earlier, many build systems put timestamps inside binaries, which is an obvious problem. But some go further and include user, group, and umask information, and sometimes environment variables, which are also a problem. Build paths are often rolled in, for example in C++ assertions. File ordering can be an issue, because Unix doesn't specify an order in which readdir() and listdir() should return the contents of a directory, so components can get built in an unpredictable order. If these components are packed into a binary in the order they're returned, each build will be different even if no other change has been made.
Similar problems exist with dictionary key ordering: for example, a build that iterates over the keys of a Perl hash will have problems, since these elements are also returned in a variable order. Parallelism in a build process can also make the build order non-deterministic, because different elements can build at different speeds at different times.
Debian's approach to this has been the development of the torture test. Everything is built twice, an A build and a B build, and between the two builds as much as possible is varied. The clock on the B build server is 18 months ahead of the clock on the A server; their hostnames and domain names are different. The reproducible build team developed a FUSE filesystem called disorderfs, which tries to be as non-deterministic as a working filesystem can be. They vary the time zone, locale, UID, GID, and kernel, all to try to determine to a high degree of accuracy whether a given build is reproducible.
When this work started in 2013, 24% of the software in Debian would build reproducibly. As we reported earlier, by 2015 it was up to about 75%, and as of March 2018, said Lamb, 93% of the packages in Debian 10 (buster) built reproducibly on amd64. But, while the proportion has been steadily increasing, the increase hasn't been monotonic. Lamb's graph of reproducibility vs. time since late 2014 showed a couple of big backslides. These, he said, tended to correlate with new variations introduced to the torture test. A sharp drop in reproducibility in late 2016, for example, marked the introduction of variable build paths. The issues that this exposed were dealt with over the next four months; this work included a patch to GCC.
Meanwhile, the idea is spreading; various distribution and build-system projects, including coreboot, Fedora, LEDE, OpenWRT, NetBSD, FreeBSD, Arch Linux, Qubes, F-Droid, NixOS, Guix, and Meson, have all joined the reproducible builds project. Three reproducible build summits have been held, and more are anticipated. Good tools other than disorderfs have come out of the project, such as the .buildinfo file we covered earlier. We also covered diffoscope, but this can now interpret about sixty different types of content, from Android APKs to xz-compressed files. As happens with any good tool, people are starting to find other uses for it. Lamb said that he found it particularly helpful verifying that security patches didn't touch any more of a binary than he expected them to; he also noted its utility in comparing binary blobs such as an old and a new router firmware image.
An honest look at limitations is a good thing. In response to a later question about whether diffoscope could identify "effectively reproducible" builds that differ in only trivial ways, Lamb declined to come up with a definition of "sufficiently reproducible". For him, the reproducibility test is exact binary compatibility; diffoscope is a tool to help diagnose failures to meet that standard, not an opportunity to lower the bar. Reproducible builds themselves are no panacea. They do nothing to help find backdoors or other vulnerabilities in the source; if your Git repository has been compromised, reproducible builds won't save you. Similarly, they do nothing to find or fix programming errors, weak algorithm choices, or "testing" modes in the style of Volkswagen.
Further improvement is possible. User interfaces for handling the installation of software that cannot build reproducibly could do with a lot of improvement over Debian's current offer. Toolchains continue to need fixes: the GCC patch referred to earlier is not yet in upstream, and a whole bunch of OCaml packages aren't reproducible. The advantage of fixing these at the toolchain level is that you fix a given issue across half a million packages at once instead of having to modify each individual package's build to work around it. Lamb noted that help from OCaml and R experts would be particularly valuable right now.
He hopes that the next release of Debian will be 100% reproducible, and noted that progress can be seen at isdebianreproducibleyet.com. He mentioned that at some point a policy change to Debian might be considered, such that software that wouldn't build reproducibly wouldn't be accepted for inclusion. In response to my question, he said that 93% is to his mind too early for such a change, but that once reproducible builds get to 98-99% he'd become much more supportive of it.
In the tricky middle ground of 95-96%, his position would depend on why builds were non-reproducible, as there are a few valid reasons for this to happen. In response to another question, he said that two good reasons for a non-reproducible build were packages that build inside their own virtual machine, such as Emacs, and security packages with signing keys such as secure boot. The former, he thinks, can probably be solved with enough work; the latter can't be fixed, but there are only a couple of them, and an exception could be made.
I'm a big fan of tools that let me manage my own security. I prefer a YubiKey to an RSA token, because I can generate and load my own secrets. This reproducible build work cheers me up because it allows me to manage the security of my vendor-supplied binaries in addition to trusting my vendor's building and signing infrastructure. Hopefully, Debian will reach a point where it mandates reproducible builds and others will soon follow suit.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]
PEP 572 and decision-making in Python
The "PEP 572 mess" was the topic of a 2018 Python Language Summit session led by benevolent dictator for life (BDFL) Guido van Rossum. PEP 572 seeks to add assignment expressions (or "inline assignments") to the language, but it has seen a prolonged discussion over multiple huge threads on the python-dev mailing list—even after multiple rounds on python-ideas. Those threads were often contentious and were clearly voluminous to the point where many probably just tuned them out. At the summit, Van Rossum gave an overview of the feature proposal, which he seems inclined toward accepting, but he also wanted to discuss how to avoid this kind of thread explosion in the future.
Assignments
Van Rossum said that he would try to summarize what he thinks the the controversy is all about, though he cautioned: "maybe we will find that it is something else". The basic idea behind the PEP is to have a way to do assignments in expressions, which will make writing some code constructs easier. C has this, as does Go, but the latter uses some extra syntax that he finds distasteful.
The problem with the C-style of assignments is that it leads to this classic error:
if (x = 0) ...That is legal syntactically, but is probably not what the programmer wanted since it assigns zero to x (rather than testing it for equality to zero) and never executes the statement after the if. If you don't believe that is a real problem, Van Rossum said, just look at Yoda-style conditions that reverse the order of a condition so that it will cause a syntax error if = is used instead of ==:
if (0 = x) ...
Python vowed to solve this problem in a different way. The original Python had a single "=" for both assignment and equality testing, as Tim Peters recently reminded him, but it used a different syntactic distinction to ensure that the C problem could not occur. Python has always "prided itself" on not giving any possible way to have that mistaken assignment problem without having to resort to tricks like Yoda style.
A classic example of where these kinds of assignments would be quite useful is in pattern matching:
m = re.match(p1, line) if m: return m.group(1) else: m = re.match(p2, line) if m: return m.group(2) else: m = re.match(p3, line) ...The proposed syntax in the PEP would use a new ":=" operator (which could be read as meaning "becomes"), so that a series of matches like the above could instead be:
if m := re.match(p1, line): return m.group(1) elif m := re.match(p2, line): return m.group(2) elif m := re.match(p3, line): ...
Another motivating code pattern is the "loop and a half". It was once common for processing a file by line, but that has been solved by making file objects iterable; however other non-iterable interfaces still suffer from patterns like:
line = f.readline() while line: ... # process line line = f.readline()or like this:
while True: line = f.readline() if not line: break ... # process lineEither of those could be replaced with a much more clear and concise version using an assignment expression:
while line := f.readline(): ... # process lineVan Rossum said that he knows he has written loop-and-a-half code and did not get it right at times. The assignment expression makes the intent of the author clear, while the other two make readers of the code work harder to see what is going on.
Another example is with comprehensions (e.g. list, dictionary). Sometimes programmers value the conciseness of the comprehension to the point where they will call an expensive function twice. He has seen that kind of thing, even in the code of good programmers.
But Python has done pretty well for 28 years without this functionality. A lot of people have reacted to the idea—in various different ways. Part of what people were looking for was examples from real code, not toy examples that were generated to justify the PEP. Peters and others found realistic examples from their own code where the feature would make the code shorter and, more importantly, clearer, Van Rossum said. All of those examples were too long to fit on his slides, however.
Contentious debate
One of the reasons that the debate has been so contentious, he thinks, is that there are so many different syntactic variations that have been suggested. Here is a partial list of the possibilities discussed in the threads:
NAME := expr expr -> NAME NAME <- expr expr {NAME} NAME = expr expr2 where NAME = expr1 let NAME = expr1 in expr2 ...As can be seen, some used new operators, others used keywords, and so on. Van Rossum said that he had tried to push C-style assignment to see how far it would go, but others were pushing their own variants. Beyond that, there were some different options that were discussed, including requiring parentheses, different precedence levels for the operator, allowing targets other than a simple name (e.g. obj.attr or a[i]), and restricting the construct to if, elif, and while.
![Guido van Rossum [Guido van Rossum]](https://static.lwn.net/images/2018/pls-vanrossum-sm.jpg)
Another contentious issue was the idea of sub-local scopes that got mixed into the PEP early on. The idea is to have implicit scopes that are only active during the execution of a statement; it is potentially useful, but there are some quirks and corner cases. In the end, it got scratched from the PEP.
Overall, the idea has been "incredibly controversial", he said. What one person thinks is more readable, another person thinks is less readable. The benefits are moderate and everyone has their own favorite syntax. Sub-local scope added other oddities and would have required new bytecodes in order to implement it.
The PEP also bundled a fix for another problem, which is a "weird corner case" in comprehensions at class scope. That should be removed from PEP 572 and turned into its own PEP, he said.
A poll was taken on PEP 572, but the additional corner-case fix was part of the PEP. That muddied the poll, since those who wanted the assignment feature but did not want (or weren't sure of) the comprehension fix did not have a way to vote; a new poll will need to be done. The PEP moved to python-dev prematurely, as well.
Python is not a democracy, Van Rossum said. But generally folks agree with his decisions "except when I don't accept [their] favorite change".
Mark Shannon wondered if the attendees of the Python Education Summit might have some thoughts on the feature. Van Rossum acknowledged that some have said the feature makes it harder to teach Python, but he is not really sure of that, in part because he does not know how people learn the language. Nick Coghlan said the problem is trying to describe the difference between = and :=, but Van Rossum suggested that teachers not use := in instructional code. However, he does recognize that sites like Stack Overflow will lead some newbies to copy code in ways that might be confusing or wrong.
Decision-making
The larger issue from this PEP is "how we make decisions", Van Rossum said. There were many long responses in the threads, mostly against the feature. Overall, there was "way too much email". There were many misunderstandings, digressions, explanations, both right and wrong, and so on. Part of the problem is that there is no real way to measure the effectiveness of new language features.
In the end, he had to stop reading the threads so he wouldn't "go insane". Chris Angelico, who is the author of the PEP, could not be at the summit, but Van Rossum suggested that he stop responding in the threads to try to tamp things down. He wondered how to "dig our way out" of situations like this. It got to the point where people were starting new threads in order to try to get the attention of those who had muted older threads with too many messages.
Łukasz Langa suggested that "dictators should dictate"; Van Rossum should perhaps use his role to put a stop to some of that kind of stuff. But if not, then Van Rossum may just have to defer and stop following the threads, as he did. Langa said that he never follows python-ideas for exactly this reason.
Van Rossum said that the PEP had four revisions that were discussed on python-ideas before moving it to python-dev; he "thought we had it right". Langa wondered if there were other PEPs with a similar kind of response. Static typing (also known as "type hints") is one that Van Rossum remembered. Shannon thought that did not have as many negative postings from core developers as PEP 572 has had. Van Rossum agreed that might be the case but did remember a few core developers in opposition to type hints as well.
Victor Stinner suggested that the python-ideas discussion be summarized in the PEP. Van Rossum said that he thought many who responded had not read the discussion section in the PEP at all. He noted that the python-ideas discussion was better than the one on python-dev, even though it too had lots of passionate postings. There are fewer people following python-ideas, Christian Heimes said. Van Rossum wondered if the opposition only got heated up after he got involved; people may not have taken it seriously earlier because they thought he would not go for it.
Ned Deily suggested that the pattern for summit discussions be used to limit how long a discussion goes on; perhaps give five days before a decision will be made. The Tcl project has a much more formal process, where core developers are required to vote on proposals, but he didn't know if the Python project wanted to go down that path. It might make sense to have someone manage the conversation for PEPs, Van Rossum said. He is familiar with the IETF process from the late 1990s, which had some of that. He actually borrowed from the IETF to establish the PEP process.
But Barry Warsaw believes that PEP 572 is an outlier. Since it changes the syntax of the language, people tend to focus on that without understanding the deeper semantic issues. He suggested that perhaps a small group in addition to the PEP author could shepherd these kinds of PEPs. But in the end, people will keep discussing it until a pronouncement is made on the PEP one way or the other.
Van Rossum said that he is generally conflict-averse; he prefers not to just use his BDFL powers to shut down a discussion. Angelico is somewhat new at writing PEPs of this sort; Van Rossum thinks that Angelico probably would not have kept pushing it if he and Peters had not jumped into the discussion. Steve Dower said that perhaps some PEPs could be sent back with a request to get some others to work with the author on it.
Brett Cannon pointed out that the PEP editors are not scrutinizing PEPs super closely before they move to python-dev; it is mostly a matter of making sure there are no huge problems with the text. It might make sense to have a working group that tried to ensure the PEP was in the right state for a quality discussion. Another idea would be to have a senior co-author on PEPs of this nature, Van Rossum said. In addition to being an expert on the subject matter of the PEP, they could use their authority to help steer the conversation.
Getting along in the Python community
In a session with a title that used a common misquote of Rodney King ("can't we all just get along?"), several Python developers wanted to discuss an incident that had recently occurred on the python-dev mailing list. A rude posting to the list led to a thread that got somewhat out of control. Some short tempers among the members of the Python developer community likely escalated things unnecessarily. The incident in question was brought up as something of an object lesson; people should take some time to simmer down before firing off that quick, but perhaps needlessly confrontational, reply.
The post by Ivan Pozdeev was never
directly cited in the
discussion (though a response in the thread by
Steven D'Aprano was put up as a slide). As Guido van Rossum put it, the
original
poster was "being a jerk". Pozdeev complained about the tkinter
module in the standard library being broken for his use case. Beyond
that, he claimed that almost no one uses it and that "no-one gives a
damn
". He suggested that it should be removed from the standard
library since it could not be maintained.
Even though Pozdeev was intentionally pushing the buttons of the Python developers, Van Rossum thought the response was a bit over the top. Brett Cannon said that by being jerks in response, the Python developers simply looked bad. Thomas Wouters agreed, saying that folks should not respond in kind; instead, give people the benefit of the doubt even if they are jerks. If you do so, they may learn and adjust their behavior next time, he said.
As D'Aprano noted, Van Rossum was the first to respond
telling the poster to "go punch a bag or something, and then propose
something a little more constructive, like adding a warning to the
docs
". That is where things should have ended, D'Aprano said, but
several others
responded, which made the overall response less than welcoming; "as a
community we haven't lived up to our own standards,
as we have piled onto him to express our [righteous] indignation
".
The Python code of conduct was raised in the thread, which was premature, several said. Cannon said that mailing list participants need to get better at handling these kinds of things. The problem also occurs within the group; it is not just newbies or outsiders who generate these kinds of responses. It gives a bad impression of the Python community, he said.
Beyond that, though, those kinds of responses can lead to quick burnout, Cannon said. The vast majority of Python contributors are putting personal time into the project, so unwelcoming or unfriendly responses could easily lead to someone just walking away. It is not just the two participants that are affected, Wouters said, as anyone who reads the posting may have a negative response.
It is something that everyone should be aware of, Cannon said; take the higher ground and give people the benefit of the doubt. Perhaps there is a need for a more formal process, however. Van Rossum is not particularly worried about repeat offenders on the mailing lists; if people are repeatedly being jerks, they will be dealt with. But in this case it was a first offense, so it was premature to bring up the code of conduct. "Sitting on your hands is often a good response", he said; give it some time before responding or perhaps don't respond at all if others have already done so. Cannon echoed that: "if you feel heated, wait it out".
Van Lindberg wondered how the Python community sees itself: as friends? a club? a professional society? He is concerned about the idea of "enforcers" who patrol for code of conduct violations. He suggested that a good way to think about it would be to ask if the response you are sending is something you would do in a professional relationship; is it civil and is it the way you would engage with a coworker? He wants to ensure that Python doesn't get to a place where people are reporting on each other to the enforcement authorities.
Canned responses might help blunt the impact in handling some of these kinds of situations, Carol Willing said. If you point someone at a response, rather than responding directly, the mention of the code of conduct may not seem so prominent or unwelcoming. She agreed that when someone is having an emotional response to a post, they should wait to respond. But she also said that the original poster had owned up to trying to game the system to get a reaction; to a certain extent, Pozdeev got what he was looking for.
Mentoring and diversity for Python
A two-part session at the 2018 Python Language Summit tackled the core developer diversity problem from two different angles. Victor Stinner outlined some work he has been doing to mentor new developers on their path toward joining the core development ranks; he has also been trying to document that path. Mariatta Wijaya gave a very personal talk that described the diversity problem while also providing some concrete action items that the project and individuals could take to help make Python more welcoming to minorities.
Mentoring
Stinner said he has been spending time trying to get more people involved in the Python project. There is a bottleneck in the review process; more developers would help with that. In addition, more people involved means that there are more diverse viewpoints, which leads to better review since different reviewers spot different kinds of problems.
![Victor Stinner [Victor Stinner]](https://static.lwn.net/images/2018/pls-stinner-sm.jpg)
He has been unable to identify a clear path from being a contributor to becoming a core developer. It is an unclear and unwritten process that he is now trying to document. Becoming a core developer is not really a goal itself, he said, there are lots of ways to contribute without being a core developer.
There are multiple stages along the path: newcomer, contributor, mentoree, and core developer. He added mentoree and tried to write down the requirements for the next stage. He also tried to document the responsibilities of a core developer; for example, if a core developer merges some code, they are responsible for that code for the next ten years. He also formalized the role for bug triage, which can be a step on the road as well.
Stinner recently mentored three contributors over a two-month period. He was surprised that many of the questions he got asked were not related to the code, but were more about the process. Things like who to ask to review code or how to use Git were the kinds of questions he fielded. He also found that "people are shy", they do not want to ask questions, even on the core-mentorship mailing list, which is not publicly archived.
In practice, the topics he needed to teach his mentorees were things like how to write a good pull request, how to add good tests, and adding entries to the NEWS file. The most efficient tool to get more developers is to get more people mentoring, Stinner said. That is why he stopped writing code for a bit to focus on mentoring; he hopes others will do the same.
Kushal Das pointed out that CPython did not get into the Google Summer of Code (GSoC) program because of a lack of good mentors. Stinner said that GSoC takes a bigger time commitment than mentoring; he would generally just need to answer a few emails a week for his mentorees. Das agreed with that, noting that GSoC can sometimes take 10-15 hours per week.
Stinner wondered if his document should become a PEP or be handled in some other way. Nick Coghlan suggested it be turned into a pull request for the Python Developer's Guide and "let that decide whether it should be a PEP". Guido van Rossum said that he could see it is a PEP written from the perspective of mentors, but not just something with a bunch of check boxes.
There is definitely a problem with a checklist approach, Ned Deily said. There are a "fair number of intangibles" that go along with being a core developer: how well they work with the group and if they understand and accept the history of the project, its processes, and so on. Stinner said the document does not have items like "fix five bugs" or "get ten commits accepted". The criteria really depends on the kind of work that is being done and Python has a need for many different types of contributors.
Diversity
Wijaya started out by saying that she had been asked to give this talk; it is not something she would necessarily choose on her own because the topic is a trigger for her. She said she would try to control her emotions but that the presentation might be uncomfortable for attendees.
She had a list of facts about Python development that she said should make people uncomfortable. From February 2017 to April 2018, there were 848 contributors to Python on GitHub—of those, less than ten were women. That is not a percentage, but a raw number. It took a month after the switch to GitHub before a woman contributed.
In the last 12 months of posts to the python-dev mailing list, there are less than five women actively participating. There are more who are watching, she said, but who do not get involved. When she first started participating in the mailing list, she thought she was the only woman. Of the 168 committers for Python, only two are women. Both of those women were added in 2017. "This is wrong", Wijaya said. The project should not accept this situation.
She had some requests for the attendees as well. Please acknowledge that the lack of diversity is a real problem, she said. In addition, it is not her problem, it is a problem for everyone in the Python community. Those who cannot see the problem are actually part of the problem.
Attendees should not expect her to solve the problem for them, or to repeatedly explain it. There are resources available to educate everyone about the problem. "Don't ever tell me you don't see a problem", she admonished.
She listed some educational resources, including using Google ("really!") with keywords like "diversity in tech" or "open source diversity". She also suggested viewing the documentary CODE: Debugging the Gender Gap. In addition, follow women and people of color on Twitter, she said, as well as @betterallies. The project could also get professional advice. Two people that she knows who may be able to help are Sage Sharp and Ashe Dryden. Their jobs are to help organizations do better at diversity.
She made some specific suggestions of action items for the project. A better code of conduct is needed because there is no enforcement information in it. Who will handle complaints and how? Several good examples can be found, including the codes of conduct for Django, Write the Docs, and the PyCascades conference. PyCon 2018 also has a code of conduct with an enforcement manual, which is the first time the conference has added that piece.
She noted that Stinner had said people are shy, but she had a somewhat different take. Public spaces are not always seen as safe spaces for minorities. So, she suggested that Python developers be available privately. Van Rossum has been mentoring Wijaya and others, she said, separate from the core-mentorship mailing list. Don't expect minorities to post there and instead explicitly provide office hours or other ways for those folks to get their questions answered.
Give minorities opportunities to contribute, she said; pay attention when they are creating pull requests and review them. Actively seek out minorities and invite them to participate. Another possibility is to pair up with someone on something you are working on, she said.
Her last piece of advice was to "be a minority". She suggested attending a PyLadies event alone to just listen. A blog post about a man's visit to PyLadies London is worth reading; his worries about being in the minority echo what she and other minorities often feel.
Brett Cannon replayed some of the action items he heard: the code of conduct needs to be more clear and to include an enforcement manual. Thomas Wouters noted that the Python Software Foundation (PSF) is working on an update to its code of conduct that could be useful. Christian Heimes suggested adding photos of the enforcement people as well as multiple different contact options for them.
Das said that some are scared to ask questions in public spaces, so office hours can really help. Video calls can also be useful for those who are willing to do that. For most, text is probably easier, but any private communication mechanism is workable. Wouters said that predictable hours where one is available to talk or answer questions is the most important part. Van Rossum agreed, saying that there are people who want to ask questions but don't want to do so on python-dev with 20,000 readers.
Stinner wondered if a new mailing list to discuss diversity topics would be useful; there was one earlier, but it closed down due to lack of traffic. Coghlan said that a mentoring special interest group (SIG) might be good. Being a good mentor is a skill in its own right, he said.
Brian Curtin suggested that active outreach is a good way to increase diversity. He noted that after a PyCon that had only a single woman speaker, the two PyCons in Santa Clara, California specifically sought out more women speakers by talking to PyLadies groups and others. That was quite successful to increase the number of women speakers at the conference.
The conferences outside of the US and Europe are a different story, Das said. He has seen PyLadies looked down on at conferences outside those regions "again and again". The women's groups are trying hard to bring in more women to the project, but they get no support from the conferences or attendees, he said.
Heimes said that when developers are asked to speak at conferences they should ask if there is a code of conduct and diversity program. If there isn't, guide them to the PSF for examples. Van Lindberg said that anything funded by the PSF must have a code of conduct, though he acknowledged there are no rules about what must be in it; the PSF does provide examples, however. One attendee said that a conference had added a code of conduct after he asked. The conference is not sponsored by the PSF, but is now talking about adding a diversity program. Potential speakers can use that as leverage to help influence conferences to get on board.
Steve Dower asked about finding people to mentor; he has helped his colleagues at work, but how can others be found? Das said that new people with pull requests who are looking for review are good candidates. Adding some information to the Developer's Guide or as a new PEP would also raise the visibility of mentoring opportunities.
A technical walkthrough of some part of the interpreter or standard library is quite useful to those who are new to the code, Carol Willing said. Making sure that the presentation is recorded for others will help increase its reach. The move to GitHub has been a great move forward in making it easier for new people to get involved. But Python is fighting an uphill battle, she said, talented women can have their pick of open-source projects to join.
Dower suggested that perhaps speaking at a PyLadies meetup to do a review of some part of CPython might be a way to combine two of the suggestions from the session. As part of that, the speaker could make it clear that they are willing to mentor anyone who is interested. Eric Snow noted that simply introducing yourself to everyone you run into at PyCon is a good starting point; telling those with an interest about the core-mentorship list and that you are willing to be a mentor will help get the word out as well.
TCP small queues and WiFi aggregation — a war story
This article describes our findings that connected TCP small queues (TSQ) with the behavior of advanced WiFi protocols and, in the process, solved a throughput regression. The resulting patch is already in the mainline tree, so before continuing, please make sure your kernel is updated. Beyond the fix, it is delightful to travel through history to see how we discovered the problem, how it was tackled, and how it was patched.
The academic life is full of rewards; one of ours was the moment in which three USB/WiFi 802.11ab/g/n dongles arrived. We bought dongles with an Atheros chipset because both the software driver and the firmware are available and modifiable. We were using the ath9k_htc kernel module with the default configuration. We compiled the latest (at the time) available kernel (4.13.8), and then we started the access point to create an 802.11n network to build the core of our future testbed for vehicular communications.
We started some simple tests with ping and iperf to check the connectivity, the distribution of IP addresses, and our custom DNS, which translates the names of our services into IP addresses. The nominal transfer rate of the dongles is 150Mb/s, but what we saw on the screen was disappointing: an upload iperf connection, no matter which options were used, was able to reach only 40Mb/s. Using another operating system as a client, we were able to achieve 90Mb/s, leaving out a problem with the server. Even with the newer kernel release (4.14), we did not see anything in the kernel messages that would have been correlated with a hardware or a driver failure. To stress-test the equipment, we started a UDP transmission at a ludicrous speed. Not so surprisingly, we arrived almost at 100Mb/s. It was clear that the root of the problem was in the TCP module or its interactions with the queueing disciplines, so the journey began.
The next step involved the tc command. We started listing the default queueing discipline and modifying its parameters. By default, we were using the mq queuing discipline, which instantiated an FQ-Codel queuing discipline for each outgoing hardware queue. With another driver, such as ath9k, the entire queuing layer is bypassed and a custom version of it, without the possibility of tuning or modifying the queueing discipline, is deployed inside the kernel WiFi subsystem. With ath9k_htc driver, instead, we still had the chance to play with the queuing discipline type and parameters. We opted for using the most basic (but reliable) discipline, pfifo_fast. But nothing changed.
We were using the default CUBIC congestion-control module. Despite the recent hype around BBR, we decided to stick with CUBIC because it has always just worked and never betrayed us (until now, as it seems). Just for a try, we switched to BBR, but things got worse than before; the throughput dropped by 50%, never passing the 20Mb/s line. To do all the tests, we employed Flent, which also gives latency results. All the latencies were low; we never exceeded a couple of milliseconds of delay. In our experience, a low throughput with a low latency indicates a well-known problem: starvation. So the question became: what was limiting the number of segments transmitted by the client?
In 2012, with commit 46d3ceabd8d9, TCP small queues were introduced. Their initial objective was to prevent TCP sockets from queuing more than 128KB of data in the network stack. In 2013, the algorithm was updated to have a dynamic limit. Instead of the fixed value, the limit was defined as either two segment's worth of data or an amount of data that corresponds to a transmission time of 1ms at the current (guessed) transmission rate. The calculation of the transmission rate was added some months earlier, with the objective of calculating the proper sizing of segments when TCP segmentation offload is in use, along with the introduction of a packet scheduler (FQ) able to spread out the sent segments over an interval.
However, the first reports suggested that the amount of data queued was too low in some subsystems, such as WiFi. The reason behind this was the impossibility, for the WiFi driver, of performing frame aggregation, due to the lack of data in the driver's queue. The aggregation technique combines multiple packets into a single frame to reduce the constant overhead for over-the-air transmission. Preventing aggregation is a sure way to wreck throughput.
In response, a minimum amount of buffering (128KB) was restored in commit 98e09386c0ef4. One year later, a refactoring patch for segmentation offload sizing introduced a small modification that, as we will see, changed the situation dramatically. The 128KB value was changed from being a lower bound to an upper bound. If the amount of data queued was forced to be less than 128KB, what would happen to the WiFi aggregation?
Fast forwarding to the 4.14 kernel, we started to think how to tune these thresholds. First of all, the function that decides (even in recent kernels) how much TCP data is allowed to enter the network stack is tcp_small_queue_check():
static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, unsigned int factor) { unsigned int limit; limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10); limit = min_t(u32, limit, sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes); limit <<= factor; /* ... */ }
The limit is calculated as the maximum of two full-size segments and ~1ms of data at the current rate. Then the minimum of this value and the 128KB threshold is used (to be in sync with the kernel history, we must say the default value was raised to 256KB in 2015). We started to wonder what would happen if we patched out the possibility of setting a lower bound on the amount of data that could be enqueued. We then modified in the most obvious way the above function to get the following results:
The first column represents the results using the pre-fix parameters for TSQ (two segments or ~1ms of data at the current rate). In the second, we forced at least 64KB to be queued. As we can see, the throughput increased by 20Mb/s, but also the delay (even if the latency increase is not as pronounced as the throughput increase). Then we tested the original configuration in which the lower bound was 128KB; the throughput exceeds the 90Mb/s value, with an added latency of 2ms. It is enough to have 128KB of data queued to have the proper aggregation behavior, at least with our hardware. Increasing that value (we plotted up to 10MB) does not improve in any way the throughput, but it worsens the delay. Even the case in which the TSQ is entirely disabled did not add any improvement to the situation. We found the cause of the problem: a minimum value of data should be enqueued to ensure that frame aggregation works.
After the testing phase, we realized that putting back fixed byte values would be the wrong choice because, for slow flows, we would only have increased the latency. But, thanks to the modifications done to support BBR, we do know what the flow's current rate is: why not use it? In fact, in commit 3a9b76fd0db9f, pushed at the end of 2017, the logic of TSQ was enriched by the possibility, for a device driver, to increase the number of milliseconds worth of data that can be queued. The best value for throughput that worked in all the hardware we tested was in between 4-8ms at the flow rate. So, we shared our results, and some weeks later a patch was accepted. In your latest kernel, thanks to commit 36148c2bbfbe, your WiFi driver can allow TCP to queue enough data to solve the aggregation problem with a negligible impact on latency.
The networking stack is complicated (what is simple in kernel space?). For sure, it is not an opaque black box, but instead, it is an orchestrated set of different pieces of knowledge, reflected into layers that can, sometimes, make incompatible choices. As a lesson, we learned that the relationship between latency and throughput on different technologies is not the same, and aggregation in wireless technologies is more common than we initially thought. Moreover, as a community, we should start thinking about automated tests that can give an idea of the performance impact of a patch under different technologies and in a wide range of contexts, from the 40Gb/s device of a burdened server to the 802.11ab/g/n USB dongle connected to a Raspberry Pi.
[The authors would like to thank Toke Høiland-Jørgensen for his support and the time he dedicated to the Flent tool, to the WiFi drivers, and to gather the results from the ath9k and ath10k drivers.]
4.18 Merge window, part 2
By the time that Linus Torvalds released 4.18-rc1 and closed the merge window for this development cycle, 11,594 non-merge changesets had found their way into the mainline kernel repository. Nearly 4,500 of those were pulled after last week's summary was written. Thus, in terms of commit traffic, 4.18 looks to be quite similar to its predecessors. As usual, the entry of significant new features has slowed toward the end of the merge window, but there are still some important changes on the list.
Core kernel
- Asynchronous I/O operations can be submitted with the new IOCB_FLAG_IOPRIO flag to set the I/O priority of individual operations.
- After years of discussion, restartable sequences have finally made it into the mainline kernel, with support on the x86, ARM, and PowerPC architectures. The associated "ops vector" functionality was not merged, and doesn't appear likely to go in anytime soon.
Architecture-specific
- The ARM64 architecture has gained mitigations for the Spectre version 4 vulnerability.
- The IA64 "perfmon" performance-monitoring feature has been marked broken, due to a set of internal problems that have been discovered. Before fixing those problems, the developers want to see if anybody even notices the removal of perfmon; there is a strong suspicion that nobody is actually using it.
- Among the many new systems-on-chip supported in this release, there is
incomplete support for the Qualcomm Snapdragon 845, found in high-end
mobile devices. Olof Johansson wrote: "
It's great to see mainline support for it. So far, you can't do much with it, since a lot of peripherals are not yet in the DTs but driver support for USB, GPU and other pieces are starting to trickle in. This might end up being a well-supported SoC upstream if the momentum keeps up
".
Filesystems and block layer
- As was covered recently, support for the Lustre filesystem has been removed from the staging tree.
- The F2FS filesystem has improved discard support, addressing some responsiveness problems experienced in the past.
- The new "writecache" device-mapper target can be used to cache block writes to a persistent-memory or solid-state device. See Documentation/device-mapper/writecache.txt for more information.
Hardware support
- Valve Steam game controllers, Silergy SY8106A regulators, ROHM BD71837 power regulators, Freescale DPAA2 1588 timer modules, Texas Instruments DAC5571 digital-to-analog converters, Qualcomm MSM8998 and SDM845 global clock controllers, Qualcomm SDM845 video clock controllers, Zorro ESP SCSI adapters, RAVE SP backlight controllers, and ORISE Technology OTM3225A backlight controllers.
Internal kernel changes
- There has been an extensive set of changes to make the kernel use
overflow-safe allocation calls whenever possible. Thus, for example,
calls that looked like:
kmalloc(count*size, gfp_flags);
have been changed to:
kmalloc_array(count, size, gfp_flags);
- The configuration options for stack protection have been changed. This is a result of the new configuration language work, which makes it possible to select the strongest available stack protection automatically, but which could have the opposite effect in some existing configurations. See this commit for details.
The net result of all the changes in 4.18 looks to be a net reduction of nearly 100,000 lines of code. That is only the fourth time in kernel development history that a release has been smaller than its predecessor, and the only time that this has happened for two releases in a row.
The stabilization period for 4.18 has now begun, with the final release expected on August 5 or 12.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: Backdoored images on Docker Hub; 4.17 security things; Fedora CoreOS; Quotes; ...
- Announcements: Newsletters; events; security updates; kernel patches; ...