Leading items

Welcome to the LWN.net Weekly Edition for March 3, 2022

This edition contains the following feature content:

CPython, C standards, and IEEE 754: a compiler change leads the Python community to move off C89.
Moving the kernel to modern C: the kernel community looks like it will finally move to a more recent version of the C language standard.
Better visibility into packet-dropping decisions: help for administrators wondering why their routers are dropping packets.
Extending restartable sequences with virtual CPU IDs: a proposed scalability feature for certain types of multi-threaded applications.
A Debian GR on secret voting—and more: would enabling secret voting make Debian's democracy better or worse?

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

CPython, C standards, and IEEE 754

By Jake Edge
March 2, 2022

Perhaps February was "compiler modernization" month. The Linux kernel recently decided to move to the C11 standard for its code; Python has just undergone a similar process for determining which flavor of C to use for building its CPython reference implementation. A calculation in the CPython interpreter went awry when built with a pre-release version of the upcoming GCC 12; that regression led down a path that ended up with the adoption of C11 for CPython as well.

A bug that was fixed in early February started the ball rolling for Python. Victor Stinner encountered a GCC regression that caused CPython not to get the expected IEEE 754 floating-point NaN (not a number) value in a calculation. An LWN article sheds some light on NaNs (and how they are used in Python) for those who need a bit more background. The calculation was using the HUGE_VAL constant, which is defined as an ISO C constant with a value of positive infinity; the code set the value of the internal Py_NAN constant used by the interpreter to HUGE_VAL*0, which should, indeed, evaluate to a NaN. Multiplying infinity by any number is defined to be a NaN for IEEE 754.

During his investigation of the problem, Stinner found that instead of the calculation, Python could simply use the NAN constant defined in <math.h>—as long as a C99 version of the header file was used. As part of the bug discussion, Petr Viktorin said that PEP 7 ("Style Guide for C Code") should be updated to reflect the need for the C99 header file. So Stinner duly created a pull request for a change to the PEP, but Guido van Rossum said that a change of that nature should be discussed on the python-dev mailing list.

That led Stinner to post a message to discuss the change on February 7. As it turns out, there are actually two bugs fixed by Stinner that require parts of the C99 math API; bug 45440 reported a problem with the CPython Py_IS_INFINITY() macro; the fix for that also involved using the C99 <math.h>. As Stinner noted, C99 is now 23 years old, and support for it in compilers is widespread; GCC, Clang, and Microsoft Visual C (MSVC) all support the needed features.

Floating point

Mark Dickinson pointed out that the existence of the NAN constant is not required by C99 directly; it is only present if IEEE 754 floating point is enabled as well. He thought that it made sense for CPython to require IEEE 754, but wondered whether Python, the language, should also require it. Stinner said that all modern computers support IEEE 754; even embedded devices without a floating-point unit (FPU) typically support it in software. "Nowadays, outside museums, it's hard to find computers which don't implement IEEE 754."

Stinner was in favor of requiring IEEE 754 for CPython; Gregory P. Smith agreed. Brett Cannon wondered if there was even any ability to test with systems that lacked the support:

Do we have a buildbot that has a CPU or OS that can't handle IEEE-754? What are the chances we will get one? If the answers are "none" and "slim", then it seems reasonable to require NaN and IEEE-754.

Stinner reported that all of the buildbot machines supported IEEE 754, so the path was clear to require it. In terms of the Python language, Christopher Barker said that IEEE 754 support should not be required for all implementations of Python, but that it should be recommended. Steve Dower agreed that leaving it up to Python implementations made sense: "Otherwise, we would prevent _by specification_ using Python as a scripting language for things where floats may not even be relevant." He said that making it a requirement would inhibit adoption: "The more 'it's-only-Python-if-it-has-X' restrictions we have, the less appealing we become."

Which C?

Switching to C99 makes sense if the compilers being used to build CPython support it, Cannon said. Viktorin asked about MSVC's support for all of C99; he did not find any documentation saying that it did, so it might be better to consider C11, which is supported. "Do we need to support a subset like 'C99 except the features that were removed in C11'?" Dower, who works on Python at Microsoft, said that he had not found an answer to the C99 question either:

All the C99 library is supposedly supported, but there are (big?) gaps in the compiler support. Possibly these are features that were removed in C11? I don't know what is on that list.
[...] Personally, I see no reason to "require" C99 as a whole. We have a style guide already, and can list the additional compiler features that we allow along with guidance on updating existing code and ensuring compatibility.
I don't see much risk requiring the C99 standard library, though. It's the compiler features that seem to have less coverage.

Stinner suggested tying the wording to what was supported in MSVC, but H. Vetinari thought a better formulation might be "'C99 without the things that became optional in C11', or perhaps 'C11 without optional features'". That led Viktorin to wonder why C11 would not make a better target: "[...] the main thing keeping us from C99 is MSVC support, and since that compiler apparently skipped C99, should we skip it as well?"

Cannon said that he found a list of optional C11 features, none of which were really needed; if the "C11 without optional features" flavor is widely supported, as it would seem that it is, "I think that's a fine target to have". Meanwhile, both Inada Naoki and Viktorin were excited about using C11's anonymous union feature in CPython.

Viktorin also said that in order to keep the CPython public header files compatible with C++, anonymous unions could not be used in them, though Inada said that C++ does support them, "with some reasonable limitations". While CPython aims to be compatible with C++ at the API level, it is hard to completely specify what that means—or even which version of the C standard is supported—as Smith pointed out:

We're likely overspecifying in any document we create about what we require because the only definition any of us are actually capable of making for what we require is "does it compile with this compiler on this platform? If yes, then we appear to support it. can we guarantee that? only with buildbots or other CI [continuous integration]" - We're generally not versed in specific language standards (aside from compiler folks, who is?), and compilers don't comply strictly with all the shapes of those anyways for either practical or hysterical reasons. So no matter what we claim to aspire to, reality is always murkier. A document about requirements is primarily useful to give guidance to what we expect to be aligned with and what is or isn't allowed to be used in new code. Our code itself always has the final say.

The final result was a rather small patch to PEP 7 to say that CPython 3.11 and beyond use C11 without the optional features (and that the public C API should be compatible with C++). In addition, bug 46656 and a February 25 post from Stinner document the changes to the floating-point requirements; interestingly, they do not mention IEEE 754, just a requirement for a floating-point NaN. While it may have seemed like a bit of a yak-shaving exercise along the way, the GCC regression eventually led to a better understanding of which flavor of C is supported for building CPython—along with moving to a flavor from this century. All in all, a good "days" work.

Comments (8 posted)

Moving the kernel to modern C

By Jonathan Corbet
February 24, 2022

Despite its generally fast-moving nature, the kernel project relies on a number of old tools. While critics like to focus on the community's extensive use of email, a possibly more significant anachronism is the use of the 1989 version of the C language standard for kernel code — a standard that was codified before the kernel project even began over 30 years ago. It is looking like that longstanding practice could be coming to an end as soon as the 5.18 kernel, which can be expected in May of this year.

Linked-list concerns

The discussion started with this patch series from Jakob Koschel, who is trying to prevent speculative-execution vulnerabilities tied to the kernel's linked-list primitives. The kernel makes extensive use of doubly-linked lists defined by struct list_head:

    struct list_head {
	struct list_head *next, *prev;
    };

This structure is normally embedded into some other structure; in this way, linked lists can be made with any structure type of interest. Along with the type, the kernel provides a vast array of functions and macros that can be used to traverse and manipulate linked lists. One of those is list_for_each_entry(), which is a macro masquerading as a sort of control structure. To see how this macro is used, imagine that the kernel included a structure like this:

    struct foo {
    	int fooness;
	struct list_head list;
    };

The list member can be used to create a doubly-linked list of foo structures; a separate list_head structure is usually declared as the beginning of such a list; assume we have one called foo_list. Traversing this list is possible with code like:

    struct foo *iterator;

    list_for_each_entry(iterator, &foo_list, list) {
    	do_something_with(iterator);
    }
    /* Should not use iterator here */

The list parameter tells the macro what the name of the list_head structure is within the foo structure. This loop will be executed once for each element in the list, with iterator pointing to that element.

Koschel included a patch fixing a bug in the USB subsystem where the iterator passed to this macro was used after the exit from the macro, which is a dangerous thing to do. Depending on what happens within the list, the contents of that iterator could be something surprising, even in the absence of speculative execution. Koschel fixed the problem by reworking the code in question to stop using the iterator after the loop.

The plot twists

Linus Torvalds didn't much like the patch and didn't see how it related to speculative-execution vulnerabilities. After Koschel explained the situation further, though, Torvalds agreed that "this is just a regular bug, plain and simple" and said it should be fixed independently of the larger series. But then he wandered into the real source of the problem: that the iterator passed to the list-traversal macros must be declared in a scope outside of the loop itself:

The whole reason this kind of non-speculative bug can happen is that we historically didn't have C99-style "declare variables in loops". So list_for_each_entry() - and all the other ones - fundamentally always leaks the last HEAD entry out of the loop, simply because we couldn't declare the iterator variable in the loop itself.

If it were possible to write a list-traversal macro that could declare its own iterator, then that iterator would not be visible outside of the loop and this kind of problem would not arise. But, since the kernel is stuck on the C89 standard, declaring variables within the loop is not possible.

Torvalds said that perhaps the time had come to look to moving to the C99 standard — it is still over 20 years old, but is at least recent enough to allow block-level variable declarations. As he noted, this move hasn't been done in the past "because we had some odd problem with some ancient gcc versions that broke documented initializers". But, in the meantime, the kernel has moved its minimum GCC requirement to version 5.1, so perhaps those bugs are no longer relevant.

Arnd Bergmann, who tends to keep a close eye on cross-architecture compiler issues, agreed that it should be possible for the kernel to move forward. Indeed, he suggested that it would be possible to go as far as the C11 standard (from 2011) while the change was being made, though he wasn't sure that C11 would bring anything new that would be useful to the kernel. It might even be possible to move to C17 or even the yet-unfinished C2x version of the language. That, however, has a downside in that it "would break gcc-5/6/7 support", and the kernel still supports those versions currently. Raising the minimum GCC version to 8.x would likely be more of a jump than the user community would be willing to accept at this point.

Moving to C11 would not require changing the minimum GCC version, though, and thus might be more readily doable. Torvalds was in favor of that idea: "I really would love to finally move forward on this, considering that it's been brewing for many many years". After Bergmann confirmed that it should be possible to do so, Torvalds declared: "Ok, somebody please remind me, and let's just try this early in the 5.18 merge window". The 5.18 merge window is less than one month away, so this is a change that could happen in the near future.

It is worth keeping in mind, though, that a lot of things can happen between the merge window and the 5.18 release. Moving to a new version of the language standard could reveal any number of surprises in obscure places in the kernel; it would not take many of those to cause the change to be reverted for now. But, if all goes well, the shift to C11 will happen in the next kernel release. Converting all of the users of list_for_each_entry() and variants (of which there are well over 15,000 in the kernel) to a new version that doesn't expose the internal iterator seems likely to take a little longer, though.

Comments (147 posted)

Better visibility into packet-dropping decisions

By Jonathan Corbet
February 25, 2022

Dropped packets are a fact of life in networking; there can be any number of reasons why a packet may not survive the journey to its destination. Indeed, there are so many ways that a packet can meet its demise that it can be hard for an administrator to tell why packets are being dropped. That, in turn, can make life difficult in times when users are complaining about high packet-loss rates. Starting with 5.17, the kernel is getting some improved instrumentation that should shed some light on why the kernel decides to route packets into the bit bucket.

This problem is not new, and neither are attempts to address it. The kernel currently contains a "drop_monitor" functionality that was introduced in the 2.6.30 kernel back in 2009. Over the years, it has gained some functionality but has managed to remain thoroughly and diligently undocumented. This feature appears to support a netlink API that can deliver notifications when packets are dropped. Those notifications include an address within the kernel showing where the decision to drop the packet was made, and can optionally include the dropped packets themselves. User-space code can turn the addresses into function names; desperate administrators can then dig through the kernel source to try to figure out what is going on.

It seems like there should be a better way. As it happens, the beginning of the infrastructure to provide that better way was contributed to 5.17 by Menglong Dong. The internal kernel function that frees the memory holding a packet is kfree_skb(); in 5.17, that function has become:

    void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason);

The reason argument is new; it is intended to say why the packet passed as skb has reached the end of the line. This information is not actually useful to the kernel, but it has been added to the existing kfree_skb tracepoint, making it available to any program that connects to that tracepoint. Analysis scripts can quickly print out why packets are being dropped; administrators can also attach BPF programs to, for example, create a histogram of reasons for dropped packets.

A new version of kfree_skb() has also been added; it simply calls kfree_skb_reason() with "unspecified" as the reason.

In 5.17, the use of this infrastructure is relatively limited. There are a few TCP-level drop locations that have been instrumented with the new call, including code that drops packets for being smaller than the TCP header size, not being associated with an existing TCP socket, exhibiting checksum failures, or having been explicitly dropped by an add-on socket filter program. The UDP subsystem has also been enhanced to note those same reasons for dropped packets.

The situation is set to improve considerably in 5.18; patches already in linux-next add a number of new reasons. These document packets dropped by the netfilter subsystem, that contain IP-header errors, or have been identified as a spoofed packet by the reverse-path filter (rp_filter) mechanism. Administrators will be able to see IP packets that have been dropped due to an unsupported higher-level protocol. Reasons have also been added for UDP packets dropped by the IPSec XFRM policy or a lack of memory within the kernel.

There is yet another set of reason annotations that has been accepted, but which has not yet found its way into linux-next; chances are that these will show up in 5.18 as well. They extend the XFRM-policy annotation to TCP, note packets dropped due to missing or incorrect MD5 hashes (which are evidently still a thing in 2022), as well as those containing invalid TCP flags or sequence numbers outside of the current TCP window. These patches also add new instances of the other reasons noted above; some situations can be detected in multiple places.

While the above set of reasons may seem long, this work could be seen as having just begun. In current linux-next, there are over 2,700 calls to kfree_skb(), compared to 18 to kfree_skb_reason(). That suggests that a lot of packets will still be dropped for unspecified reasons. Still, this work represents a useful step forward, one that should make many of the reasons for packet loss more readily available to system administrators.

The part that remains missing, of course, is the user-space side. The current reason codes are all defined in <linux/skbuff.h>, which is not part of the externally available kernel API. Moving them to a separate file under the uapi directory would make them more accessible to developers. Also helpful, of course, would be to have some documentation for this mechanism and how to use it (and interpret the results), but even your editor, often cited for naive optimism, will not be holding his breath for that to show up.

Meanwhile, though, an important piece of the kernel's network functionality is becoming a little more transparent to users. That should make life easier for system administrators who will be able to spend less time trying to figure out why packets aren't making it through the system. Unfortunately, though, this work offers no help for users who are wondering why their packets are disappearing somewhere in the far reaches of the Internet.

Comments (17 posted)

Extending restartable sequences with virtual CPU IDs

By Jonathan Corbet
February 28, 2022

Restartable sequences, a Linux kernel feature that facilitates the writing of lockless, per-CPU code in user space, has been around for some years, but it only just received support in the GNU C Library this month. Now that this barrier has been crossed, it would seem that the time has come to start adding features. Mathieu Desnoyers has responded to this challenge with a patch set adding an extension mechanism and a new "virtual CPU ID" feature.

See the above-linked article for an overview of how restartable sequences work. As a reminder, any thread using restartable sequences must first make use of the rseq() system call to register a special structure with the kernel. That structure is used to point to the rseq_cs structure describing the current critical section (if any); the kernel also ensures that it contains the ID number of the current CPU whenever the thread is running. Consistent with the pattern used in many relatively recent system calls, rseq() requires the caller to also provide the size of the rseq structure being passed in.

That length parameter exists to support future extensions to the system call. New features will generally require new data, increasing the size of the rseq structure. By looking at the size passed by user space, the kernel can tell which version of the rseq() API the calling process expects. When carefully used, this mechanism allows existing system calls to be extended in a way that preserves compatibility with older programs.

That still leaves an open question for programs that need to discover which API version they are dealing with as a way of knowing which features are available. One possibility is to invoke the system call with the most recent version of the structure and fall back to an earlier version if the call fails. Another is to simply have the kernel say which structure size it is prepared to accept. The rseq() patches take the latter approach, making the maximum accepted structure size available via getauxval().

Having added this extension mechanism, the patch set goes on to add two extensions without actually using it. These extensions add two 32-bit values to struct rseq, which does extend its length. But, due to the way that the structure was defined (with 32-byte alignment), it will already have a 32-byte allocated size, even though the (pre-extension) structure only required 20 bytes. That said, user space will still be able to tell whether the new values are supported by looking at the return value from getauxval(). Since the new value (AT_RSEQ_FEATURE_SIZE) did not exist before this patch set showed up, getauxval() will return zero on older kernels.

The first of the new values in struct rseq is called node_id and it contains exactly that: the ID number of the NUMA node on which the current thread is running. This is evidently useful for some memory allocators and, as noted in the patch changelog, supports (in conjunction with the already-present CPU ID) an entirely user-space implementation of getcpu().

The other new value is a bit further off the beaten path: it is called vm_vcpu_id. Like the cpu_id field in the same structure, it contains an integer ID number identifying the CPU on which the thread is running. But, while cpu_id contains the CPU's ID number as known by the kernel (and the rest of the system), vm_vcpu_id has no connection with the actual CPU number; it is a virtual number managed by the kernel in a process-private number space.

This new CPU ID appears to be aimed at the needs of programs running threads on a relatively small number of CPUs in a large system. Remember that rseq() is aimed at helping programs access per-CPU data structures; such structures usually take the form of an array indexed by the current CPU ID number. That array must be large enough to hold an entry for every CPU in the system, and every entry must be properly initialized and maintained.

That is just part of the task of working with per-CPU data structures. But imagine a smallish program, with a mere dozen threads or so, running on a large server with, say, 128 CPUs. Those threads may migrate over those CPUs as they run, or they may be bound to a specific subset of CPUs; either way, that per-CPU data structure must be set up for all 128 CPUs, which is not particularly efficient. It would be much nicer to match the "per-CPU" array size to the size of the program rather than that of the system it happens to be running on.

That is the purpose of the virtual CPU ID number. These numbers are assigned by the kernel when a thread is scheduled onto a (real) CPU; the kernel takes pains to ensure that all concurrently running threads in the same process have different virtual CPU ID numbers. Those numbers are assigned from their own space, though, and are chosen to be close to zero. That leaves the program with fewer possible CPU numbers to deal with while preserving the benefits of working with per-CPU data structures.

That does raise an interesting question, though: how does an application developer know what the range of possible virtual-CPU numbers is? When asked, Desnoyers explained:

I would expect the user-space code to use some sensible upper bound as a hint about how many per-vcpu data structure elements to expect (and how many to pre-allocate), but have a "lazy initialization" fall-back in case the vcpu id goes up to the number of configured processors - 1.

One might expect the virtual-CPU ID to be bounded by the number of running threads, but the full story is more complicated than that. Using this feature, thus, will require a bit of additional complexity on the user-space side.

Managing these virtual CPU IDs has a potential downside on the kernel side of the API as well: a certain amount of the work must be done in the scheduler's context-switch path, which is one of the hottest and most performance-critical paths in the kernel. Adding overhead there is not welcome. Desnoyers has duly taken a number of steps to minimize that overhead; they are described in this patch changelog. For example, a context switch between two threads of the same program just moves the virtual CPU ID from the outgoing thread to the incoming one, with no atomic operations required. Single-threaded programs are handled specially, and there is a special cache of virtual CPU IDs attached to each run queue which can be used to avoid atomic operations as well.

Benchmarks included in that changelog show that the performance impact of these changes is small in most cases. Whether that will be enough to get the patches past the scheduler maintainers remains to be seen, though; they have yet to comment on this version of the series. Should this mechanism eventually be merged, though, it will be another tool available to developers looking for the best scalability possible in multithreaded applications.

Comments (18 posted)

A Debian GR on secret voting—and more

By Jake Edge
March 1, 2022

Debian has been working on some "constitutional maintenance" of late; a general resolution (GR) on tweaks to the project's decision-making processes passed at the end of January. As part of the discussion surrounding those changes, the question of secret voting came up; currently, Debian publicly lists every voter for a GR and their ranking of the options. Another GR has been proposed to change that, but the discussion has shown that the definition of "secret" is not exactly the same for everyone. In addition, secret voting is not the only change being proposed.

A bit of history

The proximate cause for the interest in secret ballots is the controversial GR that was proposed and voted on in the early part of 2021; it proposed that the Debian project make a statement regarding Richard Stallman's return to the FSF board of directors. The voters decided that Debian would make no distribution-wide statement about that event, by a fairly close margin, but some people in the discussion were uncomfortable voting on the GR, given that their choices would be fully visible to the internet at large. The worry was that proponents or opponents of all the myriad "sides" in the debate would harass those who voted in the "wrong" way.

Back in November, Sam Hartman asked if the secret ballot question should be handled as part of the in-progress GR-process tweaking, or if it should be handled as a separate GR after that had run its course. The consensus seemed to agree with Hartman's assessment that it could overcomplicate the ballot, so he decided to defer it. In that message, though, he outlined the changes he would make to the Debian Constitution to change the GR vote to be a secret one. It would, essentially, follow the lead of the elections for the Debian project leader (DPL), which make all of the ballots public, along with the list of voters, but do not provide a mapping from voter to ballot.

The changes he proposed also sparked some discussion around the role of the project secretary. Hartman's changes said: "Votes are cast in a manner suitable to the Secretary." That removed the "by email" phrase because there might be systems for anonymous voting that do not use email. But, as Carsten Leonhardt pointed out, the manner of voting "needs to also be suitable to the voters".

Secretarial overrides

So, when Hartman returned to the subject after the vote on the earlier GR, in late January, he started with the question of how to handle disagreements between Debian developers (DDs) and the secretary. The constitution currently does not have a mechanism for the developers to override a decision of the secretary. Leonhardt noted one possible area for disagreement, voting systems, but Hartman wanted to solve the more general problem, so he proposed a change that would allow developers to override the secretary with a simple majority vote.

There are, however, two situations where a simple majority does not make sense, he said. The secretary decides which votes require a 3:1 supermajority, so overriding that kind of decision should also require a 3:1 vote. In addition, the secretary determines the outcome of all elections (including GRs) in Debian, some of which might require a supermajority, so it makes sense to require a 3:1 vote to override a vote outcome, as well, he said.

Don Armstrong pointed to clause 4.1.7 (in the section on the powers of developers) as a way to override the secretary: "In case of a disagreement between the project leader and the incumbent secretary, appoint a new secretary." But Hartman was concerned that 4.1.7 was somewhat ambiguous because of the "disagreement" language, so he said that he planned to address that as well in his secret-voting proposal.

Secret?

Jean-Phillipe Mengual said that votes in Debian should be secret by default, unless a majority of developers voted to change that for a particular GR. But Holger Levsen wondered what "secret" meant in this context. Hartman started a new thread with a lengthy reply: "TL;DR: I'm proposing that the way we handle DPL elections today is a good start for what secret means."

He outlined the mechanism for project leader elections: in addition to the public ballots (with no mapping to the actual voter), voters get a cryptographic hash that they can use to show that their vote was included in the totals. He listed two possible attacks against that system, both of which could be detected if voters (and those who can vote but did not) verify the election results. If the actual ballots are not retained, and the secretary, who oversees the process, is trustworthy: "I think those attacks are acceptable residual risk from a security standpoint".

Russ Allbery said that he was not quite as sure that voluntary verification was sufficient, however:

I'm a bit concerned that any scheme that doesn't build the cryptographic verification into the process and instead relies on people going out of their way to do verification is not going to be widely verified, and therefore it does create new risk if some future iteration of Debian has a less trustworthy secretary than we do today. To be clear, this is not a new risk; we're already living with this risk for DPL elections and maybe this should be within my risk tolerance. But it's not as clearly within my risk tolerance as it is within Sam's.

There are other risks that come with the current DPL voting process, Bill Allombert said. A developer could be forced to reveal the secret code returned by the devotee vote system, which would expose their vote. In addition, a group of voters on one side of an issue could work together to show that everyone else voted a particular way. Either of those would break the anonymity of the voting.

Hartman acknowledged those weaknesses, but noted that the problems already exist for DPL elections; he wondered if GR votes were categorically different. Allombert replied that the DPL votes are secret "as a courtesy to the candidates". He is not in favor of secret GR ballots, though he does recognize the problem that arose from the Stallman GR. He feels that GRs of that nature should be avoided:

[...] the Debian project is not the collection of opinions of its members since the members only agreed to fulfill the social contract when acting on behalf of Debian and not in general, and that their opinions outside of this is a private matter that must not be [probed], and that even the [aggregate] result of the vote is already leaking information that Debian project has no purpose to gather and publish.

As he refined his proposal, Hartman posted two more versions of the wording for discussion purposes, one on February 13 and another on February 20. As might be guessed, since this is Debian, discussion ensued. Armstrong asked about "specific examples of where someone wasn't able to vote their true preference because the vote was public". He noted that he perhaps sees his role differently with regard to voting than some do:

My personal reasoning is that I see my role as a voting project member as more of a stewardship role where I'm trying to decide what is best for the project, rather than what is best for me personally, and I want to be seen as being a good steward for the project. I also think the large number of voters masks the impact of a single individual vote. [But maybe this is a personal safety issue? Perhaps people should be able to optionally mask their identity when voting? Not sure.]

Hartman said that his country has secret ballots to elect its representatives, but that those representatives are expected to vote publicly in order to be held accountable by those who will possibly re-elect them. But Debian developers are not elected representatives; will they make better choices for the project, he asked, if they have to worry about how their vote will be perceived by others, possibly years down the road?

He also said that he and others were subjected to harassment because they sponsored or supported certain ballot options on the Stallman GR; he pointed to several possible scenarios where developers might not vote their conscience (or at all) because of concern about the reactions from others (e.g. employers or projects). He has not really looked at the individual votes in the past and was uncomfortable doing so, but he did note a response from Philip Hands that contained a valid use for the list of votes made. Hands said:

I have used the results of votes in the past to start conversations with people that I disagree with in some issue in order to better understand how they came to the other view. One can generally find someone on the other side of the argument who you already know and respect, which makes it much harder to dismiss them as an idiot. I'd miss that in a properly secret ballot.

But Hartman said that sponsors and public supporters of various ballot options should "be sufficient that it will not be difficult to find someone who can explain an alternate position even if we hide who voted for what". Karsten Merker replied that he was in favor of secret ballots, in part because the topics of GRs has shifted over time from "either technical or purely Debian-internal organizational issues" to questions like Stallman's return to the FSF board, which are "about a highly explosive public political debate completely external to Debian where there was IMHO absolutely no reason for Debian as a project to become involved at all". Public voting leaves developers without good choices:

Forcing this GR on the developers left all developers with only two choices: either to not vote at all and as a result have a highly explosive political statement that they potentially don't agree with (or even actively disagree with) published in their name, or take part in the vote and be forced to have their political views on the matter made public, political views which they otherwise wouldn't have made public and whose publication could easily have negative effects for them given the political climate around the whole matter - a climate where people in both "camps" had been sharpening their pitchforks and where having one's personal views on the matter published (regardless of which "side" one voted for) might well have negative consequences for one's further professional career.

Making the vote secret does not solve the problem of potentially having project statements made that a developer does not agree with, but it would allow developers to vote against them without subjecting themselves to various repercussions. In the threads, there were others who reported discomfort with voting publicly on GRs in the past, and reported that they had heard privately from developers who did not vote because of that.

Given that it is GRs on political positions that seem to be the most fraught, Hands suggested that perhaps the project could come to a "consensus not to even attempt these sorts of position statements in future, since all they do is highlight divisions". It is not surprising that those kinds of GRs are divisive:

Given that we generally want DDs to be drawn from as diverse a population as possible, we should expect our views on pretty-much any subject other than Free Software to represent the full spectrum of opinion, so drawing an arbitrary line somewhere and then getting the project to divide on which side we should stand as a group is not likely to give a useful result, but will give people reasons to be upset with one another.

While Allbery generally agreed with the sentiment, he did not really think the project could avoid divisive GRs in the future; he is concerned about moving forward without a way to mitigate the effects of those kinds of votes. Meanwhile, he gave some examples of where the problem might occur again:

I find it hard to escape the conclusion that we're going to have some vote in the future that will pose similar risks. Examples of lines of discussion that I think the project cannot (and should not) entirely avoid but that could lead to such a problem include Debconf venue selection, anything related to the project code of conduct including whether we should have one, and membership actions and their potential overrides under 4.1.3. I'll also point out that even technical issues have become heavily polarized and have led to at least borderline [harassment] based on publicly stated positions (see systemd).

General resolution

Though Hartman had earlier expressed some pessimism about his changes gaining enough support to consider moving forward with a GR, that changed over the course of the discussion. On February 23, he proposed the GR, which would make five specific changes to the constitution: switch to secret ballots, not require elections to be conducted by email, clarify that developers can replace the secretary, provide a way for developers to override the secretary, and codify that elections need to provide a means for voters to verify their votes were included and permit independent verification of the outcome. The proposal immediately attracted a half-dozen "seconds", which is sufficient for it to make its way to voters.

Before that, though, amendments and other ballot options can be proposed. Armstrong wanted to ensure that email was still an option: "e-mail should continue to be an option for casting votes even while alternative methods of casting ballots might also be allowed". Hartman was not opposed to the overall idea, but did not want to require a 3:1 supermajority to change the constitution in order to switch away from email to some other system. Armstrong agreed with part of that, but did not want to completely eliminate the email option without a 3:1 vote:

I don't want it to take a 3:1 majority to add additional methods (web based, I'm presuming), but I think not allowing a signed (and/or encrypted) emailed ballot to count should require a 3:1 majority. [The former potentially allows more valid voters to vote, the latter potentially reduces who can vote.]

Hartman recommended a ballot option for email in that case. Bundling all five of the changes that Hartman proposed was seen as a possible problem by Judit Foglszinger and others. Foglszinger proposed a ballot option that just encompassed two of the changes: secret ballots and the codification of voter and outcome verification. The email change, secretary removal provision, and secretarial override change would be dropped: "So it's the proposed GR minus the changes not directly related to introducing secret votes."

Once again, Hartman expressed concern about needing to change the constitution in order to adopt a new voting system, but Scott Kitterman did not see that as a real problem: "So far the rate of change in voting systems leads me to believe this is a quite manageable burden." Beyond that, Bdale Garbee worried that changing the system without a GR could lead to a less well-understood voting system:

Requiring a GR to change the mechanism seems like a completely reasonable way to both vet the proposed change, and ensure a maximum number of potential voters understand what's changing.

Martin Michlmayr said that the two changes on replacing or overriding the secretary seem unrelated to hiding voter's identities, but make up most of the changes in the text. If those two are directly important to secret voting, that should be made clearer, he said; otherwise, they should not all be bundled together. Hartman replied that the constitution already gives the secretary broad powers over elections, but there is currently no clear recourse if developers disagree with the secretary's decision. So he would like to see all of the changes adopted, but he is not comfortable adding secret voting without putting in some kind of recourse for an unpopular decision by the secretary.

That raises the specter of a combinatorial explosion of options, though it is perhaps not as bad as all possible combinations of the five options. The two secretary curbs seem to go together, as do the two vote changes, which, when coupled with the email change, might reduce it to only all possible combinations of the three options—along with "further discussion", or "none of the above" as it is now known, of course.

That's where things stand at the moment, though the conversation is ongoing. The full ballot will shake out relatively soon, as there will be two or three weeks of discussion and ballot additions, starting February 23. One of the tweaks made in the recent decision-process GR has firmed up the timing of the discussion period, so it will be three weeks long unless the DPL reduces it by a week, which seems relatively unlikely. After that, a two-week voting period will follow. All of that should play out over the next month or so. It will be interesting to see where it leads.

Comments (23 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>