LWN.net Weekly Edition for February 13, 2014
Debian decides on systemd—for now
Technical Committee (TC) chair Bdale Garbee has made his casting vote in favor of systemd as the default init system for Debian "jessie" (8.0). That would seem to put an end to a contentious whipsaw of multiple calls for votes (CFVs) on competing proposals, CFVs ending because the wording was sub-optimal, and other bits of drama, at least for the TC. It is hard to imagine that we have seen the end of this controversy for Debian as a whole, but it is a concrete step toward resolution. There are a number of developments to report since we last looked at the tussle back at the end of January.
Another CFV
On February 5, Ian Jackson called for a vote on his ballot that included language about "loose" vs. "tight" coupling for whatever default init system was chosen. Essentially, "loose coupling" means that no other package can depend on a particular init system running as PID 1, while "tight coupling" would allow such dependencies. Jackson is concerned about GNOME and other packages that depend on systemd for "proper" functioning and would like to see Debian reject that approach. His ballot would also choose the default init system; the various combinations made for ten separate options (each of the four init systems with each coupling possibility, plus the general resolution and further discussion options).
Jackson's CFV came about after several revisions on the tech-ctte mailing list, as well as an IRC meeting to discuss wording. But he eventually withdrew support for the proposal by voting for further discussion (FD) first, after complaints from Steve Langasek and Debian project secretary Kurt Roeckx. Both of them were concerned about wording issues, so Jackson rolled up his sleeves and went back to work.
Another IRC meeting was scheduled and Jackson posted a
timetable for how he thought the voting
process should proceed. He also made it abundantly clear that any CFV
posted without allowing him to amend it to tie it to the coupling question
would be, in his eyes, "a clear breach of process
".
YA CFV
But posting a CFV is exactly what Garbee did on February 8. From that message, it is clear that Garbee saw no value in complicating the ballot by tying it to the coupling question:
While it was confrontational to call for a vote (and thus preclude amendments) in the face of Jackson's stated plans and warning, it's not clear what else Garbee could have done. There was obviously no consensus on the committee about the contents of the ballot and Jackson made it quite clear that he would offer amendments to a minimal ballot if offered the chance, so Garbee had to circumvent that to put a minimal question before the TC. As TC member Russ Allbery put it:
Garbee acknowledged that the members could override his choice by voting
FD, and that Jackson almost certainly would, "but I
sincerely hope the rest of you will not do that and instead vote this
ballot to a useful conclusion
". The question was what the default
init system for Debian Linux (and not the Debian kFreeBSD or Hurd ports)
should be for the jessie release. It also included
language that would allow the project to override the TC's choice via a
general resolution (GR) with a simple majority vote (rather than the usual
2:1 majority required for overriding the TC).
Decision
The vote came down much as expected, with a 4:4 split between systemd and Upstart proponents. Anthony Towns analyzed the votes and declared a tie between systemd and Upstart, which left it up to the chairman to decide by using the casting vote. Garbee did just that, voting for systemd, which makes it the Debian Linux default init for jessie. At least for now, since the prospect of a GR to decide is being bandied about.
There is another concern that Garbee expressed in his CFV: does the TC have the "jurisdiction" to decide on the coupling question at all. Opinions vary, but coupling was certainly not part of the original question that was asked of the TC.
The coupling question is a bit murky from a
Debian Constitution standpoint, as Roeckx is concerned that a binding
decision from the TC (rather than just an advisory one) would run counter
to the committee's mandate. In section 6.3.5 of the Constitution, the committee is restricted
from "engag[ing] in design of new proposals and policies
" and
should be "choosing from or adopting compromises between solutions
and decisions which have been proposed and reasonably thoroughly discussed
elsewhere
". Deciding on loose vs. tight coupling might violate
either or both, though Roeckx has not offered a formal opinion on that
question. Jackson took strong exception to Roeckx's concerns and
claimed that the Project Secretary should not be able to overrule the
TC:
I think all of these things are very dangerous territory for the Secretary. The Secretary should avoid getting involved in the substance of these kind of subjective disputes about what is and is not sufficiently ripe, or what is or isn't detailed design, or what is or isn't sufficient consultation.
Roeckx did not disagree about the Secretary's powers, but he did see the possibility that a dispute might arise. He would have to make a decision at that point, which would presumably be handled by a GR if other Debian Developers (DDs) agreed. Mostly, he would like to avoid the problem altogether:
Aftermath
Garbee's CFV clearly made Jackson extremely angry. As Garbee predicted, Jackson voted FD above all other choices (and ranked the init choices in the following order: sysvinit, OpenRC, Upstart, systemd), but he also let his anger get away from him. He quickly called for three separate votes on various earlier proposals, as well as a call for the TC to ask Garbee to resign as its chair. That was quickly followed up with a note that he would be stepping away from the keyboard and taking a break from TC business for a few days.
As Allbery noted in the message linked above and another, both of which were in response to a
call for
Jackson to be removed from the TC, it is not surprising that Jackson got so
angry at what he sees as a "betrayal
". Allbery
argued that Jackson retreated into the details of the process because the
decision was so divisive:
He goes on to show how "real world" legislative bodies suffer from some of the same troubles. The US Senate has been fighting about its procedures for some years, and a unilateral change to those traditions led to some of the same kind of rancor we see here. But Allbery brought up another important point: the tradition of consensus ballot-building (and, in general, largely consensus-based decisions) in the TC meant that the corner cases when things got messy had never been tested. This particular incident has shown a number of those corner cases, any or all of which might lead to GRs to fix them.
One of those corner cases was alluded to by Garbee in his latest CFV: that
the Condorcet
voting method, coupled with Debian's vote resolution procedure could
produce results that were, at best, unexpected. As Allbery said, the
procedure fails the
"later no
harm" criterion. In a nutshell, that means that a participant can cause
their preferred outcome to fail by ranking less-preferred outcomes (e.g. an
Upstart proponent could cause systemd to win by ranking it second to
Upstart).
Sébastien Villemot posted an analysis
showing one possible surprising outcome: "In particular, if the ballot had not included the options about
coupling, then systemd would have won because of the casting vote of the
chairman.
"
It is not exactly clear why Jackson is so insistent that the two questions be tied together. He said that the coupling question was the most important in his mind and that voting for the default init first would make it difficult for him to decide where to rank systemd as he is trying to avoid the systemd-tight-coupling pairing. But there is nothing stopping him from assuming that tight coupling wins and voting down systemd accordingly. In the end analysis, there never seemed to be enough votes for loose coupling to win, either on its own or in a ballot with the default init question, which is part of why some saw Jackson's actions as largely just delaying the inevitable.
There are ongoing discussions between Langasek and Allbery to reach some kind of accord on the coupling question (or at least a ballot that may be agreeable). TC member Keith Packard is optimistic that enough progress is being made in that discussion that it may be resolvable without the TC having to make a decision at all:
§6.3.6 says the TC should only be applied when other efforts have failed. Frankly, it's pretty darn hard to see the current discussions as 'failed' given how much progress has been made.
Next steps
Jackson reappeared on the mailing list on February 12. He posted a draft of the coupling question (along
with the wording to allow a majority override via GR) that he plans to put
up for a vote on February 14 (with amendments as suggested by other TC
members in the interim). He also issued an ultimatum: he will propose
and/or sponsor a GR on the init question (presumably the default init
system and the coupling question) under a few different scenarios. If his
proposal results in a vote of FD or if anyone calls an immediate CFV on a
related issue without giving him the opportunity to amend it, he will
immediately propose a GR. Beyond that: "If the TC's conclusion on the coupling question is IMO not
sufficiently robust I will probably canvass opinions before deciding
whether to propose a GR.
"
The GR question is, of course, the big one moving forward. It's hard to imagine that six DDs—the number required to put a GR to a vote—won't come together to try to push the decision in one direction or the other, so the only real question may be what form it will take. Will there be push to switch the default to Upstart? Or, as some think more likely, a move back to the status quo of sysvinit? Or will Jackson sponsor a GR? The possibility of multiple conflicting GRs also exists. There are, undoubtedly, wagers being made about various possible outcomes, but for now the default for Debian in the jessie release for Linux will be systemd. Stay tuned for further updates ...
New features in Python 3.4
Python major releases come at roughly 18-month intervals, so the announcement of the first release candidate for Python 3.4 is pretty much right on schedule—Python 3.3 was released in September 2012. There are no changes to the core language in this release, but there are changes to the implementation and libraries; we will look at some of that here.
Python development proceeds in a generally orderly fashion. Python Enhancement Proposals (PEPs) are proposed, discussed, and pronounced upon by BDFL Guido van Rossum (or his designee for that PEP, the BDFL-Delegate). If they are accepted, they get added into the next release of the language. So, for Python 3.4, there is a list of PEPs that were incorporated into the release.
There are, for example, new modules that have been added to the standard library. Unlike a number of other "scripting" languages, Python comes "batteries included" with a large and diverse set of utility modules. A new enum type has been added to bind symbolic names to constant values. We looked at the discussions around the new type back in May 2013. It will allow code like the following:
from enum import Enum class Color(Enum): red = 1 blue = 2 green = 3
Another addition is the statistics module. It provides a basic set of statistical functions, such as mean, median, mode, variance, standard deviation, and so on. It is not meant to be an advanced statistical function library; for that, users should turn to NumPy or some other tool.
Van Rossum's latest project (beyond just shepherding Python development, of
course) is the asyncio
module for supporting asynchronous I/O in the standard library. There
have long been solutions outside of the standard library for doing
asynchronous I/O
(e.g. Twisted, Tornado, gevents, ...). Van Rossum looked at the existing frameworks and came up
with his own. It adopts some features from Twisted (Transports and
Protocols), but has a way for those who don't like callbacks (Van Rossum
doesn't) to use a combination of Futures and yield from
to create
coroutines. It is complicated to understand (he called it
"brain-exploding
" at PyCon 2013), but is eagerly
anticipated in some circles.
Developers are likely to appreciate the addition of a tracemalloc module. Valgrind and other tools can report on memory allocation problems, but those reports are based on the C code in the interpreter rather than the Python code that triggered the problems. Using tracemalloc will allow developers to see a Python traceback from the location where objects were allocated. For ease of use, it can be enabled via an environment variable or command-line flag.
The final new module added for 3.4 is an object-oriented path-handling module, which is based on the existing Python Package Index (PyPI) pathlib module and uses the same name. It provides a higher-level abstraction than the os.path module does and will make a lot of path-handling code easier to write. It comes with a rich set of operations to handle matching, globbing, various kinds of path decomposition, and so on.
The standard functools module gets some additional functionality to support "single dispatch" functions. These are functions that have different implementations based on the type of a single argument (thus "single" dispatch). That will allow separate functions to implement the functionality for each different type:
from functools import singledispatch @singledispatch def fun(arg, optarg=False): print(arg) @fun.register(int) def _(arg, optarg=False): print(arg + 10) @fun.register(list) def _(arg, optarg=False): for i in arg: print(i)The singledispatch decorator indicates that fun() will have different implementations based on the type of arg. The register() calls then set up each function, so calling fun() with different argument types would look something like this:
>>> fun(10) 20 >>> fun([ 'a', 'list' ]) a list >>> fun(30.9) 30.9
One of the headline features of 3.4 will surely be the inclusion of the pip installer bundled with the language. Python has long lacked a standard installer for external modules, so the addition of pip will be welcomed by many. Nick Coghlan, one of the PEP's authors, spoke about Python packaging at linux.conf.au earlier this year.
The Python import system got an upgrade as well. The ModuleSpec type has been added to hold a number of attributes about modules (name, location, loader, etc.) that can be used when extending the import system.
The hash algorithm used for strings (and bytes) has been changed to use SipHash on most systems to eliminate a problem with hash collisions. Those hashes are used for dictionary keys; a large number of collisions can result in a major performance decrease and, eventually, a denial of service. We looked at the switch back in November.
A new pickle protocol has been added (protocol number 4). It will be more space-efficient, support large objects requiring 64-bit lengths, handle sets and frozensets, and more.
Another smaller enhancement is to make subprocess
file-descriptors not be inheritable by children. It will use the
O_CLOEXEC flag to the open() system call on POSIX systems
to ensure that children don't inherit the parent's descriptors, which can
cause a number of race conditions and security holes. "We
are aware of the code breakage this is likely to cause, and doing it anyway
for the good of mankind.
"
There are also a number of new features in the CPython code (which is what implements the standard Python interpreter). An improvement to the garbage collector will allow objects with finalizers (e.g. a __del__() method) to be reclaimed even if they are part of reference cycle. There is also a new API for memory allocators. Under some circumstances, like embedding the interpreter or running it on low-memory devices, a different memory allocator is desirable. This PEP adds a stable API to do so.
In addition, Python 3.4 release manager Larry Hastings has added the Argument Clinic, which is a domain-specific language (DSL) to simplify the argument processing of "built-ins" (language functions that are implemented in CPython). It is a pre-processor that is run on suitably annotated C files that adds C code right after the annotation (which lives in a comment). An Argument Clinic Derby was held recently to add the annotation to much of C source of the interpreter.
There are also lots of bug fixes that went into the release, of course. Those are detailed in the Changelog. If the current release schedule holds, we can expect the final Python 3.4 release on March 16.
Security
Enigmail vs Thunderbird vs line-wrapping
There are plenty of security buffs who lament that it may be too late for PGP encryption to ever become common practice for email among the general public, but many of them continue to believe that PGP signatures on email still have a fighting chance. After all, the signature adds its value without making the message unreadable to those recipients who lack the proper software support. Yet "proper software support" is a tricky level to achieve, as users of Mozilla Thunderbird have known for a while. A longstanding problem in the way Thunderbird interacts with the PGP add-on Enigmail triggers false signature-mismatch warnings in certain situations (not all of which are under the user's control), illustrating yet again how difficult implementing security in the email realm can be.
In a recent blog entry about encouraging GnuPG usage among Debian Developers, Jo Shields wrote about the problem, telling readers to avoid Enigmail entirely:
Such a claim might sound shocking, considering that Enigmail is one of the most popular Thunderbird add-ons and Thunderbird one of the most popular desktop email applications. Surely, if there was such a major bug, it would have gotten fixed quickly. But many others have pointed out the same problem over the course of several years—at least since 2007, and as recently as last year.
Essentially, the trouble happens when Enigmail attaches an inline PGP signature to an email in Thunderbird's HTML message composer. The HTML composer is a different component than the plain-text composer, and it performs some "clean up" on the message body after the user hits send. That is an obvious recipe for trouble, since it occurs after the signature was computed over the message. Any alterations, including those that are invisible to the user (such as white-space changes or replacing special characters with HTML character codes) will alter the hash value of the message, which is the element of the signature that is encrypted by the sender's private key.
In this case, the alteration that happens to the message body is automatic line-wrapping. Thunderbird's line-wrapping for HTML messages breaks lines that exceed 79 characters (or whatever the value of the mailnews.wraplength preference is set to), so not every message is affected. In an attempt to avert this trouble, Enigmail performs its own line-wrapping on the message body just before generating the signature, at mailnews.wraplength - 2.
Nevertheless, there are invariably some situations when a single
"word" is longer than 77 characters; the simplest example is a lengthy
URL. In these situations, the automatic line-wrapping Thunderbird
performs after Enigmail has processed the message splits the long line
at the mailnews.wraplength point when it is sent, therefore the
signature no longer validates when the email recipient's PGP client checks
it. Changing Thunderbird's line-wrapping behavior is not simple
either; it requires
changing several preferences. As Enigmail lead developer Patrick
Brunschwig said in a 2009 comment thread
(comment #10), "The problem behind it is that Mozilla is too
clever -- it re-wraps the message after Enigmail has signed it, even
though Enigmail already applied line wrapping with the same methods as
HTML.
" Since Thunderbird provides a constrained API for
extensions, there is nothing Enigmail can do. Thus, he continued,
"the only solutions I have are: either use PGP/MIME or write
plaintext messages.
"
Unfortunately, while support for inline PGP signatures is fairly widespread, support for PGP/MIME (which in essence makes the signature a message attachment) is less common—particularly with proprietary email clients. In addition, Thunderbird's default behavior is to compose replies in the same format as the original email; one can force it to reply to an HTML email with plain text by holding down the "Shift" key when punching the "Reply" button or by altering each account's composition settings, but both options seem like an unnecessary hassle. After all, as quite a few bug reporters have noted in the various bug reports about this topic, it is at the very least odd that Thunderbird auto-line-wraps HTML messages but does not do the same to plain-text messages. It would seem like HTML content could be sent as-is, leaving the receiver's email client to render the message in however many columns are available.
Plain-text emails are not problem-free either, however. Thunderbird's default is to send plain text in the format=flowed (RFC 2646) format, which can lose leading spaces; Enigmail tries to compensate for this by transforming leading spaces to "~". Moreover, Enigmail also dash-escapes plain text (as required by the OpenPGP specification), which regularly causes problems for people emailing software patches with signatures.
One way to look at the whole mess is that the root of the problem is the existence of two ways to include a PGP signature in a message (inline and through PGP/MIME), two code paths to compose email in Thunderbird (plain text and HTML), three programs that process the message between the user hitting "send" and the email leaving the machine (GnuPG, Enigmail, and Thunderbird), and multiple preferences that affect line-wrapping. There is certainly no shortage of opportunities for finger-pointing, but considering all of the variables involved, an equally defensible conclusion is that digital email signatures—despite their relatively small size on screen—ultimately cannot be simplified down to point-and-click ease.
Brief items
Security quotes of the week
New vulnerabilities
chromium: multiple vulnerabilities
Package(s): | chromium-browser-stable | CVE #(s): | CVE-2013-6641 CVE-2013-6643 CVE-2013-6644 CVE-2013-6645 CVE-2013-6646 CVE-2013-6649 CVE-2013-6650 | ||||||||||||||||||||||||||||
Created: | February 10, 2014 | Updated: | March 10, 2014 | ||||||||||||||||||||||||||||
Description: | From the CVE entries:
Use-after-free vulnerability in the FormAssociatedElement::formRemovedFromTree function in core/html/FormAssociatedElement.cpp in Blink, as used in Google Chrome before 32.0.1700.76 on Windows and before 32.0.1700.77 on Mac OS X and Linux, allows remote attackers to cause a denial of service or possibly have unspecified other impact by leveraging improper handling of the past names map of a FORM element. (CVE-2013-6641) The OneClickSigninBubbleView::WindowClosing function in browser/ui/views/sync/one_click_signin_bubble_view.cc in Google Chrome before 32.0.1700.76 on Windows and before 32.0.1700.77 on Mac OS X and Linux allows attackers to trigger a sync with an arbitrary Google account by leveraging improper handling of the closing of an untrusted signin confirm dialog. (CVE-2013-6643) Multiple unspecified vulnerabilities in Google Chrome before 32.0.1700.76 on Windows and before 32.0.1700.77 on Mac OS X and Linux allow attackers to cause a denial of service or possibly have other impact via unknown vectors. (CVE-2013-6644) Use-after-free vulnerability in the OnWindowRemovingFromRootWindow function in content/browser/web_contents/web_contents_view_aura.cc in Google Chrome before 32.0.1700.76 on Windows and before 32.0.1700.77 on Mac OS X and Linux allows user-assisted remote attackers to cause a denial of service or possibly have unspecified other impact via vectors involving certain print-preview and tab-switch actions that interact with a speech input element. (CVE-2013-6645) Use-after-free vulnerability in the Web Workers implementation in Google Chrome before 32.0.1700.76 on Windows and before 32.0.1700.77 on Mac OS X and Linux allows remote attackers to cause a denial of service or possibly have unspecified other impact via vectors related to the shutting down of a worker process. (CVE-2013-6646) Use-after-free vulnerability in the RenderSVGImage::paint function in core/rendering/svg/RenderSVGImage.cpp in Blink, as used in Google Chrome before 32.0.1700.102, allows remote attackers to cause a denial of service or possibly have unspecified other impact via vectors involving a zero-size SVG image. (CVE-2013-6649) The StoreBuffer::ExemptPopularPages function in store-buffer.cc in Google V8 before 3.22.24.16, as used in Google Chrome before 32.0.1700.102, allows remote attackers to cause a denial of service (memory corruption) or possibly have unspecified other impact via vectors that trigger incorrect handling of "popular pages." (CVE-2013-6650) | ||||||||||||||||||||||||||||||
Alerts: |
|
chrony: distributed denial of service via amplification
Package(s): | chrony | CVE #(s): | CVE-2014-0021 | ||||||||||||
Created: | February 6, 2014 | Updated: | February 20, 2014 | ||||||||||||
Description: | From the Red Hat bugzilla entry:
Miroslav Lichvar from Red Hat reports that the cmdmon protocol implemented in chrony was found to be vulnerable to DDoS attacks using traffic amplification. By default, commands are allowed only from localhost, but it's possible to configure chronyd to allow commands from any address. This could allow a remote attacker to cause a DoS, which could cause excessive resource usage. | ||||||||||||||
Alerts: |
|
fwsnort: code execution
Package(s): | fwsnort | CVE #(s): | CVE-2014-0039 | ||||||||
Created: | February 12, 2014 | Updated: | February 12, 2014 | ||||||||
Description: | From the CVE entry:
Untrusted search path vulnerability in fwsnort before 1.6.4, when not running as root, allows local users to execute arbitrary code via a Trojan horse fwsnort.conf in the current working directory. | ||||||||||
Alerts: |
|
icedtea-web: denial of service
Package(s): | icedtea-web | CVE #(s): | |||||
Created: | February 7, 2014 | Updated: | February 12, 2014 | ||||
Description: | From the Fedora advisory: Multiple applets on one page cause deadlock. The Red Hat bug report contains additional information: This is deadlock caused by multiple applets with shared classloader. | ||||||
Alerts: |
|
icedtea-web: insecure temporary file use
Package(s): | icedtea-web | CVE #(s): | CVE-2013-6493 | ||||||||||||||||||||
Created: | February 7, 2014 | Updated: | March 6, 2014 | ||||||||||||||||||||
Description: | From the Fedora advisory: insecure temporary file use flaw in LiveConnect implementation. | ||||||||||||||||||||||
Alerts: |
|
ikiwiki: javascript code injection
Package(s): | ikiwiki | CVE #(s): | |||||||||
Created: | February 10, 2014 | Updated: | February 12, 2014 | ||||||||
Description: | From the Red Hat bugzilla:
It was found that the osm plugin for ikiwiki uses htmlscrubber (if enabled) to sanitize some parameters. Even when it is enabled, it was found that it still does not correctly escape some fields. In particular, the "name" parameter is included verbatim, breaking involuntarily javascript when the name contains a single quote/apostrophe ('). Due to this, javascript code injection might become trivial. | ||||||||||
Alerts: |
|
kernel: denial of service
Package(s): | kernel | CVE #(s): | CVE-2013-6431 | ||||
Created: | February 7, 2014 | Updated: | February 12, 2014 | ||||
Description: | From the CVE entry: The fib6_add function in net/ipv6/ip6_fib.c in the Linux kernel before 3.11.5 does not properly implement error-code encoding, which allows local users to cause a denial of service (NULL pointer dereference and system crash) by leveraging the CAP_NET_ADMIN capability for an IPv6 SIOCADDRT ioctl call. | ||||||
Alerts: |
|
kernel: denial of service
Package(s): | kernel | CVE #(s): | CVE-2013-6432 | ||||||||||||
Created: | February 7, 2014 | Updated: | February 12, 2014 | ||||||||||||
Description: | From the CVE entry: The ping_recvmsg function in net/ipv4/ping.c in the Linux kernel before 3.12.4 does not properly interact with read system calls on ping sockets, which allows local users to cause a denial of service (NULL pointer dereference and system crash) by leveraging unspecified privileges to execute a crafted application. | ||||||||||||||
Alerts: |
|
kernel: multiple vulnerabilities
Package(s): | kernel | CVE #(s): | CVE-2014-1438 CVE-2014-1446 CVE-2014-1690 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | February 10, 2014 | Updated: | February 12, 2014 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the CVE entries:
The restore_fpu_checking function in arch/x86/include/asm/fpu-internal.h in the Linux kernel before 3.12.8 on the AMD K7 and K8 platforms does not clear pending exceptions before proceeding to an EMMS instruction, which allows local users to cause a denial of service (task kill) or possibly gain privileges via a crafted application. (CVE-2014-1438) The yam_ioctl function in drivers/net/hamradio/yam.c in the Linux kernel before 3.12.8 does not initialize a certain structure member, which allows local users to obtain sensitive information from kernel memory by leveraging the CAP_NET_ADMIN capability for an SIOCYAMGCFG ioctl call. (CVE-2014-1446) Linux kernel built with the NetFilter Connection Tracking(NF_CONNTRACK) support for IRC protocol(NF_NAT_IRC), is vulnerable to an information leakage flaw. It could occur when communicating over direct client-to-client IRC connection(/dcc) via a NAT-ed network. Kernel attempts to mangle IRC TCP packet's content, wherein an uninitialised 'buffer' object is copied to a socket buffer and sent over to the other end of a connection. (CVE-2014-1690) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
libav: multiple vulnerabilities
Package(s): | libav | CVE #(s): | CVE-2011-3944 CVE-2013-0845 CVE-2013-0846 CVE-2013-0849 CVE-2013-0865 CVE-2013-7010 CVE-2013-7014 CVE-2013-7015 | ||||||||||||
Created: | February 6, 2014 | Updated: | February 12, 2014 | ||||||||||||
Description: | From the Debian advisory:
Several security issues have been corrected in multiple demuxers and decoders of the libav multimedia library. The IDs mentioned above are just a portion of the security issues fixed in this update. A full list of the changes is available at http://git.libav.org/?p=libav.git;a=blob;f=Changelog;hb=r... | ||||||||||||||
Alerts: |
|
libcommons-fileupload-java: denial of service
Package(s): | libcommons-fileupload-java | CVE #(s): | CVE-2014-0050 | ||||||||||||||||||||||||||||||||||||||||||||||||
Created: | February 10, 2014 | Updated: | April 18, 2014 | ||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Debian advisory:
It was discovered that the Apache Commons FileUpload package for Java could enter an infinite loop while processing a multipart request with a crafted Content-Type, resulting in a denial-of-service condition. | ||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
libspring-java: multiple vulnerabilities
Package(s): | libspring-java | CVE #(s): | CVE-2013-6429 CVE-2013-6430 | ||||||||
Created: | February 10, 2014 | Updated: | February 12, 2014 | ||||||||
Description: | From the Debian advisory:
It was discovered by the Spring development team that the fix for the XML External Entity (XXE) Injection (CVE-2013-4152) in the Spring Framework was incomplete. Spring MVC's SourceHttpMessageConverter also processed user provided XML and neither disabled XML external entities nor provided an option to disable them. SourceHttpMessageConverter has been modified to provide an option to control the processing of XML external entities and that processing is now disabled by default. In addition Jon Passki discovered a possible XSS vulnerability: The JavaScriptUtils.javaScriptEscape() method did not escape all characters that are sensitive within either a JS single quoted string, JS double quoted string, or HTML script data context. In most cases this will result in an unexploitable parse error but in some cases it could result in an XSS vulnerability. | ||||||||||
Alerts: |
|
mediawiki: code execution
Package(s): | mediawiki | CVE #(s): | CVE-2014-1610 | ||||||||||||||||||||||||||||||||
Created: | February 7, 2014 | Updated: | February 12, 2014 | ||||||||||||||||||||||||||||||||
Description: | From the CVE entry: MediaWiki 1.22.x before 1.22.2, 1.21.x before 1.21.5 and 1.19.x before 1.19.11, when DjVu or PDF file upload support is enabled, allows remote attackers to execute arbitrary commands via shell metacharacters in (1) the page parameter to includes/media/DjVu.php; (2) the w parameter (aka width field) to thumb.php, which is not properly handled by includes/media/PdfHandler_body.php; and possibly unspecified vectors in (3) includes/media/Bitmap.php and (4) includes/media/ImageHandler.php. | ||||||||||||||||||||||||||||||||||
Alerts: |
|
mozilla: multiple vulnerabilities
Package(s): | firefox, thunderbird, seamonkey | CVE #(s): | CVE-2014-1478 CVE-2014-1480 CVE-2014-1483 CVE-2014-1484 CVE-2014-1485 CVE-2014-1489 CVE-2014-1488 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | February 10, 2014 | Updated: | January 26, 2015 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the CVE entries:
Multiple unspecified vulnerabilities in the browser engine in Mozilla Firefox before 27.0 and SeaMonkey before 2.24 allow remote attackers to cause a denial of service (memory corruption and application crash) or possibly execute arbitrary code via vectors related to the MPostWriteBarrier class in js/src/jit/MIR.h and stack alignment in js/src/jit/AsmJS.cpp in OdinMonkey, and unknown other vectors. (CVE-2014-1478) The file-download implementation in Mozilla Firefox before 27.0 and SeaMonkey before 2.24 does not properly restrict the timing of button selections, which allows remote attackers to conduct clickjacking attacks, and trigger unintended launching of a downloaded file, via a crafted web site. (CVE-2014-1480) Mozilla Firefox before 27.0 and SeaMonkey before 2.24 allow remote attackers to bypass the Same Origin Policy and obtain sensitive information by using an IFRAME element in conjunction with certain timing measurements involving the document.caretPositionFromPoint and document.elementFromPoint functions. (CVE-2014-1483) Mozilla Firefox before 27.0 on Android 4.2 and earlier creates system-log entries containing profile paths, which allows attackers to obtain sensitive information via a crafted application. (CVE-2014-1484) The Content Security Policy (CSP) implementation in Mozilla Firefox before 27.0 and SeaMonkey before 2.24 operates on XSLT stylesheets according to style-src directives instead of script-src directives, which might allow remote attackers to execute arbitrary XSLT code by leveraging insufficient style-src restrictions. (CVE-2014-1485) Mozilla Firefox before 27.0 does not properly restrict access to about:home buttons by script on other pages, which allows user-assisted remote attackers to cause a denial of service (session restore) via a crafted web site. (CVE-2014-1489) The Web workers implementation in Mozilla Firefox before 27.0 and SeaMonkey before 2.24 allows remote attackers to execute arbitrary code via vectors involving termination of a worker process that has performed a cross-thread object-passing operation in conjunction with use of asm.js. (CVE-2014-1488) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
mupdf: denial of service
Package(s): | mupdf | CVE #(s): | CVE-2014-2013 | ||||||||||||||||||||||||
Created: | February 6, 2014 | Updated: | December 29, 2014 | ||||||||||||||||||||||||
Description: | From the Red Hat bugzilla entry:
A stack-based buffer overflow was found in mupdf's xps_parse_color() function. An attacker could create a specially crafted XPS file that, when opened, could cause mupdf or an application using mupdf to crash. | ||||||||||||||||||||||||||
Alerts: |
|
pam_skey: information disclosure
Package(s): | pam_skey | CVE #(s): | CVE-2013-4285 | ||||
Created: | February 10, 2014 | Updated: | February 12, 2014 | ||||
Description: | From the Gentoo advisory:
Ulrich Müller reported that a Gentoo patch to PAM S/Key does not remove credentials provided by the user from memory. A local attacker with privileged access could inspect a memory dump to gain access to cleartext credentials provided by users. | ||||||
Alerts: |
|
parcimonie: information disclosure
Package(s): | parcimonie | CVE #(s): | CVE-2014-1921 | ||||
Created: | February 12, 2014 | Updated: | February 12, 2014 | ||||
Description: | From the Debian advisory:
Holger Levsen discovered that parcimonie, a privacy-friendly helper to refresh a GnuPG keyring, is affected by a design problem that undermines the usefulness of this piece of software in the intended threat model. When using parcimonie with a large keyring (1000 public keys or more), it would always sleep exactly ten minutes between two key fetches. This can probably be used by an adversary who can watch enough key fetches to correlate multiple key fetches with each other, which is what parcimonie aims at protecting against. Smaller keyrings are affected to a smaller degree. This problem is slightly mitigated when using a HKP(s) pool as the configured GnuPG keyserver. | ||||||
Alerts: |
|
socat: denial of service
Package(s): | socat | CVE #(s): | CVE-2014-0019 | ||||||||||||||||||||
Created: | February 12, 2014 | Updated: | February 17, 2014 | ||||||||||||||||||||
Description: | From the CVE entry:
Stack-based buffer overflow in socat 1.3.0.0 through 1.7.2.2 and 2.0.0-b1 through 2.0.0-b6 allows local users to cause a denial of service (segmentation fault) via a long server name in the PROXY-CONNECT address in the command line. | ||||||||||||||||||||||
Alerts: |
|
thunderbird: multiple vulnerabilities
Package(s): | thunderbird | CVE #(s): | CVE-2014-1490 CVE-2014-1491 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | February 7, 2014 | Updated: | February 20, 2014 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the CVE entries: Race condition in libssl in Mozilla Network Security Services (NSS) before 3.15.4, as used in Mozilla Firefox before 27.0, Firefox ESR 24.x before 24.3, Thunderbird before 24.3, SeaMonkey before 2.24, and other products, allows remote attackers to cause a denial of service (use-after-free) or possibly have unspecified other impact via vectors involving a resumption handshake that triggers incorrect replacement of a session ticket. (CVE-2014-1490). Mozilla Network Security Services (NSS) before 3.15.4, as used in Mozilla Firefox before 27.0, Firefox ESR 24.x before 24.3, Thunderbird before 24.3, SeaMonkey before 2.24, and other products, does not properly restrict public values in Diffie-Hellman key exchanges, which makes it easier for remote attackers to bypass cryptographic protection mechanisms in ticket handling by leveraging use of a certain value. (CVE-2014-1491). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
xen: multiple vulnerabilities
Package(s): | xen | CVE #(s): | CVE-2014-1891 CVE-2014-1892 CVE-2014-1893 CVE-2014-1894 CVE-2014-1896 | ||||||||||||||||||||||||||||||||||||
Created: | February 12, 2014 | Updated: | February 25, 2014 | ||||||||||||||||||||||||||||||||||||
Description: | From the Xen advisories [1, 2, 3]:
[1] The FLASK_{GET,SET}BOOL, FLASK_USER and FLASK_CONTEXT_TO_SID suboperations of the flask hypercall are vulnerable to an integer overflow on the input size. The hypercalls attempt to allocate a buffer which is 1 larger than this size and is therefore vulnerable to integer overflow and an attempt to allocate then access a zero byte buffer. (CVE-2014-1891) Xen 3.3 through 4.1, while not affected by the above overflow, have a different overflow issue on FLASK_{GET,SET}BOOL (CVE-2014-1893) and expose unreasonably large memory allocation to arbitrary guests (CVE-2014-1892). Xen 3.2 (and presumably earlier) exhibit both problems with the overflow issue being present for more than just the suboperations listed above. (CVE-2014-1894 for the subops not covered above.) The FLASK_GETBOOL op is available to all domains. The FLASK_SETBOOL op is only available to domains which are granted access via the Flask policy. However the permissions check is performed only after running the vulnerable code and the vulnerability via this subop is exposed to all domains. The FLASK_USER and FLASK_CONTEXT_TO_SID ops are only available to domains which are granted access via the Flask policy. Attempting to access the result of a zero byte allocation results in a processor fault leading to a denial of service. [2] The FLASK_AVC_CACHESTAT hypercall, which provides access to per-cpu statistics on the Flask security policy, incorrectly validates the CPU for which statistics are being requested. An attacker can cause the hypervisor to read past the end of an array. This may result in either a host crash, leading to a denial of service, or access to a small and static region of hypervisor memory, leading to an information leak. [3] libvchan (a library for inter-domain communication) does not correctly handle unusual or malicious contents in the xenstore ring. A malicious guest can exploit this to cause a libvchan-using facility to read or write past the end of the ring. libvchan-using facilities are vulnerable to denial of service and perhaps privilege escalation. | ||||||||||||||||||||||||||||||||||||||
Alerts: |
|
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The current development kernel is 3.14-rc2, released on February 9. Linus noted that the patch volume has been light, but worried that kernel developers are lurking in the background waiting to dump more stuff on him. "Because I know kernel developers, and they are sneaky. I suspect Davem (to pick somebody not at random) is giggling to himself, waiting for this release message, planning to send me some big-ass pull request tomorrow."
Stable updates: 3.13.2, 3.12.10, 3.10.29, and 3.4.79 were released on February 6.
The 3.13.3, 3.12.11, 3.10.30, and 3.4.80 updates are in the review process as of this writing. Greg notes that these updates may be a bit more problematic than some:
Test these out well, they have barely survived my systems, and I don't trust them in the slightest to not eat your disks, reap your tasks, and run away laughing as your CPU turns into a space heater.
Assuming the carnage turns out not to be that bad, these updates can be expected on or after February 13.
3.2.55 is also in the review process, with a release expected on or after the 14th.
Quotes of the week
The reason for this is quite simple: the current 31 bit kernel was broken for nearly a year before somebody noticed.
Kernel development news
Controlling device power management
The kernel's power management code works to keep the hardware in the most power-efficient state that is consistent with the current demands on the system. Sometimes, though, overly aggressive power management can interfere with the proper functioning of the system; putting the CPU into a sleep state might wreck ongoing DMA operations, for example. To avoid situations like that, the pm_qos (power management quality of service) mechanism was added to the kernel; using pm_qos, device drivers can describe their requirements to the power-management subsystem. More recently, we have seen a bit of a change in focus in pm_qos, though, as it evolves to handle power management within peripheral devices as well.A partial step in that direction was taken in the 3.2 development cycle, when per-device constraints were added. Like the original pm_qos subsystem, this mechanism is a way for devices to specify their own quality-of-service needs; it allows a driver to specify a maximum value for how long a powered-down device can wait to get power back when it needs to do something. This value (called DEV_PM_QOS_LATENCY in current kernels) is meant to be used with the power domains feature to determine whether (and how deeply) a particular domain on a system-on-chip could be powered down.
The quest for lower power consumption continues, though, and, as a result, we are seeing more devices that perform their own internal power management based on the access patterns they observe. Memory controllers might put some banks into lower power states if they are not seeing much use, for example; this technology seems to work well enough to take much of the wind out of the sails of the various memory power management patch sets out there. Disk drives can spin themselves down, camera sensors can turn themselves off, and so on. Peripherals do not have as good an idea of future access patterns as the host computer should, but, it turns out, they can often do a good job of guessing based on the recent past.
That said, there will certainly be times when a device will decide to take a nap at an inopportune moment. To help avoid this kind of embarrassing situation, many devices that have internal power management provide a way for the host system to communicate its latency needs to the device. If such a device has been informed by the CPU that it should respond with a latency no greater than, say, 10ms, it will not go into any sleep states that would take longer to come back out of.
Current kernels have no formalized way to control the latency requirements communicated to devices, though. That situation could change as early as the 3.15 development cycle, though, if Rafael Wysocki's latency tolerance device pm_qos type patch set finds its way into the mainline. This work uses much of the existing pm_qos framework, but to a different end: rather than allowing drivers to communicate their requirements to the power management core, this mechanism carries latency requirements back to drivers.
The first step is to rename DEV_PM_QOS_LATENCY, which, it could be argued, has an ambiguous name in the new way of doing things. The new name (DEV_PM_QOS_RESUME_LATENCY) may not be that much clearer to developers reading the code from the outside, but it does make room for the new DEV_PM_QOS_LATENCY_TOLERANCE value. As noted above, this pm_qos type differs from the others in that it communicates a tolerance to a device; it also differs in that it is exposed to user space. Any device that supports this feature will have a new attribute (pm_qos_latency_tolerance_us) in its sysfs power directory. A specific latency value (in µs) can be written to this attribute to indicate that the device must be able to respond in the given period of time. There are two special values as well: "auto", which puts the device into its fully automatic power-management mode, and "any", which does not set any specific constraints, but which tells the hardware not to adjust its latency tolerance values in response to other power-management events (transitions to and from a suspended state, for example).
Device power management information is stored in struct dev_pm_info which, in turn, is found in struct device. Devices supporting DEV_PM_QOS_LATENCY_TOLERANCE need to provide a new function in that structure:
void (*set_latency_tolerance)(struct device *dev, s32 tolerance);
Whenever the latency tolerance value changes, set_latency_tolerance() will be called with the new value. The special tolerance value PM_QOS_LATENCY_ANY corresponds to the "any" value described above. Otherwise, a negative tolerance value indicates that the device should be put into the fully automatic mode.
In many cases, driver authors will not need to concern themselves with providing this callback, though. Instead, it will be handled at the bus level, perhaps in combination with the firmware. The initial implementation posted by Rafael takes advantage of the "latency tolerance reporting" registers provided via ACPI by some Intel devices; for such devices, the power management implementation exists in the ACPI code and need not be duplicated elsewhere.
The final step is to actually make use of this feature when hardware that supports it is available. Such use seems most likely to show up in mobile systems and other dedicated settings where the software can easily be taught to tweak the latency parameters when the need arises. Writing applications that can tune those parameters on a general-purpose system seems like a harder task. But, even there, when the hardware wants to do the wrong thing, there will be a mechanism to set it straight.
Flags as a system call API design pattern
The renameat2() system call recently proposed by Miklos Szeredi is a fresh reminder of a category of failures in the design of kernel-user-space APIs that has a long history on Linux (and, going even further back, Unix). A closer look at that history yields a lesson that should be kept in mind for all future system calls added to the kernel.
The renameat2() system call is an extension of renameat() which, in turn, is an extension of the ancient rename() system call. All of these system calls perform the same general task: manipulating directory entries to give an existing file a new name on the same filesystem. The original rename() system call took just two arguments: the old pathname and the new pathname. renameat() added two arguments, one associated with each pathname argument. Each of the new arguments can be a file descriptor that refers to a directory: if the corresponding pathname argument is relative, then it is interpreted relative to the associated directory file descriptor, rather than the current working directory (as is done by rename()).
renameat() was one of a raft of thirteen new system calls added to Linux in kernel 2.6.16 to perform various operations on files. The twofold purpose of the directory file descriptor argument is elaborated in the openat(2) manual page:
- to avoid race conditions that could occur with the
corresponding traditional system calls if one of the directory
components in a (relative) pathname was changed at the same time as
the system call, and
- to allow the implementation of per-thread "current working directories" via directory file descriptors.
The next step, renameat2(), extends the functionality of renameat() to support a new use case: atomically swapping two existing pathnames. Although that use case is related to the earlier system calls, it was necessary to define a new system call for one simple reason: renameat() lacked a mechanism for the kernel to support (and the caller to request) variations in its behavior. In other words, it lacked the kind of flags bit-mask argument that is provided by system calls such as clone(), fcntl(), mremap(), and open(), all of which allow a varying number of arguments, depending on the bits specified in the flags argument.
renameat2() implements the new "swap" functionality and adds a new flags argument whose bits can be used to select variations in behavior of the system call. The first of these bits is RENAME_EXCHANGE, which selects the "swap" functionality; without that flag, renameat2() behaves like renameat(). The addition of the flags arguments hopefully forestalls the need to one day create a renameat3() system call to add other new functionality. And indeed, Andy Lutomirski soon observed that another flag could be added: RENAME_NOREPLACE, to prevent a rename operation from overwriting an existing file. Formerly, the only race-free way of preventing an existing file from being clobbered was to use link() (which fails if the target pathname exists) to create the new name, followed by unlink() to remove the old name.
Mistakes repeated
There is, of course, a sense of déjà vu about the renameat2() story, since the reason that the earlier renameat() system call was required was that rename() lacked the extensibility that would have been allowed by a flags argument. Consideration of this example prompts one to ask: "How many times have we made that particular mistake?" The answer turns out to be "quite a few."
One does not need to go far to find some other examples. Returning to the thirteen "directory file descriptor" system calls that were added in Linux 2.6.16, we find that, with no particular rhyme or reason, four of the new system calls (fchownat(), fstatat(), linkat(), and unlinkat()) added a flags argument that was not present in the traditional call, while eight others (faccessat(), fchmodat(), futimesat(), mkdirat(), mknodat(), readlinkat(), renameat(), and symlinkat()) did not. (The remaining call, openat(), retained the flags argument that was already present in open().)
Of the new calls that did not include a flags argument, one, futimesat(), was soon superseded by a new call that did have a flags argument (utimensat(), added in Linux 2.6.22), and renameat() seems poised to suffer the same fate. One is left wondering: would any of the remaining calls also have benefited from the inclusion of a flags argument? Studying this set of functions further, it is soon evident that the answer is "yes", in at least three cases.
The first case is the faccessat() system call. This system call lacks a flags flags argument, but the GNU C Library (glibc) wrapper function adds one. If bits are specified in that argument, then the wrapper function instead uses the fstatat() system call to determine file access permissions. It seems clear that the lack of a flags argument was realized too late, and the design problem was subsequently papered over in glibc. (The implementer of the "directory file descriptor" system calls was the then glibc maintainer.)
The second case is the fchmodat() system call. Like the faccessat() system call, it lacks a flags argument, but the glibc wrapper adds one. That wrapper function allows for an AT_SYMLINK_NOFOLLOW flag. However, the flag is not currently supported, because the kernel doesn't provide the necessary support. Clearly, the glibc wrapper function was written to allow for the possibility of an fchmodat2() system call in the future.
The third case is the readlinkat() system call. To understand why this system call would have benefited from a flags argument, we need to consider three of the system calls that were added in Linux 2.6.13 that do permit a flags argument—fchownat(), fstatat(), and linkat(). Those system calls added the AT_EMPTY_PATH flag in Linux 2.6.39. If this flag is specified in the call, and the pathname argument is an empty string, then the call instead operates on the open file referred to by the "directory file descriptor" argument (and in this case, that argument can refer to file types other than directories). This allows these system calls to provide functionality analogous to that provided by fchmod() and fstat() in the traditional Unix API. (There is no "flink()" in the traditional API.)
Strictly speaking, the AT_EMPTY_PATH functionality could have been supported without the use of a flag: if the pathname argument was an empty string, then these calls could have assumed that they are to operate on the file descriptor argument. However, the requirement to use a flag serves the dual purposes of documenting the programmer's intent and preventing accidents that might occur if the pathname argument was unintentionally specified as an empty string.
The "operate on a file descriptor" functionality also turned out to be useful for readlinkat(), which likewise added that functionality in Linux 2.6.39. However, readlinkat() does not have a flags argument; the call simply operates on the file descriptor if the pathname argument is an empty string, and thus does not have the benefits that the AT_EMPTY_PATH flag confers on the other system calls. Thus readlinkat() is another system call where a flags argument would have been desirable.
In summary, then, of the eight "directory file descriptor" system calls that lacked a flags argument, this lack has turned out to be a mistake in at least five cases.
Of course, Linux developers were not the first to make this kind of design error. Long before Linux appeared, there was wait() without flags and then wait3() with flags. And Linux has gone on to fix some instances of this design error in APIs inherited from Unix, adding, for example, dup3() as a successor to dup2(), and pipe2() as the successor to pipe() (both new system calls added in kernel 2.6.27).
Latter-day missing-flags examples
But, given the lessons of history, we've managed to repeat the mistake far too many times in Linux-specific system calls. As well as the directory file descriptor examples mentioned above, here are some other examples:
Original system call Successor epoll_create() (2.6.0) epoll_create1() (2.6.27) eventfd() (2.6.22) eventfd2() (2.6.27) inotify_init() (2.6.13) inotify_init1() (2.6.27) signalfd() (2.6.22) signalfd4() (2.6.27)
The realization that certain system calls might need a flags argument sometimes comes in waves, as developers realize that multiple related APIs may need such an argument; one such wave occurred in Linux 2.6.13, when four of the "directory file descriptor" system calls added a flags argument.
As can be seen from the other examples shown just above, another such wave occurred in kernel 2.6.27, when a total of six new system calls were added. All of these new calls, as well as accept4(), which was added for the same reasons in Linux 2.6.28, return new file descriptors. The main reason for the addition of the new calls was to allow the caller the option of requesting that the close-on-exec flag be set on the new file descriptor at the time it is created, rather than in a separate step using the fcntl(F_SETFD) operation. This allows user-space applications to avoid certain race conditions when using the traditional counterparts of these system calls in multithreaded applications. Those races could occur when one thread tried to create a file descriptor and use fcntl(F_SETFD) to set its close-on-exec flag at the same time as another thread happened to perform a fork() plus execve(). (The socket() and socketpair() system calls also added this new functionality in 2.6.27. However, somewhat bizarrely, this was done by jamming bit flags into the high bytes of these calls' socket type argument, rather than creating new system calls with a flags argument.)
Turning to more recent Linux development history, we see that a number of new system calls added since kernel 2.6.28 have all included a flags argument, including fanotify_init(), fanotify_mark(), open_by_handle_at(), and name_to_handle_at(). However, in all of those cases, the flags argument was required at the outset, so no decision about future-proofing this aspect of the API was required.
On the other hand, there have been some misses or near misses for other system calls. The syncfs() system call added in Linux 2.6.39 does not have a flags argument, although one wonders whether some filesystem developer might have taken advantage of such a flag, if it existed, to allow the caller to vary the manner in which a filesystem is synced to disk. And the finit_module() system call added in Linux 3.8 only got a flags argument after some last minute prompting; once added, the flag proved immediately useful.
The conclusion from this oft-repeated pattern of creating new incarnations of system calls that add a flags argument is that a suitable question to ask during the design of every new system call is: "Is there a reason not to include a flags argument in the API?" Considering the question from that perspective is likely to more often lead developers to default to following the wise example of the process_vm_readv() and process_vm_writev() system calls added in Linux 3.2. The developers of those system calls included a (currently unused) flags argument on the suspicion that it may prove useful in the future. History suggests that they'll one day be proved right.
Best practices for a big patch series
The kernel development process features thousands of developers all working together without stepping on each other's toes — very often, anyway. The modularity of the kernel is one of the biggest reasons for the smoothness of the process; developers rarely find themselves trying to work on the same code at the same time. But there are always exceptions, one of which is the large, cross-subsystem patch series. Merging such a series does not have to be a recipe for trouble, especially if a few guidelines are followed; this article offers some suggestions in that direction.Changing the whole kernel tree using a pattern has become a lot easier in recent years. There is more processing power available, example scripts are out there, and tools like Coccinelle are especially targeted for such tasks. While this is great for wide-ranging work like API changes and bug fixes across all drivers, handling a patch series spanning across various subsystems can be a bit cumbersome. Dependencies and responsibilities need to be clear, the granularity (i.e. number of patches) needs to be proper, and relevant information needs to reach all people involved. If these conditions are not met, maintainers might miss important details which means more work for both the submitter and the maintainer. The best practices described below are intended to make submitting such a patch series smooth and to avoid this unnecessary work.
Patch organization
The first question to answer is: in what form should your changes be posted? Here are the most commonly used choices along with examples of when they were used. There are no strict rules when to use which approach (and there can't be), so the examples hopefully give you an idea what issues to consider and what might be appropriate for your series.
- Changing the whole tree at once:
Having one patch changing files tree-wide in one go has the advantage of
immediately changing the API (no transition time). Once applied, it is done,
ready, and there should be no cruft left over. Because only one maintainer
is needed to merge the
huge patch, this person can easily handle any dependencies that might
exist. The major drawback is a
high risk of merge conflicts all over the tree because so many subsystems are
touched. This approach was used for renaming
INIT_COMPLETION() to reinit_completion().
- Grouping changes per file:
Having one patch for every modified file gives each subsystem maintainer
freedom regarding when to apply the patches and how to handle merge
conflicts because
the patches do not cross subsystems. However, if there are dependencies, this
can become a nightmare ("Shall I apply patches 32-53 to my tree now? Do I have
to wait until 1-5 are applied? Who does that? Or is there a V2 of the series
coming?"). Also, a huge number of patches pollutes the Git history. This
choice was
used for removing
platform_driver_probe() from bus masters like I2C and SPI. It was
chosen to provide a more fine-grained bisectability in case something went
wrong.
- Grouping changes per subdirectory: Having a patch per subdirectory somewhat resembles a patch per subsystem. This is a compromise of the former two options. Fewer patches to handle, but still each subsystem maintainer is responsible for applying and for conflict resolution. When the pinctrl core became able to select the default state for a group of pins, the explicit function call doing that in drivers was removed in this fashion. In another example, a number of drivers did sanity checks of resources before passing them to devm_ioremap_resource(). Because the function does its own checks already, the drivers could be cleaned up a little, one subdirectory at a time. Finally, the notorious UAPI header file split was also handled this way.
- Drop the series: Finally, some tasks are just not suitable for mass conversion. One example is changing device drivers to use the managed resources API (devm_* and friends). There are surely some useful patterns to remove boilerplate code here. Still, not knowing hardware details may lead to subtle errors. Those will probably be noticed for popular drivers, but may introduce regressions for less popular ones. So, those patches should be tested on real hardware before they are applied. If you really want to do a series like this as a service to the community, you should then ask for and collect Tested-by tags. Expect the patches to go in individually, not as a series. Patches that do not get properly tested may never be applied.
Of course, the decision of which form to use should be driven by technical reasons only, patch count statistics, in particular, should not be a concern. As mentioned before, there are no hard rules, but you can assume that changing the whole tree at once is usually frowned upon unless the dependencies require it. Also, try to keep the number of patches low without sacrificing flexibility. That makes changes per subdirectory a good start if you are unsure. In any case, say in the cover letter what you think would be best. Be open for discussion because approaches do vary. For example, I would have preferred if the removal of __dev* attributes would have been one huge patch instead of 358 small ones. As a result, be prepared to convert your series from one form into another.
Note: To automatically create commits per subdirectory with git, the following snippet can be used as a basis. It reads a commit message template specified by $commit_msg_template to create the commit descriptions. There, it replaces the string SEDME with the directory currently being processed.
dirs=$(git status --porcelain --untracked-files=no $startdir | \ dirname $(awk '/^ M / { print $2 }') | sort -u) for d in $dirs; do git add --update $d/*.[ch] sed "s|SEDME|${d//|/\|}|" $commit_msg_template | git commit --quiet -F - done
An example commit message template might look like::
SEDME: calling foo() in drivers is obsolete foo() is called by the core since commit 12345678 ("call foo() in core"). No need for the driver to do it. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The procedure
With any patch series, the good old "release early, release often" rule holds true. Let people know what you are up to. Set up a public repository, push your complete branch there, and update it regularly. If the series is not trivial, send an RFC to collect opinions. For an RFC, it may be appropriate to start by patching only one subsystem rather than the whole tree, or to use a whole-tree patch this one time in order to keep the mail count low. Always send a cover letter and describe your aim, dependencies, public repositories, and other relevant information.
Ask Fengguang Wu to build your branch with his great kbuild test service. When all issues are resolved and there are no objections, send the whole series right away. Again, don't forget a proper cover letter. In case of per-file or per-directory patches, the subsystem maintainers will pick up the individual patches as they see fit. Be prepared for this process to take longer than one development cycle. In that case, rerun your pattern in the next development cycle and post an updated series. Keep at it until done.
If it has been agreed to use the all-at-once approach, there may be a subsystem maintainer willing to pick up your series and take care of needed fixups during the merge window (or maybe you will be asked to do them). If there is no maintainer to pick your series but appropriate Acked-by tags were given, then (and only then) it is up to you to send a pull request to Linus. Shortly after the -rc1 release is a good time for this, though it is best to agree on this timing ahead of time. Make sure you have reapplied your pattern on the relevant -rc1 release so that the patches apply. Ask Stephen Rothwell to pull your branch into linux-next. If all went well, send out the pull request to Linus.
Whom to send the patches to
When you send the series, use git send-email. The linux-kernel mailing list is usually the best --to recipient. Manually add people and lists to CC if they should be interested in the whole series.
For other CCs, get_maintainer.pl from the kernel scripts directory is the tool to use. It supports custom settings via .get_maintainer.conf, which must be placed in the kernel top directory. The option --no-rolestats should be in that file; it suppresses the printing of information about why an email address was added. This extra output may confuse git and is also seen as noise on the mailing lists. The other default options are sane, but the usage of --git-fallback depends on the series you want to send. For per-file changes, it makes sense to activate this feature, because it adds people who actually worked on the modified files. For per-subsystem and whole-tree changes, --no-git-fallback (the default) makes sense, because those changes are mostly interesting for maintainers, so individual developers don't need to be on CC. If they are interested in the series, they will usually read the mailing list of the subsystem and notice your work there.
There is one last tricky bit left: the cover letter. If it has too few CCs, people who receive individual patches might miss it; they are then left wondering what the patches are trying to accomplish. On the other hand, copying the cover letter to everybody who is also on CC of the patches will usually result in rejected emails, because the CC list becomes too large. The rule of thumb here is: add all mailing lists which get patches to the cover letter. Below is a script that does exactly that. It can be used as a --cc-cmd for git send-email. If it detects a cover letter, it runs get_maintainer.pl on all patches, collecting only mailing lists (--no-m option.) If it detects a patch, it simply executes get_maintainer.pl.
#! /bin/bash # # cocci_cc - send cover letter to all mailing lists referenced in a patch series # intended to be used as 'git send-email --cc-cmd=cocci_cc ...' # done by Wolfram Sang in 2012-14, version 20140204 - WTFPLv2 shopt -s extglob cd $(git rev-parse --show-toplevel) > /dev/null name=${1##*/} num=${name%%-*} if [ "$num" = "0000" ]; then dir=${1%/*} for f in $dir/!(0000*).patch; do scripts/get_maintainer.pl --no-m $f done | sort -u else scripts/get_maintainer.pl $1 fi
Conclusion
Applying patterns to the kernel tree is surely a useful tool. As with any tool, knowledge when to use it and how to properly handle it needs to be developed. This article is hopefully a useful contribution in that direction. The author hopes to inspire other developers and is open for discussion.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Security-related
Virtualization and containers
Page editor: Jonathan Corbet
Distributions
Fedora wrestles with the auto-closing of bugs
Bug tracking is a perennial source of pain for distributions and other free software projects. Bug reports vary widely in quality, there is never enough developer time to even verify all of the bugs, and some bugs may get fixed in the normal course of development, entirely separate from the bug report. But bug lists that just keep growing are not terribly useful, so there is typically some kind of automatic closing of bugs that is done. The criteria for auto-closing is often contentious; finding a balance between a ballooning bug database and not irritating reporters and other users can be difficult—as a recent fedora-devel mailing list thread shows.
In some ideal world, bug reporting would go something like this:
- Someone reports a bug, with lots of detailed information.
- The bug gets triaged and then assigned to the proper developer.
- The developer asks for any more information needed to help reproduce and/or fix the bug.
- The reporter responds promptly with the needed information.
- The bug gets fixed and closed.
That's where auto-closing comes into play. For Fedora, a new release of the distribution triggers the "end of life" (EOL) of the N-2 release (i.e. the release of Fedora 20 meant that Fedora 18 went EOL shortly thereafter). The project has a policy that it won't carry bug entries that only apply to EOL releases, so four weeks before the release goes EOL, a message is posted to the bug to that effect (example). If the bug is not updated to change the release it applies to, it will be closed.
In the example bug, though, David Strauss does note that the bug still affects Fedora 19, but he didn't change the release version it applies to—because he couldn't. He did not report the bug, nor does he have the privileges needed to edit all bugs, so he just posted a comment to the bugzilla entry. Thus he was a bit irritated to find that the bug had been closed:
There were suggestions to change the version on the bug (he can't), to join
the Fedora
BugZappers in order to get privileges needed (that group is defunct), and to
file an upstream bug with X.org (as it is a driver bug). Some showed a
distinct lack of sympathy for Strauss's complaint but, as Colin Macdonald
put it, bugs getting automatically closed
does not show Fedora as
"a welcoming community to
newcomers
".
However, as Adam Williamson pointed out, no one has come up with a better solution yet. Not closing bugs that get comments after the EOL warning risks leaving bugs open if someone simply reports that the problem has gone away in a newer release, he said. Several thought that reports of non-working systems would be much more likely than those reporting that the bug had gone away (since those whose systems are working tend to be less attentive to bugs that got fixed).
There was some confusion in the thread, but Williamson and others clarified that the reporter can change the version (and thus leave the bug open). In the case of Strauss's bug, the reporter had stopped posting on the bug, which leaves it to the developer responsible (typically package maintainers who tend to have a lot on their plates) to notice any comments and update the bug accordingly.
Though he called it "a radical plan
",
Williamson suggested one possible fix: allowing anyone to reopen closed
bugs. Michael Catanzaro had a different
fix: just allow anyone to change the version affected by the bug. That
would eliminate the problem that Strauss ran into. Matthew Miller had a
more elaborate plan that would leave bugs
open for one more release cycle to give users more time to respond before
they get the message that their bug has been closed.
Williamson is not sure there is a real problem, partly because there are too many open bugs already, so adding more (by not auto-closing them) isn't helpful. If the bug didn't get the attention it needed, leaving it open doesn't make it any more likely that it will . But Miller noted that it is a complaint he hears regularly at conferences. When people take the time to file a bug, they are clearly hoping for more than just an eventual "closed due to EOL" message.
On the other hand, Alexander Kurtakov described the problems that package maintainers have with open bugs:
That's a common problem with bugs. And one that's not likely to be solved anytime soon.
For the EOL issues, there is a Fedora wiki entry that covers the tasks, but it still assigns handling bugzilla entries to the disbanded BugZappers group. Miller filed a bug that eventually found its way to a Fedora Engineering Steering Committee ticket. The latter resulted in some changes to the wording of the EOL message, but there may be more changes coming.
The discussion makes it clear that few are completely satisfied with the current scheme. Fedora Program Manager Jaroslav Reznik has gathered some statistics to help quantify the number of bugs closed due to EOL releases and it is substantial: roughly 6000-8000 for each release Fedora 14-18 (with Fedora 15 omitted due to lack of data).
Ultimately, the problem is rooted in the fact that many bugs get little or no attention after they are filed. If there were enough package maintainer and/or upstream developer time to address the bugs, there would be few left to auto-close. The upstream projects are typically better equipped to handle bugs in their code, which has led some to call for filing all bugs upstream.
Bug handling is often tricky and contentious. Reporters generally believe their bugs are high quality and should get immediate attention, while maintainers often find a different picture—if they have time to find anything at all. Like other projects, Fedora is trying to feel its way through something of a minefield.
Brief items
Distribution quotes of the week
Debian 7.4 released
The Debian Project has released the fourth update to its stable distribution, Debian 7 "wheezy". As usual, this update provides corrections for security problems, along with a few adjustments for serious problems. "Please note that this update does not constitute a new version of Debian 7 but only updates some of the packages included."
Ubuntu 12.04.4 LTS released
Ubuntu has released an updated version of its 12.04 long term support (LTS) distribution for the Desktop, Server, Cloud, and Core products: 12.04.4 LTS. In addition, Kubuntu 12.04.4 LTS, Edubuntu 12.04.4 LTS, Xubuntu 12.04.4 LTS, Mythbuntu 12.04.4 LTS, and Ubuntu Studio 12.04.4 LTS have also been released. "As with 12.04.3, 12.04.4 contains an updated kernel and X stack for new installations on x86 architectures. As usual, this point release includes many updates, and updated installation media has been provided so that fewer updates will need to be downloaded after installation. These include security updates and corrections for other high-impact bugs, with a focus on maintaining stability and compatibility with Ubuntu 12.04 LTS."
Distribution News
Debian GNU/Linux
Another Debian init system vote called
Debian technical committee chair Bdale Garbee has posted a surprise call for votes on a simplified resolution that would decide the default init system for the upcoming "jessie" release on Linux — and nothing else. "The fundamental problem is that I remain as convinced now as I was when I posted my last CFV that conflating multiple questions in a single ballot is a bad idea. Our voting system works exceptionally well when we're trying to choose between multiple alternatives for a single question. But as others have observed, trying to mix questions into a matrix of alternatives in a single ballot really complicates the process." The ballot is quite similar to his first attempt, but it includes the language allowing the decision to be overridden by a simple majority on a general resolution.
Assuming a sufficient number of members vote something other than "further discussion," this move might actually bring this chapter of this story (which, alas, may have a fair while to go still) to a close.
Russ Allbery's perspective on the Debian technical committee impasse
LWN has backed off on moment-to-moment coverage of events in Debian's technical committee because it seems that a moment of relative calm is called for. But a note posted by committee member Russ Allbery on the situation is worth reading in its entirety, despite the fact that it's rather long. "In short, you can certainly disagree with the relative weights of the various features or drawbacks of any of the init systems. But I think at the point at which one goes beyond 'I disagree' to 'and therefore you must be biased,' one has lost the plot. This is a hard decision with a lot of subjective judgement, and reasonable people can arrive at opposite conclusions."
The Debian technical committee vote concludes
All of the votes are in on the simplified ballot to choose the default init system for the Debian "jessie" release (on Linux). The Condorcet process left systemd and upstart tied with four votes each; committee chair Bdale Garbee has now used his casting vote in favor of systemd. That ends one chapter of the debate, though the chances of this decision being reviewed via a general resolution seem high.Debian Project Secretary appointment
Debian Project Leader Lucas Nussbaum has announced that Kurt Roeckx will serve as Project Secretary for another year. Neil McGovern will continue to serve as Assistant Secretary.
Fedora
Election Results: Fedora Board, FESCo, and FAmSCo
The elections for the Fedora Engineering Steering Committee (FESCo) and the Fedora Ambassadors Steering Committee (FAmSCo) have concluded. There was no election for the Fedora Project Board as there were only two candidates for the two open seats.
Ubuntu family
Newsletters and articles of interest
Distribution newsletters
- Debian Project News (February 10)
- DistroWatch Weekly, Issue 545 (February 10)
- Tails report (January)
- Ubuntu Weekly Newsletter, Issue 354 (February 9)
Page editor: Rebecca Sobol
Development
Systemd programming part 2: activation and language issues
This is the second half of a pair of articles looking at systemd as a programming language for the specification and management of system services. Part 1 was concerned with modularity issues and how services can be configured. This part continues with a look at the various ways to control the activation of services before getting into an overall look at issues with systemd's language. While systemd has been extensively discussed as an init system, there is value in regarding it from a language point of view as well.
Activation options
With systemd, simply installing a unit file in the appropriate directory does not mean it is automatically active. This is much the same model as SysVinit uses, but it is a contrast to udev and upstart, which treat files as active the moment they are installed. There are subtle differences between systemd and SysVinit, though, which gave me my first hint that just continuing to use the SysVinit scripts for nfs-utils wasn't going to work, even though systemd provides some degree of support for these scripts.
With SysVinit, scripts are installed in /etc/init.d and then linked to /etc/rcN.d, or possibly /etc/init.d/rcN.d, for some value of N. This linking is often performed by the insserv utility, which will examine the header of each script and choose appropriate names for the links, so that the scripts are all run in the correct order. If a script has a "Required-Start" header referring to some other script and that other script has not been enabled (i.e. not linked into some directory), then insserv will complain and identify the missing dependency. It is then a simple matter to rerun insserv listing the extra dependency.
As SysVinit has relatively few scripts with few dependencies, this form of dependency handling does not cause an issue. With systemd, where nfs-utils alone has 14 unit files, having to explicitly enable all of them could get cumbersome. So where insserv treats a dependency as "this must already be enabled", systemd normally treats a dependency as "start this whether it is explicitly enabled or not" (though systemd has a rich language for these dependencies which we will get to in due course).
When systemd reads a SysVinit script, though, it takes a slightly different and more conservative approach to interpreting the dependency headers. It correctly treats Required-Start as implying an ordering (using the After systemd directive), but does not encode the "that service must already be enabled" meaning as it has no equivalent concept. A more liberal approach would translate Required-Start to Requires, causing the named script to be run even if not enabled. That might often be correct, but could be seen as reading more in to the dependency than is intended.
This different default behavior can create different expectations. When insserv nfs reports:
insserv: FATAL: service portmap has to be enabled to use service nfs
the system administrator will naturally run insserv portmap and move on. However, when "systemctl enable nfs" works, but a subsequent "systemctl start nfs" fails because rpcbind (which is the new name for portmap) isn't running, the administrator's response is likely to be less forgiving. For complete compatibility with SysVinit, systemd would need a dependency mechanism which makes it impossible to enable something unless some other requirement were already enabled, but that doesn't really fit into systemd's model.
With that little diversion out of the way, we should look at how units can be activated in systemd. Systemd is often described as using dependencies for unit activation and, while there is certainly truth in that, it is far from the full story. For the full (or, at least, fuller) story we will start with the "mdadm" package which provides services to assemble, monitor, and manage MD RAID arrays. One particular service is provided by running
mdadm --monitor
as a daemon. This daemon will watch for any device failures or other interesting events and respond to them, possibly by sending email to the administrator, or possibly by finding a "spare" device on some other array and moving it across (thus allowing spare sharing between arrays). This daemon should be running whenever any md array is active, but otherwise it is completely unnecessary. This requirement is achieved by creating a systemd unit file to run mdadm --monitor and using the SYSTEMD_WANTS setting in a udev rules file.
Udev (which is a whole different story when it comes to language design) is notified when any device comes online and can perform numerous tests and take actions. One of those actions is to set an environment variable (SYSTEMD_WANTS) to the name of some unit. When systemd subsequently receives events from udev, it will receive the full environment with them. Systemd interprets SYSTEMD_WANTS by adding a Wants directive to the .device unit (possibly newly created) corresponding to the device in the event. So a udev rules file which detects a new MD array and sets SYSTEMD_WANTS=mdmonitor.service will cause mdadm --monitor to run at exactly the correct time.
Note that there is no need to explicitly enable this service. As udev rules are always enabled and the udev rule directly requests the systemd service, it just happens. No activation needed. This signaling from udev to systemd is an event much like the events talked about in the context of upstart. While systemd may not only use events for activation, it certainly does use them — and not only events from udev.
When an NFSv2 or NFSv3 filesystem is mounted, then, unless network file locking has been explicitly disabled, the rpc.statd process must be running to ensure proper lock recovery after a reboot. Rather than have this process always running, /sbin/mount.nfs (which is used to mount all NFS filesystems) will check that rpc.statd is running and, if not, will start it. When systemd is being used it is best to do this by running:
systemctl start rpc-statd.service
which, again, is much like an event.
When mount.nfs checks to see if rpc.statd is running, it attempts to contact it via a network socket. It is well known that systemd supports socket-based activation, so it would be ideal to use that to activate rpc.statd. However, rpc.statd, like most ONC-RPC services, does not use a well-known port number, but instead chooses an arbitrary port and registers it with rpcbind. So systemd would need to do this too: bind an arbitrary port, register that port with rpcbind, and start rpc.statd when traffic arrives on that socket. This is certainly possible; SunOS used to ship with a version of inetd which did precisely this. Solaris still does. Whether it is worth adding this functionality to systemd for the two or maybe three services that would use it is hard to say.
The remainder of the daemons that make up the complete NFS service are not triggered by events and so must be explicitly activated by being tied to specific well-known activation point "targets", which are not unlike SysVinit run levels. Even there, the distinction between the systemd approach and the use of events in upstart is not as obvious as one might expect.
As we shall see, a dependency relationship is created between nfs-server.target and multi-user.target so that when multi-user.target is started, nfs-server.target is started too. As upstart jobs broadcast a "starting" signal when they are starting, and can register to, for example, "start on starting multi-user" the net effect is, to some degree at least, similar.
There is a key difference here, though, and it isn't really about events or dependencies, but about causality. In upstart, a job can declare "start on" to identify which event it should start on. So each job declares the events which cause it to run. Systemd, despite its rich dependency language, has no equivalent to "start on", an omission that appears to be deliberate. Instead, each event — the starting or stopping of a unit — declares which jobs (or units) need to be running. The dependency language is exactly reversed. With upstart, each job knows what causes it to start. With systemd, each job knows what it causes to start.
While systemd has no equivalent to "start on", it has something related that we must study to understand how the remaining nfs-utils daemons are started. This is represented by the "WantedBy" and "RequiredBy" directives, which are quite different from the "Requires" and "Wants" etc. dependency directives. "WantedBy" plays no role in determining when to start (or stop) any service. Instead, it is an instruction on how to enable a specific unit. The directive:
WantedBy=some-unit.target
means "the best way to enable this unit is to tell some-unit.target that it Wants us." It is possible to tell any unit that it wants another unit, either by creating a drop-in file as described in part 1 to add a "Wants" directive, or by creating some special symbolic links that systemd interprets in a similar way to drop-ins. The easiest way, though, is to run:
systemctl enable servicename
This command responds to the WantedBy directive in servicename.service by creating the special symlink so that some-unit.target thinks it wants servicename.
So in our collection of nfs-utils unit files, a few of them have WantedBy directives so they can be properly enabled. The rest get activated by "Wants" or "Requires" lines in those main files. Two of the unit files fit this pattern perfectly. nfs-server.target and nfs-client.target are WantedBy=multi-user.target or remote-fs.target, and then they Want or Require other units. The other two target unit files are a bit different, and to understand them we need to revisit the problem of configuration.
One of the purposes of the current configuration setup for nfs-utils in openSUSE is to optionally start the daemons which support using Kerberos to secure the NFS traffic. If you trust your local network, then Kerberos security is pointless and it is a waste to even run the daemons. However, if you want to use NFS over an untrusted network, then running rpc.gssd and rpc.svcgssd really is a must ("gss" here stands for "Generic Security Services"; while they were designed to be generic, the practical reality is that they only support Kerberos).
So we have the situation that nfs-server.target wants rpc-svcgssd.service, but only if network security is wanted, and this latter choice is configured by an environment variable. This is a requirement that systemd really cannot manage. It has a number of Condition directives to disable units in various cases, but none of them can be controlled using an environment variable. This suggests that either the sysadmin or the configuration tool (and possibly both) will need to use some other mechanism. The most obvious mechanism is systemctl and particularly:
systemctl enable rpc-svcgssd
to enable a service if it is off by default, or:
systemctl mask rpc-svcgssd
to mask (disable) a service that is otherwise on by default.
There is a complication though: in the pre-existing configuration that I was trying to mirror with systemd units, there are two services, rpc.gssd and rpc.svcgssd, that are both controlled by a single configuration item NFS_SECURITY_GSS. These need to be started in different circumstances: rpc.gssd is required if the NFS server is started or an NFS filesystem is mounted, while rpc.svcgssd is only required if the server is started. So we cannot simply have an nfs-secure.target which needs both of them and can be manually enabled. Systemd is powerful enough to make this set of requirements achievable, though it does seem to be a bit of a stretch.
The draft unit-file collection contains an nfs-secure.target unit which can be enabled or disabled with systemctl, but it doesn't actually start anything itself. Instead it is used to enable other units. The two related units (rpc-gssd.service and rpc-svcgssd.service) now gain the directive:
Requisite=nfs-secure.target
This means that those services want nfs-secure, and if it isn't already running, they fail. This has exactly the desired effect. After "systemctl enable nfs-secure.target" the GSS daemons will be started when required; after "systemctl disable nfs-secure.target" they will not.
Having four different targets which can selectively be enabled (the fourth being a target similar to nfs-secure.target but which enables the blkmapd daemon to support part of the "pNFS" extension; not needed by most sites) might seem like it is providing too much choice. Here again, systemd comes to the rescue with an easy mechanism for distribution packagers to take some of the things I made optional and make them always enabled. A distribution is encouraged to provide one or more "preset" files listing systemd units that should be automatically enabled whenever they are installed. So if a distribution was focused on high levels of security, it could include:
enable nfs-secure.target
in the preset file. This would ensure that, if nfs-utils were ever installed, the security option would be activated by default. This feature encourages upstream unit file developers to be generous in the number of units that require enabling, being confident that while it provides flexibility to whose who need it, it need not impose a cost on those who don't.
In summary, systemd provides a pleasing range of events and dependencies (many of which we have not covered here) which can be used to start units. It is unfortunate, however, that enabling or disabling of specific units is not at all responsive to the environment files that systemd is quite capable of reading. The choice to not automatically activate any installed unit file is probably a good one, although it is an odd contrast to udev, which is included in the same source package.
Language issues
Having a general interest in programming language design, I couldn't help looking beyond the immediate question of "can I say what I need to say" to the more subjective questions of elegance, uniformity, and familiarity. Is it easy to write good code and hard to write bad code for systemd?
One issue that stuck me as odd, though there is some room for justification, is the existence of section headings such as [Unit] or [Service] or [Install]. These don't really carry any information, as a directive allowed in one section is not allowed in any other, so we always know what a directive means without reference to its section. A justification could be that these help ensure well-structured unit files and thus, help us catch errors more easily.
If that is the case, then it is a little surprising that the concern for error detection doesn't lead to unit files with syntax errors immediately failing so they will be easily noticed. The reasoning here is probably that an imperfectly functioning system is better than one that doesn't boot at all. That is hard to argue against, though, as a programmer, I still prefer errors to fail very loudly — I make too many of them.
More significant than that is the syntax for conditionals. The directive:
ConditionPathExists=/etc/krb5.keytab
will cause activation of the unit to only be attempted if the given file exists. You can easily test for two different files, both of which must exist, with:
ConditionPathExists=/etc/krb5.keytab ConditionPathExists=/etc/exports
or even for a disjunction or 'or' condition:
ConditionPathExists=|/etc/mdadm.conf ConditionPathExists=|/etc/mdadm/mdadm.conf
If you want to get more complicated, you can negate conditions (with a leading !) and have the conjunction of a number of tests together with the disjunction of some other tests. For example:
A and B and (C or D or E)
To achieve this, the A and B conditions are unadorned, while C, D and E each have a '|' prefix. However, you cannot have multiple disjunctions like
(A or B) and (C or D)
Now, I'm not sure I would ever want a conjunction of multiple disjunctions, but it seems worth asking why the traditional infix notation was not used. From reading around, it is clear that the systemd developers want to keep the syntax "simple" so that the unit files can be read by any library that understands the "ini" file format. While this is a laudable goal, it isn't clear that the need for some unspecified program to read the unit files should outweigh the need for humans to read and understand them.
It also doesn't help that while the above rules cover most conditions, they don't cover them all. As already noted, "Requisite" is largely just a condition which might be spelled "ConditionUnitStarted", but isn't; it also doesn't allow the '|' or '!' modifiers.
The final inelegance comes from the ad hoc collection of directives which guide how state changes in one unit should affect state changes in another unit. Firstly, there are 13 directives that identify other units and carry various implications about the relationship — sometimes overlapping, sometimes completely orthogonal:
Requires, RequiresOverridable, Requisite, RequisiteOverridable, Wants, BindsTo, PartOf, Conflicts, Before, After, OnFailure, PropagateReloadsTo, ReloadPropagateFrom
Of these, Before and After are inverses and do not overlap with any others, while Requires, Requisite, and Wants specify dependencies with details that differ on at least two dimensions (active vs passive and hard vs soft). Several, such as PartOf, PropagateReloadsTo, ReloadPropagateFrom, and OnFailure, are not dependencies, but instead specify how an event in one unit should cause an action in another.
Together with these, there are some flags which fine-tune how a unit responds in various circumstances. These include:
StopWhenUnneeded, RefuseManualStart, RefuseManualStop
While there is clearly a lot of expressive power here, there is also a lot of ad-hoc detail which makes systemd harder to learn. I get a strong feeling that there is some underlying model that is trying to get out. If the model were fully understood, the language could be tuned to expose it, and the programmer would more easily select exactly the functionality that is required.
Some small part of the model is that the relationship between two units can be:
- ordered: Before or After or neither;
- active, where one unit will start another, or passive, where it won't;
- dependent, which can take one of three values: no dependence, must have started, or must still be running; and
- overridable, which indicates whether an external request has extra force, or no force at all.
Given that systemd already uses special characters to modify meaning in some directives, it is not hard to come up with a set of characters which could allow all these details, and possibly more, to be specified with a single "DependsOn" directive.
These only really cover the starting of units, and even there, they aren't complete. The model must also include the stopping of units as well as "restart" and "reload" events. Creating a simple model which covers all these aspects without leading to an overly verbose language would be a real boon. Using special characters for all of the different details that would turn up may well cause us to run out of punctuation, but maybe there is some other way to describe the wide range of connections more elegantly.
A pragmatic assessment
While it did feel like a challenge to pull together all the ideas needed to craft a clean and coherent set of unit files, the good news is that (except for one foolish cut-and-paste error) the collection of unit files I created did exactly what I expected on the first attempt. For a program of only 168 lines this might not seem like a big achievement, but, as noted, the fact that only 168 lines were needed is a big part of the value.
These 14 units files are certainly not the end of the story for nfs-utils. I'm still learning some of the finer details and will doubtlessly refine these unit files a few times before they go live. Maybe the most valuable reflection is that I'm far more confident that this program will do the right thing than I ever could be of the shell scripts used for SysVinit. Speaking as a programmer: a language that allows me to say what I want succinctly and gives me confidence that it will work is a good thing. The various aesthetic issues are minor compared to that.
Brief items
Quotes of the week
Docker 0.8 released
Version 0.8 of the Docker container-creation system has been announced. This release brings some changes to the development process: "First, this is the first Docker release where features take the backseat to quality: dozens and dozens of bugfixes, performance boosts, stability improvements, code cleanups, extra documentation and improved code coverage – that’s the primary feature in Docker 0.8." The project will also be doing time-based monthly releases going forward. There are still some new features, including a Btrfs storage driver and Mac OS support; see the changelog for details.
GDB 7.7 released
Version 7.7 of the GDB debugger is out. It features improved Python scripting support, a number of new commands, support for a few new targets, and more.Glibc 2.19
Version 2.19 of the GNU C library has been released. It includes a lot of bug fixes, support for a number of new locales, better SystemTap support, and more; see this article for more information.The EGLIBC 2.19 release is also in the works, with an important note:
It appears that changes in glibc development have made this fork unnecessary going forward.
Mozilla announces "Firefox Accounts"
The Mozilla Blog has an announcement
about "Firefox Accounts," a new service being rolled out by the
browser vendor. The post describes this venture as "as a safe
and easy way for you to create an account that enables you to sign in
and take your Firefox with you anywhere. With Firefox Accounts, we can
better integrate services into your Web experience
". The
announcement does not shed light on what services will be involved,
other than the fact that it will incorporate the existing Firefox Sync. The
service is testable on Mozilla's Aurora
pre-release builds of Firefox.
Apache SpamAssassin 3.4.0 available
The SpamAssassin 3.4.0 release is out. "This is a major release. It introduces over two years of bug fixes and features since the release of SpamAssassin 3.3.2 on June 16, 2011." Changes include use of the Redis backend for Bayesian data storage, native IPv6 support, and, of course, lots of rule changes.
Newsletters and articles
Development newsletters from the past week
- What's cooking in git.git (February 5)
- What's cooking in git.git (February 7)
- LLVM Weekly (February 10)
- OCaml Weekly News (February 11)
- Perl Weekly (February 10)
- PostgreSQL Weekly News (February 9)
- Python Weekly (February 6)
- Ruby Weekly (February 6)
- This Week in Rust (February 9 )
- Tor Weekly News (February 12)
Jones: The EFI System Partition and the Default Boot Behavior
On his blog, Peter Jones writes about the boot process for UEFI, looking at the requirements for EFI System Partitions (ESPs), how the BootOrder variable is used, falling back to removable media, and more. It may be more than you wanted to know about UEFI booting. "There’s nothing truly special about an ESP. It isn’t an ESP because of the GPT GUID and label, nor because of the file system type. Those are how the firmware identifies a partition, and the file system it contains, as candidates to treat as the ESP, when it really needs to find one. The only factor in determining if a partition is the ESP is this: is the firmware attempting to use it as the ESP? At the same time, the requirements for the ESP give us latitude; we know that we can use UEFI’s APIs to find correctly constructed FAT file systems, but there’s no need for those to be the ESP. In fact, even when we create multiple partitions with the ESP’s GUID and label, there’s no requirement that the firmware looks at more than one of them if it needs to find the ESP, and there’s no guarantee as to which one it will pick, either."
Mozilla To Sell Ads In Firefox Web Browser (AdvertisingAge)
AdvertisingAge is reporting that Mozilla will be selling ads in Firefox. In particular, the "New Tab" page that normally has nine of the most frequently visited sites shown will, for new users, show ads and "pre-packaged content" in the new feature called "Directory Tiles". The Mozilla blog gives a bit more detail: "Some of these tile placements will be from the Mozilla ecosystem, some will be popular websites in a given geographic location, and some will be sponsored content from hand-picked partners to help support Mozilla’s pursuit of our mission. The sponsored tiles will be clearly labeled as such, while still leading to content we think users will enjoy. We are excited about Directory Tiles because it has inherent value to our users, it aligns with our vision of a better Internet through trust and transparency, and it helps Mozilla become more diversified and sustainable as a project."
Page editor: Nathan Willis
Announcements
Articles of interest
Top 10 legal issues for free software of 2013 (opensource.com)
Opensource.com covers some legal issues faced in 2013. Topics include Android patent litigation, license compliance, forks, enforcement, GitHub's license selection policy, good news in the patent wars, FOSS in government and in the private sector, contributor agreements, and collaborations. "On June 14, 2013, the district court of Hamburg found that Fantec violated the obligation in the GPLv2 to provide to its customers the "complete corresponding source code" of the software. Fantec objected that it had been assured by its Chinese supplier that the source code received from the supplier was complete. And Fantec claimed that they had investigated options with third parties for source code analysis and had been informed that such reviews were quite expensive and not completely reliable. The court rejected these excuses."
SFC Fiscal Year 2012 report
The Software Freedom Conservancy has published its annual report for the fiscal year ending February 28, 2013.
New Books
New from No Starch Press: "Learn to Program with Scratch"
No Starch Press has released "Learn to Program with Scratch", by Majed Marji.
Calls for Presentations
CFP Deadlines: February 13, 2014 to April 14, 2014
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
Deadline | Event Dates | Event | Location |
---|---|---|---|
February 14 | May 12 May 16 |
OpenStack Summit | Atlanta, GA, USA |
February 27 | August 20 August 22 |
USENIX Security '14 | San Diego, CA, USA |
March 10 | June 9 June 10 |
Erlang User Conference 2014 | Stockholm, Sweden |
March 14 | May 20 May 22 |
LinuxCon Japan | Tokyo, Japan |
March 14 | July 1 July 2 |
Automotive Linux Summit | Tokyo, Japan |
March 14 | May 23 May 25 |
FUDCon APAC 2014 | Beijing, China |
March 16 | May 20 May 21 |
PyCon Sweden | Stockholm, Sweden |
March 17 | June 13 June 15 |
State of the Map EU 2014 | Karlsruhe, Germany |
March 21 | April 26 April 27 |
LinuxFest Northwest 2014 | Bellingham, WA, USA |
March 31 | July 18 July 20 |
GNU Tools Cauldron 2014 | Cambridge, England, UK |
March 31 | September 15 September 19 |
GNU Radio Conference | Washington, DC, USA |
March 31 | June 2 June 4 |
Tizen Developer Conference 2014 | San Francisco, CA, USA |
March 31 | April 25 April 28 |
openSUSE Conference 2014 | Dubrovnik, Croatia |
April 3 | August 6 August 9 |
Flock | Prague, Czech Republic |
April 4 | June 24 June 27 |
Open Source Bridge | Portland, OR, USA |
April 5 | June 13 June 14 |
Texas Linux Fest 2014 | Austin, TX, USA |
April 7 | June 9 June 10 |
DockerCon | San Francisco, CA, USA |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Lawrence Lessig to speak at SCALE 12X
Lawrence Lessig will be speaking at the Southern California Linux Expo on February 21, 2014 "Lessig is the director of the Edmond J. Safra Center for Ethics at Harvard University. He is also a founding memeber of Creative Commons and a founder of Rootstrikers, and is a former board member of the Free Software Foundation, Software Freedom Law Center and the Electronic Frontier Foundation.". SCALE will take place February 21-23 in Los Angeles, CA.
Events: February 13, 2014 to April 14, 2014
The following event listing is taken from the LWN.net Calendar.
Date(s) | Event | Location |
---|---|---|
February 14 February 16 |
Linux Vacation / Eastern Europe Winter 2014 | Minsk, Belarus |
February 21 February 23 |
conf.kde.in 2014 | Gandhinagar, India |
February 21 February 23 |
Southern California Linux Expo | Los Angeles, CA, USA |
February 25 | Open Source Software and Govenrment | McLean, VA, USA |
February 28 March 2 |
FOSSASIA 2014 | Phnom Penh, Cambodia |
March 3 March 7 |
Linaro Connect Asia | Macao, China |
March 6 March 7 |
Erlang SF Factory Bay Area 2014 | San Francisco, CA, USA |
March 15 March 16 |
Chemnitz Linux Days 2014 | Chemnitz, Germany |
March 15 March 16 |
Women MiniDebConf Barcelona 2014 | Barcelona, Spain |
March 18 March 20 |
FLOSS UK 'DEVOPS' | Brighton, England, UK |
March 20 | Nordic PostgreSQL Day 2014 | Stockholm, Sweden |
March 21 | Bacula Users & Partners Conference | Berlin, Germany |
March 22 March 23 |
LibrePlanet 2014 | Cambridge, MA, USA |
March 22 | Linux Info Tag | Augsburg, Germany |
March 24 March 25 |
Linux Storage Filesystem & MM Summit | Napa Valley, CA, USA |
March 24 | Free Software Foundation's seminar on GPL Enforcement and Legal Ethics | Boston, MA, USA |
March 26 March 28 |
Collaboration Summit | Napa Valley, CA, USA |
March 26 March 28 |
16. Deutscher Perl-Workshop 2014 | Hannover, Germany |
March 29 | Hong Kong Open Source Conference 2014 | Hong Kong, Hong Kong |
March 31 April 4 |
FreeDesktop Summit | Nuremberg, Germany |
April 2 April 4 |
Networked Systems Design and Implementation | Seattle, WA, USA |
April 2 April 5 |
Libre Graphics Meeting 2014 | Leipzig, Germany |
April 3 | Open Source, Open Standards | London, UK |
April 7 April 8 |
4th European LLVM Conference 2014 | Edinburgh, Scotland, UK |
April 7 April 9 |
ApacheCon 2014 | Denver, CO, USA |
April 8 April 10 |
Open Source Data Center Conference | Berlin, Germany |
April 8 April 10 |
Lustre User Group Conference | Miami, FL, USA |
April 11 April 13 |
PyCon 2014 | Montreal, Canada |
April 11 | Puppet Camp Berlin | Berlin, Germany |
April 12 April 13 |
State of the Map US 2014 | Washington, DC, USA |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol