LWN.net Weekly Edition for October 31, 2013

bugzilla: multiple vulnerabilities

Package(s):

bugzilla

CVE #(s):

CVE-2013-1734 CVE-2013-1742 CVE-2013-1743

Created:

October 29, 2013

Updated:

October 30, 2013

Description:

From the Red Hat bugzilla:

Class:       Cross-Site Request Forgery
Versions:    2.16rc1 to 4.0.10, 4.1.1 to 4.2.6, 4.3.1 to 4.4
Fixed In:    4.0.11, 4.2.7, 4.4.1
Description: When an attachment is edited, a token is generated to
             validate changes made by the user. Using a crafted URL,
             an attacker could force the token to be recreated,
             allowing him to bypass the token check and abuse a user
             to commit changes on his behalf.
References:  https://bugzilla.mozilla.org/show_bug.cgi?id=913904
CVE Number:  CVE-2013-1734

Class:       Cross-Site Scripting
Versions:    2.17.1 to 4.0.10, 4.1.1 to 4.2.6, 4.3.1 to 4.4
Fixed In:    4.0.11, 4.2.7, 4.4.1
Description: Some parameters passed to editflagtypes.cgi were not
             correctly filtered in the HTML page, which could lead
             to XSS.
References:  https://bugzilla.mozilla.org/show_bug.cgi?id=924802
CVE Number:  CVE-2013-1742

Class:       Cross-Site Scripting
Versions:    4.1.1 to 4.2.6, 4.3.1 to 4.4
Fixed In:    4.2.7, 4.4.1
Description: Due to an incomplete fix for CVE-2012-4189, some
             incorrectly filtered field values in tabular reports
             could lead to XSS.
References:  https://bugzilla.mozilla.org/show_bug.cgi?id=924932
CVE Number:  CVE-2013-1743

Alerts:

Mageia	MGASA-2014-0199	bugzilla	2014-05-02
Mandriva	MDVSA-2013:285	bugzilla	2013-11-26
Fedora	FEDORA-2013-19480	bugzilla	2013-10-29
Fedora	FEDORA-2013-19458	bugzilla	2013-10-29

chromium: multiple vulnerabilities

Package(s):

chromium-browser

CVE #(s):

CVE-2013-2925 CVE-2013-2926 CVE-2013-2927 CVE-2013-2928

Created:

October 28, 2013

Updated:

November 19, 2013

Description:

From the CVE entries:

Use-after-free vulnerability in core/xml/XMLHttpRequest.cpp in Blink, as used in Google Chrome before 30.0.1599.101, allows remote attackers to cause a denial of service or possibly have unspecified other impact via vectors that trigger multiple conflicting uses of the same XMLHttpRequest object. (CVE-2013-2925)

Use-after-free vulnerability in the IndentOutdentCommand::tryIndentingAsListItem function in core/editing/IndentOutdentCommand.cpp in Blink, as used in Google Chrome before 30.0.1599.101, allows user-assisted remote attackers to cause a denial of service or possibly have unspecified other impact via vectors related to list elements. (CVE-2013-2926)

Use-after-free vulnerability in the HTMLFormElement::prepareForSubmission function in core/html/HTMLFormElement.cpp in Blink, as used in Google Chrome before 30.0.1599.101, allows remote attackers to cause a denial of service or possibly have unspecified other impact via vectors related to submission for FORM elements. (CVE-2013-2927)

Multiple unspecified vulnerabilities in Google Chrome before 30.0.1599.101 allow attackers to cause a denial of service or possibly have other impact via unknown vectors. (CVE-2013-2928)

Alerts:

Gentoo	201403-01	chromium	2014-03-05
openSUSE	openSUSE-SU-2014:0065-1	chromium	2014-01-15
openSUSE	openSUSE-SU-2013:1861-1	chromium	2013-12-12
openSUSE	openSUSE-SU-2013:1776-1	chromium	2013-11-27
Mageia	MGASA-2013-0321	chromium-browser-stable	2013-11-09
openSUSE	openSUSE-SU-2013:1729-1	chromium	2013-11-19
Debian	DSA-2785-1	chromium-browser	2013-10-26

dropbear: information leak

Package(s):

dropbear

CVE #(s):

CVE-2013-4434

Created:

October 28, 2013

Updated:

November 18, 2013

Description:

From the Mageia advisory:

Inconsistent delays in authorization failures could be used to disclose the existence of valid user accounts in dropbear before 2013.59

Alerts:

openSUSE	openSUSE-SU-2013:1696-1	dropbear	2013-11-15
openSUSE	openSUSE-SU-2013:1616-1	dropbear	2013-10-31
Mandriva	MDVSA-2013:261	dropbear	2013-10-28
Mageia	MGASA-2013-0318	dropbear	2013-10-25

mozilla: multiple vulnerabilities

Package(s):

firefox, thunderbird, seamonkey

CVE #(s):

CVE-2013-5590 CVE-2013-5595 CVE-2013-5597 CVE-2013-5599 CVE-2013-5600 CVE-2013-5601 CVE-2013-5602 CVE-2013-5604

Created:

October 30, 2013

Updated:

December 10, 2013

Description:

From the Red Hat advisory:

Several flaws were found in the processing of malformed web content. A web page containing malicious content could cause Firefox to terminate unexpectedly or, potentially, execute arbitrary code with the privileges of the user running Firefox. (CVE-2013-5590, CVE-2013-5597, CVE-2013-5599, CVE-2013-5600, CVE-2013-5601, CVE-2013-5602)

It was found that the Firefox JavaScript engine incorrectly allocated memory for certain functions. An attacker could combine this flaw with other vulnerabilities to execute arbitrary code with the privileges of the user running Firefox. (CVE-2013-5595)

A flaw was found in the way Firefox handled certain Extensible Stylesheet Language Transformations (XSLT) files. An attacker could combine this flaw with other vulnerabilities to execute arbitrary code with the privileges of the user running Firefox. (CVE-2013-5604)

Alerts:

Gentoo	201504-01	firefox	2015-04-07
openSUSE	openSUSE-SU-2014:1100-1	Firefox	2014-09-09
Fedora	FEDORA-2013-22467	seamonkey	2013-12-10
Fedora	FEDORA-2013-22456	seamonkey	2013-12-10
openSUSE	openSUSE-SU-2013:1788-1	seamonkey	2013-11-29
Mageia	MGASA-2013-0329	iceape	2013-11-20
Slackware	SSA:2013-322-01	mozilla	2013-11-18
SUSE	SUSE-SU-2013:1678-1	Mozilla Firefox	2013-11-15
Debian	DSA-2797-1	icedove	2013-11-13
openSUSE	openSUSE-SU-2013:1644-1	seamonkey	2013-11-09
Mageia	MGASA-2013-0320	firefox	2013-11-09
Fedora	FEDORA-2013-20429	xulrunner	2013-11-11
Fedora	FEDORA-2013-20429	firefox	2013-11-11
openSUSE	openSUSE-SU-2013:1633-1	Mozilla	2013-11-07
openSUSE	openSUSE-SU-2013:1634-1	Mozilla	2013-11-07
Slackware	SSA:2013-307-01	mozilla	2013-11-03
Slackware	SSA:2013-322-04	seamonkey	2013-11-18
Fedora	FEDORA-2013-20448	xulrunner	2013-11-01
Fedora	FEDORA-2013-20448	firefox	2013-11-01
Ubuntu	USN-2010-1	thunderbird	2013-10-31
Scientific Linux	SLSA-2013:1480-1	thunderbird	2013-10-30
Oracle	ELSA-2013-1480	thunderbird	2013-10-30
Mandriva	MDVSA-2013:264	firefox	2013-10-31
Debian	DSA-2788-1	iceweasel	2013-10-31
CentOS	CESA-2013:1480	thunderbird	2013-10-30
CentOS	CESA-2013:1480	thunderbird	2013-10-30
Red Hat	RHSA-2013:1480-01	thunderbird	2013-10-30
Ubuntu	USN-2009-1	firefox	2013-10-29
Scientific Linux	SLSA-2013:1476-1	firefox	2013-10-30
Oracle	ELSA-2013-1476	firefox	2013-10-30
Oracle	ELSA-2013-1476	firefox	2013-10-29
CentOS	CESA-2013:1476	firefox	2013-10-30
CentOS	CESA-2013:1476	firefox	2013-10-30
Red Hat	RHSA-2013:1476-01	firefox	2013-10-29
Mageia	MGASA-2013-0326	thunderbird	2013-11-18

firefox: multiple vulnerabilties

Package(s):

firefox

CVE #(s):

CVE-2013-5591 CVE-2013-5592 CVE-2013-5593 CVE-2013-5596 CVE-2013-5598 CVE-2013-5603

Created:

October 30, 2013

Updated:

December 1, 2013

Description:

From the Ubuntu advisory:

Multiple memory safety issues were discovered in Firefox. If a user were tricked in to opening a specially crafted page, an attacker could possibly exploit these to cause a denial of service via application crash, or potentially execute arbitrary code with the privileges of the user invoking Firefox. (CVE-2013-5591, CVE-2013-5592)

Jordi Chancel discovered that HTML select elements could display arbitrary content. An attacker could potentially exploit this to conduct URL spoofing or clickjacking attacks (CVE-2013-5593)

Ezra Pool discovered a crash on extremely large pages. An attacked could potentially exploit this to execute arbitrary code with the privileges of the user invoking Firefox. (CVE-2013-5596)

Cody Crews discovered a way to append an iframe in to an embedded PDF object displayed with PDF.js. An attacked could potentially exploit this to read local files, leading to information disclosure. (CVE-2013-5598)

Abhishek Arya discovered a use-after-free when interacting with HTML document templates. An attacker could potentially exploit this to cause a denial of service via application crash or execute arbitrary code with the privileges of the user invoking Firefox. (CVE-2013-5603)

Alerts:

Gentoo	201504-01	firefox	2015-04-07
openSUSE	openSUSE-SU-2014:1100-1	Firefox	2014-09-09
openSUSE	openSUSE-SU-2013:1788-1	seamonkey	2013-11-29
Mageia	MGASA-2013-0329	iceape	2013-11-20
openSUSE	openSUSE-SU-2013:1644-1	seamonkey	2013-11-09
openSUSE	openSUSE-SU-2013:1633-1	Mozilla	2013-11-07
openSUSE	openSUSE-SU-2013:1634-1	Mozilla	2013-11-07
Fedora	FEDORA-2013-20448	xulrunner	2013-11-01
Fedora	FEDORA-2013-20448	firefox	2013-11-01
Ubuntu	USN-2010-1	thunderbird	2013-10-31
Ubuntu	USN-2009-1	firefox	2013-10-29

glance: information leak

Package(s):

glance

CVE #(s):

CVE-2013-4428

Created:

October 24, 2013

Updated:

November 19, 2013

Description:

From the Ubuntu advisory:

Stuart McLaren discovered that Glance did not properly enforce the 'download_image' policy for cached images. An authenticated user could exploit this to obtain sensitive information in an image protected by this setting.

Alerts:

Red Hat	RHSA-2013:1525-01	openstack-glance	2013-11-18
Ubuntu	USN-2003-1	glance	2013-10-23

gnutls: code execution

Package(s):

gnutls

CVE #(s):

CVE-2013-4466

Created:

October 29, 2013

Updated:

December 1, 2013

Description:

From the Red Hat bugzilla:

Upstream GnuTLS versions 3.1.15 and 3.2.5 correct a buffer overflow in dane_query_tlsa() function used to parse DANE (DNS-based Authentication of Named Entities) DNS records. The function parses DNS server reply into dane_query_st / dane_query_t struct which can hold up to 4 entries, but the function failed to check this and allowed parsing more then 4 entries form the reply, resulting in buffer overflow.

An application using DANE protocol to verify certificates could crash or, possibly, execute arbitrary code when parsing a response from a malicious DNS server.

Alerts:

Mageia	MGASA-2013-0354	gnutls	2013-11-30
Fedora	FEDORA-2013-20052	gnutls	2013-10-29
Fedora	FEDORA-2013-20628	gnutls	2013-11-18

keystone: incorrect token revocation

Package(s):

keystone

CVE #(s):

CVE-2013-4222

Created:

October 24, 2013

Updated:

November 19, 2013

Description:

From the CVE entry:

OpenStack Identity (Keystone) Folsom, Grizzly 2013.1.3 and earlier, and Havana before havana-3 does not properly revoke user tokens when a tenant is disabled, which allows remote authenticated users to retain access via the token.

Alerts:

Red Hat	RHSA-2013:1524-01	openstack-keystone	2013-11-18
Ubuntu	USN-2002-1	keystone	2013-10-23

libguestfs: insecure temporary directory

Package(s):

libguestfs

CVE #(s):

CVE-2013-4419

Created:

October 28, 2013

Updated:

December 4, 2013

Description:

From the Red Hat bugzilla:

It was found that guestfish, which enables shell scripting and command line access to libguestfs, insecurely created the temporary directory used to store the network socket when started in server mode (using the "--listen" option). If guestfish were run with the "--listen" option, a local attacker could use this flaw to intercept and modify other users' guestfish commands, allowing them to perform arbitrary guestfish actions (such as modifying virtual machines) with the privileges of a different user, or use this flaw to obtain authentication credentials.

Alerts:

Scientific Linux	SLSA-2013:1536-2	libguestfs	2013-12-03
Oracle	ELSA-2013-1536	libguestfs	2013-11-27
Oracle	ELSA-2013-2584	kernel	2013-11-28
Oracle	ELSA-2013-2584	kernel	2013-11-28
Oracle	ELSA-2013-2585	kernel	2013-11-28
Oracle	ELSA-2013-2585	kernel	2013-11-28
Oracle	ELSA-2013-2583	kernel	2013-11-28
Red Hat	RHSA-2013:1536-02	libguestfs	2013-11-21
SUSE	SUSE-SU-2013:1626-1	guestfs	2013-11-04
Fedora	FEDORA-2013-19492	libguestfs	2013-10-27
Fedora	FEDORA-2013-19452	libguestfs	2013-10-27

libuv: denial of service

Package(s):

libuv

CVE #(s):

CVE-2013-4450

Created:

October 29, 2013

Updated:

December 17, 2013

Description:

From the CVE entry:

The HTTP server in Node.js 0.10.x before 0.10.21 and 0.8.x before 0.8.26 allows remote attackers to cause a denial of service (memory and CPU consumption) by sending a large number of pipelined requests without reading the response.

Alerts:

Mageia	MGASA-2014-0007	nodejs	2014-01-06
Red Hat	RHSA-2013:1842-01	nodejs010-nodejs	2013-12-16
openSUSE	openSUSE-SU-2013:1863-1	nodejs	2013-12-12
Fedora	FEDORA-2013-19491	nodejs	2013-10-29
Fedora	FEDORA-2013-19497	nodejs	2013-10-29
Fedora	FEDORA-2013-19491	libuv	2013-10-29
Fedora	FEDORA-2013-19497	libuv	2013-10-29

mediawiki: multiple vulnerabilities

Package(s):

mediawiki

CVE #(s):

CVE-2013-1816 CVE-2013-1817 CVE-2013-1818 CVE-2013-4304 CVE-2013-4305 CVE-2013-4306 CVE-2013-4307 CVE-2013-4308

Created:

October 29, 2013

Updated:

October 30, 2013

Description:

From the Gentoo advisory:

Multiple vulnerabilities have been discovered in MediaWiki.

A remote attacker may be able to execute arbitrary code, perform man-in-the-middle attacks, obtain sensitive information or perform cross-site scripting attacks.

Alerts:

Gentoo

201310-21

mediawiki

2013-10-28

mysql: unspecified vulnerability

Package(s):

mysql

CVE #(s):

CVE-2013-5807

Created:

October 25, 2013

Updated:

November 4, 2013

Description:

From the CVE entry:

Unspecified vulnerability in Oracle MySQL Server 5.5.x through 5.5.32 and 5.6.x through 5.6.12 allows remote authenticated users to affect confidentiality and integrity via unknown vectors related to Replication.

Alerts:

Gentoo	201409-04	mysql	2014-09-04
CentOS	CESA-2014:0189	mariadb55-mariadb	2014-02-26
CentOS	CESA-2014:0173	mysql55-mysql	2014-02-19
Red Hat	RHSA-2014:0189-01	mariadb55-mariadb	2014-02-19
Scientific Linux	SLSA-2014:0186-1	mysql55-mysql	2014-02-18
Oracle	ELSA-2014-0186	mysql55-mysql	2014-02-18
CentOS	CESA-2014:0186	mysql55-mysql	2014-02-19
Red Hat	RHSA-2014:0186-01	mysql55-mysql	2014-02-18
Red Hat	RHSA-2014:0173-01	mysql55-mysql	2014-02-13
Debian	DSA-2818-1	mysql-5.5	2013-12-16
Fedora	FEDORA-2013-19648	mysql	2013-11-02
Fedora	FEDORA-2013-19654	community-mysql	2013-11-02
Ubuntu	USN-2006-1	mysql-5.5, mysql-dfsg-5.1	2013-10-24

nova: information leak

Package(s):

nova

CVE #(s):

CVE-2013-4278

Created:

October 24, 2013

Updated:

October 30, 2013

Description:

From the CVE entry:

The "create an instance" API in OpenStack Compute (Nova) Folsom, Grizzly, and Havana does not properly enforce the os-flavor-access:is_public property, which allows remote authenticated users to boot arbitrary flavors by guessing the flavor id. NOTE: this issue is due to an incomplete fix for CVE-2013-2256.

Alerts:

Fedora	FEDORA-2013-22693	openstack-nova	2013-12-12
Ubuntu	USN-2000-1	nova	2013-10-23

pmake: symlink attack

Package(s):

pmake

CVE #(s):

CVE-2011-1920

Created:

October 28, 2013

Updated:

November 21, 2013

Description:

From the CVE entry:

The make include files in NetBSD before 1.6.2, as used in pmake 1.111 and other products, allow local users to overwrite arbitrary files via a symlink attack on a /tmp/_depend##### temporary file, related to (1) bsd.lib.mk and (2) bsd.prog.mk.

Alerts:

Mandriva	MDVSA-2013:271	pmake	2013-11-21
Mageia	MGASA-2013-0331	pmake	2013-11-20
Gentoo	201310-17	pmake	2013-10-28

python-djblets: multiple vulnerabilities

Package(s):

python-djblets

CVE #(s):

CVE-2013-4409 CVE-2013-4410 CVE-2013-4411

Created:

October 29, 2013

Updated:

October 30, 2013

Description:

From the Red Hat bugzilla:

[CVE-2013-4409]: Occasionally objects would be transmitted as a repr() of an object instead of a JSON serialization. In order to restore this representation to python code while parsing the JSON, Djblets would use the eval() routine to execute them, leading to a risk of executing arbitrary code in the Review Board process.

[CVE-2013-4410]: Certain functions within Review Board's REST API were not properly validating authorization decisions against access-control lists.

If the attacker was aware of specific database table IDs, it was possible to gain access to restricted data. This vulnerability does not lead directly to a compromise of a machine or denial of service, but may expose sensitive information such as details about other embargoed security issues or confidential intellectual property.

[CVE-2013-4411]: A flaw in the Review Board dashboard URL-processing logic makes it possible for a user to construct a URL that would reveal review requests for review groups to which the user does not belong.

This flaw is only of particular risk to those deployments relying on review groups to restrict access to private reviews, such as those that may contain confidential intellectual property or provide information about embargoed security issues.

Alerts:

Fedora	FEDORA-2013-18911	ReviewBoard	2013-10-29
Fedora	FEDORA-2013-18931	ReviewBoard	2013-10-29
Fedora	FEDORA-2013-18911	python-djblets	2013-10-29
Fedora	FEDORA-2013-18931	python-djblets	2013-10-29

python-oauth2: man-in-the-middle attack

Package(s):

python-oauth2

CVE #(s):

CVE-2013-4347

Created:

October 28, 2013

Updated:

September 26, 2014

Description:

From the Mageia advisory:

It was found that in python-oauth2, an application for authorization flows for web applications, the nonce value generated isn't sufficiently random. While doing bulk operations the nonce might be repeated, so there is a chance of predictability. This could allow MITM attackers to conduct replay attacks.

Alerts:

Fedora	FEDORA-2014-12536	python-oauth2	2014-10-28
Fedora	FEDORA-2014-12475	python-oauth2	2014-10-28
Fedora	FEDORA-2014-10786	python-oauth2	2014-09-26
Fedora	FEDORA-2014-10784	python-oauth2	2014-09-26
Mageia	MGASA-2013-0314	python-oauth2	2013-10-25

qspice: denial of service

Package(s):

qspice

CVE #(s):

CVE-2013-4282

Created:

October 30, 2013

Updated:

May 18, 2015

Description:

From the Red Hat advisory:

A stack-based buffer overflow flaw was found in the way the reds_handle_ticket() function in the spice-server library handled decryption of ticket data provided by the client. A remote user able to initiate a SPICE connection to an application acting as a SPICE server could use this flaw to crash the application.

Alerts:

openSUSE	openSUSE-SU-2015:1750-1	spice	2015-10-15
SUSE	SUSE-SU-2015:0884-2	spice	2015-05-16
SUSE	SUSE-SU-2015:0884-1	spice	2015-05-15
Mandriva	MDVSA-2014:016	spice	2014-01-22
Mageia	MGASA-2014-0022	spice	2014-01-21
Debian	DSA-2839-1	spice	2014-01-08
Ubuntu	USN-2027-1	spice	2013-11-12
Fedora	FEDORA-2013-20340	spice	2013-11-08
Fedora	FEDORA-2013-20360	spice	2013-11-08
Scientific Linux	SLSA-2013:1473-1	spice-server	2013-10-30
Scientific Linux	SLSA-2013:1474-1	qspice	2013-10-30
Oracle	ELSA-2013-1473	spice-server	2013-10-29
Oracle	ELSA-2013-1474	qspice	2013-10-29
CentOS	CESA-2013:1473	spice-server	2013-10-30
CentOS	CESA-2013:1474	qspice	2013-10-29
Red Hat	RHSA-2013:1473-01	spice-server	2013-10-29
Red Hat	RHSA-2013:1474-01	qspice	2013-10-29

roundcube: code execution

Package(s):

roundcube

CVE #(s):

CVE-2013-6172

Created:

October 28, 2013

Updated:

March 14, 2014

Description:

From the Debian advisory:

It was discovered that roundcube, a skinnable AJAX based webmail solution for IMAP servers, does not properly sanitize the _session parameter in steps/utils/save_pref.inc during saving preferences. The vulnerability can be exploited to overwrite configuration settings and subsequently allowing random file access, manipulated SQL queries and even code execution.

Alerts:

openSUSE	openSUSE-SU-2014:0365-1	roundcubemail	2014-03-13
Gentoo	201402-15	roundcube	2014-02-11
Fedora	FEDORA-2013-19745	roundcubemail	2013-10-31
Mandriva	MDVSA-2013:263	roundcubemail	2013-10-29
Fedora	FEDORA-2013-19729	roundcubemail	2013-10-26
Debian	DSA-2787-1	roundcube	2013-10-27
Mageia	MGASA-2013-0325	roundcubemail	2013-11-18

salt: information leak

Package(s):

salt

CVE #(s):

CVE-2013-4439

Created:

October 28, 2013

Updated:

October 30, 2013

Description:

From the Red Hat bugzilla:

Saltstack, a client/server configuration system, was found to have allowed any minion to masquerade itself as any others agents when requesting stuff from the master, which could permit a compromised server to request data from another server, which could lead to potential information leak.

Alerts:

Fedora	FEDORA-2013-19354	salt	2013-10-27
Fedora	FEDORA-2013-19356	salt	2013-10-27

scipy: insecure temporary directory

Package(s):

scipy

CVE #(s):

CVE-2013-4251

Created:

October 28, 2013

Updated:

November 21, 2013

Description:

From Vincent Danen's comment to the Red Hat bug report:

To summarize, scipy.weave will use /tmp/[username] as persistent storage (cache), but it does not check whether or not this directory already exists, does not check whether it is a directory or a symlink, and also does not verify permissions or ownership, which could allow someone to place code in this directory that would be executed as the user running scipy.weave.

Alerts:

Mageia	MGASA-2013-0330	python-scipy	2013-11-20
Fedora	FEDORA-2013-19236	scipy	2013-10-27
Fedora	FEDORA-2013-19271	scipy	2013-10-27

tptest: multiple vulnerabilities

Package(s):

tptest

CVE #(s):

CVE-2009-0650 CVE-2009-0659

Created:

October 28, 2013

Updated:

October 30, 2013

Description:

From the CVE entries:

Stack-based buffer overflow in the GetStatsFromLine function in TPTEST 3.1.7 and earlier, and possibly 5.02, allows remote attackers to cause a denial of service (application crash) and possibly execute arbitrary code via a STATS line with a long pwd field. NOTE: some of these details are obtained from third party information. (CVE-2009-0650)

Stack-based buffer overflow in the GetStatsFromLine function in TPTEST 3.1.7 allows remote attackers to have an unknown impact via a STATS line with a long email field. NOTE: the provenance of this information is unknown; the details are obtained solely from third party information. (CVE-2009-0659)

Alerts:

Gentoo

201310-16

tptest

2013-10-27

x2goserver: code execution

Package(s):

x2goserver

CVE #(s):

CVE-2013-4376

Created:

October 28, 2013

Updated:

March 18, 2014

Description:

From the Gentoo advisory:

A vulnerability in the setgid wrapper x2gosqlitewrapper.c does not hardcode an internal path to x2gosqlitewrapper.pl, allowing a remote attacker to change that path.

A remote attacker may be able to execute arbitrary code with the privileges of the user running the server process.

Alerts:

Mandriva	MDVSA-2014:063	x2goserver	2014-03-17
Mageia	MGASA-2014-0111	x2goserver	2014-03-01
Fedora	FEDORA-2014-0168	x2goserver	2014-01-13
Fedora	FEDORA-2014-0202	x2goserver	2014-01-13
Gentoo	201310-19	x2goserver	2013-10-28

Kernel release status

The current development kernel is 3.12-rc7, released on October 27. Linus says: "The KS week is over, and thus the seventh - and likely the last - rc for 3.12 is out, and I'm back on the normal Sunday schedule." He also warned that upcoming travel is likely to turn the 3.13 merge window into a relatively messy affair.

Stable updates: 3.2.52 was released on October 27.

Quotes of the week

Eating pizza and updating a recalcitrant PC. Getting back into the mindset and if the paperwork all goes right will be back in Linuxspace Monday part time at Intel.

— Welcome back Alan Cox

In fact, I think we have gone much too far in saying "all policy in user space", because the fact is, user space isn't very good at policy. Especially not at reacting to complex situations with different devices. From what I've seen, "policy in user space" has resulted in exactly two modes:

user space does something stupid and wrong (example: "nice -19 X" to work around some scheduler oddities)
user space does nothing at all, and the kernel people say "hey, user space _could_ set this value Xyz, so it's not our problem, and it's policy, so we shouldn't touch it".

I think we in the kernel should say "our defaults should be what everybody sane can use, and they should work fine on average". With "policy in user space" being for crazy people that do really odd things and can really spare the time to tune for their particular issue.

— Linus Torvalds

Yes, I was supposed to, but James and I ended up splitting a bottle of wine on the train, and it's not a good idea to do driver model work with wine. Hard liquor is usually needed.

— Greg Kroah-Hartman

The Linux Foundation Technical Advisory Board election results

The 2013 elections for five members of the Linux Foundation's Technical Advisory Board were held on October 23. The election, which almost certainly drew more interest than in any preceding year, re-elected Greg Kroah-Hartman, Thomas Gleixner, and Jonathan Corbet to the board; they will be joined by new members Sarah Sharp and Matthew Garrett.

Full Story (comments: 23)

The 2013 Kernel Summit

By Jonathan Corbet
October 29, 2013

The 2013 Kernel Summit was held October 23-25 in Edinburgh, UK. Following the pattern set in recent years, the 2013 summit was divided into a minisummit day (the 23rd), followed by a core-kernel, invitation-only session; a plenary day finished things out on the 25th. What follows is LWN's coverage from the event, supplemented in places with information posted by others.

The minisummit day

Several minisummits were held on the first day. Naturally, your editor was only able to attend one of them, being:

Power-aware scheduling: how to improve CPU power management and tie it more firmly into the CPU scheduler.

Other minisummits held that day include:

Media; notes posted by Mauro Carvalho Chehab.
ARM architecture maintainers (Notes by Grant Likely).

The core day

The second day of the summit was attended by 70 or so invited developers. Topics covered this day include:

The kernel/user-space boundary: where do we draw the line between the kernel and user space, especially when it comes to ABI guarantees?
The Outreach Program for Women and the kernel.
Control groups are under heavy development; this session covered where this subsystem is headed and the numerous issues that still need to be worked out.
The linux-next and -stable trees; two sessions on our most important non-mainline trees.
Testing: in particular, Trinity and Fenngguang Wu's build-and-boot test robot.
On saying "no": are we accepting too much marginal code into the kernel?
Bugzilla, lightning talks, and future summits. The status of the kernel's bug tracker, some random subjects, and discussion of the Kernel Summit itself.

The plenary day

A larger group met for the final day of the kernel summit. Among the topics discussed there were:

Minisummit reports, with details from the ARM minisummit in particular.
Git tree maintenance: how the tip and arm-soc trees are managed.
Scalability techniques: four talks on the scalability mechanisms available in the kernel.
Device tree bindings and how to resolve the current mess.
Checkpoint/restart in user space; what is the status of this functionality?
A kernel.org update; current and future changes to the community's development repository site.
Security practices: what to do when a security problem is found.
Lightning talks; a number of brief presentations to finish out the day.

[Your editor would like to thank the Linux Foundation for assistance with his travel to Edinburgh to attend the Kernel Summit].

The kernel/user-space boundary

By Jonathan Corbet
October 29, 2013

H. Peter Anvin and Miklos Szeredi kicked off the invitation-only day of the 2013 Kernel Summit with a question: where, exactly, does the boundary between the kernel and user space lie? And, in particular, when is it possible to make an incompatible change to the kernel ABI with the understanding that the actual, supported ABI is provided by user-space code that is closely tied to the kernel? The answer they got was clear but, perhaps, not exactly what they wanted.

Peter started by saying that we have a clear "don't break user space" policy. For the most part, living up to that policy is relatively straightforward; one avoids making incompatible changes to system calls and things continue to work. We are also getting better with other kernel interfaces like sysfs and /proc. But there was, he said, an interesting corner case last year: the GRUB2 bootloader was making a habit of looking at the kernel configuration files during installation for the setup of its menus. The restructuring of some internal kernel code broke GRUB2. At this point, Linus jumped in to claim that the kernel's configuration files do not constitute a part of the kernel's ABI. When somebody does something that stupid, he said, one really cannot blame the kernel.

Peter moved on to another problem, one he himself introduced some sixteen years ago. The automounter ABI has issues such that it failed to work with a 32-bit user space on a 64-bit kernel. A new type of pipe had to be introduced to fix this problem; it was, he said, an example of how far we are willing to go to avoid breaking applications.

What about, he asked, cases where we need to shift to a new ABI altogether? Changes to the pseudo terminal (pty) interface are needed to get ptys to work from within containers; it's still not clear how to handle the master device in such situations. The control group interface is in flux, and there have been some disagreements with the systemd folks over who "owns" the container hierarchy as a whole. When it was suggested that systemd "wants to take over" control groups, Linus was quick to state that no such thing was going to happen. James Bottomley jumped in to note that the issue had been discussed and that a mutually acceptable solution was at hand.

Another ABI issue is the current limitation, built into the Linux virtual filesystem layer, that no single I/O operation can transfer more than 2GB of data. As systems and memory sizes get larger, that limit may eventually hurt, he said, but Linus said that this limit would not be lifted. We are, he said, better than OS X, which causes overly large I/O requests to fail outright; Linux, instead, just transfers the maximum allowed amount of data. There are huge security implications to allowing larger I/O operations, to the point that there is no excuse for removing the limit. A whole lot of potential problems will simply be avoided if filesystem code just never sees 64-bit I/O request sizes. And, he said, if you try to do a 4GB write, "you're a moron." Such requests will not be any faster, there is just no reason to do it.

In general, Linus said, he is fundamentally opposed to doing anything that might break user space; he was not sure why the topic was being discussed at all. The old issue of tracepoints came up, and Linus said that, if we break something with a tracepoint change, that is a problem and we will fix it. Greg Kroah-Hartman pointed out that some subsystem maintainers — himself and Al Viro, for example — are refusing to add tracepoints because they are afraid of being committed to supporting them forever. Others thought that this policy was excessively cautious, noting that actual problems with tracepoint ABI compatibility have been few and far between. No-tracepoints policies, Ingo Molnar said, are simply not justified.

What about changes to printk() calls that break scripts that grep through the system logs? Linus answered that printk() is not special, and that problems there will be fixed as well. Masami Hiramatsu suggested that the sort of string-oriented data found in the logs is relatively easy to work with, and changes are easy to adapt to, but that hints that, perhaps, users are just coping with problems there rather than complaining about them. It would be interesting to see what would happen if a user were to actually complain about broken scripts resulting from a printk() change. Linus closed things off by complaining that the kernel developers have spent far more time worrying about this problem than they ever have dealing with actual issues.

Miklos stepped up to ask more specifically: where is the boundary that sets the kernel ABI? Some parts of the operating system live in the kernel, while others can be found in libraries and the kernel. Sometimes things move; rename() was once in the C library, now it's a system call provided by the kernel. NFS servers have been found on both sides of the divide, graphics drivers have been moving into the kernel, sound drivers have moved in both directions, and filesystems can be found on both sides.

Miklos may have been hoping for some sort of discussion of whether the interface between the kernel and some of these low-level components could be considered to be internal and not part of the kernel's ABI, but things didn't go in that direction. Instead, the discussion wandered a bit, covering whether parts of NetworkManager should be in the kernel (no, they would just have to call out to user space for authentication and such), drivers that require closed user-space components (still considered unwelcome), and the implementation of protocols like MTP, which, evidently, has more stuff in user space than should really be there.

[Next: Outreach Program for Women]

Comments (5 posted)

The Outreach Program for Women

By Jonathan Corbet
October 29, 2013

Sarah Sharp led a session to update the group on the status of the Outreach Program for Women (OPW) and the kernel. The kernel project has just completed its first round of participation in this program, which funds three-month internships for women looking to work with a free software project. OPW runs internships twice each year, in the June-to-September and December-to-March time frames. There is a month-long application process prior to the beginning of each period, during which prospective participants much contact a mentor, submit a contribution to the target project, and get it accepted.

In the first round, the kernel applicants mostly sent patches for the staging tree; the result was 93 patches in the 3.12 kernel and a tutorial on how to participate in the program. It was, Sarah said, a successful beginning.

For those who want to help, the project is always in need of mentors for participants. Also needed are people who can hang out on the OPW IRC channel and answer basic questions as they come up, and "patient people" who can do pre-posting review of patches.

Dave Airlie asked how OPW projects were picked. The answer is that it starts with the mentors, who generally have specific areas in which they are comfortable helping new developers. After that, it's up to the applicants to suggest specific projects they would like to work on. In general, the most successful projects seem to be those that do not require a lot of subsystem-specific knowledge.

What is expected of mentors? Sarah said that the way she worked was to have weekly phone meetings with her intern, and that she spent three or four hours per week looking at patches, responding to questions, etc. All told, she said, it is a commitment of about eight hours per week.

Returning to how the kernel's first OPW experience went: for the most part, the participants "did pretty well." A couple of them have not yet completed their projects, but they are still working on them. At least six of them are looking for work in the kernel area. It was "pretty successful" overall. James Bottomley asked whether any of the interns are thinking about turning around and taking a turn as mentors; Sarah answered that some of them are helping out on the IRC channel, but none of them are ready to be full mentors yet.

Linus raised a general concern he had which, he said, had little to do with OPW specifically; it's something he has seen in other groups. There were, he said, a lot of trivial one-line patches from the OPW participants; the same fix applied to ten files showed up as ten separate patches. It makes the numbers look good, he said, but is not necessarily helpful otherwise. He is worried about people gaming the system to look good by having a lot of commits; Arnd Bergmann agreed that splitting things into too many patches tends to impede review.

From there, the conversation became a bit more unfocused. Ted Ts'o suggested that mentoring could be a good recruiting tool for companies that are forever struggling to hire enough kernel developers. There were complaints that three months is too short a period for an intern to really dig deeply into the kernel; it is, it was suggested, driven a little too much by the American university schedule. Dave added that universities in other parts of the world will often place students into this kind of project for longer periods. Mauro Carvalho Chehab suggested that it would be good to place a greater emphasis on places like Africa and South America where we have few developers now; Sarah agreed that interns can come from anywhere, we simply are not advertising well enough in those areas.

Ted asked what the limiting factor for the program was; funding seems to be the biggest issue. That will be especially true during the next cycle, when the Linux Foundation, which funded several interns the first time around, will not be able to participate. There is a need for more employers to kick in to support interns; the cost for one participant in $5,750. Information on how to sponsor OPW interns has been posted on the KernelNewbies site.

[Next: Control Groups].

The evolution of control groups

By Jonathan Corbet
October 29, 2013

The control group (cgroup) subsystem is currently under intensive development; some of that work will lead, eventually, to ABI changes that are visible from user space. Given the amount of controversy around this subsystem, it was not surprising to see control groups show up on the 2013 Kernel Summit agenda. Tejun Heo led a session to describe the consensus that had been reached on the cgroup interface, only to find that there are still a few loose ends to be tied down.

Tejun started by reminding the group that the multiple hierarchy feature of cgroups, whereby processes can be placed in multiple, entirely different hierarchies, is going away. The unified hierarchy work is not entirely usable yet, though, because it requires that all controllers be enabled for the full hierarchy. Some controllers still are not hierarchical at all; they are being fixed over time. The behavior of controllers is being made more uniform as well.

One big change that has been decided upon recently is to make cgroup controllers work on a per-process basis; currently they apply per-thread instead. Among other things, that means that threads belonging to the same process can be placed in different control groups, leading to various headaches. Of all the controllers only the CPU controller has any business working with individual threads. For that case, some sort of special interface will be introduced that will, among other things, allow processes to set CPU policies for their own threads.

That interface, evidently, might be implemented with yet another special-purpose virtual filesystem. There was some concern about how the cgroup subsystem may be adding features that, essentially, constitute new system calls without review; there were also concerns about how the filesystem-based interface suffers from race conditions. Peter Zijlstra worried about how the new per-thread interface might look, saying that there were a lot of vague details that still need to be worked out. Linus wondered if it was really true that only the CPU controller needs to look at individual threads; some server users, he said have wanted per-thread control for other resources as well.

Linus also warned that it might not be possible to remove the old cgroup interface for at least ten years; as long as somebody is using it, it will need to be supported. Tejun seemed unworried about preserving the old interface for as long as it is needed. Part of Tejun's equanimity may come from a feeling that it will not actually be necessary to keep the old interface for that long; he said that even Google, which has complained about the unified hierarchy plans in the past, has admitted that it can probably make that move. So he doesn't see people needing the old interface for a long time.

In general, he said, the biggest use for multiple hierarchies has been to work around problems in non-hierarchical controllers; once those problems are fixed, there will be less need for that feature. But he still agrees that it will need to be maintained for some years, even though removal of multiple hierarchy support would simplify things a lot. Linus pointed out that, even if nobody is using multiple hierarchies currently, new kernels will still need to work on old distributions for a long time. Current users can be fixed, he said, but Fedora 16 cannot.

Hugh Dickins worried that, if the old interface is maintained, new users may emerge in the coming years. Should some sort of warning be added to tell those users to shift to the new ABI? James Bottomley said, to general agreement, that deprecation warnings just don't work; distributions just patch them out to avoid worrying their users. Tejun noted that new features will only be supported in the new ABI; that, hopefully, will provide sufficient incentive to use it. Hugh asked what would happen if somebody submitted a patch extending the old ABI; Tejun said that the bar for acceptance would be quite high in that case.

From the discussion, it was clear that numerous details are still in need of being worked out. Paul Turner said that there is a desire for a notification interface for cgroup hierarchy changes. That, he said, would allow a top-level controller to watch and, perhaps, intervene; he doesn't like that idea, since Google wants to be able to delegate subtrees to other processes. In general, there seems to be a lack of clarity about who will be in charge of the cgroup hierarchy as a whole; the systemd project has plans in that area, but that creates difficulties when, for example, a distribution is run from within a container. Evidently some sort of accord is in the works there, but there are other interesting questions, such as what happens when the new and old interfaces are used at the same time.

All told, there is a fair amount to be decided still. Meanwhile, Tejun said, the next concrete step is to fix the locking, which is currently too strongly tied to the internal locking of the virtual filesystem layer. After that is done, it should be possible to post a prototype showing how the new scheme will work. That posting may happen by the end of the year.

[Next: Linux-next and -stable].

Comments (24 posted)

The linux-next and -stable trees

By Jonathan Corbet
October 29, 2013

While most kernel development focuses on the mainline, Linus's tree is not the only one out there. Two 2013 Kernel Summit sessions focused on a couple of the other important trees: linux-next and the stable tree; coverage of both those sessions has been combined into this article.

linux-next

Linux-next maintainer Stephen Rothwell attended the Kernel Summit as the last stop in a long journey away from the office — and from maintenance of the linux-next tree. That tree continued to function in his absence, the first time it has ever done so. Stephen's session covered the current state of this tree and how things could maybe be made to work a little better.

He started by thanking Thierry Reding and Mark Brown for keeping linux-next going during his break. Linux-next is a full-time job for him, so it has been hard to hand off in the past, but the substitute maintainership appears to have worked this time.

Stephen routinely looks at the code that flows into the mainline during merge windows to see how much of it had previously appeared in linux-next. For recent kernels, that figure has been approaching 90%; it would probably be difficult, he said, to do better than that. What might be improved a bit, though, is how long that code is in linux-next. In general, only 72% of the code that appears in linux-next is there one week before the merge window opens; that figure drops to less than 60% two weeks prior. So he tends to be busy in the last couple weeks of the development cycle, dealing with lots of merge conflicts and build failures. It leads to some long days. It would, he allowed, be nice of more of that code got into linux-next a bit earlier.

There are, he said, 181 active trees merged into linux-next every day. Some of those are second-level trees that, eventually, reach mainline by way of another subsystem maintainer's tree.

The worst problem he encounters is whole-tree changes that affect files across the kernel. Those create a lot of little messes that he must then try to clean up. It would be nice, he said, to find a better way to do things like API changes that require changes all over. Linus added that the reformatting of files to fix coding-style issues before applying small changes is "really nasty," leading to "conflicts from hell." It would be better to separate things like white-space changes from real work by a full release — or to not do the trivial changes at all. Ben Herrenschmidt suggested that white-space changes could be done at the end of a series, making them easy to drop, but Linus said that doesn't help, that the conflicts still come about. The best way to do white-space changes if they must be done, he said, is to do them to otherwise quiescent code.

Stephen said that he still sees more rebasing of trees than he would like; rebasing should be avoided whenever possible. Grant Likely asked about one of the common cases for rebasing: the addition of tags like Acked-by to patches that already appear in linux-next. Linus said that this is a bit of a gray area but that, in general, if those tags do not show up in a timely manner, it's usually best not to bother with them. Stephen added that the addition of tags to patches in a published git tree may be an indication that the tree has been published prematurely; such trees should not be fed into linux-next until they are deemed ready for merging.

James Bottomley pointed out that developers often publish git trees to get attention from Fengguang Wu's automated build-and-test system. But such trees do not need to be fed into linux-next to get that kind of testing; they can be put out on a separate testing branch. Ingo Molnar said that some tags, like Tested-by, can arrive fairly late; we really do not want to drop them and fail to credit our testers. Ted Ts'o added that employers often count things like Reviewed-by tags and that it is important to get them in.

Linus's response was that timeliness matters; a too-late review is essentially useless anyway. That said, he also said that some people do take the "no-rebase" policy a little bit too far. There are times when rebasing a tree can be justified, especially if other developers are not basing their own trees on the rebased tree. For example, when a patch turns out to be a bad idea, it can be better make it disappear from the history and to "pretend that all crap just didn't happen." Ben added that he asks his submitters to base their trees on the mainline; then he can rebase his tree and they will not be adversely affected by it.

Dave Jones complained about bugs which are claimed to be "fixed in linux-next," but those fixes then sit in linux-next for months until the next development cycle comes around. The right response to that problem, Andrew Morton said, was to "steal them" and forward them directly to Linus for immediate merging.

In general, Linus said, he has been using the linux-next tree a lot more than he used to because it is working a lot better. It has become a good indication of what will be coming in the next merge window. In general, the group agreed that linux-next is a valuable resource that has done a lot to make the development process work more smoothly.

The stable tree

Stable tree maintainer Greg Kroah-Hartman gave a quick update on the state of the various stable trees he runs. One of the biggest changes in stable tree maintenance, he said, is that he is starting to delay the inclusion of patches until they have appeared in one of Linus's -rc kernels; that delay will typically be one or two weeks. There have been a few incidents recently where "stable" patches have caused regressions; Greg hopes that, by inserting a small delay, he can flush out the problematic patches before shipping them in a stable release.

Among the other problems that Greg is trying to address is maintainers who never mark any patches for the stable tree. "We need to fix that," he said. There is also the problem Dave mentioned, where stable fixes live in linux-next for months; "don't do that." If a patch is tagged for stable, he said, that means it should go out soon, not languish for months. James said that sometimes he will hold stable fixes because he needs people to test them; he does not have every piece of SCSI hardware, and so cannot verify that every patch works as advertised.

Greg went on to say that it would be nice to have some way to automate the task of figuring out how far back a given patch needs to be backported. To that end, some developers are proposing the addition of a "Fixes:" tag to bug-fix patches. That tag would include the SHA hash of the commit that caused the original problem, along with that patch's subject line. Including the hash of the bad commit is better than just putting an initial version; it helps maintainers of non-mainline trees figure out if the fix applies to their version of the kernel or not.

Linus jumped in to say that he would like everybody to run this command in their repositories:

    git config core.abbrev 12

That causes git to abbreviate commit hashes to 12 characters. The default of seven characters is too small to prevent occasional hash collisions in the kernel; it was, he said, a big mistake made early in git's history. He also noted that he spends a lot of time fixing up hashes in patch, many of which are "clearly bogus." Most of the problems, most likely, are caused by the rebasing of trees.

James asked: what should be done about patches that should have been marked for stable but, for whatever reason, did not get tagged? The answer was to send the relevant mainline git IDs to stable@vger.kernel.org; the rest will be taken care of.

Arnd Bergmann asked whether there had been complaints about the dropping of support for the 3.0.x series. Greg answered that people are mostly happy and not complaining. One person is considering picking up 3.0 maintenance, but Greg does not think that will happen.

Other problems that Greg mentioned including people reporting that things have been broken in stable, but the real problem is in the mainline. Those reports usually come, he said, from people who are not testing Linus's -rc releases. He also mentioned "certain distributors" who are not good about sending in the fixes they apply to their own kernels. Those fixes tend to be especially useful, since they were applied in response to a real problem that somebody encountered. If anybody wants to help out the stable process, he said, digging through distributor kernels for these fixes would be a useful thing to do.

As the session wound to a close, Greg was asked about what he does with patches that do not apply to older kernels. His response is that he will bounce them back to the maintainer for backporting. Subsystem maintainers, he said, need not worry about patches being tweaked on their way into -stable.

[Next: Testing].

Comments (7 posted)

Two sessions on kernel testing

By Jonathan Corbet
October 29, 2013

Over the last couple of years, the amount of testing applied to pre-release kernels has quietly been increased in a big way; this work has had a significant impact on kernel release quality. Two of the developers behind that work — Dave Jones and Fengguang Wu — ran sessions to talk about what they are doing and their plans for the future.

Trinity

Dave's "Trinity" fuzz-testing tool has been around for some time, but the pace of development has increased in the last year or two. Dave introduced himself as the guy who "has broken lots of people's stuff" and who plans to continue doing so; Trinity, he said, is getting better and growing in scope. From the beginning, Trinity has tried to perform system call intelligent fuzz testing by avoiding calls that will obviously get an EINVAL error from the kernel. So, for example, system calls expecting a file descriptor will get a file descriptor rather than a random number. Work is continuing in that direction; the idea is to get Trinity to do things that real programs would do.

One of the targets for the future is to add more subsystem-specific testing. There will also be more use of features like control groups. Among other things, these additional tests will require that Trinity be run as root — something that has been discouraged until now. He wants the ability to fuzz things that only root can do, he said, expressing confidence that there will be "all kinds of horrors" waiting to be found.

Dave was asked about using the fault injection framework for testing; he responded that, every time he tries, he feels like he is the first to use it. "Things blow up everywhere." Dave Airlie asked about fuzz-testing in 32-bit mode on a 64-bit kernel; the answer was that this mode was broken for a while, but should work now. When asked about testing user namespaces, Dave noted that a lot of problems have been found in that area. Trinity does not run within them now; it would be nice if somebody would submit a wrapper to make that work.

Ted Ts'o remarked on the difficulty of finding the real cause of a lot of trinity-caused crashes. Quite a few of them, he suspects, are really the result of memory corruption left behind by a previous test; the place where the crash actually happens may have nothing to do with the real problem. Dave agreed that reproducibility is a problem. There is a lot that changes between runs, even after recent work that is careful to save random seeds so that the random number sequence used will be the same. It is, he said, "the number-one thing that sucks" about Trinity, but fixing it has proved to be far harder than he thought it would be.

The build-and-boot robot

Fengguang Wu has 63 Reported-by credits in the 3.12 kernel — over 12% of the total. These bug reports are the result of the extensive testing setup that he has been building; he ran a session at the Kernel Summit to describe his work.

Essentially, Fengguang's system works by pulling and merging a large number of git trees, building the resulting kernel, then booting it. There are a number of tests that are then run, looking for bugs and performance regressions. When a problem comes up, Fengguang's (large) systems can run up to 1000 KVM instances to quickly bisect the history and determine which patch caused the problem. The result is an automated email message, of which he sends about ten each day. Fengguang noted that a lot of developers send apologetic emails in response, but, he said, "it's a robot, you don't have to reply." Linus jibed that most of that mail was probably an automated "thank you" script run by Greg Kroah-Hartman.

Of the problems reported by Fengguang's system, about 10% are build errors, 20% are build warnings and documentation issues, 60% are generated by the sparse utility, and 10% come from static checkers like smatch and Coccinelle. The number of error reports going out has been dropping over time, he said; it seems that more developers are running their own tests before making their code public.

There were various questions, starting with: which compiler does he use? Fengguang said that it's gcc from the Debian "sid" distribution. Are any branches excluded from testing? Those which hold only ancient commits or which are based on old upstream releases are not tested; any branch that has "experimental" in its name will also not be tested. Otherwise, once Fengguang's system finds your repository, no branch will go untested. How does he find trees to test? Mostly from mailing lists and git logs; as Ted put it, "you can run, but you can't hide."

One of the more recent changes is the running of performance tests. These tests are time consuming, though; Fengguang would like more tests that can run quickly. The best performance tests, he said, have a --runtime flag to control how long they run; that leads to predictable behavior on both fast and slow systems. He also noted that both the size of the kernel and the time required to boot are increasing over time.

The session ended with general agreement in the room that this work is helpful and welcome.

[Next: Saying "no"].

Comments (3 posted)

On saying "no"

By Jonathan Corbet
October 29, 2013

Are we getting too lax in our acceptance of new code in the kernel? Christoph Hellwig ran a session to explore that question. It may be, he said, that kernel maintainers are getting old and just don't have the energy to fight against substandard code like they once used to; there is very little pushback against ideas now. The review process, he said, is now focused on white-space issues rather than on whether the patch is needed at all. Nobody is willing to just decide that a problem is too hard, resulting in things like the "Linux security modules debacle." People don't want to make decisions or stick their necks out to turn stuff down. There are, Christoph said, two types of maintainer currently: those who merge everything, and those who ignore everything.

Olof Johansson claimed that he and Arnd Bergmann have gotten pretty good at pushing crap out of the ARM tree. There are a lot of new subsystems with energetic new maintainers, which helps. The jury is still out, he said, but things are running well at this point.

Ted Ts'o asked for specific examples of code that should not have been merged. Christoph responded that, once upon a time, it used to be hard to merge patches that bloated core kernel data structures; now everything just keeps growing. He also pointed to the various filesystem notification subsystems in the kernel, leading to a threat from Linus: he will kick anybody who tries to submit another notification scheme. Linus did allow that we are letting a lot of stuff through; he relies on the top-level maintainers to push back and that isn't always happening, despite the fact that he asks maintainers to say "no" more often.

Ted said that, ten years, ago, patches went through a lot more review by developers other than the maintainer of the affected subsystem, but we just can't scale to that kind of review anymore. So maintainers are not paying much attention to what is happening outside of their own subsystems, and a lot of people have tuned out of the linux-kernel mailing list entirely. Ingo Molnar suggested that we are seeing some of the natural results of a distributed development model; the social structure, too, is more distributed. It was also pointed out that heavy criticism is frowned upon more than it used to be.

Christoph went on to mention the problems with O_TMPFILE, which he described as "a trainwreck." There are still critical bugs being fixed with that code. In general, he said, it often seems that code going into the kernel has been rushed and hasn't had time for proper review or testing. Dave Jones added that, sometimes, maintainers abuse Linus's trust and merge code that really is not ready to go in.

Is the model of having a single maintainer for a subsystem still appropriate? Christoph noted that having multiple maintainers seems to help to ensure adequate review of patches. Perhaps group maintenance should become more of the rule than the exception. Tejun Heo responded that we just don't have the manpower to have multiple maintainers for a lot of subsystems. He cited the workqueue code which, he said, really needs somebody else with a deep understanding of what is going on there, but, which, for years, was looked at by nobody but him.

Andrew Morton said that he often struggles with the question of whether patches should be merged at all. He pointed to the "zstuff" — transcendent memory and related in-memory compression technologies; he kept pushing back against the pressure to merge that code, hoping for an explanation of why we needed it. In the end, a couple of distributors picked it up; that tipped the balance and made it hard to keep that code out. Peter Zijlstra described this as a sort of side-channel attack: distributors will carry almost anything if their users bother them enough. So, by harassing distributors enough, developers can get controversial code merged into the kernel.

What about patches that cause performance regressions? The sense in the room was that things have gotten a little bit better in that area. Some distributors are running more performance tests; this includes Mel Gorman's work at SUSE and the tests run by Martin Petersen at Oracle. Red Hat, too, has been pushed to run more tests, but the people involved apparently feel a little ignored at the moment.

Andrew came back in to repeat that he could really use more help in deciding which features should or shouldn't be merged. While being wary of the idea that every problem can be solved with a new mailing list, he thought that perhaps some sort of "graybeards@" list might be helpful for this kind of question. Linus agreed, suggesting that it should be a non-optional list and that all Kernel Summit attendees should be subscribed. If anybody complains loudly, they could be removed, and the community would know that they aren't interested in core issues and don't want to attend any more summits.

There was some agreement on the creation of this list, but it was also agreed that, somehow, the volume would need to be kept down. Linus suggested that it should not allow messages that are copied to any other list. There may also be a rule that no patches are posted to the list; anything posted there would include links to discussions elsewhere. Ted closed out the session by saying that he would go ahead and create the list and the members could work out the rules from there.

[Next: Bugzilla, lightning talks, and future summits].

Comments (4 posted)

Bugzilla, lightning talks, and future summits

By Jonathan Corbet
October 29, 2013

The final sessions at the 2013 Kernel Summit were given over to a brief discussion of the kernel's bug tracker, followed by a series of lightning talks and a discussion of the summit itself.

Konstantin Ryabitsev is one of the kernel.org administrators hired after the Linux Foundation took over responsibility for that site. One of the many services he is charged with managing is the bugzilla system. He came to the Kernel Summit asking if the community was happy with things and how could the kernel's bug tracker be made more useful?

Rafael Wysocki responded that his group is using it to track ACPI and power management bugs; it works well and they are happy with it. Ted Ts'o said that he tends to use specific bugs as a sort of miniature mailing-list archive. He often forgets to close out the bugs when things are resolved; as a result, he's not particularly interested in getting status report messages out of the system.

There was a request for better ways of tracking patches posted in response to tracked bugs. One developer complained that he would like to be able to assign bugs to others, but the system won't allow that. It seems that the "edit bugs" permission is required; Konstantin said he is willing to give that permission to anybody who requests it, but it was also suggested that perhaps that permission should just be given to anybody who has a kernel.org account.

Russell King asked how he could get out of receiving mail for bugs in subsystems that he no longer maintains; Konstantin's response was "procmail." More seriously, though, it's just a matter of sending him a note and such issues will be taken care of.

Dave Jones noted that bugzilla is the second-worst bug tracker available, with the absolute worst being "everything else." He went on to say that Fedora developers are interested in using upstream bug trackers more and that, in particular, Fedora's kernel developers would like to track bugs in the kernel.org bugzilla. Beyond that, though, they would like to feed bugs from Fedora's "Automatic Bug Reporting Tool" directly into the system. Anybody who has ever had to wade through a morass of ABRT bugs in the Fedora tracker might be forgiven for being worried about this idea, but, Dave said, things have gotten much better and there aren't a whole lot of duplicate bugs anymore. So the assembled group was reasonably accepting of this idea, seeing it as a replacement for the much-lamented kerneloops.org service if nothing else.

Lightning talks

Linus wanted the group to know that the 3.13 merge window is likely to be "chaos." He has a bunch of travel coming up, including an appearance at the Korea Linux Forum. So he will be traveling during the merge window, sometimes in places without any sort of reasonable network access.

Ted asked whether it would just make sense to delay the release of 3.12 until Linus is back. Linus responded that he could do that if people wanted, but that it really wouldn't make much difference in the end. There's going to be a period where he just isn't looking at pull requests, but, he emphasized, maintainers should still get their pull requests in early. Greg added that, during this time, urgent stable fixes could, contrary to normal policy, get into the stable tree prior to being merged into the mainline; he won't hold them while Linus is off the net.

Dave Airlie asked: who out there is pining for kerneloops.org? Anybody who would like to see better tracking of the problems being encountered by users might want to take a look at retrace.fedoraproject.org. It has plots of reports, information on specific problems, and more. It's a useful resource, he said, but he still wishes the kernel project had somebody tracking regressions.

Dave Jones talked about his work digging through the reports generated by Coverity's static analysis tool. During the 3.11 merge window, he said, the merging of the staging tree alone added 435 new issues to the database. He's been working on cleaning out the noise, but there are still over 5000 open issues there. A lot of them are false positives or failures on the tool's part to understand kernel idioms; that noise makes it hard to see the real bugs. He's trying to clean things up over time.

He has been given the permission to allow others to see the problem list; interested developers are encouraged to contact him. He is also able to see who else is looking at the problems. A few of them are kernel developers, he said, but most of the people looking at problem reports have no commits in the tree. Instead, they work for government agencies, defense contractors, and commercial exploit companies. The list "gave him the creeps," he said. Should those people be kicked out of the system? It might be possible, but they could then just buy their own Coverity license or run a tool like smatch instead, so there would be little value in doing so.

Wireless network drivers are the leading source of problem reports, followed by misc drivers, SCSI drivers, and network drivers as a whole. In general, he said, the "dark areas of the kernel" are the ones needing the most attention.

Mathieu Desnoyers asked whether people working on their own static analysis tools should be looking at Coverity's results. That was acknowledged to be a gray area. Coverity does not appear to be a litigious company, but people working in that area might still want to steer clear of Coverity's results.

Future summits

Ted Ts'o closed out the day by asking: were developers happy with how the kernel summit went, and what would they like to see changed? The participants seemed generally happy, but there were a couple of complaints that the control group discussion was boring to many. That follows a longstanding Kernel Summit trend: highly technical subjects are generally considered to be better addressed in a more specialized setting.

Much of the discussion was related to colocation with other events. From a show of hands, only about 20% of the Kernel Summit attendees found their way over to LinuxCon, which was happening that same week. The separation of the venues (LinuxCon was a ten-minute walk away) certainly didn't help in that regard. In general, there were grumbles about how there was too much going on, or that the "cloud stuff" (CloudOpen was also running at the same time) is taking over.

Should the Kernel Summit be run as a standalone event? The consensus seemed to be that an entirely standalone summit doesn't work, but that, with enough minisummits, it might be able to go on its own. Should the summit be colocated with the Linux Plumbers Conference, and possibly away from everything else? One problem with that is that, like the summit, LPC is a high-intensity event; the two together make for a long week. Still, a straw poll indicated that most of the participants favored putting those two conferences together.

Eventually the discussion wound down. The group headed off for the group photo, followed by a nice dinner and the TAB election.

[Next: Minisummit reports].

Comments (3 posted)

Minisummit reports

By Jonathan Corbet
October 29, 2013

Following the usual tradition, the plenary day at the 2013 Kernel Summit started with a series of reports from various kernel minisummits, most of which were held earlier in the week. For some of those meetings, there is reporting that is available elsewhere, so that information will not be duplicated here.

The events with reports found elsewhere are:

The power-aware scheduling minisummit; this meeting was covered separately here on LWN.
The media minisummit; see Mauro Carvalho Chehab's notes for details.
The Linux security summit held in New Orleans in September; LWN posted several articles from that event.

The ARM minisummit

Grant Likely reported from the two-day ARM minisummit held in Edinburgh. This gathering, he said, was "mostly boring"; for the most part, it was "normal engineering stuff". Grant said that it was nice not to have "big news items" to have to deal with. Notes are promised, but have not been posted as of this writing.

One of the items discussed was the status of the "single zImage" work, which aims to create a single binary kernel that can boot on a wide variety of ARM hardware. Work is progressing in this area, with support being added for a number of ARM processor types. For the curious, there is an online spreadsheet showing the current status of many ARM-based chipsets.

Some time went into the problem of systems with non-discoverable topologies; this is an especially vexing issue in the server area. There was some talk of trying to push the problem into the firmware, but the simple fact is that it is not possible to get the firmware to hide the issue on all systems.

As anybody who has been unlucky enough to be subscribed to the relevant mailing lists knows, the big issue at the 2013 gathering was the problems with the device tree transition. Grant gave an overview of the discussion as part of his report; more details on the device tree issue came out during a separate session later in the day.

The big problem with device trees is their status as part of the kernel's ABI. As an ABI element, device tree bindings should not change in incompatible ways, but that constraint creates a problem: as the developers learn more about the problem, they need to be able to evolve the device tree mechanism to match. That has led to a situation where driver development has been stalled; the need to create perfect, future-proof device tree bindings has caused work to be hung up in the review process. The number of new bindings is large, while the number of capable reviewers is small. The result is a logjam that is slowing development as a whole.

There is a plan to resolve some of those issues which was discussed later in the day. In this session, though, Grant raised the question: might device trees be a failed experiment? Should the kernel maybe switch to something else? The alternatives are few, however. The "board file" scheme used in the past has proved to not scale and is an impediment to the single zImage goal. ACPI has its own problems in the ARM space, even if it were to become universally available. One might contemplate the possibility of something completely new, but there are no proposals on the table now. It seems that we are stuck with device trees for now.

So the ARM developers plan to focus on making things work better in that area. That means that much of the work in the coming year will be aimed at improving processes rather than inventing interesting new technologies.

[Next: Git tree maintenance]

Git tree maintenance

By Jonathan Corbet
October 29, 2013

Git has transformed the kernel development process since its introduction in 2005. While this tool is well integrated into most developers' workflows, there are still substantial differences in how maintainers use it. A session in the 2013 Kernel Summit gave maintainers of two of the more active trees a chance to talk about their management processes and what they have learned about the best ways to shepherd large numbers of patches into the mainline.

H. Peter Anvin is one of the maintainers of the "tip" tree, which takes its name from the first names of the group that manages it: Thomas Gleixner, Ingo Molnar, and Peter. This tree was started in 2007; it was initially focused on the x86 architecture tree, but has since expanded into other, mostly core-kernel areas. They made a lot of mistakes early on, Peter said, that caused Linus to "go very Finnish" on them, but things are working smoothly now.

There are three types of branches maintained in the tip tree. "Topic branches" contain patches that are intended to be pushed during the next merge window. "Urgent branches" contain bug fixes that need to go in before the merge window, while "queue branches" hold patches that will be pushed in some merge window after the next one. So, as of this writing, when the 3.12 development cycle is nearing its end, topic branches will hold changes for 3.13, while queue branches hold changes for 3.14 or later.

All of these branches are periodically integrated into the tip "master" branch; Peter described master as their version of linux-next. This merge is done by hand, usually by Ingo, who then feeds the result to his extensive testing setup.

Other tip practices include tip-bot, a program which sends out notifications when patches are added to a tip branch. Those notifications used to only go to the patch author, but they have since been expanded to include the linux-kernel list as well. Patches in tip routinely include a "Link:" tag pointing to the relevant mailing list discussion. There is a status board in the works, based on Fengguang Wu's testing setup.

Olof Johansson talked about the management of the arm-soc tree, which was started by Arnd Bergmann in July, 2011. Olof joined that effort later that year; more recently, Arnd has been on paternity leave, so Kevin Hilman has joined the team to help keep things going. This tree, which is focused on system-on-chip support for the ARM architecture, is run with no master branch. Instead, there is a large set of branches, mostly with a "next/" prefix for patches in a number of categories, including cleanups, non-urgent fixes, SoC support additions, board support, device tree changes, and driver changes. All of these branches are merged into a for-next branch which is then fed into linux-next.

All of these branches lead to a lot of merges — about 150 of them for each kernel development cycle. Olof said that newcomers tend to have a bit of a rough start as they figure out how the arm-soc tree works, but, after a while, things tend to run smoothly.

Olof mentioned a few "pain points" that the arm-soc maintainers have to live with. At the top of his list was the time period around when Linus releases -rc6; that's when a whole lot of new code comes in. It gets hard to pick a reasonable time to cut things off for the upcoming merge window. Having two levels of trees tends to add latency to the system, which doesn't help. There is also an ongoing stream of merge conflicts, both within arm-soc and with linux-next, and troubles with dependencies on external trees that get rebased by their maintainers.

Repeating a common lament, Olof said that the arm-soc maintainers are unable to keep up with the traffic on the ARM mailing lists. So they depend on the submaintainers review patches and keep inappropriate changes out.

Arnd closed the session with a quick discussion of the process of moving most device drivers out of the ARM tree and into the regular kernel drivers tree. This work has caused a lot of merge conflicts, he said. But he expressed a hope that, once all the drivers are gone, there will be little need for a separate arm-soc tree and they will be able to stop maintaining it.

[Next: Scalability techniques]

Scalability techniques

By Jonathan Corbet
October 29, 2013

The plenary day at the 2013 Kernel Summit included an hour-long block of time for the discussion of various scalability techniques. It was, in a sense, a set of brief tutorials for kernel developers wanting to know more about how to use some of the more advanced mechanisms available in the kernel.

Memory barriers

Paul McKenney started with a discussion of memory barriers — processor instructions used to ensure that memory operations are carried out in a specific order. Normally, Paul said, memory barriers cannot be used by themselves; instead, they must be used in pairs. So, for example, a typical memory barrier usage would follow a scheme like this:

The writer side fills a structure with interesting data, then sets a flag to indicate that the structure's contents are valid. It is important that no other processor see the "valid" flag as being set until the structure changes are visible there. To make that happen, the writer process will execute a write memory barrier between filling the structure and setting the flag.
The reader process knows that, once it sees the flag set, the contents of the structure are valid. But that only works if the instructions reading the flag and the structure are not reordered within the processor. To ensure that, a read barrier is placed between the reading of the flag and subsequent operations.

Paul noted that memory barriers can be expensive; might there be something cheaper? That may not be possible in an absolute sense, but there is a mechanism by which the cost can be shifted to one side of the operation: read-copy-update (RCU). RCU splits time into "grace periods"; any critical section that begins before the start of a grace period is guaranteed to have completed by the end of that grace period. Code that is concerned with concurrency can use RCU's synchronization calls to wait for a grace period to complete in the knowledge that all changes done within that grace period will be globally visible at the end.

Doing things in this way shifts the entire cost to the side making the synchronization call, which is sufficient in many situations. For the cases where it is not, one can use RCU callbacks, but that leads to some other interesting situations. But that was the subject of the next talk.

RCU-correct data structures

Josh Triplett took over to try to make the task of creating data structures that function properly with RCU a less-tricky task. The mental model for ordinary locking, he said, is relatively easy for most developers to understand. RCU is harder, with the result that most RCU-protected data structures are "cargo-culted." If the data structure looks something like a linked list, he said, it's pretty easy to figure out what is going on. Otherwise, the process is harder; he described it as "construct an ordering scenario where things go wrong, add a barrier to fix it, repeat, go insane."

There is a simpler way, he said. Developers should forget about trying to get a handle on overlapping operations, possible reordering of operations, etc., and just assume that a reader can run atomically between every pair of writes. That leads to a pair of relatively straightforward rules:

On the reader side: rcu_dereference() should be used for pointer traversal, smp_rmb() should be used to place barriers between independent reads, and the entire critical section should be enclosed within an RCU read lock.
For writers, there are two possibilities. If writes are done in the same order that readers will read the data, then synchronize_rcu() should be used between writes. If they are done in the opposite order, use smp_wmb() or rcu_assign_pointer() to insert a write memory barrier. There is no need for an expensive synchronize call in this case.

Those two rules, Josh contends, are all that is ever needed to create safe data structures protected by RCU.

Josh walked the group through a simple linked-list example. Supposed you have a simple linked list that looks like this:

If you want to insert an item into the list without taking any locks, you would start by setting the "next" pointer within the new item like this:

Once that is done, the list itself can be modified to include the new item:

Any code traversing the list concurrently will either see the new item or it will not, but it will always see a correct list and not go off into the weeds — as long as the two pointer assignments described above are visible in the correct order. To ensure that, one should apply Josh's rules. Since these pointer assignments are done in the opposite order that a reader will use to traverse the list, all that is needed is a memory barrier between the writes and all will be well.

Removing an item from the list reverses the above process. First, the list is modified to route around the item to be taken out:

Once it's sure that no threads are using the to-be-removed item, it's "next" link can be cleared:

...and the item itself can be freed. In this case, the writes are happening in the same order that the reader would use. So it is necessary to use synchronize_rcu() between the two steps to guarantee that the doomed item is truly no longer in use before freeing it. It is also possible, of course, to just use call_rcu() to complete the job and free the item asynchronously after the end of the grace period.

Lock elision

Andi Kleen talked for a while about the use of transactional memory to eliminate the taking of locks in many situations; Andi described this technique in some detail in an LWN article last January. Lock elision, he said, is much simpler to work with than RCU and, if the conditions are right, it can also be faster.

Transactional memory, he said, is functionally the same as having an independent lock on each cache line in memory. It is based on speculative execution within CPUs, something that they have been doing for years; transactional memory just makes that speculation visible. This feature is rolling out on Intel processors now; it will be available throughout the server space within a year. There are a lot of potential uses for transactional memory, but he's restricting his work to lock elision in order to keep the existing kernel programming models.

With regard to which locks should be elided, Andi said that he prefers to just enable it for everything. It can be hard to predict which locks will elide well when the kernel runs. Ben Herrenschmidt complained that in some cases prediction is easy: overly large critical sections will always abort, forcing a fallback to regular locking. Memory-mapped I/O operations will also kill things.

Will Deacon asked whether the lock-elision code took any steps to ensure fairness among lock users. Andi replied that there is no need; lock elision only happens if there is no contention (and, thus, no fairness issue) for the lock. Otherwise things fall back to the regular locking code, which can implement fairness in the usual ways.

Linus said that, sometimes, lock elision can be slower than just taking the lock, but Andi disagreed. The only time when elision would be slower is if the transaction aborts and, in that case, there's contention and somebody would have blocked anyway. Linus pointed out that Intel still has not posted any performance numbers for lock elision within the kernel; he assumes that means that the numbers are bad. Andi did not address the lack of numbers directly, but he did say that elision allows developers to go back to coarser, faster locking.

He concluded by suggesting that, rather than add a bunch of hairy scalability code to the kernel, it might be better to wait a year and just use lock elision.

SRCU

The final talk of the scalability session was given by Lai Jaingshan, who discussed the "sleepable" variant of the RCU mechanism. Normally, RCU critical sections run in atomic context and cannot sleep, but there are cases where a reader needs to block while holding an RCU read lock. There are also, evidently, situations where a separate RCU domain is useful, or when code is running on an offline CPU that does not take part in the grace period mechanism.

SRCU was introduced in 2006; Paul McKenney documented it on LWN at that time. It turned out to be too slow, however, requiring a lot of expensive calls and a per-CPU counter wait for every critical section. So SRCU was reworked in 2012 by Paul and Lai. Updates can happen much more quickly now, with no synchronization calls required; it also has a new call_srcu() primitive.

There are about sixty users of SRCU in the 3.11 kernel, the biggest of which is the KVM hypervisor code. Lai provided an overview of the SRCU API, but it went quickly and it's doubtful that many in the audience picked up much of it. Consulting the code and the documentation in the kernel tree would be the best way to start working with the SRCU mechanism.

[Next: Device tree bindings]

Comments (5 posted)

Device tree bindings

By Jonathan Corbet
October 29, 2013

The device tree issue was omnipresent during the 2013 Kernel Summit, with dedicated minisummit sessions, hallway discussions, and an interminable mailing list thread all in the mix. Despite all the noise, though, some progress was seemingly made on the issue of how to evolve device tree bindings without breaking systems that depend on them. A group of developers presented those results to the plenary session.

Grant Likely and David Woodhouse started by reiterating the problems that led to the adoption of device trees for the ARM architecture in the first place. It comes down to undiscoverable hardware — hardware that does not describe itself to the CPU, and which thus cannot be enumerated automatically. This hardware is not just a problem with embedded ARM systems, Grant said; it is showing up in desktop and server systems too. In many situations, we are seeing the need for a general hardware description mechanism. The problem is coming up with the best way of doing this description while supporting systems that were shipped with intermediate device tree versions.

The solution comes down to a set of process changes, starting with a statement that device tree bindings are, indeed, considered to be stable by default. Once a binding has been included in a kernel release, developers should not break systems that are using that binding. That said, developers should not get hung up on creating perfect bindings now; we still do not know all of the common patterns and will need to make changes as we learn things. That means that bindings can, in fact, change after they have been released in a kernel; the key is to make those changes in the correct way.

Another decision that has been made is that configuration data is allowed within device tree bindings. This has been a controversial area; many developers feel that device trees should describe the hardware and nothing else. Grant made the claim that much configuration data should be considered part of the hardware design; there may be a region of memory intended for use as graphics buffers, for example.

There will be a staging-like mechanism for unstable bindings, but it expected that this mechanism will almost never be used. The device tree developers will be producing a document describing the recommended best practices and processes around device trees; there will also be a set of validation tools. Much of this work, it is hoped, will be completed within the next year.

The current rule that device tree bindings must be documented will be reinforced. The documentation lives in Documentation/devicetree/bindings in the kernel tree. The device tree maintainers would prefer to see these documents posted as a separate patch within a series so they can find it quickly. Bindings should get an acknowledgment from the device tree maintainers, but there is already too much review work to be done in this area. So, if the device tree maintainers are slow in getting to a patch, subsystem maintainers are considered empowered to merge bindings without an ack. These changes should go through the usual subsystem tree.

The compatibility rules say that new kernels must work with older device trees. If changes are required, they should be put into new properties; the old ones can then be deprecated but not removed. New properties should be optional, so that device trees lacking those properties continue to work. The device tree developers will provide a set of guidelines for the creation of future-proof bindings.

If it becomes absolutely necessary to introduce an incompatible change, Grant said, the first step is that the developer must submit to the mandatory public flogging. After that, if need be, developers should come up with a new "compatible" string and start over, while, of course, still binding against the older string if that is all that is available. DTS files (which hold a complete device tree for a specific system) should contain either the new or the old compatible string, but never both.

If all else fails, it is still permissible to add quirks in the code for specific hardware. If this is done with care, it should not reproduce the old board file problem; such quirks should be relatively rare.

Ben Herrenschmidt worried about the unstable binding mechanism; it is inevitable, he thought, that manufacturers would ship hardware using unstable bindings. David replied that bad manufacturer behavior is not limited to bindings; they ship a lot of strange code as well. But, he said, manufacturers have learned over time that things go a lot easier if they work with upstream-supported code. He didn't think that the unstable binding mechanism would ever be used; it is a "political compromise" that should never need to be employed. Arnd Bergmann added that, should this ever happen, it will not be the end of the world; the kernel community just has to make the consequences of shipping unstable bindings clear. In such cases, users will just have to update the device tree in their hardware before they can install a newer kernel.

What about the reviewer bandwidth problem? The main change in this area, it seems, is that the device tree reviewers will only look at the binding documentation; they will not look at the driver code itself. That is part of why they want the documentation in a separate patch. That means that subsystem maintainers will have to be a bit more involved in ensuring that the code matches the documentation — though there will be some tools that will help in that area as well.

[Next: Checkpoint/restart in user space]

Checkpoint/restart in user space

By Jonathan Corbet
October 29, 2013

There has long been a desire for the ability to checkpoint a set of processes (save their state to disk) and restore those processes at some future time, possibly on a different system. For almost as long, Linux has lacked that feature, but those days are coming to an end. Pavel Emelyanov led a session during the 2013 Kernel Summit's plenary day to update the audience on the status of this functionality.

Pavel started with the history of this feature. Early attempts to add checkpoint/restart went with an entirely in-kernel approach. The resulting patch set was large and invasive; it looked like a maintenance burden and never got much acceptance from the broader development community. Eventually, some developers realized that the APIs provided by the kernel were nearly sufficient to allow the creation of a checkpoint/restore mechanism that ran almost entirely in user space. All that was needed was a few additions here and there; as of the 3.11 kernel, all of those additions have been merged and user-space checkpoint/restart works. Live migration is supported as well.

Pavel had some requests for developers designing kernel interfaces in the future. Whenever new resources are added to a process, he asked, please provide a call to query the current state. A classic example is timers; developers added interfaces to create and arm timers, but nothing to query them, so the checkpoint/restart developers had to fill that in. He also requested that any user-visible identifiers exposed by the kernel not be global; instead, they should be per-process identifiers like file descriptors. If identifiers must be global — he gave process IDs as an example — it will be necessary to create a namespace around them so that the same identifiers can be restored with a checkpointed process.

Now that the basic functionality works, some interesting new features are being worked on. One of these checkpoints all processes in the system, but keeps the contents of their memory in place. It then boots into a new kernel with kexec and restores the processes quickly, using the saved memory whenever possible. This, Pavel said, is the path toward a seamless kernel upgrade.

Andrew Morton expressed his amazement that all of this functionality works, especially given that the checkpoint/restore developers added very little in the way of new kernel code. Is there, he asked, anything that doesn't work? Pavel responded that they have tried a lot of stuff, including web servers, command-line utilities, huge high-performance computing applications, and more. Almost everything will checkpoint and restore just fine.

Andrew then refined his question: could you write an application that is not checkpointable? The answer is "yes"; the usual problem is the use of external resources that cannot be checkpointed. For example, Unix-domain sockets where one end is held by a process that is not being checkpointed will block things; syslog can apparently be a problem in this regard. Work is being done to solve this problem for a set of known services; the systemd folks want it, Pavel added. Unknown devices are another problematic resource; there is a library hook that can be used to add support for specific devices if their state can be obtained and restored.

Beyond that, though, this long-sought functionality seems to work at last.

[Next: A kernel.org update].

Comments (13 posted)

A kernel.org update

By Jonathan Corbet
October 29, 2013

An update on the status of the kernel.org infrastructure is a traditional Kernel Summit feature; the 2013 event upheld that tradition, but with a new speaker. Kernel.org admin Konstantin Ryabitsev started out by saying that he knows nothing about the 2011 security incident; he has deliberately avoided reading the forensics reports that are (apparently) available. His choice is to focus on making kernel.org be as good as it can be now without being driven by past problems.

He gave a tour of the site's architecture, which your editor will not attempt to reproduce here. In general terms, there is an extensive backend system with a set of machines providing specific services and a large storage array; it is protected by a pair of firewall systems. The front end consists of a pair of servers, each of which runs two virtual machines; one of them handles git and dynamic content, while the other serves static content.

The front end systems are currently located in Palo Alto, CA and Portland, OR. One will be added in Seoul sometime around the middle of 2014, and another one in Beijing, which will only serve git trees, "soon." Work is also proceeding on the installation of a front end system in Montreal.

There is an elaborate process for accepting updates from developers and propagating them through the system. This mechanism has been sped up considerably in recent times; code pushed into kernel.org can be generally available in less than a minute. The developers in the session expressed their appreciation of this particular change.

Konstantin was asked about the nearly devastating git repository corruption problem experienced by the KDE project; what was kernel.org doing to avoid a similar issue? It comes down to using the storage array to take frequent snapshots and to keep them for a long period of time. In the end, the git repository is smaller than one might think (about 30GB), so keeping a lot of backups is a reasonable thing to do. There are also frequent git-fsck runs and other tests done to ensure that the repositories are in good shape.

With regard to account management, everybody who wants an account must appear in the kernel's web of trust. That means having a key signed by Linus, Ted Ts'o, or Peter Anvin, or by somebody who has such a key. Anybody who has an entry in the kernel MAINTAINERS file will automatically be approved for an account; anybody else must be explicitly approved by one of a small set of developers.

With regard to security, two-factor authentication is required for administrative access everywhere. All systems are all running SELinux in enforcing mode — an idea which caused some in the audience to shudder. System logs are stored to a ~~write-only~~ write-once medium. There is also an extensive alert system that calls out unusual activity; that leads to kernel.org users getting an occasional email asking about their recent activity on the system.

Plans for the next year include faster replication through the mirror network and an updated Bugzilla instance. Further out, there are plans for offsite backups, a git mirror in Europe, a new third-party security review, and the phasing out of the bzip2 compression format.

[Next: Security practices].

Comments (12 posted)

Security practices

By Jonathan Corbet
October 29, 2013

Kees Cook has been actively trying to improve the security of the Linux kernel for some time. His talk during the plenary day at the 2013 Kernel Summit was split into two parts. The first, on security antipatterns, was the same as the talk he gave at the Linux Security Summit in September; LWN covered that talk at the time, so there is no need to repeat that material here. The second half, instead, was a new talk on what a developer should do in response to a security-relevant bug. This talk, he said, was predicated on the assumption that kernel developers had made an ethical choice in favor of fixing flaws; otherwise their response may differ.

So what are the goals when dealing with a security fix? The wish, of course, is to get the fix out to end users as quickly as possible. If time is available, identifying the severity of the issue can be helpful, but that process is also error-prone. If the bug turns out to be serious, it is worthwhile to try to minimize the time that the public is exposed once the bug has been disclosed.

If a developer is unsure about the impact of a given bug, the best thing to do is to simply ask. Help is available in two places: the security@kernel.org list (which consists of a small number of kernel developers) and linux-distros@vs.openwall.org, which is made up of representatives from distributors. Mail to the latter list must include the string "[vs]" in the subject line to get past the spam filters. Both lists are private. Members of those lists will attempt to handle serious bugs in a coordinated manner. For less serious issues, the best approach is usually to just take the problems directly to the relevant public list.

When possible, security-related fixes should be tagged for the stable tree; a "Fixes:" tag to identify the commit that introduced the problem is also helpful. If possible, the CVE number assigned to the bug should go into the commit changelog; numbers can be assigned by a number of vendors, or from the oss-security mailing list.

It's worth noting that patience for embargoes is limited in the kernel community. Any problem sent to security@kernel.org can be kept under embargo for a maximum of five days; the limit on linux-distros is ten days. The whole point of the process is to get fixes out to users quickly; developers are sick of long delays in that regard.

For distributors and manufacturers who are concerned about getting security fixes, Kees had a simple piece of advice: don't bother with tracking CVE numbers. Instead, just pull the entire stable tree and ship everything in it. A lot of security problems will never have CVE numbers assigned to them; if you only take patches with CVEs, you'll miss a lot of important fixes.

At the end, Dave Jones jumped in to say that he would very much like to know about security bugs that the Trinity tool did not catch; that will help to refine the tests to catch similar problems in the future. Dan Carpenter expressed a similar wish with regard to the smatch utility. It will probably never be possible to find all security bugs automatically, but any progress in that direction seems like a good thing.

[Next: Lightning talks].

Plenary day lightning talks

By Jonathan Corbet
October 29, 2013

The 2013 Kernel Summit closed out with a set of lightning talks covering a wide range of topics including random numbers, configuration options, and ARM board testing.

Ted Ts'o started things off with a discussion of the kernel's random number generators. There has been, he noted dryly, a significant increase in interest in the quality of the kernel's random numbers recently. His biggest concern in this area remains embedded devices that create cryptographic keys shortly after they boot; there may be little or no useful entropy in the system at that point. Some fixes have been added recently, including adding more entropy sources and mixing in system-specific information like MAC addresses, but that still may not be enough entropy to do a whole lot of good. That can be especially true on systems where the in-kernel get_cycles() function returns no useful information, a problem which was covered here in September.

MIPS was one of the architectures that had just that problem. Since MIPS chips are used in devices like home routers, this is a real concern. In that case, the developers were able to find a fine-grained counter that, while useless for timekeeping, can be used to add a bit of entropy to the random pool. A new interface has been added to allow architecture code to provide access to such counters. But the best solution, he said, was for vendors to put hardware random number generators on their chips.

Josh Triplett presented a proposal to get rid of the various "defconfig" files found in the kernel tree. These files are supposed to contain a complete, bootable configuration for a given architecture. He would like to move that information into the Kconfig that define the configuration options themselves. There would be a new syntax saying whether a given option should be enabled by default whenever the default system config was requested by the user.

Linus didn't like the idea, though, saying that it would clutter the Kconfig files and still not solve the problem. He also noted that most defconfig files are nearly useless; the x86 one, he said, is essentially a random configuration used by Andi Kleen several years ago. A lot of the relevant configuration settings are architecture-dependent, so it would be necessary to add architecture-specific selectors and such.

The plan at this point is to move further discussion to the mailing list, but, without some changes, this idea probably will not get too far.

Peter Senna talked briefly about the Coccinelle semantic analysis tool which, he said, is finding a few bugs in each kernel development cycle. He would like to add more test cases to the system; interested developers are directed toward coccinellery.org for examples of how to use this tool. (One could also see this LWN article for an introduction to Coccinelle). Dan Carpenter talked briefly about his smatch tool, which is also improving over time. His biggest goal at this point is to provide more user-friendly output; the warnings that come out of smatch now can be rather difficult to interpret.

The final talk was presented by Paul Walmsley; it covered automatic testing of ARM kernels. He is running a testing lab that builds and boots a number of trees, generating reports when things go wrong. Olof Johansson run an elaborate testing setup; among other things, it performs fuzz testing with Trinity. There is also a 20-board testing array run by Kevin Hilman; he is doing power consumption tests as well.

These testing rigs, Paul said, are catching a lot of bugs, often before the relevant patches get very far. There are also a lot of work to keep going, though. Part of that problem may be related to the fact that the bisection of problems must be done manually; work is being done to automate that process as soon as possible.

After that there was a brief discussion of the Kernel Summit itself; some developers complained about communications, saying that they didn't always know about everything that was going on. There was also some discussion of the Linux Foundation Technical Advisory Board election held the night before, which was a somewhat noisy and chaotic affair. Thereafter, the group picture was taken and the developers headed out in search of dinner and beer.

Linus Torvalds Linux 3.12-rc7 ?

Sebastian Andrzej Siewior 3.10.17-rt12 ?

Steven Rostedt 3.8.13.9-rt20 ?

Steven Rostedt 3.4.67-rt83 ?

Ben Hutchings Linux 3.2.52 ?

Steven Rostedt 3.0.101-rt130 ?

Alexandre Courbot ARM: support for Trusted Foundations secure monitor ?

Russell King - ARM Linux ARM: add support to dump the kernel page tables ?

AKASHI Takahiro arm64: Add ftrace support ?

Tejun Heo sysfs: separate out kernfs, part #1 ?

Stefani Seibold add new prctl for a per process wide close on exec V.3 ?

Namhyung Kim tracing/uprobes: Add support for more fetch methods (v6) ?

Tom Zanussi tracing: add support for multibuffer event filters ?

Tom Zanussi tracing: trace event triggers ?

Tetsuo Handa A logger specialized for receiving from netconsole. ?

Jan Kiszka Add gdb python scripts as kernel debugging helpers ?

Jovi Zhangwei ktap 0.3 released ?

Frank Haverkamp Generic WorkQueue Engine (GenWQE) device driver (v3) ?

Kamil Debski phy: Add new Exynos USB PHY driver ?

Andy Gross add MSM BAM dmaengine driver ?

Wendy Ng thermal: bcm281xx: Add Temperature Monitor driver ?

Josh Cartwright Add support for the System Power Management Interface (SPMI) ?

Stephen Boyd Krait L1/L2 EDAC driver ?

Xiubo Li Add Freescale FTM PWM driver. ?

Sricharan R DRIVERS: IRQCHIP: Add support for crossbar IP ?

Heiko Stübner Input: add driver for Neonode zForce based touchscreens ?

Jens Axboe Multiqueue block layer ?

Huajun Li f2fs: Enable f2fs support inline data ?

Miklos Szeredi vfs: rename: link replacement to source ?

Kent Overstreet Immutable biovecs ?

Vladimir Davydov kmemcg shrinkers ?

Robert Jennings vmpslice support for zero-copy gifting of pages ?

Arvid Brodin [PATCH v5] net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) ?

Simon Horman MPLS actions and matches ?

Pablo Neira Ayuso 32/64/128-bits word addressing in nf_tables ?

Gao feng Add namespace support for audit ?

Alex Williamson KVM-VFIO pseudo device for VFIO coherency ?

Alex Williamson [PATCH v2 0/3] KVM noncoherent DMA registration and VFIO pseudo device ?

Jozsef Kadlecsik ipset 6.20.1 released ?

Ubuntu Touch 1.0

October 30, 2013

This article was contributed by Adam Saunders

Along with desktop and server editions, the release of Ubuntu 13.10 on October 17 brought the "1.0" version of Ubuntu Touch for the Galaxy Nexus and Nexus 4 smartphones. Many improvements have been made since LWN.net last took a look at Ubuntu Touch this past February. Having used Touch on the Nexus 4 as my regular, everyday phone since the release date, this reviewer is surprised by the system's usability, while disappointed by several gaping holes of missing functionality and other noticeable rough edges. Please note that this review is of the image released on the Ubuntu 13.10 release date of October 17; while unstable images for the 14.04 development cycle are already available for download, this review does not cover them.

Flashing the image is ostensibly no more convoluted than installing a custom Android ROM on a Nexus device. Unfortunately, following the instructions on the Ubuntu website gave me a non-functioning device. Had I scrolled down past the installation steps on that web page, I would have found advice to wipe the /data partition and retry if there are installation problems. Once I finished doing that and added an option to the installation instructions specifying the Nexus 4 as the device to be flashed, I had a working Ubuntu Touch smartphone.

While Canonical does state that Ubuntu Touch is for "developers and industry partners only", and the page with flashing instructions begins with a disclaimer, Mark Shuttleworth claims that Ubuntu Touch has reached "1.0" status. This is accurate, as long as "1.0" here refers to the feature completeness of the core Ubuntu Touch system. Several rough edges do remain.

On first boot, a helpful tutorial explains how to navigate Ubuntu Touch's interface: swiping the left edge reveals the Unity launcher bar, swiping from the right edge brings up the last used application, and swiping from the notification bar reveals system information, depending on which icon the user swipes from. There are no soft keys, which I took great delight in; I've always thought that Android soft keys were a waste of precious screen real estate, and Ubuntu Touch takes full advantage of the Nexus 4's screen.

The home button at the bottom of the launcher bar brings up the Dash's home screen. On this screen, one will find a number of basic applications (including a dialer and messaging) as well as dedicated web apps (including Facebook and Gmail). The Dash also includes an Applications page, with a list of locally-installed applications and suggestions for third-party applications for download; a Videos page, featuring links to online and installed videos; and Music, featuring links to suggested music for purchase and download as well as installed music.

The Dash interface will be familiar to Unity users, particular to those of Ubuntu Desktop 13.10. The built-in search function by default brings up local and online searches, analogous to the Smart Scopes tool of the 13.10 desktop. This can be disabled.

Overall usability and reliability has improved greatly since February. Gone are the placeholder applications, replaced with enough working applications for the device to be used as an everyday phone. Making and receiving phone calls, SMS, the music player, taking pictures from the front and back cameras, and the WebKit-based browser all work. Video files, at least .mp4 video files, play smoothly. The Weather app is well-designed. Shorts, an RSS reader, is clean; adding new feeds is easy. USB connections to an Ubuntu laptop work, allowing the device's filesystem to be mounted; this allows music and other files to be easily imported to the phone. With Google Maps mobile providing basic directions through the browser (although location detection does not work) and the included Dropping Letters word game being surprisingly compelling, an Ubuntu Touch user can successfully navigate with the device and then keep themselves entertained upon arrival.

On a more technical side, the Mir display server now powers the system; Android's SurfaceFlinger display server is nowhere to be found. That is a remarkable achievement, given that Mir was announced less than a year ago. Linux power users will be delighted to find a Bash shell pre-installed. Not based on BusyBox, this is a full shell that taps into the system's GNU utilities. Yes, grep and GNU nano (the latter's usefulness marred by the absence of a Control button on the keyboard) are available in Ubuntu Touch.

The top utility also works, giving a view of what's running on the device. Among other things, it shows that applications, once started, stay around until they are explicitly closed. While Android kills processes when available RAM gets tight, Ubuntu Touch appears to do no such thing; only closing apps manually brings back relatively smooth performance. This may help explain the sluggishness users encounter when multiple apps are running in the background.

Sluggishness is not the only usability issue here. There is no lockscreen capability to speak of, not even four-digit PIN protection. The notification bar contains multiple drop-down shades, with each shade linked to a particular small icon that must be selected; reaching the right shade the first time can be difficult. Brightness and Bluetooth settings are not retained between screen-blanking-and-resuming, and time zone settings are not remembered between shutdown and startup. The latter is particularly irritating, as I found the time always set four hours late on boot, requiring manual correction after every boot.

A cellular signal strength indicator is a welcome addition to the notification bar, as is an icon for a functioning 3G mobile data connection, which was missed in the February build. Unfortunately, a dropped 3G data connection cannot be reestablished in the same session. A reboot is needed to reconnect, which reacquaints the user with the aforementioned time-zone-setting irritant.

Extending the system's usefulness by downloading a selection of open source third-party apps is a welcome addition to Ubuntu Touch. After setting up an Ubuntu One account on the device, new apps are easy to find and easy to download ... the first time. Unfortunately, quality assurance seems to have been missing on this front. Some of the apps work fine; uTorch does light up the back camera's LED as expected, and the X-Type app, which is a glorified bookmark to an HTML5 top-down shooter web game, provides some entertainment. But some of the apps simply do not work. Two drawing applications would launch but not accept drawing input, and the Saucy Bacon recipe application would not launch. Worst of all, after rebooting the system, I was unable to download any more apps. I kept receiving error messages regarding my Ubuntu One credentials, but nothing I tried (deleting and reentering my account information, starting another new Ubuntu One account and integrating it) worked. There appears to be no means to sideload Touch apps from an Ubuntu desktop either, so I was completely out of luck. That was very annoying, as I really wanted to play Snake.

Much of the first draft of this review was written on Touch's built-in Notes app. The state of Notes provides an excellent summary of what is excellent and what is disappointing about Ubuntu Touch. As with the other built-in applications, Notes keeps the system's theme of simplicity, cleanliness, and elegance. As with most other apps, a simple swipe up from the bottom edge reveals Note's menu. Tapping "Add" brings up a new yellow blank note and a keyboard. Sadly, neither auto-complete nor spelling corrections worked with the portrait keyboard. Tilting the device to landscape changed the device's orientation, including the keyboard's, but most of the keyboard in landscape mode would not accept input; only the portrait keyboard can be used with the device.

The Notes app experience reflects the rest of the system quite well. Enough functionality is available for Linux power users to accept using Ubuntu Touch to drive their personal smartphone. But major bugs and gaps in the system are plentiful. This jars somewhat with Canonical's claims that the October 17 image is a "1.0" release.

Ubuntu Touch is a "1.0" to the extent that the UI works relatively smoothly, the pre-installed apps all work roughly as expected, and mobile data connections can be made. That is, it works as a basic smartphone. Everything else is hit-or-miss.

Ubuntu Touch continues to evolve rapidly. When I reflash for the 14.04 release, I anticipate that most if not all of my above criticisms will be addressed. I also look forward to news of the development of the expected desktop convergence capability. Ubuntu Touch shows a lot of promise, and I eagerly await future developments in this space.

Comments (7 posted)

Distribution quote of the week

I turn off most of the features of any desktop environment, and I never ever use file browsers, so relying on my opinion to choose the default desktop environment for average users is a bit like asking a gourmet chef what brand of microwave to buy.

-- Russ Allbery

Of course, you can certainly refuse to sign such a CLA not because of its content but because you don't like the company behind it. For example if you are convinced Canonical is using contributions to build a machine to eradicate all the kittens on Earth, do not sign the CLA! Think about the kittens!

-- Aurélien Gâteau

Fedora to switch to Python 3 by default

The Fedora Engineering Steering Committee has approved a proposal to switch to Python 3 as the default version of Python for the Fedora 22 release. Among other things, this change implies switching from yum to DNF as the Fedora package manager.

Full Story (comments: 127)

Musix GNU+Linux 3.0 RC1 released

It's been four years since the Musix project had a stable release. Now with Musix GNU+Linux 3.0 RC1 they are close to another stable release. "We updated more than 300 programs from Debian Wheezy, re-installed kde-full, Rakarrack, Jamin, Blender and various other programs for artistic creation. MusixControl works only in terminal, but that will be resolved soon. Icewm desktop, KDE and Lxde are tailored to activities (office, audio, MIDI, graphics, etc.) The Kernel 3.4.14-gnu-RT23 is 100% free as well as all software in Musix and operates in real time for audio and music apps."

Fedora working group announcements

The Fedora Project's plan to create a set of working groups to chart its path forward was covered here recently. Now most of those groups have named their members; announcements have been posted for the workstation, server, cloud, and base design groups. The workstation group has also posted a draft governance charter for review.

Red Hat Enterprise Linux 3 Extended Lifecycle Support 3-month Notice

RHEL 3 ELS will reach its end-of-life in three months. "We encourage customers to plan their migration from Red Hat Enterprise Linux 3 to a more recent version of Red Hat Enterprise Linux 5 or 6. As a benefit of the Red Hat subscription model, customers can use their active subscriptions to entitle any system on a currently supported Red Hat Enterprise Linux 5 release or Red Hat Enterprise Linux 6 release."

Distribution newsletters

Last Week in CyanogenMod (October 26)
Debian Project News (October 28)
DistroWatch Weekly, Issue 531 (October 28)
Ubuntu Weekly Newsletter, Issue 340 (October 27)

Developing the Opus and Daala codecs

By Nathan Willis
October 30, 2013

GStreamer Conference

At the 2013 GStreamer Conference in Edinburgh, Greg Maxwell from Xiph.org spoke about the creation of the Opus audio codec, and how that experience has informed the subsequent development process behind the Daala video codec. Maxwell, who is employed by Mozilla to work on multimedia codecs, highlighted in particular how working entirely in the open gives the project significant advantages over codecs created behind closed doors. Daala is still in flux, he said, but it has the potential to equal the impact that Opus has had on the audio encoding world.

Approaching the battlefield

Mozilla's support for Xiph.org's codec development comes as a surprise to some, Maxwell said, but it makes sense in light of Mozilla's concern for the evolution of the web. As multimedia codecs become more important to the web, they become more important to Mozilla. And they are important, he said: codecs are used by all multimedia applications, so every cost associated with them (such as royalties or license fees) are repeated a million-fold. On your person right now, he told the audience, "you carry with you four or five licenses for AACS." Codec licensing amounts to a billion-dollar tax on communication software. In addition, it is used as a weapon between battling competitors, so it even affects people in countries without software patents.

Moreover, codec licensing is inherently discriminatory. Many come with user caps, so that an application with certain number of users does not need to pay a fee—but many open source projects cannot even afford the cost of counting their users (if they are technically able to do so), much less handle a fee imposed suddenly if the limit is exceeded. In addition, he said, simply ignoring codec licensing (as some open source projects do) creates a risk of its own: license fees or lawsuits that can appear at any time, and usually only when the codec licensor decides the project is successful enough to become a target.

"So here you have the codec guy here saying that the way to solve all this is with another codec," Maxwell joked. Nevertheless, he said, Xiph does believe it can change the game with its efforts. Creating good codecs is hard, but we don't really need that many. It is "weird competitive pressures" that led to the current state of affairs where there are so many codecs to choose from at once. High-quality free codecs can change that. The success of the internet shows that innovation happens when people don't have to ask permission or forgiveness. History also shows us that the best implementations of the patented codecs are usually the free software ones, he said, and it shows that where a royalty-free format is established, non-free codecs see no adoption. Consider JPEG, he said: there are patent-encumbered, proprietary image formats out there, like MrSID, "but who here has even heard of MrSID?" he asked the audience.

Unfortunately, he said, not everyone cares about the same issues; convincing the broader technology industry to get behind a free codec requires that the codec not just be better in one or two ways, but that it be better in almost every way.

Opus day

The goal of being better at everything drove the development of the Opus codec. Opus is "designed for the Internet," he said, not just designed for one specific application. It mixes old and new technologies, but its most interesting feature is its extreme flexibility. Originally, there were two flavors of audio codec, Maxwell said, the speech codecs optimized for low-delay and limited bandwidth, and the music codecs designed for delay-insensitive playback and lots of bandwidth.

But everyone really wants the benefits of all of those features together, and "we can now afford the best of both worlds." Opus represents the merger between work done by Xiph.org, Skype (now owned by Microsoft), Broadcom, and Mozilla, he said. It was developed in the open and published as IETF RFC 6716 in 2012. By tossing out many of the old design decisions and building flexibility into the codec itself, the Opus team created something that can adapt dynamically to any use case. He showed a chart listing the numerous codec choices of 2010: VoIP over the telephone network could use AMR-NB or Speex, wideband VoIP could use AMR-WB or Speex, low-bitrate music streaming could use He-AAC or Vorbis, low-delay broadcast could use AAC-LD, and so on. In the 2012 chart that followed, Opus fills every slot.

Maxwell briefly talked about the design of Opus itself. It supports bit-rates from 6 to 510 kbps, frequency bands from 8 to 48 kHz, and frame sizes from 2.5ms to 60ms. Just as importantly, however, all of these properties can be dynamically changed within the audio stream itself (unlike older codecs), with very fine control. The codec merges two audio compression schemes: Skype's SILK codec, which was designed for speech, and Xiph.org's CELT, which was designed for low-delay audio. Opus is also "structured so that it is hard to implement wrong," he said. Its acoustic model is actually part of its design, rather than something that has to be considered by the application doing the encoding. The development process was iterative with many public releases, and employed massive automated testing totaling hundreds of thousands of hours. Among other things, Skype was able to test Opus by rolling it into its closed-source application releases, but the Mumble open source chat application was used as a testbed, too.

He then showed the results of several tests, run by separately by Google and HydrogenAudio over a wide variety of samples; Opus tests better than all of the others in virtually every test. Its quality has meant rapid adoption; it is already supported in millions of hardware devices and it is mandatory to implement for WebRTC. The codec is available under a royalty-free license, he said, but one that has a protective clause: the right to use Opus is revoked if one engages in any Opus-related patent litigation against any Opus user.

One down, one to go...

Moving on to video, Maxwell then looked briefly at the royalty-free video codecs available. Xiph.org created Theora in 1999/2000, he said, which was good at the time, but "there's only so far you can get by putting rockets on a pig." Google's VP8 is noticeably better than the encumbered H.264 Baseline Profile codec, but even at its release time the industry was raising the bar to H.264's High Profile. VP9 is better than H.264, he said, but it shares the same basic architecture—which is a problem all of its own. Even when a free codec does not infringe on an encumbered codec's patents, he said, using the same architecture makes it vulnerable to FUD, which can impede adoption. VP9 is also a single-vendor product, which makes some people uncomfortable regardless of the license.

"So let's take the strategy we used for Opus and apply it to video," Maxwell said. That means working in public—with a recognized standards body that has a strong intellectual property (IP) disclosure policy, questioning the assumptions of older codec designs, and targeting use cases where flexibility is key. The project also decided to optimize for perceptual quality, he said, which differs from efforts like H.264 that measure success with metrics like peak signal-to-noise ratio (PSNR). A metrics-based approach lets companies push to get their own IP included in the standard, he said; companies have a vested interest in getting their patents into the standard so that they are not left out of the royalty payments.

Daala is the resulting codec effort. It is currently very much a pre-release project—in the audience Q&A portion of the session, Maxwell that they were aiming for a 2015 release date, but the work has already made significant progress. The proprietary High Efficiency Video Coding (HEVC) codec (the direct successor to H.264) is the primary area of industry interest now; it has already been published as a stand-alone ISO/ITU standard, although has not yet made it into web standards or other such playing fields. Daala is targeting the as-yet unfinished generation of codecs that would come after HEVC.

He listed several factors that differentiate Daala from HEVC and the H.264 family of codecs: multisymbol arithmetic coding, lapped transforms, frequency-domain intra-prediction, pyramid vector quantization, Chroma from Luma, and overlapping-block motion compensation (OBMC). Thankfully, he then took the time to explain the differences; first providing a brief look at lossy video compression in general, then describing how Daala takes new approaches at the various steps.

Video compression has four basic stages, he said: prediction, or examining the content to make predictions about the next data frame, transformation, or rearranging the data to make it more compressible, quantization, or computing the difference between the prediction and the data (and also throwing out unnecessary detail), and entropy coding, or replacing the quantized error signal with something that can be more efficiently packed.

The entropy-coding step is historically done with arithmetic coding, he said, but most of the efficient ways to do arithmetic coding are patented, so the team looked into non-binary multisymbol range coding instead. As it turns out, using non-binary coding has some major benefits, such as the fact that it is inherently parallel, while arithmetic coding is inherently serial (and thus slow on embedded hardware). Simply plugging multisymbol coding into VP8 doubled that codec's performance, he said.

Lapped transforms are Daala's alternative to the discrete cosine transform (DCT) that HEVC and similar codecs rely on in the transformation step. DCTs start by breaking each frame into blocks (such as 8-by-8 pixel blocks), and those blocks result in the blocky visual artifacts seen whenever a movie is overcompressed. Lapped transforms add a pre-filter at the beginning of the process and a matched post-filter at the end, and the blocks of the transform overlap with (but are not identical to) the blocks used by the filters. That eliminates blocky artifacts, outperforms DCT, and even offers better compression than wavelets.

Daala also diverges from the beaten path when it comes to intra-prediction—the step of predicting the contents of a block based on its neighboring blocks, Maxwell said. Typical intra-prediction in DCT codecs uses the one-pixel border of neighboring blocks, which does not work well with textured areas of the frame and more importantly does not work with lapped transforms since they do not have blocks at all. But DCT codecs typically make their block predictions in the un-transformed pixel domain; the Daala team decided to try making predictions in the frequency domain instead, which turned out to be quite successful.

But for this new approach, they had to experiment to figure out what basis functions to use as the predictors. So they ran machine learning experiments on a variety of test images (using different initial conditions), and found an efficient set of functions by seeing where the learning algorithm converged. The results are marginally better than DCT, and could be improved with more work. But another interesting fact, he said, was that the frequency-domain predictors happen to work very well with patterned images (which makes sense when one thinks about it: patterns are periodic textures), where DCT does poorly.

Pyramid vector quantization is a technique that came from the Opus project, he said, and it is still in the experimental stage for Daala. The idea is that in the quantization step, encoding the "energy level" (i.e., magnitude) separately from details produces higher perceptual quality. This was clearly the case with Opus, but the team is still figuring out how to apply it to Daala. At the moment, pyramid vector quantization in Daala has the effect of making lower-quality regions of the frame look grainy (or noisy) rather than blurry. That sounds like a reasonable trade-off, Maxwell said, but more work is needed.

Similarly, the Chroma from Luma and OBMC techniques are also still in heavy development. Chroma from Luma is a way to take advantage of the fact in the YUV color space used for video, edges and other image features that are visible in the Luma (brightness) channel almost always correspond to edges in the Chroma (color) channels. The idea has been examined in DCT codecs before, but is not used because it is computationally slow in the pixel domain. In Daala's frequency domain, however, it is quite fast.

OBMC is a way to predict frame contents based on motion in previous frames (also known as inter-prediction). Here again, the traditional method builds on a DCT codec's block structures, which Daala does not have. Instead, the project has been working on using the motion compensation technique used in the Dirac codec, which blends together several motion vectors predicted from nearby. On the plus side, this results in improved PSNR numbers, but on the down side it has negative side effects like blurring sharp features or introducing ghosting artifacts. It also risks introducing block artifacts, since unlike Daala's other techniques it is a block-based operation. To compensate, the project is working on adapting OBMC to variable block sizes; Dirac already does something to that effect but it is inefficient. Daala is using an adaptive subdivision technique, subdividing blocks only as needed.

There are still plenty of unfinished pieces, Maxwell said, showing several techniques based on patent-unencumbered research papers that the project would like to follow up on. The area is "far from exhausted," and what Daala needs most is "more engineers and more Moore's Law." This is particularly true because right now the industry is distracted by figuring out what to do with HEVC and VP9. He invited anyone with an interest in the subject to join the project, and invited application developers to get involved, too. Opus benefited significantly by testing in applications, he said, and Daala would benefit as well. In response to another audience question, he added that Daala is not attempting to compete with VP9, but that Xiph.org has heard "murmurs" from Google that if it looks good, Daala might supplant VP10.

The codec development race is an ongoing battle, and perhaps it will never be finished, as multimedia capabilities continue to advance. Nevertheless, it is interesting to watch Xiph.org break out of the traditional DCT-based codec mold and aim for a generation (or more) beyond what everyone else is working on. On the other hand, perhaps codec technology will get good enough that (as with JPEG) additional compression gains do not matter in the high-bandwidth future. In either case, having a royalty-free codec available is certainly paramount.

[The author would like to thank the Linux Foundation for travel assistance to Edinburgh for GStreamer Conference 2013.]

Comments (6 posted)

Quotes of the week

The belief that every [Debian Developer] is technically omniscient is the reason why we still have so many pointlessly heated debates on this mailing list.

— Stefano Zacchiroli (hat tip to Andreas Nilsson)

Who wrote this! And what was I thinking?

— "Overheard" by Ted Gould

Firefox 25 released

Firefox 25 has been released. See the release notes for details. This version adds Web Audio support among other changes and bug fixes.

Introducing ownCloud Documents

The ownCloud project is adding a collaborative editing feature whereby multiple people can work on the same document simultaneously and watch each other's changes. "All the documents are based on ODT files that live in your ownCloud. This means that you can sync your documents to your desktop and open them with LibreOffice, Calligra, OpenOffice or MS Office 2013 in parallel. Or you can access them via WebDAV if you want. You also get all the other ownCloud features like versioning, encryption, undelete and so on."

Comments (4 posted)

Exim 4.82 available

Version 4.82 of the Exim mail transfer agent has been released. This update adds several new features, such as experimental support for DMARC and PRDR, cutthroat delivery, and enhancements to header generation and authentication.

Full Story (comments: 1)

MELT 1.0 plugin for GCC

Version 1.0 of the MELT plugin for GCC has been released. This plugin supports GCC 4.7 and 4.8, and enables users to write GCC extensions in the Lisp-like MELT language. The release notes provide a taste of the plugin's capabilities for those unfamiliar: "MELT can also be used to explore the GCC (mostly middle-end) internal representations notably Gimple-s and Tree-s. With MELT you can for example, with a few arguments to gcc, find all calls to malloc with a constant size greater than 100 (and you need to do that *inside* the compiler, because that size could be constant folded and using sizeof etc....)."

Python 2.6.9 available

Python 2.6.9 is now available. While this is a security-update-only release, it is noteworthy because it marks the final release of the Python 2.6.x series—and the final release coordinated by Barry Warsaw, who notes: "Over the 19 years I have been involved with Python, I've been honored, challenged, stressed, and immeasurably rewarded by managing several Python releases. I wish I could thank each of you individually for all the support you've generously given me in this role. You deserve most of the credit for all these great releases; I'm just a monkey pushing buttons. :)"

Cisco to release an open-source H.264 codec

Cisco has announced the planned release of its H.264 video codec under the BSD license. "We plan to open-source our H.264 codec, and to provide it as a binary module that can be downloaded for free from the Internet. Cisco will not pass on our MPEG LA licensing costs for this module, and based on the current licensing environment, this will effectively make H.264 free for use in WebRTC." Mozilla has announced that it will incorporate this binary module into Firefox.

Comments (121 posted)

Development newsletters from the past week

What's cooking in git.git (October 23)
What's cooking in git.git (October 25)
What's cooking in git.git (October 28)
What's cooking in git.git (October 30)
Haskell Weekly News (October 24)
OpenStack Community Weekly Newsletter (October 25)
Perl Weekly (October 28)
PostgreSQL Weekly News (October 28)
Ruby Weekly (October 24)
Tor Weekly News (October 30)
TUGboat (October 24)

First Tizen tablet launches in Japan (Engadget)

Engadget reports that a Tizen build kit will be available in Japan. "It's a package that includes developer tools, manuals and technical and consulting services from Systena, but the real star of the kit is the included 10.1-inch developer tablet. Packing a quad-core 1.4 GHz ARM Cortex-A9 processor, 2GB of RAM and 32GB of storage underneath a 1,920 x 1,200 display, this slab offers a Tizen 2.1 experience built specifically for app development and product demonstration."

Comments (2 posted)

Mozilla: Pushing the Web Audio API to its limits

The Mozilla Hacks site has an extensive article on the Web Audio API, which is included in the just-announced Firefox 25 release. "From a game designer perspective we can use the functionality of the Web Audio API to tune the soundscape of our game. We can run a whole lot of separate sounds simultaneously while also adjusting their character to fit an environment or a game mechanic. You can have muffled sounds coming through a closed door and open the filters for these sounds to unmuffle them gradually as the door opens. In real time. We can add reflecting sounds of the environment to the footsteps of my character as we walk from a sidewalk into a church."

GIMP gets advanced metadata support (Libre Graphics World)

Libre Graphics World takes a look at the recent merger of code to add a long-requested feature to GIMP: the ability to edit standard metadata formats like Exif, XMP, and IPTC. This feature is frequently demanded by professional users, although the article notes other useful effects as well. "Also, Commons Machinery team is quite interested in getting GIMP to support preservation of metadata in compound works of art. They already provided a similar patch for Inkscape regarding SVG. And now that GIMP can read XMP, it's possible to make this happen."

CFP Deadlines: October 31, 2013 to December 30, 2013

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

Deadline	Event Dates	Event	Location
November 1	January 6	Sysadmin Miniconf at Linux.conf.au 2014	Perth, Australia
November 4	December 10 December 11	2013 Workshop on Spacecraft Flight Software	Pasadena, USA
November 15	March 18 March 20	FLOSS UK 'DEVOPS'	Brighton, England, UK
November 22	March 22 March 23	LibrePlanet 2014	Cambridge, MA, USA
November 24	December 13 December 15	SciPy India 2013	Bombay, India
December 1	February 7 February 9	devconf.cz	Brno, Czech Republic
December 1	March 6 March 7	Erlang SF Factory Bay Area 2014	San Francisco, CA, USA
December 2	January 17 January 18	QtDay Italy	Florence, Italy
December 3	February 21 February 23	conf.kde.in 2014	Gandhinagar, India
December 15	February 21 February 23	Southern California Linux Expo	Los Angeles, CA, USA

If the CFP deadline for your event does not appear here, please tell us about it.

CentOS Dojo Madrid

The next CentOS Dojo will be held in Madrid, Spain November 8, 2013. Click below for a look at the scheduled talks.