|
|
Log in / Subscribe / Register

LWN.net Weekly Edition for May 26, 2016

The value of drive-through contributions

By Nathan Willis
May 25, 2016

OSCON

The conventional viewpoint among open-source projects is that drive-through contributors—meaning people who make one pull request, patch, or other contribution then are never seen again—are problematic. At best, one would prefer to lure the contributor back, eventually cultivating them into a regular project participant. At worst, they can be seen as a disruption, taking up developers' time for work that may, ultimately, lead nowhere. At OSCON 2016 in Austin Texas, however, Vicky "VM" Brasseur from HP Enterprise presented an alternative viewpoint. Drive-through contributors are a good sign of a healthy project, she said, and optimizing the project to meet drive-through contributors' needs benefits contributors of every stripe.

Brasseur noted at the outset that the desire to capture drive-through contributors and convert them into regulars was a good [VM Brasseur] instinct, but said it was simply out-of-scope for her presentation. Instead, she wanted to explore what motivates drive-through contributors and see how projects can best make use of them. For the purposes of the talk, she explained, a "contribution" included anything that could be kept in version control, be it code, documentation, artwork, or anything else.

There are four major categories of drive-through contributor, she said: self-service contributors, work-related contributors, special-project contributors, and documentation fixers. Self-service contributors are those needing to get the project working for some other, larger purpose; perhaps they fix a bug that affects them, so they submit a patch for that fix and never return. Work-related contributors are similar, except that they are working with the open-source project at their day job, and have no other attachment to it: they will submit patches needed to get the job done, but do not get invested otherwise. Special projects include people enabling obscure hardware ports or supporting peculiar configurations; they care enough to get the project code working for that new scenario, but that is all. And documentation fixers are rather self-evident: they see a missing section or typo, send a fix, and consider the job done.

For projects, the interesting questions are why the contributor shows up at all and why they disappear. Many drive-through contributors tend to show up in the first place just to scratch their own itch (the self-service and special-projects contributor categories in particular). But there are other possibilities, she said: the contributor may have no choice but to use the project, even though they do not care for it, for instance. An example would be a developer who prefers PostgreSQL, but who is working on a MySQL project at their day job. When the job is complete, they will likely return to PostgreSQL. And, in some cases, drive-through contributors submit a patch simply because they like you: they notice something wrong and send a fix because they care about open source.

On the flip side, these contributors depart for a handful of reasons worth reflecting on, too. When the itch is scratched or the work project is finished, they may simply move on. But it is also common for contributors to start with good intentions and simply run out of time; "life happens," she reminded the audience. Drive-through contributors also sometimes move on because they find another project that suits them better. That, too, comes with the territory. Yet there are a few reasons for a drive-through contributor's departure that should worry the project: when contributors find it too difficult to work with the project, when they feel like their contribution was not appreciated, and when some project member treated them poorly.

The latter case is the "asshole problem," she said; it is a common criticism of open-source projects, but it is even more pointed when drive-through contributors are involved. Too many developers treat drive-through contributors with hostility, calling them a "waste of time" or words to that effect. These developers tend to claim that drive-through contributors are time sinks and that they are not "part [Project pyramids] of the community and never will be."

But that represents a misunderstanding, Brasseur said, of how "community" and "contributors" relate. Often, projects talk about community as the base of a pyramid, with contributors a smaller level above that and "core" developers at the top. In reality, she said, the contributor base (including drive-through contributors) is larger than the project community; it is the foundation of the project pyramid—as illustrated in the diagram to the left. "You can't build a community with no contributors—they come first."

Consequently, she said, an increasing number of drive-through contributors should be seen by the project as a positive sign. Conversely, a project with few drive-through contributions may not be as healthy as it thinks it is. Lots of drive-throughs means more people are seeing the project, more people are using the software, and that the process for making a contribution is easy and working well. Therefore, more people will make contributions (of every type) and the community can develop. Improving the number of drive-through contributions means more bugs are located and fixed, more documentation is written, test coverage is expanded, and releases can be faster. Furthermore, the project's reputation is likely to improve as well, with it being seen as friendlier and more accessible. Thus, while there are a lot of ways to measure "project health," she said, growing the number of drive-through contributions improves almost every metric.

She then turned to providing some advice on how to make drive-through contributions easier. About half of the methods revolved around documentation, she said. Better documentation cuts down on questions by providing potential contributors with the necessary information up front, and it standardizes processes across the project.

In particular, Brasseur recommended writing a "quick start" guide that offers a high-level summary of the contribution process, plus an in-depth "how to contribute" document that addresses the steps in detail: how to format patches, how to submit any contributor agreements required, how merges are approved, and so on. These documents minimize the number of "how do I start" and "what do I do next" questions that project members will have to field. She also recommended documenting the project's communication routes (i.e., who to contact on various topics) and a code of conduct. "Writing one won't kill you," she said about the latter. "It just shows people that you give a damn about them."

A few other documents worth creating include a "who's who" that explains leadership roles, subject-matter experts, and any sub-teams within the project, a "rules and processes" document, a jargon file, and a project roadmap. The "rules and processes" document should explain how someone becomes a core contributor, which can be quite inspirational for new contributors to see, as well as various bylaws and governance structures. The project roadmap helps new contributors by explaining the release schedule, the planned features, and what it will take to get a patch into a particular release.

Beyond documentation, she outlined several other methods for improving the contribution experience. Project members can mentor new contributors by doing code reviews, holding IRC "office hours" to answer questions, and by holding hackfests open to the public (perhaps even hackfests specifically geared toward new contributors). Projects can do things to improve their processes as well, she said. Suggestions include tagging "starter" bugs, providing pre-built containers or virtual-machine images of the development environment, and having a public "service level agreement" (SLA) for contributions. The SLA, she explained, means making a pledge that (for example) "we will look at each patch and respond to it within five days." That encourages newcomers by telling them that their effort will not be overlooked, and it sets expectations.

On a larger scale, she said, projects would be wise to cultivate a culture that values contributions and contributors. They can make sure that all contributions are credited in the release notes and Contributors file, they can "default to assistance" when they encounter a new contributor, and they can place a high value on documentation. "It is much easier to document as-you-go than it is to tackle a long list of documentation all at once."

Projects also need to create and enforce a "no assholes" rule, she said. "There's talk in the world about the unicorn 10X developer," she said. "But I don't care how many X's they have; if they act like an asshole they're bringing everybody down." Fortunately, she said, the majority of the time, people who treat others poorly and with hostility are not doing so intentionally—they only know that the rest of the project lets them continue acting the way they do. Most of the time, telling them why there is a problem and what they are expected to do next time is sufficient.

Finally, Brasseur advised projects to engage in outreach to contributors. They should express gratitude for contributions (including drive-through contributions), recognize each contributor, and follow up after the fact. Follow-up may include asking the contributor how their experience was, if there is anything about the process that could be improved, and (in the case of drive-through contributors) why they left. To be certain, not every drive-through contributor will—or even can—be cultivated into a regular project member. But, Brasseur said in closing, "all of the steps you take to maximize drive-through contributions also lead to a healthier project overall."

The session ended with a few questions from the audience; one person asked for examples of large projects that do a good job at the sort of documentation discussed in the talk. Brasseur replied that OpenStack does well in this regard, as do many Apache projects and the Django project. Another audience member asked how to encourage the drive-through contributors who leave for lack of time to reconsider. Brasseur echoed what someone else in the audience offered as a reply: perhaps the best thing a project can do is feel grateful that the drive-through contributor, even when busy, took some of their time to stop and make a contribution.

Comments (12 posted)

Employment agreements for free-software developers

By Nathan Willis
May 25, 2016

OSCON

At OSCON 2016 in Austin Texas, Karen Sandler of the Software Freedom Conservancy (SFC) spoke about an issue that impacts an ever-growing number of free-software developers: employment agreements. As the number of paid contributors to free-software projects grows, so do the complications: copyright assignment, licensing, patents, and many other issues may be codified in an employment agreement, and a developer who fails to consider the implications of an agreement's conditions may be in for an unpleasant surprise years down the road.

Sandler kicked off the session by acknowledging that reading through agreements and contracts is boring stuff. "It's such a drag, I know. Legal stuff is boring, and this is boring even for me. But we have to do it." You only get one chance to sign your employment agreement, she said, but even if you only plan to stay for a year, the terms and conditions included can affect you and your ability to work on free software for many years to come. That is because an employment agreement not only establishes the relationship between the employee and the employer, but it establishes how the employee will [Karen Sandler] make their contributions to free-software projects. For developers who care about their contribution to the free-software community, the details of the agreement can be significant.

We live in an age where we are constantly having to agree to more and more terms-and-conditions documents, Sandler said, to the point where no one has sufficient time to read them all. Employment agreements are different, though: you are not a consumer when you sign one, you are an employee. And your agreement is unique to you; it is not a blanket set of terms-and-conditions. Even if you sign a boilerplate agreement identical to everyone else's in the company, that class of "employees" is much smaller than the class of "consumers" or "customers" addressed by a public click-through license.

You therefore have—and should use—the power to negotiate with the company for what matters to you. Immediately after signing, a power shift occurs, but at the end of the hiring and interview process, the job candidate is in the best possible position to ask for changes to the agreement. Far too many developers never ask for any changes to their agreement, Sandler said, either assuming them to be non-negotiable documents or presuming that everyone in the company has the same agreement. Neither assumption is true; companies "ask for the world" up front, because it is in their best interest, but the clauses and conditions in an employment agreement are almost always malleable and should be just as much a part of the negotiation process as compensation.

Thus, she said, you are not "being paranoid" to read through a potential agreement. The best move is to have a lawyer review the agreement, but at the very least, educating yourself about the potential issues can enable you to spot areas of concern.

The first step is to evaluate your priorities, Sandler said. For many free-software developers, those priorities may be the licensing and patenting of code developed as part of the job, but the goal is for the potential hire to determine what is crucial and what is negotiable, then to examine the agreement. It is also vital, she said, to review all documents related to the potential job, because they may interact with each other. Other documents like contributor license agreements (CLAs) or copyright assignment documents may not be part of the employment agreement itself, but can make a big impact.

Provisions

Sandler then walked the audience through a list of key provisions to look out for. First and foremost is probably the licensing of the software created on the job. The assumption for the OSCON crowd is that the employee will be working on free software and, therefore, the agreement will state that the employee's work will be released under a free-software license, but Sandler reminded the audience that this needs to be in writing to be enforceable. A "general understanding" that the job will entail working only on free software is insufficient—what happens, she asked, if a new manager comes in, the original project is canceled, or if the whole company is acquired?

Furthermore, many free-software developers hired on at companies are hired to work on existing projects (perhaps even projects that they already contribute to). So it is important to verify that the licenses described meet the project community's expectations. Companies new to free software may have a misunderstanding about what licenses are acceptable. Some agreements might be drafted by someone who does not even realize that the employee is coming on board to work on free software, and default to an "everything is proprietary" clause.

A related concern is who owns the copyright on the employee's code. In proprietary software shops, this is not an issue, but more and more free-software developers are demanding that they personally retain their copyrights, she said. Contributions to free-software projects are important to building one's reputation and to accumulating a body of work to show other employers. In an era when many people change employers several times while working on the same code base, she said, it can impede a developer's career for an employer to be the copyright holder for some (or all) of the developer's software.

The scope of employment is another major provision to look out for, Sandler said. She referred the audience to a plot line in HBO's sitcom Silicon Valley that hinged on a company's draconian employment agreements claiming the rights to everything the employees create. In many jurisdictions, such clauses are regarded as unenforceable, but free-software developers may have a harder time establishing which jurisdiction they work in. They may work remotely, or even move around regularly—in which case the local laws, in addition to the jurisdiction expressed in the agreement, can come into play.

It is also vital to many free-software developers to establish that they can contribute to outside projects in their free time. "Exclusivity" clauses are therefore problematic. Furthermore, many employees may accept a given salary with the expectation that they will be allowed to consult on the side or do other freelance work; if the agreement claims exclusivity, this could be disallowed.

Along those lines, employment agreements should also be clear on the status of pre-existing code. If an exclusivity clause makes it impossible for the employee to fix bugs on software they have already written, it could be detrimental.

Free-software developers will also want the nature of public communications to be clearly defined. Because contributions are expected to be made in the open, the agreement should clarify when developers speak or post as themselves and when their communication is deemed to speak for the company. If the employer requires that blog posts, tweets, or emails must be approved by the company in advance, Sandler said, it is a good idea to ask what that process is like.

Patents are another serious topic to consider, since so many in the free-software community have an ideological objection to software patents. Sometimes there are hard-to-miss red flags, such as a "patent wall" at the entrance to the building, but the issue can be subtler. It is vital for concerned developers to ensure that the employment agreement is clear about whether they would be required to file for patents on inventions, or if they would be encouraged to through bonuses and promotions. If the company has a patent portfolio, the employee may also want to ask about its patent-licensing policy. Participation in something like the Open Invention Network (OIN) may be a positive sign—though, she cautioned, OIN's patent pool does not cover every patent, and it does not preclude other negative outcomes like patents being sold to third parties like patent trolls.

Last on the list, Sandler advised looking out for "non-compete" clauses, particularly those that bar "conflicting employment" and could prohibit the employee from working at the job that they consider their key skill set. Employees should push back, noting that an agreement that bars them from working as a software developer later could be ruinous to their career. Here again, companies almost uniformly ask for expansive terms in the agreement, but there is almost always room for bargaining.

Asking questions

It never hurts to ask questions or to ask for changes in an employment agreement, Sandler said. Typically there is room for compromise; she told the audience that she has never worked on an employment agreement negotiation where no changes were made. An audience member asked about non-compete clauses, which are often vaguely worded and broad. Sander replied that it is worth asking "what is it that you really want to prevent?" It may be that the company is only worried about losing employees to some specific competitor, in which case it might be worth renegotiating the clause to be more specific.

Just as there is no harm in asking the company "am I understanding this clause correctly?," Sandler said, there is no harm in asking for enough time to review the agreement in detail. Even if a company asks for an answer the same day, it will likely provide a few days if the developer indicates a desire to read over the provisions in detail. After all, once a job offer is made, the company has decided that it wants to hire the candidate.

She also told the audience to be sure that they keep track of employment agreements after signing them—even after they have moved on to another job, some provisions could still come into play. There have been developers, she said, who raised questions about copyright assignment long after the work was done, and not being able to produce a copy of the employment agreement as evidence can be a serious problem.

Although it is best to renegotiate the clauses of interest (or to get "riders" attached to the employment agreement), she said, sometimes it might not be possible to have changes made, and the informal agreement between the new hire and a manager about the potential job may be all that there is. In that case, she said, if the manager in question actually has the authority to make decisions about copyright assignment, licensing, and so forth, the best thing for the employee to do is to write down the understandings agreed upon and get a confirmation, in writing or email, from the manager that the two parties are on the same page. In may not have the same weight as a formal contract, but it is better than memory alone.

Moving forward

Employment agreements are still a rarely discussed topic in the free-software world but, as Sandler pointed out, the growth of commercial investments in free-software projects is making them more and more important. She ended the session with two tidbits about where matters may head in the future.

First, SFC is working on creating a set of resources for developers and companies to use when crafting employment agreements. Though not ready for publication yet, the concept is to publish a suite of "standard" clauses covering provisions of interest to the free-software world. They would make it easier for developers to propose the changes that they want in agreements, and could even be useful for companies in the long term. If the standard clauses became popular references, a company could specify that it offers "clauses two, five, six, and nine" unambiguously (a bit like the way Creative Commons licenses have standardized certain copyright clauses).

Finally, Sandler told the attendees that "we can create a culture shift" by actively pushing for the provisions that matter in our employment agreements. If even ten percent of developers asked to retain the copyrights on the software they create, Silicon Valley will take notice, she said. Allowing developers to hold their own copyrights might not become the default position in employment agreements, but companies will recognize that it has value and will begin offering it as a benefit. Software is a competitive business, she said; free-software developers have the ability to influence it by raising the ideological issues they care about when negotiating with new employers.

Comments (8 posted)

File-format analysis tools for archivists

May 25, 2016

This article was contributed by Gary McGath

Preserving files for the long term isn't as easy as just putting them on a drive. As xkcd points out, in its subtle way, some other issues are involved. Will the software of the future be able to read the files of today without losing information? If it can, will people be able to tell what those files contain and where they came from?

Digital archives and libraries store files for future generations, just as physical ones store books, photographs, and art; the digital institutions have a similar responsibility for the preservation of electronic documents. In a way, digital data is more problematic, since file formats change more quickly than human languages. On the other hand, effective use of metadata lets a file carry its history with it.

For these reasons, detailed characterization of files is important. The file command just isn't enough, so developers have created a variety of open-source tools to check the quality of documents going into archives. These tools analyze files, reporting those that are outright broken or might cause problems, and showing how forthcoming or reticent the files are about describing themselves. We can break the concerns down into several issues:

  • Exact format identification: Knowing the MIME type isn't enough. The version can make a difference in software compatibility, and formats come in different "profiles," or restrictions of the format for special purposes. For instance, PDF/A is a profile of PDF that requires a file to have certain structural features but no external dependencies. PDF/A is better for archiving (which is what the "A" stands for) than most other PDF files.
  • Format durability: Software that can read any given format fades into obsolescence if there isn't enough interest to keep it updated. Which formats will fare best is a guessing game, but open and widely known formats are a safer bet than proprietary or obscure ones.
  • Strict validation: Many software projects follow Postel's Law: "Be liberal in what you accept and conservative in what you send." Archiving software, though, stands on both sides of the fence. It accepts files in order to give them to an audience that doesn't even exist yet. This means it should be conservative in what it accepts.
  • Metadata extraction: A file with a lot of identifying metadata, such as XMP or Exif, is a better candidate for an archive than one with very little. An archive adds a lot of value if it makes rich, searchable metadata available.

A number of open-source applications address these concerns, some of which we will look at below. Most them come from software developers in the library and preservation communities. Some focus on a small number of formats in intense detail; others cover lots of formats but generally don't go as deep. Some just identify files, while others pull out metadata.

JHOVE

[JHOVE output]

JHOVE (JSTOR-Harvard Object Validation Environment) is the most demanding and obsessive of the lot. It covers a small number of formats in a nitpicking way, which is useful for making sure that software in the future won't have problems. It examines files exhaustively, analyzing them for validity, identifying versions and profiles, and pulling out lots of metadata. I worked on it for a decade, joining the project at the Harvard University Libraries in 2003, writing the bulk of the code, and continuing to support it after I left Harvard. It's now in the hands of the Open Preservation Foundation, which has just released version 1.14.

JHOVE is written in Java and is available under the GNU LGPL license (v2.1 or later). It includes modules for fifteen formats, including image, audio, text-based, and PDF formats. New in version 1.14 (and not yet listed in the documentation) are PNG, GZIP, and WARC.

Each module does extensive analysis on files, looking for any violations of the specification. A file that conforms to the syntactic requirements is considered "well-formed." If it also meets the semantic requirements, it's "valid." For instance, an XML file is well-formed when its tags are all properly matched and nested, etc., and it's valid when it matches its schema, if any.

The fallback format is "Bytestream," which is just a stream of bytes, in other words, any file. In the default configuration, JHOVE applies all of its modules against a file and reports the first one to declare it well-formed and valid. If no other module matches, it reports that the file is a Bytestream. It's also possible to run JHOVE to apply just a single module, for the format that a file is supposed to be. This is useful with defective files, since it will report how they aren't well-formed or valid. That's more helpful than simply declaring them Bytestreams.

If a file is valid, JHOVE will report the version of the format, any profiles that it satisfies, and lots of file metadata. The output can be in plain text or XML. The GUI version shows its output as an expandable outline and can save it as text or XML.

To examine a known TIFF file and get output in XML, the command might be:

    jhove -m TIFF-hul -h xml example.tif

Other Java applications can call JHOVE through its API.

JHOVE is strict, but it isn't designed to examine the data streams in a file, only the file's structure. For instance, in an LZW-compressed TIFF file, it will check that all the tags are well-formed, including StripOffsets and StripByteCounts, but it won't check that the actual strips (i.e., the compressed pixel data) are well-formed LZW data. Thus, JHOVE will catch subtle errors, but it won't find all defects.

DROID and PRONOM

Archivists often have large batches of files to process and need a big picture of what they have: how many in each format, how many risky files, changes in format usage by year or month, how much older format versions are being used, and so on. This is where DROID shows its strength. It's available from the UK National Archive under the three-clause BSD license. Its main purpose is to screen and identify files as they're being ingested into an archive. It works with the National Archive's PRONOM database of formats, identifying files on the basis of their signature or "magic number."

[DROID GUI]

In this regard, it's similar to file, but it performs finer-grained distinctions among formats. For example, within the TIFF format, PRONOM distinguishes the Digital Negative or DNG, which is a universal raw camera format based on TIFF, TIFF-FX for fax images, and Exif files, which are TIFF metadata without an image.

DROID is good at processing large batches of files. Analyzing them involves two steps. First the user "profiles" a set of files, collecting information on them into a single document. From the command line, the user can specify filters telling DROID which files to profile. Unfortunately, the filter language is difficult to figure out, and the documentation isn't as helpful as it might be, but fortunately there's a Google group where people can answer questions. The second step is to generate a report. One command can do both of these. Here's a relatively simple example with a filter that accepts only PDF files and generates a report as a CSV file.

    droid.sh -p "result1.droid" -e "result1.csv" -F "file_ext contains 'pdf'"

Running DROID as a GUI application is easier. In this case, profile creation and report generation are separate steps.

DROID doesn't do much validation or metadata extraction, but it's strong on identifying the format of a file by looking at its signature. This is valuable when processing a large number of files for an archive and weeding out the files that aren't in suitable formats.

ExifTool

Phil Harvey's ExifTool has a different focus. Its specialty is fiddling with metadata and, in spite of its name, it knows about lots of metadata types, not just Exif. It can modify as well as view files, and it's adept at tricks like assigning an author to a group of files or fixing a timestamp that's in the wrong time zone. Its main interest for archivists is its ability to grab and report the metadata in files.

It's aware mostly, but not exclusively, of audio, image, and video formats. It does simple signature-based format identification, along with just enough validation to identify the metadata in a file. ExifTool is available under the Perl license.

It's a versatile piece of software with extensive scripting capabilities. Perl applications can use it through Image::ExifTool. Other code can use its command-line interface as an API, using the -@ and -stay_open options to feed it commands through standard input or an argument file. In addition, a library wraps the command-line interface for use in C++ programs.

ExifTool treats all file properties and metadata as "tags." A command can request specific tags or tag groups. The following command will return a file's type, MIME type, and usual format extension:

    exiftool -filetype -mimetype -filetypeextension sample.png

The output for this would be as follows, assuming it's really a PNG file:

    File Type                       : PNG
    MIME Type                       : image/png
    File Type Extension             : png

A variety of export options are available, including HTML, RDF XML, JSON, and plain text. Output can be sorted, and some tags have formatting options.

Putting it all together: FITS

What if you want a second opinion on a file? Maybe even a third or fourth?

There are lots of free-software tools for file identification and metadata extraction, and space doesn't allow discussing all of them here. Others include MediaInfo, which extracts metadata from audio and video files, the National Library of New Zealand (NLNZ) Metadata Extraction tool, which specializes in a few archive-friendly formats, and Apache Tika, which extracts metadata from over a thousand formats.

All of these applications report different information, and they don't always agree with each other. Some produce more fine-grained identification than others, and some are fussier than others about whether a file is valid. It's desirable to use more than one tool, in case one of them doesn't handle certain cases well. The Harvard Library's File Information Tool Set (FITS) allows using a dozen different tools together.

FITS originally served as a gatekeeper for Harvard's Digital Repository Service (DRS), and it still does. Other institutions now use it too. I worked only briefly on FITS, but my efforts played a significant role in moving it from a Harvard-only tool to one with a larger user and support community. It is available under the LGPLv3.

DROID, ExifTool, and JHOVE are all parts of the repertoire of FITS. So are Tika, file, MediaInfo, the NLNZ Metadata Extractor, an unsupported but still sometimes useful tool called ffident, and several in-house tools.

For all its complexity, running FITS is fairly simple. Here's the simplest useful command, which simply processes the given file with all of the different modules:

    fits -i sample.png

Combining all the tools is tricky for several reasons. They're written in different languages; FITS is in Java, and it invokes non-Java software such as ExifTool through the command-line interface. Their output is in a variety of formats and each tool uses its own terminology.

Where the component tools can produce XML, FITS uses XSLT to convert it to "FITS XML," and then consolidates the outputs into a single XML file. Optionally, it will convert FITS XML to metadata schemas that archives and libraries commonly use, such as MIX, TextMD, and AES Audio Object.

Often the tools won't completely agree about the file, and FITS tries to do conflict resolution. The identification section of the FITS XML output lists the tools that identified the file; if they disagree, it will have the attribute status=CONFLICT. Those who just want one answer can select an ordering preference for the tools and set the conflict reporting configuration element to false. The first tool to give an answer wins.

Because FITS incorporates so many tools, each of which has its own development cycle, into a single application, it's a complicated piece of software to manage. Sometimes it has to stay with older versions of tools until the developers can fix FITS to work with the latest version of the tool.

Final thoughts

Identifying formats and characterizing files is a tricky business. Specifications are sometimes ambiguous. Practices that differ from the letter of the spec may become common; for instance, TIFF's requirement for even-byte alignment is deemed archaic. People have different views on how much error, if any, is acceptable. Being too fussy can ban perfectly usable files from archives.

Specialists are passionate about the answers, and there often isn't one clearly correct answer. It's not surprising that different tools with different philosophies compete, and that the best approach can be to combine and compare their outputs.

Comments (2 posted)

Next week's edition will be published on June 3

Monday, May 30, is the Memorial Day holiday in the US. Here at LWN, we'll be taking the day off to tune up our gas grills, mow the lawn, drink beer, or whatever else it is that we do when we're not trying to keep up with what the community is doing. As a result, next week's edition will be published one day later than usual, on June 3. We'll be back to the usual schedule the following week.

Comments (none posted)

Page editor: Jonathan Corbet

Security

New browser-fingerprinting techniques

By Jake Edge
May 25, 2016

Web tracking, which is generally used by advertisers to target their ads, is not popular in some circles—particularly with privacy advocates and privacy-conscious users. But it is also fairly pervasive. Originally, tracking was done using browser cookies, but tracking techniques have expanded over the years. A recent web survey has found several new ways that advertisers and tracking companies are fingerprinting browsers so that the users sitting in front of them can be tracked across the web.

The Princeton Web Census is a study done by Steven Englehardt and Arvind Narayanan to look at both cookie-based (stateful) and browser-fingerprint-based (stateless) tracking on the top 1 million web sites. The survey was run in January by making some 90 million requests to those sites. The survey was run using OpenWPM, which is an open-source project to make "it easy to collect data for privacy studies on a scale of thousands to millions of site[s]". In addition, the data gathered by the study is available for others to use.

The output of the study was a 24-page paper [PDF] that covers quite a bit of ground. The study looked at cookie-based tracking, as well as cookie syncing, where advertising/tracking companies share cookie IDs either in headers (e.g. the referrer header) or behind the scenes. There are some rather interesting findings, many of which are summarized on the pages linked above, but perhaps the most interesting findings are the new ways tracking companies are trying to fingerprint browsers.

The idea behind fingerprinting is straightforward; gather enough information about the user's browser and its environment (plugins, fonts, User-Agent header, localization settings, etc.) to uniquely (or nearly uniquely) identify the user. The Panopticlick tool from the Electronic Frontier Foundation (EFF) demonstrates the uniqueness of a user's browser. The current version of the tool uses some additional techniques, including canvas fingerprinting—drawing images into a hidden <canvas> element to measure the rendering differences between different browsers.

In the survey, canvas fingerprinting was found on more than 14,000 sites, where the actual tracking scripts came from roughly 400 different domains. The sites that use canvas fingerprinting (and the domains where the scripts originate) are listed on the web census page, as are those that use the newer fingerprinting methods described below.

Browser developers have taken some steps to avoid revealing high-value information like font lists, so the fingerprinters have made efforts to find workarounds. One that the study found uses the JavaScript measureText() method to provide font information. By attempting to draw a specific text string in a large number of fonts and comparing the width of the result to the width obtained using the default font, the tracking script can figure out which fonts are not present (since those will be drawn in the default font, thus have the same width). The study calls this "canvas-font fingerprinting" and found it on more than 3,200 sites. One third party (MediaMath) was responsible for most of the scripts found, but there are five other third parties found that are using canvas-font fingerprinting.

The WebRTC realtime communication feature is another vector for leaking private information that can be used in fingerprinting. In order to facilitate finding the best route between two peers, WebRTC nodes collect information on IP addresses of interest, including those used by local network interfaces (which may well be unroutable NAT addresses from behind a firewall). These addresses are made available to WebRTC, which leads to privacy concerns in its own right, but may also be used for fingerprinting purposes.

The researchers instrumented the WebRTC createDataChannel() and createOffer() API calls, then tried to determine if those calls were made for tracking purposes. In the top 1 million sites, 700 or so delivered scripts that accessed WebRTC, with more than 600 being used for tracking purposes. Furthermore:

The number of confirmed non-tracking uses of unsolicited IP candidate discovery is small, and based on our analysis, none of them is critical to the application. We therefore suggest that WebRTC IP discovery should be private by default, in contrast to the recommendation of a Working Group that recently reviewed the security and privacy concerns.

Another clever "attack" (at least on privacy) uses the Web Audio API to detect differences in the hardware and browser implementation that provide some amount of information about the browser. It is unclear at this point whether there is enough information gleaned from that to provide a fingerprint, but it certainly can be used in conjunction with other techniques.

One of the tracking scripts using the Audio API is simply looking for the presence of certain elements of the API (AudioContext and OscillatorNode) to provide a single bit of information to a more widespread fingerprint. The other two take the output from the oscillator, do some calculations on it, and produce a hash. The researchers only found roughly 500 occurrences of the simplest technique, the other two total to less than 60. This new fingerprint method was found by analyzing known tracking scripts for the use of new APIs.

OpenWPM is Firefox-based, which allowed the researchers to test with certain add-ons that are meant to block tracking scripts, such as Ghostery and EasyList + EasyPrivacy. For the most part, these tools blocked the majority of the more widespread, canvas-based techniques (i.e. canvas and canvas-font) and had less success with the newer fingerprinting methods (i.e. WebRTC and Audio) on sites that use them. For both of these blocking mechanisms, which are blacklist-based, the more prevalent third-party scripts were blocked. That resulted in covering the majority of the sites, but not generally a majority of the scripts, as less-popular scripts that are infrequently used do not get onto the blacklist.

Overall, the paper makes for an engaging look at the user-tracking landscape of the web. It is a reminder that web browsers today have an enormous reach that can be exploited to identify their users. It will be yet another arms race in the digital world, where browser makers and standards groups seek to close or narrow the information leaks (to the extent they can), while advertisers and tracking companies try to find more ways to gather their precious data. But closing the holes is a balancing act and—since vast sums of money are at stake—one suspects that these companies will always find a way to track.

Comments (3 posted)

Brief items

Security quotes of the week

We comply with the laws of the countries in which we operate. But if French law applies globally, how long will it be until other countries — perhaps less open and democratic — start demanding that their laws regulating information likewise have global reach? This order could lead to a global race to the bottom, harming access to information that is perfectly lawful to view in one’s own country. For example, this could prevent French citizens from seeing content that is perfectly legal in France. This is not just a hypothetical concern. We have received demands from governments to remove content globally on various grounds — and we have resisted, even if that has sometimes led to the blocking of our services.
Google appeals a French order to globally apply a "right to be forgotten" removal

As such, a very easy way to remove something from the internet is to accuse its creator of infringing copyright. Worse, the potential downside of such a false claim is minimal: the accused would have to first file a counterclaim, proving they own the copyright; then file a private lawsuit, and prove material damage; and then track down the offending party to actually recover any monies granted by the court.
The Guardian on "censorship by copyright"

It really depends on what your threat model is. If [you're] a high value target to someone with a lot of resources, you're essentially screwed.

It can broadcast information via your speakers, and maybe even your microphone. It can encode data in the timing of your packets as they leave your system. It can encode data in it's power consumption, it can encode data in what it sends to the screen, it can send data out via bluetooth or wifi. There are probably more ways, that I didn't think of off the top of my head.

yoo1I on Intel's Management Engine (ME) at Hacker News (Thanks to Martin Atukunda.)

Comments (42 posted)

Linux containers vs. VMs: A security comparison (InfoWorld)

Over at InfoWorld, Jim Reno compares the security of virtual machines (VMs) and containers. "Which is more secure?" is a question that is often asked, but the answer, of course, is "it depends". Reno analyzes the attack surface of each to help in the choosing between VMs and containers. "Many legacy VM applications treat VMs like bare metal. In other words, they have not adapted their architectures specifically for VMs or for security models not based on perimeter security. They might install many services on the same VM, run the services with root privileges, and have few or no security controls between services. Rearchitecting these applications (or more likely replacing them with newer ones) might use VMs to provide security separation between functional units, rather than simply as a means of managing larger numbers of machines. Containers are well suited for microservices architectures that “string together” large numbers of (typically) small services using standardized APIs. Such services often have a very short lifetime, where a containerized service is started on demand, responds to a request, and is destroyed, or where services are rapidly ramped up and down based on demand. That usage pattern is dependent on the fast instantiation that containers support. From a security perspective it has both benefits and drawbacks."

Comments (14 posted)

A report on the CoreOS remote SSH vulnerability

For those who are curious about how the CoreOS remote SSH vulnerability came to be, the company has posted a detailed report. "This misconfiguration was abetted by confirmation bias. The expected outcome of the change to the CoreOS PAM configuration was for users who presented a password present in an authentication database to be successfully authenticated. Because of the pam_permit failure case explained above, this was the observed behavior in testing, so the change was assumed to be correct. No attempt was made to determine whether the observed behavior could be explained in some other way, such as the system allowing any presented password."

Comments (50 posted)

Mathewson: Mid-2016 Tor bug retrospective, with lessons for future coding

On the Tor blog, Nick Mathewson reports on an informal survey he did for "severe" bugs in Tor over the last few years. It breaks down the 70 bugs he found into different categories that are correlated with some recommendations for ways to try to avoid them in the future. For example: "Recommendation 5.1: all backward compatibility code should have a timeout date. On several occasions we added backward compatibility code to keep an old version of Tor working, but left it enabled for longer than we needed to. This code has tended not to get the same regular attention it deserves, and has also tended to hold surprising deviations from the specification. We should audit the code that's there today and see what we can remove, and we should never add new code of this kind without adding a ticket and a comment planning to remove it." Many of the recommendations are likely applicable to other projects.

Comments (none posted)

New vulnerabilities

bugzilla: cross-site scripting

Package(s):bugzilla CVE #(s):CVE-2016-2803
Created:May 20, 2016 Updated:May 31, 2016
Description: From the Arch Linux advisory:

An attacker can craft a malicious summary within a bug report to host malicious javascript code. This code will be served to a user when he or she navigates to the bug's dependency graph.

An attacker is able to submit a malicious bug report and execute arbitrary javascript code in the client's browser by using the bugzilla server as a pivot.

Alerts:
Fedora FEDORA-2016-5bd283c48b bugzilla 2016-05-28
Fedora FEDORA-2016-6cdcddef2c bugzilla 2016-05-28
Mageia MGASA-2016-0201 bugzilla 2016-05-22
Arch Linux ASA-201605-25 bugzilla 2016-05-19

Comments (none posted)

curl: server spoofing

Package(s):curl CVE #(s):CVE-2016-3739
Created:May 23, 2016 Updated:May 25, 2016
Description: From the CVE entry:

The (1) mbed_connect_step1 function in lib/vtls/mbedtls.c and (2) polarssl_connect_step1 function in lib/vtls/polarssl.c in cURL and libcurl before 7.49.0, when using SSLv3 or making a TLS connection to a URL that uses a numerical IP address, allow remote attackers to spoof servers via an arbitrary valid certificate.

Alerts:
Gentoo 201701-47 curl 2017-01-19
Slackware SSA:2016-141-01 curl 2016-05-20

Comments (none posted)

dhcpcd: code execution

Package(s):dhcpcd CVE #(s):CVE-2014-7913
Created:May 20, 2016 Updated:June 7, 2016
Description: From the Mageia advisory:

The print_option function in dhcp-common.c in dhcpcd through 6.10.2 misinterprets the return value of the snprintf function, which allows remote DHCP servers to execute arbitrary code or cause a denial of service (memory corruption) via a crafted message (CVE-2014-7913).

Alerts:
Debian-LTS DLA-506-1 dhcpcd5 2016-06-06
Mageia MGASA-2016-0190 dhcpcd 2016-05-20

Comments (none posted)

extplorer: cross-site request forgery

Package(s):extplorer CVE #(s):CVE-2015-5660
Created:May 23, 2016 Updated:May 25, 2016
Description: From the CVE entry:

Cross-site request forgery (CSRF) vulnerability in eXtplorer before 2.1.8 allows remote attackers to hijack the authentication of arbitrary users for requests that execute PHP code.

Alerts:
Debian-LTS DLA-485-1 extplorer 2016-05-22

Comments (none posted)

gdk-pixbuf2.0: code execution

Package(s):gdk-pixbuf2.0 CVE #(s):CVE-2015-8875
Created:May 20, 2016 Updated:May 25, 2016
Description: From the Mageia advisory:

The gdk-pixbuf2.0 library is vulnerable to overflows in the pixops_composite_nearest(), pixops_composite_color_nearest() and pixops_process() functions in pixops/pixops.c (CVE-2015-8875).

Alerts:
Ubuntu USN-3085-1 gdk-pixbuf 2016-09-21
Debian DSA-3589-1 gdk-pixbuf 2016-05-30
Mageia MGASA-2016-0192 gdk-pixbuf2.0 2016-05-20

Comments (none posted)

graphicsmagick: denial of service

Package(s):graphicsmagick CVE #(s):CVE-2016-2317 CVE-2016-2318
Created:May 23, 2016 Updated:September 12, 2016
Description: From the Debian LTS advisory:

Vulnerabilities that allow to read or write outside memory bounds (heap, stack) as well as some null-pointer derreferences to cause a denial of service when parsing SVG files.

Alerts:
Arch Linux ASA-201609-6 graphicsmagick 2016-09-09
openSUSE openSUSE-SU-2016:2073-1 GraphicsMagick 2016-08-15
Mageia MGASA-2016-0252 graphicsmagick 2016-07-14
SUSE SUSE-SU-2016:1783-1 GraphicsMagick 2016-07-11
openSUSE openSUSE-SU-2016:1724-1 GraphicsMagick 2016-07-01
Fedora FEDORA-2016-40ccaff4d1 GraphicsMagick 2016-06-19
Fedora FEDORA-2016-7a878ed298 GraphicsMagick 2016-06-19
Debian-LTS DLA-484-1 graphicsmagick 2016-05-21
Debian DSA-3746-1 graphicsmagick 2016-12-24

Comments (none posted)

kernel: two vulnerabilities

Package(s):kernel CVE #(s):CVE-2016-4569 CVE-2016-4558
Created:May 25, 2016 Updated:May 25, 2016
Description: From the Red Hat bugzilla:

CVE-2016-4569: A vulnerability was found in Linux kernel. There is an information leak in file sound/core/timer.c of the latest mainline Linux kernel, the stack object “tread” has a total size of 32 bytes. It contains a 8-bytes padding, which is not initialized but sent to user via copy_to_user, resulting a kernel leak.

CVE-2016-4558: A flaw was found in the Linux kernel's implementation of BPF in which systems with more than 32GB of physical memory and unlimited RLIMIT_MEMLOCK settings an application can overflow a 32 bit refcount.

Additionally in the same environment, malicious applications can overflow a map refcount on larger memory (1Tb). When the overflow wraps to zero a reference can be held while being free'd. This can lead to a use after free.

Alerts:
Red Hat RHSA-2016:2584-02 kernel-rt 2016-11-03
Red Hat RHSA-2016:2574-02 kernel 2016-11-03
openSUSE openSUSE-SU-2016:2290-1 kernel 2016-09-12
SUSE SUSE-SU-2016:2245-1 kernel 2016-09-06
openSUSE openSUSE-SU-2016:2184-1 kernel 2016-08-29
openSUSE openSUSE-SU-2016:2144-1 kernel 2016-08-24
SUSE SUSE-SU-2016:2105-1 the Linux Kernel 2016-08-19
SUSE SUSE-SU-2016:1985-1 kernel 2016-08-08
SUSE SUSE-SU-2016:1937-1 kernel 2016-08-02
SUSE SUSE-SU-2017:0333-1 kernel 2017-01-30
Ubuntu USN-3021-2 linux-ti-omap4 2016-06-27
Ubuntu USN-3016-3 linux-snapdragon 2016-06-27
Ubuntu USN-3017-2 linux-raspi2 2016-06-27
Ubuntu USN-3016-2 linux-raspi2 2016-06-27
Ubuntu USN-3016-4 linux-lts-xenial 2016-06-27
Ubuntu USN-3017-3 linux-lts-wily 2016-06-27
Ubuntu USN-3020-1 linux-lts-vivid 2016-06-27
Ubuntu USN-3019-1 linux-lts-utopic 2016-06-27
Ubuntu USN-3018-2 linux-lts-trusty 2016-06-27
Ubuntu USN-3021-1 kernel 2016-06-27
Ubuntu USN-3018-1 kernel 2016-06-27
Ubuntu USN-3017-1 kernel 2016-06-27
Ubuntu USN-3016-1 kernel 2016-06-27
SUSE SUSE-SU-2016:1690-1 kernel 2016-06-27
SUSE SUSE-SU-2016:1696-1 kernel 2016-06-28
Debian DSA-3607-1 kernel 2016-06-28
SUSE SUSE-SU-2016:1672-1 the Linux Kernel 2016-06-24
openSUSE openSUSE-SU-2016:1641-1 kernel 2016-06-21
Debian-LTS DLA-516-1 kernel 2016-06-17
Ubuntu USN-3007-1 linux-raspi2 2016-06-10
Ubuntu USN-3005-1 linux-lts-xenial 2016-06-10
Ubuntu USN-3006-1 kernel 2016-06-10
Fedora FEDORA-2016-06f1572324 kernel 2016-06-02
Fedora FEDORA-2016-84fdc82b74 kernel 2016-05-25
Scientific Linux SLSA-2016:2574-2 kernel 2016-12-14
Oracle ELSA-2016-3646 kernel 2.6.39 2016-11-21
Oracle ELSA-2016-3646 kernel 2.6.39 2016-11-21
Oracle ELSA-2016-3645 kernel 3.8.13 2016-11-21
Oracle ELSA-2016-3645 kernel 3.8.13 2016-11-21
Oracle ELSA-2016-3644 kernel 4.1.12 2016-11-21
Oracle ELSA-2016-3644 kernel 4.1.12 2016-11-21

Comments (none posted)

libgd2: denial of service

Package(s):libgd2 CVE #(s):CVE-2015-8874
Created:May 20, 2016 Updated:July 6, 2016
Description: From the Debian-LTS advisory:

It was discovered that there was a stack consumption vulnerability in the libgd2 graphics library which allowed remote attackers to cause a denial of service via a crafted imagefilltoborder call.

Alerts:
Red Hat RHSA-2016:2750-01 rh-php56 2016-11-15
Fedora FEDORA-2016-d126bb1b74 gd 2016-07-18
Mageia MGASA-2016-0242 libgd 2016-07-05
Fedora FEDORA-2016-a4d48d6fd6 gd 2016-06-27
SUSE SUSE-SU-2016:1638-1 php53 2016-06-21
SUSE SUSE-SU-2016:1581-1 php53 2016-06-14
openSUSE openSUSE-SU-2016:1553-1 php5 2016-06-11
openSUSE openSUSE-SU-2016:1524-1 php5 2016-06-08
Ubuntu USN-2987-1 libgd2 2016-05-31
Debian DSA-3587-1 libgd2 2016-05-27
Mageia MGASA-2016-0203 libgd 2016-05-22
Debian-LTS DLA-482-1 libgd2 2016-05-19

Comments (none posted)

libxml2: denial of service

Package(s):libxml2 CVE #(s):CVE-2016-3705
Created:May 20, 2016 Updated:May 25, 2016
Description: From the Mageia advisory:

libxml2 limits the number of recursions an XML document can contain so to protect against the "Billion Laughs" denial-of-service attack. Unfortunately, the underlying counter was not incremented properly in all necessary locations. Therefore, specially crafted XML documents could exhaust all available stack space and crash the XML parser without running into the recursion limit (CVE-2016-3705).

Alerts:
Scientific Linux SLSA-2016:1292-1 libxml2 2016-06-23
Oracle ELSA-2016-1292 libxml2 2016-06-23
Oracle ELSA-2016-1292 libxml2 2016-06-23
CentOS CESA-2016:1292 libxml2 2016-06-23
CentOS CESA-2016:1292 libxml2 2016-06-23
Red Hat RHSA-2016:1292-01 libxml2 2016-06-23
Gentoo 201701-37 libxml2 2017-01-16
SUSE SUSE-SU-2016:1604-1 libxml2 2016-06-17
openSUSE openSUSE-SU-2016:1594-1 libxml2 2016-06-16
openSUSE openSUSE-SU-2016:1595-1 libxml2 2016-06-16
SUSE SUSE-SU-2016:1538-1 libxml2 2016-06-09
Ubuntu USN-2994-1 libxml2 2016-06-06
Debian-LTS DLA-503-1 libxml2 2016-06-03
Debian DSA-3593-1 libxml2 2016-06-02
openSUSE openSUSE-SU-2016:1446-1 libxml2 2016-05-30
Arch Linux ASA-201605-27 libxml2 2016-05-26
Mageia MGASA-2016-0187 libxml2 2016-05-20

Comments (none posted)

moodle: multiple vulnerabilities

Package(s):moodle CVE #(s):CVE-2016-3729 CVE-2016-3731 CVE-2016-3732 CVE-2016-3733 CVE-2016-3734
Created:May 19, 2016 Updated:May 25, 2016
Description: From the Mageia advisory:

In Moodle before 2.8.12, users are able to change profile fields that were locked by the administrator (CVE-2016-3729).

In Moodle before 2.8.12, names of hidden forums or discussions could be disclosed as part of the error message on the subscription page (CVE-2016-3731).

In Moodle before 2.8.12, users can view badges of other users without proper permissions (CVE-2016-3732).

In Moodle before 2.8.12, during the course restore, teachers could overwrite the idnumber even without having the capability to change it (CVE-2016-3733).

In Moodle before 2.8.12, possible CSRF in the URL that marks forum posts as read (CVE-2016-3734).

Alerts:
Fedora FEDORA-2016-286bacdbfb moodle 2016-05-21
Mageia MGASA-2016-0180 moodle 2016-05-18

Comments (none posted)

networkmanager: information leak

Package(s):networkmanager CVE #(s):CVE-2016-0764
Created:May 23, 2016 Updated:December 15, 2016
Description: From the Mageia advisory:

NetworkManager before 1.0.12 is vulnerable to a race condition that could lead to a local information leak.

Alerts:
Oracle ELSA-2016-2581 NetworkManager 2016-11-10
Red Hat RHSA-2016:2581-02 NetworkManager 2016-11-03
Mageia MGASA-2016-0195 networkmanager 2016-05-22
Scientific Linux SLSA-2016:2581-2 NetworkManager 2016-12-14

Comments (none posted)

ose3.1: unauthorized access

Package(s):Red Hat OpenShift Enterprise 3.1 CVE #(s):CVE-2016-3703
Created:May 20, 2016 Updated:May 25, 2016
Description: From the Red Hat advisory:

An origin validation vulnerability was found in OpenShift Enterprise. An attacker could potentially access API credentials stored in a web browser's localStorage if anonymous access was granted to a service/proxy or pod/proxy API for a specific pod, and an authorized access_token was provided in the query parameter. (CVE-2016-3703)

Alerts:
Red Hat RHSA-2016:1094-01 Red Hat OpenShift Enterprise 3.2 2016-05-19
Red Hat RHSA-2016:1095-01 Red Hat OpenShift Enterprise 3.1 2016-05-19

Comments (none posted)

ose3.2: two vulnerabilities

Package(s):Red Hat OpenShift Enterprise 3.2 CVE #(s):CVE-2016-3708 CVE-2016-3738
Created:May 20, 2016 Updated:May 25, 2016
Description: From the Red Hat advisory:

A vulnerability was found in the STI build process in OpenShift Enterprise. Access to STI builds was not properly restricted, allowing an attacker to use STI builds to access the Docker socket and escalate their privileges. (CVE-2016-3738)

A flaw was found in OpenShift Enterprise when multi-tenant SDN is enabled and a build is run within a namespace that would normally be isolated from pods in other namespaces. If an s2i build is run in such an environment the container being built can access network resources on pods that should not be available to it. (CVE-2016-3708)

Alerts:
Red Hat RHSA-2016:1094-01 Red Hat OpenShift Enterprise 3.2 2016-05-19

Comments (none posted)

p7zip: two code execution flaws

Package(s):p7zip CVE #(s):CVE-2016-2334 CVE-2016-2335
Created:May 19, 2016 Updated:January 11, 2017
Description: From the Arch Linux advisory:

CVE-2016-2334 (arbitrary code execution): An exploitable heap overflow vulnerability exists in the NArchive::NHfs::CHandler::ExtractZlibFile method functionality of 7zip that can lead to arbitrary code execution. Before decompression, ExtractZlibFile method read block size and its offset from file and after that read block data into static size buffer "buf". Because there is no check whether size of block is bigger than size of "buf", malformed size of block exceeding mentioned "buf" size will cause buffer overflow and heap corruption.

CVE-2016-2335 (arbitrary code execution): An out of bound read vulnerability exists in the CInArchive::ReadFileItem method functionality of 7zip for handling UDF files that can lead to denial of service or code execution. Because volumes can have more than one partition map their objects are keep in object vector. To start looking for item, method tries to achieve proper partition object using to this mentioned partition maps object vector and "PartitionRef" field from Long Allocation Descriptor. Lack of checking whether "PartitionRef" field is bigger than available amount of partition map objects cause read out of bounds and can lead in some circumstances to arbitrary code execution.

Alerts:
Fedora FEDORA-2016-430bc0f808 p7zip 2016-08-01
openSUSE openSUSE-SU-2016:1850-1 p7zip 2016-07-22
Fedora FEDORA-2016-bbcb0e4eb4 p7zip 2016-07-20
openSUSE openSUSE-SU-2016:1675-1 p7zip 2016-06-24
Debian-LTS DLA-510-1 p7zip 2016-06-10
Debian DSA-3599-1 p7zip 2016-06-09
Gentoo 201701-27 p7zip 2017-01-11
openSUSE openSUSE-SU-2016:1464-1 p7zip 2016-06-01
Mageia MGASA-2016-0202 p7zip 2016-05-22
Arch Linux ASA-201605-24 p7zip 2016-05-18

Comments (none posted)

php: two vulnerabilities

Package(s):php5, php7.0 CVE #(s):CVE-2016-3078 CVE-2016-3132
Created:May 25, 2016 Updated:May 25, 2016
Description: From the Ubuntu advisory:

Hans Jerry Illikainen discovered that the PHP Zip extension incorrectly handled certain malformed Zip archives. A remote attacker could use this issue to cause PHP to crash, resulting in a denial of service, or possibly execute arbitrary code. This issue only affected Ubuntu 16.04 LTS. (CVE-2016-3078)

It was discovered that PHP incorrectly handled invalid indexes in the SplDoublyLinkedList class. An attacker could use this issue to cause PHP to crash, resulting in a denial of service, or possibly execute arbitrary code. This issue only affected Ubuntu 16.04 LTS. (CVE-2016-3132)

Alerts:
Fedora FEDORA-2016-4f3c77ef90 php-pecl-zip 2016-07-02
Fedora FEDORA-2016-79ac80a0d5 php-pecl-zip 2016-07-02
Ubuntu USN-2984-1 php5, php7.0 2016-05-24

Comments (none posted)

php5: three vulnerabilities

Package(s):php5 CVE #(s):CVE-2016-4342 CVE-2016-4343 CVE-2016-4346
Created:May 19, 2016 Updated:May 25, 2016
Description: From the openSUSE advisory:

CVE-2016-4342: Heap corruption in tar/zip/phar parser (bsc#977991)

CVE-2016-4343: Uninitialized pointer in phar_make_dirstream() (bsc#977992)

CVE-2016-4346: heap overflow in ext/standard/string.c (bsc#977994)

Alerts:
Red Hat RHSA-2016:2750-01 rh-php56 2016-11-15
Debian-LTS DLA-818-1 php5 2017-02-07
SUSE SUSE-SU-2016:1638-1 php53 2016-06-21
SUSE SUSE-SU-2016:1581-1 php53 2016-06-14
openSUSE openSUSE-SU-2016:1524-1 php5 2016-06-08
Debian-LTS DLA-499-1 php5 2016-05-31
Ubuntu USN-2984-1 php5, php7.0 2016-05-24
openSUSE openSUSE-SU-2016:1357-1 php5 2016-05-19

Comments (none posted)

php-symfony: buffer overflow

Package(s):php-symfony CVE #(s):
Created:May 23, 2016 Updated:May 25, 2016
Description: From the Fedora advisory:

**Version 2.7.13** (2016-05-09) * **security** #18733 limited the maximum length of a submitted username (fabpot) * bug #18730 [FrameworkBundle] prevent calling get() for service_container service (xabbuh) * bug #18709 [DependencyInjection] top-level anonymous services must be public (xabbuh) * bug #18692 add Event annotation for KernelEvents (Haehnchen) * bug #18246 [DependencyInjection] fix ambiguous services schema (backbone87)

Alerts:
Fedora FEDORA-2016-4ad874e6c2 php-symfony 2016-05-20
Fedora FEDORA-2016-f36247d441 php-symfony 2016-05-21

Comments (none posted)

php-ZendFramework2: insecure ciphertexts

Package(s):php-ZendFramework2 CVE #(s):CVE-2015-7503
Created:May 23, 2016 Updated:June 22, 2016
Description: From the Mageia advisory:

Zend\Crypt\PublicKey\Rsa\PublicKey has a call to openssl_public_encrypt() which uses PHP's default $padding argument, which specifies OPENSSL_PKCS1_PADDING, indicating usage of PKCS1v1.5 padding. This padding has a known vulnerability, the Bleichenbacher's chosen-ciphertext attack, which can be used to decrypt arbitrary ciphertexts.

Alerts:
Fedora FEDORA-2016-03c0ed3127 php-ZendFramework2 2016-06-22
Fedora FEDORA-2016-8952105d59 php-ZendFramework2 2016-06-22
Fedora FEDORA-2016-03c0ed3127 php-zendframework-zendxml 2016-06-22
Fedora FEDORA-2016-8952105d59 php-zendframework-zendxml 2016-06-22
Mageia MGASA-2016-0196 php-ZendFramework2 2016-05-22

Comments (none posted)

wireshark: denial of service

Package(s):wireshark CVE #(s):CVE-2016-4085
Created:May 23, 2016 Updated:May 25, 2016
Description: From the CVE entry:

Stack-based buffer overflow in epan/dissectors/packet-ncp2222.inc in the NCP dissector in Wireshark 1.12.x before 1.12.11 allows remote attackers to cause a denial of service (application crash) or possibly have unspecified other impact via a long string in a packet.

Alerts:
Debian-LTS DLA-497-1 wireshark 2016-05-31
Debian DSA-3585-1 wireshark 2016-05-22

Comments (none posted)

wordpress: two cross-site scripting vulnerabilities

Package(s):wordpress CVE #(s):CVE-2016-4566 CVE-2016-4567
Created:May 23, 2016 Updated:May 25, 2016
Description: From the CVE entries:

Cross-site scripting (XSS) vulnerability in plupload.flash.swf in Plupload before 2.1.9, as used in WordPress before 4.5.2, allows remote attackers to inject arbitrary web script or HTML via a Same-Origin Method Execution (SOME) attack. (CVE-2016-4566)

Cross-site scripting (XSS) vulnerability in flash/FlashMediaElement.as in MediaElement.js before 2.21.0, as used in WordPress before 4.5.2, allows remote attackers to inject arbitrary web script or HTML via the query string. (CVE-2016-4567)

Alerts:
Fedora FEDORA-2016-e97a850183 wordpress 2016-05-20
Fedora FEDORA-2016-cf91320535 wordpress 2016-05-21

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The 4.7 merge window remains open as of this writing. See the article below for a summary of the important changes that have been merged for the 4.7 release.

Stable updates: 4.5.5, 4.4.11, and 3.14.70 were released on May 18.

Comments (none posted)

A small waitid() change

System-call fuzzing work recently turned up an interesting problem on many Linux systems: depending on which init system is used, a process can create permanent zombie children by cloning a thread, calling ptrace() on it, then exiting. If the init system is not waiting for exiting processes in the right way, that particular thread will go unnoticed and unreaped; unlike other zombies, it will not roam the system searching for a cerebral-matter snack, but it will sit there forever using memory, which is bad enough.

This is, technically, not a kernel bug: a call to waitid() (which is what most init implementations evidently use) is not supposed to wait for children in the absence of the __WALL flag. But it is a denial-of-service vector, changing every init implementation is impractical, and, besides, the waitid() system call, unlike wait4(), does not even accept the __WALL flag. So Oleg Nesterov decided that this problem should be fixed in the kernel, even if the bug could be said to be elsewhere.

So, as of 4.7, a call to waitid() will wait for child processes running under ptrace(). It will also accept the __WALL flag, but that flag will not actually be necessary to prevent this particular zombie invasion. As Oleg noted in the patch changelog, this is an ABI change: "__WCLONE and __WALL no longer have any meaning for debugger. And I can only hope that this won't break something, but at least strace/gdb won't suffer." The change seems unlikely to cause problems further afield, but one never knows; if problems are found during the 4.7 development cycle, this change may have to come back out.

Comments (none posted)

Kernel development news

4.7 Merge window, part 2

By Jonathan Corbet
May 25, 2016
As of this writing, Linus has pulled almost 9,900 non-merge changesets into the mainline repository for the 4.7 kernel; that includes some 6,500 since last week's summary was written. After the near-record volume of changes that went into 4.6, the community has slowed down a little — but only a little.

Some of the more interesting, user-visible changes pulled this time around include:

  • The tracing subsystem has gained support for histogram triggers, which can perform some types of statistical accumulation in the kernel. This commit contains the documentation additions.

  • The event-filtering code for the tracing subsystem has long been able to follow a specific list of process IDs; in 4.7, there is a new event-fork option that will cause newly-created child processes to be automatically added to the list.

  • The LoadPin security module has been merged. If this module is enabled (not the default), all data loaded into the kernel (modules, firmware, etc.) must come from a single trusted device.

  • The MIPS architecture now supports kernel address-space layout randomization.

  • The PCI Express "downstream port containment" (DPC) feature is now supported. DPC allows the containment of uncorrectable errors in hardware attached via a specific port.

  • There is a new option to randomize the ordering of the free lists in the slab memory allocator; the hope is that more unpredictability will make attacks harder.

  • The out-of-memory detection patch set has been merged. These patches change how the kernel decides that the system is out of memory with an eye toward creating more deterministic and reliable behavior.

  • A process's current umask can now be read from a new field in /proc/PID/status.

  • The "device DAX" mechanism allows persistent memory to be presented as a character device (/dev/dax.X.Y) rather than system memory. This memory can then be accessed (and mapped into user space) without the need to place a filesystem on it.

  • New hardware support includes:

    • Systems and processors: ARM V2M-MPS2 Cortex-M prototyping systems, Oxford Semiconductor OXNAS Family systems-on-chip (SoCs), ASpeed baseboard management controller SoCs, LG Electronics LG1K SoCs, EZchip NPS-based systems, and Loongson-3A R2 MIPS CPUs. See also Arnd Bergmann's description of the new ARM systems for more information, including the fact that the ASpeed submission was evidently motivated by an LWN article.

    • Block: Shingled magnetic recording devices using the Zone ATA command mechanism.

    • Graphics: Analogix ANX78XX video bridges, ARC PGU display controllers, Allwinner A10 display engines, Hisilicon Kirin series frame buffers, and Mediatek MT8173 display subsystems. See also Daniel Vetter's summary for a definitive list of improvements to the Intel graphics drivers in this cycle.

    • Industrial I/O: NXP LPC18xx analog-to-digital and digital-to-analog converter (ADC/DAC) controllers, Analog Devices AD5592R/AD5593R ADC/DACs, Microchip MCP4xxx potentiometers, HOPERF HP206C barometer/altimeters, Maxim DS1803 digital potentiometers, Maxim MAX44000 ambient and infrared proximity sensors, Bosch BMI160 inertial measurement units, ROHM BH1780 ambient light sensors, Vishay VEML6070 UV-A light sensors, HopeRF HP03 digital pressure/temperature sensors, and Aosong AM2315 relative humidity and temperature sensors.

    • Miscellaneous: Samsung Exynos SROM memory controllers, NVIDIA Tegra XUSB pad controllers, NVIDIA Tegra xHCI host controllers, NVIDIA Tegra210 ADMA controllers, Oxford Semiconductor reset controllers, Intersil/Techwell TW686x-based frame grabber cards, Microchip PIC32 serial ports, Microchip PIC32 hardware watchdogs and deadman timers, Intel Broxton digital signal processors, Marvell Armada-8K PCIe controllers, Maxim Semiconductor MAX77620 and MAX20024 power-management ICs, HiSilicon Hi655X power-management ICs, Atmel AT91 SAMA5D2-compatible shutdown controllers, HiSilicon reset controllers, ARM MPS2 UART controllers, CoreSight system trace macrocells, Microchip PIC32 series SPI controllers, and Renesas watchdog timer controllers.

    • Pin control: Intel Baytrail pin controllers, Marvell PXA25x pin controllers, and Broadcom Northstar2 pin controllers.

    • USB: USB Type-C connector system software interfaces and Broadcom Northstar USB 2.0 PHYs.

Changes visible to kernel developers include:

  • The "SG pool" code, providing helpers for the allocation of chained scatter/gather lists, has been moved out of the SCSI code and made available to the rest of the kernel. No documentation exists, but the interface can be seen in lib/sg_pool.c.

  • The pin control subsystem now offers devm_pinctrl_register(), allowing drivers to drop a lot of cleanup code.

  • The KASan memory debugging tool will now "quarantine" freed memory, taking it out of use for some time. The idea is that isolating freed memory in this way will improve the detection of use-after-free errors. KASan has also gained the ability to monitor accesses to user-space memory.

  • The multi-order radix tree patches have been merged, allowing the radix tree to track address ranges greater than a single page.

At this point, the patch flow into the mainline has slowed considerably; just about all of the major trees have been pulled. The merge window has a few more days to run, though; come back next week for a closing summary for this development cycle.

Comments (3 posted)

In search of the right RGB LED interface

May 25, 2016

This article was contributed by Neil Brown

One of the roles of the Linux kernel is to provide uniform, abstract interfaces to varying hardware. When a new class of hardware comes along, it can take a while to understand what the best interface would be. This has been seen in recent months with the appearance of nonvolatile memory in large quantities leading to disagreements over the semantics of DAX filesystem access and the handling of hardware errors. The same basic question has arisen, though in a much smaller way, over the best handling of RGB LEDs — triplets of LEDs, each of a different color, which together can produce a wide range of colors and intensities.

Linux already has support for monochrome LEDs, including minimal support for identifying the color of each LED: the name of the LED can, and sometimes does, include the English name of the LED's color (locomo:green:mail, for example). The simplest approach to managing RGB LEDs is to treat them as three independent LEDs with related names. User-space tools can then follow simple conventions to find related LEDs and create interesting colors as required.

There are two reasons to think this may not be the best long-term solution. The first involves integration with the various "triggers" that Linux supports for LEDs. As Jacek Anaszewski from Samsung explains, there are two classes of source information for triggers. One class has the trigger local to the LED, such as "timer" or "oneshot". These triggers are controlled from user space; programming three triggers in concert might be a little clumsy, but it still allows the full functionality to be used.

The other class of source information is from in-kernel events: CPU load, disk drive activity, network device activity, etc. These currently only adjust the brightness or the duty-cycle of the LEDs, but a natural enhancement would be to allow them to adjust color. That would require the kernel to know how specific LEDs work together to produce different colors. A particular example is the heartbeat trigger. On monochrome LEDs this trigger produces a "thump-thump-pause" pattern designed to mimic the human heartbeat with a rate that increases as the load-average on the system increases. Heiner Kallweit has implemented an alternate heartbeat that works with RGB LEDs and uses the color (ranging from green to red) rather than the rate to represent load.

It is easy to imagine other ways that color information could be used to represent such things as acceptable or worrisome activity from various parts of the kernel. Supporting direct connections from those subsystems to a suitable RGB LED may provide a lot of value.

The second reason that the kernel might benefit from an explicit understanding that three LEDs work together is that this understanding is embedded in some hardware. A good example is the LP5523 LED controller [PDF] from TI that can drive up to nine LEDs. This controller is programmable with three separate engines and space to store 96 16-bit instructions. The instructions are general enough to be usable for computing prime numbers. The three engines naturally align with three sets of RGB LEDs, so allowing the kernel interface to represent these triples is likely to make for a better interface. Even when the LEDs are only accessed from user space, it would be helpful if high-level program requests, such as blink rates or brightness transitions, could be described for the three together so they can reliably be synchronized.

As yet there does not seem to be a clear vision for how generic RGB support might work. Kallweit posted some patches back in March but they have some problems. The basic approach is to present the three LEDs as a single LED device that changes all three colors at the same time, so it can be used as though it were a single white LED. The "brightness" value can be given hue and saturation components as well; this allows color to be changed from user space. This triplet of values is encoded in a single sysfs attribute which, as Pavel Machek highlighted, is not generally seen as acceptable.

One argument against this approach is that there are already devices with tri-color LEDs, such as the Nokia N900 and the motion controller for the Sony PlayStation. These currently use three separate LED devices and they need to be able to continue to work the same way when new functionality is added.

Using HSV (Hue Saturation Value) has some appeal as it includes the current brightness as a subset but, for correct mapping to RGB, a "gamma" value needs to be included, and the kernel may not be the best place to be adding that sort of complexity.

After some discussion, Anaszewski came up with a proposal that could make triggers like the color-based load indicator work with individual red, green, and blue LEDs. A single trigger can already apply to multiple LEDs, so the first step would be to assign that colorful "heartbeat" to each of the three LEDs. Then a new sysfs attribute would be used to configure each one to only display the "red", "green", or "blue" component of the signal. While this feels a little clumsy, it would certainly work and is simple to implement and to understand, which are more important considerations.

This doesn't really address the need to be able to program controllers that expect LEDs to be related rather than completely independent. Machek has some ideas on how that might be approached. There isn't a lot of detail; it essentially involves creating a new "pattern" device in sysfs that represents the capabilities for the engine in the controller. It can be configured and then linked to one or more LED devices. This model seems flexible enough to be able to support both software and hardware pattern generators, but without more details (and code) it is hard to judge it fairly.

Keeping the individual LEDs separate, but allowing them to be combined for pattern generation, seems to be a fairly accurate model of how the hardware works, as there is nothing in the hardware controller that forces the LEDs to be mounted close to each other physically. This match between model and reality bodes well for the design being one that could be successful.

While little details like RGB LEDs might not get as much attention as big-ticket items like massive nonvolatile RAM arrays, they are still quite important as they are exactly the sort of thing we can expect to see more of in the mobile-device space. If we ever want these devices to run mainline kernels, we would do well to work on getting support of these devices into mainline first. Or at least a close second.

Comments (3 posted)

A multi-order radix tree

May 24, 2016

This article was contributed by Ross Zwisler

Radix trees have been a part of Linux for quite some time; an LWN article from a decade ago explained the structure and functionality of them. The radix tree is a general-purpose data structure that is used by many different components within the kernel. It provides an efficient way to create a key-value store, where the key is an unsigned long, referred to as the index, and the value is a void *. The radix tree also stores a few bits of additional information for each entry in the form of tags.

The most common use of the radix tree is to keep track of a collection of pages. In struct address_space, for example, there is a radix tree called page_tree that tracks the in-memory page-cache pages that are associated with a given inode. The key for page_tree is the page offset (pgoff_t) into the file. For normal files, page_tree will map that key to a void * value which is actually the struct page * for the page-cache page at that file offset. For page-cache pages, the radix-tree tags let us track entries that are dirty and which are marked for writeback.

For inodes that take advantage of the DAX ("direct access") feature, there is no page cache sitting between the user processes and the storage. Hence, for DAX inodes there is no need to keep track of struct page * entries via the page_tree. Instead, for DAX inodes, this same page_tree is used to hold DAX exceptional entries that track the state of the persistent-memory pages used by DAX. On x86_64, these exceptional entries are 64-bit values that store several pieces of information, such as the page size (more on that below), the sector offset within the persistent-memory storage, and some flags. From these exceptional entries, DAX knows which dirty pages need to be flushed from the processor cache when an fsync() is received from user space.

For radix-tree uses where there is a one-to-one mapping between keys and values, such as a page_tree that only tracks PAGE_SIZE page-cache entries and/or DAX entries, this all works perfectly. But what about cases where this one-to-one relationship breaks down?

One example of this breakdown is huge pages. On x86_64, regular pages are 4KiB in size. Linux x86_64 also supports "huge pages" that are 2MiB in size, and the Linux DAX code has explicit support for these 2MiB pages. For the page_tree radix tree, this means that the one-to-one relationship between keys and values may not be sufficient.

There is a desire for a 2MiB page to be tracked as a single entity. There would be a single pointer to the 2MiB worth of data and the tag state would be consistent, so that the kernel can reliably track whether the data is dirty.

Existing solutions

As of kernel 4.5, DAX successfully tracks the value and state of 2MiB pages through the use of DAX exceptional radix-tree entries that reserve a few bits to record whether the DAX entry represents a 4KiB page (RADIX_DAX_PTE) or a 2MiB page (RADIX_DAX_PMD). 2MiB page entries (referred to as "Page Middle Directory" or PMD entries) are always inserted at a 2MiB boundary, so DAX is able to support huge-page entries with the following logic:

    pgoff_t pmd_index = DAX_PMD_INDEX(index);

    entry = radix_tree_lookup(page_tree, pmd_index);

    if (entry && RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD)
	    /* operate on the 2MiB 'entry' at 'pmd_index' */
    else
	    entry = radix_tree_lookup(page_tree, index);
	    /* operate on the 4KiB 'entry' at 'index' */

This has the obvious cost that for every radix-tree operation there has to be an extra lookup for the entry using pmd_index to first check whether the index is covered by a 2MiB page. This solution is correct, but is somewhat costly in the RADIX_DAX_PTE case where we have to do a radix_tree_lookup() at both pmd_index and index. Having special-case lookups at multiple offsets also does not make for the cleanest code. When 1GiB page support is added to the DAX code, this solution begins to look even worse, because there will have to be yet another special-case lookup.

Another possible alternative that would not involve an extra lookup would be to represent the 2MiB entry via 512 redundant entries, each at a unique index. This would have the property that any lookup for the indices in the 2MiB range would return a copy of the data pointer, but it has the downside that the tag tracking is no longer consistent among the 512 entries. This would mean that some of the 512 entries could be tagged as clean and some of them could be tagged as dirty, even though they all had the same data pointer as their value.

There would also be a need to be sure to replicate other operations, such as removal, among all 512 entries. This solution has the additional downside that representing a 2MiB page via 512 individual entries adds many extra nodes to the radix tree.

Multi-order radix-tree techniques

Both of the solutions mentioned thus far for dealing with huge pages have left the radix-tree API unchanged. Ideally, there would be a solution where the one-to-one mapping between keys and values in the radix tree can be broken. That would allow inserting an entry that covers multiple 4KiB-sized indices and have operations on indices in that range, such as lookup, tag modification, and removal, all act on the same radix-tree entry. Recently there have been several patch series (1, 2, and 3) posted to the Linux kernel mailing list that set out to accomplish this goal via a solution that is called "the multi-order radix tree".

The basic idea is to add the ability to insert entries that span 2X 4KiB-sized page indices. X is referred to as the page's "order". Using this terminology, existing radix trees, in which every entry is associated with a single index, are composed entirely of order-0 entries. An order-2 entry would cover 22 = 4 adjacent indices. The 2MiB entries for the DAX huge-page example would be order-9 entries, and so on.

[sibling pointer] This new functionality is implemented in the multi-order radix tree using a pair of techniques: sibling entries and elevated entries. The smallest multi-order entry is an order-1 entry that covers two adjacent indices. This is implemented by inserting a special "sibling pointer" for the second index that points back to the actual radix-tree entry.

In this case, lookups, removals, and tag operations on both the base index and on the index for the sibling operate on the same actual entry in a way that is transparent to the user. For orders greater than 1, there can simply be multiple sibling entries that all point back to the actual radix-tree entry:

[multi-order radix 2]

With a multi-level radix tree, there can be up to three different types of pointers. The first are internal pointers, which point from a parent radix-tree node to a child radix-tree node. The second are sibling pointers, which point from one entry in a given radix-tree node to another entry in that same node. The third are the void * data pointers that are stored as part of the key-value store.

The lowest bit of the radix-tree entry, RADIX_TREE_INTERNAL_NODE, is used to distinguish between the void * data pointers and the two types of pointers internal to the radix-tree implementation. Sibling pointers and internal node-to-node pointers are distinguished by looking at the value of the pointer itself. If the pointer points within the same node's slots array, it is a sibling pointer. If not, it points to a child radix-tree node.

If the fan-out of the radix tree happens to match the order of the multi-order entry, it can be represented using an elevated entry that simply lives as a child of one of the intermediate nodes in the tree:

[multi-order radix 3]

Elevated multi-order entries can be the children of intermediate nodes at any level in the tree. Combining these two techniques allows us to have elevated multi-order entries with sibling pointers:

[multi-order radix 4]

Having both sibling entries and elevated entries allows the radix tree to support multi-order entries of any order.

Radix-tree API changes

To use this new functionality, the multi-order radix tree has a few small API changes where an order parameter was needed.

    int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
			    unsigned order, struct radix_tree_node **nodep,
			    void ***slotp);

    int __radix_tree_insert(struct radix_tree_root *, unsigned long index,
			    unsigned order, void *);

__radix_tree_create() is only used in one place in mm/filemap.c, and __radix_tree_insert() is a new API added by the multi-order patches. radix_tree_insert(), the old insertion API that is used by all existing code, is now defined to be:

    static inline int radix_tree_insert(struct radix_tree_root *root,
			    unsigned long index, void *entry)
    {
	    return __radix_tree_insert(root, index, 0, entry);
    }

The API for operations such as node lookup, deletion, tag manipulation, etc. remain unchanged. This has allowed the multi-order radix tree to be implemented with very little change to existing radix-tree users.

Integration with DAX 2MiB support

I recently posted a patch that integrates the DAX code with the new multi-order radix-tree code. As can be seen from that patch, the changes needed to move from the old method for supporting 2MiB pages to the new multi-order radix-tree support are quite small.

We now insert an order-9 entry when we need to track the status of a 2MiB huge page. This is done as follows:

    error = __radix_tree_insert(page_tree, index,
                    RADIX_DAX_ORDER(pmd_entry),
                    RADIX_DAX_ENTRY(sector, pmd_entry));

RADIX_DAX_ORDER() gives us an order of 0 for 4KiB pages and an order of 9 for 2MiB pages.

For the rest of the radix-tree operations, like lookup and tag manipulation, there is no longer a need to first check for a 2MiB PMD entry as a special case. It just operates on the radix tree using the index and the radix tree will do the right thing if that index happens to map to a multi-order entry.

One last thing worth noting is that the multi-order radix tree API currently does not define a way for the user to query the order of a given entry. It is not immediately obvious whether this API is actually needed. The DAX code can infer the order of a given entry by looking at its type: RADIX_DAX_PTE or RADIX_DAX_PMD. When multi-order struct page entries start being inserted, their size can most likely be understood by looking at the page flags. However, if an API to query an entry's order proves useful, it could easily be added.

In conclusion

The multi-order radix tree patches have been present in both Andrew Morton's -mm tree as well as Stephen Rothwell's linux-next tree for several weeks; they were pushed upstream during the 4.7 merge window. The integration between DAX PMD entries and the new multi-order radix-tree code will be merged in 4.8 or later, and will need to take into account the recent DAX page-fault-locking patch series from Jan Kara. The combination of the multi-order radix-tree patches and the locking changes will allow DAX to have locks on a per-page basis, regardless of the size of the page.

DAX will most likely be the first user of the new multi-order capability of the radix tree, but these changes should be interesting to anyone who deals with multiple page sizes. The transparent-huge-page code could probably make use of this new functionality, and it is likely that other users will spring up over time.

Comments (2 posted)

Patches and updates

Kernel trees

Greg KH Linux 4.5.5 ?
Greg KH Linux 4.4.11 ?
Sasha Levin Linux 4.1.25 ?
Sasha Levin Linux 3.18.34 ?
Greg KH Linux 3.14.70 ?
Jiri Slaby Linux 3.12.60 ?

Architecture-specific

Build system

Core kernel code

Device drivers

Device driver infrastructure

Documentation

Jani Nikula Documentation/Sphinx ?

Filesystems and block I/O

Andreas Gruenbacher Xattr inode operation removal ?

Memory management

Networking

Security-related

Miscellaneous

Page editor: Jonathan Corbet

Distributions

Should distributors disable IPv4-mapped IPv6?

By Jonathan Corbet
May 25, 2016
By all accounts, the Internet's transition to IPv6 has been a slow affair. In recent years, though, perhaps inspired by the exhaustion of the IPv4 address space, IPv6 usage has been on the rise. There is a corresponding interest in ensuring that applications work with both IPv4 and IPv6. But, as a recent discussion on the OpenBSD mailing list has highlighted, a mechanism designed to ease the transition to an IPv6 network may also make the net less secure — and Linux distributions may be configured insecurely by default.

Address mapping

IPv6 may look like IPv4 in many ways, but it is a different protocol with a different address space. Server programs wanting to receive connections using either protocol must thus open separate sockets for the two different address families — AF_INET for IPv4, and AF_INET6 for IPv6. In particular, a program wishing to accept connections to any of a host's interfaces using either protocol will need to create an AF_INET socket bound to the all-zeroes wild-card address (0.0.0.0) and an AF_INET6 socket bound to the IPv6 equivalent (written as "::"). It must then listen for connections on both sockets — or so one would think.

Many years ago, in RFC 3493, the IETF specified a mechanism by which a program could work with either protocol using a single IPv6 socket. With a socket enabled for this behavior, the program need only bind to :: to receive connections to all interfaces with both protocols. When an IPv4 connection is made to the bound port, the source address will be mapped into IPv6 as described in RFC 2373. So, for example, a program using this mode would see an incoming connection from 192.168.1.1 as originating from ::ffff:192.168.1.1 (the mixed notation is how such addresses are ordinarily written). The program can also open connections to IPv4 addresses by mapping them in the same manner.

The RFC calls for this behavior to be implemented by default, so most systems do so. There are exceptions, though, one of which is OpenBSD; there, programs wishing to work with both protocols can only do so by creating two independent sockets. A program that opens two sockets on Linux, though, will run into trouble: both the IPv4 and the IPv6 socket will try to bind to the IPv4 address(es), so whichever attempt comes second will fail. In other words, a program that binds a socket to a given port on :: will be bound to that port on both the IPv6 :: and the IPv4 0.0.0.0. If it then tries to bind an IPv4 socket to the same port on 0.0.0.0, the operation will fail as the port is already bound.

There is a way around that problem, of course; the program can call setsockopt() to turn on the IPV6_V6ONLY option. A program that opens two sockets and sets IPV6_V6ONLY should be portable across all systems.

Readers may be less than thoroughly shocked to learn that not every program out there gets all of this right. One of those, it turns out, is the OpenNTPD implementation of the Network Time Protocol. Brent Cook recently proposed a small patch adding the requisite setsockopt() call to the upstream OpenNTPD source, which lives within OpenBSD itself. That patch does not look likely to be accepted, though, for the most OpenBSD-like of reasons.

Security concerns

As mentioned above, OpenBSD does not support IPv4-mapped IPv6 sockets at all. Even if a program tries to explicitly enable address mapping by setting the IPV6_V6ONLY option to zero, its author will be disappointed; that setting has no effect on OpenBSD systems. The reasoning behind this decision is that this mapping brings some security concerns with it. There are various types of attack surface that it opens up, but it all comes down to the provision of two different ways to reach the same port, each with its own access-control rules.

Any given server system may have set up firewall rules describing the allowed access to the port in question. There may also be mechanisms like TCP wrappers or a BPF-based filter in place, or a router on the net could be doing its own stateful connection filtering. The result is likely to be gaps in firewall protection and the potential for all kinds of confusion resulting from the same IPv4 address being reachable via two different protocols. If the address mapping is done at the edge of the network, the situation gets even more complex; see this draft RFC from 2003 for a description of some other attack scenarios that come about if mapped addresses are transmitted between hosts.

Adapting systems and software to properly handle IPv4-mapped IPv6 addresses can certainly be done. But that adds to the overall complexity of the system, and it's a sure bet that this adaptation has not actually been done anywhere near as widely as it should be. As Theo de Raadt put it:

Sometimes people put a bad idea into an RFC. Later they discover it is impossible to walk the idea back to the garbagebin. The result is concepts so complicated that everyone has to be a fulltime expert, on admin side and coder side.

It is not at all clear how many of these full-time experts are actually out there configuring systems and networks where IPv4-mapped IPv6 addresses are in use.

One might well argue that, while IPv4-mapped IPv6 addresses create security hazards, there should be no harm in changing a program so that it turns off address mapping on systems that implement it. But Theo argues that this should not be done, for a couple of reasons. The first is that there are many broken programs out there, and it will never be possible to fix them all. But the real reason is to put pressure on distributors to turn off address mapping by default. As he put it: "Eventually someone will understand the damage is systematic, and change the system defaults to 'secure by default'."

Address mapping on Linux

On Linux systems, address mapping is controlled by a sysctl knob called net.ipv6.bindv6only; it is set to zero (enabling address mapping) by default. Administrators (or distributors) can turn off mapping by setting this knob to one, but they would be well advised to be sure that their applications all work properly before deploying such a system in production. A quick survey suggests that none of the primary distributors change the default for this knob; Debian changed the default for the "squeeze" release in 2009, but the change broke enough packages (anything involving Java, for example) that it was, after a certain amount of Debian-style discussion, reverted. It would appear that quite a few programs rely on address mapping being enabled by default.

OpenBSD has the freedom to break things outside of its core system in the name of "secure by default"; Linux distributors tend to have a harder time getting away with such changes. So those distributors, being generally averse to receiving abuse from their users, are unlikely to change the default of the bindv6only knob anytime soon. The good news is that this functionality has been the default for years and stories of exploits are hard to find. But, as we all know, that provides no guarantees that exploits are not possible.

Comments (105 posted)

Brief items

Release for CentOS Linux 6.8

CentOS has released version 6.8 of its enterprise distribution. "There are many fundamental changes in this release, compared with the past CentOS Linux 6 releases, and we highly recommend everyone study the upstream Release Notes as well as the upstream Technical Notes about the changes and how they might impact your installation." See the "Further Reading" section of the CentOS release notes for links to upstream notes.

Full Story (comments: none)

Stable 1.0.0 release of liveslak

Eric Hameleers has announced the release of liveslak 1.0.0. "The “1.0.0” marker is not the end of its development of course. It means that I consider the project production-ready. It will be used to create Live Editions of Slackware 14.2 (64bit and 32bit) when that is released. There’s still some more ideas for liveslak that I want to implement and those will become available as 1.x releases."

Comments (none posted)

Oracle Linux Release 6 Update 8

Oracle has released Oracle Linux 6.8. There are three available kernel packages; kernel-uek-4.1.12-37.4.1.el6uek for x86-64, kernel-uek-2.6.39-400.278.2.el6uek for i386, and Red Hat compatible kernel-2.6.32-642.el6 for i386 and x86-64.

Full Story (comments: none)

UnitedRPMs Project

The UnitedRPMs Project has been launched. Similar to RPMFusion, UnitedRPMs provides software and addons that are not in the main Fedora repositories, such as multimedia codecs and applications. UnitedRPMs seeks to maintain a solution for people running Fedora Rawhide by creating a Copr-like build system for packages with licensing problems.

Comments (6 posted)

Distribution News

Debian GNU/Linux

Debian Installer Stretch Alpha 6 release

The 6th alpha release of the Debian Stretch installer is available for testing. Debian Pure Blends can now be enabled directly from the software selection screen.

Full Story (comments: 2)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

All About the DC/OS Open Source Project (Linux.com)

Linux.com has talks with Keith Chambers about DC/OS. "The DC/OS project is a software platform that’s comprised entirely of open source technologies. It includes some existing technologies like Apache Mesos and Marathon, which were always open source, but also includes newer proprietary components developed by Mesosphere that we’ve donated to the community and which are fully open sourced under an Apache 2.0 license. Features include easy install of DC/OS itself (including all the components), plus push-button, app-store-like installation of complex distributed systems (including Apache Spark, Apache Kafka, Apache Cassandra and more) via our Universe “distributed services app store”. We’re also tightly integrating our popular Marathon container-orchestration technology right into DC/OS, as the default method for managing Docker containers and other long-running services (including traditional non-containerized web applications, as well stateful services such as databases)."

Comments (none posted)

Rebellin Linux Offers Best of Both Gnome Worlds (LinuxInsider)

LinuxInsider reviews Rebellin Linux. "The Rebellin line avoids the pitfalls that befall many Debian GNU/Linux derivatives. It does not maintain a warehouse full of desktop versions. It is neither a minimalistic Linux line nor a distro stuffed with bloat from packages typical users will never need. Instead, it comes well loaded with essential applications, drivers and codecs to provide a very uncomplicated out-of-the-box user experience. The Rebellin Linux project is very beginner-friendly."

Comments (none posted)

Page editor: Rebecca Sobol

Development

Lost user questions and GitHub

By Nathan Willis
May 25, 2016

OSCON

GitHub is currently the dominant hosting service for open-source software projects. As such, the company is in a unique position to observe and report on trends across a wide swath of the open-source community. At OSCON 2016 in Austin, GitHub's Rachel Berry presented a unique look at the lessons learned by GitHub's support team—"unique" because that team catches a surprisingly high number of support questions from users unable to contact the projects themselves. "There are thousands of questions that aren't making it into your issue tracker where they belong," she said. "The question is how you can capture that and turn these people into your user community."

Berry described her position as "technical supportocat." She works on a team of thirteen who answer questions about the GitHub platform, [Rachel Berry] API, Git in general, and essentially anything else not related to user-account management (which has a separate team). GitHub has about 14 million active user accounts, not to mention all of the users without accounts. They generate about 300 questions a day for the team. Over the past seven years, she said, the team has logged more than 27,000 questions about GitHub-hosted projects—although the frequency seems to be on the increase. She reported that more than 4,000 project-related questions had come in since she submitted her OSCON talk proposal a few months ago.

"These are people who contact us instead of contacting you," she said. "These are questions about your projects." Looking at those requests, some common threads emerge. The first is that 57% of the project-related questions come from people without GitHub accounts—a number that she and several in the audience said was surprisingly high. The question-askers break down into three broad groups, she said: the people who cannot figure out how to contact you, the people who have contacted your project but have never heard back, and the people who are either too embarrassed or too intimidated to contact your project.

When people don't know how to contact you

The first category of people is the one with the easiest solution: put contact information everywhere. But even that solution can entail some subtlety. Berry showed some brief, anonymous snippets from the confusing support questions the team encounters in this category, including those that dive right into the particulars of some technical, project-specific detail. But in one of those snippets, Berry highlighted a phrase: "Command prompt window says to report the error to github.com." Clearly, the prompt's instructions were not detailed enough, at least for that user.

The best practice Berry recommends is to have a formal contact page on your project's web site. But she noted that too many project sites feature a "contact us" button that links to the project's issue tracker on GitHub. Any user who is not already familiar with GitHub and issue trackers, though, will find it baffling to suddenly be taken to a new site. Worse still, some projects link their contact button to GitHub's "new issue" form. And, for any user not logged in to a GitHub account, clicking on that link takes them to a generic GitHub account-creation page instead of the intended target. It is hardly surprising, then, that many people landing on such a page see the "contact us" link in the page footer, and send their request to GitHub support.

The right approach is to separate contacting the project from opening an issue. She pointed to several projects that do an excellent job at this task: Bower, for example, provides multiple contact options (StackOverflow, IRC, Twitter, and mailing lists). RethinkDB also provides multiple options, though it loses a few points by calling the relevant page "community," which new users might not look for.

Regardless of what one puts on the project web site, Berry said, the same information should be duplicated in the GitHub repository's README file. That is because the GitHub README is frequently the user's entry point, rather than the web site. Thus, it should point visitors to the information they seek. All-too-common problems on GitHub include projects burying their contact information at the bottom of a multi-page project description README, or having a blank (or nearly blank) README file. GitHub's server logs indicate that most visitors who file support requests spend less than one minute on the repository page before contacting support, she said; projects need to optimize for that and put their contact information where it cannot be missed.

People who never hear back from your project

The second group of support requests comes from people who tried to reach out to the project and never heard back. Clearly, this lack of interaction is not beneficial for the user or the project. In some cases, the user's attempt to contact the developers is foiled by the project owner turning off the GitHub issue-tracking feature entirely. That should almost never be done, Berry said. Perhaps it is a viable approach for mothballed projects but, even then, the fact that the project is not accepting issues should be explained in the README.

Opening an issue and never hearing back is frustrating for the user, of course. But the simple reality is that many developers work on their GitHub projects in their limited free time—she referred the audience to Michael Bromley's recent "Why I haven't fixed your issue yet" blog post. She proposed three strategies to make responding to issues less arduous and, therefore, reduce the amount of lag time experienced by users who open bug reports.

The first is GitHub's issue-template system. This is a new feature of the site, by which projects can create a file named ISSUE_TEMPLATE.md that will automatically be inserted into the new-issue form. Projects can use this to help ensure that bug reports are sufficiently detailed and include as much relevant information as possible. Similarly, the second suggestion is GitHub's saved-replies feature. This allows users to save text snippets that they can then insert into issue discussion threads, to (for example) ask for clarifications, request more information, or acknowledge a new issue.

The third suggestion is to have a dedicated bug-triage team. That is a hard approach for smaller projects, she admitted, because of the time it requires. But she encouraged even small projects to try it, as a part of the "support-driven development" strategy. And most developers begin by doing their own bug triage, she noted, only stopping when the number of issues becomes too large. At that point, she said, it is worth considering that "bug triager" can be an important job for new team members, and an opportunity for mentoring incoming contributors.

People too intimidated to try contact

The final group of people who contact GitHub with support requests about hosted projects is those who are either too embarrassed or too intimidated to try. Here again, Berry showed some anonymous quotes from such support requests. One noted that a developer "checked the code and added a commit, and is asking me to merge his sample code, which is what I don't understand." Another said they were interested in contributing to a project, however "I can't find the process to follow." In both cases, what the project is missing are instructions for the user to follow.

At the very least, she said, projects ought to have a CONTRIBUTING file in their repository that describes the procedures to follow when a bug is found, when the user wants to fix a bug or add a feature, and when the user wants to ask a question. If a file named CONTRIBUTING exists in the repository, GitHub will provide a link to it along with a short message at the top of each new-issue page. These procedures can also be documented on GitHub Pages and project wikis, she said. It can be time-consuming, she agreed, "but I guarantee it has returns for a long time to come."

It is also useful to tag issues with labels like "first-timers only" to encourage new contributors. Berry also recommended an "up-for-grabs" tag, since few things are as discouraging as starting work on a bug only to discover at the end that someone else has fixed it and the time was wasted. There are several initiatives promoting this sort of project-management approach, she said, like up-for-grabs.net and First Timers Only. Both are designed to help new contributors get the hang of the project's preferred workflow.

It is also helpful to give detailed feedback on pull requests—at least, to those users that really want it. She noted that it can burn out developers to invest time responding to a pull request and get nothing in reply, as Sindre Sorhus had said on Twitter. Berry suggested making a GitHub saved reply to send that could be used to test whether or not the user who made the pull request was interested in doing more. It could include a few quick notes and a detailed next step, then end with "I can help you if you want, but this could be time-consuming. Let me know if you want to proceed." She referred the audience to a issue at the ReactiveUI repository as a good example.

Ultimately, she said, the goal of this sort of interaction is to keep the new contributor coming back for more. That is a big problem space involving many factors—in essence, it is the focus of the entire community-management discipline. But making your project's community the best that it can be is a critical part of making your project the best it can be, she said. "There are people out there trying to participate, just waiting for you to meet them."

Comments (10 posted)

Brief items

Quotes of the week

The whole point was to make free software attractive to business by de-emphasising the whole "freedom" part of it. Instead, OSI promised that by making your software open source, you would have better software, that open source was a better development model, leading to cheaper, less buggy software.

The "cheaper model" thing is also still a fairly popular meme nowadays. When you look at free projects in Ohloh.com, one of the lines is how much money it would have cost to build this or that under some model called COCOMO.

I'm not trying to say that OSI is right or wrong about its promises. Some free software really is less buggy than non-free variants. It probably is way cheaper to develop Linux when all of the big companies chip in a few developers here and there to maintain it. All I'm saying is that we have forgotten that with the word "open source", certain promises came attached to it. Some of these promises might even appear to be broken in some cases.

Jordi Gutiérrez Hermoso (Thanks to Paul Wise.)

Documentation gives your project super powers. It is a gift to the people of today, who don't have to answer a bunch of questions, and a gift to the people of tomorrow, who can look at why things were done and how.
— VM Brasseur at OSCON 2016

Comments (2 posted)

Roundcube Webmail 1.2.0 released

Version 1.2.0 of the Roundcube web-based email system has been released. The headline feature this time around would appear to be support for encrypted mail with PGP; the encryption can be handled either centrally in the server, or in the browser via the "Mailvelope" browser plugin. A complete list of changes can be found in the changelog.

Comments (none posted)

systemd v230 is available

Version 230 of systemd has been released. Among the changes, DNSSEC is now turned on by default in systemd-resolved, systemd-logind will, by default, terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out, and support has been added for the unified control group hierarchy added in kernel 4.5.

Full Story (comments: 16)

GNU make 4.2 released

Version 4.2 of GNU make has been released. Changes noted in the NEWS file include a new variable $(.SHELLSTATUS), "set to the exit status of the last != or $(shell ...) function invoked in this instance of make. This will be "0" if successful or not "0" if not successful.", the ability to query the parallelism in use through MAKEFLAGS (even when the job server is enabled), and more exact reporting of where errors and warnings are encountered.

Comments (1 posted)

GitLab 8.8 released with Pipelines and .gitignore templates

GitLab 8.8 has been released with pipeline visualization, .gitignore templates, the GitLab Container Registry, and more. "In this release, we are supercharging GitLab CI. First with Pipelines and now with GitLab Container Registry. GitLab Container Registry is a secure and private registry for Docker images. It isn't just a standalone registry; it's completely integrated with GitLab. In fact, our container registry is actually the first Docker registry that is fully-integrated with git repository management and comes out of the box with GitLab 8.8. So if you've upgraded, you already have it! Our integrated Container Registry requires no additional installation. It allows for easy upload and download of images from GitLab CI. And it's free."

Comments (none posted)

Newsletters and articles

Development newsletters from the past week

Comments (none posted)

Berkus: Changing PostgreSQL Version Numbering

On his blog, Josh Berkus asks about the effects of changing how PostgreSQL numbers its releases. There is talk of moving from an x.y.z scheme to an x.y scheme, where x would increase every year to try to reduce "the need to explain to users that 9.5 to 9.6 is really a major version upgrade requiring downtime". He is wondering what impacts that will have on users, tools, scripts, packaging, and so on. "The problem is the first number, in that we have no clear criteria when to advance it. Historically, we've advanced it because of major milestones in feature development: crash-proofing for 7.0, Windows port for 8.0, and in-core replication for 9.0. However, as PostgreSQL's feature set matures, it has become less and less clear on what milestones would be considered "first digit" releases. The result is arguments about version numbering on the mailing lists every year which waste time and irritate developers."

Comments (13 posted)

Repurposing Old Smartphones for Home Automation (Linux.com)

Linux.com has an interview with Dietrich Ayala about using old smartphones for home automation. "Ayala spent a lot of time studying the readouts from sensors, as well as from the phone’s microphone, camera, and, radios, that would enable a remote user to draw conclusions about what was happening at home. This contextual information could then be codified into more useful notifications. With ambient light, for example, if it suddenly goes dark in the daytime, maybe someone is standing over a device, explained Ayala. Feedback from the accelerometer can be analyzed to determine the difference between footsteps, an earthquake, or someone picking up the device. Scripts can use radio APIs to determine if a person moving around is carrying a phone with a potentially revealing Bluetooth signature."

Comments (40 posted)

Page editor: Nathan Willis

Announcements

Calls for Presentations

Join us for the first OpenPGP conference

OpenPGP.conf will take place September 8-9 in Cologne, Germany. "OpenPGP.conf is a conference for users and implementers of the OpenPGP protocol, the popular standard for encrypted email communication and protection of data at rest. That protocol is the foundation of encryption software like PGP, GnuPG, Mailvelope, OpenKeyChain, and others." The call for papers ends June 15.

Full Story (comments: none)

CFP Deadlines: May 26, 2016 to July 25, 2016

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

DeadlineEvent Dates EventLocation
May 29 September 20
September 23
PyCon JP 2016 Tokyo, Japan
May 30 September 13
September 16
PostgresOpen 2016 Dallas, TX, USA
June 3 June 24
June 25
French Perl Workshop 2016 Paris, France
June 4 July 30
July 31
PyOhio Columbus, OH, USA
June 5 September 26
September 27
Open Source Backup Conference Cologne, Germany
June 5 September 9
September 10
RustConf 2016 Portland, OR, USA
June 10 August 25
August 26
Linux Security Summit 2016 Toronto, Canada
June 11 October 3
October 5
OpenMP Conference Nara, Japan
June 15 September 8
September 9
First OpenPGP conference Cologne, Germany
June 15 November 16
November 17
Paris Open Source Summit Paris, France
June 20 September 9
September 11
Kiwi PyCon 2016 Dunedin, New Zealand
June 22 September 19
September 23
Libre Application Summit Portland, OR, USA
June 26 October 11
October 13
Embedded Linux Conference Europe Berlin, Germany
June 30 November 29
December 2
Open Source Monitoring Conference Nürnberg, Germany
July 7 November 14
November 16
PGConfSV 2016 San Francisco, CA, USA
July 13 October 25
October 28
OpenStack Summit Barcelona, Spain
July 15 October 12 Tracing Summit Berlin, Germany
July 15 September 7
September 9
LibreOffice Conference Brno, Czech Republic
July 15 October 11 Real-Time Summit 2016 Berlin, Germany
July 22 October 7
October 8
Ohio LinuxFest 2016 Columbus, OH, USA
July 24 September 20
September 21
Lustre Administrator and Developer Workshop Paris, France

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

SciPy 2016 Conference Tutorials and Talks Announced

SciPy takes place July 11-17 in Austin, TX. "This year's 3 major talk tracks include Python in Data Science, High Performance Computing, and general Scientific Computing. Our six mini-symposia include: Earth and Space Science, Engineering, Medicine and Biology, Case Studies in Industry, Education, and Reproducibility."

Full Story (comments: none)

EuroPython 2016

EuroPython 2016 will take place July 17-24 in Bilbao, Spain. Keynote speakers Jameson Rollins and Rachel Willmer have been announced. Jameson, a staff scientist in the LIGO project, based at the California Institute of Technology, will present "LIGO - The Dawn of Gravitational Wave Astronomy". Rachel has been working at the "bleeding edge" of technology for 30 years, as programmer, network engineer, manager, and startup founder. Her keynote is titled "30 years of Fun & Profit Through Technology".

There will be a complete PyData track at EuroPython. The PyData track will be part of the EuroPython 2016 conference, so you won’t need to buy extra tickets to join. There will be more than 30 talks, 5 training and 2 poster sessions dedicated to PyData on July 21-22.

Comments (none posted)

Events: May 26, 2016 to July 25, 2016

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
May 1
June 29
Open Source Innovation Spring Paris, France
May 26 NLUUG - Spring conference 2016 Bunnik, The Netherlands
May 28
June 5
PyCon 2016 Portland, OR, USA
June 1
June 2
Apache MesosCon Denver, CO, USA
June 4
June 5
Coliberator 2016 Bucharest, Romania
June 11
June 12
Linuxwochen Linz Linz, Austria
June 11 TÜBIX 2016 Tübingen, Germany
June 14
June 15
PyData Paris 2016 Paris, France
June 19
June 21
DockerCon Seattle, WA, USA
June 20
June 23
OPNFV Summit Berlin, Germany
June 21
June 25
Third Julia Conference Cambridge, MA, USA
June 21
June 22
Deutsche OpenStack Tage Köln, Deutschland
June 21
June 24
Open Source Bridge Portland, OR, USA
June 21
June 28
Wikimania Esino Lario, Italy
June 22
June 26
openSUSE Conference 2016 Nürnberg, Germany
June 22
June 24
USENIX Annual Technical Conference Denver, CO, USA
June 23
July 1
DebCamp Cape Town, South Africa
June 24
June 25
French Perl Workshop 2016 Paris, France
June 24 Swiss PostgreSQL Day Rapperswil, Switzerland
June 24
June 25
devopsdays Silicon Valley 2016 Mountain View, CA, USA
June 24
June 25
Hong Kong Open Source Conference 2016 Hong Kong, Hong Kong
June 27
July 1
12th Netfilter Workshop Amsterdam, Netherlands
June 27
July 1
Hack in Paris Paris, France
July 2
July 9
DebConf16 Cape Town, South Africa
July 8
July 9
Texas Linux Fest Austin, TX, USA
July 11
July 17
SciPy 2016 Austin, TX, USA
July 13
July 15
ContainerCon Japan Tokyo, Japan
July 13
July 14
Automotive Linux Summit Tokyo, Japan
July 13
July 15
LinuxCon Japan Tokyo, Japan
July 14
July 16
REST Fest UK 2016 Edinburgh, UK
July 17
July 24
EuroPython 2016 Bilbao, Spain

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2016, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds