LWN.net Weekly Edition for May 7, 2015 [LWN.net]

FontForge and moving forward

By Nathan Willis
May 6, 2015

LGM 2015

The annual Libre Graphics Meeting conference provides a regular update on the development of a number of free-software applications for visual graphics and design work. At this year's event in Toronto, one of the most talked-about sessions was Dave Crossland's take on the future of FontForge, the free-software font-editing tool.

Despite all of the progress that FontForge has made in the past few years, he said, recent developments have convinced him that the momentum lies elsewhere. In particular, newer projects started by dissatisfied FontForge users are likely to close the feature gap, and the type-design community seems more interested in engaging with those efforts. Perhaps predictably, that assessment of the current state of affairs sparked quite a bit of discussion—including a debate on the relative merits of desktop versus web-based applications.

Crossland began by recounting his personal experience with FontForge. He discovered the project while studying graphic design in college, and subsequently used it during his graduate work in the University of Reading's type-design program. Needless to say, virtually every other student used one of the proprietary, commercial font-design applications instead, and FontForge proved difficult to use in a number of ways.

It was after that experience that Crossland got directly involved in FontForge's development. Working with Google's Web Fonts office (which exclusively uses open fonts and relies on an open-source toolchain) he was able to fund some contractors to improve FontForge. He personally underwrote additional development with his share of the proceeds from Crafting Type, a series of type-design workshops that he co-hosted.

The workshops used FontForge, and the development work that they prompted led to important bug fixes and new features (such as collaborative editing). They also led to a significantly improved packaging situation for Windows and Mac users. Nevertheless, even after several hundred students went through the workshops, only a tiny fraction of them stayed with FontForge (in the post-talk discussion, Crossland estimated the number was in the single digits).

FontForge's technical debt was clearly a problem. The application had been initially developed as a one-off project by a developer who later lost interest and left the project. Although it more or less worked, the code was not organized in a way that let new developers get involved, it contained a lot of hard-coded designs that were hard to change, and relied on a custom widget set that was difficult to maintain.

Crossland cited one example for reference: when users asked that a certain text-entry box be made elastic in size rather than fixed-width (so it could expand to fill the available horizontal space), implementing the change took more than three hours. That was far from an ideal situation, but, he said, it also shed light on a deeper problem: users would not stick with the project and help make FontForge better because contributing small design patches or even observational notes was difficult to impossible.

Freedom to compete

Meanwhile, a growing collection of new free-software font editors was cropping up, the vast majority of which are implemented as web applications. He cited four in particular: Typism, Fontclod, Prototypo, and Glyphr Studio. These web-based tools are not as full-featured as FontForge, he said, but they are developing at a rapid clip and, even more importantly, they are attracting considerable input and involvement from working type designers.

Thus, when Crossland got involved with the Metapolator project (a whole-font-interpolation program that can be used, for example, to rapidly generate multiple weights from one font), he pushed that team to adopt a similar model: build a web-based application, and solicit input from type designers. That strategy has been successful enough, he said, that he decided he cannot justify making further contributions to FontForge.

The latest round of FontForge development has given the application robust support for importing and exporting the Unified Font Object (UFO) interchange format. Soon, users will be able to create a basic font in FontForge, interpolate its weight and other properties in Metapolator, then perform any additional tweaks in FontForge.

But he expects Prototypo, Glyphr Studio, and other UFO editors to catch up to FontForge's functionality; that and the already existing ecosystem of open-source UFO scripts and utilities (most of which originate from users of other, proprietary font editors) may make FontForge irrelevant. "It seems like a lot of people want a free-software font editor and get so frustrated with FontForge that they leave and start their own," he said in summary. "Maybe we need to work on 'conservation' of FontForge rather than 'restoration' work trying to turn it into a modern editor."

Moreover, the web-based applications have demonstrated an ability to draw end users into the development and design process—something that desktop applications rarely, if ever, achieve. One of the ultimate goals of free software's "four freedoms" is to enable the user to participate in development, Crossland said. The newer, web-based font-development applications can do so in a way that FontForge has never been able to. JavaScript and CSS are easier to understand and tweak than are C and C++.

Feedback

At the end of the session, a number of audience members took issue with what they interpreted as advice from Crossland for developers to stop working on FontForge. For example, one audience member noted that he had looked at Glyphr Studio and found it to be far behind FontForge in terms of its supported features.

Another audience member suggested that if a shortage of contributors was holding FontForge back, then the project should figure out what it needs and run a public crowdfunding campaign to raise support. Crowdfunding is an approach that has been increasingly successful for free-software graphics applications of late, and Crossland had mentioned successful Kickstarter drives by the competing web-based font editors in his talk.

To those points, Crossland responded that the technical debt of FontForge is beyond what a crowdfunding campaign can raise to fix, and that the web-based editors may be behind today, but are catching up quickly. Furthermore, he added, C-based desktop applications still separate users and developers into disjoint classes. Type designers are required to learn CSS and JavaScript as part of their jobs, so they can easily get involved with web-application development. "The ideal cultural impact of software freedom is co-creation. We want a proactive culture of design and development, and I don't think traditional desktop software is the ideal way to create that."

Øyvind "Pippin" Kolås from GIMP then disagreed strongly with the notion that web-development languages could compete with C and C++. CSS and JavaScript user interfaces rely on levels of abstraction to keep things simple, he said; the core development underneath is just as complex in a "web stack" as it is in native code.

It is easy to get started writing something new (regardless of the development platform); but over time, he said, web applications will become just as complicated as native ones—if not worse, given the abstraction layers required. Thus, ditching FontForge in favor of partially completed web applications—just to solicit UI patches from type designers—is throwing the baby out with the bathwater. The community has something that works now, while the alternatives do not do much of anything by comparison.

Crossland replied that he was not trying to make a blanket statement about native development, much less telling anyone to stop working on GIMP, Scribus, or other desktop tools. He conceded Kolås's point about the underlying complexity of applications, regardless of the programming stack. He would also be happy to see others push FontForge development forward, he added; he just cannot justify it for himself. As time on the clock was running out, FontForge developer Frank Trampe cheerfully assured the crowd that he was continuing to contribute to the project, which got a round of applause.

Because the session was the last talk of the day, the discussion carried over into the evening event that followed. Ultimately, most people seemed to come away with a clearer understanding of the distinct points under debate, which were a bit conflated at the outset. The size and scope of the technical debt in FontForge is one issue; the broader competition between web-development and native application stacks is a separate one. Crossland later posted a FAQ entry on the Metapolator wiki to explain his concerns and Kolås's, together and in more detail.

The larger question about if or when a project accumulates too much technical debt to be manageable was one that resonated with a lot of attendees at the event. There were, in fact, several other web-based applications on display in the other sessions, some of which are taking on established free-software desktop application projects. Technical debt and support problems for aging code can plague any project; no one seems to have a silver bullet.

It was also noted in the discussion that web and desktop programming platforms have begun to overlap in a number of ways. GNOME Shell is scripted in JavaScript, for example, while GTK+3 and Qt's QML both rely on CSS-like styling. On the other hand, Mozilla and Google are both exploring approaches (such as asm.js and Native Client) that bring C and C++ to web applications. As hard as it may be to predict where FontForge and Glyphr Studio will be in a year's time, it is clear that the tug-of-war between desktop and web development is far from over.

Comments (9 posted)

A Libre Graphics Meeting showcase summary

By Nathan Willis
May 6, 2015

LGM 2015

Every year, there are a variety of talks at Libre Graphics Meeting that showcase entirely new work. While these new-project sessions frequently highlight still-in-development ideas or code that may not be quite ready for packaging and mass distribution, they are always a fascinating counterpoint to the progress reports from established application projects. At this year's event, the new projects showed off work designed to help crowdsource image collection, to do desktop publishing with HTML, and to transform 3D meshes into 2D images and back again.

The List

Matt Lee, technical lead at Creative Commons, introduced The List in a session on the first day of the event. The List is, in essence, a social-networking system in which users post public requests for images, and other users respond by posting matching images—images they have created from scratch, photographs they have taken, physical artwork they have scanned, and so forth. The project uses a free-software Android app as its interface, with an iOS app to follow shortly.

The crux of the problem that The List sets out to solve is that everyone needs images to communicate, but few people have the skills to create high-quality imagery (even among people for whom open content is a priority). The main example Lee discussed was contributions to Wikipedia; his talk was (not coincidentally) scheduled right after one from a Wikipedia volunteer who described that project's efforts to generate higher-quality SVG illustrations.

The app was developed with support from the Knight Foundation, and ultimately aims to be useful for a range of purposes, including collecting images for journalists, non-profits, and cultural institutions. The goal is a lofty one; harnessing the collective power of crowds to find obscure or out-of-print cultural works, for example, is arguably a higher calling than filling in missing photographs of buildings on Wikipedia. Creative Commons also plans to use The List app as a means to explain the values behind Creative Commons itself: at first startup, users are given a walkthrough of Creative Commons licenses before they are presented with the categorized list of requested images.

At the end of the talk, Lee outlined several of the challenges facing the project team. Internationalizing the requests for images is a hard problem, he said, relying as it does on human language and cultural context. There is also concern about whether or not the average smartphone camera will produce photos good enough to meet the needs of the people making image requests. The List's backend is not limited to smartphone photos, of course (all of the uploaded images are stored at The Internet Archive, which can handle essentially any file type). A web or application-based interface could easily allow the upload of other image types. The future of The List may include support for these other upload types—among other features, like enabling the device's geographic location to alert the user to nearby image-taking opportunities.

Finally, there is the question of image and request moderation. Balancing the desire for an open community with the Internet's tendencies for attracting trolls is a difficult challenge, to be sure. But it is one that Creative Commons, Wikipedia, and the Internet Archive have all worked at maintaining for several years now.

html2print

On the second day of the conference, Stéphanie Vilayphiou of Open Source Publishing (OSP) presented a talk about html2print, its tool for doing print-ready page layout using HTML and CSS. Printable web documents are not a new idea, of course, but many of the contemporary projects rely heavily on the HTML5 <canvas> element. That makes the resulting output more like a drawing than a text document: individual elements are not accessible through the document object model (DOM), and there is no separation between the code and the design.

Html2print relies on some lightweight CSS tools (such as the preprocessor LessCSS). It defines page dimensions (including paper size and margins) as CSS properties, so that they can be adjusted and inspected easily, and allows the creation of re-flowable text elements using CSS Regions. It even allows the user to define CMYK spot colors as separate "inks" that can then be "mixed" in individual elements (e.g., 40% of chartreuse_1 and 15% eggshell_2). That feature is particularly useful for print work, because by default HTML and CSS only support RGB color at 8-bit-per-color granularity.

OSP likes to use collaborative editors like Etherpad for all of its projects, and html2print is designed to support that. It lets multiple users edit a large project simultaneously. It also supports what she called "responsive print" (a reference to responsive web design). She showed a demonstration using a booklet project: resizing the page-size parameters automatically adjusts other properties (like font size) to match. Large-format, book-like dimensions trigger a lot of text per page, while scaling the page dimensions down to pocket size proportionally scales the text so that its printed size will remain readable.

Vilayphiou explained a few of the challenges that OSP had to tackle along the way, such as implementing a new zoom tool. Relying on the web browser's zoom function was a bad idea, she said, because zooming invariably triggers a re-render operation, which often causes difficult-to-fix artifacts. The zoom tool was more or less a solved problem, she said, but others have proven trickier—such as the inherent instability of the various browser engines. Two years ago, for instance, Chrome worked perfectly, but then the project dropped WebKit for its own Blink engine and discontinued support for CSS Regions.

Vilayphiou closed out the session by noting that html2print does not yet have a proper license attached, which she said was simply an accident. She encouraged anyone with interest in preparing print documents for publication to take a look and offer feedback, although at this stage new users might need to set time aside to learn their way around the tool.

Embedding 3D data in 2D images

Architect Phil Langley presented what he called a "digital steganography" project in his session on day three. Steganography is often associated with hiding information, he said, but the term originally just referred to embedding one message inside another. The most common example is encoding a string inside the least-significant bits of an RGB image, but Langley showed several other real-world techniques, such as an audio track embedded in a photograph, and concealing one text message within another. He also noted that depth-map images such as those produced by the Microsoft Kinect camera are actually encoding 3D data in a 2D form.

That final example leads into the project that Langley undertook. A course he teaches at the Sheffield School of Architecture involves advanced 3D modeling, and he recently set up a Twitter bot to send out status reports as the software progressed through its calculations. The trouble was that tweets are only 140 characters long, which is hardly enough to explain progress in detail. So the team set out to hack a solution that used a different feature instead. Twitter allows large images to be attached to Tweets, so the team looked for a way to encode their 3D meshes as 2D images.

The initial attempt was fairly straightforward. Each face of a typical 3D model is a triangle, and each vertex of that triangle includes three coordinates (x,y,z). They tried mapping the (x,y,z) triples to RGB values, thus using three pixels to represent each triangle. While this worked, it did not provide much resolution: 8-bit-per-channel graphics allowed just 256 units in each spatial dimension, which is quite coarse.

After further refinement, they settled on a plan that used the 24 bits of an RGB pixel to encode vertex coordinates more efficiently. The 3D model is first sliced into thin horizontal bands; each band has far fewer units in the z direction, so more resolution is available for the x and y coordinates. Furthermore, the resulting image somewhat resembles a 2D projection of the model.

This encoding scheme allowed the class's Twitter bot to generate and send PNG images that could quickly be converted into 3D models. The compression ratio was impressive, too: Langley showed a 6.7KB PNG image that represented a 156.8KB file in the STereoLithography (STL) format.

But that was not the end of the experimentation. Once the image-based format was working, Langley and the students started experimenting with altering the PNGs in GIMP and other image-editing tools. Some image transformations would have a predictable effect on the resulting 3D model (such as stretching or compressing), while others were more chaotic. There were practical uses to treating the models as 2D images, too—Langley showed a tool that graphed the changes in a model over time by plotting each step's 2D image together in a sequence.

These three sessions might not be described as a representative sample of LGM 2015, consisting, as they do, of such wildly different projects. But they do provide a glimpse into what makes the conference interesting every year. Html2print has clear application for those in the publication-design field, while Langley's 3D-to-2D object transformation is notably more experimental, and The List presents an interesting new take on consumer-grade content creation. Together, they are nothing if not thought-provoking.

Comments (2 posted)

Video editing and free software

By Nathan Willis
May 6, 2015

LGM 2015

Two talks at the 2015 Libre Graphics Meeting in Toronto came from video-editing projects. One was an update from Natron, a relatively young project that deals with video compositing, while the other was a reflection on ten years' worth of development on the general-purpose non-linear editor (NLE) Pitivi. Both are active projects, but they take two markedly different approaches: one aims to support an existing industry standard, while the other must build its core functionality from the ground up.

Natron

Alexandre Gauthier-Foichat gave the first report of the two, describing progress in the Natron visual effects (VFX) compositing application. Gauthier-Foichat began with a brief discussion of the project's background: it is developed at the French Institute for Research in Computer Science and Automation (INRIA), which means "it is funded by people's taxes," and the team consists of developers, a vision scientist, and several visual effects artists, among others.

Natron's 1.0 release was made in December 2014, and the project has attracted considerable attention in recent months—presenting at SIGGRAPH and other conferences. In January, it won a "best innovation" award at the inaugural Paris Images Digital Summit. Gauthier-Foichat gave a preview of what has been happening on the development side—work that is expected to be released as Natron 2.0 by the end of May.

Natron provides a node-based editing environment in which users can visually construct a effects pipeline (by connecting processing nodes on a canvas) that is then used to process video. It implements an industry standard plugin interface called OpenFX that is supported by a wide variety of other VFX applications. That well-supported standard helps Natron gain acceptance, and not just through compatibility with other applications' plugins. The reality of large productions like television and movies requires small VFX studios to collaborate and exchange data, so many of the studios write their own scripts and tools to be compatible with OpenFX, too.

For the upcoming 2.0 release, the Natron team has reworked the user interface, in particular making the node-connection window easier to use. The team also spent a lot of time talking to VFX artists, who began bringing feature requests to the project as Natron grew in popularity. The number one request, he said, was for "interaction with 3D," but it took considerable research to narrow that request down into a specific feature set. Natron 2.0 will feature 3D support in the form of interaction with Blender, allowing users to place depth into the arrangement the images and clips that make up their scene, rather than treating all of the scene elements as flat pieces on the same plane.

It will also add new scripting features. Python scripting (using Python 3) will be supported, but not just for automating tasks. The new version will be the first to feature scriptable nodes—meaning that a video element can be manipulated by custom code in the middle of a processing pipeline. Natron's implementation will adhere to another widely used industry standard, SeExpr.

He showed several examples of SeExpr scripts in action, including an interactive lighting tool that lets users move illumination sources around the screen and immediately see updated results—without having to re-run the rendering process. Pixar offers a similar tool called LPics, he said, but Natron does the same thing in just a few lines of code. Still more work is yet to come, including G'MIC filter support and support for CUDA hardware acceleration.

Pitivi

Jean-François Fortin Tam presented the second of the talks, about the GTK+-based NLE Pitivi, which recently passed its tenth birthday.

Ten years is a noteworthy milestone, Fortin Tam pointed out. Free-software NLE projects have a habit of starting strong then dying out; he showed a timeline graphic that depicted the rise and fall of many other free-software NLEs over the past decade. NLEs have developed their own form of the classic pick any two conundrum, he said: users can choose a subset of "soon," "cheap," and "complete" but they cannot have all three. "You can get something fairly usable, but if you want it in the next decade, you need to pay people to work on it."

This is a problem, he said, because there is ultimately no business model that supports building a free-software NLE. Even a successful Kickstarter campaign can only generate enough funds to pay "a McDonald's salary" for three to six months. The reason is that less than one percent of computer users use Linux, and of those less than one percent use NLEs. As if one percent of one percent wasn't bad enough, he estimated only a fraction of that number would ever donate more than a dollar toward development.

But financing is not the only hurdle. There are "DevOps challenges" facing an NLE project, too. Regardless of whether they are paid or not, most teams are only one to three people, which is nothing compared to the hundreds employed by each proprietary NLE company. Furthermore, he said, technology making up the development platform keeps changing in front of you.

He then played a clip from the ending of Rambo III on which Pitivi-related dialog had been superimposed. In Fortin Tam's version of the scene, two protagonists on foot are congratulating each other on having completed a port to GStreamer Editing Services, only to be surprised by the arrivals of an ever-increasing horde of armed adversaries in tanks and helicopters—who are labeled "GTK+3," "GObject Introspection," "GStreamer 1.0," and so on—and demand that they "throw down their old libraries" in surrender. Keeping up with the pace of the desktop application stack, it seems, is not easy for Pitivi's small team. The code has undergone several rewrites in its ten-year history.

Then again, he continued, the video industry changes rapidly, too. Ten years ago, capturing video from external analog and digital sources was critical; today it is ancient history. High-definition has given way to 4K, 8K, and 3D video in rapid succession, and users get mad when their HD movies take more than five minutes to render. "When I was young," he said, "I was happy if it took less than a day." On the plus side, users "no longer have to take out a mortgage" to make a quality movie, and computing power has "gone through the roof."

The big question is how a project like Pitivi can make forward progress under such circumstances. Fortin Tam said that the project's strategy is to focus on enhancing the core, not the user interface. The next release will "finally kill off" GNonLin (the NLE library that originated in the GStreamer project) in favor of a new, integrated editing layer. Design flaws in GNonLin have held back the Pitivi core for years, and the project finally decided that fixing the library was not possible.

Moving forward, he said, Pitivi has to ignore legacy support issues (such as external capture), and focus solely on the modern video-editor workflow. It has to ignore the frequent calls for collaborative-editing features as well, he said. People often say they want it, but when one surveys working NLE users, no one uses collaboration features in other products.

The project also needs to focus on being a storytelling tool, he said, and not try to be every part of the production workflow from caption editor to VFX compositor to render farm. That means adjusting to the notion of being part of a "creative suite" of applications that the users will use, he said—like making sure Pitivi works with Natron. There are still technical features that need addressing, like color management and hardware acceleration, he said. But while the future may hold amazing new features, too ("3D video, holograms, and ponies," he said), focusing on letting users tell their stories is what will make Pitivi relevant for the next ten years.

One topic that Fortin Tam did not directly address in his talk was the influence that Blender has on the various efforts to develop an NLE. Like it or not, Blender has steadily expanded the scope of its built-in toolset, which has had the side effect of stealing away a large number of Linux NLE users. The talk schedule for the week included several sessions led by video artists describing their work; the vast majority use Blender for editing and exporting their output.

Video is a rapidly changing area for software developers. Natron has grown into a popular and stable application in a short amount of time (a bit less than two years). But it also has the advantage of sticking to the widely used OpenFX specification—so it gains a lot of functionality for free, so to speak—and it has a larger potential user base by virtue of being cross-platform. Pitivi does not have either luxury; after ten years and several under-the-hood rewrites, the project may finally have found a solid footing on which to build—but it will still be a challenge.

Comments (11 posted)

SpamAssassin 3.4.1 released

By Jonathan Corbet
May 6, 2015

One occasionally sees articles suggesting that the volume of spam on the net is in decline, but nobody would be foolish enough to argue that the spam problem has gone away. Industrial-strength spam-filtering tools are still a necessity for anybody whose email address is known by more than about two other people. For the minority of us who have not given in and moved to Gmail, SpamAssassin tends to be the spam-filtering tool of choice. In recent years it sometimes seems like the spammers are moving more quickly than the SpamAssassin project, so the announcement of the Apache SpamAssassin 3.4.1 release on April 30 — the first in over a year — is naturally of interest. A version-number bump from 3.4.0 to 3.4.1 would not seem to indicate major changes, but, in truth, the SpamAssassin developers have been busy.

The "auto whitelist" (AWL) feature of SpamAssassin has long been one of that program's more annoying aspects. In theory it tracks the emails from each sender to get an overall sense of whether they are trustworthy; email from a trusted source will get a bonus score, while messages from apparent spammers will be penalized. The sad truth of the matter, in your editor's experience, is that a spammer need only get a small number of messages through to convince the AWL that everything else should be whitelisted. If SpamAssassin's other scoring mechanisms were perfect, this kind of AWL corruption would not be a problem — but then the AWL would not be needed at all. In a world where scoring is imperfect, the AWL often seems to make things worse.

In 3.4.1, the SpamAssassin developers have tried to address some of the problems with the AWL by replacing it with a new mechanism called TxRep. The basic idea remains the same: track each sender's activity and adjust the score of new messages toward the mean of what has been seen in the past. But a number of useful changes have been made in how this tracking is done, starting with an expansion of the set of data that is used. TxRep maintains reputation scores for the sending email address (as did the AWL), but also the sending domain name, the IP addresses of the originating system and the server that transferred the message, and the "HELO" string used by the last server. For any given message, each of these quantities is mixed in with its own (user-configurable, naturally) weight.

Another useful change is that the sa-learn utility (until now used only with the Bayesian filter) can be used to train TxRep, so the same command now works to update both filters. There is a "dilution" mechanism that causes newer messages to have more influence on a sender's score than older ones, making the system more responsive should, say, a spammer repent and start actually sending useful stuff (or should TxRep initially misjudge a sender). TxRep can be used to whitelist (or blacklist) senders or IP addresses outright — something that might be worth doing automatically for the most obvious of spam or for messages that have been explicitly classified by the recipient. There is also a mechanism to automatically whitelist the recipients of outgoing mail — though that could have undesired effects if one is prone to sending irate responses to spammers.

With these changes, TxRep should be able to avoid some of the worst AWL pitfalls, though the documentation still recommends against turning on auto-learning until SpamAssassin as a whole has been tuned well. But the whole thing still seems to be built around the idea that people can be spammers part of the time and senders of legitimate email at others. Perhaps your editor is an excessively unforgiving character, but it seems like the sender of known spam should not get off lightly with a gradual tweaking of a reputation score; once a spammer, always a spammer. Trust is hard to earn but easy to lose; the TxRep mechanism still doesn't quite reflect that fact.

The PDFInfo module, which has long existed outside of the SpamAssassin mainline, has now been merged; PDFInfo, as its name would suggests, looks for spammy PDF attachments. There is one other new module, URELocalBL, which allows blacklisting of spammy links using a local database.

SpamAssassin 3.4.1 can do a more thorough and careful job of normalizing all messages to the UTF-8 character set before applying rules. That should help to eliminate various tricks using strange character sets to get around the spam-checking rules.

An interesting addition to the Bayesian filter is the ability to hash MIME attachments and use the result as a filter token. If it works well, it should allow the filter to recognize often-repeated spam payloads as a whole. But, as the manual page notes, "not much experience has yet been gathered regarding its usefulness". It seems worth a try, in any case.

Beyond all of this work, of course, is the constant challenge of maintaining the rule base in the face of a changing spammer landscape. Spammers may now be more concerned with getting past Gmail's filters than SpamAssassin, but there are still signs that a subset of spam has been tested against SpamAssassin until the rules are unable to stop it. The Bayesian filter helps with that problem, but so does an ongoing effort to keep those rules current. It is thus unsurprising that a new SpamAssassin release contains a long list of rule changes that should help to keep its effectiveness up — until the spammers work around those as well.

Your editor has often heard the complaint that email is reaching a point of complete uselessness. Such claims overstate the reality — one need only watch how email keeps our development communities going to see that. But email has been under attack for many years, making life harder for both email users and those who are charged with running email systems. It is fair to say that SpamAssassin is one of a small set of tools that has helped email to survive the ongoing spammer onslaught, so it is good to see this tool continuing to evolve.

Comments (6 posted)

Security quote of the week

Now, if Mozilla is willing to establish a cadre of bonded and insured site design experts willing to perform https conversions for such sites without charge, and help maintain them forevermore, well, I'd certainly be interested in having that conversation. That's using the carrot approach I mentioned earlier.

On the other hand, plans to try use those sharp sticks and prods to try bully these sites into the https camp like cattle -- well, if you think the world has a mixed view of technologists now, if Mozilla gets its way we'll end up with a positive rating on par with politicians -- if we're lucky.

I very much want to see an Internet where all communications are securely encrypted, but only if it's done the right way, with sites and users treated as valued partners with a full understanding of their resource constraints and sensibilities -- and not as "losers" to be treated with what amounts fundamentally to arrogant contempt.

— Lauren Weinstein

Comments (22 posted)

Unboxing Linux/Mumblehard: Muttering spam from your servers (WeLiveSecurity)

WeLiveSecurity reports that ESET researchers have revealed a family of Linux malware that stayed under the radar for more than 5 years. They are calling it Linux/Mumblehard. "There are two components in the Mumblehard malware family: a backdoor and a spamming daemon. They are both written in Perl and feature the same custom packer written in assembly language. The use of assembly language to produce ELF binaries so as to obfuscate the Perl source code shows a level of sophistication higher than average. Monitoring of the botnet suggests that the main purpose of Mumblehard seems to be to send spam messages by sheltering behind the reputation of the legitimate IP addresses of the infected machines."

Comments (15 posted)

Mozilla: Deprecating Non-Secure HTTP

The Mozilla community has declared its intent to phase out "non-secure" (not encrypted with TLS) web access. "Since the goal of this effort is to send a message to the web developer community that they need to be secure, our work here will be most effective if coordinated across the web community. We expect to be making some proposals to the W3C WebAppSec Working Group soon."

Comments (137 posted)

ax25-tools: denial of service

Package(s):

ax25-tools

CVE #(s):

Created:

April 30, 2015

Updated:

May 6, 2015

Description:

From the Fedora advisory:

Fixed crash when processing ROSE packets (by rose-fix patch)

Alerts:

Fedora	FEDORA-2015-6517	ax25-tools	2015-04-30
Fedora	FEDORA-2015-6424	ax25-tools	2015-04-30

Comments (none posted)

clamav: multiple vulnerabilities

Package(s):

clamav

CVE #(s):

CVE-2015-2170 CVE-2015-2221 CVE-2015-2222 CVE-2015-2668

Created:

May 4, 2015

Updated:

May 13, 2015

Description:

From the Arch Linux advisory:

CVE-2015-2170 (denial of service): A flaw has been found in the UPX decoder with crafted files. During unpacking there are two range checks which are implemented "manually". Those checks lack the detection of overflows which are considered by the CLI_ISCONTAINED() macro.

CVE-2015-2221 (denial of service): Y0da cryptor / protector is a PE file encryptor - the executable file is decrypted on start up. Clamav is able to decrypt such files in order to scan them. As part of the decryptor there is an op code emulator. A special crafted file may contain a jump op code to a position that already has been interpreted - which leads to an endless loop. This leads to an endless loop in clamav itself.

CVE-2015-2222 (denial of service): Petite is a tool for compressing PE files on windows. Clamav is a virus scanning tool which is able to unpack such files during scanning. Once the file has been identified as "petite" compressed before the decompressing process is started it is possible that a specially crafted file tells clamav to read more data than it allocated memory. On glibc it leads to SIGABRT on free() since glibc's malloc() recognizes this.

CVE-2015-2668 (denial of service): A flaw has been discovered that is leading to an infinite loop condition on a crafted "xz" archive file.

Alerts:

Gentoo	201512-08	clamav	2015-12-30
Fedora	FEDORA-2015-7378	clamav	2015-05-12
Ubuntu	USN-2594-1	clamav	2015-05-05
Mandriva	MDVSA-2015:221	clamav	2015-05-04
openSUSE	openSUSE-SU-2015:0906-1	clamav	2015-05-19
Mageia	MGASA-2015-0190	clamav	2015-05-05
Fedora	FEDORA-2015-7334	clamav	2015-05-02
Arch Linux	ASA-201505-2	clamav	2015-05-04
Debian-LTS	DLA-233-1	clamav	2015-05-28

Comments (1 posted)

curl: information leak

Package(s):

curl

CVE #(s):

CVE-2015-3153

Created:

April 30, 2015

Updated:

May 28, 2015

Description:

From the Debian advisory:

It was discovered that cURL, an URL transfer library, if configured to use a proxy server with the HTTPS protocol, by default could send to the proxy the same HTTP headers it sends to the destination server, possibly leaking sensitive information.

Alerts:

openSUSE	openSUSE-SU-2015:0861-1	curl	2015-05-12
Ubuntu	USN-2591-1	curl	2015-04-30
Debian	DSA-3240-1	curl	2015-04-29
Arch Linux	ASA-201505-20	curl	2015-05-28

Comments (none posted)

DirectFB: two vulnerabilities

Package(s):

DirectFB

CVE #(s):

CVE-2014-2977 CVE-2014-2978

Created:

April 30, 2015

Updated:

January 23, 2017

Description:

From the CVE entries:

Multiple integer signedness errors in the Dispatch_Write function in proxy/dispatcher/idirectfbsurface_dispatcher.c in DirectFB 1.4.13 allow remote attackers to cause a denial of service (crash) and possibly execute arbitrary code via the Voodoo interface, which triggers a stack-based buffer overflow. (CVE-2014-2977)

The Dispatch_Write function in proxy/dispatcher/idirectfbsurface_dispatcher.c in DirectFB 1.4.4 allows remote attackers to cause a denial of service (crash) and possibly execute arbitrary code via the Voodoo interface, which triggers an out-of-bounds write. (CVE-2014-2978)

Alerts:

Gentoo	201701-55	DirectFB	2017-01-23
SUSE	SUSE-SU-2015:0839-1	DirectFB	2015-05-08
Mandriva	MDVSA-2015:223	directfb	2015-05-04
Mageia	MGASA-2015-0176	directfb	2015-05-03
openSUSE	openSUSE-SU-2015:0807-1	DirectFB	2015-04-30

Comments (none posted)

dnsmasq: information disclosure

Package(s):

dnsmasq

CVE #(s):

CVE-2015-3294

Created:

May 5, 2015

Updated:

December 18, 2015

Description:

From the CVE entry:

The tcp_request function in Dnsmasq before 2.73rc4 does not properly handle the return value of the setup_reply function, which allows remote attackers to read process memory and cause a denial of service (out-of-bounds read and crash) via a malformed DNS request.

Alerts:

Gentoo	201512-01	dnsmasq	2015-12-17
Mageia	MGASA-2015-0214	dnsmasq	2015-05-12
openSUSE	openSUSE-SU-2015:0857-1	dnsmasq	2015-05-12
Debian	DSA-3251-2	dnsmasq	2015-05-07
Debian	DSA-3251-1	dnsmasq	2015-05-05
Ubuntu	USN-2593-1	dnsmasq	2015-05-04
Debian-LTS	DLA-225-1	dnsmasq	2015-05-20

Comments (none posted)

elasticsearch: directory traversal

Package(s):

elasticsearch

CVE #(s):

CVE-2015-3337

Created:

April 30, 2015

Updated:

May 6, 2015

Description:

From the Debian advisory:

John Heasman discovered that the site plugin handling of the Elasticsearch search engine was susceptible to directory traversal.

Alerts:

Debian

DSA-3241-1

elasticsearch

2015-04-29

Comments (none posted)

erlang: man-in-the-middle attack

Package(s):

erlang

CVE #(s):

CVE-2015-2774

Created:

May 6, 2015

Updated:

February 22, 2016

Description:

From the Mageia advisory:

Erlang's TLS-1.0 implementation failed to check padding bytes, leaving it vulnerable to an issue similar to POODLE.

Alerts:

openSUSE	openSUSE-SU-2016:0523-1	erlang	2016-02-20
Fedora	FEDORA-2015-12970	erlang	2015-08-18
Fedora	FEDORA-2015-12923	erlang	2015-08-18
Mageia	MGASA-2015-0192	erlang	2015-05-05

Comments (none posted)

fcgi: denial of service

Package(s):

fcgi

CVE #(s):

CVE-2012-6687

Created:

April 30, 2015

Updated:

March 3, 2016

Description:

From the Red Hat bugzilla:

A stack-smashing bug for fcgi was reported to Ubuntu and subsequently patched in both Ubuntu and Debian.

According to the bug report, if more than 1024 connections are received, a segfault can occur.

A patch is provided with the bug reports:

https://bugs.launchpad.net/ubuntu/+source/libfcgi/+bug/933417

and the report at debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681591

Alerts:

Mageia	MGASA-2016-0089	perl-FCGI	2016-03-02
Debian-LTS	DLA-431-1	libfcgi-perl	2016-02-25
Debian-LTS	DLA-430-1	libfcgi	2016-02-25
Mandriva	MDVSA-2015:226	fcgi	2015-05-04
Mageia	MGASA-2015-0184	fcgi	2015-05-03
Fedora	FEDORA-2015-1790	fcgi	2015-04-30
Fedora	FEDORA-2015-1803	fcgi	2015-04-30

Comments (none posted)

FlightGear: unspecified vulnerability

Package(s):

FlightGear

CVE #(s):

Created:

April 30, 2015

Updated:

September 30, 2015

Description:

From the Fedora advisory:

This update provides a security fix related to the Nasal scripting language.

From the Debian LTS advisory:

It was discovered that flightgear, a Flight Gear Flight Simulator game, did not perform adequate filesystem validation checks in its fgValidatePath routine.

Alerts:

Debian-LTS	DLA-318-1	flightgear	2015-09-29
Fedora	FEDORA-2015-6441	FlightGear-data	2015-04-30
Fedora	FEDORA-2015-6441	FlightGear	2015-04-30

Comments (none posted)

ikiwiki: cross-site scripting

Package(s):

ikiwiki

CVE #(s):

CVE-2015-2793

Created:

May 4, 2015

Updated:

May 6, 2015

Description:

From the Red Hat bugzilla:

Cross-site scripting flaw in the handling of the openid_identifier parameter has been fixed in ikiwiki.

Alerts:

Fedora	FEDORA-2015-6815	ikiwiki	2015-05-03
Fedora	FEDORA-2015-6806	ikiwiki	2015-05-03

Comments (none posted)

libphp-snoopy: command execution

Package(s):

libphp-snoopy

CVE #(s):

CVE-2014-5008

Created:

May 4, 2015

Updated:

December 1, 2015

Description:

From the Debian advisory:

It was discovered that missing input sanitizing in Snoopy, a PHP class that simulates a web browser may result in the execution of arbitrary commands.

Alerts:

Red Hat	RHSA-2017:0211-01	nagios	2017-01-31
Red Hat	RHSA-2017:0212-01	nagios	2017-01-31
Red Hat	RHSA-2017:0213-01	nagios	2017-01-31
Red Hat	RHSA-2017:0214-01	nagios	2017-01-31
Debian-LTS	DLA-357-1	libphp-snoopy	2015-11-30
Debian	DSA-3248-1	libphp-snoopy	2015-05-02

Comments (none posted)

openstack-glance: denial of service

Package(s):

openstack-glance

CVE #(s):

CVE-2014-9684 CVE-2015-1881

Created:

May 6, 2015

Updated:

May 6, 2015

Description:

From the CVE entries:

OpenStack Image Registry and Delivery Service (Glance) 2014.2 through 2014.2.2 does not properly remove images, which allows remote authenticated users to cause a denial of service (disk consumption) by creating a large number of images using the task v2 API and then deleting them before the uploads finish, a different vulnerability than CVE-2015-1881. (CVE-2014-9684)

OpenStack Image Registry and Delivery Service (Glance) 2014.2 through 2014.2.2 does not properly remove images, which allows remote authenticated users to cause a denial of service (disk consumption) by creating a large number of images using the task v2 API and then deleting them, a different vulnerability than CVE-2014-9684. (CVE-2015-1881)

Alerts:

Red Hat

RHSA-2015:0938-01

openstack-glance

2015-05-06

Comments (none posted)

owncloud: multiple vulnerabilities

Package(s):

owncloud

CVE #(s):

CVE-2015-3011 CVE-2015-3012 CVE-2015-3013

Created:

May 4, 2015

Updated:

May 6, 2015

Description:

From the Debian advisory:

CVE-2015-3011: Hugh Davenport discovered that the "contacts" application shipped with ownCloud is vulnerable to multiple stored cross-site scripting attacks. This vulnerability is effectively exploitable in any browser.

CVE-2015-3012: Roy Jansen discovered that the "documents" application shipped with ownCloud is vulnerable to multiple stored cross-site scripting attacks. This vulnerability is not exploitable in browsers that support the current CSP standard.

CVE-2015-3013: Lukas Reschke discovered a blacklist bypass vulnerability, allowing authenticated remote attackers to bypass the file blacklist and upload files such as the .htaccess files. An attacker could leverage this bypass by uploading a .htaccess and execute arbitrary PHP code if the /data/ directory is stored inside the web root and a web server that interprets .htaccess files is used. On default Debian installations the data directory is outside of the web root and thus this vulnerability is not exploitable by default.

Alerts:

Debian

DSA-3244-1

owncloud

2015-05-02

Comments (none posted)

perl-xml-libxml: information disclosure

Package(s):

perl-xml-libxml

CVE #(s):

CVE-2015-3451

Created:

May 1, 2015

Updated:

September 8, 2015

Description:

From the Arch Linux advisory:

Unpreserved unset options after a _clone() call (e.g: in load_xml()) leads to not preserved expand_entities. Therefore it leads to a XML-External-Entity Vulnerability.

This vulnerability may lead to the disclosure of confidential data, denial of service, port scanning from the perspective of the machine where the parser is located, and other system impacts.

Alerts:

openSUSE	openSUSE-SU-2015:1506-1	perl-XML-LibXML	2015-09-07
Fedora	FEDORA-2015-7258	perl-XML-LibXML	2015-05-11
Fedora	FEDORA-2015-7115	perl-XML-LibXML	2015-05-10
Mandriva	MDVSA-2015:231	perl-XML-LibXML	2015-05-07
Mageia	MGASA-2015-0199	perl-XML-LibXML	2015-05-06
Ubuntu	USN-2592-1	libxml-libxml-perl	2015-05-04
Debian	DSA-3243-1	libxml-libxml-perl	2015-05-01
Debian-LTS	DLA-214-1	libxml-libxml-perl	2015-04-30
Arch Linux	ASA-201504-32	perl-xml-libxml	2015-04-30

Comments (none posted)

quassel: SQL injection

Package(s):

quassel

CVE #(s):

CVE-2015-3427

Created:

May 1, 2015

Updated:

May 6, 2015

Description:

From the Mageia advisory:

Quassel is vulnerable to SQL injection through its use of Qt's postgres driver. If the PostgreSQL server is restarted or the connection is lost at any point, other IRC users may be able to trick the Quassel core into executing SQL queries upon reconnection.

Alerts:

Debian	DSA-3258-1	quassel	2015-05-12
Mageia	MGASA-2015-0175	quassel	2015-04-30
openSUSE	openSUSE-SU-2015:0933-1	quassel	2015-05-24

Comments (none posted)

squid: certificate validation bypass

Package(s):

squid

CVE #(s):

CVE-2015-3455

Created:

May 4, 2015

Updated:

December 22, 2015

Description:

From the Arch Linux advisory:

The flaw allows remote servers to bypass client certificate validation. Some attackers may also be able to use valid certificates for one domain signed by a global Certificate Authority to abuse an unrelated domain. However, the bug is exploitable only if you have configured Squid to perform SSL Bumping with the "client-first" or "bump" mode of operation. Sites that do not use SSL-Bump are not vulnerable.

A remote attacker is able to bypass client certificate validation, as a result malicious server responses can wrongly be presented through the proxy to clients as secure authenticated HTTPS responses.

Alerts:

openSUSE	openSUSE-SU-2016:2081-1	squid	2016-08-16
Scientific Linux	SLSA-2015:2378-1	squid	2015-12-21
Fedora	FEDORA-2016-7b40eb9e29	squid	2016-05-06
Fedora	FEDORA-2016-7b40eb9e29	libecap	2016-05-06
Oracle	ELSA-2015-2378	squid	2015-11-23
Red Hat	RHSA-2015:2378-01	squid	2015-11-19
openSUSE	openSUSE-SU-2015:1546-1	squid	2015-09-11
Mandriva	MDVSA-2015:230	squid	2015-05-06
Mageia	MGASA-2015-0191	squid	2015-05-05
Arch Linux	ASA-201505-1	squid	2015-05-02

Comments (none posted)

xen: information leak

Package(s):

xen

CVE #(s):

CVE-2015-3340

Created:

May 4, 2015

Updated:

May 6, 2015

Description:

From the CVE entry:

Xen 4.2.x through 4.5.x does not initialize certain fields, which allows certain remote service domains to obtain sensitive information from memory via a (1) XEN_DOMCTL_gettscinfo or (2) XEN_SYSCTL_getdomaininfolist request.

Alerts:

Gentoo	201604-03	xen	2016-04-05
Mageia	MGASA-2016-0098	xen	2016-03-07
Debian	DSA-3414-1	xen	2015-12-09
openSUSE	openSUSE-SU-2015:1092-1	xen	2015-06-22
openSUSE	openSUSE-SU-2015:0983-1	xen	2015-06-02
Fedora	FEDORA-2015-6583	xen	2015-05-02
Fedora	FEDORA-2015-6670	xen	2015-05-02
SUSE	SUSE-SU-2015:0927-1	Xen	2015-05-22
SUSE	SUSE-SU-2015:0940-1	Xen	2015-05-26
SUSE	SUSE-SU-2015:0944-1	Xen	2015-05-26
SUSE	SUSE-SU-2015:0923-1	xen	2015-05-21

Comments (none posted)

xorg-server: denial of service

Package(s):

xorg-server

CVE #(s):

CVE-2015-3418

Created:

May 4, 2015

Updated:

May 6, 2015

Description:

From the Debian LTS advisory:

This issue (CVE-2015-3418) is a regression which got introduced by fixing CVE-2014-8092. The above referenced version of xorg-server in Debian squeeze-lts fixes this regression in the following way:

The length checking code validates PutImage height and byte width by making sure that byte-width >= INT32_MAX / height. If height is zero, this generates a divide by zero exception. Allow zero height requests explicitly, bypassing the INT32_MAX check (in dix/dispatch.c).

Alerts:

Gentoo	201701-64	xorg-server	2017-01-25
Mageia	MGASA-2015-0196	x11-server	2015-05-06
Debian-LTS	DLA-120-2	xorg-server	2015-05-04

Comments (none posted)

Kernel release status

The current development kernel is 4.1-rc2, released on May 3. "As usual, it's a mixture of driver fixes, arch updates (with s390 really standing out due to that one prng commit), and some filesystem and networking."

Stable updates: none have been released in the last week. The 4.0.2, 3.19.7, 3.14.41, and 3.10.77 updates are in the review process as of this writing. They were originally expected on May 4, but have been held up due to some problems caused by the inclusion of some ill-advised patches.

Comments (none posted)

Quotes of the week

Over the past 10 years the x86 FPU has organically grown into somewhat of a spaghetti monster that few (if any) kernel developers understand and which code few people enjoy to hack.

Many people suggested over the years that it needs a major cleanup, and some time ago I went "what the heck" and started doing it step by step to see where it leads - it cannot be that hard!

Three weeks and 200+ patches later I think I have to admit that I seriously underestimated the magnitude of the project! ;-)

— Ingo Molnar embarks on a small cleanup task

PLEASE. We're not programming in Pascal (and thank all Gods for that), so we can have labels that have meaningful names. Also, we're not ashamed of using goto where it makes sense, so we don't need to try to hide the labels by making them look like specks of dirt on our monitor.

— Linus Torvalds

Comments (1 posted)

System call conversion for year 2038

By Jonathan Corbet
May 5, 2015

There are now less than 23 years remaining until that fateful day in January 2038 when signed 32-bit time_t values — used to represent time values in Unix-like systems — run out of bits and overflow. As that date approaches, 32-bit systems can be expected to fail in all kinds of entertaining ways and current LWN readers can look forward to being called out of retirement in a heroic (and lucrative) effort to stave off the approaching apocalypse. Or that would be the case if it weren't for a group of spoilsport developers who are trying to solve the year-2038 problem now and ruin the whole thing. The shape of that effort has come a bit more into focus with the posting by Arnd Bergmann of a new patch set (later updated) showing the expected migration path for time-related system calls.

Current Linux system calls use a number of different data types to represent times, from the simple time_t value through the timeval and timespec structures and others. Each, though, has one thing in common: an integer value counting the number of seconds since the beginning of 1970 (or from the current time in places where a relative time value is needed). On 32-bit systems, that count is a signed 32-bit value; it clearly needs to gain more bits to function in a world where post-2038 dates need to be represented.

Time representations

One possibility is to simply create 64-bit versions of these time-related structures and use them. But if an incompatible change is to be made, it might be worthwhile thinking a bit more broadly; to that end, Thomas Gleixner recently suggested the creation of a new set of (Linux-specific) system calls that would use a signed, 64-bit nanosecond counter instead. This counter would mirror the ktime_t type (defined in <include/linux/ktime.h>) used to represent times within the kernel:

    union ktime {
	s64	tv64;
    };
    typedef union ktime ktime_t;		/* Kill this */

(Incidentally, the "kill this" comment was added by Andrew Morton in 2007; nobody has killed it yet.)

Having user space work with values that mirror those used within the kernel has a certain appeal; a lot of time-conversion operations could be eliminated. But Arnd Bergmann pointed out a number of difficulties with this approach, including the fact that it makes a complicated changeover even more so. The fatal flaw, though, turns up in this survey of time-related system calls posted by Arnd shortly thereafter: system calls that deal with file timestamps need to be able to represent times prior to 1970. They also need to be able to express a wider range of times than is possible with a 64-bit ktime_t. So some variant of time_t must be used with them, at least. (The need to represent times before 1970 also precludes the use of an unsigned value to extend the forward range of a 32-bit time_t value).

So universal use of signed nanosecond time values does not appear to be in the cards, at least not as part of the year-2038 disaster-prevention effort. Still, there is room for some simplification. The current plan is to use the 64-bit version of struct timespec (called, appropriately, struct timespec64 in the kernel, though user space will still see it as simply struct timespec) for almost all time values passed into or out of the kernel. The various system calls that use the other (older) time formats can generally be emulated in user space. So, for example, a call to gettimeofday() (which uses struct timeval) will be turned into a call to clock_gettime() before entry into the kernel. That reduces the number of system calls for which compatibility must be handled in kernel space.

Thus, in the future, a 32-bit system that is prepared to survive 2038 will use struct timespec64 for all time values exchanged with the kernel. That just leaves the minor problem of how to get there with a minimal amount of application breakage. The current plan can be seen in Arnd's patch set, which includes a number of steps to move the kernel closer to a year-2038-safe mode of operation.

Getting to a year-2038-safe system

The first of those steps is to prepare to support 32-bit applications while moving the kernel's internal time-handling code to 64-bit times in all situations. The internal kernel work has been underway for a while, but the user-space interfaces still need work, starting with the implementation of a set of routines that will convert between 32-bit and 64-bit values at the system-call boundary. The good news is that these routines already exist in the form of the "compatibility" system calls used by 32-bit applications running on a 64-bit kernel. In the future, all kernels will be 64-bit when it comes to time handling, so the compatibility functions are just what is needed (modulo a few spots where other data types must be converted differently). So the patch set causes the compatibility system calls to be built into 32-bit kernels as well as 64-bit kernels. These compatibility functions are ready for use, but will not be wired up until the end of the patch series.

The next step is the conversion of the kernel's native time-handling system calls to use 64-bit values exclusively. This process is done in two broad sub-steps, the first of which is to define a new set of types describing the format of native time values in user space. For example, system calls that currently accept struct timespec as a parameter will be changed to take struct __kernel_timespec instead. By default, the two structures are (nearly) the same, so the change has no effect on the built kernel. If the new CONFIG_COMPAT_TIME configuration symbol is set, though, struct __kernel_timespec will look like struct timespec64 instead.

The various __kernel_ types are used at the system-call boundary, but not much beyond that point. Instead, they are immediately converted to 64-bit types on all machines; on 64-bit machines, obviously, there is little conversion to do. Once each of the time-related system calls is converted in this manner, it will use 64-bit time values internally, even if user space is still dealing in 32-bit time values. Any time values returned to user space are converted back to the __kernel_ form before the system call returns. There is still no change visible to user space, though.

The final step is to enable the use of 64-bit time values on 32-bit systems without breaking existing 32-bit binaries. There are three things that must all be done together to make that happen:

The CONFIG_COMPAT_TIME symbol is set, causing all of the __kernel_ data structures to switch to their 64-bit versions.
All of the existing time-related system calls are replaced with the 32-bit compatibility versions. So, for example, on the ARM architecture, clock_gettime() is system call number 263. After this change, applications invoking system call 263 will get compat_sys_clock_gettime() instead. If the compatibility functions have been done correctly, binary applications will not notice the change.
The native 64-bit versions of the system calls are given new system call numbers; clock_gettime() becomes system call 388, for example. Thus, only newly compiled code that is prepared to deal with 64-bit time values will see the 64-bit versions of these calls.

And that is about as far as the kernel can take things. Existing 32-bit binaries will call the compatibility versions of the time-related system calls and will continue to work — until 2038 comes around, of course.

That leaves a fair amount of work to be done in user space, of course. In a simplified view of the situation, the C libraries can be changed to use the 64-bit data structures and invoke the new versions of the relevant system calls. Applications can then be recompiled against the new library, perhaps with some user-space fixes required as well; after that, they will no longer participate in the year 2038 debacle. In practice, all of the libraries in a system and all applications may need to be rebuilt together to ensure that they have a coherent idea of how times are represented. The GNU C library uses symbol versioning, so it can be made to work with both time formats simultaneously, but many other libraries lack that flexibility. So converting a full distribution is likely to be an interesting challenge even once the work on the kernel side is complete.

Finishing the job

Even on the kernel side, though, there are a few pieces of the puzzle that have not yet been addressed. One significant problem is ioctl() calls; of the thousands of them supported by the kernel, a few deal in time_t values. They will have to be located and fixed one-by-one, a process that could take some time. The ext4 filesystem stores timestamps as 32-bit time_t values, though some variants of the on-disk format extend those fields to 34 bits. Ext3 does not support 34-bit timestamps, though, so the solution there is likely to be to drop it entirely in favor of ext4. NFSv3 has a similar problem, and may meet a similar fate; XFS also has some challenges to deal with. The filesystem issues, notably, affect 64-bit systems as well. There are, undoubtedly, many other surprises like this lurking in both the kernel and user space, so the task of making a system ready for 2038 goes well beyond migrating to 64-bit time values in system calls. Still, fixing the system calls is a start.

Once the remaining problems have been addressed, there is a final patch that can be applied. It makes CONFIG_COMPAT_TIME optional, but in a way that leaves the 64-bit paths in place while removing the 32-bit compatibility system calls. If this option is turned off, any binary using the older system calls will fail to run. This is thus a useful setting for testing year-2038 conversions or deploying long-lived systems that must survive past that date. As Arnd put it:

This is meant mostly as a debugging help for now, to let people build a y2038 safe distro, but at some point in the 2030s, we should remove that option and all the compat handling.

Presumably somebody will be paying attention and will remember to carry out this removal twenty years from now (if they are feeling truly inspired, they might just kill ktime_t while they are at it). At that point, they will likely be grateful to the developers who put their time into dealing with this problem before it became an outright emergency. The rest of us, instead, will just have to find some other way to fund our retirement.

(Thanks to Arnd Bergmann for his helpful comments and suggestions on an earlier draft of this article.)

Comments (15 posted)

The OrangeFS distributed filesystem

By Jake Edge
May 6, 2015

Vault 2015

There is no shortage of parallel, distributed filesystems available in Linux today. Each have their strengths and weaknesses, as well as their advocates and use cases. Orange File System (or OrangeFS) is another; it is targeted at providing high I/O performance on systems with up to several thousand multicore storage nodes, but the project is planning to support millions of cores eventually. The OrangeFS client code was proposed for the Linux kernel back in January. Walt Ligon, one of the principals behind the filesystem, gave a talk about OrangeFS at the Vault conference back in March.

At the beginning of the talk, Ligon noted that OrangeFS was similar "in some ways" to GlusterFS, which was the subject of an earlier Vault presentation. But OrangeFS grew out of a research project from 1993 called Parallel Virtual File System (PVFS). That filesystem (now in version 2, called PVFS2) is in use today by various commercial organizations as well as by universities. In 2008, the PVFS project was renamed to OrangeFS as part of changing its focus to a more general filesystem for "big data".

Overview

At its core, OrangeFS has a client-server architecture, most of which runs in user space. All of the code is available under the LGPL. There are multiple ways for client systems to use the PVFS protocol to access data on the servers. That includes libpvfs2 for low-level access, MPI-IO, Filesystem in Userspace (FUSE), web-related mechanisms (e.g. WebDAV, REST), and a Linux virtual filesystem (VFS) client implementation for mounting OrangeFS like any other filesystem in Linux. The latter is what is being proposed for upstream inclusion.

OrangeFS servers handle objects, called dataspaces, that can have both byte-stream and key-value components. The "Trove" subsystem determines how to store those components. Currently, the byte streams are stored as files on the underlying filesystem, while the key-value data is mostly stored in Berkeley DB files, though there is starting to be some use of LMDB.

As seen in the diagram at right, files are stored as a collection of objects: a metadata object and one or more distributed file ("Dfile") objects. Those are accessed from directory objects that include a metadata object. Each of those point to various DirData objects, which contain Dirent (directory entry) objects that point to the metadata object of a file.

Instead of blocks, OrangeFS is all about objects and leaves the block mapping to the underlying filesystems. There are no metadata servers, as all servers can handle all kinds of requests. It is possible to configure an OrangeFS filesystem to store its metadata separate from its data using parameters that govern how the objects should be distributed. Files are typically striped across multiple servers to facilitate parallel access.

OrangeFS provides a unified namespace, so that all files are accessible from a single mount point. It has a client protocol that supports lots of parallel clients and servers. That provides "high aggregate throughput", Ligon said.

In the past, users wanted MPI-IO access to files, but that has changed. Now, POSIX access is "what everyone wants to use". They want to be able to write Python scripts to access their data. But the POSIX API "can be a real limiting factor" because it doesn't understand parallel files, striping, and so on.

Another of the goals for OrangeFS is to "enable the future" by being flexible about the underlying technologies it uses. It wants to provide ways to swap in new redundancy, availability, and stability techniques. For example, OrangeFS is designed to allow users to use their own distribution equation, which is used to find and store data. That equation allows the system to determine which servers go with each object.

Another goal is to make OrangeFS grow to "exascale". One way to keep increasing storage is to add more disks to the computer, but that will eventually hit a wall. There is not enough bandwidth and compute power within a system to access all that data with reasonable performance; the solution to that problem is to add more computers into the mix.

That dramatically increases the number of cores accessing the data, but you can only increase the amount of storage per server to a certain point. Just as with the single computer system above, various limits will be hit, so a better solution is to add more servers with more network connections, but that can get costly. In an attempt to build a lower-cost alternative, Ligon has a new project to create, say, 500 storage servers, each using a Raspberry Pi with a disk. It will be much cheaper, but he thinks it will also be faster—though he still needs to prove that.

There are a number of planned OrangeFS attributes that are missing from the discussion so far, he said. For example, with a large enough number of servers and disks, there will be failures every day. Even if there are no failures, systems will need to be taken down to update the operating system or other software, so there is a need for features that provide availability.

Security is a "major issue" that has mostly been dealt with using "chewing gum and string", Ligon said. Data integrity is another important attribute, as the stored data must be periodically checked and repaired. There is also a need for ways to redistribute files and objects for load or space reasons, as well as a need for monitoring and administration tools.

OrangeFS V3

Some of the "core values" for the next major version of OrangeFS (3.0 or V3) are directly targeted at solving those problems. At the top of that list is "parallelism"; the filesystem should allow parallel access to files, directories, and metadata, while providing scalability through adding servers. The filesystem should also recognize that things are going to fail regularly. If a copy goes bad, throw it away and recreate it; if a node fails, simply discard and replace it.

OrangeFS V3 will minimize the dependencies between servers by not sharing state between them. That will allow servers to be added and removed as needed. Avoiding locks is key to providing better performance, which may require relaxing the semantics of some operations. Finally, 3.0 will target flexible site-customization policies for things like object placement, replication, migration, and so forth.

In order to do all of that, OrangeFS will change the PVFS handle that has been used to identify objects. It is a 64-bit value that encodes both the object and the server it lives on. That scheme has a number of limitations. Objects cannot migrate or be replicated and the collection of servers is static. That works well up to around 128 static servers, he said, but it won't work for OrangeFS V3.

The new handles will contain both an object ID and one or more server IDs, both of which will be 128-bit values. The number of server IDs will typically be somewhere between two and four that will be set when the filesystem is created; it can change, but in practice rarely will. These handles are internal-only, typically stored in metadata objects. By making this change, OrangeFS V3 will be able to do replication and migration.

This will allow all of the filesystem structure to be replicated, as well as the file data. A set of back references is also created, so that maintenance operations can find other copies of the structures. Each of those pieces and copies could be stored on different servers if that was desired. Another possibility is to use "file stuffing", which places the first data object on the same server as its metadata object.

Reads can be done from any server that has a copy of the object, while writes are done to the primary object. Its server then initiates the copy (or copies) needed for replication. The write will only complete and return to the client after a certain number of copies have completed. This is known as the "write confidence" required. For example, if one copy is sent to a much slower archive device, the write could complete after all or some of the non-archive copies have completed.

V3 adds a server ID database, rather than a fixed set of servers. That allows dynamic addition of servers with site-defined attributes (e.g. number, building, rack, etc.). A client doesn't have to know about all the servers, only the set it is using. Servers maintain a partial list of other servers that they tend to work with and there is a server resolution protocol to find others as needed.

The security model is already present in OrangeFS 2.9 (which is the current version of the filesystem). The model is based around capabilities that get returned based on the credentials presented when a metadata object is accessed. That capability is then passed when accessing the data objects. Certificates and public/private key pairs are used to authenticate clients and their credentials.

The final OrangeFS feature that Ligon described was the "parallel background jobs" (PBJs) that are used for maintenance and data integrity. They can be run to check the integrity of the data stored and to repair problems that are found. They can also handle tasks like rebalancing where data is stored to avoid access hotspots and the like.

As he said at the outset, Ligon's talk provided a high-level overview of the filesystem. It seems to not be a particularly well known filesystem, but one that has some interesting attributes. Beyond just handling large data sets for parallel computation, it is also targeted as a research platform that can be used to test ideas for enhancements or broad restructuring. The kernel patches did not receive any comments, but they are also fairly small (less than 10,000 lines of code), so it seems plausible that we will see an OrangeFS client land in the mainline sometime in the future.

[I would like to thank the Linux Foundation for travel support to Boston for Vault.]

Comments (3 posted)

Improving kernel string handling

By Jonathan Corbet
May 6, 2015

The handling and parsing of string data has long been acknowledged as a fertile breeding ground for bugs and security issues; that is doubly true when the C language — whose string model leaves a bit to be desired — is in use. Various attempts have been made to improve C string handling, both in the kernel and in user space, but few think that the problem has been solved. A couple of current projects may improve the situation on the kernel side, though.

String copying

The venerable strcpy() family of functions has long been seen as error-prone and best avoided. In most settings, they are replaced with functions like strncpy() or strlcpy(). The last time your editor wrote about criticisms of strlcpy(), he was treated to a long series of incendiary emails from one of its supporters. So, for the purposes of this article, suffice to say that not all developers are fond of those functions. Even so, the kernel contains implementations of both, and there are over 1,000 call sites for each.

That doesn't mean that there isn't room for improvement, though. Chris Metcalf thinks he has an improvement in the form of the proposed strscpy() API, which provides two new functions:

    ssize_t strscpy(char *dest, const char *src, size_t count);
    ssize_t strscpy_truncate(char *dest, const char *src, size_t count);

As with similar functions, strscpy() will copy a maximum of count bytes from src to dest, but it differs in the details. The return value in this case is the number of bytes copied, unless the source string is longer than count bytes; in that case, the return value will be -E2BIG instead. Another difference is that, in the overflow case, dest will be set to the empty string rather than a truncated version of src.

This behavior is designed to make overflows as obvious as possible and to prevent code from blithely proceeding with a truncated string. When questioned on this behavior, Chris justified it this way:

1. A truncated string with an error return may still cause program errors, even if the caller checks for the error return, if the buffer is later interpreted as a valid string due to some other program error. It's defensive programming.

2. Programmers are fond of ignoring error returns. My experience with truncated strings is that in too many cases, truncation causes program errors down the line. It's better to ensure that no partial string is returned in this case.

In a perfect world, all error returns would be checked, and there would be no need for this, but we definitely don't live in that world :-)

For cases where the code can handle a truncated string, strscpy_truncate() can be used. Its return value convention is the same, but it will fit as much of the string as possible (null-terminated) in dest.

Integer parsing

The kernel must often turn strings into integer values; the interpretation of numbers written to sysfs files or found on the kernel command line are a couple of obvious examples. This parsing can be done with functions like simple_strtoul() (which decodes a string to an unsigned long), but they were marked as being obsolete in 2011. The checkpatch script complains about their use, but there are still about 1,000 call sites in the kernel. Current advice is to use kstrtoul() and the better part of a dozen variants, also added in 2011. There are almost 2,000 uses of these functions in the kernel, but Alexey Dobriyan thinks we can do better.

Alexey has a few complaints about the current APIs. One of the reasons for moving beyond the simple_strto*() functions was that they would silently stop conversion at a non-digit character — "123abc" would be successfully converted to 123. That is the sort of behavior for which PHP is roundly criticized, but, Alexey says, there are times when it is useful. He gives the parsing of device numbers (usually given in the "major:minor" format) as an example. The kstrto*() family cannot easily be used for that kind of parsing, but there are plenty of reasons to not go back to simple_strto*() for that kind of work.

His suggestion is the addition of a new function:

    int parse_integer(const char *s, unsigned int base, <type> *val);

In truth, parse_integer() is not a function; it is instead a rather unsightly macro that arranges to do the right thing for a wide variety of types for val. So, if val is an unsigned short, the decoding will be done on an unsigned basis and will be checked to ensure that the resulting value does not exceed the range of a short.

A successful decoding will cause the result to be placed in val; the number of characters decoded will come back as the return value. If it is expected that the entire string will be decoded, a quick check to see whether s[return_value] is a null byte can verify that. Otherwise, parsing of the string can continue from the indicated point. If the base is ORed with the undocumented value PARSE_INTEGER_NEWLINE, a final newline character will be skipped over — useful for parsing input to sysfs files. If no characters at all are converted, the return value will be -EINVAL; an overflow will return -ERANGE instead.

Alexey's patch set turns the kstrto*() functions into calls to parse_integer(); it also converts a number of simple_strto*() calls to direct parse_integer() calls. The end result is an apparent simplification of the code and net reduction in lines of code.

Whether either of these patch sets will find its way into the kernel is not entirely clear; kernel developers do not, in general, tend to get too excited about string-parsing functions. In both cases, though, the potential exists for improvements to the massive amounts of parsing code found in the kernel while simultaneously making it simpler. In the end, most developers will find it hard to argue against something like that.

Comments (3 posted)

Linus Torvalds Linux 4.1-rc2 ?

Jiri Slaby Linux 3.12.42 ?

Jiri Slaby Linux 3.12.41 ?

Maxime Coquelin Add support to STMicroelectronics STM32 family ?

Bintian Wang arm64,hi6220: Enable Hisilicon Hi6220 SoC ?

Jens Kuske ARM: sunxi: Introduce Allwinner H3 support ?

Yoshinori Sato Re-introduce h8300 architecture ?

Paolo Bonzini KVM: x86: SMM support ?

Ingo Molnar big x86 FPU code rewrite ?

Vikas Shivappa x86/intel_rdt: Intel Cache Allocation Technology ?

Aravind Gopalakrishnan Enable deferred error interrupts ?

Davidlohr Bueso kernel: lockless wake-queues ?

Arnd Bergmann converting all system calls to 64-bit time_t ?

Mathieu Desnoyers sys_membarrier() ?

Michael Turquette scheduler-based cpu frequency scaling ?

Peter Zijlstra latched RB-trees and __module_address() ?

Tejun Heo netconsole: implement extended console support ?

Simon Glass Add a tool to simplify patch checking and posting (patman) ?

Eyal Reizer Bluetooth: Add tty HCI driver ?

S Twiss Add OnKey support for DA9063 ?

Ramakrishna Pallala extcon-axp288: Add axp288 extcon driver support ?

Srinivas Kandagatla ASoC: qcom: add support to apq8016 audio ?

Kevin Cernekee tas571x amplifier driver ?

Bryan O'Donoghue x86/quark: Add eSRAM driver and test code ?

Boris Brezillon crypto: add a new driver for Marvell CESA ?

Stanimir Varbanov Qualcomm PCIe and PCIe/PHY drivers ?

Rhyland Klein Tegra210 Clock Support ?

Hannes Reinecke asynchronous ALUA device handler ?

Eddie Huang Add Mediatek SoC RTC driver ?

Heiko Schocher drm/panel: add lg4573 driver ?

Mathieu Poirier Support for coresight ETMv4 tracer ?

matrandg@cisco.com Driver for Toshiba TC358743 HDMI to CSI-2 bridge ?

Brian Norris mtd: nand: add Broadcom NAND controller support ?

Jens Wiklander generic TEE subsystem ?

Sergey Senozhatsky add on-demand device creation ?

Hanjun Guo Add self-probe infrastructure and stacked irqdomain support for ACPI based GICv2/3 init ?

Luis R. Rodriguez kernel/firmware/wireless: firmware digital signature checks ?

Jan Kara Helper to abstract vma handling in media layer ?

Al Viro non-recursive pathname resolution ?

Jens Axboe Support for write stream IDs ?

Dan Williams [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t ?

Anisse Astier Sanitizing freed pages ?

Tony Luck Find mirrored memory, use for boot time allocations ?

Eric B Munson Allow TCP connections to cache SYN packet for userspace inspection ?

Florian Fainelli net: dsa: New registration API ?

Pablo Neira Ayuso Netfilter ingress support (v3) ?

Tadeusz Struk crypto: Introduce Public Key Encryption API ?

Casey Schaufler LSM: Multiple concurrent LSMs ?

Stephan Mueller Seeding DRBG with more entropy ?

Juergen Gross xen: support pv-domains larger than 512GB ?

Karel Zak util-linux v2.26.2 ?

Wang Nan perf tools: introduce 'perf bpf' command to load eBPF programs. ?

Chris Metcalf add new strscpy() API for string copy ?

Alexey Dobriyan Add parse_integer() (replacement for simple_strto*()) ?

Packaging QtWebEngine

By Jake Edge
May 6, 2015

There is always some tension between the goals of upstream projects and those of distributions. We see that playing out in a recent discussion of packaging the QtWebEngine component for Debian and Fedora that took place on the kde-core-devel mailing list. The whole intent of QtWebEngine is to use the Chromium browser code base as a tool for other applications, but that brings with it all of the packaging problems—library bundling, mostly—that come with that browser.

Lisandro Damián Nicanor Pérez Meyer raised the issue, noting that the KDE personal information management (KDE PIM) packages now depend on QtWebEngine. That package is difficult for distributions to handle because it depends on Chromium, which in turn depends on various other packages.

If Chromium simply depended on those other packages, there wouldn't be so much of a problem. Instead, however, Chromium bundles modified versions of those packages in such a way that distributions have to take the binaries they are given by the project, make a huge number of distribution-specific patches, or live without. In fact, these problems are why Chromium is not available in Fedora.

The immediate responses from KDE board member Albert Astals Cid were less than entirely helpful. He indicated that he thought the distribution policies were simply out of line. He did note that the maintenance window provided for QtWebEngine by the Qt Project might not meet distributions' requirements, but that was really a problem for the distributions:

Just state that there's no such long [maintenance] time for that package or just install the newer version of Qt. And yes again that probably goes against your rules, but it's your rules, so you can just improve them for everyone's benefit :)

While that response was not realistically addressing the problem, it does underscore the differences that exist between some upstream projects and distributions. It is not exactly clear how Astals expects KDE to get into the hands of users if it weren't for distributions—it would seem that he thinks that all of the upstream projects should simply dictate what the distributions ship. It is not an uncommon attitude, at least for larger projects, but it is a bit hard to see it as much of a cooperative stance.

Others on the list saw things a bit differently, however. Luigi Toscano noted that the distributions have their rules for a reason: "The rules are there not because someone wants to have unhappy users; think about the rule about untangling tons of embedded libraries and patching issues." Sune Vuorela envisioned an all-too-plausible scenario where adopting a new version of QtWebEngine led to an enormous cascade of other dependencies that needed updating due to the change.

That led Thomas Lübking to conclude that "the Linux SW stack is crap", but the problem is not so cut and dried. As Matthias Klumpp pointed out, there are lots of different upstream projects with lots of different priorities. The job of a distribution is to navigate that landscape and to produce something sane for its users:

So, in summary: The free software stack is a mess, and distributors align it to produce a working compilation of software. This is simply because there is no central authority coordinating the development of Mesa, Plasma, GNOME, GCC, systemd, etc., so incompatibilities can arise at any time.

Astals had suggested that distributions that choose not to ship certain applications because of dependencies that "broke rules" would simply cause their users to go elsewhere. But Vuorela explained that the problem affects both distributions and KDE equally:

We (as KDE) can either insist saying "if you don't ship this technology we won't deal with you" or we can try to work together with our downstreams to find a way to somehow make everyone somehow happy.

We (as KDE) need to understand that if we don't have our software in distributions, it is just as likely that people will switch applications, as it is that they will switch distributions.

So, all in all, if everyone takes a hard stance, everyone will lose.

Let's find a way so we all can win.

The problem isn't limited to Linux, either. Vadim Zhukov had similar complaints about QtWebEngine for OpenBSD.

While there were some advocating finding solutions, there were few actual solutions to be found in the thread. QtWebKit, which is based on WebKit rather than Chromium, might be an alternative, but it is deprecated in Qt or will be before too long. Trying to avoid hard dependencies on QtWebEngine is another possibility, but makes it hard for KDE applications that want to use the web.

This is a problem we have seen in various guises over the years and will certainly see again. Resolving the tension between the needs of upstream projects and users (as embodied by the distributions they choose) is not easy. Perhaps some radical experiments in the distribution space are indicated.

Comments (126 posted)

Distribution quotes of the week

But first, thank you to all who have made such witty suggestions in webby forums. Alas, the “wacky wabbit” and “watery walrus”, while weird enough and wisely whimsical, won’t win the race. The “warty wombat”, while wistfully wonderful, will break all sorts of systems with its wepetition. And the “witchy whippet”, in all its wiry weeness, didn’t make the cut.

Instead, my waggish friends, the winsome W on which we wish will be… the “wily werewolf”.

-- Mark Shuttleworth

Uploading to Debian can *sometimes* inject life into an upstream project but it can also be a case of flogging a dying horse.

Self-described "mediocre programmer" Harry Percival posted his reaction to the type hints. He argued that type hints are "ugly" and will make the language harder to understand, especially for beginners. He didn't buy the argument that the type annotations for functions will truly be optional, suggesting that over time there will be a strong uptake of the feature. He didn't suggest doing away with type hints entirely, just that there be a recommendation to put those annotations into stub files.

The idea of stub files is already part of the type hints proposal, largely to support C extensions where there may be no Python code to annotate. There are other situations where they will be needed, including for modules that must support both Python 2 and 3. The syntax used by the annotations is only available in Python 3. Stub files have the same syntax as regular Python files—just containing function signatures with "pass" for the function body—but use a .pyi extension

There are some problems with always using stub files, however. For one thing, the way things are currently implemented, the stub files for a module would only be consulted for code that uses that module. The module itself would not have the benefit of the annotations, so its code could not be type-checked. Also, the annotations from stubs are not accessible for introspection from within Python code as inline annotations are.

Percival's post garnered a fair amount of support. To many, type hints are not particularly "Pythonic" and they will make Python code harder to read. By far the strongest condemnation came from Jack Diederich (who is evidently still smarting from a suggestion Van Rossum made in 2003). Some of his complaints are based on misunderstandings of the plans moving forward, but those misconceptions are shared by a fair number of others in what turned into a monster thread. But the core of his complaint can be found at the end of the message:

If this wasn't in a PEP and it wasn't going to ship in the stdlib very few people would use it. If you told everyone they had to install a different python implementation they wouldn't. This is much worse than that - it is Python4 hidden away inside a PEP.

There are many fine languages that have sophisticated type systems. And many bondage & discipline languages that make you type things three times to make really really sure you meant to type that. If you find those other languages appealing I invite you to go use them instead.

Adding type declarations to a dynamically typed language seems to be the crux of the problem for Diederich and others. It fundamentally changes what Python is. But Gregory P. Smith noted that the PEP is really targeted at creating a common way for projects using types (and type-checking) for Python to annotate code, not at all for a wholesale change in existing code:

One thing I feel is often overlooked in the discussion on this PEP: It is about creating a unified type expression syntax for everyone working on Python typing to centralize around. Regardless of if the PEPs version falls short for some purposes. It allows for sharing work. There are multiple ongoing projects that are trying to make use of type information with Python, this allows them to all speak the same language. (MyPy, MicroPython, Cython, the static analyzer we are trying to create at Google, several others not on the top of my head I'm sure, etc.)

We will not be putting type annotations anywhere in the stdlib or expecting anyone else to maintain them there. That would never happen until tools that are convincing enough in their utility for developers to _want_ to use are available and accepted. That'll be a developer workflow thing we could address with a later PEP. IF it happens at all.

A more practical complaint about the annotations came from Requests developer Cory Benfield, who tried to annotate some parts of that library. For certain parameters, he got incredibly complicated annotations that still didn't fully restrict the parameter types. He summarized the problems, as follows:

I suppose the TLDR of this email is that I think that libraries with 'pythonic' APIs are the least likely to take up this typing system because it will provide the least value to them. Maintaining signatures will be a pain (stub files are necessary), writing clear signatures will be a pain (see above), and writing signatures that are both sufficiently open to be back-compatible and sufficiently restrictive to be able to catch bugs seems like it might be very nearly impossible.

After Chris Angelico suggested that Benfield might "want to just stop caring about the exact type", Benfield pointed to exactly that as part of the problem:

This change doesn't catch any of the bugs anyone was going to hit (no-one is passing a callable to this parameter), and it doesn't catch any of the bugs someone might actually hit (getting the tuple elements in the wrong order, for example). At this point the type signature is worse than useless.

Another complaint, voiced by Arnaud Delobelle, is that type hints are a threat to duck typing. The concern is that if the tools start complaining about interfaces that use duck typing, it will cause programmers to shy away from that longstanding Python type convention. That "could encourage a very significant cultural change in the way Python code is written towards a less flexible mindset". But Van Rossum has not forgotten about duck typing:

For me, PEP 484 is a stepping stone. Among the authors of PEP 484 there was much discussion about duck typing, and mypy even has some limited support for duck typing (I think you can still find it by searching the mypy code for "protocol"). But we ran out of time getting all the details written up and agreed upon, so we decided to punt -- for now. But duck typing still needs to have a way to talk about things like "seek method with this type signature" (something like `def seek(self, offset: int, whence: int=SEEK_SET) -> int`) so the current proposal gets us part of the way there.

The hope is that once 3.5 is out (with PEP 484's typing.py included *provisional* mode) we can start working on the duck typing specification. The alternative would have been to wait until 3.6, but we didn't think that there would be much of an advantage to postponing the more basic type hinting syntax (it would be like refusing to include "import" until you've sorted out packages). During the run of 3.5 we'll hopefully get feedback on where duck typing is most needed and how to specify it -- valuable input that would be much harder to obtain of *no* part of the type hints notation were standardized.

The roughly one and a half to two years that fall between each major Python release were clearly part of Van Rossum's thinking. He chose to implement something that could be done before 3.5 hit its feature freeze, but to keep the feature in a provisional state so that it could still be changed during the 3.6 development cycle. Several things may get addressed during that cycle, including support for duck typing and, possibly, adding new syntax for variable type declarations (instead of the comment-based annotations that are part of PEP 484). That all assumes that the PEP is accepted for 3.5, which seems likely unless Van Rossum has a change of heart.

The concerns about the standard library getting annotations are largely based on misunderstandings—there are no plans afoot to add them en masse either inline or as stubs. In fact, changes of that sort to "modernize" the standard library are frowned upon, Van Rossum said, because they often cause subtle regressions "in dark corners". Adding stub files for the standard library in 3.6 is a possibility, PEP co-author Łukasz Langa said. Additions to the standard library might include inline annotations, if the author is a fan of type hints, but that is going to be a ways out, according to Van Rossum.

The standard library is often seen as a place to look at "good" Python code, but that isn't really the case. It is a shared community code base, as Nick Coghlan described it but, as Van Rossum pointed out, it is not the epitome of Python coding style:

The stdlib contains a lot of Python code, and you can learn a lot from it, but good coding habits aren't generally something you learn there -- the code is crusty (some of the oldest Python code in existence lives in the stdlib!), often has to bend over backwards to support backward compatibility, and is riddled with performance hacks.

The debate got a bit testy at times but, as is generally the case in the Python world, it never got really out of hand. That community has found a way to moderate its communication, generally by way of fairly gentle reminders about respecting others' views. In the end, any change of this magnitude is going to cause some to be worried about the impacts—as they should. Contrary to some of the less temperate reactions (evidently Twitter was especially bad), this hardly seems like a change that "breaks" the language. It would not be hugely surprising to see it as quite a popular feature in, say, five years, once all the kinks have been worked out.

Comments (17 posted)

Quotes of the week

If you have to ask "if my code calls GPL'd code in this complicated way, can I avoid having to be GPL-compatible?" then the answer is pretty much always "you're trying to reduce freedom, stop it".

— Lars Wirzenius

This is my favorite piece, 'The Blue Screen of Failure.' It's been shown all over the world.

Free Software Supporter - Issue 85, May 2015

The Free Software Foundation's monthly newsletter for May covers the International Day Against DRM, the Licensing and Compliance Lab interviews Matt Lee, DMCA exemption commenting process broken beyond repair, thousands of Spaniards leave Twitter for GNU social, GNU Mailman 3.0 released, GCC 5 release series update, and several other topics.

Full Story (comments: none)

FSFE Newsletter - May 2015

The Free Software Foundation Europe newsletter looks at a secret Free Software action plan by the European Commission, interpretation of law restricting Free Software in Switzerland, Document Freedom Day, and much more.

Full Story (comments: none)

PGConf Silicon Valley CfP Now Open

PGConf Silicon Valley will be held November 17-18 in San Francisco, CA. The call for papers closes 15.

Full Story (comments: none)

CFP Deadlines: May 7, 2015 to July 6, 2015

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

Deadline	Event Dates	Event	Location
May 8	July 31 August 4	PyCon Australia 2015	Brisbane, Australia
May 15	September 28 September 30	OpenMP Conference	Aachen, Germany
May 17	September 16 September 18	PostgresOpen 2015	Dallas, TX, USA
May 17	August 13 August 17	Chaos Communication Camp 2015	Mildenberg (Berlin), Germany
May 23	August 22 August 23	Free and Open Source Software Conference	Sankt Augustin, Germany
May 23	May 23 May 25	Wikimedia/MediaWiki European Hackathon	Lyon, France
May 31	October 2 October 4	PyCon India 2015	Bangalore, India
June 1	November 18 November 22	Build Stuff 2015	Vilnius, Lithuania
June 1	July 3 July 5	SteelCon	Sheffield, UK
June 5	August 20 August 21	Linux Security Summit 2015	Seattle, WA, USA
June 6	September 29 September 30	Open Source Backup Conference 2015	Cologne, Germany
June 11	June 25 June 28	Linux Vacation Eastern Europe 2015	Grodno, Belarus
June 15	August 15 August 22	DebConf15	Heidelberg, Germany
June 15	September 24	PostgreSQL Session 7	Paris, France
June 15	November 17 November 18	PGConf Silicon Valley	San Francisco, CA, USA
June 17	October 5 October 7	LinuxCon Europe	Dublin, Ireland
June 17	October 5 October 7	Embedded Linux Conference Europe	Dublin, Ireland
June 19	August 20	Tracing Summit	Seattle, WA, USA
June 28	August 28 September 3	ownCloud Contributor Conference	Berlin, Germany
June 28	November 11 November 13	LDAP Conference 2015	Edinburgh, UK
June 30	November 16 November 19	Open Source Monitoring Conference 2015	Nuremberg, Germany
July 5	October 27 October 29	Open Source Developers' Conference	Hobart, Tasmania

If the CFP deadline for your event does not appear here, please tell us about it.

EuroPython 2015

EuroPython will take place July 20-26 in Bilbao, Spain. You may now vote for talks and apply for financial aid. Talk voting will be open until May 15.

Comments (none posted)

Events: May 7, 2015 to July 6, 2015

The following event listing is taken from the LWN.net Calendar.

Date(s)	Event	Location
May 6 May 8	German Perl Workshop 2015	Dresden, Germany
May 7 May 9	Linuxwochen Wien 2015	Wien, Austria
May 8 May 10	Open Source Developers' Conference Nordic	Oslo, Norway
May 12 May 13	PyCon Sweden 2015	Stockholm, Sweden
May 12 May 14	Protocols Plugfest Europe 2015	Zaragoza, Spain
May 13 May 15	GeeCON 2015	Cracow, Poland
May 14 May 15	SREcon15 Europe	Dublin, Ireland
May 16 May 17	11th Intl. Conf. on Open Source Systems	Florence, Italy
May 16 May 17	MiniDebConf Bucharest 2015	Bucharest, Romania
May 18 May 22	OpenStack Summit	Vancouver, BC, Canada
May 18 May 20	Croatian Linux User Conference	Zagreb, Croatia
May 19 May 21	SAMBA eXPerience 2015	Goettingen, Germany
May 20 May 22	SciPy Latin America 2015	Posadas, Misiones, Argentina
May 21 May 22	ScilabTEC 2015	Paris, France
May 23 May 24	Debian/Ubuntu Community Conference Italia - 2015	Milan, Italy
May 23 May 25	Wikimedia/MediaWiki European Hackathon	Lyon, France
May 30 May 31	Linuxwochen Linz 2015	Linz, Austria
June 1 June 2	Automotive Linux Summit	Tokyo, Japan
June 3 June 5	LinuxCon Japan	Tokyo, Japan
June 3 June 6	Latin American Akademy	Salvador, Brazil
June 5 June 7	PyCon APAC 2015	Taipei, Taiwan
June 8 June 10	Yet Another Perl Conference 2015	Salt Lake City, UT, USA
June 10 June 13	BSDCan	Ottawa, Canada
June 11 June 12	infoShare 2015	Gdańsk, Poland
June 12 June 14	Southeast Linux Fest	Charlotte, NC, USA
June 16 June 20	PGCon	Ottawa, Canada
June 22 June 23	DockerCon15	San Francisco, CA, USA
June 23 June 26	Open Source Bridge	Portland, Oregon, USA
June 23 June 25	Solid Conference	San Francisco, CA, USA
June 23 June 26	Red Hat Summit	Boston, MA, USA
June 25 June 26	Swiss PostgreSQL Conference	Rapperswil, Switzerland
June 25 June 28	Linux Vacation Eastern Europe 2015	Grodno, Belarus
June 26 June 27	Hong Kong Open Source Conference 2015	Hong Kong, Hong Kong
June 26 June 28	FUDCon Pune 2015	Pune, India
July 3 July 5	SteelCon	Sheffield, UK
July 4 July 10	Rencontres Mondiales du Logiciel Libre	Beauvais, France

If your event does not appear here, please tell us about it.

LWN.net Weekly Edition for May 7, 2015

Freedom to compete

Feedback

The List

html2print

Embedding 3D data in 2D images

Natron

Pitivi

Security

Brief items

New vulnerabilities

ax25-tools: denial of service

clamav: multiple vulnerabilities

curl: information leak

DirectFB: two vulnerabilities

dnsmasq: information disclosure

elasticsearch: directory traversal

erlang: man-in-the-middle attack

fcgi: denial of service

FlightGear: unspecified vulnerability

ikiwiki: cross-site scripting

libphp-snoopy: command execution

openstack-glance: denial of service

owncloud: multiple vulnerabilities

perl-xml-libxml: information disclosure

quassel: SQL injection

squid: certificate validation bypass

xen: information leak

xorg-server: denial of service

Kernel development

Brief items

Kernel development news

Time representations

Getting to a year-2038-safe system

Finishing the job

Overview

OrangeFS V3

String copying

Integer parsing

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Device driver infrastructure

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous

Distributions

Brief items

Distribution News

Debian GNU/Linux

Fedora

Red Hat Enterprise Linux

Ubuntu family

Newsletters and articles of interest

Development

Brief items

Newsletters and articles

Announcements

Brief items

Articles of interest

Calls for Presentations

CFP Deadlines: May 7, 2015 to July 6, 2015

Upcoming Events

Events: May 7, 2015 to July 6, 2015