LWN.net Logo

LWN.net Weekly Edition for March 25, 2010

Ubuntu and window controls

By Jake Edge
March 24, 2010

User interface design changes are often contentious, but when the changes are to something fundamental that users are accustomed to, the outcry is even louder. On the flipside, though, good UI design is best done within a small group of dedicated folks, who may—should—be willing to think "outside the box". Innovations often come from setting aside historical precedents, but, if the process is done quietly, and presented suddenly, the shock value alone can be enough to anger users. Ubuntu's recent bug report flamewar shows just how that can play out.

As part of the Ubuntu rebranding effort, there were some obvious changes, like the move from brown to purple, but there were some more subtle changes as well. In the new Gtk theme as presented in that new brand, there were changes to the window controls. Instead of the traditional—for Linux anyway—buttons in the upper right of a window, they had moved to the upper left. But the "close" button stayed in the same relative position, so that it was no longer in the corner, but was to the right of the "maximize" and "minimize" buttons (which had swapped positions).

When the rebranding was announced, most observers focused on the color, logo, and other changes; few noticed, or mentioned, the window control changes—until those changes landed in the first Lucid Lynx (10.04) beta. Once users were faced with actually using the change, they ended up noticing it—and many weren't very happy about it. A bug was filed in Launchpad on March 5, and various arguments broke out in the bug entry about whether it was a bug, whether it was a good change or not, what the bug status should be, and who it should be assigned to. As is often the case when users don't feel that a bug report is being handled correctly, there were many status, importance, and assignment changes, with others resetting the values back to what they were—pretty typical bug tracker gamesmanship.

There were also lots of comments about the change—376 at the time that this was written. There are, of course, ways to revert the behavior, either via a personal package archive (PPA) or the command line:

    $ gconftool-2 --set /apps/metacity/general/button_layout \
                   --type string "menu:minimize,maximize,close"

One of the concerns expressed is that Lucid is a long-term support (LTS) release, so any change will be supported (and lived with) for three years. Another was the way in which the change came about, i.e. with essentially no warning or explanation. "Conscious User" described how it looked:

For the button positioning, however, there was absolutely no official stance from the design team on the reasoning behind it. In a recent Ars Technica article, Ryan Paul states that Ivanka Majic posted explanations in her blog [here]. As I previously stated in this bug report, not only her blog post mentions only the questions and no answers, but clearly states that she does not agree with the design herself.

I doubt that revealing the reasoning would satisfy all users, but at least they would have a base to build arguments on. Right now, a lot of people are *assuming* the reasons and criticizing Canonical based on those assumptions. This is wrong, but there's little else possible when an official statement does not exist.

Mark Shuttleworth did provide something of an "official statement" further down in the bug comments:

The default position of the window controls will remain the left, throughout beta1. We're interested in data which could influence the ultimate decision. There are good reasons both for the change, and against them, and ultimately the position will be decided based on what we want to achieve over time.

Moving everything to the left opens up the space on the right nicely, and I would like to experiment in 10.10 with some innovative options there.

But that explanation wasn't really satisfying to many of the commenters. It didn't really explain why the change was done, other than the somewhat vague statement that it opened up space on the right for unnamed "innovative options". As several commenters noted, leaving the buttons on the right opens up the left side, so it is not clear why moving the controls was needed to support these innovations. Conscious User, among others, asked Shuttleworth for "concrete, non-vague arguments in favor of the left side", but, so far at least, those arguments have not been forthcoming.

In noting that the change "landed without warning" and that there "aren't any good reasons for that", Shuttleworth tried to defuse the situation to some extent. But much of the underlying unhappiness is not just that there was no warning, but that the reasons behind the change are, at best, murky. Without knowing what the innovative plans for the right-hand side are, it makes it harder for people to understand and accept as Adam Williamson points out:

You've said a couple of times that the idea is to free up the right hand corner for Other Stuff You Will Put There Later, which is a valid idea. What I don't get, though, is why you think it makes sense to do the freeing-up before you've got around to inventing the Other Stuff. It gives people all the drawbacks of the re-arranging with none of the benefits of the Cool New Stuff, so it's not that surprising that they wind up belly-aching.

There are some hints, though, that the "Other Stuff" has been invented, or at least discussed, by the design team. That leads some to speculate that there might be Canonical business reasons not to disclose these new ideas. That runs counter to how some people believe that community distributions should be run. There is concern that important distribution decisions are being taken out of the hands of the community. Shuttleworth doesn't completely shy away from that characterization, while noting that there is room for more experts on the decision-making teams:

We have a kernel team, and they make kernel decisions. You don't get to make kernel decisions unless you're in that kernel team. You can file bugs and comment, and engage, but you don't get to second-guess their decisions. We have a security team. They get to make decisions about security. You don't get to see a lot of what they see unless you're on that team. We have processes to help make sure we're doing a good job of delegation, but being an open community is not the same as saying everybody has a say in everything.

This is a difference between Ubuntu and several other community distributions. It may feel less democratic, but it's more meritocratic, and most importantly it means (a) we should have the best people making any given decision, and (b) it's worth investing your time to become the best person to make certain decisions, because you should have that competence recognised and rewarded with the freedom to make hard decisions and not get second-guessed all the time.

But the secrecy and way that these decisions have been handled led some to wonder whether there is an autocratic element at play. Atel Apsfej wondered about Shuttleworth's credentials: "Who certified him an expert designer? He may be passionate about design but it doesn't automatically make him good at it." Further, Apsfej thinks that the Ubuntu community has the responsibility to push back:

[Who's] in a position to tell him his designs are bad if not the external Ubuntu community? You can't really expect Canonical employees to go toe-to-toe with him when he's made up his mind. That's the problem with organizational structures that are built on cults-of-personality... the lines between what it means to be a meritocracy and an autocracy get a little blurry.

While Apsfej was one of the harsher critics, his points seem to sum up the concerns of quite a few commenting on the thread. There is concern that Shuttleworth is not quite meeting the transparency promises that he has made. As Ubuntu matures, and fixing Bug #1 ("Microsoft has a majority market share") becomes more and more important, is there a need for Shuttleworth and Canonical to take a stronger hand on the rein? Apsfej explains the difference, though in characteristically stark terms:

Ubuntu is utterly and completely Shuttleworth's baby. If he wants to collaborate with the community that has been drawn into the project's promise of transparency..then he should make good on that promise and be transparent and communicate about plans. If he wants to be Steve Jobs 2.0 and wow potential consumers with innovative product offerings born from behind closed doors with no community input then he can be that instead. He just needs to decide be consistent about how he wants to interact with the Ubuntu community. Consumer or collaborators...his choice.

For his part, Shuttleworth does recognize that mistakes were made in how the design team made this change. The change is not fixed in stone, and may be reverted before the final release of Lucid. But he is not concerned about shipping a change like this in an LTS release: "If I'm confident that 10.10, 11.04 and future releases will have the controls on the left, it makes even more sense to do it now (because the LTS will then not look dated compared to newer releases)". He notes the precedent of shipping Firefox 3.0 beta for the 8.04 LTS release, which "caused an uproar but was the right decision given that 2.0 was nearing its end of life at the time".

There are risks to any change, and Shuttleworth is cognizant of those, but he also sees big opportunities:

Look, I understand this is risky. In my judgment, it's worth the risk. Being able to tackle risky things is one of the things that gives us the chance to catch up to the big guys, and beat them. That doesn't mean we should be cavalier, but I'm not going to shy away from an opportunity to do something much better now just because Microsoft did something a particular way 20 years ago.

In the end, though, Shuttleworth is defending how decisions are made in and for Ubuntu, including this one. Because it affects every window that people use, and is thus in their faces many times a day, the level of outrage got particularly high. But that kind of backlash can't stop the decision makers:

Ubuntu is plenty big enough that there is an area where anybody can make themselves an expert, take on responsibility, and lead. But it's also big enough that if we try to make everybody feel like they can weigh in on *every* decision, we'll grind to a halt.

This is a flashpoint, but most decisions are not as contentious as this one. I'm backing this decision because I think it's the right one in the long term. It may be right, it may be wrong, but I have a mandate to take the decision. The same is true of our kernel lead, and our community governance leads. They are fallible (I certainly am) but they are nevertheless empowered to take decisions.

It is unfortunate that, for whatever reason, more details about the future plans for the right-hand side of a window's title bar are not available. One gets the sense that much of the anger and unhappiness that was spewed into the bug report would have been lessened, perhaps greatly, by a better communication of the "Cool New Feature" that may wind up there. Presumably in time we will see what the plans are and can judge at that point whether the secrecy was worth it. For now it seems to have gotten a lot of people up in arms, possibly without a very good reason.

Comments (51 posted)

Automated trading using Marketcetera

By Jake Edge
March 24, 2010

While Linux and free software have found a place in various stock exchanges and other financial firms, most of the code that is run on those systems is proprietary. Financial companies are not terribly interested in sharing their secret trading strategies with their competitors. But much of the underlying code—communicating with brokerage systems to buy and sell stocks and other financial instruments—doesn't necessarily need to be secret. Marketcetera, a San Francisco startup, is applying free software principles to its trading platform to allow more organizations, especially smaller, less well-heeled ones, to get access to the same algorithmic trading channels that the larger firms use.

[PhotonGUI]

First announced in January 2009, the Marketcetera Automated Trading Platform (MATP) is a GPLv2-licensed system for handling market data as an input and producing buy/sell orders as its output. In between, "strategy engines" can be used to decide what those orders should be, and which brokers to route them to. There is a sophisticated "signal analysis" module that filters and analyzes the incoming data stream of market-related information (bid, ask, executed trades, etc.) for use by the strategies. An "order routing server" handles bidirectional communication with brokers to submit orders. In addition, orders and their outcomes are stored in a MySQL database for record keeping purposes.

Overall, it is a rather complex Java application with lots of moving parts, many of which require interfacing with external organizations—some of whom are understandably rather picky about who they talk to. MATP uses the Financial Information Exchange (FIX) protocol both for retrieving market data and for sending and receiving order information. That allows users to connect to any FIX-enabled services and brokers, but there is often a "catch": these organizations typically require certification of the FIX connector before allowing orders to be entered into their system.

Users can go through that—presumably somewhat costly—process themselves, or they can get access to "pre-certified" connectors as part of a Marketcetera support contract. Like many free software businesses, Marketcetera's business model is centered around support and consulting. In addition, there is an element of the "open core" strategy, with "extras" that can be purchased to run on top of the core, including the FIX connectors and various market data adapters.

The core is a rather large chunk of code, however, with some fairly significant functionality. It is not exactly a turn-key system because the needs of each user are so different. The company seems to understand the benefits of free (or open source) software, touting them in its FAQ. Avoiding vendor lock-in is a key piece of that as Managing Director Roy Agostino describes:

We believe open source accelerates customer adoption of our software. Most firms have a plethora of systems in their shop and they continually run afoul of technology not built on open standards, open APIs, etc. They have additional concerns about a chronic inability to get consulting resources from proprietary vendors to make changes they require along their business timelines. Open source changes those paradigms, and present customers with an alternative that is open, has a much lower price point, and affords them the flexibility to jump right in (or have a trusted consulting resource jump right in) and make the changes they require.

Of course, there is also the community aspect of free software. Marketcetera has a separate domain (marketcetera.org) for its community portal. Unfortunately, much of the information available in the portal requires logging in after registering with the site. It's not exactly clear how that promotes community.

At the portal, logged-in visitors will find documentation, both for users and developers, a support forum, bug/feature tracker, and the like, but also the Marketcetera Open Labs. The labs are a Sourceforge-like site for community members to start new, related projects for "platform extensions, modules, new components, unofficial HOWTOs" and so on. Several projects for things like a CSV market data adapter, chart module, strategy authoring, and a few others have already been started in the labs.

For contributions to the core, Marketcetera has a contributor agreement that is based on MySQL's. It requires transferring the copyright of the code to Marketcetera and receiving a "broad license to re-use and distribute your contribution". It also requires granting a patent license to the company and its users for any patents that are held by the contributor on the contribution. There is a web page that describes some ideas of things to work on along with instructions on the logistics of code contribution.

Installation is fairly straightforward, though not very well integrated with Linux. To start with, you must be logged in to access the software, which comes in a 220MB tar file—containing a single 220MB shell script. In keeping with what seems to be a tradition in Java program installation, the Marketcetera code bundles a Java Runtime Environment (JRE) and MySQL into the installation. The installer gives a minor complaint if it isn't run on Ubuntu 8.04, for which it was tested, but it seemed to install and run just fine on Fedora 12 as there aren't many external dependencies.

There are actually two downloads available, the server components described above and the Photon GUI for order entry and strategy authoring. Photon is based on the Eclipse rich client platform (RCP), so its use will be familiar to anyone who has used Eclipse. It comes as a more traditional, though still 75MB, tarball. The RCP was chosen at least partially because Photon provides a way to develop strategies in Java or Ruby, which can be tested and run from within the GUI.

Photon also has order entry capabilities, so that user-initiated (rather than strategy-initiated) trades can be entered. Market data can be retrieved, filtered, and displayed in a variety of ways. There is a web browser component for looking at financial and other sites. Retrieving information on trades from the database is also possible in Photon. It is essentially a control panel for the entire system.

This is clearly a niche project, but in a rather large industry—at least as measured by revenues. Algorithmic, high-frequency trading is becoming ever more prevalent, so reducing the barriers to entry will allow more, and smaller, firms to get involved. Given the recent financial meltdown, that may not necessarily be a good thing, but keeping these tools in the hands of those who were "too big to fail" didn't work out all that well either.

Comments (4 posted)

Resetting PHP 6

By Jonathan Corbet
March 24, 2010
Rightly or wrongly, many in our community see Perl 6 as the definitive example of vaporware. But what about PHP 6? This release was first discussed by the PHP core developers back in 2005. There have been books on the shelves purporting to cover PHP 6 since at least 2008. But, in March 2010, the PHP 6 release is not out - in fact, it is not even close to out. Recent events suggest that PHP 6 will not be released before 2011 - if, indeed, it is released at all.

PHP 6 was, as befits a major release, meant to bring some serious changes to the language. To begin with, the safe_mode feature which is the whipping boy for PHP security - or the lack thereof - will be consigned to an unloved oblivion; the "register_globals" feature will be gone as well. The proposed traits feature would bring "horizontal reuse" to the language; think of traits as a PHPish answer to multiple inheritance or Java's interfaces. A new 64-bit integer type is planned. PHP was slated to gain a goto keyword (though the plan was to avoid the scary goto name and add target labels to break instead). Some basic static typing features are under consideration. There was even talk of adding namespaces to the language and making function and class names be case-sensitive.

The really big change in PHP 6, though, was the shift to Unicode throughout. Anybody who is running a web site which does not use Unicode is almost certainly wishing that things were otherwise - trust your editor on this one. It is possible to support Unicode to an extent even if the language in use is not aware of Unicode, but it is a painful and error-prone affair; proper Unicode support requires a language which understands Unicode strings. The PHP 6 plan was to support Unicode all the way:

PHP6 will have Unicode support everywhere; in the engine, in extensions, in the API. It's going to be native and complete; no hacks, no external libraries, no language bias. English is just another language, it's not the primary language.

Unicode, however, appears to be the rock upon which the PHP 6 ship ran aground. Despite claims back in 2006 that the development process was "going pretty well," it seems that few people are happy with the state of Unicode support in PHP. Memory usage is high, performance is poor, and broken scripts are common. The project has been struggling for some time to find a solution to this problem.

From your editor's reading of the discussion, the fatal mistake would appear to be the decision to use the two-byte UTF-16 encoding for all strings within PHP. According to PHP creator Rasmus Lerdorf, this decision was made to ease compatibility with the International Components for Unicode (ICU) library:

Well, the obvious original reason is that ICU uses UTF-16 internally and the logic was that we would be going in and out of ICU to do all the various Unicode operations many more times than we would be interfacing with external things like MySQL or files on disk. You generally only read or write a string once from an external source, but you may perform multiple Unicode operations on that same string so avoiding a conversion for each operation seems logical.

But a lot of strings simply pass through PHP programs; in the end, the conversion turned out to be more expensive and less convenient than had been hoped. Johannes Schlüter describes the problem this way:

By using UTF-16 as default encoding we'd have to convert the script code and all data passed from or to the script (request data, database results, output, ...) from another encoding, usually UTF-8, to UTF-16 or back. The need for conversion doesn't only require CPU time and more memory (a UTF-16 string takes double memory of a UTF-8 string in many cases) but makes the implementation rather complex as we always have to figure out which encoding was the right one for a given situation. From the userspace point of view the implementation brought some backwards compatibility breaks which would require manual review of the code.

These all are pains for a very small gain for many users where many would be happy about a tighter integration of some mbstring-like functionality. This all led to a situation for many contributors not willing to use "trunk" as their main development tree but either develop using the stable 5.2/5.3 trees or refuse to do development at all.

The end result of all this is that PHP 6 development eventually stalled. The Unicode problems made a release impossible while blocking other features from showing up in any PHP release at all. Eventually some work was backported to 5.3, but that is always a problematic solution; it brings back memories of the 2.5 kernel development series.

Developer frustration, it seems, grew for some time. Last November, Kalle Sommer Nielsen tried to kickstart the process, saying:

I've been thinking for a while what we should do about PHP6 and its future, because right now it seems like there isn't much future in it.

Things came to a head on March 11, when Jani Taskinen, fed up with being unable to push things forward, (1) committed some disruptive changes to the stable 5.3 branch, and (2) created a new PHP_5_4 branch which looked like it was meant to be a new development tree. That is when Rasmus stepped in:

The real decision is not whether to have a version 5.4 or not, it is all about solving the Unicode problem. The current effort has obviously stalled. We need to figure out how to get development back on track in a way that people can get on board. We knew the Unicode effort was hugely ambitious the way we approached it. There are other ways.

So I think Lukas and others are right, let's move the PHP 6 trunk to a branch since we are still going to need a bunch of code from it and move development to trunk and start exploring lighter and more approachable ways to attack Unicode.

And that is where it stands. The whole development series which was meant to be PHP 6 has been pushed aside to a branch, and development is starting anew based on the 5.3 release. Anything of value in the old PHP 6 branch can be cherry-picked from there as need be, but the process of what is going into the next release is beginning from scratch, and one assumes that proposals will be looked at closely. There are no timelines or plans for the next release at this point; as Rasmus explains, that's not what the project needs now:

We don't need timelines right now. What we need is some hacking time and to bring some fun back into PHP development. It hasn't been fun for quite a while. Once we have a body of new interesting stuff, we can start pondering releases...

So timing and features for the next PHP release are completely unknown at this point. Even the name is unknown; Jani's 5.4 branch has been renamed to THE_5_4_THAT_ISNT_5_4. There has been some concern about all of those PHP 6 books out there; it has been suggested that a release which doesn't conform to expectations for PHP 6 should be called something else - PHP7, even. There's little sympathy for the authors and publishers of those books, but those who bought them may merit a little more care. But that will be a discussion for another day. Meanwhile, the PHP hackers are refocusing on getting things done and having some fun too.

Comments (111 posted)

Evi Nemeth (an Ada Lovelace day tribute)

Sometime around 1981, when your editor was an undergraduate at the University of Colorado, he was introduced to the Computer Science department's prized VAX 11/780, which, at that time, was a dual boot system, switching between VMS and and early BSD Unix every afternoon. The chief of the Unix side was Evi Nemeth. The first thing that struck most people about Evi was a general sense of distraction and disorganization; it's only later that one realized the she was one of those smart people who make things happen.

[Evi Nemeth] Evi armed your editor with a dog-eared edition of the K&R C book and access to the Unix source. She encouraged the writing of a fix to the system's memory management code, which tended to let one memory hog take over the system - a bad feature on a multiuser computer. That "fix" has, happily, vanished from living memory, and any backups which exist will no longer be readable. But the pure fun of being able to dig into the operating system code lasts to this day.

These days, Evi lists her office as being "my sailboat, Wonderland, somewhere in the Caribbean." She has a relatively low profile in the Linux community, despite being one of the authors of (and the inspiration behind) the Linux Administration Handbook, but the USENIX crowd knows her well. Her time at CU launched a whole generation of hackers who are in the field for the joy of it, and every one of them thinks back fondly to one of the people who got them started. Well done, Evi; you helped make all this happen.

Comments (11 posted)

Page editor: Jonathan Corbet

Security

Should web developers say no to cookie-based authentication?

March 24, 2010

This article was contributed by Nathan Willis

Timothy D. Morgan of Virtual Security Research (VSR) has written a paper proposing a system for web applications to authenticate users that offers significantly better security than the common practice of storing session IDs in cookies. Morgan's proposal is based on the existing (but seldom-used) HTTP Digest Access Authentication method, so it could be implemented with very little effort in existing web browsers and web servers. He argues that digest authentication's lack of popularity is primarily due to inflexible implementations that web developers find inconvenient compared to the simplicity of using cookies, and suggests some changes that could make it more appealing.

The paper, "Weaning the Web off of Session Cookies: Making Digest Authentication Viable," is available as a PDF from the VSR web site. Morgan begins by outlining the established security problems with using cookies to store session IDs on the user's web browser. In a typical scheme, a user authenticates to a web application via a username/password combination entered through an HTML form, after which a session key is sent back to the user's browser and stored in a cookie, the value of which is either generated pseudorandomly or is some value that is encrypted server-side.

Even if the communication channel uses SSL/TLS, however, there are security problems with this approach. The pseudorandom number generator may be predictable or the encryption algorithm insecure, allowing an attacker to spoof the session. Worse, such session IDs can be exposed in URLs, and captured in referrer logs, proxy logs, and browser histories. In addition, many popular web application frameworks set a session cookie before requiring login, then subsequently "upgrade" the session to an authenticated state after redirecting the user to a login page — but using the same cookie. An attacker can hijack such a session simply by visiting the HTTP site before the login takes place and recording the session ID.

Morgan also points out several flaws with cookie-based session IDs at the protocol level. First, a cookie originally sent on HTTPS will also be sent over HTTP on subsequent requests, unless the web developer takes the proactive step of setting the "secure" flag on the cookie, which few application frameworks do. Second, by default stored cookies are available to client-side JavaScript, making them vulnerable to theft via cross-site-scripting attacks. Internet Explorer and Firefox both feature an "HttpOnly" cookie flag to block this behavior, but other browsers do not, and few session-management frameworks have adopted it. Failure to automatically time-out sessions and poorly-implemented single sign-on (SSO) methods also make many cookie-based session management schemes vulnerable.

Digest

Morgan then explains the basics of HTTP Digest Authentication, which was introduced with HTTP 1.1 in RFC 2069, and updated in RFC 2617. Digest authentication is an improvement over HTTP Basic Access Authentication, which simply encodes the username/password combination in base64 and transmits it to the server in plaintext.

Digest authentication is significantly more secure, because it computes a cryptographic hash based on the username, password, HTTP authentication realm, a server-provided nonce, the URI requested, the request method and (optionally) the request body. The RFC 2617 revision also includes a nonce count and a client nonce to further protect the integrity of the request.

The primary reason that digest authentication is not popular with web developers, Morgan says, is that it does not integrate into application and site design. All major web browsers implement HTTP basic and digest authentication in the same way, by launching a generic, modal pop-up window prompting the user for his or her username and password. The pop-up cannot be integrated into page design, nor customized, which makes it unappealing to developers. In addition to that, when using digest authentication there is no established method for the application to log the user out (which is a security risk of its own).

Finally, the current version of HTTP digest authentication specifies the MD5 hash algorithm, which is known to be vulnerable to preimage computation — although using the RFC 2617 mode makes such an attack impractical by incorporating the client-side nonce in its response.

The proposed improvements

Morgan suggests three methods to make digest authentication more accessible — and thus, useful — to web application developers.

The first is to use AJAX to take a username/password combination from an HTML form and generate the HTTP digest authentication request, which preserves the developer's ability to customize and control the login page. Making this method work seamlessly, however, would require a change to the way application respond if the username/password fails. Most web servers send a 401 error code, which causes today's browsers to automatically open a pop-up window; thus negating the work to integrate the authentication into the page. If the server returned a different error code, however, that problem could be avoided.

More difficult is how to provide a system with which the application can trigger a logout. Morgan observes that when most popular browsers receive the 401 Unauthorized error code, they immediately launch the authentication pop-up window, regardless of whether the user was already logged into the site. But this behavior is not specified in the HTTP standard; if browsers simply checked for existing credentials in the cache, a 401 could be used to trigger a logout in the event that the user is logged in, and prompt for credentials if the user is not logged in.

He also suggests another solution, a new HTTP header called "Authentication-Control" that could be used to terminate a session from the web server.

Practical problems

The paper outlines several practical problems that would come with attempting to migrate to digest authentication for session management. To begin with, digest authentication is so rarely used that its implementation in popular web servers and web browsers is immature. Just as bad, today's browsers have weak, bare-bones user interfaces for HTTP authentication. Password managers do not differentiate between credentials stored for HTTP sites and those stored for HTTPS, making man-in-the-middle attacks possible.

The authentication pop-up windows are also sparse in their presentation, offering confusing messages that make phishing attacks possible. Morgan shows example windows from Internet Explorer, Firefox, Safari and Opera, in which the "realm" value sent in the authentication request is used to display an intentionally-misleading message to the user.

Finally, Morgan notes, because application developers have relied on cookie-based session management for several years, they have become accustomed to the application framework handling session management. Switching to digest authentication would mean relying on the web server to manage the session authentication, and that change could meet with resistance.

Reaction

Morgan posted news of the paper to Bugtraq, Full Disclosure, and several other security mailing lists in January. Reaction was mixed; while most agreed with the technical arguments, some thought that the paper did not explain how important web application functionality — namely SSO — could be made to work with digest authentication. The strongest disagreements, however, came from those who argued that the amount of coding and refactoring required to change web application frameworks' authentication systems make the entire argument moot.

The proposal did receive a friendlier reception in mozilla.dev.security, however, where Mozilla's Dan Veditz pointed out similar work on improving web authentication going on at Mozilla Labs.

Will web developers be weaned off of their cookie addiction? Presumably that hinges on whether the popular application frameworks undertake the task of replacing cookies as a session-tracking tool. Morgan argues in the paper that because cookies are widely used for many general-purpose tasks, asking them to correctly implement secure authentication is a losing battle: other concerns, from marketing to SSO, will always be more popular, and get the lion's share of developer attention. But it is interesting to note that an almost-complete, more secure solution already exists in the standards. Perhaps decoupling authentication and cookies would be a quicker process than the naysayers believe.

Comments (13 posted)

Brief items

Blaze: The Spy in the Middle

Matt Blaze looks at the business of SSL man-in-the-middle attacks. "A paper published today by Chris Soghoian and Sid Stamm suggests that the threat may be far more practical than previously thought. They found turnkey surveillance products, marketed and sold to law enforcement and intelligence agencies in the US and foreign countries, designed to collect encrypted SSL traffic based on forged 'look-alike' certificates obtained from cooperative certificate authorities. The products, available only to government agencies, appear sophisticated, mature, and mass-produced, suggesting that 'certified man-in-the-middle' web surveillance is at least commonplace and widespread enough to support an active vendor community."

Comments (30 posted)

New vulnerabilities

asterisk: denial of service

Package(s):asterisk CVE #(s):CVE-2010-0441
Created:March 23, 2010 Updated:April 1, 2010
Description: From the CVE entry:

Asterisk Open Source 1.6.0.x before 1.6.0.22, 1.6.1.x before 1.6.1.14, and 1.6.2.x before 1.6.2.2, and Business Edition C.3 before C.3.3.2, allows remote attackers to cause a denial of service (daemon crash) via an SIP T.38 negotiation with an SDP FaxMaxDatagram field that is (1) missing, (2) modified to contain a negative number, or (3) modified to contain a large number.

Alerts:
Fedora FEDORA-2010-3381 2010-03-03
Fedora FEDORA-2010-3724 2010-03-06

Comments (none posted)

curl: denial of service

Package(s):curl CVE #(s):CVE-2010-0734
Created:March 22, 2010 Updated:March 6, 2012
Description: From the Mandriva advisory:

content_encoding.c in libcurl 7.10.5 through 7.19.7, when zlib is enabled, does not properly restrict the amount of callback data sent to an application that requests automatic decompression, which might allow remote attackers to cause a denial of service (application crash) or have unspecified other impact by sending crafted compressed data to an application that relies on the intended data-length limit.

Alerts:
Ubuntu USN-1158-1 2011-06-24
rPath rPSA-2010-0072-1 2010-10-27
CentOS CESA-2010:0329 2010-04-06
CentOS CESA-2010:0329 2010-04-06
Red Hat RHSA-2010:0329-01 2010-03-30
Red Hat RHSA-2010:0273-05 2010-03-30
Pardus 2010-43 2010-03-29
Debian DSA-2023-1 2010-03-28
Mandriva MDVSA-2010:062 2010-03-19
Gentoo 201203-02 2012-03-05

Comments (none posted)

glpi: cross-site scripting

Package(s):glpi CVE #(s):
Created:March 24, 2010 Updated:March 24, 2010
Description: GLPI suffers from a mysterious cross-site scripting problem in the embedded phpCAS library.
Alerts:
Fedora FEDORA-2010-5106 2010-03-23
Fedora FEDORA-2010-5188 2010-03-23

Comments (none posted)

ikiwiki: cross-site scripting

Package(s):ikiwiki CVE #(s):
Created:March 22, 2010 Updated:March 24, 2010
Description: From the Debian advisory:

Ivan Shmakov discovered that the htmlscrubber component of ikwiki, a wiki compiler, performs insufficient input sanitization on data:image/svg+xml URIs. As these can contain script code this can be used by an attacker to conduct cross-site scripting attacks.

Alerts:
Debian DSA-2020-1 2010-03-20

Comments (none posted)

krb5: denial of service

Package(s):krb5 CVE #(s):CVE-2010-0628
Created:March 24, 2010 Updated:March 30, 2010
Description: From the Ubuntu advisory: Nalin Dahyabhai, Jan iankko Lieskovsky, and Zbysek Mraz discovered that Kerberos did not correctly handle certain GSS packets. An unauthenticated remote attacker could send specially crafted traffic that would cause services using GSS-API to crash, leading to a denial of service.
Alerts:
SuSE SUSE-SR:2010:007 2010-03-30
Fedora FEDORA-2010-4677 2010-03-16
Ubuntu USN-916-1 2010-03-23

Comments (none posted)

mediawiki: information disclosure

Package(s):mediawiki CVE #(s):
Created:March 24, 2010 Updated:March 24, 2010
Description: Mediawiki suffers from a couple of information disclosure vulnerabilities. Editors are allowed to display external images on web pages, making browser information available on the image server. An insufficient permission check allows restricted image files to be viewed more widely than intended.
Alerts:
Debian DSA-2022-1 2010-03-23

Comments (none posted)

php5: denial of service

Package(s):php5 CVE #(s):CVE-2010-0397
Created:March 19, 2010 Updated:December 2, 2010
Description: From the Debian advisory:

Auke van Slooten discovered that PHP 5, an hypertext preprocessor, crashes (because of a NULL pointer dereference) when processing invalid XML-RPC requests.

Alerts:
CentOS CESA-2010:0919 2010-12-01
CentOS CESA-2010:0919 2010-11-30
Red Hat RHSA-2010:0919-01 2010-11-29
SUSE SUSE-SR:2010:017 2010-09-21
Ubuntu USN-989-1 2010-09-20
openSUSE openSUSE-SU-2010:0599-1 2010-09-10
Fedora FEDORA-2010-11428 2010-07-27
Fedora FEDORA-2010-11481 2010-07-27
Fedora FEDORA-2010-11428 2010-07-27
Fedora FEDORA-2010-11481 2010-07-27
Fedora FEDORA-2010-11428 2010-07-27
Fedora FEDORA-2010-11481 2010-07-27
Pardus 2010-104 2010-08-09
Mandriva MDVSA-2010:140 2010-07-27
Mandriva MDVSA-2010:139 2010-07-27
SuSE SUSE-SR:2010:012 2010-05-25
SuSE SUSE-SR:2010:013 2010-06-14
Pardus 2010-44 2010-03-29
Mandriva MDVSA-2010:068 2010-03-27
Debian DSA-2018-1 2010-03-18

Comments (none posted)

puppet: temporary file vulnerability

Package(s):puppet CVE #(s):CVE-2009-3564
Created:March 24, 2010 Updated:March 24, 2010
Description: Puppet contains a temporary file vulnerability which can be exploited by a local user to overwrite arbitrary files.
Alerts:
Ubuntu USN-917-1 2010-03-24
Gentoo 201203-03 2012-03-05

Comments (none posted)

qt: multiple vulnerabilities

Package(s):qt CVE #(s):CVE-2010-0046 CVE-2010-0049 CVE-2010-0050 CVE-2010-0051 CVE-2010-0651 CVE-2010-0052 CVE-2010-0054 CVE-2010-0047 CVE-2010-0048 CVE-2010-0053
Created:March 23, 2010 Updated:March 27, 2012
Description: From the Fedora advisory:

This update fixes several WebKit security issues: * CVE-2010-0046: CSS format() argument memory corruption * CVE-2010-0049: Use of free()d line boxes in mixed LTR/RTL text * CVE-2010-0050: Crash at HTMLParser after handling misnested style tags * CVE-2010-0051 (CVE-2010-0651): Remote information disclosure * CVE-2010-0052: Cached page can result in accessing a destroyed HTMLInputElement * CVE-2010-0054: Use of stale HTMLImageElement pointer.

Alerts:
Fedora FEDORA-2011-16151 2011-11-19
Mandriva MDVSA-2011:039 2011-03-02
SUSE SUSE-SR:2011:002 2011-01-25
openSUSE openSUSE-SU-2011:0024-1 2011-01-12
Fedora FEDORA-2010-8379 2010-05-11
Fedora FEDORA-2010-8360 2010-05-11
Fedora FEDORA-2010-4524 2010-03-15
Fedora FEDORA-2010-4518 2010-03-15
Fedora FEDORA-2012-3483 2012-03-26

Comments (none posted)

samba: symbolic link vulnerability

Package(s):samba CVE #(s):CVE-2010-0926
Created:March 24, 2010 Updated:February 21, 2012
Description: From the Ubuntu advisory: It was discovered the Samba handled symlinks in an unexpected way when both "wide links" and "UNIX extensions" were enabled, which is the default. A remote attacker could create symlinks and access arbitrary files from the server.
Alerts:
SUSE SUSE-SR:2010:014 2010-08-02
SuSE SUSE-SR:2010:008 2010-04-07
SuSE SUSE-SR:2010:007 2010-03-30
Ubuntu USN-918-1 2010-03-24
Red Hat RHSA-2012:0313-03 2012-02-21
Oracle ELSA-2012-0313 2012-03-07

Comments (none posted)

spamass-milter: arbitrary shell execution

Package(s):spamass-milter CVE #(s):
Created:March 22, 2010 Updated:March 24, 2010
Description: From the Debian advisory:

It was discovered a missing input sanitization in spamass-milter, a milter used to filter mail through spamassassin. This allows a remote attacker to inject and execute arbitrary shell commands.

Alerts:
Debian DSA-2021-1 2010-03-22

Comments (none posted)

thunderbird: denial of service

Package(s):thunderbird CVE #(s):CVE-2009-2470
Created:March 18, 2010 Updated:April 23, 2010
Description:

From the Red Hat advisory:

A flaw was found in the way Thunderbird processed SOCKS5 proxy replies. A malicious SOCKS5 server could send a specially-crafted reply that would cause Thunderbird to crash. (CVE-2009-2470)

Alerts:
Mandriva MDVSA-2010:071 2010-04-23
CentOS CESA-2010:0153 2010-03-26
CentOS CESA-2010:0154 2010-03-17
Red Hat RHSA-2010:0153-02 2010-03-17
Red Hat RHSA-2010:0154-02 2010-03-17
Gentoo 201301-01 2013-01-07

Comments (none posted)

thunderbird: arbitrary code execution

Package(s):thunderbird CVE #(s):CVE-2010-0163
Created:March 18, 2010 Updated:August 17, 2010
Description:

From the Ubuntu advisory:

Ludovic Hirlimann discovered a flaw in the way Thunderbird indexed certain messages with attachments. A remote attacker could send specially crafted content and cause a denial of service or possibly execute arbitrary code with the privileges of the user invoking the program. (CVE-2010-0163)

Alerts:
CentOS CESA-2010:0499 2010-08-16
Mandriva MDVSA-2010:071 2010-04-23
Fedora FEDORA-2010-7100 2010-04-21
CentOS CESA-2010:0499 2010-07-21
Red Hat RHSA-2010:0499-01 2010-06-22
SuSE SUSE-SR:2010:013 2010-06-14
Debian DSA-2025-1 2010-03-31
Ubuntu USN-915-1 2010-03-18
Gentoo 201301-01 2013-01-07

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.34-rc2, released (without announcement) on March 20. A lot of changes went in since the -rc1 release; see the short-form changelog for an overview, or see the full changelog for all the details.

Comments (1 posted)

Quotes of the week

With netlink you can do whatever you like - it is like ioctl but without the guilt.
-- Neil Brown

What you've created is no longer a single project, it is called a distro, and you're being short-sighted and anti-social to think you can garner more support than all of those individual packages you forked. This is why most developers work upstream and let the goodness propagate down from the top like molten sugar of each granular package on a flan where it is collected from the rich custard channel sitting on a distribution plate below before the big hungry mouth of the consumer devours it and incorporates it into their infrastructure.
-- Zachary Amsden (Thanks to Michael S. Tsirkin)

What happens is that hundreds of bug reports land in my inbox and I get to route them to various maintainers, most of whom don't exist, so warnings keep on landing in my inbox. Please send a mailing address for my invoices.

It would be more practical, more successful and quicker to hunt down the miscreants and send them rude emails. Plus it would save you money.

-- Andrew Morton

I guess you are talking to the wrong person as i actually have implemented ls functionality in the kernel, using async IO concepts and extreme threading ;-) It was a bit crazy, but was also the fastest FTP server ever running on this planet.
-- Ingo Molnar

Comments (none posted)

Ceph distributed filesystem merged for 2.6.34

Linus's allegedly shorter-than-usual merge window has seemingly mutated into one of the longest merge windows in recent times. Along with big trees for the Microblaze and Blackfin architectures and the SCSI subsystem, the kernel has just gained the Ceph distributed filesystem, a high-performance filesystem intended to scale into the petabyte range.

Comments (26 posted)

SystemTap 1.2 released

SystemTap 1.2 - a dynamic tracing system for kernel and user space - is out. The summary reads: "prototype perf event and hw-breakpoint probing, security fixes, error tolerance script language extensions, optimizations, tapsets, interesting new sample scripts, kernel versions 2.6.9 through 2.6.34-rc." The support for perf events and hardware breakpoints should make a number of tracing tasks easier.

Full Story (comments: 2)

SSL on kernel.org

John "Warthog9" Hawley has announced the availability of SSL encryption (i.e. https) for kernel.org. The kernel bugzilla, wikis, account requests, and the Patchwork patch tracker have all been defaulted to https via an http redirect. In addition, the www, boot, git, and android.git subdomains of kernel.org can use SSL if the user specifies https in the URL. There are no plans to support SSL for mirrors.kernel.org, because "these machines move a large amount of data to a large number of users and it would be difficult, and memory intensive, to provide SSL for this service." Hawley also notes that Thawte donated signed SSL certificates, which "alleviates a large amount of support effort that self-signed certificates would have incurred".

Comments (3 posted)

Piecemeal tracepoints?

By Jake Edge
March 24, 2010

On March 23, Jan Kara proposed a patch that would enable tracepoints selectively for different subsystems at build time. His concern was that debugging one particular area using tracepoints would end up "polluting" other kernel paths with tracepoint checks. Allowing tracing for a particular subsystem, without the potential performance degradation from tracepoint tests in other subsystems, is the goal. But various other kernel hackers saw things differently.

Quite a bit of work has gone into making disabled-but-present tracepoints have a very minimal impact on performance. Frederic Weisbecker described it this way: "each tracepoint is a lightweight thing and induce a tiny overhead, probably hard to notice, and this is going to be even more the case after the jmp label optimization patches." There are lots of benefits to having tracepoints be an "all or none" proposition as well. As part of developing tracepoints, Mathieu Desnoyers thought about and rejected the idea:

When I considered if it was worth it to create such a per-tracepoint group compile-time disabling in the first place, I decided not to do it precisely due to the added-value that comes with the availability of system-wide tracepoints. And I think with the static jump patching, we are now at a point where the overhead is stunningly low.

Ted Ts'o sees that "a lot of the value of tracepoints goes away if people are compiling kernels without them and we need to get a special 'tracing kernel' installed before we can debug a problem". Both Ingo Molnar and Steven Rostedt also agreed, making the prospects for this change rather dim. While piecemeal tracepoints seem attractive at first glance, the value of tracepoints comes, at least partially, from having them all available at once. The belief and hope is that they are built into nearly every kernel, so that when problems arise, they are there, ready to be used.

Comments (none posted)

The end for Video4Linux1

By Jonathan Corbet
March 24, 2010
The Video4Linux1 (V4L1) ABI is deprecated, and has been for a long time; it was ostensibly replaced by Video4Linux2 in the 2.5 development series. But, as has been discovered many times, an ABI is a hard thing to get rid of. So the kernel still supports V4L1 applications; indeed, there are still V4L1-only drivers in current kernels. That situation has persisted for a long time, but it may now be coming to an end.

Hans Verkuil has posted a multi-stage proposal for the removal of V4L1 from the kernel. The first phase involves the conversion of the remaining V4L1 drivers - of which there are several - to the newer ABI. Some of those drivers have since been supplanted by GSPCA and may just be deleted outright. All told, this is a bit of much-needed janitorial work.

Phase 2 may be a bit more controversial, though, in that it calls for the removal of the V4L1 compatibility layer in the kernel. This code allows V4L1 applications to work with V4L2 drivers - most of the time. It was an important bit of backward compatibility support, but it has also helped to delay the updating of a number of old V4L1 applications. Given that these applications do still exist (many distributions still ship xawtv, for example), it might be a bit surprising that this layer is slated for removal, perhaps as soon as 2.6.36.

There are problems with the compatibility layer. It cannot provide access to much of the functionality of contemporary hardware and drivers, it cannot always do the right thing in response to application requests, and it has been a long time since anybody had any interest in maintaining this code. So the V4L developers would like to push it out into user space, and into the libv4l1 library in particular. Supporting old applications would then be a matter of a quick edit (replacing ioctl() calls with v4l1_ioctl(), for example) and a rebuild against the library. Some old applications may be pulled into the V4L project, since their original maintainers have almost certainly long since lost interest.

It's not a perfect solution; old, binary applications will cease to work on newer kernels. It is an ABI break, plain and simple, and it is possible that there will be enough of an uproar to prevent this change from happening in the end. But it may also be that nobody really cares about running binary V4L1 applications on new kernels, and that it is truly time for this old interface to pass into history.

Comments (5 posted)

Kernel development news

KVM, QEMU, and kernel project management

By Jonathan Corbet
March 23, 2010
The KVM virtualization subsystem is seen as one of the great success stories of contemporary kernel development. KVM came from nowhere into a situation with a number of established players - both free and proprietary - and promptly found a home in the kernel and in the marketing plans of a number of Linux companies. Both the code and its development model are seen as conforming much more closely to the Linux way of doing things than the alternatives; KVM is expected to be the long-term virtualization solution for Linux. So, one might well wonder, why has KVM been the topic of one of the more massive and less pleasant linux-kernel discussions in some time?

Yanmin Zhang was probably not expecting to set off a flame war with the posting of a patch adding a set of KVM-related commands to the "perf" tool. The value of this patch seems obvious: beyond allowing a host to collect performance statistics on a running guest, it enables the profiling of the host/guest combination as a whole. One can imagine that there would be value to being able to see how the two systems interact.

The problem, it seems, is that this feature requires that the host have access to specific information from the running KVM guest: at a minimum, it needs the guest kernel's symbol table. More involved profiling will require access to files in the guest's namespaces. To this end, Ingo Molnar suggested that life would be easier if the host could mount (read-only) all of the filesystems which were active in the guest. It would also be nice, he said elsewhere, if the host could easily enumerate running guests and assign names to them.

The response he got was "no way." Various security issues were raised, despite the fact that the filesystems on the host would not be world-readable, and despite the fact that, in the end, the host has total control over the guest anyway. Certainly there are some interesting questions, especially when frameworks like SELinux are thrown into the mix. But Ingo took that answer as a statement of unwillingness to cooperate with other developers to improve the usability of KVM, especially on developers' desktop systems. What followed was a sometimes acrimonious and often repetitive discussion between Ingo and KVM developer Avi Kivity, with a small group of supporting actors on both sides.

Ingo's position is that any development project, to be successful, must make life easy for users who contribute code. So, he says, the system should be most friendly toward developers who want to run KVM on their desktop. Beyond that, he claims that a stronger desktop orientation is crucial to our long-term success in general:

I.e. the kernel can very much improve quality all across the board by providing a sane default (in the ext3 case) - or, as in the case of perf, by providing a sane 'baseline' tooling. It should do the same for KVM as well.

If we don't do that, Linux will eventually stop mattering on the desktop - and some time after that, it will vanish from the server space as well. Then, may it be a decade down the line, you won't have a KVM hacking job left, and you won't know where all those forces eliminating your project came from.

Avi, needless to say, sees things differently:

It's a fact that virtualization is happening in the data center, not on the desktop. You think a kvm GUI can become a killer application? fine, write one. You don't need any consent from me as kvm maintainer (if patches are needed to kvm that improve the desktop experience, I'll accept them, though they'll have to pass my unreasonable microkernelish filters). If you're right then the desktop kvm GUI will be a huge hit with zillions of developers and people will drop Windows and switch to Linux just to use it.

But my opinion is that it will end up like virtualbox, a nice app that you can use to run Windows-on-Linux, but is not all that useful.

Ingo's argument is not necessarily that users will flock to the platform, though; what seems to be important is attracting developers. A KVM which is easier to work with should inspire developers to work with it, improving its quality further. Anthony Liguori, though, points out that the much nicer desktop experience provided by VirtualBox has not yet brought in a flood of developers to fix its performance problems.

Another thing that Ingo is unhappy with is the slow pace of improvement, especially with regard to the QEMU emulator used to provide a full system environment for guest systems. A big part of the problem, he says, is the separation between the KVM and QEMU, despite the fact that they are fairly tightly-coupled components. Ingo claimed that this separation is exactly the sort of problem which brought down Xen, and that the solution is to pull QEMU into the kernel source tree:

If you want to jump to the next level of technological quality you need to fix this attitude and you need to go back to the design roots of KVM. Concentrate on Qemu (as that is the weakest link now), make it a first class member of the KVM repo and simplify your development model by having a single repo.

From Ingo's point of view, such a move makes perfect sense. KVM is the biggest user of the QEMU project which, he says, was dying before KVM came along. Bundling the two components would allow ABI work to be done simultaneously on both sides of the interface, with simultaneous release dates. Kernel and user-space developers would be empowered to improve the code on both sides of the boundary. Bringing perf into the kernel tree, he says, grew the external developer community from one to over 60 in less than one year. Indeed, integration into the kernel tree is the reason why perf has been successful:

If you are interested in the first-hand experience of the people who are doing the perf work then here it is: by far the biggest reason for perf success and perf usability is the integration of the user-space tooling with the kernel-space bits, into a single repository and project.

Clearly, Ingo believes that integrating QEMU into the kernel tree would have similar effects there. Just as clearly, the KVM and QEMU developers disagree. To them, this proposal looks like a plan to fork QEMU development - though, it should be said, KVM already uses a forked version of QEMU. This fork, Avi says, is "definitely hurting." According to Anthony, moving QEMU into the kernel tree would widen that fork:

We lose a huge amount of users and contributors if we put QEMU in the Linux kernel. As I said earlier, a huge number of our contributions come from people not using KVM.

The KVM/QEMU developers are unconvinced that they will get more developers by moving the code into the kernel tree, and they seem frankly amused by the notion that kernel developers might somehow produce a more desktop-oriented KVM. They see the separation of the projects as not being a problem, and wonder where the line would be drawn; Avi suggested that the list of projects which don't belong in the kernel might be shorter in the end. In summary, they see a system which does not appear to be broken - QEMU is said to be improving quickly - and that "fixing" it by merging repositories is not warranted.

Particular exception was taken to Ingo's assertion that a single repository allows for quicker and better development of the ABI between the components. Slower, says Zachary Amsden, tends to be better in these situations:

This is actually a Good Thing (tm). It means you have to get your feature and its interfaces well defined and able to version forwards and backwards independently from each other. And that introduces some complexity and time and testing, but in the end it's what you want. You don't introduce a requirement to have the feature, but take advantage of it if it is there.

Ingo, though, sees things differently based on his experience over time:

It didn't work, trust me - and i've been around long enough to have suffered through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as well... And you can also see the countless examples of carefully drafted, well thought out, committee written computer standards that were honed for years, which are not worth the paper they are written on.

'extra time' and 'extra bureaucratic overhead to think things through' is about the worst thing you can inject into a development process.

As the discussion wound down, it seemed clear that neither side had made much progress in convincing the other of anything. That means that the status quo will prevail; if the KVM maintainers are not interested in making a change, the rest of the community will be hard-put to override them. Such things have happened - the x86 and x86-64 merger is a classic example - but to override a maintainer in that way requires a degree of consensus in the community which does not appear to be present here. Either that, or a decree from Linus - and he has been silent in this debate.

So the end result looks like this:

Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM instrumentation features: due to lack of robust+universal vcpu/guest enumeration and due to lack of robust+universal symbol access on the KVM side. It was a really promising feature IMO and i invested two days of arguments into it trying to find a workable solution, but it was not to be.

Whether that's really the end for "perf kvm" remains to be seen; it's a clearly useful feature that may yet find a way to get into the kernel. But this disconnect between the KVM developers and the perf developers is a clear roadblock in the way of getting this sort of feature merged for now.

Comments (130 posted)

Using the TRACE_EVENT() macro (Part 1)

March 24, 2010

This article was contributed by Steven Rostedt

Throughout the history of Linux, people have been wanting to add static tracepoints — functions that record data at a specific site in the kernel for later retrieval — to the kernel. Those efforts weren't very successful because of the fear that tracepoints would sacrifice performance. Unlike the Ftrace function tracer, a tracepoint can record more than just the function being entered. A tracepoint can record local variables of the function. Over time, various strategies for adding tracepoints have been tried, with varying success, and the TRACE_EVENT() macro is the latest way to add kernel tracepoints.

History

Mathieu Desnoyers worked on adding a very low overhead tracer hook called trace markers. Even though the trace markers solved the performance issue by using cleverly crafted macros, the information that the trace marker would record was embedded at the location in the core kernel as a printf format. This upset several core kernel developers as it made the core kernel code look like debug code was left scattered throughout.

In trying to appease the kernel developers, Mathieu came up with tracepoints. The tracepoint included a function call in the kernel code that, when enabled, would call a callback function passing the parameters of the tracepoint to that function as if the callback function was called with those parameters. This was much better than the trace markers since it allowed the passing of type casted pointers that the callback functions could dereference, as opposed to the marker interface, which required the callback function to parse a string. With the tracepoint, the callback function could efficiently take whatever it needed from the structures.

Although this was an improvement over trace markers, it was still too tedious for developers to create a callback for every tracepoint they wanted to add, so that a tracer would output its data. The kernel needed a more automated way to connect a tracer to the tracepoints. That would require automating the creation of the callback and also format its data, much like what the trace marker did, but it should be done in the callback, and not at the tracepoint site in the kernel code.

To solve this issue of automating the tracepoints, the TRACE_EVENT() macro was born. Inspired by Tom Zanussi's zedtrace, this macro was specifically made to allow a developer to add tracepoints to their subsystem and have Ftrace automatically be able to trace them. The developer need not understand how Ftrace works, they only need to create their tracepoint using the TRACE_EVENT() macro. In addition, they need to follow some guidelines in how to create a header file and they would gain full access to the Ftrace tracer. Another objective of the design of the TRACE_EVENT() macro was to not couple it to Ftrace or any other tracer. It is agnostic to the tracers that use it, which is apparent now that TRACE_EVENT() is also used by perf, LTTng and SystemTap.

The anatomy of the TRACE_EVENT() macro

Automating tracepoints had various requirements that must be fulfilled:

  • It must create a tracepoint that can be placed in the kernel code.

  • It must create a callback function that can be hooked to this tracepoint.

  • The callback function must be able to record the data passed to it into the tracer ring buffer in the fastest way possible.

  • It must create a function that can parse the data recorded to the ring buffer and translate it to a human readable format that the tracer can display to a user.

To accomplish that, the TRACE_EVENT() macro is broken into six components, which correspond to the parameters of the macro:

   TRACE_EVENT(name, proto, args, struct, assign, print)
  • name - the name of the tracepoint to be created.

  • prototype - the prototype for the tracepoint callbacks

  • args - the arguments that match the prototype.

  • struct - the structure that a tracer could use (but is not required to) to store the data passed into the tracepoint.

  • assign - the C-like way to assign the data to the structure.

  • print - the way to output the structure in human readable ASCII format.

A good example of a tracepoint definition, for sched_switch, can be found here. That definition will be used below to describe each of the parts of TRACE_EVENT() macro.

All parameters except the first one are encapsulated with another macro (TP_PROTO, TP_ARGS, TP_STRUCT__entry, TP_fast_assign and TP_printk). These macros give more control in processing and also allow commas to be used within the TRACE_EVENT() macro.

Name

The first parameter is the name.

   TRACE_EVENT(sched_switch,

This is the name used to call this tracepoint. The actual tracepoint that is used has trace_ prefixed to the name (ie. trace_sched_switch).

Prototype

The next parameter is the prototype.

    TP_PROTO(struct rq *rq, struct task_struct *prev, struct task_struct *next),

The prototype is written as if you were to declare the tracepoint directly:

    trace_sched_switch(struct rq *rq, struct task_struct *prev,
                       struct task_struct *next);

It is used as the prototype for both the tracepoint added to the kernel code and for the callback function. Remember, a tracepoint calls the callback functions as if the callback functions were being called at the location of the tracepoint.

Arguments

The third parameter is the arguments used by the prototype.

    TP_ARGS(rq, prev, next),

It may seem strange that this is needed, but it is not only required by the TRACE_EVENT() macro, it is also required by the tracepoint infrastructure underneath. The tracepoint code, when activated, will call the callback functions (more than one callback may be assigned to a given tracepoint). The macro that creates the tracepoint must have access to both the prototype and the arguments. Below is an illustration of what a tracepoint macro would need to accomplish this:

    #define TRACE_POINT(name, proto, args) \
       void trace_##name(proto)            \
       {                                   \
               if (trace_##name##_active)  \
                       callback(args);     \
       }
Structure

The fourth parameter is a bit more complex.

    TP_STRUCT__entry(
		__array(	char,	prev_comm,	TASK_COMM_LEN	)
		__field(	pid_t,	prev_pid			)
		__field(	int,	prev_prio			)
		__field(	long,	prev_state			)
		__array(	char,	next_comm,	TASK_COMM_LEN	)
		__field(	pid_t,	next_pid			)
		__field(	int,	next_prio			)
    ),

This parameter describes the structure layout of the data that will be stored in the tracer's ring buffer. Each element of the structure is defined by another macro. These macros are used to automate the creation of a structure and are not function-like. Notice that the macros are not separated by any delimiter (no comma nor semicolon).

The macros used by the sched_switch tracepoint are:

  • __field(type, name) - this defines a normal structure element, like int var; where type is int and name is var.

  • __array(type, name, len) - this defines an array item, equivalent to int name[len]; where the type is int the name of the array is array and the number of items in the array is len.

There are other element macros that will be described in a later article. The definition from the sched_switch tracepoint would produce a structure that looks like:

    struct {
	      char   prev_comm[TASK_COMM_LEN];
	      pid_t  prev_pid;
	      int    prev_prio;
	      long   prev_state;
	      char   next_comm[TASK_COMM_LEN];
	      pid_t  next_pid;
	      int    next_prio;
    };

Note that the spacing used in the TP_STRUCT__entry definition breaks the rules outlined by checkpatch.pl. That is done because these macros are not function-like but, instead, are used to define a structure. The spacing follows the rules of structure spacing and not of function spacing, so that the names line up in the structure declaration. Needless to say, checkpatch.pl fails horribly when processing changes to TRACE_EVENT() definitions.

Assignment

The fifth parameter defines the way the data from the parameters is saved to the ring buffer.

    TP_fast_assign(
		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
		__entry->prev_pid	= prev->pid;
		__entry->prev_prio	= prev->prio;
		__entry->prev_state	= prev->state;
		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
		__entry->next_pid	= next->pid;
		__entry->next_prio	= next->prio;
    ),

The code within the TP_fast_assign() is normal C code. A special variable __entry represents the pointer to a structure type defined by TP_STRUCT__entry and points directly into the ring buffer. The TP_fast_assign is used to fill all fields created in TP_STRUCT__entry. The variable names of the parameters defined by TP_PROTO and TP_ARGS can then be used to assign the appropriate data into the __entry structure.

Print

The last parameter defines how a printk() can be used to print out the fields from the TP_STRUCT__entry structure.

	TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s ==> " \
 		  "next_comm=%s next_pid=%d next_prio=%d",
		__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
		__entry->prev_state ?
		  __print_flags(__entry->prev_state, "|",
				{ 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" },
				{ 16, "Z" }, { 32, "X" }, { 64, "x" },
				{ 128, "W" }) : "R",
		__entry->next_comm, __entry->next_pid, __entry->next_prio)

Once again the variable __entry is used to reference the pointer to the structure that contains the data. The format string is just like any other printf format. The __print_flags() is part of a set of helper functions that come with TRACE_EVENT(), and will be covered in another article. Do not create new tracepoint-specific helpers, because that will confuse user-space tools that know about the TRACE_EVENT() helper macros but will not know how to handle ones created for individual tracepoints.

Format file

The sched_switch TRACE_EVENT() macro produces the following format file in /sys/kernel/debug/tracing/events/sched/sched_switch/format:

   name: sched_switch
   ID: 33
   format:
	field:unsigned short common_type;	offset:0;	size:2;
	field:unsigned char common_flags;	offset:2;	size:1;
	field:unsigned char common_preempt_count;	offset:3;	size:1;
	field:int common_pid;	offset:4;	size:4;
	field:int common_lock_depth;	offset:8;	size:4;

	field:char prev_comm[TASK_COMM_LEN];	offset:12;	size:16;
	field:pid_t prev_pid;	offset:28;	size:4;
	field:int prev_prio;	offset:32;	size:4;
	field:long prev_state;	offset:40;	size:8;
	field:char next_comm[TASK_COMM_LEN];	offset:48;	size:16;
	field:pid_t next_pid;	offset:64;	size:4;
	field:int next_prio;	offset:68;	size:4;

   print fmt: "task %s:%d [%d] (%s) ==> %s:%d [%d]", REC->prev_comm, REC->prev_pid,
   REC->prev_prio, REC->prev_state ? __print_flags(REC->prev_state, "|", { 1, "S"} ,
   { 2, "D" }, { 4, "T" }, { 8, "t" }, { 16, "Z" }, { 32, "X" }, { 64, "x" }, { 128,
   "W" }) : "R", REC->next_comm, REC->next_pid, REC->next_prio

Note: Newer kernels may also display a signed entry for each field.

Notice that __entry is replaced with REC in the format file. The first set of fields (common_*) are not from the TRACE_EVENT() macro, but are added to all events by Ftrace, which created this format file, other tracers could add different fields. The format file provides user-space tools the information needed to parse the binary output containing sched_switch entries.

The header file

The TRACE_EVENT() macro cannot just be placed anywhere in the expectation that it will work with Ftrace or any other tracer. The header file that contains the TRACE_EVENT() macro must follow a certain format. These header files typically are located in the include/trace/events directory but do not need to be. If they are not located in this directory, then other configurations are necessary.

The first line in the TRACE_EVENT() header is not the normal #ifdef _TRACE_SCHED_H, but instead has:

   #undef TRACE_SYSTEM
   #define TRACE_SYSTEM sched

   #if !defined(_TRACE_SCHED_H) || defined(TRACE_HEADER_MULTI_READ)
   #define _TRACE_SCHED_H

This example is for scheduler trace events, other event headers would use something other than sched and _TRACE_SCHED_H. The TRACE_HEADER_MULTI_READ test allows this file to be included more than once; this is important for the processing of the TRACE_EVENT() macro. The TRACE_SYSTEM must also be defined for the file and must be outside the guard of the #if. The TRACE_SYSTEM defines what group the TRACE_EVENT() macros in the file belong to. This is also the directory name that the events will be grouped under in the debugfs tracing/events directory. This grouping is important for Ftrace as it allows the user to enable or disable events by group.

The file then includes any headers required by the contents of the TRACE_EVENT() macro. (e.g. #include <linux/sched.h>). The tracepoint.h file is required.

   #include <linux/tracepoint.h>

All the trace events can now be defined with TRACE_EVENT() macros. Please include comments that describe the tracepoint above the TRACE_EVENT() macros. Look at include/trace/events/sched.h as an example. The file ends with:

   #endif /* _TRACE_SCHED_H */

   /* This part must be outside protection */
   #include <trace/define_trace.h>

The define_trace.h is where all the magic lies in creating the tracepoints. The explanation of how this file works will be left to another article. For now, it is sufficient to know that this file must be included at the bottom of the trace header file outside the protection of the #endif.

Using the tracepoint

Defining the tracepoint is meaningless if it is not used anywhere. To use the tracepoint, the trace header must be included, but one C file (and only one) must also define CREATE_TRACE_POINTS before including the trace. This will cause the define_trace.h to create the necessary functions needed to produce the tracing events. In kernel/sched.c the following is defined:

   #define CREATE_TRACE_POINTS
   #include <trace/events/sched.h>

If another file needs to use tracepoints that were defined in the trace file, then it only needs to include the trace file, and does not need to define CREATE_TRACE_POINTS. Defining it more than once for the same header file will cause linker errors when building. For example, in kernel/fork.c only the header file is included:

   #include <trace/events/sched.h>

Finally, the tracepoint is used in the code just as it was defined in the TRACE_EVENT() macro:

   static inline void
   context_switch(struct rq *rq, struct task_struct *prev,
	          struct task_struct *next)
   {
	   struct mm_struct *mm, *oldmm;

	   prepare_task_switch(rq, prev, next);
	   trace_sched_switch(rq, prev, next);
	   mm = next->mm;
	   oldmm = prev->active_mm;

Coming soon

This article explained all that is needed to create a basic tracepoint within the core kernel code. Part 2 will describe how to consolidate tracepoints to keep the tracing footprint small, along with information about the TP_STRUCT__entry macros and TP_printk helper functions (like __print_flags). Part 3 will look at defining tracepoints outside of the include/trace/events directory (for modules and architecture-specific tracepoints) as well as a look at how the TRACE_EVENT() macro does its magic. Both articles will have a few practical examples of how to use tracepoints. Stay tuned ...

Comments (1 posted)

Huge pages part 5: A deeper look at TLBs and costs

March 23, 2010

This article was contributed by Mel Gorman

[Editor's note: this is the fifth and final installment in Mel Gorman's series on the use of huge pages in Linux. Parts 1, 2, 3 and 4 are available for those who have not read them yet. Many thanks to Mel for letting us run this series at LWN.]

This chapter is not necessary to understand how huge pages are used and performance benefits from huge pages are often easiest to measure using an application-specific benchmark. However, there are the rare cases where a deeper understanding of the TLB can be enlightening. In this chapter, a closer look is taken at TLBs and analysing performance from a huge page perspective.

1 TLB Size and Characteristics

First off, it can be useful to know what sort of TLB the system has. On X86 and X86-64, the tool x86info can be used to discover the TLB size.

    $ x86info -c
      ...
      TLB info
       Instruction TLB: 4K pages, 4-way associative, 128 entries.
       Instruction TLB: 4MB pages, fully associative, 2 entries
       Data TLB: 4K pages, 4-way associative, 128 entries.
       Data TLB: 4MB pages, 4-way associative, 8 entries
      ...

On the PPC64 architecture, there is no automatic means of determining the number of TLB slots. PPC64 uses multiple translation-related caches of which the TLB is at the lowest layer. It is safe to assume on older revisions of POWER - such as the PPC970 - that 1024 entries are available. POWER 5+ systems will have 2048 entries and POWER 6 does not use a TLB. On PPC64, the topmost translation layer uses an Effective to Real Address Translation (ERAT) cache. On POWER 6, it supports 4K and 64K entries but typically the default huge page size of 16MB consumes multiple ERAT entries. Hence, the article will focus more on the TLB than on ERAT.

2 Calculating TLB Translation Cost

When deciding whether huge pages will be of benefit, the first step is estimating how much time is being spent translating addresses. This will approximate the upper-boundary of performance gains that can be achieved using huge pages. This requires that the number of TLB misses that occurred is calculated as well as the average cost of a TLB miss.

On much modern hardware, there is a Performance Measurement Unit (PMU) which provides a small number of hardware-based counters. The PMU is programmed to increment when a specific low-level event occurs and interrupt the CPU when a threshold, called the sample period, is reached. In many cases, there will be one low-level event that corresponds to a TLB miss so a reasonable estimate can be made of the number of TLB misses.

On Linux, the PMU can be programmed with oprofile on almost any kernel currently in use, or with perf on recent kernels. Unfortunately, perf is not suitable for the analysis we need in this installment. Perf maps high-level requests, such as cache misses, to suitable low-level events. However it is not currently able to map certain TLB events, such as the number of cycles spent walking a page table. It is technically possible to specify a raw event ID to perf, but figuring out the raw ID is error-prone and tricky to verify. Hence, we will be using oprofile to program the PMU in this installment.

A detailed examination of the hardware specification may yield an estimate for the cost of a TLB miss, but it is time-consuming and documentation is not always sufficient. Broadly speaking, there are three means of estimating the TLB cost in the absence of documentation. The simplest case is where the TLB is software-filled and the operating system is responsible for filling the TLB. Using a profiling tool, the number of times the TLB miss handler was called and the time spent can be recorded. This gives an average cost of the TLB miss but software-filled TLBs are not common in mainstream machines. The second method is to use an analysis program such as Calibrator [manegold04] that guesses characteristics of cache and the TLB. While there are other tools that exist that claim to be more accurate [yotov04a][yotov04b], Calibrator has the advantage of being still available for download and it works very well for X86 and X86-64 architectures. Its use is described below.

Calibrator does not work well on PPC64 as the TLB is the lowest layer where as Calibrator measures the cost of an ERAT miss at the highest layer. On PPC64, there is a hardware counter that calculates the number of cycles spent doing page table walks. Hence, when automatic measurement fails, it may be possible to measure the TLB cost using the PMU as described in Section 2.3, below.

Once the number of TLB misses and the average cost of a miss is known, the percentage time spent servicing TLB misses is easily calculated.

2.1 Estimating Number of TLB Misses

Oprofile can be used to estimate the number of TLB misses using the PMU. This article will not go in-depth on how PMUs and oprofile work but, broadly speaking, the PMU counts low-level events such as a TLB miss. To avoid excessive overhead, only a sample-period number of events are recorded. When the sample-period is reached, an interrupt is raised and oprofile records the details of that event. An estimate of the real number of TLB misses that occurred is then

EstimatedTLBMisses = TLBMissesSampled * SamplePeriod

The output below shows an example oprofile session that sampled Data-TLB (DTLB) misses within a benchmark.

  $ opcontrol --setup --event PM_CYC_GRP22:50000 --event PM_DTLB_MISS_GRP22:1000
              --vmlinux=/vmlinux
  $ opcontrol --start
  Using 2.6+ OProfile kernel interface.
  Reading module info.
  Using log file /var/lib/oprofile/samples/oprofiled.log
  Daemon started.
  Profiler running.
  $ ./benchmark
  $ opcontrol --stop
  $ opcontrol --dump
  $ opreport
  CPU: ppc64 970MP, speed 2500 MHz (estimated)
  Counted PM_CYC_GRP22 events ((Group 22 pm_pe_bench4) Processor cycles)
          with a unit mask of 0x00 (No unit mask) count 50000
  Counted PM_DTLB_MISS_GRP22 events ((Group 22 pm_pe_bench4) Data TLB misses)
          with a unit mask of 0x00 (No unit mask) count 1000
  PM_CYC_GRP22:5...|PM_DTLB_MISS_G...|
    samples|      %|  samples|      %|
  ------------------------------------
     622512 98.4696      9651 97.8506 benchmark
       4170  0.6596        11  0.1115 libc-2.9.so
       3074  0.4862         1  0.0101 oprofiled
        840  0.1329         4  0.0406 bash
        731  0.1156       181  1.8351 vmlinux-2.6.31-rc5
        572  0.0905        14  0.1419 ld-2.9.so

Note in the figure that 9651 samples were taken and the sample period was 1000. Therefore it is reasonable to assume, using the equation above, that the benchmark incurred 9,651,000 DTLB misses. Analysis of a more complex benchmark would also include misses incurred by libraries.

2.2 Estimating TLB Miss Cost using Calibrator

Calibrator should be used on machines where the TLB is the primary cache for translating virtual to physical addresses. This is the case for X86 and X86-64 machines but not for PPC64 where there are additional translation layers. The first step is to setup a working directory and obtain the calibrator tool.

  $ wget http://homepages.cwi.nl/~manegold/Calibrator/v0.9e/calibrator.c
  $ gcc calibrator.c -lm -o calibrator
  calibrator.c:131: warning: conflicting types for built-in function 'round'

The warning is harmless. Note the lack of compiler optimisation options specified which is important so as not to skew the results reported by the tool. Running Calibrator with no parameters gives:

  $ ./calibrator 
  Calibrator v0.9e
  (by Stefan.Manegold@cwi.nl, http://www.cwi.nl/ manegold/)

  ! usage: './calibrator <MHz> <size>[k|M|G] <filename>` !

The CPU MHz parameter is used to estimate the time in nanoseconds a TLB miss costs. The information is not automatically retrieved from /proc/ as the tool was intended to be usable on Windows, but this shell script should discover the MHz value on many Linux installations. size is the size of work array to allocate. It must be sufficiently large that the cache and TLB reach are both exceeded to have any chance of accuracy but in practice much higher values were required. The poorly named parameter filename is the prefix given to the output graphs and gnuplot files.

This page contains a wrapper script around Calibrator that outputs the approximate cost of a TLB miss as well as how many TLB misses must occur to consume a second of system time. An example running the script on an Intel Core Duo T2600 is as follows:

  $ ./run-calibrator.sh
  Running calibrator with size 13631488: 19 cycles 8.80 ns 
  Running calibrator with size 17563648: 19 cycles 8.80 ns matched 1 times
  Running calibrator with size 21495808: 19 cycles 8.80 ns matched 2 times
  Running calibrator with size 25427968: 19 cycles 8.80 ns matched 3 times

  TLB_MISS_LATENCY_TIME=8.80
  TLB_MISS_LATENCY_CYCLES=19
  TLB_MISSES_COST_ONE_SECOND=114052631

In this specific example, the estimated cost of a TLB miss is 19 clock cycles or 8.80ns. It is interesting to note that the cost of an L2 cache miss on the target machine is 210 cycles, making it likely that the hardware is hiding most of the latency cost using pre-fetching or a related technique. Compare the output with the following from an older generation machine based on the AMD Athlon 64 3000+, which has a two-level TLB structure:

  $ ./run-calibrator.sh 
  Running calibrator with size 13631488: 16 cycles 8.18 ns 
  Running calibrator with size 17563648: 19 cycles 9.62 ns 
  Running calibrator with size 21495808: 19 cycles 9.54 ns matched 1 times
  Running calibrator with size 25427968: 19 cycles 9.57 ns matched 2 times
  Running calibrator with size 29360128: 34 cycles 16.96 ns 
  Running calibrator with size 33292288: 34 cycles 16.99 ns matched 1 times
  Running calibrator with size 37224448: 37 cycles 18.17 ns 
  Running calibrator with size 41156608: 37 cycles 18.17 ns matched 1 times
  Running calibrator with size 45088768: 36 cycles 18.16 ns matched 2 times
  Running calibrator with size 49020928: 37 cycles 18.17 ns matched 3 times

  TLB_MISS_LATENCY_TIME=18.17
  TLB_MISS_LATENCY_CYCLES=37
  TLB_MISSES_COST_ONE_SECOND=54297297

While calibrator will give a reasonable estimate of the cost, some manual adjustment may be required based on observation.

2.3 Estimating TLB Miss Cost using Hardware

When the TLB is not the topmost translation layer, Calibrator is not suitable to measure the cost of a TLB miss. In the specific case of PPC64, Calibrator measures the cost of an ERAT miss but the ERAT does not always support all the huge page sizes. In the event a TLB exists on POWER, it is the lowest level of translation and it supports huge pages. Due to this, measuring the cost of a TLB miss requires help from the PMU.

Two counters are minimally required - one to measure the number of TLB misses and a second to measure the number of cycles spent walking page tables. The exact name of the counters will vary but for the PPC970MP, the PM_DTLB_MISS_GRP22 counter for TLB misses and PM_DATA_TABLEWALK_CYC_GRP30 counters are suitable.

To use the PMU, a consistent test workload is required that generates a relatively fixed number of TLB misses per run. The simplest workload to use in this case is STREAM. First, download and build stream:

  $ wget http://www.cs.virginia.edu/stream/FTP/Code/stream.c
  $ gcc -O3 -DN=44739240 stream.c -o stream

The value of N is set such that the total working set of the benchmark will be approximately 1GB.

Ideally, the number of DTLB misses and cycles spent walking page tables would be measured at the same time but due to limitations of the PPC970MP, they must be measured in two separate runs. Because of this, it is very important that the cycles be sampled at the same time and it is essential that the samples taken for cycles in each of the two runs are approximately the same. This will require you to scale the sample rate for the DTLB and page table walk events appropriately. Here are two oprofile reports based on running STREAM.

  CPU: ppc64 970MP, speed 2500 MHz (estimated)
  Counted PM_CYC_GRP30 events ((Group 30 pm_isource) Processor cycles)
          with a unit mask of 0x00 (No unit mask) count 50000
  Counted PM_DATA_TABLEWALK_CYC_GRP30 events ((Group 30 pm_isource) Cycles
	  doing data tablewalks) with a unit mask of 0x00 (No unit mask)
	  count 10000
  PM_CYC_GRP30:5...|PM_DATA_TABLEW...|
    samples|      %|  samples|      %|
  ------------------------------------
     604695 97.9322    543702 99.3609 stream

  CPU: ppc64 970MP, speed 2500 MHz (estimated)
  Counted PM_CYC_GRP23 events ((Group 23 pm_hpmcount1) Processor cycles)
          with a unit mask of 0x00 (No unit mask) count 50000
  Counted PM_DTLB_MISS_GRP23 events ((Group 23 pm_hpmcount1) Data TLB mis
          with a unit mask of 0x00 (No unit mask) count 1000
  PM_CYC_GRP23:5...|PM_DTLB_MISS_G...|
    samples|      %|  samples|      %|
  ------------------------------------
     621541 98.5566      9644 98.0879 stream

The first point to note is that the samples taken for PM_CYC_GRP are approximately the same. This required that the sample period for PM_DATA_TABLEWALK_CYC_GRP30 be 10000 instead of the minimum allowed of 1000. The average cost of a DTLB miss is now trivial to estimate.

    PageTableCycles = CyclesSampled * SamplePeriod 
    		    = 543702 * 10000

    TLBMisses = TLBMissSampled * SamplePeriod 
    	      = 9644 * 1000

    TLBMissCost = PageTableWalkCycles/TLBMisses 
                = 5437020000/9644000 
		= ~563 cycles

Here the TLB-miss cost on PPC64 is observed to be much higher than on comparable X86 hardware. However, take into account that the ERAT translation cache hides most of the cost translating addresses and it's miss cost is comparable. This is similar in principal to having two levels of TLB.

2.4 Estimating Percentage Time Translating

Once the TLB miss cost estimate is available, estimates for any workload depend on a profile showing cycles spent within the application and the DTLB samples such as the following report.

  CPU: ppc64 970MP, speed 2500 MHz (estimated)
  Counted PM_CYC_GRP22 events ((Group 22 pm_pe_bench4) Processor cycles)
          with a unit mask of 0x00 (No unit mask) count 50000
  Counted PM_DTLB_MISS_GRP22 events ((Group 22 pm_pe_bench4) Data TLB misses)
          with a unit mask of 0x00 (No unit mask) count 1000
  PM_CYC_GRP22:5...|PM_DTLB_MISS_G...|
    samples|      %|  samples|      %|
  ------------------------------------
     156295 95.7408      2425 96.4215 stream

The calculation of the percentage of time spent servicing TLB misses is then as follows

    CyclesExecuted = CyclesSamples * SampleRateOfCycles
     		   = 156292 * 50000 
		   = 7814600000 cycles

    TLBMissCycles = TLBMissSamples * SampleRateOfTLBMiss * TLBMissCost
     		  = 2425 * 1000 * 563 
    		  = 1365275000

    PercentageTimeTLBMiss = (TLBMissCycles * 100)/CyclesExecuted 
    			  = 17.57%

Hence, the best possible performance gain we might expect from using huge pages with this workload is about 17.57%.

2.5 Verifying Accuracy

Once a TLB miss cost has been estimated, it should be validated. The easiest means of doing this is with the STREAM benchmark, modified using this patch to use malloc() and rebuilt. The system must be then minimally configured to use hugepages with the benchmark. The huge page size on PPC64 is 16MB so the following commands will configure the system adequately for the validation. Note that the hugepage pool allocation here represents roughly 1GB of huge pages for the STREAM benchmark.

    $ hugeadm --create-global-mounts
    $ hugeadm --pool-pages-min 16M:1040M
    $ hugeadm --pool-list
        Size  Minimum  Current  Maximum  Default
    16777216       65       65       65        *

We then run STREAM with base pages and profiling to make a prediction on what the hugepage overhead will be.

  $ oprofile_start.sh --sample-cycle-factor 5 --event timer --event dtlb_miss
  [ ... profiler starts ... ]
  $ /usr/bin/time ./stream
  [ ...]
  Function      Rate (MB/s)   Avg time     Min time     Max time
  Copy:        2783.1461       0.2585       0.2572       0.2594
  Scale:       2841.6449       0.2530       0.2519       0.2544
  Add:         3080.5153       0.3499       0.3486       0.3511
  Triad:       3077.4167       0.3498       0.3489       0.3510
  12.10user 1.36system 0:13.69elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+262325minor)pagefaults 0swaps

  $ opcontrol --stop
  $ opreport
  CPU: ppc64 970MP, speed 2500 MHz (estimated)
  Counted PM_CYC_GRP23 events ((Group 23 pm_hpmcount1) Processor cycles)
          with a unit mask of 0x00 (No unit mask) count 50000
  Counted PM_DTLB_MISS_GRP23 events ((Group 23 pm_hpmcount1) Data TLB misses)
          with a unit mask of 0x00 (No unit mask) count 1000
  PM_CYC_GRP23:5...|PM_DTLB_MISS_G...|
    samples|      %|  samples|      %|
  ------------------------------------
     599073 98.2975      9492 97.1844 stream

Using the methods described earlier, it is predicted that 17.84% of time is spent translating addresses. Note that time reported that the benchmark took 13.69 seconds to complete. Now rerun the benchmark using huge pages.

  $ oprofile_start.sh --sample-cycle-factor 5 --event timer --event dtlb_miss
  [ ... profiler starts ... ]
  $ hugectl --heap /usr/bin/time ./stream
  [ ...]
  Function      Rate (MB/s)   Avg time     Min time     Max time
  Copy:        3127.4279       0.2295       0.2289       0.2308
  Scale:       3116.6594       0.2303       0.2297       0.2317
  Add:         3596.7276       0.2988       0.2985       0.2992
  Triad:       3604.6241       0.2982       0.2979       0.2985
  10.92user 0.82system 0:11.95elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+295minor)pagefaults 0swaps

  $ opcontrol --stop
  $ opreport
  CPU: ppc64 970MP, speed 2500 MHz (estimated)
  Counted PM_CYC_GRP23 events ((Group 23 pm_hpmcount1) Processor cycles)
          with a unit mask of 0x00 (No unit mask) count 50000
  Counted PM_DTLB_MISS_GRP23 events ((Group 23 pm_hpmcount1) Data TLB misses)
          with a unit mask of 0x00 (No unit mask) count 1000
  PM_CYC_GRP23:5...|PM_DTLB_MISS_G...|
    samples|      %|  samples|      %|
  ------------------------------------
     538776 98.4168         0       0 stream

DTLB misses are not negligible within the STREAM benchmark and it now completes in 11.95 seconds instead of 13.69, which is about 12% faster. Of the four operations, Copy is now 12.37% faster, Scale is 9.67% faster, Add is 16.75% faster and Triad is 17.13% faster. Hence, the estimate of 563 cycles for DTLB misses on this machine is reasonable.

3 Calculating TLB Miss Cost with libhugetlbfs

The methods described in this section for measuring TLB costs were incorporated into libhugetlbfs as of release 2.7 in a script called tlbmiss_cost.sh and a manual page is included. It automatically detects whether calibrator or oprofile should be used to measure the cost of a TLB miss and optionally will download the necessary additional programs to use for the measurement. By default, it runs silently but in the following example where a miss cost of 19 cycles was measured, verbose output is enabled to show details of it working.

    $ tlbmiss_cost.sh -v
    TRACE: Beginning TLB measurement using calibrator
    TRACE: Measured CPU Speed: 2167 MHz
    TRACE: Starting Working Set Size (WSS): 13631488 bytes
    TRACE: Required tolerance for match: 3 cycles
    TRACE: Measured TLB Latency 19 cycles within tolerance. Matched 1/3
    TRACE: Measured TLB Latency 19 cycles within tolerance. Matched 2/3
    TRACE: Measured TLB Latency 19 cycles within tolerance. Matched 3/3
    TLB_MISS_COST=19

4 Summary

While a deep understanding of the TLB and oprofile is not necessary to take advantage of huge pages, it can be instructive to know more about the TLB and the expected performance benefits before any modifications are made to a system configuration. Using oprofile, reasonably accurate predictions can be made in advance.

Conclusion

While virtual memory is an unparalleled success in engineering terms, it is not totally free. Despite multiple page sizes being available for over a decade, support within Linux was historically tricky to use and avoided by even skilled system administrators. Over the last number of years, effort within the community has brought huge pages to the point where they are relatively painless to configure and use with applications, even to the point of requiring no source level modifications to the applications. Using modern tools, it was shown that performance can be improved with minimal effort and a high degree of reliability.

In the future, there will still be a push for greater transparent support of huge pages, particularly for use with KVM. Patches are currently being developed by Andrea Arcangeli aiming at the goal of greater transparency. This represents a promising ideal but there is little excuse for avoiding huge page usage as they exist today.

Happy Benchmarking.

Bibliography

libhtlb09
Various Authors. libhugetlbfs 2.8 HOWTO. Packaged with the libhugetlbfs source. http://sourceforge.net/projects/libhugetlbfs, 2009.

casep78
Richard P. Case and Andris Padegs. Architecture of the IBM system/370. Commun. ACM, 21(1):73--96, 1978.

denning71
Peter J. Denning. On modeling program behavior. In AFIPS '71 (Fall): Proceedings of the November 16-18, 1971, fall joint computer conference, pages 937--944, New York, NY, USA, 1971. ACM.

denning96
Peter J. Denning. Virtual memory. ACM Comput. Surv., 28(1):213--216, 1996.

gorman09a
Mel Gorman. http://www.itwire.com/content/view/30575/1090/1/0. http://www.csn.ul.ie/~mel/docs/stream-api/, 2009.

henessny90
Henessny, J. L. and Patterson, D. A. Computer Architecture a Quantitative Approach. Morgan Kaufmann Publishers, 1990.

manegold04
Stefan Manegold and Peter Boncz. The Calibrator (v0.9e), a Cache-Memory and TLB Calibration Tool. http://homepages.cwi.nl/~manegold/Calibrator/calibrator.shtml, 2004.

mccalpin07
John D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. In a continually updated technical report. http://www.cs.virginia.edu/stream/, 2007.

smith82
Smith, A. J. Cache memories. ACM Computing Surveys, 14(3):473--530, 1982.

yotov04a
Kamen Yotov, Keshav Pingali, and Paul Stodghill. Automatic measurement of memory hierarchy parameters. Technical report, Cornell University, nov 2004.

yotov04b
Kamen Yotov, Keshav Pingali, and Paul Stodghill. X-ray : Automatic measurement of hardware parameters. Technical report, Cornell University, oct 2004.

Comments (3 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Janitorial

Memory management

Networking

Architecture-specific

Security-related

Virtualization and containers

Benchmarks and bugs

Miscellaneous

Page editor: Jonathan Corbet

Distributions

News and Editorials

Debian Project Leader election 2010

By Rebecca Sobol
March 24, 2010

We are in the campaigning period for this year's Debian Project Leader (DPL) election; voting begins April 2, 2010. Platforms have been posted for each of the four candidates—Stefano Zacchiroli, Wouter Verhelst, Charles Plessy, and Margarita Manterola. The March archive of the debian-vote mailing list is full of questions to the candidates, and their answers. This article will summarize the candidates' answers to some of these questions.

None of the candidates plan on having a second in charge (2IC) or a DPL team this year. All agree that Debian funds should be spent on necessary hardware costs and facilitating meetings. Margarita and Stefano were both in favor of funding marketing efforts, such as booths at conferences. Stefano would also like to see the project be more transparent about the money flowing in and out.

Being a DPL takes time and no candidate is able to be a full time DPL. Stefano's job is FOSS related though, and he would be able to take some time off for DPL duties. He also said he would divert his current Debian activities into DPL activities. Wouter is a consultant with somewhat flexible hours, and plans to devote more of his free time to Debian. Both Charles and Margarita would divert the time they spend on Debian into DPL duties, but would not necessarily be able to commit to additional time. Charles, who currently lives in Japan, said he would not travel to distant timezones.

Debian is a volunteer organization, but some people have found ways to get paid for working on Debian. However, the idea of using Debian funds to pay developers was uniformly rejected. Wouter qualified his response though, citing a model used by FreeBSD. The model starts with finding sponsors to pay for a certain project and using the FreeBSD foundation to collect and hold such money specifically for that project. Then in some cases the foundation may contribute additional funds to that project. It's not something he would actively pursue for Debian, though.

Anyone who has followed Debian mailing lists (or IRC, forums, etc.) for a while knows that sometimes discussions can become very heated. There has been an overall trend toward fewer flames in recent years, but it still could be better. The candidates agree that personal attacks are never acceptable. Margarita and Stefano said they would talk privately to the participants of flamewars and politely ask them to stop. Charles said he would make an effort to prepare neutral summaries to resurrect important discussions where the productive parts were drowned in a sea of flames. There was overall agreement that Debian's culture is changing into a more polite society and the DPL can only encourage the transition and lead by example.

Debian's release cycle is unpredictable, despite the best efforts of many developers. There are technical issues that keep releases from happening, but there are also social issues. The Release Team (RT) is tasked with a difficult job and it is hard to find and keep knowledgeable volunteers. What can a DPL do?

Margarita would like to encourage more release critical bug fixing, but admits there much more to a release than RC bugs. Beyond that there's not much a DPL can do besides helping with better documentation of what needs to be done and then asking developers for their help. Stefano sees it as a cultural problem that will take time to fix. From his perception the RT often feels that the project is not interested in getting releases done, and that leads to frustrated RT members. As DPL he would prod the RT for periodic status updates and help to communicate that status to the greater development community. The development community needs to become more invested in the release process. Wouter also sees a cultural problem, where the community needs to become more welcoming in order to find and keep its valuable volunteers. Charles would like to reshape the release process, vary the definition of 'core packages' for different architectures, and make it easy to remove non-core packages from 'testing' if they have unfixed RC bugs. That would reduce the work load for the RT. He also thinks that the release process will become more social over time, with more people doing their part of the work.

The Debian community may evolve over time into a culture where releases are predictable, but should they coordinate those releases with Ubuntu? Margarita would like to see a full release every two years, with a small set of core packages updated annually. It would be good if those releases could be coordinated with Ubuntu. Stefano likes the idea of coordinating specific releases together with derivative distributions, when both distributions will benefit from the coordination. He is not convinced that Debian will benefit by trying to conform to Ubuntu's schedule. Wouter is also in favor of coordination in general, if it works out. Charles would like to see a predictable release schedule, but doesn't feel that aligning with Ubuntu is right for Debian. If anything he'd like to see stable releases happen every two years, but in between Ubuntu's Long Term Support (LTS) releases. That way Debian/Ubuntu users could install a recent release with reasonably long support every year rather than every other year. Collaboration is fine when the opportunity presents itself, but Debian should release when ready, not according to someone else's schedule. He doesn't think the current RT is communicating well enough and as DPL he would strongly encourage the RT to give frequent status reports.

The discussion continues on these and other topics. Many interested voters are already following and taking part in the discussions. Other voters are encouraged to follow the discussions on debian-vote. Hopefully this summary will help some people get a feel for the candidates and the issues they face.

Comments (none posted)

New Releases

Fedora Unity Fedora Re-spin 20100303 released

The Fedora Unity Project has announced the release of new ISO Re-Spins of Fedora 12. "These Re-Spin ISOs are based on the officially released Fedora 12 installation media and include all updates released as of March 3, 2010."

Full Story (comments: none)

FreeBSD 7.3-RELEASE Available

The FreeBSD Release Engineering Team has announced the availability of FreeBSD 7.3-RELEASE. "This is the fourth release from the 7-STABLE branch which improves on the functionality of FreeBSD 7.2 and introduces a few new features. There will be one more release from this branch to allow future improvements to be made available in the 7-STABLE branch but at this point most developers are focused on 8-STABLE."

Full Story (comments: none)

openSUSE Education Li-f-e Update

The openSUSE Education team has announced the availability of the updated openSUSE Education Li-f-e DVD ISO. "The Linux for Education (Li-f-e) contains a wide selection of education, development, office, as well as multimedia packs to meet all possible computing needs of students, teachers and parents."

Comments (none posted)

New OpenVZ kernel, new Owl ISOs and OpenVZ container templates

A new OpenVZ kernel, new ISOs and OpenVZ container templates are available for Openwall GNU/Linux (Owl). "We have updated Owl to use OpenVZ's latest kernel from their "rhel5" branch (released on 03/18), with RHEL5 patches further updated from Red Hat's latest stable kernel (released on 03/16) and with some minor changes of our own. Thus, we're ahead of OpenVZ official kernels in terms of security fixes right now, and there have been quite a few of those lately..."

Full Story (comments: none)

Ubuntu 10.04 LTS Beta 1 released

The first beta for the Ubuntu 10.04 "Lucid Lynx" is out. 10.04 will be a long-term support release; it also brings a number of new features, many with a social-networking or cloud orientation, and a new "consumer friendly" interface for the netbook edition. More information can be found on the Lucid beta 1 page.

Full Story (comments: 20)

Announce XtreemOS 2.1 Release

The XtreemOS consortium has announced the release of XtreemOS 2.1. This update includes an improved installer, lots of high impact bug fixes, XtreemFS 1.2, XtreemOS MD (Mobile Device), and more.

Full Story (comments: none)

Distribution News

Debian GNU/Linux

Neil Williams: Possibilities for Emdebian Crush

Neil Williams looks at the progress of Emdebian Grip and Crush variants. "Crush 2.0 was abandoned last year when the freeze for Debian Squeeze was still scheduled to start at the end of 2009. Even with the expected delays in the timetable for the Debian release, there never was going to be enough time to get Crush 2.0 released with the resources available. Subsequent Crush releases have always been planned, only the release of Crush 2.0 alongside Debian 6.0 (Squeeze) was abandoned. However, Emdebian Grip has developed nicely and Grip 2.0 is going to be a significant advance over Grip 1.0 - lots more packages, lots of bugs fixed for smoother installations, multistrap support, etc."

Comments (none posted)

Bits from the New Maintainer process

Click below for some information about Debian's New Maintainer (NM) process. Topics include thanks to Wouter Verhelst who recently resigned his Front Desk position, a new Application Manager (AM) tutorial, inactive AMs, and support in the NM process.

Full Story (comments: none)

RFH: DebConf 10 Travel Sponsorship Team

The DebConf 10 Organizers are soliciting volunteers for the DebConf 10 Travel Sponsorship team. "Ideal candidates for the team have both available time and are well connected in the Debian web of trust -- while not essential, high connectivity in the web of trust probably indicates familiarity with a broader range of potential DebConf attendees."

Full Story (comments: none)

DebConf10: register by April 15 for sponsorship consideration

Registration is open for DebConf10, taking place August 1-7, 2010 in New York City. April 15, 2010 is the early registration deadline. "Registrations after that date will not be eligible for sponsored food, accommodation or travel."

Comments (none posted)

Fedora

Paul Frields: FPL future

Fedora project leader (FPL) Paul Frields is looking for a successor. After more than two years and (almost) five Fedora releases, he is ready to move on to "other ways of championing free and open source software at Red Hat". The Fedora Board along with various folks at Red Hat will be part of the search process for a new FPL. "This process will naturally take some time, but I'm glad that the partnership between Red Hat and the rest of the Fedora community allows me to give people an early heads-up about these plans. [...] It's important that Fedora always be able to make opportunities for fresh and energetic leadership that will help take our Project, and the distribution we make, to the next level of achievement." Former FPL Max Spevack also has some thoughts on Frields's tenure and the role of the FPL.

Comments (none posted)

Fedora Board Meeting Recap 2010-03-18

Click below for a recap of the March 18, 2010 meeting of the Fedora Advisory Board. Topics include the default offering and user base.

Full Story (comments: none)

Fedora mini mailing list

A new mailing list, mini, has been created for "discussions relating to Sugar, Moblin and anything else that people think they would like to see in that arena."

Full Story (comments: none)

SUSE Linux and openSUSE

Planet SUSE Status

Planet SUSE readers may have noticed that it been unavailable recently. This is due to some problems while renewing the domain. An alternative DNS entry for the server under the openSUSE domain has been set up. You can now reach the planet at planet.openSUSE.org.

Full Story (comments: none)

Distribution Newsletters

DistroWatch Weekly, Issue 346

The DistroWatch Weekly for March 22, 2010 is out. "Protecting one's computer against malware in our interconnected, heterogeneous and (largely) anonymous world is a complex task. Luckily, there are free tools that help save plenty of time and effort; this week we'll take a brisk tour of Dr.Web LiveCD, a Linux-based system that offers free tools for system rescue, virus scanning, and data recovery errands. In the news section, Ubuntu stirs emotions over its unexpected placement of window control buttons, CrunchBang Linux announces a switch to Debian base for its upcoming release, Debian prepares for its annual project leader election with a woman on the candidates list, and the deputy head of LiMux explains the difficulties encountered while migrating tens of thousands of Munich's computers to Linux. Also in this issue, the Questions and Answers section provides hope and suggests tools for recovering files that were deleted by accident. Finally, two interesting distributions have been added to the DistroWatch database this week - a FreeBSD-based desktop live CD with GNOME and yet another XP look-a-like, this time from China. Happy reading!"

Comments (none posted)

openSUSE Weekly News/115

This issue of the openSUSE Weekly News covers the release of openSUSE 11.3 Milestone 3, and much more.

Comments (none posted)

Ubuntu Weekly Newsletter #185

The Ubuntu Weekly Newsletter for March 20, 2010 is out. "In this issue we cover: Ubuntu 10.04 LTS Beta 1 released, Ubuntu Global Jam: time is ticking, Call for Community help: Ubuntu.com Website Localization Project, Launchpad's Bug Watch system and other animals, Upgrade Jams - made easy, Server Bug Zapping - eucalyptus and euca2ools, Nominate your favorite Ubuntu Server Papercuts, Full Circle Podcast #2: The Full Circle of Light (Brown), and much, much more!"

Full Story (comments: none)

Interviews

Interview: CrunchBang Creator Explains Switch to Debian Sources (The Red Devil)

Steven Lawson talks with Philip Newborough about the recent alpha release of CrunchBang Linux. "PN: Interestingly, the CrunchBang community is not really confined to people who use CrunchBang. Our community is made up of people who use various different distributions. I think the one common interest that brings us together is our love of experimenting with Linux and having a little fun whilst doing it. I can honestly say that the best thing to come out of the project has been the CrunchBang forums. There are some really talented, friendly and knowledgeable people on the forums and it is a pleasure to be able to go there every day and share ideas."

Comments (none posted)

Page editor: Rebecca Sobol

Development

Not much of an email review

By Jonathan Corbet
March 24, 2010
Back in 2004, your editor grumbled about the state of electronic mail clients, noting that much of the wisdom encoded in the venerable MH system seemed to have been lost. Nearly six years later, it often seems that the situation has not improved much. Contemporary email clients are large, monolithic blobs which may look pretty, but they do not play well with other tools and seem to get in the user's way as often as not. So, when Dave Jones said in 2007:

I'm convinced there's some contest to see who can make the worst graphical mail client for Linux. I'm not sure what the prize is, or who's winning, but the entries so far are horrific.

Your editor had no choice but to agree. Since then, the situation does not appear to have improved much.

The Notmuch mail client has been on your editor's radar for some months now. Recently, the opportunity to play with this new tool came by; this review is the result. In short: Notmuch looks like it is coming from the right place, but, as befits a program in such an early stage of development, it has some ground to cover yet.

The core of Notmuch is a command-line client which, as they say, does not much. One starts by running notmuch setup to put some basic information into the configuration file, followed by notmuch new to initialize the mail database. That process involves reading and indexing every message you have (messages are stored one-per-file, so Notmuch deals just fine with MH or Maildir trees). Even though the program cheerily declares that you have "not much mail" at setup time, the indexing process can take quite a while; it also doubles the size of the mail store. This step is not optional, though; indexing is at the core of how Notmuch works.

After setup, one can use the notmuch client to perform searches, modify tags, display messages, and compose replies. In a sense, it looks back to the early MH days, when everything was done at the shell prompt. The Notmuch developers do not really expect that users will use the [Notmuch search results] command-line client directly, though; instead, it is assumed that some sort of user interface will be layered over it. There are a few such interfaces available now, but the interface of choice for the Notmuch developers at the moment would appear to be Emacs.

The Emacs notmuch-mode looks, in many ways, like most other Emacs-based mail clients. There are a few differences, though, starting with the fact that folders are a completely alien concept. This is 2010, and folders have been consigned to a dim, dusty, and hierarchical past; now we do everything with tags. So there's no "inbox" in Notmuch; instead, one sees the results of a search for the "inbox" tag. Something similar to refiling can be done, should one so desire, by attaching different tags to the message, but one gets the sense that's not expected to happen very often. This is the era of search, so a specific view of the mail store is just a search away.

Indeed, searches in Notmuch (powered by Xapian) are very fast and very flexible. The syntax is reasonably straightforward and the results are nearly instantaneous. There is an easy feature for further narrowing the results of a search. It is a powerful and flexible way to deal with large quantities of mail.

The process of reading through mail, though, is still in need of some work; the Emacs interface is not, yet, at the level of usability offered by, say, MH-E - and one would ideally set the bar higher than that. There does not appear to be a way to look at the structure of threads in the [Notmuch message display] folder search results view; each thread is collapsed into a single line with a few of the participants listed. Displaying that thread dumps all of the messages into a single Emacs buffer, which can then be paged through. Your editor would rather see the thread structure and individual messages, preferably at the same time.

Working through mail, your editor notes, can be quite slow, with noticeable delays between messages. That would appear to be a result of the removal of the "unread" tag from each message as it is viewed. Tagging operations in general seem to require significant index changes; it may well be that storing mail on a solid-state storage device will be required to get acceptable performance for this kind of operation.

Notmuch doesn't handle composition and sending of mail at all; it defers to the standard Emacs message mode for that. It also doesn't try very hard to display attachments; images, for example, are handed off to external helper programs even though Emacs can display such things inline. When it comes to HTML parts, notmuch-mode does not even try; this, of course, might be seen as a significant advantage.

Is your editor switching to Notmuch? Not yet. Notmuch requires that all mail be stored locally, but your editor likes having that central IMAP server available. There are developers working on tools for synchronizing mail and tags between stores; some folks are even seriously looking at using git as the underlying mail store. There's also no support for multiple email accounts; that is probably trivially fixable by adding a command-line option allowing easy use of multiple configuration files. And the interface remains a little rough.

In the longer term, though, Notmuch could well become your editor's mail tool of choice. The fundamental approach looks right, and the tool-oriented nature of the plumbing should enable the easy scripting of operations on messages. There is an active and growing community of users and contributors; Notmuch has the look of a successful project. This tool looks like it could become a powerful utility indeed in not much time at all.

Comments (18 posted)

Brief items

What's been going on with Ardour?

Paul Davis has an update on Ardour development, which looks at the upcoming 3.0 version, as well as maintenance on 2.x. Ardour is a "digital audio workstation" that runs on Linux and MacOS X. "This work went along with a top-to-bottom revisit of the undo/redo mechanism with the goal of making it scale properly to operations involving large numbers of regions. The results? Operations that were taking an absurd amount of time (40 seconds) to undo can now be undone in less than half a second. The overall responsiveness of undo/redo has now greatly improved." Davis also points to a recent ShotOfJaq podcast on funding models for free software projects that uses Ardour as an example. The comments on the podcast page are worth reading as well.

Comments (none posted)

DWARF Version 4 Released

Version 4 of the DWARF debugging information format specification has been released for public comment. DWARF is used by GCC, GDB, and other free and proprietary toolchains. "Michael Eager, Chair of the DWARF Committee, said 'we have made significant improvements in Version 4 since the previous version was released in 2006. These include improved data compression, better description of optimized code, and support for new language features in C++. Debugging programs can be difficult. Providing the best quality information to programmers can make this easier.'" The DWARF committee is accepting public comments on the spec until May 31. Click below for the full announcement.

Full Story (comments: 2)

GDB 7.1 released

Version 7.1 of the GDB debugger is out. The big changes appear to be multi-program debugging and the ability to work with PIE executables. There's also a couple of new platforms supported and a number of other enhancements.

Full Story (comments: 8)

Greenlet 0.3 released

Greenlet 0.3 is out. It is another Python implementation, based on Stackless Python, but with a twist:

A "greenlet", on the other hand, is a still more primitive notion of micro- thread with no implicit scheduling; coroutines, in other words. This is useful when you want to control exactly when your code runs. You can build custom scheduled micro-threads on top of greenlet; however, it seems that greenlets are useful on their own as a way to make advanced control flow structures.

This release adds some new features, a number of unit tests, and support for Python 3.

Full Story (comments: none)

OpenSSO becomes OpenAM

This entry in the not403 blog discusses OpenSSO, a single sign-on project which Oracle acquired from Sun and has subsequently shut down. "A Norwegian company called ForgeRock has stepped up to give OpenSSO a new home and continue developing OpenSSO under a new name: OpenAM (because of copyright issues with the name). They claim they will continue with Sun's original roadmap for the product, and they have started to make available again all of the express builds, including agents, that were removed from OpenSSO's site, and a new wiki with all the content that once was available at dev.java.net."

Comments (3 posted)

Python 2.6.5 and 3.1.2 released

The Python 2.6.5 and 3.1.2 releases are out. Both releases fix large number of bugs and are intended for production use.

Comments (none posted)

udisks 1.0.0 released

Udisks is the new name for DeviceKit-disks, a utility for the low-level management of block devices. With the 1.0.0 release, the developers have committed to ABI compatibility through the 1.0.x series; a number of features have been added as well.

Full Story (comments: 6)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

Claws Mail: Mail with Attitude (Linux Magazine)

Joe 'Zonker' Brockmeier reviews Claws Mail. "Modern mail user agents (MUAs) tend to hide as much complexity from the user as possible. Claws, bless its speedy little heart, doesn't. Claws is extremely configurable, feature-rich through the use of plugins, and can be keyboard-driven to satisfy users who want the speed of text-based mailers like Mutt with a decent GUI."

Comments (3 posted)

Linux Arpeggiators, Part 2 (Linux Journal)

Dave Phillips continues his coverage of Linux arpeggiators. "Part 1 of this series introduced arpeggiators in general and profiled the QMidiArp application. This week we conclude our survey with a look at two more arpeggiators for Linux musicians: Hypercyclic and Arpage."

Comments (none posted)

Luis Villa: Mailing lists are parties. Or they should be.

Luis Villa compares mailing lists and parties on his blog. He is reacting to a blog posting by Máirín Duffy that mocks up a web-based mailing list interface that incorporates feedback for readers and posters. Villa sees the feedback as being essential to reducing "bad conversations" on mailing lists. "First, the similarities. At most parties, like most mailing lists, most people want to have interesting conversations, and they understand the shared social standards and interests of the other people at the party. And at most parties and most mailing lists there are a handful of people are boors who probably don’t want to spoil the party, but who violate those shared norms- some in very mild ways (boring, talking too loud, posting too much), or maybe some less mild (the guy who doesn’t think he’s a racist, but really is.) If you’ve got similar mixes of people, why then do parties usually handle boors well, while mailing lists often fail and flame out?"

Comments (36 posted)

Page editor: Jonathan Corbet

Announcements

Commercial announcements

New Tokyo stock exchange system built on RHEL

Red Hat has sent out a press release stating that the Tokyo Stock Exchange has built its next-generation trading system on Red Hat Enterprise Linux. "The new system aims to deliver a capacity of orders accepted per second ten times larger than that of TSE's previous trading platform. TSE has measured an impressive order response time of two milliseconds and an information distribution time of three milliseconds. In addition, the solution offers the flexibility to accommodate new trading rules, the ability to be scaled with jumps in system demand and trading growth and expanded security and reliability."

Comments (7 posted)

Novell Rejects Elliott Associates' Proposal as Inadequate

Novell has announced that its Board has concluded that the unsolicited, conditional proposal from Elliott Associates, L.P. to acquire the Company for $5.75 per share in cash is inadequate and that it undervalues the Company's franchise and growth prospects. "Novell also announced that its Board of Directors has authorized a thorough review of various alternatives to enhance stockholder value. These alternatives include, but are not limited to, a return of capital to stockholders through a stock repurchase or cash dividend, strategic partnerships and alliances, joint ventures, a recapitalization and a sale of the Company."

Comments (4 posted)

rPath Joins the Linux Foundation

The Linux Foundation has announced that rPath has joined the Foundation. "Smaller budgets have increased demand for solutions that allow IT to take on increasing levels of scale without adding cost or headcount. rPath today offers automation solutions for provisioning and patching a variety of Linux-based systems, including Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise, among others. Its Linux Foundation membership will enable it to broaden its community involvement and collaborate with industry and technical leadership."

Comments (none posted)

Legal Announcements

HTC responds to Apple

HTC has finally sent out a press release responding to Apple's patent lawsuit. "HTC disagrees with Apple’s actions and will fully defend itself. HTC strongly advocates intellectual property protection and will continue to respect other innovators and their technologies as we have always done, but we will continue to embrace competition through our own innovation as a healthy way for consumers to get the best mobile experience possible."

Comments (11 posted)

Articles of interest

Open Video Alliance launches Wikipedia video campaign (ars technica)

Ryan Paul covers the launch of the Open Video Alliance. "The Open Video Alliance (OVA), a group that seeks to promote adoption of standards-based open video technologies, has launched a new campaign encouraging users to upload videos to the Wikipedia website. The goals behind this new campaign are to visually enrich the online encyclopedia and promote awareness of the value that open video technologies can bring to the Web."

Comments (9 posted)

Google Summer of Code 2010: Mentoring organisations announced (The H)

The H looks at the list of accepted mentoring organizations for GSoC 2010. "The GSoC contests offer university students stipends to write and develop code for various open source projects. Accepted mentors include the Debian Project and the KDE Project, both of which are already seeking project ideas. AbiWord, FFmpeg, Facebook, the GNU Compiler Collection, the LXDE Foundation, Mozilla and Ubuntu are all among the other accepted organisations."

Comments (2 posted)

New Books

Google Wave: Up and Running and Building Web Reputation Systems--New from O'Reilly

O'Reilly has released two new books, "Google Wave: Up and Running" and "Building Web Reputation Systems".

Full Story (comments: none)

Resources

The full text of the proposed ACTA treaty

The Anti-Counterfeiting Trade Agreement is a treaty being negotiated under strict secrecy; it seeks to impose a whole new set of "intellectual property" rules worldwide. Now, it seems, the full text of the treaty has been leaked; it can be found at swpat.org, where the process of transcribing the PDF file into searchable text is underway.

Comments (3 posted)

LCA videos available

Video recordings for the linux.conf.au 2010 conference are available. LCA2010 was held from January 18-23, 2010 at the Wellington Convention Centre in Wellington, New Zealand. (Thanks to Scott Dowdle)

Comments (24 posted)

Transcript: Andrew Tridgell on Patent Defence

Ciaran O'Riordan has made a transcript of Andrew Tridgell's LCA talk on Patent Defence.

Full Story (comments: none)

Videos from the Embedded Linux Conference Europe 2009

Videos of the talks from the 2009 Embedded Linux Conference Europe have been posted by the folks at Free Electrons. There would appear to be far more interesting material available than one can watch in a reasonable period of time.

Comments (1 posted)

Contests and Awards

Free Software Award Winners Announced

The Free Software Foundation (FSF) has announced the winners of the annual free software awards. "The award for the Advancement of Free Software was won by John Gilmore. The award for Project of Social Benefit was won by the Internet Archive. The awards were presented by FSF president and founder Richard M. Stallman."

Full Story (comments: none)

Calls for Presentations

UKUUG - Open Tech 2010

Open Tech 2010 is an informal conference to be held September 11, 2010 in London, UK. "OpenTech is as much about conversations in the bar, as it is sitting in sessions; what topics would you like to be discussed with a range of people? The best way of getting the OpenTech audience to think about the challenges you have is by sharing what they are and solutions you've already found: by offering a talk." The deadline for submissions is June 1, 2010.

Full Story (comments: none)

1st Call For Papers, 17th Annual Tcl/Tk Conference 2010

This year's annual Tcl/Tk Conference (Tcl'2010) will be held October 11-15, 2010 in Chicago/Oakbrook Terrace, Illinois, USA. Abstracts and proposals are due by August 1, 2010.

Full Story (comments: none)

LinuxCon CFP Deadline is March 31

The call for proposals for LinuxCon ends March 31,2010. LinuxCon North America 2010 takes place August 10-12, 2010 in Boston, MA, with several mini-summits taking place on August 9, 2010. "There will be three different categories for submissions: Developer (kernel, core development, software engineering), Operations (systems administration and management, systems architecture, Linux migration and deployment) and Business (open source governance, enterprise, ecosystem). Each of these groups plays a key role in the Linux community and we want to make sure that they are represented at LinuxCon. While we have a list of suggested topics for proposals, we invite the community to submit any creative and interesting topics that they think might be pertinent to the audience."

Full Story (comments: none)

Upcoming Events

Events: April 1, 2010 to May 31, 2010

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
March 30
April 1
Where 2.0 Conference San Jose, CA, USA
April 9
April 11
Spanish DebConf Coruña, Spain
April 10 Texas Linux Fest Austin, TX, USA
April 12
April 14
Embedded Linux Conference San Francisco, CA, USA
April 12
April 15
MySQL Conference & Expo 2010 Santa Clara, CA, USA
April 14
April 16
Linux Foundation Collaboration Summit San Francisco, USA
April 14
April 16
Lustre User Group 2010 Aptos, California, USA
April 16 Drizzle Developer Day Santa Clara, CA, United States
April 16
April 17
R/Finance 2010 Conference - 2nd Annual Chicago, IL, US
April 23
April 25
FOSS Nigeria 2010 Kano, Nigeria
April 23
April 25
QuahogCon 2010 Providence, RI, USA
April 24 Festival Latinoamericano de Instalación de Software Libre Many, Many
April 24 Open Knowledge Conference 2010 London, UK
April 24
April 25
OSDC.TW 2010 Taipei, Taiwan
April 24
April 25
BarCamb 3 Cambridge, UK
April 24
April 25
Fosscomm 2010 Thessaloniki, Greece
April 24
April 25
LinuxFest Northwest Bellingham WA, USA
April 24
April 26
First International Workshop on Free/Open Source Software Technologies Riyadh, Saudi Arabia
April 25
April 29
Interop Las Vegas Las Vegas, NV, USA
April 28
April 29
Xen Summit North America at AMD Sunnyvale, CA, USA
April 29 Patents and Free and Open Source Software Boulder, CO, USA
May 1
May 2
OggCamp Liverpool, England
May 1
May 2
Devops Down Under Sydney, Australia
May 1
May 4
Linux Audio Conference Utrecht, NL
May 3
May 6
Web 2.0 Expo San Francisco San Francisco, CA, USA
May 3
May 7
SambaXP 2010 Göttingen, Germany
May 6 NLUUG spring conference: System Administration Ede, The Netherlands
May 7
May 8
Professional IT Community Conference New Brunswick, NJ, USA
May 7
May 9
Pycon Italy Firenze, Italy
May 10
May 14
Ubuntu Developer Summit Brussels, Belgium
May 17
May 21
Fourth African Conference on FOSS and the Digital Commons Accra, Ghana
May 18
May 21
PostgreSQL Conference for Users and Developers Ottawa, Ontario, Canada
May 24
May 25
Netbook Summit San Francisco, CA, USA
May 24
May 26
DjangoCon Europe Berlin, Germany
May 24
May 30
Plone Symposium East 2010 State College, PA, USA
May 27
May 30
Libre Graphics Meeting Brussels, Belgium

If your event does not appear here, please tell us about it.

Miscellaneous

Google EMEA conference grants for female computer scientists

Ada Lovelace Day seems an appropriate one to note that Google is offering conference and travel grants for female computer scientists in the EMEA (Europe, the Middle East, and Africa) region. Women can apply for a grant for eligible technical conferences and, if selected, will receive free conference registration and €300 for travel. Applicants are required to have a strong academic background and be working or studying Computer Science, Computer Engineering, or a field closely related to that of the conference. (Thanks to Armijn Hemel).

Comments (24 posted)

Page editor: Rebecca Sobol

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds