LWN.net Weekly Edition for April 7, 2011

Future storage technologies and Linux

By Jonathan Corbet
April 6, 2011

The opening plenary session at the second day of the 2011 Linux Filesystem, Storage, and Memory Management workshop was led by Michael Cornwall, the global director for technology standards at IDEMA, a standards organization for disk drive manufacturers. Thirteen years ago, while working for a hardware manufacturer, Michael had a hard time finding the right person to talk to in the Linux community to get support for his company's hardware. Years later, that problem still exists; there is no easy way for the hardware industry to work with the Linux community, with the result that Linux has far less influence than it should. His talk covered the changes that are coming in the storage industry and how the Linux community can get involved to make things work better.

The International Disk Drive Equipment and Materials Association works to standardize disk drives and the many components found therein. While some may say that disk drives - rotating storage - are on their way out, the fact of the matter is that the industry shipped a record 650 million drives last year and is on track to ship one billion drives in 2015. This is an industry which is not going away anytime soon. Who drives this industry? There are, he said, four companies which control the direction the disk drive industry takes: Dell, Microsoft, HP, and EMC. Three of those companies ship Linux, but Linux is not represented in the industry's planning at all.

One might be surprised by what's found inside contemporary drives. There is typically a multi-core ARM processor similar to those found in cellphones, and up to 1GB of RAM. That ARM processor is capable, but it still has a lot of dedicated hardware help; special circuitry handles error-correcting code generation and checking, protocol implementation, buffer management, sequential I/O detection, and more. Disk drives are small computers running fairly capable operating systems of their own.

The programming interface to disk drives has not really changed in almost two decades: drives still offer a randomly-accessible array of 512-byte blocks addressable by a logical block addresses. The biggest problem on the software side is trying to move the heads as little as possible. The hardware has advanced greatly over these years, but it is still stuck with "an archaic programming architecture." That architecture is going to have to change in the coming years, though.

The first significant change has been deemed the "advanced format" - a fancy term for 4K sector drives. Christoph Hellwig asked for the opportunity to chat with the marketing person who came up with that name; the rest of us can only hope that the conversation will be held on a public list so we can all watch. The motivation behind the switch to 4K sectors is greater error-correcting code (ECC) efficiency. By using ECC to protect larger sectors, manufacturers can gain something like a 20% increase in capacity.

The developer who has taken the lead in making 4K-sector disks work with Linux is Martin Petersen; he complained that he has found it to be almost impossible to work on new technologies with the industry. Prototypes from manufacturers worked fine with Linux, but the first production drives to show up failed outright. Even with his "800-pound Oracle hat" on, he has a hard time getting a response to problems. "Welcome," Michael responded, "to the hard drive business." More seriously, he said that there needs to be a "Linux certified" program for hardware, which probably needs to be driven by Red Hat to be taken seriously in the industry. Others agreed with this idea, adding that, for this program to be truly effective, vendors like Dell and HP would have to start requiring Linux certification from their suppliers.

4K-sector drives bring a number of interesting challenges beyond the increased sector size. Windows 2000 systems will not properly align partitions by default, so some manufacturers have created off-by-one-alignment drives to compensate. Others have stuck with normal alignment, and it's not always easy to tell the two types of drive apart. Meanwhile, in response to requests from Microsoft and Dell, manufacturers are also starting to ship native 4K drives which do not emulate 512-byte sectors at all. So there is a wide variety of hardware to try to deal with. There is an evaluation kit available for developers who would like to work with the newer drives.

The next step would appear to be "hybrid drives" which combine rotating storage and flash in the same package. The first generation of these drives did not do very well in the market; evidently Windows took over control of the flash portion of the drive, defeating its original purpose, so no real performance benefit was seen. There is a second generation coming which may do better; they have more flash storage (anywhere from 8GB to 64GB) and do not allow the operating system to mess with it, so they should perform well.

Ted Ts'o expressed concerns that these drives may be optimized for filesystems like VFAT or NTFS; such optimizations tend not to work well when other filesystems are used. Michael replied that this is part of the bigger problem: Linux filesystems are not visible to the manufacturers. Given a reason to support ext4 or btrfs the vendors would do so; it is, after all, relatively easy for the drive to look at the partition table and figure out what kinds of filesystem(s) it is dealing with. But the vendors have no idea of what demand may exist for which specific Linux filesystems, so support is not forthcoming.

A little further in the future is "shingled magnetic recording" (SMR). This technology eliminates the normal guard space between adjacent tracks on the disk, yielding another 20% increase in capacity. Unfortunately, those guard tracks exist for a reason: they allow one track to be written without corrupting the adjacent track. So an SMR drive cannot just rewrite one track; it must rewrite all of the tracks in a shingled range. What that means, Michael said, is that large sequential writes "should have reasonable performance," while small, random writes could perform poorly indeed.

The industry is still trying to figure out how to make SMR work well. One possibility would be to create separate shingled and non-shingled regions on the drive. All writes would initially go to a non-shingled region, then be rewritten into a shingled region in the background. That would necessitate the addition of a mapping table to find the real location of each block. That idea caused some concerns in the audience; how can I/O patterns be optimized if the connection between the logical block address and the location on the disk is gone?

The answer seems to be that, as the drive rewrites the data, it will put it into something resembling its natural order and defragment it. That whole process depends on the drive having enough idle time to do the rewriting work; it was said that most drives are idle over 90% of the time, so that should not be a problem. Cloud computing and virtualization might make that harder; their whole purpose is to maximize hardware utilization, after all. But the drive vendors seem to think that it will work out.

Michael presented four different options for the programming interface to SMR drives. The first was traditional hard drive emulation with remapping as described above; such drives will work with all systems, but they may have performance problems. Another possibility is "large block SMR": a drive which does all transfers in large blocks - 32MB at a time, for example. Such drives would not be suitable for all purposes, but they might work well in digital video recorders or backup applications. Option three is "emulation with hints," allowing the operating system to indicate which blocks should be stored together on the physical media. Finally, there is the full object storage approach where the drive knows about logical objects (files) and tries to store them contiguously.

How well will these drives work with Linux? It is hard to say; there is currently no Linux representation on the SMR technical committee. These drives are headed for market in 2012 or 2013, so now is the time to try to influence their development. The committee is said to be relatively open, with open mailing lists, no non-disclosure agreements, and no oppressive patent-licensing requirements, so it shouldn't be hard for Linux developers to participate.

Beyond SMR, there is the concept of non-volatile RAM (NV-RAM). An NV-RAM device is an array of traditional dynamic RAM combined with an equally-sized flash array and a board full of capacitors. It operates as normal RAM but, when the power fails, the charge in the capacitors is used to copy the RAM data over to flash; that data is restored when the power comes back. High-end storage systems have used NV-RAM for a while, but it is now being turned into a commodity product aimed at the larger market.

NV-RAM devices currently come in three forms. The first looks like a traditional disk drive, the second is a PCI-Express card with a special block driver, and the third is "NV-DIMM," which goes directly onto the system's memory bus. NV-DIMM has a lot of potential, but is also the hardest to support; it requires, for example, a BIOS which understands the device, will not trash its contents with a power-on memory test, and which does not interleave cache lines across NV-DIMM devices and regular memory. So it is not something which can just be dropped into any system.

Looking further ahead, true non-volatile memory is coming around 2015. How will we interface to it, Michael asked, and how will we ensure that the architecture is right? Dell and Microsoft asked for native 4K-sector drives and got them. What, he asked, does the Linux community want? He recommended that the kernel community form a five-person committee to talk to the hard disk drive industry. There should also be a list of developers who should get hardware samples. And, importantly, we should have a well-formed opinion of what we want. Given those, the industry might just start listening to the Linux community; that could only be a good thing.

Comments (34 posted)

Camp KDE: Update on Qt open governance

April 6, 2011

This article was contributed by Joe 'Zonker' Brockmeier.

Nokia announced that it would be pursuing an open governance model for Qt in June of 2010 at Akademy. After nearly a year of discussions and preparation, Thiago Macieira provided an update at Camp KDE outlining the governance model that Nokia would be pursing and the next steps. Though Macieira did not have all of the details, it seems that Qt will be reasonably open for a project that began life as a non-free toolkit.

Macieira, a senior product manager for Qt Software under Nokia, noted early in the talk that this was really another step in a long process of opening up Qt. From its initial humble beginnings as a non-free toolkit to an open source license, then the GPL license, then finally adding LGPLv2.1 and making all development open to the public. Once there were only daily snapshots, and before that even less visibility, now Macieira says "it's not really news" that Nokia continues to open up Qt development and governance.

Not surprising, perhaps, but still newsworthy and interesting to the KDE developers who depend heavily on Qt and may have been quite worried about its future following — as Macieira put it "the events of February 11" when Nokia announced its partnership with Microsoft.

Motivation

Why is Nokia doing this? Macieira says it's in Nokia's best interest, in that Qt is growing "faster than what we can or should grow" in reference to Nokia's Qt R&D team. He continued:

It's in our best interest that others use Qt but don't depend on us doing everything for them. [We] don't want to do everything, can't do everything people want to do with Qt, let people join in and do what they need.

What will be happening now? Macieira says that Qt will be developed using open participation, which is "what Qt developers have always wanted, not what KDE has always wanted, what our engineers have always wanted". He noted that the model that Nokia had chosen for Qt would be more akin to the way the Linux kernel team works, with a distributed approval system, with public discussions and decision-making on mailing lists. He noted that with KDE development "everybody works on what they want to work on" which wouldn't work well for Qt development.

Though KDE is a very visible consumer of Qt with a long history of working with Nokia (and before it Trolltech) on opening Qt, KDE was not the only project that has influenced its open governance model. Macieira noted that MeeGo and Qxt have also been influential.

Code will make it into Qt after it's been approved by maintainers and has passed regression testing. Anything that doesn't pass regression testing will be backed out. Historically, says Macieira, they would be reluctant to accept changes in many cases because that meant that Qt developers would be signing up to maintain the changes. No longer — when developers propose changes "it's your responsibility now, if you broke a test, it's your responsibility to go fix it."

As for the actual "governance," Macieira says that Qt will not have an elected board or anything like that — simply a tiered system that starts (at the bottom) with contributors, then approvers, then maintainers who have responsibility for a given module or port (such as Webkit), and finally the "Chief Troll" who will lead the maintainers. The Chief Troll, of course, would be analogous to Linus Torvalds. Who will be the Chief Troll? Macieira wouldn't say, but said they "have an idea" who would be taking the troll role.

Macieira says that the timeline for the announcement is "within the next two months" but it could be sooner. He says that Nokia now needs to contact the people that it has been considering and asking if they're willing to take on the maintainer and Chief Troll roles. "As soon as we get the people to say yes, we'll probably announce", he said.

Filling out the project

Macieira says that the system will be bootstrapped by Nokia and many positions will be filled by people already in the organization, though he also said that the company would ask external contributors to be maintainers as well. He said that maintainers would "naturally appear" from the people contributing to modules, and that there would likely be changes after a shaking out period where maintainers had a chance to establish themselves. Some might decide, for example, that they didn't wish to keep the responsibility.

There will be other roles as well — QA, release management, project management, community management, and so on. This was a bit sketchy, but it seems to follow the model used by many companies that sponsor FOSS development projects. He asked the community to "see how it goes" and "there will be adjustments". He also invited the audience to the upcoming Qt Contributor's Summit to meet other developers and participate in the process.

Though Nokia is committing to a more open governance model, it's worth noting that the company is not turning everything over to the community. The company will continue to hold the trademark, and it will continue to ask developers to sign its contributor agreement so that the company can continue to offer Qt under commercial license as well as under the GPLv3 and LGPLv2.1.

However, Macieira says "there's no ownership here" aside from the trademark. Contributors have to allow Nokia rights to use the code to contribute, but the contributor agreement doesn't require them to sign copyright over to Nokia fully — contributors still retain the copyright. Macieira says that, because Qt is under the LGPL "anybody can take it elsewhere" if they're unhappy with the way that the community is run or the direction of the project, or for any other reason. However, he says that Nokia "wants to make it so that this community is attractive, and that people can come and work with us. Your needs can be met inside so you don't have to fork. "

After the presentation, I asked Macieira to identify the biggest hurdle for Qt open governance. He said there was not a single major issue, but "a lot of small issues" that could derail the project. In particular, he cited the lengthy process of opening Qt:

We've been at this a long time and we're risking losing the interest and participation of key influential people. Without them, we may be unable to convince that this is a legitimate effort and to get the necessary training of people external to Nokia

KDE and Nokia

How does the KDE community feel about this? Cornelius Schumacher, a KDE developer and president of KDE e.V., was at Macieira's presentation. He says that it will "make it easier for us to directly contribute to Qt, and participate in the maintenance of our foundational toolkit." Schumacher also credits Nokia for being very open about the process and inviting the community to participate. Though there are a number of details that need to be worked out, he says that he's optimistic that it will work out well.

That seems to sum up the feeling of most of the audience — Nokia seems to have quite a bit of goodwill in the KDE community and seems to be on the road to a model that will work well for the larger Qt community. Macieira emphasized a number of times that the governance model that was outlined is simply what Nokia thinks will work based on its observation of other well-functioning communities and feedback it has received in the process of moving to an open governance model. Macieira asks the community to work with it and see what works, and what doesn't.

Nokia may not be going quite as far as some community members would wish, but it does seem that the company is making a very good faith effort and satisfying most of the community's concerns. The devil, of course, is in the details — it will be interesting to see who Nokia appoints as maintainers and "Chief Troll," and how many of the decision-makers initially are from outside the Nokia corporate walls. A clearer picture should be available after the Qt Contributor's Summit in Berlin in June.

Comments (5 posted)

TXLF: HeliOS helps schoolkids and challenges developers

April 6, 2011

This article was contributed by Nathan Willis

Ken Starks of the HeliOS Project delivered the keynote talk at the second annual Texas Linux Fest (TXLF) in Austin on Saturday. HeliOS is a not-for-profit initiative that refurbishes computers and gives them to economically-disadvantaged schoolkids in the Austin area — computers running Linux. Starks had words for the audience on the value of putting technology into young hands, as well as a challenge to open source developers to re-think some of their assumptions about users — based on what HeliOS has learned giving away and supporting more than one thousand Linux systems.

How HeliOS works

Starks led off by giving the audience an overview of HeliOS, both its mission and how it operates in practice. It is under the federal non-profit umbrella of Software In the Public Interest (SPI), which supports Debian, Freedesktop.org, and many other projects. The program started in 2005, and since then has given away more than 1200 computers (some desktops, some laptops) to Austin-area children and their families.The families are important in discussing HeliOS's work, Starks said, because the 1200 number only counts the child "receiving" the computer. When siblings, parents, and other family members are included, he estimates that more than 4000 people are using HeliOS's machines.

The hardware itself is donated by area businesses and individuals. But the project does not accept just any old end-of-life machines. The goal is to provide the recipient with a working, useful system, so the project only accepts donations of recent technology. At present, that means desktops with Pentium 4 or Athlon XP processors and newer (at 2GHz and above), 1GB of RAM or more, with 40GB of storage. The full list of accepted hardware reveals some additional restrictions that the project must make (it no longer accepts CRT monitors for liability and transportation reasons) as well as predictable pain points, such as 3D-capable graphics cards. Starks has said in the past that roughly one third of all computers donated to HeliOS must have their graphics card replaced in order to be useful on a modern desktop.

Referrals come from a variety of sources, including teachers, social workers, police officers, and even hospitals. Starks and HeliOS volunteers make a visit to the home to get to know the family and scope out the child's situation before making a donation commitment. A family that can afford a high-priced monthly cable bill, he suggested, might get a call back in a few days recommending that they lower their cable package and purchase a reduced-price computer from HeliOS instead. But a computer is always in tow for the first visit, ready for immediate delivery.

Volunteers assemble and repair each PC, then install HeliOS's own custom Linux distribution — currently an Ubuntu remix tailored to include educational software, creative and music applications, and a few games. The team delivers and sets up the computer in the family's home, providing basic training for everyone in the household. They continue to stay involved with the families to provide support as needed. Support for the hardware and the Linux distribution, that is.

Periodically, HeliOS receives a call from a recipient's family member asking for help with a copy of Windows that they installed after erasing Linux from the machine. The child never removes Linux, Starks said, only a parent, and the support call almost always means trouble with viruses, malware, or driver incompatibility. At that point, HeliOS politely refuses to support the Windows OS, but will gladly reinstall Linux. This type of event is a rarity; Starks mentions on his blog that it happened just eight times in 2010, out of 296 Linux computers. It never matters to the kids what OS is on the computer, he said, the kids are simply "jacked" to be finally entering the world of computer ownership.

But Linux is not merely a cost-saving compromise HeliOS uses to make ends meet (although Microsoft did offer the project licenses for Windows XP at a reduced rate of US $50 apiece). The project includes virtual machine software in its distribution, and has a license donated by CodeWeavers to install Crossover Pro for those occasions when a specific Windows application is required, Starks said. The real reason Linux is the operating system of choice is that it allows the children to do more and learn more than they can with a closed, limited, and security-problem-riddled alternative. Our future scientists and engineers are the students learning about technology as children today, he said, and HeliOS wants them to know how Linux and free software can change that future.

What HeliOS can teach the developer community

Over six years of providing Linux computers to area schoolkids (the oldest of whom include five just entering graduate school), Starks said, the project has amassed lots of data on how children and new users use computers, which allows him to give feedback to the developer community that it won't hear otherwise. The open source community creates a lot of islands, he said — KDE island and GNOME island, for example. But the most troubling one is User island and Developer island, between which people only talk through slow and ineffective message-in-a-bottle means. Because open source lacks the inherent profit motivation that pushes proprietary software developers to keep working past the "works for me" point, too many projects reach the "good enough" stage and stop.

Starks explored several examples of the user/developer disconnect, starting with the humorous indecipherable-project-name problem. He listed around a half-dozen applications that HeliOS provides in its installs, but with names he said reinforce the impression that Linux is not only created by geeks, but for geeks: Guayadeque, Kazehakase, Gwibber, Choqok, Pidgin, Compiz, and ZynAddSubFX. The pool of available project names may be getting low, he admitted, but he challenged developers to remember that when they introduce a new user to the system, they are implicitly asking the user to learn a whole new language. When there is no "cognitive pathway" between the name of the application and what it does, learning the new environment is needlessly hard.

He then presented several usability problems that stem from poor defaults, lack of discoverability, and confusing built-in help. In OpenOffice.org Writer, for example, most users simply choose File -> Save, unaware that the default file format is incompatible with Microsoft Word, which starts a day-long firestorm for the user when they email the file to a friend and it is mysteriously unusable to the recipient. The lxBDPlayer media player — in addition to making the awkward-name list — confronts the user with a palette of Unix-heavy terminology such as "mount points" and "paths" even within its GUI.

Time ran short, so Starks skipped over a few slides, but he does blog about many of the same issues, further citing the experience of HeliOS computer families. The message for developers was essentially to rethink the assumptions that they make about the user. For example, it is common to hear the 3D graphics-card requirement of both Ubuntu's Unity and GNOME 3's Shell defended by developers because "most people" have new enough hardware. Starks touched on that issue briefly as well as in a February blog post, and might amend that defense to say "most middle-class people" have new enough hardware. Most users do not have any problem with the application name GIMP, but Starks asks the developers to consider what it is like when he has to introduce the application to a child wearing leg braces. Most developers think their interface is usable, but Starks asks them to try to remember what it was like when they used Linux — or any computer — for the very first time.

Starks concluded his talk by assuring the audience that the example projects he talked about were chosen just to stir up the pot, and not cause any real offense. He poked fun at the Ubuntu Customization Kit's acronym UCK, for example, but said HeliOS indebted to it for allowing the project to create all of its custom software builds. Indeed, Starks can dial up his "curmudgeonly" persona at will to make a humorous point (as he did many times), but also switch right back into diplomatic mode when he needs to. He ended the talk by thanking the open source community for all of its hard work. "Sure, we give away computers, but without what you do, we give away empty shells", he said.

Starks believes in the mission of the HeliOS project because the next generation will discover and innovate more than the past two generations combined — and they will be able to do it because they will learn about technology using the software created by the community. It is a humbling and exciting future to contemplate, he said, one that if the developer community stops to consider, makes for a far better incentive to innovate than the profit motivation that drives the proprietary competition.

Impact

I am part of the organizing team for TXLF, so I can tell you that among the reasons the team invited Starks to deliver the keynote this year were the opportunity to present a "Linux story" from outside the typical IT closet environment and the major distributions, and Starks's ability to present a challenge to the community. He certainly delivered on both counts. What remains an open question is whether that challenge gets taken seriously, or gets lost in the well-oiled machinery of the release cycle.

After all, most of us have heard the "project name" dilemma before, and yet it remains a persistent problem. Is the fact that HeliOS has hands-on, real-world examples of new users being put off by application names going to prompt any project to re-evaluate its name? Who knows. It is easy to dismiss Starks's stories as anecdotal (and he readily admits that his data is not controlled or scientific), but the project does install around 300 Linux computers per year, in the field.

In the meantime, it is good to know that the project will keep up that work. Starks took time out of his allotment to present volunteer Ron West with the "HeliOS volunteer of the year" award, and mention some of the ongoing work the initiative is currently engaged in. It recently moved into a new building, and has started The Austin Prometheus Project to try and raise funds to provide Internet service to HeliOS kids, 70 percent of whom have no Internet connection. Of course, that statistic flies in the face of yet another assumption the development community makes all the time about always-on connectivity. I suppose the challenges never end.

Comments (13 posted)

Page editor: Jonathan Corbet

Security

Deliberately insecure Linux distributions as practice targets

April 6, 2011

This article was contributed by Koen Vervloesem

There are a lot of penetration testing (aka pentest) tools, but they are not always easy to learn, so you need practice — a lot of practice. Before using these tools on a live environment, you need to set up a test environment, install some services with vulnerabilities, and then try to break into it. Fortunately, pentesters don't have to do all this preparation themselves, as this is a niche where a couple of Linux distributions can be found. We'll take a look at a few of these deliberately insecure Linux distributions, which can be run on an isolated network or in a virtual machine to be targeted with your pentesting tools or exploits. On the attacker's side, you could use a distribution like BackTrack or a pentesting tool like the Metasploit Framework.

Damn Vulnerable Linux and Metasploitable

Probably the most well-known vulnerable Linux distribution is Damn Vulnerable Linux, but at this moment the website has the message "We are working. DVL 2.0 might appear in summer 2011" and there doesn't seem to be a way to download the most recent release, 1.5 (which dates from January 2009), so your author couldn't review DVL. The idea, however, is simple: DVL is shipped as a distribution that is as vulnerable as possible, for learning and research purposes for security pentesters and students. DVL was built by Thorsten Schneider, a security researcher at Bielefeld University in Germany, as a training system that he could use for his university lectures, to teach topics like buffer overflows, SQL injection, and so on.

Another well-known vulnerable Linux distribution is Metasploitable, an Ubuntu 8.04 server install on a VMWare image. This install includes a number of vulnerable packages, such as a Tomcat 5.5 servlet container with weak credentials, ssh and telnet accounts with weak passwords, along with outdated versions of distcc, tikiwiki, twiki, and MySQL. Metasploitable is meant as a practice target for the Metasploit Framework, but of course you can also use it to test other pentesting tools. Moreover, the virtual disk is non-persistent, so all damage you do to the system while pentesting disappears after a reboot. Metasploitable can easily be installed in VirtualBox: just add the vmdk file as a new virtual hard disk to VirtualBox and create a new Linux VM with this hard disk as the boot disk. Just don't forget to enable IO APIC in the virtual machine.

LAMPSecurity

An especially interesting vulnerable machine (or rather, a set of virtual machines) is LAMPSecurity. There is a CentOS based virtual machine that can be used as the attacker's operating system because it becomes preloaded with many attack tools, and another CentOS based virtual machine as the target, named Capture The Flag. Unfortunately, your author couldn't get these images, distributed as VMware images, to boot on VirtualBox. However, the Capture The Flag image comes with a tutorial PDF that demonstrates how to chain together a series of vulnerabilities to be able to completely compromise the target system. The document describes one possible path to get root, but of course there are other ways to compromise the target, so after reading the document, users can surely apply what they have learned to further explore the target.

The tutorial begins with scanning the target with the vulnerability scanner Nikto, which is specialized in testing web servers for interesting files and directories (e.g. a public /phpmyadmin) and vulnerable web server software. It also identifies the version numbers of Apache and PHP, which are useful to search for vulnerabilities that apply. Then the tutorial shows how to use Paros as a web proxy in the browser, so the pentester can intercept requests to the target: all requests and responses are registered and can be investigated in the Paros program to look for vulnerabilities in a web application.

In the next step of the tutorial, the user is guided to identify an SQL injection vulnerability in the target's web site. This section is a particularly interesting introduction to SQL injections, with a step-by-step explanation spelled out in detail, including how to get access to system files. In the last step, the tutorial builds upon this SQL injection with a local privilege escalation to get an interactive root shell for the attacker.

De-ICE PenTest

The most comprehensive vulnerable distribution project is definitely the De-ICE PenTest Lab, the brainchild of penetration tester Thomas Wilhelm. When he had to learn as much about penetration testing as possible in a short time, he found no usable targets to practice on, so he created his own live CDs: two "Level 1" ISO images and one "Level 2" image. On the attacker's side, Wilhelm recommends BackTrack. Unfortunately, the target machines have an hardcoded IP address, which can conflict with your own network's address range.

Each of the ISO images is meant to be used in a specific real-world scenario: for the first Level 1 image, you are hired by a small company to pentest an old server that has a web-based list of the company's contact information. The scenario for the second Level 1 image is a little tougher: the target system is an FTP server that has been used in the past to maintain customer information but has been sanitized, and you have to show that you can get sensitive information out of the server. In the Level 2 scenario, you should identify any vulnerabilities you can find, and you get the permission to cause damage.

De-ICE PenTest also has a forum, where users can discuss the challenges for the three ISO images and get some help (warning: there are spoilers in the forum). On the wiki, there are also some video walkthroughs. Of course these contain major spoilers, so you probably want to wait for them until you have completed the challenges.

Other projects

There are a lot of other projects. The Virtual Hacking Lab has the same approach as LAMPSecurity: it distributes an ISO image to run on the attacker's side (the security-focused Gentoo derivative live CD Pentoo), and offers some vulnerable images to run as the target machines. For instance, a directory lists quite a few vulnerable distributions. Unfortunately, the project doesn't come with comprehensive documentation.

The OWASP Broken Web Applications Project is, like its name says, focused on vulnerable web applications. OWASP is the Open Web Application Security Project, a community that works to create freely available documentation, methodologies, and tools concerning web application security. The OWASP Broken Web Applications Project is distributed as a virtual machine in a VMware image. It's running outdated, vulnerable versions of some real-life web applications, such as phpBB and WordPress, but also some intentionally vulnerable applications created by OWASP and other projects.

Holynix is an Ubuntu Server install on a VMware image, which also runs on VirtualBox or Qemu. According to the README, the image requires a specific network configuration with a static IP address, which is cumbersome if the required network mask conflicts with your own network. Your author downloaded version 2 and ran it in VirtualBox. The project has a forum with help, including instructions about importing the distribution's image in VMware or VirtualBox. Just don't forget to enable PAE/NX and IO APIC in the virtual machine, or it won't boot.

Practice

If you start digging, you'll easily find a dozen vulnerable Linux distributions that you can use to practice on. However, none of these distributions really stands out from the crowd. Many of them are already old —although that's not bad in this case, as it improves the chance of finding vulnerabilities. An somewhat more painful issue is that many of these distributions require a specific network configuration, which is a barrier to quickly test them in an arbitrary network. Along the same line, many of these projects are distributed as VMware images, which are not always easy to run in other hypervisors. Documentation is also an issue with many of these projects: while one could say that good pentesters will always have to be able to find their way on a foreign system, a little guidance could make these vulnerable distributions a more efficient tool for testing these tools and techniques. However, one thing is sure: pentesters that jump through all these hoops will be able to practice their techniques on a lot of different test targets.

Comments (3 posted)

Brief items

X.Org security advisory: root hole via rogue hostname

X.Org has patched a root hole in xrdb, in all versions up to 1.0.8. "By crafting hostnames with shell escape characters, arbitrary commands can be executed in a root environment when a display manager reads in the resource database via xrdb." Hosts that set their hostname via DHCP and/or hosts that allow remote logins via xdmcp are affected. The issue has been fixed in xrdb 1.0.9.

Full Story (comments: 16)

Laurie: Improving SSL certificate security

On Google's security blog, Ben Laurie looks at some Google initiatives to improve SSL certificate security. One is a certificate catalog that Google gathers as it spiders the internet, which can be queried via DNS (see the post for details). "The second initiative to discuss is the DANE Working Group at the IETF. DANE stands for DNS-based Authentication of Named Entities. In short, the idea is to allow domain operators to publish information about SSL certificates used on their hosts. It should be possible, using DANE DNS records, to specify particular certificates which are valid, or CAs that are allowed to sign certificates for those hosts. So, once more, if a certificate is seen that isn't consistent with the DANE records, it should be treated with suspicion."

Comments (63 posted)

Linux security summit CFP open

On his blog, James Morris has announced that the call for presentations for the 2011 Linux Security Summit is now open. Proposals will be accepted until May 27, and the summit will be held on September 8 in Santa Rosa, CA in conjunction with the Linux Plumbers Conference. From the summit site: "Brief technical talks in 30 minute slots, including at least 10 minutes of discussion (i.e. the maximum length of the presentation alone is 20 minutes). Papers are encouraged, and slides should be minimal. [...] Presentation abstracts should be approximately 150 words in length."

Comments (none posted)

New vulnerabilities

asterisk: multiple vulnerabilities

Package(s):

asterisk

CVE #(s):

CVE-2011-1174 CVE-2011-1175

Created:

March 31, 2011

Updated:

April 27, 2011

Description:

From the Red Hat Bugzilla [1, 2]:

CVE-2011-1174: If manger connections were rapily opened, sent invalid data, then closed, it could cause Asterisk to exhaust available CPU and memory resources. The Manager Interface is disabled by default. Versions 1.6.2.x and 1.8.x are affected, and 1.6.2.17.1 and 1.8.3.1 have been released to correct this flaw.

CVE-2011-1175: If a remote, unauthenticated, attacker were to rapidly open and close TCP connections to services using the ast_tcptls_* API, they could cause Asterisk to crash after dereferencing a NULL pointer. This flaw affects 1.6.2.x and 1.8.x, and is corrected in 1.6.2.17.1 and 1.8.3.1.

Alerts:

Gentoo	201110-21	asterisk	2011-10-24
Fedora	FEDORA-2011-3942	asterisk	2011-03-23
Fedora	FEDORA-2011-3945	asterisk	2011-03-23
Debian	DSA-2225-1	asterisk	2011-04-25

Comments (1 posted)

cobbler: privilege escalation

Package(s):

cobbler

CVE #(s):

CVE-2011-1551

Created:

April 1, 2011

Updated:

April 6, 2011

Description:

From the openSUSE advisory:

/var/log/cobbler/ directory was owned by the web service user. Access to this account could potentially be abused to corrupt files during root filesystem operations by the Cobbler daemon.

Alerts:

SUSE	SUSE-SR:2011:006	apache2-mod_php5/php5, cobbler, evince, gdm, kdelibs4, otrs, quagga	2011-04-05
openSUSE	openSUSE-SU-2011:0277-1	cobbler	2011-04-01

Comments (none posted)

evince: buffer overflow

Package(s):

evince

CVE #(s):

CVE-2011-0433

Created:

April 1, 2011

Updated:

January 30, 2012

Description:

From the openSUSE advisory:

This update of evince fixes a buffer overflow in linetoken().

Alerts:

Gentoo	201701-57	t1lib	2017-01-24
Mandriva	MDVSA-2012:144	tetex	2012-08-28
Scientific Linux	SL-tete-20120823	tetex	2012-08-23
Oracle	ELSA-2012-1201	tetex	2012-08-23
CentOS	CESA-2012:1201	tetex	2012-08-23
Red Hat	RHSA-2012:1201-01	tetex	2012-08-23
openSUSE	openSUSE-SU-2012:0559-1	t1lib	2012-04-25
Oracle	ELSA-2012-0137	texlive	2012-02-15
CentOS	CESA-2012:0137	texlive	2012-02-16
Scientific Linux	SL-texl-20120215	texlive	2012-02-15
Red Hat	RHSA-2012:0137-01	texlive	2012-02-15
CentOS	CESA-2012:0062	t1lib	2012-01-30
Fedora	FEDORA-2012-0266	t1lib	2012-01-28
Fedora	FEDORA-2012-0289	t1lib	2012-01-28
Ubuntu	USN-1347-1	evince	2012-01-25
Scientific Linux	SL-t1li-20120125	t1lib	2012-01-25
Red Hat	RHSA-2012:0062-01	t1lib	2012-01-24
Oracle	ELSA-2012-0062	t1lib	2012-01-25
Ubuntu	USN-1335-1	t1lib	2012-01-19
Debian	DSA-2388-1	t1lib	2012-01-14
Mandriva	MDVSA-2012:004	t1lib	2012-01-12
SUSE	SUSE-SR:2011:006	apache2-mod_php5/php5, cobbler, evince, gdm, kdelibs4, otrs, quagga	2011-04-05
openSUSE	openSUSE-SU-2011:0279-1	evince	2011-04-01

Comments (none posted)

ffmpeg: denial of service

Package(s):

ffmpeg

CVE #(s):

CVE-2009-4639

Created:

April 4, 2011

Updated:

July 18, 2011

Description:

From the Mandriva advisory:

The av_rescale_rnd function in the AVI demuxer in FFmpeg 0.5 allows remote attackers to cause a denial of service (crash) via a crafted AVI file that triggers a divide-by-zero error.

Alerts:

Gentoo	201310-12	ffmpeg	2013-10-25
Mandriva	MDVSA-2011:112	blender	2011-07-18
Mandriva	MDVSA-2011:061	ffmpeg	2011-04-01
Mandriva	MDVSA-2011:060	ffmpeg	2011-04-01
Mandriva	MDVSA-2011:059	ffmpeg	2011-04-01
Mandriva	MDVSA-2011:088	mplayer	2011-05-16

Comments (none posted)

ffmpeg: multiple vulnerabilities

Package(s):

ffmpeg

CVE #(s):

CVE-2010-3908 CVE-2011-0480 CVE-2011-0722 CVE-2011-0723

Created:

April 4, 2011

Updated:

September 12, 2011

Description:

From the Mandriva advisory:

Fix memory corruption in WMV parsing (CVE-2010-3908)

Multiple buffer overflows in vorbis_dec.c in the Vorbis decoder in FFmpeg, as used in Google Chrome before 8.0.552.237 and Chrome OS before 8.0.552.344, allow remote attackers to cause a denial of service (memory corruption and application crash) or possibly have unspecified other impact via a crafted WebM file, related to buffers for (1) the channel floor and (2) the channel residue. (CVE-2011-0480)

Fix heap corruption crashes (CVE-2011-0722)

Fix invalid reads in VC-1 decoding (CVE-2011-0723)

Alerts:

Gentoo	201310-12	ffmpeg	2013-10-25
Debian	DSA-2306-1	ffmpeg	2011-09-11
Mandriva	MDVSA-2011:114	blender	2011-07-18
Mandriva	MDVSA-2011:112	blender	2011-07-18
Ubuntu	USN-1104-1	ffmpeg	2011-04-04
Mandriva	MDVSA-2011:062	ffmpeg	2011-04-01
Mandriva	MDVSA-2011:061	ffmpeg	2011-04-01
Mandriva	MDVSA-2011:089	mplayer	2011-05-16

Comments (none posted)

glibc: multiple vulnerabilities

Package(s):

glibc

CVE #(s):

CVE-2011-0536 CVE-2011-1071 CVE-2011-1095

Created:

April 5, 2011

Updated:

November 28, 2011

Description:

From the Red Hat advisory:

The fix for CVE-2010-3847 introduced a regression in the way the dynamic loader expanded the $ORIGIN dynamic string token specified in the RPATH and RUNPATH entries in the ELF library header. A local attacker could use this flaw to escalate their privileges via a setuid or setgid program using such a library. (CVE-2011-0536)

It was discovered that the glibc fnmatch() function did not properly restrict the use of alloca(). If the function was called on sufficiently large inputs, it could cause an application using fnmatch() to crash or, possibly, execute arbitrary code with the privileges of the application. (CVE-2011-1071)

It was discovered that the locale command did not produce properly escaped output as required by the POSIX specification. If an attacker were able to set the locale environment variables in the environment of a script that performed shell evaluation on the output of the locale command, and that script were run with different privileges than the attacker's, it could execute arbitrary code with the privileges of the script. (CVE-2011-1095)

Alerts:

Gentoo	201312-01	glibc	2013-12-02
Ubuntu	USN-1396-1	eglibc, glibc	2012-03-09
Scientific Linux	SL-glib-20120214	glibc	2012-02-14
Oracle	ELSA-2012-0125	glibc	2012-02-14
CentOS	CESA-2012:0125	glibc	2012-02-14
Red Hat	RHSA-2012:0125-01	glibc	2012-02-13
Mandriva	MDVSA-2011:178	glibc	2011-11-25
Pardus	2011-83	glibc	2011-06-03
CentOS	CESA-2011:0412	glibc	2011-04-14
Red Hat	RHSA-2011:0413-01	glibc	2011-04-04
Red Hat	RHSA-2011:0412-01	glibc	2011-04-04

Comments (none posted)

kdelibs4: man-in-the-middle attack

Package(s):

kdelibs4

CVE #(s):

CVE-2011-1094

Created:

April 4, 2011

Updated:

June 21, 2011

Description:

From the CVE entry:

kio/kio/tcpslavebase.cpp in KDE KSSL in kdelibs before 4.6.1 does not properly verify that the server hostname matches the domain name of the subject of an X.509 certificate, which allows man-in-the-middle attackers to spoof arbitrary SSL servers via a certificate issued by a legitimate Certification Authority for an IP address, a different vulnerability than CVE-2009-2702.

Alerts:

Gentoo	201406-34	kdelibs	2014-06-30
Pardus	2011-81	dovecot	2011-06-03
Pardus	2011-79	kdelibs	2011-05-11
Ubuntu	USN-1110-1	kde4libs	2011-04-14
Mandriva	MDVSA-2011:071	kdelibs4	2011-04-08
SUSE	SUSE-SR:2011:006	apache2-mod_php5/php5, cobbler, evince, gdm, kdelibs4, otrs, quagga	2011-04-05
Ubuntu	USN-1101-1	qt4-x11	2011-04-01
openSUSE	openSUSE-SU-2011:0281-1	kdelibs4	2011-04-04
openSUSE	openSUSE-SU-2011:0280-1	kdelibs4	2011-04-04
Red Hat	RHSA-2011:0464-01	kdelibs	2011-04-21

Comments (none posted)

loggerhead: cross-site scripting

Package(s):

loggerhead

CVE #(s):

CVE-2011-0728

Created:

April 4, 2011

Updated:

April 6, 2011

Description:

From the CVE entry:

Cross-site scripting (XSS) vulnerability in templatefunctions.py in Loggerhead before 1.18.1 allows remote authenticated users to inject arbitrary web script or HTML via a filename, which is not properly handled in a revision view.

Alerts:

Fedora	FEDORA-2011-4107	loggerhead	2011-03-25
Fedora	FEDORA-2011-4085	loggerhead	2011-03-25

Comments (none posted)

logrotate: multiple vulnerabilities

Package(s):

logrotate

CVE #(s):

CVE-2011-1098 CVE-2011-1154 CVE-2011-1155

Created:

March 31, 2011

Updated:

June 26, 2012

Description:

From the Red Hat advisory:

A shell command injection flaw was found in the way logrotate handled the shred directive. A specially-crafted log file could cause logrotate to execute arbitrary commands with the privileges of the user running logrotate (root, by default). Note: The shred directive is not enabled by default. (CVE-2011-1154)

A race condition flaw was found in the way logrotate applied permissions when creating new log files. In some specific configurations, a local attacker could use this flaw to open new log files before logrotate applies the final permissions, possibly leading to the disclosure of sensitive information. (CVE-2011-1098)

An input sanitization flaw was found in logrotate. A log file with a specially-crafted file name could cause logrotate to abort when attempting to process that file a subsequent time. (CVE-2011-1155)

Alerts:

Gentoo	201206-36	logrotate	2012-06-25
Ubuntu	USN-1172-1	logrotate	2011-07-21
Pardus	2011-85	logrotate	2011-06-21
SUSE	SUSE-SR:2011:010	postfix, libthunarx-2-0, rdesktop, python, viewvc, kvm, exim, logrotate, dovecot12/dovecot20, pure-ftpd, kdelibs4	2011-05-31
openSUSE	openSUSE-SU-2011:0536-1	logrotate	2011-05-25
Fedora	FEDORA-2011-3739	logrotate	2011-03-21
Mandriva	MDVSA-2011:065	logrotate	2011-04-05
Red Hat	RHSA-2011:0407-01	logrotate	2011-03-31

Comments (none posted)

otrs: arbitrary command execution

Package(s):

otrs

CVE #(s):

CVE-2011-0456

Created:

April 1, 2011

Updated:

April 6, 2011

Description:

From the openSUSE advisory:

Insufficient quoting of shell meta characters in otrs' webscript.pl could allow remote attackers to execute arbitrary commands.

Alerts:

SUSE	SUSE-SR:2011:006	apache2-mod_php5/php5, cobbler, evince, gdm, kdelibs4, otrs, quagga	2011-04-05
openSUSE	openSUSE-SU-2011:0278-1	otrs	2011-04-01

Comments (none posted)

php-doctrine-Doctrine: SQL injection

Package(s):

php-doctrine-Doctrine

CVE #(s):

CVE-2011-1522

Created:

April 4, 2011

Updated:

April 21, 2011

Description:

From the Doctrine advisory:

The security hole was found and affects the Doctrine\DBAL\Platforms\AbstractPlatform::modifyLimitQuery() function which does not cast input values for limit and offset to integer and allows malicious SQL to be executed if these parameters are passed into Doctrine 2 directly from request variables without previous cast to integer. Functionality building on top using limit queries in the ORM such as Doctrine\ORM\Query::setFirstResult() and Doctrine\ORM\Query::setMaxResults() are also affected by this security hole.

Alerts:

Fedora	FEDORA-2011-4098	php-doctrine-Doctrine	2011-03-25
Debian	DSA-2223-1	doctrine	2011-04-20

Comments (none posted)

xmlsec1: remote overwrite of arbitrary files

Package(s):

xmlsec1

CVE #(s):

CVE-2011-1425

Created:

April 4, 2011

Updated:

May 5, 2011

Description:

From the Mandriva advisory:

xslt.c in XML Security Library (aka xmlsec) before 1.2.17, as used in WebKit and other products, when XSLT is enabled, allows remote attackers to create or overwrite arbitrary files via vectors involving the libxslt output extension and a ds:Transform element during signature verification.

Alerts:

Gentoo	201412-09	racer-bin, fmod, PEAR-Mail, lvm2, gnucash, xine-lib, lastfmplayer, webkit-gtk, shadow, PEAR-PEAR, unixODBC, resource-agents, mrouted, rsync, xmlsec, xrdb, vino, oprofile, syslog-ng, sflowtool, gdm, libsoup, ca-certificates, gitolite, qt-creator	2014-12-11
Debian	DSA-2219-1	xmlsec1	2011-04-18
Mandriva	MDVSA-2011:063	xmlsec1	2011-04-04
CentOS	CESA-2011:0486	xmlsec1	2011-05-05
CentOS	CESA-2011:0486	xmlsec1	2011-05-05
Red Hat	RHSA-2011:0486-01	xmlsec1	2011-05-04
Pardus	2011-73	xmlsec	2011-05-03

Comments (none posted)

xorg-x11: arbitrary command execution as root

Package(s):

xorg-x11

CVE #(s):

CVE-2011-0465

Created:

April 6, 2011

Updated:

June 13, 2011

Description:

From the X.Org advisory:

By crafting hostnames with shell escape characters, arbitrary commands can be executed in a root environment when a display manager reads in the resource database via xrdb.

These specially crafted hostnames can occur in two environments:

Hosts that set their hostname via DHCP
Hosts that allow remote logins via xdmcp

Alerts:

Gentoo	201412-09	racer-bin, fmod, PEAR-Mail, lvm2, gnucash, xine-lib, lastfmplayer, webkit-gtk, shadow, PEAR-PEAR, unixODBC, resource-agents, mrouted, rsync, xmlsec, xrdb, vino, oprofile, syslog-ng, sflowtool, gdm, libsoup, ca-certificates, gitolite, qt-creator	2014-12-11
Fedora	FEDORA-2011-4879	xorg-x11-server-utils	2011-04-06
CentOS	CESA-2011:0432	xorg-x11	2011-04-19
Fedora	FEDORA-2011-4871	xorg-x11-server-utils	2011-04-06
CentOS	CESA-2011:0433	xorg-x11-server-utils	2011-04-14
SUSE	SUSE-SA:2011:016	xorg-x11	2011-04-13
Slackware	SSA:2011-096-01	xrdb	2011-04-12
Red Hat	RHSA-2011:0433-01	xorg-x11-server-utils	2011-04-11
Red Hat	RHSA-2011:0432-01	xorg-x11	2011-04-11
Debian	DSA-2213-1	x11-xserver-utils	2011-04-08
Ubuntu	USN-1107-1	x11-xserver-utils	2011-04-06
openSUSE	openSUSE-SU-2011:0298-1	xorg-x11	2011-04-06
Mandriva	MDVSA-2011:076	xrdb	2011-04-21

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.39-rc2, released on April 5. "It's been an uncommonly calm -rc2, which should make me really happy, but quite honestly just makes me really suspicious. You guys are up to something, aren't you?" See the long-format changelog for all the details.

Stable updates: the 2.6.35.12 update was released, with a long list of fixes, on March 31.

Comments (none posted)

Quotes of the week

And that concept can be brought to its logical conclusion: i think it's only a matter of time until someone takes the Linux kernel, integrates klibc and a toolchain into it with some good initial userspace and goes wild with that concept, as a single, sane, 100% self-hosting and self-sufficient OSS project, tracking the release schedule of the Linux kernel.

-- Ingo Molnar

The minimal patchset is too minimal for Oren's use and the maximal patchset seems to have run aground on general kernel sentiment. So I guess you either take the minimal patchset and make it less minimal or take the maximal patchset and make it less maximal, ending up with the same thing. How's that for hand-waving useless obviousnesses.

-- Andrew Morton

I've told people this before, and I'll tell it again: when I flame submaintainers, they should try to push the pain down. I'm not really asking those submaintainers to clean up all the stuff they are getting: I'm basically asking people to say "no", or at least push back a lot, and argue with the people who send you code. Tell them what you don't like about the code, and tell them that you can't take it any more.

-- Linus Torvalds

The mobile space is about proprietary drivers.

-- Mark Charlebois, Qualcomm Innovation Center (on stage at the Linux Foundation Collaboration Summit)

Comments (none posted)

Quotes of the week II: in memoriam

As I have seen this tangentially mentioned already a few times publicly, I figured it warranted it's own announcement now.

Linux has lost a great developer with the passing of David Brownell recently and he will be greatly missed.

-- Greg Kroah-Hartman

David made contributions to a large number of areas in the Linux kernel. Even a quick look through MAINTAINERS will show that he worked on USB controllers (OHCI, EHCI, OMAP and others), USB gadgets, USB networking, and SPI. He was influential in the core USB design (the HCD "glue" layer and the scatter-gather library) and the development of Power Management (system sleep and the USB PM implementation). His designs were elegant and his code was always a pleasure to read.

He also was a big help to me personally, assisting in my initial entry to USB core development. And he was the first person I met at the first Linux conference I attended. I too will miss him.

-- Alan Stern

I guess many of us have similar experience with Dave. He also helped me a lot when I first started doing Linux development. I learned a lot from him and will miss him a lot. His teachings, I will always carry with me.

-- Felipe Balbi

Comments (none posted)

Kernel development news

ARM wrestling

By Jonathan Corbet
April 6, 2011

The Linux kernel supports a wide variety of architectures, some of which are more prominent than others. The ARM architecture does not usually draw a lot of attention, but, over the years, it has become one of the most important architectures for Linux. There's now a vast array of embedded devices which run Linux because the kernel runs well on ARM. So when the mailing lists see extended and heated discussions about the quality of the ARM architecture code, it's worth paying attention.

It all started early in the 2.6.39 merge window when Linus objected to one of many pull requests for an ARM subarchitecture. He complained about excessive churn in the architecture, duplicated code, board-specific data encoded in source files, and conflicts between different merge requests. Much of that board-specific data, he says, should be pulled out of the kernel and into the boot loader; others have suggested that device trees could solve much of that problem. Meanwhile, it is impossible to build a kernel which runs on a wide variety of ARM systems, and that, he says, is a problem for the platform as a whole:

Why? Think of the Ubuntu's etc of the world. If you can't make half-way generic install images, you can't have a reasonably generic distribution. And if you don't have that, then what happens to your developer situation? Most sane people won't touch it with a ten-foot pole, because the bother is simply not worth their time.

There actually seems to be a bit of a consensus on what the sources of the problems with the ARM architecture are. The hardware itself varies widely from one chip to the next; each vendor's system-on-chip offerings are inconsistent with each other, and even more so with other vendors' products. According to Nicolas Pitre, the openness of Linux has helped to make ARM successful, but is also part of the problem:

On ARM you have no prepackaged "real" Windows. That let hardware people try things. So they do change the hardware platform all the time to gain some edge. And this is no problem for them because most of the time they have access to the OS source code and they modify it themselves directly. No wonder why Linux is so popular on ARM. I'm sure hardware designers really enjoy this freedom.

So the ARM architecture is a massive collection of "subplatforms," each one of those subplatforms is managed independently, often by different developers, and few of those developers have the time for or interest in doing cross-platform architecture work. The result is a lot of code flux, many duplicated drivers, and lots of hacks.

Complicating the situation is the simple fact that the kernel is a victim of its own success. For years developers have been beating on the embedded industry to work upstream and to get its code into the kernel. Now the industry is doing exactly that; the result is a lot of code, not all of which is as nice as we would like. The fact that a lot of embedded vendors seem to have little long-term vision or interest in solving anything but immediate problems makes things worse. The result is code that "works for now," but which is heading toward a long-term maintenance disaster.

How is this problem to be solved? It seems clear that the ARM architecture needs more maintainers who are concerned with cross-platform issues and improving the state of ARM support as a whole. There would appear to be a consensus that ARM maintainer Russell King is doing a good job with the core code, and there are a few people (Nicolas Pitre, Catalin Marinas, Tony Lindgren, etc.) who are trying to bring some order to the subplatform mess, but they seem to be unable to contain the problem. As Nicolas put it:

So we need help! If core kernel people could get off their X86 stool and get down in the ARM mud to help sort out this mess that would be really nice (thanks tglx). Until then all that the few of us can do is to contain the flood and hope for the best, and so far things being as they are have still worked surprisingly well in practice for users....

And we can't count on vendor people doing this work. They are all busy porting the kernel to their next SOC version so they can win the next big Android hardware design, and doing so with our kernel quality standards is already quite a struggle for them.

There are some developers who are willing to provide at least some of that help. The Linaro project could also conceivably take on a role here. But that still leaves open the question of just how the code can be cleaned up. Arnd Bergmann has suggested the radical step of creating a new ARM architecture tree with a new, clean, design, then moving support over to it. Eventually the older code would either fade away, or it would only be used to support older hardware. Creating a new architecture tree seems like a big step, but it has been done before - more than once. The x86-64 architecture was essentially a clean start from x86; the two platforms were then eventually merged back together into a much cleaner tree. PowerPC support went through a similar process.

Whether that will happen with ARM remains to be seen; there are other developers who would rather perform incremental cleanups on the existing ARM tree. Either way, the first step will have to be finding developers who are willing to do the work. There is no shortage of developers who are interested in ARM, but fewer of them are willing and able to do high-level architectural work - and to deal with the inevitable resistance to change. As Thomas Gleixner said:

The only problem is to find a person, who is willing to do that, has enough experience, broad shoulders and a strong accepted voice. Not to talk about finding someone who is willing to pay a large enough compensation for pain and suffering.

So there are some challenges to overcome. But there is also a great deal of economic value to the ARM platform, a lot of people working in that area, and a reasonably common understanding of where the problems are. So chances are good that some sort of solution will be found.

Comments (6 posted)

Linux Filesystem, Storage, and Memory Management Summit, Day 1

By Jonathan Corbet
April 5, 2011

It has been a mere eight months since the 2010 Linux Filesystem, Storage, and Memory Management Summit was held in Boston, but that does not mean that there is not much to talk about - or that there has not been time to get a lot done. A number of items discussed at the 2010 event, including writeback improvements, better error handling, transparent huge pages, the I/O bandwidth controller, and the block-layer barrier rework, have been completed or substantially advanced in those eight months. Some other tasks remain undone, but there is hope: Trond Myklebust, in the introductory session of the 2011 summit, said that this might well be the last time that it will be necessary to discuss pNFS - a prospect which caused very little dismay.

The following is a report from the first day of the 2011 meeting, held on April 4. This coverage is necessarily incomplete; when the group split into multiple tracks, your editor followed the memory management discussions.

Writeback

The Summit started with a plenary session to review the writeback problem. Writeback is the process of writing dirty pages back to persistent storage; it has been identified as one of the most significant performance issues with recent kernels. This session, led by Jan Kara, Jens Axboe, and Johannes Weiner, made it clear that a lot of thought is going into the issue, but that a full understanding of the problem has not yet been reached.

One aspect of the problem that is well understood is that there are too many places in the kernel which are submitting writeback I/O. As a result, the different I/O streams conflict with each other and cause suboptimal I/O patterns, even if the individual streams are well organized - which is not always the case. So it would be useful to reduce the number of submission points, preferably to just one.

Eliminating "direct reclaim," where processes which are allocating pages take responsibility for flushing other pages out to disk, is at the top of most lists. Direct reclaim cannot easily create large, well-ordered I/O, is computationally expensive, and leads to excessive lock contention. Some patches have been written in an attempt to improve direct reclaim by, for example, performing "write-around" of targeted pages to create larger contiguous operations, but none have passed muster for inclusion into the mainline.

There was some talk of improving the I/O scheduler so that it could better merge and organize the I/O stream created by direct reclaim. One problem with that idea is that the request queue length is limited to 128 requests or so, which is not enough to perform reordering when there are multiple streams of heavy I/O. There were suggestions that the I/O scheduler might be considered broken and in need of fixing, but that view did not go very far. The problem with increasing the request queue length is that there would be a corresponding increase in I/O latencies, which is not a desirable result. Christoph Hellwig summed things up by saying that it was a mistake to generate bad I/O patterns in the higher levels of the system and expect the lower layers to fix them up, especially when it's relatively easy to create better I/O patterns in the first place.

Many filesystems already try to improve things by generating larger writes than the kernel asks them to. Each filesystem has its own, specific hacks, though, and there is no generic solution to the problem. One thing that some filesystems could apparently do better has to do with their response to writeback on files where delayed allocation is being done. The kernel will often request writeback on a portion of the delayed allocation range; if the filesystem only completes allocation for the requested range, excessive fragmentation of the file may result. So, in response to writeback requests where the destination blocks have not yet been allocated, the filesystem should always allocate everything it can to create larger contiguous chunks.

There are a couple of patches out there aimed at the goal of eliminating direct reclaim; both are based on the technique of blocking tasks which are dirtying pages until pages can be written back elsewhere in the system. The first of these was written by Jan; he was, he said, aiming at making the code as simple as possible. With this patch, a process which goes over its dirty page limit will be put on a wait queue. Occasionally the system will look at the I/O completions on each device and "distribute" those completions among the processes which are waiting on that device. Once a process has accumulated enough completions, it will be allowed to continue executing. Processes are also resumed if they go below their dirty limit for some other reason.

The task of distributing completions runs every 100ms currently, leading to concerns that this patch could cause 100ms latencies for running processes. That could happen, but writeback problems can cause worse latencies now. Jan was also asked if control groups should be used for this purpose; his response was that he had considered the idea but put it aside because it added too much complexity at this time. There were also worries about properly distributing completions to processes; the code is inexact, but, as Chris Mason put it, getting more consistent results than current kernels is not a particularly hard target to hit.

Evidently this patch set does not yet play entirely well with network filesystems; completions are harder to track and the result ends up being bursty.

The alternative patch comes from Wu Fengguang. The core idea is the same, but it works by attempting to limit the dirtying of pages based on the amount of I/O bandwidth which is available for writeback. The bandwidth calculations are said to be complex to the point that mere memory management hackers have a hard time figuring out how it all works. When all else fails, and the system goes beyond the global dirty limit, processes will simply be put to sleep for 200ms at a time to allow the I/O subsystem to catch up.

Mike Rubin complained that this patch would lead to unpredictable 200ms latencies any time that 20% of memory (the default global dirty limit) is dirtied. It was agreed that this is unfortunate, but it's no worse than what happens now. There was some talk of making the limit higher, but in the end that would change little; if pages are being dirtied faster than they can be written out, any limit will be hit eventually. Putting the limit too high, though, can lead to livelocks if the system becomes completely starved of memory.

Another problem with this approach is that it's all based on the bandwidth of the underlying block device. If the I/O pattern changes - a process switches from sequential to random I/O, for example - the bandwidth estimates will be wrong and the results will not be optimal. The code also has no way of distinguishing between processes with different I/O patterns, with the result that those doing sequential I/O will be penalized in favor of other processes with worse I/O patterns.

Given that, James Bottomley asked, should we try to limit I/O operations instead of bandwidth? The objection to that idea is that it requires better tracking of the ownership of pages; the control group mechanism can do that tracking, but it brings a level of complexity and overhead that is not pleasing to everybody. It was asserted that control groups are becoming more mandatory all the time, but that view has not yet won over the entire community.

A rough comparison of the two approaches leads to the conclusion that Jan's patch can cause bursts of latency and occasional pauses. Fengguang's patch, instead, has smoother behavior but does not respond well to changing workloads; it also suffers from the complexity of the bandwidth estimation code. Beyond that, from the (limited) measurements which were presented, the two seem to have similar performance characteristics.

What's the next step? Christoph suggested rebasing Fengguang's more complex patch on top of Jan's simpler version, merging the simple patch, and comparing from there. Before that comparison can be done, though, there need to be more benchmarks run on more types of storage devices. Ted Ts'o expressed concerns that the patches are insufficiently tested and might cause regressions on some devices. The community as a whole still lacks a solid idea of what benchmarks best demonstrate the writeback problems, so it's hard to say if the proposed solutions are really doing the job. That said, Ted liked the simpler patch for a reason that had not yet been mentioned: by doing away with direct reclaim, it at least gets rid of the stack space exhaustion problem. Even in the absence of performance benefits, that would be a good reason to merge the code.

But it would be good to measure the patch's performance effects, so better benchmarks are needed. Getting those benchmarks is easier said than done, though; at least some of them need to run on expensive, leading-edge hardware which must be updated every year. There are also many workloads to test, many of which are not easily available to the public. Mike Rubin did make a promise that Google would post at least three different benchmarks in the near future.

There was some sympathy for the idea of merging Jan's patch, but Jan would like to see more testing done first. Some workloads will certainly regress, possibly as a result of performance hacks in specific filesystems. Andrew Morton said that there needs to be a plan for integration with the memory control group subsystem; he would also like a better analysis of what writeback problems are solved by the patch. In the near future, the code will be posted more widely for review and testing.

VFS summary

James Bottomley started off the session on the virtual filesystem layer with a comment that, despite the fact that there were a number of "process issues" surrounding the merging of Nick Piggin's virtual filesystem scalability patches, this session was meant to be about technical issues only. Nick was the leader of the session, since the planned leader, Al Viro, was unable to attend due to health problems. As it turned out, Nick wanted to talk about process issues, so that is where much of the time was spent.

The merging of the scalability work, he said, was not ideal. The patches had been around for a year, but there had not been enough review of them and he knew it at the time. Linus knew it too, but decided to merge that work anyway as a way of forcing others to look at it. This move worked, but at the cost of creating some pain for other VFS developers.

Andrew Morton commented that the merging of the patches wasn't a huge problem; sometimes the only way forward is to "smash something in and fix it up afterward." The real problem, he said, was that Nick disappeared after the patches went in; he wasn't there to support the work. Developers really have to be available after merging that kind of change. If necessary, Andrew said, he is available to lean on a developer's manager if that's what it takes to make the time available.

In any case, the code is in and mostly works. The autofs4 automounter is said to be "still sick," but that may not be entirely a result of the VFS work - a lot of automounter changes went in at the same time.

An open issue is that lockless dentry lookup cannot happen on a directory which has extended attributes or when there is any sort of security module active. Nick acknowledged the problem, but had no immediate solution to offer; it is a matter of future work. Evidently supporting lockless lookup in such situations will require changes within the filesystems as well as at the VFS layer.

A project that Nick seemed more keen to work on was adding per-node lists for data structures like dentries and inodes, enabling them to be reclaimed on a per-node basis. Currently, a memory shortage on one NUMA node can force the eviction of inodes and dentries on all nodes, even though memory may not be in short supply elsewhere in the system. Christoph Hellwig was unimpressed by the idea, suggesting that Nick should try it and he would see how badly it would work. Part of the problem, it seems, is that, for many filesystems, an inode cannot be reclaimed until all associated pages have been flushed to disk. Nick suggested that this was a problem in need of fixing, since it makes memory management harder.

A related problem, according to Christoph, is that there is no coordination when it comes to the reclaim of various data structures. Inodes and dentries are closely tied, for example, but they are not reclaimed together, leading to less-than-optimal results. There were also suggestions that more of the reclaim logic for these data structures should be moved to the slab layer, which already has a fair amount of node-specific knowledge.

Transcendent memory

Dan Magenheimer led a session on his transcendent memory work. The core idea behind transcendent memory - a type of memory which is only addressable on a page basis and which is not directly visible to the kernel - remains the same, but the uses have expanded. Transcendent memory has been tied to virtualization (and to Xen in particular) but, Dan says, he has been pushing it toward more general uses and has not written a line of Xen code in six months.

So where has this work gone? It has resulted in "zcache," a mechanism for in-RAM memory compression which was merged into the staging tree for 2.6.39. There is increasing support for devices meant to extend the amount of available RAM - solid-state storage and phase-change memory devices, for example. The "ramster" module is a sort of peer-to-peer memory mechanism allowing pages to be moved around a cluster; systems which have free memory can host pages for systems which are under memory stress. And yes, transcendent memory can still be used to move RAM into and out of virtual machines.

All of the above can be supported behind the cleancache and frontswap patches, which Linus didn't get around to merging for 2.6.39-rc1. He has not yet said "no," but the chances of a 2.6.39 merge seem to be declining quickly.

Hugh Dickins voiced a concern that frontswap is likely to be filled relatively early in a system's lifetime, and what will end up there is application initialization code which is unlikely to ever be needed again. What is being done to avoid filling the frontswap area with useless stuff? Dan acknowledged that it could be a problem; one solution is to have a daemon which would fetch pages back out of frontswap when memory pressure is light. Hugh worried that, in the long term, the kernel was going to need new data structures to track pages as they are put into hierarchical swap systems and that frontswap, which lacks that tracking, may be a step in the wrong direction.

Andrea Arcangeli said that a feature like zcache could well be useful for guests running under KVM as well. Dan agreed that it would be nice, but that he (an Oracle employee) could not actually do the implementation.

How memory control groups are used

This session was a forum in which representatives of various companies could talk about how they are making use of the memory controller control groups functionality. It turns out that this feature is of interest to a wide range of users.

Ying Han gave the surprising news that Google has a lot of machines to manage, but also a lot of jobs to run. So the company is always trying to improve the utilization of those machines, and memory utilization in particular. Lots of jobs tend to be packed into each machine, but that leads to interference; there simply is not enough isolation between tasks. Traditionally, Google has used a "fake NUMA" system to partition memory between groups of tasks, but that has some problems. Fake NUMA suffers from internal fragmentation, wasting a significant amount of memory. And Google is forced to carry a long list of fake NUMA patches which are not ever likely to make it upstream.

So Google would rather make more use of memory control groups, which are upstream and which, thus, will not be something the company has to maintain on its own indefinitely. Much work has gone upstream, but there are still unmet needs. At the top of the list is better accounting of kernel allocations; the memory controller currently only tracks memory allocations made from user space. There is also a need for "soft limits" which would be more accommodating of bursty application behavior; this topic was discussed in more detail later on.

Hiroyuki Kamezawa of Fujitsu said that his employer deals with two classes of customers in particular: those working in the high-performance computing area, and government. The memory controller is useful in both situations, but it has one big problem: performance is not what they really need it to be. Things get especially bad when control groups begin to push up against the limits. So his work is mostly focused on improving the performance of memory control groups.

Larry Woodman of Red Hat, instead, deals mostly with customers in the financial industry. These customers are running trading systems with tight deadlines, but they also need to run backups at regular intervals during the day. The memory controller allows these backups to be run in a small, constrained group which enables them to make forward progress without impeding the trading traffic.

Michal Hocko of SUSE deals with a different set of customers, working in an area which was not well specified. These customers have large datasets which take some time to compute, and which really need to be kept in RAM if at all possible. Memory control groups allow those customers to protect that memory from reclaim. They work like a sort of "clever mlock()" which keeps important memory around most of the time, but which does not get in the way of memory overcommit.

Pavel Emelyanov of Parallels works with customers operating as virtual hosting Internet service providers. They want to be able to create containers with a bounded amount of RAM to sell to customers; memory control groups enable that bounding. They also protect the system against memory exhaustion denial-of-service attacks, which, it seems, are a real problem in that sector. He would like to see more graceful behavior when memory runs out in a specific control group; it should be signaled as a failure in a fork() or malloc() call rather than a segmentation fault signal at page fault time. He would also like to see better accounting of memory use to help with the provisioning of containers.

David Hansen of IBM talked about customers who are using memory control groups with KVM to keep guests from overrunning their memory allocations. Control groups are nice because they apply limits while still allowing the guests to use their memory as they see fit. One interesting application is in cash registers; these devices, it seems, run Linux with an emulation layer that can run DOS applications. Memory control groups are useful for constraining the memory use of these applications. Without this control, these applications can grow until the OOM killer comes into play; the OOM killer invariably kills the wrong process (from the customer's point of view), leading to the filing of bug reports. The value of the memory controller is not just that it constrains memory use - it also limits the number of bug reports that he has to deal with.

Coly Li, representing Taobao, talked briefly about that company's use of memory control groups. His main wishlist item was the ability to limit memory use based on the device which is providing backing store.

What's next for the memory controller

The session on future directions for the memory controller featured contributors who were both non-native English speakers and quite soft-spoken, so your editor's notes are, unfortunately, incomplete.

One topic which came up repeatedly was the duplication of the least-recently used (LRU) list. The core VM subsystem maintains LRU lists in an attempt to track which pages have gone unused for the longest time and which, thus, are unlikely to be needed in the near future. The memory controller maintains its own LRU list for each control group, leading to a wasteful duplication of effort. There is a strong desire to fix this problem by getting rid of the global LRU list and performing all memory management with per-control-group lists. This topic was to come back later in the day.

Hiroyuki Kamezawa complained that the memory controller currently tracks the sum of RAM and swap usage. There could be value, he said, in splitting swap usage out of the memory controller and managing it separately.

Management of kernel memory came up again. It was agreed that this was a hard problem, but there are reasons to take it on. The first step, though, should be simple accounting of kernel memory usage; the application of limits can come later. Pavel noted that it will never be possible to track all kernel memory usage, though; some allocations can never be easily tied to specific control groups. Memory allocated in interrupt handlers is one example. It also is often undesirable to fail kernel allocations even when a control group is over its limits; the cost would simply be too high. Perhaps it would be better, he said, to focus on specific problem sources. Page tables, it seems, are a data structure which can soak up a lot of memory and a place where applying limits might make sense.

The way shared pages are accounted for was discussed for a bit. Currently, the first control group to touch a page gets charged for it; all subsequent users get the page for free. So if one control group pages in the entire C library, it will find its subsequent memory use limited while everybody else gets a free ride. In practice, though, this behavior does not seem to be a problem; a control group which is carrying too much shared data will see some of it reclaimed, at which point other users will pick up the cost. Over time, the charging for shared pages is distributed throughout the system, so there does not seem to be a need for a more sophisticated mechanism for accounting for them.

Local filesystems in the cloud

Mike Rubin of Google ran a plenary session on the special demands that cloud computing puts on filesystems. Unfortunately, the notes on this talk are also incomplete due to a schedule misunderstanding.

What cloud users need from a filesystem is predictable performance, the ability to share systems, and visibility into how the filesystem works. Visibility seems to be the biggest problem; it is hard, he said, to figure out why even a single machine is running slowly. Trying to track down problems in an environment consisting of thousands of machines is a huge problem.

Part of that problem is just understanding a filesystem's resource requirements. How much memory does an ext4 filesystem really need? That number turns out to be 2MB of RAM for every terabyte of disk space managed - a number which nobody had been able to provide. Just as important is the metadata overhead - how much of a disk's bandwidth will be consumed by filesystem metadata? In the past, Google has been surprised when adding larger disks to a box has caused the whole system to fall over; understanding the filesystem's resource demands is important to prevent such things from happening in the future.

Tracing, he said, is important - he does not know how people ever lived without it. But there need to be better ways of exporting the information; there is a lack of user-space tools which can integrate the data from a large number of systems. Ted Ts'o added that the "blktrace" tool is fine for a single system where root access is available. In a situation where there are hundreds or thousands of machines, and where developers may not have root access on production systems, blktrace does not do the job. There needs to be a way to get detailed, aggregated tracing information - with file name information - without root access.

Michael said that he is happy that Google's storage group has been upstreaming almost everything they have done. But, he said, the group has a "diskmon" tool which still needs to see the light of day. It can create histograms of activity and latencies at all levels of the I/O stack, showing how long each operation took and how much of that time was consumed by metadata. It is all tied to a web dashboard which can highlight problems down to the ID of the process which is having trouble. This tool is useful, but it is not yet complete. What we really need, he said, is to have that kind of visibility designed into kernel subsystems from the outset.

Michael concluded by saying that, in the beginning, his group was nervous about engaging with the development community. Now, though, they feel that the more they do it, the better it gets.

Memory controller targeted reclaim

Back in the memory management track, Ying Han led a session on improving reclaim within the memory controller. When things get tight, she said, the kernel starts reclaiming from the global LRU list, grabbing whatever pages it finds. It would be far better to reclaim pages specifically from the control groups which are causing the problem, limiting the impact on the rest of the system.

One technique Google uses is soft memory limits in the memory controller. Hard limits place an absolute upper bound on the amount of memory any group can use. Soft limits, instead, can be exceeded, but only as long as the system as a whole is not suffering from memory contention. Once memory gets tight at the global level, the soft limits are enforced; that automatically directs reclaim at the groups which are most likely to be causing the global stress.

Adding per-group background reclaim, which would slowly clean up pages in the background, would help the situation, she said. But the biggest problem is the global LRU list. Getting rid of that list would eliminate contention on the per-zone LRU lock, which is a problem, but, more importantly, it would improve isolation between groups. Johannes Weiner worried that eliminating the global LRU list would deprive the kernel of its global view of memory, making zone balancing harder; Rik van Riel responded that we are able to balance pages between zones using per-zone LRU lists now; we should, he said, be able to do the same thing with control groups.

The soft limits feature can help with global balancing. There is a problem, though, in that configuring those limits is not an easy task. The proper limits depend on the total load on the system, which can change over time; getting them right will not be easy.

Andrea Arcangeli made the point that whatever is done with the global LRU list cannot be allowed to hurt performance on systems where control groups are configured out. The logic needs to transparently fall back to something resembling the current implementation. In practice, that fallback is likely to take the form of a "global control group" which contains all processes which are not part of any other group. If control groups are not enabled, the global group would be the only one in existence.

Shrinking struct page_cgroup

The system memory map contains one struct page for every page in the system. That's a lot of structures, so it's not surprising that struct page is, perhaps, the most tightly designed structure in the kernel. Every bit has been placed into service, usually in multiple ways. The memory controller has its own per-page information requirements; rather than growing struct page, the memory controller developers created a separate struct page_cgroup instead. That structure looks like this:

    struct page_cgroup {
	unsigned long flags;
	struct mem_cgroup *mem_cgroup;
	struct page *page;
	struct list_head lru;
    };

The existence of one of these structures for every page in the system is part of why enabling the memory controller is expensive. But Johannes Weiner thinks that he can reduce that overhead considerably - perhaps to zero.

Like the others, Johannes would like to get rid of the duplicated LRU lists; that would allow the lru field to be removed from this structure. It should also be possible to remove the struct page backpointer by using a single LRU list as well. The struct mem_cgroup pointer, he thinks, is excessive; there will usually be a bunch of pages from a single file used in any given control group. So what is really needed is a separate structure to map from a control group to the address_space structure representing the backing store for a set of pages. Ideally, he would point to that structure (instead of struct address_space) in struct page, but that would require some filesystem API changes.

The final problem is getting rid of the flags field. Some of the flags used in this structure, Johannes thinks, can simply be eliminated. The rest could be moved into struct page, but there is little room for more flags there on 32-bit systems. How that problem will be resolved is not yet entirely clear. One way or the other, though, it seems that most or all of the memory overhead associated with the memory controller can be eliminated with some careful programming.

Memory compaction

Mel Gorman talked briefly about the current state of the memory compaction code, which is charged with the task of moving pages around to create larger, physically-contiguous ranges of free pages. The rate of change in this code, he said, has reduced "somewhat." Initially, the compaction code was relatively primitive; it only had one user (hugetlbfs) to be concerned about. Since then, the lumpy reclaim code has been mostly pushed out of the kernel, and transparent huge pages have greatly increased the demands on the compaction code.

Most of the problems with compaction have been fixed. The last was one in which interrupts could be disabled for long periods - up to about a half second, a situation which Mel described as "bad." He also noted that it was distressing to see how long it took to find the bug, even with tools like ftrace available. There are more interrupt-disabling problems in the kernel, he said, especially in the graphics drivers.

One remaining problem with compaction is that pages are removed from the LRU list while they are being migrated to their new location; then they are put back at the head of the list. As a result, the kernel forgets what it knew about how recently the page has actually been used; pages which should have been reclaimed can live on as a result of compaction. A potential fix, suggested by Minchan Kim, is to remember which pages were on either side of the moved page in the LRU list; after migration, if those two pages are still together on the LRU, it probably makes sense to reinsert the moved page between them. Mel asked for comments on this approach.

Rik van Riel noted that, when transparent huge pages are used, the chances of physically-contiguous pages appearing next to each other in the LRU list are quite high; splitting a huge page will create a whole set of contiguous pages. In that situation, compaction is likely to migrate several contiguous pages together; that would break Minchan's heuristic. So Mel is going to investigate a different approach: putting the destination page into the LRU in the original page's place while migration is underway. There are some issues that need to be resolved - what happens if the destination page falls off the LRU and is reclaimed during migration, for example - but that approach might be workable.

Mel also talked briefly about some experiments he ran writing large trees to slow USB-mounted filesystems. Things have gotten better in this area, but the sad fact is that generating lots of dirty pages which must be written back to a USB stick can still stall the system for a long time. He was surprised to learn that the type of filesystem used on the device makes a big difference; VFAT is very slow, ext3 is better, and ext4 is better yet. What, he asked, is going on?

There was a fair amount of speculation without a lot of hard conclusions. Part of the problem is probably that the filesystem (ext3, in particular) will end up blocking processes which are waiting on buffers until a big journal commit frees some buffers. That can cause writes to a slow device to stall unrelated processes. It seems that there is more going on, though, and the problem is not yet solved.

Per-CPU variables

Christoph Lameter and Tejun Heo discussed per-CPU data. For the most part, the session was a beginner-level introduction to this feature and its reason for existence; see this article if a refresher is needed. There was some talk about future applications of per-CPU variables; Christoph thinks that there is a lot of potential for improving scalability in the VFS layer in particular. Further in the future, it might make sense to confine certain variables to specific CPUs, which would then essentially function as servers for the rest of the kernel; LRU scanning was one function which could maybe be implemented in this way.

There was some side talk about the limitations placed on per-CPU variables on 32-bit systems. Those limits exist, but 32-bit systems also create a number of other, more severe limits. It was agreed that the limit to scalability with 32 bits was somewhere between eight and 32 CPUs.

Lightning talks

The final session of the day was a small set of lightning talks. Your editor will plead "incomplete notes" one last time; perhaps the long day and the prospect of beer caused a bit of inattention.

David Howells talked about creating a common infrastructure for the handling of keys in network filesystems. Currently most of these filesystems handle keys for access control, but they all have their own mechanisms. Centralizing this code could simplify a lot of things. He would also like to create a common layer for the mapping of user IDs while he is at it.

David also talked about a scheme for the attachment of attributes to directories at any level of a network filesystem. These attributes would control behavior like caching policies. There were questions as to why the existing extended attribute mechanism could not be used; it came down to a desire to control policy on the client side when root access to the server might not be available.

Matthew Wilcox introduced the "NVM Express" standard to the group. This standard describes the behavior of solid-state devices connected via PCI-Express. The standard was released on March 1; a Linux driver, he noted with some pride, was shipped on the same day. The Windows driver is said to be due within 6-9 months; actual hardware can be expected within about a year.

The standard seems to be reasonably well thought out; it provides for all of the functionality one might expect on these devices. It allows devices to implement multiple "namespaces" - essentially separate logical units covering parts of the available space. There are bits for describing the expected access patterns, and a "this data is already compressed so don't bother trying to compress it yourself" bit. There is a queued "trim" command which, with luck, won't destroy performance when it is used.

How the actual hardware will behave remains to be seen; Matthew disappointed the audience with his failure to have devices to hand out.

Day 2

See this page for reporting from the second day of the summit.

Comments (12 posted)

Linux Filesystem, Storage, and Memory Management Summit, Day 2

By Jonathan Corbet
April 6, 2011

This article covers the second day of the 2011 Linux Filesystem, Storage, and Memory Management Summit, held on April 5, 2011 in San Francisco, California. Those who have not yet seen the first day coverage may want to have a look before continuing here.

The opening plenary session was led by Michael Cornwall, the global director for technology standards at IDEMA, a standards organization for disk drive manufacturers. His talk, which was discussed in a separate article, covered the changes that are coming in the storage industry and how the Linux community can get involved to make things work better.

I/O resource management

The main theme of the memory management track often appeared to be "control groups"; for one session, though, the entire gathering got to share the control group fun as Vivek Goyal, Fernando Cao, and Chad Talbott led a discussion on I/O bandwidth management. There are two I/O bandwidth controllers in the kernel now: the throttling controller (which can limit control groups to an absolute bandwidth value) and the proportional controller (which divides up the available bandwidth between groups according to an administrator-set policy). Vivek was there to talk about the throttling controller, which is in the kernel and working, but which still has a few open issues.

One of those is that the throttling controller does not play entirely well with journaling filesystems. I/O ordering requirements will not allow the journal to be committed before other operations have made it to disk; if some of those other operations have been throttled by the controller, the journal commit stalls and the whole filesystem slows down. Another is that the controller can only manage synchronous writes; writes which have been buffered through the page cache have lost their association with the originating control group and cannot be charged against that group's quota. There are patches to perform throttling of buffered writes, but that is complicated and intrusive work.

Another problem was pointed out by Ted Ts'o: the throttling controller applies bandwidth limits on a per-device basis. If a btrfs filesystem is in use, there may be multiple devices which make up that filesystem. The administrator would almost certainly want limits to apply to the volume group as a whole, but the controller cannot do that now. A related problem is that some users want to be able to apply global limits - limits on the amount of bandwidth used on all devices put together. The throttling controller also does not work with NFS-mounted filesystems; they have no underlying device at all, so there is no place to put a limit.

Chad Talbott talked about the proportional bandwidth controller; it works well with readers and synchronous writers, but, like the throttling controller, it is unable to deal with asynchronous writes. Fixing that will require putting some control group awareness into the per-block-device flushing threads. The system currently maintains a set of per-device lists containing inodes with dirty pages; those lists need to be further subdivided into per-control-group lists to enable the flusher threads to write out data according to the set policy. This controller also does not yet properly implement hierarchical group scheduling, though there are patches out there to add that functionality.

The following discussion focused mostly on whether the system is accumulating too many control groups. Rather than a lot of per-subsystem controllers, we should really have a cross-subsystem controller mechanism. At this point, though, we have the control groups (and their associated user-space API which cannot be broken) that are in the kernel. So, while some (like James Bottomley) suggested that we should maybe dump the existing control groups in favor of something new which gets it right, that will be a tall order. Beyond that, as Mike Rubin pointed out, we don't really know how control groups should look even now. There has been a lack of "taste and style" people to help design this interface.

Working set estimation

Back in the memory management track, Michel Lespinasse discussed Google's working set estimation code. Google has used this mechanism for some time as a way of optimally placing new jobs in its massive cluster. By getting a good idea of how much memory each job is really using, they can find the machines with the most idle pages and send new work in that direction. Working set estimation, in other words, helps Google to make better decisions on how to overcommit its systems.

The implementation is a simple kernel thread which scans through the physical pages on the system, every two minutes by default. It looks at each page to determine whether it has been touched by user space or not and remembers that state. The whole idea is to try to figure out how many pages could be taken away from the system without causing undue memory pressure on the jobs running there.

The kernel thread works by setting a new "idle" flag on each page which looks like it has not been referenced. That bit is cleared whenever an actual reference happens (as determined by looking at whether the VM subsystem has cleared the "young" bit). Pages which are still marked idle on the successive scan are deemed to be unused. The estimation code does not take any action to reclaim those pages; it simply exports statistics on how many unused pages there are through a control group file. The numbers are split up into clean, swap-backed dirty, and file-backed dirty pages. It's then up to code in user space to decide what to do with that information.

There were questions about the overhead of the page scanning; Michel said that scanning every two minutes required about 1% of the available CPU time. There were also questions about the daemon's use of two additional page flags; those flags are a limited resource on 32-bit systems. It was suggested that a separate bitmap outside of the page structure could be used. Google runs everything in 64-bit mode, though, so there has been little reason to care about page flag exhaustion so far. Rik van Riel suggested that the feature could simply not be supported on 32-bit systems. He also suggested that the feature might be useful in other contexts; systems running KVM-virtualized guests could use it to control the allocation of memory with the balloon driver, for example.

Virtual machine sizing

Rik then led a discussion on a related topic: allocating the right amount of memory to virtual machines. As with many problems, there are two distinct aspects: policy (figuring out what the right size is for any given virtual machine) and mechanism (actually implementing the policy decisions). There are challenges on both sides.

There are a number of mechanisms available for controlling the memory available to a virtual machine. "Balloon drivers" can be used to allocate memory in guests and make it available to the host; when a guest needs to get smaller, the balloon "inflates," forcing the guest to give up some pages. Page hinting is a mechanism by which the guest can inform the host that certain pages do not contain useful data (for example, they are on the guest's free list). The host can then reclaim memory used for so-hinted pages without the need to write them out to backing store. The host can also simply swap the guest's pages out without involving the guest operating system at all. The KSM mechanism allows the kernel to recover pages which contain duplicated contents. Compression can be used to cram data into a smaller number of pages. Page contents can also simply be moved around between systems or stashed into some sort of transcendent memory scheme.

There seem to be fewer options on the policy side. The working set estimation patches are certainly one possibility. One can control memory usage simply through decisions on the placement of virtual machines. The transcendent memory mechanism also allows the host to make policy decisions on how to allocate its memory between guests.

One interesting possibility raised by Rik was to make the balloon mechanism better. Current balloon drivers tend to force the release of random pages from the guest; that leads to fragmentation in the host, thwarting attempts to use huge pages. A better approach might be to use page hinting, allowing the guest to communicate to the host which pages are free. The balloon driver could then work by increasing the free memory thresholds instead of grabbing pages itself; that would force the guest to keep more pages free. Even better, memory compaction would come into play, so the guest would be driven to free up contiguous ranges of pages. Since those pages are marked free, the host can grab them (hopefully as huge pages) and use them elsewhere. With this approach, there is no need to pass pages directly to the host; the hinting is sufficient.

There are other reasons to avoid the direct allocation of pages in balloon drivers; as Pavel Emelyanov pointed out, that approach can lead to out-of-memory situations in the guest. Andrea Arcangeli stated that, when balloon drivers are in use, the guest must be configured with enough swap space to avoid that kind of problem; otherwise things will not be stable. The policy implemented by current balloon drivers is also entirely determined by the host system; it's not currently possible to let the guest decide when it needs to grow.

There is also a simple problem of communication; the host has no comprehensive view of the memory needs of its guest systems. Fixing that problem will not be easy; any sort of intrusive monitoring of guest memory usage will fail to scale well. And most monitoring tends to fall down when a guest's memory usage pattern changes - which happens frequently.

Few conclusions resulted from this session. There will be a new set of page hinting patches from Rik in the next few weeks; after that, thought can be put into doing ballooning entirely through hinting without having to call back to the host.

Dirty limits and writeback

The memory management track had been able to talk for nearly a full hour without getting into control groups, but that was never meant to last; Greg Thelen brought the subject back during his session on the management of dirty limits within control groups. He made the claim that keeping track of dirty memory within control groups is relatively easy, but then spent the bulk of his session talking about the subtleties involved in that tracking.

The main problem with dirty page tracking is a more general memory controller issue: the first control group to touch a specific page gets charged for it, even if other groups make use of that page later. Dirty page tracking makes that problem worse; if control group "A" dirties a page which is charged to control group "B", it will be B which is charged with the dirty page as well. This behavior seems inherently unfair; it could also perhaps facilitate denial of service attacks if one control group deliberately dirties pages that are charged to another group.

One possible solution might be to change the ownership of a page when it is dirtied - the control group which is writing to the page would then be charged for it thereafter. The problem with that approach is pages which are repeatedly dirtied by multiple groups; that could lead to the page bouncing back and forth. One could try a "charge on first dirty" approach, but Greg was not sure that it's all worth it. He does not expect that there will be a lot of sharing of writable pages between control groups in the real world.

The bigger problem is what to do about control groups which hit their dirty limits. Presumably they will be put to sleep until their dirty page counts go below the limit, but that will only work well if the writeback code makes a point of writing back pages which are associated with those control groups. Greg had three possible ways of making that happen.

The first of those involved creating a new memcg_mapping structure which would take the place of the address_space structure used to describe a particular mapping. Each control group would have one of these structures for every mapping in which it has pages. The writeout code could then find these mappings to find specific pages which need to be written back to disk. This solution would work, but is arguably more complex than is really needed.

An approach which is "a little dumber" would have the system associating control groups with inodes representing pages which have been dirtied by those control groups. When a control group goes over its limit, the system could just queue writeback on the inodes where that group's dirty pages reside. The problem here is that this scheme does not handle sharing of inodes well; it can't put an inode on more than one group's list. One could come up with a many-to-one mechanism allowing the inode to be associated with multiple control groups, but that code does not exist now.

Finally, the simplest approach is to put a pointer to a memory control group into each inode structure. When the writeback code scans through the list of dirty inodes, it could simply skip those which are not associated with control groups that have exceeded their dirty limit. This approach, too, does not do sharing well; it also suffers from the disadvantage that it causes the inode structure to grow.

Few conclusions were reached in this session; it seems clear that this code will need some work yet.

Kernel memory accounting and soft limits

The kernel's memory control group mechanism is concerned with limiting user-space memory use, but kernel memory can matter too. Pavel Emelyanov talked briefly about why kernel memory is important and how it can be tracked and limited. The "why" is easy; processes can easily use significant amounts of kernel memory. That usage can impact the system in general; it can also be a vector for denial of service attacks. For example, filling the directory entry (dentry) cache is just a matter of writing a loop running "mkdir x; cd x". For as long as that loop runs, the entire chain of dentries representing the path to the bottommost directory will be pinned in the cache; as the chain grows, it will fill the cache and prevent anything else from performing path lookups.

Tracking every bit of kernel data used by a control group is a difficult job; it also becomes an example of diminishing returns after a while. Much of the problem can be solved by looking at just a few data structures. Pavel's work has focused on three structures in particular: the dentry cache, networking buffers, and page tables. The dentry cache controller is relatively straightforward; it can either be integrated into the memory controller or made into a separate control group of its own.

Tracking network buffers is harder due to the complexities of the TCP protocol. The networking code already does a fair amount of tracking, though, so the right solution here is to integrate with that code to create a separate controller.

Page tables can occupy large amounts of kernel memory; they present some challenges of their own, especially when a control group hits its limit. There are two ways a process can grow its page tables; one is via system calls like fork() or mmap(). If a limit is hit there, the kernel can simply return ENOMEM and let the process respond as it will. The other way, though, is in the page fault handler; there is no way to return a failure status there. The best the controller can do is to send a segmentation fault signal; that usually just results in the unexpected death of the program which incurred the page fault. The only alternative would be to invoke the out-of-memory killer, but that may not even help: the OOM killer is designed to free user-space memory, not kernel memory.

Pavel plans to integrate the page table tracking into the memory controller; patches are forthcoming.

Ying Han got a few minutes to discuss the implementation of soft limits in the memory controller. As had been mentioned on the first day, soft limits differ from the existing (hard) limits in that they can be exceeded if the system is not under global memory pressure. Once memory gets tight, the soft limits will be enforced.

That enforcement is currently suboptimal, though. The code maintains a red-black tree in each zone containing the control groups which are over their soft limits, even though some of those groups may not have significant amounts of memory in that specific zone. So the system needs to be taught to be more aware of allocations in each zone.

The response to memory pressure is also not perfect; the code picks the control group which has exceeded its soft limit by the largest amount and beats on it until it goes below the soft limit entirely. It would probably be better to add some fairness to the algorithm and spread the pain among all of the control groups which have gone over their limits. Some sort of round-robin algorithm which would cycle through those groups would probably be a better way to go.

There was clearly more to discuss on this topic, but time ran out and the discussion had to end.

Transparent huge page improvements

Andrea Arcangeli had presented the transparent huge page (THP) patch set at the 2010 Summit and gotten some valuable feedback in return. By the 2011 event, that code had been merged for the 2.6.38 kernel; it still had a number of glitches, but those have since been fixed up. Since then, THP has gained some improved statistics support under /proc; there is also an out-of-tree patch to add some useful information to /proc/vmstat. Some thought has been put into optimizing libraries and applications for THP, but there is rarely any need to do that; applications can make good use of the feature with no changes at all.

There are a number of future optimizations on Andrea's list, though he made it clear that he does not plan to implement them all himself. The first item, though - adding THP support to the mremap() system call - has been completed. Beyond that, he would like to see the process of splitting huge pages optimized to remove some unneeded TLB flush operations. The migrate_pages() and move_pages() system calls are not yet THP-aware, so they split up any huge pages they are asked to move. Adding a bit of THP awareness to glibc could improve performance slightly.

The big item on the list is THP support for pages in the page cache; currently only anonymous pages are supported. There would be some big benefits beyond another reduction in TLB pressure; huge pages in the page cache would greatly reduce the number of pages which need to be scanned by the reclaim code. It is, however, a huge job which would require changes in all filesystems. Andrea does not seem to be in a hurry to jump into that task. What might happen first is the addition of huge page support to the tmpfs filesystem; that, at least, would allow huge pages to be used in shared memory applications.

Currently THP only works with one size of huge pages - 2MB in most configurations. What about adding support for 1GB pages as well? That seems unlikely to happen anytime soon. Working with those pages would be expensive - a copy-on-write fault on a 1GB page would take a long time to satisfy. The code changes would not be trivial; the buddy allocator cannot handle 1GB pages, and increasing MAX_ORDER (which determines the largest chunk managed by the buddy allocator) would not be easy to do. And, importantly, the benefits would be small to the point that they would be difficult to measure. 2MB pages are enough to gain almost all of the performance benefits which are available, so supporting larger page sizes is almost certainly not worth the effort. The only situation in which is might happen is if 2MB pages become the basic page size for the rest of the system.

Might a change in the primary page size happen? Not anytime soon. Andrea actually tried it some years ago and ran into a number of problems. Among other things, a larger page size would change a number of system call interfaces in ways which would break applications. Kernel stacks would become far more expensive; their implementation would probably have to change. A lot of memory would be wasted in internal fragmentation. And a lot of code would have to change. One should not expect a page size change to happen in the foreseeable future.

NUMA migration

Non-uniform memory access systems are characterized by the fact that some memory is more expensive to access than the rest. For any given node in the system, memory which is local to that node will be faster than memory found elsewhere in the system. So there is a real advantage to keeping processes and their memory together. Rik van Riel made the claim that this is often not happening. Long-running processes, in particular, can have their memory distributed across the system; that can result in a 20-30% performance loss. He would like to get that performance back.

His suggestion was to give each process a "home node" where it would run if at all possible. The home node differs from CPU affinity in that the scheduler is not required to observe it; processes can be migrated away from their home node if necessary. But, when the scheduler performs load balancing, it would move processes back to their homes whenever possible. Meanwhile, the process's memory allocations would be performed on the home node regardless of where the process is running at the time. The end result should be processes running with local memory most of the time.

There are some practical difficulties with this scheme, of course. The system may end up with a mix of processes which all got assigned to the same home node; there may then be no way to keep them all there. It's not clear what should happen if a process creates more threads than can be comfortably run on the home node. There were also concerns about predictability; the "home node" scheme might create wider variability between identical runs of a program. The consensus, though, was that speed beats predictability and that this idea is worth experimenting with.

Stable pages

What happens if a process (or the kernel) modifies the contents of a page in the time between when that page is queued for writing to persistent storage and when the hardware actually performs the write? Normally, the result would be that the newer data is written, and that is not usually a problem. If, however, something depends on the older contents, the result could be problematic. Examples which have come up include checksums used for integrity checking or pages which have been compressed or encrypted. Changing those pages before the I/O completes could result in an I/O operation failure or corrupted data - neither of which is desirable.

The answer to this problem is "stable pages" - a rule that pages which are in flight cannot be changed. Implementing stable pages is relatively easy (with one exception - see below). Pages which are written to persistent storage are already marked read-only by the kernel; if a process tries to write to the page, the kernel will catch the fault, mark the page (once again) dirty, then allow the write to proceed. To implement stable pages, the kernel need only force that process to block until any outstanding I/O operations have completed.

The btrfs filesystem implements stable pages now; it needs them for a number of reasons. Other filesystems do not have stable pages, though; xfs and OCFS implement them for metadata only, and the rest have no concept of stable pages at all. There has been some resistance to the idea of adding stable pages because there is some fear that performance could suffer; processes which could immediately write to pages under I/O would slow down if they are forced to wait.

The truth of the matter seems to be that most of the performance worries are overblown; in the absence of a deliberate attempt to show problems, the performance degradation is not measurable. There are a few exceptions; applications using the Berkeley database manager seem to be one example. It was agreed that it would be good to have some better measurements of potential performance issues; a tracepoint may be placed to allow developers to see how often processes are actually blocked waiting for pages under I/O.

It turns out that there is one place where implementing stable pages is difficult. The kernel's get_user_pages() function makes a range of user-space pages accessible to the kernel. If write access is requested, the pages are made writable at the time of the call. Some time may pass, though, before the kernel actually writes to those pages; in the meantime, some of them may be placed under I/O. There is currently no way to catch this particular race; it is, as Nick Piggin put it, a real correctness issue.

There was some talk of alternatives to stable pages. One is to use bounce buffers for I/O - essentially copying the page's contents elsewhere and using the copy for the I/O operation. That would be expensive, though, so the idea was not popular. A related approach would be to use copy-on-write: if a process tries to modify a page which is being written, the page would be copied at that time and the process would operate on the copy. This solution may eventually be implemented, but only after stable pages have been shown to be a real performance problem. Meanwhile, stable pages will likely be added to a few other filesystems, possibly controlled by a mount-time option.

Closing sessions

Toward the end of the day, Qian Cai discussed the problem of sustainable testing. There are a number of ways in which our testing is not as good as it could be. Companies all have their own test suites; they duplicate a lot of effort and tend not to collaborate in the development of the tests or sharing of the results. There are some public test suites (such as xfstests and the Linux Testing Project), but they don't work together and each have their own approach to things. Some tests need specific hardware which may not be generally available. Other tests need to be run manually, reducing the frequency with which they are run.

The subsequent discussion ranged over a number of issues without resulting in any real action items. There was some talk of error injection; that was seen as a useful feature, but a hard thing to implement well. It was said that our correctness tests are in reasonably good shape, but that there are fewer stress tests out there. The xfstests suite does some stress testing, but it runs for a relatively short period of time so it cannot catch memory leaks; xfstests is also not very useful for catching data corruption problems.

The biggest problem, though, is one which has been raised a number of times before: we are not very good at catching performance regressions. Ted Ts'o stated that the "dirty secret" is that kernel developers do not normally stress filesystems very much, so they tend not to notice performance problems.

In the final set of lightning talks, Aneesh Kumar and Venkateswararao Jujjuri talked about work which is being done with the 9p filesystem. Your editor has long wondered why people are working on this filesystem, which originally comes from the Plan9 operating system. The answer was revealed here: 9p makes it possible to export filesystems to virtualized guests in a highly efficient way. Improvements to 9p have been aimed at that use case; it now integrates better with the page cache, uses the virtio framework to communicate with guests, can do zero-copy I/O to guests running under QEMU, and supports access control lists. The code for all this is upstream and will be shipping in some distributions shortly.

Amir Goldstein talked about his snapshot code, which now works with the ext4 filesystem. The presentation consisted mostly of benchmark results, almost all of which showed no significant performance costs associated with the snapshot capability. The one exception appears to be the postmark benchmark, which performs a lot of file deletes.

Mike Snitzer went back to the "advanced format" discussion from the morning's session on future technology. "Advanced format" currently means 4k sectors, but might the sector size grow again in the future? How much pain would it take for Linux to support sector sizes which are larger than the processor's page size? Would the page size have to grow too?

The answer to the latter question seems to be "no"; there is no need or desire to expand the system page size to support larger disk sectors. Instead, it would be necessary to change the mapping between pages in memory and sectors on the disk; in many filesystems, this mapping is still done with the buffer head structure. There are some pitfalls, including proper handling of sparse files and efficient handling of page faults, but that is just a matter of programming. It was agreed that it would be nice to do this programming in the core system instead of having each filesystem solve the problems in its own way.

The summit concluded with an agreement that things had gone well, and that the size of the event (just over 70 people) was just about right. The summit, it said, should be considered mandatory for all maintainers working in this area. It was also agreed that the memory management developers (who have only been included in the summit for the last couple of meetings) should continue to be invited. That seems inevitable for the next summit; the head of the program committee, it was announced, will be memory management hacker Andrea Arcangeli.

Comments (14 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 2.6.39-rc2 ?

Andi Kleen The longterm Linux kernel 2.6.35.12 has been released ?

Core kernel code

Srikar Dronamraju 0: Uprobes patchset with perf probe support ?

Peter Zijlstra sched: Reduce runqueue lock contention -v6 ?

Matt Fleming Improve signal delivery scalability ?

Development tools

Hui Zhu KGTP (Linux Kernel debugger and tracer) 20110405 release ?

Device drivers

David Collins regulator: msm: Add PM8921 regulator driver ?

Christopher Heiny (corrected) input/touchscreen: Synaptics RMI4 Touchscreen Driver ?

Jonathan Cameron IIO: Channel registration rework, buffer chardev combining and rewrite of triggers as 'virtual' irq_chips. ?

Bill Gatliff Implement a generic PWM framework ?

Abhijeet Dharmapurikar pm8921 core and subdevices ?

Lauro Ramos Venancio NFC subsystem prototype ?

Filesystems and block I/O

Justin TerAvest [PATCH v3 0/8] Provide cgroup isolation for buffered writes. ?

Mike Snitzer dm: improve block integrity support ?

Memory management

Marek Szyprowski Contiguous Memory Allocator ?

Peter Zijlstra mm: Preemptibility -v10 ?

Miscellaneous

Pekka Enberg Native Linux KVM tool ?

Daniel Poelzleithner ulatencyd 0.5.0 ?

Page editor: Jonathan Corbet

Distributions

Camp KDE: Using Slackware to investigate KDE 4

By Jake Edge
April 6, 2011

The lack of automatic dependency resolution in Slackware's package management system is seen by some as a fatal problem for the distribution. But Vincent Batts came to Camp KDE to point out some advantages of Slackware's laissez faire attitude toward dependencies when testing a rapidly changing set of packages under development, like, for instance, the early state of KDE 4. The flexibility of being able to easily build packages based on the KDE source code and install them as needed, while being able to back out as problems arose made it easy for him to investigate the changes that came with KDE 4.

Batts said that when KDE 4.0 was released, he was firmly happy with 3.5, and not yet willing to switch. He was, however, curious about the new KDE world, and wanted to "see what I might be missing". In trying to check it out, he was regularly breaking his systems, which is a Linux user's right, he said, but that breakage led him to using Slackware to make it easier to manage his KDE 4 investigation.

KDE 4 brought a lot of new dependencies, which he found much easier to manage with Slackware as opposed to a packaging system like RPM that does dependency handling. Slackware has a "simplicity in component handling and upgrades" that is lacking in other systems. You can use RPM and the --force option to install package without their dependencies, or to override other requirements, but it can lead to various problems that can be difficult to recover from.

For Slackware, though, there is a "managed set of packages", with most dependencies already available in the base system, and "everything plays nicely together". That sounds like the situation with most distributions, but Batts pointed out some significant differences, as well.

Slackware packages are not much more than a tarball and a shell script. There are no domain-specific languages and no "automagic" in the package management system. As he was building the KDE 4 code, if he ran into a dependency that he didn't have, he got the source, built it in the canonical way, and easily turned it into a Slackware package using the following commands:

    $ make install DESTDIR=`pwd`/tmp
    $ cd tmp
    $ makepkg -l y ../pkg_name.tgz
    $ sudo upgradpkg --reinstall ../pkg_name.tgz

One can also use explodepkg to unbundle the package, fix any problems in it, and then makepkg/upgradepkg to recreate and reinstall the package. In working with the early KDE 4 packages, Batts got "good at rendering systems useless", but was able to fairly easily return the system to a known state. It is one of the benefits of not having a dependency chain, he said, because you can try things out, then back them out as needed.

To do that, Batts recommended having a local mirror or DVD of the Slackware release, then you can roll back any changes on a running system. From the top of the Slackware tree, a simple:

    # upgradepkg --reinstall */*.t?z

will reinstall all of the packages from the release, overwriting any that appear in the core set and have been changed. A "slackpkg clean-system" will then remove any packages that are not part of the core set, essentially resetting the system back to the "just installed" state.

One of the attendees asked about how KDE was to work with as an upstream and Batts seemed to be pretty happy with the project. He likes that you can get tarballs of each of the major subparts of KDE without having to get a tarball for each individual program. For example the KDE education tarball has all of the applications that make up that piece (e.g. KStars, Marble) which one can then subdivide into Slackware packages as they are built if that's desirable.

The dependencies have greatly increased with KDE 4, but "it is still pretty manageable", he said. One of the reasons that GNOME was removed from the Slackware core several releases ago was that the dependencies got out of hand: "Try to build GNOME from source, it will be instructive", he said.

It was interesting to hear one of the oft-heard weaknesses of Slackware turned on its head in Batts's presentation. While dependency tracking and handling can be useful—very useful at times—there are times when it can just get in the way. When choosing a distribution to use, it may make sense to look at Slackware, especially for systems that will be undergoing rapid, pervasive changes.

Comments (6 posted)

Brief items

Distribution quotes of the week

See, one of the things I work towards is igniting more hackers — more people curious about and tinkering with the way things work, the way things get made. And for that, we need more folks saying whoa, how did that happen? If a release named "Beefy Miracle" doesn't get the world to do a double-take and say wait, what? — then... I don't know what will.

-- Mel Chua

In general, to face any forthcoming big change, we need to stick more and more to the well known principles of "rough consensus and working code", rather than to fruitless discussions and inertia as defaults. We need more people who dare to propose changes and who are able to show, with working code, that those changes are viable. Otherwise the whole (geek) world will evolve around us, leaving Debian behind.

-- Stefano 'Zack' Zacchiroli

Comments (none posted)

Ubuntu 11.04 Beta 1 (Natty Narwhal) Released.

Ubuntu has announced the availability of the first beta of 11.04. Those interested in trying it out can either upgrade from 10.10 or download the beta. "Codenamed "Natty Narwhal", 11.04 continues Ubuntu's proud tradition of integrating the latest and greatest open source technologies into a high-quality, easy-to-use Linux distribution. [...] Ubuntu 11.04 now combines Ubuntu Desktop Edition and Ubuntu Netbook Edition. This edition introduces the Unity environment as the default desktop. [...] Ubuntu 11.04 Netbook edition will still be produced for the ARM platform, and the team is proud to introduce a Headless edition with 11.04 for ARM."

Full Story (comments: 19)

MeeGo Tablet Developer Preview

MeeGo has released a developer preview for tablets. "This release provides a touch-optimized user interface for MeeGo tablets, introducing the new panels UI concept and including a suite of built-in applications for Web browsing, personal information management and media consumption. This project is a work-in-progress under active development and considered pre-alpha. We welcome your involvement and contributions."

Comments (1 posted)

Distribution News

Ubuntu family

Announcing Ubuntu App Developer Week

Ubuntu App Developer Week will be held on IRC April 11-15, 2011. "Ubuntu App Developer Week is a week of sessions aimed at enabling and inspiring developers to write applications that scratch their itches. Our goal is to give all attendees a taste of the wide variety of tools on the Ubuntu platform that can be used to create awesome applications, and to showcase some applications that have been created and explain how they were put together."

Full Story (comments: none)

Ubuntu Shipit Discontinued

Canonical has announced the end of the ShipIt program. "Technology moves on and as we look at ways to spread Ubuntu further, a CD distribution programme, especially one of that size and delivered in that way, makes less sense. We have been slowly easing back the programme over the last two years to limit the number of CDs per person and the number of times a person could apply for a CD. But for Ubuntu 11.04 you will no longer be able to go to our website and apply for a free CD." CDs will still be available for Ubuntu Local Communities. The company will also be launching a free online trial for Ubuntu.

Comments (3 posted)

Other distributions

CentOS 5.6 release imminent

Dag Wieërs reports that CentOS 5.6 is on its way to a mirror near you. "Next up is CentOS 6.0, hopefully this one is released before RHEL 6.1, since the RHEL 6.1 Beta is already two weeks out. The fact that CentOS 6.0 is already 145 days behind RHEL 6.0 is something the team will have to think about. Leveraging the community by opening up the QA process is a no-brainer to me."

Update: Karanbir Singh has more information on his blog.

Comments (10 posted)

Newsletters and articles of interest

Distribution newsletters

Fedora Weekly News Issue 269 (March 30)
DistroWatch Weekly, Issue 399 (April 4)
openSUSE Weekly News, Issue 169 (April 2)

Comments (none posted)

Puppy Linux update brings improvements (The H)

The H takes a look at Puppy 5.2.5. "Based on Ubuntu 10.04 LTS "Lucid Lynx" binary packages, Lucid Puppy 5.2.5 was built using the Woof build system from FEB 28 (earlier releases were built using Woof from NOV 28) and features version 2.6.33.2 of the Linux kernel. Other changes include upgrades and improvements to the built-in applications, such as Bash 4.1.0, an upgrade from Bash 3, as well as Syslinux 4.03, an upgrade from Syslinux 3, and version 1.41.14 of the e2fsprogs filesystem utilities, the latest from Ubuntu 11.04 Natty Narwhal."

Comments (none posted)

Spotlight on Linux: Supergamer Supreme 2.5 (Linux Journal)

Susan Linton looks at the latest release of Supergamer. "The main advantage is games, games, and even more games. Supergamer ships with lots of games already installed and ready to go. No fighting 3D acceleration drivers, no digging up old howtos to get some games to work, and no visiting numerous websites looking for demos of popular commercial games. In other words, convenience is the key word. One can either install the system or not, which may be an advantage especially with shared, public, or family computers."

Comments (none posted)

Debian 6 Squeeze review (Linux User and Developer)

Koen Vervloesem reviews Debian 6.0. "Another hallmark of Debian is that it really deserves its name of "universal operating system". There are official CD and DVD images for various architectures: amd64, armel, i386, ia64, mips, mipsel, powerpc, sparc, and s390. So if you have an old Mac with a PowerPC processor lying around, you can give it a new life with Debian, or if you want to install a full Linux distro on your NAS with an ARM processor, chances are that Debian supports it. The Squeeze release even has for the first time two non-Linux architectures: kfreebsd-i386 and kfreebsd-amd64, which give you a complete Debian system on top of a FreeBSD 8 kernel, which is nice if you want features like the ZFS file system."

Comments (1 posted)

Bodhi Linux: Interview with Jeff Hoogland (TechRepublic)

Jack Wallen talks with Jeff Hoogland, one of the Bodhi developers. "What language do you primarily work in when working on Bodhi Linux? The primary language all the new Bodhi tools (and Enlightenment itself) are written in is C. We have three people on the team working towards this end. C is our preferred language because it is both fast and what the EFLs (Enlightenment Foundation Libraries) are written in. Because Bodhi is based on Ubuntu we inherit their many python applications (for better and worse). Myself and one of our other developers take care of all the python and bash coding that comes up as we edit existing system components." LWN looked at Bodhi in March.

Comments (none posted)

Page editor: Rebecca Sobol

Development

Camp KDE: Geolocation

By Jake Edge
April 6, 2011

At this year's edition of Camp KDE, John Layt reported in on his research to try to determine the right course for adding geolocation features to KDE. Currently, there is no common API for applications to use for tagging objects with their location or to request geolocation data from the system. There are a number of different approaches that existing applications have taken, but a shared infrastructure that allows geolocation data to be gathered and shared between applications and the desktop environment is clearly desirable.

Layt began by explaining that he had more than a passing interest in geolocation, partly because he was born in New Zealand, has lived in Australia and South Africa, and now lives in the UK. Along the way, he has visited 47 countries. He has been studying archeology and how to use Geographical Information Systems (GIS) for that work, which have a clear connection to geolocation. He is best known in the KDE community for his work on calendars, holidays, and printing ("don't ask" about how he got involved in the latter, he said). In 2002, he started writing a virtual globe application called Kartographer, which is a distant ancestor of today's KDE globe application, Marble.

Geolocation basics

At the base level, geolocation consists of using and recording geographic coordinates that correspond to a particular location: latitude, longitude, and altitude will locate a point in 3D space. But there are additional pieces of information that are generally useful, like heading and velocity for navigation applications, as well as the accuracy of the coordinate information. GPS provides data accurate to within 10m or better, while IP address information provides far less accuracy (typically just narrowed down to a city at best). Generally applications are interested in two levels of accuracy, that more or less correspond to those measures, which he termed "fine" and "coarse" accuracy.

There are additional services that should be provided by a geolocation API, such as the current location of the device, as well as historic position information (i.e. where was the object 1 minute or hour ago). Converting things like addresses to geographic coordinates (geocoding), and the reverse, is also required. There are multiple possible providers of location information, GPS, IP address, cellular network location, visible WiFi access points, and so on, and there needs to be ways to switch between them. An application should be able to specify that it only needs coarse location information and the geolocation subsystem should use the appropriate provider(s).

Geolocation information is used in mapping and navigation applications, but there is more than just map data involved as there are various points of interest that get incorporated: landmarks, places, businesses, and so forth. Obviously, these kinds of applications are huge on mobile devices, but there is no good reason that desktops should not have access to the same kinds of data. Providing a geolocation framework just for mobile devices is an "arbitrary split", and one that shouldn't exist, Layt said. "Why should mobiles have all the fun?"

Geolocation applications

Geolocation can be added to applications beyond the obvious mapping/navigation programs, and Layt mentioned several, including star map applications like KStars (using the Android Sky Map application as an example), weather plasmoids (noting that his laptop was still showing London weather after he arrived in San Francisco), proximity for social and messaging applications, and mapping locations in address books. He said that there are "many many more" applications that could use geolocation if it were more easily available.

He also noted some specific applications that already use geolocation. digiKam can place photos on maps based on coordinates in the metadata or add those coordinates to photos that lack them. It allows for fairly sophisticated searches ("give me all the photos within 100m of a particular point"), by using its own library (libkmap) which is an abstraction over Google maps or Marble.

Layt has some ideas on other possibilities that geolocation could enable. Changing locations could modify the state of the desktop and system, by asking the user if they want to switch timezones or locale when their location changes. Moving from work to home could change Plasma activities, closing down work applications and documents, and opening up those that make sense for home. Those kinds of decisions will require a way to name locations (e.g. "home") and to also allow for some fuzziness in what defines the locations. Obviously, "home" is not defined by a single set of coordinates, and different locations will have different sizes, so there will need to be ways to define and store that information.

HTML 5 has geolocation support that could optionally be used to send location information to web sites for narrowing searches or discovering information based on one's current location. In addition, the semantic desktop (e.g. Nepomuk) could log location-based information if it were easily obtainable.

The ideal solution would have single API for all platforms, mobile or desktop. It would be lightweight so that there would be fewer barriers to including it into those platforms. It also would be available at a low level in the stack, so that it can be used by as much of the rest of the stack as possible. The solution would also use free data providers for map data to avoid the restrictions placed by the proprietary data providers (e.g. Google or Bing maps). A solution along those lines would allow the applications to largely free themselves from each having to implement its own solution.

Existing choices

There are a number of current solutions that might be adapted for the needs of a KDE platform solution. One is the Plasma DataEngine for geolocation that has been used by Plasma applets, mostly for weather applications so far. It has backends for gpsd or IP-based geolocation but does not provide support for geocoding.

Marble and libmarble have an extensive feature set for geolocation and mapping using free data providers. The library has few dependencies and can be used in a Qt-only mode, so that there are no KDE dependencies. The library itself is around 840K in size, but the data requirements are, unsurprisingly, much higher, around 10M. The biggest problem with using libmarble is that it does not provide a stable interface, as binary and API compatibility are not guaranteed between releases.

Another option is GeoClue, which is a freedesktop.org project for "geoinformation" that has support for geolocation, but not mapping. It is lightweight at 370K and includes multiple backends, choosing which to use based on the accuracy requirements of the query. One big problem, from the perspective of the KDE project at least, is that GeoClue has a dependency on gconf to provide a single configuration setting. Layt said that the GeoClue developers have been reaching out to KDE to suggest that it use the service, but the gconf dependency makes that impossible. Layt and others have been trying to get that changed for some time, but it hasn't happened yet; the projects need to work together to get that solved, he said.

The QtLocation module, which is part of the QtMobility framework, has support for geolocation, mapping, and landmarks. It only does simple tile maps and its landmarks are stored in non-standard ways, but those can be translated into a usable format. It lives low enough in the stack for use by the rest of the system and is supported and available for all mobile platforms. The desktop version is not maintained, so work would have to be done there, though there is a version for MeeGo that could be adapted for Linux desktop use. The biggest problem with QtLocation is that the available backend is for Ovi Maps, which comes with a huge number of restrictions.

Layt said that the Ovi Maps restrictions were much like those placed on the data from other providers like Google and Bing. The data providers "don't want you to do anything terribly useful with the data". There is a laundry list of things that are forbidden with Ovi Maps, such as reusing the data in a way that competes with the Ovi Maps application or altering the data that gets returned from the server. It requires registration to use, may not be available in some countries, and, rather oddly, requires "good manners" from its licensees.

There were indications that Ovi Maps might be opening up to more uses not so long ago, but Layt didn't seem very optimistic about that given the recent events at Nokia. Basically, he said, KDE needs to use free data from sources like OpenStreetMap.

In summary, there is no current solution that meets the needs of KDE. For now, mobile applications should probably use QtLocation, desktop applications should use libmarble, and Plasma applets should use the DataEngine. For the future there are several possibilities. Layt would like to use libmarble, but the lack of a stable API/ABI makes that difficult. QtLocation is stable enough but would require backends for free data sources. GeoClue is also attractive if the gconf dependency can be removed. He would like to see if a decision on the right approach could be made at the Platform 11 Sprint that will be held in June.

Camp KDE

Camp KDE is the yearly gathering of KDE developers in North America, and is meant to help reconnect developers in that region with their more numerous European counterparts. This is the third Camp KDE and the fourth KDE event held in North America. Around 35 people attended the first day's sessions, with many of those attending Camp KDE for the first time. Celeste Lyn Paul, who helped organize the event, noted that attracting "first timers" was exactly why the event exists, to try to grow the North American community both in size and in the minds the KDE communities elsewhere in the world.

Comments (9 posted)

Brief items

Coccinelle 0.2.5 released

Version 0.2.5 of the Coccinelle semantic patch tool has been released. New features include support for iterating a patch over a code base and, primarily, beginning support for the C++ language.

Full Story (comments: none)

GNOME 3.0 released

The GNOME 3.0 ~~release announcement~~ press release has gone out. "Today, the GNOME Desktop project released GNOME 3.0, its most significant redesign of the computer experience in nine years. A revolutionary new user interface and new features for developers make this a historic moment for the free and open source desktop." More information and downloads can be had at gnome3.org.

Update: the real release announcement is now available.

Full Story (comments: 89)

GNOME Journal Issue 23 (GNOME 3 edition) is out

The GNOME 3 release has not been announced as of this writing, but a special edition of the GNOME Journal is available in the meantime. There are articles on the history of this release, GNOME 3 fonts, developer interviews, and more; it's all at gnomejournal.org.

Full Story (comments: none)

(Almost) all Mercurial extensions together in one place

Greg Ward has announced the "All Mercurial Extensions" project, which is: "a meta-repository that gathers together all known Mercurial extensions that are not included with the latest version of Mercurial, as listed in the Mercurial wiki." It's meant to make it easier to find extensions, see how they are affected by core Mercurial changes, and serve as a working example of a repository with subrepos.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the last week

Caml Weekly News (April 5)
CCAN News (February/March)
PostgreSQL Weekly News (April 3)
Python-URL! (April 2)
Tcl-URL! (April 5)

Comments (none posted)

Mozilla kills embedding support for Gecko layout engine (The H)

The H describes Mozilla's plans to remove the ability to embed the Gecko layout engine into other programs. This will affect projects like the Galeon browser and other programs that currently embed Gecko. "In a posting to mozilla.dev.embedding, Embedding Module owner Benjamin Smedberg said that Mozilla had been considering the future for embedding Gecko in other applications. He cites the difficulty involved to date, the expected complexity of moving to a multiple process model and the desire to "strongly prioritize" Firefox as the key product of Mozilla. There is a possibility that embedding support could return in the future after Mozilla has moved Firefox to a multi-process model, but the developers are not going to [prioritize] that as a goal in their design work."

Comments (54 posted)

Writing an Interpreter with PyPy, Part 1

Andrew Brown has posted the first part of a tutorial on how to use PyPy to create an interpreter for a new language. "Wouldn't it be nice if you could write your language in an existing high level language like, for example, Python? That sure would be ideal, you'd get all the advantages of a high level language like automatic memory management and rich data types at your disposal. Oh, but an interpreted language interpreting another language would be slow, right? That's twice as much interpreting going on. As you may have guessed, PyPy solves this problem."

Comments (7 posted)

Page editor: Jonathan Corbet

Announcements

Brief items

Google working to build a larger patent pile

Google has announced that it is bidding for Nortel's considerable pile of software patents. "If successful, we hope this portfolio will not only create a disincentive for others to sue Google, but also help us, our partners and the open source community—which is integrally involved in projects like Android and Chrome—continue to innovate. In the absence of meaningful reform, we believe it's the best long-term solution for Google, our users and our partners."

Comments (63 posted)

Mozilla reabsorbing Mozilla Messaging

The Mozilla project has announced that the Mozilla Messaging project (which produces Thunderbird) will cease to be a separate operation. "We intend to combine the two teams to increase our effectiveness. Practically this means well be integrating Mozilla Messaging with Mozilla Labs. David Ascher will lead a new innovation group within Mozilla Labs focused on online communications and social interactions on the Web. After the teams merge into Mozilla Labs we will dissolve Mozilla Messaging. This simplifies our overall structure."

Comments (1 posted)

Articles of interest

NASA Hosts Its First Open Source Summit (eWeek)

eWeek reports on NASA's Open Source Summit, which was held March 29-30 at Moffett Field in Silicon Valley. Speakers included Google's Chris DiBona, David Wheeler of the Institute for Defense Analysis, Mozilla's Pascal Finette, Bob Sutor of IBM, and Red Hat's Brian Stevens. "NASA's use of open source has been restricted in past years due to the International Traffic in Arms Regulations (ITAR) of the U.S. State Department, which apply directly to aerospace equipment. DiBona argued that these restrictions ought to be eased. [...] If NASA's IT group used more open-source software, DiBona said, the help of the community would save time and tax dollars as well as speed up transfer of technology to and from aerospace programs. It also would accelerate NASA's software-procurement practices, he said."

Comments (10 posted)

Professional Quality CAD on Linux with DraftSight (Linux.com)

Nathan Willis reviews the zero-cost-but-proprietary beta release of DraftSight, computer-aided design (CAD) software for Linux. "Although this new app is not open source, it is the first professional-level package available for free on Linux that can read and write the industry-standard .DWG file format. Free software CAD still has a long way to go, but for now DraftSight offers Linux users a rare glimmer of hope."

He goes on to look at some of the alternatives. "As frustrating as it is, those are the options right now for CAD on Linux: non-free software that supports DWG, or free software that doesn't. All this stems back to the need to reverse-engineer the DWG file format itself. An independent group called the Open Design Alliance (ODA) did just that, and created a Linux-compatible library called OpenDWG. Yet as is too often the case, the organization and product do not live up to their names. You must be a paid-up member of the ODA in order to access its software, and you are not allowed to share it with others."

Comments (21 posted)

Capturing SIGCSE conversation: Computer science professors discuss teaching open source (Opensource.com)

Over at Opensource.com, Mel Chua summarizes what was learned about teaching open source at the SIGCSE computer science education conference. "When the completion of a project hinges on many factors outside a student's control, professors need to find different ways of grading. It's unfair to penalize a student for good work that wasn't accepted as a patch simply because an external dependency slipped or an outside developer didn't respond to their email before the semester ended. To address this, Grant Hearn from the University of the Western Cape suggested competency categories rather than hard rubrics--did the student do something related to documentation in the project? Write some form of feature specification? Can the student hand you a chat log with a remote developer from upstream (regardless of the outcome of that conversation)? Figure out learning objectives and turn them into benchmarks that are under the student's control."

Comments (none posted)

Villa: MPL Beta 2- as FAQ

On his blog, Luis Villa discusses the latest beta version of the update to the Mozilla Public License (MPL). In particular, there is an experimental rewrite of the license in FAQ form: "This approach has two advantages. First, it helps you draft and organize things more clearly. Since every paragraph was the answer to a question, things were broken up into what normal human beings would consider more logical units, instead of the giant blocks of text legal documents sometimes sprawl into. Preparing the FAQified version of Beta 2 made us aware of some MPL sections that had this problem, and it helped us reorder and reorganize text as a result- something which you can see in (for example) the new Section 8 of MPL 2 Beta 2, which is part of the old Section 9 broken out so that it makes more sense independently. Because of this, these changes will help every reader of the license, even if we never publish another "FAQified" version."

Comments (2 posted)

New Books

Agile Web Development with Rails, Fourth Edition--New from Pragmatic Bookshelf

Pragmatic Bookshelf has release "Agile Web Development with Rails, Fourth Edition", by Sam Ruby, Dave Thomas and David Heinemeier Hansson.

Full Story (comments: none)

Resources

FSFE Newsletter - April 2011

The Free Software Foundation Europe April newsletter is out. In this issue: FSFE turned 10, a worldwide celebration of Open Standards, and several other topics.

Full Story (comments: none)

Contests and Awards

Tagesschau.de awarded for the use of Open Standards

The Free Software Foundation Europe (FSFE) has awarded Tagesschau.de for the use of Open Standards at the "Document Freedom Day". "The prize is awarded by the Free Software Foundation Europe (FSFE) and the Foundation for a Free Information Infrastructure e.V. (FFII) for offering the broadcasted shows also in the free video format "Ogg Theora"."

Full Story (comments: none)

Calls for Presentations

Linux Security Summit 2011 - Announcement and CFP

The call for participation is open for the Linux Security Summit which will be held September 8 in Santa Rosa, California, co-located with the Linux Plumbers Conference. There will be brief technical talks and Q&A panel sessions. The deadline for abstracts and panel discussion topics is May 27.

Full Story (comments: none)

Upcoming Events

Events: April 14, 2011 to June 13, 2011

The following event listing is taken from the LWN.net Calendar.

Date(s)	Event	Location
April 11 April 14	O'Reilly MySQL Conference & Expo	Santa Clara, CA, USA
April 13 April 14	2011 Android Builders Summit	San Francisco, CA, USA
April 16	Open Source Conference Kansai/Kobe 2011	Kobe, Japan
April 25 April 26	WebKit Contributors Meeting	Cupertino, USA
April 26 April 29	OpenStack Conference and Design Summit	Santa Clara, CA, USA
April 28 April 29	Puppet Camp EU 2011: Amsterdam	Amsterdam, Netherlands
April 29	Ottawa IPv6 Summit 2011	Ottawa, Canada
April 29 April 30	Professional IT Community Conference 2011	New Brunswick, NJ, USA
April 30 May 1	LinuxFest Northwest	Bellingham, Washington, USA
May 3 May 6	Red Hat Summit and JBoss World 2011	Boston, MA, USA
May 4 May 5	ASoC and Embedded ALSA Conference	Edinburgh, United Kingdom
May 5 May 7	Linuxwochen Österreich - Wien	Wien, Austria
May 6 May 8	Linux Audio Conference 2011	Maynooth, Ireland
May 9 May 11	SambaXP	Göttingen, Germany
May 9 May 10	OpenCms Days 2011 Conference and Expo	Cologne, Germany
May 9 May 13	Linaro Development Summit	Budapest, Hungary
May 9 May 13	Ubuntu Developer Summit	Budapest, Hungary
May 10 May 13	Libre Graphics Meeting	Montreal, Canada
May 10 May 12	Solutions Linux Open Source 2011	Paris, France
May 11 May 14	LinuxTag - International conference on Free Software and Open Source	Berlin, Germany
May 12	NLUUG Spring Conference 2011	ReeHorst, Ede, Netherlands
May 12 May 15	Pingwinaria 2011 - Polish Linux User Group Conference	Spala, Poland
May 12 May 14	Linuxwochen Österreich - Linz	Linz, Austria
May 16 May 19	PGCon - PostgreSQL Conference for Users and Developers	Ottawa, Canada
May 16 May 19	RailsConf 2011	Baltimore, MD, USA
May 20 May 21	Linuxwochen Österreich - Eisenstadt	Eisenstadt, Austria
May 21	UKUUG OpenTech 2011	London, United Kingdom
May 23 May 25	MeeGo Conference San Francisco 2011	San Francisco, USA
June 1 June 3	Workshop Python for High Performance and Scientific Computing	Tsukuba, Japan
June 1	Informal meeting at IRILL on weaknesses of scripting languages	Paris, France
June 1 June 3	LinuxCon Japan 2011	Yokohama, Japan
June 3 June 5	Open Help Conference	Cincinnati, OH, USA
June 6 June 10	DjangoCon Europe	Amsterdam, Netherlands
June 10 June 12	Southeast LinuxFest	Spartanburg, SC, USA

If your event does not appear here, please tell us about it.

Audio and Video programs

LAM: Best of 2010 mix - Now online

The *Linux Audio Musicians Best of 2010 mix* is available for your listening pleasure. "Congratulations!!! to the Artists and Bands that made 2010 a bumper year for well produced and unique music from the Linux Audio Community. This mix is the biggest yet with over 100 tracks included and over 12 hours of music to listen to. A hefty selection of guitar music and rock productions are included in this years mix to complement the large range of electronica. The variation of styles and genres is superb and as expected there was lots of off the wall and challenging music produced during the 2010 period also."

Full Story (comments: none)

Page editor: Rebecca Sobol