As GCC nears its 4.4 release, there are a number of criteria that need to be met before it can be released. Those requirements—regressions requiring squashing—have been met, but things are still stalled. A number of issues were raised with the changes to the runtime library exemption that have caused the release, and a branch that will allow new development into the GCC tree, to be delayed until that is resolved. In the meantime, however, GCC development is hardly standing still, there are numerous interesting ideas floating around for new features.
Changing the runtime library exemption was meant to allow the creation of a plugin API for GCC, so that developers could add additional analysis or processing of a program as it is being transformed by the compiler. The Free Software Foundation has long been leery of allowing such a plugin mechanism because they feared that binary-only GCC plugins of various sorts might be the result. In January, though, the FSF announced that it would change the exemption—which allows proprietary programs to link to the GCC runtime library—in order to exclude code that has been processed by a non-GPL "compilation process". It is a bit of license trickery that will only allow plugins that are GPL-licensed.
Shortly after the new exception was released, there were some seemingly substantive issues raised on the gcc-devel mailing list. Ian Taylor neatly summarized the concerns, which break down into three separate issues:
By and large, the byte code produced as part of a compiler's operation is just an intermediate form that likely shouldn't be considered a "high-level, non-intermediate language", but Java and LLVM are a bit different. In both cases, the byte code is a documented language, somewhat higher level than assembly code, which, at least in the case of LLVM, is sometimes hand-written. For Java, non-GPL compilers are often used, but based on the current exemption language, the byte code from those compilers couldn't be combined with the GCC runtime libraries and distributed as a closed source program. Since LLVM is GPL-compatible, there are currently no issues combining its output with the GCC runtime, but Taylor is using it as another example of byte code being generated by non-GCC tools.
In addition to laying out the issues, Taylor recommends two possible ways forward. One of those is to clarify the difference between a compiler intermediate form and a "high-level, non-intermediate language". The other is to expand the definition of an eligible compilation process to allow any input to GCC that is created by a program that is not derived from GCC. Trying to make the former distinction seems difficult to pin down in any way that can't be abused down the road, so the second might be easier to implement. After all, the GCC developers can determine what kinds of input the compiler is willing to accept.
This may seem like license minutiae to some—and it is—but it is important to get it right. The FSF has chosen to go this route to prevent the—currently theoretical—problem of proprietary GCC plugins, so they need to ensure that they close any holes. As Dave Korn pointed out in another thread, releasing anything using an unclear license could create problems down the road:
Meanwhile, GCC developers have been working on reducing the regressions so that 4.4 can be released. Richard Guenther reported on March 13 that there were no priority 1 (P1) regressions, and less than 100 overall regressions, which would normally mean that a new branch for 4.4 would be created, with 4.5 development being added to the trunk. But, because of the runtime library exception questions, Richard Stallman asked the GCC Steering Committee (SC) to wait for those to be resolved before branching.
The delay has been met with some unhappiness amongst GCC hackers. Without a 4.4 release branch, interesting new features are still languishing in private developer branches. As Steven Bosscher put it:
What is also being held back, is more than a year of improvements since GCC 4.3.
Bosscher suggested releasing with the old exemption for 4.4 and fixing the problems in the 4.5 release. While that could work, it would seem that Stallman and the SC are willing to give FSF legal some time to clarify the exemption. In the end, though, the point is somewhat moot as there is, as yet, no plugin API available.
As part of the discussion of the new runtime library exception, Sean Callanan sparked a discussion about a plugin API by mentioning some of the plugins his research group had been working on. That led to various thoughts about the API, including a wiki page for the plugin project and one for the API itself. Diego Novillo has also created a branch to contain the plugin work.
The basic plan is to look at the existing plugins—most of which have implemented their own API—to extract requirements for a generalized API. In addition to the plugins mentioned by Callanan, there are others, including Mozilla's Dehydra C++ analysis tool, the Middle End Lisp Translator (MELT), which is a Lisp dialect that allows the creation of analysis and transformation plugins, and the MILEPOST self-optimizing compiler. Once the license issues shake out, it would appear that a plugin API won't be far behind.
There are other new features being discussed for GCC as well. Taylor has put out a proposal to support "split stacks" in GCC. The basic idea is to allow thread stacks to grow and shrink as needed, rather than be statically allocated at a particular size. Currently, applications that have enormous numbers of threads must give each one the worst-case stack size, even when it might go unused during the life of that thread. So, this could reduce memory usage, thus allowing more threads to run, but it would also alleviate the need for programmers to consider stack size for applications with thousands or millions of threads.
Another feature is link-time optimization (LTO), which is much further along than split stacks. Novillo put out a call for testers of the LTO branch in late January. There are a number of optimizations that can be performed when the linker has access to information about all of the compilation units. Currently, the linker only has access to the object files that are being collected into an executable, but LTO would put the GCC-internal representation (GIMPLE) into a special section of the object file. Then, at link time (but not actually implemented by the linker), various optimizations based on the state of the whole program could be performed. The kinds of optimizations that can be done are outlined in a paper [PDF] on "Whole Program Optimizations" (WHOPR) written by a number of GCC hackers including Taylor and Novillo.
While it is undoubtedly disappointing to delay GCC 4.4, hopefully the license issues will be worked out soon and the integration of GCC 4.5 can commence. In the interim, work on various features—many more than are described here—is proceeding. The FSF has always had a cautious approach to releases—witness the pace of Emacs—but sooner or later, we will see GCC 4.4, presumably with a licensing change. With luck, six months or so after that will come GCC 4.5 with some of these interesting new features.
The non-profit Media Development Loan Fund (MDLF) released a major upgrade to its online journalism content management system (CMS) Campsite last week. Campsite 3.2 brings a flexible new plugin system and several improvements to search, templating, and content editing. MDLF describes Campsite as a CMS tailor-made for newspaper publishers — many of whom cannot afford expensive commercial products or the IT support required to heavily customize general-purpose CMS packages. Many of MDLF's target organizations are independent media in countries in transition, but the system is used in newsrooms all over the globe.
Campsite is deployed by more than 70 publications, many in Central and Eastern Europe near MDLF's Center for Advanced Media in Prague (CAMP), the office from which Campsite takes its name. But the software is also popular in Latin America, such as at El Periodico in Guatemala City, Guatemala.
MDLF's mission is to support independent journalists and media organizations, so that they are "strong enough to hold governments to account, expose corruption and drive systemic change." Founded in 1996, it provides funding to independent media in 23 countries, made possible through private donations and public grants. MDLF describes tools as the key investments for independent media, including printing presses, radio and television transmitters, and software. Campsite and the other CAMP projects grew out of MDLF's need to provide low-cost, open source software for new media outlets.
The feature set caters to the needs of professional news publications, which CAMP's Head of Research and Development Douglas Arellanes described as an "organic" relationship. "Campsite has been around since 2000, and nearly all of its features have come on the basis of real-world implementations."
Arellanes says journalists and editors on deadlines have better things to do than worry about CMS management, and that is the key difference that sets Campsite apart. He personally likes Wordpress and has great respect for the project, noting that:
Campsite's back-end allows an organization to replicate the newspaper workflow: authors can create and edit stories, submitting them to the editors when ready; editors can alter them, schedule them to run at predetermines times, change their visibility, move them between sections, and ultimately approve their publication. The system also handles administrative tasks like managing subscriptions, tracking article views, and moderating reader comments. A single back-end can also run multiple publications with different rules, schedules, layouts, and subscription lists and policies.
It can handle paid or unpaid subscriptions, supports embedded multimedia in articles, can integrate with the Phorum web forum package, and is multilingual from the ground up. The interface is available in 14 languages, and every individual article can be translated, whether as a one-off special, side-by-side in a single publication, or with entire sections or publications in separate languages.
Lead developer Mugur Rus said Campsite takes security very seriously. The administration interface can run over SSL and uses fine-grained role based privileges on all accounts. The system also uses CAPTCHA for comment forms, logs all events, and can use email notification to alert system administrators. Rus says Campsite itself has only been cracked once, and that it uses the standard security features of its free software based platform.
Campsite is written in PHP and is designed to run on Apache servers using MySQL. The manual cites Apache 2.0.x, PHP 5.0, and MySQL 5.0 as the minimum version dependencies, and requires ImageMagick to handle graphics. In addition, you must run PHP as an Apache module, not as CGI, and there is a short list of required PHP directives to set up in the installation's php.ini file. Campsite runs on Linux, FreeBSD, Windows, and Mac OS X servers. No current Linux distributions are known to include Campsite, although from time to time users have shared their own home-brewed packages.
New in version 3.2 are improvements to search functionality, content editing, and site templating. The search improvements include an "advanced" search mode and increased support for non-ASCII languages that were problematic in earlier revisions. The story editor now uses the WYSIWYG TinyMCE component, with which administrators can customize the available markup features by privilege level. The Smarty-based templating system now supports functions, and developers have begun migrating the Campsite administration interface to Yahoo's open source AJAX interface library YUI.
The most significant feature is the debut of a plugin architecture that can extend the functionality of a publication but remain integrated with the core Campsite story content. For example, one of the three default plugins in Campsite 3.2 is a blogging module. Arellanes observed that most newspaper sites are just beginning to implement staff blogs, and that when they do they are typically stand-alone deployments of existing blog systems that sit isolated from the rest of the publication. Using Campsite's blog plugin, however, content is accessible via the same topic tags and search, whether it is a news article or a blog post.
The other two default plugins implement reader polls and a "live" interview system — in which readers can ask questions, an administrator can approve or reject them. The interviewee can then respond to them and have the answers posted automatically.
3.2 also uses a simpler installation process, inspired by Wordpress's five-minute install experience. Users now need only to expand the tar archive of the latest Campsite build, put the site contents into the folder they desire on their Web host, and follow the step-by-step setup process through the Web installer.
Arellanes said to expect a quick turnaround for the next stable release of Campsite, focusing on cache performance and overall speedups, and implementing a content API that will permit Campsite sites to make their story content available to outside users for aggregation or content mashups.
He also hopes to see more development from third-party coders on plugins using the new plugin API. "The open source model, as it concerns Campsite, has meant that we're really growing in terms of the number of features that are being contributed from non-core developers. We expect this trend to really pick up now that we've got a plugin architecture to work with."
The press release for Campsite 3.2 notes that independent media in developing countries have long operated on limited funds that preclude the expensive CMS solutions preferred by other organizations — the very situation that drives MDLF's software projects. But it also points out that newspapers in the "developed" world are facing a financial crisis of their own. Consequently, an open source CMS like Campsite makes more sense than ever.
As has been well covered (and discussed) elsewhere, the delayed allocation feature found in the ext4 filesystem - and most other contemporary filesystems as well - has some real benefits for system performance. In many cases, delayed allocation can avoid the allocation of space on the physical medium (along with the associated I/O) entirely. For longer-lived data, delayed allocation allows the filesystem to optimize the placement of the data blocks, making subsequent accesses faster. But delayed allocation can, should the system crash, lead to the loss of the data for which space has not yet been allocated. Any filesystem may lose data if the system is unexpectedly yanked out from underneath it, but the changes in ext4 can lead to data loss in situations that, with ext3, appeared to be robust. This change looks much like a regression to many users.
Many electrons have been expended to justify the new, more uncertain ext4 situation. The POSIX specification says that no persistence is guaranteed for data which has not been explicitly sent to the media with the fsync() call. Applications which lose data on ext4 are not using the filesystem correctly and need to be fixed. The real problem is users running proprietary kernel modules which cause their systems to crash in the first place. And so on. All of these statements are true, at least to an extent.
But one might also argue that they are irrelevant.
Your editor recently became aware that Simson Garfinkel's Unix Hater's Handbook [PDF] is available online. To say that this book is an aggravating read is an understatement; much of it seems like childish poking at Unix by somebody who wishes that VMS (or some other proprietary system) had taken over the world. It's full of text like:
But behind the silly rhetoric are some real points that anybody concerned with the value of Unix-like systems should hear. Among them are the "worse is better" notion expressed by Richard Gabriel in 1991 - the year the Linux kernel was born. This charge states that Unix developers will choose implementation simplicity over correctness at the lower levels, even if it leads to application complexity (and lack of robustness) at the higher levels. The ability of a write() system call to succeed partially is given as an example; it forces every write() call to be coded within a loop which retries the operation until the kernel gets around to finishing the job. Developers who cut corners like that are left with an application which works most of the time, but which can fail silently in unexpected circumstances. It is far better, these people say, to solve the problem once at the kernel level so that applications can be simpler and more robust.
The ext4 situation can be seen as similar: any application developer who wants to be sure that data has made it to persistent storage must take extra care to inform the kernel that, yes, that data really does matter. Developers who skip that step will have applications which work - almost all the time. One could well argue that, again, the kernel should take the responsibility of ensuring correctness, freeing application developers from the need to worry about it.
The ext3 filesystem made no such guarantees, but, due to the way its features interact, ext3 provides something close to a persistence guarantee in most situations. An ext3 filesystem running under a default configuration will normally lose no more than five seconds worth of work in a crash, and, importantly, it is not prone to the creation of zero-length files in common scenarios. The ext4 filesystem withdrew that implicit guarantee; unpleasant surprises for users followed.
Now the ext4 developers are faced with a choice. They could stand by their changes, claiming that the loss of robustness is justified by increased performance and POSIX compliance. They could say that buggy applications need to be fixed, even if it turns out that very large numbers of applications need fixing. Or, instead, they could conclude that Linux should provide a higher level of reliability, regardless of how diligent any specific application developers might have been and regardless of what the standards say.
It should be said that the first choice is not entirely unreasonable. POSIX forms a sort of contract between user space and the kernel. When the kernel fails to provide POSIX-specified behavior, application developers are the first to complain. So perhaps they should not object when the kernel insists that they, too, live up to their end of the bargain. One could argue that applications which have been written according to the rules should not take a performance hit to make life easier for the rest. Besides, this is free software; it would not take that long to fix up the worst offenders.
[PULL QUOTE: There is a case to be made that this is a situation where the Linux kernel, in the interest of greater robustness throughout the system, should go beyond POSIX. END QUOTE] But fixing this kind of problem is a classic case of whack-a-mole: application developers will continually reintroduce similar bugs. The kernel developers have been very clear that they do not feel bound by POSIX when the standard is seen to make no sense. So POSIX certainly does not compel them to provide a lower level of filesystem data robustness than application developers would like to have. There is a case to be made that this is a situation where the Linux kernel, in the interest of greater robustness throughout the system, should go beyond POSIX.
The good news, of course, is that this has already happened. There is a set of patches queued for 2.6.30 which will provide ext3-like behavior in many of the situations that have created trouble for early ext4 users. Beyond that, the ext4 developers are considering an "allocate-on-commit" mount option which would force the completion of delayed allocations when the associated metadata is committed to disk, thus restoring ext3 semantics almost completely. Chances are good that distributors would enable such an option by default. There would be a performance penalty, but ext4 should still perform better than ext3, and one should not underestimate the performance costs associated with lost data.
In summary: the ext4 developers - like Linux developers in general - do care about their users. They may complain a bit about sloppy application developers, standards compliance, and proprietary kernel modules, but they'll do the right thing in the end.
One should also remember that ext4 is still a very young filesystem; it's not surprising that a few rough edges remain in places. It is unlikely that we have seen the last of them.
As a related issue, it has been suggested that the real problem is with the POSIX API, which does not make the expression of atomicity and durability requirements easy or natural. It is time, some say, to create an extended (or totally new) API which handles these issues better. That may well be true, but this is easier said than done. There are, of course, the difficulties in designing a new API to last for the next few decades; one assumes that we are up to that challenge. But will anybody use it? Consider Linus Torvalds's response to another suggestion for an API extension:
Application developers will be naturally apprehensive about using Linux-only interfaces. It is not clear that designing a new API which will gain acceptance beyond Linux is feasible at this time.
Your editor also points out, hesitantly, that Hans Reiser had designed - and implemented - all kinds of features designed to allow applications to use small files in a robust manner for the reiser4 filesystem. Interest in accepting those features was quite low even before Hans left the scene. There were a lot of reasons for this, including nervousness about a single-filesystem implementation and nervousness about dealing with Hans, but the addition of non-POSIX extensions was problematic in its own right (see this article for coverage of this discussion in 2004).
The real answer is probably not new APIs. It is probably a matter of building our filesystems to provide "good enough" robustness as a default, with much stronger guarantees available to developers who are willing to do the extra coding work. Such changes may come hard to filesystem hackers who have worked to create the fastest filesystem possible. But they will happen anyway; Linux is, in the end, written by and for its users.
Page editor: Jonathan Corbet
Next page: Security>>
Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds