By Jake Edge
March 18, 2009
As GCC nears its 4.4 release, there are a number of criteria that need to
be met before it can be released. Those
requirements—regressions requiring squashing—have been met, but
things are still stalled. A number of issues were
raised with the changes to the
runtime library
exemption that have caused the
release, and a branch that will allow new development into the GCC tree, to
be delayed until that is resolved. In the
meantime, however, GCC development is hardly standing still, there are
numerous interesting ideas floating around for new features.
Changing the runtime library exemption was meant to allow the creation of a plugin
API for GCC, so that developers could add additional analysis or
processing of a program as it is being transformed by the compiler. The
Free Software Foundation has long been leery of allowing such a plugin
mechanism because they feared that binary-only GCC plugins of various sorts
might be the result. In January, though, the FSF announced that it would
change the exemption—which allows proprietary programs to link to the GCC
runtime library—in order to exclude code that has been processed by
a non-GPL "compilation process". It is a bit of license trickery that will
only allow plugins that are GPL-licensed.
Shortly after the new exception was
released, there were some seemingly substantive issues raised on the
gcc-devel mailing list. Ian Taylor neatly summarized the concerns, which break down into
three separate issues:
- Code that does not use the runtime library and its interfaces at all might
not be interpreted as included in the definition of an "Independent
Module", which would then disallow it from being combined with the GCC
runtime libraries. The code that fell outside of the "Independent Module"
definition would not be affected directly, but combining it with
other, compliant code that did use the runtime library would be
disallowed.
- There are questions about whether Java byte code should be
considered a "high-level, non-intermediate language". It is common to
generate Java byte code using a non-GCC compiler, but then process it with
gcj.
- There is also a hypothetical question about LLVM byte code
and whether it should be considered a "high-level, non-intermediate
language" as well.
Definitions of terms makes up the bulk of the runtime library exemption, so
it is clearly important to get them right. The first issue in Taylor's
summary seems like just an oversight—easily remedied—but the
last two are a little more subtle.
By and large, the byte code produced as part of a compiler's operation is
just an intermediate form that likely shouldn't be considered a "high-level,
non-intermediate language", but Java and LLVM are a bit different. In both
cases,
the byte code is a documented language, somewhat higher level than assembly
code, which, at least in the case of LLVM, is sometimes hand-written. For
Java, non-GPL compilers are often used, but based on the current exemption
language, the byte code from those compilers couldn't be combined with the
GCC runtime libraries and
distributed as a closed source program. Since LLVM is GPL-compatible,
there are currently no issues combining its output with the GCC runtime,
but Taylor is using it as another example of byte code being generated by
non-GCC tools.
In addition to laying out the issues, Taylor recommends two possible ways
forward. One of those is to clarify the difference between a compiler
intermediate form and a "high-level, non-intermediate language". The other
is to expand the definition of an eligible compilation process to allow any
input to GCC that is created by a program that is not derived from GCC.
Trying to make the former
distinction seems difficult to pin down in any way that can't be abused
down the road, so the second might be easier to implement. After all, the
GCC developers can determine what kinds of input the compiler is willing to
accept.
This may seem like license minutiae to some—and it is—but it is
important to get it right. The FSF has chosen to go this route to prevent
the—currently theoretical—problem of proprietary GCC plugins,
so they need to ensure that they close any holes.
As Dave Korn pointed out in another thread, releasing
anything using an unclear license could create problems down the road:
If there's a problem with the current
licence that would open a backdoor to proprietary plugins, and we ever release
the code under that licence, evaders will be able to maintain a fork under the
original licence no matter how we subsequently relicense it.
Meanwhile, GCC developers have been working on reducing the regressions so
that 4.4 can be released. Richard Guenther reported on March 13 that there were no
priority 1 (P1) regressions, and less than 100 overall regressions, which
would normally mean that a new branch for 4.4 would be created, with 4.5
development being added to the trunk.
But, because of the runtime library exception
questions, Richard Stallman asked the GCC Steering Committee (SC) to wait
for those to be resolved before branching.
The delay has been met with some unhappiness amongst GCC hackers. Without
a 4.4 release branch, interesting new features are still languishing in private
developer branches. As Steven Bosscher put
it:
But there are interactions
between the branches, and the longer it takes to branch for GCC 4.4,
the more difficult it will be to merge all the branches in for GCC
4.5. So GCC 4.5 is *also* being delayed, not just GCC 4.4.
What is also being held back, is more than a year of improvements since GCC
4.3.
Bosscher suggested releasing with the old exemption for 4.4 and fixing the
problems in the 4.5 release. While that could work, it would seem that
Stallman and the SC are willing to give FSF legal some time to clarify the
exemption. In the end, though, the point is somewhat moot as there is, as
yet, no plugin API available.
As part of the discussion of the new runtime library exception, Sean
Callanan sparked a discussion about a plugin
API by mentioning some of the plugins his research group had been
working on. That led to various thoughts about the API, including a wiki page for the plugin project
and one for the API
itself. Diego Novillo has also created a
branch to contain the plugin work.
The basic plan is to look at the existing plugins—most of which have
implemented their own API—to extract requirements for a generalized
API. In addition to the plugins mentioned by Callanan, there are others,
including Mozilla's Dehydra C++ analysis
tool, the Middle
End Lisp Translator (MELT), which is a Lisp dialect that allows the
creation of analysis and transformation plugins, and the MILEPOST self-optimizing
compiler. Once the license issues shake out, it would appear that a plugin
API won't be far behind.
There are other new features being discussed for GCC as well. Taylor has
put out a proposal to support "split
stacks" in GCC. The basic idea is to allow thread stacks to grow and
shrink as needed, rather than be statically allocated at a particular
size. Currently, applications that have enormous numbers of threads must
give each one the worst-case stack size, even when it might go unused
during the life of that thread. So, this could reduce memory usage, thus
allowing more threads to run, but it would also alleviate the need for
programmers to consider stack size for applications with thousands or
millions of threads.
Another feature is link-time optimization (LTO), which is much further
along than split stacks. Novillo put out a call for testers of the LTO branch in late
January. There are a number of optimizations that can be performed when
the linker has access to information about all of the compilation units.
Currently, the linker only has access to the object files that are being
collected into an executable, but LTO would put the GCC-internal
representation (GIMPLE) into a special section of the object file. Then,
at link time (but not actually implemented by the linker), various
optimizations based on the state of the whole program could be performed.
The kinds of optimizations that can be done are outlined in a paper [PDF] on "Whole
Program Optimizations" (WHOPR) written by a number of GCC hackers including
Taylor and Novillo.
While it is undoubtedly disappointing to delay GCC 4.4, hopefully the
license issues will be worked out soon and the integration of GCC 4.5 can
commence. In the interim, work on various features—many more
than are described here—is proceeding. The FSF has always had a
cautious approach to releases—witness the pace of Emacs—but
sooner or later, we will see GCC 4.4, presumably with a licensing change.
With luck, six months or so after that will come GCC 4.5 with some of these
interesting new features.
Comments (41 posted)
March 18, 2009
This article was contributed by Nathan Willis
The non-profit Media Development Loan
Fund (MDLF) released a major upgrade to its online journalism content
management system (CMS) Campsite last
week. Campsite 3.2 brings a flexible new plugin system and several
improvements to search, templating, and content editing. MDLF describes
Campsite as a CMS tailor-made for newspaper publishers — many of whom
cannot afford expensive commercial products or the IT support required to
heavily customize general-purpose CMS packages. Many of MDLF's target
organizations are independent media in countries in transition, but the
system is used in newsrooms all over the globe.
Campsite is deployed by more than 70 publications, many in Central and
Eastern Europe near MDLF's Center for
Advanced Media in Prague (CAMP), the office from which Campsite takes
its name. But the software is also popular in Latin America, such as at El Periodico in Guatemala City,
Guatemala.
MDLF's mission is to
support independent journalists and media organizations, so that they are
"strong enough to hold governments to account, expose corruption and
drive systemic change." Founded in 1996, it provides funding to
independent media in 23 countries, made possible through private
donations and public grants. MDLF describes tools as the key
investments for independent media, including printing presses, radio and
television transmitters, and software. Campsite and the other CAMP
projects grew out of MDLF's need to provide low-cost, open source
software for new media outlets.
The feature set caters to the needs of professional news publications,
which CAMP's Head of Research and Development Douglas Arellanes described
as an "organic" relationship. "Campsite has been around since 2000,
and nearly all of its features have come on the basis of real-world
implementations."
Arellanes says journalists and editors on deadlines have better things
to do than worry about CMS management, and that is the key difference that
sets Campsite apart. He personally likes Wordpress and has great respect
for the project, noting that:
It's really easy to get something
resembling a news site up and running quickly. And that's fine, especially
when those sites are one-person shows, where there's only one person
inputting content and managing the backend. But when you start to get more
people involved, and when you start to have different sections, with each
wanting their own news prioritization, managing that can become much
harder. And that's especially where a CMS like Campsite is
best-used.
Campsite's back-end allows an organization to replicate the newspaper
workflow: authors can create and edit stories, submitting them to the
editors when ready; editors can alter them, schedule them to run at
predetermines times, change their visibility, move them between sections,
and ultimately approve their publication. The system also handles
administrative tasks like managing subscriptions, tracking article views,
and moderating reader comments. A single back-end can also run multiple
publications with different rules, schedules, layouts, and subscription
lists and policies.
It can handle paid or unpaid subscriptions, supports embedded multimedia
in articles, can integrate with the Phorum web forum package, and is
multilingual from the ground up. The interface is available in 14
languages, and every individual article can be translated, whether as a
one-off special, side-by-side in a single publication, or with entire
sections or publications in separate languages.
Lead developer Mugur Rus said Campsite takes security very seriously.
The administration interface can run over SSL and uses fine-grained role
based privileges on all accounts. The system also uses CAPTCHA for comment
forms, logs all events, and can use email notification to alert system
administrators. Rus says Campsite itself has only been cracked once, and
that it uses the standard security features of its free software based
platform.
A peek at 3.2
Campsite is written in PHP and is designed to run on Apache servers
using MySQL. The manual
cites Apache 2.0.x, PHP 5.0, and MySQL 5.0 as the minimum version
dependencies, and requires ImageMagick to handle graphics. In addition,
you must run PHP as an Apache module, not as CGI, and there is a short list
of required PHP directives to set up in the installation's php.ini file.
Campsite runs on Linux, FreeBSD, Windows, and Mac OS X servers.
No current Linux distributions are known to include Campsite, although from
time to time users have shared their own home-brewed packages.
New in
version 3.2 are improvements to search functionality, content editing,
and site templating. The search improvements include an "advanced" search
mode and increased support for non-ASCII languages that were problematic in
earlier revisions. The story editor now uses the WYSIWYG TinyMCE component, with which
administrators can customize the available markup features by privilege
level. The Smarty-based
templating system now supports functions, and developers have begun
migrating the Campsite administration interface to Yahoo's open source AJAX
interface library YUI.
The most significant feature is the debut of a plugin architecture that
can extend the functionality of a publication but remain integrated with
the core Campsite story content. For example, one of the three default
plugins in Campsite 3.2 is a blogging module. Arellanes observed that most
newspaper sites are just beginning to implement staff blogs, and that when
they do they are typically stand-alone deployments of existing blog systems
that sit isolated from the rest of the publication. Using Campsite's blog
plugin,
however, content is accessible via the same topic tags and search, whether
it is a news article or a blog post.
The other two default plugins implement reader polls and a "live"
interview system — in which readers can ask questions, an
administrator can approve or reject them. The interviewee can then respond
to them and have the answers posted automatically.
3.2 also uses a simpler installation process, inspired by Wordpress's
five-minute install experience. Users now need only to expand the tar
archive of the latest Campsite build, put the site contents into the folder
they desire on their Web host, and follow the step-by-step setup process
through the Web installer.
Moving forward
Arellanes said to expect a quick turnaround for the next stable release
of Campsite, focusing on cache performance and overall speedups, and
implementing a content API that will permit Campsite sites to make their
story content available to outside users for aggregation or content
mashups.
He also hopes to see more development from third-party coders on plugins
using the new plugin API. "The open source model, as it concerns
Campsite, has meant that we're really growing in terms of the number of
features that are being contributed from non-core developers. We expect
this trend to really pick up now that we've got a plugin architecture to
work with."
The press
release for Campsite 3.2 notes that independent media in developing
countries have long operated on limited funds that preclude the expensive
CMS solutions preferred by other organizations — the very situation
that drives MDLF's software projects. But it also points out that
newspapers in the "developed" world are facing a financial crisis of their
own. Consequently, an open source CMS like Campsite makes more sense than
ever.
Comments (5 posted)
By Jonathan Corbet
March 17, 2009
One might well think that, at this point, there has been sufficient
discussion of the design decisions built into the ext4 "delayed allocation"
feature and the user-space implications of those decisions. And perhaps
that is true, but there should be room for a summary of the relevant
issues. The key question has little to do with the details of filesystem
design, and a lot to do with the kind of API that the Linux kernel should
present to its user-space processes.
As has been well covered (and discussed) elsewhere, the delayed
allocation feature found in the ext4 filesystem - and most other
contemporary filesystems as well - has some real benefits for system
performance. In many cases, delayed allocation can avoid the allocation
of space on the physical medium (along with the associated I/O) entirely. For longer-lived
data, delayed allocation allows the filesystem to optimize the placement of
the data blocks, making subsequent accesses faster. But delayed allocation
can, should the system crash, lead to the loss of the data for which space
has not yet been allocated. Any filesystem may lose data if the system is
unexpectedly yanked out from underneath it, but the changes in ext4 can
lead to data loss in situations that, with ext3, appeared to be robust.
This change looks much like a regression to many users.
Many electrons have been expended to justify the new, more uncertain ext4
situation. The POSIX specification says that no persistence is guaranteed
for data which has not been explicitly sent to the media with the
fsync() call. Applications which lose data on ext4 are not using
the filesystem correctly and need to be fixed. The real problem is users
running proprietary kernel modules which cause their systems to crash in
the first place. And so on. All of these statements are true, at least to
an extent.
But one might also argue that they are irrelevant.
Your editor recently became aware that Simson Garfinkel's Unix Hater's Handbook
[PDF] is available online. To say that this book is an aggravating
read is an understatement; much of it seems like childish poking at Unix by
somebody who wishes that VMS (or some other proprietary system) had taken
over the world. It's full of text like:
The traditional Unix file system is a grotesque hack that, over the
years, has been enshrined as a "standard" by virtue of its
widespread use. Indeed, after years of indoctrination and
brainwashing, people now accept Unix's flaws as desired
features. It's like a cancer victim's immune system enshrining the
carcinoma cell as ideal because the body is so good at making
them.
But behind the silly rhetoric are some real points that anybody concerned
with the value of Unix-like systems should hear. Among them are the "worse
is better" notion expressed by Richard Gabriel in 1991 - the year the Linux
kernel was born. This charge states that Unix developers will choose
implementation simplicity over correctness at the lower levels, even if it leads to
application complexity (and lack of robustness) at the higher levels. The
ability of a write() system call to succeed partially is given as
an example; it forces every write() call to be coded within a loop
which retries the operation until the kernel gets around to finishing the
job. Developers who cut corners like that are left with an application
which works most of the time, but which can fail silently in unexpected
circumstances. It is far better, these people say, to solve the problem
once at the kernel level so that applications can be simpler and more
robust.
The ext4 situation can be seen as similar: any application developer who
wants to be sure that data has made it to persistent storage must take
extra care to inform the kernel that, yes, that data really does matter.
Developers who skip that step will have applications which work -
almost all the time. One could well argue that, again, the kernel
should take the responsibility of ensuring correctness, freeing
application developers from the need to worry about it.
The ext3 filesystem made no such guarantees, but, due to the way its
features interact, ext3 provides something close to a persistence guarantee
in most situations. An ext3 filesystem running under a default
configuration will normally lose no more than five seconds worth of work in
a crash, and, importantly, it is not prone to the creation of zero-length
files in common scenarios. The ext4 filesystem withdrew that implicit
guarantee; unpleasant surprises for users followed.
Now the ext4 developers are faced with a choice. They could stand by their
changes, claiming that the loss of robustness is justified by increased
performance and POSIX compliance. They could say that buggy applications need to be
fixed, even if it turns out that very large numbers of applications need
fixing. Or, instead, they could conclude that Linux should provide a higher level of
reliability, regardless of how diligent any specific application developers might
have been and regardless of what the standards say.
It should be said that the first choice is not entirely unreasonable.
POSIX forms a sort of contract between user space and the kernel. When the
kernel fails to provide POSIX-specified behavior, application developers
are the first to complain. So perhaps they should not object when the
kernel insists that they, too, live up to their end of the bargain.
One could argue that applications which have been written according to the
rules should not take a performance hit to make life easier for the rest.
Besides, this is free software; it would not take that long to fix
up the worst offenders.
[PULL QUOTE:
There is a case to be made that
this is a situation where the Linux kernel, in the interest of greater
robustness throughout the system, should go beyond POSIX.
END QUOTE]
But fixing this kind of problem is a classic case of whack-a-mole:
application developers will continually reintroduce similar bugs. The kernel
developers have been very clear that they do not feel bound by POSIX when
the standard is seen to make no sense. So POSIX certainly does not compel
them to provide a lower level of filesystem data robustness than
application developers would like to have. There is a case to be made that
this is a situation where the Linux kernel, in the interest of greater
robustness throughout the system, should go beyond POSIX.
The good news, of course, is that this has already happened. There is a
set of patches queued for 2.6.30 which will provide ext3-like behavior in
many of the situations that have created trouble for early ext4 users.
Beyond that, the ext4 developers are considering an "allocate-on-commit"
mount option which would force the completion of delayed allocations when
the associated metadata is committed to disk, thus restoring ext3 semantics
almost completely. Chances are good that
distributors would enable such an option by default. There would be a
performance penalty, but ext4 should still perform better than ext3, and
one should not underestimate the performance costs associated with lost
data.
In summary: the ext4 developers - like Linux developers in general - do
care about their users. They may complain a bit about sloppy application
developers, standards compliance, and proprietary kernel modules, but
they'll do the right thing in the end.
One should also remember that ext4 is still a very young filesystem; it's not
surprising that a few rough edges remain in places. It is unlikely that we
have seen the last of them.
As a related issue, it has been suggested that the real problem is with the
POSIX API, which does not make the expression of atomicity and durability
requirements easy or natural. It is time, some say, to create an extended
(or totally new) API which handles these issues better. That may well be
true, but this is easier said than done. There are, of course, the
difficulties in designing a new API to last for the next few decades; one
assumes that we are up to that challenge. But will anybody use it?
Consider Linus Torvalds's
response to another suggestion for an API extension:
Over the years, we've done lots of nice "extended functionality"
stuff. Nobody ever uses them. The only thing that gets used is the
standard stuff that everybody else does too.
Application developers will be naturally apprehensive about using
Linux-only interfaces. It is not clear that designing a new API which will
gain acceptance beyond Linux is feasible at this time.
Your editor also points out, hesitantly, that Hans Reiser had designed -
and implemented - all kinds of features designed to allow applications to
use small files in a robust manner for the reiser4 filesystem. Interest in
accepting those features was quite low even before Hans left the scene.
There were a lot of reasons for this, including nervousness about a
single-filesystem implementation and nervousness about dealing with Hans,
but the addition of non-POSIX extensions was problematic in its own right
(see this article for
coverage of this discussion in 2004).
The real answer is probably not new APIs. It is probably a matter of
building our filesystems to provide "good enough" robustness as a default,
with much stronger guarantees available to developers who are willing to do
the extra coding work. Such changes may come hard to filesystem hackers
who have worked to create the fastest filesystem possible. But they will
happen anyway; Linux is, in the end, written by and for its users.
Comments (141 posted)
Page editor: Jonathan Corbet
Next page: Security>>