By Jonathan Corbet
September 14, 2011
At the 2011 Linux Plumbers Conference, it was not entirely unusual to hear
complaints that the sessions were not as energetic and discussion-oriented
as they were in previous LPCs.
Chances are good that anybody talking that way did not attend the "Boot and
Init" session, which was an occasion for vigorous - if good-natured -
debate. This article will cover two of the topics discussed there: booting
the system and the init process.
Reworking the boot sequence
According to Harald Hoyer, Linux does not lack for available boot loaders;
indeed, we have far too many of them. These boot loaders are becoming more
complex in unwelcome ways. GRUB and GRUB2, for example, contain
reimplementations of a number of filesystems used in Linux. GRUB
developers work hard to keep up, but they often find themselves one step
behind what is being done in the kernel. GRUB2 has made things worse by
turning its configuration file into a general-purpose scripting language;
that adds a bunch of complexity to the bootstrap process. We also see
battles between distributions (and non-Linux operating systems) over who
controls the master boot record (MBR) on the disk.
Harald had a proposal for improving the situation: rather than add
complexity to boot loaders in an attempt to keep up with the kernel, why
not just boot a simple generic Linux kernel and let it deal with the rest?
His idea is to create a /firstboot directory with a simple
filesystem and populate it with a single Linux kernel and an initramfs
image whose sole purpose is to find the real kernel and boot that.
This kernel will naturally understand Linux filesystems, and it will
support a user space with enough power to run whatever scripts are needed
to find other bootable images on the system. Meanwhile, the initial boot
loader can be made to be as simple as possible and distributions can stop
messing with the MBR.
The idea has some clear appeal, but it was not universally accepted by the
others in the room. To many, it looks like trying to solve the boot
problem by adding an extra level of indirection. In the process, it adds
another kernel bootstrap which, in turn, will make the boot process longer
and, arguably, more likely to fail. It is safe to say that no consensus
was reached in the room; the work will presumably continue and will be
judged on its merits when it is more advanced.
Systemd
The bulk of the time in this session was spent discussing Systemd. Lennart
Poettering talked at length about what has been accomplished in the last
year and where Systemd can be expected to go in the future. Suffice to say
that, as always, he does not lack ambition or a willingness to
stir things up.
In the last year, Lennart said, we have seen the first release of a
distribution using Systemd by default - Fedora 15. Mandriva has
recently released a Systemd-based version, and others (including openSUSE and
"a couple of others") are in the works. He seemed well pleased with the
adoption of Systemd so far.
Systemd is now able to boot a system without invoking any shells at all.
Under "ideal circumstances" it can get to a running user space less than
one second after startup. Not everybody gets to run under ideal
circumstances; the goal for the rest of us is less than ten seconds. There
are some significant challenges in the way of getting there, though; for
example, just loading the SELinux policy can take a few seconds by itself.
Starting up the logical volume manager (LVM) can also take a while; Lennart
proposes to fix that one by just removing LVM and using the volume
management features in Btrfs instead.
Lennart paused here to make the point that Systemd is now a capable init
system. But that's not where it stops; the plan is for Systemd to be a
platform on which a number of interesting things can be built.
There was a bit of discussion over moving functionality into Systemd. For
example, not everybody is happy with moving the setting of the host name
into the program. Lennart's position is that this task is done with a
single system call; invoking a separate binary for that just isn't
worthwhile. Others disagreed with this assessment. There was similar
disagreement over setting the system clock directly instead of using
hwclock; once again Lennart thinks it is too simple a task to
require a separate program, especially one as filled with legacy cruft as
hwclock is. Scott James Remnant asserted that said cruft is what
makes hwclock actually work for all systems and asked whether
Lennart planned to refuse to support older machines. Lennart's response
was that
older hardware is fine; what he is not supporting is older kernels that
lack proper realtime clock drivers.
In general, though, he said that, while Systemd is trying to simplify
common initialization tasks and make them fast, there is nothing preventing
people from using external programs like hwclock if that is what
they want to do. Systemd carefully avoids taking away the ability to use
older tools if that's what's needed. When those tools are not necessary,
though, Systemd aims to be able to completely boot a system with only a
very small number of other packages (such as glibc, d-bus, and util-linux)
installed. In the process, he hopes to standardize the boot process across
distributions, getting rid of lots of little differences that do not need
to be there.
A moment was spent defending Systemd against the charge of being bloated.
It is not bigger than it needs to be, Lennart said, and embedded developers
can use configuration options to trim it down considerably if there is
functionality that they do not want. Systemd is being picked up by
embedded distributions like Yocto and Ångström.
There are a number of interesting changes coming into Systemd in the near
future. One of those is the elimination of getty processes at
startup time. Instead, Systemd will start a getty on demand if
and when the user switches to a virtual console. The user experience will
be the same, but there will be fewer processes cluttering the system.
All services started by Systemd will have their standard output and error
streams connected to syslog by default. That makes it easier to write
services; it even supports the severity notation used by printk()
in the kernel. The downside is that verbose processes can clog the system
log, but, Lennart said, that should be fixed by shutting those processes
up.
"Presets" are another upcoming feature. Each distribution has its own
policy regarding whether services should be started by default and where
the exceptions are. Fedora, for example, requires explicit administrator
action to start any service, while Debian tends to assume that, if a
service is installed, it is meant to be run. The preset feature allows the
distributor to encapsulate that policy in a single file outside of the
packages for those services. Spins or derivative distributions can use it
to create a different policy without needing to modify the packages
themselves, and administrators can impose their own policy if they wish.
Further in the future is the idea of using systemd to manage sessions. The
problems encountered at that level, he said, are quite similar to those
encountered at initialization time. It's really just a matter of starting
a set of programs and keeping track of them. He had hoped to have session
management ready for Fedora 16, but that didn't happen, so the current
target is Fedora 17.
As part of this work, Lennart would really like the kernel to present a
single view of an "application," which can involve any number of processes.
For example, it would be nice to give specific applications access to
certain ports through the firewall. Control groups handle this task
reasonably well, so that is what Systemd is using. He is also trying to
create a unified view of a "session" encompassing its control group,
desktop, login information, PAM credentials, etc.
Specific goals for Fedora 17 include finishing this user session work.
There should also be multi-seat support. Imagine plugging in a USB hub
with keyboard, mouse, audio port, and frame buffer device; Systemd will
pick it up,
start a GDM session, and all of it will just work with no configuration
work required at all. This feature will be nice for settings like schools
where one system can easily handle multiple users; he also noted that it
can be highly useful for debugging embedded systems. Once upon a time, all
Unix systems were multi-seat; he is, he said, just bringing back a feature
that was in Unix at the very beginning. One side effect of this work will
be the removal of ConsoleKit.
There was some talk of removing the cron daemon, but that seems
unlikely to happen. What may happen instead is a movement of all the standard
system cron jobs to Systemd with the result that cron becomes an optional
utility. There was some interesting talk of using wakeup timers to set up
jobs that can actually power up the system to run. But cron
itself is a useful tool with a nice-enough interface; there doesn't seem to
be any real reason to replace it. But it will probably only be started if
actual configuration files are found.
Finally, there was a bit of talk about Systemd's socket activation
mechanism and security. Evidently "the SELinux folks" (not named in the
discussion) do not like this feature because Systemd represents a third,
uncontrolled process in
the connection between client and server. But Lennart pointed out that
Systemd never reads data from sockets; it simply uses them in the
activation process. And, in any case, Systemd is charged with loading the
SELinux policy in the first place; if it cannot be trusted, the system has
larger problems.
The overall picture was of a project that is on a roll, gaining features
and users at a fast rate. The Systemd view of the world has not yet won
over everybody, but the opposition seems to be fading. Systemd looks like
the init system of the future (and more) for a lot of high-profile
distributions.
Comments (58 posted)
September 14, 2011
This article was contributed by Nathan Willis
Transifex, the web-based
collaborative string translation hub, has rolled out a major update. There
are several new tools and features aimed at developers and translation
teams, but the most fundamental change is that the project is now offering
paid accounts for those who wish to work on closed-source
projects—providing funding that will help further development
of the
project.
Development projects can link Transifex to their existing version
control system (VCS), and Transifex will pull in and parse supported file types that contain user-visible strings. On the Transifex site, translators can start language-specific translation efforts, entering translations for each string and (if the project managers allow it) pushing the results back to the original VCS. Project managers can leave their projects relatively free-form, or set up more structured translation teams, with approval required to check changes back in.
We covered Transifex in 2009, and the service has improved considerably since that time. The 1.0 release in 2010 brought the largest change set, adopting an internal storage engine that is agnostic to the upstream VCS used by the project. The Transifex server retrieves files over HTTP, so the upstream files to be translated must be publicly accessible in raw form.
Once files are fetched, the server parses the file, saving the original
version as a template with its initial strings designated as the "source language." Translators are presented with the source language strings, and can enter their translated versions in a web-based editor — which was also new in 1.0. When changes are sent back to the upstream VCS, the Transifex server uses the template as a model, inserting the new strings where appropriate, and modifying the file metadata to indicate new language support and translator identities.
1.1 features
This level of automation requires building support for specific VCSes and new importer/exporter models for each new file type added. All of the major VCSes are supported now, but the number of file formats supported is still growing. The new 1.1 release, which landed in June 2011 — although the public Transifex.net server was not updated at that time — includes several new ones, most notably Freedesktop.org .desktop launcher files and the XML Localization Interchange File Format (XLIFF).
Most of the supported formats are for software development, such as
.po and .pot files for Gettext, .strings files
for Mac OS X and iOS, .resx for Windows, and the various formats
for Android, Java, and Qt applications. It is surprising to some, but Transifex supports file formats designed for other purposes as well. Support for XHTML, PHP arrays, and YAML enables projects to work on translation of web content, and support for the .srt, .sub, and .sbv media subtitle formats enable video caption translations.
For translators, the biggest new feature of Transifex 1.1 is "translation memory." This is a database of other translations which can be used for reference when editing a new string. Older translations from the specific project are available as a sort of local phrasebook, but the more interesting development is that the translations of other projects are accessible as suggestions, too. That could be particularly helpful when starting out a new project — if there are several possibilities for an uncommon term, it would be useful to see what other projects chose. Transifex presents the translations of similar text culled from among the other hosted projects.
Picky or secretive project managers can disable the cross-project
sharing feature, but still access previously-used suggestions from the
individual project. A primitive version of this feature was available in
Transifex 1.0, but it required making an explicit search query; the
automatic suggestions are simpler to use. A spell-checker is also now
built into the web-based editor, which is especially important because
it auto-saves translation work.
Two new features are available from the project manager side of the interface. First, developers can enter comments on the translatable resources directly within the Transifex web editor. Those comments could include explanatory notes on specific terms, or general instructions for translators.
A bit more interesting is the "pseudo-file" auto-generation feature. Transifex can create translation files in a dummy language, which a developer can then run and use to spot any strings that somehow escaped into the interface without being marked for translation. This type of pseudo-file is called the "Dot language," which substitutes a period for every original character. There is also an option that inserts random characters into the file, which Transifex creator Dimitris Glezos said could prove useful in testing UI layouts.
Putting random characters instead of dots helps the developer see if his strings show up correctly if there are tall characters, by expanding the string by 30% he can see if the UI will show up correctly in languages which tend to use more characters per word than English, etc.
Here's an example:
#: addons/cla/handlers.py:58
msgid "License text"
msgstr "Lïקïcéקénséקé téקéxt"
By "tall characters" Glezos primarily seems to mean accented letters, which are uncommon in English, the source language of most software projects. Extra-long and extra-tall strings have the potential to push interface widgets, menus, and column text out of vertical and horizontal alignment — a problem that can be difficult to test for without changing languages. Of course, a UI that breaks or becomes mis-aligned when the strings are too short is also a possibility, but in most cases the Dot language option would reveal those problems.
Finally, Transifex has also added an "Explore" interface to the site
itself, which lets visitors browse featured and active projects, in order
to foster community development. Right now, the explore feature highlights
only the most-active and largest projects on the site. If encouraging new
translators to join is part of the goal, hopefully other views will follow
— such as which projects or languages are most in need of help. The
individual project pages currently display this information,
showing the completion-percentage for each
language.
Freemium blend
Transifex was born out of Glezos's efforts to improve translation in the
Fedora project, and the emphasis of the service is still squarely on open
source projects. But 1.1 offers developers the ability to connect private
and proprietary projects to the web service.
There are multiple pricing plans to choose from, which vary in the number of contributors and the number of source words that are allowed in private or proprietary projects. Free accounts can connect to a total of two users and amass 2,000 source words. The 30-Euro-per-month plan ups the numbers to five users and 10,000 words, and the 300-Euro plan to 20 users and 50,000 words. The 300-Euro plan also includes a distinct yourproject.transifex.net subdomain, and priority ticket and telephone support. All of the plans, free and premium, can host an unlimited number of open source projects, with no limits on the number of user accounts associated or the size of the content.
Placing premium-based restrictions on the number of user accounts and the number of words associated with private/proprietary projects seems like an odd choice — after all, one would have to count the total number of words in the strings of a project, which may be hard to predict if it is still in development. But then again, the Transifex server software is available under the GPLv2, so any entity interested in maintaining a large, closed translation effort could simply install the code on its own private server.
In the blog announcement heralding the arrival of 1.1, Transifex noted that in years past, self-hosting has been the project's answer when someone asked about running a private server. It can now offer an alternative, which parent company Indifex indicates will be used to fund further work on the main Transifex code base.
Last words
Compared to the Rosetta online translation editor offered by Launchpad, Transifex appears to be evolving at a faster pace. Rosetta supports a smaller set of of file formats (just Gettext and Mozilla .xpi at the moment), although Rosetta has supported translation suggestions based on other projects' strings for several years, and supports maintaining multiple translation efforts for concurrent branches of the same project. Unless I have missed it in the interface, that last feature is not yet supported in Transifex.
It will be interesting to watch how the tiered, freemium price plans affect Transifex's development. Paid users on high-end plans get guaranteed two-day turnaround of support tickets; the crux will be whether that includes feature requests as well as bugs. The Enterprise Edition already supports more file formats and "performance optimizations."
Although Indifex says that revenue from the paid plans will be funneled back into the development of Transifex, we have all seen examples in the past where paid or "enterprise" users either begin to drive the development of new features, or get access to the new features before they are available to the free service or source code repository. Development is already underway for the next Transifex release, of course; given the large number of open source projects that now depend on it, hopefully it will be able to avoid the pitfalls of diverging free/paid interests.
Comments (none posted)
It's hard to believe that it's been almost one year since The Document Foundation (TDF) came
into existence. In that time, the foundation has made significant
progress, Oracle has handed the
OpenOffice.org keys to the Apache Foundation, and LibreOffice
team has been working hard to improve the suite in the meantime.
OpenOffice.org has, itself, had a long strange trip. The suite began as
a proprietary office suite called StarOffice developed and published by
StarDivision. StarDivision was eventually snapped up by Sun Microsystems,
which was ultimately swallowed by Oracle in 2010. After Oracle took over,
little happened and it was unclear what plans (if any) the software giant
had for OpenOffice.org.
Oracle's inaction, plus
impatience over promises to create a vendor neutral foundation for
OpenOffice.org, led to the decision to fork. Predictably, Oracle was
not pleased and showed
TDF members the door in October, 2010. Louis Suarez-Potts told the
members "your role in the Document Foundation and LibreOffice makes
your role as a representative in the OOo CC untenable and
impossible," and gave them the option of disassociating themselves
from TDF or resigning. Very little else happened with OpenOffice.org in the
meantime until Oracle proposed
OpenOffice.org to Apache as an Incubator project on June 1st.
LibreOffice developers didn't sit on their hands after announcing the
intent to fork. LibreOffice was put on an aggressive time-based release
plan, with two major releases a year. The first stable release (3.3.0)
landed just four months after
the split, with a number
of new features. Development has continued at a fair clip, and the
LibreOffice team continues to push out point releases on a regular
basis. Meanwhile, most if not all Linux distributions have made the
transition from OpenOffice.org to LibreOffice without any major
headaches.
LibreOffice Goals Met?
When LibreOffice launched, longtime OO.org developer Michael Meeks
talked to LWN about the goals for LibreOffice. Meeks said that he wanted
LibreOffice to have a "All Contributions Welcome and Valued"
sign welcoming contributions, clean up LibreOffice code, and "target
tackling many of the problems that have traditionally made it hard to
develop with, such as the arcane and monolithic build system."
In February 2011, the project started
fundraising to set up TDF as a legal entity. It took only
eight days to raise the €50,000 that the foundation sought to
incorporate the legal entity in Germany. More than 2,000 contributors
donated.
At six months, TDF member Florian Effenberger observed
the milestone with a post tallying the project's accomplishments. More
than 6,000 people subscribed to LibreOffice mailing lists, more than 150
new contributors checked in code for LibreOffice, and the project picked up
more than 50 translators as well.
The foundation is having
its first election with voting through October 10 to fill a board of
seven board seats and three deputies.
How about contributions? A snapshot
of contributors to LibreOffice 3.4.2 shows that about 25% came from
SUSE, about 25% were brought in from OpenOffice.org (attributed to Oracle),
and about 20% from Red Hat. Contributors not affiliated with one of the big
vendors also account for about 25% of the contributions. According to the
post, 3.4.2 received more than 23,000 commits from 300 contributors. This
may not reflect all work on LibreOffice, but it does show a pattern of
heavy contribution.
(Re)-Bootstrapping OpenOffice.org
While LibreOffice continues to churn out releases, the slow work of
transitioning OpenOffice.org to Apache is continuing. The incubator site is up
on Apache.org, and things like the mailing
lists have been put in place. The project has more than
70 committers listed, and commits
have started coming in as well.
However, according to the clutch status page for
Apache Incubator projects the project has not
added any committers since the project was established. The project
also lacks an issue tracker. There are no releases for Apache OpenOffice
— even a beta — though code is
available in Apache's repository. This is not surprising, since much of
the discussions on the list involve trying to successfully build AOO. The
project blog has been relatively
quiet, with only two posts. The first post in June, announces the
addition of Apache OpenOffice.org to the incubator. The second
on September 1st announcing a IRC-based developer eduction event for
building OpenOffice.org on Linux.
The developer
list for the AOO podling has been fairly active — though much of
the recent conversation has been community
governance problems that need to be solved with regards to moving from
an established project to the Apache structure and new management.
The Future
Apache OpenOffice.org is still putting
together its plans for builds and releases. The plans for the first
Apache release include phasing
out the old binary format for OpenOffice.org but not much in the way of
new features. LibreOffice also
will be doing away with the old binary StarOffice formats in the 4.0
timeframe.
Assuming AOO.org does come online and start pouring out new features,
they may be difficult to
share with LibreOffice according to Meeks. This has been raised
as an issue by Rob Weir on the AOO.org list.
The LibreOffice team recently
had a hackfest in Munich. Some of the concrete features that came out
of that include support for importing Visio format, a feature for editing
headers and footers in Writer, and an initial Gerrit setup for code
review.
The project has also launched
a extension and template repository for LibreOffice and compatible
suites. The sites are in beta testing at the moment, put into place in
cooperation with the Plone community.
In October, the first LibreOffice conference
will take place in Paris. The conference will run from 12 to 15
October, and includes everything from media training for LibreOffice
volunteers to a presentation about LibreOffice Online (LOOL) by Michael
Meeks. Unfortunately, no details are provided regarding the plans Meeks has
for the presentation. Perhaps we'll see a libre competitor to Google Docs
at some point from the LibreOffice folks.
Coming in 3.5
The LibreOffice 3.5.0 release is planned for December. The work-in-progress
release notes indicate some of the features that may appear in
3.5. Currently there's a plan to include two new numbering types for
bullets (Persian words, and Arabic Abjad sequence) in Writer, and display
non-printable characters at the end of a line if desired.
Calc may increase support to 32,000 sheets thanks to features from
Markus Mohrhard, and users will be able to specify how many sheets are
available in a new Calc document thanks to Albert Thuswaldner. There's also
improvements to line drawing in Chart, and Kohei Yoshida has added some
performance improvements for importing Excel documents.
Miklos Vajna has been improving import for RTF and
DOCX formats, which should land in 3.5 as well. The proposed release
notes also have a few GUI improvements, such as getting
rid of the unused toolbar menus and sorting menus in a natural sort
order (so Heading 10 would follow Heading 9, instead of Heading 1 in
formatting as an example).
One year following the split, and LibreOffice looks like a fairly
healthy and viable community. Apache OpenOffice.org may also grow into a
viable project, though it's a bit too early to tell whether it has
legs.
Comments (21 posted)
Page editor: Jonathan Corbet
Security
By Jake Edge
September 14, 2011
A longstanding "flaw" (depending on who you talk to) in the Linux security
module (LSM) subsystem is the inability to stack LSMs. That particular
problem came up for discussion at the Linux Security Summit (LSS), which
was held
on September 8 in conjunction with the Linux Plumbers Conference. The
participants were mostly in favor of stacking LSMs, the question
was how to get there.
Allowing administrators to enable more than one LSM at a time has been a recurring problem. Some would like to be able
to mix and match the protections offered by the different security
solutions, but that is not currently possible. In addition, some
specialized security functionality has been proposed at various times, but
typically shunted toward an LSM-based solution. Unfortunately, in most
distributions, the single LSM slot is already occupied by SELinux,
AppArmor, or some other LSM, so separate LSMs with extra protections are of
no use to many administrators. Thus the interest in stacking
(or chaining) LSMs.
Ubuntu security team member Kees Cook and Smack developer Casey Schaufler
led the discussion, with Schaufler presenting a list of directions for LSM
that he jokingly described as "suggestions that we've come up with
and think you should come up with too". The list covered five
items, but all were targeted at problems that stem from having multiple
concurrent (i.e. stacked) LSMs.
Stacking
David Howells proposed a possible solution
for stacking LSMs back in February, but it won't allow two mandatory access
control (MAC) LSMs (e.g. Smack and SELinux) to coexist. Schaufler has
promised another, more
general solution (which he calls "Glass"), but it still doesn't work for
all four LSMs (SELinux, Smack, TOMOYO, and AppArmor) at once, though
"it's really close".
There is a question of why users would want to combine the existing LSMs,
but Schaufler said that there are several combinations that people want to
try. The two that are hardest to get working together (SELinux and Smack)
turn out to be the "only uninteresting combination", he said.
He has heard of users wanting to use Smack and AppArmor (or TOMOYO) at the
same time, as well as folks that want one thing that SELinux can do along
with something else that TOMOYO can do.
As Cook pointed out, though, another likely scenario is that administrators
will want to augment the distribution-provided security framework with
additional restrictions that could come from a specialized LSM. Cook's Yama is one such solution. It restricts
ptrace() and symbolic links in
"sticky" directories in ways that many are in favor of, though it has
not yet made it into the mainline. For
that use-case, the idea would be to not have to lose the distribution's LSM
to add others like Yama.
Howells's patches could support fairly simple scenarios like SELinux (or
AppArmor) plus Yama, but Schaufler isn't convinced that restricting the
combinations is the right way forward. There are, however, a bunch of
technical problems that will need to be solved in order to do arbitrary
stacking. Some are fairly straightforward to handle, like how to share the
security context "blob" between multiple LSMs, but others are more difficult.
If there is a stack of LSMs, what happens when one LSM chooses
to deny access? If the other LSMs in the stack are bypassed because of that
denial decision, they may get an incomplete picture of the accesses being
requested. Howells's patch does short-circuit other LSMs that way, but at
the time it was proposed Schaufler was
concerned about LSMs that collect statistical information on accesses that
would factor into subsequent access decisions.
Security IDs
But the biggest problem area is with Security IDs (secids). These
are 32-bit identifiers used by LSMs (currently only SELinux and Smack) to
identify security contexts when callbacks are made from other subsystems
(notably audit and networking) for access decisions. But, if both are
active, the secid space needs to be shared somehow. There are two
ways to do that, Schaufler said, "one isn't good, the other is
painful". Essentially, you could either split the secid
into two 16-bit pieces (the not good choice), one for each LSM (which, of course, opens the
question of what to do for three or more secid-using LSMs), or you
could set up some kind of mapping where each LSM had its own secid
space and those get mapped to a value in a shared space (the painful choice).
Alternatively, Schaufler advocates getting rid of the secids
entirely and using security blob pointers everywhere. There is at least one
major stumbling block to that plan, however, as getting a blob pointer into
the networking code will be somewhere between difficult and impossible.
Networking maintainer David Miller is adamantly opposed to putting such a
pointer into the sk_buff structure, and he NAKed that when it was
proposed earlier. There are some lifecycle management and performance
issues that Miller is concerned about, according to Paul Moore. In fact,
Moore is pretty confident that Miller hasn't changed his mind, as he
offered a "case of your favorite scotch" to anyone who could
convince Miller to add the pointer to sk_buffs.
According to Schaufler, LSM stacking is clearly needed, particularly in the
embedded space. In addition, without the ability to stack LSMs, people are
becoming discouraged from writing new, more specialized LSMs. While
Schaufler believes there are various access restrictions that can't be done
using the existing LSMs, the SELinux folks (Stephen Smalley in particular)
are not so sure. That said, though, Smalley is not opposed to something
that would allow stacking Yama with SELinux (for example). Rather than
trying to get a fully general stacking method into the mainline, Cook
suggested that a "trimmed-down approach", along the lines of
what Howells proposed, be tried instead.
Other multi-LSM wrinkles
There are other things that need to be worked out in any multi-LSM
scenario, including what to do about /proc/PID/attr/current.
According to Schaufler (with the agreement of AppArmor maintainer John
Johansen), the LSMs that came after SELinux made a mistake by reusing the
current file to contain information on the security context of the
process. Because stacking wasn't allowed, there was no real reason not to
reuse that file, but now it could cause problems.
One possibility is to include the name of the LSM in the path somewhere
(e.g. /proc/PID/attr/smack-current), but that isn't a complete
solution because existing user-space programs expect to find
current. Cook suggested that whichever LSM gets loaded first gets
current (in addition to its LSM-specific file). Or "out of
respect for our elders", all LSMs could defer to SELinux for
current, Schaufler said.
That leads to a related problem: determining which LSM is active (or LSMs
are active in a multi-LSM world). Currently, each LSM has its own ad
hoc method for user space to figure out whether it is running. Adding
a /sys/kernel/security/lsm file with the names of any active LSMs
in it would help. Any LSM that is "actively enforcing policy"
(e.g. not SELinux in permissive or disabled modes) would add itself in the
order in which the LSM was loaded.
Another related problem is the lack of consistency in
/sys/kernel/security that led to the suggestion of an "LSMKit"
(which was greeted with laughter when Cook jokingly suggested it). There
are a number of tools that display security context information
(e.g. ls -Z) that will be confused in a multi-LSM world.
Creating a library that would enumerate the active LSMs and gather up the
relevant context information would simplify those tools, as well as
providing some consistency of the kind of information that gets presented.
Those gathered at the summit seemed favorably disposed toward that idea,
though it is unclear if anyone will actually have the time to work on it.
Schaufler noted that the general agreement about the need for LSM stacking
was new. It is "the first time no one has stood up and said 'This is
an abomination'", he said. But, Smalley said that didn't mean that
he thinks it's a good idea either. Basically, he said that
"arbitrary composition [of security modules] is known to be a bad
thing", but that he recognizes some will still want to be able to do
it. As long as full-fledged security frameworks like SELinux and AppArmor
can live in "their own separate worlds", he is not opposed to
having some way to compose LSMs.
But, Smalley still thinks that there could be a single LSM that is used by
everyone. Getting there is a matter of understanding all of the
requirements that are being solved by various LSMs and incorporating them
all into one. Schaufler is skeptical of that approach, and believes that
it is "beyond us technologically" to fully understand all of
the requirements that are or will be needed. Good solutions tend to come
along periodically, he said, and we should have ways to accommodate them.
Wrapping up
Debian currently only compiles one LSM (SELinux) into its kernel due to
the memory
that gets wasted by the unused code for inactive LSMs. Cook brought this
issue up because he would like to see Debian kernels build in more LSMs and
allow users to choose which to activate at boot time. It is a
"tiny"
amount of memory, according to Cook, but Debian is unwilling to add any
more LSMs until there is some way to recover the lost memory.
At first, there was concern that the idea was to return to the days where
LSMs were actually kernel modules that could loaded and unloaded (which
caused innumerable problems when the active LSM was unloaded). But Cook
said all that was really needed was a way to unload all but the active
LSM. As long as this unloading mechanism didn't
touch the active LSM, and that the feature itself was optional, no one seemed
to object to it. So it is mostly just a matter of someone finding the
time to write the code.
The fate of Yama was the last thing discussed in the LSM roundtable. The
protections that it offers are valuable; several people in the room
said they would enable it if it were in the mainline (and the stacking
problem were solved). But, seemingly, no matter how Cook structures the
code (in the core or as an LSM), it gets NAKed, partly because it does not
represent a coherent security framework as the existing LSMs do.
Part of the concern is that LSMs would become a "dumping ground" for
various security fixes/enhancements that are not deemed acceptable for core
kernel code. Smalley wanted to ensure that there was not a proliferation
of small, specialized LSMs and would instead like to see Yama become the
LSM for discretionary access control (DAC) enhancements. Any other
proposals for those kinds of changes could be pushed toward Yama, rather
than creating a whole new LSM.
That idea seemed to gain some traction but, unless Yama gets into the
mainline, it's a moot point. There was some discussion of Christoph
Hellwig's NAK that kept Yama out the last time it came up. Smalley and
others are not really convinced that his NAK is valid unless Yama touches
VFS internals (which it doesn't). Cook pointed out that the
ptrace() restrictions can't be done with any of the existing LSMs
and that the symlink restrictions are "provably correct", but
there is no path into the kernel that he's found. At this point, the plan
seems to be to propose Yama again, perhaps as the "enhanced DAC" LSM, and
to try to overcome any NAKs by better explaining the benefits Yama
provides. The clear sense was that a more concerted effort would be made
to get Yama into the mainline in the near future.
[ I would like to thank all LWN subscribers for travel assistance to attend the security
summit. ]
Comments (8 posted)
Brief items
Concluding with the massive expansion of surveillance since 9/11, the
report delves into the many ways the government now spies on Americans
without any suspicion of wrongdoing, from warrantless wiretapping to cell
phone location tracking - but with little to show for it. "The reality is
that as governmental surveillance has become easier and less constrained,
security agencies are flooded with junk data, generating thousands of false
leads that distract from real threats," the report says.
-- The
American
Civil Liberties Union (ACLU) previews its
A Call to Courage:
Reclaiming Our Liberties Ten Years After 9/11 report
So much intercepted information is now being collected from "enemies" at
home and abroad that, in order to store it all, the agency [US National
Security Agency] last year began
constructing the ultimate monument to eavesdropping. Rising in a remote
corner of Utah, the agency's gargantuan data storage center will be 1
million square feet, cost nearly $2 billion and likely be capable of
eventually holding more than a yottabyte of data — equal to about a
septillion (1,000,000,000,000,000,000,000,000) pages of text.
--
James
Bamford
Comments (6 posted)
The Linux Foundation has announced that Linux.com and LinuxFoundation.org accounts have been compromised. It believes the breach is connected to the kernel.org compromise. "
As with any intrusion and as a matter of caution, you should consider the
passwords and SSH keys that you have used on these sites compromised. If you
have reused these passwords on other sites, please change them immediately.
We are currently auditing all systems and will update public statements when
we have more information. [...] We have taken all Linux Foundation servers offline to do complete
re-installs. Linux Foundation services will be put back up as they become
available. We are working around the clock to expedite this process and are
working with authorities in the United States and in Europe to assist with
the investigation."
Full Story (comments: 41)
New vulnerabilities
audacious-plugins: unspecified vulnerability
| Package(s): | audacious-plugins |
CVE #(s): | |
| Created: | September 12, 2011 |
Updated: | September 19, 2011 |
| Description: |
Fedora added a patch to use the system's libmodplug library.
|
| Alerts: |
|
Comments (none posted)
chromium-browser: multiple vulnerabilities
| Package(s): | chromium-browser |
CVE #(s): | CVE-2011-2359
CVE-2011-2800
CVE-2011-2818
|
| Created: | September 12, 2011 |
Updated: | September 15, 2011 |
| Description: |
From the Debian advisory:
CVE-2011-2818: Use-after-free vulnerability in Google Chrome allows remote attackers to cause a denial of service or possibly have unspecified other impact via vectors related to display box rendering.
CVE-2011-2800: Google Chrome before allows remote attackers to obtain potentially sensitive information about client-side redirect targets via a crafted web site.
CVE-2011-2359:
Google Chrome does not properly track line boxes during rendering, which
allows remote attackers to cause a denial of service or possibly have
unspecified other impact via unknown vectors that lead to a "stale pointer."
|
| Alerts: |
|
Comments (1 posted)
cyrus-imapd: remote code execution
| Package(s): | cyrus-imapd |
CVE #(s): | CVE-2011-3208
|
| Created: | September 14, 2011 |
Updated: | October 24, 2011 |
| Description: |
The cyrus-imapd daemon suffers from a buffer overflow that may be exploitable for code execution by remote attackers. |
| Alerts: |
|
Comments (none posted)
hplip: symlink attack
| Package(s): | hplip |
CVE #(s): | CVE-2011-2722
|
| Created: | September 12, 2011 |
Updated: | February 21, 2013 |
| Description: |
From the Red Hat bugzilla:
A temporary file handling flaw was reported in prnt/hpijs/hpcupsfax.cpp,
the hplip HP CUPS filter. Because a predictable temporary filename is used
(/tmp/hpcupsfax.out), an attacker could use a symlink attack to overwrite an
arbitrary file with the privileges of the process running the HP CUPS fax
filter. |
| Alerts: |
|
Comments (none posted)
kernel: multiple vulnerabilities
| Package(s): | kernel |
CVE #(s): | CVE-2011-2723
CVE-2011-2928
CVE-2011-3188
CVE-2011-3191
|
| Created: | September 9, 2011 |
Updated: | November 28, 2011 |
| Description: |
From the Debian advisory:
CVE-2011-2723: Brent Meshier reported an issue in the GRO (generic receive offload) implementation. This can be exploited by remote users to create a denial of service (system crash) in certain network device configurations.
CVE-2011-2928: Timo Warns discovered that insufficient validation of Be filesystem images could lead to local denial of service if a malformed filesystem image is mounted.
CVE-2011-3188: Dan Kaminsky reported a weakness of the sequence number generation in the TCP protocol implementation. This can be used by remote attackers to inject packets into an active session.
CVE-2011-3191: Darren Lavender reported an issue in the Common Internet File System (CIFS). A malicious file server could cause memory corruption leading to a denial of service.
|
| Alerts: |
|
Comments (none posted)
kernel: denial of service
| Package(s): | kernel |
CVE #(s): | CVE-2011-2482
CVE-2011-2519
|
| Created: | September 9, 2011 |
Updated: | September 14, 2011 |
| Description: |
From the Scientific Linux advisory:
A NULL pointer dereference flaw was found in the Linux kernel's Stream
Control Transmission Protocol (SCTP) implementation. A remote attacker
could send a specially-crafted SCTP packet to a target system, resulting in
a denial of service. (CVE-2011-2482, Important)
A flaw was found in the way the Linux kernel's Xen hypervisor
implementation emulated the SAHF instruction. When using a
fully-virtualized guest on a host that does not use hardware assisted
paging (HAP), such as those running CPUs that do not have support for (or
those that have it disabled) Intel Extended Page Tables (EPT) or AMD
Virtualization (AMD-V) Rapid Virtualization Indexing (RVI), a privileged
guest user could trigger this flaw to cause the hypervisor to crash.
(CVE-2011-2519, Moderate)
|
| Alerts: |
|
Comments (none posted)
kernel: denial of service
| Package(s): | kernel |
CVE #(s): | CVE-2011-2699
|
| Created: | September 14, 2011 |
Updated: | November 28, 2011 |
| Description: |
The IPv6 stack was found to be using predictable fragment identification numbers, allowing an attacker to run the server out of memory. |
| Alerts: |
|
Comments (none posted)
librsvg2: arbitrary code execution
| Package(s): | librsvg2 |
CVE #(s): | CVE-2011-3146
|
| Created: | September 12, 2011 |
Updated: | October 5, 2011 |
| Description: |
From the Red Hat bugzilla:
A NULL pointer dereference flaw was reported [1] by Sauli Pahlman in librsvg.
If a program linked to librsvg where to open a crafted SVG file, it could cause
that application to crash or potentially execute arbitrary code.
|
| Alerts: |
|
Comments (none posted)
mantis: local file inclusion/cross-site scripting
| Package(s): | mantis |
CVE #(s): | CVE-2011-3357
CVE-2011-3358
|
| Created: | September 12, 2011 |
Updated: | November 9, 2012 |
| Description: |
From the Debian advisory:
Several vulnerabilities were found in Mantis, a web-based bug
tracking system: Insufficient input validation could result in local
file inclusion and cross-site scripting.
|
| Alerts: |
|
Comments (none posted)
openssl: certification error
| Package(s): | openssl |
CVE #(s): | CVE-2011-3207
|
| Created: | September 12, 2011 |
Updated: | October 27, 2011 |
| Description: |
From the Red Hat bugzilla:
Under certain circumstances OpenSSL's internal certificate verification
routines can incorrectly accept a CRL whose nextUpdate field is in the past.
|
| Alerts: |
|
Comments (none posted)
openssl: key disclosure
| Package(s): | openssl |
CVE #(s): | CVE-2011-1945
|
| Created: | September 14, 2011 |
Updated: | October 5, 2011 |
| Description: |
The openssl ECDHE_EDCS cipher is vulnerable to timing attacks that make it easier to determine private keys. |
| Alerts: |
|
Comments (none posted)
phpMyAdmin: cross-site scripting
| Package(s): | phpMyAdmin |
CVE #(s): | CVE-2011-3181
|
| Created: | September 13, 2011 |
Updated: | October 21, 2011 |
| Description: |
From the CVE entry:
Multiple cross-site scripting (XSS) vulnerabilities in the Tracking feature in phpMyAdmin 3.3.x before 3.3.10.4 and 3.4.x before 3.4.4 allow remote attackers to inject arbitrary web script or HTML via a (1) table name, (2) column name, or (3) index name. |
| Alerts: |
|
Comments (none posted)
pure-ftpd: directory traversal
| Package(s): | pure-ftpd, pure-ftpd-debuginfo |
CVE #(s): | CVE-2011-3171
|
| Created: | September 9, 2011 |
Updated: | September 14, 2011 |
| Description: |
From the SUSE advisory:
A local attacker could overwrite local files when the OES
remote server feature of pure-ftpd is enabled due to a
directory traversal. |
| Alerts: |
|
Comments (none posted)
quassel: denial of service
| Package(s): | quassel |
CVE #(s): | |
| Created: | September 12, 2011 |
Updated: | September 14, 2011 |
| Description: |
From the Ubuntu advisory:
It was discovered that Quassel did not properly handle CTCP requests. A
remote attacker could exploit this to cause a denial of service via
application crash.
|
| Alerts: |
|
Comments (none posted)
Page editor: Jake Edge
Kernel development
Brief items
The current development kernel is 3.1-rc6,
released on September 14. Things
continue to move slowly in the absence of kernel.org, so there aren't that
many changes this time around. "
Nothing really stands out. Have at
it, and let us know of any outstanding regressions." The repository
is still hosted at Github, naturally.
Stable updates: no stable updates have been released in the last
week.
Comments (none posted)
In short, spatch files can be used on target directories to
generate patches. spdiff can read a patch file and generate an
spatch file for you. What this means for the backporting world is
if you backport one evolutionary change in the Linux kernel for one
driver you can then backport the same change for *all*
drivers. This is a quantum leap in terms of effort required to
backport.
--
Luis Rodriguez makes backporting easy
I'm not sure derivative works law is quite so clear cut, but then
'provide a clear concise definition of derivative works' appears to
be the legal version of The Goldbach Conjecture.
--
Alan Cox
Comments (none posted)
By Jonathan Corbet
September 13, 2011
The security problems at kernel.org have raised concerns about the kernel
source and other software hosted there. There has been no evidence, so
far, that kernel.org was used to distribute any corrupted software. But
there is another aspect
to this breakin: kernel.org is "down for maintenance" and there is no word
as to when it might come back. As a result, even if no malware was
distributed, the kernel.org crack represents a denial of service attack of
significant proportions.
Linus has released two 3.1-rc versions from a temporary site at Github, but
there's not a lot of work to be found there. Among other
things, the loss of all the repositories hosted on kernel.org means that
there is relatively little for him to pull. Stephen Rothwell, meanwhile,
continues to pull the trees he can reach to create linux-next. He is able
to report integration and build problems, but cannot put the tree where others can reach it.
"Besides, I am having a nice restful time." There have been no
stable tree updates since kernel.org went down.
Alternative trees are beginning to pop up across the net as developers find
other places to host their work for now. If the kernel.org outage
continues for some time, we can expect to see many more of those show up -
though some developers are refusing to set
up alternative repositories.
Most of the substitute trees are described as temporary; it will be
interesting to see how many of them actually move back to kernel.org once
this episode has run its course. Some developers may decide that keeping
their trees elsewhere works better for them.
We may have a distributed source control system, but it has become clear
that the kernel community works with a rather centralized hosting and distribution
infrastructure.
The loss of kernel.org has slowed things enough to make it
clear that the process has a single point of failure built into it.
Whether that is worth fixing is not entirely clear; no code should have
been lost and, if kernel.org were ever to disappear permanently, the
process could be back to full speed on other systems in short order. For
now, though, we're seeing things disrupted in a way few other events have
been able to manage. It's interesting to ponder on what would have
happened had the compromise come out during the merge window.
Comments (8 posted)
Kernel development news
By Jonathan Corbet
September 13, 2011
Almost every service offered by Google is delivered over the Internet, so
it makes sense that the company would have an interest in improving how the
net performs. The networking session at the 2011 Linux Plumbers Conference
featured presentations from three Google developers, each of whom had a
proposal for a significant implementation change. Between the three, it
seems, there is still a lot of room for improvement in how we do
networking.
Proportional rate reduction
The "congestion window" is a TCP sender's idea of how much data it can have
in flight to the other end before it starts to overload a link in the middle.
Dropped packets are often a sign that the congestion window is too large,
so TCP implementations normally reduce the window significantly when loss
happens. Cutting the congestion window will reduce performance, though; if
the packet loss was a one-time event, that slowdown will be entirely
unnecessary. RFC 3517
describes an algorithm for bringing the connection up to speed quickly
after a lost packet, but, Nandita Dukkipati says, we can do better.
According to Nandita, a large portion of the network sessions involving
Google's servers
experience losses at some point; the ones that do can take 7-10 times
longer to complete. RFC 3517 is part of the problem. This algorithm
responds to a packet loss by immediately cutting the congestion window in
half; that means that the sending system must, if the congestion window had
been full at the time of the loss, wait for ACKs for half of the in-transit
packets before transmitting again. That causes the sender to go silent for
an extended period of time. It works well enough in simple cases (a single
packet lost in a long-lasting flow), but it tends to clog up the works when
dealing with short flows or extended packet losses.
Linux does not use strict RFC 3517 now; it uses, instead, an enhancement
called "rate halving." With this algorithm, the congestion window is not
halved immediately. Once the connection goes into loss recovery, each
incoming ACK (which will typically acknowledge the receipt of two packets
at the other end) will cause the congestion window to be reduced by a
single packet. Over the course of one full set of in-flight packets, the
window will be cut in half, but the sending system will continue to
transmit (at a lower rate) while that reduction is happening. The result
is a smoother flow and reduced latency.
But rate halving can be improved upon. The ACKs it depends on are
themselves subject to loss; an extended loss can cause significant
reduction of the congestion window and slow recovery. This algorithm also
does not even begin the process of raising the congestion window back to
the highest workable value until the recovery process is complete. So it
can take quite a while to get back up to full speed.
The proportional rate reduction algorithm takes a different approach. The
first step is to calculate an estimate for the amount of data still in
flight, followed by a calculation of what, according to the congestion
control algorithm in use, the congestion window should now be. If the
amount of data in the pipeline is less than the target congestion window,
the system just goes directly into the TCP slow start algorithm to bring
the congestion window back up. Thus, when the connection experiences a
burst of losses, it will start trying to rebuild the congestion window
right away instead of creeping along with a small window for an extended
period.
If, instead, the amount of data in flight is at least as large as the new
congestion window, an algorithm
similar to rate halving is used. The actual reduction is calculated
relative to the new congestion window, though, rather than being a strict
one-half cut. For both large and small losses, the emphasis on using
estimates of the
amount of in-flight data instead of counting ACKs is said to make recovery
go more smoothly and to avoid needless reductions in the congestion window.
How much more better is it? Nandita said that Google has been running
experiments on some of its systems; the result has been a 3-10% reduction
in average latency. Recovery timeouts have been reduced by 5%. This
code is being deployed more widely on Google's servers; it also has been
accepted for merging during the 3.2 development cycle. More information
can be found in this
draft RFC.
TCP fast open
Opening a TCP connection requires a three-packet handshake: a SYN packet
sent by the client, a SYN-ACK response from the server, and a final ACK
from the client. Until the handshake is complete, the link can carry no
data, so the handshake imposes an unavoidable startup latency on every
connection. But what would happen, asked Yuchung Cheng, if one were to
send data with the handshake packets? For simple transactions - an HTTP
GET request followed by the contents of a web page, for example - sending
the relevant data with the handshake packets would eliminate that latency.
The result of this thought is the "TCP fast open" proposal.
RFC 793 (describing TCP)
does allow data to be passed with the handshake packets, with the proviso
that the data not be passed to applications until the handshake completes.
One can consider fudging that last requirement to speed the process of
transmitting data through a TCP connection, but there are some hazards to
be dealt with. An obvious problem is the amplification of SYN flood
attacks, which are bad enough when they only involve the kernel; if each
received SYN packet were to take up application resources as well, the
denial of service possibilities would be significantly worse.
Yuchung described an approach to fast open which is intended to get
around most of the problems. The first step is the creation of a
per-server secret which is hashed with information from each client to
create a per-client cookie. That cookie is sent to the client as a special
option on an ordinary SYN-ACK packet; the client can keep it and use it for
fast opens in the future. The requirement to get a cookie first is a low
bar for the prevention of SYN flood attacks, but it does make things a
little harder. In addition, the server's secret is changed relatively
often, and,
if the server starts to see too many connections, fast open will simply be
disabled until things calm down.
One remaining problem is that about 5% of the systems on the net will drop
SYN packets containing unknown options or data. There is little to be done
in this situation; TCP fast open simply will not work. The client must
thus remember cases where the fast-open SYN packet did not get through and
just use ordinary opens in the future.
Fast open will not happen by default; applications on both ends of the
connection must specifically request it. On the client side, the
sendto() system call is used to request a fast-open connection;
with the new MSG_FAST_OPEN flag, it functions like the combination
of connect() and sendmsg(). On the server side, a
setsockopt() call with the TCP_FAST_OPEN option will
enable fast opens. Either way, applications need not worry about dealing
with the fast-open cookies and such.
In Google's testing, TCP fast open has been seen to improve page load times
by anything between 4% and 40%. This technique works best in situations
where the round trip time is high, naturally; the bigger the latency, the
more value there is in removing it. A patch implementing this feature will
be submitted for inclusion sometime soon.
Briefly: user-space network queues
While the previous two talks were concerned with improving the efficiency
of data transfer over the net, Willem de Bruijn is concerned with network
processing on the local host. In particular, he is working with high-end
hardware: high-speed links, numerous processors, and, importantly, smart
network adapters that can recognize specific flows and direct packets to
connection-specific queues. By the time the kernel gets around to thinking
about a given packet at all, it will already be sorted into the proper
place, waiting for the application to ask for the data.
Actual processing of the packets will happen in the context of the
receiving process as needed. So it all happens in the right context and on
the right CPU; intermediate processing at the software IRQ level will be
avoided. Willem even described a new interface whereby the application
would receive packets directly from the kernel via a shared memory
segment.
In other words, this talk described a variant of the network channels
concept, where packet processing is pushed as close to the application as
possible. There are numerous details to be dealt with, including the usual
hangups for the channels idea: firewall processing and such. The proposed
use of a file in sysfs to pass packets to user space also seems unlikely to
pass review. But this work may eventually reach a point where it is
generally useful; those who are interested can find the patches on the unetq page.
Comments (11 posted)
By Jonathan Corbet
September 14, 2011
As Linaro's CTO, David Rusling spends a lot of time observing the
interactions between the ARM architecture and the mainline kernel
development community. In his Linux Plumbers Conference 2011 keynote,
David made the point that ARM's diversity is behind many of the problems
that have made themselves felt in recent years. Much is being done to
align the ARM community with how the kernel works, but the kernel, too, is
going to have to change if it will successfully address the challenges
posed by increasingly diverse hardware.
David started with a brief note to the effect that he dislikes the
"embedded" term. If a system is connected to the Internet, he said, it is
no longer embedded. Now that everything is so connected, it is time to
stop using that term, and time to stop having separate conferences for
embedded developers. It's all just Linux now.
ARM brings diversity
ARM is a relative newcomer to the industry, having been born in 1990 as
part of a joint venture between Acorn, VLSI, and Apple. The innovative
aspect to ARM was its licensing model; rather than being a processor
produced by a single manufacturer, ARM is a processor design that is
licensed to many manufacturers. The overall architecture for systems built
around
ARM is not constrained by that license, so each vendor creates its own
platform to meet its particular needs. The result has been a lot of
creativity and variety in the hardware marketplace, and a great deal of
commercial success. David estimated that each attendee in the room was
carrying about ten ARM processors; they show up in phones (several of them,
not just "the" processor), in disk controllers, in network interfaces,
etc.
Since each vendor can create a new platform (or more than one), there is no
single view of what makes an ARM processor. Developers working with ARM
usually work with a single vendor's platform and tend not to look beyond
that platform. They are also working under incredibly tight deadlines;
four months from product conception to availability on the shelves is not
uncommon. There is a lot of naivety about open source software, its
processes, and the licensing. In this setting, David said, fragmentation
was inevitable. Linaro has been formed in response in an attempt to help
the ARM community work better with the kernel development community; its
prime mission is to bring about some consolidation in the ARM code base.
Beyond that, he said, Linaro seeks to promote collaboration; without
that, the community will be able to achieve very little. Companies working
in the ARM space recognize the need to collaborate, but they are sometimes
less clear on just which problems they should be trying to solve.
Once upon a time, Microsoft was the dominant empire and Linux was the
upstart rebel child. Needless to say, Linux has been successful in many
areas; it is now settling, he said, into a comfortable middle age. But this
has all happened in the context of the PC architecture, which is not
particularly diverse, so Linux, too, is not hugely diverse. It's also
worth noting that, in this environment, hardware does not ship until
Windows runs on it; making Linux work is often something that comes
afterward.
The mobile world is different;
Android, he said, has become the de facto standard mobile Linux
distribution. It has become known for its "fork, rebase, repeat"
development cycle. Android runs on systems with highly integrated graphics
and media processors, and it is developed with an obsession about battery
lifetime. In this world, things have turned around: now the hardware will
not ship until Linux runs on it. Given the time pressures involved, it is
no wonder, he said, that forking happens.
In the near future we are going to see the arrival of ARM-based server
systems; that is going to stir things up again. They will be very
different from existing servers - and from each other; the diversity of the
ARM world will be seen again. There will be a significant long-term impact
on the kernel as a result. For example, scheduling will have to become
much more aware of power management and thermal management issues. Low
power use will always be a concern, even in the server environment.
Problems to solve
Making all of this work is going to require greater collaboration between
the ARM and kernel communities. ARM developers are developing the habits
needed to work with upstream; the situation is much better than it was a
few years ago. But we are going to need a lot more kernel developers with
an ARM background, and they are going to have to get together and talk to
each other more often. Some of that is beginning to happen; Linaro is
trying to help with this process.
A big problem to deal with, he said, was boot architecture: what happens on
the system before the kernel runs. Regardless of architecture, the boot
systems are all broken and all secret; developers hate them. In the end we
have to communicate system information to the kernel; now we are using
features like ACPI or techniques like flattened device trees. We are
seeing new standards (like UEFI) emerging, but, he asked, are we
influencing those standards enough?
Taking things further: will there be a single ARM platform such that one
kernel can run on any system? The answer was "maybe," but, if so, it is
going to take some time. We're currently in a world where we have many
such platforms - OMAP, iMX, etc. - and pulling them together will be hard.
We need to teach ARM developers that not all code they develop belongs in
their platform tree - or in arch/arm at all. The process of
looking for patterns and turning them into generic code must continue. The
ARM community is working toward the goal of creating a generic kernel;
there are lots of interesting challenges to face, but other architectures
have faced them before.
One step in the right direction is the recent creation of the arm-soc tree,
managed by Arnd Bergmann. The goal of this tree is to support Russell King
(the top-level ARM maintainer) and the platform maintainers and to increase
the efficiency of the whole process. The arm-soc tree has become the path
for much of the ARM consolidation work to get into the mainline kernel.
Returning briefly to power management, David noted that ARM-based systems
usually have no fans. The kernel needs a better thermal management
framework to keep the whole thing from melting. And that framework will
have to reach throughout the kernel; the scheduler may, for example, need
to move processes away from an overheating core to allow it to cool down.
Everywhere we look, he said, we need better instrumentation so we have a
better idea of what is happening with the hardware.
More efficient buffer management is a high priority for ARM devices;
copying data uses power and generates heat, so copying needs to be avoided
whenever possible. But existing kernel mechanisms are not always a good
match to the ARM world, where one can encounter a plethora of memory
management units, weakly-ordered memory, and more. There are a lot of
solutions in the works, including CMA, a reworked DMA mapping framework, and more, but
they are not all yet upstream.
In summary, we have some problems to solve. There is an inevitable tension
between product release plans and kernel engineering. Product release
cycles have no space for the "argument time" required to get features into
the mainline kernel. It is, he said, a social engineering problem that we
have to solve. It will certainly involve forking the kernel at times; the
important part is joining back with the mainline afterward. And, he asked,
do we really need to have everything in the kernel? Perhaps, in the case
of "throwaway devices" with short product lives, we don't really need to
have all that code upstream.
If we are going to scale the kernel across the diversity of contemporary
hardware, he said, we will have to maintain a strong focus on making our
code work on all systems. We'll have to continue to address the tensions
between mobile and server Linux, and we'll have to make efforts to cross
the kernel/user-space border and solve problems on both sides. This is a
discussion we will be having for some time, he said; events like the
Linux Plumbers Conference are the ideal place for that discussion.
Comments (25 posted)
By Jonathan Corbet
September 13, 2011
Approximately one year after describing bufferbloat to the world and
starting his campaign to remedy the problem, Jim Gettys traveled to the
2011 Linux Plumbers Conference to update the audience on the current state
of affairs. A lot of work is being done to address the bufferbloat
problem, but even more remains to be done.
"Bufferbloat" is the problem of excessive buffering used at all layers of
the network, from applications down to the hardware itself. Large buffers
can create obvious latency problems (try uploading a large file from a home
network while somebody else is playing a fast-paced network game and you'll
be able to measure the latency from the screams of frustration in the other
room), but the real issue is deeper than that. Excessive buffering wrecks
the control loop that enables implementations to maximize throughput
without causing excessive congestion on the net. The experience of the
late 1980's showed how bad a congestion-based collapse of the net can be;
the idea that bufferbloat might bring those days back is frightening to
many.
The initial source of the problem, Jim said, was the myth that dropping
packets is a bad thing to do combined with the fact that it is no longer
possible to buy memory in small amounts. The truth of the matter is that
the timely
dropping of packets is essential; that is how the network signals to
transmitters that they are sending too much data. The problem is
complicated with the use of the bandwidth-delay
product to size buffers. Nobody really knows what either the bandwidth
or the delay are for a typical network connection. Networks vary widely;
wireless networks can be made to vary considerably just by moving across
the room. In this environment, he said, no static buffer size can ever be
correct, but that is exactly what is being used at many levels.
As a result, things are beginning to break. Protocols that cannot handle
much in the way of delay or loss - DNS, ARP, DHCP, VOIP, or games, for
example - are beginning to suffer. A large proportion of broadband links,
Jim said, are "just busted." The edge of the net is broken, but the
problem is more widespread than that; Jim fears that bloat can be found
everywhere.
If static buffer sizes cannot work, buffers must be sized dynamically. The
RED protocol is meant to do
that sizing, but it suffers from one little problem: it doesn't actually
work. The problem, Jim said, is that the protocol knows about the size of
a given buffer, but it knows nothing about how quickly that buffer is
draining. Even so, it can improve the situation in some situations. But
it requires quite a bit of tuning to work right, so a lot of service
providers simply do not bother. Efforts to create an improved version of
RED are underway, but the results are not yet available.
A real solution to bufferbloat will have to be deployed across the entire
net. There are some things that can be done now; Jim has spent a lot of
time tweaking his home router to squeeze out excessive buffering. The
result, he said, involved throwing away a bit of bandwidth, but the
resulting network is a lot nicer to use. Some of the fixes are fairly
straightforward; Ethernet buffering, for example, should be proportional to
the link speed. Ring buffers used by network adapters should be reviewed
and reduced; he found himself wondering why a typical adapter uses the same
size for the transmit and receive buffers. There is also an extension to
the DOCSIS
standard in the works to allow ISPs to remotely tweak the amount of buffering
employed in cable modems.
A complete solution requires more than that, though. There are a lot of
hidden buffers out there in unexpected places; many of them will be hard to
find. Developers need to start thinking about buffers in terms of time,
not in terms of bytes or packets. And we'll need active queue management
in all devices and hosts; the only problem is that nobody really knows
which queue management algorithm will actually solve the problem. Steve
Hemminger noted that there are no good multi-threaded queue-management
algorithms out there.
CeroWRT
Jim yielded to Dave Täht, who talked about the CeroWRT router
distribution. Dave pointed out that, even when we figure out how to tackle
bufferbloat, we have a small problem: actually getting those fixes to
manufacturers and, eventually, users. A number of popular routers are
currently shipping with 2.6.16 kernels; it is, he said, the classic
embedded Linux problem.
One router distribution that is doing a better job of keeping up with the
mainline is OpenWRT. Appropriately,
CeroWRT is based on OpenWRT; its purpose is to complement
the debloat-testing kernel tree and provide
a platform for real-world testing of bufferbloat fixes. The goals behind
CeroWRT are to always be within a release or two of the mainline kernel,
provide reproducible results for network testing, and to be reliable enough
for everyday use while being sufficiently experimental to accept new stuff.
There is a lot of new stuff in CeroWRT. It has fixes to the packet
aggregation code used in wireless drivers that can, in its own right, be a
source of latency. The length of the transmit queues used in network
interfaces has been reduced to eight packets - significantly smaller than
the default values, which can be as high as 1000. That change alone is
enough, Dave said, to get quality-of-service processing working properly
and, he thinks, to push the real buffering bottleneck to the receive side
of the equation.
CeroWRT runs a tickless kernel, and enables protocol extensions like
explicit congestion notification (ECN), selective acknowledgments (SACK),
and duplicate SACK (DSACK) by default. A number of speedups have also been
applied to the core netfilter code.
CeroWRT also includes a lot of interesting software, including just about
every network testing tool the developers could get their hands on. Six
TCP congestion algorithms are available, with Westwood used by default.
Netem (a network emulator package)
has been put in to allow the simulation of packet loss and delay.
There is a bind9 DNS server with an extra-easy DNSSEC setup. Various mesh
networking protocols are supported. A lot of data collection and tracing
infrastructure has been added from the web10g project, but Dave has not yet
found a real use for the data.
All told, CeroWRT looks like a useful tool for validating work done in the
fight against bufferbloat. It has not yet reached its 1.0 release, though;
there are still some loose ends to tie and some problems to be fixed. For
now, it only works on the Netgear WNDR3700v2 router - chosen for its open
hardware and relatively large amount of flash storage. CeroWRT should be
ready for general use before too long; fixing the bufferbloat problem is
likely to take rather longer.
[Your editor would like to thank LWN's subscribers for supporting his
travel to LPC 2011.]
Comments (70 posted)
Patches and updates
Kernel trees
- Thomas Gleixner: 3.0.4-rt13 .
(September 12, 2011)
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Architecture-specific
Security-related
Page editor: Jonathan Corbet
Distributions
September 14, 2011
This article was contributed by Donnie Berkholz
It's been three years since
LWN last covered Gentoo Linux, so
checking in on Gentoo's activities since then seems appropriate. Let's start
with a re-introduction to Gentoo. Gentoo is a source-based distribution
that is
unlike the more common binary distributions because packages are compiled
on your machines rather than remotely on the distribution's
infrastructure. Source-based distributions allow for far more customization
than is possible with binary distributions because you can not only control
which packages are installed, but also which features of a given package are
enabled (and consequently how many dependencies get pulled in).
This leads to compelling advantages for a number of use cases, although
Gentoo isn't suitable for everyone. For example, Gentoo is superb on developer workstations
because you get a proven-working toolchain and all development packages
(headers and so on) by default, as well as good support for building live
packages directly from Git/Subversion/CVS/etc. It also stands out for use in
embedded
or other minimal configurations, systems that need every last drop of
performance (since you can control the compilation flags for every package),
and other places requiring significant customization. Furthermore, Gentoo's
"teach a man to fish" philosophy makes it an excellent distribution for
learning more about how Linux works; even the installation process is
performed by hand, by following an extensive
handbook that explains every concept along the way.
Since mid-2008, Gentoo's made a number of improvements to its packaging
format and release schedule, and its development community has mostly held
steady in terms of both developers and code committed. There haven't been
any drastic changes, but in a mature project with roughly 175 active developers and
around 375 irregular contributors, sheer inertia means you shouldn't expect
many. Let's take a look at where Gentoo stands today, some of the biggest
changes over the past few years, and what looming issues it still faces.
Gentoo by the numbers
The best way to get an idea of where Gentoo's developer community and code
stand today is check the numbers rather than relying on subjective
opinion. Ohloh helpfully provides such statistics, so we'll rely on it for
this analysis. An overall picture of Gentoo's lines of code since
its origins over a decade ago are shown in the graph at right.
What's immediately visible is a steady growth from 2002-2005, a meteoric
rise in 2006, and another steady growth from 2007 on. We're going to focus
on recent years in this article, since LWN has already analyzed Gentoo's
previous history. There are two main features worth noting in this
graph. First, the slope of the 2002-2005 period is much higher than the
2007-2011 period. All the hype around Gentoo during that time was directly
correlated with a higher rate of development, whereas today's development
rate is relatively slower-paced. Second, from 2007 to 2011, the slope
appears to be gradually trailing off.
The apparent increase at the very end is likely an artifact due to
additional repositories being registered with Ohloh rather than a sudden
increase in code production.
Gentoo's codebase is growing more
slowly, suggesting a drop in the size of our community or its
productivity.
Now, let's take a look at how many contributors Gentoo has,
and how that's changed over time (Note: Ohloh defines contributors as people
who committed during a 30-day period), which is shown in the graph at right.
The data closely matches the first graph, with a peak near 250 developers
around 2006 followed by a steep drop to around 200 in 2008 and then a gradual decline
to 175 today. What could have caused the sharp drop from 2006 to 2008?
Ubuntu was announced much earlier, in late 2004, and it then became the new
"hot" distribution, so the timing is off for this to be the cause.
I suspect
the community-related problems Gentoo battled in this time frame (culminating
in the forced removal of three developers for abusive behavior in early 2008)
demotivated existing contributors and scared away potential new
contributors. Gentoo's gradual decline today suggests that its reputation
and community never fully recovered from that crisis. Although there is a
general perception within the development community
that some important things aren't being done, nobody has previously
pointed out the quantitative drop in contributors or its potential
connection to those issues. My hope is that exposing this gradual but
very real decline will spur efforts to address Gentoo's most visible and
damaging problems.
Now that we've looked at the health of the project in terms of code and
contributors, we'll more closely examine the
specific improvements the project has made. Since it's a distribution,
there's no surprise that those improvements primarily involving packaging.
Updates to the ebuild packaging format
To understand much of Gentoo's progress over the past few years, you'll need
a basic understanding of Gentoo's packages — called ebuilds. They
are essentially bash scripts with a number of helper functions and a
primitive form of inheritance. Ebuilds build packages by stepping through a
series of bash functions called phases. Each phase corresponds to a
conceptual step like unpacking the tarball, patching the code,
configuration, compilation, or installation. The key difference from the RPM
or deb packages used in binary distributions is that ebuilds must allow for
flexibility regarding how the package is built, so they're full of
conditionals about how to configure, build, and install specific features.
Gentoo's packaging format, the ebuild, is explicitly versioned to allow for
improvements to the format using an Ebuild API (EAPI). Unlike most
other distributions, these improvements occur in Gentoo on a fairly regular
basis: roughly once a year there's a new EAPI. In late 2008, Gentoo's
governing council
approved EAPI=2, which contained a significant series of
changes to the ebuild format, of which I'll describe a few of the most
important examples.
First, it added default implementations for ebuild
phases. Previously, we had to re-implement all the default code for a phase
if it required modification at all. For example, to install one additional
documentation file, we had to rewrite the default code that runs make install with a series of Gentoo-specific arguments; now, we could
instead call a function named default to run that code, then just
install the docs.
Second, EAPI=2 provided finer-grained control over
different steps of the build process. It added two new phases specifically
for preparing unpacked code for a build (e.g. applying patches) and
configuring the code (running configure or its equivalent). This allowed for
shorter, more maintainable ebuilds because more of the code for unpacking
and building can fall back to the default implementations. The final
important feature of EAPI=2 is the ability to require that specific features
be built into dependencies (a.k.a. USE dependencies). Gentoo's USE flags
generally correspond to --enable-foo in a configure script or its
equivalent, and packages higher in the stack often require that ones lower
in the stack be built with or without certain features.
The next major improvement in packaging was EAPI=4,
which came with a number of changes, and I will highlight a few of the
most important. First was
a new, very early phase called pkg_pretend to perform checks during
the dependency calculation. This allows developers to perform particular
checks before starting the build process, so an extended build of many
packages won't die in the middle because the correct kernel options aren't
enabled, for example.
Second, EAPI=4 improved error handling by forcing all
ebuild utilities to die on failure by default. This shortens error-handling
code because typically the failure of any ebuild command during the build should
result in failure for the entire package. Third, ebuilds could indicate
whether they had interdependencies among various features they
support. Often, more complex packages will have various features that are
dependent upon other features within the same package; to enable Y, you must
also enable X. A new variable called REQUIRED_USE allowed developers to set
dependencies and also indicate conflicting features.
Where did the security updates go?
If you follow Gentoo, you may have noticed the conspicuous lack of any
announcements regarding security updates (a.k.a. GLSAs) since January of
this year, and sparse announcements since late 2009. This doesn't mean that
security updates do not occur; package maintainers, who comprise the
majority of Gentoo developers, continue to quickly add patches and new
releases for security fixes. It does mean, however, that the security team
is heavily undermanned and cannot keep up with the pace of security holes in
a distribution's worth of software. A significant population of the people
who care deeply about security updates maintain servers, and often are paid
to do so. If they desire GLSAs, perhaps they could contribute to creating
them; a modest effort from enough of these people could help to revive
these security announcements.
Changes to Gentoo's release strategy
Gentoo follows a rolling-release model, with constant updates to individual
packages showing up hourly, 24 hours a day. Previously, it made releases
semi-annually by taking snapshots of its package database, performing lots
of QA on them, and creating LiveCDs — a process that required intensive
manual effort. Gentoo then moved to a "rolling release" strategy for its
releases by creating weekly automatic builds rather than formal
releases. This was a big win in terms of reducing developer effort but came
with an unexpected loss of PR for Gentoo.
When coupled with the current
lack of a
weekly or monthly newsletter, Gentoo has nearly disappeared from news
sites. It turns out that official releases drive news articles; without a
major reason to write about an open-source project, like a release announcement,
news sites often ignore it. For that reason, as well as users clamoring
for full-featured LiveDVDs with pretty artwork, Gentoo again started
producing DVD releases, with the most recent being 11.2 in August.
What makes a healthy project?
Gentoo seems to have the core aspects right: code and community. But all the
peripheral components necessary to a thriving open-source project, at least
one of this size, have been lacking in recent years. Contributors have faded
away for the weekly newsletter (first monthly, then "on hiatus"); Gentoo's previously
award-winning documentation has begun to get stale due to a lack of
documentation contributors to update it for recent changes; and the same has
been true for its release engineers and security team.
Major years-long
initiatives like a migration to Git and a redesigned website have largely
stalled, again because the people involved don't have enough time to work on
them. Although some of these aspects have improved very recently (real
releases again, and former documentation lead Sven Vermeulen just returned
to Gentoo), others remain an open question. It seems that the shrinkage in
the developer community has affected some of the most important
contributors, resulting in a major hit to the distribution that it's still
working to recover from.
What's in Gentoo's future?
First, there's the expected: Gentoo will continue to improve upon its ebuild
format with new EAPIs. The Google Summer of Code program has brought some
welcome new blood into the project, with around two-thirds of the roughly
15 internship
students each year becoming Gentoo developers. Work is ongoing to enable
integration with new technologies like systemd, although it's unclear at
this point whether it will replace Gentoo's custom init system (OpenRC) or
become yet another option. Gentoo is about providing choice wherever it
makes sense rather than enforcing its own choices upon its users, so this
same idea of choosing between alternatives also applies to GNOME 3 vs KDE,
and if anyone makes Unity integrate with Gentoo, that will become an option
too.
As is unfortunately far too common in open-source projects, progress can
sometimes be slow due to lack of volunteer time, especially on larger or
complex issues. Some of them have been dragging on for years now, like a
migration of Gentoo's main repository to Git from CVS, which is both a large
and complex issue. To date, sample conversions exist (such as the one Ohloh
uses for statistics), and a scheme was developed in collaboration with
upstream Git developers to individually sign every commit for improved
security. A tracker
bug and mailing
list exist for anyone interesting in following (or even better, helping
with) the work on the Git conversion.
Another longstanding question that's constantly discussed but rarely acted
upon is the problems with Gentoo's organizational structure. The current
model is of a seven-member council that is entirely up for re-election every
year. In addition, there is a nonprofit foundation that controls Gentoo's
finances, copyrights and trademarks, and hardware, with its own independent
board of trustees. The members-at-large model of the council (rather than
members being in charge of specific areas) means that no progress can happen
on any global Gentoo issue without a majority vote of the seven members, and
this can take months. A number of ideas have been floated, like shrinking
the council, returning to the previous model of a benevolent dictator, or
installing the foundation trustees as a corporate board that would appoint a
project leader, similar to a CEO. The goal should be whatever allows Gentoo
to make faster progress; my entirely biased opinion is that open-source
projects exist to accomplish a purpose and should focus on that, rather than
attempting to be a democratic government where everyone is equal.
Gentoo's largest problems, however, come down to a single core issue: not
enough people are working in the areas outside of development, perhaps
because they're under-appreciated — security, release engineering,
newsletters, and documentation, to name a few. If Gentoo can focus on
finding and retaining contributors there, perhaps by applying lessons from
its involvement in the Google Summer of Code, it could improve its
reputation and increase its publicity. That could well bring in the
contributors to rejuvenate what has become a somewhat sluggish open-source
project.
Comments (17 posted)
Brief items
After that, however, I reckon that I do have a tendency of noticing new,
interesting problems in need(?) of a solution, and I guess I would end up
wildly experimenting new ideas in Debian much like a victorian mad
scientist. Which reminds me that I most definitely need minions! Where can
I find minions?
--
Enrico
Zini
Comments (none posted)
The CentOS project has released CentOS 5.7. "
CentOS-5.7 is based on
the upstream release EL 5.7 and includes packages from all variants
including Server and Client. All upstream repositories have been combined
into one, to make it easier for end users to work with." Details
can be found in the
release notes.
Full Story (comments: 6)
The OpenIndiana project has
released
OpenIndiana oi_151a, exactly one year afer the release of version oi_147.
OpenIndiana is based on
illumos, a
community driven fork of OpenSolaris. "
OpenIndiana oi_151a now includes KVM, the open source Kernel-based Virtual Machine, as a basic virtualization solution along with the QEMU package! This KVM port includes virtualization extensions for Intel VT. Using KVM, a user or system administrator can run multiple virtual machines running unmodified x86_64-based operating system images for Linux, BSD, or Windows images. Each virtual machine has private virtualized hardware: a network card, disk, graphics adapter, etc."
Comments (1 posted)
Scientific Linux 5.7 is
available
for i386 and x86_64 architectures. See the
release
notes for details.
Comments (none posted)
Distribution News
Fedora
The Fedora Project is accepting nominations for the Fedora 17 release name
until September 20. Current naming ideas can be found
here.
Full Story (comments: none)
Newsletters and articles of interest
Comments (none posted)
Ryan Paul
examines
rumors of MeeGo's demise. "
The Linux-based MeeGo mobile
operating system faces an uncertain future amid rumors that Intel plans to
back away from the platform. The troubled open source software project has
failed to gain broad industry support and appears to be slowing down in the
face of weak demand and declining engagement from its backers. Intel
denied the rumors today, saying that it is still "fully committed" to MeeGo
and intends to continue developing the platform while searching for new
partners. Intel's "commitment" doesn't mean much in practice, however,
because the company's development efforts to date have done little to
advance the project. Unless Intel can attract a partner that is better
equipped to produce consumer-facing software, MeeGo doesn't have much of a
future as a discrete mobile platform."
Comments (35 posted)
Scott James Remnant has posted
a detailed discussion of problems he sees in the current Ubuntu release process and a proposed improvement: monthly releases. "
My proposal is a radical change to the Ubuntu Release Process, but surprisingly it would take very little technical effort to implement because all the pieces are already there including the work on performing automated functional and verification tests.
I believe it solves the problem of landing unstable features before theyre ready, because it almost entirely removes releases as a thing. As a developer you simply work in a PPA until youll pass review, and land a stable feature that can replace what was there before."
Comments (46 posted)
Page editor: Rebecca Sobol
Development
September 14, 2011
This article was contributed by Nathan Willis
File transfer may have been one of the original purposes of the Internet (along with remote login), but all these years later it still isn't simple. There are a wealth of no-cost file-sharing services built on top of Amazon's cloud computing and storage services, but as with most "freemium" business models, they impose usage restrictions — as well as not being free software. GNOME's Eduardo Lima unveiled a clever alternative on September 1, an AGPL-licensed project called FileTea that permits direct peer-to-peer file transfers through an HTTP gateway.
Apart from the the obvious licensing distinctions, FileTea differs from other web file-sharing services in one important respect. The FileTea server process does not store the transferred file at all. Rather, it links together an HTTP upload of the file from the sender's PC or device and an HTTP download of the file to the recipient. When user A adds a file to the service, the server generates a temporary short URL link that the user can send to his or her friends. Anyone with the URL can start downloading the file, and it will be transferred from user A's computer.
This means that user A must keep the FileTea page open, of course, or else the connection will be broken and any partial transfers aborted. The uploading (or "seeding") user also uses bandwidth for every transfer, unlike with remote-upload services, and the server itself racks up twice the bandwidth charges by virtue of funneling both the incoming and outgoing connection.
On the other hand, no storage is required on the server, and the short URLs do not need to be persistent. Any file type is equally supported, and there is no inherent limit on the size of the individual files that can be exchanged. A standard web browser is the only client-side tool required, and no special plugins are necessary. Right now only HTTP connections are supported, but HTTPS support is in development. Similarly, although at the moment only "one-hop" connections are permitted, the technique can be extended to relay connections between FileTea peers.
The transient nature of file sharing with FileTea can be a big plus. The simple use case is one user slapping up the file, pinging a colleague over email or IM with the URL, and then taking the file down immediately after the download is complete. The process is simpler than sharing a file over Bittorrent, which requires Bittorrent client software and publishing or announcing a URL that third parties could see. Especially in a one-to-one file transfer scenario, FileTea has less overhead.
Makin' tea
The FileTea source code is hosted at Gitorious, but to get a feel for how the system operates there is a demo server running at FileTea.me. The bandwidth is provided gratis by Lima's employer, Igalia, however, so it would be impolite for the community to run up their bill by using it as if it were a full-fledged product.
The FileTea process itself is a small web server, although you can
configure it to use another, external web server to host the HTML, CSS, and
JavaScript interface. The front-end allows you to "add" multiple files,
each of which gets assigned its own short URL. It would be inaccurate to
call it "uploading" each file, since there is no transfer until someone
attempts a download. The original file name, size, and MIME type are
displayed next to the URL for easy reference. You can also remove files from the currently-shared-file list, and if you navigate away from the page while you have files shared, a friendly pop-up warns you they will disappear if you do so.
The front-end is essentially the same for users downloading a file through one of the short URLs. A second tab labeled "Download" shows the files being transferred, but the "Share files" tab used to add files is there, too. Since all users are anonymous, every visitor can both upload and download.
The web interface is built with jQuery, but the FileTea server is a different beast. It is a compiled executable that depends on EventDance, a peer-to-peer communication C library also written by Lima. EventDance itself uses GLib and GObject. It is designed to provide a transport mechanism for remote "peer" nodes — in FileTea's case, between the client-side page and the server.
But because EventDance treats all peers as equals, it is relatively easy to extend FileTea's file transport to relay connections between FileTea servers and federate file-sharing. EventDance's abstraction layer also makes it easy to add HTTPS support in addition to HTTP. Lima said he is working on both features. Federation is working in a private branch, but the emphasis is on HTTPS.
On the other hand, GLib may not be a common package on web hosting
plans, and many hosting providers have restrictions on what custom code they allow. As a result, testing a publicly-accessible FileTea service is out of the reach of some users.
Security
Without persistent storage and permanent links, there is no need to maintain a database of user accounts. This allows a FileTea server to offer an essentially anonymous file-sharing service: the recipients of a file can observe only the connection between their machine and the server, not the origin of the file. So if the short URL is distributed anonymously, determining the source of the file would require compromising the server or sniffing its connections.
The short URL codes are generated by a hash function. On an extremely busy FileTea server, an attacker could brute-force guess some URLs of shared files, but the server can be configured to generate longer hash strings to make this more difficult (currently between 5 and 40 characters long, defaulting to 8).
Due to the anonymizing effects of the FileTea server, passively collecting data on a remote file provider or downloader is virtually impossible. However, because file seeders can observe when their local shared file starts an outgoing transfer, the seeder does have some information about when and how many times a file is accessed. File leechers, for their part, must remain vigilant to disguised malware in the file payload — the FileTea server does not authenticate the contents of transferred files.
Although FileTea does not require user accounts, it does obviously make connections to both the uploader's and downloader's machines, and a server could collect all sorts of data about visitors. On top of that, a server could save the actual file contents — it is up to the user to determine if a particular site is trustworthy.
Stirring up more
Lima described HTTPS support as the next "urgent" feature occupying his time, but there are a few others still to come before FileTea would be advisable for general usage. The first is a way for users to set upload bandwidth limits. Lima said that the server-side bandwidth controls are already in place. "It is a built-in feature of all EventDance connections and services. But I want to add UI controls to allow users ([especially] seeders) to limit their bandwidth, because one could easily run out of outbound bandwidth while someone is sucking files from you."
For now it is also not possible to combine multiple files into a single
download link. Of course, you can always tar or zip them together and
share the archive file, but it is possible to select and upload multiple
files separately with HTTP — the trick would just be to implement it
in FileTea without overly complicating the workflow. One might speculate
that the FileTea HTTP server could employ gzip compression on file transfers (which it does not currently) to save bandwidth, but that might not be of much practical value. Gzip performs best on uncompressed data like HTML and other text-based formats. Files large enough to warrant direct transfer are more likely to be in an already-optimized format (e.g. TIFF images or Vorbis audio), where gzip compression is likely to add bandwidth — not to mention processor — overhead.
Even today without HTTPS support, running your own FileTea server is a compelling alternative to the commercial file sharing services. Your files are not stored "in the cloud" or anywhere else, you can observe when they are downloaded, and you can take them down as soon as they are no longer necessary. You also do not have to register with a third-party to get started, and your transfer speed is as fast as your upload connection allows.
In my estimation, HTTPS is a necessary bullet point for many users, but the real moment of truth will be when peer-to-peer relaying hits. That feature has the potential to open up entirely new uses for web-application-based file sharing. With multiple hops, a popular FileTea service could implement load balancing, or a privacy "cloud" akin to Tor. Lima said he is continually getting suggestions from users for "cool features," including end-to-end encryption done in JavaScript. But the core functionality is already useful, provided that you are aware of the security implications, and everyone on the Internet does not hit the Igalia-provided demo server all at once.
Comments (3 posted)
Brief items
No developer ever thinks their change is going to break anything
for anyone. It's the QA Law of What Could Possibly Go Wrong.
--
Adam Williamson
Comments (none posted)
There is a new release of Apache httpd out there. This one corrects
another denial of service vulnerability (it requires both mod_proxy_ajp and
mod_proxy_balancer, so it will affect fewer sites) and adds further fixes
for the range-request denial of service vulnerability.
Full Story (comments: 2)
BlackHole is a block
device for the NBD protocol that performs data de-duplication. The 2.0
release has been announced; new features include support for multiple LUNs,
compression, and encryption.
Comments (none posted)
The
MongoDB
2.0 release is out. There are a lot of performance improvements; this
release also offers authentication in sharded clusters, journaling enabled
by default, improved geospatial support, and more. See
the release
notes for details.
Comments (none posted)
Version 0.9.0 of the Nemiver graphical debugger is out. "
Not only has
the code base been completely ported to GNOME 3, but it comes packed
with new features like the ability to dynamically position the main
elements of the user interface depending on your taste (or screen
layout), support of GDB pretty printers, updated translations, and
many other incremental improvements and bug fixes."
Full Story (comments: none)
Perl 5 maintainer Jesse Vincent has posted a lengthy description of where
he sees the language going in the future. "
Perl 5 is not Latin. It is a living language, still borrowing liberally
from...just about everything. Sometimes we borrow the wrong things.
Sometimes we borrow things and use them
wrong. Sometimes we invent things and later wish we hadn't. Perl has
always been something of a packrat, but we're in danger of hitting the
point of being diagnosed with a pathological hoarding problem. We need to
get better at fixing problems and moving forward without hurting old
code." Anybody with an interest in Perl 5 will likely want to
read the whole thing.
Full Story (comments: 50)
The PostgreSQL 9.1 release is out. New features include synchronous
replication, per-column collations, unlogged tables for temporary data,
security-enhanced PostgreSQL, and more; see
this LWN article for an overview of the 9.1
release and
the release
notes for details.
Full Story (comments: 54)
Version 1.8.3 of the Tahoe LAFS filesystem has been released fixing a
vulnerability that might allow an attacker to delete arbitrary files. "
This vulnerability
does not enable anyone to read file contents without authorization
(confidentiality), nor to change the contents of a file (integrity). How
exploitable this vulnerability is depends upon some details of how you use
Tahoe-LAFS."
Full Story (comments: none)
Veil
is a PostgreSQL add-on meant to provide greater control over access to
data. "
It provides an API allowing you to control access to data at
the row, or even column, level. Different users will be able to run the
same query and see different results. Other database vendors describe this
as a Virtual Private Database." The 9.1.0 release, described as the
first one that is production-ready, is now available.
Comments (none posted)
Newsletters and articles
Comments (none posted)
Page editor: Jonathan Corbet
Announcements
Brief items
The "Qt Project," being the new governance structure for Qt, has
announced its
existence. "
Since the Open Governance Model discussions started
in July 2010, we have worked closely with the community to restructure the
code base, design the governance structure, prepare the tooling, and define
a contribution model for individuals and companies. And, we are excited to
have a system in place that will be rolled out just five weeks from
now."
Some more information can be found in this supplemental
posting by chief maintainer Lars Knoll. "I want to make it very
clear that the foundation will not steer the project in any way. The
foundation is in place only to cover the costs of hosting and run the
infrastructure. All technical decisions, as well as decisions about the
project direction, will be taken by the community of Contributors,
Approvers and Maintainers. For example this means that people in Nokia
working on Qt will start working with Qt as an upstream project. Everyone
will be using the same infrastructure, including mailing lists and
IRC."
Comments (8 posted)
Articles of interest
LinuxInsider is running
an opinionated
four-part series on the "GPLv2 death penalty" discussion and the Free
Software Foundation's role in it. "
While it is true that section 4
of the GPLv2 license terminates your right to redistribute when you fall
out of compliance, section 6 is equally clear when it states that you get a
valid license from the copyright-holder with each new copy you
receive. Resuming distribution is simply a matter of returning to
compliance and downloading a new copy. It's true that this won't 'fix'
previous compliance problems; depending on their nature, they may have to
be negotiated with the copyright-holders or decided by a court, but the
threat of the ultimate "big stick" -- of never being able to resume
distribution with the new license automatically granted under section 6 --
is an attempt to impose restrictions that neither a plain reading of the
license nor the rules dealing with take-it-or-leave-it contracts
allows."
Comments (53 posted)
Bruce Perens has posted
a
description of a scheme he came up with to make copyright assignment
policies more acceptable to developers. The article is long, but the idea
is straightforward: "
A company can covenant, to each contributor of a
copyright, to continue to support and maintain a project as Open Source,
for a fixed period after a particular contribution - or to remove the
contribution from their product if they cannot continue to Open Source
their work. In this way, the Open Source developer would be assured of the
continuing labor of paid developers on the project in exchange for his
contribution, and thus the continued improvement of the program that he
uses gratis as a community participant. By making the promise in exchange
for the participation of the entire Open Source community, the company will
have a better idea of the value it is expending and the value it receives
than if it attempted to pay piecemeal for modifications. This covenant is
renewed each time there is a new copyright assignment, in that the
three-year counter starts anew every time the company receives a
contribution from a developer. Thus, developers continue to be encouraged
to contribute their copyrights to the company."
Comments (65 posted)
Upcoming Events
The schedule for the GStreamer Conference 2011 is available. "
This
year the GStreamer Conference includes presentations from the whole field
of open source multimedia, not just GStreamer, with the goal for it to be
the premier annual event for all things multimedia on linux and other open
platforms." The conference takes place October 24-25 in Prague,
Czech Republic.
Full Story (comments: none)
This linux.conf.au organizing team has released
a draft schedule for the
January 2012 conference. "
The Papers Committee worked incredibly
hard to sift through the high quality submissions and select an array of
presentations which are engaging and entertaining while delivering in-depth
info across a variety of topics. There's something for everyone in this
schedule, from those interested in kernel development, open hardware,
community or packaging to the increasingly popular FOSS music and
multimedia. It's going to be amazing! Both the Papers Committee and
Organising Team were very enthusiastic about the high rate of talks
accepted from female speakers - without any acceptance bias being needed in
the selection process. Nearly a quarter of accepted talks will be given by
female presenters - an encouraging sign of progress in the traditionally
male-oriented conference."
Full Story (comments: none)
Events: September 22, 2011 to November 21, 2011
The following event listing is taken from the
LWN.net Calendar.
| Date(s) | Event | Location |
September 19 September 22 |
BruCON 2011 |
Brussels, Belgium |
September 22 September 25 |
Pycon Poland 2011 |
Kielce, Poland |
September 23 September 24 |
Open Source Developers Conference France 2011 |
Paris, France |
September 23 September 24 |
PyCon Argentina 2011 |
Buenos Aires, Argentina |
September 24 September 25 |
PyCon UK 2011 |
Coventry, UK |
September 27 September 29 |
Nagios World Conference North America 2011 |
Saint Paul, MN, USA |
September 27 September 30 |
PostgreSQL Conference West |
San Jose, CA, USA |
September 29 October 1 |
Python Brasil [7] |
São Paulo, Brazil |
September 30 October 3 |
Fedora Users and Developers Conference: Milan 2011 |
Milan, Italy |
October 1 October 2 |
WineConf 2011 |
Minneapolis, MN, USA |
October 1 October 2 |
Big Android BBQ |
Austin, TX, USA |
October 3 October 5 |
OpenStack "Essex" Design Summit |
Boston, MA, USA |
October 4 October 9 |
PyCon DE |
Leipzig, Germany |
October 6 October 9 |
EuroBSDCon 2011 |
, Netherlands |
October 7 October 9 |
Linux Autumn 2011 |
Kielce, Poland |
October 7 October 10 |
Open Source Week 2011 |
Malang, Indonesia |
| October 8 |
PHP North West Conference |
Manchester, UK |
| October 8 |
FLOSSUK / UKUUG's 2011 Unconference |
Manchester, UK |
October 8 October 9 |
PyCon Ireland 2011 |
Dublin, Ireland |
October 8 October 9 |
Pittsburgh Perl Workshop 2011 |
Pittsburgh, PA, USA |
October 8 October 10 |
GNOME "Boston" Fall Summit 2011 |
Montreal, QC, Canada |
October 9 October 11 |
Android Open |
San Francisco, CA, USA |
| October 11 |
PLUG Talk: Rusty Russell |
Perth, Australia |
October 12 October 15 |
LibreOffice Conference |
Paris, France |
| October 14 |
Workshop Packaging BlankOn |
Jakarta , Indonesia |
October 14 October 16 |
MediaWiki Hackathon New Orleans |
New Orleans, Louisiana, USA |
| October 15 |
Packaging Debian Class BlankOn |
Surabaya, Indonesia |
October 17 October 18 |
PyCon Finland 2011 |
Turku, Finland |
October 18 October 21 |
PostgreSQL Conference Europe |
Amsterdam, The Netherlands |
October 19 October 21 |
13th German Perl Workshop |
Frankfurt/Main, Germany |
October 19 October 21 |
Latinoware 2011 |
Foz do Iguaçu, Brazil |
October 20 October 22 |
13th Real-Time Linux Workshop |
Prague, Czech Republic |
| October 21 |
PG-Day Denver 2011 |
Denver, CO, USA |
October 21 October 23 |
PHPCon Poland 2011 |
Kielce, Poland |
October 23 October 25 |
Kernel Summit |
Prague, Czech Republic |
October 24 October 25 |
GitTogether 2011 |
Mountain View, CA, USA |
October 24 October 25 |
GStreamer Conference 2011 |
Prague, Czech Republic |
October 24 October 28 |
18th Annual Tcl/Tk Conference (Tcl'2011) |
Manassas, Virgina, USA |
October 26 October 28 |
Embedded Linux Conference Europe |
Prague, Czech Republic |
October 26 October 28 |
LinuxCon Europe 2011 |
Prague, Czech Republic |
October 28 October 30 |
MiniDebConf Mangalore India |
Mangalore, India |
| October 29 |
buildroot + crosstool-NG Developers' Day |
Prague, Czech Republic |
October 31 November 4 |
Ubuntu Developer Summit |
Orlando, FL, USA |
October 31 November 4 |
Linux on ARM: Linaro Connect Q4.11 |
Orlando, FL, USA |
November 1 November 3 |
oVirt Workshop and Initial Code Release |
San Jose, CA, USA |
November 1 November 8 |
2011 Plone Conference |
San Francisco, CA, USA |
November 4 November 6 |
Fedora Users and Developer's Conference : India 2011 |
Pune, India |
November 4 November 6 |
Mozilla Festival -- Media, Freedom and the Web |
London, United Kingdom |
November 5 November 6 |
Technical Dutch Open Source Event |
Eindhoven, The Netherlands |
November 5 November 6 |
OpenFest 2011 |
Sofia, Bulgaria |
November 7 November 11 |
ApacheCon NA 2011 |
Vancouver, Canada |
November 8 November 12 |
Grace Hopper Celebration of Women in Computing |
Portland, Oregon, USA |
November 10 November 12 |
Clojure/conj 2011 |
Raleigh, NC, USA |
November 11 November 12 |
Zentyal Summit |
Zaragoza , Spain |
November 11 November 13 |
Free Society Conference and Nordic Summit 2011 |
Gothenburg, Sweden |
| November 12 |
London Perl Workshop 2011 |
London, United-Kingdom |
| November 12 |
Emacsforum 2011 |
Copenhagen, Denmark |
November 14 November 17 |
SC11 |
Seattle, WA, USA |
November 14 November 18 |
Open Source Developers Conference 2011 |
Canberra, Australia |
November 17 November 18 |
LinuxCon Brazil 2011 |
São Paulo, Brazil |
| November 18 |
LLVM Developers' Meeting |
San Jose, CA, USA |
November 18 November 20 |
Foswiki Camp and General Assembly |
Geneva, Swizerland |
November 19 November 20 |
MediaWiki India Hackathon 2011 - Mumbai |
Mumbai, India |
November 20 November 22 |
Open Source India Days 2011 |
Bangalore, India |
If your event does not appear here, please
tell us about it.
Page editor: Rebecca Sobol