LWN.net Weekly Edition for November 19, 2009
Btrfs for Rawhide users?
Your editor stopped using Rawhide (the Fedora development distribution) after things melted down spectacularly back in July. Since then, problems have been scarce but all that stability on the desktop has proved to be seriously boring. Additionally, running a stable distribution can make it harder to test leading-edge project releases. So your editor has been looking to return to a development distribution on the desktop as soon as time allows and things look safe enough. Rawhide's worst problems are far behind it for now; it might just be safe to go back into the water, though the beginning of the Fedora 13 development cycle could add some excitement. As an added incentive, the Fedora developers now are considering mixing in Btrfs snapshots as an optional feature; use of an experimental filesystem might not seem like the way to improve stability, but Btrfs could, in fact, make life easier for Rawhide testers.It is worth noting at the outset that Fedora is not, yet, considering using Btrfs in Rawhide by default. What has been proposed, instead, is the implementation of a "system rollback" feature for Rawhide users who are crazy enough to install on Btrfs despite its young and immature state. If this feature works out, it could remove much of the risk of tracking Rawhide and begin the exploration of a new capability which could prove highly useful for Linux users in general in the future.
One of the many features provided by Btrfs is copy-on-write snapshots. At any time, it is possible to freeze an image of the state of the filesystem. Snapshots are cheap - at creation time, their cost is almost zero. As changes are made to the filesystem, copies will be made of modified blocks while the snapshot remains unchanged. One can certainly fill a filesystem through use of the snapshot facility - and filling Btrfs filesystems remains a bit of a hazardous thing to do - but Btrfs will share data between snapshots for as long as possible.
The value of snapshots to system administrators is fairly obvious: a snapshot can be taken immediately prior to an operating system upgrade. Should that upgrade turn out to be less of a step forward than had been hoped, the filesystem can simply be reverted back to its pre-upgrade state. The days of digging around for older versions of a broken packages - perhaps with the assistance of a rescue disk - should be long gone.
That said, there are a number of details which need to be worked out before snapshots can be made ready even for Rawhide users, much less the wider user community. Perhaps the biggest problem is that Btrfs snapshots cover the entire filesystem, so reverting to an older state will lose all changes made to the filesystem in the meantime. If a system update fails to boot, dumping the update seems like a straightforward choice - there will be no other changes to lose. But going back to a snapshot after the system has been running for a while could lose a fair amount of work, log data, etc. along with the unwelcome changes. One can always cherry-pick changed files after reverting to the snapshot, but that would be a tedious and error-prone process.
There are a lot of user interface details to take care of as well. Tools need to be created to allow administrators to look at existing snapshots, mount them for examination, clean them up, and so on. Btrfs will probably have to be extended with a concept of a user-selectable "default" snapshot for each filesystem. Grub needs some work for boot-time snapshot selection. There is also talk of eventually adding snapshot-browsing support to Nautilus as well.
Snapshots will clearly be a useful feature for Linux in the future. Back in your editor's system administration days, backup tapes were occasionally used to recover from disk disasters, but much more frequently used to help users recover from "fat-finger" incidents. Snapshots are not true backups, but they should certainly be useful as a quick error-recovery mechanism. Your editor is looking forward to the day when his system always supports a series of snapshots allowing the recent state of the filesystem to be recovered.
A snapshot is a heavyweight tool for dealing with system upgrade problems, though. In the longer term, it would make sense to have better rollback support built into the package management system itself. Interestingly, Yum and RPM have had some rollback support in the past, but that feature does not seem to be well supported now. Providing rollback support at this level is a hard problem, to say the least, but solving that problem would put a powerful tool into the hands of Linux system administrators.
In the absence of this feature, filesystem-level snapshots will have to do; certainly they are a major improvement over what we have now. In the short term, potential users should remain aware that Btrfs is a very young filesystem, and that snapshots may not be a viable recovery mechanism if the filesystem itself gets corrupted. In the longer term, though, there will be a day when we will wonder how we ever used our systems without this feature. The work being done by the Fedora developers is an important step in that direction.
Reducing HTTP latency with SPDY
Google unveiled an experimental open source project in early November aimed at reducing web site load times. SPDY, as it is called, is a modification to HTTP designed to target specific, real-world latency issues without altering GET, POST, or any other request semantics, and without requiring changes to page content or network infrastructure. It does this by implementing request prioritization, stream multiplexing, and header compression. Results from tests on a SPDY-enabled Chrome and a SPDY web server show a reduction in load times of up to 60%.
SPDY is part of Google's "Let's
make the web faster" initiative that also includes projects targeting
JavaScript speed, performance benchmarking, and analysis tools. Mike
Belshe and Roberto Peon announced SPDY on November 11 on both the Chromium and
Google
Research blogs, noting that "HTTP is an elegantly simple protocol
that emerged as a web standard in 1996 after a series of experiments. HTTP
has served the web incredibly well. We want to continue building on the
web's tradition of experimentation and optimization, to further support the
evolution of websites and browsers.
"
Finding the latency in HTTP
The SPDY white paper details the group's analysis of web latency, beginning with the observation that although page requests and responses rely on both HTTP as the application-layer protocol and TCP as the transport-layer protocol, it would be infeasible to implement changes to TCP. Experimenting on HTTP, on the other hand, requires only a compliant browser and server and can be tested on real network conditions.
The group found four factors to be HTTP's biggest sources of latency. First, relying on a single request per HTTP connection makes inefficient use of the TCP channel and forces browsers to open multiple HTTP connections to send requests, adding overhead. Second, the size of uncompressed HTTP headers, which comprise a significant portion of HTTP traffic because of the large number of HTTP requests in a single page. Third, the sending of redundant headers — such as User-Agent and Host — that remain the same for a session. Finally, the exclusive reliance on the client to initiate all HTTP requests, when there are cases where the server knows that related content will be requested, but cannot push it to the client.
SPDY tackles these weaknesses by multiplexing an unlimited number of concurrent streams over a single TCP connection, by allowing the client to assign priorities to HTTP requests in order to avert channel congestion, and by compacting HTTP request and response headers with gzip compression and omitting the redundant transmission of headers. The SPDY draft specification also includes options for servers to initiate content delivery. The available methods are "server push," in which the server initiates transmission of a resource via an X-Associated-Content header, and "server hint," in which the server only suggests related resources to the client with X-Subresources.
In addition, SPDY is designed to run on top of SSL, because the team decided it was wiser to build security into its implementation now than to add it later. Also, because SPDY requires agents to support gzip compression for headers, it compresses the HTTP data with gzip too.
The important thing to note is that SPDY's changes affect only the manner in which data is sent over the wire between the client and the server; there are no changes to the existing HTTP protocol that a web page owner would notice. Thus, SPDY is not a replacement for HTTP so much as a set of possible enhancements to it.
Comments on the blog posts indicate that although most readers see the value in header compression and request prioritization, some are skeptical of the need to multiplex HTTP requests over a single TCP connection. Other alternatives have been tried in the past, notably HTTP pipelining and the Stream Control Transmission Protocol (SCTP).
The white paper addresses both. SCTP, it says, is a transport-layer protocol designed to replace TCP, and although it may offer some improvements, it would not fix the problems with HTTP itself, which SPDY attempts to do. Implementing SCTP would also require large changes to client and server networking stacks and web infrastructure. The latter is also true for similar transport-layer solutions like Structured Stream Transport (SST), intermediate-layer solutions like MUX, and HTTP-replacements like Blocks Extensible Exchange Protocol (BEEP).
The problem with pipelining, it says, is that even when multiple requests are pipelined into one HTTP connection, the entire connection remains first-in-first-out, so a lost packet or delay in processing one request results in the delay of every subsequent request in the pipeline. On top of that, HTTP pipelining is difficult for web proxies to implement, and remains disabled by default in most browsers. The fully multiplexed approach taken by SPDY, however, allows multiple HTTP requests and responses to be interleaved in any order, more efficiently filling the TCP channel. A lost packet would still be retransmitted, but other requests could continue to be filled without pausing to wait for it. A request that requires server-side processing would form a bottleneck in an HTTP pipeline, but SPDY can continue to answer requests for static data over the channel while the server works on the slower request.
Implementation and test results
The development team wrote a SPDY web server and added client support in a branch of the Chrome browser, then ran tests serving up "top 100" web site content over simulated DSL and cable home Internet connections. The test included SSL and non-SSL runs, single-domain and multiple-domain runs, and server push and server hint runs. The resulting page load times were smaller in every case, ranging from 27.93% to 63.53% lower.
The team's stated goal is a 50% reduction in load time; the average of the published tests in all of their variations is 48.76%. Though it calls the initial results promising, the team also lists several problems — starting with the lack of well-understood models for real world packet loss behavior.
SPDY remains an experiment, however, and the team solicits input on a number of open questions, including dealing with the latency introduced by SSL handshakes, recovering from a lost TCP connection, and how best to implement the server-side logic to truly take advantage of server push and server hint. Interested people are encouraged to join the mailing list and download the code.
So far, only the modified Chrome client code is available, and that from the public Subversion repository, not binary downloads. Peon said that the server release is coming soon, and the project page says that the test suite and benchmarking code used in Google's test will be released under an open source license as well.
A 50% reduction in page load times is nothing to sneer at, particularly
when all of the gains come from tweaking HTTP's connection and data
transfer behavior. Header
compression alone gives noticeable savings; the white paper states that it
resulted in an "~88% reduction in the size of request headers and an
~85% reduction in the size of response headers.
"
The future of the web may indeed include
new protocols like SCTP and BEEP, but SPDY is already demonstrating that
there is plenty of room for improvement without drastically altering the
protocol stack.
Notes from the LF End User Summit
To many, the Linux development community appears to be highly open, with access to developers only an email away. To much of the user community, though, the situation looks different, with core developers seemingly as distant and inaccessible as they would be if they were doing proprietary code. Bridging the gap between users and developers is one of the tasks the Linux Foundation has set for itself; the annual End User Summit is intended to help toward that goal.The End User Summit draws a different crowd than any other event. Well-known Linux developers are present, certainly, but they do not form the majority of the crowd; they are, instead, strongly outnumbered by representatives of banks, insurance companies, and financial firms. Old conference T-shirts are far outnumbered by suits and ties in this crowd. The End User Summit, in other words, caters to enterprise distribution customers and others who are using Linux in high-stakes situations - even a major stock exchange which has based its operation on Gentoo. It makes for an interesting combination of people and a unique set of conversations.
One speaker was Brian Clark from the New York Stock Exchange. NYSE's systems run under high pressure and tight constraints. They process some three billion transactions per day - more than Google does - and those transactions need to execute in less than one millisecond. Customers can switch to competing exchanges instantly and for almost no cost, so if NYSE's systems are not performing, its customers will vanish. A typical trading day involves the processing of 1.5TB of data; some 8 petabytes of data are kept online. And this whole operation runs on Linux.
NYSE is highly concerned with software quality and security; they are subject to thousands of attacks every day. Downtime is to be limited to 90 seconds per year. All told, Linux has worked very well in this setting. NYSE had some requests, though, including the increasingly common desire for a way to move everything except a specific application off of a given core. Brian requested a way to lock a process's memory in place - a functionality which mlock() would appear to have provided for many years. He would also like a non-disruptive way to measure latencies, especially in the network stack.
In the end, he says, NYSE likes Linux because of the community which stands behind it - an interesting position given NYSE's rather low profile in that community. One place where it was suggested NYSE could help would be to advise the developers on the best placement of tracepoints into the network stack to yield the sort of latency measurements they would like to see.
Al Gillen of IDC is a common presence at this sort of event; he gave a chart-heavy talk on how IDC expects things to go in the server marketplace. The outlook for Linux server shipments would appear to be bright. One interesting tidbit from the talk: Linux server shipments will be growing strongly in the coming years, while Unix will be declining. That means that, in 2013, the Linux market looks likely to reach half the revenue value of the Unix server market. Unix may be suffering, but there's still a lot of money being spent on it.
Anthony Golia of Morgan Stanley discussed the use of Linux there; Morgan Stanley has been heavily using the operating system for several years now, and is running it on tens of thousands of systems. It was, he says, a bit of a rough start, but Morgan Stanley learned that the community "lends itself well to partnership." The company figured out how to send fixes back upstream and has experience the "warm fuzzy feeling" that comes with getting fixes merged. In recent times they are finding far fewer bugs and are quite happy with the choice to go with Linux.
Anthony had some requests too, beginning with support for TCP offload engines. What Morgan Stanley really needs, though, is shorter network latencies. Trades are dependent on getting orders in quickly in response to events, and latencies work against that goal. They would like a way to generate long-term statistics of a process's memory use, mostly as a way of knowing whether it's safe to load more work onto a specific server. There was also a request for better coordination between distributors and hardware manufacturers, yielding support for new hardware as soon as that hardware is available.
Jeffrey Birnbaum of the Bank of America led a session on shortcomings he sees with Linux at this time. In particular, Jeffrey anticipates a future dominated by increasing availability of fast CPUs and the growing influence of solid-state storage devices. The world is changing, and he worries that Linux is not changing quickly enough to keep up with it. Technology is improving quickly, he says, and the kernel is holding users back.
Specific problems include latency in the network stack and the ability of networking to make use of large numbers of CPUs. TCP, he says, is not scalable, but it wasn't clear where the problems are. One request that was clear was a means by which messages could be sent to multiple destinations with a single system call - something akin to the proposed sendmmsg() system call. He suggested that the time has come to move beyond POSIX interfaces - he is a fan of Ulrich Drepper's event interface proposal - and that the use of protocols like SATA to talk to solid-state storage is a mistake. There was also some discussion about difficulties getting a scalability problem with the epoll_wait() system call fixed.
Perhaps the clearest point to emerge from this session is that users like Jeffrey need a solid channel to communicate with the development community about their needs and frustrations. One would think that this would be an ideal role for enterprise distribution vendors to fill; indeed, in the following session, Novell's Carlos Montero-Luque described the session as a great advertisement for commercial distributions. But, for whatever reason, those distributions do not appear to be filling that role in this case.
Carlos, along with Red Hat's Brian Stevens, talked about the future as the distributors see it. There was lots of talk on the value of Linux on mainframes, which seems to be of great interest to this user community currently. Interestingly, Brian noted that Red Hat is not entirely sure that the success which has been achieved with Linux can be replicated at other levels; the JBoss development community, for example, is nearly 100% Red Hat employees.
On the subject of unpaid Linux, Brian claimed that these deployments were "fantastic." Anything which grows the overall market can only be good for the participants therein. Carlos had some darker comments about how unpaid Linux is not "free," and that it will always be paid for in some other way.
[PULL QUOTE: Everybody was afraid of being sued and ending up on the front page of the Wall Street Journal, so outright prohibitions on the use of open source were common. END QUOTE] Tim Golden is a manager at a high-profile American bank; in his talk on "the changing role of enterprise open source," though, he was clear to point out that he was speaking only for himself. This talk started with the relatively early days, when companies like banks saw open source as being far too risky to use. Everybody was afraid of being sued and ending up on the front page of the Wall Street Journal, so outright prohibitions on the use of open source were common.
There were a couple of intermediate steps, including one where managers came to the radical conclusion that the submission of bug fixes did not deprive a company of its Valuable Intellectual Property. During this time, fears about the use of open source faded considerably, and companies increasingly decided that they could tolerate whatever risk remained - at least in "high value" situations.
The current situation is heavily affected by the financial crisis; financial companies have realized that they must find a way to be competitive with far less money. This understanding has helped to usher in the "open source software as a strategy" era, with companies setting up formalized management programs for open source. An interesting thing is happening in some companies as they go through this process, though: executives are figuring out that it's hard to drive open-source projects from the back seat. They are also coming to the conclusion that participation in development projects is not as disruptive as they had once thought.
So now these companies are beginning to dip their toes in the water and look at ways to participate. There are lots of options, ranging from simple cash contributions - which don't create any real linkage with the community - through to investments in companies and "intellectual property contributions." Eventually, says Tim, we'll start to see something that was once unthinkable: development projects being run by end users.
That last statement maybe reveals something about how these companies see free software. To them, projects run by end users are a new, scary, and exotic thing. But your editor would submit that almost every development project of interest is run by end users. The developers who came together to create the Linux kernel weren't working for others. The group that pulled together their patches and released "a patchy" server were planning to deploy that server (now "Apache") themselves. As end users in the financial industry start to run projects aimed at meeting their own needs, some of those projects, at least, should prove equally successful.
There is no need to convince the financial industry that free software can benefit its operation; they have understood that for a few years now. Convincing this industry that contributing to the software it uses makes sense has been somewhat harder. It would appear that this message is starting to be heard, and companies in this industry are beginning to look for ways to reach out to the development community. Events like the End User Summit seem like an ideal way to facilitate communication between the existing development community and its future members; it is a learning experience for everybody involved.
The LWN.net Weekly Edition will be early next week
Thursday, November 26, is the U.S. Thanksgiving holiday. LWN's editors fully intend to spend that holiday eating far too much food; to make that possible, we'll be publishing the Weekly Edition on November 25. LWN will return to its regular schedule the following week.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: TLS renegotiation vulnerability; New vulnerabilities in asterisk, java, qt, wordpress,...
- Kernel: High-order GFP_ATOMIC allocation trouble; Receive packet steering; SamyGO.
- Distributions: openSUSE 11.2; new releases of Fedora 12, Knoppix 6.2, openSUSE 11.2, Ubuntu Studio 9.10, Vector Linux 6.0 Kde Classic, XtreemOS 2.0; openSUSE board meetings to be public; reviews of Fedora and Ubuntu.
- Development: Officeshots: making ODF truly interoperable, GNOME Zeitgeist overview, notmuch mail client, future of Moonlight, new versions of JACK, PulseAudio, Exim, Midgard2, nginx, Ardour, XCircuit, Wine, Amarok, IcedTea7, Parrot, Urwid, GIT, Mercurial, GNU patch.
- Announcements: Android Dev Phone, Chumby Guts, EFF gets FISA docs, Open Web Foundation Agreement, Sudo patent, Google and Linux, PyPI Poll, PyCon talks, EFF Copyright Watch, planet LAD.