|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for April 23, 2009

Faster updates with yum-presto

By Jake Edge
April 22, 2009

Keeping up with an active distribution like Fedora consumes a fair amount of time, but also bandwidth. Depending on the frequency that a yum update is performed, hundreds of megabytes—or even gigabytes—can be required to bring the system up to date. A recent experiment in rawhide uses deltarpms and the yum Presto plugin to significantly reduce the size of the packages that needed to be retrieved. The experiment looks to be largely successful which means that Fedora will likely make the deltarpm files available more widely as part of Fedora 11.

The idea behind deltarpms is not a particularly new one, but the visibility has been raised by the recent Fedora Presto test day. The tools to build deltarpms were originally created by Michael Schröder of SUSE and have been around for a few years. Basically, the tools generate a binary difference (i.e. diff) between the new and old rpm files and create an rpm that just contains the differences (a drpm). Because package changes are typically fairly small and localized, the size difference between the new rpm and the drpm can be quite substantial.

The deltarpm tools do not require that the old rpm be present on the system when installing, instead they can reconstruct the state of the old rpm from the installation itself. As long as there is a drpm corresponding to the difference between the version currently installed and the version that needs to be installed, Presto will choose the more bandwidth-efficient package to download. If the deltarpm tools are unable to reconstruct the new rpm from the installed files and drpm—due to a local configuration file change for example—Presto will fall back to downloading the full rpm of the updated package.

For rawhide users, trying Presto out is quite simple:

    yum install yum-presto
which will install and enable the Presto plugin. Using it to update rawhide on April 22 would normally have required 68M, but using the drpms available (20 of 21 packages that needed updating) reduced that to 23M for a 66% reduction. There is a substantial pause after the packages have been downloaded while the deltarpm tools rebuild the rpms from drpms—in this case something on the order of one to two minutes. For someone at the end of a low-medium bandwidth link (or someone who pays by the the amount transferred), that tradeoff is likely to be a good one.

There are still a few infrastructure glitches on the Fedora side. Part of the reason for the test day and publicizing the new feature was to find and fix those problems before Fedora 11 ships. Because of the way the deltarpm tools work—reading both rpms into memory before doing the diff—and how the Fedora infrastructure builds rpms for all architectures in parallel, only packages smaller than 200M are currently turned into drpms. There are also questions about whether it makes sense to build source and debuginfo drpms. Those types of packages are not widely used so spending repository space and build resources on drpm versions may not be warranted. From a user perspective, though, it all works quite smoothly: install a package and get a lot of bandwidth savings.

SUSE has been using drpms for some time, at least since SUSE Linux 9.3 was released in 2005. Users automatically get drpms when using the zypper tool for package updates and drpms are created for all package updates as long as the diff is smaller than the full rpm. For users that would rather get the full rpm when doing updates, drpms can be disabled in /etc/zypp/zypp.conf.

Presto development is, unsurprisingly, a Fedora Hosted project with a Trac page and Git repository. It would seem that there has been some collaboration with the openSUSE folks on the drpm format and tools so that yum and zypper will interoperate. Given that both are rpm-based tools, it is good to see the two distributions working together.

One could argue, as some have, that there is too much package churn in Fedora. On the other hand, Fedora users do tend to expect very recent, often bleeding-edge, packages. Since that is unlikely to change, Presto will be very welcome for folks whose bandwidth is limited in some way—those who are unconcerned, need not install it. Meanwhile, with less fanfare, SUSE users have been getting those savings for some time.

Comments (10 posted)

Oracle: SELECT * FROM Sun

By Jonathan Corbet
April 20, 2009
Despite a steady stream of rumors, IBM did not, in the end, buy Sun Microsystems. But, on April 20, Oracle did. This acquisition could have some interesting implications for the Linux community. Your editor, while not really knowing more than anybody else, suspects that the outcome could be mostly positive. What follows, here, is some wild speculation on where this could all go.

Some months ago, your editor posted a slightly tongue-in-cheek article on a serious topic: what would happen if Sun Microsystems were to undergo a change in management which rendered the company far less friendly toward free software? It now appears that there will, indeed, be a management change. One might well worry what changes we might see in the newly-acquired company's attitude; Oracle is not always seen as the friendliest company in general. But Oracle, while being very much a proprietary software company, does seem to have a supportive approach toward free software. Your editor was reasonably well impressed by the talk given by Oracle "Chief Corporate Architect" Edward Screven at the recent Linux Foundation Collaboration Summit. At some levels of the software stack, at least, Oracle seems genuinely interested in working with and growing the development community.

There are a number of specific topics of interest when speculating on what could happen; your editor will visit a few of them below.

MySQL. This project, of course, can be seen as being in direct competition with Oracle's flagship offering. So, unsurprisingly, a number of people have speculated that Oracle will not encourage its further growth. So, perhaps, Oracle will de-emphasize the project or "return it to the community." But that is not necessarily how things will go.

One should remember that this isn't the first time Oracle has been seen to threaten MySQL through acquisition. Back in 2005, Oracle bought Innobase, the creator of the InnoDB storage engine used by MySQL. The MySQL project wisely branched away from InnoDB, but the fact of the matter is that this code is still free software, and InnoDB releases continue to happen. The sky did not fall after all.

Beyond that, there is the simple matter that MySQL appears to earn money. This acquisition could well be an opportunity for Oracle to gain revenue from customers who, for whatever reason, are not interested in buying Oracle licenses. It broadens the company's database product line and might provide the opportunity to encourage some customers to move toward the more expensive, proprietary offerings.

Most interesting, though, will be to see what happens with the MySQL development community. Oracle still does not have vast amounts of experience running large, community-oriented projects, but it seems to be learning. The MySQL community is not in top condition, currently; it has suffered from Sun's legendary heavy hand, leading to a fair amount of developer unhappiness. There are currently a few active forks out there, raising the possibility that control over the "real" MySQL could move out of Sun's hands altogether. Oracle could, just maybe, woo these developers back into a core MySQL project which was managed in a more community-oriented manner. If that were to happen, it would be hard to conclude that this acquisition was anything but good for MySQL.

Solaris. This operating system is said, in the press release, to be one of the core justifications for the acquisition. Oracle sells a fair number of licenses for deployments on Solaris; it cannot be unhappy with the idea of gaining control over the full platform. The real question here, perhaps, is whether Oracle sees Solaris as a system with a long future ahead of it, or whether Solaris becomes a legacy platform which will be supported for some time, but which will not see a great deal of development.

There have been suggestions for a while that Sun is reconsidering its licensing choices. A GPL-licensed Solaris was not entirely out of the question before the acquisition; quite possibly, those chances have improved now. A relicensed Solaris, preferably combined with some clarity on patent licensing, could make it possible for technologies like ZFS and Dtrace to move into Linux. Whether Linux would want them is a separate discussion, though.

There is an alternative, of course: Oracle could decide to promote Solaris as an (incompatibly-licensed) competitor to Linux and reduce its involvement on the Linux side. Your editor, perhaps naively, sees this outcome as unlikely. Oracle has invested heavily enough in Linux to create a real impression of believing in the platform. Oracle has not invested in Solaris (which is also free software, remember) at anything close to the same level. If Oracle were to to try to push Solaris as a better alternative to Linux, it would really just be continuing Sun's strategy. Presumably there are people in Oracle smart enough to wonder why Oracle would have any more success with that approach than Sun did.

Btrfs. Edward Screven claimed that Oracle was pursuing Btrfs because it likes the technology better than it likes ZFS. Ownership of ZFS could well put that claim to the test, but there does not appear to be any reason to believe that it was not sincere. The early word from Oracle is that plans for Btrfs have not changed, and that the resources put into that project will not decrease.

Java. The press release states that Java "is the most important software Oracle has ever acquired." Much Oracle-based software is written in Java, so there are clear advantages in having control over that part of the software stack. Increasingly, customers can just go to Oracle and get support for most of the major components they use from a single source. That, presumably, will help make some money for Oracle.

OpenOffice.org. This project looks like a bit of a strange fit in Oracle, which is not really a desktop software company. Still, Oracle may see value in keeping this project going as a way to encourage corporate desktop users away from Microsoft products. With any luck at all, Oracle will work to turn OpenOffice.org into a more community-oriented project. By making participation in OpenOffice.org so hard, Sun has spurned the offers of assistance which have come from around the community. Maybe Oracle will be a bit smarter and will realize that, by opening things up a bit, it can speed the development of OpenOffice.org without really having to invest more into the project. One can always hope.

What it comes down to is that just about anything could happen. It could be that this acquisition is part of a long-term plan by Oracle to acquire just enough of the free software community to neutralize any threats it sees. Now that this hypothetical plan is coming to fruition (lacking, perhaps, just the occasionally-rumored acquisition of Red Hat), Oracle can proceed to move away from Linux, turn things proprietary, and generally prepare itself for the Final Battle. This would not be a good outcome for the Linux community, though we would, as usual, end up stronger once the dust had settled.

Alternatively, Oracle may have understood that truly free software can help to turn its competitors' products into commodities while enabling Oracle to provide a solid offering around its own products. This company, which has already become one of the top Linux kernel contributors, could become the top contributor to free software projects as a whole (a title which Sun has already claimed). If Oracle sustains Sun's projects in a more community-oriented mode, we may well conclude, one year from now, that this acquisition was a good thing indeed.

Comments (82 posted)

A look at the MySQL forks

April 22, 2009

This article was contributed by Nathan Willis

Sun's sudden acquisition by Oracle triggered a deluge of speculation about the future of the company's free software projects: Java, OpenOffice, VirtualBox, OpenSolaris, and, most of all, MySQL. Will Oracle kill it? Spin it off? Keep its hands off? In light of this uncertainty, the discussion soon shifted to the trickier question of what branch constitutes the MySQL. The project has been forked multiple times — several even in the past year. Considering that each competitor is led by a heavyweight MySQL developer and has its own goals, how is a humble database administrator supposed to choose?

Patch sets and proto-forks

The seeds of this confusion predate MySQL's acquisition by Sun, when MySQL developers began to lose patience with MySQL AB's governance of the project. Management had announced two branches, "enterprise" and "community," in 2006, but soon began to miss scheduled binary and source releases of the community branch. Worse still, community developers complained that the company was trying to hide the enterprise branch code — changing the release location between iterations.

In 2007, Jeremy Cole of Proven Scaling took matters into his own hands, and set up a public mirror of the official "enterprise" releases as they appeared. Cole does not make changes to the code released by Sun, although Proven Scaling does publicly maintain its own set of patches and tools for MySQL — as do several other database consulting firms and MySQL users, including Google.

Percona

One of those consulting firms is Percona, a web-development consulting business that emphasizes its expertise in MySQL. Percona develops a pluggable storage engine for MySQL called XtraDB. XtraDB is an enhancement to the popular InnoDB engine, designed to work as a drop-in replacement. It adds the ability to scale better on multi-core hardware, use memory more efficiently, and adds more tune-ability and metrics.

Percona's MySQL releases do not remove InnoDB to replace it with XtraDB, but do include patches to InnoDB. They also incorporate patches from other sources, including Proven Scaling, Google, and Open Query. Source and binary releases, as well as RPMs for Red Hat Enterprise Linux, are available for MySQL 5.0 and MySQL 5.1.

Percona's patch set is documented on the company's wiki. The patches include changes that add status variables, more configuration parameters, additional I/O settings, dynamic memory allocation, and alters mutexes and locks to improve performance on SMP systems.

OurDelta

OurDelta was launched in October of 2008 by former MySQL employee Arjen Lentz (now at Open Query), and describes its mission as providing "enhanced" MySQL builds for common production platforms. Its releases build on Percona's, adding additional patches (some from Google and other third-parties, some original work) and including additional storage engines.

OurDelta maintains two builds, one stable and one bleeding-edge. All stable releases so far have been for MySQL 5.0, and include the full-text-search-capable Sphinx storage engine. Upcoming work for MySQL 5.1 and MySQL 6.0 will add an enhanced version of InnoDB from Innobase, PBXT, and FederatedX storage engines. OurDelta makes source code releases available as tar archives, and runs binary repositories for Red Hat Enterprise Linux and CentOS, Debian, and Ubuntu.

OurDelta also documents its significant patches. In addition to the Percona patch set, OurDelta includes activity monitoring and reporting (per table, index, account, and machine), improved logging, an option to kill idle database connections, the ability to temporarily freeze InnoDB for backup purposes, and improvements to speed up failover.

MariaDB

MySQL founder Michael "Monty" Widenius started his own fork in February of 2009 after leaving Sun. At the time, he said his reason for departing was dissatisfaction with Sun's development and community processes for MySQL, which was not "a true open development environment" that encouraged outside participation.

Widenius's fork is called MariaDB, and the only major change is that it uses the Maria storage engine, which is the focus of development. The rest of the code is regularly synchronized with MySQL releases from Sun, and is intended to be one hundred percent interoperable.

The Maria storage engine is an evolution of MySQL's default MyISAM storage engine, and is designed to duplicate the features found in InnoDB, notably crash recovery and full transactional support. Maria and MariaDB are being developed against MySQL 5.1. Widenius expects the Maria engine to be a standard part of Sun's MySQL 6.0 releases, but intends to keep developing MariaDB even after MySQL 6.0 is stable. So far, the project has released source code packages and generic x86 binaries for Linux.

Widenius maintains a wiki page documenting the advantages of MariaDB over Sun's unmodified MySQL, focusing on the features of the Maria storage engine. Aside from the larger goals of crash-safety and transactional support, he notes that using Maria as a storage engine should speed up complex queries. In addition, MariaDB contains speed improvements, the ability to use a pool of threads to handle queries (rather than one thread per connection), and bugfixes not accepted by Sun.

Drizzle

Drizzle is the most distinctive MySQL fork, perhaps better described as a complete refactoring. Drizzle is the work of Brian Aker, long a preeminent MySQL developer. He announced the project in July of 2008, saying that he disliked many of the changes made to MySQL after version 4.1, and felt that there was a large market of users that did not want them. Despite launching the fork, Aker continues to work in the MySQL group at Sun.

Drizzle cuts the core of MySQL down to the bare minimum, using a microkernel-and-modules approach. The goal is to create a slimmed-down, optimized database targeting web infrastructure and cloud components.

Aker said that Drizzle will question the foundations of database design, and is not intended to be SQL compliant. The FAQ emphasizes a "look forward, not back" philosophy. For example, Drizzle targets modern, multi-core hardware, modern compilers, and modern operating systems. Similarly, the development team is not interested in feature requests or in adding excised MySQL features back in. Thus far, the project had made only source code releases, and has noted that they are not yet stable for production use.

Conclusion

The major Linux distributions all package Sun's "community" version of MySQL. Sun itself provides free downloads of the community edition from the web, evidently having learned a lesson from the 2007 uproar. Sun's official packages are likely to be newer, given the release cycles of most distributions, and to its credit Sun makes binary builds available for a wide variety of processor architectures and distributions, including older releases of those distributions. For most users, such a supported build is usually the best choice. The Percona and OurDelta packages represent the work of in-the-field MySQL consultants, and MariaDB is focused on the Maria engine, but only experienced database administrators are likely to be able to take advantage of the additional features they offer.

Still, it is telling that so much of the work done by the forks centers around the InnoDB storage engine: the patches written by Percona and OurDelta, Percona's replacement engine XtraDB, and MariaDB's replacement engine Maria. InnoDB is GPLv2-licensed, but the copyright is owned by ... Oracle. Oracle acquired InnoDB's creator Innobase in 2005. That acquisition sparked a flurry of concern that the database giant would kill the product, take it proprietary, or somehow use it against MySQL — many of the same nightmare scenarios now speculated about the Sun purchase. It is worth noting that in the intervening years two things have occurred: Oracle has not killed or maimed InnoDB, and the open source community has preemptively created its own innovative solutions, thereby insulating open source users and customers from disaster should Oracle take a step in the wrong direction.

The real question is not which fork is the MySQL, but whether the multiple patch sets and forks indicate sickness or health for MySQL as a whole. Excluding Drizzle, all of the projects were started because someone who cared a great deal about the future of MySQL saw something wrong with MySQL's development process (and for its part, Drizzle was spawned by even deeper dissatisfaction with the technical direction of MySQL). Surely that much concern on the part of the community signifies health. There is no telling which forks will prosper and which will fizzle out, but that depends to a large degree on Oracle, and how it governs the project in the future.

Comments (13 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Security: A privilege escalation flaw in udev; New vulnerabilities in cups, firefox, kernel, udev,...
  • Kernel: In search of the perfect changelog; The slow work mechanism; DRBD: a distributed block device.
  • Distributions: Debian GNU/kFreeBSD: one more step towards a universal operating system; Annual Distribution List update; Ubuntu 9.04 RC; Sugar on a Stick Beta 1; Fedora Unity Releases F10 Re-spins.
  • Development: GCC reaches the 4.4.0 release, what's coming in glibc 2.10, new versions of MySQL Community Server, TestDisk/PhotoRec, Samba, RPM, CUPS, Midgard2, Octopussy, skpd, Xfce, Elisa, Firefox, PyMite, Python, Mock, GIT, Jason.
  • Press: Shuttleworth promotes 2-3 year meta-cycles, Android for set-top boxes, Linux under Windows apps, OLPC XO 1.5 review, RTI Data Distribution Service review, the health of openSUSE.
  • Announcements: GNOME sysadmin team, Zemlin on Oracle/Sun, TomTom case and GPLv3, Oracle buys Sun, open source activity map, IMF cfp, LPC cfp, openSUSE Summit cfp, NLUUG conf sched, OpenSource World sched, X Dev Conf.
Next page: Security>>

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds