Leading items

MySQL 5.1 and development models

By Jonathan Corbet
December 3, 2008

The MySQL development team decided to celebrate the (US) Thanksgiving holiday with the release of MySQL 5.1.30, the first "general availability" (read "production-ready") release in the 5.1 series. There is a lot of good stuff in 5.1.30, including table partitioning, row-based replication, a new plugin API, a built-in job scheduler, and more; see the nutshell summary for more information. It's a celebration point for a long development series; the MySQL developers are to be congratulated for what they have accomplished with this release.

Behind the celebration, though, one can hear the grumbling from unhappy developers and users. This release has been a long time in coming; the first 5.0 GA release was in October, 2005 - just over three years ago. The first 5.1 release candidate (5.1.22) came out in September, 2007; seven more "release candidates," many with major changes, were announced over the following 14 months. So the 5.1 production release came rather later than desired, but some developers feel that it was still to soon; the complaints reached a climax in this lengthy posting from Michael "Monty" Widenius, the original creator of MySQL. His point of view, in short, is that this release has fatal bugs, and that these bugs come from a number of flaws in how MySQL development is managed.

Your editor cannot claim to be an expert on the MySQL development community. But Monty, presumably, is an expert on this community, so his observations have a higher than usual likelihood of reflecting something close to reality. Reading various dissenting posts (example) has done little to make your editor feel otherwise. And, in any case, much of what Monty says rings true when compared against experiences from elsewhere in the free software community. As projects grow, they must occasionally revisit their development models. There is little happening here which is truly unique to MySQL.

Monty asserts:

MySQL 5.1 was declared beta and RC way too early. The reason MySQL 5.1 was declared RC was not because we thought it was close to being GA, but because the MySQL manager in charge *wanted to get more people testing MySQL 5.1*. This didn't however help much, which is proved by the fact that it has taken us 14 months and 7 RC's before we could do the current "GA". This caused problems for developers as MySQL developers have not been able to do any larger changes in the source code since February 2006!

Two things jump out of that statement. One is that MySQL apparently suffers from an inadequate testing community. Needless to say, that is not a problem which is unique to this project; testing is a scarce resource throughout our community. MySQL users who are unhappy with the results of the development process might want to ask themselves if they are doing enough to help with the testing process. Like it or not, testing software and finding bugs is one of the costs of "free" (beer) software. If this testing doesn't happen during the development cycle, it will end up happening with the "stable" releases instead.

The other attention-getter above is the statement that MySQL developers have been unable to make major changes since early 2006. One need only think back to the 2.4 kernel days to see the kind of damage that can result from pent up "patch pressure." Developers get frustrated, major changes start to find their way into "release candidate" code, and the number of bugs tends to increase. The existence of a separate MySQL 6 development branch helps, perhaps, in reducing patch pressure, but it can also only serve to distract developers from stabilizing current release candidates.

Related to this is another assertion:

Too many new developers without a thorough knowledge of the server have been put on the product trying to fix bugs. This in combined with a failing review process have introduced of a lot new bugs while trying to fix old bugs.

Review would appear to be a big part of the problem in general. It may well be that a failure of review has caused the introduction of new bugs with fixes. But one could argue that the problem is deeper than that: any code which failed to stabilize over fourteen months of release candidates should, almost certainly, never have been merged into the MySQL trunk to begin with. It seems that there are not enough eyeballs being applied to major new features before they go in.

Your editor has resisted the temptation to make comparisons with other relational database manager projects, but there is value in comparing this state of affairs with the review problems faced by PostgreSQL in recent years. An inability to get additions to PostgreSQL properly reviewed resulted in those additions not being merged. That, in turn, leads to delayed releases with fewer than the desired number of features, neither of which is particularly pleasing for users or developers. But, on the other hand, PostgreSQL does not appear to have the same kind of trouble stabilizing its major releases.

Perhaps the key point to take away from all of this, though, is here:

In addition, the MySQL current development model doesn't in practice allow the MySQL community to participate in the development of the MySQL server.

MySQL is very much a corporate-owned, corporate-driven project, and it has been for a long time. Decisions on what to include are made internally; there is little discussion of development decisions on the project's mailing lists. It is hard to find information on how to contribute to the project; some of the available information still tells prospective contributors to use BitKeeper. All code is copyrighted by MySQL (now Sun), which reserves (and uses) a right to distribute that code under proprietary licenses.

All of the above reflects an arrangement which has worked well for years, and which has produced an immensely valuable database manager used by vast numbers of people. But it is not a community project, so development decisions will not necessarily reflect the best interests of the wider user or developer communities. If, as Monty suggests, those decisions are made in ways which favor features and deadlines over quality, there will be little that the community can do about it.

Comments (11 posted)

KSM runs into patent trouble

By Jake Edge
December 3, 2008

On the kernel page a few weeks ago, we took a look at KSM, a technique to reduce memory usage by sharing identical pages. Currently proposed for inclusion in the mainline kernel, KSM implements a potentially useful—but not particularly new—mechanism. Unfortunately, before it can be examined on its technical merits, it may run afoul of what is essentially a political problem: software patents.

The basic idea behind KSM is to find memory pages that have the same contents, then arrange for one copy to be shared amongst the various users. The kernel does some of this already for things like shared libraries, but there are numerous ways for identical pages to get created that the kernel does not know about directly, thus cannot coalesce. Examples include initialized memory (at startup or in caches) from multiple copies of the same program and virtualized guests that are running the same operating system and application programs.

Unfortunately, as Dmitri Monakhov points out, the KSM technique appears to be patented by VMware. A patent for "Content-based, transparent sharing of memory units" was filed in July 2001 and granted in September 2004. The abstract seems to clearly cover the ideas behind KSM:

[...] The context, as opposed to merely the addresses or page numbers, of virtual memory pages that [are] accessible to one or more contexts are examined. If two or more context pages are identical, then their memory mappings are changed to point to a single, shared copy of the page in the hardware memory, thereby freeing the memory space taken up by the redundant copies. The shared copy is ten preferable [sic] marked copy-on-write. Sharing is preferably dynamic, whereby the presence of redundant copies of pages is preferably determined by hashing page contents and performing full content comparisons only when two or more pages hash to the same key.

It should be noted that the abstract has no legal bearing, that comes from the—always tortuously worded—claims, which can be seen at the link above. In this case, as far as can be determined, the claims and abstract are in close agreement.

The dates above are rather important because there is some "prior art" to consider, namely the mergemem patch first announced in March of 1998. It is substantially the same as the patented idea: it looks for identical "context pages", then changes the memory mappings to point to a single copy-on-write page. This would seem to be a clear example of the idea being implemented well before the patent was filed, so it should invalidate the patent. As with everything surrounding software patents, though, it isn't as easy as that.

In order to invalidate a patent, either a court must rule that way or the patent office must be convinced to re-examine it, then find that the prior art makes it invalid. Both of these methods take time and usually money and lawyers as well. Free software projects may have time, but the other two are typically out of reach. Alan Cox suggests that "perhaps the Linux Foundation and some of the patent busters could take a look at mergemem and re-examination". While that might eventually resolve the problem, it is a multi-year process at best.

The folks behind the KSM project are some of the kvm hackers from Qumranet—which is now part of Red Hat. It is certainly conceivable that VMware might consider kvm a competitor and try to use this patent as a "competitive" weapon. That concern is probably enough to keep KSM out of the mainline until the issue is resolved.

There is a much quicker resolution available should VMware wish to do so. Like IBM has done with the RCU patent, VMware could license its patent for use in GPL-licensed code. There is much to be gained by doing that, at least in terms of positive community relations, and there is little to be lost—unless VMware truly believes that the patent will stand up to scrutiny. Both VMware and its parent, EMC, are members of the Linux Foundation, so one could see a role for the foundation in helping to put that kind of agreement together.

The original mergemem idea did not make into the kernel, but the code is still available for those running Linux 2.2.9. It appears that it was not pushed very hard in the face of some security concerns—which will need to be addressed by KSM as well. Processes could create a page of memory with known contents then, after waiting for the checker process (or kernel thread) to run, see if memory usage has increased. Based on that information, one can determine if other processes have a page with identical values. It would seem rather difficult to exploit, but clearly does allow some information to leak.

It will come as no surprise to most LWN readers that software patents are an increasingly dense minefield that can derail free software projects. Unfortunately, it is the kind of problem that has no solution in the technical domain where such projects excel. The political arena is where any solution will have to come from, though there seems to be some hope that judicial opinions (like the Bilski decision) may limit the scope of the damage. It is a problem that we are likely to see more frequently until there is some kind of resolution.

Comments (43 posted)

A look at free software in Ecuador

December 3, 2008

This article was contributed by Marco Fioretti

I recently spoke at the Congress on Free Software and Democratization of Knowledge hosted in Quito by the Universidad Politecnica Salesiana of Ecuador. My general report about the conference and Free as in Freedom knowledge in that country is at the P2P Foundation blog: the trip, however, was also an excellent occasion to check out the most interesting Free Software projects currently taking place in Ecuador. It turns out that there is a lot of activity at the Government level to promote Free Software, and interesting news from some cool projects developed locally.

FOSS in the Government

A recent presidential decree mandates that most national Public Administrations migrate entirely to Free Software. Ing. Mario Albuja, head of the Subsecretariat for Information Technology of the Presidency of Ecuador, explained during the congress the reasons and the general guidelines of this initiative. Later on, I was able to get more details in a couple of meetings with the members of his staff. Among the most important things going on right now there are the studies and tests for a Government digital signatures application which runs on Gnu/Linux and a unified document management system for 45 central Public Administrations. There is also a field trial of the GPL hospital management software Care2X in the works.

The initial implementation of the digital signature project, which uses Free Software whenever possible, is based on keys and digital certificates stored on SafeNet iKey 2032 USB tokens from Entrust. The first official field test will take place in the next weeks, when President Correa himself will use one such key to sign a decree. The Certificate Authority infrastructure which will issue keys and certificates is the same implemented by Banco Central del Ecuador in November 2007.

The software application, instead, runs inside any browser. A PostgreSQL backend stores all the documents, together with administrative metadata, on a CentOS-based server. The decrees waiting for electronic signature are presented to the user via a simple Apache/PHP front-end. The actual digital signature happens through a Java applet which reads the encrypted key from the USB token thanks to libraries provided by Entrust.

Another big step in the process of freeing Ecuador institutions from proprietary software will be the formal ratification of OpenDocument 1.0 by the Ecuadorian Institute of Standards (INEN). Large-scale usage of this format for public documents should take off right after that, around mid-2009.

All the public officials I talked with really believe in the potential of Free Software for a developing country like Ecuador. This only makes more relevant, and worthy of careful consideration, a comment I got from them: there, they say, is no coordination or common vision among the developers of the several FOSS applications they need to deploy. This was no surprise, of course: people at the Subsecretariat understand how FOSS development works. Nevertheless, the fact that there is no unified, local, reliable source for support, with predictable, if not guaranteed, response times, is creating them more problems than they expected when they began. There may be quite a business opportunity here for local FOSS entrepreneurs.

Talking with hackers

Rafael Bonifaz told me what's new in the Elastix world. In case you never heard of it, Elastix is a specialized GNU/Linux distribution born and (mostly) developed in Ecuador. Its goal is to solve all the communication problems of organizations of any size. Elastix integrates in one easy to administer package all you need to have PBX, VoIP, email, instant messaging, fax and fax/email gateway through Asterisk, Hylafax, Postfix and Openfire for Jabber. You can manage all the PBX functions with a customized version of freepbx. Other tools developed by the Elastix team provide hardware detection, centralized automatic configuration of phones and billing support with a2billing.

Elastix is doing great in Ecuador: RTS and Aerolineas Galapagos (Aerogal), which are respectively one of the most important TV channels and one of the main domestic airlines in Ecuador, are using it. Namely, Aerogal is running its call center off Elastix, which is being deployed also in the Ministry of Public Health.

Rafael, who is the current coordinator of the Elastix Community, is also proud of the fact that Elastix is the only Gnu/Linux distribution for communications which has two manual books, totaling about five hundred pages, freely downloadable from the Internet: Elastix Without Tears [PDF] by Ben Sharif and Unified communications with Elastix [PDF] by Edgar Landivar. The second manual is still a beta version, currently available only in Spanish. There already is, however, a new mailing list devoted to coordinating all the translation efforts for this second book.

Still thanks to Rafael, after knowing about Elastix I met a local group of Java developers who have very recently begun developing a new, interesting content management system called Melenti. Adrian Cadena, member of the Melenti team, explained to me that he and his partners needed a GPL, friendly, easy to use and fast CMS that could scale well from personal web pages to corporate portals. Another must on their requirement list was ease of integration with enterprise software (Java or not) for ERP, CRM and SAP services. That's why, three months ago, after some unsatisfactory experiences with the popular Joomla CMS they started writing Melenti.

One of the main features of Melenti should be performance under high loads. Adrian said they are aiming for something able to handle hundreds of thousands of clicks per second, something which Joomla "simply could not handle, when we tried it". Melenti administrators, instead, would be able to configure load balancing without problems, thanks to an interface based on Jndi and other tools.

Melenti should run on any JEE infrastructure, from Websphere to JBoss, BEA, Oracle AS, Tomcat, Jetty and more. According to Adrian, Melenti will also be much simpler to set up and extend than most other GPL software for Content Management. Installation should be as simple as dropping a .war file into your flavor of JEE container and following the steps of the graphical wizard which will pop up. Writing Melenti "gadgets", that is plugins, should also be easier than with Joomla, Drupal, Php-nuke and similar products. This because, says Adrian, "unlike those products, Java has worldwide standards like Spring, JPA, JSF, GWT and so on: new developers can just take a look at the core Melenti API and start writing their own gadgets in no time".

The first releases of Melenti will support basic CMS functions like management of web pages, images and other files. There will be also interfaces for banner rotation, creation of user polls and a Web Services Creator. The latter is a simple wizard to create Web Services from existing Melenti gadgets. The first alpha version of Melenti has been just uploaded to Sourceforge. You're obviously welcome to have a look at the code and to participate in the development of Melenti.

Let's go back to the reason why I went to Quito now, that is Free Software and Democratization of Knowledge. Quiliro Ordonez, with one friend and other occasional volunteers, is now implementing in the field a project first announced in 2007: placing Free Software in a school of the community of Quilapungo, south of Quito, which serves about 200 students. Thus far, Quiliro has installed 2 servers and 4 thin clients running gNewSense. He chose this distribution because it is "100% free software, without non-free repositories or blobs in the kernel which promote functionality before anything else, as this would weaken our position for freedom". He's also very happy with TCOS, which made setting up the thin clients a breeze. The school staff will use Projecto Alba, a modular administration and planning software for schools first developed in Argentina. While gNewSense worked fine out of the box, Quiliro and his partners had to localize Alba to adapt it to the terminology and procedures adapted in Ecuadorian schools.

Eventually, the school in Quilapungo will have about 40 Gnu/Linux workstations, but Quiliro doesn't plan to stop there. If all goes well, Quilapungo will be presented as a pilot project in a proposal for Free Software deployment in all public schools in Ecuador. Let's wish Quiliro good luck!

Comments (9 posted)

Page editor: Jake Edge
Next page: Security>>