LWN.net Weekly Edition for November 12, 2009

Python moratorium and the future of 2.x

November 11, 2009

This article was contributed by Andrew M. Kuchling

On November 9, Python BDFL ("Benevolent Dictator For Life") Guido van Rossum froze the Python language's syntax and grammar in their current form for at least the upcoming Python 2.7 and 3.2 releases, and possibly for longer still. This move is intended to slow things down, giving the larger Python community a chance to catch up with the latest Python 3.x releases.

The idea of freezing the language was originally proposed by Van Rossum in October on the python-ideas list and discussed on LWN. There are three primary arguments for the freeze, all described in the original proposal:

Letting alternate implementations, IDEs, catch up:
[...] frequent changes to the language cause pain for implementors of alternate implementations (Jython, IronPython, PyPy, and others probably already in the wings) at little or no benefit to the average user [...]
Encouraging the transition to Python 3.x:
The main goal of the Python development community at this point should be to get widespread acceptance of Python 3000. There is tons of work to be done before we can be comfortable about Python 3.x, mostly in creating solid ports of those 3rd party libraries that must be ported to Py3k before other libraries and applications can be ported.
Redirecting effort to the standard library and the CPython implementation:
Development in the standard library is valuable and much less likely to be a stumbling block for alternate language implementations. I also want to exclude details of the CPython implementation, including the C API from being completely frozen — for example, if someone came up with (otherwise acceptable) changes to get rid of the [Global Interpreter Lock] I wouldn't object.

The proposal turned into PEP 3003, "Python Language Moratorium", which is more definite about what cannot be changed:

New built-ins
Language syntax
The grammar file essentially becomes immutable apart from ambiguity fixes.
General language semantics
The language operates as-is with only specific exemptions ...
New __future__ imports
These are explicitly forbidden, as they effectively change the language syntax and/or semantics (albeit using a compiler directive).

Adding a new method to a built-in type will still be open for consideration, and so is changing language semantics that turn out to be ambiguous or difficult to implement. Python's C API can be changed in any way that doesn't impose grammar or semantic changes, and the modules in the standard library are still fair game for improvement.

The duration of the freeze is given in the PEP as "a period of at least two years from the release of Python 3.1." Python 3.1 was released on June 27 2009, so the freeze would extend until at least June 2011. Van Rossum later clarified the duration on python-dev, writing "In particular, the moratorium would include Python 3.2 (to be released 18-24 months after 3.1) but (unless explicitly extended) allow Python 3.3 to once again include language changes."

Most responses to the moratorium idea were favorable, but those who had objections felt those objections very strongly. Steven D'Aprano wrote:

A moratorium isn't cost-free. With the back-end free to change, patches will go stale over 2+ years. People will lose interest or otherwise move on. Those with good ideas but little patience will be discouraged. I fully expect that, human nature being as it is, those proposing a change, good or bad, will be told not to bother wasting their time, there's a moratorium on at least as often as they'll be encouraged to bide their time while the moratorium is on.

A moratorium turns Python's conservativeness up to 11. If Python already has a reputation for being conservative in the features it accepts — and I think it does — then a moratorium risks giving the impression that Python has become the language of choice for old guys sitting on their porch yelling at the damn kids to get off the lawn. That's a plus for Cobol. I don't think it is a plus for Python.

The 2-to-3 transition

One of the reasons for the moratorium is the developers' increasing concern at the slow speed of the user community's transition away from Python 2.x. The moratorium thread led to a larger discussion of where Python 3.x stands.

Progress on the transition can be roughly measured by looking at the third-party packages available for Python 3.x. Only about 100 of the 8000 packages listed on the Python Package Index claim to be compatible with Python 3, and many significant packages have not yet been ported (Numeric Python, MySQLdb, PyGTk), making it impossible for users to port their in-house code or application. Few Linux distributions have even packaged a Python 3.x release yet.

For the Python development community, it's tempting to nudge the users toward Python 3 by discouraging them from using Python 2. The Python developers have been dividing their attention between the 2.x and 3.x branches for a few years now, and a significant number of them would like to refocus their attention on a single branch. Given the slow uptake of Python 3, though, it's difficult to know when Python 2 development can stop. The primary suggestions in the recent discussion were:

Declare Python 2.6 the last 2.x release.
Declare Python 2.7 the last 2.x release.
After Python 2.7, continue with a few more releases (2.8, 2.9, etc.).
Declare the 3.x branch an experimental version, call it dead, and begin back-porting features to the 2.x branch.

Abandoning the 3.x branch had very few supporters. Retroactively declaring 2.6 the final release was also not popular, because people have been continuing to apply and backport improvements on the assumption that there was going to be a 2.7 release.

As Skip Montanaro phrased it:

2.6.0 was released over a year ago and there has been no effort to suppress bug fix or feature additions to trunk since then. If you call 2.6 "the end of 2.x" you'll have wasted a year of work on 2.7 with about a month to go before the first 2.7 alpha release.

If you want to accelerate release of 2.7 (fewer alphas, compressed schedule, etc) that's fine, but I don't think you can turn back the clock at this point and decree that 2.7 is dead.

A significant amount of work has already been committed to the 2.7 branch, as can be seen by reading "What's New in Python 2.7" or the more detailed NEWS file. New features include an ordered dictionary type, support for using multiple context managers in a single with statement, more accurate numeric conversions and printing, and several features backported from Python 3.1.

Clearly a 2.7 release will happen, and manager Benjamin Peterson's draft release schedule projects a 2.7 final release in June 2010. There's no clear consensus on whether to continue making further releases after 2.7. Post-2.7 releases could continue to bring 2.x and 3.x into closer compatibility and improve porting tools such as the 2to3 script, while keeping existing 2.x users happy with bugfixes and a few new features, but this work does cost effort and time. Brett Cannon stated his case for calling an end with 2.7:

[...] I think a decent number of us no longer want to maintain the 2.x series. Honestly, if we go past 2.7 I am simply going to stop backporting features and bug fixes. It's just too much work keeping so many branches fixed.

Raymond Hettinger argued that imposing an end-of-life is unpleasant for users:

I do not buy into the several premises that have arisen in this thread. [First premise:] For 3.x to succeed, something bad has to happen to 2.x. (which in my book translates to intentionally harming 2.x users, either through neglect or force, in order to bait them into switching to 3.x).

Hettinger is unmoved by the argument that maintaining 2.x takes up a lot of time, arguing that backporting a feature is relatively quick compared to the time required to implement it in the first place. He's also concerned that 3.x still needs more polishing, and concludes:

In all these matters, I think the users should get a vote. And that vote should be cast with their decision to stay with 2.x, or switch to 3.x, or try to support both.

Assessment

Declaring such a long-term freeze on the language's evolution is a surprising step, and not one that developer groups often choose. Languages defined by an official standard, such as C, C++, or Lisp, are forced to evolve very slowly because of the slow standardization process, but Python is not so minutely specified. D'Aprano makes a good point that the developers are already pretty conservative; most suggestions for language changes are rejected. On the other hand, switching to Python 3.x is a big jump for users and book authors; temporarily halting further evolution may at least give them the sense they're not aiming for a constantly shifting target.

It's probably premature to call the transition to Python 3.x a failure, or even behind schedule. These transitions invariably take a lot of time and proceed slowly. Many Linux distributions have adopted Python for writing their administrative tools, making the interpreter critical to the release process. Distribution maintainers will therefore be very conservative about upgrading the Python version. It's a chicken-and-egg problem; third-party developers who stick to their distribution's packages can't use Python 3 yet, which means they don't port their code to Python 3, which gives distributions little incentive to package it. Eventually the community will switch, but it'll take a few years. The most helpful course for the Python developers is probably to demonstrate and document how applications can be ported to Python 3, as Martin von Löwis has done by experimentally porting Django to Python 3.x, and where possible get the resulting patches accepted by upstream.

It remains to be seen if a volunteer development group's efforts can be successfully redirected by declaring certain lines of development to be unwelcome. Volunteers want to work on tasks that are interesting, or amusing, or relevant to their own projects. The moratorium may lead to a perception that Python development is stalled, and developers may start up DVCS-hosted branches of Python that contain more radical changes, or move on to some other project that's more entertaining.

The nearest parallel might be the code freezes for versions 2.4 and 2.6 of the Linux kernel. The code freeze for Linux 2.4 was declared in December 1999, and 2.5.0 didn't open for new development until November 2001, nearly two years later. The long duration of the freeze led to a lot of pressure to bend the rules to get in one more feature or driver update.

Python's code freeze will be of similar length and there may be similar pressure to slip in just one little change. However, freezing the language still leaves lots of room to improve the standard library and the CPython implementation, enhance developer tools, and explore other areas not covered by the moratorium. Perhaps these tasks are enough of an outlet for creative energy to keep people interested.

Comments (50 posted)

Switching desktops surprisingly painless

By Jake Edge
November 11, 2009

Busy days are not uncommon here at LWN, but Tuesdays and Wednesdays are particularly full with writing and editing tasks for the weekly edition. Any kind of computer problem is most unwelcome on such days, whether that is because your ISP has decided to route your packets to Borneo, or because some critical piece of the desktop goes out to lunch. So, trying to wrestle with a Plasma—KDE 4's desktop shell—crash while under deadline pressure made for a rather stressful few hours.

The point here is not to bash on KDE, or Fedora, which is the distribution I tend to use, as the fault may well be my own. But, in the process of getting back to a working state, I made a surprising (to me, anyway) discovery: switching from KDE to GNOME was completely painless. The desktop wars would lead one to believe that there are such fundamental differences between the two dominant desktops that switching between them would be "entering a world of pain". Not so, at least for this relatively unsophisticated user.

Generally, I try and avoid dipping into the Fedora 10 update stream in the early part of any week, just for stability purposes. I also tend to ignore updates for several weeks at a time, which is part of what made the Plasma crash so surprising—I hadn't updated for roughly two weeks. It may well have been that I didn't restart the desktop after the last update, I often do yum update in some random Konsole and forget about it. So, when Plasma tried to restart, perhaps there was a newer version of it or libraries it depended on.

Back in January, I wrote an article about the KDE 4 transition and, in it noted that I had gone through the transition with few problems. In some ways, I was mystified by all of the uproar about KDE 4 as it more or less worked for me. Oddly, I think I now have a better perspective on what folks suffered at that time. If Plasma won't start, you don't really have a desktop; maybe there are various workarounds to switch workspaces or to particular windows (especially iconified windows) using just the keyboard, but I didn't have any time to figure that out.

Over the last year and a half or so—since switching to KDE 4—I had been having infrequent Plasma crashes. Roughly once a month, the desktop would briefly seize up, with the Panel disappearing and windows rearranging themselves, and then it would all come back. Sometimes, the KDE bug reporting tool would come up, but, since I don't run debug versions of the desktop applications, the tool made it clear that the reports weren't wanted. In fact, as I found out yesterday, reporting a bug through the tool requires having a bugs.kde.org login, which is an exceedingly annoying—unlikely to be surmounted by many—requirement.

This time, things were different, as Plasma went away and didn't come back. Since I had a lot of state in various workspaces (i.e. virtual desktops) that I didn't want to lose, I poked around on the net for ways to restart Plasma. None of those worked, so I saved off what I could and killed the X server with ctrl-alt-backspace, fully believing that logging in anew would clear up the problem. Not so; new logins would either just return to the login screen (after loading most of KDE based on the progress bar), or get to a screen with KDE bug tool reporting that Plasma had crashed.

So, it seemed like it might be time to upgrade or downgrade KDE to get back to a working state. I didn't have a lot of patience, so I just did:

    yum remove kde*

to get rid of an updates-testing version I had previously installed, followed by:

    yum groupinstall "KDE (K Desktop Environment)"

just using the standard F10 repositories. I was very sure that would fix the problem, but it was not to be. The same behavior was exhibited when trying to start Plasma.

For a moment, I was kind of at a loss, trying to figure out the optimal approach for tracking down the breakage and somehow downgrading to a working version, or deciding to switch to the laptop—ironically running Rawhide without any problems—for the next few days. At that point I realized there was another option, switch over to using GNOME.

Now that probably is an obvious choice, in hindsight it certainly is, but when you are focused on a particular desktop, the alternative doesn't even enter your mind right away. Once I decided to do that, I immediately began to dread the amount of configuration and hassle that I expected it to take. I hadn't run GNOME in many years, perhaps as many as ten. I know lots of folks who run GNOME and had no reason to believe it was inferior in any way, but I did think, or led myself to believe, that it was different.

It took roughly five minutes of working with GNOME before I was largely unaware that I had switched. It may be that my use of the desktop is relatively minimal, though it is in use for many hours a day. I tend to have multiple Konsole windows (each with multiple tabs), a browser, the Gimp, claws-mail, and emacs up all the time, scattered around multiple workspaces. The only departures from standard GNOME were to move the menu bar to the bottom of the screen and to make windows auto-raise when the mouse pointer moves into them (something that has to be done for KDE as well).

I imagine I will revert back to KDE once I resolve whatever problem Plasma is having (and it may be as simple as yum update in a day or two), but I certainly won't look at the competing desktops for Linux in quite the same way. For any hardcore advocates of one desktop or the other, I would seriously encourage giving the other a try for a few hours some day. You may very well be surprised at how little difference, in terms of actually getting work done, there really is.

There are certainly lessons here for both Fedora and KDE, even though the problem may be partially of my own doing. It is hard to see how removing KDE and reinstalling from the F10 repositories didn't clear up this problem, unless there is some configuration in my home directory that causes it. Perhaps it is something in the Plasmoids (I did fiddle with the World Clock widget an hour or two before the crashes started) or Panel widgets that I have running. It is not easy to tell, nor is it particularly easy to Google for.

Plasma should (obviously) try to be more robust, and provide some better idea of where things might be going wrong—and what to do about it—but, certainly KDE should not be placing barriers between its users and its bug reporting system. I suspect it is an attempt to reduce some kind of bug report spam, but requiring a username/password for some "random" site is something that will stop bug reports from lots of users, including me.

Fedora has been concerned recently about the stability of upgrades. A recent issue with a Thunderbird upgrade, which caused much pain for users, is just one of a number of Fedora upgrade woes. This particular Plasma problem may be another, and, if so, will likely cause the project to focus on the upgrade stability issue even more.

Software has bugs, but desktop environments—much like the kernel—occupy a special place. If those pieces stop working, there is little the user can do to work around that fact. Usually, downgrading the kernel to the previously functioning one is an option, and it may be for desktops as well, but another alternative that free software brings is to switch to a different desktop (or even kernel, FreeBSD anyone?) entirely. That's a freedom worth having.

Comments (28 posted)

Nepomuk: sharing application metadata

November 11, 2009

This article was contributed by Ben Martin

The Nepomuk project has the potential to unlock the data from its originating application so that it can used by other applications on the desktop. If Nepomuk becomes pervasive, history logs, bookmarks, file metadata, email, instant messages, photo tags, or other metadata will be shared between various desktop applications. Why should music metadata like track length or artist and song title be locked away in an index created and used explicitly by a music playing application?

Consider a download assistant such as kget. The subversion branch of kget recently got the ability to store its download history using Nepomuk. kget could already save the transfer history in XML or SQLite. The advantage of using Nepomuk is that other desktop tools can easily see where a file was downloaded from and when; the information is unlocked from just kget. With Nepomuk, other applications don't need to know where the SQLite file is, or find and parse an XML file. All of the sudden, the file manager can let you know where this file came from so you can easily return for newer versions, or a desktop search can reveal all the files downloaded last year from http://example.com.

To allow data to be stored, exchanged, and understood by many applications, Nepomuk uses the same underlying technology that the Semantic Web is designed around. The Semantic Web tries to separate the data from the presentation in a way that allows for both humans and computers to inspect and digest the data. At the base of the Semantic Web is Resource Description Framework (RDF) which aims to allow metadata to be exchanged in an unambiguous, machine-processing-friendly format.

There are many who dismiss the Semantic Web as an ivory tower pipe dream. Various concerns are cited as reasons that RDF will not be adopted: it takes extra time to generate RDF data, it allows for automated comparisons which will make companies uncomfortable, and there will no agreement between companies on which schemas to use, etc.

Nepomuk and RDF have a huge potential on the Free and Open Source Software (FOSS) desktop because application developers have no vested interest in locking their data away, and due to the nature of free software, one can patch in RDF and/or Nepomuk support into projects. The latter problem about projects designing their own schema is still present for FOSS, but, luckily, schema mismatches in and of themselves are not a show stopper for RDF adoption. By definition, once data is in RDF it can be processed automatically by a computer, so the machine, rather than the human, can always work around schema differences.

RDF tries to capture information in the form of triples. The classic examples are relationships and ownerships, for example: "Mary knows Mark" and "dog has tail". To avoid name clashes for things described in RDF, longer URL style identifiers are used for the three pieces of information. To get back to smaller text strings for these URLs, prefixes are used in the style of XML namespaces. For example, foaf:name could be used for a human name which expands to the URL http://xmlns.com/foaf/0.1/name. This way, individual things can still be described concisely, but they should also have globally understood meaning. A foaf:name is a person's name, whereas a toolshed:name might name a screwdriver.

Below is an example of using Nepomuk from the command line to create and list an RDF file:

    $ sopranocmd --backend redland add \
	"<http://onto.libferris.com/things/1234>" "<http://onto.libferris.com/price>" "30"
    $ sopranocmd --backend redland add \
	"<http://onto.libferris.com/things/1234>" "<http://onto.libferris.com/title>" \
 	"super crazy magical item"

    $ sopranocmd --backend redland list
    <http://onto.libferris.com/things/1234> <http://onto.libferris.com/price> \
	"30"^^<http://www.w3.org/2001/XMLSchema#int> (empty)
    <http://onto.libferris.com/things/1234> <http://onto.libferris.com/title> \
	"super crazy magical item"^^<http://www.w3.org/2001/XMLSchema#string> (empty)
    Total results: 2
    Execution time: 00:00:00.1

While an RDF repository can be used to just store, update, and query these triples, a schema can also be imposed so that applications know what to expect. For example, that the foaf:homepage is a link to a web site with certain constraints. Examples of constraints include the type of data stored (integer, date, etc), how many times a property can appear (only one homepage), and so on.

The SPARQL query language can be used to join together the triples and select the information of interest. While SPARQL uses familiar SQL, like the SELECT, WHERE, ORDER BY, and LIMIT keywords, joining triples is a bit different than with SQL. For example, the query below grabs the price and title for "something". We don't particularly care what the something is, as long as the same something has a title and a price of less than 30.5.

    SELECT  ?title ?price
    WHERE   { ?x ns:price ?price .
	      FILTER (?price < 30.5)
	      ?x dc:title ?title . }

With all this talk of RDF, triples, and ivory towers, one might think that using Nepomuk and RDF will be painful and have an extremely long learning curve. Below are a few examples of using Nepomuk in a KDE application to quell those fears. Nepomuk makes using RDF simple because it provides a code generator that makes native C++ classes to allow interaction with the RDF store:

    Nepomuk::File f( "/home/foo/bar.txt" );
    f.setAnnotation( "This is just a test file that contains nothing of interest." );

The above is much neater than thinking in terms of the triples shown below which might be stored to represent it. In this case X will really be a persistent unique identifier used to identify the file, similar to the device number and inode in the kernel. The type, file etc will of course be longer URIs in the real RDF store.

    X type        file
    X url         "/home/foo/bar.txt"
    X annotation  "This is just a..."

The above example which uses setAnnotation() takes advantage of a schema for annotating and tagging files which comes with Nepomuk itself. The kget program mentioned earlier in the article is a good example of not using a standard schema. In the sources of kget, the transferhistorystore.cpp file manages the XML, SQLite, and Nepomuk representations of download history. At the end of transferhistorystore.cpp file, there is the following code:

    void NepomukStore::saveItem(const TransferHistoryItem &item)
    {
      Nepomuk::HistoryItem historyItem(item.source());
      historyItem.setDestination(item.dest());
      historyItem.setSource(item.source());
      historyItem.setState(item.state());
      historyItem.setSize(item.size());
      historyItem.setDateTime(item.dateTime());
    }

    void NepomukStore::deleteItem(const TransferHistoryItem &item)
    {
      Nepomuk::HistoryItem historyItem(item.source());
      historyItem.remove(); 
      ...

The HistoryItem class is generated by Nepomuk using the custom schema file kget_history.trig, part of which is repeated below. While the schema language that kget_history.trig is using may be unfamiliar, it should still be clear that there is a ndho:HistoryItem which has properties of various types with various restrictions on them, such as a destination property which can appear zero or one times and is a string. Given the below schema file, Neopmuk can generate the C++ class Nepomuk::HistoryItem needed to allow the above C++ code compile.

    <http://nepomuk.kde.org/ontologies/2008/10/06/ndho> {

	ndho:HistoryItem
	a rdfs:Class ;
	rdfs:comment "A kget history item." ;
	rdfs:label "application" ;
	rdfs:subClassOf rdfs:Resource .

	ndho:destination
	a rdf:Property ;
	rdfs:comment "Destination of the download." ;
	rdfs:domain ndho:HistoryItem ;
	rdfs:label "source" ;
	rdfs:range xsd:string ;
	nrl:maxCardinality "1" . 
	...

At the base of the Nepomuk project is the Soprano library and command line tools which depend only on QtCore, making them a useful RDF library for use on both desktop and mobile platforms. The Nepomuk libraries build on Soprano to make writing KDE applications using RDF simple. One of the great things about the design of Soprano is that there are multiple backends which can store and query RDF. So there can be a memory mapped implementation for a mobile device, or a full-blown database server for a LAN, and applications still use the same API.

For a long time Soprano has had two main backends: Redland and Sesame2. The former is a C library for RDF and the latter a Java implementation. While Sesame2 is written in Java it can deliver better query performance than Redland. This left KDE4 in the predicament that it required Java to achieve good RDF performance. To solve this issue the new Virtuoso backend was created and is getting to the point where it is now stable. As I discovered recently, the main impediment to developing a backend for soprano is implementing SPARQL.

Adoption still remains the major hurdle for Nepomuk and Soprano. With the host of persistence options available, the first thing that comes to an application developer's mind might be flat files, MySQL, Sqlite, Berkeley DB, or some generic relational database library, when wanting to store and retrieve data. However, when storing data that might be of interest to other applications, using Nepomuk or Soprano has the potential to unlock an application's data. As can be seen above, the main thing to learn is a bit about the schema language and then native C++ objects can be used to interact with Nepomuk from an application.

Comments (8 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: What lessons can be learned from the iPhone worms?; New vulnerabilities in cups, drupal6, java, QtWebKit,...
Kernel: Supporting transactions in btrfs; Another new ABI for fanotify; A futex overview and update.
Distributions: Mandriva 2010; Moblin v2.1 released; Fedora's CC license changeover complete.
Development: ]Project Open[ for Enterprise Project Management, GNOME 3.0 date, Google's new "Go" language, new versions of SQLite, CUPS, ForcePAD, SQL-Ledger, pymos, Firefox, IcedTea6, eGenix pyOpenSSL, ErrorHandler.
Announcements: FSFE on EU interoperability, Cavium to acquire MontaVista, Samsung's bada smartphone, Bilski case, AU case on BitTorrent, Act on ACTA, CodePlex guidelines, Novell cuts workforce, FSFE grants, Linux-Kongress report, ELC cfp, SCALE cfp, CONFidence, Perl not dead.

Next page: Security>>