November 11, 2009
This article was contributed by Andrew M. Kuchling
On November 9, Python BDFL ("Benevolent Dictator For Life") Guido van
Rossum froze
the Python language's syntax and grammar in their current form for at least the
upcoming Python 2.7 and 3.2 releases, and possibly for longer still.
This move is intended to slow things down, giving the larger Python
community a chance to catch up with the latest Python 3.x releases.
The idea of freezing the language was originally
proposed by Van Rossum in October on the python-ideas list and discussed on LWN. There
are three primary arguments for the freeze, all described in the
original proposal:
- Letting alternate implementations, IDEs, catch up:
[...] frequent changes to the
language cause pain for implementors of alternate implementations
(Jython, IronPython, PyPy, and others probably already in the wings)
at little or no benefit to the average user [...]
-
Encouraging the transition to Python 3.x:
The main goal of the Python development community at this point should
be to get widespread acceptance of Python 3000. There is tons of work
to be done before we can be comfortable about Python 3.x, mostly in
creating solid ports of those 3rd party libraries that must be ported
to Py3k before other libraries and applications can be ported.
-
Redirecting effort to the standard library and the CPython implementation:
Development in the
standard library is valuable and much less likely to be a stumbling
block for alternate language implementations. I also want to exclude
details of the CPython implementation, including the C API from being
completely frozen — for example, if someone came up with (otherwise
acceptable) changes to get rid of the [Global Interpreter Lock]
I wouldn't object.
The proposal turned into PEP 3003, "Python
Language Moratorium", which is more definite about what cannot be
changed:
- New built-ins
- Language syntax
The grammar file essentially becomes immutable apart from ambiguity
fixes.
- General language semantics
The language operates as-is with only specific exemptions ...
- New __future__ imports
These are explicitly forbidden, as they effectively change the language
syntax and/or semantics (albeit using a compiler directive).
Adding a new method to a built-in type will still be open for
consideration, and so is changing language semantics that turn out to be
ambiguous or difficult to implement. Python's C API can be changed in
any way that doesn't impose grammar or semantic changes, and the
modules in the standard library are still fair game for improvement.
The duration of the freeze is given in the PEP as "a period of at
least two years from the release of Python 3.1." Python 3.1 was
released on June 27 2009, so the freeze would extend until at least
June 2011. Van Rossum later clarified
the duration on python-dev, writing "In particular, the moratorium
would include Python 3.2 (to be released 18-24 months after 3.1) but
(unless explicitly extended) allow Python 3.3 to once again include
language changes."
Most responses to the moratorium idea were favorable, but
those who had objections felt those objections very strongly.
Steven D'Aprano wrote:
A moratorium isn't cost-free. With the back-end free to change,
patches will go stale over 2+ years. People will lose interest or
otherwise move on. Those with good ideas but little patience will be
discouraged. I fully expect that, human nature being as it is, those
proposing a change, good or bad, will be told not to bother wasting
their time, there's a moratorium on at least as often as they'll be
encouraged to bide their time while the moratorium is on.
A moratorium turns Python's conservativeness up to 11. If Python
already has a reputation for being conservative in the features it
accepts — and I think it does — then a moratorium risks giving the
impression that Python has become the language of choice for old guys
sitting on their porch yelling at the damn kids to get off the
lawn. That's a plus for Cobol. I don't think it is a plus for Python.
The 2-to-3 transition
One of the reasons for the moratorium is the developers' increasing
concern at the slow speed of the user community's transition away from
Python 2.x. The moratorium thread led to a larger discussion of
where Python 3.x stands.
Progress on the transition can be roughly measured by looking at
the third-party packages available for Python 3.x. Only about 100 of
the 8000 packages listed on the
Python Package Index claim to be compatible with Python 3, and
many significant packages have not yet been ported (Numeric Python,
MySQLdb, PyGTk), making it impossible for users to port their in-house
code or application. Few Linux distributions have even packaged a
Python 3.x release yet.
For the Python development community, it's tempting to nudge the
users toward Python 3 by discouraging them from using Python 2.
The Python developers have been dividing their attention between the
2.x and 3.x branches for a few years now, and a significant number of
them would like to refocus their attention on a single branch. Given
the slow uptake of Python 3, though, it's difficult to know when
Python 2 development can stop. The primary suggestions
in the recent discussion were:
- Declare Python 2.6 the last 2.x release.
- Declare Python 2.7 the last 2.x release.
- After Python 2.7, continue with a few more releases (2.8, 2.9, etc.).
- Declare the 3.x branch an
experimental version, call it dead, and begin back-porting features to
the 2.x branch.
Abandoning the 3.x branch had very few supporters. Retroactively
declaring 2.6 the final release was also not popular, because people
have been continuing to apply and backport improvements on the
assumption that there was going to be a 2.7 release.
As Skip Montanaro phrased it:
2.6.0 was released over a year ago and there has been no effort to
suppress bug fix or feature additions to trunk since then. If you
call 2.6 "the end of 2.x" you'll have wasted a year of work on 2.7
with about a month to go before the first 2.7 alpha release.
If you want to accelerate release of 2.7 (fewer alphas, compressed schedule,
etc) that's fine, but I don't think you can turn back the clock at this
point and decree that 2.7 is dead.
A significant amount of work has already been committed to the
2.7 branch, as can be seen by reading "What's New in
Python 2.7" or the more detailed NEWS file.
New features include an ordered dictionary type, support for using
multiple context managers in a single with statement,
more accurate numeric conversions and printing, and several features
backported from Python 3.1.
Clearly a 2.7 release will happen, and manager Benjamin Peterson's
draft release
schedule projects a 2.7 final release in June 2010. There's no
clear consensus on whether to continue making further releases after
2.7. Post-2.7 releases could continue to bring 2.x and 3.x into
closer compatibility and improve porting tools
such as the 2to3 script, while keeping
existing 2.x users happy with bugfixes and a few new features,
but this work does cost effort and time.
Brett Cannon stated
his
case for calling an end with 2.7:
[...] I think a decent number of us no longer want to maintain the 2.x
series. Honestly, if we go past 2.7 I am simply going to stop
backporting features and bug fixes. It's just too much work keeping so
many branches fixed.
Raymond Hettinger argued that imposing an end-of-life is
unpleasant for users:
I do not buy into the several premises that have arisen in this
thread. [First premise:] For 3.x to succeed, something bad has to
happen to 2.x. (which in my book translates to intentionally harming
2.x users, either through neglect or force, in order to bait them into
switching to 3.x).
Hettinger is unmoved by the argument that maintaining 2.x
takes up a lot of time, arguing that backporting a feature is
relatively quick compared to the time required to implement it
in the first place. He's also concerned that 3.x still needs
more polishing, and concludes:
In all these matters, I think the users should get a vote. And that vote
should be cast with their decision to stay with 2.x, or switch to 3.x, or
try to support both.
Assessment
Declaring such a long-term freeze on the language's evolution is a
surprising step, and not one that developer groups often choose.
Languages defined by an official standard, such as C, C++, or Lisp,
are forced to evolve very slowly because of the slow standardization
process, but Python is not so minutely specified. D'Aprano makes a good
point that the developers are already pretty conservative; most
suggestions for language changes are rejected. On the other hand,
switching to Python 3.x is a big jump for users and book authors;
temporarily halting further evolution may at least give them the sense
they're not aiming for a constantly shifting target.
It's probably premature to call the transition to Python 3.x a
failure, or even behind schedule. These transitions invariably take a
lot of time and proceed slowly. Many Linux distributions have adopted
Python for writing their administrative tools, making the interpreter
critical to the release process. Distribution maintainers will
therefore be very conservative about upgrading the Python version.
It's a chicken-and-egg problem; third-party developers who stick to
their distribution's packages can't use Python 3 yet, which means they
don't port their code to Python 3, which gives distributions little
incentive to package it. Eventually the community will switch, but
it'll take a few years. The most helpful course for the Python
developers is probably to demonstrate and document how applications
can be ported to Python 3, as Martin von Löwis has done by
experimentally porting Django to
Python 3.x, and where possible get the resulting patches accepted
by upstream.
It remains to be seen if a volunteer development group's efforts
can be successfully redirected by declaring certain lines of
development to be unwelcome. Volunteers want to work on tasks that
are interesting, or amusing, or relevant to their own projects. The
moratorium may lead to a perception that Python development
is stalled, and developers may start up DVCS-hosted branches of Python
that contain more radical changes, or move on to some other project
that's more entertaining.
The nearest parallel might be the code freezes for versions 2.4
and 2.6 of the Linux kernel. The code freeze for Linux 2.4 was
declared in December 1999, and 2.5.0 didn't open for new development until
November 2001, nearly two years later. The long duration of the freeze led
to a lot
of pressure to bend the rules to get in one more feature or driver
update.
Python's code freeze will be of similar length and there may be
similar pressure to slip in just one little change. However,
freezing the language still leaves lots of room to improve the
standard library and the CPython implementation, enhance developer
tools, and explore other areas not covered by the moratorium. Perhaps
these tasks are enough of an outlet for creative energy to keep people
interested.
Comments (50 posted)
By Jake Edge
November 11, 2009
Busy days are not uncommon here at LWN, but Tuesdays and Wednesdays are
particularly full with writing and editing tasks for the weekly edition.
Any kind of computer problem is most unwelcome on such days, whether that is
because your ISP has decided to route your packets to Borneo, or because
some critical piece of the desktop goes out to lunch. So, trying to
wrestle with a Plasma—KDE 4's desktop shell—crash while under
deadline pressure made for a rather stressful few hours.
The point here is not to bash on KDE, or Fedora, which is the distribution I
tend to use, as the fault may well be my own. But, in the process of
getting back to a working state, I made a surprising (to me, anyway)
discovery: switching from KDE to GNOME was completely painless. The
desktop wars would lead one to believe that there are such fundamental
differences between the two dominant desktops that switching between them
would be "entering a world of pain". Not so, at least for this relatively
unsophisticated user.
Generally, I try and avoid dipping into the Fedora 10 update stream in the
early part of any week, just for stability purposes. I also tend to ignore
updates for several weeks at a time, which is part of what made the Plasma
crash so surprising—I hadn't updated for roughly two weeks. It may
well have been that I didn't restart the desktop after the last update, I
often do yum update in some random Konsole and forget about it.
So, when Plasma tried to restart, perhaps there was a newer version of it
or libraries it depended on.
Back in January, I wrote an article about the KDE 4
transition and, in it noted that I had gone through the transition with
few problems. In some ways, I was mystified by all of the uproar about KDE
4 as it more or less worked for me. Oddly, I think I now have a better
perspective on what folks suffered at that time. If Plasma won't start,
you don't really have a desktop; maybe there are various workarounds to
switch workspaces or to particular windows (especially iconified windows)
using just the
keyboard, but I didn't have any time to figure that out.
Over the last year and a half or so—since switching to KDE 4—I
had been having infrequent Plasma crashes. Roughly once a month, the desktop
would briefly seize up, with the Panel disappearing and windows
rearranging themselves, and then it would all come back. Sometimes, the
KDE bug reporting tool would come up, but, since I don't run debug versions
of the desktop applications, the tool made it clear that the reports
weren't wanted. In fact, as I found out yesterday, reporting a bug through
the tool
requires having a bugs.kde.org login, which is an exceedingly
annoying—unlikely to be surmounted by many—requirement.
This time, things were different, as Plasma went away and didn't come
back. Since I had a lot of state in various workspaces (i.e. virtual
desktops) that I didn't want to lose, I poked around on the net for ways to
restart Plasma. None of those worked, so I saved off what I could and
killed the X server with ctrl-alt-backspace, fully believing that logging
in anew would clear up the problem. Not so; new logins would either just
return to the login screen (after loading most of KDE based on the progress
bar), or get to a screen with KDE bug tool reporting that Plasma had crashed.
So, it seemed like it might be time to upgrade or downgrade KDE to get back
to a working state. I didn't have a lot of patience, so I just did:
yum remove kde*
to get rid of an
updates-testing version I had previously
installed, followed by:
yum groupinstall "KDE (K Desktop Environment)"
just using the standard F10 repositories. I was very sure that would fix
the problem, but it was not to be. The same behavior was exhibited when
trying to start Plasma.
For a moment, I was kind of at a loss, trying to figure out the optimal
approach for tracking down the breakage and somehow downgrading to a
working version, or deciding to switch to the laptop—ironically
running Rawhide without any problems—for the next few days. At that
point I realized there was another option, switch over to using GNOME.
Now that probably is an obvious choice, in hindsight it certainly is, but
when you are focused on a particular desktop, the alternative doesn't even
enter your mind right away. Once I decided to do that, I immediately began
to dread the amount of configuration and hassle that I expected it to
take. I hadn't run GNOME in many years, perhaps as many as ten. I know
lots of folks who run GNOME and had no reason to believe it was inferior in
any way, but I did think, or led myself to believe, that it was
different.
It took roughly five minutes of working with GNOME before I was largely
unaware that I had switched. It may be that my use of the desktop is
relatively minimal, though it is in use for many hours a day. I tend to
have multiple Konsole windows (each with multiple tabs), a browser, the
Gimp, claws-mail, and emacs up all the time, scattered around multiple
workspaces. The only departures from standard GNOME were to move the menu
bar to the bottom of the screen and to make windows auto-raise when the
mouse pointer moves into them (something that has to be done for KDE as
well).
I imagine I will revert back to KDE once I resolve whatever problem Plasma
is having (and it may be as simple as yum update in a day or two),
but I certainly won't look at the competing desktops for Linux in quite the
same way. For any hardcore advocates of one desktop or the other, I would
seriously encourage giving the other a try for a few hours some day. You
may very well be surprised at how little difference, in terms of actually
getting work done, there really is.
There are certainly lessons here for both Fedora and KDE, even though the
problem may be partially of my own doing. It is hard to
see how removing KDE and reinstalling from the F10 repositories didn't
clear up this problem, unless there is some configuration in my home
directory that causes it. Perhaps it is something in the Plasmoids (I did
fiddle with the World Clock widget an hour or two before the crashes
started) or Panel widgets that I have running. It is not easy to tell, nor
is it particularly easy to Google for.
Plasma should (obviously) try to be more robust, and provide some better
idea of where things might be going wrong—and what to do about
it—but, certainly KDE should not be placing barriers between its
users and its bug reporting system. I suspect it is an attempt to reduce
some kind of bug report spam, but requiring a username/password for some
"random" site is something that will stop bug reports from lots of users,
including me.
Fedora has been concerned
recently about the stability of upgrades. A recent issue with a Thunderbird upgrade, which
caused much pain for users, is just one of a number of Fedora upgrade
woes. This particular Plasma problem may be another, and, if so, will
likely cause the project to focus on the upgrade stability issue even more.
Software has bugs, but desktop environments—much like the
kernel—occupy a special place. If those pieces stop working, there
is little the user can do to work around that fact. Usually, downgrading
the kernel to the previously functioning one is an option, and it may be
for desktops as well, but another alternative that free software brings is
to switch to a different desktop (or even kernel, FreeBSD anyone?)
entirely. That's a freedom worth having.
Comments (28 posted)
November 11, 2009
This article was contributed by Ben Martin
The Nepomuk project has the
potential to unlock the data from its originating application so that
it can used by other applications on the desktop. If Nepomuk becomes
pervasive, history logs, bookmarks, file metadata, email,
instant messages, photo tags, or other metadata will be shared between
various desktop applications. Why should music metadata like track length
or artist and song title be locked away in an index created and used
explicitly by a music playing application?
Consider a download assistant such as kget. The subversion branch of
kget recently got the ability to store its download history using
Nepomuk. kget could already save the transfer history in XML or
SQLite. The advantage of using Nepomuk is that other desktop tools can
easily see where a file was downloaded from and when; the information
is unlocked from just kget. With Nepomuk, other applications don't need to know
where the SQLite file is, or find and parse an XML file. All of the
sudden, the file manager can let you know where this file came from so
you can easily return for newer versions, or a desktop search
can reveal all the files downloaded last year from http://example.com.
To allow data to be stored, exchanged, and understood by many
applications, Nepomuk uses the same underlying technology that
the Semantic
Web is designed around. The Semantic Web tries to separate the
data from the presentation in a way that allows for both humans and
computers to inspect and digest the data. At the base of the Semantic
Web is
Resource Description Framework (RDF)
which aims to allow metadata to be exchanged in an unambiguous,
machine-processing-friendly format.
There are many who dismiss the Semantic Web as an ivory tower pipe
dream. Various concerns are cited as reasons that RDF will not be
adopted: it takes extra time to generate RDF data, it allows for
automated comparisons which will make companies uncomfortable, and
there will no agreement between companies on which schemas to use, etc.
Nepomuk and RDF have a huge potential on the Free and Open
Source Software (FOSS) desktop because application developers have no
vested interest in locking their data away, and due to the nature of
free software, one can patch in RDF and/or Nepomuk support into projects. The
latter problem about projects designing their own schema is still
present for FOSS, but, luckily, schema
mismatches in and of themselves are not a show stopper for RDF
adoption. By definition, once data is in RDF it can be processed
automatically by a computer, so the machine, rather than the human, can
always work
around schema differences.
RDF tries to
capture information in the form of triples.
The classic examples are relationships and ownerships, for
example: "Mary knows Mark" and "dog has tail". To avoid name clashes for
things described in RDF, longer URL style identifiers are used for the
three pieces of information. To get back to
smaller text strings for these URLs, prefixes are used in the style of XML
namespaces. For example, foaf:name could be used for a human name
which expands to the URL
http://xmlns.com/foaf/0.1/name. This way,
individual things can still be described concisely, but they should also
have globally
understood meaning. A foaf:name is a person's name, whereas a
toolshed:name might name a screwdriver.
Below is an example of using Nepomuk from the command line to create and
list an RDF file:
$ sopranocmd --backend redland add \
"<http://onto.libferris.com/things/1234>" "<http://onto.libferris.com/price>" "30"
$ sopranocmd --backend redland add \
"<http://onto.libferris.com/things/1234>" "<http://onto.libferris.com/title>" \
"super crazy magical item"
$ sopranocmd --backend redland list
<http://onto.libferris.com/things/1234> <http://onto.libferris.com/price> \
"30"^^<http://www.w3.org/2001/XMLSchema#int> (empty)
<http://onto.libferris.com/things/1234> <http://onto.libferris.com/title> \
"super crazy magical item"^^<http://www.w3.org/2001/XMLSchema#string> (empty)
Total results: 2
Execution time: 00:00:00.1
While an RDF repository can be used to just store, update, and query
these triples, a schema can also be imposed so that
applications know what to expect. For example, that the foaf:homepage
is a link to a web site
with certain
constraints. Examples of constraints include the type of data
stored (integer, date, etc), how many times a property can appear
(only one homepage), and so on.
The
SPARQL query
language can be used to join together the triples and select the
information of interest. While SPARQL uses familiar SQL, like the
SELECT, WHERE, ORDER BY, and LIMIT keywords, joining
triples is a bit different than with SQL. For example, the query below grabs
the price and title for "something". We don't particularly care what
the something is, as long as the same something has a title and a price
of less than 30.5.
SELECT ?title ?price
WHERE { ?x ns:price ?price .
FILTER (?price < 30.5)
?x dc:title ?title . }
With all this talk of RDF, triples, and ivory towers, one might
think that using Nepomuk and RDF will be painful and have an extremely long
learning curve. Below are a few examples of using Nepomuk in a KDE
application to quell those fears. Nepomuk makes using RDF simple
because it provides a code generator that makes native C++ classes to
allow interaction with the RDF store:
Nepomuk::File f( "/home/foo/bar.txt" );
f.setAnnotation( "This is just a test file that contains nothing of interest." );
The above is much neater than thinking in terms of the triples shown below which might
be stored to represent it. In this case X will really be a persistent unique
identifier used to identify the file, similar to the device number and
inode in the kernel. The type, file etc will of course be longer URIs in the
real RDF store.
X type file
X url "/home/foo/bar.txt"
X annotation "This is just a..."
The above example which uses setAnnotation() takes advantage of a
schema for annotating and tagging files which comes with Nepomuk
itself. The kget program mentioned earlier in the article is a good
example of not using a standard schema. In the sources of kget,
the transferhistorystore.cpp
file manages the XML, SQLite, and Nepomuk representations of download
history. At the end of transferhistorystore.cpp file, there is the
following code:
void NepomukStore::saveItem(const TransferHistoryItem &item)
{
Nepomuk::HistoryItem historyItem(item.source());
historyItem.setDestination(item.dest());
historyItem.setSource(item.source());
historyItem.setState(item.state());
historyItem.setSize(item.size());
historyItem.setDateTime(item.dateTime());
}
void NepomukStore::deleteItem(const TransferHistoryItem &item)
{
Nepomuk::HistoryItem historyItem(item.source());
historyItem.remove();
...
The HistoryItem class is generated by Nepomuk using the custom schema
file kget_history.trig,
part of which is repeated below. While the schema language that
kget_history.trig is using may be unfamiliar, it should
still be clear that there is a ndho:HistoryItem which has
properties of various types with various restrictions on them, such as
a destination property which can appear zero or one times and is a
string. Given the below schema file, Neopmuk can generate the C++ class
Nepomuk::HistoryItem needed to allow the above C++ code compile.
<http://nepomuk.kde.org/ontologies/2008/10/06/ndho> {
ndho:HistoryItem
a rdfs:Class ;
rdfs:comment "A kget history item." ;
rdfs:label "application" ;
rdfs:subClassOf rdfs:Resource .
ndho:destination
a rdf:Property ;
rdfs:comment "Destination of the download." ;
rdfs:domain ndho:HistoryItem ;
rdfs:label "source" ;
rdfs:range xsd:string ;
nrl:maxCardinality "1" .
...
At the base of the Nepomuk project is the Soprano library and command
line tools which depend only on QtCore, making them a useful RDF
library for use on both desktop and mobile platforms. The Nepomuk
libraries build on Soprano to make writing KDE applications using RDF
simple. One of the great things about the design of Soprano is that
there are multiple backends which can store and query RDF. So there can
be a memory mapped implementation for a mobile device, or a full-blown
database server for a LAN, and applications still use the same
API.
For a long time Soprano has had two main
backends: Redland and Sesame2. The
former is a C library for RDF and the latter a Java implementation.
While Sesame2 is written in Java it can
deliver
better query performance than Redland. This left KDE4 in the
predicament that it required Java to achieve good RDF performance. To
solve this issue the
new Virtuoso backend was
created and is getting to the point where it is now stable.
As
I discovered
recently, the main impediment to developing a backend for
soprano is implementing SPARQL.
Adoption still remains the major hurdle for Nepomuk and Soprano.
With the host of persistence options available, the first thing that
comes to an application developer's mind might be flat files, MySQL,
Sqlite, Berkeley DB, or some
generic relational database library, when wanting to store and retrieve
data. However, when storing data that might be of interest to
other applications, using Nepomuk or Soprano has the potential to
unlock an application's data. As can be seen above,
the main thing to learn is a bit about the schema language and
then native C++ objects can be used to interact with Nepomuk
from an application.
Comments (8 posted)
Page editor: Jonathan Corbet
Next page: Security>>