Python finally offloads some batteries
Python has often been touted as a "batteries included" language because of its rich standard library that provides access to numerous utility modules and is distributed with the language itself. But those libraries need maintenance, of course, and that is provided by the Python core development team. Over the years, it has become clear that some of the modules are not really being maintained any longer and they probably are not really needed by most Python users—either because better alternatives exist or because they address extremely niche use cases. A long-running project to start the removal of those modules has recently been approved.
A 2018 Python Language Summit session was
the start of Christian Heimes's quest to unplug some of the old batteries
in the standard library. That discussion led to the first
draft of PEP 594
("Removing dead batteries from the standard library
") in
May 2019. It listed more than two dozen modules, scattered across the
standard library, to consider for removal.
The PEP has been floating around in Python space since that time; in general, core developers have been favorably inclined toward the idea, though deciding exactly which modules would be removed was always difficult. The process of removing a module from the standard library starts with deprecation for two release cycles, followed by the actual removal. But the project has struggled with how to handle deprecations in the language over the last few years, as our Python article index entry shows.
Revival
Discussion of the PEP occurred sporadically in a thread
on the Python discussion forum since Heimes first posted the PEP there
in 2019. In early February, the PEP was revived
by new co-author Brett Cannon in a new
forum post. Cannon said that he expected to propose it for a decision
by the steering council (of which he is a member) soon, "barring any
major objections that come up in this topic
". As can be seen in the
update
history section of the PEP, the list of modules to be removed has
evolved over time.
The current version removes 21 separate modules that are described in
the PEP abstract as: "mostly historic data formats
(e.g. Commodore and SUN file formats), APIs and operating systems that have
been superseded a long time ago (e.g. Mac OS 9), or modules that have
security implications and better alternatives (e.g. password and
login).
"
The full list of modules that would be removed in the PEP is as follows:
Type Modules Data encoding uu (and the associated uu codec) and xdrlib Multimedia aifc, audioop, chunk, imghdr, ossaudiodev, sndhdr, and sunau Networking asynchat, asyncore, cgi, cgitb, smtpd, nntplib, and telnetlib OS interface crypt, nis, and spwd Miscellaneous msilib and pipes
Comparing that table with the one in our article on the introduction of the PEP shows that the broad strokes are the same, but the details have changed somewhat. The removals were meant to be largely non-controversial, so if good reasons to keep a module were raised—and the maintenance burden was low—it was retained. The list is also different because some of the modules have already been removed. Modules that were considered for removal along the way, but retained (at least for now), were also described in the PEP along with the reasons behind keeping them. One of the more amusing reasons for retaining a module is for wave, which handles the WAV sound-file format:
According to David Beazley the wave module is easy to teach to kids and can make crazy sounds. Making a computer generate sounds is a powerful and highly motivating exercise for a nine-year-old aspiring developer. It’s a fun battery to keep.
The wave module also provides an example of the kinds of work that remains to be done if the modules are removed; wave relies on the audioop module that is being removed:
The module uses one simple function from the audioop module to perform byte swapping between little and big endian formats. Before 24 bit WAV support was added, byte swap used to be implemented with the array module. To remove wave’s dependency on audioop, the byte swap function could be either be moved to another module (e.g. operator) or the array module could gain support for 24-bit (3-byte) arrays.
A few of the to-be-removed modules were actually deprecated long ago—even more modules had been proposed for deprecation in two now-inactive PEPs—while the bulk of the modules would be deprecated for the upcoming Python 3.11 (due in October) and potentially removed in Python 3.13 (due in October 2024). Three modules, asynchat, asyncore, and smtpd, would be removed in Python 3.12 in 2023. This would be the biggest upheaval in the standard library for quite a long time, if not in its history.
The discussion thread on the PEP revival had relatively few comments, mostly corrections or clarifications, but there were a few muted complaints about some of the choices. The PEP does not specify what will happen to the modules after they are removed; the code will obviously still be available, so interested users could create Python Package Index (PyPI) modules or incorporate parts into other projects. "Vendoring" some pieces, by copying the code directly into an affected project (e.g. into the Roundup Issue Tracker) is another possibility.
On February 16, Cannon submitted the PEP to the steering council and on March 11, Gregory P. Smith accepted the PEP on behalf of the council. There were a few suggestions from the council as part of the acceptance, starting with backporting the deprecation notices into the module documentation for earlier—but still active—versions of the language, so that more developers will be aware of the upcoming removals.
In addition, the council asked that care be taken
during the alpha and beta parts of the release cycle to ensure that there
were no problems being caused. "If it turns out the removal of a
module proves to be a problem in practice despite the clear deprecation,
deferring the removal of that module should be considered to avoid
disruption.
" We saw just that kind of deferral back in February
when deprecated portions of two modules
were causing problems for Fedora.
While Smith said that the council expects this kind of mass-removal event to be a one-time thing, it does mean that more ongoing attention should be paid to the contents of the standard library:
Doing a “mass cleanup” of long obsolete modules is a sign that we as a project have been ignoring rather than maintaining parts of the standard library, or not doing so with the diligence being in the standard library implies they deserve. Resolving ongoing discussions around how we define the stdlib for the long term does not block this PEP. It seems worthwhile for us to conduct regular reviews of the contents of the stdlib every few releases so we can avoid accumulating such a large pile of dead batteries, but this is outside the scope of this particular PEP.
urllib too?
At roughly the same time Cannon revived PEP 594, Victor Stinner was proposing the deprecation (and eventual removal) of the urllib module on the python-dev mailing list. As its name would imply, urllib is for handling URLs, but it does quite a bit more than just that. In his lengthy message, Stinner described a number of problems that he sees with the module, including a complicated API, many better alternatives, no support for HTTP/2 or HTTP/3, and a lack of maintenance, with lots of open issues, including some security issues.
There are four different sub-modules for urllib, with urllib.request, for opening URLs, and urllib.parse, for parsing URLs, as the most prominent and widely used of them. Stinner proposed deprecating all four, but, as he recognized, deprecating all, or even parts, of urllib is going to be controversial. It is likely going to be an uphill battle (and require a PEP of its own) as the discussion showed.
There were a number of objections raised, including Dong-hee Na's concern about the use of urllib in the pip PyPI-package installer. While Stinner thought that perhaps retaining urllib.parse would be sufficient for pip, Damian Shaw disagreed, noting that pip is dependent on parts of urllib.request as well. Beyond that, Shaw said that some of the alternative libraries mentioned by Stinner rely on parts of urllib too.
Paul Moore was strongly
against the proposal; he said that while use of urllib.request was not a best
practice, it is "still extremely
useful for simple situations
". Beyond that, the standard library
itself uses parts of urllib heavily and dependencies on
modules outside the standard library are unsuitable in some domains; he was
concerned about pip as well:
[...] pip relies pretty heavily on urllib (parse and request), and pip has a bootstrapping issue, so using 3rd party libraries is non-trivial. Also, of pip's existing vendored dependencies, webencodings, urllib3, requests, pkg_resources, packaging, html5lib, distlib and cachecontrol all import urllib. So this would be *hugely* disruptive to the whole packaging ecosystem (which is under-resourced at the best of times, so this would put a lot of strain on us).
No one really directly refuted Stinner's contentions about the problems with urllib; the complaints and concerns were about removing it without adequate replacement in the standard library itself. Heimes generally agreed that the problems are real:
The urllib package -- and to some degree also the http package -- are constant source of security bugs. The code is old and the parsers for HTTP and URLs don't handle edge cases well. Python core lacks a true maintainer of the code. To be honest, we have to admit defeat and be up front that urllib is not up to the task for this decade. It was designed [and] written during a more friendly, less scary time on the internet.
He also said that if he "had the power and time
", he would
replace urllib with a simpler HTTP client that used the services
provided by the underlying operating systems. For more complex uses, there
are other options available in PyPI. Another possibility would be to
"reduce the feature set of urllib to core HTTP
(no ftp, proxy, HTTP auth)
" coupled with a partial rewrite to make
the other remaining pieces more standards-compliant and simpler.
Several were in favor of either of those options, though Smith felt
that even the simplification options would cause "disruption to the
world and loss of trust in Python
". There are, it seems, lots of
good reasons to keep urllib around, but the question of
maintenance remains. No one volunteered to take on urllib and
address some of its obvious problems, though Senthil Kumaran said
that urllib.parse "is semi-maintained
".
Given all of that, users of urllib can be pretty confident it will be around for quite a bit longer, perhaps forever. But the maintenance problem needs to be addressed somehow given that urllib interacts with the internet and all of the inherent messiness and danger that comes along for the ride. With luck, part of the frequent re-evaluation of the contents of the standard library that the steering council recommended will also find ways to identify and address the maintenance holes in the standard library as well. If not, it would seem that there are some ticking time bombs lurking there.
| Index entries for this article | |
|---|---|
| Python | Deprecation |
