|
|
Subscribe / Log in / New account

Python finally offloads some batteries

By Jake Edge
March 16, 2022

Python has often been touted as a "batteries included" language because of its rich standard library that provides access to numerous utility modules and is distributed with the language itself. But those libraries need maintenance, of course, and that is provided by the Python core development team. Over the years, it has become clear that some of the modules are not really being maintained any longer and they probably are not really needed by most Python users—either because better alternatives exist or because they address extremely niche use cases. A long-running project to start the removal of those modules has recently been approved.

A 2018 Python Language Summit session was the start of Christian Heimes's quest to unplug some of the old batteries in the standard library. That discussion led to the first draft of PEP 594 ("Removing dead batteries from the standard library") in May 2019. It listed more than two dozen modules, scattered across the standard library, to consider for removal.

The PEP has been floating around in Python space since that time; in general, core developers have been favorably inclined toward the idea, though deciding exactly which modules would be removed was always difficult. The process of removing a module from the standard library starts with deprecation for two release cycles, followed by the actual removal. But the project has struggled with how to handle deprecations in the language over the last few years, as our Python article index entry shows.

Revival

Discussion of the PEP occurred sporadically in a thread on the Python discussion forum since Heimes first posted the PEP there in 2019. In early February, the PEP was revived by new co-author Brett Cannon in a new forum post. Cannon said that he expected to propose it for a decision by the steering council (of which he is a member) soon, "barring any major objections that come up in this topic". As can be seen in the update history section of the PEP, the list of modules to be removed has evolved over time.

The current version removes 21 separate modules that are described in the PEP abstract as: "mostly historic data formats (e.g. Commodore and SUN file formats), APIs and operating systems that have been superseded a long time ago (e.g. Mac OS 9), or modules that have security implications and better alternatives (e.g. password and login)." The full list of modules that would be removed in the PEP is as follows:

Type Modules
Data encoding uu (and the associated uu codec) and xdrlib
Multimedia aifc, audioop, chunk, imghdr, ossaudiodev, sndhdr, and sunau
Networking asynchat, asyncore, cgi, cgitb, smtpd, nntplib, and telnetlib
OS interface crypt, nis, and spwd
Miscellaneous msilib and pipes

Comparing that table with the one in our article on the introduction of the PEP shows that the broad strokes are the same, but the details have changed somewhat. The removals were meant to be largely non-controversial, so if good reasons to keep a module were raised—and the maintenance burden was low—it was retained. The list is also different because some of the modules have already been removed. Modules that were considered for removal along the way, but retained (at least for now), were also described in the PEP along with the reasons behind keeping them. One of the more amusing reasons for retaining a module is for wave, which handles the WAV sound-file format:

According to David Beazley the wave module is easy to teach to kids and can make crazy sounds. Making a computer generate sounds is a powerful and highly motivating exercise for a nine-year-old aspiring developer. It’s a fun battery to keep.

The wave module also provides an example of the kinds of work that remains to be done if the modules are removed; wave relies on the audioop module that is being removed:

The module uses one simple function from the audioop module to perform byte swapping between little and big endian formats. Before 24 bit WAV support was added, byte swap used to be implemented with the array module. To remove wave’s dependency on audioop, the byte swap function could be either be moved to another module (e.g. operator) or the array module could gain support for 24-bit (3-byte) arrays.

A few of the to-be-removed modules were actually deprecated long ago—even more modules had been proposed for deprecation in two now-inactive PEPs—while the bulk of the modules would be deprecated for the upcoming Python 3.11 (due in October) and potentially removed in Python 3.13 (due in October 2024). Three modules, asynchat, asyncore, and smtpd, would be removed in Python 3.12 in 2023. This would be the biggest upheaval in the standard library for quite a long time, if not in its history.

The discussion thread on the PEP revival had relatively few comments, mostly corrections or clarifications, but there were a few muted complaints about some of the choices. The PEP does not specify what will happen to the modules after they are removed; the code will obviously still be available, so interested users could create Python Package Index (PyPI) modules or incorporate parts into other projects. "Vendoring" some pieces, by copying the code directly into an affected project (e.g. into the Roundup Issue Tracker) is another possibility.

On February 16, Cannon submitted the PEP to the steering council and on March 11, Gregory P. Smith accepted the PEP on behalf of the council. There were a few suggestions from the council as part of the acceptance, starting with backporting the deprecation notices into the module documentation for earlier—but still active—versions of the language, so that more developers will be aware of the upcoming removals.

In addition, the council asked that care be taken during the alpha and beta parts of the release cycle to ensure that there were no problems being caused. "If it turns out the removal of a module proves to be a problem in practice despite the clear deprecation, deferring the removal of that module should be considered to avoid disruption." We saw just that kind of deferral back in February when deprecated portions of two modules were causing problems for Fedora.

While Smith said that the council expects this kind of mass-removal event to be a one-time thing, it does mean that more ongoing attention should be paid to the contents of the standard library:

Doing a “mass cleanup” of long obsolete modules is a sign that we as a project have been ignoring rather than maintaining parts of the standard library, or not doing so with the diligence being in the standard library implies they deserve. Resolving ongoing discussions around how we define the stdlib for the long term does not block this PEP. It seems worthwhile for us to conduct regular reviews of the contents of the stdlib every few releases so we can avoid accumulating such a large pile of dead batteries, but this is outside the scope of this particular PEP.

urllib too?

At roughly the same time Cannon revived PEP 594, Victor Stinner was proposing the deprecation (and eventual removal) of the urllib module on the python-dev mailing list. As its name would imply, urllib is for handling URLs, but it does quite a bit more than just that. In his lengthy message, Stinner described a number of problems that he sees with the module, including a complicated API, many better alternatives, no support for HTTP/2 or HTTP/3, and a lack of maintenance, with lots of open issues, including some security issues.

There are four different sub-modules for urllib, with urllib.request, for opening URLs, and urllib.parse, for parsing URLs, as the most prominent and widely used of them. Stinner proposed deprecating all four, but, as he recognized, deprecating all, or even parts, of urllib is going to be controversial. It is likely going to be an uphill battle (and require a PEP of its own) as the discussion showed.

There were a number of objections raised, including Dong-hee Na's concern about the use of urllib in the pip PyPI-package installer. While Stinner thought that perhaps retaining urllib.parse would be sufficient for pip, Damian Shaw disagreed, noting that pip is dependent on parts of urllib.request as well. Beyond that, Shaw said that some of the alternative libraries mentioned by Stinner rely on parts of urllib too.

Paul Moore was strongly against the proposal; he said that while use of urllib.request was not a best practice, it is "still extremely useful for simple situations". Beyond that, the standard library itself uses parts of urllib heavily and dependencies on modules outside the standard library are unsuitable in some domains; he was concerned about pip as well:

[...] pip relies pretty heavily on urllib (parse and request), and pip has a bootstrapping issue, so using 3rd party libraries is non-trivial. Also, of pip's existing vendored dependencies, webencodings, urllib3, requests, pkg_resources, packaging, html5lib, distlib and cachecontrol all import urllib. So this would be *hugely* disruptive to the whole packaging ecosystem (which is under-resourced at the best of times, so this would put a lot of strain on us).

No one really directly refuted Stinner's contentions about the problems with urllib; the complaints and concerns were about removing it without adequate replacement in the standard library itself. Heimes generally agreed that the problems are real:

The urllib package -- and to some degree also the http package -- are constant source of security bugs. The code is old and the parsers for HTTP and URLs don't handle edge cases well. Python core lacks a true maintainer of the code. To be honest, we have to admit defeat and be up front that urllib is not up to the task for this decade. It was designed [and] written during a more friendly, less scary time on the internet.

He also said that if he "had the power and time", he would replace urllib with a simpler HTTP client that used the services provided by the underlying operating systems. For more complex uses, there are other options available in PyPI. Another possibility would be to "reduce the feature set of urllib to core HTTP (no ftp, proxy, HTTP auth)" coupled with a partial rewrite to make the other remaining pieces more standards-compliant and simpler.

Several were in favor of either of those options, though Smith felt that even the simplification options would cause "disruption to the world and loss of trust in Python". There are, it seems, lots of good reasons to keep urllib around, but the question of maintenance remains. No one volunteered to take on urllib and address some of its obvious problems, though Senthil Kumaran said that urllib.parse "is semi-maintained".

Given all of that, users of urllib can be pretty confident it will be around for quite a bit longer, perhaps forever. But the maintenance problem needs to be addressed somehow given that urllib interacts with the internet and all of the inherent messiness and danger that comes along for the ride. With luck, part of the frequent re-evaluation of the contents of the standard library that the steering council recommended will also find ways to identify and address the maintenance holes in the standard library as well. If not, it would seem that there are some ticking time bombs lurking there.


Index entries for this article
PythonDeprecation


to post comments

Python finally offloads some batteries

Posted Mar 16, 2022 22:06 UTC (Wed) by milesrout (subscriber, #126894) [Link] (26 responses)

Python is core infrastructure not just of many operating systems (Gentoo comes to mind in particular) but of the whole internet. If Python breaks, the world breaks.

The question I have is: why, when all these billion/trillion-dollar companies depend so much on Python, can they not dedicate some resources to its maintenance? Python would be so much better with like.. a handful more developers contributing to it. A handful of full time contributors that could maintain things like urllib would obviate much of the need for these disruptive changes. Python is famously undermanned and has been for quite some time. How many millions or billions of dollars have been made on the back of the use of Python by people that happily run it without even considering contributing back to its development?

Hundreds (thousands?) of developers are employed to contribute to the kernel. But we only need to look at Python and OpenSSL and other pieces of core infrastructure that *don't* receive that support to see that the real reason for this isn't that they are dependent on it. People don't employ Linux kernel developers because they want to support the kernel. They employ them because they NEED code to be written for their particular use cases. These other projects don't receive proper support commensurate with the level to which these companies depend on them because they can rely on the code just sitting there being maintained by part timers.

I think that Python should aggressively remove old crud from its standard library. If that breaks people's code.. well.. tough luck? If you don't want to contribute to its maintenance, then you can't expect it to stick around forever. If you desperately need [insert old cruddy standard library functionality here], maintain it yourself, inside the standard library or out. Get involved. Or don't get involved. Maintain it as a separate package once it's removed. But the inevitable cries of "oh my god Python is changing something and it breaks my code!" from people that have used and abused Python for decades without ever contributing anything back should not just be ignored. They should be condemned for what they are: the cries of ungrateful leeches on the goodwill and hard work of others.

Python finally offloads some batteries

Posted Mar 16, 2022 22:30 UTC (Wed) by pebolle (guest, #35204) [Link] (15 responses)

> But the inevitable cries of "oh my god Python is changing something and it breaks my code!" from people that have used and abused Python for decades without ever contributing anything back should not just be ignored. They should be condemned for what they are: the cries of ungrateful leeches on the goodwill and hard work of others.

Once again it's proven that Poe's law is universal.

Python finally offloads some batteries

Posted Mar 16, 2022 23:31 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (14 responses)

Meh, the bit you quoted is rather melodramatic, but there is a point here: If you contribute exactly zero developer-hours to project X, then you should expect to get exactly zero say in how project X is run. Anything above and beyond that is a gift from the project X contributors, and may be withdrawn at any time.

Python finally offloads some batteries

Posted Mar 16, 2022 23:52 UTC (Wed) by pebolle (guest, #35204) [Link] (8 responses)

"ungrateful leeches on the goodwill and hard work of others" is not rather melodramatic, it is nauseating. Free software does not and should not include an obligation to contribute back. It's perfectly fine to only use free software. That's one of its tenets.

Moreover it is also perfectly fine to criticize a project as a mere user. "You made me rewrite my program!" isn't invalidated by "You should have contributed money, patches or bug reports!". That's basically a truism.

Python finally offloads some batteries

Posted Mar 17, 2022 0:06 UTC (Thu) by anselm (subscriber, #2796) [Link] (3 responses)

It's perfectly fine to only use free software. That's one of its tenets.

Yes, but the whole point of free software is also to give users what they need (from the programming and legal POV) to scratch their own itches; the original developers are under no obligation whatsoever to scratch their users' itches for them for free, indefinitely.

Python finally offloads some batteries

Posted Mar 17, 2022 0:21 UTC (Thu) by pebolle (guest, #35204) [Link] (2 responses)

That's all correct.

But it doesn't justify name-calling users that do not contribute back. Neither does it mean that one shouldn't be able to criticize a free software project without having contributed back.

Python finally offloads some batteries

Posted Mar 17, 2022 0:45 UTC (Thu) by anselm (subscriber, #2796) [Link] (1 responses)

I agree about the name-calling, but when it comes to “criticising a free software project”, there are obvious differences between criticism that is constructive, which should be welcome from anybody, and “criticism” that is basically vociferous complaints by non-contributing users that they're not getting their itches scratched for free, which developers should be free to disregard at will.

(If users can't scratch their own itches, the least they can do, instead of complaining, is learn how to write and submit meaningful and constructive bug reports. If nothing else, this would turn them into contributing users who are actually helping the project.)

Python finally offloads some batteries

Posted Mar 17, 2022 7:37 UTC (Thu) by Wol (subscriber, #4433) [Link]

The other thing they can do - if they are businesses - is to sponsor a developer to scratch their itches for them.

I've just been watching this scenario play out on a kernel mailing list - some users have no clue ... it's tricky, that one came over somewhat as a culture clash ...

Cheers,
Wol

Python finally offloads some batteries

Posted Mar 25, 2022 7:13 UTC (Fri) by oldtomas (guest, #72579) [Link] (3 responses)

If the name describes the situation adequately ("leech", in this case a metaphor), I don't have issues with it.

That's what names have been made for, after all.

Python finally offloads some batteries

Posted Mar 28, 2022 13:23 UTC (Mon) by nye (subscriber, #51576) [Link] (2 responses)

> If the name describes the situation adequately ("leech", in this case a metaphor), I don't have issues with it

A leech is a parasite that latches on to an unwilling host and drains the life out of it to sustain itself. Some metaphor. In fact, the word is specifically used as a particularly emotive term of derision and hatred; it implies that the speaker *utterly despises* the person they're talking about.

Basically, it's a more specific way of describing somebody as a "worthless fucking cunt", or similar. On the offensiveness scale, it's high enough that I can only really think of one word that's higher but that I might occasionally use in close company when senselessly enraged; everything higher than *that* I wouldn't even *think*, let alone say.

Python finally offloads some batteries

Posted Mar 28, 2022 21:21 UTC (Mon) by pebolle (guest, #35204) [Link]

> it implies that the speaker *utterly despises* the person they're talking about.

Exactly. And the infuriating part, for me, is that this metaphor is used for people and organizations doing what I thought was the right thing: using Free Software.

Python finally offloads some batteries

Posted Mar 29, 2022 10:11 UTC (Tue) by kleptog (subscriber, #1183) [Link]

Maybe it's a cultural thing? A leech (in my experience) doesn't tend to harm the host, and they go away by themselves. They are at most temporarily irritating.

However, I just looked it up in the dictionary and it has definitions involving the words "exploit" and "extort" which are much more negative. If that's the meaning you're using then I can see the statement being interpreted very differently.

Python finally offloads some batteries

Posted Mar 17, 2022 3:36 UTC (Thu) by k8to (guest, #15413) [Link] (4 responses)

I mean, in practice i've tried to submit bug fixes to some lesser maintained core libraries and got told to shove off no one wanted to review my fixes for this crap library. So I guess I'm a leech because I was told to leech.

Python finally offloads some batteries

Posted Mar 17, 2022 6:26 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

Several years ago, I uploaded a few lines of SQL to GitHub as a Gist. I wrote those lines of SQL while trying to teach myself about a specific domain's typical data model (if you must know, it was double-entry bookkeeping). It's a very simplistic demonstration of "when you strip all the complicated bullshit away, here's what an accounting ledger actually looks like." Every few months, some random drops a comment asking me if I considered use case X or Y. It's a Gist, not a proper GitHub project (no bug tracker, only one file, that file is well under 50 lines of code, and I'm being generous in describing a schema as "code", etc.). It's not *supposed* to support use case X, for any value of X. It's not even supposed to be used, you're really just supposed to look at it to better understand the problem domain (obviously, if someone chooses to run it, I'm not going to try and stop them, but it is very obviously an incomplete product). I can't imagine what happens to people who upload whole-ass codebases they wrote five years ago and never intended to support.

So yes, people will reject incoming contributions, if the project was never intended to be supported in the first place, or if they have decided to stop supporting it, as is their prerogative. Nobody can force me to do a code review if I don't want to (except maybe for my employer, but they pay me for that).

Python finally offloads some batteries

Posted Mar 17, 2022 8:34 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (1 responses)

Python is not 10 lines on a pastebin.

Python finally offloads some batteries

Posted Mar 17, 2022 17:18 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

I never said it was. My point is that, even if it *was* 10 lines on a pastebin, people would still expect support, because people, in general, are unreasonable, and developers get tired of dealing with them.

Python finally offloads some batteries

Posted Mar 24, 2022 20:49 UTC (Thu) by dm93 (guest, #157602) [Link]

This was my experience as well.

I was told not to fix things because "it would clutter up the git history."

Python finally offloads some batteries

Posted Mar 17, 2022 8:43 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (2 responses)

Well to be fair python only suddenly notices they are breaking code if the complainer is $$big$$ enough.

They don't care at all to break a library with only tens of projects using it.

Python finally offloads some batteries

Posted Mar 21, 2022 8:09 UTC (Mon) by nilsmeyer (guest, #122604) [Link] (1 responses)

> They don't care at all to break a library with only tens of projects using it.

Should they?

Python finally offloads some batteries

Posted Apr 25, 2022 7:33 UTC (Mon) by LtWorf (subscriber, #124958) [Link]

On the long run these sort of things bring people to using other languages.

Python finally offloads some batteries

Posted Mar 17, 2022 8:45 UTC (Thu) by ballombe (subscriber, #9523) [Link] (2 responses)

If you want a car to stay put, how much gasoline are you willing to pump in it ?

Python finally offloads some batteries

Posted Mar 18, 2022 16:33 UTC (Fri) by notriddle (subscriber, #130608) [Link]

If you want Python to stay put, don't upgrade. If you don't want your cake to go away, don't eat it.

Python finally offloads some batteries

Posted Mar 22, 2022 10:31 UTC (Tue) by tao (subscriber, #17563) [Link]

Obviously if I wanted a car to *stay put* I would not pump any gasoline into it at all. Refuelling it would just make it easier for it to drive off. :P

Python finally offloads some batteries

Posted Mar 17, 2022 12:08 UTC (Thu) by azumanga (subscriber, #90158) [Link] (3 responses)

I genuinely looked at how I could join the Python 2 team, to keep it going (with minimal future changes). Turns out the people in charge of Python didn't want that -- they explicitly wanted it to stop, even if people and companies were willing to provide future support.

Python finally offloads some batteries

Posted Mar 17, 2022 17:19 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

You have the right to fork (which someone has already exercised, see the Tauthon project). If you don't want to do that, then you were (presumably) expecting to get something out of the PSF that you can't get out of a fork, and the PSF is not obligated to give that something to you if they do not wish to do so.

Python finally offloads some batteries

Posted Mar 18, 2022 8:31 UTC (Fri) by milesrout (subscriber, #126894) [Link] (1 responses)

That pretty much confirms my point, though. There were supposedly tens of thousands of developers out there all dependent on Python 2. They couldn't *possibly* move to Python 3. They insisted Python 2 had to be maintained. A few of them, like you, may have inquired about contributing to maintenance of Python 2.

But where is the big effort to actually do it? I can't see one. There are a few basically-dead projects out there to maintain forks of Python 2.7. I don't think (?) any of them are still going. So despite all the handwringing and complaining, it turns out that the *revealed preference* of those people is actually that it's less work and more desirable for them to just port to Python 3.

It's a lot easier to complain or to inquire about something than to actually *do* it. If it were really important to you to maintain Python 2, you would have done it with a bunch of other people for whom it was important.

Python finally offloads some batteries

Posted Mar 18, 2022 9:26 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

> is actually that it's less work and more desirable for them to just port to Python 3.

Or to not-Python.

PCM

Posted Mar 16, 2022 23:03 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (6 responses)

> According to David Beazley the wave module is easy to teach to kids and can make crazy sounds. Making a computer generate sounds is a powerful and highly motivating exercise for a nine-year-old aspiring developer. It’s a fun battery to keep.

This motivation makes sense, but going via wave (and thus WAV) seems unfortunate.

WAV is *almost* but not quite just a wrapper for PCM data, and PCM data really is a pretty fundamental thing that is likely to be just as important and relevant ten years later than the nine-year old first encounters this, but WAV not so much.

I think, from a glance at wave, that it expects you to think about PCM data the way it would have been stored on a CD in the 1990s, specifically as signed 16-bit integers, but that feels very much like something kids shouldn't need to care about at age nine and is less likely to matter in say 2035 than today. Presenting the same capabilities but with normalisation might work, or, if nine year olds are comfortable with decimals, the convention in modern PCM (in software anyway) is the range -1.0 to +1.0 via 0 at 0dB full scale. I don't think there's value in learning that for some crazy reason -32768 through 32767 is the range used by some hardware (signed 16-bit) and there's a _very destructive_ tendency in software that does reflect -32768 to 32767 to not provide saturating arithmetic. In the actual world, with actual sounds, 20000 + 20000 = 40000 and [that being impossible for our 16-bit hardware] so 32767 is the "correct" answer, this correct behaviour for sound is called clipping, but even in Python with 16-bit integers nearby I think you might easily wind up getting -25535 which is horribly wrong (and sounds awful).

Avoiding 16-bit integer arithmetic makes it more likely kids are learning (via cool noises) about how PCM actually works rather than things that didn't work how you'd expect on a 1980s computer.

(if you've ever heard a space rocket take off, you've heard actual clipping - even with sensors designed not to "clip" themselves sound really does saturate like this eventually, but at a loudness which would permanently deafen humans so you can't experience it directly close up)

PCM

Posted Mar 16, 2022 23:15 UTC (Wed) by milesrout (subscriber, #126894) [Link] (1 responses)

>I think, from a glance at wave, that it expects you to think about PCM data the way it would have been stored on a CD in the 1990s, specifically as signed 16-bit integers, but that feels very much like something kids shouldn't need to care about at age nine and is less likely to matter in say 2035 than today.

I'll preface this by saying that personally I found that all the 'real limitations of hardware' stuff was the most interesting thing I learnt about computers. I wish I'd been able to learn about computers back in the 8-bit or 16-bit eras of computing, rather than learning high level languages on a (relatively, but by today's standards obviously not very) modern computer. From a purely personal perspective it was always what interested me the most. Computer programming was relatively uninteresting to me until I got down to that level of detail. I was never really interested in graphical programming environments, or drawing using turtles or whatever. It got interesting when we got into the low level details. That's when it came alive.

BUT I have to agree that if someone _has_ decided to introduce children to computers using a high level language like Python (which clearly suits some children, even it it didn't suit me), then he has clearly chosen the high level route, and the consistent, congruent, sensible thing to do there is to use a relatively high level representation of sound like PCM. If a teacher wished to talk about limited-range hardware integer representations etc. then he probably wouldn't have chosen Python in the first place.

PCM

Posted Mar 24, 2022 12:19 UTC (Thu) by davidgerard (guest, #100304) [Link]

> BUT I have to agree that if someone _has_ decided to introduce children to computers using a high level language like Python (which clearly suits some children, even it it didn't suit me),

It's taught in UK schools now. Scratch in primary school, Python in high school.

PCM

Posted Mar 17, 2022 9:21 UTC (Thu) by eru (subscriber, #2753) [Link] (3 responses)

WAV is commonly used as a lossless audio format, for example when ripping CD:s, and as a lowest common denominator when moving audio data between different programs. So keeping it is well justified.

PCM

Posted Mar 17, 2022 11:49 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (2 responses)

All WAV is doing in these scenarios is providing a few header bytes on some raw PCM data. If you have software to just treat some data as 16-bit unsigned PCM data and play it as a sound, you'll find your WAV file is exactly that sound, except there's a brief fraction of a second glitch at the start because the first few bytes are a header.

The actual Microsoft WAV format technically does lots more than this, but scarcely anybody bothers implementing any of that stuff, and I think that includes Python's wave. So we should admit that we don't want WAV we just wanted to store PCM data.

If you've ever run into files (or downloads) named .xls but they were actually CSV files, it's a little similar to that. Do you want an XLS reader? Well, that's pretty complicated, I worked on one (for Gnumeric) and it's quite a ride. But if all you need is to read the CSV file you didn't actually want a XLS reader and it's probably misleading to name your CSV parser "xls".

PCM

Posted Mar 17, 2022 12:26 UTC (Thu) by nix (subscriber, #2304) [Link]

> The actual Microsoft WAV format technically does lots more than this, but scarcely anybody bothers implementing any of that stuff, and I think that includes Python's wave. So we should admit that we don't want WAV we just wanted to store PCM data.

Yes, but... huge amounts of software accepts and produces WAV files. Much less accepts or produces PCM. It's an interchange format in that respect, like it or not...

PCM

Posted Mar 17, 2022 19:40 UTC (Thu) by mbunkus (subscriber, #87248) [Link]

It actually isn't that easy as not all PCM data is equal. There are several parameters that a decoder must know in order to play that sound back properly, including:

* sampling frequency (also called sample rate)
* number of channels
* order of channels
* number of bits per sample (also called bit depth)
* signed or unsigned
* Endianess if bits per sample > 8
* PCM format (e.g. a-law, µ-law, linear, etc.)

PCM used on CDs isn't the same as PCM used on Blu-rays, for example.

Not one of those parameters is really optional. Sure, you can specify all of them manually when decoding (or when using tools such as sox to convert from raw PCM data to one of the containerized variants), but the container's whole purpose is to relief us humans of the need to keep that extra information around somehow. That's what the WAV container does. As do other container formats, but WAV one of the simplest one to use (within certain limits), making it especially suitable for people new to programming or new to audio processing.

Python finally offloads some batteries

Posted Mar 17, 2022 1:10 UTC (Thu) by pabs (subscriber, #43278) [Link] (17 responses)

It seems like the Python standard library is where modules go to die.
It might be better to just ditch the entirety of the standard library.
There could be different groups (like the Python Packaging Authority) defining sets of blessed modules that are modern and well maintained using best practices.

Python finally offloads some batteries

Posted Mar 17, 2022 1:35 UTC (Thu) by jafd (subscriber, #129642) [Link] (4 responses)

Please no.

It’s bad enough in Perl where modules go away from core and back between releases.

Python finally offloads some batteries

Posted Mar 17, 2022 1:37 UTC (Thu) by pabs (subscriber, #43278) [Link] (3 responses)

Perl definitely shouldn't be doing that; my approach would be to leave them out permanently.

Python finally offloads some batteries

Posted Mar 17, 2022 12:28 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

I don't know. Doing that has the advantage that moving things in and out of core is uncontroversial and done routinely, and that things that leave core are *still easily accessible* (to the majority of users who can install stuff as needed), because they're still on CPAN. This in turn is easy because the majority of the standard library is maintained using the same build system as CPAN, so moving things from core to CPAN or dual-lifing them in both is literally a matter of running a couple of commands: almost no source changes are usually needed. In Python, my impression is that migrating a module is quite a big job, so there's a temptation to just drop it and leave it nowhere and its few remaining users out in the cold.

Python finally offloads some batteries

Posted Mar 17, 2022 13:20 UTC (Thu) by Wol (subscriber, #4433) [Link]

Yup. The Python people need to create some fjords for old batteries to pine for.

Cheers,
Wol

Python finally offloads some batteries

Posted Mar 17, 2022 14:25 UTC (Thu) by anselm (subscriber, #2796) [Link]

In Python, my impression is that migrating a module is quite a big job, so there's a temptation to just drop it and leave it nowhere and its few remaining users out in the cold.

The code is there, and making Python packages for PyPI isn't exactly rocket science. The seven people who actually still use ossaudiodev or nntplib can do their own legwork, or else the modules must not have been all that essential after all.

Python finally offloads some batteries

Posted Mar 17, 2022 5:34 UTC (Thu) by garyvdm (subscriber, #82325) [Link] (8 responses)

Respectfully I disagree.

Yes, you have your medium to big projects that need a setup.py/requirments.txt/pyproject.toml , and for those projects, doing this would not matter.

But there are just as many small projects. Think single script, where installation = drop the script into /usr/local/bin. This simplicity is valuable, and If you remove the entire, or a large portion of the stdlib, you take away the ability for this to be done.

Python finally offloads some batteries

Posted Mar 17, 2022 7:48 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (7 responses)

I would divide the stdlib into four non-equal groups:

1. Modules that belong in the stdlib, because they are basically language extensions or language services. itertools, dataclasses, enum, importlib, ast, sys, etc. all fall into this category. Compare and contrast the java.lang.* package. In particular, sys is never going anywhere because it literally cannot live anywhere other than the stdlib (at least in the CPython implementation).
2. Modules that belong in the stdlib, because the services they provide are so basic and so slow-moving that there are no real advantages to splitting them out. pathlib, collections, heapq, math, etc.
3. asyncio, which is a whole separate ball of wax all by itself. Probably this needs to stay as part of the stdlib because it's tightly coupled to the async/await syntax. In retrospect, I'm somewhat suspicious of that design decision, but it's far too late to change now regardless of whether it was a good idea at the time.
4. Everything else, which could probably go into PyPI and maybe have some sort of "automatically acquire the latest stable version at installation/upgrade time" logic (which all of the distros would promptly yank out and replace with package manager dependencies). Depending on who you ask, this group *may or may not* contain any or all of the following: old protocols and file formats, current protocols and file formats, higher-level services such as logging, GUI/TUI-related stuff like tkinter and curses, and maybe a few other misc. things that need or want to move faster than the stdlib can reasonably accommodate.

The main sticking point, IMHO, is the boundary between (2) and (4), as well as whether or not (4) should be installed-by-default or require separate installation. The advantage of installed-by-default over keep-in-stdlib is, of course, that the libraries can continue to be actively developed and maintained independently of Python's release schedule. The main disadvantage is that, if nobody actually steps up to maintain the libraries, they may get quite old and outdated...

Python finally offloads some batteries

Posted Mar 17, 2022 8:41 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (1 responses)

That would make python kinda useless for several thousands of users.

There's a million scripts running on rpi made by people that have no idea how to track dependencies and figure out how to manage them and lock and install and all that crap.

At some point you have to decide if you want to close shop and just let another better maintained language take over the spot.

Python finally offloads some batteries

Posted Mar 17, 2022 17:23 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Why? I just described a solution in which all of the same modules would still be installed by default, meaning your individual scripts wouldn't need to track anything.

Python finally offloads some batteries

Posted Mar 17, 2022 18:29 UTC (Thu) by logang (subscriber, #127618) [Link] (4 responses)

I strongly disagree.

This would be like the Linux Kernel developers deciding they don't have time to maintain large swaths of drivers and just dropping them. Then expecting people who need them to get the oot drivers from github using dkms and shift the weight on distributions to package all these drivers and deal with the resulting dependency hell. It would be a _giant_ mess. What would really happen is the drivers would be maintained even worse then they are now and become even more broken, and people who need them would be out of luck.

I think the emphasis should be on growing the project and the number of developers, not splintering off poorly maintained code into situations of even worse maintenance. Dropping modules that are obsolete and which nobody really uses is fine (allowing for the option of a maintainer who cares to step up); but dropping useful modules that people depend upon is not.

IMO, the best solution to the urllib issue would be to absorb the requests module into the standard library and bring all their developers with it. A development model similar to the kernel where a subsystem maintainer collects and sends patches upstream to the core python maintainers. There may be issues with this, but none that couldn't be worked out in the long run. Yes, the requests developers may need to do more work to ensure backwards compatibility, but they'd get help dealing with underlying core changes that affect their module. The python core team may have to make process changes to allow for a higher volume of security changes to stable releases. But after the pain, the benefit is a quality url module in the standard library and urllib could be re-implemented as a thin wrapper over it, and/or dropped after a very long deprecation period.

I've written many simple scripts that run on urllib because I have no interest in dealing with the additional dependency. If requests was in the standard library I'd certainly use it universally, but it is not and in many circumstances that disqualifies it. I suspect this is a common practice. The core team have a responsibility not to break such widespread usage with no real alternative for developers.

Python finally offloads some batteries

Posted Mar 17, 2022 21:17 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> but dropping useful modules that people depend upon is not.

If nobody is willing to maintain the code, it is de facto unmaintained. You can slap a "maintained" label on it all you like, but that does not cause maintenance work to get done.

> but they'd get help dealing with underlying core changes that affect their module.

I believe we've all learned the hard way that core changes should be backwards-compatible, and so this kind of help should not be required in the vast majority of reasonable cases.

> But after the pain, the benefit is a quality url module in the standard library and urllib could be re-implemented as a thin wrapper over it, and/or dropped after a very long deprecation period.

They certainly can't do that until requests stops depending on urllib.

Python finally offloads some batteries

Posted Mar 18, 2022 8:25 UTC (Fri) by milesrout (subscriber, #126894) [Link] (2 responses)

It's a bit different with Python, because the language itself is - pretty much - stable. They don't go out of their way to break core language interfaces in every version.

Linux developers, on the other hand, have no qualms about changing core interfaces in any old version. They don't exactly *go out of their way to*, especially where it would complicate backporting fixes to older versions. But look at the discussions happening around list iterators. They clearly are willing to change fundamental interfaces quite readily.

This means that out-of-tree modules for the kernel are in a very different level of support (none at all) than third-party modules for Python. The whole *point* of Python is a stable interface against which to write third-party modules! That's what a language *is*!

Python finally offloads some batteries

Posted Mar 18, 2022 15:46 UTC (Fri) by logang (subscriber, #127618) [Link] (1 responses)

That seems like wishful thinking at best. It is not nearly as stable as you think it is. Most python packages only support of subset of python reasons for lots of good reasons. If large swaths of the python library are now in PyPi they also now gain complicated dependencies between them as well. Maybe the kernel's driver API experiences more churn, but the point remains.

As NYKevin pointed out the requests module depends on urllib (yikes) so if that module gets removed from the standard library then you've broken requests for the latest release of python.

If tons of important modules are ejected then the core teams haS to stop removing or deprecating things to avoid the same dependancy hell the kernel would have with out of tree drivers.

Python finally offloads some batteries

Posted Mar 18, 2022 17:40 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> Most python packages only support of subset of python reasons for lots of good reasons.

In my experience, this "subset" is usually of the form "version 3.x or later" for some value of x (or, for a handful of very old libraries, "version 2.7.x only"). I don't believe I have seen a whole lot of libraries that set a maximum version, other than the ones which were never ported to 3.

Python finally offloads some batteries

Posted Mar 17, 2022 9:47 UTC (Thu) by ddevault (subscriber, #99589) [Link] (2 responses)

Some batteries is a good thing. The alternative is npm, which is a disaster.

Python finally offloads some batteries

Posted Mar 18, 2022 7:48 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (1 responses)

Well js lacks even kinda basic functions, but I agree with you.

The other problem with npm is that somehow they've decided (clearly without ever taking the time to do any measurement) that a library of 1 function is faster than putting a bunch of related functions all into the same library.

I think the original idea was to save time on doing js files downloads… but apparently the person saying this didn't know about all the headers and roundtrips that need to happen before a file can be downloaded.

Python finally offloads some batteries

Posted Mar 18, 2022 17:37 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> I think the original idea was to save time on doing js files downloads… but apparently the person saying this didn't know about all the headers and roundtrips that need to happen before a file can be downloaded.

I believe this may have been a contributing factor to the invention and adoption of HTTP/2, because now the server can just say upfront "here are all of the libraries you are going to need, when you eventually get around to running the JS."

Python finally offloads some batteries

Posted Mar 17, 2022 10:44 UTC (Thu) by kleptog (subscriber, #1183) [Link] (1 responses)

> The PEP does not specify what will happen to the modules after they are removed; the code will obviously still be available, so interested users could create Python Package Index (PyPI) modules or incorporate parts into other projects.

Why so passive? Just specify that the packages will be available under the same name on PyPI so that anyone using the code can simply add it to their dependencies and continue.

This is related to the issue that's it would be nice to be able to upgrade the batteries without having to upgrade the whole of Python.

Python finally offloads some batteries

Posted Mar 17, 2022 22:01 UTC (Thu) by Paf (subscriber, #91811) [Link]

I think it’s passive because they’re saying they’re not going to do it. It would be *allowed*, but they’re not doing it, which seems reasonable enough.

Python finally offloads some batteries

Posted Mar 17, 2022 17:19 UTC (Thu) by mb (subscriber, #50428) [Link] (7 responses)

cgi will be removed? Really?

I bet that's being used in thousands of in-production scripts. Including some of mine.
Why does this have to be removed?
What are the alternatives (that aren't deprecated tomorrow)?

Python finally offloads some batteries

Posted Mar 17, 2022 23:06 UTC (Thu) by cjwatson (subscriber, #7322) [Link] (5 responses)

I don't know about the rest of cgi, but cgi.FieldStorage is so broken as to be a snare and a delusion, so I'm glad they're removing it. I got involved with maintaining the multipart package when I found that FieldStorage was unusably broken for my purposes and that its design was enough of a ball of wax that it couldn't realistically be fixed without breaking something else.

Python finally offloads some batteries

Posted Mar 18, 2022 7:03 UTC (Fri) by mb (subscriber, #50428) [Link] (1 responses)

>cgi.FieldStorage is so broken as to be a snare and a delusion, so I'm glad they're removing it.

It works just fine for me.
What's broken with it?

I'm not against removing unused or rarely used modules.
But removing those widely used modules, like cgi, is going to cause major waste of developer time in the order of hundreds of thousands of hours. That's not Ok and it will hurt Python's reputation. Again.

Python finally offloads some batteries

Posted Mar 18, 2022 18:48 UTC (Fri) by cjwatson (subscriber, #7322) [Link]

https://bugs.python.org/issue27777 was an absolute blocker in my application causing extremely confusing failures (and that only due to quite pedantic tests - we might easily have missed it until it hit production), and caused me weeks of work trying to work around it before I eventually concluded that cgi.FieldStorage was engaged in playing core-wars with other bits of itself and there was no realistic prospect of it ever being fixed, so switched to something different instead. See https://github.com/zopefoundation/zope.publisher/issues/39.

Python finally offloads some batteries

Posted Mar 18, 2022 18:35 UTC (Fri) by edgewood (subscriber, #1123) [Link] (2 responses)

Did you find an alternative? I have a few scripts that are only called occasionally, and aren't performance sensitive, so are fine as CGI scripts, which saves me deployment hassle.

But I do need to parse URL and form parameters, and use FieldStorage for that. I was planning to just copy cgi.py when it went EOL, but if there's a better replacement I'll use it.

Python finally offloads some batteries

Posted Mar 18, 2022 18:49 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (1 responses)

The PEP itself has several suggestions (https://peps.python.org/pep-0594/#cgi).

Python finally offloads some batteries

Posted Mar 27, 2022 13:53 UTC (Sun) by edgewood (subscriber, #1123) [Link]

I'm so sorry that I missed that. I had skimmed the PEP, and had seen that section, but my eyes got pulled to the code sample at the bottom of the section.

I just had a chance to convert a script that used cgi.FieldStorage to use urllib.parse.parse_qs instead, and it only took me about a half an hour. It helped that I had a helper method to smooth over some weirdness caused by the interaction of FieldStorage and the structure of my existing HTML, and a lot of the accesses of the FieldStorage values already went through that method. I just changed it to access the parse_qs dictionary instead, and changed a handful of sites that directly accessed FieldStorage values directly to call the helper, and it all worked.

My previous plan was to vendor cgi.py, so thank you for responding to my question and pointing me to how I could use supported stdlib code!

Python finally offloads some batteries

Posted Mar 18, 2022 22:07 UTC (Fri) by flussence (guest, #85566) [Link]

If *Perl* can successfully get rid of its bundled CGI module, Python has no excuse. Surely 20 years is long enough to figure out what alternatives are worth using and have staying power.

Is this one of those 80/20 problems?

Posted Mar 18, 2022 1:46 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link] (1 responses)

We want stability, but not stagnation.

I'm looking to see python's package management improve.

Maybe doing some battery maintenance will establish precedent to go for further modernization.

Is this one of those 80/20 problems?

Posted Mar 18, 2022 17:21 UTC (Fri) by canatella (subscriber, #6745) [Link]

This:

> I'm looking to see python's package management improve.

As of now there are multiple ways of managing dependencies and installing packages: setuptools, pip, pipx, pipenv, poetry, conda, pdm, twine, you name it (I'm probably mixing stuff here but that's part of the problem). There is no common file format for a lock file so end user installing using pip or pipx means that you never know which version of a dependency will be installed (I know I do generate requirements.txt from the lock files but I still got it to somehow break)

I'm okay with moving things out of stdlib, but first fix the package dependency management mess. Bless one solution and document it and support it. The packaging.python.org advises to use pipenv but then warns: "By contrast, Pipenv explicitly avoids making the assumption that the application being worked on will support distribution as a pip-installable Python package." Why is that ? The officially documented way of building a package doesn't support the recommended way of installing packages ? Also recommending pip to install packages ? Guaranteed version conflicts ahead...

Sorry for ranting here but I can't count how many days I lost fixing dependency issues in Python, maybe I'm doing it wrong but there is so many way to manage dependencies and delivery in Python and nothing blessed and cleary documented on packaging.python.org. I'm completely lost.

Never had any problem with Ruby with gem and bundler. And honestly, these days, even C++ dep management is better here with Conan then Python, which says a lot.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds