Keeping Python competitive

By Jake Edge
May 31, 2017

Victor Stinner sees a need to improve Python performance in order to keep it competitive with other languages. He brought up some ideas for doing that in a 2017 Python Language Summit session. No solid conclusions were reached, but there is a seemingly growing segment of the core developers who are interested in pushing Python's performance much further, possibly breaking the existing C API in the process.

The "idea is to make Python faster", he said, but Python is not as easy to optimize as other languages. For one thing, the C API blocks progress in this area. PyPy has made great progress with its CPyExt API for C extensions, but it still has a few minor compatibility problems. PyPy tried to reimplement the NumPy extension a few years back, so that it would work with PyPy, but that effort failed. NumPy is one of the C extensions to Python that essentially must work for any alternative implementation. But the C API blocks CPython enhancements as well—Gilectomy, for example. It would be nice to find a way to change that, he said.

A limited stable ABI has been defined for Python, but the full API, which is a superset of the ABI, can change between releases. The ABI is not tested, however; there is no tool to do so. Stinner said he knows of multiple regressions in the ABI, for example. The standard library is not restricted to only using the stable ABI; thus the default is the full API. All of that makes the ABI useless in practice. But Gilectomy needs to use a somewhat different C API to gain parallelism.

A different C API could perhaps provide a benefit that would act as a carrot for users to switch to using it, but he is not sure what should be offered. It is, in some ways, similar to the Python 2 to 3 transition, changing the API for performance or parallelism may not provide enough incentive for extension authors and users to port their existing code.

The C API is used both by the CPython core and C extensions. Beyond that, it used by low-level debuggers as well. But all of the header files for Python reside in the same directory, which makes it hard to determine what is meant to be exposed and what isn't. In the past, there have been some mistakes in adding to the API when that wasn't the intent. It might make sense to break out the headers that are meant to describe the API into their own directory, he suggested.

Python 3.7 is as fast as Python 2.7 on most benchmarks, but 2.7 was released in 2010. Users are now comparing Python performance to that of Rust or Go, which had only been recently announced in 2010. In his opinion, the Python core developers need to find a way to speed Python up by a factor of two in order for it to continue to be successful.

One way might be just-in-time (JIT) compilation, but various projects have been tried (e.g. Unladen Swallow, Pyston, and Pyjion) and none has been successful, at least yet. PyPy has made Python up to five times faster; "should we drop CPython and promote PyPy?". Many core developers like CPython and the C API, however. But, in his opinion, if Python is to be competitive in today's language mix, the project needs to look at JIT or moving to PyPy.

He had some other ideas to consider. Perhaps a new language could be created that is similar to Python but stricter, somewhat like Hack for PHP. He is not sure that would achieve his 2x goal, though. Compilation ahead of time (AoT) using guards that are checked at runtime, like Stinner's FAT Python project, might be a possibility to get a JIT without it needing a long warmup time. A multi-stage JIT, like the one for JavaScript, might provide the performance boost he is looking for.

Brett Cannon (who is one of the developers of Pyjion) noted that JIT projects are generally forks of CPython. That means the JIT developers are always playing catch-up with the mainline and that is hard to do. Pyjion is a C extension, but the other projects were not able to do that; the interfaces that Pyjion uses only went in for Python 3.6. He thought there might be room for consolidating some of the independent JIT work that has gone on, however.

But Mark Shannon pointed out that Pyjion and others are function-based JITs, while PyPy is tracing based. Beyond that, PyPy works, he said. Alex Gaynor, who is a PyPy developer, said that the PyPy project has changed the implementation of Python to make it more JIT friendly; that led to "a huge performance gain". He is skeptical that making small API changes to CPython will result in large performance gains from a JIT.

An attendee suggested Cython, which does AoT compilation, but its types are not Pythonic. He suggested that it might be possible to use the new type hints and Cython to create something more Pythonic. Cython outputs C, so the 2x performance factor seems possible.

Another audience member said that while it makes sense to make the ABI smaller, it is being used, so how is that going to change? It might be possible to stop it growing or growing in certain directions. One way to do that might be to require new C APIs to be implemented in PyPy before they can be merged. That might avoid the "horrible things" that some extensions (e.g. PyQt) have done. Stinner responded, "I did not say I have solutions, I only have problems", to some chuckles around the room.

PyPy has gotten its CPyExt extension API to work better, so NumPy now works for the most part, an attendee said. Problems can be fixed using the original rewrite. The long arc is to push more extension writers away from the C API and to the C Foreign Function Interface (CFFI). But Stinner is still concerned that the problem is bigger than just extensions; the C API is preventing some innovative changes to CPython.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Index entries for this article
Conference	Python Language Summit/2017

Keeping Python competitive

Posted May 31, 2017 16:10 UTC (Wed) by vstinner (subscriber, #42675) [Link]

Slides of my talk: https://github.com/haypo/conf/raw/master/2017-PyconUS/sum...

Keeping Python competitive

Posted May 31, 2017 20:53 UTC (Wed) by barryascott (subscriber, #80640) [Link]

I would love to see a better API then the existing C API to Python.
All my code is typically C++ not C. CFFI is not usable for C++ code right?
I doubt that CFFI is able to handle PyQt, that the article memntions,
at the level it is today using SIP.

I managed to, mostly, hide the deferences between python 2 and python 3
C APIs in PyCXX and would love the challenge of adding support for a
replacement python extension API.

Barry with PyCXX hat on.

Keeping Python competitive

Posted Jun 1, 2017 2:34 UTC (Thu) by fratti (guest, #105722) [Link] (1 responses)

As I've had to discover, the awfully long startup times of PyPy make it bad for use in command line applications.

Here's a radical suggestion: instead of trying to work around Python's loose and wild language and API to compete with languages that were actually designed by someone taking these things into account (such as Rust, Go and D), Python could remain in its own cozy niche as a fast to prototype and easy to use language for all sorts of automation.

Keeping Python competitive

Posted Jun 1, 2017 21:56 UTC (Thu) by vstinner (subscriber, #42675) [Link]

"Python could remain in its own cozy niche as a fast to prototype"

Haha, nice joke, thank you ;-)

Keeping Python competitive

Posted Jun 1, 2017 9:06 UTC (Thu) by Wummel (guest, #7591) [Link]

Can someone explain what "horrible things" the PyQt extension is doing?

Keeping Python competitive

Posted Jun 1, 2017 10:40 UTC (Thu) by mikapfl (subscriber, #84646) [Link]

For me (scientific programming), I found the numba JIT [1] for python very interesting. It does not compile python, but it does compile a specific subset of python which covers almost all of numpy, some parts of scipy and a good chunk of stdlib math. It is not API-compatible (for example, with regards to floating-point exceptions) but it works remarkably well for my use case: I develop an algorithm in CPython, run it in CPython on some toy examples, and when I actually need the performance, I jit the hot code paths with numba. The good thing is that the interface between jitted code and native code works flawless so if something doesn't work in jitted code, I do it in CPython instead. And I can always compare the results of test runs non-jitted (CPython) and jitted.

I actually compared implementations of an algorithm with rather deeply nested for-loops (lots and lots of complex 2x2 matrices need to be multiplied) with plain CPython, numpy, Cython, numba, C++ and C++ with eigen. CPython and numba are legible, and C++ with eigen is illegible to the point I couldn't make it produce proper results. The performance of the CPython or plain numpy versions is of course abysmal, the for-loops kill everything. Cython doesn't help much. The numba version is decently fast. The C++ version can be hand-optimized at an algorithmic level (i.e. using 1/exp(x) instead of exp(-x) at certain points, or implementing x² for complex x as an own function) to be about 20% faster than the numba version, but I'll happily take something written in python and jitted with a slightly worse performance, as it is much easier to write and read.

Maybe this would be a way to go: Extend numba to more modules, such that it becomes more useful outside of the numeric/scientific use case. I don't really know what the limiting things are performance-wise in

[1] http://numba.pydata.org/

Keeping Python competitive

Posted Jun 1, 2017 12:55 UTC (Thu) by ehiggs (subscriber, #90713) [Link] (7 responses)

This is a good write up. Thanks ,Jake.

>possibly breaking the existing C API in the process.

Please, no.

> The ABI is not tested, however; there is no tool to do so. Stinner said he knows of multiple regressions in the ABI, for example.

Add CI testing then.

>He had some other ideas to consider. Perhaps a new language could be created that is similar to Python but stricter, somewhat like Hack for PHP.

This is the right direction, imo.

Python can look to its own history of successes and failures to work through this. Allowing opt-in systems to let the community decide. Numpy, numba, pandas, sklearn, cffi, cython, theano, and tensorflow are all routes to face melting performance without changing the CPython runtime. The attempts to replace CPython or make backwards incompatible changes are less successful. Low barrier to experimentation (and hence trust) is key.

Therefore, some ideas that should be possible (warning: braindump quality prose):

1. Pull in a decorator like @jit from numba that works for a wider subset of python. Numba is generally for numeric computing so something that works with strings and the like as well would be better. Maybe allow different backends (e.g. @jit(backend=cython)) for more up front cost but tastier performance. Let the programmer tell the runtime what to do instead of coming up with all sorts of whizbang tracing infrastructure. As a user, if @jit gives an error or the performance isn't worth it, I can drop the decorator. And if I, as a user, have asked to eat the up front cost, then I'm happy to eat it.

2. If I understand correctly, a non trivial amount of time is also spent in interface querying. So add a decorator to seal objects so no interface querying needs to take place (e.g. @freeze; @bless, @fossil, @repr("C"), etc). If you need to monkey patch something, maybe add enough RTTI salt to allow re-vanilla-pythonification of the object. But if I'm a user that has sealed a type or an object, I should be ok with the fact that it's now a black box.

3. Find out what people are trying to do that isn't performant enough and see if we can come up with primitives that are written in C that are better performing. This is the same principle where bash is slow, but grep is fast so no one cares because it's just a race-to-C. e.g. csv parsing is pretty quick (faster than Go, in fact[1]). uvloop is fast[2]. Offer non-monkeypatchable primitives if performance is better. Pull in a better datetime based on a 64bit value into the stdlib instead of the ridiculous existing datetime. This won't help people benchmarking their recursive fibonacci numbers but it will help actual applications shred through data.

4. For killing the GIL, come up with a protocol for embedding languages (i.e. the aforementioned strict-similar-language (SSL)) in Python. Maybe calls are asynchronous since the SSL lives outside the GIL. Maybe this protocol can be used to wrap Python 2.7 or Python 4.x and now major version changes also have a migration path as you can take functions and convert them to Python 4 and still call them from Python 3. So this work wouldn't just be for performance but for re-cementing Python's position as a universal glue language.

[1] https://bitbucket.org/ewanhiggs/csv-game
[2] https://magic.io/blog/uvloop-blazing-fast-python-networking/

Keeping Python competitive

Posted Jun 1, 2017 14:54 UTC (Thu) by pboddie (guest, #50784) [Link] (5 responses)

He had some other ideas to consider. Perhaps a new language could be created that is similar to Python but stricter, somewhat like Hack for PHP.
This is the right direction, imo.

Here, I agree.

1. Pull in a decorator like @jit from numba that works for a wider subset of python.

I think the difficulty here is actually implementing such a thing, but maybe you know something that would make the lives of the implementers of the three just-in-time compilation extensions/implementations, supported to varying (arguably feeble) levels by large corporations and now in varying states of stagnation, somewhat easier.

3. Find out what people are trying to do that isn't performant enough and see if we can come up with primitives that are written in C that are better performing.

And then people will be even more tied to CPython and even less able to use alternative implementations like PyPy which can actually do much more with the code if it stays implemented in Python.

Python's main problem here is incoherency. As long as the proposed solutions involve people doing array processing (whilst laying claim to "scientific computing" in general) throwing things at the wall and seeing what sticks, and where the person considering or dabbling in Python is confronted with an alphabet soup of different "speed booster" technologies, there will be a strong tendency for such people to go and use another language instead: one with generally better performance and a familiarisation story with a lot fewer twists, turns and surprises.

Keeping Python competitive

Posted Jun 1, 2017 22:10 UTC (Thu) by ehiggs (subscriber, #90713) [Link] (4 responses)

the three just-in-time compilation extensions/implementations, supported to varying (arguably feeble) levels by large corporations and now in varying states of stagnation, somewhat easier.

Which ones are you talking about? numba (which is actively developed)? What else is there that can be used through a decorator?

And then people will be even more tied to CPython and even less able to use alternative implementations like PyPy which can actually do much more with the code if it stays implemented in Python.

Doesn't CFFI solve this?

As long as the proposed solutions involve people doing array processing (whilst laying claim to "scientific computing" in general) throwing things at the wall and seeing what sticks, and where the person considering or dabbling in Python is confronted with an alphabet soup of different "speed booster" technologies, there will be a strong tendency for such people to go and use another language instead: one with generally better performance and a familiarisation story with a lot fewer twists, turns and surprises.

I disagree. The article says "Users are now comparing Python performance to that of Rust or Go". People doing array processing aren't seeing the alphabet soup and turning to Rust or Go. They are coming from R and Matlab and saying "I"m ok with this".

I don't think people using Python are dropping it for Rust since Rust and Python play nicely over FFI (e.g. rust-fst which is used in some genomics research). You can have Rust + Python.

In the end, I think it's web developers and sysadmins, turning to Go and the deployment story is probably a big part of it. Deploying a static ELF is much nicer than whatever we do with Python.

Keeping Python competitive

Posted Jun 2, 2017 11:51 UTC (Fri) by pboddie (guest, #50784) [Link] (3 responses)

the three just-in-time compilation extensions/implementations, supported to varying (arguably feeble) levels by large corporations and now in varying states of stagnation, somewhat easier.
Which ones are you talking about? numba (which is actively developed)? What else is there that can be used through a decorator?

The ones Victor mentions in his slides: Unladen Swallow (developed by Google, now abandoned), Pyston (developed by Dropbox, now a side project), Pyjion (developed by Microsoft employees as a side project). I'm talking about general just-in-time compilers here.

People doing array processing aren't seeing the alphabet soup and turning to Rust or Go. They are coming from R and Matlab and saying "I"m ok with this".

I'm sure they are. However, array processing is not the entirety of "scientific computing", which means that there are plenty of other people who haven't had to use Fortran or Matlab (or various particularly horrid proprietary technologies that apparently linger in certain domains). Many of those people want a coherent path if they are going to learn and use a language, and they aren't likely to be impressed when told that they have to pull in today's favourite numeric Python extension or that they will need to learn another language to rewrite part of their Python code so that it goes fast enough.

Some people will be so embedded in a particular culture that they will just use what their boss/peers recommend - that's how those nasty proprietary systems manage to linger even when the world has moved on - and so they will be happy using whichever numeric extension is popular in their environment. But that doesn't mean that people outside such cultures see things the same way.

Keeping Python competitive

Posted Jun 2, 2017 14:40 UTC (Fri) by ehiggs (subscriber, #90713) [Link] (2 responses)

Ok, so:

I think the difficulty here is actually implementing such a thing, but maybe you know something that would make the lives of the implementers of Unladen Swallow, Pyston, and Pyjion somewhat easier.

I obviously communicated my point this poorly. Picking out some of the points:

Low barrier to experimentation (and hence trust) is key...a decorator like @jit ...Let the programmer tell the runtime what to do instead of coming up with all sorts of whizbang tracing infrastructure...As a user, if @jit gives an error or the performance isn't worth it, I can drop the decorator. And if I, as a user, have asked to eat the up front cost, then I'm happy to eat it.

From a deployment perspective, Unladen Swallow, Pyston, Pyjion, and Pypy are high risk changes. Using them is a big bang change to the project. If it doesn't work initially then I can spend considerable time trying to port my code into a state where it can be used by the different runtimes - all the while not knowing what the payoff will be. This makes them nonstarters and that's why adoption is so low even with Pypy.

What I am trying to suggest has a low deployment footprint. You profile your program and find a function that needs to run fast. You mark it "@jit". If it works and performance improves, you go find something else to do with your time. If it didn't make the program fast enough, you mark a few more functions as "@jit", etc. If it made it worse, then you remove "@jit". If the code is too weird (monkey patched, weird types like string + int, etc) then it raises an exception saying it couldn't jit it.

The deployment story is so laughably simple compared to fundamentally replacing the runtime that I don't feel that it's a similar solution to Unladen Swallow, Pyston, Pyjion, or Pypy at all. How does this make the maintainer's life easier? It vastly simplifies the problem. No tracing. No API or ABI compatability hell. No weird benchmark issues where you have to warm up the jit first. Easier to sell to users. Less code to maintain. Easier to document.

Keeping Python competitive

Posted Jun 2, 2017 15:05 UTC (Fri) by pboddie (guest, #50784) [Link]

From a deployment perspective, Unladen Swallow, Pyston, Pyjion, and Pypy are high risk changes.

But they shouldn't be. It is only because the default strategy to mitigate CPython limitations has been to more or less "rewrite it in C" that people now find themselves with the CPython and C API baggage mentioned in the article.

I remember attending a seminar where people were being told to consider a cornucopia of different speed-up technologies, but knowing that some of the people present were not array processing people, I made a point of suggesting that perhaps those people could consider trying PyPy and seeing if it made a difference instead. Note well that I brought up PyPy because it was conspicuously absent from the material presented in the seminar.

Of course, trying out PyPy wasn't really an option for people with C extension dependencies, but I was trying to persuade people that they might consider using what should be a transparent replacement for CPython (in its main job of running Python code) before dashing down the road of reimplementation and sprinkling their code with artefacts of a range of other technologies that will most certainly broaden their dependency manifests.

Using them is a big bang change to the project. If it doesn't work initially then I can spend considerable time trying to port my code into a state where it can be used by the different runtimes - all the while not knowing what the payoff will be. This makes them nonstarters and that's why adoption is so low even with Pypy.

I don't disagree that adopting PyPy once you've started to rely on CPython extensions is going to be an upheaval. My point is that lots of people, led by the core CPython developers, have taken a path that limits their options considerably. Indeed, the reason why many of these "complete" JIT compilers struggle to balance CPython legacy requirements while attempting to introduce more modern techniques and still deliver the performance benefits that you see with other languages is because they want to deliver it all, precisely so that they can cater to people with specific and unyielding dependencies. Which is again a point made in the article.

The deployment story is so laughably simple compared to fundamentally replacing the runtime that I don't feel that it's a similar solution to Unladen Swallow, Pyston, Pyjion, or Pypy at all.

I'd almost believe the "laughably simple" part if I had any confidence in these technologies staying the course, rather than becoming the "old new thing", the "old old thing", and the "new old thing", which is a bit of a problem in the Python scene in general. But again, marking up and tuning individual functions for handling by some CPython extension just ties people to CPython even more. Maybe the array processing people will see this as still being better than Fortran, but for many other people, it makes them look elsewhere. Again, covered by the article.

Keeping Python competitive

Posted Jun 3, 2017 2:54 UTC (Sat) by njs (subscriber, #40338) [Link]

Yeah, Numba's design is super clever, exactly because it can provide incremental value without having to first reimplement all of Python. But it's probably worth keeping in mind that (a) this design really only works for the kinds of numerical kernel code that they're focusing on, and (b) their success has also depended on having a corporate backer and tons of funding. (I think it's varied between 2 and 4 full-time devs over its lifetime?) So... it's not necessarily something other projects can generalize from.

The kinds of programs that PyPy can speed up are almost disjoint from the ones that Numba can speed up, and this isn't just a difference in what they've worked on - it's fundamental to their underlying optimization strategies. In many cases, using PyPy and Numba together would make a lot of sense, with Numba acting as a highest "tier" for hot numerical loops.

Keeping Python competitive

Posted Jun 8, 2017 18:41 UTC (Thu) by jgfenix (guest, #113371) [Link]

I see Julia as that new language, specially in the numerical field.

Keeping Python competitive

Posted Jun 1, 2017 15:18 UTC (Thu) by mgedmin (subscriber, #34497) [Link] (2 responses)

> PyPy tried to reimplement the NumPy extension a few years back, so that it would work with PyPy, but that effort failed.

Is this referring to NumPyPy? It failed? The last mention of it I that I can find on the PyPy blog (https://morepypy.blogspot.lt/2016/11/) sill mentions it as a viable alternative. What happened since then?

Keeping Python competitive

Posted Jun 1, 2017 22:08 UTC (Thu) by vstinner (subscriber, #42675) [Link]

Sorry, I was maybe plain wrong. I heard that NumPyPy is small subset of numpy, too small to be usable for many cases. I also heard that NumPyPy is developed by a small team, much smaller than numpy.

My point was more than cpyext became faster and support more functions of the CPython C API.

"We continue to make incremental improvements to our C-API compatibility layer (cpyext). We pass all but 12 of the over-6000 tests in the upstream NumPy test suite, and have begun examining what it would take to support Pandas and PyQt."
https://morepypy.blogspot.fr/2016/11/

Keeping Python competitive

Posted Jun 2, 2017 3:33 UTC (Fri) by njs (subscriber, #40338) [Link]

The pypy team has since given up on developing numpypy as a straight replacement for numpy; their current plan is to support real-numpy via their cpyext layer (which has involved a bunch of improvements in cpyext + working with the numpy developers to remove some of the nastier stuff numpy does), and then monkeypatch in performance critical bits of numpypy on top of numpy when its imported.

I'm not really a fan of the latter part, but there isn't yet much in the way of viable alternatives.

The underlying problem is that numpy's API is big, complicated, quirky, and a moving target, so trying to create a bug-for-bug replica of all or even part of it is not easy.

Keeping Python competitive

Posted Jun 1, 2017 16:23 UTC (Thu) by dowdle (subscriber, #659) [Link] (1 responses)

Maybe this is the video of his presentation?
https://www.youtube.com/watch?v=d65dCD3VH9Q

He seems to be wearing a different shirt in the video than in the photo, so maybe not.

Keeping Python competitive

Posted Jun 1, 2017 16:58 UTC (Thu) by pboddie (guest, #50784) [Link]

No, that is most likely (without checking the schedule) a regular talk, whereas the Language Summit is an invitation-only event separate to the regular proceedings. So, I guess we should be thankful for LWN's coverage, because without it the rest of us would probably never really hear what was discussed in sessions like the one featured in this article.

Keeping Python competitive

Posted Jun 1, 2017 22:15 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

Apart from the C extension issues, isn't a big part of the problem that Python-the-language is really complicated and underdocumented, being defined by its interpreter and and growing by accretion over many years, with zillions of special cases that have to be checked at run time? There was an LWN article about this not that long ago that I can't find now...

Keeping Python competitive

Posted Jun 2, 2017 4:19 UTC (Fri) by njs (subscriber, #40338) [Link]

Yes, this is what motivates PyPy's "metatracing" approach: they implement all the quirky special cases inside the JITted language, so the tracer can see them and prune off the ones that aren't actually used in a particular case.

But there are ways to attack this problem (as demonstrated by PyPy), or to remove the GIL, or etc.; the problem for CPython is that it's incredibly difficult to even begin on any of those without breaking the C extension API. The current C API is not entirely unlike Firefox's old extension API: it pretty much directly exposes all the interpreter's internals. In particular, it makes it very obvious that CPython is a straightforward mapping of Python constructs onto C structures + a straightforward bytecode interpreter loop, all of which is fine for what it is but sure looks different from modern high-performance JITs.