LWN: Comments on "A better story for multi-core Python" https://lwn.net/Articles/650489/ This is a special feed containing comments posted to the individual LWN article titled "A better story for multi-core Python". en-us Fri, 26 Sep 2025 11:11:40 +0000 Fri, 26 Sep 2025 11:11:40 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net A public relations problem https://lwn.net/Articles/652696/ https://lwn.net/Articles/652696/ pboddie <blockquote>I agree that Python is not the best tool for that job. The answer here would be C or Cython.</blockquote> <p>I heard this myself from a bunch of "scientific Python" people a few years ago. The response from a colleague of mine who isn't (or wasn't) really a Python user was, "Why not just write the code in C in the first place and just ignore Python?" That's a pretty hard question to answer even for those of us who feel moderately productive in Python.</p> <p>The big problems with Python's evolution have been the denial that various disadvantages are "real enough" and that everything has to be tied somehow to CPython or not be completely legitimate (although some in the "scientific" community are slowly accepting things like PyPy after choosing to ignore it for years). Need to cross-compile Python in a sane way or target embedded systems or mobile devices? No-one needs to do that! Wasn't Python for Symbian Series 60 not enough?! Thankfully, stuff like Micro Python has been developed and has presumably thrived by filling an otherwise neglected niche. Meanwhile, attempts to deliver CPython as a mobile platform seem to be stuck on repeat at the earliest stage. Plenty of examples of other domains exist if you care to look.</p> <blockquote>In the end, Python values other features higher than performance; and again, that's largely a design decision.</blockquote> <p>People have been saying this for twenty years. Making a virtue of such things - that performance is a lost cause and that everyone should instead celebrate other things including the tendency to make the language even more baroque - is precisely what has held the language back for a good long time, too. Such attitudes probably put Perl in the place it currently resides today, in case any lessons from history were needed.</p> Tue, 28 Jul 2015 23:19:19 +0000 A public relations problem https://lwn.net/Articles/651664/ https://lwn.net/Articles/651664/ Kwi <p>A quick Internet survey of implementations in other languages demonstrates my point.</p> <p>There's a 76 line <a href="https://github.com/sourcefrog/natsort/blob/master/strnatcmp.c">C implementation</a> and a related 63 line <a href="https://github.com/paour/natorder/blob/master/NaturalOrderComparator.java">Java implementation</a>. The large number of lines reflect that both C and Java are lacking in their native support for high-level list and string operations.</p> <p>I struggled to find an idiomatic C++ implementation (found plenty of "C in C++" style implementations), though I did find one <a href="https://web.archive.org/web/20071217040157/http://www.boostcookbook.com/Recipe:/1235053">using Boost</a> (39 lines).</p> <p><a href="http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting">With C#</a> we're finally getting somewhere; it can be done in 7 lines, plus a 17 line utility class that really ought to be in the standard library (but isn't). (C# in general seems to be a good fit if one wants a statically typed, compiled and managed language with a level of expressiveness that approaches Python.)</p> <p>Again, I'm sure that an experienced C/C++/Java developer could do it in fewer lines than the above, but according to Google, those examples are the best the Internet has to offer. Google also finds several Python implementations, all of them variations on the same 5 lines as I posted above. (I guess there's <a href="https://www.python.org/dev/peps/pep-0020/">only one obvious way to do it</a>.)</p> Mon, 20 Jul 2015 01:00:36 +0000 A public relations problem https://lwn.net/Articles/651648/ https://lwn.net/Articles/651648/ Kwi <div class="FormattedComment"> Oh, I also wanted to add the following correction to my earlier post, though it evidently got dropped while I was editing my reply:<br> <p> A "100x" boost between C and Python is overstating it, but I'm confident that 10x is a lower bound.<br> <p> In the end, it's all fuzzy numbers, obviously. :-)<br> </div> Sun, 19 Jul 2015 17:41:02 +0000 A public relations problem https://lwn.net/Articles/651633/ https://lwn.net/Articles/651633/ Kwi <p>Maybe I'm just a bad C developer. ;-)</p> <p>All joking aside: For a project where I can fully harvest the benefits of Python features like tuples, generators, memory safety and the wide selection of readily available libraries, I routinely write in 5 lines what would have taken 50 in a language like C. (I'd put modern C++ – that is C++11 or later – somewhere in the middle, let's say 3x faster than C and 3x slower than Python.)</p> <p>Not only does that save me the time it takes to type those lines, but several studies suggest that the bug density (bugs per line) is <i>roughly</i> independent of the choice of programming language*, which means I save the time needed to debug those lines.</p> <p>Coming up with a simple example to demonstrate the benefits of a programming languages is always difficult, but I'll try anyway.</p> <p>Here's a 5-line Python function. The function depends on the standard library <tt>re</tt> (regular expression) module, and it's used with the built-in <tt>sorted</tt> function.</p> <pre>def natural(s, _re=re.compile('([0-9]+)')): """ Provides a sort key for obtaining a natural collation of strings. &gt;&gt;&gt; sorted(['Figure 2', 'Figure 11a', 'Figure 7b']) ['Figure 11a', 'Figure 2', 'Figure 7b'] &gt;&gt;&gt; sorted(['Figure 2', 'Figure 11a', 'Figure 7b'], key=natural) ['Figure 2', 'Figure 7b', 'Figure 11a'] """ return tuple( int(text) if text.isdigit() else text.lower() for text in _re.split(s) ) </pre> <p>If you count the docstring, it's 10 lines, but then you also have unit tests (<tt>python -m doctest natural_sort.py</tt>).</p> <p>And yes, I'll go out on a limb and say that the above is representative of maybe 80% of the Python code I write – except for the number of lines, of course. ;-)</p> <p>If put to the challenge, I'm sure that <i>someone</i> can come up a more or less equivalent C function in less than 50 lines (or less than 15 lines of C++). But it'll take them significantly longer than the 10 minutes it took to write the above, and it won't be nearly as readable (YMMV).</p> <p>*) I know, I know, it's nearly impossible to measure with any level of scientific rigor, and the research is <a href="https://programmers.stackexchange.com/questions/131137/research-on-software-defects">highly contested</a>. Still, some references:</p> <p>Ray et al., 2014. <i><a href="http://macbeth.cs.ucdavis.edu/lang_study.pdf">A Large Scale Study of Programming Languages and Code Quality in Github</a></i>.</p> <p>While the paper draws no conclusions, its data suggests that Python has roughly twice the bug density (bugs per SLOC) of C, C++ or Java. (Assuming Python has at most half as many SLOC than the equivalent C, that's still a win.)</p> <p>Phipps, 1999. <i>Comparing observed bug and productivity rates for Java and C++</i>.</p> <p>Apparently (haven't read the study) suggests that C++ has 15–30% more defects per line than Java.</p> Sun, 19 Jul 2015 17:09:35 +0000 A public relations problem https://lwn.net/Articles/651545/ https://lwn.net/Articles/651545/ linuxrocks123 <div class="FormattedComment"> Unless we're talking assembler versus something else, I don't think developer productivity impact should be anywhere near 10x between different high-level languages. Python is supremely convenient, but that convenience is worth no more to me than a 2x productivity increase over C or C++. In fact, for very large projects, Python starts to break down for a variety of reasons, and the speedup could even turn negative.<br> <p> What type of code are you writing where you think you get a 10x productivity boost by switching languages?<br> </div> Sat, 18 Jul 2015 06:04:39 +0000 A public relations problem https://lwn.net/Articles/651481/ https://lwn.net/Articles/651481/ Kwi <p>I should clarify that 99% of the time, the performance hit is insignificant, but in a tight spot, one may want to replace <tt>while ...: x.y()</tt> by:</p> <pre> z = x.y while ...: z() </pre> <p>This saves a lookup of the <tt>y</tt> attribute on every iteration, in favor of a much faster local variable access (assuming this is in a function body).</p> <p>The Python interpreter can't do this optimization automatically, because it'd change the semantics if one thread assigned to <tt>x.y</tt> while another was in the loop. It's just one example of the performance difficulties imposed by a highly dynamic language like Python (which doesn't have C's <tt>volatile</tt> keyword).</p> <p>But again, 99% of the time, you care more about the language than the performance. So don't go "optimize" every bit of Python code like this. :-)</p> Fri, 17 Jul 2015 15:51:26 +0000 A public relations problem https://lwn.net/Articles/651447/ https://lwn.net/Articles/651447/ arvidma <div class="FormattedComment"> I had no idea about the difference in cost between x.y() and y(), I assumed that that type of thing would be optimized away by the interpreter.<br> <p> Thanks for a very informative response!<br> </div> Fri, 17 Jul 2015 11:51:14 +0000 A public relations problem https://lwn.net/Articles/650892/ https://lwn.net/Articles/650892/ Kwi <p>Sorry, I should've clarified that the data structure should be in C as well, for the reasons you give.</p> <p>Anyway, the secondary point (besides "the GIL is just one of many problems for performance") is that all languages to some extent trade CPU hours for programmer hours... with the possible exception of assembler code (as written by humans), which often performs worse than the equivalent C.</p> <p>The Java JITs are generally considered to provide excellent performance, while providing a high-level language. However, let's face it, outside microbenchmarks, the "Java performs better than C/C++" claims are completely bogus.</p> <p>Now, C++ is a high-performance and high-level language, but you'll find people making reasonable arguments that C performs slightly better, by stripping away the (very thin) abstractions provided by C++.</p> <p>And C is considered the king of performance, except that you'll find people making reasonable arguments that <a href="https://stackoverflow.com/questions/146159/is-fortran-faster-than-c">Fortran performs slightly better</a>, by stripping away the (exceedingly thin) abstractions provided by C.</p> <p>A trivial example of this trade-off is signed integer overflow, which goes from impossible (Python) to possible but well-defined (Java) to undefined behavior (C).</p> <p>Now, if developer performance is the primary concern, I use Python, in which my performance is (rough estimate) 10x better than in Java, or 100x better than in C. If CPU performance is the primary concern, the reverse holds.</p> <p>(Except nowadays, I'd look into using Rust instead of C. And if both developer performance and CPU performance was a concern, I'd use Python and curse its performance, because I really don't like the Java language... but that's besides the point.)</p> Mon, 13 Jul 2015 16:56:19 +0000 A public relations problem https://lwn.net/Articles/650891/ https://lwn.net/Articles/650891/ Cyberax <div class="FormattedComment"> C or Cythons won't help. If you wish to access Python objects then you still have to lock the interpreter. <br> <p> We've learned that hard way, while building a messaging system in Python. We had a similar problem - a largish shared object queue on which multiple consumers performed some operations. Python simply crashed and burned.<br> </div> Mon, 13 Jul 2015 16:19:58 +0000 A public relations problem https://lwn.net/Articles/650866/ https://lwn.net/Articles/650866/ Kwi <p><font class="QuotedText">I learnt about the GIL, when I was writing a piece of code that needed (frequently) to traverse</font> [preferably in parallel] <font class="QuotedText">a (big) tree structure and perform some calculations on each node.</font></p> <p>I agree that Python is not the best tool for that job. The answer here would be C or Cython.</p> <p>Even without the GIL, you'd probably have lock contention on the reference counters for any Python function called while processing your tree.</p> <p>All CPython objects, including functions, are reference counted; while executing a function, the reference count is increased.</p> <pre> &gt;&gt;&gt; import sys &gt;&gt;&gt; def foo(): ... print(sys.getrefcount(foo)) ... &gt;&gt;&gt; print(sys.getrefcount(foo)) 2 &gt;&gt;&gt; foo() 4 </pre> <p>Reference counting is another reason why multithreaded CPython is bad for performance critical stuff. Note that, unlike removing the GIL, removing reference counting would change the semantics of the language. That's why e.g. PyPy (which uses garbage collection) is not the "standard" interpreter.</p> <p>(Now, <a href="https://pypy.readthedocs.org/en/latest/faq.html?highlight=gil#does-pypy-have-a-gil-why">PyPy still has the GIL</a> &ndash; except in the experimental STM branch &ndash; but my experience indicates that PyPy using a single thread is likely faster that a hypotehical GIL-free CPython using four cores.)</p> <p>Python compromises performance in numerous places, by design, whether it's by allowing crazy monkey patching of modules at runtime or by rejecting <a href="http://neopythonic.blogspot.com/2009/04/final-words-on-tail-calls.html">tail call optimizations</a>.</p> <p>Did you know that <tt>from module import func</tt> and calling <tt>func</tt> gives better performance than <tt>import module</tt> and calling <tt>module.func</tt> (in a tight loop)? It's obvious when you know Python, but it can be surprising to newcomers.</p> <p>In the end, Python values other features higher than performance; and again, that's largely a design decision.</p> Mon, 13 Jul 2015 15:41:19 +0000 A public relations problem https://lwn.net/Articles/650854/ https://lwn.net/Articles/650854/ arvidma <div class="FormattedComment"> I learnt about the GIL, when I was writing a piece of code that needed (frequently) to traverse a (big) tree structure and perform some calculations on each node. <br> <p> The problem itself is super easy to parallelize, the tree is static at the time of traversal, there are no cross-branch references and you only modify branch-local data (in the nodes). You spin-off a thread per first-level branch (or second level branch if you want to use moar cores), wait for the recursion to trickle down and up again and summarize each thread-result on the top node when they are finished.<br> <p> Problem is, only one thread at the time is active, due to GIL. Using the multiprocessing module is of no help, since that means you have to serialize the full tree in one chunk per process, copy each chunk to its new context, deserialize the chunks in every new context, reserialize it when you're finished, copy back to main context and then deserialize again... Unless the computations per node are extremely heavy, you lose way more from the overhead than you gain from the threads.<br> <p> I would very much appreciate if this type of problem could be handled efficiently in Python.<br> </div> Mon, 13 Jul 2015 13:59:45 +0000 A better story for multi-core Python https://lwn.net/Articles/650809/ https://lwn.net/Articles/650809/ wahern <div class="FormattedComment"> This project claims to have a copy-on-write fork implementation for Windows: <a href="http://midipix.org/">http://midipix.org/</a><br> <p> I don't think they've released their implementation, yet, though.<br> <p> </div> Sun, 12 Jul 2015 17:32:41 +0000 A public relations problem https://lwn.net/Articles/650765/ https://lwn.net/Articles/650765/ Kwi <p>As for allowing Python to "remain relevant" and your outsider(?) PoV, that's exactly my point about the GIL being a PR problem.</p> <p>For years, I've used Python for server administration and orchestration, web development, and even network packet processing. All of these things have been highly concurrent, and not once has the GIL been a problem.</p> <p>And where is Python in danger of becoming irrelevant? Python might be facing fierce competition in web development from Rails and Node.js, but it's hardly because of the GIL (or performance at all, when it comes to Ruby). NumPy and SciPy are not going anywhere in the scientific/numeric world, and Python reigns supreme in Linux server administration, having even dethroned Perl.</p> <p>Complaints about the GIL are many, but concrete examples of people having problems with the GIL in actual application code are rare. I'd like to hear more of those, because otherwise, it's just a bunch of people saying "I can't write threaded code like I'm used to in Python", and honestly, there are <i>a lot</i> of things you're used to which you can't (or shouldn't) do in Python. That's a feature, not a bug.</p> Sat, 11 Jul 2015 12:30:40 +0000 A public relations problem https://lwn.net/Articles/650759/ https://lwn.net/Articles/650759/ alankila <div class="FormattedComment"> <font class="QuotedText">&gt; Calling the GIL a public relations problem is exactly right. It's more a PR problem than a technical problem, anyway.</font><br> <p> Do not undersell this. This is an actual technical problem, caused by design decisions appropriate for uniprocessor machines, and general neglect of execution performance. That there are solutions, such as using a different language or implementation just underline (C)Python's unsuitability for certain tasks where other languages that are still fairly similar to Python are far better. PyPy is perhaps the thing that allows Python to remain relevant -- from my outsider point of view, the whole Python community should simply abandon CPython as soon as possible and make PyPy the official way to run Python.<br> </div> Sat, 11 Jul 2015 05:54:16 +0000 A public relations problem https://lwn.net/Articles/650740/ https://lwn.net/Articles/650740/ Kwi <p>Calling the GIL a public relations problem is exactly right. It's more a PR problem than a technical problem, anyway.</p> <p>If you want concurrency in your Python application, you have the following options:</p> <ul> <li>Just use threads (if you're I/O-bound). The GIL is only a problem if you're CPU-bound.</li> <li>Use message passing instead of shared memory for communication. Besides removing a bunch of race condition worries, it enables you to use Python concurrency solutions like <a href="https://docs.python.org/3/library/multiprocessing.html">multiprocessing</a> that avoid the GIL. (The proposed subinterpreter support is just one more of these solutions.)</li> <li>Implement your performance critical code in a C module, allowing you to avoid the GIL - and harness the improved performance of C.</li> <li>Implement your performance critical code in <a href="http://cython.org/">Cython</a>, for the same benefits as C, with a Python-like syntax.</li> <li>Use Jython.</li> <li>Use IronPython.</li> <li>Use <a href="http://pypy.org/">PyPy</a>. In fact, if you're CPU-bound, why are you not using this already?</li> </ul> <p>"If you want your code to run faster, you should probably just use PyPy." — Guido van Rossum</p> <p>Would it be easier for everybody if we could all just use threads and not worry about the GIL? Yes. But performance is rarely easy - not in Python, nor in any language.</p> Fri, 10 Jul 2015 14:49:33 +0000 A better story for multi-core Python https://lwn.net/Articles/650614/ https://lwn.net/Articles/650614/ flussence <div class="FormattedComment"> This sounds like a carbon copy of the threading model introduced in Perl 5.8, design mistakes and all. I suppose going from having nothing whatsoever to that is still an improvement, but it's a seriously weak offering for something released in 2015.<br> </div> Thu, 09 Jul 2015 14:44:47 +0000