A public relations problem

Posted Jul 13, 2015 15:41 UTC (Mon) by Kwi (subscriber, #59584)
In reply to: A public relations problem by arvidma
Parent article: A better story for multi-core Python

I learnt about the GIL, when I was writing a piece of code that needed (frequently) to traverse [preferably in parallel] a (big) tree structure and perform some calculations on each node.

I agree that Python is not the best tool for that job. The answer here would be C or Cython.

Even without the GIL, you'd probably have lock contention on the reference counters for any Python function called while processing your tree.

All CPython objects, including functions, are reference counted; while executing a function, the reference count is increased.

>>> import sys
>>> def foo():
...     print(sys.getrefcount(foo))
...
>>> print(sys.getrefcount(foo))
2
>>> foo()
4

Reference counting is another reason why multithreaded CPython is bad for performance critical stuff. Note that, unlike removing the GIL, removing reference counting would change the semantics of the language. That's why e.g. PyPy (which uses garbage collection) is not the "standard" interpreter.

(Now, PyPy still has the GIL – except in the experimental STM branch – but my experience indicates that PyPy using a single thread is likely faster that a hypotehical GIL-free CPython using four cores.)

Python compromises performance in numerous places, by design, whether it's by allowing crazy monkey patching of modules at runtime or by rejecting tail call optimizations.

Did you know that from module import func and calling func gives better performance than import module and calling module.func (in a tight loop)? It's obvious when you know Python, but it can be surprising to newcomers.

In the end, Python values other features higher than performance; and again, that's largely a design decision.

A public relations problem

Posted Jul 13, 2015 16:19 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

C or Cythons won't help. If you wish to access Python objects then you still have to lock the interpreter.

We've learned that hard way, while building a messaging system in Python. We had a similar problem - a largish shared object queue on which multiple consumers performed some operations. Python simply crashed and burned.

A public relations problem

Posted Jul 13, 2015 16:56 UTC (Mon) by Kwi (subscriber, #59584) [Link] (4 responses)

Sorry, I should've clarified that the data structure should be in C as well, for the reasons you give.

Anyway, the secondary point (besides "the GIL is just one of many problems for performance") is that all languages to some extent trade CPU hours for programmer hours... with the possible exception of assembler code (as written by humans), which often performs worse than the equivalent C.

The Java JITs are generally considered to provide excellent performance, while providing a high-level language. However, let's face it, outside microbenchmarks, the "Java performs better than C/C++" claims are completely bogus.

Now, C++ is a high-performance and high-level language, but you'll find people making reasonable arguments that C performs slightly better, by stripping away the (very thin) abstractions provided by C++.

And C is considered the king of performance, except that you'll find people making reasonable arguments that Fortran performs slightly better, by stripping away the (exceedingly thin) abstractions provided by C.

A trivial example of this trade-off is signed integer overflow, which goes from impossible (Python) to possible but well-defined (Java) to undefined behavior (C).

Now, if developer performance is the primary concern, I use Python, in which my performance is (rough estimate) 10x better than in Java, or 100x better than in C. If CPU performance is the primary concern, the reverse holds.

(Except nowadays, I'd look into using Rust instead of C. And if both developer performance and CPU performance was a concern, I'd use Python and curse its performance, because I really don't like the Java language... but that's besides the point.)

A public relations problem

Posted Jul 18, 2015 6:04 UTC (Sat) by linuxrocks123 (subscriber, #34648) [Link] (3 responses)

Unless we're talking assembler versus something else, I don't think developer productivity impact should be anywhere near 10x between different high-level languages. Python is supremely convenient, but that convenience is worth no more to me than a 2x productivity increase over C or C++. In fact, for very large projects, Python starts to break down for a variety of reasons, and the speedup could even turn negative.

What type of code are you writing where you think you get a 10x productivity boost by switching languages?

A public relations problem

Posted Jul 19, 2015 17:09 UTC (Sun) by Kwi (subscriber, #59584) [Link] (2 responses)

Maybe I'm just a bad C developer. ;-)

All joking aside: For a project where I can fully harvest the benefits of Python features like tuples, generators, memory safety and the wide selection of readily available libraries, I routinely write in 5 lines what would have taken 50 in a language like C. (I'd put modern C++ – that is C++11 or later – somewhere in the middle, let's say 3x faster than C and 3x slower than Python.)

Not only does that save me the time it takes to type those lines, but several studies suggest that the bug density (bugs per line) is roughly independent of the choice of programming language*, which means I save the time needed to debug those lines.

Coming up with a simple example to demonstrate the benefits of a programming languages is always difficult, but I'll try anyway.

Here's a 5-line Python function. The function depends on the standard library re (regular expression) module, and it's used with the built-in sorted function.

def natural(s, _re=re.compile('([0-9]+)')):
    """ Provides a sort key for obtaining a natural collation of strings.

        >>> sorted(['Figure 2', 'Figure 11a', 'Figure 7b'])
        ['Figure 11a', 'Figure 2', 'Figure 7b']
        >>> sorted(['Figure 2', 'Figure 11a', 'Figure 7b'], key=natural)
        ['Figure 2', 'Figure 7b', 'Figure 11a']
    """
    return tuple(
        int(text) if text.isdigit() else text.lower()
        for text in _re.split(s)
    )

If you count the docstring, it's 10 lines, but then you also have unit tests (python -m doctest natural_sort.py).

And yes, I'll go out on a limb and say that the above is representative of maybe 80% of the Python code I write – except for the number of lines, of course. ;-)

If put to the challenge, I'm sure that someone can come up a more or less equivalent C function in less than 50 lines (or less than 15 lines of C++). But it'll take them significantly longer than the 10 minutes it took to write the above, and it won't be nearly as readable (YMMV).

*) I know, I know, it's nearly impossible to measure with any level of scientific rigor, and the research is highly contested. Still, some references:

Ray et al., 2014. A Large Scale Study of Programming Languages and Code Quality in Github.

While the paper draws no conclusions, its data suggests that Python has roughly twice the bug density (bugs per SLOC) of C, C++ or Java. (Assuming Python has at most half as many SLOC than the equivalent C, that's still a win.)

Phipps, 1999. Comparing observed bug and productivity rates for Java and C++.

Apparently (haven't read the study) suggests that C++ has 15–30% more defects per line than Java.

A public relations problem

Posted Jul 19, 2015 17:41 UTC (Sun) by Kwi (subscriber, #59584) [Link]

Oh, I also wanted to add the following correction to my earlier post, though it evidently got dropped while I was editing my reply:

A "100x" boost between C and Python is overstating it, but I'm confident that 10x is a lower bound.

In the end, it's all fuzzy numbers, obviously. :-)

A public relations problem

Posted Jul 20, 2015 1:00 UTC (Mon) by Kwi (subscriber, #59584) [Link]

A quick Internet survey of implementations in other languages demonstrates my point.

There's a 76 line C implementation and a related 63 line Java implementation. The large number of lines reflect that both C and Java are lacking in their native support for high-level list and string operations.

I struggled to find an idiomatic C++ implementation (found plenty of "C in C++" style implementations), though I did find one using Boost (39 lines).

With C# we're finally getting somewhere; it can be done in 7 lines, plus a 17 line utility class that really ought to be in the standard library (but isn't). (C# in general seems to be a good fit if one wants a statically typed, compiled and managed language with a level of expressiveness that approaches Python.)

Again, I'm sure that an experienced C/C++/Java developer could do it in fewer lines than the above, but according to Google, those examples are the best the Internet has to offer. Google also finds several Python implementations, all of them variations on the same 5 lines as I posted above. (I guess there's only one obvious way to do it.)

A public relations problem

Posted Jul 17, 2015 11:51 UTC (Fri) by arvidma (guest, #6353) [Link] (1 responses)

I had no idea about the difference in cost between x.y() and y(), I assumed that that type of thing would be optimized away by the interpreter.

Thanks for a very informative response!

A public relations problem

Posted Jul 17, 2015 15:51 UTC (Fri) by Kwi (subscriber, #59584) [Link]

I should clarify that 99% of the time, the performance hit is insignificant, but in a tight spot, one may want to replace while ...: x.y() by:

z = x.y
while ...: z()

This saves a lookup of the y attribute on every iteration, in favor of a much faster local variable access (assuming this is in a function body).

The Python interpreter can't do this optimization automatically, because it'd change the semantics if one thread assigned to x.y while another was in the loop. It's just one example of the performance difficulties imposed by a highly dynamic language like Python (which doesn't have C's volatile keyword).

But again, 99% of the time, you care more about the language than the performance. So don't go "optimize" every bit of Python code like this. :-)

A public relations problem

Posted Jul 28, 2015 23:19 UTC (Tue) by pboddie (guest, #50784) [Link]

I agree that Python is not the best tool for that job. The answer here would be C or Cython.

I heard this myself from a bunch of "scientific Python" people a few years ago. The response from a colleague of mine who isn't (or wasn't) really a Python user was, "Why not just write the code in C in the first place and just ignore Python?" That's a pretty hard question to answer even for those of us who feel moderately productive in Python.

The big problems with Python's evolution have been the denial that various disadvantages are "real enough" and that everything has to be tied somehow to CPython or not be completely legitimate (although some in the "scientific" community are slowly accepting things like PyPy after choosing to ignore it for years). Need to cross-compile Python in a sane way or target embedded systems or mobile devices? No-one needs to do that! Wasn't Python for Symbian Series 60 not enough?! Thankfully, stuff like Micro Python has been developed and has presumably thrived by filling an otherwise neglected niche. Meanwhile, attempts to deliver CPython as a mobile platform seem to be stuck on repeat at the earliest stage. Plenty of examples of other domains exist if you care to look.

In the end, Python values other features higher than performance; and again, that's largely a design decision.

People have been saying this for twenty years. Making a virtue of such things - that performance is a lost cause and that everyone should instead celebrate other things including the tendency to make the language even more baroque - is precisely what has held the language back for a good long time, too. Such attitudes probably put Perl in the place it currently resides today, in case any lessons from history were needed.