A public relations problem
A public relations problem
Posted Jul 13, 2015 16:19 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)In reply to: A public relations problem by Kwi
Parent article: A better story for multi-core Python
We've learned that hard way, while building a messaging system in Python. We had a similar problem - a largish shared object queue on which multiple consumers performed some operations. Python simply crashed and burned.
Posted Jul 13, 2015 16:56 UTC (Mon)
by Kwi (subscriber, #59584)
[Link] (4 responses)
Sorry, I should've clarified that the data structure should be in C as well, for the reasons you give. Anyway, the secondary point (besides "the GIL is just one of many problems for performance") is that all languages to some extent trade CPU hours for programmer hours... with the possible exception of assembler code (as written by humans), which often performs worse than the equivalent C. The Java JITs are generally considered to provide excellent performance, while providing a high-level language. However, let's face it, outside microbenchmarks, the "Java performs better than C/C++" claims are completely bogus. Now, C++ is a high-performance and high-level language, but you'll find people making reasonable arguments that C performs slightly better, by stripping away the (very thin) abstractions provided by C++. And C is considered the king of performance, except that you'll find people making reasonable arguments that Fortran performs slightly better, by stripping away the (exceedingly thin) abstractions provided by C. A trivial example of this trade-off is signed integer overflow, which goes from impossible (Python) to possible but well-defined (Java) to undefined behavior (C). Now, if developer performance is the primary concern, I use Python, in which my performance is (rough estimate) 10x better than in Java, or 100x better than in C. If CPU performance is the primary concern, the reverse holds. (Except nowadays, I'd look into using Rust instead of C. And if both developer performance and CPU performance was a concern, I'd use Python and curse its performance, because I really don't like the Java language... but that's besides the point.)
Posted Jul 18, 2015 6:04 UTC (Sat)
by linuxrocks123 (subscriber, #34648)
[Link] (3 responses)
What type of code are you writing where you think you get a 10x productivity boost by switching languages?
Posted Jul 19, 2015 17:09 UTC (Sun)
by Kwi (subscriber, #59584)
[Link] (2 responses)
Maybe I'm just a bad C developer. ;-) All joking aside: For a project where I can fully harvest the benefits of Python features like tuples, generators, memory safety and the wide selection of readily available libraries, I routinely write in 5 lines what would have taken 50 in a language like C. (I'd put modern C++ – that is C++11 or later – somewhere in the middle, let's say 3x faster than C and 3x slower than Python.) Not only does that save me the time it takes to type those lines, but several studies suggest that the bug density (bugs per line) is roughly independent of the choice of programming language*, which means I save the time needed to debug those lines. Coming up with a simple example to demonstrate the benefits of a programming languages is always difficult, but I'll try anyway. Here's a 5-line Python function. The function depends on the standard library re (regular expression) module, and it's used with the built-in sorted function. If you count the docstring, it's 10 lines, but then you also have unit tests (python -m doctest natural_sort.py). And yes, I'll go out on a limb and say that the above is representative of maybe 80% of the Python code I write – except for the number of lines, of course. ;-) If put to the challenge, I'm sure that someone can come up a more or less equivalent C function in less than 50 lines (or less than 15 lines of C++). But it'll take them significantly longer than the 10 minutes it took to write the above, and it won't be nearly as readable (YMMV). *) I know, I know, it's nearly impossible to measure with any level of scientific rigor, and the research is highly contested. Still, some references: Ray et al., 2014. A Large Scale Study of Programming Languages and Code Quality in Github. While the paper draws no conclusions, its data suggests that Python has roughly twice the bug density (bugs per SLOC) of C, C++ or Java. (Assuming Python has at most half as many SLOC than the equivalent C, that's still a win.) Phipps, 1999. Comparing observed bug and productivity rates for Java and C++. Apparently (haven't read the study) suggests that C++ has 15–30% more defects per line than Java.
Posted Jul 19, 2015 17:41 UTC (Sun)
by Kwi (subscriber, #59584)
[Link]
A "100x" boost between C and Python is overstating it, but I'm confident that 10x is a lower bound.
In the end, it's all fuzzy numbers, obviously. :-)
Posted Jul 20, 2015 1:00 UTC (Mon)
by Kwi (subscriber, #59584)
[Link]
A quick Internet survey of implementations in other languages demonstrates my point. There's a 76 line C implementation and a related 63 line Java implementation. The large number of lines reflect that both C and Java are lacking in their native support for high-level list and string operations. I struggled to find an idiomatic C++ implementation (found plenty of "C in C++" style implementations), though I did find one using Boost (39 lines). With C# we're finally getting somewhere; it can be done in 7 lines, plus a 17 line utility class that really ought to be in the standard library (but isn't). (C# in general seems to be a good fit if one wants a statically typed, compiled and managed language with a level of expressiveness that approaches Python.) Again, I'm sure that an experienced C/C++/Java developer could do it in fewer lines than the above, but according to Google, those examples are the best the Internet has to offer. Google also finds several Python implementations, all of them variations on the same 5 lines as I posted above. (I guess there's only one obvious way to do it.)
A public relations problem
A public relations problem
A public relations problem
def natural(s, _re=re.compile('([0-9]+)')):
""" Provides a sort key for obtaining a natural collation of strings.
>>> sorted(['Figure 2', 'Figure 11a', 'Figure 7b'])
['Figure 11a', 'Figure 2', 'Figure 7b']
>>> sorted(['Figure 2', 'Figure 11a', 'Figure 7b'], key=natural)
['Figure 2', 'Figure 7b', 'Figure 11a']
"""
return tuple(
int(text) if text.isdigit() else text.lower()
for text in _re.split(s)
)
A public relations problem
A public relations problem