Not logged in
Log in now
Create an account
Subscribe to LWN
Pencil, Pencil, and Pencil
Dividing the Linux desktop
LWN.net Weekly Edition for June 13, 2013
A report from pgCon 2013
Little things that matter in language design
Python's implementation is absolutely horrible performance-wise, and Mr. van Rossum ought to be ashamed of it, if anything.
Van Rossum: Python is not too slow (InfoWorld)
Posted Mar 16, 2012 22:01 UTC (Fri) by zooko (subscriber, #2589)
What Guido is saying is exactly correct, in my experience. The Python culture is not like the Java culture of "100% Pure" in which re-using extant libraries written in native (C/C++) code is frowned on. Instead, the Python culture has always been pragmatic and polyglot, and it is common to re-use extant C/C++ libraries when appropriate, or to write a new bit of C/C++ code to compute the inner loop of your Python program. That latter technique is what Guido is talking about.
In practice, "Python" always means "Python plus as much C/C++ as you need".
Posted Mar 16, 2012 23:36 UTC (Fri) by pboddie (subscriber, #50784)
The argument that there are other factors in program performance is a valid one. However, this shouldn't be used to justify closing the door on the fundamental runtime or generated code performance. Fortunately, systems like PyPy and a variety of other tools are keeping that particular door open.
Posted Mar 17, 2012 6:03 UTC (Sat) by cmccabe (guest, #60281)
Posted Mar 18, 2012 15:03 UTC (Sun) by pboddie (subscriber, #50784)
Posted Mar 19, 2012 1:56 UTC (Mon) by agrover (subscriber, #55381)
Posted Mar 19, 2012 7:27 UTC (Mon) by ssmith32 (subscriber, #72404)
Posted Mar 19, 2012 22:11 UTC (Mon) by cmccabe (guest, #60281)
Rust's idea of making everything immutable sounds like a great idea and it appeals to a lot of folks (especially people with a functional programming background) but how practical is it really? The jury is still out, I think. Maybe at some point I'll give it another look, however.
Posted Mar 20, 2012 2:19 UTC (Tue) by Cyberax (✭ supporter ✭, #52523)
Mutability is evil (EVIL, I say!!!), but sometimes it's required. So making it optional is a good design choice.
And there are successful languages without mutable variables (Erlang comes to mind).
Posted Mar 20, 2012 9:06 UTC (Tue) by cmccabe (guest, #60281)
Allocation: Rust has three different types of allocation -- "interior types" which are stored on the stack, "unique types" which are scoped, and "boxed types" which are GC'ed. Go only has one type of allocation and the compiler figures out what to do for you. Go does have the "defer" statement, which you can use to get something very like C++ scoped destructors or Rust "unique types." This somewhat reflects the C++ background of the Rust developers. C++ has always given you lots of different flavors of allocation. Is this a good thing? Um... for a high-level language, probably not.
Rust goes through great lengths to avoid global garbage collection. Go currently has a stop-the-world garbage collector. I think this reflects the fact that Mozilla, with their interest in big, user-interface intensive programs, developed Rust, whereas Google, with their interest in server stuff, is backing Go. I don't think this is a big long-term problem for Go. Google has shown us with Dalvik and V8 that incremental garbage collectors are a practical way to solve this problem.
Both Rust and Go have structural subtyping. Go has a very intuitive system for creating modules and encapsulation (in my opinion.) I'm not as familiar with Rust's systems in these areas, so I can't comment on them. Both Rust and Go provide mechanisms to structure programs in terms of many small tasklets that share state by communicating rather than with mutexes, locks, and shared state.
As mentioned above, Rust has this concept of immutable data, which Go lacks. In Rust, mutable data needs to be allocated in a special way, on a special heap (all other data is immutable.) Currently there is a flaw in the Rust type system which allows supposedly immutable data to be modified, but hopefully this will soon be fixed. [ See https://github.com/mozilla/rust/wiki/Proposal-to-address-... ] You can get most of the benefits of immutable data in Go by passing around values rather than pointers. (In Go, channels can take either values or pointers.) It is good to avoid gratuitous mutation, but I find myself wondering whether the Rust approach will be clunky in practice.
I've suffered through a lot of pain being "helped" by the C++ type system, which forbids, for example, passing a vector<char*>* to a function which expects vector<const char*>*. So I am a little bit gun-shy of trying to enforce these types of things through a complex type system. For now, I'll just say this: time will tell if Rust's approach is viable.
Rust also disallows null pointers, which I personally do not agree with. Maybe it's my background as a C programmer, but I like having a special value to represent "none." I always find myself using clunky workarounds to emulate NULL when it is not present in the language-- especially in languages without exceptions (but more about that later.)
Rust has generics, and Go does not. The creators of Go have publicly stated that generics might be added in the future, but nobody has a timeframe. Generics are probably the thing that I most miss in Go.
Both Rust and Go try to encourage you NOT to use exceptions as a control flow mechanism. Rust has "fail," and Go has "panic," but neither of them is a traditional exception mechanism like you might find in Java. They don't allow you to associate much with the exception except a string. This seems like a good choice to me-- flow control via exceptions is icky. It will surprise some people, though.
Rust has this concept of "typestate," which is kind of like the Linux kernel's BUILD_BUG_ON or Boost's static_assert, but built-in. This is something that I really like. Things like this can be retrofitted on to the language after the fact-- like sparse does for C, or annotations do for Java-- but it is better to put it to the compiler itself.
Language change: Rust has changed a lot from 2010 to 2011, and seemingly in backwards-compatibility breaking ways. Even the syntax changed a lot. This reflects a lack of actual users and a more academic attitude. Will Rust finally stabilize in the future, or keep ever-changing like D has? Hopefully the former.
Code maturity: The Rust slide deck from late 2011 says "we're missing garbage collection... and the compiler isn't yet production-quality." (This is a direct quote from the slide deck-- please don't flame me about this.) So it seems irresponsible to point people looking for something to use in production towards Rust, until those points are fixed. On the other hand, Google is using Go in production right now.
I hope I've been fair here!
Posted Mar 20, 2012 10:10 UTC (Tue) by khim (subscriber, #9252)
Google has shown us with Dalvik and V8 that incremental garbage collectors are a practical way to solve this problem.
What Google actually showed is that RFC 1925 is still in effect: while GC is totally unsuitable for UI (that's why company which knows how to create attractive UI deprecates it), but yes, if you'll spend enough time with profiler and if your system is powerful enough (say, 10x as powerful as the task on hand really needs) then GC may work for UI. It remains to be seen if Go will actually be ever usable for UI.
Rust has changed a lot from 2010 to 2011, and seemingly in backwards-compatibility breaking ways.
Go did that too. This is not a crime: C++, for example, introduced a lot of backward-incompatible changes in it's early days. Only years after introduction (when there was already huge amount of code written) it stopped doing this. And C++11 introduces breaking changes again.
Posted Mar 20, 2012 13:02 UTC (Tue) by Cyberax (✭ supporter ✭, #52523)
Unique types allow deterministic destruction of resources - you are guaranteed not to have memory/resource leaks with them. Unfortunately, the next step (refcounted handles) can not make such guarantees. So they've stopped at unique types.
Dividing the RAM into independent thread-local heaps helps immensely - Erlang does it for exactly the same purpose, for example. Experience has taught us that truly pauseless 'global' garbage collectors are possible, but they either have a big overhead or require hardware assists (there's a company that even creates special hardware for high-speed GCs!).
The split between immutable and mutable types in Rust is interesting. But I don't think it can be simplified much. And no, Go doesn't have anything like it - you can pass mutable data using channels (which is actually an advantage in Go's programming style).
Posted Mar 20, 2012 18:17 UTC (Tue) by njs (guest, #40338)
Personally this difference is enough to pretty much rule out Go as an option so long as Rust ends up being viable at all, no matter what other clever syntax and type systems and stuff they have... but of course YMMV.
> Rust also disallows null pointers, which I personally do not agree with. Maybe it's my background as a C programmer, but I like having a special value to represent "none." I always find myself using clunky workarounds to emulate NULL
I don't remember the details, but I'm pretty sure it's easy and idiomatic in Rust to define a "maybe<X>" type: a value which is either "none" or else has a real pointer-to-X in it. The trick is that unlike C, now you *can't* dereference such a pointer without checking for NULL-ness when "unpacking" it. This might seem like a burden, but of course it only applies to those pointers which *might* be NULL, which are exactly the ones that you have to check. So they're not trying to make this clunky; basically it's just like C, except now the compiler will keep track of when you need to check for NULL and when you don't. Boom, no more segfaults.
> Rust has this concept of "typestate," which is kind of like the Linux kernel's BUILD_BUG_ON or Boost's static_assert
Typestates are *substantially* more powerful than BUILD_BUG_ON, because typestates make compile-time assertions about program flow. A simple example of a typestate assertion would be "strings which come from the network and then later end up being passed to the filesystem *must* go through the utf8 sanitizer at some point in between". If you accidentally add a code path that violates this invariant, the compiler will tell you.
Generally a lot of things in Rust are designed around the idea a good language should help you write correct code.
(Disclaimer: Graydon's a friend, and I reviewed an early version of the Rust spec, but I haven't been involved or followed the project much since.)
Posted Mar 20, 2012 19:24 UTC (Tue) by Cyberax (✭ supporter ✭, #52523)
So I'd certainly like at least some controllable amount of sharing with explicit locking. But message-passing seems to be much easier to write for.
Posted Mar 20, 2012 20:10 UTC (Tue) by njs (guest, #40338)
But yeah, it's possible that in some situations, code in Rust will be slower than the best possible concurrent implementation that exploits details of the CPU's cache coherency model etc. For me this is a totally acceptable trade-off, but again YMMV.
(Anyway, it doesn't look like Go has any primitive memory barrier operations, so your only safe concurrency options there are mutexes and channel sends. No RCU.)
Posted Mar 21, 2012 16:36 UTC (Wed) by cmccabe (guest, #60281)
As far as I can see, you should be able to use the CompareAndSwapUintptr operation to get the update-side memory barrier you need for RCU. Then you should be able to have reader threads read the pointer value normally, without a memory barrier, and get whatever they get.
In Go, you have garbage collection, so you can forget all about fooling with grace periods and so forth. Just update the pointer.
Posted Mar 21, 2012 16:53 UTC (Wed) by khim (subscriber, #9252)
In Go, you have garbage collection, so you can forget all about fooling with grace periods and so forth. Just update the pointer.
Not exactly. Quite often things like RCU are used for performance-critical tasks. The fact that language is built around GC means that now you not only need to worry about grace periods but you must convince GC to [roughly] obey them.
YMMV: in some cases it may all “just work”, in some other cases you'll spend huge amount of time taming GC (and then everything will break once GC will be changed in your implementation).
In fact non-optional GC is my biggest gripe with go.
Posted Mar 21, 2012 17:05 UTC (Wed) by njs (guest, #40338)
Posted Mar 17, 2012 11:04 UTC (Sat) by anselm (subscriber, #2796)
For people who already know C or are thinking of learning it, the response is then, "Why don't I just write it in C to start with?" Yes, I've had this conversation. The response is not quite soundbite-sized and is thus unsatisfactory for some audiences.
What's wrong with »Because those 90% of your program which are not in fact time-critical will be that much easier to write and maintain«?
Also, you never know, the Python code may actually be fast enough in practice. In any case, it is a good idea to prototype the whole thing in Python to begin with to (a) see whether it will work at all, and (b) provide a performance baseline for later optimisation work.
As Donald E. Knuth famously said, »Premature optimisation is the root of all evil.«
Python is too slow, but that's not even the problem
Posted Mar 17, 2012 12:15 UTC (Sat) by tialaramex (subscriber, #21167)
But there is a substantial opportunity for cost inefficiency from requiring extra skillsets in your programming team. The very expensive C++ programmers you hired, who can write templated code that's simultaneously fast, correct and re-usable, are wasted writing noddy beginner-level Python for a week when there is no C++ work on the Gantt chart. A project that takes 2 Python programmers two weeks not only doesn't take one week if you throw 4 Python programmers at it, it can take six months if all you have is a C++ programmer.
In a hobby Free Software project this sort of thing can just strangle you. You do not get to hire the "right" people, you have to take what you get, and if what you've got is six C++ programmers then writing the non-critical paths in Python makes no sense. If the project's progenitor happens to love Python but none of its subsequent maintainers do, either the Python code is slowly removed or the project bitrots.
Now it so happens that the best programmer in my company happen to be skilled in both C++ and Python. But that wasn't a hiring decision, it was a happy accident. He's a perfectly good Java and C programmer too, I hope we're paying him very well indeed. So for us we have this type of flexibility. But despite that we rarely use it within a program. Programs that need raw bit-banging performance are in C and C++, programs that don't are in Java, scripts that bolt stuff together are in Perl or Python. And we still run into manpower problems where we have a developer ready to work on a high priority task, but their skillset doesn't match the task needed.
Posted Mar 17, 2012 18:33 UTC (Sat) by robert_s (subscriber, #42402)
Posted Mar 19, 2012 0:26 UTC (Mon) by jmalcolm (guest, #8876)
Ok. So your argument is that you should not use Python is that you may need SOME expensive C++ programmers to optimize critical paths? Is using ONLY C++ programmers a better solution?
If Python coders are cheaper, this sounds like an argument FOR using Python.
Posted Mar 19, 2012 10:04 UTC (Mon) by tialaramex (subscriber, #21167)
It's not just Python vs C++, the opportunity to screw this up will arise in other areas. Maybe your program could be faster if it used a custom network protocol instead of the stuff provided. You could hire a specialist to do that work. Maybe your program would benefit from a novel indexing strategy in the database. You could hire a specialist for that too. Does that make good sense? It depends.
One-language programmer == bad idea
Posted Mar 19, 2012 1:46 UTC (Mon) by david.a.wheeler (subscriber, #72896)
No one language can be all things to all people. Most large programs include multiple languages, even if they appear otherwise at first (e.g., they start growing specialized interpreters).
Calling out to other programming languages for specific tasks is often a very *efficient* way to use human time. Use a language that's easy and fast to program in for most stuff, and then dive down when you need to. In many problems there are a few hotspots; spend most of the expensive (human) time on those, and not on stuff that doesn't matter. This is old (and still valid) advice).
Clearly, Python's slow speed and its poor support for multithreading makes it inappropriate for some tasks. It'd be sensible to improve that in Python implementations. But that doesn't mean it's inappropriate for all tasks. The biggest problem with Python right now is the really awful way they've handled the 2-to-3 transition. The CPython developers have abandoned python 2, but almost no one is moving to python 3, and they're incompatible. Note that PyPy does not even *try* to implement python3, and most of the available python libraries are for python2, not python3.
The problem here is that the CPython developers have confused implementations of the language with the language itself. They should have had a single implementation of both Python variants, so that people could gradually move up as the language changes (as they'd always done previously). Historically, it was easy to write code that ran on multiple versions of Python; once that failed to be true, python got a lot less useful. I like python, but I think the 2-vs-3 gap is a way more important problem than its (lack of) speed.
Posted Mar 19, 2012 2:23 UTC (Mon) by dlang (✭ supporter ✭, #313)
I don't think this is the problem. Python has multiple implementations, so they are pretty careful to keep the language implementation from being the language definition.
The problem here is the Python mantra "there must only be one way to do something"
if you find that the way you implemented to do something is wrong, you can't add a better way and keep the old way, because there would now be two ways of doing something.
In the interview, he says that a large part of the reason for doing Python3 was that Python2 had grown more than one way to do some things.
you can't both keep backwards compatibility and eliminate the 'extra' ways to do something.
Posted Mar 19, 2012 6:05 UTC (Mon) by ssmith32 (subscriber, #72404)
The main problem I run into with the "use a different interpreter" argument is that CPython is installed everywhere, and other implementations nowhere. Which means you constantly have to install other implementations.. and install them in a side-by-side setup that doesn't affect other programs that need CPython for whatever reason (like needing particular versions, etc). Basically, nice in theory, but, assuming your program isn't just something for one special machine, using other interpreters ends up be a huge PITA.
And, yes, threads are nice to have sometimes (they are not universally "evil" - and I seen some terribly evil, twisted programs from people who just discovered python & twisted, and got religion, but no implementation skills). And quite nice threaded programs from people who know what they're doing. So the GIL does suck, in the end. And the whole "need to support the uniprocessor" case is long gone..
Posted Mar 22, 2012 23:52 UTC (Thu) by samroberts (subscriber, #46749)
xrange and range
ElementTree and cElementTree (one is fast, the other is the default)
string module, string methods
python has lots of different ways to do things.
Posted Mar 19, 2012 11:08 UTC (Mon) by tialaramex (subscriber, #21167)
Posted Mar 22, 2012 22:24 UTC (Thu) by zuki (subscriber, #41808)
> Note that PyPy does not even *try* to implement python3, and most of the
> available python libraries are for python2, not python3.
PyPy is working on it: http://pypy.org/py3donate.html.
Posted Mar 17, 2012 15:20 UTC (Sat) by iabervon (subscriber, #722)
On the other hand, I think you should keep the real work out of your Python code not just for performance reasons but also because Python doesn't have good encapsulation, which means that the size of the total Python code is a factor in how hard it is to maintain the program as a whole.
But I think that Python is an ideal bridge between UNIX-style C code that does one thing well and monolithic interactive applications that do all of the steps of a complex task.
Posted Mar 16, 2012 22:07 UTC (Fri) by ovitters (subscriber, #27950)
Posted Mar 16, 2012 22:35 UTC (Fri) by slashdot (guest, #22014)
This is not just slow, it's utterly embarassing performance, and simply unusable for any non-trivial computation.
Posted Mar 16, 2012 23:31 UTC (Fri) by Sho (subscriber, #8956)
Posted Mar 17, 2012 7:29 UTC (Sat) by Cato (subscriber, #7643)
Posted Mar 17, 2012 10:58 UTC (Sat) by anselm (subscriber, #2796)
Posted Mar 17, 2012 18:38 UTC (Sat) by robert_s (subscriber, #42402)
This is not just slow, it's utterly embarassing performance, and simply unusable for any non-trivial computation.
Yeah that's why nobody uses it for serious computation, I - oh wait.
Posted Mar 19, 2012 6:12 UTC (Mon) by ssmith32 (subscriber, #72404)
Like on a HPC cluster? Doing physics or weather simulations or something? Not a rhetorical question, I actually don't know.
I thought must serious numerical computation was in C and MPI on high-end clusters.. but I really have no idea. Heard of NumPy several times. Never heard of it being used for serious numerical computation, though.
Then again, someone who consulted for JPL once told me they do insane amounts of stuff in Excel spreadsheets.. so... not sure if science geeks using it for serious computation would mean anything anyways :D
Posted Mar 19, 2012 12:29 UTC (Mon) by tnoo (subscriber, #20427)
Python/Numpy/Scipy really has an advantage over the competition (Matlab) that Python is a universal, fully object-oriented programming language, is extremely flexible, and is very easy to use for evolutionary programming.
So, first there is a rough idea, once this works, abstraction and encapsulation happens without any effort (unlike in Java or C++).
Once the code is fully functional, maybe performance bottlenecks play a role, but profiling and refactoring some code to run a compiled module is usually sufficient.
The last step, which is probably rarely done, is to implement everything from scratch with performance as the main objective, and thus choosing the fastest language/library available.
But doing this last step before the evolutionary exploration of ideas is not possible, and for that purpose Python & Co is ideal.
Posted Mar 21, 2012 13:47 UTC (Wed) by jbh (subscriber, #494)
It's very nice to be able to prototype with help from numpy/matplotlib/etc, and use the same code (without matplotlib :-) on a cluster. And yes, this is real physics, on up to thousands of processors.
(Disclaimer: I am occasionally involved in dolfin/fenics development)
Posted Mar 19, 2012 13:07 UTC (Mon) by magi (subscriber, #4051)
At the moment we are trying to convince people to move from matlab to python.
Posted Mar 21, 2012 23:50 UTC (Wed) by dashesy (subscriber, #74652)
Posted Mar 17, 2012 20:55 UTC (Sat) by Tobu (subscriber, #24111)
More Python fragmentation
Posted Mar 18, 2012 13:00 UTC (Sun) by man_ls (subscriber, #15091)
What van Rossum doesn't seem to understand is that while this fragmentation is not good for Python programs, it is very damaging to the library ecosystem. The "little piece of your system" may not be in your program, but in one of the libraries you are using. At which point you are out of luck, or you have to enter a rabbit hole not apparent from the surface.
Posted Mar 18, 2012 14:54 UTC (Sun) by Tobu (subscriber, #24111)
PyPy isn't a Python subset, it is a compatible alternative implementation of the same language. One PyPy developer did work on the faster but restricted approach (Armin Rigo on Psyco) before getting involved in PyPy. Of the incompatibilities between CPython and PyPy, the one that matters in practice is the absence of refcounting.
Python 3 compatibility is being worked on. So is compatibility with the C api, which has been reworked to be less abstraction-leaky in Python 3. When C is used to accelerate bottlenecks, a pure-python fallback for PyPy can perform faster than the C version. People are also working on implementing enough C-api compatibility that Cython-generated bindings work out of the box; this is competing with another approach that makes Cython generate pure python + ctypes bindings. Some parts of the ecosystem are harder to crack, especially scientific Python, but that is being worked on by rewriting the numpy core.
As far as compatibility being a maintenance burden: I tend to write Python 2 code, which is compatible with CPython2 and PyPy, and let distribute invoke 2to3 to convert my packages when installed in a CPython3 environment. Running a testsuite across multiple implementations and versions of Python is very simple using a tool called tox.
Posted Mar 18, 2012 15:23 UTC (Sun) by pboddie (subscriber, #50784)
Most compatibility issues with alternative Python implementations are purely due to library availability, and if there hadn't been such a focus on CPython and migrating code to C for improved performance, this would be less of an issue. In fact, some code is being migrated back to Python to take advantage of PyPy's just-in-time compiler. The remaining issues are arguably a result of people taking advantage of CPython implementation details, often in an ill-advised manner or "because they're there".
Make no mistake: the principal cause of fragmentation in the Python realm is that of the introduction of Python 3. Fortunately for the language designers, PyPy will probably come to the rescue of Python 3 because the improved performance will no longer be confined to Python 2 programs. And I can easily see PyPy, not CPython 3, being the preferred runtime for Python in the not too distant future because of this.
Posted Mar 19, 2012 0:41 UTC (Mon) by jmalcolm (guest, #8876)
Does the Alioth test bench resemble your program? Most programs?
Let's say you are writing an app that is memory bound or I/O bound and nowhere near CPU bound. Most web apps fit this description and a big factor in their performance is network latency. For such an app, the choice of language may make only a small difference to performance but a big difference to programmer productivity.
There are many, many examples of systems that win in the market against systems that "perform" better.
Now what if the increased programmer productivity translates into more time for profiling and optimizing the design and infrastructure? These could easily lead to a faster system overall despite working with a base system that benchmarks more slowly.
Saying that something is "unusable" because of performance is silly at the best of times. Saying it about a system that is extensively used is even sillier.
Posted Mar 16, 2012 23:13 UTC (Fri) by hazmat (subscriber, #668)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds