> I was a huge python fan until a few years ago when I started making bigger
> and bigger apps in it, and threading them more and more. And I was not
> getting the performance gain from multi-core machines that I would get
> with other languages.
Just out of curiosity, what applications did you feel you needed threads for?
If you're sharing a lot of state between threads, then I have to ask why? It really makes everything so much more difficult. My experience has been that once your program starts using locking, you're not object-oriented any more; you're "mutex-oriented." You can't just freely reuse objects and code because you might violate the constraints that the code was written under.
If you're not sharing a lot of state, then processes are just as good as threads.
Posted Feb 13, 2011 13:39 UTC (Sun) by kleptog (subscriber, #1183)
[Link]
I'm not a huge fan of threads but where they mostly come in is dealing with I/O. Say you have an app that has to send/receive data over 3 different pipes and in between it also has to do actual work. While you can write a main loop that does a select() over each descriptor and calls the right code when something becomes readable/writable, it's conceptually much clearer having a thread whose job it is to read any data and process it. Especially when you have to deal with issues like write() blocking, etc...
This right away gives you 4 threads. Add a thread to monitor everything (since thread death does not get signalled anywhere) and you're at 5.
There's not so much shared state as that I/O on any port can execute callbacks which could access anything the initiator of the request wanted (go closures!). There's barely any locking, python's atomic instructions is sufficient (though I imagine Queue does it under the hood).
One effect of the fact that I/O falls outside the GIL means that the process running at full speed can take 110% CPU. (There's a lot of I/O).
Back to the issue at hand: Python2's unicode handling bites me daily. Whoever decided that using str() on a unicode string should *except* when you have a unicode character, should be shot. Just error *every* time, then I won't get called at 3 in the morning to fix the bloody thing (usually buried in some library, even some standard python libs have had bugs in the past).
Moving to Python 3
Posted Feb 14, 2011 19:26 UTC (Mon) by cmccabe (guest, #60281)
[Link]
> I'm not a huge fan of threads but where they mostly come in is dealing
> with I/O. Say you have an app that has to send/receive data over 3
> different pipes and in between it also has to do actual work. While you
> can write a main loop that does a select() over each descriptor and calls
> the right code when something becomes readable/writable, it's conceptually
> much clearer having a thread whose job it is to read any data and process
> it. Especially when you have to deal with issues like write() blocking,
Twisted is a great way to do multiple I/O operations without using threads. It wraps the ugly select() interface in something much nicer.
Posted Feb 17, 2011 8:26 UTC (Thu) by rqosa (subscriber, #24136)
[Link]
That way can scale poorly, because there must be at least one thread per FD.
Using an epoll-driven main loop and a pool of worker threads (with one work queue per worker thread) makes the amount of threads become independent from the amount of FDs, so you can adjust the amount of threads to whatever gives the best performance. It also has the benefit of avoiding the overhead of thread-start-on-FD-open and thread-quit-on-FD-close, since you can reuse the existing threads. (Make it so that any idle thread will wait on a semaphore until its work queue becomes non-empty. Also, rather than using epoll directly, use libevent, so that it's portable to non-Linux systems.)
Moving to Python 3
Posted Feb 17, 2011 8:41 UTC (Thu) by rqosa (subscriber, #24136)
[Link]
> at least one thread per FD
Forgot to mention this in my previous post: the "one thread/process per FD" pattern is the main design issue that made possible the Slowloris DoS attack, which LWN covered 2 years ago.
Moving to Python 3
Posted Feb 17, 2011 20:50 UTC (Thu) by kleptog (subscriber, #1183)
[Link]
I think you misread my post: I have a fixed number of FDs and a fixed number of threads. Each FD has a completely different purpose and protocol and so an event loop is not really practical. You have to get several different components of the system (which don't know of each other's existence) to work through a single event loop. Sure it's possible, but threads are a nice way to isolate them.
Of course in the general case you are right, a service like a webserver should try to reduce the number of threads. But also in the special case of CPython it's pointless to use more threads, since the GIL prevents more than one thread running at a time anyway.
Moving to Python 3
Posted Feb 14, 2011 4:33 UTC (Mon) by ssmith32 (subscriber, #72404)
[Link]
As the other comment mentioned - I/O stuff is common.
Where I first started running up against problems was in a GUI that had a central append-only (which eased the locking constraints) data structure, and a corresponding tree of threads. Each thread could generate a new node(s) on the data structure, which might need more analysis by more threads. The operations on the data structure that needed to be thread safe had the thread safety encapsulated in the data structure. It just fell naturally into a threaded design.
As far as being mutex-oriented programming - with an appropriately designed shared data structure, I've rarely ran into that problem. There's usually a couple operations that you think hard about, get right, encapsulate, and move on. OOP really complements this well.
As far as the GIL-specific issue I ran into - with the GIL, your python interpreter runs on a single core, and all threads in that instance of the interpreter run on that core. So it makes multi-core kind of useless for heavily threaded programs. If you go the multi-process route, you get multiple interpreters. But like I said, sometimes threads can be nice :|
Moving to Python 3
Posted Feb 14, 2011 19:39 UTC (Mon) by cmccabe (guest, #60281)
[Link]
> Where I first started running up against problems was in a GUI that had a
> central append-only (which eased the locking constraints) data structure,
> and a corresponding tree of threads. Each thread could generate a new
> node(s) on the data structure, which might need more analysis by more
> threads. The operations on the data structure that needed to be thread
> safe had the thread safety encapsulated in the data structure. It just
> fell naturally into a threaded design.
In my experience, when you start sharing a lot of data between threads, you can take one of two approaches. You can have a giant lock that covers all threads. This is sort of like the old BKL or the GIL itself. One giant lock is simple, but it limits scalability a lot. Alternately, you can have many different little locks. This is great for performance and scalability, but hard on the poor programmers. Debugging becomes very difficult because runs are not repeatable. Code reuse is impaired because everything is tangled in this web of locks and you can't easily move code around.
When you have many little locks, the usual approach is to impose absolute ordering, so that if you take lock A before B at any point, you must always take A before B. That's what the kernel does. It seems to be the best strategy, but again, why impose this on yourself when you don't have to? Just don't share state unless you have to.
It's sad that programmers have been trained to think that threads are "simple" and "natural" and processes are "hard." That's the exact reverse of reality. I blame Windows and its high per-process overhead.
Moving to Python 3
Posted Feb 15, 2011 11:15 UTC (Tue) by mpr22 (subscriber, #60784)
[Link]
Data sharing looks easy in threadland, as long as you don't look too closely. Processes make you worry about files and sockets and pipes and god-knows-what, and J. Newbie Programmer thinks that all looks like a lot more work than just having threads that can all see all the data.
(See also: "Why do I have to go through this big scary graphics library with lots of things to set up? Why can't I just stuff pixels into the graphics card?")