Unladen Swallow 2009Q2 released

Posted Jul 15, 2009 16:41 UTC (Wed) by rriggs (guest, #11598)
Parent article: Unladen Swallow 2009Q2 released

I'm really looking to the Q3 release where the intention is to remove the GIL. I have had many situations in the past decade where I needed speed, had many CPUs available and a problem that lent itself to Python & threading. But the GIL always got in the way in these situations. I could only choose Python OR threading.

When I (invariably) did choose to use Python, I usually would end up spending most of my time on multi-processing & IPC issues and not the main problem domain.

Some of the newer modules (e.g. process) do help, but they are not nearly as helpful as removing the GIL will be.

Unladen Swallow 2009Q2 released

Posted Jul 15, 2009 20:54 UTC (Wed) by drag (guest, #31333) [Link] (2 responses)

I like using forks and sockets for IPC.

Linux does 'COW' with memory for forks so that as the number of processing 'threads' go up the memory overhead does not match. The only unique code would be stuff that gets created after the fork, which is probably going to be unique to each process/thread anyways if you do your job right and load up the modules you need prior to the fork().

I don't know how helpful that approach would be. Probably not much, but it's what I like doing.

Unladen Swallow 2009Q2 released

Posted Jul 15, 2009 23:33 UTC (Wed) by rriggs (guest, #11598) [Link] (1 responses)

When you have 16 worker threads all consuming work from a producer thread, the amount of coding you have to do for sockets (which I use as well) is a bit cumbersome (serializing the data, select() or poll() loops, waitpid(), etc.) given the clean interface one has with Queue and Thread.

Sometimes serializing the data (pickle/unpickle or what ever makes sense) takes up too much overhead.

Memory usage is rarely an issue. I've got enough RAM, enough CPU; I just need my favorite language to make the use of the resources I have a bit simpler.

Unladen Swallow 2009Q2 released

Posted Jul 16, 2009 11:28 UTC (Thu) by pboddie (guest, #50784) [Link]

When you have 16 worker threads all consuming work from a producer thread, the amount of coding you have to do for sockets (which I use as well) is a bit cumbersome (serializing the data, select() or poll() loops, waitpid(), etc.) given the clean interface one has with Queue and Thread.

There's a list of probably appropriate solutions for you on the Python Wiki; my humble offering, pprocess, tries to hide the awkward socket management behind fairly straightforward abstractions, as (I believe) do other solutions including the 2.6/3.x standard library offering, multiprocessing.

There are also libraries like Kamaelia which take a more radical approach to doing concurrent programming, and these certainly shouldn't require you to manage the tedious details.

Unladen Swallow 2009Q2 released

Posted Jul 16, 2009 0:31 UTC (Thu) by sergey (guest, #31763) [Link]

I've just tried the multiprocessing module in Python 2.6 for this and was able to make a single-threaded program scale on multiple CPUs quite nicely. It was a custom database dump code, and it took less then hour to make necessary changes and debug multiprocessig-specific quirks. What's even better was to find out that it works well with py2exe, so I was able to package up the whole thing into a single executable file and give it to my teammates who don't has Python installed. I'm sure multiprocessing won't work for some "interesting" requirements, but it seems to be a very efficient and well-designed general purpose solution.