User: Password:
|
|
Subscribe / Log in / New account

Moving to Python 3

Moving to Python 3

Posted Feb 13, 2011 13:39 UTC (Sun) by kleptog (subscriber, #1183)
In reply to: Moving to Python 3 by cmccabe
Parent article: Moving to Python 3

I'm not a huge fan of threads but where they mostly come in is dealing with I/O. Say you have an app that has to send/receive data over 3 different pipes and in between it also has to do actual work. While you can write a main loop that does a select() over each descriptor and calls the right code when something becomes readable/writable, it's conceptually much clearer having a thread whose job it is to read any data and process it. Especially when you have to deal with issues like write() blocking, etc...

This right away gives you 4 threads. Add a thread to monitor everything (since thread death does not get signalled anywhere) and you're at 5.

There's not so much shared state as that I/O on any port can execute callbacks which could access anything the initiator of the request wanted (go closures!). There's barely any locking, python's atomic instructions is sufficient (though I imagine Queue does it under the hood).

One effect of the fact that I/O falls outside the GIL means that the process running at full speed can take 110% CPU. (There's a lot of I/O).

Back to the issue at hand: Python2's unicode handling bites me daily. Whoever decided that using str() on a unicode string should *except* when you have a unicode character, should be shot. Just error *every* time, then I won't get called at 3 in the morning to fix the bloody thing (usually buried in some library, even some standard python libs have had bugs in the past).


(Log in to post comments)

Moving to Python 3

Posted Feb 14, 2011 19:26 UTC (Mon) by cmccabe (guest, #60281) [Link]

> I'm not a huge fan of threads but where they mostly come in is dealing
> with I/O. Say you have an app that has to send/receive data over 3
> different pipes and in between it also has to do actual work. While you
> can write a main loop that does a select() over each descriptor and calls
> the right code when something becomes readable/writable, it's conceptually
> much clearer having a thread whose job it is to read any data and process
> it. Especially when you have to deal with issues like write() blocking,

Twisted is a great way to do multiple I/O operations without using threads. It wraps the ugly select() interface in something much nicer.

http://en.wikipedia.org/wiki/Twisted_(software)

Moving to Python 3

Posted Feb 17, 2011 8:10 UTC (Thu) by rqosa (subscriber, #24136) [Link]

Or libevent with pyevent (or with gevent).

Moving to Python 3

Posted Feb 17, 2011 8:26 UTC (Thu) by rqosa (subscriber, #24136) [Link]

That way can scale poorly, because there must be at least one thread per FD.

Using an epoll-driven main loop and a pool of worker threads (with one work queue per worker thread) makes the amount of threads become independent from the amount of FDs, so you can adjust the amount of threads to whatever gives the best performance. It also has the benefit of avoiding the overhead of thread-start-on-FD-open and thread-quit-on-FD-close, since you can reuse the existing threads. (Make it so that any idle thread will wait on a semaphore until its work queue becomes non-empty. Also, rather than using epoll directly, use libevent, so that it's portable to non-Linux systems.)

Moving to Python 3

Posted Feb 17, 2011 8:41 UTC (Thu) by rqosa (subscriber, #24136) [Link]

> at least one thread per FD

Forgot to mention this in my previous post: the "one thread/process per FD" pattern is the main design issue that made possible the Slowloris DoS attack, which LWN covered 2 years ago.

Moving to Python 3

Posted Feb 17, 2011 20:50 UTC (Thu) by kleptog (subscriber, #1183) [Link]

I think you misread my post: I have a fixed number of FDs and a fixed number of threads. Each FD has a completely different purpose and protocol and so an event loop is not really practical. You have to get several different components of the system (which don't know of each other's existence) to work through a single event loop. Sure it's possible, but threads are a nice way to isolate them.

Of course in the general case you are right, a service like a webserver should try to reduce the number of threads. But also in the special case of CPython it's pointless to use more threads, since the GIL prevents more than one thread running at a time anyway.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds