User: Password:
|
|
Subscribe / Log in / New account

Moving to Python 3

Moving to Python 3

Posted Feb 10, 2011 7:10 UTC (Thu) by ssmith32 (subscriber, #72404)
Parent article: Moving to Python 3

Hmm.. what I'd wish they done in py3 was fix the GIL in CPython. I'd switch for that.

Yes, I could be misinformed - I haven't kept up because:

I was a huge python fan until a few years ago when I started making bigger and bigger apps in it, and threading them more and more. And I was not getting the performance gain from multi-core machines that I would get with other languages.

Quick googling seems to indicate it's still a known issue with the fix being "use processes". Perhaps viable, but not for me. Threads can be a mess, but there are some apps that fit the thread model pretty well.

I still use python for little scripts, but, at that point, the version rarely matters.


(Log in to post comments)

Moving to Python 3

Posted Feb 10, 2011 15:48 UTC (Thu) by fandingo (guest, #67019) [Link]

What's wrong with the Multiprocess library. There are all kinds of things like mutex locks and process safe data structures (like Queue) that will work across processes.
Using multiprocess, it's possible for a Python programmer to use "multi-threaded" techniques, with the underlying implementation being processes instead of threads.

I've been usign multiprocess with Python3 for a while, and I don't find that there are any limitations. What specific problems keep you from using it?

The GIL needs to go, but there's a lot of ways to work around it.

Moving to Python 3

Posted Feb 10, 2011 23:58 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

>What's wrong with the Multiprocess library.

It's not threading. My friend write a SIP switch in Python and the major bottleneck is SIP message parsing. Right now they use a thread pool and C library to parse them in several threads.

Pickling/unpickling results to pass them between processes would defeat the whole idea.

Moving to Python 3

Posted Feb 11, 2011 1:21 UTC (Fri) by cmccabe (guest, #60281) [Link]

Python seems like a weird choice for writing a SIP switch. It seems like you would want to write that sort of thing in C, for much the same reason apache is written in C.

Then again, I've never worked with SIP, so maybe I'm off base here.

I do think there's a desire for a language that's lower level than Python/Ruby/Perl, but higher level than C. Java and C++ sort of fill that role, but poorly. I'm hoping that I'll get a chance to test out Google Go to see if it works for that.

Moving to Python 3

Posted Feb 11, 2011 1:46 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

You may want to look into D: http://www.digitalmars.com/d/2.0/index.html

Like C++, but better. Native performance, safe dynamic arrays/strings, Unicode support, garbage collection (if you wish), namespaces, classes, exception handling, interfaces, value types, metaprogramming, compile-time evaluation, and more. The main limitation at the moment is that it remains a new language under heavy development, but it's far enough along for real programs.

Moving to Python 3

Posted Feb 11, 2011 19:47 UTC (Fri) by cmccabe (guest, #60281) [Link]

D is kind of interesting. Wikipedia says it's existed since 1999, but apparently the 1.0 release was only in 2007.

To me the big question is why I should use D over, say, Java. D has "Java-style single inheritance with interfaces and mixins" according to Wikipedia. If you're going to be Java-style, why not just use Java? I'm sure that D's syntax is probably slightly better than Java's. (That's not a difficult achievement.) But does that slight improvement justify throwing away all the existing libraries and code?

On the other hand, Google Go doesn't have inheritance at all. It has duck typing, enforced at compile-time. Its approach to concurrency is not mutexes and condition variables, but channels. I think these two improvements alone justify the switch to another language.

I am curious how the D and Google Go runtimes compare to the JVM. My experience has been that the JVM runs code relatively fast, but starting the JVM itself takes up a huge amount of resources. In comparison, equivalent C++ or C programs have a lower up-front cost. This is one of the reasons why Java web hosting is still more expensive than PHP web hosting, even today. Java has a high up-front performance cost but then does better than PHP over time.

Moving to Python 3

Posted Feb 11, 2011 21:02 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

D is far from "Java with slightly better syntax". It compiles to native binary code, like C or C++, and can link directly against C and C++ library functions. Its metaprogramming features give you all the flexibility of C++-style class and function templates, including compile-time "duck typing". It has a multithreading model based on message-passing, not mutexes or conditions, with explicit sharing and process-wide immutability built into the type system. Object invariants and pre-/post-condition annotation are available to support design-by-contract.

Moving to Python 3

Posted Feb 12, 2011 1:33 UTC (Sat) by cmccabe (guest, #60281) [Link]

> D compiles to native binary code, like C or C++

Well, technically gcj can compile Java to native binary code. However, Java in general has poor integration with non-Java code-- I can't deny that.

> Its metaprogramming features give you all the flexibility of C++-style
> class and function templates, including compile-time "duck typing"

Saying D has duck typing because it has templates is a little misleading. Nobody would seriously say that C++ has duck typing, and it also has templates. For most of your classes in either C++ or D, you're still going to be worrying about inheritance hierarchies and doing "big design up front" which seems more like the Java way of doing things, not the Ruby way.

Speaking of metaprogramming... one nice thing about Google Go is that because there's no inheritance hierarchies, there's no dynamic_cast. One less ugly piece of clutter.

The message passing stuff in Phobos is interesting. It seems that the integration into the language is just at the library level, though, rather than being an integral part of the syntax as in Go.

Overall, the more I learn about D, the more I see it as a "better C++". Like C++, it tries to include everything *and* the kitchen sink. Thankfully multiple inheritance didn't make the cut this time, but most of the other clutter did. (And just like C++, there are some weird omissions-- like reflection.) Google Go, on the other hand, seems to be a more elegant and minimalist language, kind of like C. I like that. But, these are just first impressions, and I guess I might change my mind later.

Moving to Python 3

Posted Feb 12, 2011 3:44 UTC (Sat) by nybble41 (subscriber, #55106) [Link]

> Saying D has duck typing because it has templates is a little misleading. Nobody would seriously say that C++ has duck typing, and it also has templates.

Why not? The following is valid D code:

void callMethod(T)(T object)
{
  if (object.property)
    object.method();
}
...
callMethod(new ClassA);
callMethod(new ClassB); // unrelated to ClassA

Any type which has the required property and method can be passed to the template function. I would say that looks exactly like duck typing. You can do the same thing in C++ with template functions, and in fact the STL makes extensive use of such functions.

If you end up using interfaces and/or inheritance instead that is only because the strongly-typed approach (which Go does not support) has advantages over duck typing, including better compile-time error checking and runtime performance.

> Speaking of metaprogramming... one nice thing about Google Go is that because there's no inheritance hierarchies, there's no dynamic_cast. One less ugly piece of clutter.

One could say that this is only because there is an implied dynamic_cast (but with worse performance, since it requires a runtime lookup) at every location where an object is used.

> The message passing stuff in Phobos is interesting. It seems that the integration into the language is just at the library level, though, rather than being an integral part of the syntax as in Go.

While I fail to see any reason to prefer extraneous syntax over a library, there is in fact some syntax in D intended to support multithreaded programming: the 'immutable' and 'sharable' type keywords, for example, plus the fact that global variables occupy thread-local storage by default. Assurance that shared state cannot be mutated behind the scenes (not just through a given reference, like 'const', but *anywhere* in the process) allows large messages to be passed efficiently and safely.

Moving to Python 3

Posted Feb 13, 2011 1:55 UTC (Sun) by cmccabe (guest, #60281) [Link]

> [For golang] One could say that this is only because there is an
> implied dynamic_cast (but with worse performance, since it requires a
> runtime lookup) at every location where an object is used.

You are confused. Go has static typing. As in, checked at compile-time. Code that misuses types will not compile.

Think of it this way: if you refer to a type T in a C++ template, and then use T::foo, you're not doing a dynamic_cast. You're just using the normal type system. If T does not have a foo method, the code will not compile. Similarly, in Go, you will get a compile-time, not runtime, error, if you try to use methods on an object that don't exist.

Also, you seem to be confused about C++ as well. dynamic_cast *always* "requires a runtime lookup" on the RTTI (runtime type information) in C++. If you do not compile with support for RTTI, you cannot use dynamic_cast. C++ templates, on the other hand, are a completely compile-time mechanism. I don't know why you would say that templated code is not as "strongly-typed" as other C++ code.

> While I fail to see any reason to prefer extraneous syntax over a library,
> there is in fact some syntax in D intended to support multithreaded
> programming: the 'immutable' and 'sharable' type keywords, for example,
> plus the fact that global variables occupy thread-local storage by
> default. Assurance that shared state cannot be mutated behind the scenes
> (not just through a given reference, like 'const', but *anywhere* in the
> process) allows large messages to be passed efficiently and safely.

There's a lot of syntax in D, but is it the right syntax? I think we are going to have to agree to disagree.

I also would like to note in passing that C/C++ have the const, restrict and __thread keywords, which could be used to provide exactly the same message-passing interface you describe in D. In fact, there have been a lot of them created over the years.

Moving to Python 3

Posted Feb 13, 2011 9:33 UTC (Sun) by nybble41 (subscriber, #55106) [Link]

> You are confused. Go has static typing. As in, checked at compile-time. Code that misuses types will not compile.... I don't know why you would say that templated code is not as "strongly-typed" as other C++ code.

Perhaps I am confused after all. I fail to see how Go can be statically typed, with, in particular, specific types for each function parameter, as well as "duck typed", where any type which provides certain methods will be accepted. If types must be known at compile-time then what would be the point of "duck typing"?

Note, too, that it is entirely possible to misuse types in a "duck typed" language, even with compile-time checking. Since there is no information regarding the connections between types (namely inheritance), any type which provides the right method signatures will be accepted, whether or not they were intended to be used in that fashion. This is where explicit interfaces and inheritance can prove very useful as a way of preventing semantic errors.

Go code and C++/D templates are not as strongly-typed for the reasons I stated above: namely, any type with a matching signature will be accepted, even if that use of the type's members is incorrect or even undefined. Classes with inheritance are more strongly typed because they explicitly state that they implement the expected interface--the semantics, not just the type signatures.

> Also, you seem to be confused about C++ as well. dynamic_cast *always* "requires a runtime lookup" on the RTTI (runtime type information) in C++.

RTTI is only required to test that the object you are casting is derived from the class you are casting to. As such, it is a very simple test. In Go the compiler must either generate separate code for each type (like C++/D template functions) or else look up the member offset or function pointer by name for each object at runtime, which seems to me quite likely to be far more expensive.

> I also would like to note in passing that C/C++ have the const, restrict and __thread keywords, which could be used to provide exactly the same message-passing interface you describe in D.

The 'const' and 'restrict' keywords in C++ are not equivalent to 'immutable' in D. Immutable data is guaranteed to never change, and this is enforced by the compiler. You can't pass mutable data to a function expecting immutable, for example. (You *can* pass mutable data to a function which accepts 'const' data, in addition to immutable data.) The 'restrict' keyword is just a declaration of intent. If you declare a pointer 'const' and 'restrict' then the compiler will optimize the code on the assumption that the data cannot be altered via another pointer, but it is up to the programmer to ensure that this is actually the case. I do not think it insignificant, either, that in D you have to specifically declare data as shareable between threads (again enforced by the compiler, unlike in C/C++), whereas in C/C++ all global data is shared by default and you must use __thread to make it private. In D thread-safety is the default; in C/C++ it is a rarely-used compiler-specific extension. (C++0x will supposedly add TLS as a standard storage class.)

Moving to Python 3

Posted Feb 13, 2011 21:12 UTC (Sun) by cmccabe (guest, #60281) [Link]

> Perhaps I am confused after all. I fail to see how Go can be statically
> typed, with, in particular, specific types for each function parameter, as
> well as "duck typed", where any type which provides certain methods will
> be accepted. If types must be known at compile-time then what would be the
> point of "duck typing"?

Golang's philosophy is that inheritance is evil. Not "multiple inheritance is evil" (that is Java's philosophy), or "inheritance is often less useful than composition" (that's Scott Meyers' philosophy in Effective C++). Just "inheritance is evil."

Why is inheritance evil? Well, it forces you to do a lot of work up front before you start writing code. A lot of that work is just writing boilerplate code like Singletons, abstract base classes, Factories, Adaptors, etc. This leads to longer and less readable code. Changing the inheritance hierarchy is difficult after you've written the code. Moreover, unless the code is totally trivial, you will *have* to change the hierarchy in response to changing requirements and new insights into the design that you'll have over time.

The dirty little secret of C++ is that code written in the high-level, object-oriented style often tends to be longer than code written in the old-fashioned C style. It starts to smell like Java.

For a good criticism of Java, and deep inheritance hierarchies in general, see:
http://steve-yegge.blogspot.com/2006/03/execution-in-king...

[snip discussion of TLS, const, and restrict]

You seem to have a good understanding of const and restrict. Your analysis is correct. I'm glad to hear that __thread will be standardized soon. pthread_getspecific is slow on Linux.

Moving to Python 3

Posted Feb 15, 2011 13:05 UTC (Tue) by marcH (subscriber, #57642) [Link]

> Perhaps I am confused after all. I fail to see how Go can be statically typed, with, in particular, specific types for each function parameter, as well as "duck typed", where any type which provides certain methods will be accepted. If types must be known at compile-time then what would be the point of "duck typing"?

The point is: no need to design and maintain a type *hierarchy*. This point looks orthogonal to the static versus dynamic debate.

See also "Structural Typing".

Moving to Python 3

Posted Feb 11, 2011 10:48 UTC (Fri) by mgedmin (subscriber, #34497) [Link]

Anthony Baxter had an entertaining presentation about writing VoIP code in Python back in 2004, titled "Scripting Language", My Shiny Metal Arse. Apparently, it is fast enough.

Moving to Python 3

Posted Feb 11, 2011 6:36 UTC (Fri) by njs (guest, #40338) [Link]

The multiprocessing module can use shared memory to pass state:
http://docs.python.org/library/multiprocessing.html#shari...

I've never used it, but in principle it should be pretty much equivalent to threads.

Moving to Python 3

Posted Feb 13, 2011 20:42 UTC (Sun) by spaetz (subscriber, #32870) [Link]

What's wrong with multiprocess. According to the docs, some parts don't work on *bsd. Which breaks intrroperability.

Moving to Python 3

Posted Feb 11, 2011 1:09 UTC (Fri) by cmccabe (guest, #60281) [Link]

> I was a huge python fan until a few years ago when I started making bigger
> and bigger apps in it, and threading them more and more. And I was not
> getting the performance gain from multi-core machines that I would get
> with other languages.

Just out of curiosity, what applications did you feel you needed threads for?

If you're sharing a lot of state between threads, then I have to ask why? It really makes everything so much more difficult. My experience has been that once your program starts using locking, you're not object-oriented any more; you're "mutex-oriented." You can't just freely reuse objects and code because you might violate the constraints that the code was written under.

If you're not sharing a lot of state, then processes are just as good as threads.

Moving to Python 3

Posted Feb 13, 2011 13:39 UTC (Sun) by kleptog (subscriber, #1183) [Link]

I'm not a huge fan of threads but where they mostly come in is dealing with I/O. Say you have an app that has to send/receive data over 3 different pipes and in between it also has to do actual work. While you can write a main loop that does a select() over each descriptor and calls the right code when something becomes readable/writable, it's conceptually much clearer having a thread whose job it is to read any data and process it. Especially when you have to deal with issues like write() blocking, etc...

This right away gives you 4 threads. Add a thread to monitor everything (since thread death does not get signalled anywhere) and you're at 5.

There's not so much shared state as that I/O on any port can execute callbacks which could access anything the initiator of the request wanted (go closures!). There's barely any locking, python's atomic instructions is sufficient (though I imagine Queue does it under the hood).

One effect of the fact that I/O falls outside the GIL means that the process running at full speed can take 110% CPU. (There's a lot of I/O).

Back to the issue at hand: Python2's unicode handling bites me daily. Whoever decided that using str() on a unicode string should *except* when you have a unicode character, should be shot. Just error *every* time, then I won't get called at 3 in the morning to fix the bloody thing (usually buried in some library, even some standard python libs have had bugs in the past).

Moving to Python 3

Posted Feb 14, 2011 19:26 UTC (Mon) by cmccabe (guest, #60281) [Link]

> I'm not a huge fan of threads but where they mostly come in is dealing
> with I/O. Say you have an app that has to send/receive data over 3
> different pipes and in between it also has to do actual work. While you
> can write a main loop that does a select() over each descriptor and calls
> the right code when something becomes readable/writable, it's conceptually
> much clearer having a thread whose job it is to read any data and process
> it. Especially when you have to deal with issues like write() blocking,

Twisted is a great way to do multiple I/O operations without using threads. It wraps the ugly select() interface in something much nicer.

http://en.wikipedia.org/wiki/Twisted_(software)

Moving to Python 3

Posted Feb 17, 2011 8:10 UTC (Thu) by rqosa (subscriber, #24136) [Link]

Or libevent with pyevent (or with gevent).

Moving to Python 3

Posted Feb 17, 2011 8:26 UTC (Thu) by rqosa (subscriber, #24136) [Link]

That way can scale poorly, because there must be at least one thread per FD.

Using an epoll-driven main loop and a pool of worker threads (with one work queue per worker thread) makes the amount of threads become independent from the amount of FDs, so you can adjust the amount of threads to whatever gives the best performance. It also has the benefit of avoiding the overhead of thread-start-on-FD-open and thread-quit-on-FD-close, since you can reuse the existing threads. (Make it so that any idle thread will wait on a semaphore until its work queue becomes non-empty. Also, rather than using epoll directly, use libevent, so that it's portable to non-Linux systems.)

Moving to Python 3

Posted Feb 17, 2011 8:41 UTC (Thu) by rqosa (subscriber, #24136) [Link]

> at least one thread per FD

Forgot to mention this in my previous post: the "one thread/process per FD" pattern is the main design issue that made possible the Slowloris DoS attack, which LWN covered 2 years ago.

Moving to Python 3

Posted Feb 17, 2011 20:50 UTC (Thu) by kleptog (subscriber, #1183) [Link]

I think you misread my post: I have a fixed number of FDs and a fixed number of threads. Each FD has a completely different purpose and protocol and so an event loop is not really practical. You have to get several different components of the system (which don't know of each other's existence) to work through a single event loop. Sure it's possible, but threads are a nice way to isolate them.

Of course in the general case you are right, a service like a webserver should try to reduce the number of threads. But also in the special case of CPython it's pointless to use more threads, since the GIL prevents more than one thread running at a time anyway.

Moving to Python 3

Posted Feb 14, 2011 4:33 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

As the other comment mentioned - I/O stuff is common.

Where I first started running up against problems was in a GUI that had a central append-only (which eased the locking constraints) data structure, and a corresponding tree of threads. Each thread could generate a new node(s) on the data structure, which might need more analysis by more threads. The operations on the data structure that needed to be thread safe had the thread safety encapsulated in the data structure. It just fell naturally into a threaded design.

As far as being mutex-oriented programming - with an appropriately designed shared data structure, I've rarely ran into that problem. There's usually a couple operations that you think hard about, get right, encapsulate, and move on. OOP really complements this well.

As far as the GIL-specific issue I ran into - with the GIL, your python interpreter runs on a single core, and all threads in that instance of the interpreter run on that core. So it makes multi-core kind of useless for heavily threaded programs. If you go the multi-process route, you get multiple interpreters. But like I said, sometimes threads can be nice :|

Moving to Python 3

Posted Feb 14, 2011 19:39 UTC (Mon) by cmccabe (guest, #60281) [Link]

> Where I first started running up against problems was in a GUI that had a
> central append-only (which eased the locking constraints) data structure,
> and a corresponding tree of threads. Each thread could generate a new
> node(s) on the data structure, which might need more analysis by more
> threads. The operations on the data structure that needed to be thread
> safe had the thread safety encapsulated in the data structure. It just
> fell naturally into a threaded design.

In my experience, when you start sharing a lot of data between threads, you can take one of two approaches. You can have a giant lock that covers all threads. This is sort of like the old BKL or the GIL itself. One giant lock is simple, but it limits scalability a lot. Alternately, you can have many different little locks. This is great for performance and scalability, but hard on the poor programmers. Debugging becomes very difficult because runs are not repeatable. Code reuse is impaired because everything is tangled in this web of locks and you can't easily move code around.

When you have many little locks, the usual approach is to impose absolute ordering, so that if you take lock A before B at any point, you must always take A before B. That's what the kernel does. It seems to be the best strategy, but again, why impose this on yourself when you don't have to? Just don't share state unless you have to.

It's sad that programmers have been trained to think that threads are "simple" and "natural" and processes are "hard." That's the exact reverse of reality. I blame Windows and its high per-process overhead.

Moving to Python 3

Posted Feb 15, 2011 11:15 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

Data sharing looks easy in threadland, as long as you don't look too closely. Processes make you worry about files and sockets and pipes and god-knows-what, and J. Newbie Programmer thinks that all looks like a lot more work than just having threads that can all see all the data.

(See also: "Why do I have to go through this big scary graphics library with lots of things to set up? Why can't I just stuff pixels into the graphics card?")


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds