|
|
Subscribe / Log in / New account

Grok the GIL (opensource.com)

Grok the GIL (opensource.com)

Posted Apr 21, 2017 8:34 UTC (Fri) by Sesse (subscriber, #53779)
In reply to: Grok the GIL (opensource.com) by quietbritishjim
Parent article: Grok the GIL (opensource.com)

> This is what CPU heavy Python apps usually spend most of their time in anyway

Not necessarily. There are tons of cases (basically anything related to web) where you spend your time in the Python interpreter, not in C extensions.


to post comments

Grok the GIL (opensource.com)

Posted Apr 21, 2017 9:01 UTC (Fri) by quietbritishjim (subscriber, #114117) [Link] (22 responses)

I would have thought that "Basically anything related to the web" is going to spend most of its time interacting with a database, which of course is I/O. What's more, database interactions run fastest (even with one thread) when you put as much logic as possible into prepared statements, just as numpy runs fastest when you heavily vectorise your operations. Maybe you could elaborate on what you think of as a web application? I'm genuinely curious.

More broadly, if you have a Python application in any domain that's using loads of CPU in the interpreter then I think you've used the wrong language, because native code will be orders of magnitude faster. Removing the GIL is not going to change that. An obvious counter argument is that CPUs are cheap. Yes, but are they so cheap that you can afford to buy 100 times as many web servers?

It's OK that Python is hopelessly slow because it's intended as a glue language that just coordinates fast code (in-process C extensions or out-of-process via I/O). This works really well because most code is actually the same in different applications, so with this strategy the shared code gets written once in a native language (in numpy or Postgres or whatever) and the tiny amount of application-specific code gets written in Python. As I see it, the GIL doesn't have much effect on this.

Grok the GIL (opensource.com)

Posted Apr 21, 2017 9:58 UTC (Fri) by Sesse (subscriber, #53779) [Link] (10 responses)

Well, for one, the HTML templating systems tend to be pure-Python, and they take a non-zero amount of time. And if you're using a web framework, all sorts of routing logic… It all adds up. Talking to the database is usually only a small part.

This isn't unique to Python, by the way. Perl, Python, Ruby, Lua… most applications I've seen _don't_ have this “90% of time is spend in doing XYZ, so just use libXYZ from C” pattern.

Grok the GIL (opensource.com)

Posted Apr 21, 2017 11:04 UTC (Fri) by excors (subscriber, #95769) [Link] (9 responses)

Or if they do follow that pattern, and they switch to libXYZ which is 100x faster, then the not-XYZ work which used to take 10% of the execution time is now taking ~92%. Now they're back in the same pattern, and (if they're operating at a large enough scale that hosting costs outweigh development costs) they really need to optimise all that non-XYZ code.

That process will repeat until there are no identifiable bottlenecks, they've just got hundreds of thousands of lines of Python code that are uniformly slow, and then they'll want to make Python faster. (It's basically Amdahl's law - once you speed up everything you can, performance is likely dominated by the parts you couldn't speed up.)

Grok the GIL (opensource.com)

Posted Apr 23, 2017 10:39 UTC (Sun) by joib (subscriber, #8541) [Link] (8 responses)

The sort-of sad thing is that python (or to be specific, cpython which is the implementation ~everybody uses) is still a simplistic bytecode interpreter which uses reference counting. I mean, come on, the 1970's called.

Meanwhile, say, Lisp, a similarly dynamic language, has had sophisticated native code compilers and generational GC for decades. Whereas in the python world efforts like unladen swallow (sp?) fail or are not widely used (pypy). What gives?

Not to pick on python specifically, ruby, perl, R, octave etc. are all equally bad. LuaJIT and modern JavaScript runtimes are the happy exceptions, though. Perhaps one shouldn't draw too much conclusions from javascript, considering the $$$ spent there, but then again, LuaJIT is basically a one-man show.

Grok the GIL (opensource.com)

Posted Apr 23, 2017 11:00 UTC (Sun) by Sesse (subscriber, #53779) [Link]

To be honest, I believe even LuaJIT without the JIT would be faster than most languages in this class. JIT versus interpreter is only one part of the equation; just as important is things like value representation. Lua has a head start there in terms of being a simple and small language, which helps. That, and the LuaJIT guy is brilliant.

Grok the GIL (opensource.com)

Posted Apr 23, 2017 12:26 UTC (Sun) by njs (subscriber, #40338) [Link]

The main reason PyPy's not widely used is that people don't use Python for just the language, but also the ecosystem. And it turns out that ecosystem has a bunch of important libraries that are written against the CPython C API, which more-or-less exposes all of CPython's internal implementation details. This makes it extremely difficult for alternative implementations to get traction. (There's even tons of code out that there that implicitly assumes a refcounting gc – the Pyston folks actually started with a fancier GC and then switched back to refcounting to improve CPython compatibility.) None of this is how you would ideally want anything to work, but when it's ideals versus the 100,000 packages on PyPI, the 100,000 packages tend to win.

It doesn't help that Python's semantics are much richer than other dynamic languages like JS and Lua (and I suspect lisp too) in a way that makes effective JITting much harder – e.g. a basic operation like attribute lookup involves like 10 different special cases you have to check. This is what motivates PyPy's weird architecture – they need a JIT engine that can introspect "basic operations" like this and automatically generate context specific optimized variants, so they implement these operations in a high-level language ("RPython") that their JIT engine can see through.

The good news is that the PyPy devs have been doing a truly heroic job over the last ~year on polishing up their fake CPython C API layer to the point where it's starting to be able to handle gnarly old libraries like numpy, lxml, etc. So hopefully this last major roadblock will be removed soon.

Grok the GIL (opensource.com)

Posted Apr 25, 2017 5:43 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

Lua is not really free-threaded, you can use it in multithreaded environment as long as you're extra-careful with data sharing.

So far I don't know _any_ mainstream scripting languages that have proper multithreading.

Grok the GIL (opensource.com)

Posted Apr 25, 2017 7:06 UTC (Tue) by Sesse (subscriber, #53779) [Link] (1 responses)

Perl? The Perl 5 threads are deprecated, though.

Grok the GIL (opensource.com)

Posted Apr 25, 2017 11:50 UTC (Tue) by niner (subscriber, #26151) [Link]

Perl 5 threads have never been more than a base for fork() emulation on Windows. Ironically there's a drop in replacement called "forks", so instead of "use threads" you write "use forks" and get even better performance.

Grok the GIL (opensource.com)

Posted Apr 25, 2017 11:48 UTC (Tue) by niner (subscriber, #26151) [Link] (2 responses)

Perl 6 does support proper multithreading.

Grok the GIL (opensource.com)

Posted Apr 25, 2017 11:49 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

Is Perl 6 considered a mainstream scripting language?

Grok the GIL (opensource.com)

Posted Apr 29, 2017 19:08 UTC (Sat) by flussence (guest, #85566) [Link]

I'd consider Perl 6 mainstream, all mainstream distros are making an effort to package it, it's regularly shipping Windows/MacOS/Docker installers, is supported on various "exec arbitrary code on our server" sites, and even GitHub's broken and subpar syntax/doc rendering toolchain now partially acknowledges it exists.

But to play the numbers game a bit, I'd like to point out that PHP provides a genuine shoe-facing hand cannon: http://php.net/pthreads

Grok the GIL (opensource.com)

Posted Apr 21, 2017 11:20 UTC (Fri) by kigurai (guest, #85475) [Link] (3 responses)

I agree that most CPU intensive stuff is already done in C-libraries, but I am not so sure this makes the point of removing the GIL moot.
Sometimes you have a library function that does the computations in C (fast) but you need to run it on a shitload of data, and the library itself is not built for that.
Today you solve that by e.g. multiprocessing.Pool() but if the data you are operating on is large, or not picklable, then you either can't, or at least have to write quite complicated boilerplate code to share the data somehow.
Thus I would really love to instead be able to use a pool of threads that can all freely share the data I have already loaded, which is of course impossible as long as the GIL remains.

Grok the GIL (opensource.com)

Posted Apr 21, 2017 15:46 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

Yup. Think TensorFlow. Highly-optimized C++ code with threading pools and everything, 99% of CPU time is spent there. 1% is spent in linear (because if GIL) python “driver”. Now think about “monster desktop” with two Monster Xeons. 96 threads, add Amdahl's law and... vila: almost half of wall-clock time is spent in python! Still think GIL is “not a big deal if core of your toolboox is in C++”?

Grok the GIL (opensource.com)

Posted Apr 21, 2017 15:51 UTC (Fri) by Sesse (subscriber, #53779) [Link] (1 responses)

As someone who has had TensorFlow optimization as a day job (in the Google Brain team)… sorry, it's a bad example. :-) But it doesn't really use Python for much except just setting up the graph and then pressing play.

Grok the GIL (opensource.com)

Posted Apr 22, 2017 18:45 UTC (Sat) by khim (subscriber, #9252) [Link]

Yes, TensorFlow does not use Python for much... and usually GIL is not a problem... yet in certain use cases almost half of wall-clock time in spent in Python. I believe dvyukov have some numbers.

Grok the GIL (opensource.com)

Posted Apr 21, 2017 16:40 UTC (Fri) by niner (subscriber, #26151) [Link] (6 responses)

So which native library do you recommend as replacement for http://pygments.org/?

Grok the GIL (opensource.com)

Posted Apr 25, 2017 1:33 UTC (Tue) by sciurus (guest, #58832) [Link] (5 responses)

If you're building a web application, consider doing syntax highlighting client-side using a library like http://prismjs.com/

Grok the GIL (opensource.com)

Posted Apr 25, 2017 3:55 UTC (Tue) by pabs (subscriber, #43278) [Link] (4 responses)

Please don't, you will alienate people who do not allow JavaScript to run.

Grok the GIL (opensource.com)

Posted Apr 25, 2017 10:47 UTC (Tue) by jond (subscriber, #37669) [Link] (3 responses)

One could argue that losing syntax highlighting is still consistent with using Javascript in a "gracefully degrade" fashion.

Grok the GIL (opensource.com)

Posted Apr 25, 2017 11:41 UTC (Tue) by karkhaz (subscriber, #99844) [Link]

Slightly orthogonal issue: it seems ridiculous to have client-side computation for something that will compute exactly the same result on every client's machine. Why not do the syntax highlighting once on the server and serve the result to everybody, saving their battery life and cutting an iota from the page latency?

Grok the GIL (opensource.com)

Posted Apr 25, 2017 17:24 UTC (Tue) by pboddie (guest, #50784) [Link] (1 responses)

Except that Hipster 2.0 sites - those that use JavaScript for everything, which is where this ends up - don't tend to gracefully degrade, instead leaving a jumble of elements or even a blank page since the text and graphics aren't able to glide in smoothly from off the sides of the screen without scripts activated (or whatever special effect it is that is inevitably needed to spice up the ten to twenty words of actual content). On my rather old machine, such animations are wasted anyway: by the time the browser has woken up to the task of animating, the end of the animation timeline is upon it, and it might as well just put things in their final positions.

Also, while I'm impressed with things like PDF.js, I find myself using a native viewer like Okular for documents of any size, purely because the performance difference is significant. So, JavaScript-based equivalent solutions certainly have their limitations.

Grok the GIL (opensource.com)

Posted Apr 28, 2017 15:30 UTC (Fri) by mstone_ (subscriber, #66309) [Link]

And pdf.js tends to print like crap.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds