LWN.net Logo

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Apple has decided to open the code behind Snow Leopard's Grand Central Dispatch feature. "The user-space implementation of the Grand Central Dispatch services API, called libdispatch, has been delivered as its own open source project, joining with other components that are part of projects Apple has already designated as open, including the kernel components in the Darwin OS XNU kernel and the blocks runtime that is part of the LLVM project."
(Log in to post comments)

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 11, 2009 19:49 UTC (Fri) by tajyrink (subscriber, #2750) [Link]

Yay for these two leading companies in openness with announcements today... it really has come to the point that everyone has to play being open one way or another.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 11, 2009 21:15 UTC (Fri) by kragil (guest, #34373) [Link]

Emphasis on _play_

It is not like IBM, Apple, MSFT or any other big company think like RMS now. They will do whatever is in the short term interest of their investors.

The only good thing is that FOSS will win in the very long run (IMHO) and they can't do anything about it. The future is bright :)

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 5:14 UTC (Sat) by sbergman27 (guest, #10767) [Link]

> It is not like IBM, Apple, MSFT or any other big company think like RMS now.

Well... that's a blessing!

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 5:28 UTC (Sat) by muwlgr (guest, #35359) [Link]

Could anybody tell how GCD is different or better from existing alternatives like ThreadWeaver or QtConcurrent ?

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 6:14 UTC (Sat) by k8to (subscriber, #15413) [Link]

An analysis of the competing tools for this sort of thing would be a welcome article indeed.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 12:37 UTC (Sat) by chr_reisinger (guest, #41249) [Link]

http://www.osnews.com/thread?383495

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 17, 2009 2:59 UTC (Thu) by k8to (subscriber, #15413) [Link]

Hmm, this doesn't go deep, and it doesn't compare it to anything but microsoft CLR threadpools. It does highlight some of the rationales to have GCD at all (system wide) but those rationales are unlikely to resonate on platforms where it is unlikely to be the default.

I'm sure there's more meat here.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 18:56 UTC (Sat) by malor (subscriber, #2973) [Link]

This isn't a comparison, but John Siracusa does a very good writeup on GCD and how it works in the middle of his Snow Leopard review.

GCD uses a new compiler feature called "blocks", which are described earlier in the review. This allows for some mighty complex functionality encapsulated into just a few simple, easily-understood lines of code.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 22:04 UTC (Sat) by jzbiciak (✭ supporter ✭, #5246) [Link]

Lexical closures in my C program? It's more likely than I think!

  • Blocks can be defined inline, as “anonymous functions.”
  • Blocks capture read-only copies of local variables, similar to “closures” in other languages

Nifty!

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 23:33 UTC (Sat) by nix (subscriber, #2304) [Link]

GCC nested functions done right :) (well, OK, they're a lot more than
that.)

The only downside now is unportability. I mean, it's part of Apple's GCC
4.2 fork, but is it upstream? (I'd search the gcc-patches list, but it's
sort of hard to search it for 'blocks'!)

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 19:27 UTC (Sun) by muwlgr (guest, #35359) [Link]

In C++ we have already had perfectly usable "functional objects" with redefined operator() and clear semantics (and well-known limitations as well). Compared to that, Apple's "block" extensions to ObjC do not turn it into SmallTalk anyway. Very soon you stumble on the same memory mamangement issues you have in C/C++/ObjC. E.g. if you construct a block which refers to "auto" (local) objects your function and then return this block somewhere up-level, you have to mark these auto objects with special attribute (by hand) so that they are allocated in heap, not in stack, and not destructed on fuction return. Seamless introduction of blocks into ObjC would require changes in the lowest levels of its runtime, and raising the overall level of the language to be closer to SmallTalk or Lisp (and to be farther from rawiron/C, accordingly).

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 22:15 UTC (Sat) by jzbiciak (✭ supporter ✭, #5246) [Link]

Oy... I just spotted something in that review that gave me pause:

There are further optimizations inherent in this scheme. In Mac OS X, threads are relatively heavyweight. Each thread maintains its own set of register values, stack pointer, and program counter, plus kernel data structures tracking its security credentials, scheduling priority, set of pending signals and signal masks, etc. It all adds up to over 512 KB of overhead per thread.

Yowzers! 512K bytes per thread? Where does that figure come from? And what are they storing in that half a megabyte?? Surely this number is too big. I can imagine some overhead for OS structures and stuff, on the order of a page's worth (4K), and maybe even several pages of unique VM mappings. But a half-megabyte of unique storage per thread? Surely that number can't be right.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 12, 2009 23:34 UTC (Sat) by nix (subscriber, #2304) [Link]

It certainly explains why they needed to automate thread pooling, doesn't
it?

But thankfully GCD is a hell of a lot more than just an automatic
thread-pooler, so it might be useful to the rest of us as well (as long as
we're willing to throw portability to the winds, or at least say that you
only get systemwide thread pooling if you compile with *this* compiler and
run *this* daemon).

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 8:07 UTC (Sun) by mjg59 (subscriber, #23239) [Link]

Well. "useful to the rest of us" ends up being kind of conditional on the licensing - Apache v2 isn't compatible with GPLv2, so the code (as is) isn't likely to end up in any of the desktop stacks where it could probably be of most use. We'll have to see what happens with the concept.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 16:13 UTC (Mon) by epa (subscriber, #39769) [Link]

Citation for 'Apacher v2 is not compatible with GPL2?' I thought it was a permissive, essentially BSD-style licence with an additional grant of patent use, so it would be compatible with GPL in the same way the BSD licence is compatible.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 23:27 UTC (Mon) by mjg59 (subscriber, #23239) [Link]

http://www.fsf.org/licensing/licenses/ - search for apache license, version 2.0

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 18, 2009 9:40 UTC (Fri) by trasz (guest, #45786) [Link]

It doesn't matter, because libdispatch accounts as 'system library' - see GPL FAQ for explanation what system libraries are and why GPL applications can link against them, regardless of their license. (For example, GPL-ed applications under Windows can link against Microsoft-provided DLLs.)

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 18, 2009 13:10 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

The system library exemption only covers the case where you're distributing the code that uses the library separately. Traditional interpretation in the Linux world has been that this prevents you using this for a non-GPL compatible library if other libraries or applications in the standard install make use of it.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 8:10 UTC (Sun) by hppnq (subscriber, #14462) [Link]

I am not sure whether the number of 512KB is correct (see this table, for instance), but one of the points of GCD is that it reduces it: GCD queues are very lightweight compared to threads, taking up only 256 bytes.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 15:09 UTC (Sun) by jzbiciak (✭ supporter ✭, #5246) [Link]

Oh, indeed, I have no doubt that GCD does reduce the overhead as compared to a fork/join type operation that actually creates and destroys threads.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 19:08 UTC (Sun) by muwlgr (guest, #35359) [Link]

512KB probably refers to the amount of address space mapped for thread stack. This value is usually even higher in Linux (PTREAD_STACK_SIZE) (2 or 8 MB as I remember). So a thread is really quite a heavyweight resource inside the process.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 20:24 UTC (Sun) by nix (subscriber, #2304) [Link]

Yeah, but I don't see why conserving 512Kb is really worth introducing all
this complexity. 512Kb is basically nothing on a 64-bit system, and not
terribly much on a 32-bit one unless you have thousands of threads (in
which case each thread will be incredibly slow unless your system is
humungous --- in which case it's probably 64-bit).

I suspect the real reason this was implemented is because it actually
allows proper utilization of all CPUs: it's not doable without some sort
of systemwide coordination. I wish we had something like it in Linux.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 20:43 UTC (Sun) by malor (subscriber, #2973) [Link]

Well, if you read Siracusa's article up there, the point of GCD is having a single, system-wide thread manager that understands what hardware you actually have, and then intelligently manages thread creation. He specifically uses the example of having a six-core machine with four cores busy; if you spawn more than two additional threads, you'll cause unneeded overhead. But you have no good way to determine how many cores are in use. One common strategy is to spawn one thread per core on the system, which works well if you're not loaded, and badly if you are.

GCD tracks that for you. You specify the maximum parallelism of your algorithm, hand it off to GCD, and then it figures out how many threads to actually spawn, given the total system load at that exact moment. (I don't know if it can track non-GCD load, though.) It can reuse threads over and over for new code; apparently thread setup on OS X is painful and slow.

By using GCD this way, Siracusa claims that your code automatically scales with the hardware you're on, up to whatever level of maximum parallelism you coded into your program.

It seems like a very good idea. Putting it under the Apache license will cripple it, though. This is pretend open source -- none of Apple's real competitors can use it.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 21:35 UTC (Sun) by nix (subscriber, #2304) [Link]

Agreed, the idea seems really good. It doesn't seem like it would be too
terribly hard to reimplement, though. (The block stuff would probably have
to be done by some of the GCC hackers not employed by Apple: it seems a
*hell* of a lot nicer than the ugly mess of pragmas which is GOMP.)

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 13, 2009 22:58 UTC (Sun) by mikov (subscriber, #33179) [Link]

I don't think it is that clear cut. What happens for example if one task in GCD blocks on slow IO?

The kernel scheduler is in a much better position to arbitrate the available CPU resources. In theory a context switch between threads should not be much less efficient that switching between tasks.

In short, this seems mainly an attempt to work around the heavyweight and relatively inefficient threads in Mac.

OTOH, I think blocks (or closures in C) are very nice. They do not violate the spirit of C. I would love to see them in mainstream GCC.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 4:36 UTC (Mon) by muwlgr (guest, #35359) [Link]

copying my comment : http://lwn.net/Articles/352461/

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 7:22 UTC (Mon) by malor (subscriber, #2973) [Link]

The difference between GCD and a kernel would appear to be this: the kernel doesn't spawn threads sequentially based on available hardware. It's designed to run all the threads "at once". It schedules all of them for CPU time in small slices until they complete, incurring extra overhead for doing so. If you spawn 500 identical threads, they will all finish around the same time, much more slowly than they would have if you'd spawned one thread per available processor.

This is what GCD is doing; it's offering a different service. Threads don't exist or run until there's room for them on the hardware resources, so there's no contention, and throughput is maximized. It's a combination of serialization and concurrency. If you spawn 500 GCD jobs on a quadcore box, it shouldn't finish them very much slower than if you'd spawned just four, because it will only run four at a time. Apparently, GCD also offers some functionality for groups and dependencies as well.

It's a different problem domain. The kernel could obviously do something like this, and might potentially be a better place to do so, but it would be somewhat atypical for the Linux kernel team to provide a rich, highly functional interface for programmers. They don't like to maintain that stuff, so they keep it as minimalist as possible. Exporting that kind of rich ABI would seem politically impossible, no matter how much sense it might otherwise make.

Plus, Apple doesn't control any kernel but its own, so doing it one level higher makes more sense from the standpoint of trying to spread the concept and their particular interface to that concept.

With it being Apache-licensed, though, it's not going to spread very far anyway, so they pretty much shot themselves in the foot.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 16:20 UTC (Mon) by epa (subscriber, #39769) [Link]

Why can't the kernel provide a batch scheduling class whereby threads or processes run in sequence? So if you have 2 processors, just start 500 threads but only two are scheduled, and others don't run until one of the first two threads finishes, or blocks for IO. That seems better than guessing how many threads to start at once.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 16:32 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

Indeed, it sounds an awful lot like the SCHED_FIFO scheduling class, except not quite as strict about the real-time priority.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 16:48 UTC (Mon) by malor (subscriber, #2973) [Link]

just start 500 threads

That seems better than guessing how many threads to start at once.

You're still more or less guessing with that 500.

And yeah, the kernel probably would be a better place, but this kind of rich management API seems very against the minimalist Linux kernel team ethos. It strikes me that they'd insist, whether rightly or wrongly, that an API with this much management (like dependencies and subqueues) would have to be implemented in user space, which goes right back to GCD.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 16, 2009 11:58 UTC (Wed) by epa (subscriber, #39769) [Link]

I'm not suggesting any kind of rich management API - just a scheduling class that runs threads one at a time and never pre-empts one thread to run another. If they are all batch jobs and all of the same priority, there is no need for preemption.

You're right, the choice to start 500 threads is itself a bit arbitrary. What I mean is if you know that (for whatever reason) you want to start 500, why hold back for fear of overwhelming the poor kernel with too many at once? Surely you should just be able to create them, and the kernel will decide for itself when to schedule each one.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 8:19 UTC (Mon) by njs (guest, #40338) [Link]

I... still don't get it. Why would I want to system-wide run exactly as many CPU-bound threads as there are CPUs, assuming I have a half-decent kernel?

That article tries to justify it with nonsense like "if you have 8 cpus, 6 threads are running, and you start 4 threads, then those 4 threads have to spend all their time on the 2 remaining CPUs", but of course in reality all 10 threads would be scheduled across all 8 cpus, and our program with 4 threads would get a larger share of the total CPU than if it only spawned 2 threads.

If I have a big video transcode running in the background with one thread per cpu, and then I interactively apply a filter in Photoshop, I don't really want that interactive job to politely become sequential rather than step on the batch job's toes. And it's not like anything *bad* will happen if they each spawn 1 thread per CPU and then share.

Obviously if you have enough threads then at some point context switch overhead and cache effects will become overwhelming, but this is just a latency/throughput trade-off -- GCD only avoids context-switch slowdowns because it is willing to decide that some code can run now, and some will have to wait (how does it know which is which?). I already have a rich vocabulary for managing that trade-off (SCHED_IDLE/SCHED_BATCH/nice/cgroups/etc.) without reimplementing my scheduler in userspace...

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 8:37 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

if you are running batch jobs that do not interact with anything then the most efficiant way to run them is also the most unfair, you pick one per core and run it to completion with no context switches at all.

now in the real world, nothing short of HPC clusters really doesn't interact with anything. even normal batch jobs need to get data from disk, at which point that thread must sit idle waiting for the disk and another thread could run instead.

don't think of this like you would a normal kernel scheduler. think of this like you would a userspace batch engine. it tries to maximize the total throughput at the expense of latency

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 12:20 UTC (Mon) by malor (subscriber, #2973) [Link]

Well, that isn't the only problem it appears to be trying to solve.

From what I can see, there seems to be four things in play here:

1. Systemic scheduling of threads for absolute maximum throughput, largely disregarding latency;
2. Thread re-use on a platform that finds them expensive to create;
3. Very, very easy integration of multicore processing into formerly single-threaded applications. This is a critical feature; with a few lines of special wrapping code, using blocks, programmers can do batch threading with little additional effort.
4. Very broad scalability. The whole concept of sharing CPUs may just go away as we keep adding more and more. At the rate things are going, we're going to have desktops with 256 cores within 10 years. If you have 256 full speed CPUs, why on earth would you even share most of them? The whole concept of a scheduler is probably going to need to be rethought soon, and GCD would appear to be one early attempt.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 20:56 UTC (Mon) by Tomislav (guest, #56442) [Link]

> At the rate things are going, we're going to have
> desktops with 256 cores within 10 years.

I wouldn't count on it. We are getting close to the physical boundaries in the photolithography technology so there would have to be a real revolution in the manufacturing process if this is to be. Clock speeds have remained around 2.5-3.5 GHz for a while now and it's doubtful that having more than 6-8 cores on the desktop is useful enough to offset the cost of such chips.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 23:31 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

Well, that's true if the core size stays the same and the cores stay homogeneous. GPUs already have 100s of simple cores, and stuff like OpenCL allows your multithreaded, vectorizable app to run on such a heterogeneous setup today. Technologies such as Larrabee blur this further, putting dozens of simple x86 ISA cores out there.

I think you're right that you won't see 256 i7-class CPUs in a computer. I think you're wrong that you won't see 256 application processors in a computer. You will probably have a couple beefy CPUs for the single-threaded portions, and lots of simpler processors to sop of the rest.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 14, 2009 23:35 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

Errr... sop *up* the rest.

Apple Open Sources Snow Leopard's Grand Central Dispatch (Apple Insider)

Posted Sep 18, 2009 9:43 UTC (Fri) by trasz (guest, #45786) [Link]

It may be worth mentioning that libdispatch has just been ported to FreeBSD: http://libdispatch.macosforge.org/trac/changeset/27

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds