LWN.net Logo

GCC and static analysis

GCC and static analysis

Posted Apr 21, 2012 22:45 UTC (Sat) by khim (subscriber, #9252)
In reply to: GCC and static analysis by neilbrown
Parent article: GCC and static analysis

Most programming really is fairly boring single threaded stuff.

Not really. Most programs need to deal with bazillion things simultaneously. Mouse, network, computations (even if they are simple they don't happen in zero time). Heck, even simple parse of the config takes unpredictable time because you need to find said config somewhere on slow HDD first (SSD is faster but still many times slower then CPU). Contemporary systems include bazillion tricks which make single-threaded programs somewhat acceptable (insanely fast CPUs and multitude of “transparent” caches are primary solution… but as I've said we've hit the wall there).

Of course when you've created all that it makes no sense to try to use functional programming in this environment: it does not really fit! The fact that all these “transparent accelerators” are not needed does not help you to produce the result faster and you need to hire more knowledgeable, more expensive programmers!

I've always thought that functional languages are great for writing interesting algorithms but just didn't make sense for writing complete applications.

On the existing hardware, sure, it makes no sense. But I'm not so sure it'll be true in 10, 20, or 30 years time. We've hit the wall with CPUs about 10 years ago. I think we'll hit similar wall WRT memory soon. It'll be interesting to see how long we'll be able to pretend that programs which utilize our systems by 1% of the capability or so are exactly what we need.


(Log in to post comments)

GCC and static analysis

Posted Apr 23, 2012 15:23 UTC (Mon) by cmccabe (guest, #60281) [Link]

There are other ways to write imperative programs besides using locks and condition variables. Check out Golang for an example of an imperative language built around message passing between lightweight threads. Such a language could easily scale to thousands of cores.

Whether or not something can be parallelized is more related to the algorithm than the programming language you write it in. There are algorithms like Blowfish which require a fixed number of iterations and simply cannot be parallelized no matter what McGuffin you use. I wrote a compiler once in OCaml (a functional programming language), and there was a lot of stuff I put in there that couldn't be parallelized (for both good and bad reasons.)

GCC and static analysis

Posted Apr 23, 2012 21:40 UTC (Mon) by khim (subscriber, #9252) [Link]

Check out Golang for an example of an imperative language built around message passing between lightweight threads. Such a language could easily scale to thousands of cores.

Golang still produces programs which connects sequential islands with pipes. To efficiently ulitize millions of cores you need to parallelize task as much as possible. Not sure about thousands: perhaps Go will scale to this level, perhaps not, time will tell.

It's possible to write anything in imperative languages (any functional language is translated to imperative ISA of target CPU, after all). But I'm not sure if such a feat is achievable by someone who can not program in functional languages - and if you need to employ such people anyway, then it makes to sense to strangle them with imperative programming.

Whether or not something can be parallelized is more related to the algorithm than the programming language you write it in.

Sure, but that's the problem: when you write programs of thousand of lines they you usually think that you are implementing one single algorithm. Which may be parallelziable or not. But in reality if implement hundreds of tiny subalgorithms there - and even if the "main" algorithm is not parallelizeable these subalgorithms may be parallelziable.

Think about map::insert example again. Suppose you have sequence of keys and you need to print for each key the line number. High-level algorithm employed by the imperative programmer and by functional programmer may be the same (insert all the elements in the map or, better yet, priority queue, pull them one after another), but imperative program (even written in Go) will, most likely, use just one core while functional program (given smart enough compiler) will meaningfully saddle dozen of core with meaningful work!

The problem we hit here is simple: contemporary CPUs are so much tuned for streamlined execution of imperative programs that they are useless for such organization of the work! Sure, you can meaningfully use dozen of cores for the aforementioned task… in your dreams. In real life all these countless buffers and caches start working against you and lose so much speed in communications between them that functional program may be even slower on dozen of cores then imperative one on a single core!

You need completely different organization of the hardware to effectively utilize this kind of parallelism! It's possible to create it, but then it'll be slow when you'll try to run the existing programs so who'll buy it? Thus I'm sure that till SMP paradigm will work we'll be stuck with the existing architecture and imperative languages.

GCC and static analysis

Posted Apr 24, 2012 3:43 UTC (Tue) by cmccabe (guest, #60281) [Link]

I honestly hadn't thought about the problems that would arise with millions of cores. I guess you could call it a failure of imagination on my part. But on the other hand, even if the number of cores doubles every year, it would still take about 20 years for us to get to that point. I find it hard to make predictions about things 5 years from now, let alone 20 years away.

Even if you somehow had a chip with millions of cores, and a functional programming language that could harness that chip perfectly, there would still be limits. Lets say you had a map-reduce operation that could efficiently distribute out the work to all million cores, and then reduce it to a single dataset. To use a million functional units, you need at least a million-way parallelism. How many data sets need this kind of parallelism?

I guess there are a lot of things that could happen between then and now. One is that we could stop taking the Moore's law dividend in terms of performance, and start taking it in terms of increased battery life, lower thermal dissipation, and so forth. There is evidence that this is already happening, with the popularity of various pods and pads.

The other is that special-purpose hardware could evolve to do the common tasks that we currently use a general-purpose CPU for. So instead of a PC running MySQL, you would have some kind of storage device that understood SQL. The hardware itself would understand and process SQL-- there would be no need to for busses to slowly shuttle data back and forth, but instead, a single die intermingling flash memory and small cores. There might be another combined physics engine and graphics chip used for games.

In this future, general purpose CPUs still exist, but only as tiny add-ons to beefy co-processors. This future is not that unrealistic: most programmers don't work on MySQL or write physics engines. They write business logic, or do level design. Why should they be concerned with how those things work? As always, capitalism produces more and more specialists in narrower and narrower fields.

The other thing that could happen is that FPGAs could go mainstream. This is probably the least likely possibility, but it's the most exciting. In this case, you would basically be able to generate your own hardware on the fly. This would be the best possibility from the open source point of view, of course.

GCC and static analysis

Posted Apr 28, 2012 17:05 UTC (Sat) by khim (subscriber, #9252) [Link]

To use a million functional units, you need at least a million-way parallelism. How many data sets need this kind of parallelism?

Most of them? Compare our lighting-speed tiny-number-of-cores computer with super-duper-slow bazillion-cores computer (called human brain) and you'll see that most tasks out there can be solved quite efficiently using massively parallel computer. Some can not (that's why regular computers were invented in first place) but most can. This is only achievable with totally different paradigm and approach to programming.

Why should they be concerned with how those things work? As always, capitalism produces more and more specialists in narrower and narrower fields.

Nope. This process is more-or-less finished. Speed of light stopped The Megahertz War. Size of Earth population is stopping future specialization. This process is at it's end. Will capitalism survive? Who knows, it's interesting question. But it'll stop producing "specialists in narrower and narrower fields", that's for sure.

GCC and static analysis

Posted Apr 29, 2012 8:33 UTC (Sun) by paulj (subscriber, #341) [Link]

Nit: Wasn't it quantum tunneling effects and resultant current leakage that killed the MHz war?

It's surely not speed of light, cause electrons/information in copper guides do not propagate at that speed, and I think its much slower again through CMOS gates. Hence the research interest in photonic computing.

GCC and static analysis

Posted Apr 29, 2012 10:45 UTC (Sun) by khim (subscriber, #9252) [Link]

Nit: Wasn't it quantum tunneling effects and resultant current leakage that killed the MHz war?

Nope. These can and do limit speed - but this limit is near terahertz. Worse: this limit have not stopped growing in the last decade: it went from about 100MHz to 800MHz+. Yet CPUs have stopped at 4-5GHz. Pentium4 was famously unable to reach this limit (fastest model reached 3.8GHz) while POWER6 (with it's entirely different architecture) stopped at 5GHz.

There are many different processes which limit the speed of CPU but they all are related to propagation of signals - and these are measured in fractions of speed of light. Refractive index of silicone is large (about 4, but, as usual, it depends on frequency) while refractive index of copper is low (about 1.10 - but, again, it depends on frequency). In the end the fact that at frequency of 100GHz speed can only pass 3mm in vacuum means working CPUs can not be built at this frequency.

Note that frequency of x86 CPUs is still incresing - but at totally different speed. Nehalem reached 3.86GHz at 45nm (Core™ i5-680) while Sandy Bridge reached 4GHz at 32nm (Xeon® E3-1290). You may say that it's limited by thermal envelope (as current leakage will predict), but I'll just point out that Core™ i5-680 is dual-core CPU while Xeon® E3-1290 is quad-core CPU. Where is my single-core 6GHz CPU? I know a lot of guys who'll be ready to pay good money for such a beast.

GCC and static analysis

Posted Apr 29, 2012 18:11 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, there are asynchronous chips that can work without clocking entirely.

They are easy to create for low 'frequencies', but it becomes harder and harder to design them as you increase their complexity. However they are not limited by the lightspeed.

We still have a lot of way to go. Probably, we'll reach at least 20-30GHz with carbon-based transistors.

GCC and static analysis

Posted Apr 29, 2012 18:21 UTC (Sun) by apoelstra (subscriber, #75205) [Link]

A light-nanosecond is roughly 30cm (or roughly a foot, if you'd like). So at 1Ghz, light in a vacuum can reach 30cm per tick. Electrical signals in copper travel at around (1/3)c can reach 10cm.

(I find it very funny that we can use such nice numbers, given that both the meter and second were defined long before c was known.)

So at 20Ghz, this is cut to 5mm. In other words, your die can have a diameter of at most 5mm, no matter what crazy architecture you can imagine, if you want the processor to function at 20Ghz. So in a space of (5mm)^2, you're sending 20 000 000 000 signals per second, which need to be much stronger than background noise.

This does not sound possible to me.

GCC and static analysis

Posted Apr 29, 2012 18:28 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

>A light-nanosecond is roughly 30cm (or roughly a foot, if you'd like). So at 1Ghz, light in a vacuum can reach 30cm per tick. Electrical signals in copper travel at around (1/3)c can reach 10cm.

Yep, I have one nano-lightsecond length of wire in a frame on a wall at home (it is a gift from a friend who got it from Admiral Grace Hopper herself).

>So at 20Ghz, this is cut to 5mm. In other words, your die can have a diameter of at most 5mm, no matter what crazy architecture you can imagine
Why? Async chips do not depend (strongly) on the speed of light. One part of an async chip can be doing completely unrelated tasks. Of course, in the end you'll hit the lightspeed wall, but it's a long way off.

>So in a space of (5mm)^2, you're sending 20 000 000 000 signals per second, which need to be much stronger than background noise.
So? Modern network routers work at about the same speed and die size.

GCC and static analysis

Posted Apr 30, 2012 15:10 UTC (Mon) by khim (subscriber, #9252) [Link]

Why? Async chips do not depend (strongly) on the speed of light.

Async has huge PR value (you can claim you chip is working on 100GHz or even 1THz if you are lucky). It does not change anything WRT to actual speed of computation.

One part of an async chip can be doing completely unrelated tasks.

It's the same with synchronous schemes as well. Why do you think Prescott had over 30 stages in pipeline? Why do you think Itanic has L1, L2 and L3 caches - all on the same chip with the same semiconductor process technology?

Of course, in the end you'll hit the lightspeed wall, but it's a long way off.

In the end the nominal number on the CPU envelope is just a number. Sure, you can create totally-async 1THz chip which will need "about 1000 tics" to add two numbers together, but it'll be slower then Core i7 so what's the point? Lightspeed barrier it as real for async as it's for traditional CPU.

Note that lightspeed barrier only limits latency (of single-threaded program in this particular case), it does not affect throughput at all. In fact a way to create tremendous throughput is well known: just use more of the same things!

So? Modern network routers work at about the same speed and die size.

They don't execute sequential program, that's totally different kettle of fish. For them streaming async design makes perfect sense. For GPUs it can be applied, too. For CPUs? Not so much. Netburst tried to do that. It failed. I doubt anyone will actually try to do it again.

1mm cube?

Posted May 5, 2012 3:28 UTC (Sat) by gmatht (guest, #58961) [Link]

If thermal density wasn't a problem, we should be able to fit lots of 22nm transistors into a 1mm cube.

GCC and static analysis

Posted Apr 23, 2012 23:43 UTC (Mon) by marcH (subscriber, #57642) [Link]

> Whether or not something can be parallelized is more related to the algorithm than the programming language you write it in.

In theory yes. But in practice the language and its environment (built-in features, libraries, "culture",...) have a strong influence on the choice of algorithm(s). You just gave an example yourself!

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds