LWN.net Logo

GCC and static analysis

By Jonathan Corbet
April 21, 2012
Concurrency tends to make programming hard. Kernel development obviously involves dealing with a lot of concurrency, but there is also a lot of multi-threaded user-space development that suffers from the same issues. It would be nice if the computer could help developers avoid race conditions and other problems that arise in concurrent environments. Some developers at Google have been working on just such a project for some time, but they have just relocated the project from GCC to the LLVM Clang compiler, saying that GCC is not suited to the work they want to do. The result has been a sort of wake-up call for GCC developers. Is the GCC compiler suite not well suited to the creation of static analysis tools?

Like kernel programming, multi-threaded user-space programming involves creating and using locks to prevent concurrent access to shared data. In a properly designed and implemented locking scheme, code will never see inconsistent views of shared data, and that data will not change when the code is not expecting changes. Getting to that point is hard, though, and the bugs that result from locking mistakes can be hard to reproduce and hard to diagnose. There are few things more frustrating than a highly intermittent bug that seemingly makes no sense and defies efforts to track it down.

In 2008, Google developer Le-Chun Wu announced a project to add support for "thread safety annotations" in C and C++ programs to the GCC compiler. Since then, work has progressed to the point that the developers have a useful system that is employed by a number of internal projects. The ideas are relatively straightforward. Shared data requiring a lock is annotated with something like:

    Mutex data_mutex;
    struct shared_data Data GUARDED_BY(data_mutex);

Functions that manipulate locks are annotated with:

    class some_class {
  	Mutex mutex;
	void get_lock() EXCLUSIVE_LOCK_FUNCTION(mutex) { ... }
	void put_lock() UNLOCK_FUNCTION(mutex) { ... }
	/* ... */
    };

There are also annotations for shared locks and "trylock" functions that may not succeed. If a function expects a given lock to be held when it is called, it can be annotated with EXCLUSIVE_LOCKS_REQUIRED(); there is also a LOCKS_EXCLUDED() annotation for functions that will acquire non-nesting locks themselves. Finally, this construction can be used for lock ordering constraints:

    class some_class {
	Mutex m1;
	Mutex m2 ACQUIRED_AFTER(m1);
	/* ... */
    };

Some other annotations exist; they can be seen in this slide deck [PDF], which is where your editor stole the above examples from.

The GCC implementation sets itself up as an early optimization pass. It builds a representation of the code from the GIMPLE internal representation, tracking which locks are held at each point through the function. When problems or inconsistencies are found, the compiler raises the alarm, hopefully causing the problem to be fixed before it gets into production code and bites somebody.

This code is available in a branch in the GCC repository and appears to be useful, so it came as a bit of a surprise when, on April 19, Google developer Diego Novillo announced that the project had been terminated, and that the group was now working to implement the same functionality in the LLVM Clang compiler instead. When asked why the group was making this change, developer Delesley Hutchins responded:

The gcc version has been difficult to support and maintain, due mainly to the fact that the GIMPLE intermediate language was never designed for static analysis. The abstract syntax tree provided by Clang is an easier data structure to work with for front-end analyses of this kind.

The response added that the GCC implementation has "some issues" that would make it hard to merge into the GCC mainline, while the Clang implementation has been in the trunk all along.

The GCC developers, naturally, would like to understand what it is about their compiler that made this project hard. The Google team was hesitant to respond, seemingly unwilling to criticize GCC or to cause a compiler flame war. Eventually, though, Delesley posted a more detailed description of the kinds of difficulties they had run into. There is a lot of information there, but it seems to come down to two crucial points:

  • The GIMPLE representation loses information about the structure of the original program that the static analysis pass really needs to have. LLVM, instead, builds an abstract syntax tree that much more closely matches the original C/C++ syntax; additional structures are then based on that tree. LLVM's tree is evidently well suited to the task of static analysis.

  • The GIMPLE representation changes significantly from one compiler release to the next, causing a lot of things to break. That adds a maintenance cost that the Google developers are unwilling to pay - especially if basing on LLVM instead makes that cost go away.

There were other things that were better for their purposes in GCC, but, in the balance, the developers concluded that LLVM is the better platform for their work. They seem determined to make the shift and feel that attempts to fix the problems they encountered in GCC would create difficulties elsewhere in the compiler. So this move appears to be a done deal.

In one sense, this decision lacks serious implications for the free software community. LLVM, too, is free software and the static analysis code will be part of it. That said, it is worrisome if GCC's internal structure truly turns out to be poorly suited to static analysis tasks. There is a lot of interesting work being done in the static analysis area; it offers the prospect of finding large numbers of bugs early in the development process. Static analysis tools will almost certainly be an increasingly important part of many developers' tool boxes. If those tools cannot easily be implemented with GCC, that compiler's future may not be as bright as it otherwise should be.

That said, it is dangerous to extrapolate from one project to the whole field of static analysis tools. The GCC plugin mechanism is just beginning to mature; it really has not had the time to turn into a platform that complex tools can be built on top of. So, while this episode is a warning that the GCC developers should hear (and evidently have heard quite well), it should not be seen as evidence of fatal design flaws. Likely as not, it is just an indication of a problem in need of solution - something the GCC community is good at.


(Log in to post comments)

GCC and static analysis

Posted Apr 21, 2012 16:48 UTC (Sat) by JohnLenz (subscriber, #42089) [Link]

Like kernel programming, multi-threaded user-space programming involves creating and using locks to prevent concurrent access to shared data.

I guess you haven't read this blog post which has been making the rounds recently. Also, this talk by Simon Peyton Jones gives a more in depth overview of things like software transactional memory, data parallel arrays, channels, and so on which allow concurrency with no locks.

GCC and static analysis

Posted Apr 21, 2012 17:04 UTC (Sat) by juliank (subscriber, #45896) [Link]

GCC and static analysis

Posted Apr 21, 2012 18:45 UTC (Sat) by ballombe (subscriber, #9523) [Link]

Why all the man attending Simon Peyton Jones talk are bald ?

GCC and static analysis

Posted Apr 21, 2012 22:19 UTC (Sat) by nix (subscriber, #2304) [Link]

Oh, come on. They're *functional programmers*. There can be only one answer: their sheer heat of intellect has burned their hair away.

GCC and static analysis

Posted Apr 21, 2012 22:38 UTC (Sat) by neilbrown (subscriber, #359) [Link]

> Oh, come on. They're *functional programmers*.

So I guess the rest of us are *dysfunctional programmers*??

(sounds like a good name for a blog - "dysfunctional programming" is taken, but not "The Dysfunctional Programmer")

GCC and static analysis

Posted Apr 26, 2012 12:03 UTC (Thu) by nix (subscriber, #2304) [Link]

Either a blog or a geeky rock band.

GCC and static analysis

Posted Apr 22, 2012 2:25 UTC (Sun) by skx (subscriber, #14652) [Link]

Lazy evaluation ..

GCC and static analysis

Posted Apr 21, 2012 19:18 UTC (Sat) by linusw (subscriber, #40300) [Link]

So the declaration of Imperative languages (BTW a term only used by the opponents of this class of languages IIRC) has been stated before. To be more precise, in the last multi-CPU crisis in the 1970s. The fifth generation computers were a response to that last crisis:
http://en.wikipedia.org/wiki/Fifth_generation_computer

So they actually designed the whole OS for that machine in KL1, based on Prolog, which is a functional language. I think the code is actually open source. There is a KL1-to-C translator but the FGCS inference machine ran the language natively, in machine code.

So why did the last crisis not switch all computer and language development to functional languages? What do I know. One official story goes that speculative execution and caches made imperative languages run faster than any of the machines designed to run e.g. KL1, and performace was all that mattered in the end.

Am I saying that the current "multicore crisis" will be solved by something totally different again? Not really, I like functional languages so I hope they will fit the bill this time.

GCC and static analysis

Posted Apr 21, 2012 19:53 UTC (Sat) by khim (subscriber, #9252) [Link]

So why did the last crisis not switch all computer and language development to functional languages?

Because it made no sense? Functional languages have many advantages and one, yet crippling disadvantage: most programmers can't use them. Not as in "they don't know them", but as in "they'll never be able to learn them". And the same is true for concurrent programming in traditional imperative languages!

It's hard to say right now if functional programming will win or not, but I'm pretty sure it'll do. Not because it's better alternative, but because it'll be the only alternative. It'll happen in the same way multicores won the war: not because they were better, but because they become the only possibility after certain threshold! They were always better (transputers did amazing things quarter-century ago) but till AMD and Intel were able to provide faster single-core chips mainstream stayed single core…

Well, if functional languages will win… then when this may happen? Hard to say. There are the whole industry right now which tries to somehow help people who just can not write concurrent or functional code to still use multicore systems (tools parent article discusses is very much in this bucket), but as number of cores grows it becomes harder and harder to do. Beyond certain threshold the whole thing will just fall apart and something like Cell will be the only alternative - but this time lies years from now. We know contemporary SMP/Numa systems can handle tens of cores, they will probably never be able to handle millions, but where exactly the threshold lies we'll know only aposteory: after Pentium4-like fiasco when “latest and greatest” creature will be found to be slower then it's predecessor. Then and only then functional languages will become mainstream.

And of course even after that point a lot of guys will still stick with imperative programming (like today a lot of guys write single-threaded programs still) because it's easier.

GCC and static analysis

Posted Apr 21, 2012 20:03 UTC (Sat) by pcarrier (subscriber, #65446) [Link]

Let's not forget that single-threaded programming works perfectly fine in many cases, as there isn't any need to scale to multiple cores.
Not every piece of software involves CPU-intensive tasks on shared data. A few single-threaded processes will use my cores just fine, thanks.

GCC and static analysis

Posted Apr 21, 2012 21:22 UTC (Sat) by neilbrown (subscriber, #359) [Link]

Absolutely agreed.
Most programming really is fairly boring single threaded stuff. Imperative languages excel at that.
Sometimes you do need to do something interesting and highly parallel and having a functional language - or language subset - for those situations makes lots of sense.
I've always thought that functional languages are great for writing interesting algorithms but just didn't make sense for writing complete applications. As one visionary repeatedly says "Use the right tool for the job".

GCC and static analysis

Posted Apr 21, 2012 22:45 UTC (Sat) by khim (subscriber, #9252) [Link]

Most programming really is fairly boring single threaded stuff.

Not really. Most programs need to deal with bazillion things simultaneously. Mouse, network, computations (even if they are simple they don't happen in zero time). Heck, even simple parse of the config takes unpredictable time because you need to find said config somewhere on slow HDD first (SSD is faster but still many times slower then CPU). Contemporary systems include bazillion tricks which make single-threaded programs somewhat acceptable (insanely fast CPUs and multitude of “transparent” caches are primary solution… but as I've said we've hit the wall there).

Of course when you've created all that it makes no sense to try to use functional programming in this environment: it does not really fit! The fact that all these “transparent accelerators” are not needed does not help you to produce the result faster and you need to hire more knowledgeable, more expensive programmers!

I've always thought that functional languages are great for writing interesting algorithms but just didn't make sense for writing complete applications.

On the existing hardware, sure, it makes no sense. But I'm not so sure it'll be true in 10, 20, or 30 years time. We've hit the wall with CPUs about 10 years ago. I think we'll hit similar wall WRT memory soon. It'll be interesting to see how long we'll be able to pretend that programs which utilize our systems by 1% of the capability or so are exactly what we need.

GCC and static analysis

Posted Apr 23, 2012 15:23 UTC (Mon) by cmccabe (guest, #60281) [Link]

There are other ways to write imperative programs besides using locks and condition variables. Check out Golang for an example of an imperative language built around message passing between lightweight threads. Such a language could easily scale to thousands of cores.

Whether or not something can be parallelized is more related to the algorithm than the programming language you write it in. There are algorithms like Blowfish which require a fixed number of iterations and simply cannot be parallelized no matter what McGuffin you use. I wrote a compiler once in OCaml (a functional programming language), and there was a lot of stuff I put in there that couldn't be parallelized (for both good and bad reasons.)

GCC and static analysis

Posted Apr 23, 2012 21:40 UTC (Mon) by khim (subscriber, #9252) [Link]

Check out Golang for an example of an imperative language built around message passing between lightweight threads. Such a language could easily scale to thousands of cores.

Golang still produces programs which connects sequential islands with pipes. To efficiently ulitize millions of cores you need to parallelize task as much as possible. Not sure about thousands: perhaps Go will scale to this level, perhaps not, time will tell.

It's possible to write anything in imperative languages (any functional language is translated to imperative ISA of target CPU, after all). But I'm not sure if such a feat is achievable by someone who can not program in functional languages - and if you need to employ such people anyway, then it makes to sense to strangle them with imperative programming.

Whether or not something can be parallelized is more related to the algorithm than the programming language you write it in.

Sure, but that's the problem: when you write programs of thousand of lines they you usually think that you are implementing one single algorithm. Which may be parallelziable or not. But in reality if implement hundreds of tiny subalgorithms there - and even if the "main" algorithm is not parallelizeable these subalgorithms may be parallelziable.

Think about map::insert example again. Suppose you have sequence of keys and you need to print for each key the line number. High-level algorithm employed by the imperative programmer and by functional programmer may be the same (insert all the elements in the map or, better yet, priority queue, pull them one after another), but imperative program (even written in Go) will, most likely, use just one core while functional program (given smart enough compiler) will meaningfully saddle dozen of core with meaningful work!

The problem we hit here is simple: contemporary CPUs are so much tuned for streamlined execution of imperative programs that they are useless for such organization of the work! Sure, you can meaningfully use dozen of cores for the aforementioned task… in your dreams. In real life all these countless buffers and caches start working against you and lose so much speed in communications between them that functional program may be even slower on dozen of cores then imperative one on a single core!

You need completely different organization of the hardware to effectively utilize this kind of parallelism! It's possible to create it, but then it'll be slow when you'll try to run the existing programs so who'll buy it? Thus I'm sure that till SMP paradigm will work we'll be stuck with the existing architecture and imperative languages.

GCC and static analysis

Posted Apr 24, 2012 3:43 UTC (Tue) by cmccabe (guest, #60281) [Link]

I honestly hadn't thought about the problems that would arise with millions of cores. I guess you could call it a failure of imagination on my part. But on the other hand, even if the number of cores doubles every year, it would still take about 20 years for us to get to that point. I find it hard to make predictions about things 5 years from now, let alone 20 years away.

Even if you somehow had a chip with millions of cores, and a functional programming language that could harness that chip perfectly, there would still be limits. Lets say you had a map-reduce operation that could efficiently distribute out the work to all million cores, and then reduce it to a single dataset. To use a million functional units, you need at least a million-way parallelism. How many data sets need this kind of parallelism?

I guess there are a lot of things that could happen between then and now. One is that we could stop taking the Moore's law dividend in terms of performance, and start taking it in terms of increased battery life, lower thermal dissipation, and so forth. There is evidence that this is already happening, with the popularity of various pods and pads.

The other is that special-purpose hardware could evolve to do the common tasks that we currently use a general-purpose CPU for. So instead of a PC running MySQL, you would have some kind of storage device that understood SQL. The hardware itself would understand and process SQL-- there would be no need to for busses to slowly shuttle data back and forth, but instead, a single die intermingling flash memory and small cores. There might be another combined physics engine and graphics chip used for games.

In this future, general purpose CPUs still exist, but only as tiny add-ons to beefy co-processors. This future is not that unrealistic: most programmers don't work on MySQL or write physics engines. They write business logic, or do level design. Why should they be concerned with how those things work? As always, capitalism produces more and more specialists in narrower and narrower fields.

The other thing that could happen is that FPGAs could go mainstream. This is probably the least likely possibility, but it's the most exciting. In this case, you would basically be able to generate your own hardware on the fly. This would be the best possibility from the open source point of view, of course.

GCC and static analysis

Posted Apr 28, 2012 17:05 UTC (Sat) by khim (subscriber, #9252) [Link]

To use a million functional units, you need at least a million-way parallelism. How many data sets need this kind of parallelism?

Most of them? Compare our lighting-speed tiny-number-of-cores computer with super-duper-slow bazillion-cores computer (called human brain) and you'll see that most tasks out there can be solved quite efficiently using massively parallel computer. Some can not (that's why regular computers were invented in first place) but most can. This is only achievable with totally different paradigm and approach to programming.

Why should they be concerned with how those things work? As always, capitalism produces more and more specialists in narrower and narrower fields.

Nope. This process is more-or-less finished. Speed of light stopped The Megahertz War. Size of Earth population is stopping future specialization. This process is at it's end. Will capitalism survive? Who knows, it's interesting question. But it'll stop producing "specialists in narrower and narrower fields", that's for sure.

GCC and static analysis

Posted Apr 29, 2012 8:33 UTC (Sun) by paulj (subscriber, #341) [Link]

Nit: Wasn't it quantum tunneling effects and resultant current leakage that killed the MHz war?

It's surely not speed of light, cause electrons/information in copper guides do not propagate at that speed, and I think its much slower again through CMOS gates. Hence the research interest in photonic computing.

GCC and static analysis

Posted Apr 29, 2012 10:45 UTC (Sun) by khim (subscriber, #9252) [Link]

Nit: Wasn't it quantum tunneling effects and resultant current leakage that killed the MHz war?

Nope. These can and do limit speed - but this limit is near terahertz. Worse: this limit have not stopped growing in the last decade: it went from about 100MHz to 800MHz+. Yet CPUs have stopped at 4-5GHz. Pentium4 was famously unable to reach this limit (fastest model reached 3.8GHz) while POWER6 (with it's entirely different architecture) stopped at 5GHz.

There are many different processes which limit the speed of CPU but they all are related to propagation of signals - and these are measured in fractions of speed of light. Refractive index of silicone is large (about 4, but, as usual, it depends on frequency) while refractive index of copper is low (about 1.10 - but, again, it depends on frequency). In the end the fact that at frequency of 100GHz speed can only pass 3mm in vacuum means working CPUs can not be built at this frequency.

Note that frequency of x86 CPUs is still incresing - but at totally different speed. Nehalem reached 3.86GHz at 45nm (Core™ i5-680) while Sandy Bridge reached 4GHz at 32nm (Xeon® E3-1290). You may say that it's limited by thermal envelope (as current leakage will predict), but I'll just point out that Core™ i5-680 is dual-core CPU while Xeon® E3-1290 is quad-core CPU. Where is my single-core 6GHz CPU? I know a lot of guys who'll be ready to pay good money for such a beast.

GCC and static analysis

Posted Apr 29, 2012 18:11 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, there are asynchronous chips that can work without clocking entirely.

They are easy to create for low 'frequencies', but it becomes harder and harder to design them as you increase their complexity. However they are not limited by the lightspeed.

We still have a lot of way to go. Probably, we'll reach at least 20-30GHz with carbon-based transistors.

GCC and static analysis

Posted Apr 29, 2012 18:21 UTC (Sun) by apoelstra (subscriber, #75205) [Link]

A light-nanosecond is roughly 30cm (or roughly a foot, if you'd like). So at 1Ghz, light in a vacuum can reach 30cm per tick. Electrical signals in copper travel at around (1/3)c can reach 10cm.

(I find it very funny that we can use such nice numbers, given that both the meter and second were defined long before c was known.)

So at 20Ghz, this is cut to 5mm. In other words, your die can have a diameter of at most 5mm, no matter what crazy architecture you can imagine, if you want the processor to function at 20Ghz. So in a space of (5mm)^2, you're sending 20 000 000 000 signals per second, which need to be much stronger than background noise.

This does not sound possible to me.

GCC and static analysis

Posted Apr 29, 2012 18:28 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

>A light-nanosecond is roughly 30cm (or roughly a foot, if you'd like). So at 1Ghz, light in a vacuum can reach 30cm per tick. Electrical signals in copper travel at around (1/3)c can reach 10cm.

Yep, I have one nano-lightsecond length of wire in a frame on a wall at home (it is a gift from a friend who got it from Admiral Grace Hopper herself).

>So at 20Ghz, this is cut to 5mm. In other words, your die can have a diameter of at most 5mm, no matter what crazy architecture you can imagine
Why? Async chips do not depend (strongly) on the speed of light. One part of an async chip can be doing completely unrelated tasks. Of course, in the end you'll hit the lightspeed wall, but it's a long way off.

>So in a space of (5mm)^2, you're sending 20 000 000 000 signals per second, which need to be much stronger than background noise.
So? Modern network routers work at about the same speed and die size.

GCC and static analysis

Posted Apr 30, 2012 15:10 UTC (Mon) by khim (subscriber, #9252) [Link]

Why? Async chips do not depend (strongly) on the speed of light.

Async has huge PR value (you can claim you chip is working on 100GHz or even 1THz if you are lucky). It does not change anything WRT to actual speed of computation.

One part of an async chip can be doing completely unrelated tasks.

It's the same with synchronous schemes as well. Why do you think Prescott had over 30 stages in pipeline? Why do you think Itanic has L1, L2 and L3 caches - all on the same chip with the same semiconductor process technology?

Of course, in the end you'll hit the lightspeed wall, but it's a long way off.

In the end the nominal number on the CPU envelope is just a number. Sure, you can create totally-async 1THz chip which will need "about 1000 tics" to add two numbers together, but it'll be slower then Core i7 so what's the point? Lightspeed barrier it as real for async as it's for traditional CPU.

Note that lightspeed barrier only limits latency (of single-threaded program in this particular case), it does not affect throughput at all. In fact a way to create tremendous throughput is well known: just use more of the same things!

So? Modern network routers work at about the same speed and die size.

They don't execute sequential program, that's totally different kettle of fish. For them streaming async design makes perfect sense. For GPUs it can be applied, too. For CPUs? Not so much. Netburst tried to do that. It failed. I doubt anyone will actually try to do it again.

1mm cube?

Posted May 5, 2012 3:28 UTC (Sat) by gmatht (guest, #58961) [Link]

If thermal density wasn't a problem, we should be able to fit lots of 22nm transistors into a 1mm cube.

GCC and static analysis

Posted Apr 23, 2012 23:43 UTC (Mon) by marcH (guest, #57642) [Link]

> Whether or not something can be parallelized is more related to the algorithm than the programming language you write it in.

In theory yes. But in practice the language and its environment (built-in features, libraries, "culture",...) have a strong influence on the choice of algorithm(s). You just gave an example yourself!

GCC and static analysis

Posted Apr 21, 2012 22:34 UTC (Sat) by ballombe (subscriber, #9523) [Link]

> A few single-threaded processes will use my cores just fine, thanks.
Depend how many core you have.

But the issue is that algorithms that scale linearly are not known for a lot of problems. In most case, one get a speed up of at best 3, while using two time more total CPU time (on 8 cores. Using more cores would only increase the wastage but not the speed). So the incentive is not that large.

GCC and static analysis

Posted Apr 22, 2012 11:36 UTC (Sun) by KSteffensen (subscriber, #68295) [Link]

>> Not as in "they don't know them", but as in "they'll never be able to learn them".

That's a rather arrogant statement. What makes you say so?

GCC and static analysis

Posted Apr 22, 2012 12:09 UTC (Sun) by khim (subscriber, #9252) [Link]

Experience. I've teached CS at one time and we had functional programming as a mandatory course (toy simplified language, nothing too fancy). Small group of students took to it immediately and did everything in very short amount of time (later we added quite a bunch of optional tasks to make sure they'll have things to do in the course). Yet significant percentage of students were never able to do even simple things in the course without extereme struggles and it often looked that they employ “combinatorial programming” (shuffle symbols and variable names around till you'll get working program by some random chance). Many of them went on to become SWE.

GCC and static analysis

Posted Apr 22, 2012 16:58 UTC (Sun) by rwst (guest, #84121) [Link]

Combinatorial, heh, I know the beginner's experience with Haskell. Headers, globals/locals, data structures ok but I never knew when to put $ or (...). Finally, monads are a black art, but presumably the most powerful construct. Though I didn't stay in that language, I would come back and try again should I need to.

Is it true what someone said: to parallelize a Haskell program needs the mere setting of a switch?

GCC and static analysis

Posted Apr 22, 2012 17:29 UTC (Sun) by khim (subscriber, #9252) [Link]

Is it true what someone said: to parallelize a Haskell program needs the mere setting of a switch?

This is exaggeration but with a kernel of truth. Programs in functional languages are usually easily parallelizable by a compiler if the underlying algorithm is parallelizable. But it is possible to write even well-parallelizable algorithm in such a way as to make it impossible. This is an art, though - and an ugly one. Trivial proof-of-concept: implement Turing Machine on top of Haskell, implement your algorithm on top of said machine… mission accomplished.

GCC and static analysis

Posted Apr 23, 2012 11:41 UTC (Mon) by sorpigal (subscriber, #36106) [Link]

Interesting. Have you read "The camel has two humps"? It seems like this might be related. To quote the introduction:

most people can’t learn to program: between 30% and 60% of every university computer science department’s intake fail the first programming course.

Does a similar curve to the one they describe (e.g. figure 14) occur for functional programming, or is it different?

Learning to program

Posted Apr 23, 2012 13:40 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

When I learned, the introductory programming course at university used a functional programming language (ML) for no other reason than because students were extremely unlikely to already be familiar with the language and would therefore have to attend at least the early classes in order to have any idea what was going on, giving faculty and opportunity to impress upon those (like me) who already thought they could program a computer that there was a good deal more to it than we thought.

But I don't remember washout rates as high as those suggested. I guess it's conceivable that some of my colleagues just avoided all subsequent programming but it seems very unlikely.

Learning to program

Posted Apr 23, 2012 14:22 UTC (Mon) by juliank (subscriber, #45896) [Link]

People should use Haskell, and only Haskell. Haskell is the most important language in the FP area. Once people have mastered Haskell or in parallel, they should learn C Programming in an environment sufficiently compatible to POSIX.1-2001 (or Linux), and Makefiles.

Almost everything else apart from Haskell programming and POSIX programming is just non-sense.

Learning to program

Posted Apr 23, 2012 16:39 UTC (Mon) by dps (subscriber, #5725) [Link]

When I did my 1st degree in (Mathematics and) CS we used a functional programming language before an imperative one. One of the exercises was to write an interactive program which appeared to have state. Obviously actually having state was impossible because that would violate fundamental principles of functional programming.

I think that functional programming languages are mathematically beautiful inferior to imperative languages for most tasks. Attempting to use an imperative programming language as functional one, or vica-vesra, is both stupid and painful.

Serious numerical analysis is *never* done using functional programming languages: any language without BLAS and linpack can be ruled out immediately. The popular choices include high performance fortran, which does not support recursion.

Learning to program

Posted Apr 23, 2012 18:08 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

>Obviously actually having state was impossible because that would violate fundamental principles of functional programming.

It's certainly possible. You just pass the state of the 'world' as a parameter to functions that can affect it.

>Serious numerical analysis is *never* done using functional programming languages: any language without BLAS and linpack can be ruled out immediately. The popular choices include high performance fortran, which does not support recursion.

Wow! So much bogosity in two statements.
1) Functional programs certainly can be fast. And serious numeric analysis is now often done with NumPy which is _interpreted_.
2) Fortran supports recursion since 1990 (actually since much earlier mostly in form of vendor-specific extensions).

Learning to program

Posted Apr 26, 2012 3:16 UTC (Thu) by samlh (subscriber, #56788) [Link]

> And serious numeric analysis is now often done with NumPy which is _interpreted_.

Much of NumPy is a thin wrapper around BLAS, linpack, etc. What makes it popular is that it provides an intuitive interface, but the number-crunching is rarely written in Python.

Learning to program

Posted Apr 26, 2012 12:58 UTC (Thu) by nix (subscriber, #2304) [Link]

Serious numerical analysis is *never* done using functional programming languages: any language without BLAS and linpack can be ruled out immediately. The popular choices include high performance fortran, which does not support recursion.
I don't do numerical analysis, nor do I know anyone who does, so I cannot argue regarding the truth of your assertion. But as a simple Google search will show, Haskell has bindings to BLAS.

Learning to program

Posted Apr 28, 2012 0:20 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

"most people can’t learn to program: between 30% and 60% of every university computer science department’s intake fail the first programming course."
But I don't remember washout rates as high as those suggested.

It's misleading. A footnote implies most of those who failed received credit for the class and moved on to the next level. By "failed," the authors mean they failed to learn the subject matter -- to program -- and they say the policy in these classes is to pass a certain fraction of the class rather than to apply an absolute standard as the researchers did.

GCC and static analysis

Posted Apr 23, 2012 20:32 UTC (Mon) by khim (subscriber, #9252) [Link]

Interesting. Have you read "The camel has two humps"? It seems like this might be related.

Yes, I've read this thing - sadly after I've stopped teaching. It's related but different. The camel has two humps explains that the most important thing for a programmer it to understand that programming is The Glass Bead Game: computer has no common sense thus it's your responsibility to bring sense to the program. And that it's very hard, almost impossible, to teach it to someone who's not accepting that right away!

Functional programming is similar but different: it separates the result of said game from the process! It certainly feels that even people who can play The Glass Bead Game successfully not all can do this next step - and that means they'll not be able to successfully utilize functional programming. Even if they'll be able to write programs using functional languages they'll do in a very inefficient way.

Think STL: it mostly tries to use functional style, but sometimes fails. For example map::insert function - it returns the pair<iterator, bool>. This is obvious thing to do in the imperative world. But in functional world (which is related to SMP world, BTW) it's bad idea: it means that the whole insert operation must be done sequentially before insert method will return!

Does a similar curve to the one they describe (e.g. figure 14) occur for functional programming, or is it different?

Well, I no longer teach CS thus I can not check, but it certainly felt this way: people either "got" functional programming easily or were unable to pick it at all. They tried to invent a way (often successful) to shoehorn cycles to the language which had noting of the sort.

Of course most people had problem with just an imperative programming (as The camel has two humps shows) thus I can not really say what percentage of students had trouble with programming and what percentage only had trouble with functional programming. I only remember the key result: people who groked functional programming were able to program imperatively (I can not recall even a single person who was able to succeed with functional programming but had trouble with imperative one - this probably happens, but it's extremely rare), yet people who were barely able to pass functional programming part often did very well with imperative part.

GCC and static analysis

Posted Apr 24, 2012 11:59 UTC (Tue) by renox (subscriber, #23785) [Link]

> computer has no common sense thus it's your responsibility to bring sense to the program. And that it's very hard, almost impossible, to teach it to someone who's not accepting that right away!

This specific part depends probably a lot if you're teaching Prolog or if you're teaching assembly language..

GCC and static analysis

Posted May 17, 2012 15:50 UTC (Thu) by nye (guest, #51576) [Link]

>I only remember the key result: people who groked functional programming were able to program imperatively (I can not recall even a single person who was able to succeed with functional programming but had trouble with imperative one - this probably happens, but it's extremely rare)

That's interesting - not particularly surprising but it still makes me feel weird.

When I started my CS course (about a decade ago) we started with Scheme, which I found simple and intuitive. It took me quite a while to get used to imperative programming (in C and Ada) after that - it felt very unnatural and I didn't really feel like I understood what I was doing, more like shooting in the dark.

Later we had a couple of courses using Prolog, and I *never* managed to grok that programming style, so I find it easy to believe that some people are intrinsically more inclined towards one programming style versus another.

GCC and static analysis

Posted May 20, 2012 21:49 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> Later we had a couple of courses using Prolog, and I *never* managed to grok that programming style, so I find it easy to believe that some people are intrinsically more inclined towards one programming style versus another.

It took until the last weekend of a two week project for things to "click" with Prolog. My experience tells me that half of Prolog programming is knowing where to stick '!' in between your statements. Of course, my imperative skills were a little "broken" after that project, but I'm glad I got to work with Prolog even though I don't use it for anything today.

GCC and static analysis

Posted Apr 23, 2012 21:23 UTC (Mon) by HelloWorld (guest, #56129) [Link]

I have only read the abstract for now, but it seems to make for an interesting read. Thanks for that!

GCC and static analysis

Posted Apr 23, 2012 23:27 UTC (Mon) by neilbrown (subscriber, #359) [Link]

Thanks so much for that link. A delightfully written - as well as interesting - paper.

I particularly liked:

> what distinguishes the three groups in the first test is their different attitudes to meaninglessness.

That sums it all up so beautifully!

GCC and static analysis

Posted Apr 23, 2012 23:30 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

It is a good paper, is there any follow-up work happening? Are the full tests (and scoring info) available anywhere?

GCC and static analysis

Posted Apr 24, 2012 10:58 UTC (Tue) by sorpigal (subscriber, #36106) [Link]

Yes, there have been several follow-up papers from the same people on the same subject. I've just skimmed them, but they mostly look like defense against various attempts to deny the results of the first paper. There are additional statistics, though, which are interesting.

GCC and static analysis

Posted Apr 26, 2012 13:05 UTC (Thu) by nix (subscriber, #2304) [Link]

Quite. I'm not so sure about the division they jokingly draw a few paras further down, though, dividing software practitioners into those who can take the full dose of CS formality and those who are doomed to not enjoy programming, become software engineers, and be drowned in UML. Some of us had trouble with formal CS stuff (certainly at that age) but still enjoy programming very much, thank you (and see UML as the useless pile of notation that it is!).

GCC and static analysis

Posted Apr 24, 2012 12:51 UTC (Tue) by piman (subscriber, #8957) [Link]

> Interesting. Have you read "The camel has two humps"? It seems like this might be related.

I'm sick of seeing this paper thrown around. To quote the followup paper from the same author,

> Two years ago we appeared to have discovered an exciting and enigmatic new predictor of success in a first programming course. We now report that after six experiments, involving more than 500 students at six institutions in three countries, the predictive effect of our test has failed to live up to that early promise.

In other words, more research is needed, especially before lay people go around citing the original paper as evidence of any particular teaching method or subject.

GCC and static analysis

Posted Apr 24, 2012 16:13 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

and there's a newer paper (July 2009) than the one you are referring (Jan 2008) to that seems to say that there is correlation

GCC and static analysis

Posted Apr 26, 2012 4:21 UTC (Thu) by fuhchee (guest, #40059) [Link]

"We now report that [...] the predictive effect of our test has failed to live up to that early promise."

What a breath of fresh air, to scientifically evaluate one's prediction, then publish its failure. Bravo.

GCC and static analysis

Posted Apr 24, 2012 8:58 UTC (Tue) by ssmith32 (subscriber, #72404) [Link]

How did you factor in the teaching skill? After all, your teaching was the one constant here. In the early programmer courses I took, there always students who felt hopeless, and just did brute forces symbol manipulation, until you pulled them aside and explain things to the differently. I had a few people that helped me, and I helped a few people once I learned. The teachers often just kept doing the same wrong things over and over again.

Most off-hand statistics like this also ignore the previous experience of the students. Even studies which supposedly factor this out only use things like "previous classes" or "previous work" or "how the student rated themselves". Yet all of these are not-so-great proxies for previous experience on a specific problem with specific approach. More often then not, the students I knew who just "got it", often were hacking on similar problems in their spare time with similar tools, before they ever started a class. Once again, I've been on both sides of this equation: and OS class with a horrible teacher and friends who just "magically" got it, assured me that it was a bad teacher and not me (I was in doubt). After which I started hacking on linux at home. Then I took a more advanced OS class with an even more horrible teacher, and just "got it", leaving another friend frustrated. So I had to pull them aside, step through stuff, assure them that it was not them, the teacher just sucked.

That's how the other side looks to most of us normal, imperative people. Maybe we just suck, but we write most of the software, so :P

PS. Also, you basically contradicted yourself - you started off saying "no one can learn functional programming", yet somehow concluded that it will be the answer, because it's the only thing that works... if no one can use it, sorry, it won't be the answer.

GCC and static analysis

Posted Apr 24, 2012 16:07 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

I have been in a lot of different programming classes, with a lot of different teachers with different styles, as well as doing tutoring of fellow students. I also now have quite a few years of working in a company that does a lot of development.

The idea that some people just can't wrap their mind around programming, and others can get the job done, but always with a struggle really matches my experience.

The claim that this relates to people accepting that the program is really meaningless really rings true as I think about it.

GCC and static analysis

Posted Apr 24, 2012 16:45 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

by the way, the studies don't say that the best results were from people who were correct initially, just that the best results were from the people who were most consistent initially.

This also feels right to me. Someone who applies the wrong rules consistently just needs to learn what the right rules are. People who don't consistently use the same rules are in much worse shape.

GCC and static analysis

Posted Apr 24, 2012 21:32 UTC (Tue) by neilbrown (subscriber, #359) [Link]

My own reflections suggest that the consistency is probably a bit of a distraction and that it is the meaninglessness that is really important.
When people are not consistent it isn't because it is their nature to be inconsistent, but rather because they cannot see any possible consistency because they are blinded by the meaninglessness.

I actually suspect a strong correlation here with enjoying Science Fiction, which can often seem quite meaningless to outsiders and requires serious suspension-of-disbelief.

To enjoy SciFi (and fantasy, and probably D&D is a good predictor too) you need to immerse yourself in a world that has clear rules (mostly) but very different rules from the ones we are used to. So you cannot use your "intuition" to understand what is happening, you must reason from the stated rules. This is exactly the attitude that you need to bring to computer programming.

The people who were consistent made up rules and followed them. The people who were not consistent simply couldn't find any rules and their intuition failed them (as you would expect it to).

The paper also recalls previous reports of a correlation between programming ability and and ability with native language (already noted by Dijkstra). I think this fits the pattern. Meaning in language comes through use and people who depend on meaning will simply copy the usage they are exposed to which will be a/ limited and b/ erroneous (Like people using "Bob and I" as the object of the verb - I hate that!). People who are finding meaning in "meaningless" rules will understand the grammar (often seen as meaningless by people who haven't 'got' it) and other aspects of the language that they are not so often exposed to and thus have a richer language model to speak from.

Now I just need to convince some academic to test my brilliant hypothesis.

GCC and static analysis

Posted Apr 24, 2012 23:02 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

> …finding meaning in "meaningless"[ness]…

Sounds like something down Hofstadter's alley to me.

GCC and static analysis

Posted Apr 24, 2012 23:32 UTC (Tue) by neilbrown (subscriber, #359) [Link]

> Sounds like something down Hofstadter's alley to me.

Yes - just to the left of the 'l'. :-)

GCC and static analysis

Posted Apr 25, 2012 14:54 UTC (Wed) by fuhchee (guest, #40059) [Link]

Presenting Erlang and Haskell as lacking mutable state, and thus free of concurrency problems, is only approximately true. While plain variables may be unchangeable after assignment, via actors or other syntax, mutable state can definitely be created in functional languages.

GCC and static analysis

Posted Apr 26, 2012 19:49 UTC (Thu) by peter-b (subscriber, #66996) [Link]

Don't forget the "wait-free" class of algorithms, which avoid locking by restarting an operation if a conflict is detected -- usually these are implemented using atomic operations such as compare-and-swap. This paper gives a decent overview:

M. Herlihy, "Wait-free synchronization", ACM Transactions on Programming Languages and Systems, Volume 13 Issue 1, Jan. 1991

Wait-free disjoint set algorithms turned out to be the key to getting high multiprocessing performance in the image feature extraction tools I've written for my PhD research.

GCC and static analysis - GCC Python Plugin?

Posted Apr 21, 2012 17:50 UTC (Sat) by mjw (subscriber, #16740) [Link]

As a counter it would be interesting have an LWN article about the new GCC Python Plugin https://fedorahosted.org/gcc-python-plugin/ which is fairly new but includes a static analysis tool for CPython extension code http://gcc-python-plugin.readthedocs.org/en/latest/cpyche... which already seems to have some nice success stories:
http://gcc-python-plugin.readthedocs.org/en/latest/succes...

GCC and static analysis - GCC Python Plugin?

Posted Apr 21, 2012 17:57 UTC (Sat) by corbet (editor, #1) [Link]

I did look briefly at that plugin last year. Doing more with it would be great, but it's a hard subject to approach; one needs to know a lot about GCC internals to even begin.

GCC and static analysis

Posted Apr 21, 2012 17:57 UTC (Sat) by khim (subscriber, #9252) [Link]

Note that Google also moves dynamic tools to LLVM: TSan is reimplemented as LLVM plugin while ASan is implemented similarly from the start thus it's not just about static analysis

GCC and static analysis

Posted Apr 21, 2012 21:15 UTC (Sat) by mitchskin (subscriber, #32405) [Link]

Is this really a surprise? RMS's resistance to a GCC plugin system was intense and long-lived. LLVM/clang, on the other hand, tried to be modular and pluggable from the very beginning. One might naturally expect that LLVM would be a better platform for this sort of thing, even if GCC did finally start to soften its stance on plugins; GCC's architecture isn't going to change wholesale overnight.

GCC and static analysis

Posted Apr 21, 2012 22:59 UTC (Sat) by jpnp (subscriber, #63341) [Link]

Google have also shown some resistance to GPL projects. When GCC was the only game in town they'll use it, but recently CLang has gotten to the stage of being a feasible alternative. I'd imagine that, even if they considered both GCC & LLVM technically equal to the task, they'd favour the non-GPL project.

GCC and static analysis

Posted Apr 22, 2012 15:15 UTC (Sun) by mbligh (subscriber, #7720) [Link]

Google is a company run by its engineers. You'll find a widely diverse set of opinions and attitudes and licensing - it depends more on the individuals involved than some corporate stance.

GCC and static analysis

Posted Apr 23, 2012 3:42 UTC (Mon) by dberlin (subscriber, #24694) [Link]

Yes, google has no official corporate stance, except that we support both projects.

GCC and static analysis

Posted Apr 23, 2012 23:35 UTC (Mon) by marcH (guest, #57642) [Link]

Any big company has a widely diverse set of opinions and attitudes. The only difference between companies is how much of this diversity is hidden/controlled.

GCC and static analysis

Posted Apr 25, 2012 10:15 UTC (Wed) by wahern (subscriber, #37304) [Link]

If you read the actual explanation of the issues (follow the link in the article), you'd realize that your analysis bares no relationship whatsoever to the real reasons they switched. He lists several requirements, and GCC by his estimation scored better than LLVM/Clang for most of them. The only issue issue is that the areas where LLVM/Clang excelled happened to be far more important for those particular engineers wrt to their particular project.

GCC and static analysis

Posted Apr 24, 2012 11:29 UTC (Tue) by jwakely (subscriber, #60262) [Link]

> The result has been a sort of wake-up call for GCC developers.

Really?

> Is the GCC compiler suite not well suited to the creation of static analysis tools?

No. I thought that was well known.

GCC and static analysis

Posted Apr 24, 2012 23:50 UTC (Tue) by stevenb (guest, #11536) [Link]

Not really, I'd say. GIMPLE was not designed with source language level static analysis in mind.

Also, this "The GIMPLE representation changes significantly from one compiler release to the next, causing a lot of things to break." argument is funny, coming from a Google developer. The only major change in the GIMPLE representation since GCC 4.0 is the change from 'everything-is-a-tree' to GIMPLE tuples. This change was mostly the work of ... Google developers!

I can't help having the impression that Google have made their own job harder by working on several branches simultaneously, instead of following the GCC trunk. It leads to what appears to be a waste of developer time (see for example http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01511.html where a Google developer hacks around a GCC 4.6 deficiency that was already fixed for GCC 4.7 in a much more comprehensive way).

Maybe Google should align their work better internally, and work more closely with the GCC community? I think that would be beneficial for both...

And for something completely different: I had so hoped that there would be a GSoC project proposal to port clang to emit GIMPLE, but alas! ...

GCC and static analysis

Posted May 3, 2012 8:06 UTC (Thu) by mlopezibanez (guest, #66088) [Link]

Blaming Google instead of reflecting on what GCC devs could have done to avoid this situation seems to be a bad strategy for keeping Google as a contributor to GCC.

LLVM is a much more welcoming environment than GCC. See the reaction to essentially the same patch:

http://www.mail-archive.com/cfe-commits@cs.uiuc.edu/msg30...
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00974.html

The result? Clang has this warning, GCC doesn't. I think it is easy to find many examples like this.

Improving Clang/LLVM is relatively easy. Working on GCC requires incredibly amounts of knowledge, motivation, persistence and social maneuvering to get patches reviewed and approved.

One of the points that Delesley made is that the code was integrated into trunk immediately in LLVM, whereas in GCC it was languishing in a branch that would take super-human persistence during several months to get reviewed and accepted.

In fact, it is likely that Google will work less closely with GCC, as more of their engineers rely on LLVM.

GCC and static analysis

Posted May 3, 2012 7:29 UTC (Thu) by mlopezibanez (guest, #66088) [Link]

Well, the article seems to miss the most important question:

* Does GCC need to concern itself with static analysis, diagnostics, modularity and all the other things where Clang excels and GCC sucks?

My opinion is that these shortcomings are producing an increasing transfer of contributors and users from GCC to LLVM, so this should be a high-priority.

Unfortunately, GCC maintainers see these things as nuisances and, at best, their answer is that people who are interested in this should do the work (without disturbing the GCC maintainers' work). But people who are interested in these features are moving to LLVM, and then working on LLVM to get the features where GCC still has some advantage (portability, optimization).

It would have been nice if the article, apart from reading what anyone could read in the mailing list, have asked some GCC global maintainers (some from google, some for suse/fedora) about their opinion on the subject and the future of GCC. Contrary to what the article says, there has been no visible wake-up call at all.

GCC and static analysis

Posted Apr 24, 2012 14:58 UTC (Tue) by zmower (subscriber, #3005) [Link]

put_lock is an UNLOCK_FUNCTION? No wonder it needs an annotation!

Adding annotations to one C compiler to achieve this seems like a lot of work for little gain. Especially coming from a defense background were we've had protected types in Ada since Ada 95.

GCC and static analysis

Posted Apr 25, 2012 20:13 UTC (Wed) by magnus (subscriber, #34778) [Link]

"release_lock" would be a lot more unambiguous than "put_lock", but too many characters to type for the programmer I suppose... :)

The whole metaphor of "getting a lock" is a bit broken overall when you think about it. Actually "getting a key" to something would make more sense (like if I have the key to something then I'm the only one that can access it)

Existing Tools.

Posted Apr 29, 2012 7:51 UTC (Sun) by gmatht (guest, #58961) [Link]

Since the LLVM project has had its own static analysis tool for a while now, it doesn't surprise me that that it makes a good base to build a new static analysis tool on top of.

One thing mentioned was that the LLVM representation was at a higher level. I imagine that this means that the LLVM tool is less likely to pick up security flaws introduced by flawed compiler optimizations. One interesting technique is to used "Typed Assembly Language" so we can verify the generated assembly code, for example TALx86 verifies x86 level assembly. It also has some significant downsides; only a safe subset of C can be compiled to TALx86.

GCC and static analysis

Posted May 2, 2012 15:19 UTC (Wed) by npsimons (guest, #80535) [Link]

> Static analysis tools will almost certainly be an increasingly important
> part of many developers' tool boxes.

Will be? I don't know about you, but when I'm hacking on C or C++, one of the first things I do is setup my run_tests script to run as many static tools on my code as possible. It's saved me tons of time, and those tools are just ones separate from GCC! I am happy that there is competition, and I still need to look into LLVM/CLANG, but there are other static code checking tools (splint and cppcheck are two that come to mind).

GCC and static analysis

Posted May 2, 2012 16:21 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Same here. I don't have static analysis tools enabled in my build trees yet, but all unit tests get coupled with targets to run the following over them:

- valgrind
- callgrind
- helgrind
- cachegrind
- massif
- Also experimental valgrind tools, but I haven't actually used them yet.
- gprof

Fire off the 'valgrind' target, wait a while, come back and inspect memory leaks (mainly in Python from the bindings). Tack that on top of pretty much every warning flag GCC understands, and I am much more sure that the code is safe (though still not correct).

GCC and static analysis

Posted May 4, 2012 16:23 UTC (Fri) by jezuch (subscriber, #52988) [Link]

> Will be? I don't know about you, but when I'm hacking on C or C++, one of the first things I do is setup my run_tests script to run as many static tools on my code as possible.

Same here (in Java, I mean; FindBugs all the way!), but not everyone is so enlightened yet, hence "will be". (I also turn on as many compiler warnings as is practical; and I say "practical" because sometimes there are warnings that are a matter of taste, at least in Java.)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds