LWN.net Logo

LWN.net Weekly Edition for June 12, 2008

Google announces Gadgets for Linux

By Jake Edge
June 11, 2008

Google recently announced the release of their Gadgets for the Linux desktop, and, unlike some of their other desktop offerings, they released it under a free software license. While it is not earth-shattering technology, Gadgets does provide some interesting features and amusing diversions. It also generates some hope that Google is getting better at understanding what free software users are looking for, so perhaps things like the Google Desktop for Linux will be better integrated and more useful in the future.

Gadgets are a cross-platform way to create simple applications that can run on web pages and desktops. The gadget API provides a means to retrieve content from other sites and display it along with a user interface. Many kinds of applications can be created, from clocks and calendars to RSS-feedreaders and "picture of the day" viewers.

[Gadget desktop]

There are numerous gadgets available, a semi-random collection on a KDE desktop can be seen at left. Google has created a handful of gadgets, but the vast majority are available from others in various categories including News, Sports, Finance, Fun and Games, Technology, and Communication. The gadget browser shown below, at right, allows easy access to an amazing number of choices, many of which are variations on a theme.

[Gadget browser]

To get started with gadgets, it is first necessary to build the tool. Google does not yet provide .rpm or .deb files for various distributions. The "how to build" page was useful, but there was some difficulty in trying to translate the dependencies into Fedora 9 package names. A page in a language I don't know needed no translation, however. Linux commands, it seems, are multi-lingual.

Building from the Apache-licensed source tarball was straightforward after that. Gadgets for Linux comes in both GTK+ and Qt flavors which allows for integration with the two dominant Linux desktop environments. The screenshots accompanying this article are from the Qt version, but a bit of a look at the GTK+ version seemed roughly the same—though the Qt version lacks the sidebar dock.

This is a beta release, perhaps more of a beta than many Google releases, so there are still a fair number of glitches. Perhaps 20% of the gadgets tried had one problem or another, with some seeming not to function at all. Having no experience with gadgets on other platforms, it was not clear whether these were caused by bugs in the gadgets themselves or the desktop gadget program.

[Moon image gadget]

The main benefit of the gadget API seems to be the cross-platform capabilities. Gadgets can run—largely unchanged—on Linux, Mac OS X, or Windows, but can also run in browsers on web pages at social networking sites or on other pages. If the API can deliver that wide of a range of platform choices, it could open up a much wider audience for folks that want to develop their gadgets on Linux.

Still missing is one of the tools recommended for developing gadgets, Gadget Designer, which is only available for Windows. The documentation for creating a gadget make it look like a tedious exercise in XML manipulation and Javascript programming, but there may be tools available or in development to make some of that easier.

Overall, gadgets look like an interesting project. There is really nothing new about the kinds of applications that can be built using the API, but there are few choices to build those kinds of programs in a truly cross-platform way. Google's choice to support Linux—and support it well—accompanied by the code under a free software license is, perhaps, the best news of all.

Comments (7 posted)

An interview with Jim Ready

By Jonathan Corbet
June 11, 2008
Jim Ready has a long history in the embedded systems market. Most recently, he became the founder of MontaVista, now one of the most successful embedded Linux companies. A recent LWN article took issue with some of Jim's comments; it only seemed fair to give him the opportunity to present his side of the story. Thus, this interview. We asked several questions about MontaVista and its approach to Linux marketing, and Jim took quite a bit of time to answer them in detail. So, without further ado...

You have been working in the embedded Linux market for some years. How has that market changed over that time? What do you think are the prospects for embedded Linux now?

The single biggest change, and one that gives me great pleasure, is that embedded Linux is now mainstream, part of the landscape, and arguably the fastest growing embedded OS. Believe me, when we started in 1999 that was hardly the case. Of course, the complexity of the devices that our customers are building continues to increase. The underlying hardware is typically a highly integrated system with loads of I/O, for example the SOCs (System On a Chip) such as the TI 3430. That in turn drives both complexity in Linux as well as in the application and middleware software stacks. It's pretty amazing to realize that a little Linux-based handheld device running on batteries is powerful enough to have supplied the State of California government's computing needs not so many years ago.

Where do you think MontaVista's sweet spot is in that market?

Companies who are highly focused on their value-add who want a first class partner to supply them a suitable Linux and associated services (consulting and training etc.) upon which they will develop their application. The more formal approach a company takes towards their own software development, the more they care about meeting schedules and the higher their requirements for quality, the "sweeter" MontaVista looks. When you're basing a billion dollar product line on someone's OS you care about what's going in your product. So when Motorola, NEC or Panasonic end up shipping 30 million phones, they need a supplier who can meet their technical, schedule and quality requirements. The phones have to ship for the Christmas season, and no one wants to recall millions of devices.

As another example, our Carrier Grade Linux distribution is the core OS in deployed NEC systems which have established 99.9999% availability (that's no more than ~31.5 seconds of unscheduled downtime in a year, which is a DoCoMo requirement). Our Professional Edition is the OS for two different patient monitoring systems that have been through FDA certification. We're truly fortunate to have thousands of customers, both big and not-so-big.

Embedded systems vendors have, as a group, been criticized for their lack of participation in the free software development process. Are you happy with MontaVista's level of contribution? What, in your mind, are some of the highlights of MontaVista's community participation?

Most contribution surveys show MV in the top 10 of Linux contributors. (No other embedded Linux vendor even makes it into the top 30) Arguably MontaVista contributes more to Linux relative to our size than any other company in the world. It has always been a cornerstone of our strategy to be a major contributor to Linux. We figured the more gas we poured on the Linux fire the faster we could erode the RTOS suppliers installed base and speed to movement towards Linux. We are perhaps best known for our work in helping making Linux real-time capable but over time we have also made significant contributions in the PPC, XScale, MIPS and ARM trees as well as some other specific projects such as kgdb, LTT, DPM (Dynamic Power Management) etc.

Your recent article in Military Embedded Systems was seized upon by a proprietary embedded vendor as proof that Linux is too expensive and difficult for embedded applications. Assuming you disagree with his conclusions, where do you think his reasoning went wrong?

Well there really wasn't any reasoning, just ranting. But having said that Dan (Dan O'Dowd Greenhills' CEO) implied that our business model was to allow customers to more or less get in trouble by developing their own from scratch Linux distribution and then charge them for support to bail them out. Of course that's not what we do. Rather we build a robust fully tested and supported embedded Linux distribution (MontaVista Linux Professional Edition, for example) and deliver that to our customers. We then maintain and support that specific version of MontaVista Linux over time, even as the community dashes forward. In fact we have maintenance obligations that can be as long as a decade from initial deployment.

That approach gets the customer out of the business of making their own distribution, maintaining and supporting it with all the accompanying costs. So we shield the customer from the complexity and change rate that they otherwise would be exposed to if they were on their own. They don't have to watch all the patches, monitor the newsgroups and otherwise be tied up, they can get on to building their product. Dan purposely ignored the fact that a commercial embedded Linux distribution makes it very easy to use Linux as an embedded OS. I suspect that's why he tried to hide it.

Your article suggests that an embedded systems manufacturer using Linux would start by assembling the kernel and development toolchain by hand. Why do you think they would do that? Even in the absence of vendors like MontaVista, there are numerous options which do not require assembling systems at such a low level; why would a vendor not use one of them?

We know from direct experience that even starting with what appears to be "pre-assembled" distribution, from a semiconductor maker or elsewhere, a developer sometimes isn't getting what they think they are.

Don't get me wrong, almost any Linux distribution can serve as a starting point, maybe 99.99% perfect, but our customers demand more than that. They want to be at the end of the Linux development cycle, not the beginning. For example, a Linux distribution we recently started working with had the following problems:

  • The code explicitly ignored Linux coding standards by adding hardware dependencies. That code would never be accepted into the upstream trees, and this kind of fork creates debugging issues and additional maintenance burden.

  • The drivers were not SMP-safe, real-time safe nor did they support DPM, yet the device was designed for applications where all three could well be required. In order to take advantage of these advanced features, the device driver would need to be re-written from scratch.

  • The code contained numerous defects that caused the system to crash. Error returns were not checked and other problems indicating very poor coding practices. These are exactly the type of quality issues that should compel businesses to find a Linux commercialization partner.

We had the great pleasure of fixing all these problems as we assembled our distribution. Even with our standard practice of pushing back the changes, as you well know, there is no guarantee by the community that these changes make it back into the appropriate open source trees.

The fact is it is difficult for a prospective Linux developer to have any idea of the state of the Linux distribution they might select. A high quality, commercial distribution can give a developer some peace of mind about what they are getting. For example, MontaVista has a formal development process in place for each of its releases, with quantitative criteria that must be met for defects (0 critical defects for example with a sharply declining overall new defect detection curve.) before the distribution can ship. Our processes have been formally audited by a number of our largest customers in order to assure themselves of what they are getting from us. And as we mentioned above, the proven results from devices in the field speak to our abilities. As for other starting points, you'll have to ask them about their process.

There were some interesting numbers in that article. Where did the 5000 messages/day for kernel.org come from - which lists?

Since our engineers live and breathe Linux and the other Open Source components that make up and embedded Linux distribution, we have a pretty good feel for the overall rate of traffic we keep up with. Based upon that experience we measured the overall message traffic that a developer would have to monitor to keep abreast of the daily ins and outs of the typical mix of software they would use for an embedded Linux project. We aggregated the total under kernel.org, which isn't precisely correct, but the paragraph preceding that statement clearly referred to a set of lists one would have to monitor.

For example, the monitoring would include not only lkml, but also the lists for other significant parts of the software typically used for an embedded project, including the list maintained for the specific architecture used (MIPS, PPC ARM etc), the real-time list, networking, IPv6, security, advanced filesystems etc. By the way, the lkml list on May 21, 2008 contained ~500 messages, and gcc contained ~100, just for starters. So it wasn't just lkml at 5000 but a total set that can total up to 5000 per day. Does the fact that lkml is only ~500 a day (and "only" ~200-300 on weekends) make it any less daunting? I don't think so.

You say: "a recent security patch that took all of 13 lines of code to implement against an embedded Linux system would have taken more than 800k lines of source patches to implement if the previous trail of patches had been ignored." How was that number arrived at? Which security patch were you describing? How could it possibly require 800,000 lines of patches "to implement" this security fix?

This example comes from a sequence back in 2006 (CVE-2006-1528 to be precise), but the "problem" is just as true today. Here's the setup: A developer decides to use Linux and has taken the strategy of minimizing their costs by using a community-maintained Linux kernel. (This story would be true if the developer started with the typical semiconductor distribution, by the way) The community has a good reputation for stability and defect resolution and therefore the developers think they can minimize their own effort.

They start with Linux 2.6.10 and base their device application software on this release. During testing, they notice a defect and find that the defect has been identified (the good news) and fixed in 2.6.13 (the bad news). So now they have a problem, moving up to 2.6.13, where the defect is fixed also introduces 846,233 new lines of code (the delta between 2.6.10 and 2.6.13).

This magnitude of change restarts their QA process, since so much code has changed in the underlying Linux kernel. Their other choice is to backport the fix, which in this particular case is 33 lines (we know because we did it), but now the developer has taken on maintenance of their own Linux, which was what they were trying to avoid in the first place. This drift between a Linux release you have baselined and the fact that defects are often fixed in newer releases presents a less than perfect set of choices for developers. Whether you wanted to or not, you're in the Linux maintenance business. This drift problem is true of many distributions, not just dealing with kernel.org.

If our customer found the same defect, we have the obligation to fix it in the release that they purchased from us; we don't force them to potentially destabilize their environment by sending them a newer kernel release where the defect was originally fixed. I guess it all depends how cavalier one is about changing your underlying operating system after you've developed and tested your application. In general our customers are very strict about minimizing changes, and so are we.

At least some of MontaVista's marketing would appear to focus on making Linux look scary. Are you not concerned that this approach might have the effect of making Linux in general look less attractive and, thus, playing into the hands of proprietary systems vendors?

No one is a bigger proponent of embedded Linux than MontaVista (and we have the contributions to prove it). But it doesn't do us any good to have folks try Linux, get over their heads and fail, and attribute that failure to Linux.

We have seen over the past 8 years any number of projects that got into trouble by not understanding what to expect when they downloaded some Linux and started in by themselves. In fact one of our very earliest customers, back in 1999, had started off building their own Linux, and hit a hardware integration bug that stopped them dead in their tracks for weeks, putting their project in real trouble. Had we not been able to help them out, their alternative was Windows CE. Ugh!

Why shouldn't many millions of lines of complex operating system code that changes daily be a little scary, especially when your business is making devices, not operating systems? I think it is a mistake to "trivialize" the difficulty in owning large amounts of any software, including Linux. That's why I think it's important for folks to be well informed about what they are getting into, so they can make good decisions on how they will approach using Linux for their system, whether they do-it-themselves or go commercial. In either case we want them to succeed.

Is there anything else you would like to pass on to LWN's readers?

We can all be quite proud of the enormous progress Linux has made in transforming the embedded OS marketplace from one which was highly fragmented and largely devoid of standards, to an environment based upon a highly functional OS which is a truly open standard: Linux. A whole cast of characters made this possible: visionary customers who dared think it was possible to embed Linux in their devices, the semiconductor companies making sure Linux was ported to their chips, commercial companies such as MontaVista and others making rock solid distributions that were capable of being deployed by the millions, and numerous individuals who made significant contributions along the way. It's a pretty powerful combination that's hard to beat.

We would like to thank Jim for taking the time to answer our questions.

Comments (2 posted)

Implications of pure and constant functions

June 10, 2008

This article was contributed by Diego Pettenò

Introduction

Attributes and why you should use them

Free Software development is often a fun task for developers, and it is its low barrier to entry (on average) that makes it possible to have so much available software for so many different tasks. This low barrier to entry, though, is also probably the cause of the widely varying quality of the code of these projects.

Most of the time, the quality issues one can find are not related to developers' lack of skill, but rather to lack of knowledge of how the tools work, in particular, the compiler. For non-interpreted languages, the compiler is probably the most complex tool developers have to deal with. Because a lot of Free Software is written in C, GCC is often the compiler of choice.

Modern compilers are also supposed to do a great job at optimizing the code by taking code, often written with maintainability and readability in mind, and translating it into assembler code with a focus on performance. Code analysis for optimization (which is also used for warnings about the code) has the task of taking a semantic look at the code, rather than syntactic, and identifying various fragments of algorithms that can be replaced with faster code (or with code that uses a smaller memory footprint, if the user desires to do so).

This task is a pretty complex one and relies on the compiler knowing about the function called by the code. For instance, the compiler might know when to replace a call to a (local, static) function with its body (inlining) by looking at its size, the number of times it is called, and its content (loops, other calls, variables it uses). This is because the compiler can give a semantic value to the code for a function, and can thus assess the costs and benefits of a particular transformation at the time of its use.

I specified above that the compiler knows when to inline a function by looking at its content. Almost all optimizations related to function calls work this way: the compiler, knowing the body of a function, can decide when it's the case to replace a call with its body; when it is possible to completely avoid calling the function at all; and when it is possible to call it just once and thereby avoid multiple calls. This means, though, that these optimization can be applied only to functions that are defined in the same unit wherein they are used. These functions are usually limited to static functions (functions that are not defined as static can often be overridden both at link time and runtime, so the compiler cannot safely assume that what it finds in the unit is what the code will be calling).

As this is far from optimal, modern compilers like GCC provide a way for the developer to provide information about the semantics of a function, through the use of attributes attached to declarations of functions and other symbols. These attributes provide information to the compiler on what the function does, even though its body is not available. Consequently, the compiler can optimize at least some of its calls.

This article will focus on two particular attributes that GCC makes available to C developers: pure and const, which can declare a function as either pure or constant. The next section will provide a definition of these two kinds of functions, and after that I'll get into an analysis of some common optimizations that can be performed on the calls of these functions.

As with all the other function attributes supported by GCC and ICC, the pure and const attributes should be attached to the declarative prototype of the function, so that the compiler know about them when it finds a call to the function even without its definition. For static functions, the attribute can be attached to the definition by putting it between the return type and the name of the function:

    int extern_pure_function([...])
      __attribute__((pure));
    int extern_const_function([...])
      __attribute__((const));

    int __attribute__((pure)) static_pure_function([...]) {
      [...]
    }

    int __attribute__((const)) static_const_function([...]) {
      [...]
    }
    

Pure and Constant Functions

For what concerns the scope of this article, functions can be divided into three categories, from the smallest to the biggest: constant functions, pure functions and the remaining functions can be called normal functions.

As you can guess, constant functions are also pure functions, but pure functions cannot be not all pure functions are constant functions. In many ways, constant functions are a special case of pure functions. It is, therefore, best to first define pure functions and how they differ from all the rest of the functions.

A pure function is a function with basically no side effect. This means that pure functions return a value that is calculated based on given parameters and global memory, but cannot affect the value of any other global variable. Pure functions cannot reasonably lack a return type (i.e. have a void return type).

GCC documentation provides strlen() as an example of a pure function. Indeed, this function takes a pointer as a parameter, and accesses it to find its length. This function reads global memory (the memory pointed to by parameters is not considered a parameter), but does not change it, and the value returned derives from the global memory accessed.

A counter-example of a non-pure function is the strcpy() function. This function takes two pointers as parameters. It accesses the latter to read the source string, and the former to write to the destination string. As I said, the memory areas pointed to by the parameters are not parameters on their own, but are considered global memory and, in that function, global memory is not only accessed for reading, but also for writing. The return value derives directly from the parameters (it is the same as the first parameter), but global memory is affected by the side effect of strcpy(), making it not pure.

Because the global memory state remains untouched, two calls to the same pure function with the same parameters will have to return the same value. As we'll see, it is a very important assumption that the compiler is allowed to make.

A special case of pure functions is constant functions. A pure function that does not access global memory, but only its parameters, is called a constant function. This is because the function, being unrelated to the state of global memory, will always return the same value when given the same parameters. The return value is thus derived directly and exclusively from the values of the parameters given.

The way a constant function "consumes" pointers is very different from the way other functions do: it can handle them as both parameter and return value only if they are never dereferenced, for accessing the memory they are referencing would be a global memory access, which breaks the requirements of constant functions.

Of course these requirements have to apply not only to the operations in the given function, but also recursively to all the functions it calls. One function can at best be of the same kind of the least restrictive kind of function it calls. So when it calls a normal function it can't be but a normal function itself, if it only calls pure functions it can be either pure or normal, but not constant, and if it only calls constant functions it can be constant.

As with inlining, the compiler will be able to decide if a function is pure or constant, in case no attribute is attached to it, only if the function is static (with the exception of special cases for freestanding code and other advanced options). When a function is not static, even if it's local, the compiler will assume that the function can be overridden at link- or run-time so it will not make any assumption based on the body for the definition it may find.

Optimizing Function Calls

Why should developers bother with marking functions pure or constant, though? As I said, these two attributes help the compiler to know some semantic meaning of a function call, so that it can apply higher optimization than to normal functions.

There are two main optimizations that can be applied to these kinds of functions: CSE (Common Sub-expression Elimination) and DCE (Dead Code Elimination). We'll soon see in detail, with the help of the compiler itself, what these two consist of. Their names, however, are already rather explicit: CSE is used to avoid duplicating the same code inside a function, usually factoring out the code before branching or storing the results of common operations in temporary variables (registers or stack), while DCE will remove code that would never be executed or that would be executed but never used.

These are both optimization that can be implemented in the source code, to an extent, reducing the usefulness of declaring functions pure or constant. On the other hand, as I'll demonstrate, doing so often reduces the readability of the code by obscuring the actual algorithm in favor of making it faster. This does not apply to all cases though, sometimes, doing the optimization "manually", directly in the source code, makes it more readable, and makes the code resemble the output of the compiler more.

About Assemblers and Examples

When talking about optimization, it's quite difficult to visualize the task of the compiler, and the way the code morphs from what you read in the C source code into what the CPU is really going to execute. For this reason, the best way to write about them is to use examples, showing what the compilers generates starting from the source code.

Given the way in which GCC works, this is actually quite easy. You just need to enable optimization and append the -S switch to the gcc command line. This switch stops the compiler after the transformation of C source code into assembly, before the result is passed to the assembler program to produce the object file.

Although I suspect a good fraction of the people reading this article would be comfortable reading IA-32 or x86-64 assembly code, I decided to use the Blackfin [1] assembly language, which should be readable for people who have never studied a particular assembly language.

The Blackfin assembler is more symbolic than IA-32: instead of having operations named movl and addq, the operations are identified by their algebraic operators (=, +), while the registers are merely called R1, R2 and so on.

Calling conventions are also quite easy to understand: for all the cases we'll look through in the article (at most four parameters, integers or pointers), the parameters are passed through the registers, starting in order from R0. The return value of the function call is also stored in the R0 register.

To clarify the examples which will appear later on, let's see how the following C source code is translated by GCC into Blackfin code:

    int somefunction(int a, int b, int c);
    void somestringfunction(char *pA, char *pB);

    int globalvar;

    void test() {
      somestringfunction("foo", "bar");
      globalvar = somefunction(11, 22, 33);
    }
      

becomes:

	    .section	.rodata
	    .align 4
    L$LC$0:
	    .string	"foo"
	    .align 4
    L$LC$1:
	    .string	"bar"
    .text;
	    .align 4
    .global _test;
    .type _test, STT_FUNC;
    _test:
	    LINK 12;
	    R0.H = L$LC$0; 1
	    R0.L = L$LC$0;
	    R1.H = L$LC$1;
	    R1.L = L$LC$1;
	    call _somestringfunction; 2
	    R0 = 11 (X); 3
	    R1 = 22 (X);
	    R2 = 33 (X);
	    call _somefunction;
	    P2.H = _globalvar; 4
	    P2.L = _globalvar;
	    [P2] = R0; 5
	    UNLINK;
	    rts; 6
	    .size	_test, .-_test
      
1

As the Blackfin does not have 32 bit immediate load, you have to load high and low addresses separately (in whichever order); the assembler will take care of properly loading the high 16 bits of the label to the upper part of the register, and the low 16 bits to the lower part.

2 Once the parameters are loaded, the function is called almost identically to any other call operation on other architectures; note the prefixed underscore on symbols' names.

3 Integers, both constant or parameters and variables, are also loaded for calls in the registers. Blackfin doesn't have 32 bit immediate loading, but if the constant to load fits into 16 bits, it can be loaded through sign extension by appending the (X) suffix.

4 When accessing a global memory location, the P2 pointer is set to the address of the memory location...

5 ... and then dereferenced to assign that memory area. Being a RISC architecture, Blackfin does not have direct memory operations.

The return value for a function is loaded into the R0 register, and can be accessed from there.

6 The rts command is the return from subroutine, and usually indicates the end of the function, but like the return statement in C, it might appear in any place of the routine.

In the following examples, the preambles with declarations and data will be omitted whenever these are not useful to the discussion.

Concerning optimization levels, the code will almost always be compiled with at least the first optimization level enabled (-O1). This both because it makes the code cleaner to read (using register-register copy for parameters passing, instead of saving to the stack and then restoring from that) and because we need optimization enabled to see how they are applied.

Also, most of the times I'll refer to the fastest alternative. Most of what I say, though, applies also to the smaller alternative when using the -Os optimization level. In any case, the compiler always weighs the cost-to-benefit ratio between the optimized and the unoptimized version, or between different optimized versions. If you want to know the exact route the compiler takes for your code, you can always use the -S switch to find out.

DCE and Unused Variables

One area where DCE is useful is to avoid operations that result in unused data. It's not that uncommon that a variable is defined by an operation, complex or not, and is then never used by the code, either because it is intended for future expansion or because it's a remnant of older code that has been removed or replaced. While the best thing would be to get rid of the definition entirely, users expect the compiler to produce a good result with sloppy code too, and that operation should not be emitted.

The DCE pass can remove all the code that has no side effect, when its result is not used. This includes all mathematical operations and functions known to be pure or constant (as neither are allowed to change the global state of the variables). If a function call is not known to be at least pure, it may change the global state, and its call will not be eliminated, as shown in the following code:

    int someimpurefunction(int a, int b);
    int somepurefunction(int a, int b)
      __attribute__((pure));

    int testfunction(int a, int b, int c) {
      int res1 = someimpurefunction(c, b);
      int res2 = somepurefunction(b, c);
      int res3 = a + b - c;

      return a;
    }
      

Which, once compiled with -O1, [2] produces the following Blackfin assembler:

    _testfunction:
	    [--sp] = ( r7:7 );

	    LINK 12;
	    R7 = R0;
	    R0 = R2;
	    call _someimpurefunction;
	    R0 = R7;
	    UNLINK;
	    ( r7:7 ) = [sp++];

	    rts;
      

As you can see, the call to the pure function has been eliminated (the res2 variable was not being used), together with the algebraic operation but, the impure function, albeit having its return value discarded, is still called. This is due to the fact that the compiler emits the call, not knowing whether the latter function has side effects on the global memory state or not.

This is equivalent to the following code (which produces the same assembler code):

    int someimpurefunction(int a, int b);

    int testfunction(int a, int b, int c) {
      someimpurefunction(c, b);

      return a;
    }
      

The Dead Code Elimination optimization can be very helpful to reduce the overhead caused by code written to conform to C89 standard, where you couldn't mix variables (and constant) declarations with executable code.

In those sources, you had to declare variables at the top of the function, and then start to check for prerequisites. If you wanted to make it explicit that some variable had to keep its value, by making it constant, you would often have to fill them before the prerequisites could be checked.

Without discussing legacy code, it is also useful when writing debug code, so that it doesn't look out of place from the use of lots of #ifdef directives. Take for instance the following code:

    #ifdef NDEBUG
    # define assert_se(x) (x)
    #else
    void assert_se(int boolean);
    #endif

    char *getsomestring(int i) __attribute__((pure));
    int dosomethinginternal(void *ctx, int code, int val);

    int dosomething(void *ctx, int code, int val) {
      char *string = getsomestring(code);

      // returning string might be a sub-string of "something"
      // like "some" or "so"
      assert_se(strncmp(string, "something", strlen(string)) == 0);

      return dosomethinginternal(ctx, code, val);
    }
      

The assert_se macro has different behavior from the standard assert, as it has side effects, which basically means that the code passed to the assertion is called even though the compiler is told to disable debugging. This is a somewhat common trick, although its effects on readability are debatable.

With getsomestring() pure, when compiling without debugging, the DCE will remove the calls to all three functions: getsomestring(), strncmp() and strlen() (the latter two are usually declared as pure by both the C library and by GCC's built-in replacements). This because none of these functions have a side effect, resulting in a very short function:

    _dosomething:
	    LINK 0;
	    UNLINK;
	    jump.l _dosomethinginternal;
      

If our getsomestring() function weren't pure, even though its return value is not going to be used, the compiler would have to emit the call, resulting in rather more complex (albeit still simple, compared with most real-world functions) assembler code:

    _dosomething:
	    [--sp] = ( r7:5 );

	    LINK 12;
	    R7 = R0;
	    R0 = R1;
	    R6 = R1;
	    R5 = R2;
	    call _getsomestring;
	    UNLINK;
	    R0 = R7;
	    R1 = R6;
	    R2 = R5;
	    ( r7:5 ) = [sp++];

	    jump.l _dosomethinginternal;
      

Common Sub-expression Elimination

The Common Sub-expression Elimination optimization is one of the most important optimizations performed by the compiler, because it's the one that, for instance, replaces multiple indexed accesses to an array so that the actual memory address is calculated just once.

What this optimization does is to find common operations executed on the same operands (even when they are not known at compile-time), decide which ones are more expensive than saving the result in a temporary (register or stack), and then swapping the code around to take the cheapest course.

While its uses are quite varied, one of the easiest ways to see the work of the CSE is to look at the code generated when using the ternary if operator. Let's take the following code:

    int someimpurefunction(int a);
    int somepurefunction(int a)
      __attribute__((pure));

    int testfunction(int a, int b, int c, int d) {

      int res1 = someimpurefunction(a) ? someimpurefunction(a) : b;
      int res2 = somepurefunction(a) ? somepurefunction(a) : c;
      int res3 = a+b ? a+b : d;

      return res1+res2+res3;
    }
      

The compiler will optimize the code as:

    _testfunction:
	    [--sp] = ( r7:4 );

	    LINK 12;
	    R7 = R0;
	    R5 = R1;
	    R4 = R2;
	    call _someimpurefunction;
	    cc =R0==0;
	    if !cc jump L$L$2;
	    R6 = R5;
	    jump.s L$L$4;
    L$L$2:
	    R0 = R7;
	    call _someimpurefunction;
	    R6 = R0;
    L$L$4:
	    R0 = R7;
	    call _somepurefunction;
	    R1 = R0;
	    cc =R0==0;
	    if cc R1 =R4; /* movsicc-1b */
	    R0 = R5 + R7;
	    cc =R0==0;
	    R2 = [FP+36];
	    if cc R0 =R2; /* movsicc-1b */
	    R1 = R1 + R6;
	    R0 = R1 + R0;
	    UNLINK;
	    ( r7:4 ) = [sp++];

	    rts;
      

As you can see, the pure function is called just once, because the two references inside the ternary operator are equivalent, while the other one is called twice. This is because there was no change to global memory known to the compiler between the two calls of the pure function (the function itself couldn't change it – note that the compiler will never take multi-threading into account, even when asking for it explicitly through the -pthread flag), while the non-pure function is allowed to change global memory or use I/O operations.

The equivalent code in C would be something along the following lines (it differs a bit because the compiler will use different registers):

    int someimpurefunction(int a);
    int somepurefunction(int a)
      __attribute__((pure));

    int testfunction(int a, int b, int c, int d) {

      int res1 = someimpurefunction(a) ? someimpurefunction(a) : b;
      const int tmp1 = somepurefunction(a);
      int res2 = tmp1 ? tmp1 : c;
      const int tmp2 = a+b;
      int res3 = tmp2 ? tmp2 : d;

      return res1+res2+res3;
    }
      

The Common Sub-expression Elimination optimization is very useful when writing long and complex mathematical operations. The compiler can find common calculations even though they don't look common to the naked eye, and act on those.

Although sometimes you can get away with using multiple constants or variables to carry out temporary operations so that they can be re-used in the following calculations, leaving the formulae entirely explicit is usually more readable, as long as the formulae are not intended to change.

Like with other algorithms, there are some advantages to reducing the source code used to calculate the same thing; for instance you can easily make a change directly to the definition of a constant and get the change propagated to all the uses of that constant. On the other hand, this can be quite a problem if the meaning of two calculations is very different (and thus can vary in different ways with the evolution of the code), and just happen to be calculated in the same way at a given time.

Another rather useful place where the compiler can further optimize code with CSE, where it wouldn't be so nice or simple to do manually in the source code, is where you deal with static functions that are inlined by the compiler.

Let's examine the following code for instance:

    extern int a;
    extern int b;

    static inline int somefunc1(int p) {
      return (p * 16) + (3 << a);
    }

    static inline int somefunc2(int p) {
      return (p * 16) + (4 << b);
    }

    extern int res1;
    extern int res2;
    extern int res3;
    extern int res4;

    void testfunc(int p1, int p2)
    {
      res1 = somefunc1(p1);
      res2 = somefunc2(p1);
      res3 = somefunc1(p2);
      res4 = somefunc2(p2);
    }
      

In this code, you can find four basic expressions: (p1 * 16), (p2 * 16), (3 << a) and (4 << b). Each of these four expressions is used twice in the somefunc() function. Thanks to the CSE, though, the code will calculate each of them once, even though they cross the function boundary, producing the following code:

    _testfunc:
	    [--sp] = ( r7:7 );

	    LINK 0;

	    R0 <<= 4;
	    R1 <<= 4;

	    P2.H = _a;
	    P2.L = _a;
	    R2 = [P2];
	    R7 = 3 (X);
	    R7 <<= R2;

	    P2.H = _b;
	    P2.L = _b;
	    R2 = [P2];
	    R3 = 4 (X);
	    R3 <<= R2;

	    R2 = R0 + R7;
	    P2.H = _res1;
	    P2.L = _res1;
	    [P2] = R2;
	    P2.H = _res2;
	    P2.L = _res2;
	    R0 = R0 + R3;
	    [P2] = R0;

	    R7 = R1 + R7;
	    P2.H = _res3;
	    P2.L = _res3;
	    [P2] = R7;
	    R1 = R1 + R3;
	    P2.H = _res4;
	    P2.L = _res4;
	    [P2] = R1;
	    UNLINK;
	    ( r7:7 ) = [sp++];

	    rts;
      

As you can easily see (the assembly was modified a bit to improve its readability, the compiler re-ordered loads of registers to avoid pipeline stalls, making it harder to see the point), the four expressions are calculated first, and stored respectively in the registers R0, R1, R7 and R3.

These kinds of sub-expressions are usually harder to see in the code and also harder to implement. Sometimes they get factored out on their own parameter, but that can be more expensive during execution, depending on the calling conventions of the architecture.

Cheats

As I wrote above, there are some requirements that apply to functions that are declared pure and constant, related to not changing or accessing global memory; not executing I/O operations; and, of course, not calling further impure functions. The reason for this is that the compiler will accept what the user declares the function to be, whatever its body is (as it's usually unknown by the compiler at the call stage).

Sometimes, though, it's possible to fool the compiler so that it treats impure functions as pure or even constant functions. Although this is a risky endeavor, as it might truly cause bad code generation by the compiler, it can sometimes be used to force optimization for particular functions.

An example of this can be a lookup function that scans through a global table to return a value. While it is accessing global memory, you might want the compiler to promote it to a constant function, rather than simply to a pure one.

Let's take for instance the following code:


    const struct {
      const char *str;
      int val;
    } strings[] = {
      { "foo", 31 },
      { "bar", 34 },
      { "baz", -24 }
    };

    const char *lookup(int val) {
      int i;
      for(i = 0; i < sizeof(strings)/sizeof(*strings); i++)
	if ( strings[i].val == val )
	  return strings[i].str;

      return NULL;
    }

    void testfunction(int val, const char **str, unsigned long *len) {
      if ( lookup(val) ) {
	*str = lookup(val);
	*len = strlen(lookup(val));
      }
    }

      

If the lookup() function is only considered a pure function, as it is, adhering to the rules we talked about at the start of the article, it will be called three times in testfunction(), like this:

    _testfunction:
	    [--sp] = ( r7:7, p5:4 );

	    LINK 12;
	    R7 = R0;
	    P5 = R1;
	    P4 = R2;
	    call _lookup;
	    cc =R0==0;
	    if cc jump L$L$17;
	    R0 = R7;
	    call _lookup;
	    [P5] = R0;
	    R0 = R7;
	    call _lookup;
	    call _strlen;
	    [P4] = R0;
    L$L$17:
	    UNLINK;
	    ( r7:7, p5:4 ) = [sp++];

	    rts;
      

Instead, we can trick the compiler by declaring the lookup() function as constant (the data it is reading is constant, after all, so at a given parameter it will always return the same result). If we do that, the three calls will have to return the same value, and the compiler will be able to optimize them as a single call:

    _testfunction:
	    [--sp] = ( p5:4 );

	    LINK 12;
	    P5 = R1;
	    P4 = R2;
	    call _lookup;
	    cc =R0==0;
	    if cc jump L$L$17;
	    [P5] = R0;
	    call _strlen;
	    [P4] = R0;
    L$L$17:
	    UNLINK;
	    ( p5:4 ) = [sp++];

	    rts;
      

In addition to lookup functions on constant tables, this trick is useful with functions which read data from files or other volatile data, and cache it in a memory variable.

Take for instance the following function that reads an environment variable:

    char *get_testval() {
      static char *cachedval = NULL;

      if ( cachedval == NULL ) {
	cachedval = getenv("TESTVAL");
	if ( cachedval == NULL )
	  cachedval = "";
	else
	  cachedval = strdup(cachedval);
      }

      return cachedval;
    }
      

This is not truly a constant function, as its return value depends on the environment. Even so, assuming that the environment of the process is left untouched, its return value will never change between calls. Even though it will affect the global state of the program (as the cachedval static variable will be filled in the first time the function is called), it can be assumed to always return the same value.

Tricking the compiler into thinking that a function is constant even though it has to load data through I/O operations, as I said, is risky, as the compiler will think there is no I/O operation going on; on the other hand, this trick might make a difference sometimes, as it allows the expression of functions in more semantic ways, leaving it up to the compiler to optimize the code with temporaries, where needed.

One example can be the following code:

    char *get_testval() {
      static char *cachedval = NULL;

      if ( cachedval == NULL ) {
	cachedval = getenv("TESTVAL");
	if ( cachedval == NULL )
	  cachedval = "";
	else
	  cachedval = strdup(cachedval);
      }

      return cachedval;
    }

    extern int a;
    extern int b;
    extern int c;
    extern int d;

    static int testfunc1() {
      if ( strcmp(get_testval(), "FOO") == 0 )
	return a;
      else
	return b;
    }

    static int testfunc2() {
      if ( strcmp(get_testval(), "BAR") == 0 )
	return c;
      else
	return d;
    }

    int testfunction() {
      return testfunc1() + testfunc2();
    }
      
Note: To make sure that the compiler won't reduce the three function calls to their return values right away, the static sub-functions return values taken from global variables; the meanings of those variables are not important.

Considering the above source code, if get_testval() is impure, as the compiler will automatically find it to be, it will be compiled into:

    _testfunction:
	    [--sp] = ( r7:7 );

	    LINK 12;
	    call _get_testval;
	    R1.H = L$LC$2;
	    R1.L = L$LC$2;
	    call _strcmp;
	    cc =R0==0;
	    if !cc jump L$L$11 (bp);
	    P2.H = _a;
	    P2.L = _a;
	    R7 = [P2];
    L$L$13:
	    call _get_testval;
	    R1.H = L$LC$3;
	    R1.L = L$LC$3;
	    call _strcmp;
	    cc =R0==0;
	    if !cc jump L$L$14 (bp);
	    P2.H = _c;
	    P2.L = _c;
	    R0 = [P2];
	    UNLINK;
	    R0 = R0 + R7;
	    ( r7:7 ) = [sp++];

	    rts;
    L$L$11:
	    P2.H = _b;
	    P2.L = _b;
	    R7 = [P2];
	    jump.s L$L$13;
    L$L$14:
	    P2.H = _d;
	    P2.L = _d;
	    R0 = [P2];
	    UNLINK;
	    R0 = R0 + R7;
	    ( r7:7 ) = [sp++];

	    rts;
      

As you can see, the get_testval() is called twice, even though its result will be identical. If we declare it constant, instead, the code of our test function will be the following:

    _testfunction:
	    [--sp] = ( r7:6 );

	    LINK 12;
	    call _get_testval;
	    R1.H = L$LC$2;
	    R1.L = L$LC$2;
	    R7 = R0;
	    call _strcmp;
	    cc =R0==0;
	    if !cc jump L$L$11 (bp);
	    P2.H = _a;
	    P2.L = _a;
	    R6 = [P2];
    L$L$13:
	    R1.H = L$LC$3;
	    R0 = R7;
	    R1.L = L$LC$3;
	    call _strcmp;
	    cc =R0==0;
	    if !cc jump L$L$14 (bp);
	    P2.H = _c;
	    P2.L = _c;
	    R0 = [P2];
	    UNLINK;
	    R0 = R0 + R6;
	    ( r7:6 ) = [sp++];

	    rts;
    L$L$11:
	    P2.H = _b;
	    P2.L = _b;
	    R6 = [P2];
	    jump.s L$L$13;
    L$L$14:
	    P2.H = _d;
	    P2.L = _d;
	    R0 = [P2];
	    UNLINK;
	    R0 = R0 + R6;
	    ( r7:6 ) = [sp++];

	    rts;
      

The CSE pass combines the two calls to get_testval with one. Again, this is one of the optimizations that are harder to achieve by manually changing the source code since the compiler can have a larger view of the use of its value. A common way to handle this is by using global variables, but that might require one more load from the memory, while CSE can take care of keeping the values in registers or on the stack.

Conclusions

After what you have read about pure and constant functions, you might have some concerns about the average use of them. Indeed, in a lot of cases, these two attributes allow the compiler to do something you can easily achieve by writing better code.

There are two objectives you have to keep in mind that are related to the use of these (and other) attributes. The first is code readability because sometimes the manually optimized functions are harder to read than what the compiler can produce. The second is allowing the compiler to optimize legacy or external code.

While you might not be too concerned with letting legacy code or code written by someone else get away with slower execution, a pragmatic view of the current Free Software world should take into consideration the fact that there are probably thousands lines of code of legacy code around. Some of that code, written with pre-C99 declarations, might be even using libraries that are being developed with their older interface, which could be improved by providing some extra semantic information to the compiler through use of attributes.

Also, it's unfortunately true that extensive use of these attributes might be seen by neophytes as an easy solution to let sloppy code run at a decent speed. On the other hand, the same attributes could be used to identify such sloppy code through analysis of the source code.

Although GCC does not issue warnings for all of these cases, it already warns for some of them, like unused variables, or statements without effect (both triggered by the DCE). In the future more warnings might be reported if pure and constant functions get misused.

In general, like with many other GCC function attributes, their use is tightly related to how programmers perceive their task. Most pragmatic programmers would probably like these tools, while purists will probably dislike the way these attributes help sloppy code to run almost as fast as properly written code.

My hopes are that in the future better tools will make good use of these and other attributes on different levels than compilers, like static and dynamic analyzers.


[1] The Blackfin architecture is a RISC architecture developed by Analog Devices, supported by both GCC and Binutils (and Linux, but I'm not interested in that here).

[2] I have chosen -O1 rather than -O2 because in the latter case the compiler performs extra optimization passes that I do not wish to discuss within the scope of this article.

Comments (45 posted)

Page editor: Jake Edge

Inside this week's LWN.net Weekly Edition

  • Security: SCADA system vulnerabilities; New vulnerabilities in kernel, openoffice.org, snort, tomcat,...
  • Kernel: The linux-staging tree; 2.6.26 API changes; Andrew Morton interview
  • Distributions: openSUSE merges forums ahead of 11.0 release; 64 Studio 2.1; Debian Installer lenny beta 2; Mandriva Flash 2008 Spring; openSUSE Build Service 1.0 RC 1; Celebrating 10 years of Mandriva
  • Development: Detect and record video movement with Motion, About Django and the importance of releases, threads in Python, new versions of pgAdmin III, BusyBox, JVoiceXML, OrangeHRM, GNOME, GARNOME, pdfposter, Gimp User Filter, ScaleTempo, OO.o, Dirac, Souzou, GCC, GNU Classpath, Qt Jambi, UMLet.
  • Press: Stallman complains about the Oyster payment system, Acer likes Linux for laptops, Google Gadgets for Linux, multitouch gesture support for Synaptics TouchPads, KDE 4.1 snapshot review, Linux vs. Windows on power usage.
  • Announcements: Ardour in Summer Code Finland, EFF on unmasking MySpace user, Mozilla parties, SFLC files more BusyBox suits, SUSE Linux supercomputers, Win4Lin Desktop 5, ODBMS.org panel discussion, OO.o conf cfp, PyOhio cfp, French KDE Day videos.
Next page: Security>>

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds