June 10, 2008
This article was contributed by Diego Pettenò
Introduction
Attributes and why you should use them
Free Software development is often a fun task for developers,
and it is its low barrier to entry (on average) that makes it
possible to have so much available software for so many
different tasks. This low barrier to entry, though, is also
probably the cause of the widely varying quality of the code of
these projects.
Most of the time, the quality issues one can find are not
related to developers' lack of skill, but rather to lack of
knowledge of how the tools work, in particular, the
compiler. For non-interpreted languages, the compiler is
probably the most complex tool developers have to deal
with. Because a lot of Free Software is written in C, GCC is
often the compiler of choice.
Modern compilers are also supposed to do a great job at
optimizing the code by taking code, often written with
maintainability and readability in mind, and translating it into
assembler code with a focus on performance. Code analysis for
optimization (which is also used for warnings about the code)
has the task of taking a semantic look at the code, rather than
syntactic, and identifying various fragments of algorithms that
can be replaced with faster code (or with code that uses a
smaller memory footprint, if the user desires to do so).
This task is a pretty complex one and relies on the compiler
knowing about the function called by the code. For instance, the
compiler might know when to replace a call to a (local, static)
function with its body (inlining) by
looking at its size, the number of times it is called, and its
content (loops, other calls, variables it uses). This is because
the compiler can give a semantic value to the code for a
function, and can thus assess the costs and benefits of a
particular transformation at the time of its use.
I specified above that the compiler knows when to
inline a function by looking at its
content. Almost all optimizations related to function calls work
this way: the compiler, knowing the body of a function, can
decide when it's the case to replace a call with its body; when
it is possible to completely avoid calling the function at all;
and when it is possible to call it just once and thereby
avoid multiple calls. This means, though, that these
optimization can be applied only to functions that are defined
in the same unit wherein they are used. These functions are
usually limited to static functions (functions that are not
defined as static can often be overridden both at link time and
runtime, so the compiler cannot safely assume that what it finds
in the unit is what the code will be calling).
As this is far from optimal, modern compilers like GCC provide a
way for the developer to provide information about the semantics
of a function, through the use of
attributes attached to declarations of
functions and other symbols. These attributes provide
information to the compiler on what the function does, even
though its body is not available. Consequently, the compiler can
optimize at least some of its calls.
This article will focus on two particular attributes that GCC
makes available to C developers: pure and
const, which can declare a function as
either pure or
constant. The next section will provide a
definition of these two kinds of functions, and after that I'll
get into an analysis of some common optimizations that can be
performed on the calls of these functions.
As with all the other function attributes supported by GCC and
ICC, the pure and
const attributes should be attached to the
declarative prototype of the function, so that the compiler know
about them when it finds a call to the function even without its
definition. For static functions, the attribute can be attached
to the definition by putting it between the return type and the
name of the function:
int extern_pure_function([...])
__attribute__((pure));
int extern_const_function([...])
__attribute__((const));
int __attribute__((pure)) static_pure_function([...]) {
[...]
}
int __attribute__((const)) static_const_function([...]) {
[...]
}
Pure and Constant Functions
For what concerns the scope of this article, functions can be
divided into three categories, from the smallest to the biggest:
constant functions, pure
functions and the remaining functions can be called
normal functions.
As you can guess, constant functions are also pure functions,
but pure functions cannot be not all pure functions are constant
functions. In many ways, constant functions are a special case
of pure functions. It is, therefore, best to first define pure
functions and how they differ from all the rest of the
functions.
A pure function is a function with basically no
side effect. This means that pure functions return a value that
is calculated based on given parameters and global memory, but
cannot affect the value of any other global variable. Pure
functions cannot reasonably lack a return type
(i.e. have a void return type).
GCC documentation provides strlen() as an
example of a pure function. Indeed, this function takes a pointer
as a parameter, and accesses it to find its length. This
function reads global memory (the memory pointed to by
parameters is not considered a parameter), but does not change
it, and the value returned derives from the global memory
accessed.
A counter-example of a non-pure function is the
strcpy() function. This function takes two
pointers as parameters. It accesses the latter to read the
source string, and the former to write to the destination
string. As I said, the memory areas pointed to by the parameters
are not parameters on their own, but are considered global
memory and, in that function, global memory is not only accessed for
reading, but also for writing. The return value derives directly
from the parameters (it is the same as the first parameter), but
global memory is affected by the side effect of
strcpy(), making it not pure.
Because the global memory state remains untouched, two calls
to the same pure function with the same parameters will have to
return the same value. As we'll see, it is a very important
assumption that the compiler is allowed to make.
A special case of pure functions is constant functions. A pure
function that does not access global memory, but only its
parameters, is called a constant function. This is because the
function, being unrelated to the state of global memory, will
always return the same value when given the same parameters. The
return value is thus derived directly and exclusively from the
values of the parameters given.
The way a constant function "consumes" pointers is very
different from the way other functions do: it can handle them as
both parameter and return value only if they are never
dereferenced, for accessing the memory they are referencing
would be a global memory access, which breaks the requirements
of constant functions.
Of course these requirements have to apply not only to the
operations in the given function, but also recursively to all
the functions it calls. One function can at best be of the same
kind of the least restrictive kind of function it calls. So when
it calls a normal function it can't be but a normal function
itself, if it only calls pure functions it can be either pure or
normal, but not constant, and if it only calls constant
functions it can be constant.
As with inlining, the compiler will be able
to decide if a function is pure or constant, in case no
attribute is attached to it, only if the function is
static (with the exception of special cases for freestanding
code and other advanced options). When a function is not static,
even if it's local, the compiler will assume that the function
can be overridden at link- or run-time so it will not make any
assumption based on the body for the definition it may find.
Optimizing Function Calls
Why should developers bother with marking functions pure or
constant, though? As I said, these two attributes help the
compiler to know some semantic meaning of a function call, so
that it can apply higher optimization than to normal functions.
There are two main optimizations that can be applied to these
kinds of functions: CSE
(Common Sub-expression Elimination) and
DCE (Dead Code
Elimination). We'll soon see in detail, with the help of the
compiler itself, what these two consist of. Their names,
however, are already rather explicit: CSE is
used to avoid duplicating the same code inside a function,
usually factoring out the code before branching or storing the
results of common operations in temporary variables (registers
or stack), while DCE will remove code that
would never be executed or that would be executed but never
used.
These are both optimization that can be implemented in the
source code, to an extent, reducing the usefulness of declaring
functions pure or constant. On the other hand, as I'll
demonstrate, doing so often reduces the readability of the code
by obscuring the actual algorithm in favor of making it
faster. This does not apply to all cases though, sometimes, doing
the optimization "manually", directly in the source code, makes
it more readable, and makes the code resemble the output of
the compiler more.
About Assemblers and Examples
When talking about optimization, it's quite difficult to
visualize the task of the compiler, and the way the code
morphs from what you read in the C source code into what the
CPU is really going to execute. For this reason, the best way
to write about them is to use examples, showing what the
compilers generates starting from the source code.
Given the way in which GCC works, this is actually quite
easy. You just need to enable optimization and append the
-S switch to the gcc
command line. This switch stops the compiler after the
transformation of C source code into assembly, before the
result is passed to the assembler program to produce the
object file.
Although I suspect a good fraction of the people reading this article
would be comfortable reading IA-32 or x86-64 assembly code, I
decided to use the Blackfin
[1]
assembly language, which should be readable for people who
have never studied a particular assembly language.
The Blackfin assembler is more symbolic than IA-32: instead of
having operations named movl and
addq, the operations are identified by
their algebraic operators (=,
+), while the registers are merely called
R1, R2 and so on.
Calling conventions are also quite easy to understand: for all
the cases we'll look through in the article (at most four
parameters, integers or pointers), the parameters are passed
through the registers, starting in order from
R0. The return value of the function call
is also stored in the R0 register.
To clarify the examples which will appear later on, let's see
how the following C source code is translated by GCC into
Blackfin code:
int somefunction(int a, int b, int c);
void somestringfunction(char *pA, char *pB);
int globalvar;
void test() {
somestringfunction("foo", "bar");
globalvar = somefunction(11, 22, 33);
}
becomes:
.section .rodata
.align 4
L$LC$0:
.string "foo"
.align 4
L$LC$1:
.string "bar"
.text;
.align 4
.global _test;
.type _test, STT_FUNC;
_test:
LINK 12;
R0.H = L$LC$0;
R0.L = L$LC$0;
R1.H = L$LC$1;
R1.L = L$LC$1;
call _somestringfunction;
R0 = 11 (X);
R1 = 22 (X);
R2 = 33 (X);
call _somefunction;
P2.H = _globalvar;
P2.L = _globalvar;
[P2] = R0;
UNLINK;
rts;
.size _test, .-_test

As the Blackfin does not have 32 bit immediate load, you
have to load high and low addresses separately (in
whichever order); the assembler will take care of properly
loading the high 16 bits of the label to the upper
part of the register, and the low 16 bits to the lower part.
Once the parameters are loaded, the function is called
almost identically to any other call
operation on other architectures; note the prefixed
underscore on symbols' names.
Integers, both constant or parameters and variables, are
also loaded for calls in the registers. Blackfin doesn't
have 32 bit immediate loading, but if the constant to load
fits into 16 bits, it can be loaded through sign extension
by appending the (X) suffix.
When accessing a global memory location, the
P2 pointer is set to the address of the
memory location...
... and then dereferenced to assign that memory
area. Being a RISC architecture, Blackfin does not have
direct memory operations.
The return value for a function is loaded into the
R0 register, and can be accessed from
there.
The rts command is the return from
subroutine, and usually indicates the end of the function,
but like the return statement in C,
it might appear in any place of the routine.
In the following examples, the preambles with declarations and
data will be omitted whenever these are not useful to the
discussion.
Concerning optimization levels, the code will almost
always be compiled with at least the first optimization level
enabled (-O1). This both because it makes the code cleaner to
read (using register-register copy for parameters passing,
instead of saving to the stack and then restoring from that)
and because we need optimization enabled to see how they are
applied.
Also, most of the times I'll refer to the
fastest alternative. Most of what I say,
though, applies also to the smaller
alternative when using the -Os optimization level. In any
case, the compiler always weighs the cost-to-benefit ratio
between the optimized and the unoptimized version, or between
different optimized versions. If you want to know the exact
route the compiler takes for your code, you can always use the
-S switch to find out.
DCE and Unused Variables
One area where DCE is useful
is to avoid operations that result in unused data. It's
not that uncommon that a variable is defined by an operation,
complex or not, and is then never used by the code, either
because it is intended for future expansion or because it's a
remnant of older code that has been removed or replaced. While
the best thing would be to get rid of the definition entirely,
users expect the compiler to produce a good result with sloppy
code too, and that operation should not be emitted.
The DCE pass can remove all the code that
has no side effect, when its result is not used. This includes
all mathematical operations and functions known to be pure or
constant (as neither are allowed to change the global state of
the variables). If a function call is not known to be at least
pure, it may change the global state, and its call will not be
eliminated, as shown in the following code:
int someimpurefunction(int a, int b);
int somepurefunction(int a, int b)
__attribute__((pure));
int testfunction(int a, int b, int c) {
int res1 = someimpurefunction(c, b);
int res2 = somepurefunction(b, c);
int res3 = a + b - c;
return a;
}
Which, once compiled with -O1,
[2]
produces the following Blackfin assembler:
_testfunction:
[--sp] = ( r7:7 );
LINK 12;
R7 = R0;
R0 = R2;
call _someimpurefunction;
R0 = R7;
UNLINK;
( r7:7 ) = [sp++];
rts;
As you can see, the call to the pure function has been
eliminated (the res2 variable was not being
used), together with the algebraic operation but, the impure
function, albeit having its return value discarded, is still
called. This is due to the fact that the compiler emits the
call, not knowing whether the latter function has side
effects on the global memory state or not.
This is equivalent to the following code (which
produces the same assembler code):
int someimpurefunction(int a, int b);
int testfunction(int a, int b, int c) {
someimpurefunction(c, b);
return a;
}
The Dead Code Elimination optimization can be very helpful to
reduce the overhead caused by code written to conform to C89
standard, where you couldn't mix variables (and constant)
declarations with executable code.
In those sources, you had to declare variables at the top of
the function, and then start to check for prerequisites. If
you wanted to make it explicit that some variable had to keep
its value, by making it constant, you would often have to fill
them before the prerequisites could be checked.
Without discussing legacy code, it is also useful when
writing debug code, so that it doesn't look out of place from
the use of lots of #ifdef directives. Take
for instance the following code:
#ifdef NDEBUG
# define assert_se(x) (x)
#else
void assert_se(int boolean);
#endif
char *getsomestring(int i) __attribute__((pure));
int dosomethinginternal(void *ctx, int code, int val);
int dosomething(void *ctx, int code, int val) {
char *string = getsomestring(code);
// returning string might be a sub-string of "something"
// like "some" or "so"
assert_se(strncmp(string, "something", strlen(string)) == 0);
return dosomethinginternal(ctx, code, val);
}
The assert_se macro has different
behavior from the standard assert, as it
has side effects, which basically means that the code passed
to the assertion is called even though the compiler is told to
disable debugging. This is a somewhat common trick, although
its effects on readability are debatable.
With getsomestring() pure, when compiling
without debugging, the DCE will remove the calls to all three
functions: getsomestring(),
strncmp() and
strlen() (the latter two are usually
declared as pure by both the C library and by GCC's built-in
replacements). This because none of these functions have a
side effect, resulting in a very short function:
_dosomething:
LINK 0;
UNLINK;
jump.l _dosomethinginternal;
If our getsomestring() function weren't
pure, even though its return value is not going to be used,
the compiler would have to emit the call, resulting in rather
more complex (albeit still simple, compared with most
real-world functions) assembler code:
_dosomething:
[--sp] = ( r7:5 );
LINK 12;
R7 = R0;
R0 = R1;
R6 = R1;
R5 = R2;
call _getsomestring;
UNLINK;
R0 = R7;
R1 = R6;
R2 = R5;
( r7:5 ) = [sp++];
jump.l _dosomethinginternal;
Common Sub-expression Elimination
The Common Sub-expression Elimination optimization is one of
the most important optimizations performed by the compiler,
because it's the one that, for instance, replaces multiple
indexed accesses to an array so that the actual memory address
is calculated just once.
What this optimization does is to find common operations
executed on the same operands (even when they are not known at
compile-time), decide which ones are more expensive than
saving the result in a temporary (register or stack), and then
swapping the code around to take the cheapest course.
While its uses are quite varied, one of the easiest ways to
see the work of the CSE is to look at the
code generated when using the ternary if
operator. Let's take the following code:
int someimpurefunction(int a);
int somepurefunction(int a)
__attribute__((pure));
int testfunction(int a, int b, int c, int d) {
int res1 = someimpurefunction(a) ? someimpurefunction(a) : b;
int res2 = somepurefunction(a) ? somepurefunction(a) : c;
int res3 = a+b ? a+b : d;
return res1+res2+res3;
}
The compiler will optimize the code as:
_testfunction:
[--sp] = ( r7:4 );
LINK 12;
R7 = R0;
R5 = R1;
R4 = R2;
call _someimpurefunction;
cc =R0==0;
if !cc jump L$L$2;
R6 = R5;
jump.s L$L$4;
L$L$2:
R0 = R7;
call _someimpurefunction;
R6 = R0;
L$L$4:
R0 = R7;
call _somepurefunction;
R1 = R0;
cc =R0==0;
if cc R1 =R4; /* movsicc-1b */
R0 = R5 + R7;
cc =R0==0;
R2 = [FP+36];
if cc R0 =R2; /* movsicc-1b */
R1 = R1 + R6;
R0 = R1 + R0;
UNLINK;
( r7:4 ) = [sp++];
rts;
As you can see, the pure function is called just once, because the
two references inside the ternary operator are equivalent,
while the other one is called twice. This is because there was
no change to global memory known to the compiler between the
two calls of the pure function (the function itself couldn't
change it – note that the compiler will never take
multi-threading into account, even when asking for it
explicitly through the -pthread flag),
while the non-pure function is allowed to change global memory
or use I/O operations.
The equivalent code in C would be something along the
following lines (it differs a bit because the compiler will
use different registers):
int someimpurefunction(int a);
int somepurefunction(int a)
__attribute__((pure));
int testfunction(int a, int b, int c, int d) {
int res1 = someimpurefunction(a) ? someimpurefunction(a) : b;
const int tmp1 = somepurefunction(a);
int res2 = tmp1 ? tmp1 : c;
const int tmp2 = a+b;
int res3 = tmp2 ? tmp2 : d;
return res1+res2+res3;
}
The Common Sub-expression Elimination optimization is very
useful when writing long and complex mathematical
operations. The compiler can find common calculations even
though they don't look common to the naked eye, and act on
those.
Although sometimes you can get away with using multiple
constants or variables to carry out temporary operations so
that they can be re-used in the following calculations,
leaving the formulae entirely explicit is usually more
readable, as long as the formulae are not intended to change.
Like with other algorithms, there are some advantages to
reducing the source code used to calculate the same thing; for
instance you can easily make a change directly to the
definition of a constant and get the change propagated to all
the uses of that constant. On the other hand, this can be
quite a problem if the meaning of two calculations is very
different (and thus can vary in different ways with the
evolution of the code), and just happen to be calculated in
the same way at a given time.
Another rather useful place where the compiler can further
optimize code with CSE, where it wouldn't be so nice or simple
to do manually in the source code, is where you deal with
static functions that are inlined by the compiler.
Let's examine the following code for instance:
extern int a;
extern int b;
static inline int somefunc1(int p) {
return (p * 16) + (3 << a);
}
static inline int somefunc2(int p) {
return (p * 16) + (4 << b);
}
extern int res1;
extern int res2;
extern int res3;
extern int res4;
void testfunc(int p1, int p2)
{
res1 = somefunc1(p1);
res2 = somefunc2(p1);
res3 = somefunc1(p2);
res4 = somefunc2(p2);
}
In this code, you can find four basic expressions:
(p1 * 16), (p2 *
16), (3 << a) and
(4 << b). Each of these four
expressions is used twice in the
somefunc() function. Thanks to the CSE,
though, the code will calculate each of them once, even
though they cross the function boundary, producing the
following code:
_testfunc:
[--sp] = ( r7:7 );
LINK 0;
R0 <<= 4;
R1 <<= 4;
P2.H = _a;
P2.L = _a;
R2 = [P2];
R7 = 3 (X);
R7 <<= R2;
P2.H = _b;
P2.L = _b;
R2 = [P2];
R3 = 4 (X);
R3 <<= R2;
R2 = R0 + R7;
P2.H = _res1;
P2.L = _res1;
[P2] = R2;
P2.H = _res2;
P2.L = _res2;
R0 = R0 + R3;
[P2] = R0;
R7 = R1 + R7;
P2.H = _res3;
P2.L = _res3;
[P2] = R7;
R1 = R1 + R3;
P2.H = _res4;
P2.L = _res4;
[P2] = R1;
UNLINK;
( r7:7 ) = [sp++];
rts;
As you can easily see (the assembly was modified a bit to
improve its readability, the compiler re-ordered loads of
registers to avoid pipeline stalls, making it harder to see the
point), the four expressions are calculated first, and stored
respectively in the registers R0,
R1, R7 and
R3.
These kinds of sub-expressions are usually harder to see in
the code and also harder to implement. Sometimes they get
factored out on their own parameter, but that can be more
expensive during execution, depending on the calling conventions
of the architecture.
Cheats
As I wrote above, there are some requirements that apply to
functions that are declared pure and constant, related to not
changing or accessing global memory; not executing I/O
operations; and, of course, not calling further impure
functions. The reason for this is that the compiler will
accept what the user declares the function to be, whatever its
body is (as it's usually unknown by the compiler at the call
stage).
Sometimes, though, it's possible to fool the compiler so that
it treats impure functions as pure or even constant
functions. Although this is a risky endeavor, as it might
truly cause bad code generation by the compiler, it can
sometimes be used to force optimization for particular
functions.
An example of this can be a lookup function that scans through
a global table to return a value. While it is accessing global
memory, you might want the compiler to promote it to a
constant function, rather than simply to a pure one.
Let's take for instance the following code:
const struct {
const char *str;
int val;
} strings[] = {
{ "foo", 31 },
{ "bar", 34 },
{ "baz", -24 }
};
const char *lookup(int val) {
int i;
for(i = 0; i < sizeof(strings)/sizeof(*strings); i++)
if ( strings[i].val == val )
return strings[i].str;
return NULL;
}
void testfunction(int val, const char **str, unsigned long *len) {
if ( lookup(val) ) {
*str = lookup(val);
*len = strlen(lookup(val));
}
}
If the lookup() function is only
considered a pure function, as it is, adhering to the rules we
talked about at the start of the article, it will be called
three times in testfunction(), like this:
_testfunction:
[--sp] = ( r7:7, p5:4 );
LINK 12;
R7 = R0;
P5 = R1;
P4 = R2;
call _lookup;
cc =R0==0;
if cc jump L$L$17;
R0 = R7;
call _lookup;
[P5] = R0;
R0 = R7;
call _lookup;
call _strlen;
[P4] = R0;
L$L$17:
UNLINK;
( r7:7, p5:4 ) = [sp++];
rts;
Instead, we can trick the compiler by declaring the
lookup() function as constant (the data
it is reading is constant, after all, so at a given parameter
it will always return the same result). If we do that, the
three calls will have to return the same value, and the
compiler will be able to optimize them as a single call:
_testfunction:
[--sp] = ( p5:4 );
LINK 12;
P5 = R1;
P4 = R2;
call _lookup;
cc =R0==0;
if cc jump L$L$17;
[P5] = R0;
call _strlen;
[P4] = R0;
L$L$17:
UNLINK;
( p5:4 ) = [sp++];
rts;
In addition to lookup functions on constant tables, this
trick is useful with functions which read data from files or
other volatile data, and cache it in a memory variable.
Take for instance the following function that reads an
environment variable:
char *get_testval() {
static char *cachedval = NULL;
if ( cachedval == NULL ) {
cachedval = getenv("TESTVAL");
if ( cachedval == NULL )
cachedval = "";
else
cachedval = strdup(cachedval);
}
return cachedval;
}
This is not truly a constant function, as its return value
depends on the environment. Even so, assuming that the
environment of the process is left untouched, its return value
will never change between calls. Even though it will affect
the global state of the program (as the
cachedval static variable will be filled in
the first time the function is called), it can be assumed to
always return the same value.
Tricking the compiler into thinking that a function is constant
even though it has to load data through I/O operations, as I
said, is risky, as the compiler will think there is no I/O
operation going on; on the other hand, this trick might make a
difference sometimes, as it allows the expression of functions
in more semantic ways, leaving it up to the compiler to
optimize the code with temporaries, where needed.
One example can be the following code:
char *get_testval() {
static char *cachedval = NULL;
if ( cachedval == NULL ) {
cachedval = getenv("TESTVAL");
if ( cachedval == NULL )
cachedval = "";
else
cachedval = strdup(cachedval);
}
return cachedval;
}
extern int a;
extern int b;
extern int c;
extern int d;
static int testfunc1() {
if ( strcmp(get_testval(), "FOO") == 0 )
return a;
else
return b;
}
static int testfunc2() {
if ( strcmp(get_testval(), "BAR") == 0 )
return c;
else
return d;
}
int testfunction() {
return testfunc1() + testfunc2();
}
Note:
To make sure that the compiler won't reduce the three
function calls to their return values right away, the static
sub-functions return values taken from global variables; the
meanings of those variables are not important.
Considering the above source code, if
get_testval() is impure, as the compiler
will automatically find it to be, it will be compiled into:
_testfunction:
[--sp] = ( r7:7 );
LINK 12;
call _get_testval;
R1.H = L$LC$2;
R1.L = L$LC$2;
call _strcmp;
cc =R0==0;
if !cc jump L$L$11 (bp);
P2.H = _a;
P2.L = _a;
R7 = [P2];
L$L$13:
call _get_testval;
R1.H = L$LC$3;
R1.L = L$LC$3;
call _strcmp;
cc =R0==0;
if !cc jump L$L$14 (bp);
P2.H = _c;
P2.L = _c;
R0 = [P2];
UNLINK;
R0 = R0 + R7;
( r7:7 ) = [sp++];
rts;
L$L$11:
P2.H = _b;
P2.L = _b;
R7 = [P2];
jump.s L$L$13;
L$L$14:
P2.H = _d;
P2.L = _d;
R0 = [P2];
UNLINK;
R0 = R0 + R7;
( r7:7 ) = [sp++];
rts;
As you can see, the get_testval() is
called twice, even though its result will be identical. If we
declare it constant, instead, the code of our test function
will be the following:
_testfunction:
[--sp] = ( r7:6 );
LINK 12;
call _get_testval;
R1.H = L$LC$2;
R1.L = L$LC$2;
R7 = R0;
call _strcmp;
cc =R0==0;
if !cc jump L$L$11 (bp);
P2.H = _a;
P2.L = _a;
R6 = [P2];
L$L$13:
R1.H = L$LC$3;
R0 = R7;
R1.L = L$LC$3;
call _strcmp;
cc =R0==0;
if !cc jump L$L$14 (bp);
P2.H = _c;
P2.L = _c;
R0 = [P2];
UNLINK;
R0 = R0 + R6;
( r7:6 ) = [sp++];
rts;
L$L$11:
P2.H = _b;
P2.L = _b;
R6 = [P2];
jump.s L$L$13;
L$L$14:
P2.H = _d;
P2.L = _d;
R0 = [P2];
UNLINK;
R0 = R0 + R6;
( r7:6 ) = [sp++];
rts;
The CSE pass combines the two calls to
get_testval with one. Again, this is one
of the optimizations that are harder to achieve by manually
changing the source code since the compiler can have a larger
view of the use of its value. A common way to handle this is
by using global variables, but that might require one more
load from the memory, while CSE can take care of keeping the
values in registers or on the stack.
Conclusions
After what you have read about pure and constant functions, you
might have some concerns about the average use of them. Indeed,
in a lot of cases, these two attributes allow the compiler to do
something you can easily achieve by writing better code.
There are two objectives you have to keep in mind that are
related to the use of these (and other) attributes. The first is
code readability because sometimes the manually optimized
functions are harder to read than what the compiler can
produce. The second is allowing the compiler to optimize legacy
or external code.
While you might not be too concerned with letting legacy code or
code written by someone else get away with slower execution, a
pragmatic view of the current Free Software world should take
into consideration the fact that there are probably thousands
lines of code of legacy code around. Some of that code, written with
pre-C99 declarations, might be even
using
libraries that are being developed with their older interface,
which could be improved by providing some extra semantic
information to the compiler through use of attributes.
Also, it's unfortunately true that extensive use of these
attributes might be seen by neophytes as an easy solution to let
sloppy code run at a decent speed. On the other hand, the same
attributes could be used to identify such sloppy code through
analysis of the source code.
Although GCC does not issue warnings for all of these cases, it
already warns for some of them, like unused variables, or
statements without effect (both triggered by the
DCE). In the future more warnings might be
reported if pure and constant functions get misused.
In general, like with many other GCC function attributes, their
use is tightly related to how programmers perceive their
task. Most pragmatic programmers would probably like these
tools, while purists will probably dislike the way these
attributes help sloppy code to run almost as fast as properly
written code.
My hopes are that in the future better tools will make good use
of these and other attributes on different levels than
compilers, like static and dynamic analyzers.
[1]
The Blackfin architecture is a RISC architecture developed
by Analog Devices, supported by both GCC and Binutils (and
Linux, but I'm not interested in that here).
[2]
I have chosen -O1 rather than -O2 because in the latter
case the compiler performs extra optimization passes that
I do not wish to discuss within the scope of this article.
(
Log in to post comments)