Google announces Gadgets for Linux
By Jake Edge
June 11, 2008
Google recently announced
the release of their Gadgets for the Linux desktop, and, unlike some of
their other
desktop offerings, they released it under a free software license. While
it is not earth-shattering technology, Gadgets does provide some
interesting features and amusing diversions. It also generates some hope
that Google is getting better at understanding what free software users are
looking for, so perhaps things like the Google Desktop for Linux will be
better integrated and more useful in the future.
Gadgets are a
cross-platform way to create simple applications that can run on web pages
and desktops. The gadget API provides a means to retrieve content from
other sites and display it along with a user interface. Many kinds of
applications can be created, from clocks and calendars to RSS-feedreaders
and "picture of the day" viewers.
There are numerous gadgets available, a semi-random collection on a KDE
desktop can be seen at left. Google has created a handful of gadgets, but
the vast majority are available from others in various categories including
News, Sports, Finance, Fun and Games, Technology, and Communication. The
gadget browser shown below, at right, allows easy access to an amazing
number of choices, many of which are variations on a theme.
To get started with gadgets, it is first necessary to build the tool.
Google does not yet provide .rpm or .deb files for various distributions.
The "how
to build" page was useful, but there was some difficulty in trying to
translate the dependencies
into Fedora 9 package names. A page
in a language I don't know needed no translation, however. Linux commands,
it seems, are multi-lingual.
Building from the Apache-licensed source tarball was straightforward after
that. Gadgets for Linux comes in both
GTK+ and Qt flavors which allows for integration with the two dominant
Linux desktop environments. The screenshots accompanying this article are
from the Qt version, but a bit of a look at the GTK+ version seemed roughly
the same—though the Qt version lacks the sidebar dock.
This is a beta release, perhaps more of a beta than many Google releases,
so there are still a fair number of glitches. Perhaps 20% of the gadgets
tried had one problem or another, with some seeming not to function at
all. Having no experience with gadgets on other platforms, it was not
clear whether these were caused by bugs in the gadgets themselves or the
desktop
gadget program.
The main benefit of the gadget API seems to be the cross-platform
capabilities. Gadgets can run—largely unchanged—on Linux, Mac OS
X, or Windows, but can also run in browsers on web pages at social
networking sites or on other pages. If the API can deliver that wide of a
range of platform choices, it could open up a much wider audience for folks
that want to develop their gadgets on Linux.
Still missing is one of the tools recommended for developing gadgets, Gadget
Designer, which is only available for Windows. The documentation
for creating a gadget make it look like a tedious exercise in XML
manipulation and Javascript programming, but there may be tools available
or in development to make some of that easier.
Overall, gadgets look like an interesting project. There is really nothing
new about the kinds of applications that can be built using the API, but
there are few choices to build those kinds of programs in a truly cross-platform way.
Google's choice to support Linux—and support it
well—accompanied by the code under a
free software license is, perhaps, the best news of all.
Comments (7 posted)
An interview with Jim Ready
By Jonathan Corbet
June 11, 2008
Jim Ready has a long history in the embedded systems market. Most
recently, he became the founder of MontaVista, now one of the most
successful embedded Linux companies. A
recent LWN article took issue
with some of Jim's comments; it only seemed fair to give him the
opportunity to present his side of the story. Thus, this interview. We
asked several questions about MontaVista and its approach to Linux
marketing, and Jim took quite a bit of time to answer them in detail. So,
without further ado...
You have been working in the embedded Linux market for some years. How has
that market changed over that time? What do you think are the prospects
for embedded Linux now?
The single biggest change, and one that gives me great pleasure, is that
embedded Linux is now mainstream, part of the landscape, and arguably the
fastest growing embedded OS. Believe me, when we started in 1999 that was
hardly the case. Of course, the complexity of the devices that our
customers are building continues to increase. The underlying hardware is
typically a highly integrated system with loads of I/O, for example the
SOCs (System On a Chip) such as the TI 3430. That in turn drives both
complexity in Linux as well as in the application and middleware software
stacks. It's pretty amazing to realize that a little Linux-based
handheld device running on batteries is powerful enough to have supplied
the State of California government's computing needs not so many years
ago.
Where do you think MontaVista's sweet spot is in that market?
Companies who are highly focused on their value-add who want a first class
partner to supply them a suitable Linux and associated services (consulting
and training etc.) upon which they will develop their application. The more
formal approach a company takes towards their own software development, the
more they care about meeting schedules and the higher their requirements
for quality, the "sweeter" MontaVista looks. When you're basing a
billion dollar product line on someone's OS you care about what's
going in your product. So when Motorola, NEC or Panasonic end up shipping
30 million phones, they need a supplier who can meet their technical,
schedule and quality requirements. The phones have to ship for the
Christmas season, and no one wants to recall millions of devices.
As
another example, our Carrier Grade Linux distribution is the core OS in
deployed NEC systems which have established 99.9999% availability
(that's no more than ~31.5 seconds of unscheduled downtime in a year,
which is a DoCoMo requirement). Our Professional Edition is the OS for two
different patient monitoring systems that have been through FDA
certification. We're truly fortunate to have thousands of customers,
both big and not-so-big.
Embedded systems vendors have, as a group, been criticized for their lack
of participation in the free software development process. Are you happy
with MontaVista's level of contribution? What, in your mind, are some of
the highlights of MontaVista's community participation?
Most contribution surveys show MV in the top 10 of Linux contributors. (No
other embedded Linux vendor even makes it into the top 30) Arguably
MontaVista contributes more to Linux relative to our size than any other
company in the world. It has always been a cornerstone of our strategy to
be a major contributor to Linux. We figured the more gas we poured on the
Linux fire the faster we could erode the RTOS suppliers installed base and
speed to movement towards Linux. We are perhaps best known for our work in
helping making Linux real-time capable but over time we have also made
significant contributions in the PPC, XScale, MIPS and ARM trees as well as
some other specific projects such as kgdb, LTT, DPM (Dynamic Power
Management) etc.
Your recent article
in Military Embedded Systems was seized upon by a
proprietary embedded vendor as proof that Linux is too expensive and
difficult for embedded applications. Assuming you disagree with his
conclusions, where do you think his reasoning went wrong?
Well there really wasn't any reasoning, just ranting. But having said
that Dan (Dan O'Dowd Greenhills' CEO) implied that our business model
was to allow customers to more or less get in trouble by developing their
own from scratch Linux distribution and then charge them for support to
bail them out. Of course that's not what we do. Rather we build a robust
fully tested and supported embedded Linux distribution (MontaVista Linux
Professional Edition, for example) and deliver that to our customers. We
then maintain and support that specific version of MontaVista Linux over
time, even as the community dashes forward. In fact we have maintenance
obligations that can be as long as a decade from initial deployment.
That
approach gets the customer out of the business of making their own
distribution, maintaining and supporting it with all the accompanying
costs. So we shield the customer from the complexity and change rate that
they otherwise would be exposed to if they were on their own. They don't
have to watch all the patches, monitor the newsgroups and otherwise be tied
up, they can get on to building their product. Dan purposely ignored the
fact that a commercial embedded Linux distribution makes it very easy to
use Linux as an embedded OS. I suspect that's why he tried to hide it.
Your article suggests that an embedded systems manufacturer using Linux
would start by assembling the kernel and development toolchain by hand.
Why do you think they would do that? Even in the absence of vendors like
MontaVista, there are numerous options which do not require assembling
systems at such a low level; why would a vendor not use one of them?
We know from direct experience that even starting with what appears to be
"pre-assembled" distribution, from a semiconductor maker or
elsewhere, a developer sometimes isn't getting what they think they
are.
Don't get me wrong, almost any Linux distribution can serve as a
starting point, maybe 99.99% perfect, but our customers demand more than
that. They want to be at the end of the Linux development cycle, not the
beginning. For example, a Linux distribution we recently started working
with had the following problems:
- The code explicitly ignored Linux coding standards by adding hardware
dependencies. That code would never be accepted into the upstream trees,
and this kind of fork creates debugging issues and additional maintenance
burden.
- The drivers were not SMP-safe, real-time safe nor did they support DPM, yet
the device was designed for applications where all three could well be
required. In order to take advantage of these advanced features, the
device driver would need to be re-written from scratch.
- The code contained numerous defects that caused the system to crash. Error
returns were not checked and other problems indicating very poor coding
practices. These are exactly the type of quality issues that should compel
businesses to find a Linux commercialization partner.
We had the great pleasure of fixing all these problems as we assembled our
distribution. Even with our standard practice of pushing back the changes,
as you well know, there is no guarantee by the community that these changes
make it back into the appropriate open source trees.
The fact is it is difficult for a prospective Linux developer to have any
idea of the state of the Linux distribution they might select. A high
quality, commercial distribution can give a developer some peace of mind
about what they are getting. For example, MontaVista has a formal
development process in place for each of its releases, with quantitative
criteria that must be met for defects (0 critical defects for example with a
sharply declining overall new defect detection curve.) before the
distribution can ship. Our processes have been formally audited by a number
of our largest customers in order to assure themselves of what they are
getting from us. And as we mentioned above, the proven results from devices
in the field speak to our abilities. As for other starting points,
you'll have to ask them about their process.
There were some interesting numbers in that article. Where did the
5000 messages/day for kernel.org come from - which lists?
Since our engineers live and breathe Linux and the other Open Source
components that make up and embedded Linux distribution, we have a pretty
good feel for the overall rate of traffic we keep up with. Based upon that
experience we measured the overall message traffic that a developer would
have to monitor to keep abreast of the daily ins and outs of the typical
mix of software they would use for an embedded Linux project. We aggregated
the total under kernel.org, which isn't precisely correct, but the
paragraph preceding that statement clearly referred to a set of lists one
would have to monitor.
For example, the monitoring would include not only
lkml, but also the lists for other significant parts of the software
typically used for an embedded project, including the list maintained for
the specific architecture used (MIPS, PPC ARM etc), the real-time list,
networking, IPv6, security, advanced filesystems etc. By the way, the lkml
list on May 21, 2008 contained ~500 messages, and gcc contained ~100, just
for starters. So it wasn't just lkml at 5000 but a total set that can
total up to 5000 per day. Does the fact that lkml is only ~500 a day (and
"only" ~200-300 on weekends) make it any less daunting? I don't
think so.
You say: "a recent security patch that took all of 13 lines of code to
implement against an embedded Linux system would have taken more than 800k
lines of source patches to implement if the previous trail of patches had
been ignored." How was that number arrived at? Which security patch were
you describing? How could it possibly require 800,000 lines of patches "to
implement" this security fix?
This example comes from a sequence back in 2006 (CVE-2006-1528 to be
precise), but the "problem" is just as true today. Here's the
setup:
A developer decides to use Linux and has taken the strategy of minimizing
their costs by using a community-maintained Linux kernel. (This story would
be true if the developer started with the typical semiconductor
distribution, by the way) The community has a good reputation for stability
and defect resolution and therefore the developers think they can minimize
their own effort.
They start with Linux 2.6.10 and base their device
application software on this release. During testing, they notice a defect
and find that the defect has been identified (the good news) and fixed in
2.6.13 (the bad news). So now they have a problem, moving up to 2.6.13,
where the defect is fixed also introduces 846,233 new lines of code (the
delta between 2.6.10 and 2.6.13).
This magnitude of change restarts their
QA process, since so much code has changed in the underlying Linux
kernel. Their other choice is to backport the fix, which in this particular
case is 33 lines (we know because we did it), but now the developer has
taken on maintenance of their own Linux, which was what they were trying to
avoid in the first place. This drift between a Linux release you have
baselined and the fact that defects are often fixed in newer releases
presents a less than perfect set of choices for developers. Whether you
wanted to or not, you're in the Linux maintenance business. This drift
problem is true of many distributions, not just dealing with kernel.org.
If our customer found the same defect, we have the obligation to fix it in
the release that they purchased from us; we don't force them to
potentially destabilize their environment by sending them a newer kernel
release where the defect was originally fixed. I guess it all depends how
cavalier one is about changing your underlying operating system after
you've developed and tested your application. In general our customers
are very strict about minimizing changes, and so are we.
At least some of MontaVista's marketing would appear to focus on making
Linux look scary. Are you not concerned that this approach might have the
effect of making Linux in general look less attractive and, thus, playing
into the hands of proprietary systems vendors?
No one is a bigger proponent of embedded Linux than MontaVista (and we have
the contributions to prove it). But it doesn't do us any good to have
folks try Linux, get over their heads and fail, and attribute that failure
to Linux.
We have seen over the past 8 years any number of projects that got into
trouble by not understanding what to expect when they downloaded some Linux
and started in by themselves. In fact one of our very earliest customers,
back in 1999, had started off building their own Linux, and hit a hardware
integration bug that stopped them dead in their tracks for weeks, putting
their project in real trouble. Had we not been able to help them out, their
alternative was Windows CE. Ugh!
Why shouldn't many millions of lines of complex operating system code
that changes daily be a little scary, especially when your business is
making devices, not operating systems? I think it is a mistake to
"trivialize" the difficulty in owning large amounts of any software,
including Linux. That's why I think it's important for folks to be
well informed about what they are getting into, so they can make good
decisions on how they will approach using Linux for their system, whether
they do-it-themselves or go commercial. In either case we want them to
succeed.
Is there anything else you would like to pass on to LWN's readers?
We can all be quite proud of the enormous progress Linux has made in
transforming the embedded OS marketplace from one which was highly
fragmented and largely devoid of standards, to an environment based upon a
highly functional OS which is a truly open standard: Linux. A whole cast of
characters made this possible: visionary customers who dared think it was
possible to embed Linux in their devices, the semiconductor companies
making sure Linux was ported to their chips, commercial companies such as
MontaVista and others making rock solid distributions that were capable of
being deployed by the millions, and numerous individuals who made
significant contributions along the way. It's a pretty powerful
combination that's hard to beat.
We would like to thank Jim for taking the time to answer our questions.
Comments (2 posted)
Implications of pure and constant functions
June 10, 2008
This article was contributed by Diego Pettenò
Introduction
Attributes and why you should use them
Free Software development is often a fun task for developers,
and it is its low barrier to entry (on average) that makes it
possible to have so much available software for so many
different tasks. This low barrier to entry, though, is also
probably the cause of the widely varying quality of the code of
these projects.
Most of the time, the quality issues one can find are not
related to developers' lack of skill, but rather to lack of
knowledge of how the tools work, in particular, the
compiler. For non-interpreted languages, the compiler is
probably the most complex tool developers have to deal
with. Because a lot of Free Software is written in C, GCC is
often the compiler of choice.
Modern compilers are also supposed to do a great job at
optimizing the code by taking code, often written with
maintainability and readability in mind, and translating it into
assembler code with a focus on performance. Code analysis for
optimization (which is also used for warnings about the code)
has the task of taking a semantic look at the code, rather than
syntactic, and identifying various fragments of algorithms that
can be replaced with faster code (or with code that uses a
smaller memory footprint, if the user desires to do so).
This task is a pretty complex one and relies on the compiler
knowing about the function called by the code. For instance, the
compiler might know when to replace a call to a (local, static)
function with its body (inlining) by
looking at its size, the number of times it is called, and its
content (loops, other calls, variables it uses). This is because
the compiler can give a semantic value to the code for a
function, and can thus assess the costs and benefits of a
particular transformation at the time of its use.
I specified above that the compiler knows when to
inline a function by looking at its
content. Almost all optimizations related to function calls work
this way: the compiler, knowing the body of a function, can
decide when it's the case to replace a call with its body; when
it is possible to completely avoid calling the function at all;
and when it is possible to call it just once and thereby
avoid multiple calls. This means, though, that these
optimization can be applied only to functions that are defined
in the same unit wherein they are used. These functions are
usually limited to static functions (functions that are not
defined as static can often be overridden both at link time and
runtime, so the compiler cannot safely assume that what it finds
in the unit is what the code will be calling).
As this is far from optimal, modern compilers like GCC provide a
way for the developer to provide information about the semantics
of a function, through the use of
attributes attached to declarations of
functions and other symbols. These attributes provide
information to the compiler on what the function does, even
though its body is not available. Consequently, the compiler can
optimize at least some of its calls.
This article will focus on two particular attributes that GCC
makes available to C developers: pure and
const, which can declare a function as
either pure or
constant. The next section will provide a
definition of these two kinds of functions, and after that I'll
get into an analysis of some common optimizations that can be
performed on the calls of these functions.
As with all the other function attributes supported by GCC and
ICC, the pure and
const attributes should be attached to the
declarative prototype of the function, so that the compiler know
about them when it finds a call to the function even without its
definition. For static functions, the attribute can be attached
to the definition by putting it between the return type and the
name of the function:
int extern_pure_function([...])
__attribute__((pure));
int extern_const_function([...])
__attribute__((const));
int __attribute__((pure)) static_pure_function([...]) {
[...]
}
int __attribute__((const)) static_const_function([...]) {
[...]
}
Pure and Constant Functions
For what concerns the scope of this article, functions can be
divided into three categories, from the smallest to the biggest:
constant functions, pure
functions and the remaining functions can be called
normal functions.
As you can guess, constant functions are also pure functions,
but pure functions cannot be not all pure functions are constant
functions. In many ways, constant functions are a special case
of pure functions. It is, therefore, best to first define pure
functions and how they differ from all the rest of the
functions.
A pure function is a function with basically no
side effect. This means that pure functions return a value that
is calculated based on given parameters and global memory, but
cannot affect the value of any other global variable. Pure
functions cannot reasonably lack a return type
(i.e. have a void return type).
GCC documentation provides strlen() as an
example of a pure function. Indeed, this function takes a pointer
as a parameter, and accesses it to find its length. This
function reads global memory (the memory pointed to by
parameters is not considered a parameter), but does not change
it, and the value returned derives from the global memory
accessed.
A counter-example of a non-pure function is the
strcpy() function. This function takes two
pointers as parameters. It accesses the latter to read the
source string, and the former to write to the destination
string. As I said, the memory areas pointed to by the parameters
are not parameters on their own, but are considered global
memory and, in that function, global memory is not only accessed for
reading, but also for writing. The return value derives directly
from the parameters (it is the same as the first parameter), but
global memory is affected by the side effect of
strcpy(), making it not pure.
Because the global memory state remains untouched, two calls
to the same pure function with the same parameters will have to
return the same value. As we'll see, it is a very important
assumption that the compiler is allowed to make.
A special case of pure functions is constant functions. A pure
function that does not access global memory, but only its
parameters, is called a constant function. This is because the
function, being unrelated to the state of global memory, will
always return the same value when given the same parameters. The
return value is thus derived directly and exclusively from the
values of the parameters given.
The way a constant function "consumes" pointers is very
different from the way other functions do: it can handle them as
both parameter and return value only if they are never
dereferenced, for accessing the memory they are referencing
would be a global memory access, which breaks the requirements
of constant functions.
Of course these requirements have to apply not only to the
operations in the given function, but also recursively to all
the functions it calls. One function can at best be of the same
kind of the least restrictive kind of function it calls. So when
it calls a normal function it can't be but a normal function
itself, if it only calls pure functions it can be either pure or
normal, but not constant, and if it only calls constant
functions it can be constant.
As with inlining, the compiler will be able
to decide if a function is pure or constant, in case no
attribute is attached to it, only if the function is
static (with the exception of special cases for freestanding
code and other advanced options). When a function is not static,
even if it's local, the compiler will assume that the function
can be overridden at link- or run-time so it will not make any
assumption based on the body for the definition it may find.
Optimizing Function Calls
Why should developers bother with marking functions pure or
constant, though? As I said, these two attributes help the
compiler to know some semantic meaning of a function call, so
that it can apply higher optimization than to normal functions.
There are two main optimizations that can be applied to these
kinds of functions: CSE
(Common Sub-expression Elimination) and
DCE (Dead Code
Elimination). We'll soon see in detail, with the help of the
compiler itself, what these two consist of. Their names,
however, are already rather explicit: CSE is
used to avoid duplicating the same code inside a function,
usually factoring out the code before branching or storing the
results of common operations in temporary variables (registers
or stack), while DCE will remove code that
would never be executed or that would be executed but never
used.
These are both optimization that can be implemented in the
source code, to an extent, reducing the usefulness of declaring
functions pure or constant. On the other hand, as I'll
demonstrate, doing so often reduces the readability of the code
by obscuring the actual algorithm in favor of making it
faster. This does not apply to all cases though, sometimes, doing
the optimization "manually", directly in the source code, makes
it more readable, and makes the code resemble the output of
the compiler more.
About Assemblers and Examples
When talking about optimization, it's quite difficult to
visualize the task of the compiler, and the way the code
morphs from what you read in the C source code into what the
CPU is really going to execute. For this reason, the best way
to write about them is to use examples, showing what the
compilers generates starting from the source code.
Given the way in which GCC works, this is actually quite
easy. You just need to enable optimization and append the
-S switch to the gcc
command line. This switch stops the compiler after the
transformation of C source code into assembly, before the
result is passed to the assembler program to produce the
object file.
Although I suspect a good fraction of the people reading this article
would be comfortable reading IA-32 or x86-64 assembly code, I
decided to use the Blackfin
[1]
assembly language, which should be readable for people who
have never studied a particular assembly language.
The Blackfin assembler is more symbolic than IA-32: instead of
having operations named movl and
addq, the operations are identified by
their algebraic operators (=,
+), while the registers are merely called
R1, R2 and so on.
Calling conventions are also quite easy to understand: for all
the cases we'll look through in the article (at most four
parameters, integers or pointers), the parameters are passed
through the registers, starting in order from
R0. The return value of the function call
is also stored in the R0 register.
To clarify the examples which will appear later on, let's see
how the following C source code is translated by GCC into
Blackfin code:
int somefunction(int a, int b, int c);
void somestringfunction(char *pA, char *pB);
int globalvar;
void test() {
somestringfunction("foo", "bar");
globalvar = somefunction(11, 22, 33);
}
becomes:
.section .rodata
.align 4
L$LC$0:
.string "foo"
.align 4
L$LC$1:
.string "bar"
.text;
.align 4
.global _test;
.type _test, STT_FUNC;
_test:
LINK 12;
R0.H = L$LC$0;
R0.L = L$LC$0;
R1.H = L$LC$1;
R1.L = L$LC$1;
call _somestringfunction;
R0 = 11 (X);
R1 = 22 (X);
R2 = 33 (X);
call _somefunction;
P2.H = _globalvar;
P2.L = _globalvar;
[P2] = R0;
UNLINK;
rts;
.size _test, .-_test

As the Blackfin does not have 32 bit immediate load, you
have to load high and low addresses separately (in
whichever order); the assembler will take care of properly
loading the high 16 bits of the label to the upper
part of the register, and the low 16 bits to the lower part.
Once the parameters are loaded, the function is called
almost identically to any other call
operation on other architectures; note the prefixed
underscore on symbols' names.
Integers, both constant or parameters and variables, are
also loaded for calls in the registers. Blackfin doesn't
have 32 bit immediate loading, but if the constant to load
fits into 16 bits, it can be loaded through sign extension
by appending the (X) suffix.
When accessing a global memory location, the
P2 pointer is set to the address of the
memory location...
... and then dereferenced to assign that memory
area. Being a RISC architecture, Blackfin does not have
direct memory operations.
The return value for a function is loaded into the
R0 register, and can be accessed from
there.
The rts command is the return from
subroutine, and usually indicates the end of the function,
but like the return statement in C,
it might appear in any place of the routine.
In the following examples, the preambles with declarations and
data will be omitted whenever these are not useful to the
discussion.
Concerning optimization levels, the code will almost
always be compiled with at least the first optimization level
enabled (-O1). This both because it makes the code cleaner to
read (using register-register copy for parameters passing,
instead of saving to the stack and then restoring from that)
and because we need optimization enabled to see how they are
applied.
Also, most of the times I'll refer to the
fastest alternative. Most of what I say,
though, applies also to the smaller
alternative when using the -Os optimization level. In any
case, the compiler always weighs the cost-to-benefit ratio
between the optimized and the unoptimized version, or between
different optimized versions. If you want to know the exact
route the compiler takes for your code, you can always use the
-S switch to find out.
DCE and Unused Variables
One area where DCE is useful
is to avoid operations that result in unused data. It's
not that uncommon that a variable is defined by an operation,
complex or not, and is then never used by the code, either
because it is intended for future expansion or because it's a
remnant of older code that has been removed or replaced. While
the best thing would be to get rid of the definition entirely,
users expect the compiler to produce a good result with sloppy
code too, and that operation should not be emitted.
The DCE pass can remove all the code that
has no side effect, when its result is not used. This includes
all mathematical operations and functions known to be pure or
constant (as neither are allowed to change the global state of
the variables). If a function call is not known to be at least
pure, it may change the global state, and its call will not be
eliminated, as shown in the following code:
int someimpurefunction(int a, int b);
int somepurefunction(int a, int b)
__attribute__((pure));
int testfunction(int a, int b, int c) {
int res1 = someimpurefunction(c, b);
int res2 = somepurefunction(b, c);
int res3 = a + b - c;
return a;
}
Which, once compiled with -O1,
[2]
produces the following Blackfin assembler:
_testfunction:
[--sp] = ( r7:7 );
LINK 12;
R7 = R0;
R0 = R2;
call _someimpurefunction;
R0 = R7;
UNLINK;
( r7:7 ) = [sp++];
rts;
As you can see, the call to the pure function has been
eliminated (the res2 variable was not being
used), together with the algebraic operation but, the impure
function, albeit having its return value discarded, is still
called. This is due to the fact that the compiler emits the
call, not knowing whether the latter function has side
effects on the global memory state or not.
This is equivalent to the following code (which
produces the same assembler code):
int someimpurefunction(int a, int b);
int testfunction(int a, int b, int c) {
someimpurefunction(c, b);
return a;
}
The Dead Code Elimination optimization can be very helpful to
reduce the overhead caused by code written to conform to C89
standard, where you couldn't mix variables (and constant)
declarations with executable code.
In those sources, you had to declare variables at the top of
the function, and then start to check for prerequisites. If
you wanted to make it explicit that some variable had to keep
its value, by making it constant, you would often have to fill
them before the prerequisites could be checked.
Without discussing legacy code, it is also useful when
writing debug code, so that it doesn't look out of place from
the use of lots of #ifdef directives. Take
for instance the following code:
#ifdef NDEBUG
# define assert_se(x) (x)
#else
void assert_se(int boolean);
#endif
char *getsomestring(int i) __attribute__((pure));
int dosomethinginternal(void *ctx, int code, int val);
int dosomething(void *ctx, int code, int val) {
char *string = getsomestring(code);
// returning string might be a sub-string of "something"
// like "some" or "so"
assert_se(strncmp(string, "something", strlen(string)) == 0);
return dosomethinginternal(ctx, code, val);
}
The assert_se macro has different
behavior from the standard assert, as it
has side effects, which basically means that the code passed
to the assertion is called even though the compiler is told to
disable debugging. This is a somewhat common trick, although
its effects on readability are debatable.
With getsomestring() pure, when compiling
without debugging, the DCE will remove the calls to all three
functions: getsomestring(),
strncmp() and
strlen() (the latter two are usually
declared as pure by both the C library and by GCC's built-in
replacements). This because none of these functions have a
side effect, resulting in a very short function:
_dosomething:
LINK 0;
UNLINK;
jump.l _dosomethinginternal;
If our getsomestring() function weren't
pure, even though its return value is not going to be used,
the compiler would have to emit the call, resulting in rather
more complex (albeit still simple, compared with most
real-world functions) assembler code:
_dosomething:
[--sp] = ( r7:5 );
LINK 12;
R7 = R0;
R0 = R1;
R6 = R1;
R5 = R2;
call _getsomestring;
UNLINK;
R0 = R7;
R1 = R6;
R2 = R5;
( r7:5 ) = [sp++];
jump.l _dosomethinginternal;
Common Sub-expression Elimination
The Common Sub-expression Elimination optimization is one of
the most important optimizations performed by the compiler,
because it's the one that, for instance, replaces multiple
indexed accesses to an array so that the actual memory address
is calculated just once.
What this optimization does is to find common operations
executed on the same operands (even when they are not known at
compile-time), decide which ones are more expensive than
saving the result in a temporary (register or stack), and then
swapping the code around to take the cheapest course.
While its uses are quite varied, one of the easiest ways to
see the work of the CSE is to look at the
code generated when using the ternary if
operator. Let's take the following code:
int someimpurefunction(int a);
int somepurefunction(int a)
__attribute__((pure));
int testfunction(int a, int b, int c, int d) {
int res1 = someimpurefunction(a) ? someimpurefunction(a) : b;
int res2 = somepurefunction(a) ? somepurefunction(a) : c;
int res3 = a+b ? a+b : d;
return res1+res2+res3;
}
The compiler will optimize the code as:
_testfunction:
[--sp] = ( r7:4 );
LINK 12;
R7 = R0;
R5 = R1;
R4 = R2;
call _someimpurefunction;
cc =R0==0;
if !cc jump L$L$2;
R6 = R5;
jump.s L$L$4;
L$L$2:
R0 = R7;
call _someimpurefunction;
R6 = R0;
L$L$4:
R0 = R7;
call _somepurefunction;
R1 = R0;
cc =R0==0;
if cc R1 =R4; /* movsicc-1b */
R0 = R5 + R7;
cc =R0==0;
R2 = [FP+36];
if cc R0 =R2; /* movsicc-1b */
R1 = R1 + R6;
R0 = R1 + R0;
UNLINK;
( r7:4 ) = [sp++];
rts;
As you can see, the pure function is called just once, because the
two references inside the ternary operator are equivalent,
while the other one is called twice. This is because there was
no change to global memory known to the compiler between the
two calls of the pure function (the function itself couldn't
change it – note that the compiler will never take
multi-threading into account, even when asking for it
explicitly through the -pthread flag),
while the non-pure function is allowed to change global memory
or use I/O operations.
The equivalent code in C would be something along the
following lines (it differs a bit because the compiler will
use different registers):
int someimpurefunction(int a);
int somepurefunction(int a)
__attribute__((pure));
int testfunction(int a, int b, int c, int d) {
int res1 = someimpurefunction(a) ? someimpurefunction(a) : b;
const int tmp1 = somepurefunction(a);
int res2 = tmp1 ? tmp1 : c;
const int tmp2 = a+b;
int res3 = tmp2 ? tmp2 : d;
return res1+res2+res3;
}
The Common Sub-expression Elimination optimization is very
useful when writing long and complex mathematical
operations. The compiler can find common calculations even
though they don't look common to the naked eye, and act on
those.
Although sometimes you can get away with using multiple
constants or variables to carry out temporary operations so
that they can be re-used in the following calculations,
leaving the formulae entirely explicit is usually more
readable, as long as the formulae are not intended to change.
Like with other algorithms, there are some advantages to
reducing the source code used to calculate the same thing; for
instance you can easily make a change directly to the
definition of a constant and get the change propagated to all
the uses of that constant. On the other hand, this can be
quite a problem if the meaning of two calculations is very
different (and thus can vary in different ways with the
evolution of the code), and just happen to be calculated in
the same way at a given time.
Another rather useful place where the compiler can further
optimize code with CSE, where it wouldn't be so nice or simple
to do manually in the source code, is where you deal with
static functions that are inlined by the compiler.
Let's examine the following code for instance:
extern int a;
extern int b;
static inline int somefunc1(int p) {
return (p * 16) + (3 << a);
}
static inline int somefunc2(int p) {
return (p * 16) + (4 << b);
}
extern int res1;
extern int res2;
extern int res3;
extern int res4;
void testfunc(int p1, int p2)
{
res1 = somefunc1(p1);
res2 = somefunc2(p1);
res3 = somefunc1(p2);
res4 = somefunc2(p2);
}
In this code, you can find four basic expressions:
(p1 * 16), (p2 *
16), (3 << a) and
(4 << b). Each of these four
expressions is used twice in the
somefunc() function. Thanks to the CSE,
though, the code will calculate each of them once, even
though they cross the function boundary, producing the
following code:
_testfunc:
[--sp] = ( r7:7 );
LINK 0;
R0 <<= 4;
R1 <<= 4;
P2.H = _a;
P2.L = _a;
R2 = [P2];
R7 = 3 (X);
R7 <<= R2;
P2.H = _b;
P2.L = _b;
R2 = [P2];
R3 = 4 (X);
R3 <<= R2;
R2 = R0 + R7;
P2.H = _res1;
P2.L = _res1;
[P2] = R2;
P2.H = _res2;
P2.L = _res2;
R0 = R0 + R3;
[P2] = R0;
R7 = R1 + R7;
P2.H = _res3;
P2.L = _res3;
[P2] = R7;
R1 = R1 + R3;
P2.H = _res4;
P2.L = _res4;
[P2] = R1;
UNLINK;
( r7:7 ) = [sp++];
rts;
As you can easily see (the assembly was modified a bit to
improve its readability, the compiler re-ordered loads of
registers to avoid pipeline stalls, making it harder to see the
point), the four expressions are calculated first, and stored
respectively in the registers R0,
R1, R7 and
R3.
These kinds of sub-expressions are usually harder to see in
the code and also harder to implement. Sometimes they get
factored out on their own parameter, but that can be more
expensive during execution, depending on the calling conventions
of the architecture.
Cheats
As I wrote above, there are some requirements that apply to
functions that are declared pure and constant, related to not
changing or accessing global memory; not executing I/O
operations; and, of course, not calling further impure
functions. The reason for this is that the compiler will
accept what the user declares the function to be, whatever its
body is (as it's usually unknown by the compiler at the call
stage).
Sometimes, though, it's possible to fool the compiler so that
it treats impure functions as pure or even constant
functions. Although this is a risky endeavor, as it might
truly cause bad code generation by the compiler, it can
sometimes be used to force optimization for particular
functions.
An example of this can be a lookup function that scans through
a global table to return a value. While it is accessing global
memory, you might want the compiler to promote it to a
constant function, rather than simply to a pure one.
Let's take for instance the following code:
const struct {
const char *str;
int val;
} strings[] = {
{ "foo", 31 },
{ "bar", 34 },
{ "baz", -24 }
};
const char *lookup(int val) {
int i;
for(i = 0; i < sizeof(strings)/sizeof(*strings); i++)
if ( strings[i].val == val )
return strings[i].str;
return NULL;
}
void testfunction(int val, const char **str, unsigned long *len) {
if ( lookup(val) ) {
*str = lookup(val);
*len = strlen(lookup(val));
}
}
If the lookup() function is only
considered a pure function, as it is, adhering to the rules we
talked about at the start of the article, it will be called
three times in testfunction(), like this:
_testfunction:
[--sp] = ( r7:7, p5:4 );
LINK 12;
R7 = R0;
P5 = R1;
P4 = R2;
call _lookup;
cc =R0==0;
if cc jump L$L$17;
R0 = R7;
call _lookup;
[P5] = R0;
R0 = R7;
call _lookup;
call _strlen;
[P4] = R0;
L$L$17:
UNLINK;
( r7:7, p5:4 ) = [sp++];
rts;
Instead, we can trick the compiler by declaring the
lookup() function as constant (the data
it is reading is constant, after all, so at a given parameter
it will always return the same result). If we do that, the
three calls will have to return the same value, and the
compiler will be able to optimize them as a single call:
_testfunction:
[--sp] = ( p5:4 );
LINK 12;
P5 = R1;
P4 = R2;
call _lookup;
cc =R0==0;
if cc jump L$L$17;
[P5] = R0;
call _strlen;
[P4] = R0;
L$L$17:
UNLINK;
( p5:4 ) = [sp++];
rts;
In addition to lookup functions on constant tables, this
trick is useful with functions which read data from files or
other volatile data, and cache it in a memory variable.
Take for instance the following function that reads an
environment variable:
char *get_testval() {
static char *cachedval = NULL;
if ( cachedval == NULL ) {
cachedval = getenv("TESTVAL");
if ( cachedval == NULL )
cachedval = "";
else
cachedval = strdup(cachedval);
}
return cachedval;
}
This is not truly a constant function, as its return value
depends on the environment. Even so, assuming that the
environment of the process is left untouched, its return value
will never change between calls. Even though it will affect
the global state of the program (as the
cachedval static variable will be filled in
the first time the function is called), it can be assumed to
always return the same value.
Tricking the compiler into thinking that a function is constant
even though it has to load data through I/O operations, as I
said, is risky, as the compiler will think there is no I/O
operation going on; on the other hand, this trick might make a
difference sometimes, as it allows the expression of functions
in more semantic ways, leaving it up to the compiler to
optimize the code with temporaries, where needed.
One example can be the following code:
char *get_testval() {
static char *cachedval = NULL;
if ( cachedval == NULL ) {
cachedval = getenv("TESTVAL");
if ( cachedval == NULL )
cachedval = "";
else
cachedval = strdup(cachedval);
}
return cachedval;
}
extern int a;
extern int b;
extern int c;
extern int d;
static int testfunc1() {
if ( strcmp(get_testval(), "FOO") == 0 )
return a;
else
return b;
}
static int testfunc2() {
if ( strcmp(get_testval(), "BAR") == 0 )
return c;
else
return d;
}
int testfunction() {
return testfunc1() + testfunc2();
}
Note:
To make sure that the compiler won't reduce the three
function calls to their return values right away, the static
sub-functions return values taken from global variables; the
meanings of those variables are not important.
Considering the above source code, if
get_testval() is impure, as the compiler
will automatically find it to be, it will be compiled into:
_testfunction:
[--sp] = ( r7:7 );
LINK 12;
call _get_testval;
R1.H = L$LC$2;
R1.L = L$LC$2;
call _strcmp;
cc =R0==0;
if !cc jump L$L$11 (bp);
P2.H = _a;
P2.L = _a;
R7 = [P2];
L$L$13:
call _get_testval;
R1.H = L$LC$3;
R1.L = L$LC$3;
call _strcmp;
cc =R0==0;
if !cc jump L$L$14 (bp);
P2.H = _c;
P2.L = _c;
R0 = [P2];
UNLINK;
R0 = R0 + R7;
( r7:7 ) = [sp++];
rts;
L$L$11:
P2.H = _b;
P2.L = _b;
R7 = [P2];
jump.s L$L$13;
L$L$14:
P2.H = _d;
P2.L = _d;
R0 = [P2];
UNLINK;
R0 = R0 + R7;
( r7:7 ) = [sp++];
rts;
As you can see, the get_testval() is
called twice, even though its result will be identical. If we
declare it constant, instead, the code of our test function
will be the following:
_testfunction:
[--sp] = ( r7:6 );
LINK 12;
call _get_testval;
R1.H = L$LC$2;
R1.L = L$LC$2;
R7 = R0;
call _strcmp;
cc =R0==0;
if !cc jump L$L$11 (bp);
P2.H = _a;
P2.L = _a;
R6 = [P2];
L$L$13:
R1.H = L$LC$3;
R0 = R7;
R1.L = L$LC$3;
call _strcmp;
cc =R0==0;
if !cc jump L$L$14 (bp);
P2.H = _c;
P2.L = _c;
R0 = [P2];
UNLINK;
R0 = R0 + R6;
( r7:6 ) = [sp++];
rts;
L$L$11:
P2.H = _b;
P2.L = _b;
R6 = [P2];
jump.s L$L$13;
L$L$14:
P2.H = _d;
P2.L = _d;
R0 = [P2];
UNLINK;
R0 = R0 + R6;
( r7:6 ) = [sp++];
rts;
The CSE pass combines the two calls to
get_testval with one. Again, this is one
of the optimizations that are harder to achieve by manually
changing the source code since the compiler can have a larger
view of the use of its value. A common way to handle this is
by using global variables, but that might require one more
load from the memory, while CSE can take care of keeping the
values in registers or on the stack.
Conclusions
After what you have read about pure and constant functions, you
might have some concerns about the average use of them. Indeed,
in a lot of cases, these two attributes allow the compiler to do
something you can easily achieve by writing better code.
There are two objectives you have to keep in mind that are
related to the use of these (and other) attributes. The first is
code readability because sometimes the manually optimized
functions are harder to read than what the compiler can
produce. The second is allowing the compiler to optimize legacy
or external code.
While you might not be too concerned with letting legacy code or
code written by someone else get away with slower execution, a
pragmatic view of the current Free Software world should take
into consideration the fact that there are probably thousands
lines of code of legacy code around. Some of that code, written with
pre-C99 declarations, might be even
using
libraries that are being developed with their older interface,
which could be improved by providing some extra semantic
information to the compiler through use of attributes.
Also, it's unfortunately true that extensive use of these
attributes might be seen by neophytes as an easy solution to let
sloppy code run at a decent speed. On the other hand, the same
attributes could be used to identify such sloppy code through
analysis of the source code.
Although GCC does not issue warnings for all of these cases, it
already warns for some of them, like unused variables, or
statements without effect (both triggered by the
DCE). In the future more warnings might be
reported if pure and constant functions get misused.
In general, like with many other GCC function attributes, their
use is tightly related to how programmers perceive their
task. Most pragmatic programmers would probably like these
tools, while purists will probably dislike the way these
attributes help sloppy code to run almost as fast as properly
written code.
My hopes are that in the future better tools will make good use
of these and other attributes on different levels than
compilers, like static and dynamic analyzers.
[1]
The Blackfin architecture is a RISC architecture developed
by Analog Devices, supported by both GCC and Binutils (and
Linux, but I'm not interested in that here).
[2]
I have chosen -O1 rather than -O2 because in the latter
case the compiler performs extra optimization passes that
I do not wish to discuss within the scope of this article.
Comments (45 posted)
Page editor: Jake Edge
Inside this week's LWN.net Weekly Edition
- Security: SCADA system vulnerabilities; New vulnerabilities in kernel, openoffice.org, snort, tomcat,...
- Kernel: The linux-staging tree; 2.6.26 API changes; Andrew Morton interview
- Distributions: openSUSE merges forums ahead of 11.0 release; 64 Studio 2.1; Debian Installer lenny beta 2; Mandriva Flash 2008 Spring; openSUSE Build Service 1.0 RC 1; Celebrating 10 years of Mandriva
- Development: Detect and record video movement with Motion,
About Django and the importance of releases, threads in Python,
new versions of pgAdmin III, BusyBox, JVoiceXML, OrangeHRM, GNOME, GARNOME,
pdfposter, Gimp User Filter, ScaleTempo, OO.o, Dirac, Souzou, GCC,
GNU Classpath, Qt Jambi, UMLet.
- Press: Stallman complains about the Oyster payment system,
Acer likes Linux for laptops, Google Gadgets for Linux, multitouch gesture support for Synaptics TouchPads, KDE 4.1 snapshot review, Linux vs. Windows on power usage.
- Announcements: Ardour in Summer Code Finland, EFF on unmasking MySpace user, Mozilla
parties, SFLC files more BusyBox suits, SUSE Linux supercomputers,
Win4Lin Desktop 5, ODBMS.org panel discussion, OO.o conf cfp, PyOhio cfp,
French KDE Day videos.
Next page:
Security>>