September 27, 2011
This article was contributed by Matt Davis
The free and open source community is largely based on
extensibility, sharing, and modularity, so projects will often provide an
API that allows users to add functionality.
One way to accomplish that is
via plugins to the base project, which allows a user
to customize the project to their needs and easily share the
result with others who might also see utility in such a
modification. Such a framework was recently implemented in GCC.
As
of version 4.5 of GCC, a user can create a plugin that provides extra
features to the compiler, which they can further pass around as a
shareable module. GCC plugins provide developers with a rich subset
of the GCC API to allow them to extend GCC as they see fit.
Whether it is writing an additional optimization pass, transforming
code, or analyzing information, plugins can be quite useful.
In this article we will learn how to develop a GCC plugin by creating one that
can be used for spell checking read-only strings in an executable. What is the
reason for doing so, one might ask? Well, it's a lesson on plugins, and this
example provides a nice way to investigate this aspect of GCC.
Creating a plugin
A plugin is just a shared object file, following an API defined by GCC. To get
started cooking up a plugin we will need to obtain a version of GCC, 4.5 or
later.
After your GCC is built, you are ready to get your hack on. Create a
development directory "mkdir myplugin" and "cd myplugin" From there we will
create the source file which will be our plugin: myplugin.c.
Plugin preparation
The first step in creating a plugin is to fill out some of the structures that
the plugin API provides. These allow GCC to properly call the plugin, provide
the GCC user some "help" information, and verify that the plugin is able to run
on the version of the compiler that is trying to use the plugin. Before I
introduce the plugin API, take a peek at it for yourself, the easiest way to
figure out where the API header is located is to run the GCC that you are
targeting
your plugin for:
gcc -print-file-name=plugin
The files in there are the GCC API accessible to your plugin. If, however,
the command simply prints the word "plugin", that means those
files are not installed on your system. Of interest right
now is the boiler-plate stuff, located in gcc-plugin.h. The following
structures are provided by the API, and their fields are documented in
gcc-plugin.h.
- struct plugin_name_args:
- Contains information GCC deduces about the plugin that is being called.
- We do not fill this structure out, rather GCC passes us an instance of it
when our plugin's initialization routine is called. We can also get
arguments for the plugin that were passed via the command line, such as:
-fplugin-arg-myplugin-foo=bar.
All plugin arguments are prefixed with the "basename" of the plugin,
which is the
name of the plugin shared object file that results after compilation but
without the .so extension.
- struct plugin_info:
- Version and help string for your plugin, what gcc -v will display.
- Must be registered upon initialization of your plugin
- Optional
- struct plugin_gcc_version:
- Ensures that your plugin will operate on the appropriate
version of GCC.
- Gets passed as input to the plugin via initialization. We can use this to
verify version information.
- Optional
Now to get started. Using your favorite text editor, create a file called
myplugin.c and include the headers for the basic GCC API.
#include <gcc-plugin.h>
The next thing we must define is a variable called plugin_is_GPL_compatible.
When plugins are being loaded, the symbols in the resulting object file are
examined, and before the plugin will actually be usable, GCC makes sure that
there is a symbol called plugin_is_GPL_compatible. If this symbol exists,
then your plugin is deemed to have the proper license.
int plugin_is_GPL_compatible = 1;
Next, let's define some metadata about the plugin; that is done by instantiating an
instance of struct plugin_info in case the user asks for help (via gcc -v and/or gcc -v --help)
with the plugin loaded. This is not required but can be helpful. Also, to make
this effective, during initialization GCC must be handed this structure:
static struct plugin_info myplugin_info =
{
.version = "42",
.help = "Not yet...",
};
Next, we will define an instance of plugin_gcc_version so that, during
initialization, we can verify that our plugin and GCC can dance together
nicely.
static struct plugin_gcc_version myplugin_ver =
{
.basever = "4.6",
};
Note that every field of this structure (basever,
datestamp, devphase, revision, and
configuration_arguments), all of which are strings, must match that of the version
of GCC you are using when you let GCC determine if the version matches, e.g. via
a call at initialization to plugin_default_version_check(). You can use your own
version check method instead, which is ideal, as I doubt that your configuration
options would exactly match those of your plugin's users. In short, do
your own version checking unless you are bundling your plugin as part of a GCC
release where the configuration options and datestamps, etc. will match.
If you
are curious, the 'plugin-version.h' has the data for the basever, datestamp,
devphase, revision, used when you built GCC (or when your distribution package
maintainer built it). For our example, we will just validate against the base
version string of GCC and will handle the version check ourselves. For our
purposes we
only care about any GCC version 4.6 and are ignoring the micro version number,
e.g. the '0' in 4.6.0.
Plugin initialization
By now we have provided a bunch of info about the plugin, but there is one
more piece of information that is required by GCC: the routine
that actually initializes the plugin and registers callbacks to occur during the
compilation of a program. This is another symbol the plugin framework will
pluck out of the file, so it must be named plugin_init(). Our
plugin is going
to operate as a GIMPLE pass, allowing us to analyze (and transform if we wish)
GCC's middle-end representation of the program being compiled.
There are a couple of other options available.
IPA_PASS is an interprocedural pass where the developer parses the control flow
graph of the program and optimizes code that is dependent across
functions, such as function inlining. The RTL_PASS is the last pass the
compiler performs. This type of pass operates on a RTL (register transfer
layout) representation of the program. RTL is what maps the gcc GIMPLE
code into the register of a machine (machine description).
In order to do that, we need to define the proper struct and pass it
information to the plugin
framework, which will then call our callback handler
once for each function in the source file. There are
a number of other things that can be accomplished via plugins. These things
are called "events" and are enumerated values in the plugin.def file. The .def files in
GCC represent enumerations and are translated to source code when GCC is being
compiled. These are then included into GCC's source for plugin.c:
#include "plugin.def"
This makes using ctags kind of tricky, because it does not inspect
.def files, nor does it know how to parse them. On the other hand, cscope works
well if tell it to look for .def files.
Below is the data structure we use when our plugin is asked to
initialize itself.
/* See tree-pass.h for a list and descriptions for the fields of this struct */
static struct gimple_opt_pass myplugin_pass =
{
.pass.type = GIMPLE_PASS,
.pass.name = "myplugin", /* For use in the dump file */
/* Predicate (boolean) function that gets executed before your pass. If the
* return value is 'true' your pass gets executed, otherwise, the pass is
* skipped.
*/
.pass.gate = myplugin_gate, /* always returns true, see full code */
.pass.execute = myplugin_exec, /* Your pass handler/callback */
};
Please note that
there are many more options we can specify for our pass, but to keep things
simple, we are just going to have a basic pass with a gate and execution
callback. myplugin_exec() is the function that is called back given the conditions
specified in the optimization pass we define below. Here are the pieces for all
three parts of the optimization pass (the source code for the full plugin
can be found here):
Now that we have a structure defining how we want our pass to act,
let's finally write the required plugin_init() function:
/* Return 0 on success or error code on failure */
int plugin_init(struct plugin_name_args *info, /* Argument information */
struct plugin_gcc_version *ver) /* Version info of GCC */
{
/*
* Used to tell the plugin-framework about where we want to be called in the
* set of all passes. This is located in tree-pass.h
*/
struct register_pass_info pass;
printf("Plugin initialized...\n");
/*
* We could call: plugin_default_version_check() to validate our plugin, but
* we will skip that. Instead, as mentioned it can be more useful if we
* validate the version information ourselves
*/
if (strncmp(ver->basever, myplugin_ver.basever, strlen("4.6")))
return -1; /* Incorrect version of GCC */
/*
* Setup the info to register with GCC telling when we want to be called and
* to what GCC should call, when it's time to be called.
*/
pass.pass = &myplugin_pass.pass;
/*
* Get called after GCC has produced the SSA representation of the program.
* After the first SSA pass.
*/
pass.reference_pass_name = "ssa";
pass.ref_pass_instance_number = 1;
pass.pos_op = PASS_POS_INSERT_AFTER;
/* Tell GCC we want to be called after the first SSA pass */
register_callback("myplugin", PLUGIN_PASS_MANAGER_SETUP, NULL, &pass);
/*
* Tell GCC some information about us... just for use in --help and
* --version
*/
register_callback("myplugin", PLUGIN_INFO, NULL, &myplugin_info);
/* Successful initialization */
return 0;
}
The above ties everything together. The first thing we need our plugin to do is
to visit each statement in the source code that is being compiled and identify
the read-only data. As an SSA pass, the compiler will hand our plugin GIMPLE,
GCC's intermediate language, a set of basic blocks per function.
Since all frontends (C, C++, Go, Fortran, etc) produce GIMPLE, gcc can
effectively optimize just the GIMPLE code, rendering all optimizers capable of
working on any language that GCC can parse. Likewise, when a developer
writes a pass for the GIMPLE gcc intermediate language, it becomes language
agnostic and can be applied to any language GCC parses. Using a three-address
code, GIMPLE represents each statement in the input language as a statement
consisting of two operands and a result value. In the case of an
assignment statement with an addition operator, we have something like:
lhs = op1 + op2
Where lhs, op1, and op2 are the three address
codes that make up the assignment statement.
Read-only identification
There might be an easier way of finding read-only data in a program, however
our approach here allows one to understand how to traverse basic blocks and
statements in the program. We are more concerned with learning than
functionality for this tutorial.
When a function gets passed to our plugin we need to do something with it,
mainly analyze it and identify read-only data. The following is our callback
we registered previously myplugin_exec():
static unsigned myplugin_exec(void)
{
unsigned i;
const_tree str, op;
basic_block bb;
gimple stmt;
gimple_stmt_iterator gsi;
FOR_EACH_BB(bb)
for (gsi=gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
{
stmt = gsi_stmt(gsi);
for (i=0; i<gimple_num_ops(stmt); ++i)
if ((op = gimple_op(stmt, i)) && (str = is_str_cst(op)))
spell_check(stmt, str);
}
return 0;
}
FOR_EACH_BB operates on a global variable in GCC which represents
the current function being processed, cfun. The gsi
is a GIMPLE statement iterator. In short, this loop traverses each basic
block in cfun and visits each statement in that function. We look
at each statement in the basic block via the gsi interface.
Finally we look at each operand that makes up the GIMPLE statement. The
is_str_cst() is a predicate function that we will define below.
This function determines if the operand we plucked from the statement
represents a string constant. If it does we will spell check that string.
In GCC, every construct in the source code is essentially a tree. A tree
node could represent constants, variables, functions, etc. Often these tree
instances are wrapped multiple times. For instance the tree node might be
encapsulated in an SSA_NAME instance for the SSA representation. Under that
might be a
POINTER_TYPE and under that an INTEGER_CST for a constant integer.
Note that these node types are defined in tree.def. Helper functions in tree.h
can help to identify and peel off these layers, such as with TREE_CODE() and
TREE_OPERAND() respectively. Also, TREE_TYPE() is useful for peeling off the
layers. The debug_* functions can help better understand this layering,
particularly debug_tree(). The following routine is what we will start
off with for identifying STRING_CST:
static const_tree is_str_cst(const_tree node)
{
const_tree str = node;
/* Filter out types we are ignoring */
if (TREE_CODE(str) == VAR_DECL)
{
if (!(str = DECL_INITIAL(node))) /* nop expr */
return NULL_TREE;
else if (TREE_CODE(str) == INTEGER_CST) /* Ignore single chars */
return NULL_TREE;
str = TREE_OPERAND(str, 0); /* addr expr */
}
else if (TREE_CODE(str) == ADDR_EXPR)
str = TREE_OPERAND(str, 0);
/* We only deal with readonly stuff */
if (!TYPE_READONLY(str) && (TREE_CODE(str) != ARRAY_REF))
return NULL_TREE;
if (TREE_CODE(str) != STRING_CST)
str = TREE_OPERAND(str, 0);
if (TREE_CODE(str) != STRING_CST)
return NULL_TREE;
else
return str;
}
As you can see, we detect the type of node to be processed. If we are
initially presented with a declaration node (which one might get if the node
were a parameter instance) we look at the declaration of it via DECL_INITIAL().
If we get an address, we peel off that variant and look at what it is an address
expression of. If the node isn't read-only we assume that it's probably not a
hard-coded string so we just ignore it. There is probably a better way of
handling this, such as just placing the node in a loop around operand checks,
but I'll leave other methods of peeling the types as an exercise for the reader.
Conclusion
Plugin passes are quite powerful, as can be seen above. I hope this brief
jaunt down
"pass" lane has been helpful. But before I leave you to your plugin hacking I
want to give a few more tips that can aid learning the GCC internals. Use a
debugger. If you built GCC and your plugin with debugging symbols, you can
learn a lot by stepping through the code. When debugging via
gdb, make sure you
use the actual binary for compiling the language you want, for go the binary is
gocc and for C you probably want to use
xgcc instead of GCC.
xgcc is located
in the object directory where you built GCC, in the
gcc subdirectory. Use that
for debugging C related code. One other useful tip in learning the GCC
internals is to actually trace the code that is used for printing and dumping
data from the passes, such as
debug_gimple_stmt(),
debug_generic_stmt(), and
debug_tree() these will help you see how GCC structures the objects because
everything is a tree.
(
Log in to post comments)