LWN.net Logo

An introduction to creating GCC plugins

September 27, 2011

This article was contributed by Matt Davis

The free and open source community is largely based on extensibility, sharing, and modularity, so projects will often provide an API that allows users to add functionality. One way to accomplish that is via plugins to the base project, which allows a user to customize the project to their needs and easily share the result with others who might also see utility in such a modification. Such a framework was recently implemented in GCC.

As of version 4.5 of GCC, a user can create a plugin that provides extra features to the compiler, which they can further pass around as a shareable module. GCC plugins provide developers with a rich subset of the GCC API to allow them to extend GCC as they see fit. Whether it is writing an additional optimization pass, transforming code, or analyzing information, plugins can be quite useful.

In this article we will learn how to develop a GCC plugin by creating one that can be used for spell checking read-only strings in an executable. What is the reason for doing so, one might ask? Well, it's a lesson on plugins, and this example provides a nice way to investigate this aspect of GCC.

Creating a plugin

A plugin is just a shared object file, following an API defined by GCC. To get started cooking up a plugin we will need to obtain a version of GCC, 4.5 or later. After your GCC is built, you are ready to get your hack on. Create a development directory "mkdir myplugin" and "cd myplugin" From there we will create the source file which will be our plugin: myplugin.c.

Plugin preparation

The first step in creating a plugin is to fill out some of the structures that the plugin API provides. These allow GCC to properly call the plugin, provide the GCC user some "help" information, and verify that the plugin is able to run on the version of the compiler that is trying to use the plugin. Before I introduce the plugin API, take a peek at it for yourself, the easiest way to figure out where the API header is located is to run the GCC that you are targeting your plugin for:

    gcc -print-file-name=plugin   

The files in there are the GCC API accessible to your plugin. If, however, the command simply prints the word "plugin", that means those files are not installed on your system. Of interest right now is the boiler-plate stuff, located in gcc-plugin.h. The following structures are provided by the API, and their fields are documented in gcc-plugin.h.

  • struct plugin_name_args:
    • Contains information GCC deduces about the plugin that is being called.
    • We do not fill this structure out, rather GCC passes us an instance of it when our plugin's initialization routine is called. We can also get arguments for the plugin that were passed via the command line, such as: -fplugin-arg-myplugin-foo=bar. All plugin arguments are prefixed with the "basename" of the plugin, which is the name of the plugin shared object file that results after compilation but without the .so extension.

  • struct plugin_info:
    • Version and help string for your plugin, what gcc -v will display.
    • Must be registered upon initialization of your plugin
    • Optional

  • struct plugin_gcc_version:
    • Ensures that your plugin will operate on the appropriate version of GCC.
    • Gets passed as input to the plugin via initialization. We can use this to verify version information.
    • Optional

Now to get started. Using your favorite text editor, create a file called myplugin.c and include the headers for the basic GCC API.

    #include <gcc-plugin.h>  

The next thing we must define is a variable called plugin_is_GPL_compatible. When plugins are being loaded, the symbols in the resulting object file are examined, and before the plugin will actually be usable, GCC makes sure that there is a symbol called plugin_is_GPL_compatible. If this symbol exists, then your plugin is deemed to have the proper license.

    int plugin_is_GPL_compatible = 1;  

Next, let's define some metadata about the plugin; that is done by instantiating an instance of struct plugin_info in case the user asks for help (via gcc -v and/or gcc -v --help) with the plugin loaded. This is not required but can be helpful. Also, to make this effective, during initialization GCC must be handed this structure:

    static struct plugin_info myplugin_info =
    {
        .version = "42",
        .help = "Not yet...",
    };

Next, we will define an instance of plugin_gcc_version so that, during initialization, we can verify that our plugin and GCC can dance together nicely.

    static struct plugin_gcc_version myplugin_ver =
    {
        .basever = "4.6",
    };

Note that every field of this structure (basever, datestamp, devphase, revision, and configuration_arguments), all of which are strings, must match that of the version of GCC you are using when you let GCC determine if the version matches, e.g. via a call at initialization to plugin_default_version_check(). You can use your own version check method instead, which is ideal, as I doubt that your configuration options would exactly match those of your plugin's users. In short, do your own version checking unless you are bundling your plugin as part of a GCC release where the configuration options and datestamps, etc. will match.

If you are curious, the 'plugin-version.h' has the data for the basever, datestamp, devphase, revision, used when you built GCC (or when your distribution package maintainer built it). For our example, we will just validate against the base version string of GCC and will handle the version check ourselves. For our purposes we only care about any GCC version 4.6 and are ignoring the micro version number, e.g. the '0' in 4.6.0.

Plugin initialization

By now we have provided a bunch of info about the plugin, but there is one more piece of information that is required by GCC: the routine that actually initializes the plugin and registers callbacks to occur during the compilation of a program. This is another symbol the plugin framework will pluck out of the file, so it must be named plugin_init(). Our plugin is going to operate as a GIMPLE pass, allowing us to analyze (and transform if we wish) GCC's middle-end representation of the program being compiled. There are a couple of other options available. IPA_PASS is an interprocedural pass where the developer parses the control flow graph of the program and optimizes code that is dependent across functions, such as function inlining. The RTL_PASS is the last pass the compiler performs. This type of pass operates on a RTL (register transfer layout) representation of the program. RTL is what maps the gcc GIMPLE code into the register of a machine (machine description).

In order to do that, we need to define the proper struct and pass it information to the plugin framework, which will then call our callback handler once for each function in the source file. There are a number of other things that can be accomplished via plugins. These things are called "events" and are enumerated values in the plugin.def file. The .def files in GCC represent enumerations and are translated to source code when GCC is being compiled. These are then included into GCC's source for plugin.c:

    #include "plugin.def"

This makes using ctags kind of tricky, because it does not inspect .def files, nor does it know how to parse them. On the other hand, cscope works well if tell it to look for .def files.

Below is the data structure we use when our plugin is asked to initialize itself.

    /* See tree-pass.h for a list and descriptions for the fields of this struct */
    static struct gimple_opt_pass myplugin_pass = 
    {
        .pass.type = GIMPLE_PASS,
        .pass.name = "myplugin", /* For use in the dump file */
    
        /* Predicate (boolean) function that gets executed before your pass.  If the
         * return value is 'true' your pass gets executed, otherwise, the pass is
         * skipped.
         */
        .pass.gate = myplugin_gate,  /* always returns true, see full code */
        .pass.execute = myplugin_exec, /* Your pass handler/callback */
    };

Please note that there are many more options we can specify for our pass, but to keep things simple, we are just going to have a basic pass with a gate and execution callback. myplugin_exec() is the function that is called back given the conditions specified in the optimization pass we define below. Here are the pieces for all three parts of the optimization pass (the source code for the full plugin can be found here):

Now that we have a structure defining how we want our pass to act, let's finally write the required plugin_init() function:

    /* Return 0 on success or error code on failure */
    int plugin_init(struct plugin_name_args   *info,  /* Argument information */
                    struct plugin_gcc_version *ver)   /* Version info of GCC  */
    {
        /* 
         * Used to tell the plugin-framework about where we want to be called in the
         * set of all passes.  This is located in tree-pass.h
         */
        struct register_pass_info pass;
        printf("Plugin initialized...\n");
        
        /*
	 * We could call: plugin_default_version_check() to validate our plugin, but
         * we will skip that.  Instead, as mentioned it can be more useful if we
         * validate the version information ourselves
         */
         if (strncmp(ver->basever, myplugin_ver.basever, strlen("4.6")))
           return -1; /* Incorrect version of GCC */
    
        /*
	 * Setup the info to register with GCC telling when we want to be called and
         * to what GCC should call, when it's time to be called.
         */
        pass.pass = &myplugin_pass.pass;
    
        /*
	 * Get called after GCC has produced the SSA representation of the program.
         * After the first SSA pass.
         */
        pass.reference_pass_name = "ssa";
        pass.ref_pass_instance_number = 1;
        pass.pos_op = PASS_POS_INSERT_AFTER;
    
        /* Tell GCC we want to be called after the first SSA pass */
        register_callback("myplugin", PLUGIN_PASS_MANAGER_SETUP, NULL, &pass);
    
        /*
	 * Tell GCC some information about us... just for use in --help and
         * --version
         */
        register_callback("myplugin", PLUGIN_INFO, NULL, &myplugin_info);
       
        /* Successful initialization */ 
        return 0;
    }  

The above ties everything together. The first thing we need our plugin to do is to visit each statement in the source code that is being compiled and identify the read-only data. As an SSA pass, the compiler will hand our plugin GIMPLE, GCC's intermediate language, a set of basic blocks per function. Since all frontends (C, C++, Go, Fortran, etc) produce GIMPLE, gcc can effectively optimize just the GIMPLE code, rendering all optimizers capable of working on any language that GCC can parse. Likewise, when a developer writes a pass for the GIMPLE gcc intermediate language, it becomes language agnostic and can be applied to any language GCC parses. Using a three-address code, GIMPLE represents each statement in the input language as a statement consisting of two operands and a result value. In the case of an assignment statement with an addition operator, we have something like:

    lhs = op1 + op2

Where lhs, op1, and op2 are the three address codes that make up the assignment statement.

Read-only identification

There might be an easier way of finding read-only data in a program, however our approach here allows one to understand how to traverse basic blocks and statements in the program. We are more concerned with learning than functionality for this tutorial.

When a function gets passed to our plugin we need to do something with it, mainly analyze it and identify read-only data. The following is our callback we registered previously myplugin_exec():

   
    static unsigned myplugin_exec(void)
    {
       unsigned i;
       const_tree str, op;
       basic_block bb;
       gimple stmt;
       gimple_stmt_iterator gsi;

       FOR_EACH_BB(bb)
         for (gsi=gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
         {
             stmt = gsi_stmt(gsi);
             for (i=0; i<gimple_num_ops(stmt); ++i)
               if ((op = gimple_op(stmt, i)) && (str = is_str_cst(op)))
                 spell_check(stmt, str);
         }

       return 0;
    }

FOR_EACH_BB operates on a global variable in GCC which represents the current function being processed, cfun. The gsi is a GIMPLE statement iterator. In short, this loop traverses each basic block in cfun and visits each statement in that function. We look at each statement in the basic block via the gsi interface. Finally we look at each operand that makes up the GIMPLE statement. The is_str_cst() is a predicate function that we will define below. This function determines if the operand we plucked from the statement represents a string constant. If it does we will spell check that string.

In GCC, every construct in the source code is essentially a tree. A tree node could represent constants, variables, functions, etc. Often these tree instances are wrapped multiple times. For instance the tree node might be encapsulated in an SSA_NAME instance for the SSA representation. Under that might be a POINTER_TYPE and under that an INTEGER_CST for a constant integer. Note that these node types are defined in tree.def. Helper functions in tree.h can help to identify and peel off these layers, such as with TREE_CODE() and TREE_OPERAND() respectively. Also, TREE_TYPE() is useful for peeling off the layers. The debug_* functions can help better understand this layering, particularly debug_tree(). The following routine is what we will start off with for identifying STRING_CST:

    static const_tree is_str_cst(const_tree node)
    {
       const_tree str = node;

       /* Filter out types we are ignoring */
       if (TREE_CODE(str) == VAR_DECL)
       {
           if (!(str = DECL_INITIAL(node))) /* nop expr  */
             return NULL_TREE;
           else if (TREE_CODE(str) == INTEGER_CST) /* Ignore single chars */
             return NULL_TREE;

           str = TREE_OPERAND(str, 0); /* addr expr */
       }
       else if (TREE_CODE(str) == ADDR_EXPR)
           str = TREE_OPERAND(str, 0);

       /* We only deal with readonly stuff */
       if (!TYPE_READONLY(str) && (TREE_CODE(str) != ARRAY_REF))
           return NULL_TREE;
      
       if (TREE_CODE(str) != STRING_CST) 
           str = TREE_OPERAND(str, 0);

       if (TREE_CODE(str) != STRING_CST)
           return NULL_TREE;
       else
           return str;
    }

As you can see, we detect the type of node to be processed. If we are initially presented with a declaration node (which one might get if the node were a parameter instance) we look at the declaration of it via DECL_INITIAL(). If we get an address, we peel off that variant and look at what it is an address expression of. If the node isn't read-only we assume that it's probably not a hard-coded string so we just ignore it. There is probably a better way of handling this, such as just placing the node in a loop around operand checks, but I'll leave other methods of peeling the types as an exercise for the reader.

Conclusion

Plugin passes are quite powerful, as can be seen above. I hope this brief jaunt down "pass" lane has been helpful. But before I leave you to your plugin hacking I want to give a few more tips that can aid learning the GCC internals. Use a debugger. If you built GCC and your plugin with debugging symbols, you can learn a lot by stepping through the code. When debugging via gdb, make sure you use the actual binary for compiling the language you want, for go the binary is gocc and for C you probably want to use xgcc instead of GCC. xgcc is located in the object directory where you built GCC, in the gcc subdirectory. Use that for debugging C related code. One other useful tip in learning the GCC internals is to actually trace the code that is used for printing and dumping data from the passes, such as debug_gimple_stmt(), debug_generic_stmt(), and debug_tree() these will help you see how GCC structures the objects because everything is a tree.
(Log in to post comments)

An introduction to creating GCC plugins: also GCC MELT or GCC Python

Posted Sep 29, 2011 6:42 UTC (Thu) by bstarynk (guest, #63409) [Link]

There are also some simpler way to extend GCC: coding GCC plugins in C is tedious, and there are some alternatives:

  • First, use GCC-MELT (disclaimer: I, Basile, am the main author of GCC MELT). MELT is a high-level domain specific language to ease the development of GCC extensions. The MELT language provides some high-level features: ability to do powerful pattern matching on GCC internal data (like Gimple): first-class dynamically typed garbage-collected values; ability to add new GCC passes in MELT; the MELT language is translated to C, and you can even add small C code chunks in your MELT code; MELT has a syntax similar to Lisp or Scheme; your MELT extensions can analyze or modify GCC internal representations, for various tasks like static analysis, coding rules validation, specific warnings, optimization, code refactoring, etc. MELT is available as a GCC [meta-] plugin.
  • There is also a Gcc Python Plugin which enables you to code your GCC plugin-like extensions in Python. However, Python don't give you pattern matching on GCC internal representations (like MELT does), and the GCC Python plugin is less mature than MELT so gives you access to less internal GCC data.

The main point is that people should consider extending GCC for their own needs and this is now possible with GCC plugin machinery, either in C, or in MELT or Python.

-- Basile Starynkevitch

An introduction to creating GCC plugins: also GCC MELT or GCC Python

Posted Sep 29, 2011 20:19 UTC (Thu) by dave_malcolm (subscriber, #15013) [Link]

Thanks for mentioning the gcc Python plugin.

I'm the author of the Python plugin, so naturally I'm fond of it.

The lack of pattern matching within Python's syntax hasn't been a major issue to me in practice. Also, I like to think that it's become significantly more mature than when I first announced it :)

I suspect that MELT will lead to faster-executing results, given that CPython isn't the fastest runtime in the world - though speed hasn't been an issue for me yet, either.

Inspired by this LWN article, I had a go at implementing a spell-checking GCC pass in Python, using my plugin.

You can see the results at:
http://readthedocs.org/docs/gcc-python-plugin/en/latest/w...

I believe my version is considerably simpler than doing it in C (about 35 lines of Python code, including comments), and, I hope, easier to read.

The Python implementation of the spell checker is slightly different to the one in the article, in that I'm using gcc.Gimple.walk_tree() on each statement to (recursively) visit all tree nodes it references. I'm also using gcc.warning() and gcc.inform() to integrate with GCC's warnings subsystem, so that the error messages are in GCC's normal format (so that e.g. when I click on them in emacs it takes me to the source line containing the error).

One of the nice things about using Python for this is all of the 3rd-party libraries that are available. In the Python implementation I'm using the "enchant" module that came out of AbiWord's spellchecker, and this makes it easy to offer spelling suggestions. There is a huge collection of Python libraries out there, so if you want to e.g. pause compilation part-way through and serve the internal-representation over HTTP, or dump it all into a database etc, there are lots of possibilities there.

Hope this is helpful
Dave

source code for example

Posted Sep 29, 2011 9:04 UTC (Thu) by mjw (subscriber, #16740) [Link]

Is the source code for this example plugin available for those of us who are too lazy to copy/paste from the article?

source code for example

Posted Sep 29, 2011 13:27 UTC (Thu) by corbet (editor, #1) [Link]

Over here. It was linked from the article, though, I must admit, not in an easy-to-notice way.

An introduction to creating GCC plugins

Posted Sep 29, 2011 21:58 UTC (Thu) by PaXTeam (subscriber, #24616) [Link]

if anyone's interested in what gcc plugins can do in a kernel context, check out the few distributed in PaX (under tools/gcc). some examples: automatic ops structure constification, user/kernel function pointer separation, *alloc size statistics, etc.

An introduction to creating GCC plugins

Posted Sep 30, 2011 13:38 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

I'd vote for a way to point out such oportunities to hackers, or even write source patches. Some magic mangling that gives object code that doesn't behave as the source says is deeply disturbing to my little mind.

An introduction to creating GCC plugins

Posted Sep 30, 2011 21:10 UTC (Fri) by PaXTeam (subscriber, #24616) [Link]

i assume you're compiling all your code with -O0 then lest you fall victim of the infamous NULL check elimination as the kernel itself did not long ago, not to mention SSP that must be your enemy #1 then ;). more seriously, and assuming you're talking about the constification plugin, the truth is that the patch route was tried before and it just doesn't scale at both the producer (patch writer) and the consumer (lkml/kernel devs) side.

to give you some numbers, an allmodconfig 2.6.39 i386 kernel loses over 7000 static (i.e., not runtime allocated) writable function pointers (a reduction of about 16%). creating an equivalent source patch would be thousands of lines of code and have virtually no chance to be accepted in any reasonable amount of time (not to mention the maintenance effort of being out-of-tree during the process; i've carried a small fraction of such a patch in PaX for years and it was a PITA even if the patch itself was 'only' 800k).

also by virtue of enforcing const on these types one will quickly find out all the places where variables of a given type are modified (sometimes in direct contradiction to kernel policy) and one will be forced to document it in the code (for each such exception there's a writable type to be used, based on the no_const attribute introduced by the plugin). the patch for this is about 100k, but it also includes changes that actually eliminate the need to modify a given variable (vs. just changing its type to be writable).

all in all, the cost/benefit ratio of the plugin approach is excellent and there's a lot more in the pipeline ;).

An introduction to creating GCC plugins

Posted Oct 4, 2011 8:44 UTC (Tue) by MattDavis (guest, #79566) [Link]

Thanks for the tip!

An introduction to creating GCC plugins

Posted Jun 17, 2013 21:04 UTC (Mon) by sandeepchaudhary (guest, #91458) [Link]

Hi guys,

I am posting this comment here for some help. I have followed the steps mentioned in this tutorial to create the plugin, and I am able to successfully build the plugin shared object. However, I can not run the plugin. It gives errors related to undefined symbols. Below is the terminal snippet.

sandeep@ubuntu:~/myplugin$ /usr/bin/gcc -fplugin=/home/sandeep/myplugin/speller.so -c test.c
cc1: error: cannot load plugin /home/sandeep/myplugin/speller.so
/home/sandeep/myplugin/speller.so: undefined symbol: warning_at

I can see that the mentioned symbol is indeed undefined using following

sandeep@ubuntu:~/myplugin$ nm -D -C speller.so | grep warning_at
U warning_at

And I see that

sandeep@ubuntu:/usr/lib/gcc/x86_64-linux-gnu/4.7.2$ pwd
/usr/lib/gcc/x86_64-linux-gnu/4.7.2
sandeep@ubuntu:/usr/lib/gcc/x86_64-linux-gnu/4.7.2$ nm -D -C $(gcc -print-file-name=cc1) | grep "warning_"
0000000000c1e5a0 T warning_at(unsigned int, int, char const*, ...)

What am I missing? How should I link the libraries correctly?

Thanks,
Sandeep.

An introduction to creating GCC plugins

Posted Jun 17, 2013 21:28 UTC (Mon) by PaXTeam (subscriber, #24616) [Link]

gcc 4.7 introduced the ability to be built with a c++ compiler and most/all distros build gcc that way so you'll have to build your plugins with g++ instead of gcc as well.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds