Replacing AWK with Python in GCC?
GCC has a lot of command-line options—so many, in fact, that its build process does a fair amount of processing using AWK to generate the option-parsing code for the compiler. But some find the AWK code to be difficult to work with. A recent post to the GCC mailing list proposes replacing AWK with Python in the hopes of more maintainable option-parsing generation in the future.
Martin Liška raised the idea on
July 17 to gauge the reaction of the GCC development community to a
switch away from AWK; there are a number of cleanups that he would like to
make, but he doesn't want to make them in AWK.
One problem that he noted is that the .opt file format used for
specifying the options is not
well-specified, so part of what he would like to do is to clean that up.
That should make it easier to parse those files and for some targets that
generate
them
(e.g. ARM).
There are other problems with the AWK code, he said, including too few
sanity checks on the options and the code generally being "quite
unpleasant
to make any adjustments
".
His post was accompanied by a Python script that can build code from an optionlist file. That file is created as part of the process of building GCC by combining all of the options in various .opt files (using an AWK script that would presumably be replaced as well). Right now, the optionlist file is processed by other AWK scripts to generate .c and .h files that will do the option parsing. The .opt files specify the names, types, and arguments for the GCC command-line options.
While there was not a huge amount of opposition to Python per se,
there were a
number of questions about requiring it to build GCC. There were questions
about which version of Python to require (Liška's script uses Python 3)
as well as how difficult it will be to bring up Python on targets that do
not have it available. For example, Karsten Merker was concerned that a Python dependency would make
it harder to bring up GCC on new target architectures, but Matthias Klose
said that building Python should not be any
more of a burden than building AWK. In addition, Python can be cross-built
easily, unlike two other languages that were mentioned along the way:
"you can cross build python as
well more easily than for example perl or guile
".
A bigger problem would seem to be for Windows, where building Python using GCC is not supported, Vadim Konovalov said. That's all a bit circular—using GCC to build Python to build GCC—but using GCC to build itself is a time-honored tradition. There are various binary versions of Python for Windows available, however.
The Python version question is also something that would need to be
resolved. There are lots of systems out there that only have Python 2
available, particularly those running enterprise Linux distributions. But
creating Python programs that can run on either Python 2 or 3 is
certainly possible—and perhaps desirable. Eric S. Raymond said: "It's not very difficult to write
'polyglot' Python that is indifferent
to which version it runs under.
" He pointed to a FAQ/HOWTO
that he
co-authored, which documented the techniques he has used for
projects like reposurgeon.
Those methods do not provide compatibility with Python 2.6, though, which is the version available in some of the older (but still supported) enterprise distributions. Raymond pointed out that Python 2.6 stopped being supported by the Python core developers in 2013; beyond that, the lack of 2.6 support has not been a problem for his projects:
In practice, no deployment of reposurgeon or src or doclifter or any of the other polyglot Python code I maintain has tripped over this, or at least I'm not seeing issue reports about it.
Perl is already required for building certain targets, so some wondered if it made more sense to use it to replace AWK instead of Python. There was also mention of using the GNU Project's preferred scripting languages, which is Guile. But there are a number of advantages for choosing Python, not least that Liška is offering to do the work for that switch. He also elaborated on why he wants to move away from AWK:
Others agreed with the idea of switching away from AWK and thought that Python was a good choice. Several in the thread said that they found AWK hard to read. Paul Konig noted that he supports switching to Python:
Joseph Myers also supported the switch. He had some suggestions for features that are not part of the current AWK scripts, as well.
Common code that reads .opt files into some logical datastructure, complete with validation including that all flags specified are in the list of valid flags, followed by converting those structures to whatever output is required, seems appropriate to me.
As noted in his first message, Liška thinks the decision will ultimately need to be made by the GCC steering committee (which Myers is a member of). Given the reaction to Liška's RFC of sorts, it would seem likely that a decision would be favorable. He said that he is targeting GCC 10 for these changes, which is still two years or so out at this point. A more full-featured scripting language (and someone actively working on making the option handling better) will serve the project well down the road.
| Index entries for this article | |
|---|---|
| Python | In other tools |
