Replacing AWK with Python in GCC?
GCC has a lot of command-line options—so many, in fact, that its build process does a fair amount of processing using AWK to generate the option-parsing code for the compiler. But some find the AWK code to be difficult to work with. A recent post to the GCC mailing list proposes replacing AWK with Python in the hopes of more maintainable option-parsing generation in the future.
Martin Liška raised the idea on
July 17 to gauge the reaction of the GCC development community to a
switch away from AWK; there are a number of cleanups that he would like to
make, but he doesn't want to make them in AWK.
One problem that he noted is that the .opt file format used for
specifying the options is not
well-specified, so part of what he would like to do is to clean that up.
That should make it easier to parse those files and for some targets that
generate
them
(e.g. ARM).
There are other problems with the AWK code, he said, including too few
sanity checks on the options and the code generally being "quite
unpleasant
to make any adjustments
".
His post was accompanied by a Python script that can build code from an optionlist file. That file is created as part of the process of building GCC by combining all of the options in various .opt files (using an AWK script that would presumably be replaced as well). Right now, the optionlist file is processed by other AWK scripts to generate .c and .h files that will do the option parsing. The .opt files specify the names, types, and arguments for the GCC command-line options.
While there was not a huge amount of opposition to Python per se,
there were a
number of questions about requiring it to build GCC. There were questions
about which version of Python to require (Liška's script uses Python 3)
as well as how difficult it will be to bring up Python on targets that do
not have it available. For example, Karsten Merker was concerned that a Python dependency would make
it harder to bring up GCC on new target architectures, but Matthias Klose
said that building Python should not be any
more of a burden than building AWK. In addition, Python can be cross-built
easily, unlike two other languages that were mentioned along the way:
"you can cross build python as
well more easily than for example perl or guile
".
A bigger problem would seem to be for Windows, where building Python using GCC is not supported, Vadim Konovalov said. That's all a bit circular—using GCC to build Python to build GCC—but using GCC to build itself is a time-honored tradition. There are various binary versions of Python for Windows available, however.
The Python version question is also something that would need to be
resolved. There are lots of systems out there that only have Python 2
available, particularly those running enterprise Linux distributions. But
creating Python programs that can run on either Python 2 or 3 is
certainly possible—and perhaps desirable. Eric S. Raymond said: "It's not very difficult to write
'polyglot' Python that is indifferent
to which version it runs under.
" He pointed to a FAQ/HOWTO
that he
co-authored, which documented the techniques he has used for
projects like reposurgeon.
Those methods do not provide compatibility with Python 2.6, though, which is the version available in some of the older (but still supported) enterprise distributions. Raymond pointed out that Python 2.6 stopped being supported by the Python core developers in 2013; beyond that, the lack of 2.6 support has not been a problem for his projects:
In practice, no deployment of reposurgeon or src or doclifter or any of the other polyglot Python code I maintain has tripped over this, or at least I'm not seeing issue reports about it.
Perl is already required for building certain targets, so some wondered if it made more sense to use it to replace AWK instead of Python. There was also mention of using the GNU Project's preferred scripting languages, which is Guile. But there are a number of advantages for choosing Python, not least that Liška is offering to do the work for that switch. He also elaborated on why he wants to move away from AWK:
Others agreed with the idea of switching away from AWK and thought that Python was a good choice. Several in the thread said that they found AWK hard to read. Paul Konig noted that he supports switching to Python:
Joseph Myers also supported the switch. He had some suggestions for features that are not part of the current AWK scripts, as well.
Common code that reads .opt files into some logical datastructure, complete with validation including that all flags specified are in the list of valid flags, followed by converting those structures to whatever output is required, seems appropriate to me.
As noted in his first message, Liška thinks the decision will ultimately need to be made by the GCC steering committee (which Myers is a member of). Given the reaction to Liška's RFC of sorts, it would seem likely that a decision would be favorable. He said that he is targeting GCC 10 for these changes, which is still two years or so out at this point. A more full-featured scripting language (and someone actively working on making the option handling better) will serve the project well down the road.
Index entries for this article | |
---|---|
Python | In other tools |
Posted Jul 25, 2018 18:26 UTC (Wed)
by nirbheek (subscriber, #54111)
[Link] (22 responses)
"Oh cool, you want to fix this? I support you fully, but only if you do it the way I would've done it."
Well, guess what you're not doing it, so the choice is not "Python 3 or Python 2 or Perl or Guile", it's "Python 3 or status-quo of dumpster fire".
For god's sake it's not Java, why is there so much bikeshedding over it.
Posted Jul 25, 2018 18:55 UTC (Wed)
by smoogen (subscriber, #97)
[Link] (3 responses)
Posted Jul 26, 2018 4:45 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (2 responses)
Software and especially software maintenance is unlike any other kind. It is very different.
Posted Jul 30, 2018 7:08 UTC (Mon)
by Rudd-O (guest, #61155)
[Link] (1 responses)
Posted Aug 2, 2018 22:38 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
If you let the engineers do things properly, it's a simple job. If you let people with no clue (like the beancounters and marketeers) interfere, then things get messy.
Sadly, it's usually the beancounters and marketeers that hold the purse-strings, so it gets done their way or - more usually - not at all (meanwhile promising customers "it's coming - it really is - it'll be here real soon now").
Cheers,
Posted Jul 25, 2018 19:58 UTC (Wed)
by madscientist (subscriber, #16861)
[Link] (9 responses)
Saying "guess what, your previously working environment is now working Even Better(tm), but oh by the way now you need to install a whole bunch of extra stuff onto your system before you can do the same things you were doing yesterday" is not, to everyone, actually an improvement.
I build GCC regularly. Many of the systems I do this on do not have Python 3 installed. I am NOT excited about having to go find/download/build Python 3 on all my build systems.
The GCC build system already relies (at least tangentially) on Perl. So why introduce Yet Another Interpreter Dependency? If you don't like awk (and who does?) then Perl is more than powerful and expressive enough for this job: rewrite the awk scripts in Perl and avoid adding new prerequisites for the build system.
Change has consequence and just because someone did some work doesn't mean the results have no cost and should be accepted without thought. It's not bikeshedding if the change has a major repercussion like increasing the dependency footprint of the software.
Posted Jul 25, 2018 20:03 UTC (Wed)
by rahvin (guest, #16953)
[Link] (4 responses)
Posted Jul 25, 2018 20:07 UTC (Wed)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Jul 25, 2018 20:49 UTC (Wed)
by smurf (subscriber, #17840)
[Link]
WRT old machines without Python >= 2.7: there's always cross compilation.
Posted Jul 25, 2018 20:37 UTC (Wed)
by madscientist (subscriber, #16861)
[Link] (1 responses)
However, to answer your question, if the choice ends up being either someone comes up with a Perl implementation in which case we'd prefer to use that, or if no one does that then we're going to start requiring Python 3 as a GCC build dependency, then sure, I'd be willing to put in that effort. It does not appear that these are the options on offer, however.
Posted Jul 25, 2018 21:22 UTC (Wed)
by rahvin (guest, #16953)
[Link]
Commenting on it here is all well and good, but the discussion is moving towards python, at least partly, because no one has come forward and said they'd move the scripts to perl. If you don't make the offer it's not there for consideration.
Posted Jul 26, 2018 4:42 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
Replacing certainly unreadable code by likely unreadable code doesn't sound like a major upgrade IMHO, not worth the disruption.
> and avoid adding new prerequisites for the build system.
Granted.
Posted Jul 26, 2018 20:14 UTC (Thu)
by MrWim (subscriber, #47432)
[Link] (2 responses)
One trick they could apply is to run these scripts during Of course this assumes that the process is deterministic and that the generated C files are themselves portable - but this should not be too much of an inconvenience. You can even go further and include the generated C files in the git repo. It sounds icky I know, but the downsides of the duplication can be mitigated as long as: I've used this approach successfully in some of my projects where I want a build dependency, but I don't want to burden other developers (including casual, drive-by ones) with the requirement to install that dependency.
Posted Jul 26, 2018 20:54 UTC (Thu)
by jwilk (subscriber, #63328)
[Link] (1 responses)
(BTW, it's Subversion, not git.)
Posted Jul 27, 2018 21:19 UTC (Fri)
by paulj (subscriber, #341)
[Link]
Posted Jul 25, 2018 21:35 UTC (Wed)
by david.a.wheeler (subscriber, #72896)
[Link] (1 responses)
However, this is proposing a major new dependency in a key component. That's a big decision, and worth discussing first.
Posted Jul 25, 2018 23:12 UTC (Wed)
by jhoblitt (subscriber, #77733)
[Link]
I am completely ignorant as to the gcc build process but is it possible for `optionlist` to be pre-computed and perhaps even included in the release tarballs? Is the .opt format so complicated that a small c/c++ parser couldn't be included as part of the build?
Posted Jul 25, 2018 21:39 UTC (Wed)
by xorbe (guest, #3165)
[Link] (3 responses)
Posted Jul 25, 2018 23:13 UTC (Wed)
by jhoblitt (subscriber, #77733)
[Link] (2 responses)
Posted Jul 26, 2018 4:26 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (1 responses)
Plus "light-weight" is intuitively not one of the top requirements when gcc is concerned... is it?
Now compared to awk (as opposed to Python) I suspect lua would win on practically every possible metric :-)
Posted Jul 26, 2018 6:52 UTC (Thu)
by eru (subscriber, #2753)
[Link]
On the other hand, Lua is very readable and easy to learn. Rather like Python in that respect.
Plus "light-weight" is intuitively not one of the top requirements when gcc is concerned... is it?
Its implementation is so small that you could include it with GCC sources without growing them with any significant percentage!
Posted Jul 26, 2018 4:38 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (1 responses)
No one can tell for how long the volunteer/hero-of-the-day will stick around after his initial feat for less appealing maintenance, not even himself. No one knows if/when he will have kids, a super demanding job in a brand new startup, sickness, etc. People capable of large and intense efforts tend to also be the ones getting bored the most easily. Maybe the people "volunteering an opinion" are the ones who have been taking out the garbage for decades, did you check?
> to cleanup a horribly messed up and unmaintainable piece of crap part of the code [...] status-quo of dumpster fire".
Did anyone say it was that bad? AFAIK gcc works.
> For god's sake it's not Java, why is there so much bikeshedding over it.
Good to know we can count on your bikeshedding whenever Java is considered for some random thing :-)
Posted Jul 26, 2018 5:02 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
It blew me away because it methodically removes most of Java pain points one by one - while of course keeping all the advantages.
I digress sorry; *not* suggesting Kotlin for this case.
Posted Jul 26, 2018 9:48 UTC (Thu)
by jwilk (subscriber, #63328)
[Link] (4 responses)
But I imagine that AWK programs would be too verbose when rewritten in Python. Perl would be probably better choice.
Posted Jul 26, 2018 19:36 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (2 responses)
Posted Jul 26, 2018 19:43 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link]
Posted Jul 26, 2018 20:10 UTC (Thu)
by jwilk (subscriber, #63328)
[Link]
Posted Jul 28, 2018 21:05 UTC (Sat)
by dvdeug (guest, #10998)
[Link]
Posted Jul 26, 2018 14:44 UTC (Thu)
by Tara_Li (guest, #26706)
[Link]
AWK is one of the modules in busybox - while Python I'm fairly sure isn't. I doubt there are many situations where full GCC building is done in a busybox environment, but it is a possible use case, I think.
Posted Jul 26, 2018 17:28 UTC (Thu)
by mm7323 (subscriber, #87386)
[Link]
Surely fixing the grammar and syntax of the .opt files should be the first step, and then a trivial parser/processor could be made in any language, ideally C to ease bootstrapping.
Additionally, I've only ever found cross-compiling various Python interpreters difficult, though that was some years ago and things may have improved.
Posted Jul 26, 2018 22:39 UTC (Thu)
by RooTer (guest, #91640)
[Link] (3 responses)
Why would anyone want to use a version of Python that is know to expire in 1.5years from now one, just to support some enterprise distro users that want to compiling gcc AND don't want to compile Python 3 for some unfathomable reason.
Posted Jul 26, 2018 23:27 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
And really, I looked at the code and there's no reason to NOT make compatible with both Python versions.
Posted Jul 28, 2018 11:31 UTC (Sat)
by azumanga (subscriber, #90158)
[Link] (1 responses)
Posted Aug 3, 2018 19:07 UTC (Fri)
by flussence (guest, #85566)
[Link]
Posted Jul 29, 2018 18:23 UTC (Sun)
by ofranja (guest, #11084)
[Link]
IMHO, the awk version easier to read. Sure, it has more duplication, but one can easily abstract awk code (hint: it supports functions and there are ways to encapsulate data in arrays). Bad/redundant code can be written in any language. The python version has tons of details that require a deeper knowledge of python as well and is not necessarily easier to work with.
Also, not everyone can (or want to) afford extra ~130MB of storage just to process some text.
In my opinion, putting some effort to improve the awk version would probably be a better solution in the long term.
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Wol
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
They are also complex enough that the translated version is unlikely to work at all, given that a2p is essentially unmaintained AFAIK.
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Saying "guess what, your previously working environment is now working Even Better(tm), but oh by the way now you need to install a whole bunch of extra stuff onto your system before you can do the same things you were doing yesterday" is not, to everyone, actually an improvement.
make dist
and include the output in the source tarball. This would mean that there are no additional dependencies required for people who just want to build GCC, only for those who want to develop GCC. This would make bring-up on new targets just as easy as now.
.opt
and .c
fileReplacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Lua
much smaller community/number of people understanding it and generally smaller "ecosystem" of tools like debuggers, editors,.
Lua
Replacing AWK with Python in GCC?
> "Oh cool, you want to fix this? I support you fully, but only if you do it the way I would've done it."
Kotlin
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
You are supposed to push “Preview comment” *again* after making changes in textarea.
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?
Replacing AWK with Python in GCC?