|
|
Subscribe / Log in / New account

Replacing AWK with Python in GCC?

By Jake Edge
July 25, 2018

GCC has a lot of command-line options—so many, in fact, that its build process does a fair amount of processing using AWK to generate the option-parsing code for the compiler. But some find the AWK code to be difficult to work with. A recent post to the GCC mailing list proposes replacing AWK with Python in the hopes of more maintainable option-parsing generation in the future.

Martin Liška raised the idea on July 17 to gauge the reaction of the GCC development community to a switch away from AWK; there are a number of cleanups that he would like to make, but he doesn't want to make them in AWK. One problem that he noted is that the .opt file format used for specifying the options is not well-specified, so part of what he would like to do is to clean that up. That should make it easier to parse those files and for some targets that generate them (e.g. ARM). There are other problems with the AWK code, he said, including too few sanity checks on the options and the code generally being "quite unpleasant to make any adjustments".

His post was accompanied by a Python script that can build code from an optionlist file. That file is created as part of the process of building GCC by combining all of the options in various .opt files (using an AWK script that would presumably be replaced as well). Right now, the optionlist file is processed by other AWK scripts to generate .c and .h files that will do the option parsing. The .opt files specify the names, types, and arguments for the GCC command-line options.

While there was not a huge amount of opposition to Python per se, there were a number of questions about requiring it to build GCC. There were questions about which version of Python to require (Liška's script uses Python 3) as well as how difficult it will be to bring up Python on targets that do not have it available. For example, Karsten Merker was concerned that a Python dependency would make it harder to bring up GCC on new target architectures, but Matthias Klose said that building Python should not be any more of a burden than building AWK. In addition, Python can be cross-built easily, unlike two other languages that were mentioned along the way: "you can cross build python as well more easily than for example perl or guile".

A bigger problem would seem to be for Windows, where building Python using GCC is not supported, Vadim Konovalov said. That's all a bit circular—using GCC to build Python to build GCC—but using GCC to build itself is a time-honored tradition. There are various binary versions of Python for Windows available, however.

The Python version question is also something that would need to be resolved. There are lots of systems out there that only have Python 2 available, particularly those running enterprise Linux distributions. But creating Python programs that can run on either Python 2 or 3 is certainly possible—and perhaps desirable. Eric S. Raymond said: "It's not very difficult to write 'polyglot' Python that is indifferent to which version it runs under." He pointed to a FAQ/HOWTO that he co-authored, which documented the techniques he has used for projects like reposurgeon.

Those methods do not provide compatibility with Python 2.6, though, which is the version available in some of the older (but still supported) enterprise distributions. Raymond pointed out that Python 2.6 stopped being supported by the Python core developers in 2013; beyond that, the lack of 2.6 support has not been a problem for his projects:

The HOWTO introduction does say that its techniques won't guarantee 2.6 compatibility. That would have been a great deal more difficult - some 3.x syntax backported into 2.7.2 makes a large difference here.

In practice, no deployment of reposurgeon or src or doclifter or any of the other polyglot Python code I maintain has tripped over this, or at least I'm not seeing issue reports about it.

Perl is already required for building certain targets, so some wondered if it made more sense to use it to replace AWK instead of Python. There was also mention of using the GNU Project's preferred scripting languages, which is Guile. But there are a number of advantages for choosing Python, not least that Liška is offering to do the work for that switch. He also elaborated on why he wants to move away from AWK:

Yes, using Python is mainly because of object-oriented programming paradigm. It's handy to have encapsulation of functionality in methods, one can do unit-testing of parts of the script. Currently AWK scripts are mix of input/output transformation and various emission of printf('#error..') sanity checks. In general the script is not easily readable and contains multiple global arrays that simulate encapsulation in classes.

Others agreed with the idea of switching away from AWK and thought that Python was a good choice. Several in the thread said that they found AWK hard to read. Paul Konig noted that he supports switching to Python:

In roughly 40 years, and roughly 40 programming languages, I've only twice encountered a language where I could go from knowing nothing at all to writing a substantial real world program in just one week: Pascal (in college) and Python (about 15 years ago). This is why Python became my language of choice whenever I don't need the speed or small memory footprint of C/C++.

Joseph Myers also supported the switch. He had some suggestions for features that are not part of the current AWK scripts, as well.

More generally, I don't think there are any checks that flags specified for options are known flags at all; I expect a typo in a flag to result in it being silently ignored.

Common code that reads .opt files into some logical datastructure, complete with validation including that all flags specified are in the list of valid flags, followed by converting those structures to whatever output is required, seems appropriate to me.

As noted in his first message, Liška thinks the decision will ultimately need to be made by the GCC steering committee (which Myers is a member of). Given the reaction to Liška's RFC of sorts, it would seem likely that a decision would be favorable. He said that he is targeting GCC 10 for these changes, which is still two years or so out at this point. A more full-featured scripting language (and someone actively working on making the option handling better) will serve the project well down the road.


Index entries for this article
PythonIn other tools


to post comments

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 18:26 UTC (Wed) by nirbheek (subscriber, #54111) [Link] (22 responses)

I've never understood this mentality in open source, that when someone comes along to cleanup a horribly messed up and unmaintainable piece of crap part of the code in a particular way, suddenly the *entire community* has an opinion on how it has to be done and *none* of those people are actually volunteering to help, just volunteering an opinion.

"Oh cool, you want to fix this? I support you fully, but only if you do it the way I would've done it."

Well, guess what you're not doing it, so the choice is not "Python 3 or Python 2 or Perl or Guile", it's "Python 3 or status-quo of dumpster fire".

For god's sake it's not Java, why is there so much bikeshedding over it.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 18:55 UTC (Wed) by smoogen (subscriber, #97) [Link] (3 responses)

I have the same reaction too, but have come to realize it is basic human things. It is no different with auto mechanics, building houses, or the various sciences. Everyone has grumbled about getting deionized water into the labs, but done nothing about it. Then when someone comes up with a pipe system, every chemist is now a plumbing expert and complain that their preferred method wasn't used. They all think they are helping make things better but it is as much about humans being hard wired to keep status quo for safety as it is "well maybe my idea I have been sitting on for 20 years could get done since you are actually going to do something about it."

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 4:45 UTC (Thu) by marcH (subscriber, #57642) [Link] (2 responses)

> It is no different with auto mechanics, building houses, or the various sciences.

Software and especially software maintenance is unlike any other kind. It is very different.

Replacing AWK with Python in GCC?

Posted Jul 30, 2018 7:08 UTC (Mon) by Rudd-O (guest, #61155) [Link] (1 responses)

It isn't at all different in the sense that people suddenly become "experts" when their cheese is moved.

Replacing AWK with Python in GCC?

Posted Aug 2, 2018 22:38 UTC (Thu) by Wol (subscriber, #4433) [Link]

Actually, it isn't different at all.

If you let the engineers do things properly, it's a simple job. If you let people with no clue (like the beancounters and marketeers) interfere, then things get messy.

Sadly, it's usually the beancounters and marketeers that hold the purse-strings, so it gets done their way or - more usually - not at all (meanwhile promising customers "it's coming - it really is - it'll be here real soon now").

Cheers,
Wol

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 19:58 UTC (Wed) by madscientist (subscriber, #16861) [Link] (9 responses)

There's a reason something is the status quo: it works. If you're proposing to replace something that works, it needs to be replaced in a way that doesn't cause other people pain.

Saying "guess what, your previously working environment is now working Even Better(tm), but oh by the way now you need to install a whole bunch of extra stuff onto your system before you can do the same things you were doing yesterday" is not, to everyone, actually an improvement.

I build GCC regularly. Many of the systems I do this on do not have Python 3 installed. I am NOT excited about having to go find/download/build Python 3 on all my build systems.

The GCC build system already relies (at least tangentially) on Perl. So why introduce Yet Another Interpreter Dependency? If you don't like awk (and who does?) then Perl is more than powerful and expressive enough for this job: rewrite the awk scripts in Perl and avoid adding new prerequisites for the build system.

Change has consequence and just because someone did some work doesn't mean the results have no cost and should be accepted without thought. It's not bikeshedding if the change has a major repercussion like increasing the dependency footprint of the software.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 20:03 UTC (Wed) by rahvin (guest, #16953) [Link] (4 responses)

Are you volunteering to rewrite the AWK scripts in perl?

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 20:07 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

There is an awk to Perl translator. It used to ship as part of the Perl core and is still available.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 20:49 UTC (Wed) by smurf (subscriber, #17840) [Link]

The a2p translator is likely to create an even greater mess of unreadable code from the current awk scripts.
They are also complex enough that the translated version is unlikely to work at all, given that a2p is essentially unmaintained AFAIK.

WRT old machines without Python >= 2.7: there's always cross compilation.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 20:37 UTC (Wed) by madscientist (subscriber, #16861) [Link] (1 responses)

What I'm saying is that the complaint in the post I replied to is inappropriate for (at least) this situation. It's perfectly reasonable, and not whining or bikeshedding, to consider and discuss carefully and fully a decision like adding a huge new build dependency.

However, to answer your question, if the choice ends up being either someone comes up with a Perl implementation in which case we'd prefer to use that, or if no one does that then we're going to start requiring Python 3 as a GCC build dependency, then sure, I'd be willing to put in that effort. It does not appear that these are the options on offer, however.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 21:22 UTC (Wed) by rahvin (guest, #16953) [Link]

Maybe you should make your offer to convert these scripts to perl officially on the list so they can be considered. Mind you, they'd probably also want a commitment from you to maintain them as well.

Commenting on it here is all well and good, but the discussion is moving towards python, at least partly, because no one has come forward and said they'd move the scripts to perl. If you don't make the offer it's not there for consideration.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 4:42 UTC (Thu) by marcH (subscriber, #57642) [Link]

> rewrite the awk scripts in Perl

Replacing certainly unreadable code by likely unreadable code doesn't sound like a major upgrade IMHO, not worth the disruption.

> and avoid adding new prerequisites for the build system.

Granted.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 20:14 UTC (Thu) by MrWim (subscriber, #47432) [Link] (2 responses)

Saying "guess what, your previously working environment is now working Even Better(tm), but oh by the way now you need to install a whole bunch of extra stuff onto your system before you can do the same things you were doing yesterday" is not, to everyone, actually an improvement.

One trick they could apply is to run these scripts during make dist and include the output in the source tarball. This would mean that there are no additional dependencies required for people who just want to build GCC, only for those who want to develop GCC. This would make bring-up on new targets just as easy as now.

Of course this assumes that the process is deterministic and that the generated C files are themselves portable - but this should not be too much of an inconvenience.

You can even go further and include the generated C files in the git repo. It sounds icky I know, but the downsides of the duplication can be mitigated as long as:

  1. The generation is deterministic - it depends only on files in the same git repo and isn't dependent on the architecture of the machine it's running on or the environment in which it is run.
  2. You have a CI system to reject git commits with an inconsistency between the .opt and .c file
  3. Concurrent modifications to the .opt files are fairly rare - you don't want to be dealing with merge conflicts in generated code - even if they can be resolved just by rerunning the tool on the merge commit.

I've used this approach successfully in some of my projects where I want a build dependency, but I don't want to burden other developers (including casual, drive-by ones) with the requirement to install that dependency.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 20:54 UTC (Thu) by jwilk (subscriber, #63328) [Link] (1 responses)

They already have configure (generated from configure.ac) committed to the repo, so the eww factor shouldn't be a problem.

(BTW, it's Subversion, not git.)

Replacing AWK with Python in GCC?

Posted Jul 27, 2018 21:19 UTC (Fri) by paulj (subscriber, #341) [Link]

The general approach is to have those built "source" files in the 'dist' tarball (generated by 'make dist'), so that that is buildable without dependencies on the tools needed for those built "source" files.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 21:35 UTC (Wed) by david.a.wheeler (subscriber, #72896) [Link] (1 responses)

Just fixing something probably wouldn't have raised any issues.

However, this is proposing a major new dependency in a key component. That's a big decision, and worth discussing first.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 23:12 UTC (Wed) by jhoblitt (subscriber, #77733) [Link]

As gcc is so fundamental during bootstraping, I also agree that introducing a dep on python 3 is likely raise problems on many platforms.

I am completely ignorant as to the gcc build process but is it possible for `optionlist` to be pre-computed and perhaps even included in the release tarballs? Is the .opt format so complicated that a small c/c++ parser couldn't be included as part of the build?

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 21:39 UTC (Wed) by xorbe (guest, #3165) [Link] (3 responses)

I don't think anyone is denying that awk is unfavorable or difficult. It's the complication of adding dependencies to the gcc build flow, it's a valid concern. If only there were light weight scripting languages that didn't inevitably implement every feature request. Sort of like a usable awk.

Replacing AWK with Python in GCC?

Posted Jul 25, 2018 23:13 UTC (Wed) by jhoblitt (subscriber, #77733) [Link] (2 responses)

Isn't that essentially what lua is?

Lua

Posted Jul 26, 2018 4:26 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

Yes - the drawbacks being as expected: much smaller community/number of people understanding it and generally smaller "ecosystem" of tools like debuggers, editors,... You can't have your cake and eat it.

Plus "light-weight" is intuitively not one of the top requirements when gcc is concerned... is it?

Now compared to awk (as opposed to Python) I suspect lua would win on practically every possible metric :-)

Lua

Posted Jul 26, 2018 6:52 UTC (Thu) by eru (subscriber, #2753) [Link]

much smaller community/number of people understanding it and generally smaller "ecosystem" of tools like debuggers, editors,.

On the other hand, Lua is very readable and easy to learn. Rather like Python in that respect.

Plus "light-weight" is intuitively not one of the top requirements when gcc is concerned... is it?

Its implementation is so small that you could include it with GCC sources without growing them with any significant percentage!

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 4:38 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

> and *none* of those people are actually volunteering to help, just volunteering an opinion.
> "Oh cool, you want to fix this? I support you fully, but only if you do it the way I would've done it."

No one can tell for how long the volunteer/hero-of-the-day will stick around after his initial feat for less appealing maintenance, not even himself. No one knows if/when he will have kids, a super demanding job in a brand new startup, sickness, etc. People capable of large and intense efforts tend to also be the ones getting bored the most easily. Maybe the people "volunteering an opinion" are the ones who have been taking out the garbage for decades, did you check?

> to cleanup a horribly messed up and unmaintainable piece of crap part of the code [...] status-quo of dumpster fire".

Did anyone say it was that bad? AFAIK gcc works.

> For god's sake it's not Java, why is there so much bikeshedding over it.

Good to know we can count on your bikeshedding whenever Java is considered for some random thing :-)

Kotlin

Posted Jul 26, 2018 5:02 UTC (Thu) by marcH (subscriber, #57642) [Link]

More seriously: if you ever had to suffer with Java I very highly recommend this Kotlin demo at Google I/O 2017 https://www.youtube.com/watch?v=X1RVYt2QKQE

It blew me away because it methodically removes most of Java pain points one by one - while of course keeping all the advantages.

I digress sorry; *not* suggesting Kotlin for this case.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 9:48 UTC (Thu) by jwilk (subscriber, #63328) [Link] (4 responses)

There's no benefit of having polyglot Python programs. Just stick to one version (so Python 2 probably). Also, don't bother reading esr's FAQ; it's full of bad ideas.

But I imagine that AWK programs would be too verbose when rewritten in Python. Perl would be probably better choice.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 19:36 UTC (Thu) by marcH (subscriber, #57642) [Link] (2 responses)

You forgot the content in your comment, you stopped after the introduction. It happens when people click "send" by mistake before they're finished, however it's a bit surprising on LWN which forces a "preview" step?

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 19:43 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

That may happen when you start writing, hit the preview button, then finish your comment and click “Publish comment”. It will publish the original preview.
You are supposed to push “Preview comment” *again* after making changes in textarea.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 20:10 UTC (Thu) by jwilk (subscriber, #63328) [Link]

I didn't forget anything. I'm sorry that you found my comment insufficiently contentful.

Replacing AWK with Python in GCC?

Posted Jul 28, 2018 21:05 UTC (Sat) by dvdeug (guest, #10998) [Link]

An obsolete version of Python that's already scheduled to be removed from cutting edge distros? Sounds problematic.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 14:44 UTC (Thu) by Tara_Li (guest, #26706) [Link]

Not sure this is a real concern, but...

AWK is one of the modules in busybox - while Python I'm fairly sure isn't. I doubt there are many situations where full GCC building is done in a busybox environment, but it is a possible use case, I think.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 17:28 UTC (Thu) by mm7323 (subscriber, #87386) [Link]

Isn't this one of those silly problems where things are getting built on unstable foundations just because of what is there already? E.g. like initially using SATA to connect SSDs which are essentially filled with memory chips.

Surely fixing the grammar and syntax of the .opt files should be the first step, and then a trivial parser/processor could be made in any language, ideally C to ease bootstrapping.

Additionally, I've only ever found cross-compiling various Python interpreters difficult, though that was some years ago and things may have improved.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 22:39 UTC (Thu) by RooTer (guest, #91640) [Link] (3 responses)

Python 2 focus seems like a horrible idea. They want to fix maintainability problem not add to it right?

Why would anyone want to use a version of Python that is know to expire in 1.5years from now one, just to support some enterprise distro users that want to compiling gcc AND don't want to compile Python 3 for some unfathomable reason.

Replacing AWK with Python in GCC?

Posted Jul 26, 2018 23:27 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

There are systems right now that don't have Py3 installed by default and they are going to be supported for years.

And really, I looked at the code and there's no reason to NOT make compatible with both Python versions.

Replacing AWK with Python in GCC?

Posted Jul 28, 2018 11:31 UTC (Sat) by azumanga (subscriber, #90158) [Link] (1 responses)

Or... All Mac's, which don't come with Python 3

Replacing AWK with Python in GCC?

Posted Aug 3, 2018 19:07 UTC (Fri) by flussence (guest, #85566) [Link]

Macs don't come with modern GCC either. Besides, Mac developers are used to massive amounts of bloat, a single 100MB runtime isn't much to ask.

Replacing AWK with Python in GCC?

Posted Jul 29, 2018 18:23 UTC (Sun) by ofranja (guest, #11084) [Link]

I did not have a strong opinion about this until I read the email containing the arguments and proposed solution, and looked at the original code.

IMHO, the awk version easier to read. Sure, it has more duplication, but one can easily abstract awk code (hint: it supports functions and there are ways to encapsulate data in arrays). Bad/redundant code can be written in any language. The python version has tons of details that require a deeper knowledge of python as well and is not necessarily easier to work with.

Also, not everyone can (or want to) afford extra ~130MB of storage just to process some text.

In my opinion, putting some effort to improve the awk version would probably be a better solution in the long term.


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds