bison-3.1 released

Version 3.1 of the Bison parser generator has been released. "It introduces new features such as typed midrule actions, brings improvements in the diagnostics, fixes several bugs and portability issues, improves the examples, and more".

From:		Akim Demaille <akim-AT-lrde.epita.fr>
To:		GNU Announcements List <info-gnu-AT-gnu.org>
Subject:		bison-3.1 released [stable]
Date:		Tue, 28 Aug 2018 06:33:15 +0200
Message-ID:		<F19B66DE-52D0-4D7D-8770-D8DA8F6966DF__44397.7687428289$1535461018$gmane$org@lrde.epita.fr>
Cc:		coordinator-AT-translationproject.org, Bison Help <help-bison-AT-gnu.org>, Bison Bugs <bug-bison-AT-gnu.org>, Bison Patches <bison-patches-AT-gnu.org>
Archive-link:		Article

We are very happy to announce the release of GNU Bison 3.1.  It introduces
new features such as typed midrule actions, brings improvements in the
diagnostics, fixes several bugs and portability issues, improves the
examples, and more.

See the NEWS below for more details.

Enjoy!

==================================================================

Bison is a general-purpose parser generator that converts an annotated
context-free grammar into a deterministic LR or generalized LR (GLR) parser
employing LALR(1) parser tables.  Bison can also generate IELR(1) or
canonical LR(1) parser tables. Once you are proficient with Bison, you can
use it to develop a wide range of language parsers, from those used in
simple desk calculators to complex programming languages.

Bison is upward compatible with Yacc: all properly-written Yacc grammars
ought to work with Bison with no change. Anyone familiar with Yacc should be
able to use Bison with little trouble. You need to be fluent in C or C++
programming in order to use Bison. Java is also supported.

Here is the GNU Bison home page:
    https://gnu.org/software/bison/

==================================================================

Here are the compressed sources:
  https://ftp.gnu.org/gnu/bison/bison-3.1.tar.gz   (4.4MB)
  https://ftp.gnu.org/gnu/bison/bison-3.1.tar.xz   (2.0MB)

Here are the GPG detached signatures[*]:
  https://ftp.gnu.org/gnu/bison/bison-3.1.tar.gz.sig
  https://ftp.gnu.org/gnu/bison/bison-3.1.tar.xz.sig

Use a mirror for higher download bandwidth:
  https://www.gnu.org/order/ftp.html

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify bison-3.1.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 0DDCAA3278D5264E

and rerun the 'gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.69
  Automake 1.16.1
  Flex 2.6.4
  Gettext 0.19.8.1
  Gnulib v0.1-2061-ga05181f4b

==================================================================

NEWS

* Noteworthy changes in release 3.1 (2018-08-27) [stable]

** Backward incompatible changes

  Compiling Bison now requires a C99 compiler---as announced during the
  release of Bison 3.0, five years ago.  Generated parsers do not require a
  C99 compiler.

  Support for DJGPP, which have been unmaintained and untested for years, is
  obsolete. Unless there is activity to revive it, the next release of Bison
  will have it removed.

** New features

*** Typed midrule actions

  Because their type is unknown to Bison, the values of midrule actions are
  not treated like the others: they don't have %printer and %destructor
  support.  It also prevents C++ (Bison) variants to handle them properly.

  Typed midrule actions address these issues.  Instead of:

    exp: { $<ival>$ = 1; } { $<ival>$ = 2; }   { $$ = $<ival>1 + $<ival>2; }

  write:

    exp: <ival>{ $$ = 1; } <ival>{ $$ = 2; }   { $$ = $1 + $2; }

*** Reports include the type of the symbols

  The sections about terminal and nonterminal symbols of the '*.output' file
  now specify their declared type.  For instance, for:

    %token <ival> NUM

  the report now shows '<ival>':

    Terminals, with rules where they appear

    NUM <ival> (258) 5

*** Diagnostics about useless rules

  In the following grammar, the 'exp' nonterminal is trivially useless.  So,
  of course, its rules are useless too.

    %%
    input: '0' | exp
    exp: exp '+' exp | exp '-' exp | '(' exp ')'

  Previously all the useless rules were reported, including those whose
  left-hand side is the 'exp' nonterminal:

    warning: 1 nonterminal useless in grammar [-Wother]
    warning: 4 rules useless in grammar [-Wother]
    2.14-16: warning: nonterminal useless in grammar: exp [-Wother]
     input: '0' | exp
                  ^^^
    2.14-16: warning: rule useless in grammar [-Wother]
     input: '0' | exp
                  ^^^
    3.6-16: warning: rule useless in grammar [-Wother]
     exp: exp '+' exp | exp '-' exp | '(' exp ')'
          ^^^^^^^^^^^
    3.20-30: warning: rule useless in grammar [-Wother]
     exp: exp '+' exp | exp '-' exp | '(' exp ')'
                        ^^^^^^^^^^^
    3.34-44: warning: rule useless in grammar [-Wother]
     exp: exp '+' exp | exp '-' exp | '(' exp ')'
                                      ^^^^^^^^^^^

  Now, rules whose left-hand side symbol is useless are no longer reported
  as useless.  The locations of the errors have also been adjusted to point
  to the first use of the nonterminal as a left-hand side of a rule:

    warning: 1 nonterminal useless in grammar [-Wother]
    warning: 4 rules useless in grammar [-Wother]
    3.1-3: warning: nonterminal useless in grammar: exp [-Wother]
     exp: exp '+' exp | exp '-' exp | '(' exp ')'
     ^^^
    2.14-16: warning: rule useless in grammar [-Wother]
     input: '0' | exp
                  ^^^

*** C++: Generated parsers can be compiled with -fno-exceptions (lalr1.cc)

  When compiled with exceptions disabled, the generated parsers no longer
  uses try/catch clauses.

  Currently only GCC and Clang are supported.

** Documentation

*** A demonstration of variants

  A new example was added (installed in .../share/doc/bison/examples),
  'variant.yy', which shows how to use (Bison) variants in C++.

  The other examples were made nicer to read.

*** Some features are no longer 'experimental'

  The following features, mature enough, are no longer flagged as
  experimental in the documentation: push parsers, default %printer and
  %destructor (typed: <*> and untyped: <>), %define api.value.type union and
  variant, Java parsers, XML output, LR family (lr, ielr, lalr), and
  semantic predicates (%?).

** Bug fixes

*** GLR: Predicates support broken by #line directives

  Predicates (%?) in GLR such as

    widget:
      %? {new_syntax} 'w' id new_args
    | %?{!new_syntax} 'w' id old_args

  were issued with #lines in the middle of C code.

*** Printer and destructor with broken #line directives

  The #line directives were not properly escaped when emitting the code for
  %printer/%destructor, which resulted in compiler errors if there are
  backslashes or double-quotes in the grammar file name.

*** Portability on ICC

  The Intel compiler claims compatibility with GCC, yet rejects its _Pragma.
  Generated parsers now work around this.

*** Various

  There were several small fixes in the test suite and in the build system,
  many warnings in bison and in the generated parsers were eliminated.  The
  documentation also received its share of minor improvements.

  Useless code was removed from C++ parsers, and some of the generated
  constructors are more 'natural'.



-- 
If you have a working or partly working program that you'd like
to offer to the GNU project as a GNU package,
see https://www.gnu.org/help/evaluation.html.

bison-3.1 released

Posted Aug 29, 2018 19:23 UTC (Wed) by tshow (subscriber, #6411) [Link] (5 responses)

Now if we can get a version of flex that supports utf-8 fully, we'd be off to the races.

bison-3.1 released

Posted Sep 4, 2018 15:28 UTC (Tue) by eru (subscriber, #2753) [Link] (4 responses)

I wonder if the almost unbounded character set of unicode would blow up the implementation strategies of Flex and similar. Like it no longer is feasible to simply index anything by the character code. You would have to use hash tables instead.

bison-3.1 released

Posted Sep 4, 2018 16:44 UTC (Tue) by excors (subscriber, #95769) [Link] (3 responses)

Couldn't you just index by UTF-8 code units, i.e. bytes? I don't really know how Flex works but it sounds like it constructs an NFA then converts to a DFA, and I would imagine you could take an NFA with Unicode symbols and simply replace each of them with a chain of UTF-8 single-byte symbols, then construct and execute the DFA as normal (with all the normal byte-based optimisations). Is there some reason that wouldn't work?

bison-3.1 released

Posted Sep 5, 2018 8:06 UTC (Wed) by eru (subscriber, #2753) [Link] (2 responses)

If I understood correctly, your solution would keep everything in the "UTF-8 domain". But I think this would run into trouble with regular expressions like . or [x-y] (where x and y represent some Unicode characters that need more than one byte in UTF-8 representation). The regular expressions are meant to match characters, not bytes.

bison-3.1 released

Posted Sep 5, 2018 9:55 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

I'm imagining it uses something similar to https://en.wikipedia.org/wiki/Thompson%27s_construction , where you convert the regexp to an NFA state machine with some simple rules. /[a-c]/ is implemented the same as /a|b|c/, where you construct NFAs for symbols a,b,c, each with a single state transition like "---> (q) ---a--> ((f))", then combine them into a bigger NFA with the union rule.

Similarly /[\U00010000-\U00010002]/ is a union of three Unicode symbols, but the NFA for symbol \U00010000 would be replaced with something like "---> (q) ---F0--> ( ) ---90--> ( ) ---80--> ( ) ---80--> (( f ))", so it accepts the UTF-8 byte representation directly. Then (in theory) continue with the union as usual.

In practice I guess you'd want to optimise it a bit for large character ranges, so you can share some states between characters with the same prefix bytes, to save memory. Maybe that's non-trivial, but I don't immediately see why it'd be problematic. /./ is equivalent to /[\x00-\x09\x0b-\U001fffff] and doesn't need anything special.

(I'm totally not an expert though, I'm just remembering the theoretical stuff from university and I don't know how it really works in Flex.)

bison-3.1 released

Posted Sep 7, 2018 6:21 UTC (Fri) by eru (subscriber, #2753) [Link]

I am sure a Flex-like program could be implemented along the lines you propose. However it would probably be slower. The 'F' in Flex stood originally for "fast": the goal was to make it generate lexers that are as fast as good hand-written ones. I would say it succeeds in this admirably well: I have inherited the maintenance of one program with an impossibly large lexical specification (hundreds of tokens, ickier than usual rules about delimiters, it basically simulates an obsolete job command language), and the speed of reading input has never been a problem with it.