Visualization of Ruby's Grammar
As part of the momentum surrounding the Ruby implementer's summit, I have decided to take on a pet project to understand Ruby's grammar better, with the goal of contributing to an implementation-independent specification of the grammar. Matz mentioned during his keynote how parse.y was one of the uglier parts of Ruby, but just how ugly?" (Found on Linux Journal)
Posted Oct 28, 2006 17:16 UTC (Sat)
by tjc (guest, #137)
[Link] (10 responses)
http://flickr.com/photos/nicksieger/281055530/ It's interesting how clean the C and Phython graphs are compared to the other three.
Posted Oct 28, 2006 18:47 UTC (Sat)
by gmaxwell (guest, #30048)
[Link]
Posted Oct 28, 2006 20:29 UTC (Sat)
by hilmi (guest, #41368)
[Link]
Posted Oct 29, 2006 3:03 UTC (Sun)
by mikov (guest, #33179)
[Link] (6 responses)
Posted Oct 30, 2006 15:30 UTC (Mon)
by tjc (guest, #137)
[Link] (1 responses)
Posted Oct 30, 2006 18:38 UTC (Mon)
by mikov (guest, #33179)
[Link]
I can't really understand the meaning of your post. I will take it as a not very successful joke :-) I think I explained why the graphs are meaningless, and more importantly that Ruby's graph is incorrect (LR(1) vs LL(k), etc)
This story should not really have been posted here - unlike most other stories on LWN, it has no useful content and is misleading. I suspect that most people do not know enough about parsing to be able to judge that, so they might take those graphs as a "valuable insight". They aren't. Perhaps I sound harsher than I intend to - I am just trying to be helpful to the other LWN readers.
Posted Oct 31, 2006 16:21 UTC (Tue)
by tjc (guest, #137)
[Link] (1 responses)
Posted Oct 31, 2006 16:47 UTC (Tue)
by mikov (guest, #33179)
[Link]
I have a problem with the automatic conversion. I am sure pretty much
anything (well, within limits) can eventually be parsed with ANTLR using
backtracking, with varying amount of effort. In this case, since the
generated grammar is almost certainly not correct, and since the person
who did it even neglected to mention that, it cannot be taken seriously.
I have no idea whether the diagram looks similar to one for the correct
grammar. Do you ? Oh, it will probably have similar node names, but that
doesn't really tell us much, which is my next point.
I have no idea what the diagram really means. Where
is it defined ? Who uses it ? Subjective observations that one "looks
cleaner" than the other are meaningless. I could rearrange any diagram to
look terrible. Perhaps one could derive some value from it, of one already
had intimate knowledge about the grammar, but not the other way around.
I agree that his comment is interesting. Then I am pretty sure it is
because he probably knows Ruby, not because he actually used the diagram
to derive insight. His comments about the size of the parser make sense in
the same context. The observation about the lexical analyzer doing a lot
of work especially doesn't stand on its own - it is impossible to deduce
that
from the diagram.
The problem is that the entire thing it is based on a wrong premise.
Even if it can generate interesting comments and useful tidbits of
information, ultimately it is misleading. A casual reader could be lead to
believe that:
All these things are false.
Posted Nov 1, 2006 9:18 UTC (Wed)
by rafdz (guest, #41427)
[Link] (1 responses)
Posted Nov 1, 2006 16:44 UTC (Wed)
by mikov (guest, #33179)
[Link]
Posted Nov 2, 2006 17:07 UTC (Thu)
by kbob (guest, #1770)
[Link]
Don't miss the C and Python graphs posted further down in comment #8:Visualization of Ruby's Grammar
http://flickr.com/photos/nicksieger/281055485/
I wonder if such a graph for perl could be generated in finite time... :)Visualization of Ruby's Grammar
I wonder if the reason for Ruby's graph looking different is the Visualization of Ruby's Grammar
fundamental difference of Ruby from others that everything is an
expression and returns a value (like Lisp). In Ruby, you can embed any
language structure into another by using parenthesis.
These graphs are completely meaningless, as far as I can tell. They may be Visualization of Ruby's Grammar
pretty to look at, but that's all. So, the relative "cleanliness" of one
graphs vs the other tells as nothing. I suspect anybody with artistic
sense could make any graph to look better.
Further on, I should point that the original blog post appears mostly
meaningless too. They converted Ruby's grammar from BISON (LRAR(1) parser
generator) to ANTLR (LL(k) parser generator) using an automatic converter.
General automatic conversion between LR and LL is, of course, completely
impossible (since LR(1) languages are a superset of LL(k) languages). I
suspect that the converter they used simply translates some common idioms,
but _does not produce a valid LL grammar_.
So, this is all totally uninteresting, and actually fairly misleading.
Visualization of Ruby's Grammar
These graphs are completely meaningless, as far as I can tell.
Most non-programmers would probably say the same thing about source code. ;-)
Visualization of Ruby's Grammar
Visualization of Ruby's Grammar
They converted Ruby's grammar from BISON (LRAR(1) parser generator) to ANTLR (LL(k) parser generator) using an automatic converter.
Perhaps you mean LALR(1)? I do not know what LRAR(1) is.
General automatic conversion between LR and LL is, of course, completely impossible (since LR(1) languages are a superset of LL(k) languages). I suspect that the converter they used simply translates some common idioms, but _does not produce a valid LL grammar_.
Are you saying that Ruby can not be parsed with an ANTLR grammar, or that it's just the automatic conversion from LALR(1) that is problematic? Even if the generated grammar is incorrect, I suspect that a corrected version would look substantially the same, assuming that it can be generated at all.
So, this is all totally uninteresting, and actually fairly misleading.
On the contrary, I find this very interesting! See Bruce Person's comment about the primary node in Ruby, for example.
Visualization of Ruby's Grammar
Are you saying that Ruby can not be parsed with an
ANTLR grammar, or that it's just the automatic conversion from LALR(1)
that is problematic? Even if the generated grammar is incorrect, I suspect
that a corrected version would look substantially the same, assuming that
it can be generated at all.
On the contrary, I find this very interesting! See
Bruce Person's comment about the primary node in Ruby, for example.
LR(1) grammares are not supersets of LL(k) grammars.only LR(k) are supersets of LL(k) grammars.Visualization of Ruby's Grammar
While I am a little rusty on the theory, I seem to remember that any LR(k) Visualization of Ruby's Grammar
grammar can be converted to an equivalent LR(1) grammar, and since LR(k)
is superset of LL(k), it follows that LR(1) is superset of LL(k).
I'd like to see the graph for Lisp... (-:Visualization of Ruby's Grammar