LWN.net Logo

Full regular expressions?

Full regular expressions?

Posted Mar 12, 2010 19:13 UTC (Fri) by Pc5Y9sbv (guest, #41328)
In reply to: Full regular expressions? by vonbrand
Parent article: Google's RE2 regular expression library

Don't some of the back-referencing extensions actually take it from regular deep into LR(1) languages?

If I remember way back from school, in practice, many human-comprehensible non-determinstic REs can be emulated in closer to linear time. Running a set of DFAs, growing on each non-deterministic branch point, doesn't grow as wildly as it could in theory, because so many of the DFAs are culled relatively soon on dead-end input transitions. Is that no longer considered a valid heuristic strategy?


(Log in to post comments)

Full regular expressions?

Posted Mar 14, 2010 10:50 UTC (Sun) by ballombe (subscriber, #9523) [Link]

> Don't some of the back-referencing extensions actually take it from regular deep into LR(1) languages?

This is much much worse. Languages defined with general back-references are undecidable, and even the simplest kind of back reference can cause languages not to be algebraic, for example ([ab]*)\1 and (a*)(b*)\1\2 are not.

Now, all true regular expressions can be matched in linear time and bounded memory. However both the memory bound and the setup time (before starting the match) are in the worse case exponential in the size of the expression.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds