LWN.net Logo

Google's RE2 regular expression library

Google's RE2 regular expression library

Posted Mar 15, 2010 2:37 UTC (Mon) by tkil (subscriber, #1787)
In reply to: Google's RE2 regular expression library by tzafrir
Parent article: Google's RE2 regular expression library

What's so special about a regex, then?

If the VM has the code half-compiled, there are probably many other optimizations to do under the cover.

That's pretty much the reason: it's a tradeoff between upfront regex parsing / compilation / optimizing, and the ongoing cost of matching against that regular expression.

If you are only going to match against a regex once or twice, then there's no point in spending a huge amount of time optimizing that regex (and certainly not in doing it every time it's seen!). If, on the other hand, you are going to match many things against that regex, it makes sense to spend more time up front so that each match is a bit faster.

It's a continuum: on the one end, the regex engine can just parse the regex text as it goes, and do no compilation or optimization whatsoever (this would be analogous to a true interpreter). As the next step, the interpreter could process the regex into some internal representation, but not do any optimizations. A step further, that internal representation could be optimized in various ways. Finally, the optimized internal representation could be compiled down to machine code (with possible further optimizations applied). Each of those steps takes longer up front, though, so you have to amortize the up-front regex processing time across the number of matches.


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds