Google's RE2 regular expression library
Google's RE2 regular expression library
Posted Mar 13, 2010 19:24 UTC (Sat) by tkil (guest, #1787)In reply to: Google's RE2 regular expression library by tzafrir
Parent article: Google's RE2 regular expression library
Something like perl's qr//
operator?
Not really; qr//
simply lets the programmer control when the
double-quotish regex string is turned into a regex object. It's more like
generating an opcode tree for the regex, as opposed to compiling it down to
bytecode or machine instructions.
This is still a win, since the regex object creation step is often the most expensive part of using a regex: it involves interpolation, compilation to opcodes, and various optimizations. Doing that only once vs. every time through a loop is a huge help.
Being able to compile down to machine instruction (with all the optimizations along the way) is just going further along the scale of "more effort up front" vs. "better speed on each match". When you're working with long-lived C++ apps and the regexes are static, it makes perfect sense to try to compile them down as far as possible; when working with short- lived dynamic scripts, it isn't always a win.
Posted Mar 14, 2010 10:02 UTC (Sun)
by tzafrir (subscriber, #11501)
[Link] (1 responses)
If the VM has the code half-compiled, there are probably many other optimizations to do under the cover.
Posted Mar 15, 2010 2:37 UTC (Mon)
by tkil (guest, #1787)
[Link]
What's so special about a regex, then? If the VM has the code half-compiled, there are probably many other
optimizations to do under the cover. That's pretty much the reason: it's a tradeoff between upfront regex
parsing / compilation / optimizing, and the ongoing cost of matching
against that regular expression. If you are only going to match against a regex once or twice, then
there's no point in spending a huge amount of time optimizing that regex
(and certainly not in doing it every time it's seen!). If, on the other
hand, you are going to match many things against that regex, it makes sense
to spend more time up front so that each match is a bit faster. It's a continuum: on the one end, the regex engine can just parse the
regex text as it goes, and do no compilation or optimization whatsoever
(this would be analogous to a true interpreter). As the next step, the
interpreter could process the regex into some internal representation, but
not do any optimizations. A step further, that internal representation
could be optimized in various ways. Finally, the optimized internal
representation could be compiled down to machine code (with possible
further optimizations applied). Each of those steps takes longer up front,
though, so you have to amortize the up-front regex processing time across
the number of matches.
Google's RE2 regular expression library
Google's RE2 regular expression library