LWN.net Logo

Google's RE2 regular expression library

Google's RE2 regular expression library

Posted Mar 15, 2010 6:34 UTC (Mon) by dlang (✭ supporter ✭, #313)
In reply to: Google's RE2 regular expression library by tkil
Parent article: Google's RE2 regular expression library

if you are going to be doing a lot of string character position based manipulation _and_ expect to be dealing with a lot of non-ASCII data then it's worth converting the strings when you read them in.

on the other hand, many programs copy strings around a lot, but don't actually manipulate them much.

and for many things, the data being used really is almost entirely ASCII.

in these cases it is far better to leave things in UTF-8 variable length encoding and just walk the string when needed.

in the case of regex matching, if you are going to start at the beginning of the string and walk though it looking for matches, then you may as well leave it in UTF-8, you aren't doing anything that would benifit from knowing ahead of time where a particular character position starts, and the fact that the string is almost always going to be smaller is a win.


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds