|
|
Log in / Subscribe / Register

Development quotes of the week - regular expression edition

Development quotes of the week - regular expression edition

Posted Feb 17, 2022 12:51 UTC (Thu) by flussence (guest, #85566)
Parent article: Development quotes of the week - regular expression edition

Oh, here's a fun regex story I had last week:

I had a bunch of syslog data in wildly varying shapes (httpd, postfix, various other servers) and wanted to pull out badly-behaving remote actors' IPs and stuff them into an ipset. Easy enough to do those one at a time, but wouldn't it be nice to do them in a single pass? (Before you run away screaming, I had no intention of using a single regex for the whole thing)

So I used ripgrep's handy --file flag, which allegedly reads one regex per line (and more importantly, supports comments between them). And then I put the IP address part of each line in a capture group, intending to ask for "$1" as output. And... it broke horribly because it turns out something internally rewrites it into a single regex for the whole thing, and the capture groups become sequentially numbered. What about named (?<foo>) groups? That's a no-go too because libpcre loudly complains about duplicates and errors out.

I ended up doing it one at a time.


to post comments

Development quotes of the week - regular expression edition

Posted Feb 21, 2022 8:54 UTC (Mon) by tlamp (subscriber, #108540) [Link] (1 responses)

> That's a no-go too because libpcre loudly complains about duplicates and errors out.

ripgrep doesn't use libpcre, but the native rust regex crate[0] which is inspired by Google's re2 and has similar pro/cons that NYKevin mentioned above w.r.t. re2.

Maybe it could be worth opening an issue at ripgrep regarding this behavior, but IMO it's a bit odd to use grep/rg/ag/... for this, awk (and derivates, like frawk[1]) sounds like it could be a better option. I use awk quite successfully for searching nginx access logs for unusual patterns.

[0]: https://github.com/rust-lang/regex
[1]: https://github.com/ezrosent/frawk

Development quotes of the week - regular expression edition

Posted Feb 26, 2022 3:02 UTC (Sat) by flussence (guest, #85566) [Link]

> ripgrep doesn't use libpcre, but the native rust regex crate

It uses both, see `ldd`. I was using the --pcre2 flag because there were lookbehinds in a few of the patterns.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds