|
|
Subscribe / Log in / New account

Python support for "irregular" expressions

Python support for "irregular" expressions

Posted Feb 23, 2022 13:16 UTC (Wed) by fman (subscriber, #121579)
In reply to: Python support for "irregular" expressions by brenns10
Parent article: Python support for regular expressions

> (A good background on this from Russ Cox is found here [1]).
> [1]: https://swtch.com/~rsc/regexp/regexp1.html

Thanks for that link. That is certainly an enlightening read.
Make me wonder if what is *really* needed isn't a "simple-re" module with a "Thompson NFA" regex engine. A 6 digit speedup should be worth aiming for after all


to post comments

Python support for "irregular" expressions

Posted Mar 1, 2022 0:22 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

IMHO all you really need is 2½ pieces:

1. Modify one or both of re/regex to use an NFA/DFA implementation if possible (i.e. if there are no lookarounds/possessives/atomic groups in the expression).
2. Add a flag to re/regex.compile() that throws if it sees any of those features in the expression to be compiled. Off by default, must be explicitly passed.
2½. For bonus points, make a locked_down_regex module (preferably with a better name) that is exactly like re/regex, except the flag in (2) is always passed for you automatically and cannot be turned off by any means. This is analogous to the use of the secrets module in lieu of random. Since it's a whole new module, it won't break anything and must be opted-into, but OTOH it's easy to audit whether you are using the "right" module if your org cares.

This is 100% backwards compatible, would significantly improve the performance of existing regular expressions, and the only downside is a more complicated implementation.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds