Development quotes of the week - regular expression edition
Development quotes of the week - regular expression edition
Posted Feb 20, 2022 23:00 UTC (Sun) by ras (subscriber, #33059)In reply to: Development quotes of the week - regular expression edition by NYKevin
Parent article: Development quotes of the week - regular expression edition
At some point, sometime between 10 and 20 years ago, that decision made it's way across the wide blurry line delineating "good engineering decision" to "actively harmful". Using an NFA instead of a DFA is now so far into "actively harmful" territory it's bloody annoying - I get bitten by non-deterministically lurking exponential time blowup bugs at least once a year now. Well maybe it's not non-deterministic, but you would have to be superman to figure out when a given NFA engine take until until the heat death of the universe to process some random real world input, while breezing though all the unit tests you give it. And it will do that despite you give it a well formed RE a DFA could easily represent in sane amount of space.
If I had a choice, I would use any DFA based engine (like RE) every time. But the regex engines most languages put in their "comes with batteries" standard libraries are NFA's, because some bright spark decided to allow back reference matching donkey's years ago. I presume they added it because it's something NFA's can do easily, and that fact that it turned an O(1) problem into an NP one didn't really matter because hey, NFA's already can't detect NP regex's so why not add a few more cases? The answer turned out to be because when the time came to move from NFA's to DFA's, back references would prevent it.
It's going to take a major change mindset to change this situation. It's not that languages like Python or PHP couldn't add an RE like regex engine to their batteries. It's that they would have to acknowledge it's more important to have a safe regex engine than to have the most features. In fact there was some discussion about replacing Python's regex engine not so long ago. I don't know what happened - maybe it's still ongoing. I got all excited and looked up what was being proposed. But no, the one thing I wanted - guaranteed O(1) execution time was not on the list. In fact as far as I could tell, it hadn't even occurred to anyone the current situation might be a problem. They just wanted more features.
