From: Andrew Kuchling <akuchlin@cnri.reston.va.us>
Subject: New release of PCRE module
Date: Mon, 06 Apr 1998 14:11:06 EDT
A new release, dated March 26, of the Python PCRE module is available.
This contains all the patches listed on the Python 1.5 Errors page,
and also contains some additional significant changes and bugfixes.
This independent distribution is at
<ftp://starship.skyport.net/pub/crew/amk/regex/pcre.tgz>
The PCRE module underlies the re module for regular expression
matching that was added in Python 1.5, and this distribution therefore
also includes re.py and `test_re.py, the re module's test suite.
*Please* try this release, so that it can get as much testing
as possible, and so we can verify that it doesn't break anything.
Hopefully we can be convinced of its stability before Guido decides to
release Python 1.5.1 (whenever that will be).
Changes:
* Code updated to the upstream PCRE release 1.07.
* Binary \0 is no longer legal in a pattern, because escaping
it was too unreliable.
A more detailed explanation of this change: the re functions
take a normal Python string as the pattern. Such strings can contain
the character with ASCII value 0. This caused problems with PCRE,
because it assumes that the only null byte is at the end of the
pattern. The original PCRE module would hide this by replacing null
bytes with the 4 characters '\000', but that turned out to break
things like \<null byte>. Rather than make the escaping more
complicated, I decided to rip it out completely. (From the C
extension point of view, the re.compile function now uses
PyArg_ParseTuple("s") instead of "s#", and therefore disallows an
ASCII zero in its input.)
So, re.compile('\0') won't work, but re.compile( r"\0") will,
because the compile will see a two character string containing '\' and
'0'. Therefore, it's now an even better idea to always use r"..." for
re patterns, which avoids the problem because you can't get a binary
zero into the string.
* In the replacement string for re.sub, \g<1> is now a synonym
for \1, but it doesn't cause ambiguity in strings like \10\2.
Documentation updated accordingly.
* The test suite has been greatly increased to test more
functions and cases.
* Previously fixed bugs: the maxsplit option on re.split is
now implemented; the return value of MatchObject.groups() is changed
to be a tuple even if there's only one group; patterns such as ((a)*)*
could dump core; re.sub( r'^\s*', 'X', ' test') returned 'XtXeXsXtX'
instead of "Xtest". (Patches for these are already on the 1.5 errors
page at <http://www.python.org/1.5/errors.html>.)
A.M. Kuchling http://starship.skyport.net/crew/amk/
Human history becomes more and more a race between education and catastrophe.
-- H.G. Wells
--
-------- comp.lang.python.announce (moderated) --------
Article Submission Address: python-announce@python.org
Python Language Home Page: http://www.python.org/
-------------------------------------------------------