On Sat, Jun 2, 2012 at 10:37 AM, Mark Lawrence <firstname.lastname@example.org> wrote: > On 01/06/2012 18:27, Brett Cannon wrote:
>> About the only thing I can think of from the language summit that we
>> discussed doing for Python 3.3 that has not come about is accepting the
>> regex module and getting it into the stdlib. Is this still being worked
> Umpteen versions of regex have been available on pypi for years. Umpteen
> bugs against the original re module have been fixed. If regex can't now go
> into the standard library, what on earth can?
That's why it's approved *in principle* already. However, it's not a
simple matter of dropping something into the standard library and
calling it done, especially an extension module as complex as regex.
Even integrating a simple pure Python module like ipaddr took
1. The API had to be reviewed to see if it was suitable for someone
that was *not* familiar with the problem domain, but was instead
learning about it from the standard library documentation. This isn't
a big concern for regex, since it is replacing the existing re module,
but this is the main reason ipaddr became ipaddress before PEP 3144
was approved (The ipaddr API plays fast and loose with network
terminology in a way that someone that already *knows* that
terminology can easily grasp, but would have been incredibly confusing
to someone that is discovering those terms for the first time).
2. The code had to actually be added to the standard library (not a
big effort for PEP 3144 - saving ipaddress.py into Lib/ and
test_ipaddress.py into Lib/test/ pretty much covered it)
3. Redundant 2.x cruft needed to be weeded out (ongoing)
4. The howto guide needed to be incorporated into the documentation
(and rewritten to be more suitable for genuine beginners)
5. An API module reference still needs to be incorporated into the
standard library reference
The effort to integrate regex is going to be substantially higher,
since it's a significantly more complicated module:
1. A new, non-trivial C extension needs to be incorporated into both
the autotools and Windows build processes
2. Due to PEP 393, there's a major change to the string implementation
in 3.3. Does regex still build against that? Even if it builds, it
should probably be ported to the new API for performance reasons.
3. Does regex build cleanly on all platforms supported by CPython? If
not, do we need to keep the existing re module around as a fallback
4. How do we merge the test suites? Do we keep the existing test
suite, add the regex test suite, then filter for duplication
5. What, precisely, *are* the backwards incompatibilities between
regex and re? Does the standard library trigger any of them? Does the
6. How will the PyPI backport be maintained in the future? The amount
of backwards compatibility cruft in standard library code should be
minimised, but that potentially makes backports more difficult.
ipaddress is in the 3.3 standard library because Peter Moody cared
enough about the concept to initially submit it for inclusion, and
because I volunteered to drive the review and integration process
forward and to be the final arbiter of what counted as "good enough"
for inclusion. That hasn't happened yet for regex - either nobody has
cared enough to write a PEP for it, or the bystander effect has kicked
in and everyone that cares is assuming *someone else* will take up the
burden of being the PEP champion.
So that's the first step: someone needs to take http://bugs.python.org/issue2636
and turn it into a PEP (searching the
python-dev and python-ideas archives for references to previous
discussions of the topic would also be good, along with summarising
the open Unicode related re bugs reported by Tom Christensen where the
answer is currently "use regex from PyPI instead of the standard
library's re module" ).
Nick Coghlan | email@example.com | Brisbane, Australia
Python-Dev mailing list