User: Password:
|
|
Subscribe / Log in / New account

PEP 461 Final?

From:  Ethan Furman <ethan-AT-stoneleaf.us>
To:  Python Dev <Python-Dev-AT-python.org>
Subject:  PEP 461 Final?
Date:  Fri, 17 Jan 2014 08:49:21 -0800
Message-ID:  <52D95F11.3020005@stoneleaf.us>
Archive-link:  Article

Here's the text for your reading pleasure.  I'll commit the PEP after I add some markup.

Major change:

   - dropped `format` support, just using %-interpolation

Coming soon:

   - Rationale section  ;)

================================================================================
PEP: 461
Title: Adding % formatting to bytes
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman <ethan@stoneleaf.us>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-14, 2014-01-15, 2014-01-17
Resolution:


Abstract
========

This PEP proposes adding % formatting operations similar to Python 2's str type
to bytes [1]_ [2]_.


Overriding Principles
=====================

In order to avoid the problems of auto-conversion and Unicode exceptions that
could plague Py2 code, all object checking will be done by duck-typing, not by
values contained in a Unicode representation [3]_.


Proposed semantics for bytes formatting
=======================================

%-interpolation
---------------

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
will be supported, and will work as they do for str, including the
padding, justification and other related modifiers.

Example::

    >>> b'%4x' % 10
    b'   a'

    >>> '%#4x' % 10
    ' 0xa'

    >>> '%04X' % 10
    '000A'

%c will insert a single byte, either from an int in range(256), or from
a bytes argument of length 1, not from a str.

Example:

     >>> b'%c' % 48
     b'0'

     >>> b'%c' % b'a'
     b'a'

%s is restricted in what it will accept::

   - input type supports Py_buffer?
     use it to collect the necessary bytes

   - input type is something else?
     use its __bytes__ method; if there isn't one, raise a TypeError

Examples:

     >>> b'%s' % b'abc'
     b'abc'

     >>> b'%s' % 3.14
     Traceback (most recent call last):
     ...
     TypeError: 3.14 has no __bytes__ method

     >>> b'%s' % 'hello world!'
     Traceback (most recent call last):
     ...
     TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?

.. note::

    Because the str type does not have a __bytes__ method, attempts to
    directly use 'a string' as a bytes interpolation value will raise an
    exception.  To use 'string' values, they must be encoded or otherwise
    transformed into a bytes sequence::

       'a string'.encode('latin-1')


Numeric Format Codes
--------------------

To properly handle int and float subclasses, int(), index(), and float()
will be called on the objects intended for (d, i, u), (b, o, x, X), and
(e, E, f, F, g, G).


Unsupported codes
-----------------

%r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
supported.


Proposed variations
===================

It was suggested to let %s accept numbers, but since numbers have their own
format codes this idea was discarded.

It has been suggested to use %b for bytes instead of %s.

   - Rejected as %b does not exist in Python 2.x %-interpolation, which is
     why we are using %s.

It has been proposed to automatically use .encode('ascii','strict') for str
arguments to %s.

   - Rejected as this would lead to intermittent failures.  Better to have the
     operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have %s return the ascii-encoded repr when the value
is a str  (b'%s' % 'abc'  --> b"'abc'").

   - Rejected as this would lead to hard to debug failures far from the problem
     site.  Better to have the operation always fail so the trouble-spot can be
     easily fixed.

Originally this PEP also proposed adding format style formatting, but it was
decided that format and its related machinery were all strictly text (aka str)
based, and it was dropped.

Various new special methods were proposed, such as __ascii__, __format_bytes___,
etc.; such methods are not needed at this time, but can be visited again later
if real-world use shows deficiencies with this solution.


Footnotes
=========

.. [1] http://docs.python.org/2/library/stdtypes.html#string-for...
.. [2] neither string.Template, format, nor str.format are under consideration.
.. [3] %c is not an exception as neither of its possible arguments are unicode.


Copyright
=========

This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:
================================================================================


(Log in to post comments)


Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds