User: Password:
|
|
Subscribe / Log in / New account

Re: PEP 461 - Adding % and {} formatting to bytes

From:  Ethan Furman <ethan-AT-stoneleaf.us>
To:  Python Dev <Python-Dev-AT-python.org>
Subject:  Re: PEP 461 - Adding % and {} formatting to bytes
Date:  Tue, 14 Jan 2014 11:56:25 -0800
Message-ID:  <52D59669.20404@stoneleaf.us>
Archive-link:  Article

Duh.  Here's the text, as well.  ;)


PEP: 461
Title: Adding % and {} formatting to bytes
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman <ethan@stoneleaf.us>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-13
Resolution:


Abstract
========

This PEP proposes adding the % and {} formatting operations from str to bytes.


Proposed semantics for bytes formatting
=======================================

%-interpolation
---------------

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
will be supported, and will work as they do for str, including the
padding, justification and other related modifiers.

Example::

    >>> b'%4x' % 10
    b'   a'

%c will insert a single byte, either from an int in range(256), or from
a bytes argument of length 1.

Example:

     >>> b'%c' % 48
     b'0'

     >>> b'%c' % b'a'
     b'a'

%s, because it is the most general, has the most convoluted resolution:

   - input type is bytes?
     pass it straight through

   - input type is numeric?
     use its __xxx__ [1] [2] method and ascii-encode it (strictly)

   - input type is something else?
     use its __bytes__ method; if there isn't one, raise an exception [3]

Examples:

     >>> b'%s' % b'abc'
     b'abc'

     >>> b'%s' % 3.14
     b'3.14'

     >>> b'%s' % 'hello world!'
     Traceback (most recent call last):
     ...
     TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?

.. note::

    Because the str type does not have a __bytes__ method, attempts to
    directly use 'a string' as a bytes interpolation value will raise an
    exception.  To use 'string' values, they must be encoded or otherwise
    transformed into a bytes sequence::

       'a string'.encode('latin-1')


format
------

The format mini language will be used as-is, with the behaviors as listed
for %-interpolation.


Open Questions
==============

For %s there has been some discussion of trying to use the buffer protocol
(Py_buffer) before trying __bytes__.  This question should be answered before
the PEP is implemented.


Proposed variations
===================

It has been suggested to use %b for bytes instead of %s.

   - Rejected as %b does not exist in Python 2.x %-interpolation, which is
     why we are using %s.

It has been proposed to automatically use .encode('ascii','strict') for str
arguments to %s.

   - Rejected as this would lead to intermittent failures.  Better to have the
     operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have %s return the ascii-encoded repr when the value
is a str  (b'%s' % 'abc'  --> b"'abc'").

   - Rejected as this would lead to hard to debug failures far from the problem
     site.  Better to have the operation always fail so the trouble-spot can be
     easily fixed.


Foot notes
==========

.. [1] Not sure if this should be the numeric __str__ or the numeric __repr__,
        or if there's any difference
.. [2] Any proper numeric class would then have to provide an ascii
        representation of its value, either via __repr__ or __str__ (whichever
        we choose in [1]).
.. [3] TypeError, ValueError, or UnicodeEncodeError?


Copyright
=========

This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:



(Log in to post comments)


Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds