|
|
Subscribe / Log in / New account

Improving pretty-printing in Python

By Jake Edge
March 18, 2020

The python-ideas mailing list is typically used to discuss new features or enhancements for the language; ideas that gain traction will get turned into Python Enhancement Proposals (PEPs) and eventually make their way to python-dev for wider consideration. Steve Jorgensen recently started a discussion of just that sort; he was looking for a way to add customization to the "pretty-print" module (pprint) so that objects could change the way they are displayed. The subsequent thread went in a few different directions that reflect the nature of the mailing list—and the idea itself.

Jorgensen prefaced his thoughts with a disclaimer of sorts: "This is really an idea for an idea [...]". He suggested that adding a "dunder" method to Python objects for pretty-printing purposes. Those methods have names that start and end with double underscores (i.e. "dunder"); they are used internally by Python for a number of standard tasks (e.g. __init__()). A new one might allow objects to represent themselves differently in Unicode streams:

The informal (`str`) representations of `inf` and `-inf` are "inf" and "-inf", and that seems appropriate as a known-safe value, but if we're writing the representation to a stream, and the stream has a Unicode encoding, then those might prefer to represent themselves as "∞" and "-∞". If there were a dunder method for informal representation to which the destination stream was passed, then the object could decide how to represent itself based on the properties of the stream.

Beyond that, objects might like to control how they are pretty-printed in the general case. pprint provides some amount of customization, in terms of text width, indentation, and traversal depth, but he is looking for more than that:

It would be nice if there were some method that, if implemented for the object, would be used to allow the object to tell the pretty printer to treat it is a composite with starting text, component objects, and ending text.

Guido van Rossum thought the idea had some merit. He suggested that a pprint alternative "that allows classes to have formatting hooks that get passed in some additional information (or perhaps a PrettyPrinter object) that can affect the formatting" might make sense. It would be the type of feature that could be developed as independent modules on the Python Package Index (PyPI), "*except* it would be more effective if there was a standard, rather than several competing such modules, with different APIs for the formatting hooks". He encouraged a discussion on what that API might look like.

Jonathan Fine offered up some potential starting points, at least in terms of design, in the reprlib and json modules in the standard library. Eric V. Smith pointed to the @functools.singledispatch decorator as a potential pattern to use; it allows for overloaded functions based on the type of the first argument.

But the definition of some putative __pretty__() method on objects could be problematic, Barry Scott said. "Pretty" is "in the eye of the beholder", so he is skeptical that objects can define a one-size-fits-all implementation; for example, internationalization and localization might be required. Instead of driving it from the object side, he would rather have something that takes an object "and returns the pretty version depending on the apps demands/config". Stephen J. Turnbull more or less concurred with that:

Allowing objects to decide implicitly how to represent themselves is usually a bad idea, and we shouldn't encourage it. Yes, it's *very* cool that you can do things like "π = math.pi", and with MacroPy you can even do things like substitute "λ" for "lambda". However, if ways are provided to do this automatically depending on encodings and other variable environment state, people *will* put them into public libraries, and clients of those libraries will have to compensate for that. And of course there's the potential for foot-shooting in private libraries.

If an application wants to make such substitutions, I have no objection to that. But "explicit is better than implicit", and those substitutions should be made at the level of application I/O, not the class level IMO. (Yes, I know those "levels" are ill-defined, but that's still an appropriate informal principle, I think.)

Christopher Barker was concerned that adding a new dunder method for pretty-printing, beyond the existing __str__() and __repr__(), might just lead to the need for more than one version of "pretty". He wondered about updating __str__() for standard types, so that the output was "pretty" by default, but recognized that it would likely break many things: "I imagine a LOT of code out there (doctests, who know what else) does in fact expect the str() of builtins not to change -- so this is probably dead in the water." But beyond the code (and documentation) upheaval, it is far from clear what "pretty" means, as Steven D'Aprano pointed out:

Define "pretty". The main reason I don't use the pprint module at the moment is that it formats things like lists into a single long, thin column which is *less* attractive than the unformatted list:
    py> pprint.pprint(list(range(200)))
    [0,
     1,
     2,
     3,
     ...
     198,
     199]

I've inserted the ellipsis for brevity, the real output is 200 rows tall.

When it comes to floats, depending on what I'm doing, I may consider any of these to be "pretty":

  • the minimum number of digits which are sufficient to round trip;
  • the mathematically exact value, which could take a lot of digits;
  • some short number of digits, say, 5, that is "close enough".

Turnbull agreed with Barker that doctest-based tests would be affected by a change to str() (which calls __str__() if present), but that other things would be broken as well, which is something that the project tries to avoid:

Python may be good for developers who are moving fast and breaking things, but that's partly because (despite frequent complaints to the contrary) we don't move fast and break things most of the time.

Beyond the standard library modules, Alex Hall noted two projects on GitHub that may be of interest: PrettyPrinter and pprint++. Jorgensen said that he is looking at those as well as the others suggested in the thread. He is continuing the discussion, but is now thinking that adding dunder methods is not the right approach:

There has been some argument regarding whether objects should say how to present themselves "prettily". I think a case can be made either way, but in either case, it makes sense that it should be easy to override the representation for an object type without subclassing or monkey-patching it. Also, it might make sense not to clutter up the dunder-method space for all kinds of objects with this kind of thing.

Instead, he suggested adding a way for objects to register hooks governing how they want to be represented. It is still in the early going for any pretty-printing improvements; Jorgensen posted his initial message on March 15. Any wrangling over an API is still down the road a bit; a PEP and changes to the language, if any, are further out still. But there does seem to be a contingent that favors a feature of this sort, so it may well work its way into, say, Python 3.10, presumably sometime in 2021.


Index entries for this article
PythonEnhancements


to post comments

Improving pretty-printing in Python

Posted Mar 19, 2020 1:19 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

If you want to do this "at the level of application I/O," I would tend to assume you'd just subclass reprlib.Repr and implement the repr_foo() method overrides. Since that's already in the standard library, I'm not sure I understand what they are proposing... Maybe you could add a fallback option where reprlib.Repr.repr1() dispatches to some dunder method on the object (with the expectation that the object call back into repr1() when it wants to format a subobject)? That seems messy, since I usually think of dunder methods as belonging to the language rather than the standard library. But there is precedent for this sort of thing (e.g. __copy__ and __deepcopy__), so I suppose it could work.

Improving pretty-printing in Python

Posted Mar 21, 2020 7:22 UTC (Sat) by divbzero (guest, #137744) [Link]

I don’t have a strong opinion on this, but would note that the debate over a __pretty__ method seems to parallel the debate over a __json__ method. [1] [2]

[1]: https://bugs.python.org/issue27362

[2]: https://mail.python.org/pipermail/python-ideas/2010-July/...

__json__ has not gained traction, probably because JSON serialization can be application dependent and JSONDecoder already provides a flexible way to customize JSON serialization.

__pretty__ seems even more in the eye of the beholder so I’m not surprised to see hesitant reactions to the idea.


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds