|
|
Subscribe / Log in / New account

Python and public APIs

By Jake Edge
July 31, 2019

In theory, the public API of a Python standard library module is fully specified as part of its documentation, but in practice it may not be quite so clear cut. There are other ways to specify the names in a module that are meant to be public, and there are naming conventions for things that should not be public (e.g. the name starts with an underscore), but there is no real consistency in how those are used throughout the standard library. A mid-July discussion on the python-dev mailing list considered the problem and some possible solutions; the main outcome seems to be interest in making the rules more explicit.

It should be noted that the Python language does not enforce any access restrictions at all; any program that can import a module can access any top-level name defined in it. All of the "rules" that govern access restrictions are simply conventions, though they are meant to delineate things that can be changed by a module maintainer without going through the usual deprecation cycle. A big part of the public API is effectively a list of names that the module maintainer promises not to change without a good deal of warning (at least two full development cycles).

Rules?

Serhiy Storchaka raised the issue by listing the rules that he thought governed the public/private question for names in modules. They revolve around the use of the __all__ attribute, which is a way to list names (or submodules) that should be imported when a "from module import *" is executed. If there is no __all__, Python will import any names that do not start with underscore in a from import, so those names would be part of the public API, Storchaka suggested. In addition, any name that was explicitly documented to part of the public API would be.

He noted that two bug reports with some recent comments seemed to violate his mental model of how the public API is specified. In the first, Raymond Hettinger asked that all non-public functions in the calendar module be renamed to start with an underscore. In the other, Gregory P. Smith suggested documenting the escape_decode() function in the codecs module because "it is public by virtue of its name". escape_decode() is recommended in answers at sites like Stack Overflow, which is part of what motivated the suggestion.

But in both cases, Storchaka said, the modules have __all__ attributes where the names in question are not listed, so they should not be considered part of the public API. Thus they don't need underscores or documentation, Stack Overflow notwithstanding. Hettinger argued that the calendar module was one that had adhered to the underscore convention along the way until "a recent patch went against that practice". It came to his attention via a tweet from a confused user.

As Storchaka pointed out, however, calendar already had quite a few non-public functions that did not start with an underscore back in Python 3.6. Part of the problem is that some people are using the dir() builtin to examine the names in a module. But dir(module) will give a list of all of the names, public or private, without regard for __all__ or the underscore-prefix convention. Storchaka said that dir() is not the proper tool and suggested the help() builtin instead (e.g. help(module)).

The first line of Hettinger's mail should cover the question: "The RealDefinition™ is that whatever we include in the docs is public, otherwise not." His point about maintaining the conventions used by a module (though calendar is apparently not a good example) was a good one, Brett Cannon said. He thought that the core developers should encourage the leading-underscore practice for new modules, in fact.

But a suggestion from Kyle Stanley to do a mass rename of the standard library did not get far. There are logistical hurdles, in terms of the deprecation cycle, but there is also a question of whether it would solve a real problem or not. Steven D'Aprano pointed out that he had rarely seen people misuse the private parts of a module, "but frankly that's going to happen even if we named them all '_private_implementation_detail_dont_use_this_you_have_been_warned' *wink*". Meanwhile, though, there are a lot of costs to making the change, which D'Aprano described at some length.

He also mentioned a "rule" that governs all of this: "unless explicitly documented public, all imports are private even if not prefixed with an underscore". Stanley replied that he was rethinking his advocacy of a tree-wide change, but wondered where the rule was specified. Steve Dower said that the rule was probably not documented anywhere, but thought D'Aprano's formulation was a reasonable one. In addition, Stanley suggested another path toward cleaning up the inconsistencies in the standard library: the @public decorator.

atpublic

Early on in the thread, Barry Warsaw pointed to his @public decorator project noting one of the "Zen of Python" principles: "Explicit is better than implicit." The module is available from PyPI (thus via pip) under the name "atpublic"; it provides a simple decorator that can be used to explicitly indicate the public names in a module:

    @public
    def foo():
        pass

    def bar():
        pass

    @public
    class Baz:
        pass

    public(QUX=42)
The function foo() and class Baz would be listed in the __all__ attribute, while bar() would not. Since constants cannot be decorated, public() can be used to both define the name and add it to __all__ as seen with QUX above. That way, __all__ will always reflect the current state of the intended public API.

Dower did not think @public was the way forward, in part because it should be harder, rather than easier, to change the names of the public API. He also mentioned the runtime overhead of @public, but Warsaw and others pointed out various ways to make that essentially disappear (even though it is pretty tiny even with the pure Python implementation of the module). Dower backed away from the performance impact question and, instead, looked at the downsides of a tree-wide change. He was also concerned with making changes before the actual policy was clearly articulated:

We already maintain separate documentation from the source code, and this is the canonical reference for what is public or not. Until we make a new policy for __all__ to be the canonical reference, touching every file to use it is premature (let alone adding a builtin).

So I apologise for mentioning that people care about import performance. Let's ignore them/that issue for now and worry instead about making sure people (including us!) know what the canonical reference for public/internal is.

Stanley and Warsaw were both in favor of making it clear what the rule is for delineating the public API, and Cannon said that PEP 8 ("Style Guide for Python Code") would be the right place to put it. Warsaw noted that he was not envisioning some tree-wide operation should @public be adopted; that decorator can and should only be added incrementally. He was also concerned that there have been inconsistencies between the code and the documentation in the past. "The question always becomes whether the source or the documentation is the source of truth. For any individual case, we don't always come down on the same side of that question."

For the most part, there was widespread agreement on the underlying rules for determining the public API. D'Aprano's formulation seems to be a nice, compact way to put it, but a more detailed statement might make it more clear. APIs are tricky beasts; in general, development projects do not spend enough time designing, reviewing, and testing them before they commit to them for the long term. If the rules governing what is even in the API are not clear, it makes things that much worse. Resolving that ambiguity for Python would be a nice step forward.


Index entries for this article
PythonAPI
PythonStandard library


to post comments

Python and public APIs

Posted Aug 1, 2019 2:06 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

The arguments in bug #28292 (the calendar bug) seem to be going in circles:

  • They don't want a lot of code churn from refactoring.
  • They want standard modules to follow consistent rules.

I'm unconvinced you can do both those things at once, in general. Of course, these desires are expressed by different people, so it may just be a difference of opinion.

The other side of the question is whether, and to what extent, the language should change its behavior to facilitate this. I doubt anybody is seriously advocating for Python to introduce formal access control specifiers (public/private like in Java), obviously. But I'm firmly of the opinion that PEP 484 (optional static typing) was a Good Idea, and it may be desirable to take a similar approach here. IIRC pylint can already tell you if you're importing or using a non-public interface, so it may just be a matter of formalizing the rules and/or exposing a warning hook in the interpreter.

Or perhaps Python should go the route of Perl and Javascript, and grow some kind of 'use strict' declaration for these things... from __future__ import i_wish_i_was_writing_java?

Python and public APIs

Posted Aug 1, 2019 13:27 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> "The RealDefinition™ is that whatever we include in the docs is public, otherwise not."

I hope this is adopted because then maybe some of the darker corners of the API will finally get better documentation. The parts around metaloaders, metasearchers, etc. are really sparse for how they're supposed to actually be used. Is a searcher supposed to use a single loader instance for multiple modules? If not, why pass the module name to each method of the loader? Seems kind of inefficient to me to have to do the name lookup N times instead of having the searcher amortize it for you. If you're trying to do some even vaguely off-the-beaten-path stuff with setup.py, the docs are pretty silent (I ended up having to go to the *tests* to figure out what some keyword arguments were supposed to be).

I will say the stdlib API is usually of very high quality, but some of the things "behind the curtains" towards the mechanisms are not so nice.


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds