Python and public APIs
In theory, the public API of a Python standard library module is fully specified as part of its documentation, but in practice it may not be quite so clear cut. There are other ways to specify the names in a module that are meant to be public, and there are naming conventions for things that should not be public (e.g. the name starts with an underscore), but there is no real consistency in how those are used throughout the standard library. A mid-July discussion on the python-dev mailing list considered the problem and some possible solutions; the main outcome seems to be interest in making the rules more explicit.
It should be noted that the Python language does not enforce any access restrictions at all; any program that can import a module can access any top-level name defined in it. All of the "rules" that govern access restrictions are simply conventions, though they are meant to delineate things that can be changed by a module maintainer without going through the usual deprecation cycle. A big part of the public API is effectively a list of names that the module maintainer promises not to change without a good deal of warning (at least two full development cycles).
Rules?
Serhiy Storchaka raised the issue by listing the rules that he thought governed the public/private question for names in modules. They revolve around the use of the __all__ attribute, which is a way to list names (or submodules) that should be imported when a "from module import *" is executed. If there is no __all__, Python will import any names that do not start with underscore in a from import, so those names would be part of the public API, Storchaka suggested. In addition, any name that was explicitly documented to part of the public API would be.
He noted that two bug reports with some recent comments seemed to violate
his mental model of how the public API is specified. In the first, Raymond
Hettinger asked
that all non-public functions in the calendar
module be renamed to start with an underscore. In the other, Gregory
P. Smith suggested
documenting the escape_decode() function in the codecs
module because "it is public by virtue of its name
".
escape_decode() is recommended in answers
at sites like Stack Overflow, which is part of what motivated the suggestion.
But in both cases, Storchaka said, the modules have __all__
attributes where the names in question are not listed, so they should not
be considered part of the public API. Thus they don't need underscores or
documentation, Stack Overflow notwithstanding. Hettinger argued
that the calendar module was one that had adhered to the
underscore convention along the way until "a recent patch went
against that practice
". It came to his attention via a tweet
from a confused user.
As Storchaka pointed out, however, calendar already had quite a few non-public functions that did not start with an underscore back in Python 3.6. Part of the problem is that some people are using the dir() builtin to examine the names in a module. But dir(module) will give a list of all of the names, public or private, without regard for __all__ or the underscore-prefix convention. Storchaka said that dir() is not the proper tool and suggested the help() builtin instead (e.g. help(module)).
The first line of Hettinger's mail should cover the question: "The
RealDefinition™ is that whatever we include in the docs is public,
otherwise not.
" His point about maintaining the conventions used by
a module (though calendar is apparently not a good example) was
a good one, Brett Cannon said.
He thought that the core developers should encourage the leading-underscore
practice for new modules, in fact.
But a suggestion from Kyle Stanley to do a
mass rename of the standard library did not get far. There are logistical
hurdles, in terms of the deprecation cycle, but there is also a question of
whether it would solve a real problem or not. Steven D'Aprano pointed
out that he had rarely seen people misuse the private parts of a
module, "but frankly that's going to happen even if we named them all
'_private_implementation_detail_dont_use_this_you_have_been_warned'
*wink*
". Meanwhile, though, there are a lot of costs to making the
change, which D'Aprano described at some length.
He also mentioned a "rule" that governs all of this: "unless
explicitly documented public, all
imports are private even if not prefixed with an underscore
".
Stanley replied
that he was rethinking his advocacy of a tree-wide change, but wondered
where the rule was specified. Steve Dower said
that the rule was probably not documented anywhere, but thought D'Aprano's
formulation was a reasonable one. In addition, Stanley suggested
another path
toward cleaning up the inconsistencies in the standard library: the
@public decorator.
atpublic
Early on in the thread, Barry Warsaw pointed
to his @public decorator
project noting one of the "Zen of Python"
principles: "Explicit is better than implicit.
" The module is
available from PyPI (thus
via pip) under the name "atpublic"; it provides a simple
decorator that can be used to explicitly indicate the public names in a
module:
@public def foo(): pass def bar(): pass @public class Baz: pass public(QUX=42)The function foo() and class Baz would be listed in the __all__ attribute, while bar() would not. Since constants cannot be decorated, public() can be used to both define the name and add it to __all__ as seen with QUX above. That way, __all__ will always reflect the current state of the intended public API.
Dower did not think @public was the way forward, in part because it should be harder, rather than easier, to change the names of the public API. He also mentioned the runtime overhead of @public, but Warsaw and others pointed out various ways to make that essentially disappear (even though it is pretty tiny even with the pure Python implementation of the module). Dower backed away from the performance impact question and, instead, looked at the downsides of a tree-wide change. He was also concerned with making changes before the actual policy was clearly articulated:
So I apologise for mentioning that people care about import performance. Let's ignore them/that issue for now and worry instead about making sure people (including us!) know what the canonical reference for public/internal is.
Stanley and Warsaw were both in favor of making it clear what the rule is
for delineating the public API, and Cannon said that PEP 8
("Style Guide for Python Code
") would be the right place to
put it. Warsaw noted
that he was not envisioning some tree-wide operation should
@public be adopted; that decorator can and should only be added
incrementally. He was also concerned that there have been inconsistencies
between the code and the documentation in the past. "The question
always becomes whether
the source or the documentation is the source of truth. For any individual
case, we don't always come down on the same side of that question.
"
For the most part, there was widespread agreement on the underlying rules for determining the public API. D'Aprano's formulation seems to be a nice, compact way to put it, but a more detailed statement might make it more clear. APIs are tricky beasts; in general, development projects do not spend enough time designing, reviewing, and testing them before they commit to them for the long term. If the rules governing what is even in the API are not clear, it makes things that much worse. Resolving that ambiguity for Python would be a nice step forward.
Index entries for this article | |
---|---|
Python | API |
Python | Standard library |
Posted Aug 1, 2019 2:06 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
The arguments in bug #28292 (the calendar bug) seem to be going in circles:
I'm unconvinced you can do both those things at once, in general. Of course, these desires are expressed by different people, so it may just be a difference of opinion.
The other side of the question is whether, and to what extent, the language should change its behavior to facilitate this. I doubt anybody is seriously advocating for Python to introduce formal access control specifiers (public/private like in Java), obviously. But I'm firmly of the opinion that PEP 484 (optional static typing) was a Good Idea, and it may be desirable to take a similar approach here. IIRC pylint can already tell you if you're importing or using a non-public interface, so it may just be a matter of formalizing the rules and/or exposing a warning hook in the interpreter.
Or perhaps Python should go the route of Perl and Javascript, and grow some kind of 'use strict' declaration for these things... from __future__ import i_wish_i_was_writing_java?
Posted Aug 1, 2019 13:27 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
I hope this is adopted because then maybe some of the darker corners of the API will finally get better documentation. The parts around metaloaders, metasearchers, etc. are really sparse for how they're supposed to actually be used. Is a searcher supposed to use a single loader instance for multiple modules? If not, why pass the module name to each method of the loader? Seems kind of inefficient to me to have to do the name lookup N times instead of having the searcher amortize it for you. If you're trying to do some even vaguely off-the-beaten-path stuff with setup.py, the docs are pretty silent (I ended up having to go to the *tests* to figure out what some keyword arguments were supposed to be).
I will say the stdlib API is usually of very high quality, but some of the things "behind the curtains" towards the mechanisms are not so nice.
Python and public APIs
Python and public APIs