|
|
Subscribe / Log in / New account

Python attributes, __slots__, and API design

By Jake Edge
July 6, 2021

A discussion on the python-ideas mailing list touched on a number of interesting topics, from the problems with misspelled attribute names through the design of security-sensitive interfaces and to the use of the __slots__ attribute of objects. The latter may not be all that well-known (or well-documented), but could potentially fix the problem at hand, though not in a backward-compatible way. The conversation revolves around the ssl module in the standard library, which has been targeted for upgrades, more than once, over the years—with luck, the maintainers may find time for some upgrades relatively soon.

Thomas Grainger posted about a problem he encountered when setting the minimum TLS version to use for a particular SSLContext using the following code:

    context.miunimum_version = ssl.TLSVersion.TLSv1_3
That was meant to ensure that the program would only use TLS version 1.3 (and TLS 1.4+ someday perhaps), but he observed the program using TLS 1.2. As sharp-eyed readers may have noticed, "minimum_version" has been misspelled, leading to the bug.

It is, of course, no surprise that Python happily accepts the attribute name, even though it is "wrong". In a dynamic language, there is nothing inherently wrong with setting an attribute with an arbitrary name, but this case is a little different. For one thing, SSLContext is, obviously, a security-sensitive object, so an API that requires setting attributes—correctly spelled—may be less than ideal.

One way to potentially fix the problem is by using the __slots__ class variable for the SSLContext, as Jonathan Fine pointed out. A Python class that has a __slots__ entry is restricted to attribute names that are listed in the class variable:

    >>> class Foo:
    ...     __slots__ = ('bar', 'baz')
    ... 
    >>> x = Foo()
    >>> x.bar = 3
    >>> x.qux = 9
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'Foo' object has no attribute 'qux'

Fine also noted that the use of __slots__ is not well-documented; he laid down a challenge, in fact: "Find a page in docs.python.org that describes clearly, with helpful examples, when and how to use __slots__." One of the links he suggested, the "Descriptor HowTo Guide", does have a section with some concrete examples on how __slots__ could be used, though it seemingly falls short of what he was looking for.

Under the covers, __slots__ shorts out the normal per-instance dictionary for attributes and replaces it with a fixed-sized array, which saves some memory and speeds up attribute lookup; it will also catch the kind of pesky spelling error that Grainger ran into by raising an AttributeError at run time. Fine pointed to the warning in red in the ssl description, which strongly suggests reading the "Security Considerations" section of the module documentation. "Given that context is important for security, perhaps it's worthwhile closing the door to spelling errors creating security holes."

"Bluenix" also suggested __slots__, but as Guido van Rossum said, that is not a backward-compatible solution. There could be "code out there that for some reason adds private attributes to an SSLContext instance, and using __slots__ would break such usage." Even if it were determined that a compatibility break was in order here, __slots__ could not be used for other reasons, as Christian Heimes, one of the ssl maintainers, pointed out:

Also __slots__ won't work. The class has class attributes that can be modified in instances. You cannot have attributes that are both class and instance attributes with __slots__. We'd have to overwrite __setattr__() and block unknown attributes of exact instances of ssl.SSLContext.

There are two class attributes, sslsocket_class and sslobject_class, that are specifically mentioned as being settable in an instance, but __slots__ does not work with attributes that can switch between class and instance attributes. Eric V. Smith thought that making a specific fix for SSLContext was not the right approach, though:

Isn't this a general "problem" in python, that's always been present? Why are we trying to address the problem with this specific object? I suggest doing nothing, or else thinking big and solve the general problem, if in fact it needs solving.

But the problem at hand is "that assigning attributes is a bad API", Oscar Benjamin said. He suggested adding a new class with a better interface as a backward-compatible way forward. Others agreed that because of the security-sensitive nature of SSLContext, it deserves "special" treatment; the general "problem" of misspelled attributes is not seen as something that needs to be addressed in the language, however.

Grainger is advocating using a frozen dataclass for SSLContext, though that would also break backward compatibility. Python dataclasses were added for Python 3.7 (in 2018) as a way to represent a collection of data items as the attributes on a object, similar to a C struct. A frozen dataclass gets initialized with a set of values that cannot be changed by setting the attribute directly; Grainger suggested having explicit methods to change the attributes of an SSLContext.

Most who commented in the thread seemed to agree that there is a problem to be solved; Marc-Andre Lemburg put it this way:

IMO, a security relevant API should not use direct attribute access for adjusting important parameters. Those should always be done using functions or method calls which apply extra sanity checks and highlight issues in [the] form of exceptions.

Steven D'Aprano thought that a compatibility break might be in order to try to resolve a "mildly troubling security flaw/bug/vulnerability" Others were less sure of that, though. If there is to be a compatibility break, it should create "a cleaner, more Pythonic API", Brendan Barnwell said. He had some suggestions for how that might look:

Why not have the class accept only valid options at creation time and raise an error if any unexpected arguments are passed? Is there even any reason to allow changing the SSLContext parameters after creation, or could we just freeze them on instance creation and make people create a separate context if they want a different configuration? I think any of these would be better than the current setup that expects people to adjust the options by manually setting attributes one by one after instance creation.

Heimes said that there will not be any incompatible changes made to SSLContext in the near future, however. If time is found to work on this problem for Python 3.11, changes along the lines of the configuration object in PEP 543 ("A Unified TLS API for Python") would be made. We looked at the PEP in early 2017, but it was withdrawn in mid-2020, "due to changes in the APIs of the underlying operating systems". There are still pieces of the PEP that could be used to address the problem that Grainger encountered.

The ssl module has always been a thin layer atop OpenSSL, which has undergone a number of API (and other) changes over the years. Support for TLS in the Python standard library has changed as well; up until Python 3.4 in 2014, TLS certificates were not able to be checked for validity using it at all, for example. The ssl module has seemingly always had a lack of available developer time, which is rather worrisome for a critical piece of security infrastructure. Hopefully some time can be found to at least resolve problems like this that can be caused by a simple misspelling.


Index entries for this article
PythonSecurity


to post comments

Python attributes, __slots__, and API design

Posted Jul 7, 2021 7:01 UTC (Wed) by LtWorf (subscriber, #124958) [Link] (9 responses)

I think it's funny how python is gradually becoming not python.

Annotating types, forbidding to add attributes to objects…

I understand the issue, but I think that if it's an issue for one object it can be an issue in any case. Not all the security issues are created miscalling ssl interfaces.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 11:25 UTC (Wed) by vadim (subscriber, #35271) [Link] (1 responses)

I think that's a perpetual cycle of software development.

You start excited that it's nice and easy to just get stuff done without the compiler whining about something every 5 seconds. Life is good, code gets written fast.

Then a little project grows until there's 20 devs working on it. One day you spend hours figuring out that there's somebody made a typo and in one obscure user branch the code sets data['Username'] rather than data['username'] and this propagates through a bunch of layers until it explodes somewhere else entirely.

And then you start getting thoughts like "If I could have a hash with a fixed set of keys, or the compiler could check that for me, a lot of annoyance could have been avoided". And so you start grafting a way to get that done to your favorite language.

In my old age, I'm starting to develop the idea that writing anything big in something like Perl or Python may be a fundamentally bad idea -- you spend more time debugging issues that could have been avoided, and if you try to graft on checks afterwards it ends up that there are several slightly different and incompatible ways of doing it floating around.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 19:43 UTC (Wed) by pj (subscriber, #4506) [Link]

>In my old age, I'm starting to develop the idea that writing anything big in something like Perl or Python may be a fundamentally bad idea

I tend to agree, if only because with python nothing should be 'big' - anything that might qualify should instead be broken into a bunch of modules, pulled together by a core that uses that functionality.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 14:59 UTC (Wed) by pbryan (guest, #3438) [Link]

Every language undergoes evolution and Python even underwent an unpopular revolution: Python 3000 (aka. Python 3). If Python continues to evolve to meet greater and changing sets of needs, I'm all for it.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 15:38 UTC (Wed) by mb (subscriber, #50428) [Link] (5 responses)

>I think it's funny how python is gradually becoming not python.

__slots__ and typing are optional.
It's up to you, as a developer, whether you use it or not.

Python doesn't become less Python, if optional features are added.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 20:26 UTC (Wed) by Sesse (subscriber, #53779) [Link] (4 responses)

Programming languages are not primarily about their language definition, but by their culture. If everyone ends up using Python types, then Python programming will effectively be different: You can try to go against the grain and not use them, but every other programmer will look funny at you, every library you interface with will require them, all documentation and sample code you can find will use them. It doesn't matter how optional the interpreter says they are; Python as a language has still changed. (Whether you consider programming with static typing “less Python” or not is a different question, of course.)

Python attributes, __slots__, and API design

Posted Jul 7, 2021 20:42 UTC (Wed) by mb (subscriber, #50428) [Link] (3 responses)

>every library you interface with will require them

That's not how typing works in Python.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 21:52 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (2 responses)

Just as a refresher for anyone who hasn't been following this, Python's static typing essentially consists of the following:

* Optional syntax for annotating the types of things. The interpreter treats this syntax as a glorified comment. It checks for basic syntactic validity and NameError, and that's about it. If you say that the type of an object is "12", or don't indicate a type at all, the interpreter is perfectly happy with that. If you write a syntax error, it will yell at you, but that's hardly a problem since you can just omit annotations altogether.
* The typing module, which includes a number of classes that let you write things like Union[foo, bar] or Optional[T]. The resulting objects are (for the most part) just opaque blobs that have reasonable-looking repr() text. They generally don't have any actual logic in them, and are just designed to remember the arguments you passed in.
* A set of linting rules for checking the types of things. CPython itself does not include a reference implementation of those rules, so the CPython interpreter is incapable of applying them. Instead, you have to download a third-party linter such as mypy. Obviously, the linter will be unhappy if an object's type is "12", but if you're running the linter, then you presumably wanted to be warned about that... right?

Hypothetically, if the entire Python community decided tomorrow that all new code must have static types, then the only problem you would have is that some people would file bugs against your project saying the linter doesn't like it. You can WONTFIX those bugs and carry on as usual, if that is your inclination. There should be no compatibility issues with anyone else's code. If you write library code, and your library is unannotated, then some people might be unhappy about that (because it would make the linter less accurate on their application code where it calls into your library), but that's arguably their problem, not yours.

(If you want, you can provide a set of type hints for your external API without having to type hint every line of code inside your library. This is particularly useful for C extensions, which otherwise would not be possible to annotate.)

Python attributes, __slots__, and API design

Posted Jul 7, 2021 23:03 UTC (Wed) by Sesse (subscriber, #53779) [Link] (1 responses)

I am fully aware, but it's also missing my point. Yes, you can ignore whatever you'd like and code in your own little corner. You can also write Python with braces and have a preprocessor convert them into the right indentation. But at some point, that's going to be culturally very much uphill. (I guess I should have picked a different example than types.)

Python attributes, __slots__, and API design

Posted Jul 7, 2021 23:30 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

Seeing as this whole discussion was already about types to begin with, I frankly fail to understand what kind of point you were trying to make in the first place.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 12:30 UTC (Wed) by smurf (subscriber, #17840) [Link] (3 responses)

Python doesn't have checks for attribute misspellings in most standard library objects.

Maybe the Python people should propagate the use of mypy and/or similar tools to find this sort of problem.

Python attributes, __slots__, and API design

Posted Jul 7, 2021 12:36 UTC (Wed) by mradziej (subscriber, #124815) [Link] (2 responses)

Exactly. There are lots of tools that can detect this kind of mistake.

And one lesson learned: Using an attribute as part of the API is usually a bad idea for a python library. Make it a function call.

Python attributes, __slots__, and API design

Posted Jul 9, 2021 18:43 UTC (Fri) by bluss (guest, #47454) [Link] (1 responses)

Using attributes is pretty much promoted as the pythonic way to do it (See: Any Raymond Hettinger talk). It's also much easier to read. The upgrade path is using properties if you need to add getter/setter logic.

Python attributes, __slots__, and API design

Posted Jul 27, 2021 18:14 UTC (Tue) by sammythesnake (guest, #17693) [Link]

The problem is that - properties or not - there's no way for the class to spot a mistyped attempt at using a valid attribute name without using __slots__ or overriding __setattr__ or the like (and bring able to extend that class with a subclass needs planning for, too)

Personally, I *want* static typing and complaints about my dumb mistakes. It'd be nice to be able to get that without reams of boilerplate, which itself is exactly where I'll be making a bunch of boring adult mistakes of exactly the kind that computers are better at spotting than me!

I would love to be able to user all the rich typing info in my annotations to complain when somebody (most likely me) tries passing a parameter that is of the wrong type, but isinstance(param, mapping [str, some_type]) isn't available, so I have to write loops or comprehensions or whatever by hand for each case, making the code really verbose.

Honestly, the need for either attacks of boilerplate or deep dives into reimplementing fairly for boys of functionality just to make mistakes hard is my main gripe against python.

One thing I've found appears in my current project a dozen times is that I want a way to delegate a bunch of dunders to some class member en masse so I can make a class implement, say, the mapping protocol by nominating some dict member as the real mapping. I either have to write a list of one liner methods or read up on how to write a decorator to do it.

I think all these are things I (or some library writer) could do with decorators, but I've not found such a library and I haven't dug into how to make decorators do it. I guess I'll read up on the decorator thing...


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds