|
|
Subscribe / Log in / New account

Improving Python's SimpleNamespace

By Jake Edge
April 29, 2020

Python's SimpleNamespace class provides an easy way for a programmer to create an object to store values as attributes without creating their own (almost empty) class. While it is useful (and used) in its present form, Raymond Hettinger thinks it could be better. He would like to see the hooks used by mappings (e.g. dictionaries) added to the class, so that attributes can be added and removed using either x.a or x['a']. It would bring benefits for JSON handling and more in the language.

A SimpleNamespace provides a mechanism to instantiate an object that can hold attributes and nothing else. It is, in effect, an empty class with a fancier __init__() and a helpful __repr__():

    >>> from types import SimpleNamespace
    >>> sn = SimpleNamespace(x = 1, y = 2)
    >>> sn
    namespace(x=1, y=2)
    >>> sn.z = 'foo'
    >>> del(sn.x)
    >>> sn
    namespace(y=2, z='foo')

Hettinger proposed his idea to the python-dev mailing list in mid-April. He described it as follows:

SimpleNamespace() is really good at giving attribute style-access. I would like to make that functionality available to the JSON module (or just about anything else that accepts a custom dict) by adding the magic methods for mappings so that this works:
     catalog = json.load(f, object_hook=SimpleNamespace)
     print(catalog['clothing']['mens']['shoes']['extra_wide']['quantity']) # currently possible with dict()
     print(catalog.clothing.mens.shoes.extra_wide.quantity)                # proposed with SimpleNamespace()
     print(catalog.clothing.boys['3t'].tops.quantity)                      # would also be supported
The json.load() function will use the object_hook to create SimpleNamespace objects rather than dictionaries. Then a mixture of operations can be used to retrieve information from the data structure. In effect, json.load() will be using dictionary-style access to store things into the data structure and Hettinger wants the ability to work with it using attribute notation.

There are examples of production code that does this sort of thing, he said, but each user needs to reinvent the wheel: "This is kind of [a] bummer because the custom subclasses are a pain to write, are non-standard, and are generally somewhat slow." He had started with a feature request in the Python bug tracker, but responses there suggested adding a new class.

[...] but I don't see the point in substantially duplicating everything SimpleNamespace already does just so we can add some supporting dunder methods. Please add more commentary so we can figure-out the best way to offer this powerful functionality.

Guido van Rossum thought that kind of usage was not particularly Pythonic, and was not really in favor of propagating it:

I've seen this pattern a lot at a past employer, and despite the obvious convenience I've come to see it as an anti-pattern: for people expecting Python semantics it's quite surprising to read code that writes foo.bar and then reads back foo['bar']. We should not try to import JavaScript's object model into Python.

Kyle Stanley wondered if it made sense for the feature to reside in the json module; "that seems like the most useful and intuitive location for the dot notation". He thought that JSON users would not be surprised by that style of usage, but Van Rossum disagreed:

Well, as a user of JSON in Python I *would* be surprised by it, since the actual JSON notation uses dicts, and most Python code I've seen that access raw JSON data directly uses dict notation. Where you see dot notation is if the raw JSON dict is verified and converted to a regular object (usually with the help of some schema library), but there dict notation is questionable.

Several others agreed that the duality of object and dictionary access was not a good fit for Python, but there is a still a problem to be solved, as Hettinger noted: "working with heavily nested dictionaries (typical for JSON) is no fun with square brackets and quotation marks". Victor Stinner listed a handful of different projects from the Python Package Index (PyPI) that provide some or all of the features that are desired, but he did not see that any of those had "been battle-tested and gained enough popularity" that they should be considered for the standard library.

Stinner (and others in the thread) pointed to the glom library as one that might be of use in working with deeply nested JSON data. But the "AttrDict" pattern is rather popular, as Hettinger pointed out. glom can do lots more things, but it is not able to freely mix and match the two access types as Hettinger wants.

There were some who thought it might be reasonable for the json module to provide the functionality, as Stanley had suggested, including Van Rossum who seemed to come around to the idea. Glenn Linderman supported adding the feature in a bug report comment; he thinks it is useful well beyond just JSON. "Such a feature is just too practical not to be Pythonic." Similarly, Cameron Simpson thought it would make a good addition:

I'm with Raymond here. I think my position is that unlike most classes, SimpleNamespace has very simple semantics, and no __getitem__ facility at all, so making __getitem__ map to __getattr__ seems low impact.

It is true that adding dictionary-like functionality to SimpleNamespace should not affect existing code, but most in the thread still seem to be against adding the feature to that class. Eric Snow put it this way:

Keep in mind that I added SimpleNamespace when implementing PEP [Python Enhancement Proposal] 421, to use for the new "sys.implementation". The whole point was to keep it simple, as the docs suggest.

Perhaps the most radical suggestion came from Rob Cliffe. He thought it might make sense to add a new operator to the language (perhaps "..") with no default implementation. That would allow classes to define the operator for themselves:

Then in a specific class you could implement x..y to mean x['y'] and then you could write
    obj..abc..def..ghi
Still fairly concise, but warns that what is happening is not normal attribute lookup.

As Stinner pointed out, though, that and some of the other more speculative posts probably belonged in a python-ideas thread instead. It does not seem particularly likely that SimpleNamespace will be getting this added feature anytime soon—or at all. There is enough opposition to making that change, but there is recognition of the problem, so some other solution might come about. It would, presumably, need the PEP treatment, though; a visit to python-ideas might be in the offing as well.


Index entries for this article
PythonEnhancements


to post comments

Improving Python's SimpleNamespace

Posted Apr 29, 2020 17:18 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (3 responses)

Honestly, SimpleNamespace has always struck me as a weird class. It feels like a dict() playing dress-up as a "real" object. But I've never understood why that is considered useful or desirable. If you have a particular schema in mind, then surely dataclasses (or attrs) are more useful than SimpleNamespace, and if you don't, then you probably ought to be using a dict instead, to call attention to the fact that d['key'] can throw. I suppose it might be useful as a shortcut for "I would use dataclasses but I don't actually need their functionality" - but then you really ought to document the schema somewhere else, and I don't think that actually saves you all that much work. dataclasses do not take a lot of typing to create, after all.

(I'm also not a huge fan of JSON's object_hook, since it's slightly too dumb to actually deal with complicated JSON object structures correctly - the type of an object is context-sensitive and needs to be recursively deduced based on the parent type, unless you were clever and preemptively tagged all of your JSON objects with a type hint. Unfortunately, I don't see that sort of tagging much in practice. So you end up just converting everything to dicts and then manually parsing it into the actual object type. A declarative way of writing these schemata, and passing them directly to json.load(), would be Nice To Have. Perhaps just pass the dataclass of the root object as an argument or something like that? dataclasses already have all the introspection support required for json to figure the rest out on its own.)

Improving Python's SimpleNamespace

Posted Apr 29, 2020 19:44 UTC (Wed) by martin.langhoff (subscriber, #61417) [Link] (2 responses)

In some languages, simple-ish objects and dicts are used (almost) interchangeably. Some folks "fold" this into their coding style. I've seen it done in PHP and in JS.

In other words -- it's dynamic typing, applied to complex variables.

Just like dynamic typing, it works well for small projects, breaks down eventually because they are not the same thing.

Improving Python's SimpleNamespace

Posted Apr 29, 2020 22:03 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (1 responses)

Dynamically-typed JSON seems like a really Bad Idea to me. JSON is an interaction point with the outside world, and very likely to contain untrusted (or only marginally trusted) data. At a bare minimum, you should be checking the types of the parsed objects, recursively in all sub-objects. Otherwise, it's ridiculously easy for an attacker to cause all sorts of headaches. Consider:

>>> ham = json.loads(r'{"eggs": 5}')
>>> ham['eggs'] * 1000000000  # Convert seconds to nanoseconds
5000000000
>>> spam = json.loads(r'{"eggs": [1, 2, 3]}')
>>> spam['eggs'] * 1000000000  # Convert tiny list to MemoryError

And I'm sure there are more "interesting" examples than just OOMing the client. But by the time you're doing recursive type checking, it's really hard to justify not using a "real" statically-checkable type like a dataclass. The only excuse I can think of is that the standard library lacks a facility to do it automatically, which is a shame because I could probably bang that out in an hour or two (with the bulk of that time devoted to re-normalizing the weird type objects in typing.py back into classes that you can hand to isinstance()).

Improving Python's SimpleNamespace

Posted Apr 30, 2020 7:11 UTC (Thu) by smurf (subscriber, #17840) [Link]

That's true, and a JsonSchemaDict that enforces not storing nonsense into it would be a superb idea (assuming it doesn't exist already), but this is orthogonal to the current discussion.

Improving Python's SimpleNamespace

Posted Apr 29, 2020 20:21 UTC (Wed) by amarao (guest, #87073) [Link] (21 responses)

I think the problem everyone is fixing not a dict/attr thing, but a simple convince. Nested dicts and frequent access to dicts through constant keys is a pattern everyone is using. To use a value from a dict a rather bizzare operator is needed. [' and then ']. If someone made it one character everyone would be happy.

Improving Python's SimpleNamespace

Posted Apr 30, 2020 2:22 UTC (Thu) by dtlin (subscriber, #36537) [Link] (19 responses)

That's an interesting idea. Using something like obj!abc!def!ghi to mean obj['abc']['def']['ghi'] wouldn't break any existing code. Still feels pretty foreign to Python though.

Improving Python's SimpleNamespace

Posted Apr 30, 2020 5:12 UTC (Thu) by neilbrown (subscriber, #359) [Link] (18 responses)

> Still feels pretty foreign to Python though.

I wonder what you mean by that exactly. For example, does it contravene some part of the Zen of Python?

Simple is better than complex.
Readability counts.
Now is better than never.

All seem to support it.

The article mentioned a ".." suggestion which is much like this "!" suggestion. Would ".." feel less foreign??

Improving Python's SimpleNamespace

Posted Apr 30, 2020 6:19 UTC (Thu) by dtlin (subscriber, #36537) [Link] (17 responses)

There should be one-- and preferably only one --obvious way to do it.

Obviously there's other forms of syntactic sugar in Python. But it seems to me like other sugar has more benefit than saving 3 characters - or 2 characters, in the case of obj..abc..def..ghi. IMO obj['abc']['def']['ghi'] already scores reasonably well along simple, readable, and now measures, so a proposal should be substantially better.

Improving Python's SimpleNamespace

Posted Apr 30, 2020 7:16 UTC (Thu) by smurf (subscriber, #17840) [Link] (16 responses)

Well, I disagree. I also seem not to be the only person who has re-invented "AttrDict" (even using that exact name), so I might be a bit prejudiced.

It's also not just readability but also typing. One dot is one keystroke with pretty much any keyboard layout ever. On a German keyboard, however, [''] requires eight (brackets require AltGr while single quotes need Shift). Owch.

Typing costs of non-English keyboard layouts in programming languages

Posted Apr 30, 2020 10:42 UTC (Thu) by mbunkus (subscriber, #87248) [Link] (7 responses)

This is a real problem. I'm German like most Germans used to use German keyboard layout. A lot of years ago I had to write so much LaTeX by day (job) & C++ by night (hobby) that I developed pain in my right wrist (possibly RPI, though I never went to see a doctor). The reason is that on German keyboards you need to press AltGr (think "right Alt key" for non-German keyboard users) for all of the most often used characters in LaTeX: \ { } [ ]

Not only that, for pressing{ with one hand you really need to do funky acrobatics as it's on AltGr+7. Doing that for several hours a day _hurt_! Look at images of German keyboard layouts to get an idea how you have to contort your hand for that.

The result was that I switched to English layouts, even with German keyboards. I then spent hours on implementing some way to write German Umlauts & ß without too much hassle (I also switched to using ergonomic keyboards, but that's a different topic).

smurf is right, having to type asd['qwe']['whatever'] requires a LOT of changing states of different modifier keys, it slows down typing significantly: a s d press&hold AltGr 8 release AltGr press&hold Shift # release Shift q w e press&hold Shift # release Shift press&hold AltGr+9 etc. etc.

I'm pretty sure other non-English languages have similar problems typing something like that.

Typing costs of non-English keyboard layouts in programming languages

Posted Apr 30, 2020 14:12 UTC (Thu) by NAR (subscriber, #1313) [Link] (2 responses)

This is the reason why I use "hunglish" layout: English layout, extra Hungarian characters available by AltGr (mostly) on the right side of the keyboard (e.g. 0-=[];'\), so it's fairly easy to type them. I never understood people who can use Hungarian layout for programming... BTW Elixir has maps (the usual associative arrays) in the language. It has also structs, but those are implemented using maps with a special __struct__ field containing the type and the field names of the structs are keys in the map (as atoms). The generic syntax for accessing maps is the usual map[key], while for structs it is struct.attribute. So far so good, it's probably what people from other languages expect. However, for some reason if the keys of a map are atoms, the "struct syntax" also works - which sometimes drives me nuts as I can't tell by looking at code like this:

some.thing

that it's accessing a struct or a map. It also has an interesting interplay with command line expansion - it is possible to create atom names with space (crazy idea, but possible):

iex(8)> m3 = %{:"a b" => "c"}
%{"a b": "c"}

Then when I type m3. followed by TAB, the shell helpfully extends the field name, so I get

iex(9)> m3.a b
** (CompileError) iex:9: undefined function b/0

Of course, m3."a b" or m3[:"a b"] works.

Typing costs of non-English keyboard layouts in programming languages

Posted May 4, 2020 12:14 UTC (Mon) by ballombe (subscriber, #9523) [Link] (1 responses)

I do something similar. I stopped using X for typing when xmodmap was deprecated in favor of unusable xkb.

Typing costs of non-English keyboard layouts in programming languages

Posted May 4, 2020 12:58 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I use Xcompose settings which still seem to work even in the xkb era. `setxkbmap` is how I enable AltGr/compose keys on my layout as well. `xmodmap` seems mostly replaced with libinput these days. For example, here's how I remap Caps Lock to be Backspace:

% cat /etc/udev/99-kb-capslock.hwdb
evdev:input:b0011v0001p0001eAB41-*:
KEYBOARD_KEY_70039=backspace

which means that it also works on the TTY and not just when X is running (and apps can't sniff the fact that it is Caps Lock behind my back and do the wrong thing).

Typing costs of non-English keyboard layouts in programming languages

Posted May 1, 2020 8:03 UTC (Fri) by knuto (subscriber, #96401) [Link]

This brought back some old memories here...
Yes, same issue in Norwegian, both {} and [] requires AltGr. But when iso8859-1 emerged,
it was still an enormous a relief compared to back in the 7 bit ascii days when a construct like

\documentstyle[12pt]{report}

would show up on Norwegian screens as

ØdocumentstyleÆ12ptÅæreportå

and 'hopeless' == 'håpløst' in Norwegian
would have to be written as the less readable 'hØaaplØost'

I ended up writing a small 138 lines of c code preprocessor for TeX which I used for all my early years of TeX and subsequently LaTeX work that used /<> instead of \{} , translated all the æøå variants to the right escape sequences and an escape construct @( @) and special handling of \begin{verbatim} .. \end{verbatim}.

Typing costs of non-English keyboard layouts in programming languages

Posted May 7, 2020 22:24 UTC (Thu) by flussence (guest, #85566) [Link]

I've been using a weird mini keyboard for a few years now. Having {}[]()<> all equally awkward to reach (though thankfully none of them require one-hand chording) has really made me appreciate programming languages that offer more than one way to do it.

But now I can emphasise with the plight of users who get this experience by default. A lot of programming's still too ASCII-centric.

Typing costs of non-English keyboard layouts in programming languages

Posted May 7, 2020 23:01 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Have you tried installing two keyboard layouts and switching between them (CapsLock is the best switch key)?

Typing costs of non-English keyboard layouts in programming languages

Posted May 9, 2020 16:42 UTC (Sat) by smurf (subscriber, #17840) [Link]

That's a good idea in principle – I tried, but I need to use computers where this is not possible far too often. The mental effort to switch is … painful. I positively envy people who can do that on the fly.

Improving Python's SimpleNamespace

Posted May 2, 2020 4:20 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (7 responses)

One of the more interesting suggestions in that thread, from Chris Angelico:

My solution to that has usually been something along the lines of:

def get(obj, path):
    for step in path.split("-"):
        obj = obj[step]
    return obj

print(get(catalog, 'clothing-mens-shoes-extra_wide-quantity'))

Will often be custom-tweaked to the situation, but the basic idea is the same.

My 2 cents: I would much prefer forward slashes as the separator (by analogy with filesystem paths), but otherwise that looks quite reasonable to me. It also solves the foo["names with multiple words aren't valid identifiers"] problem. And as an extra bonus, this requires no changes to core Python or the standard library, and can traverse any dict-like object that supports __getitem__(), rather than being a class in its own right (no ugly multiple inheritance if you want to combine functionality with another mapping type).

Improving Python's SimpleNamespace

Posted May 2, 2020 7:12 UTC (Sat) by dtlin (subscriber, #36537) [Link] (4 responses)

Seems to me that it makes more sense to keep splitting the responsibility of the caller:

def deep_getitem(obj, *path):
    return functools.reduce(operator.getitem, path, obj)

deep_getitem(catalog, *'clothing mens shoes extra_wide quantity'.split())

The caller should know what an appropriate separator is, and could even build the path up from multiple parts split in different ways if that's appropriate.

Although that reminds me of how convenient Perl's qw(...) and Ruby's %w[...] are. I wonder if there might be interest in some hypothetical w-string in Python, such that

w'hello world' == ["hello", "world"]

Improving Python's SimpleNamespace

Posted May 2, 2020 8:47 UTC (Sat) by smurf (subscriber, #17840) [Link] (1 responses)

> w'hello world' == ["hello", "world"]

One more character and it works today.

w/'hello world'

Writing the three-line singleton object 'w', with an appropriate dunder method, is left as an exercise to the reader.

Improving Python's SimpleNamespace

Posted May 2, 2020 10:30 UTC (Sat) by dtlin (subscriber, #36537) [Link]

Yeah, that would be simple, but precedence doesn't work out entirely in our favor:

w/'hello world'[0]

would result in "h" instead of "hello".

Improving Python's SimpleNamespace

Posted May 3, 2020 14:35 UTC (Sun) by kleptog (subscriber, #1183) [Link] (1 responses)

When dealing with complex data structures from a database or client it's useful to have a kind of "deep get" like you have, but one that gracefully handles missing entries well. Otherwise you end up with horrors like:
a.get('foo', {}).get('bar', {}).get('baz', None)
Ideally you'd like something that also handled arrays, but Python doesn't have an easy way to index an array with a default when you go off the end.

I've often ended up coding methods to do this, but it'd be cool if there was something standard.

Improving Python's SimpleNamespace

Posted May 7, 2020 5:45 UTC (Thu) by njs (subscriber, #40338) [Link]

This kind of stuff is exactly what the 'glom' package (linked in the article) is all about.

Improving Python's SimpleNamespace

Posted May 2, 2020 8:03 UTC (Sat) by PhilippWendler (subscriber, #126612) [Link] (1 responses)

Especially in complex cases JSONPath is a nice standard for this, it would allow you to write 'clothing.mens.shoes.extra_wide.quantity' here but also supports arrays etc. (like if shoes where an array and you need to find the element in it that has the extra_wide property set to true, and then retrieve quantity from it). I have not yet used it from a Python project, but it seems there are libraries implementing.

Improving Python's SimpleNamespace

Posted May 3, 2020 11:24 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

I've always ended up using the JSON Pointer standard for stuff like this. Are they basically the same, with Pointer preferring `/` rather than `.`. But it seems there are quite a few existing standards for this. `-` truly would be an xkcd#927 instance :) .

Improving Python's SimpleNamespace

Posted May 1, 2020 1:18 UTC (Fri) by moxfyre (guest, #13847) [Link]

I agree. It's a matter of convenient access, whether you're populating contents “on the fly” (more a use case for dict) or with a relatively small set of fixed names (more a use case for a custom class or attrs).

In my vpn-slice and wtf utilities, I have long used an even simpler version of SimpleNameSpace, dubbed slurpy:

# Quacks like a dict and an object
class slurpy(dict):
    def __getattr__(self, k):
        try:
            return self[k]
        except KeyError as e:
            raise AttributeError(*e.args)
    def __setattr__(self, k, v):
        self[k]=v
This allows you to create an object like d = slurpy(foo="bar", baz=1) and then refer to any of its members/contents either by member access (.foo) or by item access (["foo"]).

It's very simple and performs well for a pure-Python implementation, and it even throws KeyError or AttributeError appropriately so that callers/REPLs don't get confused by the “wrong” kind of exception.

(See https://github.com/dlenski/vpn-slice/blob/HEAD/vpn_slice/util.py#L13-L22 and https://github.com/dlenski/wtf/blob/HEAD/wtf.py#L10-L18 for some context as to how this is useful.)

Improving Python's SimpleNamespace

Posted Apr 30, 2020 7:27 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (3 responses)

I have written a library, called typedload[1], that I use to put json data into dataclass or similar.

@dataclass
class A:
a: int
b: List[str] = field(default_factory=list)

typedload.load(data, A)

It tells mypy of the output type, will do the runtime checks to make sure the type is actually correct, the exceptions offer a way to figure out where exactly in the data the error happened.

It has a number of options, for example to disallow unknown fields in the dictionaries that are not in the classes, and allows to define custom functions to load into whatever type.

Personally I'd rather be able to access fields that exist for sure than expect exceptions to happen all over the place.

[1]: https://github.com/ltworf/typedload

Improving Python's SimpleNamespace

Posted May 1, 2020 9:52 UTC (Fri) by tamasrepus (subscriber, #33205) [Link] (2 responses)

Improving Python's SimpleNamespace

Posted May 1, 2020 15:01 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

Yes they are pretty similar.

I'd trust mine more because it has tests running on all the python versions that are supported and on mypy.

jsons doesn't seem to use mypy, because at a casual glance I found some typing errors.

jsons supports more types, but typedload has a better code for unions and is more customisable.

Pretty similar but jsons is MIT licensed and typedload is GPL3 so I guess I lose on the license. Which is pretty deliberate because I don't want my free time work to be used for free by proprietary projects.

Ah, mine one already exists in Debian, so that's a slight advantage.

Improving Python's SimpleNamespace

Posted Aug 7, 2022 21:48 UTC (Sun) by LtWorf (subscriber, #124958) [Link]

After over 2 years I got around to try jsons :D

It seems to be very low quality. For example loading 1.1 into a Union[int, float] returns 1, which is obviously wrong.

It's also 10x to 40x times slower than typedload.

Despite this it has 8x more downloads :)


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds