|
|
Subscribe / Log in / New account

Python time-zone handling

By Jake Edge
March 4, 2020

Handling time zones is a pretty messy affair overall, but language runtimes may have even bigger problems. As a recent discussion on the Python discussion forum shows, there are considerations beyond those that an operating system or distribution needs to handle. Adding support for the IANA time zone database to the Python standard library, which would allow using names like "America/Mazatlan" to designate time zones, is more complicated than one might think—especially for a language trying to support multiple platforms.

It may come as a surprise to some that Python has no support in the standard library for getting time-zone information from the IANA database (also known as the Olson database after its founder). The datetime module in the standard library has the idea of a "time zone" but populating an instance from the database is typically done using one of two modules from the Python Package Index (PyPI): pytz or dateutil. Paul Ganssle is the maintainer of dateutil and a contributor to datetime; he has put out a draft Python Enhancement Proposal (PEP) to add IANA database support as a new standard library module.

Ganssle gave a presentation at the 2019 Python Language Summit about the problem. On February 25, he posted a draft of PEP 615 ("Support for the IANA Time Zone Database in the Standard Library"). The original posted version of the PEP can be found in the PEPs GitHub repository. The datetime.tzinfo abstract base class provides ways "to implement arbitrarily complex time zone rules", but he has observed that users want to work with three time-zone types: fixed offsets from UTC, the system time zone, and IANA time zones. The standard library supports the first type with datetime.timezone objects, and the second to a certain extent, but does not support IANA time zones at all.

There are some wrinkles to handling time zones, starting with the fact that they change—frequently. The IANA database is updated multiple times per year; "between 1997 and 2020, there have been between 3 and 21 releases per year, often in response to changes in time zone rules with little to no notice". Linux and macOS have packages with that information which get updated as usual, but the situation for Windows is more complicated. Beyond that, there is a question of what should happen in a running program when the time-zone information changes out from under it.

The PEP proposes adding a top-level zoneinfo standard library module with a zoneinfo.ZoneInfo class for objects corresponding to a particular time zone. A call like:

    tz = zoneinfo.ZoneInfo("Australia/Brisbane")
will search for a corresponding Time Zone Information Format (TZif) file in various locations to populate the object. The zoneinfo.TZPATH list will be consulted to find the file of interest.

On Unix-like systems, that variable will be set to a list of the standard locations (e.g. /usr/share/zoneinfo, /etc/zoneinfo) where the time-zone data files are normally stored. On Windows, there is no official location for the system-wide time-zone information, so TZPATH will initially be empty. The PEP proposes that a data-only tzdata package be created for PyPI that would be maintained by the CPython core developers. That could be used on Windows systems to provide a source for the IANA database information.

By default, ZoneInfo objects would effectively be singletons; a cache would be maintained so that repeated uses of the same time-zone name would return the exact same object. That is not specifically being done for efficiency reasons, but to ensure that times in the same time zone will be handled correctly. The existing datetime arithmetic operations only consider time zones to be equal if they are the same object, not just if they contain the same information. But caching also protects running programs from strange behavior if the underlying time-zone data changes. Effectively, the data will be read once, on first use, and never change again until the interpreter is restarted.

There is support for loading time zones without consulting (or changing) the cache, as well as for clearing the cache, which would effectively reload the time zone for any new ZoneInfo object. But getting updates to time zones mid-stream is problematic in its own right, Ganssle said:

In the end, always getting “the latest data” is fraught with edge cases anyway, and the fact that datetime semantics rely on object identity rather than object equality just adds to the edge cases that are possible.

I will note that there is some precedent in this very area: local time information is only updated in response to a call to time.tzset(), and even that doesn’t work on Windows. The equivalent to calling time.tzset() to get updated time zone information would be calling ZoneInfo.clear_cache() to force ZoneInfo to use the updated data (or to always bypass the main constructor and use the .nocache() constructor).

But Florian Weimer was concerned that users would want those time-zone updates to automatically be incorporated, so he sees the caching behavior as problematic. "I do not think that users would want to restart their application (with a scheduled downtime) just to apply one of those updates." Ganssle acknowledged the concern, "but there are a lot of reasons to use the cache, and good reasons to believe that using the cache won’t be a problem". He went on to note that both pytz and dateutil already behave this way and he has heard no complaints. He also gave an example of surprising behavior without any caching:

>>> from datetime import *
>>> from zoneinfo import ZoneInfo
>>> dt0 = datetime(2020, 3, 8, tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt1 = dt0 + timedelta(1)
>>> dt2 = dt1.replace(tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt2 == dt1
True

Each call to ZoneInfo.nocache() will return a different object, even if the time-zone name is the same. So dt1 and dt2 have the same time-zone information, but different ZoneInfo objects. The two datetime objects compare "equal" (==) because they represent the same "wall time", but that does not mean that arithmetic operations will behave as one might expect:

>>> print(dt2 - dt1)
0:00:00
>>> print(dt2 - dt0)
23:00:00
>>> print(dt1 - dt0)
1 day, 0:00:00

March 8, 2020 is the day of the daylight savings time transition in the US, so adding one day (i.e. timedelta(1)) crosses that boundary. In a followup message, he explained more about the oddities of datetime math that are shown by the example:

This is because there’s an STD->DST transition between 2020-03-08 and 2020-03-09, so the difference in wall time is 24 hours, but the absolute elapsed time is 23 hours.

[...] So dt2 - dt0 is treated as two different zones and the math is done in UTC, whereas dt1 - dt0 is treated as the same zone, and the math is done in local time.

dt1 will necessarily be the same zone as dt0, because it’s the result of an arithmetical operation on dt0. dt2 is a different zone because I bypassed the cache, but if it hit the cache, the two would be the same.

Using the pickle object-serialization mechanism on ZoneInfo objects was also discussed. The PEP originally proposed that pickling a ZoneInfo object would serialize all of the information from the object (e.g. all of the current and historical transition dates), rather than simply serializing the key (e.g. "America/NewYork"). Only serializing the key could lead to problems when de-serializing the object with a different set of time-zone data (e.g. the "Asia/Qostanay" time zone was added in 2018).

But, as pytz maintainer Stuart Bishop pointed out, serializing all of the transition data is likely to lead to other, worse problems:

If I serialize ‘2022-06-05 14:00 Europe/Berlin’ today, and deserialize it in two years time after Berlin has ratified EU recommendations and abolished DST, then there are two possible results. If my application requires calendaring semantics, when deserializing I want to apply the current timezone definition, and my appointment at 2pm in Berlin is still at 2pm in Berlin. Because I need wallclock time (the time a clock hung on the wall in that location should show). If I wanted a fixed timestamp, best practice is to convert it to UTC to avoid all the potential traps, but it would also be ok to deserialize the time using the old, incorrect offset it was stored with and end up with 1pm wallclock time.

The PEP specifies that datetimes get serialized with all transition data. That seems unnecessary, as the transition data is reasonably likely to be wrong when it is de-serialized, and I can’t think of any use cases where you want to continue using the wrong data.

Ganssle agreed that it makes more sense to pickle ZoneInfo objects "by reference" (i.e. by time-zone name), though providing a way to also pickle "by value" for those who need or want it would be an option. Guido van Rossum had suggested an approach where a RawZoneInfo class would underlie ZoneInfo objects. Pickling a RawZoneInfo could be done by value. Ganssle liked that idea but thought that it could always be added later if there was a need for it; dateutil.tz already gives the by-value ability, so that could be used in the interim if needed.

Overall, the reaction to the PEP seems quite favorable. Bishop said that he looks forward to "being able to deprecate pytz, making it a thin wrapper around the standard library when run with a supported Python". Ganssle is still working out some of the details, particularly around whether to automatically install the tzdata module for platforms where there is no system-supplied IANA database. It seems likely that we will soon see support for IANA time zones in Python—presumably in Python 3.9 in October.


Index entries for this article
PythonPython Enhancement Proposals (PEP)/PEP 615


to post comments

tzdata versions

Posted Mar 4, 2020 12:10 UTC (Wed) by mirabilos (subscriber, #84359) [Link] (3 responses)

I’d argue that the *default* case, and the one a user would _want_ as default case, should be to have the same version of the tzdata information used in Python that is used by libc and thus almost (Perl notwithstanding) anywhere else on the system.

Getting more up-to-date tzdata should be the secondary case. Optional, possible, but not default.

This is mostly for consistency.

tzdata versions

Posted Mar 4, 2020 16:29 UTC (Wed) by Nahor (subscriber, #51583) [Link] (1 responses)

If your Python app talks to some other Python app, using consistent data between the two systems might be better. But if your app talks to the user (e.g. clock app), or to some non-Python app (e.g. a web server), or using crypto (certificates), or ... using an up-to-date timeinfo is likely better. Feels to me that the latter is most common than the former.

tzdata versions

Posted Mar 9, 2020 21:22 UTC (Mon) by cortana (subscriber, #24596) [Link]

Oh no... surely dates and times in certificates are specified in terms of UTC... right?

tzdata versions

Posted Mar 9, 2020 6:23 UTC (Mon) by ceplm (subscriber, #41334) [Link]

Which is BTW what happens on (I guess) most Linux distributions (I know for sure about Fedora and openSUSE): Olson database is cut out of pytz and it is patched to use the system database, and any attempts to include yet another copy of the database are removed with the extreme prejudice.

Python time-zone handling

Posted Mar 4, 2020 17:36 UTC (Wed) by kleptog (subscriber, #1183) [Link] (5 responses)

The fundamental issue with Python datetime handling is that it conflates two separate concepts: (a) you have a time as you see on the clock and you may or may not want to refer to a timezone (b) you want to refer to a particular instant in time (like seconds since epoch), and any timezone attached effects the display and calculations. Python supports the former use-case, but that latter is more often what people want.

This leads to storing of datetime in anything other than UTC being incredibly painful because the way datetimes are stored using (year,month,day,hour,minute,second,tz) is fundamentally ambiguous. The tm_isdst field is just a hack to paper over this. See for example the FixedOffsetTimezone in psycopg2 whose sole purpose it to accurately represent the timestamps coming from the database which actually represent instants in time. Manipulation of datetime with timezones in Python is hard to get right, which is unfortunate for a language which tries to be "batteries included".

We have libraries like Maya, Delorean, Arrow and Pendulum which all try to solve this problem but it would be nice if the standard library at least offered a basic datetime type which acted sanely with timezones.

Python time-zone handling

Posted Mar 4, 2020 19:13 UTC (Wed) by perennialmind (guest, #45817) [Link] (4 responses)

Thank you! Civil time and wall/epoch time are different spaces for which relations exist. They deserve assistance from the type system. Implicitly transforming abstract civil intervals (1 day) to elapsed time (86400 SI seconds) is nuts will inevitably result in subtle errors. Add a civil day to an abstract date? Sure. Add SI seconds to TAI? Sure. Add SI seconds to ISO 8601? With a time zone attached, maybe.

One "pound" sounds the same whether you're talking about force (lbf), mass (lb), or currency (£). With F# style units, you can tell the difference:

The unit of measure '£' does not match the unit of measure 'lb'

It's not obvious how (year,month,day,hour,minute,second,tz) is ambiguous. With the IANA timezone, `tm_isdst` obliquely encodes the UTC offset. The difference operator is ambiguous – because it capriciously selects between wall time and civil time units.

Python time-zone handling

Posted Mar 4, 2020 20:15 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (3 responses)

Suppose I make a timestamp for a future time near a DST transition. What should happen if that DST transition is moved over my timestamp's intended time? You also need to encode either the state of the future DST transitions at the time if you care about wall time or store it as UTC if you want that instant. So basically, you need to also store whether you want that time in the wall clock or instant sense.

Python time-zone handling

Posted Mar 4, 2020 21:27 UTC (Wed) by perennialmind (guest, #45817) [Link] (2 responses)

Yes! Both units and position/coordinate are worth making distinct. If you add SI seconds to a tz-aware civil `datetime`, you're asking for extrapolation with an emphasis on the "civil" over the "time". If I make an appointment for noon six months out in a region that decides on a whim to adopt DST, I'll should still have my noon appointment. That's how it will work out if I'm using iCalendar, since it stores ISO8601 with IANA time zone id. If I want my Starliner thruster to fire at a precise offset, hopefully I'm using something like CLOCK_MONOTONIC, CLOCK_TAI, or Barycentric Dynamic Time.

Python time-zone handling

Posted Mar 4, 2020 22:04 UTC (Wed) by perennialmind (guest, #45817) [Link]

Yeesh. I gave the iCalendar spec too much credit. They got the TZID part right, but didn't include UTC offset or a 'isdst' equivalent, so there are times that can't be represented in that format. Bummer.

Python time-zone handling

Posted Mar 5, 2020 0:39 UTC (Thu) by excors (subscriber, #95769) [Link]

> If I want my Starliner thruster to fire at a precise offset, hopefully I'm using something like CLOCK_MONOTONIC, CLOCK_TAI, or Barycentric Dynamic Time.

CLOCK_MONOTONIC sounds like a bad idea for precise thrusting, since it "is affected by the incremental adjustments performed by adjtime(3) and NTP" (per the man page), so 1 second on that clock may not be 1 second in real time. CLOCK_MONOTONIC_RAW sounds a bit safer.

(Both of those have an unspecified starting point though, so they probably wouldn't have saved Boeing from their Starliner issue where it missed the desired orbit because a clock was off by 11 hours.)

Python time-zone handling

Posted Mar 4, 2020 20:33 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (4 responses)

> So dt2 - dt0 is treated as two different zones and the math is done in UTC, whereas dt1 - dt0 is treated as the same zone, and the math is done in local time.

Why on Earth was it designed to work that way in the first place? What they're basically saying is that the subtraction operator does two entirely different things depending on the values it is subtracting.

I'm beginning to think they should invent a datetime.instant class that has the same API as datetime.datetime, but stores (time and date in UTC, timezone) instead of (local time and date, timezone), and does all computations with the UTC time (the timezone is just for display and conversion). Then maybe you could also have datetime.wall_time which stores local time and always does wall time math. The latter is almost the same thing as a "naive" datetime.datetime object, but I think a few parts of that API are terrible or broken in various ways... (i.e. they assume no tz means the local system tz).

Of course, you can build these things out of the primitives in the datetime library, but you shouldn't have to. There should be one family of types that everyone can standardize on.

Python time-zone handling

Posted Mar 4, 2020 20:51 UTC (Wed) by JFlorian (guest, #49650) [Link] (3 responses)

> Why on Earth...

Right there's your answer: it was designed on Earth. What else have humans made so impossibly complex as time? Do we have a worse technical debt?

Python time-zone handling

Posted Mar 6, 2020 19:09 UTC (Fri) by flussence (guest, #85566) [Link] (2 responses)

There are three unsolvable problems in computing: time, the Web, and software distribution.

Fixing them would entail removing other users, which can't be done for compatibility reasons.

Python time-zone handling

Posted Mar 9, 2020 13:14 UTC (Mon) by JFlorian (guest, #49650) [Link] (1 responses)

I love it! This strikes as a variation of something I've heard before but I can't quite place it. Though my first guess would be from the HHGTG.

Python time-zone handling

Posted Mar 9, 2020 20:19 UTC (Mon) by flussence (guest, #85566) [Link]

It's a riff on the "naming things, cache invalidation, off-by-one errors" line - but those are merely _unsolved_, not unsolvable :)

Python time-zone handling

Posted Mar 5, 2020 9:39 UTC (Thu) by guus (subscriber, #41608) [Link] (5 responses)

If you are thinking: "What's all this fuss about time-zones? How hard can it be?" You have to watch this excellent explaination about the issues surrounding them: https://www.youtube.com/watch?v=-5wpm-gesOY

Python time-zone handling

Posted Mar 5, 2020 11:51 UTC (Thu) by mbunkus (subscriber, #87248) [Link] (1 responses)

Tom Scott's classic. Highly educational and entertaining at the same time.

"And THEN, then you get a call from the West Bank."

Python time-zone handling

Posted Mar 5, 2020 20:37 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

"Time. . .is a *Western* disease"--Mullen.

Python time-zone handling

Posted Mar 5, 2020 23:14 UTC (Thu) by jschrod (subscriber, #1646) [Link]

Thank you for this link!

Highly recommendable and highly entertaining, I saved it for further reference.

Joachim

Python time-zone handling

Posted Mar 9, 2020 6:39 UTC (Mon) by ceplm (subscriber, #41334) [Link] (1 responses)

TL;DR: use glibc calls and never ever try to do this yourself, this way lies madness.

Python time-zone handling

Posted Mar 9, 2020 9:19 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

The problem with glibc is that TZ handling in it sucks.

No, it SUCKS.

For example, it's not possible to extract the date and time of the DST change. Or even to convert between arbitrary timezones without being thread unsafe.

Python time-zone handling

Posted Mar 9, 2020 14:24 UTC (Mon) by anarcat (subscriber, #66354) [Link]

i'm using both pytz and dateutil now in undertime which is too bad: i'd rather have only one or the other. but i need dateutil for the math, and i need pytz for the timezone listing (which dateutil doesn't directly provide). would be nice to have the latter in the stdlib as well so that pytz (and dateutil of course) could be truly deprecated.


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds