Python time-zone handling
Handling time zones is a pretty messy affair overall, but language runtimes may have even bigger problems. As a recent discussion on the Python discussion forum shows, there are considerations beyond those that an operating system or distribution needs to handle. Adding support for the IANA time zone database to the Python standard library, which would allow using names like "America/Mazatlan" to designate time zones, is more complicated than one might think—especially for a language trying to support multiple platforms.
It may come as a surprise to some that Python has no support in the standard library for getting time-zone information from the IANA database (also known as the Olson database after its founder). The datetime module in the standard library has the idea of a "time zone" but populating an instance from the database is typically done using one of two modules from the Python Package Index (PyPI): pytz or dateutil. Paul Ganssle is the maintainer of dateutil and a contributor to datetime; he has put out a draft Python Enhancement Proposal (PEP) to add IANA database support as a new standard library module.
Ganssle gave a presentation
at the 2019
Python Language Summit about the problem. On February 25, he posted
a draft of PEP 615
("Support for the IANA Time Zone Database in the Standard
Library
"). The original posted version of the PEP can be found
in the PEPs GitHub repository.
The datetime.tzinfo
abstract base class provides ways "to implement arbitrarily
complex time zone rules
", but he has observed that users want to work with
three time-zone types: fixed offsets from UTC, the system time zone, and
IANA time zones. The standard library supports the first type with datetime.timezone
objects, and the second to a certain extent, but does not support IANA time
zones at all.
There are some wrinkles to handling time zones, starting with the
fact that they change—frequently. The IANA database is updated multiple
times per year; "between 1997 and 2020, there have been between 3 and 21
releases per year, often in response to changes
in time zone rules with
little to no notice
". Linux and macOS have packages with that
information which get updated as usual, but the situation for Windows is
more complicated. Beyond that, there is a question of what should happen
in a running program when the time-zone information changes out from under it.
The PEP proposes adding a top-level zoneinfo standard library module with a zoneinfo.ZoneInfo class for objects corresponding to a particular time zone. A call like:
tz = zoneinfo.ZoneInfo("Australia/Brisbane")
will search for a corresponding Time
Zone Information Format (TZif) file in various locations to populate
the object. The zoneinfo.TZPATH list will be consulted to find
the file of interest.
On Unix-like systems, that variable will be set to a list of the standard locations (e.g. /usr/share/zoneinfo, /etc/zoneinfo) where the time-zone data files are normally stored. On Windows, there is no official location for the system-wide time-zone information, so TZPATH will initially be empty. The PEP proposes that a data-only tzdata package be created for PyPI that would be maintained by the CPython core developers. That could be used on Windows systems to provide a source for the IANA database information.
By default, ZoneInfo objects would effectively be singletons; a cache would be maintained so that repeated uses of the same time-zone name would return the exact same object. That is not specifically being done for efficiency reasons, but to ensure that times in the same time zone will be handled correctly. The existing datetime arithmetic operations only consider time zones to be equal if they are the same object, not just if they contain the same information. But caching also protects running programs from strange behavior if the underlying time-zone data changes. Effectively, the data will be read once, on first use, and never change again until the interpreter is restarted.
There is support for loading time zones without consulting (or changing) the cache, as well as for clearing the cache, which would effectively reload the time zone for any new ZoneInfo object. But getting updates to time zones mid-stream is problematic in its own right, Ganssle said:
I will note that there is some precedent in this very area: local time information is only updated in response to a call to time.tzset(), and even that doesn’t work on Windows. The equivalent to calling time.tzset() to get updated time zone information would be calling ZoneInfo.clear_cache() to force ZoneInfo to use the updated data (or to always bypass the main constructor and use the .nocache() constructor).
But Florian Weimer was concerned
that users would want those time-zone updates to automatically be
incorporated, so he sees the caching behavior as problematic. "I do not
think that users would want to restart their application (with a scheduled
downtime) just to apply one of those updates.
" Ganssle acknowledged
the concern, "but there are a lot of reasons to use the cache, and
good reasons to believe that using the cache won’t be a problem
".
He went on to note that both pytz and dateutil already
behave this way and he has heard no complaints. He also gave an example of
surprising behavior without any caching:
>>> from datetime import *
>>> from zoneinfo import ZoneInfo
>>> dt0 = datetime(2020, 3, 8, tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt1 = dt0 + timedelta(1)
>>> dt2 = dt1.replace(tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt2 == dt1
True
Each call to ZoneInfo.nocache() will return a different object, even if the time-zone name is the same. So dt1 and dt2 have the same time-zone information, but different ZoneInfo objects. The two datetime objects compare "equal" (==) because they represent the same "wall time", but that does not mean that arithmetic operations will behave as one might expect:
>>> print(dt2 - dt1) 0:00:00 >>> print(dt2 - dt0) 23:00:00 >>> print(dt1 - dt0) 1 day, 0:00:00
March 8, 2020 is the day of the daylight savings time transition in the US, so adding one day (i.e. timedelta(1)) crosses that boundary. In a followup message, he explained more about the oddities of datetime math that are shown by the example:
[...] So dt2 - dt0 is treated as two different zones and the math is done in UTC, whereas dt1 - dt0 is treated as the same zone, and the math is done in local time.
dt1 will necessarily be the same zone as dt0, because it’s the result of an arithmetical operation on dt0. dt2 is a different zone because I bypassed the cache, but if it hit the cache, the two would be the same.
Using the pickle object-serialization mechanism on ZoneInfo objects was also discussed. The PEP originally proposed that pickling a ZoneInfo object would serialize all of the information from the object (e.g. all of the current and historical transition dates), rather than simply serializing the key (e.g. "America/NewYork"). Only serializing the key could lead to problems when de-serializing the object with a different set of time-zone data (e.g. the "Asia/Qostanay" time zone was added in 2018).
But, as pytz maintainer Stuart Bishop pointed out, serializing all of the transition data is likely to lead to other, worse problems:
The PEP specifies that datetimes get serialized with all transition data. That seems unnecessary, as the transition data is reasonably likely to be wrong when it is de-serialized, and I can’t think of any use cases where you want to continue using the wrong data.
Ganssle agreed that it makes more sense to pickle ZoneInfo objects "by reference" (i.e. by time-zone name), though providing a way to also pickle "by value" for those who need or want it would be an option. Guido van Rossum had suggested an approach where a RawZoneInfo class would underlie ZoneInfo objects. Pickling a RawZoneInfo could be done by value. Ganssle liked that idea but thought that it could always be added later if there was a need for it; dateutil.tz already gives the by-value ability, so that could be used in the interim if needed.
Overall, the reaction to the PEP seems quite favorable. Bishop said that
he looks forward to "being able to deprecate pytz, making it a thin
wrapper around the standard library when run with a supported
Python
". Ganssle is still working out some of the details,
particularly around whether to automatically install the tzdata
module for platforms where there is no system-supplied IANA database. It
seems likely that we will soon see support for IANA time zones in
Python—presumably in Python 3.9 in October.
| Index entries for this article | |
|---|---|
| Python | Python Enhancement Proposals (PEP)/PEP 615 |
Posted Mar 4, 2020 12:10 UTC (Wed)
by mirabilos (subscriber, #84359)
[Link] (3 responses)
Getting more up-to-date tzdata should be the secondary case. Optional, possible, but not default.
This is mostly for consistency.
Posted Mar 4, 2020 16:29 UTC (Wed)
by Nahor (subscriber, #51583)
[Link] (1 responses)
Posted Mar 9, 2020 21:22 UTC (Mon)
by cortana (subscriber, #24596)
[Link]
Posted Mar 9, 2020 6:23 UTC (Mon)
by ceplm (subscriber, #41334)
[Link]
Posted Mar 4, 2020 17:36 UTC (Wed)
by kleptog (subscriber, #1183)
[Link] (5 responses)
This leads to storing of datetime in anything other than UTC being incredibly painful because the way datetimes are stored using (year,month,day,hour,minute,second,tz) is fundamentally ambiguous. The tm_isdst field is just a hack to paper over this. See for example the FixedOffsetTimezone in psycopg2 whose sole purpose it to accurately represent the timestamps coming from the database which actually represent instants in time. Manipulation of datetime with timezones in Python is hard to get right, which is unfortunate for a language which tries to be "batteries included".
We have libraries like Maya, Delorean, Arrow and Pendulum which all try to solve this problem but it would be nice if the standard library at least offered a basic datetime type which acted sanely with timezones.
Posted Mar 4, 2020 19:13 UTC (Wed)
by perennialmind (guest, #45817)
[Link] (4 responses)
Thank you! Civil time and wall/epoch time are different spaces for which relations exist. They deserve assistance from the type system. Implicitly transforming abstract civil intervals (1 day) to elapsed time (86400 SI seconds)
One "pound" sounds the same whether you're talking about force (lbf), mass (lb), or currency (£). With F# style units, you can tell the difference:
It's not obvious how (year,month,day,hour,minute,second,tz) is ambiguous. With the IANA timezone, `tm_isdst` obliquely encodes the UTC offset. The difference operator is ambiguous – because it capriciously selects between wall time and civil time units.
Posted Mar 4, 2020 20:15 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
Posted Mar 4, 2020 21:27 UTC (Wed)
by perennialmind (guest, #45817)
[Link] (2 responses)
Yes! Both units and position/coordinate are worth making distinct. If you add SI seconds to a tz-aware civil `datetime`, you're asking for extrapolation with an emphasis on the "civil" over the "time". If I make an appointment for noon six months out in a region that decides on a whim to adopt DST, I'll should still have my noon appointment. That's how it will work out if I'm using iCalendar, since it stores ISO8601 with IANA time zone id. If I want my Starliner thruster to fire at a precise offset, hopefully I'm using something like
Posted Mar 4, 2020 22:04 UTC (Wed)
by perennialmind (guest, #45817)
[Link]
Posted Mar 5, 2020 0:39 UTC (Thu)
by excors (subscriber, #95769)
[Link]
CLOCK_MONOTONIC sounds like a bad idea for precise thrusting, since it "is affected by the incremental adjustments performed by adjtime(3) and NTP" (per the man page), so 1 second on that clock may not be 1 second in real time. CLOCK_MONOTONIC_RAW sounds a bit safer.
(Both of those have an unspecified starting point though, so they probably wouldn't have saved Boeing from their Starliner issue where it missed the desired orbit because a clock was off by 11 hours.)
Posted Mar 4, 2020 20:33 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
Why on Earth was it designed to work that way in the first place? What they're basically saying is that the subtraction operator does two entirely different things depending on the values it is subtracting.
I'm beginning to think they should invent a datetime.instant class that has the same API as datetime.datetime, but stores (time and date in UTC, timezone) instead of (local time and date, timezone), and does all computations with the UTC time (the timezone is just for display and conversion). Then maybe you could also have datetime.wall_time which stores local time and always does wall time math. The latter is almost the same thing as a "naive" datetime.datetime object, but I think a few parts of that API are terrible or broken in various ways... (i.e. they assume no tz means the local system tz).
Of course, you can build these things out of the primitives in the datetime library, but you shouldn't have to. There should be one family of types that everyone can standardize on.
Posted Mar 4, 2020 20:51 UTC (Wed)
by JFlorian (guest, #49650)
[Link] (3 responses)
Right there's your answer: it was designed on Earth. What else have humans made so impossibly complex as time? Do we have a worse technical debt?
Posted Mar 6, 2020 19:09 UTC (Fri)
by flussence (guest, #85566)
[Link] (2 responses)
Fixing them would entail removing other users, which can't be done for compatibility reasons.
Posted Mar 9, 2020 13:14 UTC (Mon)
by JFlorian (guest, #49650)
[Link] (1 responses)
Posted Mar 9, 2020 20:19 UTC (Mon)
by flussence (guest, #85566)
[Link]
Posted Mar 5, 2020 9:39 UTC (Thu)
by guus (subscriber, #41608)
[Link] (5 responses)
Posted Mar 5, 2020 11:51 UTC (Thu)
by mbunkus (subscriber, #87248)
[Link] (1 responses)
"And THEN, then you get a call from the West Bank."
Posted Mar 5, 2020 20:37 UTC (Thu)
by smitty_one_each (subscriber, #28989)
[Link]
Posted Mar 5, 2020 23:14 UTC (Thu)
by jschrod (subscriber, #1646)
[Link]
Highly recommendable and highly entertaining, I saved it for further reference.
Joachim
Posted Mar 9, 2020 6:39 UTC (Mon)
by ceplm (subscriber, #41334)
[Link] (1 responses)
Posted Mar 9, 2020 9:19 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link]
No, it SUCKS.
For example, it's not possible to extract the date and time of the DST change. Or even to convert between arbitrary timezones without being thread unsafe.
Posted Mar 9, 2020 14:24 UTC (Mon)
by anarcat (subscriber, #66354)
[Link]
tzdata versions
tzdata versions
tzdata versions
tzdata versions
Python time-zone handling
Python time-zone handling
is nuts will inevitably result in subtle errors. Add a civil day to an abstract date? Sure. Add SI seconds to TAI? Sure. Add SI seconds to ISO 8601? With a time zone attached, maybe.
The unit of measure '£' does not match the unit of measure 'lb'
Python time-zone handling
Python time-zone handling
CLOCK_MONOTONIC, CLOCK_TAI, or Barycentric Dynamic Time.
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
Python time-zone handling
i'm using both pytz and dateutil now in undertime which is too bad: i'd rather have only one or the other. but i need dateutil for the math, and i need pytz for the timezone listing (which dateutil doesn't directly provide).
would be nice to have the latter in the stdlib as well so that pytz (and dateutil of course) could be truly deprecated.
Python time-zone handling
