It would not just break POSIX but a huge number of applications, unless you complicated glibc to do the conversion, in which case it would still be obviously broken in some major cases, as the time you got from gettimeofday() would be different from the time visible on a file you just created when you stat()ted it, unless you had glibc adjust *that* too -- and that way lies madness.
And for what benefit? Moving a pile of complexity out of ntpd (which is meant for dealing with this sort of thing) and out of the kernel's time handling code (which is a single body of code maintained by people who know what they're doing) into glibc and a vast body of applications. Now perhaps glibc could get its updates via tzdata, but are the applications all going to get it right? They get it wrong *now*, many would need changing, and as has been pointed out elsethread, getting this right is hard, since even the original authors of many programs probably didn't expect 'this time tomorrow' and 'this time 86400 seconds away' to have distinct answers, and nearly all the time they wouldn't.
The solution to bugs in a bit of code in highly-tested critical software that is hard to debug and test because it caters to a rarely-arising condition is surely *not* to distribute and multiply that code among a vast number of applications, critical and otherwise, many of which are much less tested than the kernel is.
For this to be less dangerous, you'd need to translate every single time the kernel passes to userspace by whatever means (including timestamps in network filesystems!), teach everything that touched raw fs dumps to translate times as well, and unless you want to waste all that time add TAI-returning syscalls akin to gettimeofday() et al... and that looks like a lot more code than the existing set of leap-second-handling code, which is clearly *already* too rarely executed to be expected to keep working between leap second invocations.
Or we could just wait. Leap seconds are getting quadratically more frequent over long enough timespans, and will be downright common in timespans comparable to that since the invention of the computer. When the things are occurring monthly, something will either be done to fix it or at the very least the code to handle them will be frequently tested!
Posted Jul 9, 2012 17:49 UTC (Mon) by rschroev (subscriber, #4164)
[Link]
Maybe I'm missing something, but it seems to me you're making it much more complex than it needs to be.
Given the fact that leap seconds, to me it seems to best way to handle time is:
- The kernel deals exclusively with TAI, by which I mean the number of seconds since the Epoch. Leap seconds are seconds just like any other. That means that some days are somewhat shorter or longer than 86400 seconds, but that's not important to the kernel. That is consistent with the manpage already: "time() returns the time as the number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC)."
- Programs can use that directly if they want to, mostly if they want to know the time delta between two moments in seconds. In other cases, they can call asctime(), ctime(), gmtime() or localtime() which handle the translation from TAI to UTC and local time. Perhaps functions should be added for conversion bewteen TAI and UTC, both expressed as seconds since the epoch.
This way the kernel time handling code is as simple as can be (no special cases there), and the complexity is moved to glibc which already knows how to handle time zones. Adding or subtracting leap seconds is very similar to handling time zones, I would think. The system always has a simple unambiguous idea of the time, converted to/from wall clock time as needed by glibc for interacting with users, which to me is just a natural extensions of the current concept of the kernel using UTC instead of local time.
I agree it makes time delta calculations more difficult: instead of adding n * 86400 seconds for n days, you have to add the required number of days/months/years in struct tm.
If I understand you correctly, you are saying that such a solution would cause the time from gettimeofday() to be different from the file time on a freshly created file. I don't understand that... since TAI is all the kernel knows, but will give you the same value.
The only real disadvantage I see is that it potentially makes interoperation between different systems harder, because some systems might do it this way and some might not.
Leaping seconds and looping servers
Posted Jul 10, 2012 16:00 UTC (Tue) by nix (subscriber, #2304)
[Link]
The problem with that ultra-simple case is that it is completely incompatible with the installed base of applications. You *cannot* change gettimeofday() et al to return TAI, because nobody who calls gettimeofday() is expecting it, and because the relevant standard guarantees that adding 86400 to a time will always give you the same time on the next day. A *lot* of code depends on this assumption, and you can't sensibly distinguish between code that wants 'exactly one day from now' and code that wants 'the same time, tomorrow' -- which suddenly become different things, though the authors of the code making that assumption pretty much universally didn't expect that.
Leaping seconds and looping servers
Posted Jul 10, 2012 16:09 UTC (Tue) by rschroev (subscriber, #4164)
[Link]
... and because the relevant standard guarantees that adding 86400 to a time will always give you the same time on the next day. A *lot* of code depends on this assumption, ...
That assumption is already wrong twice a year in places that observe daylight saving time, by a full hour in most places, which is a lot more than the difference between TAI and UTC. Code that depends on it is already incorrect.
Leaping seconds and looping servers
Posted Jul 9, 2012 20:49 UTC (Mon) by Jonno (subscriber, #49613)
[Link]
> It would not just break POSIX but a huge number of applications
Very few application does syscalls directly, almost everyone goes through glibc which already converts times using tzdata to get time-zone handling correctly.
> unless you complicated glibc to do the conversion, in which case it would still be obviously broken in some major cases, as the time you got from gettimeofday() would be different from the time visible on a file you just created when you stat()ted it
Of course not, all not-time-zone-aware times would be TAI, and all time-zone-aware times would be correct for that time-zone. UTC would be just another time-zone (which it already is, though today the kernel-time to UTC conversion is trivial).
> And for what benefit? Moving a pile of complexity [...] into glibc and a vast body of applications.
Applications would need no more complexity than what they already need for correct time-zone handling. The small number of applications that lacks that (small) complexity today and thus uses UTC exclusively (and are thus wrong for all users at least half the year) would just start using TAI instead (and thus be wrong for all users all year). All other applications would work just fine without any change.
tz-data would need to start carry leap second information, and glibc would need to make use of it, but that is trivial compared to what they already deal with (leap seconds are after all the same in all jurisdictions).
The only real problem is handling the transition correctly. I believe there would only have to add fixes to three components for it to work.
glibc would need to detect the kernel version and decide whether to use the leap second information from tz-data depending on whether the kernel runs in UTC or in TAI.
hwclock would need to use glibc to convert between UTC and kernel time (just like it does today when it converts between local time and kernel time for dual boot system whose BIOS clock runs in local time).
NTP would need to either introduce a flag day when the NTP pool switches from UTC to TAI, or (more likely) adding some compatibility code so new NTP versions that speak TAI can communicate with old NTP versions that speak UTC.
Leaping seconds and looping servers
Posted Jul 10, 2012 16:02 UTC (Tue) by nix (subscriber, #2304)
[Link]
You completely ignored everything I mentioned about filesystems, networking, the intersection of the two, and other routes for times out of the kernel which do not pass through glibc (or, if they do,. Since this is the nub of the problem, requiring replication of rarely-tested TAI-to-UTC conversion code in many places where it is currently centralized in two (NTP and the kernel), it is not surprising that you thought there was no real problem. There is.