Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 2, 2012 18:53 UTC (Mon) by jhhaller (guest, #56103)
In reply to: Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2) by Baylink
Parent article: Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

There are actually a couple of ways leap seconds are implemented in GNU/Linux (we can't blame the whole mess on the kernel). There are three parts to leap seconds, ntpd, the kernel, and glibc.

To use glibc with leap seconds, "right" timezone files must be used, e.g. US_Central_right. This allows the ISO version of timestamps to be used, and the clock will indeed tick at 235958, 235959, 235960, 000000. These are not the default timezones, for reasons described below. Note that this requires that the number of seconds since January 1, 1970 must account for all leap seconds.

To use glibc with the conventional timezones, there is no notion of leap seconds. This is the way ctime has worked since the beginning. It assures that every year will have the same number of seconds, except for leap day years, which have an extra day. This obviously causes problems when there is a leap second, as a second has to be played twice from the kernel, as the kernel clock can't know about leap seconds since ctime doesn't. There is no way in the current interfaces to report a time plus report that this kernel time represents a leap second. Also, the Posix definition for time does not account for leap seconds. The only way to do this is to replay the time value for second 59, as ctime has no way to know a leap second is coming to show the displayed second as 60.

Now, to throw NTP into the mix. The NTP protocol reports time in UTC, which is ephemeral or solar time. There is no history of leap seconds in the UTC protocol, just an indication that an upcoming minute at the end of the day will have 59, 60, or 61 seconds. While NTP could in theory run with the "right" timezones, and add or subtract historical leap seconds when setting the system time, that would make the time returned from the time call to be incorrect according to Posix.

In short, Posix is inconsistent with itself, or at least needs a new kernel API to reflect historical leap seconds, both for time and adjtime, although it appears that this was known when the time system call was standardized.

Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 2, 2012 20:18 UTC (Mon) by chloe_zen (guest, #8258) [Link] (7 responses)

Should ntp not use a gradual clock skew instead of simply slamming the time to its new value?

Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 2, 2012 20:36 UTC (Mon) by Thue (guest, #14277) [Link] (5 responses)

In any sane standard, the clock (and NTP) should be using TAI ( http://en.wikipedia.org/wiki/International_Atomic_Time ). The same way as the computer's clock doesn't include time zones, it shouldn't include leap seconds.

Time zone and leap second offsets should be added in user space programs, the same way I assume time zones currently are.

Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 2, 2012 20:45 UTC (Mon) by chloe_zen (guest, #8258) [Link] (4 responses)

I don't disagree, but that's beside the point IMO. Sometimes clock drift happens. When ntp is called on to fix that drift -- whether due to stupid standards or everyday imprecision -- shouldn't it use the system calls that are designed for adjusting the system clock's fundamental speed, instead of just saying "ok your time is different NOW!" ?

Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 2, 2012 21:08 UTC (Mon) by Thue (guest, #14277) [Link] (2 responses)

Of course we would still need NTP if the system clock was set to TAI, and of course the same gradual clock adjust as now should be used. The use of NTP is ortogonal to the UTC vs TAI as system clock argument.

Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 2, 2012 21:40 UTC (Mon) by chloe_zen (guest, #8258) [Link] (1 responses)

Er, "the same gradual clock adjust as now" isn't so gradual, is it? Else this bug wouldn't have hit?

Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 2, 2012 22:05 UTC (Mon) by Thue (guest, #14277) [Link]

The current clock implementation is only non-gradual at leap seconds.

gradual clock adjust

Posted Jul 2, 2012 23:11 UTC (Mon) by pflugstad (subscriber, #224) [Link]

FWIW, using NTP to gradually account for a leap second is what Google decided to do:

<http://googleblog.blogspot.com/2011/09/time-technology-an...>

Note that they don't explicitly say over what time window they adjust the time, but my impression from the above article is that it's done over a few hours, not over an entire day.

Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)

Posted Jul 3, 2012 14:19 UTC (Tue) by Tobu (subscriber, #24111) [Link]

NTP merely informs the kernel (using adjtimex) that a leap second is being inserted. The kernel implementation makes an internal timestamp jump, but that's because the kernel counts using POSIX timestamps, which is an implementation decision. If the kernel's internal timekeeping used TAI, the kernel would cross-reference the adjtimex notification with some sort of leap seconds table, which it would use whenever it needs to come up with a POSIX timestamp (for much of its ABI including protocols, filesystem formats, and system calls). NTP is agnostic about how the kernel clock is run.