Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
| From: | John Stultz <johnstul-AT-us.ibm.com> | |
| To: | Linux Kernel <linux-kernel-AT-vger.kernel.org> | |
| Subject: | Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2) | |
| Date: | Sun, 01 Jul 2012 15:05:08 -0700 | |
| Message-ID: | <4FF0C994.2020300@us.ibm.com> | |
| Cc: | Prarit Bhargava <prarit-AT-redhat.com>, stable-AT-vger.kernel.org, Thomas Gleixner <tglx-AT-linutronix.de>, Jan Engelhardt <jengelh-AT-inai.de> | |
| Archive‑link: | Article |
On 07/01/2012 11:29 AM, John Stultz wrote:
> TODOs:
> * Chase down the futex/hrtimer interaction to see if this could
> be triggered in any other way.
Ok, got a little more detailed diagnosis of what is going on figured out:
* Leap second occurs, CLOCK_REALTIME is set back one second.
* As clock_was_set() is not called, the hrtimer base.offset value for
CLOCK_REALTIME is not updated, thus its sense of wall time is one second
ahead of the timekeeping core's.
* At interrupt time (T), the hrtimer code expires all CLOCK_REALTIME
based timers set for T+1s and before, causing early expirations for
timers between T and T+1s since the hrtimer code's sense of time is one
second ahead.
* This causes all TIMER_ABSTIME CLOCK_REALTIME timers to expire one
second early.
* More problematically, all sub-second TIMER_ABSTIME CLOCK_REALTIME
timers will return immediately. If any such timer calls are done in a
loop (as commonly done with futex_wait or other timeouts), this will
cause load spikes in those applications.
* This state persists until clock_was_set() is called (most easily done
via settimeofday())
I've used the attached test case to demonstrate triggering a leap-second
and its effect on CLOCK_REALTIME hrtimers.
The test sets a leapsecond to trigger in 10 seconds, then in a loop
sleeps for half a second via clock_nanosleep, printing out the current
time, and the delta from the target wakeup time for 30 seconds.
When the leap second triggers, on affected machines you'll see the
output streams quickly, with negative diff values, as clock_nanosleep is
immediately returning.
To build:
gcc leaptest-timer.c -o leaptest-timer -lrt
I've reproduced this behaviour in kernel versions:
v3.5-rc4
v2.6.37
v2.6.32.59
(And quite likely all in-between).
I haven't been able to build or boot anything earlier with the distro on
my current test boxes, but I'm working to get older distro installed so
I can do further testing.
Likely has potentially been around
since:746976a301ac9c9aa10d7d42454f8d6cdad8ff2b in v2.6.22, as Ben Blum
and Jan Ceuleers already noted.
With my fix to call clock_was_set when we apply a leapsecond, I no
longer see the issue.
thanks
-john
/* Leap second timer test
* by: john stultz (johnstul@us.ibm.com)
* (C) Copyright IBM 2012
* Licensed under the GPL
*/
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
#include <sys/timex.h>
#define CALLS_PER_LOOP 64
#define NSEC_PER_SEC 1000000000ULL
struct timespec timespec_add(struct timespec ts, unsigned long long ns)
{
ts.tv_nsec += ns;
while(ts.tv_nsec >= NSEC_PER_SEC) {
ts.tv_nsec -= NSEC_PER_SEC;
ts.tv_sec++;
}
return ts;
}
struct timespec timespec_diff(struct timespec a, struct timespec b)
{
long long ns;
int neg = 0;
ns = a.tv_sec *NSEC_PER_SEC + a.tv_nsec;
ns -= b.tv_sec *NSEC_PER_SEC + b.tv_nsec;
if (ns < 0) {
neg = 1;
ns = -ns;
}
a.tv_sec = ns/NSEC_PER_SEC;
a.tv_nsec = ns%NSEC_PER_SEC;
if (neg) {
a.tv_sec = -a.tv_sec;
a.tv_nsec = -a.tv_nsec;
}
return a;
}
int main(void)
{
struct timeval tv;
struct timex tx;
int i, inconsistent;
long now, then;
struct timespec ts;
int clock_type = CLOCK_REALTIME;
int flag = TIMER_ABSTIME;
long long sleeptime = NSEC_PER_SEC/2;
/* clear TIME_WAIT */
tx.modes = ADJ_STATUS;
tx.status = 0;
adjtimex(&tx);
sleep(2);
/* Get the current time */
gettimeofday(&tv, NULL);
/* Calculate the next leap second */
tv.tv_sec += 86400 - tv.tv_sec % 86400;
/* Set the time to be 10 seconds from that time */
tv.tv_sec -= 10;
settimeofday(&tv, NULL);
/* Set the leap second insert flag */
tx.modes = ADJ_STATUS;
tx.status = STA_INS;
adjtimex(&tx);
clock_gettime(clock_type, &ts);
now = then = ts.tv_sec;
while(now - then < 30){
struct timespec target, diff, rem;
rem.tv_sec = 0;
rem.tv_nsec = 0;
if (flag == TIMER_ABSTIME)
target = timespec_add(ts, sleeptime);
else
target = timespec_add(rem, sleeptime);
clock_nanosleep(clock_type, flag, &target, &rem);
clock_gettime(clock_type, &ts);
diff = timespec_diff(ts, target);
printf("now: %ld:%ld diff: %ld:%ld rem: %ld:%ld\n",
ts.tv_sec, ts.tv_nsec,
diff.tv_sec, diff.tv_nsec,
rem.tv_sec, rem.tv_nsec);
now = ts.tv_sec;
}
/* clear TIME_WAIT */
tx.modes = ADJ_STATUS;
tx.status = 0;
adjtimex(&tx);
return 0;
}
Posted Jul 2, 2012 16:09 UTC (Mon)
by Baylink (guest, #755)
[Link] (13 responses)
Do people not understand how leap seconds are implemented? Really?
235958
Not, as Red Hat seems to think:
235958
Or, as google seems to think would be a Good Idea:
235958
with *seconds being 1/86,401th of a second longer than other days* (no, I am not making any part of that up).
Posted Jul 2, 2012 16:24 UTC (Mon)
by Baylink (guest, #755)
[Link] (1 responses)
ISO8601 actually permits 60 as a valid seconds count, for precisely this reason.
https://en.wikipedia.org/wiki/ISO_8601
I had thought that it, for some reason, permitted 61, too, but I was worng.
Posted Jul 4, 2012 7:12 UTC (Wed)
by butlerm (subscriber, #13312)
[Link]
Changing the kernel's internal time base to use something TAI derived instead of UTC derived is probably the only way to fix this problem reliably. The downside is that means the kernel would have to maintain a leap second table and convert back and forth between POSIX time and linear time where necessary.
Posted Jul 2, 2012 16:27 UTC (Mon)
by lindi (subscriber, #53135)
[Link] (1 responses)
1341100798
Isn't this exactly the correct behavior?
Posted Jul 2, 2012 17:24 UTC (Mon)
by Jonno (subscriber, #49613)
[Link]
However, there was some internal kernel code that expected to be informed when the time of day and elapsed time wasn't continuous (done by calling clock_was_set()). The code for settimeofday() got this right, but the clock_was_set() call was missing from the leap second introducing code, leading to some trouble I don't really understand.
Posted Jul 2, 2012 18:53 UTC (Mon)
by jhhaller (guest, #56103)
[Link] (8 responses)
To use glibc with leap seconds, "right" timezone files must be used, e.g. US_Central_right. This allows the ISO version of timestamps to be used, and the clock will indeed tick at 235958, 235959, 235960, 000000. These are not the default timezones, for reasons described below. Note that this requires that the number of seconds since January 1, 1970 must account for all leap seconds.
To use glibc with the conventional timezones, there is no notion of leap seconds. This is the way ctime has worked since the beginning. It assures that every year will have the same number of seconds, except for leap day years, which have an extra day. This obviously causes problems when there is a leap second, as a second has to be played twice from the kernel, as the kernel clock can't know about leap seconds since ctime doesn't. There is no way in the current interfaces to report a time plus report that this kernel time represents a leap second. Also, the Posix definition for time does not account for leap seconds. The only way to do this is to replay the time value for second 59, as ctime has no way to know a leap second is coming to show the displayed second as 60.
Now, to throw NTP into the mix. The NTP protocol reports time in UTC, which is ephemeral or solar time. There is no history of leap seconds in the UTC protocol, just an indication that an upcoming minute at the end of the day will have 59, 60, or 61 seconds. While NTP could in theory run with the "right" timezones, and add or subtract historical leap seconds when setting the system time, that would make the time returned from the time call to be incorrect according to Posix.
In short, Posix is inconsistent with itself, or at least needs a new kernel API to reflect historical leap seconds, both for time and adjtime, although it appears that this was known when the time system call was standardized.
Posted Jul 2, 2012 20:18 UTC (Mon)
by chloe_zen (guest, #8258)
[Link] (7 responses)
Posted Jul 2, 2012 20:36 UTC (Mon)
by Thue (guest, #14277)
[Link] (5 responses)
Time zone and leap second offsets should be added in user space programs, the same way I assume time zones currently are.
Posted Jul 2, 2012 20:45 UTC (Mon)
by chloe_zen (guest, #8258)
[Link] (4 responses)
Posted Jul 2, 2012 21:08 UTC (Mon)
by Thue (guest, #14277)
[Link] (2 responses)
Posted Jul 2, 2012 21:40 UTC (Mon)
by chloe_zen (guest, #8258)
[Link] (1 responses)
Posted Jul 2, 2012 22:05 UTC (Mon)
by Thue (guest, #14277)
[Link]
Posted Jul 2, 2012 23:11 UTC (Mon)
by pflugstad (subscriber, #224)
[Link]
<http://googleblog.blogspot.com/2011/09/time-technology-an...>
Note that they don't explicitly say over what time window they adjust the time, but my impression from the above article is that it's done over a few hours, not over an entire day.
Posted Jul 3, 2012 14:19 UTC (Tue)
by Tobu (subscriber, #24111)
[Link]
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
235959
235960
000000
000001
235959
235959
000000
000001
235959
000000
000001
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
1341100799
1341100800
1341100800
1341100801
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
gradual clock adjust
NTP merely informs the kernel (using adjtimex) that a leap second is being inserted. The kernel implementation makes an internal timestamp jump, but that's because the kernel counts using POSIX timestamps, which is an implementation decision. If the kernel's internal timekeeping used TAI, the kernel would cross-reference the adjtimex notification with some sort of leap seconds table, which it would use whenever it needs to come up with a POSIX timestamp (for much of its ABI including protocols, filesystem formats, and system calls). NTP is agnostic about how the kernel clock is run.
Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue (v2)
