LWN.net Logo

Kernel prepatch 3.5-rc5

Kernel prepatch 3.5-rc5

Posted Jul 1, 2012 19:09 UTC (Sun) by theophrastus (guest, #80847)
Parent article: Kernel prepatch 3.5-rc5

Well done. thanks be to Linus! looking forward to 3.5.

now my annoying doesn't-belong-anywhere question (please ignore it as you should):

Can anyone explain in technical terms why/how a leap second causes some large server any problems? (e.g. http://www.wired.com/wiredenterpris /2012/07/leap-second-bug-wreaks-havoc-with-java-linux/ and i guess it *wasn't* the problem with the_pirate_bay) so ok, "Java", not my favorite programming language, but i still don't see:

if(time.hour == 23 and time.min == 59 and time.sec == 60) {
process.go(nuts)
}


(Log in to post comments)

Kernel prepatch 3.5-rc5

Posted Jul 1, 2012 20:14 UTC (Sun) by Tobu (subscriber, #24111) [Link]

Apparently the bug was noticed (presumably some forward-looking organisations do run tests with clocks running a few months fast) and fixed in 3.4 but not publicised and not CC-ed to stable. So there's a process problem in that the test->publicise loop wasn't closed. And the bug wasn't noticed after it was pulled in 2008, presumably because no one ran leap second tests that flagged an abnormally high cpu load.

Leap seconds & the kernel

Posted Jul 1, 2012 20:27 UTC (Sun) by Tobu (subscriber, #24111) [Link]

Except there was a leap second in December 2008, so there's more to it than that.

Leap seconds & the kernel

Posted Jul 1, 2012 20:31 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

And we had kernel crashes back then on loaded systems :)

Well, at least we're safe for 3-4 more years.

Leap seconds & the kernel

Posted Jul 1, 2012 20:55 UTC (Sun) by theophrastus (guest, #80847) [Link]

I guess what i was hoping for - and thankee for any insights! - was more along the lines of: "...process_table scheduler counts up to a hardwired 86,400 and then increments day_count..." ? and/or what does Java have to do with it? (as various reports have reported)

Leap seconds & the kernel

Posted Jul 1, 2012 21:09 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

No, it's more like "incremental seconds counter and absolute time diverge in an unexpected way".

Leap seconds & the kernel

Posted Jul 1, 2012 22:03 UTC (Sun) by alankila (subscriber, #47141) [Link]

Given that wide array of programs were affected, not just java, it was a more general issue. Apparently futexes immediately timeouted after the leap second was applied, and this caused high CPU usage.

The freakish thing was that just restarting the affected program did not fix it. I'm still waiting for the full LWN exposé of this particular issue...

Leap seconds & the kernel

Posted Jul 2, 2012 8:54 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Looks like the problem has been identified: http://article.gmane.org/gmane.linux.kernel/1321284

Oh well, I guess I'll wait 4 more years to see if it's fixed by that time :)

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 4:22 UTC (Mon) by jstultz (subscriber, #212) [Link]

Here's my analysis for what most folks seem to be seeing:
https://lkml.org/lkml/2012/7/1/203

Proposed fix thread starts here:
https://lkml.org/lkml/2012/7/1/176

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 5:32 UTC (Mon) by theophrastus (guest, #80847) [Link]

fascinating. thank you very much for your efforts into something that the linux community clearly needed resolved.

how did you decide to make it 10 seconds offset? does that include a substantial padding against unstably reset clocks?
...
/* Calculate the next leap second */
tv.tv_sec += 86400 - tv.tv_sec % 86400;
/* Set the time to be 10 seconds from that time */
tv.tv_sec -= 10;
settimeofday(&tv, NULL);
...

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 9:36 UTC (Mon) by benja (guest, #70385) [Link]

Hi, I think your confused: this is a user mode test case to trigger the bug, not the actual kernel fix. IIUC, the code is setting system time 10 seconds before the next day (23:59:50 UTC) then tells the kernel that it needs to insert a leap second, which will be done at 23:59:60. BR.

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 15:15 UTC (Mon) by theophrastus (guest, #80847) [Link]

you're correct, (i'm incorrect), i read the "proposed fix thread" and 'leap'ed that it was a proposed fix patch. ...shutt'n-up now boss.

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 14:25 UTC (Mon) by kjp (subscriber, #39639) [Link]

OK, It's high time (haha) to get moving on UTC-SLS (smoothed leap seconds). This is insanity.

Seriously, is there an NTP daemon that can do this.

- Karl

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 16:03 UTC (Mon) by mikemol (subscriber, #83507) [Link]

I was under the impression that NTP updated small time differences using the 'slew' method. I honestly expected the leap second would slowly propagate via stratum after stratum of clock slewing...

It seems I need to read up more on NTP.

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 16:30 UTC (Mon) by dgm (subscriber, #49227) [Link]

A time difference is not the same as a leap second. The former comes from a clock running out of sync with the reference one, while the later means some minutes are 61 seconds long on all clocks, even the reference one.

Kernel prepatch 3.5-rc5

Posted Jul 2, 2012 16:33 UTC (Mon) by mikemol (subscriber, #83507) [Link]

My expectation was that the leap second would be *communicated* as a time difference. I.e. have the stratum N time sources appear to themselves to have gotten 1 second behind their stratum N-1 time sources, and slew back to resync.

Leap seconds

Posted Jul 2, 2012 20:11 UTC (Mon) by midg3t (subscriber, #30998) [Link]

Impending leap seconds are explicitly signaled in NTP payloads on the day of the leap second. Search for "Leap Indicator" in RFC 2030.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds