The leap second bug [LWN.net]

The leap second bug

Posted Jul 2, 2012 15:12 UTC (Mon) by theophrastus (guest, #80847) [Link] (2 responses)

to the editors: you could consider moving the comment tree on the leap second issue (which i apologize for starting) under "Kernel prepatch 3.5-rc5" to here as more appropriate. as penitence, you could delete my comments as you do so. i just wanted LWN to be on the forefront of this rather interesting linux topic.

The leap second bug

Posted Jul 2, 2012 15:17 UTC (Mon) by corbet (editor, #1) [Link] (1 responses)

It never occurred to me to implement a "reparent these comments" functionality. I could do it in SQL, I guess, but we believe that a natural aversion to typing SQL directly at the production site is a healthy thing. So the best thing, for those who are interested, is to follow this pointer to said comments.

The leap second bug

Posted Jul 2, 2012 16:14 UTC (Mon) by Baylink (guest, #755) [Link]

I, too, may have commented in the wrong place on this:

https://lwn.net/Articles/504658/

The leap second bug

Posted Jul 2, 2012 17:22 UTC (Mon) by fest3er (guest, #60379) [Link] (2 responses)

Good Lord, I'm glad computers didn't exist in the 1700s when 2½ weeks were dropped; surely the universe would've imploded from the loss of all that time.

I don't understand the difficulty. Leap seconds, minutes, hours, days, years are simply adjustments to our perception of 'now' and should be handled in the same way as all other date conversion problems. Elapsed time is contiguous and constant*; our interpretation of elapsed time into 'when' is discontiguous where adjustments are concerned.

* Fairly constant in our gravity well. Time passes at a slightly different pace up in orbit.

The leap second bug

Posted Jul 3, 2012 1:53 UTC (Tue) by kevinm (guest, #69913) [Link]

This is a great sentiment, but unfortunately POSIX doesn't agree, so we are stuck with having to munge time for leap seconds.

(POSIX time_t isn't the number of seconds since 01-JAN-1970 00:00 UTC - it's the number of whole days since then multiplied by 86400 plus the number of seconds since the most recent midnight UTC).

The leap second bug

Posted Jul 3, 2012 9:38 UTC (Tue) by Tobu (subscriber, #24111) [Link]

If you want to break convention and use TAI64 timestamps rather than POSIX timestamps (which are ambiguous when a leap second is inserted), you can use djb's libtai. Of course, you won't be protected from leap second bugs in the kernel or glibc. And these can only be used internally, but you should already be using RFC 3339 dates for interoperability anyway. Finally you need to maintain your own leap second table without the help of NTP. Or for many uses when the timestamps won't leave your process, you can use clock_gettime with CLOCK_MONOTONIC.

The leap second bug

Posted Jul 2, 2012 18:04 UTC (Mon) by bluss (guest, #47454) [Link] (5 responses)

It seems much easier if the leap second thing was dropped from the kernel and handled by userspace like a normal clock adjustment.

The leap second bug

Posted Jul 2, 2012 18:49 UTC (Mon) by dirtyepic (guest, #30178) [Link] (4 responses)

Try running a RTK GPS array with userspace clock adjustment. A one-second clock skew will only get you about 300m from your target. Hopefully it's not controlling any drones.

I imagine the NASDAQ people might also have a problem with this.

The leap second bug

Posted Jul 2, 2012 19:55 UTC (Mon) by cmccabe (guest, #60281) [Link] (3 responses)

> A one-second clock skew will only get you about 300m from your target.

I guess maybe you should be using CLOCK_MONOTONIC then? That is what it's for-- monotonic time.

The leap second bug

Posted Jul 4, 2012 6:27 UTC (Wed) by butlerm (subscriber, #13312) [Link] (2 responses)

CLOCK_MONOTONIC is of little use in clustered systems, unfortunately.

The leap second bug

Posted Jul 4, 2012 10:03 UTC (Wed) by cmccabe (guest, #60281) [Link] (1 responses)

Well, CLOCK_MONOTONIC is useful for timeouts. And timeouts are often quite useful in a distributed system.

Having your 1 second mutex timedwait last for an hour because someone set the wall clock time back an hour, on the other hand, is not useful. Having your 1 second wait take 2 seconds because of a leap second is not useful.

Assuming that wall-clock time is properly synchronized across all nodes in a distributed system is usually not a good idea. It introduces operational complexity and more things to go wrong. And even the best maintained, properly synchronized clocks aren't usually really all that synchronized.

What we have here is a bad default, unfortunately. Most timeouts should not be in terms of wall-clock time-- ideally, none, except for calendar programs and similar.

The leap second bug

Posted Jul 5, 2012 3:03 UTC (Thu) by dirtyepic (guest, #30178) [Link]

GPS sats are generally synchronized to somewhere ~10ns, but the receivers have to operate in UTC time and use a delta that's broadcast with the nav message which cycles every 12.5min. This is where I thought a leap second could have the chance to mess with phase tracking. It turns out that I was wrong though and there are failsafes built in to specifically prevent that from happening. Yay learning.

The leap second bug

Posted Jul 2, 2012 18:48 UTC (Mon) by slashdot (guest, #22014) [Link] (16 responses)

* Leap second occurs, CLOCK_REALTIME is set back one second.

OK, the terminal idiocy is right there.
Who thought of this dumb idea?

I mean, libc already has to handle timezones when converting to human time, so surely it can handle leap seconds as well, allowing an unambiguous absolute time representation which is also monotonic if time is correctly set and synchronized.

One would think that the POSIX standard committee would not approve such a totally broken interface that jumps back one second on perfectly working systems, but clearly that's not the case.

And obviously whoever used CLOCK_REALTIME to do timeouts not tied to real time is also not a genius.

I also fail to see why we have leap seconds at all, given that I don't think anybody cares if the sun sets a minute later on average, but that's not the real issue.

The leap second bug

Posted Jul 2, 2012 19:01 UTC (Mon) by slashdot (guest, #22014) [Link] (12 responses)

BTW, why the **** is software using subsecond timeouts in futexes?

Are these polling loops or perhaps code that fails if something holds a mutex for more than a second?

Do these clowns seriously put such crap in production software?

The leap second bug

Posted Jul 2, 2012 20:04 UTC (Mon) by clugstj (subscriber, #4020) [Link]

I assume your last question is rhetorical.

The leap second bug

Posted Jul 3, 2012 7:41 UTC (Tue) by farnz (subscriber, #17727) [Link] (4 responses)

As an example from a codebase I worked on (under VxWorks, not Linux, but the idea carries over) - you want a decoder thread to run immediately after the graphics thread has finished rendering a frame to screen, to repopulate the queue of decoded frames (if needed due to timestamps being missed by the graphics thread). Your graphics thread is set up to sync to vblank, and you know the expected frame rate; you don't want the decoder thread to stall completely if the graphics stalls due to complex rendering.

You know that in your environment, the decoder thread keeps a minimum of 5 decoded frames queued and ready to go; that's 80 milliseconds. You expect the graphics thread to wake the decoder up every 16 milliseconds, but know that if it doesn't, you can wait up to 50 milliseconds before you urgently need to catch up to ensure that when the graphics thread recovers, it has current data to display. You therefore set the timeout on the mutex sleep decoder-side to 50 milliseconds, knowing that if the graphics thread stalls for whatever reason, you will keep the queue filled with current frames.

The leap second bug

Posted Jul 3, 2012 19:34 UTC (Tue) by slashdot (guest, #22014) [Link] (3 responses)

Assuming this is a viewer for a real-time graphics stream (since you said it wants to have "current" data) that needs to be decoded, why does the graphics thread wake up the decoder?

One would expect the decoder to read from the network (sleeping if no data is available) and decode frames, putting them in a shared queue, while discarding any that have a "too old" timestamp when a new one is queued (and/or discarding to keep the queue smaller than a fixed limit), and waking up the graphics thread on empty -> non-empty transition.

The graphics thread simply removes the oldest one from the queue (or waits for one to appear if the queue is empty), waits for vblank, and displays it.

No idea why you would want to hold mutexes for milliseconds (which is generally not ideal in any case), or sleep for any time interval, instead of just waiting for network I/O, for vblank, and for the decoding queue being non-empty.

The leap second bug

Posted Jul 4, 2012 9:20 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

The device in question did encoded video stream conformance checking. We had a total of four interesting threads; the statistics gathering thread, the automatic reaction thread, the decoder thread, and the render thread, and three input feeds.

The statistics gathering thread gathers interesting information about the stream, and stores it for the automatic reaction thread and the render thread to use. This can take up to 4ms per frame per input, depending on the instantaneous complexity of the inputs (it does partial decode of video and audio to approximate some interesting measures of the video and audio), but normally takes about 1ms per input.

The automatic reaction thread applies a set of business rules against the statistics, and can trigger external systems to react to an out-of-bounds condition, plus indicates to the render thread that it should show an alarm state to any human user. This takes no more than 1ms with the most complicated rules permitted.

The render thread takes 1ms to render the statistics and any alarm and a further 4ms to update the video box.

The decode thread takes up to 4ms to decode each frame of a selected input. As decoded video is only for presentation to a user, it is considered low importance.

When you add it up, the statistics take 12ms. The reaction thread gets us to 13ms. The render thread needs 1ms if not showing video, for 14ms total, or 5ms if showing video, for 18ms total. The decode thread can add another 4ms to that (22ms total), and our deadline is 16.66ms per frame. We are 6ms over budget for a single frame, in the worst case.

We took this to our product managers, and were told that as long as the automatic reactions happen every frame, we would be OK if the UI was late (render and decode threads), but that they'd want to see at least 1 in every 4 frames of video. This was because the automation was expected to run 24/7 as a set-and-forget system, but the UI would be something only used by some customers at critical times, and could be slow to update.

We handled this by making the reaction thread highest possible priority; the stats thread is the next priority down, as it's more important to have the automated stuff happening than it is to keep the user updated (we expected most people to treat the product as set-and-forget). The decode thread is the next highest priority, as we want to complete a frame decode once we've started it, so that there is some video to display - we don't want to be unable to ever decode a frame in time to display it. The render thread runs at a low priority. The mutex then permits the render thread to release the decode thread when the render thread has enough time left until its next deadline that it should be safe to decode a frame; if the render thread doesn't reach this point in 3 frame times, the decode thread will start anyway, and claim the CPU (delaying the render thread, as it's higher priority, and this is all SCHED_FIFO scheduling).

Given the constraints, how would you have implemented it?

The leap second bug

Posted Jul 5, 2012 2:35 UTC (Thu) by slashdot (guest, #22014) [Link] (1 responses)

Assuming it's an UP machine, I think what you need to do is give priority to the decode thread if a low number of video frames has been decoded during the last N renderer frames, and to the renderer thread otherwise.

This can for instance be accomplished by giving higher priority to the render thread, and having it wait for the decode thread if N frames have been output with less than K decoded video frames.

Alternatively, decide on a fixed rate and simply sequentially decode one frame and then display N stats frames with that same video frame.

The latter is less flexible but might allow tricks like decoding directly to the framebuffer and then XORing each stats overlay on and off, thus never copying decoded video.

Anyway, non-embedded software does not usually have those issues.

The leap second bug

Posted Jul 5, 2012 5:20 UTC (Thu) by farnz (subscriber, #17727) [Link]

Fixed rate is out - while we only guaranteed a low frame rate, we also wanted to achieve a high frame rate if possible, as in the common case (main feed to transmitter, received transmission, low bit rate "we are sorry for any inconvenience" feed), we can meet full frame rate.

Your suggested solution has the same problem as a low timeout - in terms of system overhead, it's identical, plus it now needs extra analysis to verify that the decode thread will run often enough. The advantage of a 50ms timeout is that it's obvious that the decode thread will run once every 50ms in the worst case. In general, it's a bad idea to introduce complexity for the maintenance programmer - if it's not obvious, there's a good chance they'll miss it, make an apparently unrelated change, and now the render thread isn't kicking until you've missed 4 frames, instead of waking up after you've missed 3.

And in non-embedded systems, you get small timeouts by calculation - e.g. "wait for the result of the work I've just asked another thread to do, or for the application layer keepalive timeout to expire". If you've already done 239 seconds of work on this request, and the keepalive timer is 4 minutes, the computed time to sleep will be under a second. Adding extra application code to make the timeout sloppy (e.g. send the keepalive early if the remaining is less than 5 seconds) is extra complexity for a rare case that isn't even needed in the absence of kernel/libc bugs (and one of the powerful points of open source is that you can fix kernel/libc bugs if they affect you, instead of having to have everyone work around them).

The leap second bug

Posted Jul 4, 2012 2:13 UTC (Wed) by jzbiciak (guest, #5246) [Link] (5 responses)

Well, both Firefox and VirtualBox seemed to go nutso after the leap second. I only straced Firefox; it seemed to be in a mad poll/read loop. I assume VirtualBox was doing the same. My system was clocking over 300,000 context switches/sec...

The leap second bug

Posted Jul 9, 2012 23:37 UTC (Mon) by dlang (guest, #313) [Link] (4 responses)

Firefox does that on a fairly regular basis for me (admittedly not _as_ frequently as it used to)

The leap second bug

Posted Jul 10, 2012 0:14 UTC (Tue) by jzbiciak (guest, #5246) [Link] (3 responses)

An actual berzerk poll loop, or mad context switches due to GC?

I had to disable various extensions to slay the memory leaks that were driving Firefox up to the 10-12GB range on me regularly. GC would regularly insert 3+ second pauses and run up the soft-fault count. (At least, I assume that's what it was doing... I had soft-fault counts in the billions.)

The leap second bug

Posted Jul 10, 2012 0:19 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

It fairly commonly gets into a condition where it uses an entire CPU and when I do an strace on it, it's in some poll loop (it tries to read something, gets an error, does a poll, repeat)

This has been going on for several years, but is getting less common on recent versions (I am now running the Aurora versions everywhere, so my 'recent' may be newer than your 'current' :-)

The leap second bug

Posted Jul 10, 2012 0:34 UTC (Tue) by jzbiciak (guest, #5246) [Link] (1 responses)

Ah yep, that's the sort of mad poll loop I saw on Leap Second Day. I can't say for sure I've seen that any other time, though I will be popping up strace next time I see Firefox running up the cycles.

The leap second bug

Posted Jul 10, 2012 1:01 UTC (Tue) by dlang (guest, #313) [Link]

It's a race condition of some sort inside firefox. I'm one of those crazy people who have a couple hundred tabs open across a dozen or so windows and I've seen many cases where on startup, firefox would go into this mad loop. Just killing firefox and trying again would eventually get it out of the loop and operating sanely.

The leap second bug

Posted Jul 2, 2012 19:58 UTC (Mon) by chloe_zen (guest, #8258) [Link] (2 responses)

POSIX decided to enshrine the convention that when time() % 60 == 0, then now is the top of a minute. I must say when I first heard them do that, I was VERY pleased, because ANSI is so wishy-washy about times.

Too late to reconsider it, IMO.

The leap second bug

Posted Jul 2, 2012 23:33 UTC (Mon) by slashdot (guest, #22014) [Link]

Are applications relying on that?

The thing is that you can only get UTC h:mm:ss and local timezone seconds using modulus on time_t, due to half-hour-offset timezones, DST and irregular month lengths.

Now, you generally either want to display UTC date+time or local h:mm:ss at least, neither of which you can do without calling library functions, so maybe not much or anything will break?

I'm not sure I'd risk it though, and not adding leap seconds ever again to UTC seems a much better solution.

The leap second bug

Posted Jul 4, 2012 6:13 UTC (Wed) by butlerm (subscriber, #13312) [Link]

>POSIX decided to enshrine the convention that when time() % 60 == 0, then now is the top of a minute.

That is a perfectly adequate (if not ideal) convention for use in generating representations of civil time. Month, day, year and so on. The problem comes where programs (and kernels) are designed around the preposterous presumption that POSIX time has any reliable relationship to real time.

That might work reasonably well when your timeouts are measured in a cardinal number of seconds greater than two, but experience clearly demonstrates that using POSIX time (or UTC) for anything requiring sub-second accuracy is foolish in the extreme. It amounts to designing a system to fail with a high probability every three or four years.

The leap second bug

Posted Jul 2, 2012 20:42 UTC (Mon) by man_ls (guest, #15091) [Link] (1 responses)

What? When? Why? Whatever?

I think that my two home systems could have been affected: my desktop (i3, two cores) started consuming a whole core whenever I started Firefox or Chrome, for whatever reason. When I shut down both then it went to normal. And again when I started any of them, CPU spikes.

My SheevaPlug just stopped responding to certain commands, df -h worked while free yielded an I/O error after a while. Not a common occurrence; I had to restart it and it went fine.

Does this bug solve both mysteries (happening at about the right time IIRC), or am I retconning the solution?

The leap second bug

Posted Jul 4, 2012 21:20 UTC (Wed) by idupree (guest, #71169) [Link]

I doubt you're retconning. The morning before the leap second I warned my family "there's a leap second coming - our systems might do weird things!" Later that day my sibling's Chromium and Git Gui started eating the CPU, and later my own Firefox did (and a few other programs too). It rarely happens for either of us that multiple programs randomly eat the CPU!

The leap second bug

Posted Jul 2, 2012 23:21 UTC (Mon) by adamgundy (subscriber, #5418) [Link] (1 responses)

just a warning: I've seen some very odd behavior with 'clearing' the bug by setting the date, then stopping/starting ntpd. it seems to throw the time forward an hour plus some minutes. waiting a minute or two before starting ntpd seems to be OK..

The leap second bug

Posted Jul 2, 2012 23:27 UTC (Mon) by adamgundy (subscriber, #5418) [Link]

sigh. spoke too soon. at some point after the initial date reset, possibly in combination with ntpd (and maybe not..) the time appears to jump forward to the next nearest hour. this may cause havoc, depending on your software. ntpdate will fix it, but you have to wait for it to happen (5-10 minutes?), then use ntpdate to correct the problem.

The leap second bug

Posted Jul 2, 2012 23:41 UTC (Mon) by camh (guest, #289) [Link]

If date -s "`date`" does not work for you (invalid date), you may need to run date -s "`LC_ALL=C date`" instead. At least I had to for my locale (en_AU.UTF-8)

The leap second bug

Posted Jul 3, 2012 1:57 UTC (Tue) by kevinm (guest, #69913) [Link] (1 responses)

It can cause more than just load spikes - I had a Debian Lenny machine freeze solid at precisely midnight UTC.

The leap second bug

Posted Jul 3, 2012 4:38 UTC (Tue) by adisaacs (subscriber, #53996) [Link]

That's a different bug, 6b43ae8a619d (ntp: Fix leap-second hrtimer livelock).

The leap second bug

Posted Jul 3, 2012 13:55 UTC (Tue) by welinder (guest, #4699) [Link]

We need to source a rather large number of brown paper bags.

Seriously, we _knew_ something unusual time-wise was coming up.
We didn't test what would happen. As a community -- developers
and technical users -- that's an embarrassment! Brown bags for
everyone.

It's like sales tax being considered an unexpected expense.