Leaping seconds and looping servers
The earth's rotation is slowing over time; contrary to some public claims, this slowing is not caused by Republican administrations, government spending, or proprietary software. In an attempt to keep the official Coordinated Universal Time (UTC) in sync with the earth's behavior, the powers that be occasionally insert an additional second (a "leap second") into a day; 25 such seconds have been inserted since the practice began in 1972. This habit is not without its detractors, and there are constant calls for its abolition, but, for now, leap seconds are a reality that the world (and the kernel) must deal with. For the curious, the Wikipedia leap second page has more detail than almost anybody could want.
The kernel's core time is kept in a timespec structure:
struct timespec { __kernel_time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ };
It is, in essence, a count of seconds since the beginning of the epoch. Unfortunately, that count is defined to not include leap seconds. So when a leap second happens, the system time must be explicitly corrected; that is done by setting the system clock back one second at the end of that leap second. The code that handles this change is quite old and works pretty much as advertised. It is the source of this message that most Linux systems should have (in some form) in their logs:
Jun 30 19:59:59 dt kernel: Clock: inserting leap second 23:59:60 UTC
The kernel's high-resolution timer (hrtimer) code does not use this version of the system time, though — at least, not directly. Instead, hrtimers have a couple of internal time bases that are offset from the system time. These time bases allow the implementation of different clocks; the "realtime" clock should adjust with the time, while the "monotonic" clock must always move forward, for example. Importantly, these timer bases are CPU-specific, since realtime clocks can differ between one CPU and the next in the same system. The hrtimer offsets allow the timer subsystem to quickly turn a system time into a time value appropriate for a specific processor's realtime clock.
If the system time changes, those offsets must be adjusted accordingly. There is a function called clock_was_set() that handles this task. As long as any system time change is followed by a call to clock_was_set(), all will be well. The problem, naturally, is that the kernel failed to call clock_was_set() after the leap second adjustment, which certainly qualifies as a system time change. So the hrtimer subsystem's idea of the current time moved forward while the system time was held back for a second; hrtimers were thereafter operating one second in the future. The result of that offset is that timers started expiring one second sooner than they should have; that is not quite what the timer developers had in mind when they used the term "high resolution."
For many applications, having a timer go off one second early is not a big problem. But there are plenty of situations where timers are set for less than one second in the future; all such timers will naturally expire immediately if the timer subsystem is operating one second ahead of the system time. Many of these timers are also recurring timers; they will be re-set immediately after expiration, at which point they will immediately expire again — and so on. The resulting loop is the source of the load spikes reported by victims of this bug across the net.
The fix is to call clock_was_set() in the leap second code—a call that had been removed in 2007. But it's not quite that simple. The work done by clock_was_set() must happen on every CPU, since each CPU has its own set of timer bases. That's not something that can be done in atomic context. So John's patch detects a call in atomic context and defers the work to a workqueue in that case. With this patch in place, the kernel's leap second handling should work again.
How could such a bug come about? Time-related code is notoriously tricky in general; bugs are common. But the situation is far worse when the code in question is almost never executed. Prior to June 30, 2012, the last leap second was at the end of 2008. That is 3½ years in which the leap second code could have been broken without anybody noticing. If the kernel had a regularly-run regression test that verified the correct functioning of hrtimers in the presence of leap second adjustments, this problem might just have been caught before it affected production systems, but nobody has made a habit of running such tests thus far.
Perhaps that will change in the future; if nothing else, distributors with
support obligations are likely to run some tests ahead of the next
scheduled leap second adjustment. Hopefully, that will catch any problems
in this particular little piece of code, should they happen to slip in
again. Beyond that, one can always hope for an end to leap seconds. The
kernel could also contemplate a switch to international
atomic time (TAI), which does not have leap seconds, for its internal
representation. Using TAI internally has its own challenges, though,
including a need to avoid changing the time representation as seen by user
space—meaning that the kernel would still have to track leap seconds
internally. So it seems likely that, one way or another, leap seconds are
likely to continue to be a source of irritation and bugs in the future.
Index entries for this article | |
---|---|
Kernel | hrtimer |
Kernel | Timers |
Posted Jul 3, 2012 1:34 UTC (Tue)
by ras (subscriber, #33059)
[Link]
At around 2012-06-27T09:30:00+1000 several servers had their time go backwards by precisely 10 hours. Ie the time zone difference. These servers were all running a fully patched Debian stable. There are distributed around the country, all running ntpd with 4 debian.pool.ntp.org upstream servers, presumably different servers in each case. Most of them are little more than an internet gateway and so were running very little in the way of non-Debian software. But two of them are application servers.
That day I discovered our how well our in-house software copes with the day being set to yesterday. Turns out it causes lots of transient problems as it whinges about the entered dates being wrong. The most serious thing was a SIP E1 gateway running an embedded linux stopped receiving calls. That took out the companies entire phone system, but a reboot fixed that.
I am pretty sure it was ntpd that was at fault, as "ntpq -c peers", showed then being out by offset by precisely 10 hours from their upstream servers.
Setting the date manually and restarting ntpd fixed the problem, but at the time I was at a total loss at to what the cause might have been. Then I read about the leap second, and then found out that ntpd can start adjusting for it days before hand. Anyway, it made for in interesting departure from the normal daily routine.
Posted Jul 3, 2012 2:03 UTC (Tue)
by geofft (subscriber, #59789)
[Link] (6 responses)
Is that actually true? Shouldn't we just be able to make libc deal with it?
Posted Jul 3, 2012 3:16 UTC (Tue)
by corbet (editor, #1)
[Link] (5 responses)
Posted Jul 3, 2012 6:47 UTC (Tue)
by josh (subscriber, #17465)
[Link] (4 responses)
Posted Jul 3, 2012 11:07 UTC (Tue)
by kunitz (subscriber, #3965)
[Link] (3 responses)
Posted Jul 3, 2012 22:07 UTC (Tue)
by simlo (guest, #10866)
[Link] (2 responses)
Make a alias for the POSIX CLOCK_REALTIME -> CLOCK_UTC. Make a new CLOCK_TAI running along CLOCK_UTC. And some function to get the difference between CLOCK_TAC and CLOCK_UTC at any given time (except you don't know about future leap seconds).
In the applications I am working on right now, I would try to restrict myself to CLOCK_MONOTONIC and CLOCK_TAC, but I would need to translate to and from UTC because some protocols require timestamps in UTC.
Posted Jul 3, 2012 23:43 UTC (Tue)
by dashesy (guest, #74652)
[Link] (1 responses)
This is what I use for any relative time:
Posted Jul 4, 2012 13:57 UTC (Wed)
by simlo (guest, #10866)
[Link]
Posted Jul 3, 2012 4:43 UTC (Tue)
by chloe_zen (guest, #8258)
[Link] (33 responses)
Posted Jul 3, 2012 5:34 UTC (Tue)
by aburgoyne (subscriber, #3924)
[Link]
Posted Jul 3, 2012 6:48 UTC (Tue)
by josh (subscriber, #17465)
[Link] (30 responses)
Posted Jul 3, 2012 8:51 UTC (Tue)
by gevaerts (subscriber, #21521)
[Link] (29 responses)
Posted Jul 3, 2012 12:51 UTC (Tue)
by nix (subscriber, #2304)
[Link] (28 responses)
Posted Jul 3, 2012 13:18 UTC (Tue)
by Thue (guest, #14277)
[Link]
Posted Jul 3, 2012 19:02 UTC (Tue)
by kleptog (subscriber, #1183)
[Link] (26 responses)
I read somewhere that an NTP server (it might have been OpenBSDs) handled it by smearing it over 10 seconds, since the adjtime interface specifies a maximum slew of 10%.
Frankly I think Google's smearing algorithm is a brilliant idea. You could implement it with a handful of lines in the NTP server. (You don't need to do the lying-to-downstream bit).
Our systems were apparently protected by an upstream old OpenBSD server losing the leap-second bit so they spent the next day resyncing their clock back in line with the new time (the munin graphs are interesting). Unfortunately, the OpenBSD servers themselves didn't do quite so well, they stepped the clock back a whole second which apparently threw OpenVPN for a loop.
Posted Jul 3, 2012 19:28 UTC (Tue)
by drag (guest, #31333)
[Link] (24 responses)
Posted Jul 3, 2012 19:54 UTC (Tue)
by paulj (subscriber, #341)
[Link] (15 responses)
Posted Jul 3, 2012 19:59 UTC (Tue)
by drag (guest, #31333)
[Link] (14 responses)
I expect that for a whole host of applications having a second that is randomly different from all other seconds would be irritating. Anything dealing with automation on a assembly line would probably be irritated to know that their devices are going to have to deal with time units that are shifting goal posts. Avionics was mentioned above as well as a couple other things that I won't bother repeating.
The whole point of the leap second is to keep a second a second. When a second is not a second then what do you do to deal with that?
Posted Jul 3, 2012 20:08 UTC (Tue)
by paulj (subscriber, #341)
[Link] (13 responses)
For timestamps for record keeping, with similar requirements, surely they should be using epoch-based kernel interfaces, and doing any remaining formatting to and conversion for calendar times in userspace?
Posted Jul 3, 2012 21:03 UTC (Tue)
by drag (guest, #31333)
[Link] (12 responses)
Well if I was following the discussion correctly it's those 'epoch-based kernel interfaces' that are the things being 'smeared'.
Besides that,
If it was up to me all the programs I use would only see time in epoch UTC. That would be the only time supported by anything. It's the userspace's responsibility to present time in a human-readable format. Unfortunately that is not how people do things.
Posted Jul 3, 2012 23:53 UTC (Tue)
by xman (subscriber, #46972)
[Link] (4 responses)
TAI is the way to go.
Posted Jul 4, 2012 3:28 UTC (Wed)
by drag (guest, #31333)
[Link] (2 responses)
As far as scientific time keeping it's already been found to be fundamentally flawed due to the fact that they didn't take the effect of gravity into it. and is probably going to be replaced by something else eventually that is adjusted for altitude.
Posted Jul 4, 2012 13:26 UTC (Wed)
by andreasb (guest, #80258)
[Link]
Posted Jul 4, 2012 18:32 UTC (Wed)
by cesarb (subscriber, #6266)
[Link]
GPS does. The GPS timestamp is TAI with a fixed offset.
Posted Jul 17, 2012 17:59 UTC (Tue)
by Baylink (guest, #755)
[Link]
The problem here is not the *choice of timescale*: UTC is monotonic even over leap seconds; 58, 59, 60, 00.
The *problem* is that the kernel isn't following UTC *either*; not if it's ticking backwards. It's that *ticking backwards* part that is the problem, and I've yet to see a truly compelling reason why it should do so.
Posted Jul 4, 2012 11:49 UTC (Wed)
by paulj (subscriber, #341)
[Link] (6 responses)
Anyway, the question remains: exactly who needs to have the leap-second occur as an inserted second, rather than a spaced out smear?
People who need precise control/responses: They definitely don't want it, they need accurate relative time.
People who need 1±0.1 second accuracy to the global reference of UTC: well, that must be because they need to compare time across systems. In which case, they need some *other* system to synchronise time across those systems, such as NTP. If those systems are within one organisation, they can use NTP to do the slew in a relatively co-ordinated fashion.
So whose left? It seems to me that it must be organisations who wish to compare time to ±0.1s accuracy across systems distributed over multiple organisations, who do not normally work closely enough together that they can arrange to synchronise to anything other than UTC.
So how many such organisations exist with those kinds of requirements? What is the application? Is it even realistic to expect ±0.1s accuracy in timestamps at such scales?
Why hold the reliability of our software hostage to requirements that few likely need or care about? Would it be possible to punt to userspace, and have NTP handle the method, through an interface that allows the kernel to be agnostic about it? Wouldn't that be much much better for pretty much everyone? (Don't you need to run NTP in the first place in order to get the leap-second?).
Posted Jul 4, 2012 14:01 UTC (Wed)
by faramir (subscriber, #2327)
[Link] (3 responses)
Posted Jul 4, 2012 14:22 UTC (Wed)
by paulj (subscriber, #341)
[Link] (1 responses)
Posted Jul 4, 2012 17:56 UTC (Wed)
by jwakely (subscriber, #60262)
[Link]
Posted Jul 7, 2012 1:13 UTC (Sat)
by BenHutchings (subscriber, #37955)
[Link]
Posted Jul 6, 2012 17:31 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
Well, to be precise, there's no such thing as time since X in UTC. UTC is means of identifying a moment in time that involves year, month, day, etc. It implicitly provides a means of identifying certain intervals too, for example tells you what "February 1992" is. It doesn't deal with lengths of time (periods).
The relationship between the POSIX time representation and UTC is that the POSIX count assumes 60 seconds in every UTC minute, regardless of how long that minute actually lasts.
Posted Jul 12, 2012 18:59 UTC (Thu)
by VITTUIX-MAN (guest, #82895)
[Link]
Any system that has heterogenous hardware and software that nevertheless passes around time stamps, could get upset if different machines used heterogenous sources of time as well, so that this smearing wont happen in unison. That is, different ntp servers some of which smear, others not or they use different smear period and interwall.
Rare - surely, as rare as leap second problems, but have a million systems, and a few gets hitten by this hard.
Posted Jul 3, 2012 20:04 UTC (Tue)
by kleptog (subscriber, #1183)
[Link] (7 responses)
leap_second_handling = smear|strict
I'd use the 'smear' setting. Alternatively, if there was a syscall that gave me the TAI I might use that. But monotonic time is just one of those assumptions that creeps in very easily.
Interestingly, you could define a smeared time which would be mostly UTC but be convertible exactly to both UTC and TAI on demand. But for me monotonicity is the most important bit really.
Posted Jul 3, 2012 21:47 UTC (Tue)
by lindi (subscriber, #53135)
[Link] (5 responses)
Posted Jul 4, 2012 18:49 UTC (Wed)
by kleptog (subscriber, #1183)
[Link] (1 responses)
Posted Jul 4, 2012 19:38 UTC (Wed)
by lindi (subscriber, #53135)
[Link]
Posted Jul 6, 2012 4:48 UTC (Fri)
by pr1268 (guest, #24648)
[Link] (2 responses)
Why return t + 10;? I'm also curious about the curl braces creating a new scope but no if() / do / while() code. I'm not trying to be critical; just a little curious...
Posted Jul 6, 2012 5:43 UTC (Fri)
by lindi (subscriber, #53135)
[Link] (1 responses)
The new block was just for clarity.
Posted Jul 8, 2012 1:53 UTC (Sun)
by pr1268 (guest, #24648)
[Link]
I was then going to ask why add ten seconds, but then I found out what the ten seconds were about (scroll down to the image titled "Time scales since the cesium atomic frequency standard").
Posted Jul 4, 2012 9:40 UTC (Wed)
by Tobu (subscriber, #24111)
[Link]
Posted Jul 3, 2012 23:52 UTC (Tue)
by xman (subscriber, #46972)
[Link]
Posted Jul 3, 2012 23:49 UTC (Tue)
by xman (subscriber, #46972)
[Link]
Posted Jul 3, 2012 8:28 UTC (Tue)
by nowster (subscriber, #67)
[Link] (1 responses)
Posted Jul 3, 2012 10:28 UTC (Tue)
by ballombe (subscriber, #9523)
[Link]
Posted Jul 3, 2012 14:02 UTC (Tue)
by joey (guest, #328)
[Link] (4 responses)
This seems to be a good overview: http://landslidecoding.blogspot.com/2012/07/linuxs-leap-s...
There were reports of deadlocks with older kernels on, at least, Debian stable. Above page also points out reports of a "spinlock lockup" message whose cause remains unknown.
Posted Jul 3, 2012 17:10 UTC (Tue)
by mgedmin (subscriber, #34497)
[Link]
Posted Jul 3, 2012 23:59 UTC (Tue)
by mcisely (guest, #2860)
[Link] (2 responses)
I personally experienced the deadlock issue.
I spent 3+ unplanned hours at work last Saturday afternoon trying to figure out why our main server kept wedging itself every time it transitioned from single user to multi-user mode. I had no idea about the pending leap-second at the time and had been suspecting storm damage from the previous night. I finally clued-in when I found that the kernel hang would happen every time about 5-8 seconds after starting ntpd. It was running Debian Lenny (yeah, old) with a Debian stock 2.6.32-5-amd64 kernel. Server architecture is a multicore Athlon setup. The hangs started happening at about 7:43AM CDT. Once I figured out that trigger was ntp, I disabled that, then searched the web and found this: Monday morning I enabled ntp again and started it - and as expected, no lockup...
Posted Jul 7, 2012 1:16 UTC (Sat)
by BenHutchings (subscriber, #37955)
[Link] (1 responses)
Posted Jul 7, 2012 1:22 UTC (Sat)
by mcisely (guest, #2860)
[Link]
Posted Jul 3, 2012 14:22 UTC (Tue)
by bobsol (subscriber, #54641)
[Link] (7 responses)
Posted Jul 3, 2012 15:29 UTC (Tue)
by ejr (subscriber, #51652)
[Link]
Posted Jul 3, 2012 17:16 UTC (Tue)
by Ringding (guest, #34316)
[Link] (1 responses)
Posted Jul 7, 2012 1:36 UTC (Sat)
by BenHutchings (subscriber, #37955)
[Link]
The known bugs involving kernel hangs are:
1. Deadlock in printk at midnight (introduced in ???, fixed by commits b845b51, fa33507 in 2.6.29, 2.6.27.46, Debian package version 2.6.26-20)
When I saw the fix for bug 2 it appeared that the live-lock being described was introduced by bd33126 etc. (post-3.3) and therefore not possible in earlier versions, but I'm no longer certain of this.
So I'm still hoping to find out what went wrong in 2.6.32 and in 3.2 so these can be fixed in stable updates.
Posted Jul 3, 2012 18:32 UTC (Tue)
by smoogen (subscriber, #97)
[Link] (1 responses)
My Fedora laptop running Chrome went 100% load during the leap second. My Fedora and RHEL server just running apache didn't.
However other RHEL and Fedora boxes reported issues depending on the software that was being run at that time.
Posted Jul 4, 2012 11:05 UTC (Wed)
by man_ls (guest, #15091)
[Link]
The funny thing was that after stopping both browsers everything went back to normal, but restarting any of them CPU load went up again. I had never seen something like this, so I took it as a divine sign telling me to go to bed and didn't look back... until the first LWN report.
My Debian stable SheevaPlug server displayed the deadlock, or something like it. That is a lot of bugs for a tiny second!
Posted Jul 4, 2012 9:09 UTC (Wed)
by ftc (subscriber, #2378)
[Link] (1 responses)
The installation on CentOS was affected by the problem (i.e. java started using an insane amount of CPU due to timers expiring immediately), while the installations on Debian appeared to work fine.
Posted Jul 4, 2012 17:38 UTC (Wed)
by Lennie (subscriber, #49641)
[Link]
So maybe you have 32-bit Java on Debian and 64-bit Java on CentOS ?
Posted Jul 3, 2012 20:54 UTC (Tue)
by andrewt (guest, #5703)
[Link] (2 responses)
Posted Jul 3, 2012 21:07 UTC (Tue)
by corbet (editor, #1)
[Link] (1 responses)
Posted Jul 4, 2012 10:37 UTC (Wed)
by nix (subscriber, #2304)
[Link]
(I note that they've been renamed to the 'International Earth Rotation and Reference Systems Service', which is a very much less nifty name somehow.)
Posted Jul 3, 2012 22:36 UTC (Tue)
by butlerm (subscriber, #13312)
[Link] (16 responses)
1. The adoption of a standard for real time, something like CLOCK_TAI. (It is too bad that CLOCK_REALTIME is already taken for such a ridiculously non real time clock).
Posted Jul 4, 2012 2:40 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (13 responses)
The problem is that it's probably not the best thing on which to base your internal clock, nor for high-precision, SI second based algorithms.
The second problem is that a ton of Unix kernel code (Linux and *BSD), as well as the NTP system, is predicated on the existing behavior. Which is why the software industry and their lobbyists and politicians want to redefine UTC, instead of letting people migrate to TAI (or a TAI successor), with mappings to UTC.
Posted Jul 4, 2012 13:24 UTC (Wed)
by paulj (subscriber, #341)
[Link] (12 responses)
Does leap-seconds not invalidate that latter assumption? There are still 24 hours in the day, and 1440 minutes, but any of those minutes could have 61 seconds in them?
Posted Jul 4, 2012 15:05 UTC (Wed)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Jul 4, 2012 15:29 UTC (Wed)
by Tobu (subscriber, #24111)
[Link]
Posted Jul 5, 2012 1:46 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (9 responses)
The SI second is different than the second that most people care about. When people speak of seconds, they're usually speaking in terms of remainders of a minute, which are remainders of an hour, which are remainders of a day. People chunk time, the same way we chunk everything else for our memory.
The day is a fuzzy unit, and equivocating day units with second units is always going to fail. So you can either redefine the day (as industry wants to do by changing UTC), or you can redefine the second, as POSIX effectively does. Or, I suppose, software could stop using second units for storage, and use expanded date-time descriptions.
I say that POSIX time is beautiful because it, rather accidentally, codified a redefinition of the second. And it turns out that it works very well for many use cases, both where you produce output directly for human consumption, or for inputting into algorithms which collaterally produce output for human consumption. It's a manifestation of the worse-is-better principle, where you sacrifice a quality like elapsed second precision for other qualities--simplicity and, depending on the context, robustness. However, it's just horrible for cases that rely on precise SI seconds, and it's clearly not simplistic from the perspective of the kernel.
Posted Jul 5, 2012 4:53 UTC (Thu)
by raven667 (subscriber, #5198)
[Link] (1 responses)
8-) kidding of course.
Posted Jul 12, 2012 13:41 UTC (Thu)
by njs (subscriber, #40338)
[Link]
1) When the event occurs, get a timestamp from the nearest convenient national lab atomic clock.
2) Wait a few weeks while the different national labs compare their clock's relative drift rates over the last period, pick some sort of average as the "real TAI time", and publish tables mapping from what their clock actually did to this consensus clock that doesn't exist.
3) Look up the timestamp you got in step 1 in this table.
It's hard to make jokes about time...
Posted Jul 5, 2012 5:20 UTC (Thu)
by butlerm (subscriber, #13312)
[Link]
This failure in particular would not have occurred if the kernel and all the pertinent applications used a reliable time base do timing with, instead of trying to derive reliable timing from a time base with all the stability of a drunken sailor.
Posted Jul 5, 2012 9:40 UTC (Thu)
by dgm (subscriber, #49227)
[Link] (5 responses)
Forcing the 1 day=86400 seconds stuff was one of the worst ideas ever, and we're still paying the consequences. Effectively it means you are really counting days, not seconds, but a day is not a precisely defined unit. Had they just kept as a simple count of seconds, that would have been useful and simple. And less error prone.
I sincerely don't get how can you say this is beautiful.
Posted Jul 6, 2012 17:14 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (4 responses)
You might be able to argue that in 2012 it would be better for POSIX to specify a simple count of seconds, but if you look at how the world was when that time format was invented, there just can't be any question that not including leap seconds in the count was the right thing to do.
Users of computers want traditional UTC-style year-month-day etc. datetimes. To convert from a simple count of seconds to that requires not only significantly more code to be written but some way to know when the leap seconds were. A file, a manual maintenance procedure, or a global computer network, all of which were simply not practical enough to meet
the timekeeping requirements of the users.
Most of the present discussion isn't about whether the POSIX standard should be a simple count of seconds, but whether the kernel's time base should be. POSIX doesn't say how anyone has to keep time; it just says how it gets communicated. Either way, some users are going to be adjusting for leap seconds; some adding them, others removing them.
Posted Jul 9, 2012 11:33 UTC (Mon)
by dgm (subscriber, #49227)
[Link] (3 responses)
It was a mistake, and no amount of historic excuses can make it a right decision. I never was the right thing to do, neither is it now.
> Users of computers want traditional UTC-style year-month-day etc.
Users don't talk to the kernel, nor with the system libraries. They talk to applications. And by the way, users really don't care about UTC, 99% of the time they want the time their watch says it is. Until that's not enough, and then what they want is an _unambiguous_ moment in time. Anything between those extremes is completely useless.
> To convert from a simple count of seconds to that requires not only significantly more code to be written but some way to know when the leap seconds were.
What exactly solved forcing the duration of the day? Nothing, everything is exactly as complex as it was, but now you have that "fictional" day which is different from what users need. And you still need complex time handling code because there are such niceties as timezones, local calendars, different week conventions, and even time dilation!
> Most of the present discussion isn't about whether the POSIX standard should be a simple count of seconds, but whether the kernel's time base should be.
The mere fact that there's a discussion is a sign that people are not sufficiently aware of past mistakes. We should be raising awareness of this, with the goal of avoiding repeating them. The Kernel should just count time in the most unambiguous way it can, and let applications handle the presentation.
> POSIX doesn't say how anyone has to keep time; it just says how it gets communicated.
An application cannot give the user an unambiguous moment in time if all it can get from POSIX is an ambiguous representation. That is the crux of the matter.
Posted Jul 9, 2012 15:57 UTC (Mon)
by giraffedata (guest, #1954)
[Link] (2 responses)
Every watch in the world displays traditional UTC-style year-month-day etc.
My point is that POSIX time makes it more practical for a computer to display that than a straight seconds-since epoch time format would. Or are you referring to the watch being inaccurate?
Except for the time dilation, which history proves you don't need in your time handling code, those are all simpler to handle than leap seconds. Asking the computer to understand leap seconds is asking for a whole other level of computation. The most significant part of that is knowing when the leap seconds are.
At the risk of being repetitive, it allows for practical calculation of UTC-style, wristwatch-style datetimes. Also for calculating differences in the larger units such as hours and days (when a computer user says "3 days after 10:00 Wednesday," he normally means 10:00 Thursday, not 86400*3 seconds after 10:00 Wednesday).
It also causes or fails to solve some other problems. The only question is which problems are greater?
By the way, if unambiguous datetimes were seen as a problem worth solving in the days that POSIX was invented, I think the proper solution would have been to go with the same "seconds since epoch not counting leap seconds" and then have a separate bit saying "leap second" that the folks who need to distinguish between 23:59:59 and 23:59:60 could use. Practically computable datetimes are that important.
Posted Jul 9, 2012 18:44 UTC (Mon)
by dark (guest, #8483)
[Link] (1 responses)
Ah, but consider the user who says "3 days after Friday 10:00". Presumably that user wants "Monday 10:00", and not "Monday 9:00, 10:00 or 11:00 depending on whether there is a DST transition this weekend". So yeah, you can't use 86400*n anyway. Unless you want to mishandle DST, which is the option chosen by the vast majority of applications :)
Posted Jul 9, 2012 23:16 UTC (Mon)
by dgm (subscriber, #49227)
[Link]
As a bunch of others have pointed out, leap seconds are very similar in essence to timezones -or any other oddity of civil time- in that they are _arbitrary_. They really belong in the code that has no choice but to handle them: the system libraries (glibc for modern Linux distros).
Posted Jul 4, 2012 9:29 UTC (Wed)
by Tobu (subscriber, #24111)
[Link]
Posted Jul 5, 2012 15:27 UTC (Thu)
by kjp (guest, #39639)
[Link]
Posted Jul 4, 2012 1:48 UTC (Wed)
by leromarinvit (subscriber, #56850)
[Link]
Now I finally know what happened to my file server/router a few days ago! A few minutes after the leap second, Nagios sent me a warning about the load being somewhere around 15. I didn't think much of it, since the nightly backup sometimes does that. The next day I couldn't log in over SSH any more, and some time later dnsmasq stopped serving DHCP requests. Killing everything via sysrq finally brought it back to its senses, but at the expense of all the services normally started by init, so I just rebooted in the end.
There was a Java program running that updates a timestamp in its GUI every second, so that probably caused it. I initially thought said Java app had just gone mad and used up all memory, causing the box to swap itself to death, because I saw the HDD light flickering like mad. Of course, this being Java, it quite possibly did allocate some memory every time the counter fired, saving the GC for later...
For reference, this is a Ubuntu 10.04 box running 3.4.3-ck.
Posted Jul 7, 2012 11:45 UTC (Sat)
by lab (guest, #51153)
[Link] (2 responses)
Posted Jul 8, 2012 0:14 UTC (Sun)
by pr1268 (guest, #24648)
[Link]
Fascinating indeed, and quite apocalyptic, too: Heavens, yes! 1 This experience has shown us that how Linux handles leap seconds is not a "pointless" detail.
Posted Jul 8, 2012 10:45 UTC (Sun)
by nix (subscriber, #2304)
[Link]
"Some systems have resorted to slowing down the clock by 1/3600th for the last hour before the leap second, hoping that nobody notices that seconds suddenly are 277 microseconds long."
I think that might be noticed! Seconds being 277 microseconds longer, though, is less likely to be noticed except by people who really care.
Posted Jul 9, 2012 13:54 UTC (Mon)
by Jonno (subscriber, #49613)
[Link] (6 responses)
That way inserting a leap second will be handled the same way as when politicians change the DST rules, with a simple tzdata package update.
Yes, this would break POSIX, but it wouldn't be the first time Linux break POSIX in some small way when POSIX dictates stupid behaviour.
Posted Jul 9, 2012 16:15 UTC (Mon)
by nix (subscriber, #2304)
[Link] (5 responses)
And for what benefit? Moving a pile of complexity out of ntpd (which is meant for dealing with this sort of thing) and out of the kernel's time handling code (which is a single body of code maintained by people who know what they're doing) into glibc and a vast body of applications. Now perhaps glibc could get its updates via tzdata, but are the applications all going to get it right? They get it wrong *now*, many would need changing, and as has been pointed out elsethread, getting this right is hard, since even the original authors of many programs probably didn't expect 'this time tomorrow' and 'this time 86400 seconds away' to have distinct answers, and nearly all the time they wouldn't.
The solution to bugs in a bit of code in highly-tested critical software that is hard to debug and test because it caters to a rarely-arising condition is surely *not* to distribute and multiply that code among a vast number of applications, critical and otherwise, many of which are much less tested than the kernel is.
For this to be less dangerous, you'd need to translate every single time the kernel passes to userspace by whatever means (including timestamps in network filesystems!), teach everything that touched raw fs dumps to translate times as well, and unless you want to waste all that time add TAI-returning syscalls akin to gettimeofday() et al... and that looks like a lot more code than the existing set of leap-second-handling code, which is clearly *already* too rarely executed to be expected to keep working between leap second invocations.
Or we could just wait. Leap seconds are getting quadratically more frequent over long enough timespans, and will be downright common in timespans comparable to that since the invention of the computer. When the things are occurring monthly, something will either be done to fix it or at the very least the code to handle them will be frequently tested!
Posted Jul 9, 2012 17:49 UTC (Mon)
by rschroev (subscriber, #4164)
[Link] (2 responses)
Given the fact that leap seconds, to me it seems to best way to handle time is:
- The kernel deals exclusively with TAI, by which I mean the number of seconds since the Epoch. Leap seconds are seconds just like any other. That means that some days are somewhat shorter or longer than 86400 seconds, but that's not important to the kernel. That is consistent with the manpage already: "time() returns the time as the number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC)."
- Programs can use that directly if they want to, mostly if they want to know the time delta between two moments in seconds. In other cases, they can call asctime(), ctime(), gmtime() or localtime() which handle the translation from TAI to UTC and local time. Perhaps functions should be added for conversion bewteen TAI and UTC, both expressed as seconds since the epoch.
This way the kernel time handling code is as simple as can be (no special cases there), and the complexity is moved to glibc which already knows how to handle time zones. Adding or subtracting leap seconds is very similar to handling time zones, I would think. The system always has a simple unambiguous idea of the time, converted to/from wall clock time as needed by glibc for interacting with users, which to me is just a natural extensions of the current concept of the kernel using UTC instead of local time.
I agree it makes time delta calculations more difficult: instead of adding n * 86400 seconds for n days, you have to add the required number of days/months/years in struct tm.
If I understand you correctly, you are saying that such a solution would cause the time from gettimeofday() to be different from the file time on a freshly created file. I don't understand that... since TAI is all the kernel knows, but will give you the same value.
The only real disadvantage I see is that it potentially makes interoperation between different systems harder, because some systems might do it this way and some might not.
Posted Jul 10, 2012 16:00 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Jul 10, 2012 16:09 UTC (Tue)
by rschroev (subscriber, #4164)
[Link]
Posted Jul 9, 2012 20:49 UTC (Mon)
by Jonno (subscriber, #49613)
[Link] (1 responses)
> It would not just break POSIX but a huge number of applications Very few application does syscalls directly, almost everyone goes through glibc which already converts times using tzdata to get time-zone handling correctly. > unless you complicated glibc to do the conversion, in which case it would still be obviously broken in some major cases, as the time you got from gettimeofday() would be different from the time visible on a file you just created when you stat()ted it Of course not, all not-time-zone-aware times would be TAI, and all time-zone-aware times would be correct for that time-zone. UTC would be just another time-zone (which it already is, though today the kernel-time to UTC conversion is trivial). > And for what benefit? Moving a pile of complexity [...] into glibc and a vast body of applications. Applications would need no more complexity than what they already need for correct time-zone handling. The small number of applications that lacks that (small) complexity today and thus uses UTC exclusively (and are thus wrong for all users at least half the year) would just start using TAI instead (and thus be wrong for all users all year). All other applications would work just fine without any change. tz-data would need to start carry leap second information, and glibc would need to make use of it, but that is trivial compared to what they already deal with (leap seconds are after all the same in all jurisdictions). The only real problem is handling the transition correctly. I believe there would only have to add fixes to three components for it to work.
Posted Jul 10, 2012 16:02 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Jul 12, 2012 13:50 UTC (Thu)
by njs (subscriber, #40338)
[Link] (1 responses)
Posted Jul 12, 2012 14:41 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Leaping seconds and looping servers
Leaping seconds and looping servers
If you were designing a system from the beginning, you could consider that kind of division of responsibilities. The kernel has to fit into existing systems, though, without breaking things. That means preserving the current interfaces.
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Many POSIX API implicitly CLOCK_REALTIME=CLOCK_UTC. Alternativ APIs must be made where the user can choose between CLOCK_UTC, CLOCK_TAI, CLOCK_MONOTONIC etc. In many (most) cases CLOCK_MONOTONIC would make most sense.
Right now we are using CLOCK_REALTIME and using arithmetic on the resulting struct timestamp to find time differences.
Leaping seconds and looping servers
#ifndef CLOCK_MONOTONIC_RAW
#define CLOCK_MONOTONIC_RAW CLOCK_MONOTONIC
#endif
Leaping seconds and looping servers
Leaping seconds and looping servers
That's exactly what google did. See here.
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Ah yes, the damn POSIX epoch specifies time since in Epoch in UTC. I thought it was just "seconds since epoch". :(
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Well, you can demonstrate it fairly easily on the command-line:
Leaping seconds and looping servers
$ TZ="right/UTC" date ; TZ="posix/UTC" date
Wed Jul 4 18:26:24 UTC 2012
Wed Jul 4 18:26:49 UTC 2012
What you need is a library that can easily open multiple timezones at once. I thought glibc could do it, but it doesn't appear to, from reading the source. However, GLib seems to have support. I would have thought the g_time_zone_get_offset() would be enough, but apparently not. Actually, even though I can open both zones I can't convince GLib to give me the answer :(. Though it must be possible.
Leaping seconds and looping servers
tai.c - why return t + 10?
tai.c - why return t + 10?
why t + 10?
Here's a little about libtai. lindi's approach is interesting as well since it uses the leap seconds table implicitly precent in Olson's tzdata, which is already regularly updated in stable distributions.
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
what about the deadlocks?
what about the deadlocks?
what about the deadlocks?
what about the deadlocks?
The system locked up solid each time, no signs of distress or trouble anywhere prior to the lockup, not even on the console. So no there were unfortunately no kernel logs for this.
what about the deadlocks?
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
2. Possible live-lock when NTP sets up the leap second (introduced in ???, fixed by 6b43ae8 in 3.4)
Leaping seconds and looping servers
My Debian testing machine went to 100% CPU with either Chrome or Firefox, the former load being quite irregular with many spikes and the latter a solid regular 100%. Luckily I have two cores and the second one seemed to be unaffected.
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
You'd be amazed how hard it is to get that kind of patch merged...
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
2. The standardization of an interface for smoothed UTC, something like CLOCK_SLS.
3. The standardization of an interface for strict (non-smoothed,non-slewed) UTC, something like CLOCK_UTC.
4. Deprecate CLOCK_REALTIME.
5. At the kernel level, do all timekeeping using TAI or the equivalent. Convert to UTC, UTC-SLS as necessary.
Leaping seconds and looping servers
POSIX time is a beautiful thing. … You can depend on 86400 "seconds" per day,
Leaping seconds and looping servers
Leaping seconds and looping servers
Or said second happens twice, if you have higher-resolution POSIX timestamps.
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Forcing the 1 day=86400 seconds stuff was one of the worst ideas ever ...
Had they just kept as a simple count of seconds, that would have been useful
and simple. And less error prone.
Leaping seconds and looping servers
Leaping seconds and looping servers
Users of computers want traditional UTC-style year-month-day etc.
users really don't care about UTC, 99% of the time they want the time their watch says it is.
And you still need complex time handling code because there are such niceties as timezones, local calendars, different week conventions, and even time dilation!
What exactly is solved forcing the duration of the day?
Leaping seconds and looping servers
Leaping seconds and looping servers
There was a recent patch (at the RFC stage) adding CLOCK_TAI to the kernel. The implementation is convoluted (John Stultz prefers if it is built using the current abstractions, which also had a problem with CLOCK_MONOTONIC pausing during a leap second), so it might still be bug-prone. Actually testing the damn clock subsystem ahead of time, and after reviewing any related commits, could compensate for that.
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
But Linus Torvalds' observation that "95% of all programmers think they are in the top 5%, and the rest are certain they are above average" [...] When a large fraction of the world economy is run by the creations of lousy programmers, and when embedded systems are increasingly capable of killing people, do we raise the bar and demand that programmers pay attention to pointless details1...?
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
... and because the relevant standard guarantees that adding 86400 to a time will always give you the same time on the next day. A *lot* of code depends on this assumption, ...
That assumption is already wrong twice a year in places that observe daylight saving time, by a full hour in most places, which is a lot more than the difference between TAI and UTC. Code that depends on it is already incorrect.
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers
Leaping seconds and looping servers