|
|
Subscribe / Log in / New account

Leaping seconds and looping servers

By Jonathan Corbet
July 2, 2012
As most of the net is likely to have heard by now, Linux servers displayed a notable tendency to misbehave during the leap second event at the end of the day on June 30. The problem often presented itself as abrupt and sustained load spikes on the affected machines. The bug that caused this behavior has been tracked down (thanks to a determined effort by John Stultz); a look at what happened shines an interesting light on the trickiness of dealing with time in software systems.

The earth's rotation is slowing over time; contrary to some public claims, this slowing is not caused by Republican administrations, government spending, or proprietary software. In an attempt to keep the official Coordinated Universal Time (UTC) in sync with the earth's behavior, the powers that be occasionally insert an additional second (a "leap second") into a day; 25 such seconds have been inserted since the practice began in 1972. This habit is not without its detractors, and there are constant calls for its abolition, but, for now, leap seconds are a reality that the world (and the kernel) must deal with. For the curious, the Wikipedia leap second page has more detail than almost anybody could want.

The kernel's core time is kept in a timespec structure:

    struct timespec {
	__kernel_time_t	tv_sec;			/* seconds */
	long		tv_nsec;		/* nanoseconds */
    };

It is, in essence, a count of seconds since the beginning of the epoch. Unfortunately, that count is defined to not include leap seconds. So when a leap second happens, the system time must be explicitly corrected; that is done by setting the system clock back one second at the end of that leap second. The code that handles this change is quite old and works pretty much as advertised. It is the source of this message that most Linux systems should have (in some form) in their logs:

    Jun 30 19:59:59 dt kernel: Clock: inserting leap second 23:59:60 UTC

The kernel's high-resolution timer (hrtimer) code does not use this version of the system time, though — at least, not directly. Instead, hrtimers have a couple of internal time bases that are offset from the system time. These time bases allow the implementation of different clocks; the "realtime" clock should adjust with the time, while the "monotonic" clock must always move forward, for example. Importantly, these timer bases are CPU-specific, since realtime clocks can differ between one CPU and the next in the same system. The hrtimer offsets allow the timer subsystem to quickly turn a system time into a time value appropriate for a specific processor's realtime clock.

If the system time changes, those offsets must be adjusted accordingly. There is a function called clock_was_set() that handles this task. As long as any system time change is followed by a call to clock_was_set(), all will be well. The problem, naturally, is that the kernel failed to call clock_was_set() after the leap second adjustment, which certainly qualifies as a system time change. So the hrtimer subsystem's idea of the current time moved forward while the system time was held back for a second; hrtimers were thereafter operating one second in the future. The result of that offset is that timers started expiring one second sooner than they should have; that is not quite what the timer developers had in mind when they used the term "high resolution."

For many applications, having a timer go off one second early is not a big problem. But there are plenty of situations where timers are set for less than one second in the future; all such timers will naturally expire immediately if the timer subsystem is operating one second ahead of the system time. Many of these timers are also recurring timers; they will be re-set immediately after expiration, at which point they will immediately expire again — and so on. The resulting loop is the source of the load spikes reported by victims of this bug across the net.

The fix is to call clock_was_set() in the leap second code—a call that had been removed in 2007. But it's not quite that simple. The work done by clock_was_set() must happen on every CPU, since each CPU has its own set of timer bases. That's not something that can be done in atomic context. So John's patch detects a call in atomic context and defers the work to a workqueue in that case. With this patch in place, the kernel's leap second handling should work again.

How could such a bug come about? Time-related code is notoriously tricky in general; bugs are common. But the situation is far worse when the code in question is almost never executed. Prior to June 30, 2012, the last leap second was at the end of 2008. That is 3½ years in which the leap second code could have been broken without anybody noticing. If the kernel had a regularly-run regression test that verified the correct functioning of hrtimers in the presence of leap second adjustments, this problem might just have been caught before it affected production systems, but nobody has made a habit of running such tests thus far.

Perhaps that will change in the future; if nothing else, distributors with support obligations are likely to run some tests ahead of the next scheduled leap second adjustment. Hopefully, that will catch any problems in this particular little piece of code, should they happen to slip in again. Beyond that, one can always hope for an end to leap seconds. The kernel could also contemplate a switch to international atomic time (TAI), which does not have leap seconds, for its internal representation. Using TAI internally has its own challenges, though, including a need to avoid changing the time representation as seen by user space—meaning that the kernel would still have to track leap seconds internally. So it seems likely that, one way or another, leap seconds are likely to continue to be a source of irritation and bugs in the future.

Index entries for this article
Kernelhrtimer
KernelTimers


to post comments

Leaping seconds and looping servers

Posted Jul 3, 2012 1:34 UTC (Tue) by ras (subscriber, #33059) [Link]

Yeah, well my experience was rather different. I'd be interested to know if anybody else saw the same thing.

At around 2012-06-27T09:30:00+1000 several servers had their time go backwards by precisely 10 hours. Ie the time zone difference. These servers were all running a fully patched Debian stable. There are distributed around the country, all running ntpd with 4 debian.pool.ntp.org upstream servers, presumably different servers in each case. Most of them are little more than an internet gateway and so were running very little in the way of non-Debian software. But two of them are application servers.

That day I discovered our how well our in-house software copes with the day being set to yesterday. Turns out it causes lots of transient problems as it whinges about the entered dates being wrong. The most serious thing was a SIP E1 gateway running an embedded linux stopped receiving calls. That took out the companies entire phone system, but a reboot fixed that.

I am pretty sure it was ntpd that was at fault, as "ntpq -c peers", showed then being out by offset by precisely 10 hours from their upstream servers.

Setting the date manually and restarting ntpd fixed the problem, but at the time I was at a total loss at to what the cause might have been. Then I read about the leap second, and then found out that ntpd can start adjusting for it days before hand. Anyway, it made for in interesting departure from the normal daily routine.

Leaping seconds and looping servers

Posted Jul 3, 2012 2:03 UTC (Tue) by geofft (subscriber, #59789) [Link] (6 responses)

> Using TAI internally has its own challenges, though, including a need to avoid changing the time representation as seen by user space—meaning that the kernel would still have to track leap seconds internally.

Is that actually true? Shouldn't we just be able to make libc deal with it?

Leaping seconds and looping servers

Posted Jul 3, 2012 3:16 UTC (Tue) by corbet (editor, #1) [Link] (5 responses)

If you were designing a system from the beginning, you could consider that kind of division of responsibilities. The kernel has to fit into existing systems, though, without breaking things. That means preserving the current interfaces.

Leaping seconds and looping servers

Posted Jul 3, 2012 6:47 UTC (Tue) by josh (subscriber, #17465) [Link] (4 responses)

True, but the kernel has gone through transitions like this before. glibc could learn to track leap-seconds itself. The kernel could have a compile-time option to enable compatibility support for older userspace, defaulting to on for now. After a distro release or two, that option could go away.

Leaping seconds and looping servers

Posted Jul 3, 2012 11:07 UTC (Tue) by kunitz (subscriber, #3965) [Link] (3 responses)

The kernel must know the POSIX time to update file systems time, because of the definitions of the file system formats. POSIX time requires you to manage UTC including leap seconds in the kernel, but having a linear progressing time as basic time source would definitely make sense.

Leaping seconds and looping servers

Posted Jul 3, 2012 22:07 UTC (Tue) by simlo (guest, #10866) [Link] (2 responses)

Yes, if I would (if I had the time) do the following

Make a alias for the POSIX CLOCK_REALTIME -> CLOCK_UTC. Make a new CLOCK_TAI running along CLOCK_UTC. And some function to get the difference between CLOCK_TAC and CLOCK_UTC at any given time (except you don't know about future leap seconds).
Many POSIX API implicitly CLOCK_REALTIME=CLOCK_UTC. Alternativ APIs must be made where the user can choose between CLOCK_UTC, CLOCK_TAI, CLOCK_MONOTONIC etc. In many (most) cases CLOCK_MONOTONIC would make most sense.

In the applications I am working on right now, I would try to restrict myself to CLOCK_MONOTONIC and CLOCK_TAC, but I would need to translate to and from UTC because some protocols require timestamps in UTC.
Right now we are using CLOCK_REALTIME and using arithmetic on the resulting struct timestamp to find time differences.

Leaping seconds and looping servers

Posted Jul 3, 2012 23:43 UTC (Tue) by dashesy (guest, #74652) [Link] (1 responses)

This is what I use for any relative time:

#ifndef CLOCK_MONOTONIC_RAW
#define CLOCK_MONOTONIC_RAW CLOCK_MONOTONIC
#endif

Leaping seconds and looping servers

Posted Jul 4, 2012 13:57 UTC (Wed) by simlo (guest, #10866) [Link]

Use of CLOCK_MONOTONIC is not an option because we might want to compare timestampt between different servers.

Leaping seconds and looping servers

Posted Jul 3, 2012 4:43 UTC (Tue) by chloe_zen (guest, #8258) [Link] (33 responses)

Jumping backwards seems like a big hammer for the leap second. Why not adjust the time gradually, so that e.g. 101 seconds of objective time results in 100 seconds of system time?

Leaping seconds and looping servers

Posted Jul 3, 2012 5:34 UTC (Tue) by aburgoyne (subscriber, #3924) [Link]

That's exactly what google did. See here.

Leaping seconds and looping servers

Posted Jul 3, 2012 6:48 UTC (Tue) by josh (subscriber, #17465) [Link] (30 responses)

ntp does precisely that for most normal time adjustments, which makes me wonder why it doesn't do so for leap seconds.

Leaping seconds and looping servers

Posted Jul 3, 2012 8:51 UTC (Tue) by gevaerts (subscriber, #21521) [Link] (29 responses)

Because leap seconds are defined as extra seconds, not as a one-second error that suddenly appears and must be corrected for.

Leaping seconds and looping servers

Posted Jul 3, 2012 12:51 UTC (Tue) by nix (subscriber, #2304) [Link] (28 responses)

Quite. Slewing the time over an entire day and changing the length of a second for all that time would have scientists, avionics people and so on hiring assassins to take the proposers of such an insane idea down as fast as possible. Google can do this because Google only cares about time synch, not about its absolute value down to the second. A lot of entities care more deeply about time than that.

Leaping seconds and looping servers

Posted Jul 3, 2012 13:18 UTC (Tue) by Thue (guest, #14277) [Link]

Hopefully anybody who really cares about time already uses TAI. The article mentions the kernel should use TAI internally - the same goes for user space programs such as databases, for the same reason.

Leaping seconds and looping servers

Posted Jul 3, 2012 19:02 UTC (Tue) by kleptog (subscriber, #1183) [Link] (26 responses)

You don't have to smear it over a whole day. For example, you could at one second before set the clock at half speed. Then time still increases monotonically.

I read somewhere that an NTP server (it might have been OpenBSDs) handled it by smearing it over 10 seconds, since the adjtime interface specifies a maximum slew of 10%.

Frankly I think Google's smearing algorithm is a brilliant idea. You could implement it with a handful of lines in the NTP server. (You don't need to do the lying-to-downstream bit).

Our systems were apparently protected by an upstream old OpenBSD server losing the leap-second bit so they spent the next day resyncing their clock back in line with the new time (the munin graphs are interesting). Unfortunately, the OpenBSD servers themselves didn't do quite so well, they stepped the clock back a whole second which apparently threw OpenVPN for a loop.

Leaping seconds and looping servers

Posted Jul 3, 2012 19:28 UTC (Tue) by drag (guest, #31333) [Link] (24 responses)

Like the man says the 'smearing' is only brilliant if you don't care that the leap second happens. For some people it does matter, although this is not typical.

Leaping seconds and looping servers

Posted Jul 3, 2012 19:54 UTC (Tue) by paulj (subscriber, #341) [Link] (15 responses)

For whom would it matter, exactly?

Leaping seconds and looping servers

Posted Jul 3, 2012 19:59 UTC (Tue) by drag (guest, #31333) [Link] (14 responses)

People that need to keep accurate time keeping, I suppose.

I expect that for a whole host of applications having a second that is randomly different from all other seconds would be irritating. Anything dealing with automation on a assembly line would probably be irritated to know that their devices are going to have to deal with time units that are shifting goal posts. Avionics was mentioned above as well as a couple other things that I won't bother repeating.

The whole point of the leap second is to keep a second a second. When a second is not a second then what do you do to deal with that?

Leaping seconds and looping servers

Posted Jul 3, 2012 20:08 UTC (Tue) by paulj (subscriber, #341) [Link] (13 responses)

For control functionality that is compromised by 10% inaccuracy, surely they should be using monotonic timers?

For timestamps for record keeping, with similar requirements, surely they should be using epoch-based kernel interfaces, and doing any remaining formatting to and conversion for calendar times in userspace?

Leaping seconds and looping servers

Posted Jul 3, 2012 21:03 UTC (Tue) by drag (guest, #31333) [Link] (12 responses)

> surely they should be using epoch-based kernel interfaces

Well if I was following the discussion correctly it's those 'epoch-based kernel interfaces' that are the things being 'smeared'.

Besides that,

If it was up to me all the programs I use would only see time in epoch UTC. That would be the only time supported by anything. It's the userspace's responsibility to present time in a human-readable format. Unfortunately that is not how people do things.

Leaping seconds and looping servers

Posted Jul 3, 2012 23:53 UTC (Tue) by xman (subscriber, #46972) [Link] (4 responses)

The kernel's use of UTC time was why ntpd had to do what it did, which exposed the faulty logic.

TAI is the way to go.

Leaping seconds and looping servers

Posted Jul 4, 2012 3:28 UTC (Wed) by drag (guest, #31333) [Link] (2 responses)

Does anybody actually use TAI for anything?

As far as scientific time keeping it's already been found to be fundamentally flawed due to the fact that they didn't take the effect of gravity into it. and is probably going to be replaced by something else eventually that is adjusted for altitude.

Leaping seconds and looping servers

Posted Jul 4, 2012 13:26 UTC (Wed) by andreasb (guest, #80258) [Link]

Going by what's on the Wikipedia page, it has been corrected for altitude (normalized to mean sea level) since 1/1/1977. The former uncorrected TAI got the new name EAL.

Leaping seconds and looping servers

Posted Jul 4, 2012 18:32 UTC (Wed) by cesarb (subscriber, #6266) [Link]

> Does anybody actually use TAI for anything?

GPS does. The GPS timestamp is TAI with a fixed offset.

Leaping seconds and looping servers

Posted Jul 17, 2012 17:59 UTC (Tue) by Baylink (guest, #755) [Link]

No; that's factually incorrect, I'm afraid.

The problem here is not the *choice of timescale*: UTC is monotonic even over leap seconds; 58, 59, 60, 00.

The *problem* is that the kernel isn't following UTC *either*; not if it's ticking backwards. It's that *ticking backwards* part that is the problem, and I've yet to see a truly compelling reason why it should do so.

Leaping seconds and looping servers

Posted Jul 4, 2012 11:49 UTC (Wed) by paulj (subscriber, #341) [Link] (6 responses)

Ah yes, the damn POSIX epoch specifies time since in Epoch in UTC. I thought it was just "seconds since epoch". :(

Anyway, the question remains: exactly who needs to have the leap-second occur as an inserted second, rather than a spaced out smear?

People who need precise control/responses: They definitely don't want it, they need accurate relative time.

People who need 1±0.1 second accuracy to the global reference of UTC: well, that must be because they need to compare time across systems. In which case, they need some *other* system to synchronise time across those systems, such as NTP. If those systems are within one organisation, they can use NTP to do the slew in a relatively co-ordinated fashion.

So whose left? It seems to me that it must be organisations who wish to compare time to ±0.1s accuracy across systems distributed over multiple organisations, who do not normally work closely enough together that they can arrange to synchronise to anything other than UTC.

So how many such organisations exist with those kinds of requirements? What is the application? Is it even realistic to expect ±0.1s accuracy in timestamps at such scales?

Why hold the reliability of our software hostage to requirements that few likely need or care about? Would it be possible to punt to userspace, and have NTP handle the method, through an interface that allows the kernel to be agnostic about it? Wouldn't that be much much better for pretty much everyone? (Don't you need to run NTP in the first place in order to get the leap-second?).

Leaping seconds and looping servers

Posted Jul 4, 2012 14:01 UTC (Wed) by faramir (subscriber, #2327) [Link] (3 responses)

I have this feeling that you have just described "the stock market". If true, that would pretty much explain why everyone ends up caring.

Leaping seconds and looping servers

Posted Jul 4, 2012 14:22 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

Trading requires imposing an absolute ordering - not an absolute time. You could specify order with UTC time-stamps I guess, but still you'd want a central arbiter to provide those time-stamps. Otherwise you'd need an honour system to settle trades - which surely would be open to abuse?

Leaping seconds and looping servers

Posted Jul 4, 2012 17:56 UTC (Wed) by jwakely (subscriber, #60262) [Link]

You only need an ordering at the matching engine. But most participants in the markets are not running matching engines. For measuring the age of quotes and your network latency and your software latency and the latency of messages from the exchange and the latency of messages from other sources and numerous other measurements you want accurate (definitely sub-0.1s!) timestamps, that agree across multiple organisations.

Leaping seconds and looping servers

Posted Jul 7, 2012 1:13 UTC (Sat) by BenHutchings (subscriber, #37955) [Link]

Some people were very happy that June 30 was not a trading day!

Leaping seconds and looping servers

Posted Jul 6, 2012 17:31 UTC (Fri) by giraffedata (guest, #1954) [Link]

Ah yes, the damn POSIX epoch specifies time since in Epoch in UTC. I thought it was just "seconds since epoch". :(

Well, to be precise, there's no such thing as time since X in UTC. UTC is means of identifying a moment in time that involves year, month, day, etc. It implicitly provides a means of identifying certain intervals too, for example tells you what "February 1992" is. It doesn't deal with lengths of time (periods).

The relationship between the POSIX time representation and UTC is that the POSIX count assumes 60 seconds in every UTC minute, regardless of how long that minute actually lasts.

Leaping seconds and looping servers

Posted Jul 12, 2012 18:59 UTC (Thu) by VITTUIX-MAN (guest, #82895) [Link]

"Anyway, the question remains: exactly who needs to have the leap-second occur as an inserted second, rather than a spaced out smear?"

Any system that has heterogenous hardware and software that nevertheless passes around time stamps, could get upset if different machines used heterogenous sources of time as well, so that this smearing wont happen in unison. That is, different ntp servers some of which smear, others not or they use different smear period and interwall.

Rare - surely, as rare as leap second problems, but have a million systems, and a few gets hitten by this hard.

Leaping seconds and looping servers

Posted Jul 3, 2012 20:04 UTC (Tue) by kleptog (subscriber, #1183) [Link] (7 responses)

Let me put it this way, if there were an option in the NTP server:

leap_second_handling = smear|strict

I'd use the 'smear' setting. Alternatively, if there was a syscall that gave me the TAI I might use that. But monotonic time is just one of those assumptions that creeps in very easily.

Interestingly, you could define a smeared time which would be mostly UTC but be convertible exactly to both UTC and TAI on demand. But for me monotonicity is the most important bit really.

Leaping seconds and looping servers

Posted Jul 3, 2012 21:47 UTC (Tue) by lindi (subscriber, #53135) [Link] (5 responses)

Getting TAI time under GNU/Linux seems to be quite challenging. I came up with a rather hacky approach that seems to work: http://iki.fi/lindi/tai.c -- can you figure out how to clean that up?

Leaping seconds and looping servers

Posted Jul 4, 2012 18:49 UTC (Wed) by kleptog (subscriber, #1183) [Link] (1 responses)

Well, you can demonstrate it fairly easily on the command-line:
$ TZ="right/UTC" date ; TZ="posix/UTC" date
Wed Jul  4 18:26:24 UTC 2012
Wed Jul  4 18:26:49 UTC 2012
What you need is a library that can easily open multiple timezones at once. I thought glibc could do it, but it doesn't appear to, from reading the source. However, GLib seems to have support. I would have thought the g_time_zone_get_offset() would be enough, but apparently not. Actually, even though I can open both zones I can't convince GLib to give me the answer :(. Though it must be possible.

Leaping seconds and looping servers

Posted Jul 4, 2012 19:38 UTC (Wed) by lindi (subscriber, #53135) [Link]

Thanks for the pointer! If we had an easy way to get TAI I'd seriously consider using it in our internal systems since many parts already assume they can use simple substraction to get lengths of time intervals.

tai.c - why return t + 10?

Posted Jul 6, 2012 4:48 UTC (Fri) by pr1268 (guest, #24648) [Link] (2 responses)

Why return t + 10;?

I'm also curious about the curl braces creating a new scope but no if() / do / while() code.

I'm not trying to be critical; just a little curious...

tai.c - why return t + 10?

Posted Jul 6, 2012 5:43 UTC (Fri) by lindi (subscriber, #53135) [Link] (1 responses)

"1 January 1972 00:00:00 UTC was 1 January 1972 00:00:10 TAI exactly" -- http://en.wikipedia.org/wiki/Coordinated_Universal_Time

The new block was just for clarity.

why t + 10?

Posted Jul 8, 2012 1:53 UTC (Sun) by pr1268 (guest, #24648) [Link]

I was then going to ask why add ten seconds, but then I found out what the ten seconds were about (scroll down to the image titled "Time scales since the cesium atomic frequency standard").

Leaping seconds and looping servers

Posted Jul 4, 2012 9:40 UTC (Wed) by Tobu (subscriber, #24111) [Link]

Here's a little about libtai. lindi's approach is interesting as well since it uses the leap seconds table implicitly precent in Olson's tzdata, which is already regularly updated in stable distributions.

Leaping seconds and looping servers

Posted Jul 3, 2012 23:52 UTC (Tue) by xman (subscriber, #46972) [Link]

All this does though is change which assumptions about time break. For a particular use case, this might be the right thing to do, but in general, one approach has as much a chance of triggering a bug as the other.

Leaping seconds and looping servers

Posted Jul 3, 2012 23:49 UTC (Tue) by xman (subscriber, #46972) [Link]

Because that would destroy *other* contracts about precision and such with regard to time.

Leaping seconds and looping servers

Posted Jul 3, 2012 8:28 UTC (Tue) by nowster (subscriber, #67) [Link] (1 responses)

Don't forget that leap seconds can go either way. You can have the extra second 23:59:60 (as this year) or you could skip 23:59:59 completely (never happened yet). If the IERS determines that the earth's rotation is faster than atomic time, the leap second that would be applied is negative.

Leaping seconds and looping servers

Posted Jul 3, 2012 10:28 UTC (Tue) by ballombe (subscriber, #9523) [Link]

True, but I doubt it ever happen, if only for political reason. The drift is strongly biaised toward positive leap second, and even if a negative leap second would be necessary due to fluctuation, it is likely it would be merged with the next positive leap second due to the pressure to avoid them.

what about the deadlocks?

Posted Jul 3, 2012 14:02 UTC (Tue) by joey (guest, #328) [Link] (4 responses)

I'm surprised there's no mention of the kernel deadlocks experienced by some due to the leap second insertation.

This seems to be a good overview: http://landslidecoding.blogspot.com/2012/07/linuxs-leap-s...

There were reports of deadlocks with older kernels on, at least, Debian stable. Above page also points out reports of a "spinlock lockup" message whose cause remains unknown.

what about the deadlocks?

Posted Jul 3, 2012 17:10 UTC (Tue) by mgedmin (subscriber, #34497) [Link]

They were briefly mentioned in the comments of the other LWN story: http://lwn.net/Articles/504835/

what about the deadlocks?

Posted Jul 3, 2012 23:59 UTC (Tue) by mcisely (guest, #2860) [Link] (2 responses)

I personally experienced the deadlock issue.

I spent 3+ unplanned hours at work last Saturday afternoon trying to figure out why our main server kept wedging itself every time it transitioned from single user to multi-user mode. I had no idea about the pending leap-second at the time and had been suspecting storm damage from the previous night. I finally clued-in when I found that the kernel hang would happen every time about 5-8 seconds after starting ntpd.

It was running Debian Lenny (yeah, old) with a Debian stock 2.6.32-5-amd64 kernel. Server architecture is a multicore Athlon setup.

The hangs started happening at about 7:43AM CDT. Once I figured out that trigger was ntp, I disabled that, then searched the web and found this:

http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second

Monday morning I enabled ntp again and started it - and as expected, no lockup...

what about the deadlocks?

Posted Jul 7, 2012 1:16 UTC (Sat) by BenHutchings (subscriber, #37955) [Link] (1 responses)

Did you get a kernel log for this?

what about the deadlocks?

Posted Jul 7, 2012 1:22 UTC (Sat) by mcisely (guest, #2860) [Link]

The system locked up solid each time, no signs of distress or trouble anywhere prior to the lockup, not even on the console. So no there were unfortunately no kernel logs for this.

Leaping seconds and looping servers

Posted Jul 3, 2012 14:22 UTC (Tue) by bobsol (subscriber, #54641) [Link] (7 responses)

I'm curious about the scope of this problem. Where the reports refer to a distribution, Debian or a derivative keeps coming up. My servers are RH or Slackware based; no problems.

Leaping seconds and looping servers

Posted Jul 3, 2012 15:29 UTC (Tue) by ejr (subscriber, #51652) [Link]

And I haven't seen the problem on Debian (unstable & testing). Must be an interesting combination of factors.

Leaping seconds and looping servers

Posted Jul 3, 2012 17:16 UTC (Tue) by Ringding (guest, #34316) [Link] (1 responses)

I have a RHEL 6.2 server that sent MySQL into the CPU-burning state, kernel 2.6.32-220.17.1.el6.x86_64.

Leaping seconds and looping servers

Posted Jul 7, 2012 1:36 UTC (Sat) by BenHutchings (subscriber, #37955) [Link]

All kernel versions from the past few years have that bug.

The known bugs involving kernel hangs are:

1. Deadlock in printk at midnight (introduced in ???, fixed by commits b845b51, fa33507 in 2.6.29, 2.6.27.46, Debian package version 2.6.26-20)
2. Possible live-lock when NTP sets up the leap second (introduced in ???, fixed by 6b43ae8 in 3.4)

When I saw the fix for bug 2 it appeared that the live-lock being described was introduced by bd33126 etc. (post-3.3) and therefore not possible in earlier versions, but I'm no longer certain of this.

So I'm still hoping to find out what went wrong in 2.6.32 and in 3.2 so these can be fixed in stable updates.

Leaping seconds and looping servers

Posted Jul 3, 2012 18:32 UTC (Tue) by smoogen (subscriber, #97) [Link] (1 responses)

The problem was not just to any distribution.

My Fedora laptop running Chrome went 100% load during the leap second. My Fedora and RHEL server just running apache didn't.

However other RHEL and Fedora boxes reported issues depending on the software that was being run at that time.

Leaping seconds and looping servers

Posted Jul 4, 2012 11:05 UTC (Wed) by man_ls (guest, #15091) [Link]

My Debian testing machine went to 100% CPU with either Chrome or Firefox, the former load being quite irregular with many spikes and the latter a solid regular 100%. Luckily I have two cores and the second one seemed to be unaffected.

The funny thing was that after stopping both browsers everything went back to normal, but restarting any of them CPU load went up again. I had never seen something like this, so I took it as a divine sign telling me to go to bed and didn't look back... until the first LWN report.

My Debian stable SheevaPlug server displayed the deadlock, or something like it. That is a lot of bugs for a tiny second!

Leaping seconds and looping servers

Posted Jul 4, 2012 9:09 UTC (Wed) by ftc (subscriber, #2378) [Link] (1 responses)

I had some problems with a java-based streaming server, where identical software was installed on one CentOS 6.2 machine and two Debian Squeeze ones.

The installation on CentOS was affected by the problem (i.e. java started using an insane amount of CPU due to timers expiring immediately), while the installations on Debian appeared to work fine.

Leaping seconds and looping servers

Posted Jul 4, 2012 17:38 UTC (Wed) by Lennie (subscriber, #49641) [Link]

I've seen 64-bit Java on Debian having problems, but 32-bit Java on Debian was fine (even on the same machine).

So maybe you have 32-bit Java on Debian and 64-bit Java on CentOS ?

Leaping seconds and looping servers

Posted Jul 3, 2012 20:54 UTC (Tue) by andrewt (guest, #5703) [Link] (2 responses)

All this talk of fixing Linux, and I was hoping to hear how we would alter the Earth's rotation...

Leaping seconds and looping servers

Posted Jul 3, 2012 21:07 UTC (Tue) by corbet (editor, #1) [Link] (1 responses)

You'd be amazed how hard it is to get that kind of patch merged...

Leaping seconds and looping servers

Posted Jul 4, 2012 10:37 UTC (Wed) by nix (subscriber, #2304) [Link]

But, dammit, that sort of thing is just what a body with a name like the International Earth Rotation Service should do! Should the Earth spin on unregulated?

(I note that they've been renamed to the 'International Earth Rotation and Reference Systems Service', which is a very much less nifty name somehow.)

Leaping seconds and looping servers

Posted Jul 3, 2012 22:36 UTC (Tue) by butlerm (subscriber, #13312) [Link] (16 responses)

It ought to be obvious by now that POSIX time is a cruel joke not suitable for timing anything. So the question is, what is to be done about it? I suggest the following:

1. The adoption of a standard for real time, something like CLOCK_TAI. (It is too bad that CLOCK_REALTIME is already taken for such a ridiculously non real time clock).
2. The standardization of an interface for smoothed UTC, something like CLOCK_SLS.
3. The standardization of an interface for strict (non-smoothed,non-slewed) UTC, something like CLOCK_UTC.
4. Deprecate CLOCK_REALTIME.
5. At the kernel level, do all timekeeping using TAI or the equivalent. Convert to UTC, UTC-SLS as necessary.

Leaping seconds and looping servers

Posted Jul 4, 2012 2:40 UTC (Wed) by wahern (subscriber, #37304) [Link] (13 responses)

POSIX time is a beautiful thing. It simplifies time management for the vast majority of use cases. You can depend on 86400 "seconds" per day, yet still derive all the proper dates with a simple formula. For most cases, even software cases, time matters because we want to relate events to the natural rhythms of human life, not because we want to relate it to some arbitrary point in 1955 when the first atomic clock started ticking away.

The problem is that it's probably not the best thing on which to base your internal clock, nor for high-precision, SI second based algorithms.

The second problem is that a ton of Unix kernel code (Linux and *BSD), as well as the NTP system, is predicated on the existing behavior. Which is why the software industry and their lobbyists and politicians want to redefine UTC, instead of letting people migrate to TAI (or a TAI successor), with mappings to UTC.

Leaping seconds and looping servers

Posted Jul 4, 2012 13:24 UTC (Wed) by paulj (subscriber, #341) [Link] (12 responses)

POSIX time is a beautiful thing. … You can depend on 86400 "seconds" per day,

Does leap-seconds not invalidate that latter assumption? There are still 24 hours in the day, and 1440 minutes, but any of those minutes could have 61 seconds in them?

Leaping seconds and looping servers

Posted Jul 4, 2012 15:05 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Not under POSIX time they can't. One of those seconds might take 2s to pass, is all. It's a different tradeoff (and one that is intolerable for some applications, though perhaps not as intolerable as a livelock!)

Leaping seconds and looping servers

Posted Jul 4, 2012 15:29 UTC (Wed) by Tobu (subscriber, #24111) [Link]

Or said second happens twice, if you have higher-resolution POSIX timestamps.

Leaping seconds and looping servers

Posted Jul 5, 2012 1:46 UTC (Thu) by wahern (subscriber, #37304) [Link] (9 responses)

That's why I used the scare quotes. "Seconds" are the units the computer stores, but 7 months later nobody cares how many seconds have passed. They care how many months, days, hours and minutes have passed.

The SI second is different than the second that most people care about. When people speak of seconds, they're usually speaking in terms of remainders of a minute, which are remainders of an hour, which are remainders of a day. People chunk time, the same way we chunk everything else for our memory.

The day is a fuzzy unit, and equivocating day units with second units is always going to fail. So you can either redefine the day (as industry wants to do by changing UTC), or you can redefine the second, as POSIX effectively does. Or, I suppose, software could stop using second units for storage, and use expanded date-time descriptions.

I say that POSIX time is beautiful because it, rather accidentally, codified a redefinition of the second. And it turns out that it works very well for many use cases, both where you produce output directly for human consumption, or for inputting into algorithms which collaterally produce output for human consumption. It's a manifestation of the worse-is-better principle, where you sacrifice a quality like elapsed second precision for other qualities--simplicity and, depending on the context, robustness. However, it's just horrible for cases that rely on precise SI seconds, and it's clearly not simplistic from the perspective of the kernel.

Leaping seconds and looping servers

Posted Jul 5, 2012 4:53 UTC (Thu) by raven667 (subscriber, #5198) [Link] (1 responses)

How about we use GUIDs and distribute copies of the official list that uniquely identifies every second past or future.

8-) kidding of course.

Leaping seconds and looping servers

Posted Jul 12, 2012 13:41 UTC (Thu) by njs (subscriber, #40338) [Link]

Well. Except. It turns out that to get a really precise TAI timestamp, what you do is:

1) When the event occurs, get a timestamp from the nearest convenient national lab atomic clock.

2) Wait a few weeks while the different national labs compare their clock's relative drift rates over the last period, pick some sort of average as the "real TAI time", and publish tables mapping from what their clock actually did to this consensus clock that doesn't exist.

3) Look up the timestamp you got in step 1 in this table.

It's hard to make jokes about time...

Leaping seconds and looping servers

Posted Jul 5, 2012 5:20 UTC (Thu) by butlerm (subscriber, #13312) [Link]

POSIX time is great for most of the uses to which it is put. There is no need to get rid of it any time soon, nor could we. It just shouldn't be used as the basis of anything requiring sub-second accuracy, linearity, or precision.

This failure in particular would not have occurred if the kernel and all the pertinent applications used a reliable time base do timing with, instead of trying to derive reliable timing from a time base with all the stability of a drunken sailor.

Leaping seconds and looping servers

Posted Jul 5, 2012 9:40 UTC (Thu) by dgm (subscriber, #49227) [Link] (5 responses)

> I say that POSIX time is beautiful because it, rather accidentally, codified a redefinition of the second.

Forcing the 1 day=86400 seconds stuff was one of the worst ideas ever, and we're still paying the consequences. Effectively it means you are really counting days, not seconds, but a day is not a precisely defined unit. Had they just kept as a simple count of seconds, that would have been useful and simple. And less error prone.

I sincerely don't get how can you say this is beautiful.

Leaping seconds and looping servers

Posted Jul 6, 2012 17:14 UTC (Fri) by giraffedata (guest, #1954) [Link] (4 responses)

Forcing the 1 day=86400 seconds stuff was one of the worst ideas ever ... Had they just kept as a simple count of seconds, that would have been useful and simple. And less error prone.

You might be able to argue that in 2012 it would be better for POSIX to specify a simple count of seconds, but if you look at how the world was when that time format was invented, there just can't be any question that not including leap seconds in the count was the right thing to do.

Users of computers want traditional UTC-style year-month-day etc. datetimes. To convert from a simple count of seconds to that requires not only significantly more code to be written but some way to know when the leap seconds were. A file, a manual maintenance procedure, or a global computer network, all of which were simply not practical enough to meet the timekeeping requirements of the users.

Most of the present discussion isn't about whether the POSIX standard should be a simple count of seconds, but whether the kernel's time base should be. POSIX doesn't say how anyone has to keep time; it just says how it gets communicated. Either way, some users are going to be adjusting for leap seconds; some adding them, others removing them.

Leaping seconds and looping servers

Posted Jul 9, 2012 11:33 UTC (Mon) by dgm (subscriber, #49227) [Link] (3 responses)

> there just can't be any question that not including leap seconds in the count was the right thing to do.

It was a mistake, and no amount of historic excuses can make it a right decision. I never was the right thing to do, neither is it now.

> Users of computers want traditional UTC-style year-month-day etc.

Users don't talk to the kernel, nor with the system libraries. They talk to applications. And by the way, users really don't care about UTC, 99% of the time they want the time their watch says it is. Until that's not enough, and then what they want is an _unambiguous_ moment in time. Anything between those extremes is completely useless.

> To convert from a simple count of seconds to that requires not only significantly more code to be written but some way to know when the leap seconds were.

What exactly solved forcing the duration of the day? Nothing, everything is exactly as complex as it was, but now you have that "fictional" day which is different from what users need. And you still need complex time handling code because there are such niceties as timezones, local calendars, different week conventions, and even time dilation!

> Most of the present discussion isn't about whether the POSIX standard should be a simple count of seconds, but whether the kernel's time base should be.

The mere fact that there's a discussion is a sign that people are not sufficiently aware of past mistakes. We should be raising awareness of this, with the goal of avoiding repeating them. The Kernel should just count time in the most unambiguous way it can, and let applications handle the presentation.

> POSIX doesn't say how anyone has to keep time; it just says how it gets communicated.

An application cannot give the user an unambiguous moment in time if all it can get from POSIX is an ambiguous representation. That is the crux of the matter.

Leaping seconds and looping servers

Posted Jul 9, 2012 15:57 UTC (Mon) by giraffedata (guest, #1954) [Link] (2 responses)

Users of computers want traditional UTC-style year-month-day etc.
users really don't care about UTC, 99% of the time they want the time their watch says it is.

Every watch in the world displays traditional UTC-style year-month-day etc. My point is that POSIX time makes it more practical for a computer to display that than a straight seconds-since epoch time format would. Or are you referring to the watch being inaccurate?

And you still need complex time handling code because there are such niceties as timezones, local calendars, different week conventions, and even time dilation!

Except for the time dilation, which history proves you don't need in your time handling code, those are all simpler to handle than leap seconds. Asking the computer to understand leap seconds is asking for a whole other level of computation. The most significant part of that is knowing when the leap seconds are.

What exactly is solved forcing the duration of the day?

At the risk of being repetitive, it allows for practical calculation of UTC-style, wristwatch-style datetimes. Also for calculating differences in the larger units such as hours and days (when a computer user says "3 days after 10:00 Wednesday," he normally means 10:00 Thursday, not 86400*3 seconds after 10:00 Wednesday).

It also causes or fails to solve some other problems. The only question is which problems are greater?

By the way, if unambiguous datetimes were seen as a problem worth solving in the days that POSIX was invented, I think the proper solution would have been to go with the same "seconds since epoch not counting leap seconds" and then have a separate bit saying "leap second" that the folks who need to distinguish between 23:59:59 and 23:59:60 could use. Practically computable datetimes are that important.

Leaping seconds and looping servers

Posted Jul 9, 2012 18:44 UTC (Mon) by dark (guest, #8483) [Link] (1 responses)

Ah, but consider the user who says "3 days after Friday 10:00". Presumably that user wants "Monday 10:00", and not "Monday 9:00, 10:00 or 11:00 depending on whether there is a DST transition this weekend". So yeah, you can't use 86400*n anyway. Unless you want to mishandle DST, which is the option chosen by the vast majority of applications :)

Leaping seconds and looping servers

Posted Jul 9, 2012 23:16 UTC (Mon) by dgm (subscriber, #49227) [Link]

Indeed. 1 day = 86400 seconds is a computation that was broken from the start. It wasn't working back in the day POSIX was defined, and of course, things could only go downhill from there.

As a bunch of others have pointed out, leap seconds are very similar in essence to timezones -or any other oddity of civil time- in that they are _arbitrary_. They really belong in the code that has no choice but to handle them: the system libraries (glibc for modern Linux distros).

Leaping seconds and looping servers

Posted Jul 4, 2012 9:29 UTC (Wed) by Tobu (subscriber, #24111) [Link]

There was a recent patch (at the RFC stage) adding CLOCK_TAI to the kernel. The implementation is convoluted (John Stultz prefers if it is built using the current abstractions, which also had a problem with CLOCK_MONOTONIC pausing during a leap second), so it might still be bug-prone. Actually testing the damn clock subsystem ahead of time, and after reviewing any related commits, could compensate for that.

Leaping seconds and looping servers

Posted Jul 5, 2012 15:27 UTC (Thu) by kjp (guest, #39639) [Link]

I mentioned in the earlier article comments that I would support UTC-SLS (smeared seconds) in ntpd. Seriously, my company would contribute to a bounty for that. Is there a petition somewhere... PS Posix time works great for an awful lot of people.. when it's not going backwards.

Leaping seconds and looping servers

Posted Jul 4, 2012 1:48 UTC (Wed) by leromarinvit (subscriber, #56850) [Link]

Now I finally know what happened to my file server/router a few days ago! A few minutes after the leap second, Nagios sent me a warning about the load being somewhere around 15. I didn't think much of it, since the nightly backup sometimes does that. The next day I couldn't log in over SSH any more, and some time later dnsmasq stopped serving DHCP requests. Killing everything via sysrq finally brought it back to its senses, but at the expense of all the services normally started by init, so I just rebooted in the end.

There was a Java program running that updates a timestamp in its GUI every second, so that probably caused it. I initially thought said Java app had just gone mad and used up all memory, causing the box to swap itself to death, because I saw the HDD light flickering like mad. Of course, this being Java, it quite possibly did allocate some memory every time the counter fired, saving the GC for later...

For reference, this is a Ubuntu 10.04 box running 3.4.3-ck.

Leaping seconds and looping servers

Posted Jul 7, 2012 11:45 UTC (Sat) by lab (guest, #51153) [Link] (2 responses)

For those interested in the more deep background, I can highly recommend this post, by an authority on the matter (my countryman Poul-Henning Kamp): http://cacm.acm.org/magazines/2011/5/107699-the-one-secon.... It's a fascinating read.

Leaping seconds and looping servers

Posted Jul 8, 2012 0:14 UTC (Sun) by pr1268 (guest, #24648) [Link]

Fascinating indeed, and quite apocalyptic, too:

But Linus Torvalds' observation that "95% of all programmers think they are in the top 5%, and the rest are certain they are above average" [...] When a large fraction of the world economy is run by the creations of lousy programmers, and when embedded systems are increasingly capable of killing people, do we raise the bar and demand that programmers pay attention to pointless details1...?

Heavens, yes!

1 This experience has shown us that how Linux handles leap seconds is not a "pointless" detail.

Leaping seconds and looping servers

Posted Jul 8, 2012 10:45 UTC (Sun) by nix (subscriber, #2304) [Link]

Couple of amusing typos in that.

"Some systems have resorted to slowing down the clock by 1/3600th for the last hour before the leap second, hoping that nobody notices that seconds suddenly are 277 microseconds long."

I think that might be noticed! Seconds being 277 microseconds longer, though, is less likely to be noticed except by people who really care.

Leaping seconds and looping servers

Posted Jul 9, 2012 13:54 UTC (Mon) by Jonno (subscriber, #49613) [Link] (6 responses)

What we really should do is make the kernel (including all syscalls that deals with time) use TAI exclusively, and let tzdata convert between TAI, UTC and local time-zones as needed.

That way inserting a leap second will be handled the same way as when politicians change the DST rules, with a simple tzdata package update.

Yes, this would break POSIX, but it wouldn't be the first time Linux break POSIX in some small way when POSIX dictates stupid behaviour.

Leaping seconds and looping servers

Posted Jul 9, 2012 16:15 UTC (Mon) by nix (subscriber, #2304) [Link] (5 responses)

It would not just break POSIX but a huge number of applications, unless you complicated glibc to do the conversion, in which case it would still be obviously broken in some major cases, as the time you got from gettimeofday() would be different from the time visible on a file you just created when you stat()ted it, unless you had glibc adjust *that* too -- and that way lies madness.

And for what benefit? Moving a pile of complexity out of ntpd (which is meant for dealing with this sort of thing) and out of the kernel's time handling code (which is a single body of code maintained by people who know what they're doing) into glibc and a vast body of applications. Now perhaps glibc could get its updates via tzdata, but are the applications all going to get it right? They get it wrong *now*, many would need changing, and as has been pointed out elsethread, getting this right is hard, since even the original authors of many programs probably didn't expect 'this time tomorrow' and 'this time 86400 seconds away' to have distinct answers, and nearly all the time they wouldn't.

The solution to bugs in a bit of code in highly-tested critical software that is hard to debug and test because it caters to a rarely-arising condition is surely *not* to distribute and multiply that code among a vast number of applications, critical and otherwise, many of which are much less tested than the kernel is.

For this to be less dangerous, you'd need to translate every single time the kernel passes to userspace by whatever means (including timestamps in network filesystems!), teach everything that touched raw fs dumps to translate times as well, and unless you want to waste all that time add TAI-returning syscalls akin to gettimeofday() et al... and that looks like a lot more code than the existing set of leap-second-handling code, which is clearly *already* too rarely executed to be expected to keep working between leap second invocations.

Or we could just wait. Leap seconds are getting quadratically more frequent over long enough timespans, and will be downright common in timespans comparable to that since the invention of the computer. When the things are occurring monthly, something will either be done to fix it or at the very least the code to handle them will be frequently tested!

Leaping seconds and looping servers

Posted Jul 9, 2012 17:49 UTC (Mon) by rschroev (subscriber, #4164) [Link] (2 responses)

Maybe I'm missing something, but it seems to me you're making it much more complex than it needs to be.

Given the fact that leap seconds, to me it seems to best way to handle time is:

- The kernel deals exclusively with TAI, by which I mean the number of seconds since the Epoch. Leap seconds are seconds just like any other. That means that some days are somewhat shorter or longer than 86400 seconds, but that's not important to the kernel. That is consistent with the manpage already: "time() returns the time as the number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC)."

- Programs can use that directly if they want to, mostly if they want to know the time delta between two moments in seconds. In other cases, they can call asctime(), ctime(), gmtime() or localtime() which handle the translation from TAI to UTC and local time. Perhaps functions should be added for conversion bewteen TAI and UTC, both expressed as seconds since the epoch.

This way the kernel time handling code is as simple as can be (no special cases there), and the complexity is moved to glibc which already knows how to handle time zones. Adding or subtracting leap seconds is very similar to handling time zones, I would think. The system always has a simple unambiguous idea of the time, converted to/from wall clock time as needed by glibc for interacting with users, which to me is just a natural extensions of the current concept of the kernel using UTC instead of local time.

I agree it makes time delta calculations more difficult: instead of adding n * 86400 seconds for n days, you have to add the required number of days/months/years in struct tm.

If I understand you correctly, you are saying that such a solution would cause the time from gettimeofday() to be different from the file time on a freshly created file. I don't understand that... since TAI is all the kernel knows, but will give you the same value.

The only real disadvantage I see is that it potentially makes interoperation between different systems harder, because some systems might do it this way and some might not.

Leaping seconds and looping servers

Posted Jul 10, 2012 16:00 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

The problem with that ultra-simple case is that it is completely incompatible with the installed base of applications. You *cannot* change gettimeofday() et al to return TAI, because nobody who calls gettimeofday() is expecting it, and because the relevant standard guarantees that adding 86400 to a time will always give you the same time on the next day. A *lot* of code depends on this assumption, and you can't sensibly distinguish between code that wants 'exactly one day from now' and code that wants 'the same time, tomorrow' -- which suddenly become different things, though the authors of the code making that assumption pretty much universally didn't expect that.

Leaping seconds and looping servers

Posted Jul 10, 2012 16:09 UTC (Tue) by rschroev (subscriber, #4164) [Link]

... and because the relevant standard guarantees that adding 86400 to a time will always give you the same time on the next day. A *lot* of code depends on this assumption, ...
That assumption is already wrong twice a year in places that observe daylight saving time, by a full hour in most places, which is a lot more than the difference between TAI and UTC. Code that depends on it is already incorrect.

Leaping seconds and looping servers

Posted Jul 9, 2012 20:49 UTC (Mon) by Jonno (subscriber, #49613) [Link] (1 responses)

> It would not just break POSIX but a huge number of applications

Very few application does syscalls directly, almost everyone goes through glibc which already converts times using tzdata to get time-zone handling correctly.

> unless you complicated glibc to do the conversion, in which case it would still be obviously broken in some major cases, as the time you got from gettimeofday() would be different from the time visible on a file you just created when you stat()ted it

Of course not, all not-time-zone-aware times would be TAI, and all time-zone-aware times would be correct for that time-zone. UTC would be just another time-zone (which it already is, though today the kernel-time to UTC conversion is trivial).

> And for what benefit? Moving a pile of complexity [...] into glibc and a vast body of applications.

Applications would need no more complexity than what they already need for correct time-zone handling. The small number of applications that lacks that (small) complexity today and thus uses UTC exclusively (and are thus wrong for all users at least half the year) would just start using TAI instead (and thus be wrong for all users all year). All other applications would work just fine without any change.

tz-data would need to start carry leap second information, and glibc would need to make use of it, but that is trivial compared to what they already deal with (leap seconds are after all the same in all jurisdictions).

The only real problem is handling the transition correctly. I believe there would only have to add fixes to three components for it to work.

  • glibc would need to detect the kernel version and decide whether to use the leap second information from tz-data depending on whether the kernel runs in UTC or in TAI.
  • hwclock would need to use glibc to convert between UTC and kernel time (just like it does today when it converts between local time and kernel time for dual boot system whose BIOS clock runs in local time).
  • NTP would need to either introduce a flag day when the NTP pool switches from UTC to TAI, or (more likely) adding some compatibility code so new NTP versions that speak TAI can communicate with old NTP versions that speak UTC.

Leaping seconds and looping servers

Posted Jul 10, 2012 16:02 UTC (Tue) by nix (subscriber, #2304) [Link]

You completely ignored everything I mentioned about filesystems, networking, the intersection of the two, and other routes for times out of the kernel which do not pass through glibc (or, if they do,. Since this is the nub of the problem, requiring replication of rarely-tested TAI-to-UTC conversion code in many places where it is currently centralized in two (NTP and the kernel), it is not surprising that you thought there was no real problem. There is.

Leaping seconds and looping servers

Posted Jul 12, 2012 13:50 UTC (Thu) by njs (subscriber, #40338) [Link] (1 responses)

I just find it baffling that apparently there is not a single Linux-supporting IT shop in the world that thought "hmm, leap second coming up in a few months, maybe we should test that". It's not like the hrtimer bug was subtle or depending on an unusual system configuration.

Leaping seconds and looping servers

Posted Jul 12, 2012 14:41 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Yep. As we call it in Russia: "The winter came suddenly".


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds