The trouble with the TSC

By Jake Edge
May 19, 2010

The time stamp counter (TSC) provided by x86 processors is a high-resolution counter that can be read with a single instruction (RDTSC), which makes it a tempting target for applications that need fine-grained timestamps. Unfortunately, it is also rather unreliable, so the kernel jumps through hoops to decide whether to use it and to try to detect when it goes awry. An effort to export the kernel's knowledge about the reliability of the TSC has met strong resistance for a number of reasons, but the biggest is that the kernel developers don't think that applications should be accessing the counter directly.

Dan Magenheimer and Venkatesh Pallipadi proposed adding a /sys/devices/tsc directory with several entries corresponding to the kernel's internal TSC information, including the tsc_unstable flag, which governs whether the kernel uses the counter as a stable time source. Andi Kleen questioned the idea:

Is this really a good idea? It will encourage the applications to use RDTSC directly, but there are all kinds of constraints on that. Even the kernel has a hard time with them, how likely is it that applications will get all that right?

That is exactly what the patch is meant to do, Magenheimer said, because applications have no reliable way to determine whether the standard system calls will be "fast" or "slow":

The problem is from an app point-of-view there is no vsyscall. There are two syscalls: gettimeofday and clock_gettime. Sometimes, if it gets lucky, they turn out to be very fast and sometimes it doesn't get lucky and they are VERY slow (resulting in a performance hit of 10% or more), depending on a number of factors completely out of the control of the app and even undetectable to the app.

Note also that even vsyscall with TSC as the clocksource will still be significantly slower than rdtsc, especially in the common case where a timestamp is directly stored and the delta between two timestamps is later evaluated; in the vsyscall case, each timestamp is a function call and a convert to nsec but in the TSC case, each timestamp is a single instruction.

Depending on the hardware, gettimeofday() and clock_gettime() may be implemented as vsyscalls—virtual system calls—rather than standard system calls, which eliminates the user space to kernel transition. Vsyscalls are code that is stored in a special memory region in user space (the vdso region) that may access kernel-maintained data, like clock ticks. Using vsyscalls, the calls are (relatively) fast, but on some hardware (or virtual machines) that requires kernel-space operations to get to a reliable counter, a vsyscall cannot be used, so the calls are slower. For applications that "need to obtain timestamp data tens or hundreds of thousands of times per second", the difference is significant.

But Magenheimer believes that if the kernel finds the TSC stable enough for its own timekeeping purposes, then that guarantees that it is usable by applications. Arjan van de Ven and Thomas Gleixner are quick to correct that misunderstanding. Van de Ven notes that the stability of the TSC can change under certain circumstances and there would be no way to notify the applications. His advice: "friends don't let friends use rdtsc in application code".

Gleixner goes into some detail about how the TSC can get out of whack, including system management mode interrupts (SMIs) fiddling with the TSC to hide their presence, that multiple cores can have different values because of boot offsets and/or hotplugging, and that multiple sockets can introduce differences due to separate clocks or drift in the clock signals due to temperature. There is, in short, nothing reliable about the TSC: "the stupid hardware is not reliable whether it has some 'I claim to be reliable tag' on it or not". Gleixner did offer a possible alternative, though:

[...] but as long as we do not have some really reliable hardware I'm going to NACK any exposure of the gory details to user space simply because I have to deal with the fallout of this.

What we can talk about is a vget_tsc_raw() interface along with a vconvert_tsc_delta() interface, where vget_tsc_raw() returns you an nasty error code for everything which is not usable.

Currently, there are unnamed "enterprise applications" that attempt to figure out whether they can use the TSC, and do so if they think it will work because of the uncertainty in the performance of gettimeofday() and friends. Magenheimer suggests that perhaps that information could be made available:

But the kernel doesn't expose a "gettimeofday performance sucks" flag either. If it did (or in the case of the patch, if tsc_reliable is zero) the application could at least choose to turn off the 10000-100000 timestamps/second and log a message saying "you are running on old hardware so you get fewer features".

Magenheimer also wonders if the kernel developers are suffering from "hot stove" syndrome, in that they have been burned in the past and are reluctant to even consider changes. But Gleixner and van de Ven both point out that there is no hardware that can make the guarantees that Magenheimer wants. And Gleixner has the burn marks to prove it:

I'm unfortunately forced to deal with the 500+ different variants of borked timers and that makes me very reluctant to believe anything what chip/board/bios vendors promise. It's not the one time hot stove experience, it's the constant exposure to the never ending supply of hot stoves, which makes me nervous.

While the discussion had various interesting analogies including hanging ropes/knives and condoms versus abstention, it did not (yet) find a car analogy. It did, however, seem to find some common ground that information about whether the clock calls are implemented as vsyscalls or system calls should be exported. That is unlikely to satisfy those that have been "using vsyscalls for a while and still have a performance headache", who Magenheimer quotes, but there is nothing stopping applications from reading the TSC directly. Those applications just have to be prepared to handle any strange TSC behavior they encounter.

Ingo Molnar tries to clarify the reasons that the kernel can't export the reliability information: "The point is for the kernel to not be complicit in practices that are technically not reliable. [...] So the kernel wont 'signal' that something is safe to use if it is not safe to use." But he also sees some reason to hope:

You could win the argument by coming up with a patch that changes gettimeofday to make use of the TSC in a reliable manner.

I really mean it - and it might be possible - but we have not found it yet.

Peter Zijlstra has another solution to the problem. He would like to see the kernel move to eventually disable RDTSC from user space. By emulating the instruction and logging all uses of it (and the related RDTSCP), user-space programs that use it could be identified and changed:

Once we get most of userspace running fine, we can switch it to generating faults.

Of course closed source stuff will have to deal with it themselves, but who cares about that anyway ;-)

Exporting the information about whether gettimeofday() is "slow" or not seems like a reasonable starting point. No patches to do that have emerged yet, but it is a fairly straightforward thing to do. Eventually, something like Gleixner's vget_tsc_raw() may also come about, though it won't satisfy those who are unhappy with the current vsyscall performance. Those applications will just have to read the TSC themselves and deal with whatever the hardware throws at them.

Index entries for this article
Kernel	Timers

TSC useful for embedded

Posted May 20, 2010 13:49 UTC (Thu) by abatters (✭ supporter ✭, #6932) [Link]

I write code for embedded x86 systems. I use RDTSC in my code on embedded systems where testing has shown it to be reliable. I don't use it with AMD processors or boards with multiple CPU sockets, etc. Most applications have to run on a large variety of hardware, but my embedded code runs only on hardware that I personally test and verify to work. Maybe you can't use RDTSC in a reliable way in general applications on general hardware, but in more restricted circumstances it can be useful.

The trouble with the TSC

Posted May 20, 2010 17:36 UTC (Thu) by sustrik (guest, #62161) [Link] (3 responses)

I am using RDTSC as well. I don't need it to be reliable, but I need it to be fast. Disabling it in userspace would suck badly.

The trouble with the TSC

Posted May 21, 2010 17:04 UTC (Fri) by blitzkrieg3 (guest, #57873) [Link] (2 responses)

So you're cool with time going backwards?

The trouble with the TSC

Posted May 21, 2010 17:13 UTC (Fri) by sustrik (guest, #62161) [Link]

Sure. No problem with that. It's only used as a heuristic to find out whether at least some time have elapsed since the previous time measurement. I don't need correct answer each time. Correct answer in say 90% of cases will do.

The trouble with the TSC

Posted May 21, 2010 21:11 UTC (Fri) by foom (subscriber, #14868) [Link]

Well, if you use gettimeofday(), you'd better be cool with time going backwards too, cause it happens rather often.

(hardware/kernel version with malfunctioning timekeeping, also running NTP; NTP will step the time once in a while because it can't keep up with the clock drift. You might say that's uncommon, but not so uncommon that you won't run into it!)

The trouble with the TSC

Posted May 20, 2010 19:35 UTC (Thu) by dmadsen (guest, #14859) [Link] (11 responses)

I know! Let's make a GUI with a paperclip that says "I'm sorry Dave, I'm afraid I can't do that".

What happened to the philosophy of letting the {user,programmer} decide if he wants to shoot his foot? If someone is accessing the TSC, then he also should know the pitfalls.

Isn't there a better use of kernel developer time than creating needless restrictions?

The trouble with the TSC

Posted May 21, 2010 1:32 UTC (Fri) by dlang (guest, #313) [Link]

the kernel developers are afraid that if they provide this and programs rely on it, there will be a bunch of complaints to the kernel people when it doesn't work well.

given how badly it works, they expect this to be the common case rather than the exception, so they don't want to enable something that doesn't work and will cause them problems.

The "philosophy" was dead for a long, long time aready... deal with it

Posted May 21, 2010 8:30 UTC (Fri) by khim (subscriber, #9252) [Link] (7 responses)

The ability to shoot his foot is all good and well if you only have knowledgeable, responsive people at the keyboard. Or massive per-review system. When IT was young all IT people were like this (hey, if you need a week just to run the program once you'll be careful... and ask a lot of other people to verify that you've not written crap).

But over time as more and more people got access to the computer this approach started to fail: more and more clueless people appeared. First as users, then as developers too. And as access to the CPU become less and less expensive (hey, the CPU you have in your pocket is more powerful then what you had for million dollars fifty years ago) even clueful people started doing mistakes (there are not enough time to carefully reread every written line hundred times anymore). So today this approach is limited to the kernel - and may be even this group is too big.

This is natural evolution. Compare with cars. Early models were primitive but allowed you to tinker freely - and it was easy to damage them by abuse. Today if you'll sell car which blows up if you press the gas pedal too much... well, you'll fired - best case. Worst case - you'll go to jail.

Unix (and linux) always had the ability to restrict the user (file permissions and quotas). Today it can restrict the program too (seccomp, SELinux, AppArmor, etc). The next step if, if course, developers.

And just like with cars: if you need highly specialized system which will not use the same roads as regular cars (off-road car or rocket car) - you can ignore the warnings and remove the "superfluous" checks - but this is not an options for 99% developers out there.

The "philosophy" was dead for a long, long time aready... deal with it

Posted May 22, 2010 6:38 UTC (Sat) by dmadsen (guest, #14859) [Link] (6 responses)

The problem I have is not necessarily with this particular instance; it's with the philosophy of restricting someone else's freedom "for their own good".

I should not have to pay because "more and more clueless people appeared". Shall we all then live to the lowest common denominator? My God, soon we'd all be using Windows Starter Edition! :-)

One way I got to be a "knowledgable, responsive [responsible?] user" was to hurt myself doing "stupid" things. Ctrl-Alt-Backspace *should* hurt -- once.

And you know, code review isn't so bad: when I coded at the keypunch, it required more debugging than when I used the coding pad. Writing code that I (and hopefully others) reviewed not only made better code then, but taught me (and others) to write better in the future. (I would recommend reading "The Psychology of Computer Programming" by Gerald M Weinberg for more information). And if I talked to someone who'd done rm -rf incorrectly, maybe I could avoid their pain.

If it is true that natural evolution is to produce buggy code quicker, than perhaps we should resist. Maybe sometimes celerity isn't a virtue, and the modern motion of the inevitability of bugs in code isn't true.

The assumption that kernel code is somehow special and should be specially treated is wrong -- *someone* is going to depend on code you write no matter how big or small the project, and an error in that code is gonna cause *someone* some trouble. If you don't believe that, than you should not be writing code for others to use.

File permissions, quotas, etc are there to mainly stop a system user from hurting other system users. That's one of the normal policies of an OS. On the other hand, training wheels are fine for novices, but they are meant to come off.

Again, I'm not talking about removing for everyone the equivalent of safety gear in a program. What I'm talking about is deliberately engineering something so that any safety gear is [almost] impossible to remove. In normal use, I should be able to use something safely, but, in my freedom, I must be able to remove or modify it *even if you don't think I should*.

Keeping up with the car analogy, would you buy a car deliberately built so it cannot go faster than 65 MPH and attempts to circumvent that would be illegal?

I do understand -- and appreciate -- the point that they aren't only trying to protect others, but that the kernel devs don't want to hear clueless whining.

My point -- and really my main point -- is that when you operate to remove another's freedom, it had better be at more than just a whim, and more than your convenience. In this case, that fact that you have a couple people saying "don't do it!" to me means that it shouldn't be done.

It's not as if there aren't a lot of other kernel decisions that people can't whine about, you know. This is just yet another reason for those on LKML to say "RTFM. Go Away.". :-)

Look around you before answering... please...

Posted May 22, 2010 10:26 UTC (Sat) by khim (subscriber, #9252) [Link] (3 responses)

I should not have to pay because "more and more clueless people appeared".

You are not paying "more". If anything you pay less - old systems can be bought for cheap (unless they are very old and reach the antique category). You want new stuff for cheap? Sorry, it's not for you - it's cheap because of the economy of scale and this economy is only possible because it's not for you.

One way I got to be a "knowledgable, responsive [responsible?] user" was to hurt myself doing "stupid" things.

You think that being a "knowledgable, responsible user" is worthwhile goal and worth spending time. Most users don't think so - and since systems are created for them the rules are adjusted for them.

Maybe sometimes celerity isn't a virtue, and the modern motion of the inevitability of bugs in code isn't true.

Bug-free software is absolutely impossible. Deal with it. You can create tiny kernel which is mostly bug-free (and no, linux is way too big for that), but the rest of system will be bug-infested no matter what the programmers will be doing.

The assumption that kernel code is somehow special and should be specially treated is wrong -- *someone* is going to depend on code you write no matter how big or small the project, and an error in that code is gonna cause *someone* some trouble.

Brilliant observation! That's why I have zero sympathy for freetype developers. They brought the problems on themselves by exposing private interfaces - so not they should deal with the fallout. They should have done what all sensible people are doing: attached the visibility ("hidden") to all internal functions.

File permissions, quotas, etc are there to mainly stop a system user from hurting other system users. That's one of the normal policies of an OS. On the other hand, training wheels are fine for novices, but they are meant to come off.

Helmet, on the other hand, is meant to be used by everyone. Yes, some things should only be enabled in debug libraries (like _GLIBCXX_DEBUG) and some should be enabled all the time.

What I'm talking about is deliberately engineering something so that any safety gear is [almost] impossible to remove.

This is the right way to design things. Safety gear must be built robustly - or else it'll not work. It can always be removed by direct modification of program source (or even binary if the need is really strong). The ability to turn it off easily is actively harmful.

Keeping up with the car analogy, would you buy a car deliberately built so it cannot go faster than 65 MPH and attempts to circumvent that would be illegal?

Brilliant example. Of course not! 65 MPH is not enough - there are roads where you can travel faster! Real car's speed limit is set to 155mph or 112mph (depending on country). On the other hand, if you meat to say that artificial limits in cars is something unimaginable you've utterly failed: not only it's imaginable - it's widespread in real world!

My point -- and really my main point -- is that when you operate to remove another's freedom, it had better be at more than just a whim, and more than your convenience.

See the freetype example above. API usage freedom is not right, you must earn it - by showing use cases where it's needed and where nothing else really fits. See the recent dicussion related to such right: it looks like Google will get the wakelocks in the end, but it was not an easy sell. And this is good thing. Often it's much easier to just muck with system internals rather then use proper interfaces - but this leads to the Windows-like mess, where you can't change anything without breaking something.

Look around you before answering... please...

Posted May 23, 2010 5:40 UTC (Sun) by dmadsen (guest, #14859) [Link]

It is clear to me that we have rather different world views in the areas we have discussed.

This has brought home to me in a personal way the difficulties that a successful project manager must have when working with a diverse population, especially regarding cohesiveness and consistency of vision. Gentlemen, my hat's off to you!

Look around you before answering... please...

Posted May 23, 2010 13:49 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)

There was no portable way to enable hidden visibility when FreeType 2.0 was released. GCC's visibility attribute is much newer. (But, yes, they should have hidden their internal interfaces *somehow*.)

I can understand SNAFU with freetype 2.1, but why persist in folly ?

Posted May 23, 2010 17:54 UTC (Sun) by khim (subscriber, #9252) [Link]

You are right, of course. Freetype 2.1 was released in 2002 and gcc only got visibility attribute in 2003. And other methods are not as nice. But we are in 2010 - and yet Freetype 2.3.12 (released just a few months ago) does not use visibility(hidden). Not even as option!

The "philosophy" was dead for a long, long time aready... deal with it

Posted May 30, 2010 4:32 UTC (Sun) by fredi@lwn (subscriber, #65912) [Link] (1 responses)

I agree on most of what you said. Except for one thing; you're talking about freedom, freedom to use your hardware as you please. And kernel people taking out that freedom. Here is where i dont agree, look, you have the sources, is not that the fact that there's no interface kernelside to use eg. RDTSC limits you, you have the source, so ... in case you need it, just add another syscall and use it.

The "philosophy" was dead for a long, long time aready... deal with it

Posted Jun 1, 2010 5:30 UTC (Tue) by dmadsen (guest, #14859) [Link]

I understand your point: if I don't like the code, I'm free to modify it as I please. And that is such a good thing, because it means that no matter what anyone does, if I feel strongly enough about it to spend money/time/effort, I can change it.

But I'm also talking about something a bit more subtle, which says "[don't restrict me] ** only because you think it's for my own good **". In this case, the reasoning is "let's disable it because it's not reliable and someone using it might have negative results". As a philosophy, this says "protect people from harming themselves".

My point is that to go down this path *in a general way* is a slippery slope; where do we stop in making a "safe" system, one where a {user,developer} can't hurt himself? Shall we, for example, as a policy decision, eliminate the ability to remove files from users and make it for root only?

I am aware, of course, that different functionality is directed towards users of different experience levels, and how that might affect any let's-protect-the-user-from-himself decisions.

But much learning comes from making mistakes -- and the "baby-proofing" one does in a home with toddlers is only enough to so that the baby doesn't get permanently damaged before he learns.

So I re-emphasize that we must be careful not to *blindly* apply the "let's remove that function for his own good" philosophy. Each case must be thought about carefully. And even then, should the decision to be made to protect the user from himself, it should be easy for that user to say "I've learned I can hurt myself, and I don't want to be coddled/restricted anymore because I've also learned how to use that sharp knife you were shielding me from". That's where freedom comes in.

And I re-state that the particular function in this article is surely not likely to be used by someone who doesn't already have the level of knowledge to understand the potential pitfalls. That is, he should not need to be protected, as he knows the stove is hot already.

The trouble with the TSC

Posted May 21, 2010 10:20 UTC (Fri) by farnz (subscriber, #17727) [Link]

There's a well-supported interface that (in theory) could be almost as fast as RDTSC for the case where the TSC is reliable - it's called gettimeofday(). It has the advantage that it can be reliable even when the TSC is not, and can exploit future timers that are even better than the TSC. And, with vsyscall support, it's all userspace, so you don't pay the price of a context switch, just of converting the TSC to seconds.

Given this, why not disable userspace support for RDTSC when it's not needed as part of a fast implementation of gettimeofday()? Applications on a custom platform can use a slightly modified kernel that always permits RDTSC (and the serializing variant RDTSCP), while applications that can't control the platform should rely on gettimeofday(). The missing bit is kernel support for telling you whether gettimeofday() is fast or slow - if it's slow, you can log this, and disable things that depend on gettimeofday() being fast.

The trouble with the TSC

Posted May 21, 2010 10:39 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

The kernel also "goes out of its way" to stop you from trying to generically hook system calls. No doubt there are thousands of developers out there who think they have a brilliant use case for hooking a system call, and no doubt one of them really does - he should talk to the LKML about his idea. The rest though, were going to do something horrid and thoughtless and then everyone else was going to have to live with the resulting mess. So, you don't get to do that in a vanilla kernel.

There's a balance to be struck and I think that the difference between what people think the TSC will do (cheap, fast, reliable way to measure time) and what they get (rarely all, and often none of the above) makes the status quo acceptable. I wouldn't support actively blocking use of TSC, but it certainly makes no sense for the kernel folk to endorse it as this patch proposed.