It might be sufficient to just disable the heuristic when the thread making the syscall has real-time priority. Don't know if that's the Perfect Answer, but I think it would be an incremental improvement at least.
I don't much like the idea of making this an undocumented difference between different timing syscalls (like someone else suggested), so that if you use ppoll you get one thing and timerfd another etc. -- I don't see why timerfd should be useful only to apps who need precision and don't care about power! Really the behavior should be uniform across ppoll/poll/pselect/select, epoll, timerfd, nanosleep, posix timers, interval timers. (Not sure if all of those use hrtimers yet; I know nanosleep and posix timers do in -rt.)
For timerfd in particular, one could add a timerfd_setslack call without breaking compatibility. It might be possible for some of those other APIs as well.
Posted Sep 5, 2008 5:24 UTC (Fri) by arjan (subscriber, #36785)
[Link]
the realtime thing is there already in my current codebase
right now what I do is (in summary)
if realtime => slack is 0
if nice, slack is 0.5% with a max of 100 msec
if not nice, slack is 0.1% with a max of 100 msec
if not rt and slack is less than the per thread setting, use the per thread setting