|
|
Log in / Subscribe / Register

CLOCK_MONOTONIC change reverted

The 4.17 merge window included a change in the behavior of the CLOCK_MONOTONIC system clock; in particular, it would advance after a resume to reflect the time that the system was suspended. At the time, the developers involved acknowledged that the change might have to be reverted if it caused regressions. It seems that some such regressions have indeed been reported, so a revert has been queued with this comment:

As reported by several folks systemd and other applications rely on the documented behaviour of CLOCK_MONOTONIC on Linux and break with the above changes. After resume daemons time out and other timeout related issues are observed.

It's sad, that these problems were neither catched in -next nor by those folks who expressed interest in this change.



to post comments

CLOCK_MONOTONIC change reverted

Posted Apr 26, 2018 12:22 UTC (Thu) by judas_iscariote (guest, #47386) [Link] (5 responses)

Was a bug report filled at the systemd issue tracker.. ? if not please do so explaining exactly what relies on the incorrect old behaviour ! . I'm certain that everybody wants this new CLOCK_MONOTONIC sane behaviour.

CLOCK_MONOTONIC change reverted

Posted Apr 26, 2018 13:02 UTC (Thu) by tglx (subscriber, #31301) [Link] (1 responses)

Well, even if we file a bug and stuff eventually gets fixed, the change will break existing user space and probably more than just systemd and network mangler. It was expected and it might have been possible to fix this ahead of time, but the only option 4 weeks before the final 4.17 release is to revert it.

CLOCK_MONOTONIC change reverted

Posted Apr 27, 2018 14:48 UTC (Fri) by eternaleye (guest, #67051) [Link]

To be honest, what confused me most here was the approach taken to making the change. If a behavior is as useful and widely-documented as CLOCK_BOOTTIME vs. CLOCK_MONOTONIC is (and believe me, the _behavior_ distinction is very useful regardless of name), why was doing a straight renaming in one shot _ever_ considered?

When a behavior's name is suboptimal, there's a pretty well established way of fixing that - introduce a new name, deprecate the old name, and have a deprecation cycle.

1. Add CLOCK_MONOTONIC_ACTIVE as an alias for CLOCK_MONOTONIC, deprecating CLOCK_MONOTONIC (maybe add #warning)
2. Add an #ifdef that people can set to switch (deprecated) CLOCK_MONOTONIC = CLOCK_MONOTONIC_ACTIVE to (new) CLOCK_MONOTONIC = CLOCK_BOOTTIME
3. Flip the default ifdef, after waiting long enough
4. Deprecate CLOCK_BOOTTIME (maybe add #warning)
5. Remove CLOCK_BOOTTIME, after waiting long enough
6. Remove the #ifdef, after waiting long enough (this is last because previous stages _introduce_ errors or at least warnings, while this would cause build-silent misbehavior, as was seen with the current patch)

I cannot think of a reason that "just change it" was considered viable for public, documented API and ABI?

CLOCK_MONOTONIC change reverted

Posted Apr 26, 2018 13:32 UTC (Thu) by zuki (subscriber, #41808) [Link]

The effect was reported as "If you suspend >3min, then after a resume systemd watchdog timers all fire.".

CLOCK_MONOTONIC change reverted

Posted May 3, 2018 5:15 UTC (Thu) by smckay (guest, #103253) [Link] (1 responses)

In what sense is the old behavior incorrect? It matches the documentation and is monotonic. The new behavior is monotonic but doesn't match the documentation. That sounds less correct than before, to me.

CLOCK_MONOTONIC change reverted

Posted May 14, 2018 12:29 UTC (Mon) by sourcejedi (guest, #45153) [Link]

See e.g. Why is there a Linux kernel policy to never break user space?

Specifically also, we really want people to be testing new upstream kernels and reporting regressions. (That's probably not a very compelling activity on its own, but there are more positive reasons people play with running upstream kernels as well. For example, hoping it fixes a regression introduced in the previous version :). They will be much less willing to test and report, if we're deliberately breaking their system.

Making a change which kills logind on any suspend over 3 minutes, which then kills X / gnome-shell, definitely counts as breakage :-).

AFAIK this change also breaks rtkit, for example.

Here's another error you get if you suspend the host that's running a VM (as I mentioned on the previous LWN.net article, this causes the same problem, due to a bug somewhere in qemu/KVM):

rtkit-daemon[1218]: The canary thread is apparently starving. Taking action.
rtkit-daemon[1218]: Demoting known real-time threads.
rtkit-daemon[1218]: Demoted 0 threads.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds