I think the leveling confusion in the diagram starts with the exit from the "wait for QS" state.
"CPU passes through QS" and "CPU offline" are considered singleton events, but the "GP too long" is considered a bulk event with all blocking CPUs being addressed.
This, and the explanation of the initiation of the check for the dynaticks state, clarified for me (indeed corrected a misconception I had ) that the dyntick counters are not captured at the beginning of each GP, but only after a "time out" (a reasonable optimization for a low occurrence event).
Speaking of the time out -
in "Detect a Too-Long Grace Period" the "record_gp_stall_check_time() function records the time and also a timestamp set three seconds into the future." timeframe seemed excessive. I believe jiffies was meant (not seconds) which would be consistent with the later reference "A two-jiffies offset helps ensure that CPUs report on themselves when possible".