You're right there's some implementation variations. This said, leakage becomes nasty even on LP processes nowadays. It used to be the case that clock gating could be sufficient, but from 40nm downward (even in 40LP, the low-power / low leakage variant) for good performance you want power gating for long durations. So the notion of race to idle will apply to most systems, and I can testify it applies to embedded SoCs very (VERY ;) far from Intel CPUs.
There will still be variations in how long is needed to justify the cost of mode changes, and what are the operating points available for DVFS (dynamic voltage and frequency scaling). But this may be handled as configuration.