I may miss something, but restoring the *whole* cache at context switch
will not magically be faster than restoring *all* inidividual cachelines
on first (re)use. So you are essentially trading a huge context switch
delay for less initial cache misses. Overall this should at best perform
equally (if each restored cacheline is used at least once), and typically
worse (if there are cachelines that won't be used again) than the normal,
lazy strategy.
Unless there is some win by doing bulk memory transfers. But individual
cachelines should be wide enough to incure this win also.