For some reason, an edit I tried to make didn't get into the just-immediate post:
Although caching hash(key) might be useful to avoid its recomputation when splitting a bucket, it is not needed at all when removing a bucket, since we know absolutely what bucket the removed-bucket's records go into. It is the bucket number (actually, M) masked by the smaller mask at the time of the merge-back and before M is decreased.
Whether or not hash(key) is cached, it is a nice value to keep as the record itself in a fast simulation looking for where splitting doesn't catch up fast enough and what the headroom needs to be on load factor, etc. OK, that's more than enough speculation from me on one day.