You're forgetting the memory overhead (including cache misses), not everything is pure CPU cycles. Besides that, this adds a cost to most memory operations, even if it turns out it was never necessary. Why slow down the common case to speed up special cases in rare situations?
I don't have time to look further into the details, at least not this month. Hopefully next month.