To find identical pages by comparing their hashes, first you have to calculate these hashes, reading full contents of each page into CPU registers (and maybe even evicting all useful CPU cache). Your memory traffic is 100% of your used pages, every time.
Contrary, when you compare pages by their contents right from the start, in most cases you could find differing pages quite quickly, by reading only their first few bytes. You have to read two whole pages only to find out that they are identical. And this is done without mathematical overhead of hash calculation, remember. So your memory traffic is 100% only for those pages that happen to be identical. For pages that are different pairwise, you usually have to read only a small part of them to find that. With random contents of the pages, you stop comparing them on the first byte in 255/256 cases. On the fourth byte (32-bit word), in (2^32-1)/2^32 cases. In Linux case, there could exist differing pages with longer common prefixes, but I think this prefix will be shorter than half of page length often enough. So for differing memory pages you most probably don't get 100% of memory traffic anyway. I clearly see why simplified approach of KSM could have its benefits.