1) Less I$ pollution. You won't see this in a memcpy() benchmark, but what about a more realistic workload?
2) Give some incentive to CPU makers to optimize the simple rep mov instead of requiring ever more fancy unrolled loops written in the latest instruction set extension. :)