The memcpy() routine is used in a wide variety of places, sometimes implicitly, to copy small amounts of data. For example, implicit memcpy() calls may be inserted by the compiler whenever an initialized array is allocated on the stack. A test+branch based on the order of the operands may take just as long as copying the data. Ergo, adding even small amounts of additional error-checking to memcpy() may have a significant impact on performance.
The restrictions on memcpy() are hardly unique; *most* APIs do not tolerate overlapping memory regions. The memmove() routine is an exception. If you want a nice "safe" way to copy some data between buffers which may or may not overlap, and don't care so much about performance, just use memmove() everywhere.
While forward compatibility is a good thing in general, it is unreasonable for API developers to feel bound to support obvious *misuse* of their APIs which directly contradicts explicit API documentation, which is exactly what is happening here. Given that any broken applications can be trivially patched with a simple LD_PRELOAD, I see no reason not to permit this change to the internal implementation of memcpy() in glibc.