Actually, if the compiler were forced to consider overflow, it would just have to think a little harder before making the same optimizations. The compiler could determine that 1<<32 / GCD(1<<32, 0x04040404) > 256/64 and thus that the first time i >= 256/64, val == 0x04030200 with unsigned math, and val != 0x04030200 before that. Your loop would have to get slower (or, more likely, take an additional register) if it was going to use the same value in "val" multiple times, simply because it becomes necessary to track another piece of information.
(Also note that it doesn't matter if you declare the loop counter as an "unsigned int"; the nominal loop counter actually gets discarded entirely, in favor of a proxy loop counter, which is what could overflow.)