Please look at the presentation again. The things you mentions are already put in the compiler. The initial design of the compiler made it very easy to add state-of-the-art optimizing stuff. I think especially the multi-class register allocator is one of the best available today.
Currently the code generated is between 0-10% slower than code generated by gcc, and most of the time "lost" is due to not-yet added optimizations. Still, the compiler runs around 15 times faster than gcc.