I think the author missed one important argument for x32 mode: position independent code (PIC), commonly used in shared libraries, is quite slow in x86 mode. See slide 3 of http://linuxplumbersconf.net/2011/ocw//system/presentatio... The claim is that on x86 the performance penalty of PIC is > 20%. As most code of an application comes from a shared library, this is affecting application performance considerably.
Embedded systems with an Intel Atom that currently run only x86 code might benefit most from x32 as they are not likely going to have an AMD64 user space. An x32 kernel might be an excellent idea for these class of systems.