While a 32-bit address space is needed for many programs, many others would run fine in a 16-bit space. With 16-bit pointers and 16-bit int, data structures would be much smaller than on x64 or x32. Better yet, the complete data sets of dozens of programs would fit in L3 cache. A machine with no RAM at all, just a CPU, would be useful in many applications where the need for separate RAM and a RAM controller add prohibitive expense.
Carefully designed, an x16 mode would enable four-slice SIMD programming on commodity hardware, using a quarter of each register for each slice. Indeed, x32 could run with two slices. A kernel confined to 4G is little inconvenienced, but a kernel that can do twice as many operations in many cycles may be noticeably faster. Gcc already generates code for Itanic; can sliced x32 be difficult to add?