"Average win of 1.6% to 3.8% does not look very compelling."
Perhaps but the average win on typical GNU/Linux systems is greater than that: GCC does not do so well on register starved platforms. Find a comparison on GCC and you the difference is greater because x86 is really quite register starved.
Additionally, x86_64 makes mandatory a number of micro-architectural improvements which were previously optional. For example, consider SSE2 which is not universally available on x86. So generally distributed binaries must either offer alternative functions and detect at run time, or, more commonly, avoid using those instructions. x86_64 reset the baseline and you can just assume that many of these features will be there. This difference won't show up in benchmarks like the one you cited, since it used compiles directed at the particular CPUs in use, but it does show up in the real world.
There are also a couple of other advantages of x86_64 mode which were not mentioned up-thread: Much faster sysctls and improved security (no-execute pages are always available, bigger address space means that addresses can always contain null bytes, etc).
And I'm not sure why you say that ~3% by itself isn't all that compelling when the price difference between a 2.83GHz core2 quad and a 3.0GHz core 2 quad cpu is $230 US. Based on that simplistic analysis we might expect 3% to be worth $115 US while if you use a modern x86_64-painless distro and don't use non-free software the cost of 64bit mode is pretty close $0.