"Itanic" indeed, that's exactly my point about "no clue". You correctly identify that there's a transition process, and the Itanium made no allowance for such a process, it seemed as though Intel hadn't even considered that the software then running on their x86 line would be expected to somehow transition onto this weird new architecture.
The transition process doesn't have to take decades, I think you've got that part wrong. The degree of abstraction from hardware is much greater this time, meaning you don't have all or nothing transitions. On OS X apparently even drivers can opt out of knowing about 64-bit for a while. With RAM being so cheap, most general purpose computing devices will quickly be 64-bit capable for practical reasons, and once the CPU is 64-bit you will have some applications that make use of that. You no longer have "32-bit only" as an option, so if you want the minimum number of platforms to support, "64-bit only" becomes the sensible choice. I think this creates a situation in which a "siphon effect" occurs, taking people from "Well, there is one 64-bit program I want to run" to a 64-bit only system in one upgrade cycle. I think the limiting factor will be ISVs, particularly on Windows, who loathe having to change /anything/ they were still shipping 16-bit installer programs that hadn't been maintained in almost a decade, right up until Microsoft banned such nonsense from their co-branding programme.
But then maybe I'm just counting very differently to you. I would not have said that transition to 32-bit ended in 2001. I'd say that happened in the mid 1990s, there was a lot of mess inside Windows 95, but new development of 16-bit software had ceased, PC video games (which you might think of as 16-bit because they ran under DOS for a few years still after Win 95) were actually using DOS extenders to escape 16-bit DOS and run 32-bit code. When the Pentium Pro flopped it was just barely mis-timed, its 16-bit performance was poor and the last few major applications with performance critical 16-bit code were just dying off. The Pentium II, also with fairly bad 16-bit performance, went into the market just fine.
Now, how about I show you how to burn lots of RAM on a web page without trying. Let's write a blog entry about a TV show we saw last night. Firstly, we inline the show itself, compressed of course, but still could easily be 2GB of data. Now, we add our commentary track, synchronised to the original, and then a series of images illustrating particular points which people can click on to zoom to their full HD resolution. Today it would be considered anti-social to embed a 2GB video on my hypothetical blog, but then ten years ago it would have been anti-social to put a 2MB image on there. In 2013 a 2GB video may just be the obvious way to illustrate what I'm talking about.
Of course very little of this strictly has to be in RAM at one moment, it's just that the difference between the user experience with the program desperately trying to keep all this fitted into 100MiB versus not worrying about it and letting the OS set cache policy is very noticeable. When you seek back a few seconds and a little beach ball appears and the disk goes crazy for 15 seconds, that makes you desperately want to avoid ever seeking again. Whereas if it happens instantly, you now have a useful feature. If scrolling to a different image causes the whole machine to have a heart attack while it goes off to load the image back into RAM, again you find yourself not wanting to scroll any more. This is the same as Linus' observation about git - faster isn't just a quantitative thing.
Now, as to the more general case, with the clock. You're right - in principle there are programs running on this laptop in front of me that could be 16-bit, or even 8-bit programs, They don't do very much, they don't manipulate huge files, or anything. But in practice it's much easier to just have one platform, a platform that's big enough for all the programs you run, and I argue that today and certainly tomorrow that platform is 64-bit.
Finally the pointers question. This is a design issue. There's a common Unix design where pointers are used as opaque handles, so an "image handle" or a "message handle" or whatever is just a pointer. Fast, but not necessarily space efficient. If your program typically has a few individual handles being passed around, that's a good pragmatic decision, but if you keep huge structures filled with handles then you may need to re-evaluate when porting to 64-bit, could the handles be indexes on an array or vector instead? The "indexing" style handles are more common in Windows, but each kind is found on both systems. With a 32-bit index instead of an address sized pointer you can have the same data throughput (and thus more or less performance) regardless of 64-bit vs 32-bit code. That's just one illustration, but hopefully it's helpful. Programs that absolutely /must have/ large numbers of real pointers or other address-sized things are fairly rare.