1. The language speed is due to their interpreting loop rather than the fact they use GC. As a rule of thumb, an interpreter evaluating an opcode stream appears to be 10 times slower than compiler that translates it into native form.
2. Probably true. Missing from this discussion is the observation that even malloc/free can be unpredictable with respect to latency because they must maintain the free list and occasionally optimize it to maintain performance of allocations. No doubt advances to malloc technology have happened and will happen, and there are multiple implementations to pick from that expose different tradeoffs.
3. Python, I've been told, also contains a true GC in addition to the refcounter. I think the largest single advantage of a refcounter is that it gives a very predictable lifecycle for an object, often removing it as soon as the code exits a block. This makes user code simpler to write because filehandles don't need to be closed and database statement handles go away automatically. Still, this is synchronous object destruction, and scheduling object destruction during user think-time would give better user experience.