The article partly answers your question:
> How does the kernel decide how long to wait for incoming packets before
> merging them? It turns out that there is no real need for any special
> waiting code: the NAPI API already has the driver polling for new packets
> occasionally and processing them in batches. GRO can simply be performed
> at NAPI poll time.
In a low-throughput setting, the kernel uses the normal interrupt-driven networking mode. Individual packets are processed as quickly as they come in over the wire.
Only when the CPU is too pegged to keep up with the interrupt load, does NAPI revert to the polling mode. At this point, without NAPI, the CPU would already be thrashing -- spending time to receive packets that it has no time to process. GRO merely increases the throughput that can be handled in polling mode.
How many packets are grabbed in each polling cycle can be changed with /proc/sys/net/core/netdev_budget (the default 300 is quite modest IMHO)