The description of the 'low_latency' mode isn't very accurate unfortunately, perhaps Jon didn't have his coffee before reading over it :-). Let me attempt to rectify that.
The low_latency knob doesn't impact delays or merging, one of the key aspects to getting low latency for a series of operations (like starting your firefox while other IO is happening) is actually making sure we get the delays right. If we take the classic case of reader vs writer, the writer dirty speed will greatly outnumber the writeback speed. So we always have tons of dirty pages waiting to be written. The normal reader, however, does dependent reads that are serialized by each other. When one read finishes, another will be issued by the reader very shortly. Achieving good throughput and latency for the reader in CFQ is accomplished by briefly waiting for another IO when one has completed. In CFQ, this is called idling.
The two primary changes in behaviour for CFQ in -rc3 is letting seeky IO also idle, even if the hardware does command queuing (which most does these days) and limiting the damage that the async IO can do while sync IO is also happening. With the 'low_latency' knob switched to on, CFQ will only slowly build up a queue depth of async writes. This greatly helps reduce the impact that a writer will have on the system interactiveness, since when we miss a sync idle window only slightly, the amount of async writeback sent to the device will be limited by the time since that last sync IO.
The end result is that the desktop experience should be less impacted by background IO activity. It's also worth mentioning that the 'low_latency' setting defaults to on.