This seems a lot better than what we got now, but there seems room for improvement.
Dirty throttling should be mostly independent of memory pressure. If you start throttling IO when getting under memory pressure, the damage can already be done. Throttling should always happen when the rate of dirtying is greater than the rate of write out. This automatically finds the best buffer size for that particular IO and device. The tricky part is measuring the IO speed. If you do throttling per task then you have to measure the IO speed per task too (the difference between max, min and avg IO speed is just too great).
A reason to not throttle would be to cache dirty memory in the hope that it will be rewritten/removed soon so that overall less is written. Another reason is when something wants to use that just written data immediately. And things like laptopmode might delay writes further. How much to cache does depend on memory pressure. Extra caching should be the exception, not the default algorithm.
Another concern is latency, mostly for unrelated read IOs.
For rotating disks it's most efficient to give them as much writes as possible, to fill up their write buffer and reduce the seek cost. Even then 100MB is a tad excessive though, especially in a system with many disks.
SSDs need a lot less write data to keep them saturated. Even the most crappy ones should be close to maximum throughput with a couple of MBs outstanding. More importantly, this figure is independent on the speed of the SSD, faster SSDs won't need more data. So the 1 second of work rule of thumb is a bit flawed.
Also the effective disk throughput depends on how many read IOs happen at the same time, so I think something more dynamic is needed than a handful of arbitrary thresholds.
All in all this is a big step in the right direction, so I hope it gets merged soon.