By the way, I suspect that the reason Python's human-factors are so good is because its immediate, strong ancestor ABC was subjected to iterated user testing. Imagine! A tool intended for use by users actually being *tested* on users and then changed in response to what is observed.
If I were the emperor of the programming languages research universe, I would summarily reject all papers that didn't come with studies on real humans, which would be all of them.