Let's not exaggerate

Posted Jul 28, 2018 19:03 UTC (Sat) by wx (guest, #103979)
In reply to: Let's not exaggerate by Cyberax
Parent article: Deep learning and free software

> > As in, you could run a plain register-to-register floating-point arithmetic instruction on the GPU and it would sometimes give different results for the same input (on the same GPU and same compiled code etc)?
> Yup.

Unless you come up with code to reproduce such situations I'll have to call this out as an urban legend. I've been doing numerical optimization research (which has to be 100% reproducible) on virtually every high-end Nvidia GPU (consumer and Tesla) released since 2011. I've seen a lot of issues with Nvidia's toolchain but not a single case of the actual hardware misbehaving like that. Correct machine code will generate the exact same results every time you run it on the same GPU when using the same kernel launch parameters.

I'm also not aware of any credible reports by others to the contrary. There was one vague report of a Titan V not producing reproducible results (http://www.theregister.co.uk/2018/03/21/nvidia_titan_v_re...) but that is much more likely to be caused by the micro architecture changes in Volta. Intra-warp communication now requires explicit synchronization which can require significant porting effort for existing code bases and is rather tricky to get right.