Let's not exaggerate

Posted Jul 26, 2018 14:26 UTC (Thu) by gpu (guest, #125963)
In reply to: Let's not exaggerate by excors
Parent article: Deep learning and free software

Two better examples of non-determinism on GPUs would be:

1. Non-synchronized GPU atomics (https://docs.nvidia.com/cuda/cuda-c-programming-guide/ind...)

2. Non-deterministic numerical algorithms, e.g. http://papers.nips.cc/paper/4390-hogwild-a-lock-free-appr... (though, this particular example is CPU-specific)

Let's not exaggerate

Posted Jul 28, 2018 19:44 UTC (Sat) by wx (guest, #103979) [Link]

> 1. Non-synchronized GPU atomics

This is spot on at least for anything using TensorFlow which, sadly, applies to the majority of deep learning research out there. The respective issue trackers on github are full of bug reports about TensorFlow not generating reproducible results. These are usually closed with claims that the use of atomics is strictly required to obtain plausible performance.

Anecdotal evidence from colleagues involved in deep learning research suggests that even if you have access to all source code and training data the resulting networks will often differ wildly if TensorFlow is involved. E.g. it's not uncommon for success rates of a trained classifier to vary from 75% to 90% between different training runs. With that in mind the discussion within Debian is, well, a little off from actual real-world problems.