Let's not exaggerate
Let's not exaggerate
Posted Jul 26, 2018 14:26 UTC (Thu) by gpu (guest, #125963)In reply to: Let's not exaggerate by excors
Parent article: Deep learning and free software
1. Non-synchronized GPU atomics (https://docs.nvidia.com/cuda/cuda-c-programming-guide/ind...)
2. Non-deterministic numerical algorithms, e.g. http://papers.nips.cc/paper/4390-hogwild-a-lock-free-appr... (though, this particular example is CPU-specific)
Posted Jul 28, 2018 19:44 UTC (Sat)
by wx (guest, #103979)
[Link]
This is spot on at least for anything using TensorFlow which, sadly, applies to the majority of deep learning research out there. The respective issue trackers on github are full of bug reports about TensorFlow not generating reproducible results. These are usually closed with claims that the use of atomics is strictly required to obtain plausible performance.
Anecdotal evidence from colleagues involved in deep learning research suggests that even if you have access to all source code and training data the resulting networks will often differ wildly if TensorFlow is involved. E.g. it's not uncommon for success rates of a trained classifier to vary from 75% to 90% between different training runs. With that in mind the discussion within Debian is, well, a little off from actual real-world problems.
Let's not exaggerate