There are CP15 registers (ID_ISARx, MVFRx, CPACR, NSACR) that tell you whether VFP and NEON are present, and if present, whether they're powered up and available. (VFP/NEON are in a separate power domain on A15.) So, you could theoretically build software that includes NEON-optimized algorithms, and set up the right versions of the functions at the start of the code.
Your point stands, though, for devices that would have to run the fallback version. Those would benefit from ARMv8 without having to use NEON. Plus, keeping the multiple versions around takes up more space, wrangling them is added complexity, etc. etc.
For something like a cell phone, where interpreters like Dalvik have a JIT, the compiled code can match the exact CPU in the phone. After all, that isn't exactly a user-serviceable part. ;-)