Either reporting too many, or too few holes is potentially less efficient, but should be allowed simply because it won't change the content of the file. So, while the interface allows it, it doesn't claim (imo) that the interface *must* report any and all actual or potential holes. Doing so would work, but would be a pessimization.
I assume the wording of the specification has to be "loose" like this to cover cases where the filesystem converts zero data blocks to holes (via block data scrubbing), or a file block of zeros gets rewritten as actual zeros (an optimization like zero-block data deduplication, for example) so that while the logical content of the file has not changed, the "hole" structure is different and the previous lseek(SEEK_HOLE) may no longer be a hole. This is a lesser constraint than if the content itself is altered, and should still work.
"so an implementation that reported every 0 in the file would be valid" - Yes, although it should at least adhere to the _PC_MIN_HOLE_SIZE as a lower bound. If that lower bound can be 1, clients should be prepared for that; in particular, backup software might need to detect and refuse to bother with bookkeeping such small holes, and just read and store the zeros verbatim.
"the ability to find potential holes without having to push the data all the way to userspace just to find 0's int he file seems like a useful optimisation for a small amount of code." - A filesystem could choose to "scrub" the data in the background and look for places to add holes, but whether its userspace or kernel, the act of looking for potential holes will involve processing a lot of data blocks, and could be tricky when done on active filesystems. The copying to userspace is trivial, compared to the block reads (even on non-rotating media). Whereas, creating the file with holes initially can often be done efficiently, since the writing application may know where the holes belong at the start. (ie. compare "time dd if=/dev/zero of=/var/tmp/non-sparse-file bs=1M count=1000" vs. "time dd if=/dev/zero of=/var/tmp/sparse-file bs=1M count=1 seek=999")
That said, I wonder if any of the compressing files system try to aggressively find ways to make files sparser (given that they have to process all the data anyway)? My guess is that sparseness is not much of a win on those filesystems, so they don't bother.