it just seems conceptually wrong to me that finding holes (or potential holes) should be a two step process.
step 1 use SEEK_HOLE to find holes the filesystem knows about
step 2 read the remainder of the file through userspace to look for additional holes (or holes that SEEK_HOLE didn't report.
examining a range of memory to find if it's exclusively zero seems like the type of thing that is amiable to optimisation based on the particular CPU in use. Since the kernel is already optimised this way it would seem to be better to leverage this rather than require multiple userspace tools to all implement the checking (with the optimisations)
the full details of what extents are used for a file seems like it isn't the right answer, both because it's complex, but also because it's presenting a lot of information that isn't useful (i.e. you don't care if a block of real data is in one block, or fragmented into lots of blocks), but at the same time it seems a bit wasteful to find the holes by doing a separate system call for each hole boundary.