as you note, every flash drive available has a remapping layer hiding the details from the OS.
but let me point out some things that _could_ be done with either raw access to the flash, or some more smarts in the remapping layer
the key factor is that flash does not always need to be erased before it's changed
SLC flash can change any bit from a 1 to a 0 without needing to erase the block first
MLC flash can change any pair of bits from 11 to 00 (I think the order is 11 -> 10 -> 01 -> 00 but I'm not sure) without needing to erase the block first.
when a block is modified the hardware could compare the new data to the old data, and if the only difference os a 1->0 transition, it could modify the existing bock rather than writing a new version elsewhere
if the hardware supported this, then the OS could take advantage of the capability to reduce the number of erases necessary
the filesystem could:
leave the unused space at the end of a block is left as all '1's rather than all '0's so that additional data could be appended without needing to erase a block first.
change it's 'nothing more to point to' from a pointer containing all 0s to a pointer containing all 1s so that adding an additional block to a chain (or extent..) would not require re-writing the prior block as well
make a space/rewrite tradeoff in favor of reducing rewrites by allocating space for multiple copies of frequently changed metadata so that the entire block only needs to be re-written when all the extra slots have been used up.
as a trivial example of this last one. with atime enabled, every time a file is accessed it requires a rewrite of the entire eraseblock to record the new time.if you have a need to do a sync mount for data reliably, this could result in a rewrite for each file that's looked at
if however you had 10 atime slots, you would only do a rewrite after accessing a particular file 10 times, and if you a sync mount you would only have to do a rewrite after doing 10 passes through all the files (each file accessed would modify an atime slot,but until all 10 slots are full for any one file the block would not need to be moved, when the filesystem overflows the available slots on one file it can clean up all the other files in that block at the same time)
similar tricks could be done for size (either multiple slots or size+delta+delta approaches)
exactly what metadata should be given extra slots, and how many slots is an interesting problem to consider and experimant with (and probably is going to be different for different use cases as well)
if the hardware can tell the filesystem where the eraseblock boundries are then there are more optimizations that can take place
a couple side notes. the musical greeting cards and similar cheap recorder chips that became available in the 1990's actually worked by using eprom chips that had the similar programming properties as flash, when you erased them you get all 1s, but then by programming you could change a 1 to a 0.the recording capabilities showd up when someone realized that you didn't have to program them all the way to a 0, like flash they actually store an analog value and by rapidly sending programming pulses to the device (up to 100 per bit) you could adjust the flash voltage output to match the audio sample.then to play it back you just cycle through the addresses and amplify the analog voltage produced.
MLC flash takes advantage of a similar thing, it doesn't program the flash celll to a true 1 or 0, it can also program it to one of two additional analog values and then lables the original '1' as '11' the original '0' as '00' and then the two additional values as '10' and '01' the difficulty is that it's now harder to tell the different voltages apart.
I expect that MLC flash is going to climb in capacity rapidly as the manufacturers copy ideas from the history of modems
1. more values in a particular slot (what MLC does today vs SLC) as the ability to distinguish (and program) voltages that are close togeather get better (similar to how modems got faster as they distinguished more different tones as they went from 1200bps to 9600bps)
right now I believe that flash programming is mostly (if not entirely) a case of 'hit it with one programming pulse to change the cell', I expect that things will shift to 'hit it with a series of short programming pulses, checking between each pulse, until the cell gets to the desired voltage' doing this will increase complexity, and may slow down writes slightly (in some cases it may speed up writes as in the first instance the programming pulse needs to cover the 'worst case' needs, but with the new approach it can avoid 'overprogramming' the cell), but will result in more precise control of the cell voltage.
2. combining adjacent flash cells and define that only some of the range of possible bit patterns are legal, allowing the use of voltages in an individual cell that could be ambiguous, but become no longer ambiguous when combined with the data from the adjacent cell (similar to how modems shifted from pure tone detection to tone/phase detection with only some combinations being legal to allow for easier detection as they went above 9600bps)