The idea was sufficiently well known for Val Henson (now Val Aurora) to publish a paper arguing against it in May 2003; she cites 6 different earlier systems using it: http://valerieaurora.org/review/hash/node2.html
The oldest appears to be rsync, with Tridge's thesis coming out in 1999, and for de-duplication specifically I'd check the paper on a backup system called "Pastiche" that was formally published in 2002...