I would not know when COW fork was implemented in various kernels.
Presumably not long after MMU hardware became available.
Still, a 1GB process needs 244,140 * 4KB page entries to be copied for the child. That's a lot of baggage if the child's sole purpose is to call exec(). Better to use vfork/exec when possible.
I'd like to be clear that the over commit issues with fork() are not an implementation problem but are a fundamental consequence of what fork does.
If the parent has a working data set of 100MB, and the child only needs 5MB from the parent, fork() still marks the remaining 95MB as needed by the child.
Assume the parent modifies it's entire 100MB working set while the child continues running with it's 5MB working set, then eventually both processes will consume 200MB instead of the 105MB which is technically needed.
So, regardless of the fork implementation, 95MB out of 200MB is wasted. As the parent spawns more children over time, the % wasted only gets worse.
Of course there are workarounds, but they come at the expense of forgoing the semantics which make fork appealing in the first place: inheriting context and data structures from the parent without IPC.