You have to understand that TCP was not designed to save power, quite the opposite. It was designed to provide a relatively smooth transmission rate without hardware support. To that end it uses ACK clocking where ideally no more than two packets are transmitted for every incoming ACK. That requires TCP level processing for every ACK, which amounts to considerable CPU overhead on a high bandwidth transfer.
If you want to save power there is really only one good way to go about it - hardware support for packet pacing, where the driver instructs the hardware: here is a series of ten packets, schedule for transmission at X microsecond intervals. Adapting that to TCP is kind of a trick, but the point is that the kernel can then go to sleep for a considerably longer time before it has to wake up again to do proper congestion control.
Unfortunately, as far as I know, there are no Ethernet chipsets out there with support for hardware packet pacing. It is a big problem, because the kernel can only respond so fast (and actually get any other work done) to incoming traffic on a high bandwidth interface, so what we get instead is ACK compression where a much longer series of packets gets queued for transmission in a short period of time, swamping network queues and increasing packet loss and jitter.