I believe there are instructions to flush particular TLB entries -- invlpg for one.
AFAICS the trouble in this case stems from the inefficiency of a global TLB flush. Post a vfree() you have changed some global state on your *local* CPU. However, the other remote TLBs don't know about this change in state and will happily resolve references to the vfree'd range unless told otherwise. Thus, the IPIs to inform them to flush TLB.