The first main benefit of this, is that you no longer need to have stack frames enabled. The -pg option with mcount requires stack frames. With -pg and -mfentry, you no longer have to have stack frames, which gives a bit of a performance boost.
The next part is that the callbacks to the function tracer can now get access to the registers. Because the stack frame is set up before mcount is called, you lose out on having the stack and registers holding function parameters by the time mcount is called. With the fentry right at the beginning of the function, you now have full access to the registers and stack frame as it was given to the function, which means we now have the possibility of tracing the data in the function parameters as well.
The third part and the most extreme, is that because fentry is called as the very first instruction of the function, we could possibly now "hijack" the function completely! That is, we could call a different function and return to the original caller without any issue. I could imagine crazy things with this feature.
Perhaps taking point 2 and 3 above, instead of a full hijack, we could also have the ability to modify the parameters. Not sure what usefulness that is besides rootkits and academia. But who knows?
As for a link for documentation of what ftrace could do with this? Sorry, but I don't know of the url that points into my head ;)