Might have found out where the FPS timer B overhead comes from, on 5D3:

Previously, I've noticed timer B overhead (i.e. difference between vertical resolution and minimum timer B that still gives clean image) was a bit hard to figure out; it wasn't a plain number that I could just hardcode in crop_rec. I could notice the overhead is larger in high-FPS modes and smaller at large resolutions (such as 4K half-FPS), but could not quantify it, so I had to tune each preset manually. Now I've confirmed a hypothesis I had since looking into 5D2: the overhead for timer B is, indeed,
constant, but measured in
milliseconds.
The red line shows the timer B limit found experimentally in the old FPS override code. Notice it matches the last red marker.
Between the red markers, EvfState executes its state transition functions; in particular, evfSetParamInterrupt and evfReadOutDoneInterrupt. During the last transition (last pair of red markers), Canon code sets up the hardware for capturing the next frame. This process is quite complex and... takes a significant chunk of execution time on the ARM processor (about 2.3 ms).
According to that graph, the 1080p60 can be easily pushed to 70 FPS without sacrificing resolution. Whether that can be recorded, I did not test.
If that software reconfiguration can be somehow optimized (e.g. by caching the register values configured by Canon code, and just replaying them for every frame), the 720p mode can - in theory - be pushed to about 83 FPS. Probably not worth the effort, as the lossless encoder is likely to be unable to keep up, the rolling shutter would be very close to 100%, and so on.
Our raw video "vsync" hook is placed right at the last red marker (it's executed right after Canon code does its evfReadOutDoneInterrupt transition, i.e. it extends the duration of that), so whatever we do there might also limit the maximum frame rate, to some extent.
Also notice the evfSetParam event sometimes starts to be processed *after* the HEAD3 interrupt (which signals the evfReadOutDoneInterrupt event). That might result in corrupted frames (unconfirmed hypothesis). Reducing the value of HEAD4 (which triggers evfSetParamInterrupt) *might* fix these corrupted frames (again, unconfirmed hypothesis).
Here's what happens if one reduces HEAD4 from 1320 to 1000 in 1080p24, timing-wise:

In other words, it appears to help, but that's not the only cause of delayed frames.