First you need to realize that a local tone mapping operation needs to analyze the neighbouring pixels over a quite large area. With a 10x10 kernel, the processing power will be 100x higher than for a normal image. With a 100x100 kernel... 10000x higher (unless you can optimize that particular operation with some clever FFT math or something like that).
On a PC, just time enfuse and compare with some trivial brightness change in a video editor.
Second, remember that we don't know how to program Canon's image processing chip.
Third, good luck designing the math formulas to do that.