Author Topic: EDMAC internals  (Read 4121 times)

a1ex

  • Administrator
  • Hero Member
  • *****
  • Posts: 10177
  • 5D Mark Free
EDMAC internals
« on: November 26, 2016, 01:28:55 PM »
Until now, we didn't know much about how to configure the EDMAC. Recently we did some experiments that cleared up a large part of the mystery.

Will start with size parameters. They are labeled xa, xb, xn, ya, yb, yn, off1a, off1b, off2a, off2b, off3 (from debug strings). Their meaning was largely unknown, and so far we only used the following configuration:

Code: [Select]
xb = width
yb = height-1
off1b = padding after each line

Let's start with the simplest configuration (memcpy):

Code: [Select]
xb = size in bytes. 

Unfortunately, it doesn't work - the image height must be at least 2.


Simplest WxH


How Canon code sets it up:
Code: [Select]
  CalculateEDmacOffset(edmac_info, 720*480, 480):
     xb=0x1e0, yb=0x2cf

Transfer model (what the EDMAC does, in a compact notation):
Code: [Select]
xb * (yb+1)        (xb repeated yb times)


WxH + padding (skip after each line)


Code: [Select]
(xb, skip off1b) * (yb+1)

Note: skipping does not change the contents of the memory,
so the above is pretty much the same as:
 
Code: [Select]
(xb, skip off1b) * yb
followed by xb (without skip)


xa, xb, xn (usual raw buffer configuration)


Code: [Select]
xa = xb = width
xn = height-1

To see what xa and xn do, let's look at some more examples (how Canon code configures them):
Code: [Select]
  edmac_setup_size(ch, 0x1000000):
    xn=0x1000, xa=0x1000

  edmac_setup_size(6, 76800):
    xa=0x1000, xb=0xC00, xn=0x12

  CalculateEDmacOffset(edmac_info, 0x100000, 0x20):
    xa=0x20, xb=0x20, yb=0xfff, xn=0x7

  CalculateEDmacOffset(edmac_info, 1920*1080, 240):
    xa=0xf0, xb=0xf0, yb=0xb3f, xn=0x2

The above can be explained by a transfer model like this:
Code: [Select]
(xa * xn + xb) * (yb+1)


Adding ya, yn (to xa, xb, xn, yb)


Some experiments (trial and error, 5D3):
Code: [Select]
  xa = 3276, xb = 1638, xn = 1055                  => 3276*1055 + 1638 transferred
  xa = 3276, xb = 32,   xn = 1055                  => 3276*1055 + 32
  xa = 3276, xb = 0,    xn = 1055                  => 3276*1056 - 20 (?!)
  xa = 3276, xb = 3276, xn = 95, yb = 10           => 3276*96*11
  xa = 3276, xb = 3276, xn = 95, yb = 7,  yn = 3   => 3276*96*11
  xa = 3276, xb = 3276, xn = 10, yb = 62, yn = 33  => 3276*11*96
  xa = 3276, xb = 3276, xn = 10, yb=3, yn=5, ya=2  => 3276*11*19
  xa = 3276, xb = 3276, xn = 10, yb=5, yn=3, ya=6  => 3276*11*27
  xa = 3276, xb = 3276, xn = 10, yb=5, yn=3, ya=7  => 3276*11*30
  xa = 3276, xb = 3276, xn = 10, yb=7, yn=8, ya=9  => 3276*11*88
  xa = 3276, xb = 3276, xn = 10, yb=8, yn=3, ya=28 => 3276*11*96

Code: [Select]
(xa * xn + xb) REP (yn REP ya + yb)

Here, a REP b means 'perform a, repeat b times' => a * (b+1).

So far, so good, the above model appears to explain the behavior
when there are no offsets, and looks pretty simple.

There is a quirk: if xb = 0, the behavior looks strange.
Let's ignore it for now.


Adding off1b (to xa, xb, xn, ya, yb, yn)


What do we do about the offset off1b?

Experiment:
Code: [Select]
xa = 3276, xb = 3276, xn = 10, yb=95, off1b=100
=> copied 3276*10*96 + 3276, skipped 100,
   (CP 3276, SK 100) repeated 94 times (95 runs).

It copies a large block, then it starts skipping after each line.
Let's decompose our model and reorder the terms.
Then, let's skip off1b after each xb.

Code: [Select]
(xa * xn)        REP (yn REP ya + yb)
(xb, skip off1b) REP (yn REP ya + yb)

Let's check a more complex scenario:
Code: [Select]
xa = 3276, xb = 3276, xn = 10, yb=8, yn=3, ya=28, off1b=100
=> (CP 3276*10*29 + 3276,   SK 100), (CP 3276, SK 100) * 27,
   (CP 3276*10*29 + 3276*2, SK 100), (CP 3276, SK 100) * 27,
   (CP 3276*10*29 + 3276*2, SK 100), (CP 3276, SK 100) * 27,
   (CP 3276*10*9  + 3276*2, SK 100), (CP 3276, SK 100) * 8.

There's some big operation that appears repeated 3 times (yn),
although the copied block sizes are a little inconsistent (first is smaller).

After that, (xa * xn) is executed 9 times (yb+1).
At the end, (xb, skip off1b) is executed 9 times (also yb+1).

In the big operation, the 29 is clearly ya+1.

What if off1b is skipped after all xb iterations, but not the last one?
This could explain why we have an extra 3276 (the *2) on the last 3 log lines.

Regroup the terms like this:
Code: [Select]
  => ((CP 3276*10*29), (CP 3276, SK 100) * 28, CP 3276) * 3,
      (CP 3276*10*9 ), (CP 3276, SK 100) * 9.

Our model starts to look like this:
Code: [Select]
(
   (xa * xn)   (ya+1)
   (xb, skip off1b) *  ya
    xb without skip
)
  * yn

followed by:

   (xa * xn)   (yb+1)
   (xb, skip off1b) * (yb+1)

So far so good, it's a bit more complex,
but explains all the above observations.
Of course, the last line may be as well:
Code: [Select]
  (xb, skip off1b) * yb, xb without skip


Adding off1a


Let's try another offset: off1a = 44.
The log from this experiment is pretty long, so I'll simplify it by regrouping the terms.

Code: [Select]
xa = 3276, xb = 3276, xn = 10, yb=8, yn=3, ya=28, off1a=44, off1b=100
=> (
     ((CP 3276, SK 44)  * 28, CP 3276) * 10,
     ((CP 3276, SK 100) * 28, CP 3276),
   ) * 3,
   (
     ((CP 3276, SK 44)  * 8, CP 3276) * 10,
     ((CP 3276, SK 100) * 8, CP 3276)
   )

This gives good hints about what is happening when:
Code: [Select]
(
   ((xa, skip off1a) * ya, xa) * xn
    (xb, skip off1b) * ya, xb
) * yn,

(
   ((xa, skip off1a) * yb, xa) * xn
    (xb, skip off1b) * yb, xb
)


Adding the remaining offsets (all parameters are now used)


Let's add off2a, off2b and off3. They are pretty obvious now, so I'll skip the log file (which looks quite intimidating anyway).

Code: [Select]
(
   ((xa, skip off1a) * ya, xa, skip off2a) * xn
    (xb, skip off1b) * ya, xb,
     skip off3
) * yn,

(
   ((xa, skip off1a) * yb, xa, skip off2b) * xn
    (xb, skip off1b) * yb, xb
)

So, there is a pattern: perform N iterations with some settings, then perform the last iteration with slightly different parameters. The pattern repeats at all iteration levels (somewhat like fractals).

Just by looking at the memory contents, we can't tell what what the skip value is used for the very last iteration. However, by reading the memory address register (0x08) directly from hardware (not from the shadow memory), we can get the end address (after the EDMAC transfer was finished). For a write transfer, this includes the transferred data and also the skip offsets. Now it's straightforward to notice the last offset is off3, so our final model for EDMAC becomes:


EDMAC transfer model


Code: [Select]
(
   ((xa, skip off1a) * ya, xa, skip off2a) * xn
    (xb, skip off1b) * ya, xb, skip off3
) * yn,

(
   ((xa, skip off1a) * yb, xa, skip off2b) * xn
    (xb, skip off1b) * yb, xb, skip off3
)

The offset labels now start to make sense :)

C code (used in qemu):
Code: [Select]
for (int jn = 0; jn <= yn; jn++)
{
    int y     = (jn < yn) ? ya    : yb;
    int off2  = (jn < yn) ? off2a : off2b;
    for (int in = 0; in <= xn; in++)
    {
        int x     = (in < xn) ? xa    : xb;
        int off1  = (in < xn) ? off1a : off1b;
        int off23 = (in < xn) ? off2  : off3;
        for (int j = 0; j <= y; j++)
        {
            int off = (j < y) ? off1 : off23;
            cpu_physical_memory_write(dst, src, x);
            src += x;
            dst += x + off;
        }
    }
}

The above model is for write operations. For read, the skip offsets are applied to the source buffer - that's the only difference.

Offsets can be positive or negative. In particular, off1a and off1b only use 17 bits (digic 3 and 4) or 19 bits (digic 5), so we have to extend the sign.

The above model explained all the combinations that are not edge cases (such as yb=0 or odd values). Here are the tests I've ran: 5D3 vs QEMU.

For more details, please have a look at the "edmac" and "qemu" branches.

To be continued.

g3gg0

  • Developer
  • Hero Member
  • *****
  • Posts: 3024
Re: EDMAC internals
« Reply #1 on: November 26, 2016, 01:50:53 PM »
after alex found out how to correctly configure the edmac, it was easy to match it with patents.

https://www.google.de/patents/US7817297 see fig 11a
the description matches the reverse engineered information
Help us with datasheets - Help us with register dumps
magic lantern: 1Magic9991E1eWbGvrsx186GovYCXFbppY, server expenses: [email protected]
ONLY donate for things we have done, not for things you expect!

a1ex

  • Administrator
  • Hero Member
  • *****
  • Posts: 10177
  • 5D Mark Free
Re: EDMAC internals
« Reply #2 on: December 12, 2016, 01:21:37 AM »
Some pictures showing the EDMAC usage on 5D3 LiveView:




a1ex

  • Administrator
  • Hero Member
  • *****
  • Posts: 10177
  • 5D Mark Free
Re: EDMAC internals
« Reply #3 on: December 12, 2016, 09:22:25 PM »
5D3 photo mode:





TTJ = TwoInTwoOutJpegPath
TTL = TwoInTwoOutLosslessPath

a1ex

  • Administrator
  • Hero Member
  • *****
  • Posts: 10177
  • 5D Mark Free
Re: EDMAC internals
« Reply #4 on: June 20, 2017, 04:19:03 PM »
Committed the test code that outputs the raw logs used to figure out the EDMAC model, just in case anyone would like to play with it.

This code can be used for cross-checking the EDMAC behavior with our understanding on what it does, by running it on both a real camera and on QEMU - the logs should match.

PR ready.

a1ex

  • Administrator
  • Hero Member
  • *****
  • Posts: 10177
  • 5D Mark Free
Re: EDMAC internals
« Reply #5 on: August 17, 2017, 01:37:34 AM »
Something easier: playing back an image.



Translation:
- the graph shows only the steps performed on the image processing engine
- first step: some memcpy of size 3840x1079, using connection <6> which is pass-through (probably zeroing out some buffer); note the input is a single line repeated many times.
- second step: a quick JPEG read pass (reading the embedded JPEG from the CR2 to find its metadata).
- third step: decoding the JPEG (it reads the same JPEG again, but this time outputs a 2880x960 YUV image in some unusual order); input from connection <5>, output on <3>, using JPCORE.
- fourth step: resizing the 2880x960 YUV to 1440x480 (all these sizes were in bytes, so the end result is 720x480 pixels - the displayed image). Input and output on connection <3>.

The last configuration probably shows the data coming to some connection is not automatically forwarded to the other end of that connection (exception: connections 6 and 7 will simply copy the input data to output).

A real-time connection diagram is available on the "edmac" branch, but it only shows the latest state (so, in this case of playing back an image, it will only show the last processing step, because the EDMAC channels are reused).

After some playing around, I've got a few snapshots throughout the image playback process (captured one screenshot on each StartEDmac call):