Until now, we didn't know much about how to configure the
EDMAC. Recently we did some experiments that cleared up a large part of the mystery.
Will start with size parameters. They are labeled xa, xb, xn, ya, yb, yn, off1a, off1b, off2a, off2b, off3 (from debug strings). Their meaning was largely unknown, and so far we only used the following configuration:
xb = width
yb = height-1
off1b = padding after each line
Let's start with the simplest configuration (memcpy):
xb = size in bytes.
Unfortunately, it doesn't work - the image height must be at least 2.
Simplest WxH
How Canon code sets it up:
CalculateEDmacOffset(edmac_info, 720*480, 480):
xb=0x1e0, yb=0x2cf
Transfer model (what the EDMAC does, in a compact notation):
xb * (yb+1) (xb repeated yb times)
WxH + padding (skip after each line)
(xb, skip off1b) * (yb+1)
Note: skipping does not change the contents of the memory,
so the above is pretty much the same as:
(xb, skip off1b) * yb
followed by xb (without skip)
xa, xb, xn (usual raw buffer configuration)
xa = xb = width
xn = height-1
To see what xa and xn do, let's look at some more examples (how Canon code configures them):
edmac_setup_size(ch, 0x1000000):
xn=0x1000, xa=0x1000
edmac_setup_size(6, 76800):
xa=0x1000, xb=0xC00, xn=0x12
CalculateEDmacOffset(edmac_info, 0x100000, 0x20):
xa=0x20, xb=0x20, yb=0xfff, xn=0x7
CalculateEDmacOffset(edmac_info, 1920*1080, 240):
xa=0xf0, xb=0xf0, yb=0xb3f, xn=0x2
The above can be explained by a transfer model like this:
(xa * xn + xb) * (yb+1)
Adding ya, yn (to xa, xb, xn, yb)
Some experiments (trial and error, 5D3):
xa = 3276, xb = 1638, xn = 1055 => 3276*1055 + 1638 transferred
xa = 3276, xb = 32, xn = 1055 => 3276*1055 + 32
xa = 3276, xb = 0, xn = 1055 => 3276*1056 - 20 (?!)
xa = 3276, xb = 3276, xn = 95, yb = 10 => 3276*96*11
xa = 3276, xb = 3276, xn = 95, yb = 7, yn = 3 => 3276*96*11
xa = 3276, xb = 3276, xn = 10, yb = 62, yn = 33 => 3276*11*96
xa = 3276, xb = 3276, xn = 10, yb=3, yn=5, ya=2 => 3276*11*19
xa = 3276, xb = 3276, xn = 10, yb=5, yn=3, ya=6 => 3276*11*27
xa = 3276, xb = 3276, xn = 10, yb=5, yn=3, ya=7 => 3276*11*30
xa = 3276, xb = 3276, xn = 10, yb=7, yn=8, ya=9 => 3276*11*88
xa = 3276, xb = 3276, xn = 10, yb=8, yn=3, ya=28 => 3276*11*96
(xa * xn + xb) REP (yn REP ya + yb)
Here, a REP b means 'perform a, repeat b times' => a * (b+1).
So far, so good, the above model appears to explain the behavior
when there are no offsets, and looks pretty simple.
There is a quirk: if xb = 0, the behavior looks strange.
Let's ignore it for now.
Adding off1b (to xa, xb, xn, ya, yb, yn)
What do we do about the offset off1b?
Experiment:
xa = 3276, xb = 3276, xn = 10, yb=95, off1b=100
=> copied 3276*10*96 + 3276, skipped 100,
(CP 3276, SK 100) repeated 94 times (95 runs).
It copies a large block, then it starts skipping after each line.
Let's decompose our model and reorder the terms.
Then, let's skip off1b after each xb.
(xa * xn) REP (yn REP ya + yb)
(xb, skip off1b) REP (yn REP ya + yb)
Let's check a more complex scenario:
xa = 3276, xb = 3276, xn = 10, yb=8, yn=3, ya=28, off1b=100
=> (CP 3276*10*29 + 3276, SK 100), (CP 3276, SK 100) * 27,
(CP 3276*10*29 + 3276*2, SK 100), (CP 3276, SK 100) * 27,
(CP 3276*10*29 + 3276*2, SK 100), (CP 3276, SK 100) * 27,
(CP 3276*10*9 + 3276*2, SK 100), (CP 3276, SK 100) * 8.
There's some big operation that appears repeated 3 times (yn),
although the copied block sizes are a little inconsistent (first is smaller).
After that, (xa * xn) is executed 9 times (yb+1).
At the end, (xb, skip off1b) is executed 9 times (also yb+1).
In the big operation, the 29 is clearly ya+1.
What if off1b is skipped after all xb iterations, but not the last one?
This could explain why we have an extra 3276 (the *2) on the last 3 log lines.
Regroup the terms like this:
=> ((CP 3276*10*29), (CP 3276, SK 100) * 28, CP 3276) * 3,
(CP 3276*10*9 ), (CP 3276, SK 100) * 9.
Our model starts to look like this:
(
(xa * xn) (ya+1)
(xb, skip off1b) * ya
xb without skip
)
* yn
followed by:
(xa * xn) (yb+1)
(xb, skip off1b) * (yb+1)
So far so good, it's a bit more complex,
but explains all the above observations.
Of course, the last line may be as well:
(xb, skip off1b) * yb, xb without skip
Adding off1a
Let's try another offset: off1a = 44.
The log from this experiment is pretty long, so I'll simplify it by regrouping the terms.
xa = 3276, xb = 3276, xn = 10, yb=8, yn=3, ya=28, off1a=44, off1b=100
=> (
((CP 3276, SK 44) * 28, CP 3276) * 10,
((CP 3276, SK 100) * 28, CP 3276),
) * 3,
(
((CP 3276, SK 44) * 8, CP 3276) * 10,
((CP 3276, SK 100) * 8, CP 3276)
)
This gives good hints about what is happening when:
(
((xa, skip off1a) * ya, xa) * xn
(xb, skip off1b) * ya, xb
) * yn,
(
((xa, skip off1a) * yb, xa) * xn
(xb, skip off1b) * yb, xb
)
Adding the remaining offsets (all parameters are now used)
Let's add off2a, off2b and off3. They are pretty obvious now, so I'll skip the log file (which looks quite intimidating anyway).
(
((xa, skip off1a) * ya, xa, skip off2a) * xn
(xb, skip off1b) * ya, xb,
skip off3
) * yn,
(
((xa, skip off1a) * yb, xa, skip off2b) * xn
(xb, skip off1b) * yb, xb
)
So, there is a pattern: perform N iterations with some settings, then perform the last iteration with slightly different parameters. The pattern repeats at all iteration levels (somewhat like fractals).
Just by looking at the memory contents, we can't tell what what the skip value is used for the very last iteration. However, by reading the memory address register (0x08) directly from hardware (not from the shadow memory), we can get the end address (after the EDMAC transfer was finished). For a write transfer, this includes the transferred data and also the skip offsets. Now it's straightforward to notice the last offset is off3, so our final model for EDMAC becomes:
EDMAC transfer model
(
((xa, skip off1a) * ya, xa, skip off2a) * xn
(xb, skip off1b) * ya, xb, skip off3
) * yn,
(
((xa, skip off1a) * yb, xa, skip off2b) * xn
(xb, skip off1b) * yb, xb, skip off3
)
The offset labels now start to make sense

C code (used in qemu):
for (int jn = 0; jn <= yn; jn++)
{
int y = (jn < yn) ? ya : yb;
int off2 = (jn < yn) ? off2a : off2b;
for (int in = 0; in <= xn; in++)
{
int x = (in < xn) ? xa : xb;
int off1 = (in < xn) ? off1a : off1b;
int off23 = (in < xn) ? off2 : off3;
for (int j = 0; j <= y; j++)
{
int off = (j < y) ? off1 : off23;
cpu_physical_memory_write(dst, src, x);
src += x;
dst += x + off;
}
}
}
The above model is for write operations. For read, the skip offsets are applied to the source buffer - that's the only difference.
Offsets can be positive or negative. In particular, off1a and off1b only use 17 bits (digic 3 and 4) or 19 bits (digic 5), so we have to extend the sign.
The above model explained all the combinations that are not edge cases (such as yb=0 or odd values). Here are the tests I've ran:
5D3 vs
QEMU.
For more details, please have a look at the "edmac" and "qemu" branches.
To be continued.