Intercepting DebugMsg with cache hacks

Started by a1ex, August 28, 2012, 10:56:48 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

nikfreak

just gave it a quick try after a long time (=months).
Best thing is that dm.log doesn't get overwritten like in past:

https://drive.google.com/drive/folders/0B9Mu66yg5QzROGV6aTNWTlJNbFE?usp=sharing
[size=8pt]70D.112 & 100D.101[/size]

a1ex

You can log code sequences from other branches as well. Here's an example for debugging compressed_raw on 5D3 1.2.3:

Makefile.user:

CONFIG_DEBUG_INTERCEPT = y


Merge the two branches:

hg update compressed_raw -C
hg merge dm-spy-experiments


(there will be some merge conflicts, sorry about that, I'm still novice in this area)
(maybe it's best to cleanup the dm-spy-experiments branch a bit and merge it in mainline)
(ignore mlv_lite and mlv_rec for now, as we are not going to use them - just merge the other files)

Change the code to enable logging:

diff -r ad3969261ccf modules/silent/lossless.c
--- a/modules/silent/lossless.c
+++ b/modules/silent/lossless.c
@@ -3,6 +3,7 @@
#include "lens.h"
#include "raw.h"
#include "edmac.h"
+#include "dm-spy.h"

struct TwoInTwoOutLosslessPath_args
{
@@ -251,10 +252,18 @@

int lossless_compress_raw(struct raw_info * raw_info, struct memSuite * output_memsuite)
{
-    return lossless_compress_raw_rectangle(
+    /* start logging */
+    debug_intercept();
+
+    int result = lossless_compress_raw_rectangle(
         output_memsuite, raw_info->buffer,
         raw_info->width, 0, 0,
         raw_info->width, raw_info->height
     );
+
+    /* stop logging and save to file */
+    debug_intercept();
+
+    return result;
}


Now compile ML core and the silent picture module, place them on the card, and take a full-res silent picture with lossless compression (so the routine we have modified will get called in a quiet context - with LiveView turned off).


cd platform/5D3.123
make clean; make
make installq       # this copies only autoexec and sym file
cd ../../modules/silent
make clean; make
make install        # copy only this module to your card


You should get a file named dm-000.log in the root of the card, that contains these lines (among many others):

172EB> shoot_task:000aeef0:00:00: *** register_interrupt("Jp57IntrHandler", 0xa3, 0xff291e2c, 0x0), from ff3dcc78
17882> shoot_task:ff3dd01c:16:03: [TTL] START WR1:0x18800064 WR2:0x15a0c000
178B2> shoot_task:ff3dd048:16:03: [TTL] START RD1:0x86051a8 RD2:0xad24490
34733> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x15336af )
3476D> **INT-83h*:00a0e980:00:00: LosslessCompleteCBR









Want more details?

Let's enable interrupts and ENGIO messages:

diff -r 9146aa47814c src/dm-spy-extra.c
--- a/src/dm-spy-extra.c
+++ b/src/dm-spy-extra.c
@@ -20,7 +20,7 @@
#include "timer.h"

/* this needs pre_isr_hook/post_isr_hook stubs */
-//~ #define LOG_INTERRUPTS
+#define LOG_INTERRUPTS

/* delay mpu_send calls, to allow grouping MPU messages by timestamps */
#define MPU_DELAY_SEND
@@ -338,11 +338,11 @@
     { 0xff2953b8, "LockEngineResources", 1, LockEngineResources_log },
     { 0xff29555c, "UnLockEngineResources", 1, UnLockEngineResources_log},

-    //~ { 0xff290f98, "engio_write", 1, engio_write_log},   /* on 5D3, Canon's engio_write is silent */
-    //~ { 0xff290c80, "EngDrvOut", 2 },                     /* same here */
-    //~ { 0xff290ca0, "EngDrvIn", 1 },
-    //~ { 0xff290d38, "EngDrvBits", 3, engdrvbits_log},
-    //~ { 0xff290cd4, "EngDrvIns", 3 },
+    { 0xff290f98, "engio_write", 1, engio_write_log},   /* on 5D3, Canon's engio_write is silent */
+    { 0xff290c80, "EngDrvOut", 2 },                     /* same here */
+    { 0xff290ca0, "EngDrvIn", 1 },
+    { 0xff290d38, "EngDrvBits", 3, engdrvbits_log},
+    { 0xff290cd4, "EngDrvIns", 3 },
     /* only EngDrvOuts is verbose, no need to log it here */
     
     //~ { 0xFF694880, "eeko_setup_global_vars", 10 },
diff -r 9146aa47814c src/dm-spy.c
--- a/src/dm-spy.c
+++ b/src/dm-spy.c
@@ -52,8 +52,6 @@
{
     if (!buf) return;
         
-    if (class == 21) // engio, lots of messages
-        return;
     
     #ifdef CONFIG_7D
     if (class == 0x3E) // IPC, causes ERR70


Lots of messages now! Let's look at some of them.

How does SetEDmac work?


CDD68> shoot_task:000af06c:00:00: *** SetEDmac(0x11, 0x58800064, 0x1c42ac, 0x20000001), from ff580f28
CDD80> shoot_task:000af168:00:00:     *0x1c42ac = { 0 0 0 0  0 0 2000 0  fff 0 0 a0f7f8 ... }
CDDBD> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f2610c, Data:0x       0)[8]EngDrvOuts
CDDE5> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26110, Data:0x fff2000)[7]EngDrvOuts
CDE0F> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26114, Data:0x       0)[6]EngDrvOuts
CDE32> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26118, Data:0x       0)[5]EngDrvOuts
CDE55> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f2611c, Data:0x       0)[4]EngDrvOuts
CDE79> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26120, Data:0x       0)[3]EngDrvOuts
CDE9C> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26124, Data:0x       0)[2]EngDrvOuts
CDEBF> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26128, Data:0x       0)[1]EngDrvOuts
CDEE5> shoot_task:000125c0:15:01: ResetEDmac(0x20000001)
CDF1F> shoot_task:000af06c:00:00: *** EngDrvOut(0xc0f26104, 0x20000001), from 125d4
CDF55> shoot_task:000af06c:00:00: *** EngDrvOut(0xc0f26108, 0x18800064), from 125ec


This reveals the register configuration performed by SetEDmac.

How do we know where to stop? The next line reads:

CDF9C> shoot_task:000af06c:00:00: *** RegisterEDmacCompleteCBR(0x11, 0xff580b3c "[PackMem] CompleteInterrupt In %d", 0x11), from ff580f38


which is the next call, and nothing interesting happens in-between (so, all those messages must be from SetEDmac).

Alternative: log the return value of SetEDmac, so you'll know for sure where this call ends. Exercise for the reader.

How does the main CPU know when the compression is finished?

The JPCORE CPU announces it using an interrupt:

EC9F6> **INT-A3h*:000aed4c:00:00: >>> INT-A3h Jp57IntrHandler ff291e2c(0)
ECA36> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20044), from ff291ce4
ECA70> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20024), from ff291d18
ECA90> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x1403b4e )
ECADC> **INT-A3h*:000af06c:00:00: *** EngDrvOut(0xc0f26104, 0x20000006), from 12aec
...
ECCB1> **INT-83h*:00000588:00:00: <<< INT-A3h done


That magic number is 0x1403b4e = 20986702 - looks like a valid compressed size for a full-res picture (and you can confirm by printing output_size in lossless.c).

OK, so that must be where the firmware finds out the compressed size! Let's see which of these registers returns it. Enable logging the return value of EngDrvIn and repeat the experiment:


-    { 0xff290ca0, "EngDrvIn", 1 },
+    { 0xff290ca0, "EngDrvIn", 1 | RET },



D46DC> **INT-A3h*:000aed4c:00:00: >>> INT-A3h Jp57IntrHandler ff291e2c(0)
D4720> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20044), from ff291ce4
D482B> **INT-A3h*:000af1d0:00:00: !!! cannot log return value (err 10)
D4876> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20024), from ff291d18
D491E> **INT-A3h*:000aee24:00:00: ==> 0x14264a0, from ff291d18
D4954> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x14264a0 )
D49AA> **INT-A3h*:000af06c:00:00: *** EngDrvOut(0xc0f26104, 0x20000006), from 12aec
...
D4E17> **INT-0Ah*:00000588:00:00: <<< INT-A3h done


So it's the register 0xc0e20024 where compressed output size is coming from!

Still, what's up with 0xc0e20044? Why its value could not be logged? Look up the error code in patch.h => E_PATCH_CACHE_COLLISION...

Oh no, a cache conflict! What do we do?

Check the index bits of ff291ce8 (LR after the first EngDrvIn call): 0xff291ce8 & 0x7E0 = 0x4e0. Check the other logged addresses (I know, this should be automatic) => 0xff4fa4ec & 0x7E0 = 0x4e0. Comment it out.

Not feeling like doing the math? Just comment out all other ROM addresses (those starting with FF).


E45AF> **INT-A3h*:000aed4c:00:00: >>> INT-A3h Jp57IntrHandler ff291e2c(0)
E45F3> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20044), from ff291ce4
E4681> **INT-A3h*:000aee24:00:00: ==> 0x400, from ff291ce4
E46D2> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20024), from ff291d18
E47C4> **INT-A3h*:000aee24:00:00: ==> 0x14c14d8, from ff291d18
E47FA> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x14c14d8 )


With this method and some patience, you can trace just about anything in the firmware.

nikfreak

Wow, thx for the nice tour and explanation.  8)
Will definitely become useful, not only for me. It just becomes a pain to insert the card a dozen of times but i am used to it from porting. Guess QEMU is the way to go in future...
[size=8pt]70D.112 & 100D.101[/size]


nikfreak

Houston, we've got a malloc problem and that only with the lossless DNG option, rest is fine:

[size=8pt]70D.112 & 100D.101[/size]

a1ex


dfort

Quote from: a1ex on April 14, 2017, 05:33:58 PM
(the same steps should work just fine on the crop_rec_4k branch)

You mean the same steps should work on the compressed_raw branch?

Last time I checked, full res silent lossless compressed DNG (that was a mouthful) wasn't working on the crop_rec_4k branch. The files were being created but the files were way too small and would not open with any of the raw editing programs I've got. Mainly Adobe Camera Raw and dcraw.

Ant123

Can anybody post here logs containing UHS-SD initialisation messages(like there)

nikfreak

[size=8pt]70D.112 & 100D.101[/size]

dfort

Quote from: a1ex on April 14, 2017, 12:48:04 PM
Now compile ML core and the silent picture module, place them on the card, and take a full-res silent picture with lossless compression (so the routine we have modified will get called in a quiet context - with LiveView turned off).

Not sure how to shoot a silent full resolution picture with LiveView turned off but I did manage to capture a dm-0000.log file on the 5D3.123 starting from the compressed_raw branch.

B0B0A> shoot_task:000af81c:00:00: *** register_interrupt("Jp57IntrHandler", 0xa3, 0xff291e2c, 0x0), from ff3dcc78
B10AE> shoot_task:ff3dd01c:16:03: [TTL] START WR1:0x18800064 WR2:0x15a0c000
B10DA> shoot_task:ff3dd048:16:03: [TTL] START RD1:0x86051a8 RD2:0xad24490
CDF6B> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x1327926 )
CDFAA> **INT-83h*:00a12858:00:00: LosslessCompleteCBR


Switching over to the 700D I got the Malloc error that @nikfreak pointed out and no log saved. Next I tried silent.c from the crop_rec_4k branch but that gave the same errors that I reported back in Reply #31 on the 5D3.123 with the crop_rec_4k branch. A screenshot is probably worth more than any description I can come up with:



Then I tried applying just the changes in the commit that @a1ex pointed out but ended up with the same results. On the 700D it results in a system freeze that requires a battery pull. No log or DNG file saved.

a1ex

Quote from: dfort on April 15, 2017, 12:05:22 AM
Last time I checked, full res silent lossless compressed DNG (that was a mouthful) wasn't working on the crop_rec_4k branch.

Hopefully fixed.

dfort

Thought I'd create some logs on the 7D (2.0.3) just to see what makes it tick. First issue was that dm-spy-experiments wouldn't compile because it was missing some stubs. That was easy enough to find:

/** MPU communication **/
NSTUB(0xFF1F0298, _mpu_send);                               /* dwSize < TXBD_DATA_SIZE */
NSTUB(0xFF06B2BC, _mpu_recv);                               /* registered in InitializeIntercom as CBR */


So now it compiles and creates logs. I was going to make a pull request but there are no mpu events in the log files so maybe I haven't found the right addresses?

Also noticed there are some memory patches showing up in the Debug menu while logging is in progress:



There are a few of us looking to get the 7D running on the crop_rec_4k branch. Looks like it is going to be a challenging group project to say the least.

a1ex

The stubs look fine, and so does the memory patches dialog. Hoped to debug it in QEMU, but the emulation doesn't go as far as initializing the MPU communication on this model, and it's not trivial to do so (need to sit down to investigate it).

While you are at it, you could try to collect logs regarding IPC (inter-processor communication) - what registers it uses, what their values are, what's their meaning. For example, the emulation gets stuck at "Wait Master Wakeup Timeout". This one is easy to figure out - return 1 for 0xC0220024 (eos_handle_gpio) and it goes past this, but then gets stuck waiting for startup flag 0x40000000. To figure it out, log FF02AAF8 NotifyComplete and see where it's called with this argument; if it's not obvious, try to get a backtrace. Then see why the emulation doesn't reach that code (usually it's waiting for some semaphore triggered by some interrupt).

Also try to find out where FF06B404 InitializeIntercom is called (it's not obvious to me).

edit: looking to an older startup log, I've noticed there are very few messages with DisablePowerSave / EnablePowerSave (which are called every time a MPU message is received, among others):

for f in *.log *.LOG; do echo -n "$f: "; cat $f | grep PowerSave | wc -l; done
7D-startup.LOG: 18
60D-startup-menu.LOG: 382
550D-startup.LOG: 330
500D-startup.log: 274
450D-dm-mpu.log: 418
700D-startup.LOG: 1543


Hypothesis: could the 7D be the only EOS model that does not use a MPU?!

(photo-parts.ua shows TX19A, and I even have a dump of its code...)

Hm, maybe it's used from the master CPU then. Not obvious from that code either, and emulation doesn't even go past bootloader.

The best way to tell is getting a backtrace for GUI_Control to see where the button codes are coming from.



To log the values of 0xC0220024:


./run_canon_fw.sh 7D -d debugmsg,io,nochain -singlestep |& grep -C 5 0xC0220024


It will print something like this:

    { 0xFF080094, "0xC0220024", R(0), mmio_log },     /* [GPIO] GPIO_9 (master woke up on 7D) at Startup:FF080090 (0x0)*/


Paste that into dm-spy-extra.c and you'll get the values returned by this register on actual hardware.

Repeat for other relevant registers (unfortunately I don't know them in advance).




Based on this snippet from a 7D startup log (using an older dm-spy-experiments):


**INTERRUP:ff079ff0:3f:02:9: ipcEventHandler (1, 0x280b, 8)
**INTERRUP:ff07b084:40:01:9: (ID = 0x280b, 0x8)
  MainCtrl:ff01e584:9c:01:9: ID:13(42)
  MainCtrl:ff01c10c:89:03:9: bindReceiveSwitch (19, 0)


I'm pretty sure the button codes are coming from the master CPU. In other words, the slave CPU (where ML runs currently) does not talk directly to the MPU; that's probably what the master CPU does.

dfort

Followed your instructions and got QEMU to print out the line for 0xC0220024 so I pasted it in dm-spy-extra.c and ran a startup log on the camera. I also made a pull request for dm-spy-experiments to work with the 7D.

Don't see anything on the log that looks like eos_handle_gpio and no mpu activity so perhaps this is happening on the master CPU.

a1ex

Something's wrong - the log is not covering anything from the startup. The logging hooks are probably disabled (a call to sync_caches would do that) and will get re-enabled when loading modules (most likely). That's something I need to fix in the patch manager.

Workaround: move the first call to debug_intercept() right before calling Canon's init_task (init_task_func). FIXED!

Definitions with firmware version are not there yet, except for 5D3 - let's do this in makefile-updates while we are at it (also need to fix some Mac stuff iirc - will check later). DONE!

eos_handle_gpio is from QEMU ;)

BTW - this looks like an easy way to get a log from both master and slave CPU (not as complete as ours, but still useful):

call("dumpfall");


based on these strings from master CPU:

Cannot use [dumpf] from Master
Please use [dumpfall] from Slave


note: call("dumpf") was the old way to get a debug log (see http://magiclantern.wikia.com/wiki/Debugging_Magic_Lantern )

dfort

Saw several commits and merges made on the dm-spy-experiments branch. Not sure if that was my cue but I posted a new 7D startup log - photo mode. Still no MPU events.

[EDIT] Looks like we both posted at the same time. I'll read up on dumpfall.

a1ex

Confirmed - MPU messages are on the Master processor. Actually, g3gg0 tried to log them back in 2012 (!)

a1ex

Sneak preview - able to log MMIO reads/writes for many all? peripherals, outside QEMU:


B3C46> >>> INT-A3h Jp57IntrHandler FF28DAD0(00000000)
B3C5D> **INT-A3h*:FF28C95C:MMIO:  [0xC0E20044] -> 0x00000400
B3C6B> **INT-A3h*:FF28C95C:MMIO:  [0xC0E20024] -> 0x01A39113
B3C95> **INT-A3h*:000afd90:16:03: [TTL][167,86328,0] JpCoreCompleteCBR( 0x1a39113 )


In the previous example, to discover the above MMIO reads, we went to great lengths, adding custom handlers, fighting with cache conflicts and so on. Now it's as simple as enabling CONFIG_MMIO_TRACE=y and selecting the JPCORE memory region in io_trace.c.

This is insanely powerful - it has the potential to reveal the complete interactions between our ARM processor, and the various peripherals in the camera. It would shorten the time required to get a half-decent log by at least one order of magnitude :)

Credits goto g3gg0 - I just adapted his code (mem_prot) and re-purposed it.

dfort


a1ex

That's cool, first try on a different camera (I've only tested on 5D3) and worked out of the box :D

What about trying it to see the touch screen protocol? On M2 (unlike 100D), the touch screen uses I2C - can you try logging REGION(0xC0090000, 0x001000) ?

BTW, there's a newer changeset that has a nicer logging output (although I can interpret both).

dfort

CONFIG_MMIO_TRACE with CONFIG_DEBUG_INTERCEPT_STARTUP is working great. CONFIG_DEBUG_INTERCEPT, not so much. At least not on the EOSM2. The camera freezes after a few seconds. I made another startup log with your latest changes and tapping on the touch screen as the camera started up. Hopefully there's something interesting in there.

https://pastebin.com/Rjr0N6gM

a1ex

Please log the I2C memory region mentioned above (by default, it logs the GPIO registers).

dfort

Oops, sorry didn't read the instructions.

Tried this:

static ASM_VAR uint32_t protected_region = REGION(0xC0090000, 0x001000);


but the camera won't start up. Blinking orange LED then solid orange. No log file saved.

a1ex

Hm, that works fine on 5D3 with HDMI monitor attached (the HDMI chip also talks via I2C). Not sure what the problem might be.

I was also able to log the entire MMIO range with REGION(0xC0000000, 0x10000000), but this one will fill the log very quickly. Any luck with that? (just curious whether it crashes or not).

REGION(0xC0000000, 0x100000) is probably also worth trying.

dfort

Quote from: a1ex on February 08, 2018, 01:18:46 AM
REGION(0xC0000000, 0x100000) is probably also worth trying.

Same lock up issue with that one. Need to run off now but will check back later. It might be something I'm doing or the EOSM2 isn't ready to come out of the closet.