Magic Lantern Forum

Developing Magic Lantern => Reverse Engineering => Topic started by: a1ex on August 28, 2012, 10:56:48 AM

Title: Intercepting DebugMsg with cache hacks
Post by: a1ex on August 28, 2012, 10:56:48 AM
Problem: Canon logs get trimmed (because they are too big), or they don't log all messages.

Is there a way to log all DebugMsg calls?

YES :)

https://bitbucket.org/hudson/magic-lantern/src/tip/src/dm-spy.c

Credits goto g3gg0 for the ultimate patching tool :)

Code: [Select]
        cache_fake((uint32_t)&DryosDebugMsg, B_INSTR((uint32_t)&DryosDebugMsg, my_DebugMsg), TYPE_ICACHE);
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nanomad on August 28, 2012, 11:28:45 AM
Remember to add dm-spy.o to your Makefile too ;)
Title: Re: Intercepting DebugMsg with cache hacks
Post by: Indy on August 29, 2012, 09:32:42 PM
Respect to G3gg0 !
Title: Re: Intercepting DebugMsg with cache hacks
Post by: 1% on August 29, 2012, 09:36:32 PM
Yes, indeed!

Hopefully I can use this to see all JPCORE messages

Nope, still can't see stuff like this:

[JPCORE] SetEncodeH264Parameter I %d,%d,%d'
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nanomad on August 30, 2012, 12:36:16 AM
Try increasing the log size
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on August 30, 2012, 10:26:31 AM
1%: you need to enable ENGIO messages.

Code: [Select]
[JPCORE] SetEncodeH264Parameter I 1920,1088,1
[JPCORE] SetEncodeH264Parameter P 1920,1088,2

This should be what happens when encoding one frame (5D2): http://pastebin.com/J21HuvLX
Title: Re: Intercepting DebugMsg with cache hacks
Post by: 1% on August 30, 2012, 04:13:25 PM
When I was setting the store level to 15, 2 (what I got from Tramell's post) I was getting engio writes but barely anything from jpcore in regular dumpf. I'll try again but what you posted is pretty good to look through. Set store level before or after starting the hijacked logger?


LV_GetPtpZoomWindowArea IX=2327(2496), IY=1582(1630) - Full frame size?
RequestH264Encode is set to 1920x1080 from only 2 calls + the 2 parameters I/P. But I'm getting off topic...
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nanomad on August 30, 2012, 04:43:41 PM
Well 0xFF2493B0 Looks like an encoder initialization routine to me
Title: Re: Intercepting DebugMsg with cache hacks
Post by: 1% on August 30, 2012, 05:35:58 PM
Yep, initializes all the modes... Enabling engio logging worked to see some messages. 
Title: Re: Intercepting DebugMsg with cache hacks
Post by: jplxpto on September 16, 2012, 12:40:28 AM
Problem: Canon logs get trimmed (because they are too big), or they don't log all messages.

Is there a way to log all DebugMsg calls?

YES :)

https://bitbucket.org/hudson/magic-lantern/src/tip/src/dm-spy.c

Credits goto g3gg0 for the ultimate patching tool :)

Code: [Select]
        cache_fake((uint32_t)&DryosDebugMsg, B_INSTR((uint32_t)&DryosDebugMsg, my_DebugMsg), TYPE_ICACHE);

--------------------------------

Can I use the methods in the file cache_hacks.h, I want to replace the call of some functions of the 40D.
I found a function in 40D program that those records p15. I believe that they start to cache I and D.
Can you explain me better how this mechanism works?

I want to write all messages directly to a file without having to invoke dumpf.
For that I need to replace the function that writes messages to the log 0xFFD4C1EC

Thank you
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on April 27, 2014, 06:34:26 PM
I wanted to take a closer look at the boot process, and I've also seen Maqs on IRC doing some low-level research, so today I sat down and pimped this function a little.

Code: https://bitbucket.org/hudson/magic-lantern/commits/branch/dm-spy-experiments
First proof of concept: https://bitbucket.org/hudson/magic-lantern/commits/9e7945c

There are two interesting points here: the plain DebugMsg logging, and a new generic code logger.

Plain DebugMsg log (dm-spy.c):
- updated for the latest codebase (you can enable it from Makefile.user with CONFIG_DEBUG_INTERCEPT, it uses the new patch manager, stuff like that)
- prints task name for each debug message
- can intercept the debug messages very early in the boot process (CONFIG_DEBUG_INTERCEPT_STARTUP)

Startup log example from 5D2: http://pastebin.com/zYc1jwaq

New generic logger (dm-spy-extra.c, built on top of g3gg0's GDB hooks mixed with dm-spy):

- With a simple description like this, you can trace virtually any Canon function calls, combined with Canon's own debug messages:
Code: [Select]
    { 0xff9b9198, "StateTransition", 4 , state_transition_log },
    { 0xff9b989c, "TryPostEvent", 4 },

- by default, this uses a generic logger that should cover many usual cases, but you can always use a custom logging function.

- the code does a bit of reverse engineering on the fly (see also asm.c): for functions, it guesses the type for its arguments (string, function name from referenced string, or plain hex) and it also prints the caller address.
Code: [Select]
          PropMgr:33:01: SendPipeEvent [2][0][5]
          PropMgr:00:00: *** TryPostStageEvent(&"StageClass", &"PtpDps", 0x5, 0x92a96c), from ff83a9f8
          PropMgr:00:00: *** TryPostEvent(&"TaskClass", &"PropMgr", 0x5, 0x735b64), from ff85f220
     ShootCapture:00:00: *** RegisterEDmacCompleteCBR(0x2, 0xffa37488 "CCDWriteEDmacCompleteCBR", 0x0), from ffa37644

- this replaces the old tp-spy and state object logger, with fairly generic code:
Code: [Select]
         EventMgr:00:00: *** TryPostEvent(&"TaskClass", &"LiveViewMgr", 0xe, 0x4), from ff835648
     ShootCapture:00:00: *** SCSState: (8) --10--> (8)                              x=640190 z=0 t=ffcd0741
     ShootCapture:93:03: scsReleaseEnd
      LiveViewMgr:00:00: *** LVState: (0) --14--> (0)                              x=72c44c z=4 t=72ee54
           RscMgr:00:00: *** SRMState: (0) --13--> (0)                              x=63cb0c z=515f1a t=ffcd0741
           RscMgr:00:00: *** SRMState: (0) --14--> (0)                              x=63cb0c z=0 t=ffcd0741

Picture taking log from 5D2: http://pastebin.com/8TPBUAbT

Known issues:
- startup log cannot use the extra loggers (probably GDB hooks get destroyed somehow, didn't look too much into it)
- be careful on low-memory cameras: the startup log allocates 1MB in the main binary. I've only tested it only on 5D2; for other cameras you may need to either limit the log size, remove some ML features, or find some place to allocate it without malloc calls - since the malloc library is not yet initialized, neither in Canon code, nor in ML.

Happy hacking!
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nikfreak on September 28, 2014, 09:49:25 PM
I get some error if i compile with "CONFIG_DEBUG_INTERCET_STARTUP = y". GDB = y is ofc set.
The error doesn't happen with "CONFIG_DEBUG_INTERCEPT = y".

Code: [Select]
make[2]: Leaving directory `/home/magiclantern/nik6d/magic-lantern/tcc'
[ CC       ]   module.o
[ CC       ]   afma.o
[ CC       ]   gps.o
[ AR       ]   strrchr.o
[ AR       ]   dietlibc.a
[ AR       ]   lib_a-setjmp.o
[ AR       ]   newlib-libc.a
[ CP       ]   newlib-libm.a
[ CP       ]   gcc-libgcc.a
[ LD       ]   magiclantern
dm-spy-extra.o: In function `generic_log':
dm-spy-extra.c:(.text+0xc4): undefined reference to `looks_like_string'
dm-spy-extra.c:(.text+0xe0): undefined reference to `is_sane_ptr'
dm-spy-extra.c:(.text+0xf0): undefined reference to `looks_like_string'
dm-spy-extra.c:(.text+0x11c): undefined reference to `asm_guess_func_name_from_string'
make[1]: *** [magiclantern] Error 1
make[1]: Leaving directory `/home/magiclantern/nik6d/magic-lantern/platform/6D.113'
make: *** [6D] Error 2



Anyone else encountering this compile error with "CONFIG_DEBUG_INTERCET_STARTUP = y" ???
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on September 28, 2014, 09:50:27 PM
Add asm.o in your Makefile.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nikfreak on September 28, 2014, 10:04:12 PM
That did it! Thank you.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on July 20, 2015, 09:16:31 PM
A group of us are trying to resolve the EOS-M shutter-bug issue and someone asked for a dm_spy_experiments build to look into what is going on in the boot process. Since I am able to compile ML on my MacBook Pro I tackled this project. Here are some issues and workarounds that I made to get it to compile--hope I didn't mess anything up!

Starting with a fresh clone of dm_spy_experiments:

First of all I copied Makefile.user.default to Makefile.user and turned on CONFIG_DEBUG_INTERCEPT and CONFIG_DEBUG_INTERCEPT_STARTUP. I left CONFIG_DEBUG_INTERCEPT_STARTUP_BLINK off because I think it might not apply to the EOS-M.
Code: [Select]
# Intercept all DebugMsg's
CONFIG_DEBUG_INTERCEPT = y
CONFIG_DEBUG_INTERCEPT_STARTUP = y
CONFIG_DEBUG_INTERCEPT_STARTUP_BLINK = n

My first attempts to compile I got errors relating to the menu and documentation. My workaround was to copy src/menuindexentries.h from the unified branch--it is missing in dm_spy_experiments.

Since I'm on a Mac using the macports version of truncate I got the "truncate: illegal option -- s" so a quick search revealed the following changes in src/Makefile.src:
Code: [Select]
- @truncate -s 284 $@
+ @truncate $@ 284
and
- truncate -s 0 $@ ; \
+ truncate $@ 0 ; \

So I got it to compile and it seems to be working. The DM.LOG looks very interesting. Hopefully this will help figure out what's going on with the EOS-M shutter-bug.

A big thanks to a1ex!
Title: Intercepting DebugMsg with cache hacks
Post by: DeafEyeJedi on July 21, 2015, 01:54:04 AM
@dfort:

That's just awesome news for M users. Great read and thanks for pointing us to this thread.

Thumbs up to @a1ex as always!
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on July 30, 2015, 06:09:22 AM
I just wanted to mention that nikfreak made an updated branch of dm-spy-experiments that is working great. A few of us are using this on the EOS-M to run tests on the shutter-bug--lots of interesting logs.

https://bitbucket.org/nikfreak/magic-lantern/branch/dmspy-70D-new
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nikfreak on July 30, 2015, 10:19:38 AM
It's just the latest unified branch merged into the dmspy-experiments-branch with the additions of 70D related committs. Goal:

Get 70D merged into unified and hope a1ex finds some free time and update some relevant branches which will make life easier and enable us to add committs there: dmspy-experiments, qemu
Title: Re: Intercepting DebugMsg with cache hacks
Post by: escho on August 02, 2016, 11:02:24 PM
Quote
Known issues:
- startup log cannot use the extra loggers (probably GDB hooks get destroyed somehow, didn't look too much into it)
- be careful on low-memory cameras: the startup log allocates 1MB in the main binary. I've only tested it only on 5D2; for other cameras you may need to either limit the log size, remove some ML features, or find some place to allocate it without malloc calls - since the malloc library is not yet initialized, neither in Canon code, nor in ML.

Some feedback using 600D:

The cam does pretty things:
M-Mode: Freezeing, if I switch the cam on
Video-Mode: I can start the cam. But I,m told, there is no card inside
Sometimes bootlog is created, sometimes not.

After removing a bunch of features and compiling new, no problems more . I receive a nice bootlog

Edgar
Title: Re: Intercepting DebugMsg with cache hacks
Post by: rbrune on November 15, 2016, 08:24:15 PM
I'm trying to get the current dm-spy-experiments running on EOSM, is there an easy way to find the current_task address in RAM?
It is missing from the stubs and I'm not sure what to look for. Can I retrieve/reduce it by using get_current_task?
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on November 15, 2016, 11:05:57 PM
Yes, it's very easy to find from get_current_task. It reads current_task->taskId, and and in tasks.h, taskId is at offset 0x40 in struct task.

5D3:
Code: [Select]
FF9E1474             get_current_task
ROM:FF9E1480 1C 00 9F 05                 LDREQ   R0, =0x23E14
ROM:FF9E1484 00 00 90 05                 LDREQ   R0, [R0]
ROM:FF9E1488 40 00 90 05                 LDREQ   R0, [R0,#0x40]

For CURRENT_INTERRUPT_ADDR (optional, but very useful for low-level research), you can find instructions in qemu/scripts/debug-logging.gdb (qemu branch).
Title: Re: Intercepting DebugMsg with cache hacks
Post by: rbrune on November 16, 2016, 11:09:38 AM
Yes, it's very easy to find from get_current_task. It reads current_task->taskId, and and in tasks.h, taskId is at offset 0x40 in struct task.

For CURRENT_INTERRUPT_ADDR (optional, but very useful for low-level research), you can find instructions in qemu/scripts/debug-logging.gdb (qemu branch).

Thanks!
Much easier than expected. I made a pull request for the current_task stub.
The log is way more detailed than I expected.

I will take a look at CURRENT_INTERRUPT_ADDR tonight.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: josepvm on November 16, 2016, 12:39:04 PM
So, for 50D:

Code: [Select]
loc_ff8166b4: ; 14 refs     <  -------------------- (get_current_task)

ff8166b4: e59f0020 ldr r0, [pc, #32] ; ff8166dc: (00001a74)
ff8166b8: e5900000 ldr r0, [r0]
ff8166bc: e3500000 cmp r0, #0
ff8166c0: 059f001c ldreq r0, [pc, #28] ; ff8166e4: (00001a70)        <--------
ff8166c4: 05900000 ldreq r0, [r0]
ff8166c8: 05900040 ldreq r0, [r0, #64] ; 0x40

Can I assume that current_task = 0x1A70 ?
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on November 16, 2016, 02:05:28 PM
That's right.

You can check it either by:
- looking in the log files from dm-spy-experiments (whether they contain valid task names)
- running the firmware under QEMU and GDB like this (http://www.magiclantern.fm/forum/index.php?topic=2864.msg168321#msg168321) (CURRENT_TASK in debugmsg.gdb)
- running the firmware under QEMU with "-d io" and checking whether the I/O register accesses have valid task names (define current_task_addr in qemu/eos/model_list.c)
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on April 14, 2017, 11:21:05 AM
Some additions and tips:

You can now log the return value of any function (a temporary logging point is added on the LR address on the fly, similar to gdb tbreak). Example:

Code: [Select]
     { 0xFF304A74, "IsAddressEEP", 1 | RET },
     { 0xFF309F34, "IsAddressSerialFlash", 1 | RET },
     { 0xFF128228, "Compare_FROMAddress", 1 | RET },

Code: [Select]
75500>  Terminate:000aee60:00:00: *** Compare_FROMAddress(0x2000000), from ff0ce434
75600>  Terminate:000aee60:00:00: *** IsAddressEEP(0xf8f51000), from ff128260
75600>  Terminate:000aec18:00:00: ==> 0x0, from ff128260
75600>  Terminate:000aee60:00:00: *** IsAddressSerialFlash(0xf8f51000), from ff128274
76100>  Terminate:000aec18:00:00: ==> 0x0, from ff128274
76200>  Terminate:ff1282a4:02:03: Compare FROMAddress (0) 0x406cf800 0xfff40000 Size 4088
76300>  Terminate:000aec18:00:00: ==> 0x0, from ff0ce434

(notice the nested calls)


- RAM pointers that appear to be data structures are logged as well (no need to do anything special):
Code: [Select]
     { 0xFF127844, "prop_erase", 1 },
     { 0xFF1272FC, "propad_erase", 3 },
Code: [Select]
76300>  Terminate:000aee60:00:00: *** prop_erase(0x2000000), from ff0ce444
76400>  Terminate:000aee60:00:00: *** propad_erase(0x6cf708, 0x1, 0x147c70), from ff127870
76400>  Terminate:000aef5c:00:00:     *0x6cf708 = { f8f40000 fff40000 20000 2  f8f51000 f8f52000 406cf800 406cf790  1000 1 64f55c 6d0904 ... }
76400>  Terminate:000aef5c:00:00:     *0x147c70 = { 6cf708 0 19980218 19980218  19980218 23f2c 19980218 ff0ce448  0 becd3c 19980218 ff0f8ee0 ... }


- Functions already defined in stubs can be taken from there (no need to type their address again):
Code: [Select]
    STUB_ENTRY(task_create, 5),
    STUB_ENTRY(msleep, 1),
    STUB_ENTRY(_malloc, 1 | RET),
    STUB_ENTRY(_free, 1),


- The logging points do not have to be functions - they can be anywhere in the middle of the code (in theory - bug reports welcome) and you can log arbitrary register values (without a custom logging function):

Code: [Select]
    { 0xFF1C8C00, "resinfo_wait_smth", R(0) },
    { 0xFF1C8AC0, "resinfo_check", R(7) },


- You can log interrupts as well. Define LOG_INTERRUPTS in dm-spy-extra.c:
Code: [Select]
ShootCaptu:000afe68:00:00: *** register_interrupt("Jp57IntrHandler", 0xa3, 0xff28dad0, 0x0), from ff28f71c
ShootPreDe:ff14f698:95:03: spsInit
**INT-A3h*:000afc54:00:00: >>> INT-A3h Jp57IntrHandler ff28dad0(0)
**INT-A3h*:00000588:00:00: <<< INT-A3h done




- You can run the startup logging under QEMU, in real-time, with no cache conflicts:

Code: [Select]
CONFIG_DEBUG_INTERCEPT_STARTUP = y
CONFIG_QEMU = y

It's best to install QEMU from the qemu branch for now - which should not cause problems with branching, as QEMU runs outside the ML tree.

There is similar functionality in the GDB scripts from the QEMU branch - they output somewhat similar logs, without modifying the guest code (so they can be used very early in the boot process). The method from here requires running autoexec.bin alongside the main firmware.

Q: Then why not just use GDB? (when running in QEMU)

A: it's very very slow. I bet you don't like to wait for half an hour for the firmware to boot the GUI under GDB, when it can do the same thing with DebugMsg hooks in about 1 minute.

Q: Then why do we use GDB?

A: It works, it's well-tested and does not require (http://www.magiclantern.fm/forum/index.php?topic=2864.msg166936#msg166936) modifying (http://www.magiclantern.fm/forum/index.php?topic=2864.msg168321#msg168321) the guest code (https://bitbucket.org/hudson/magic-lantern/commits/e84f8b32acb#Lcontrib/qemu/eos/debug_message_helper.cT47). The method from here must have ML ported on your camera before you can use it (not to mention bugs in this code possibly affecting the results), while with GDB, just add your logging points and you are good to go.



- Anything helpful for understanding ENGIO, ADTG, EDMAC and similar monsters?

These things are a bit verbose, so a bunch of messages are disabled by default. Comment out this in dm-spy.c:
Code: [Select]
    if (class == 21) // engio, lots of messages
        return;

There is also a script in the iso-research branch, that can annotate logs with info about ADTG/CMOS/ENGIO registers:

https://bitbucket.org/hudson/magic-lantern/src/iso-research/modules/adtg_gui/annotate_log.py

Some registers are set directly (without the EngDrvOut/engio_write wrappers) - simply by writing to memory. Currently, the only way to catch these is to run the analyzed routine in QEMU (assuming it can emulate at least some of it) and log the MMIO activity:

Code: [Select]
./run_canon_fw.sh 5D3,firmware="boot=1" -d io



- What about timestamps?

We have those, too. The number on the left of > is the hardware counter aka digic timer (microseconds, wraps at 0x100000):
Code: [Select]
04A55>       init:000af020:00:00: *** msleep(0x64), from 69674
...
1BACC>       init:000af020:00:00: *** msleep(0x64), from 69674

All that's left is to unwrap the counter in post.



- Common errors and how to solve them?

Code: [Select]
Will NOT load on:
    100D (debug_intercept)

Recompile ML core, double-checking Makefile.user (CONFIG_DEBUG_INTERCEPT = y) and running "make clean" before.

Code: [Select]
Birthday paradox: ff4fa224 RegisterHead3InterruptHandler
  conflicts with: ff216a20 RegisterHead1InterruptHandler

That's a limitation of cache hacks - you can't patch two ROM addresses whose index bits (mask 0x7E0) are the same, but the more significant bits are different.

What you can do?
- Find a different logging address (sometimes, moving a few bytes around will change the index bits, solving the conflict)
- Log the two conflicting functions in two sessions
- Try under QEMU (which does not emulate cache)
- Update cache_fake to lock down two caching lines for ROM patches.



The examples were from 5D3 1.2.3 (most) and 60D 1.1.1 (resinfo).

Happy reversing.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nikfreak on April 14, 2017, 12:27:21 PM
just gave it a quick try after a long time (=months).
Best thing is that dm.log doesn't get overwritten like in past:

https://drive.google.com/drive/folders/0B9Mu66yg5QzROGV6aTNWTlJNbFE?usp=sharing
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on April 14, 2017, 12:48:04 PM
You can log code sequences from other branches as well. Here's an example for debugging compressed_raw (http://www.magiclantern.fm/forum/index.php?topic=18443) on 5D3 1.2.3:

Makefile.user:
Code: [Select]
CONFIG_DEBUG_INTERCEPT = y

Merge the two branches:
Code: [Select]
hg update compressed_raw -C
hg merge dm-spy-experiments

(there will be some merge conflicts, sorry about that, I'm still novice in this area)
(maybe it's best to cleanup the dm-spy-experiments branch a bit and merge it in mainline)
(ignore mlv_lite and mlv_rec for now, as we are not going to use them - just merge the other files)

Change the code to enable logging:
Code: [Select]
diff -r ad3969261ccf modules/silent/lossless.c
--- a/modules/silent/lossless.c
+++ b/modules/silent/lossless.c
@@ -3,6 +3,7 @@
 #include "lens.h"
 #include "raw.h"
 #include "edmac.h"
+#include "dm-spy.h"
 
 struct TwoInTwoOutLosslessPath_args
 {
@@ -251,10 +252,18 @@
 
 int lossless_compress_raw(struct raw_info * raw_info, struct memSuite * output_memsuite)
 {
-    return lossless_compress_raw_rectangle(
+    /* start logging */
+    debug_intercept();
+
+    int result = lossless_compress_raw_rectangle(
         output_memsuite, raw_info->buffer,
         raw_info->width, 0, 0,
         raw_info->width, raw_info->height
     );
+
+    /* stop logging and save to file */
+    debug_intercept();
+
+    return result;
 }

Now compile ML core and the silent picture module, place them on the card, and take a full-res silent picture with lossless compression (so the routine we have modified will get called in a quiet context - with LiveView turned off).

Code: [Select]
cd platform/5D3.123
make clean; make
make installq       # this copies only autoexec and sym file
cd ../../modules/silent
make clean; make
make install        # copy only this module to your card

You should get a file named dm-000.log in the root of the card, that contains these lines (among many others):
Code: [Select]
172EB> shoot_task:000aeef0:00:00: *** register_interrupt("Jp57IntrHandler", 0xa3, 0xff291e2c, 0x0), from ff3dcc78
17882> shoot_task:ff3dd01c:16:03: [TTL] START WR1:0x18800064 WR2:0x15a0c000
178B2> shoot_task:ff3dd048:16:03: [TTL] START RD1:0x86051a8 RD2:0xad24490
34733> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x15336af )
3476D> **INT-83h*:00a0e980:00:00: LosslessCompleteCBR







Want more details?

Let's enable interrupts and ENGIO messages:
Code: [Select]
diff -r 9146aa47814c src/dm-spy-extra.c
--- a/src/dm-spy-extra.c
+++ b/src/dm-spy-extra.c
@@ -20,7 +20,7 @@
 #include "timer.h"
 
 /* this needs pre_isr_hook/post_isr_hook stubs */
-//~ #define LOG_INTERRUPTS
+#define LOG_INTERRUPTS
 
 /* delay mpu_send calls, to allow grouping MPU messages by timestamps */
 #define MPU_DELAY_SEND
@@ -338,11 +338,11 @@
     { 0xff2953b8, "LockEngineResources", 1, LockEngineResources_log },
     { 0xff29555c, "UnLockEngineResources", 1, UnLockEngineResources_log},
 
-    //~ { 0xff290f98, "engio_write", 1, engio_write_log},   /* on 5D3, Canon's engio_write is silent */
-    //~ { 0xff290c80, "EngDrvOut", 2 },                     /* same here */
-    //~ { 0xff290ca0, "EngDrvIn", 1 },
-    //~ { 0xff290d38, "EngDrvBits", 3, engdrvbits_log},
-    //~ { 0xff290cd4, "EngDrvIns", 3 },
+    { 0xff290f98, "engio_write", 1, engio_write_log},   /* on 5D3, Canon's engio_write is silent */
+    { 0xff290c80, "EngDrvOut", 2 },                     /* same here */
+    { 0xff290ca0, "EngDrvIn", 1 },
+    { 0xff290d38, "EngDrvBits", 3, engdrvbits_log},
+    { 0xff290cd4, "EngDrvIns", 3 },
     /* only EngDrvOuts is verbose, no need to log it here */
     
     //~ { 0xFF694880, "eeko_setup_global_vars", 10 },
diff -r 9146aa47814c src/dm-spy.c
--- a/src/dm-spy.c
+++ b/src/dm-spy.c
@@ -52,8 +52,6 @@
 {
     if (!buf) return;
         
-    if (class == 21) // engio, lots of messages
-        return;
     
     #ifdef CONFIG_7D
     if (class == 0x3E) // IPC, causes ERR70

Lots of messages now! Let's look at some of them.

How does SetEDmac work?

Code: [Select]
CDD68> shoot_task:000af06c:00:00: *** SetEDmac(0x11, 0x58800064, 0x1c42ac, 0x20000001), from ff580f28
CDD80> shoot_task:000af168:00:00:     *0x1c42ac = { 0 0 0 0  0 0 2000 0  fff 0 0 a0f7f8 ... }
CDDBD> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f2610c, Data:0x       0)[8]EngDrvOuts
CDDE5> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26110, Data:0x fff2000)[7]EngDrvOuts
CDE0F> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26114, Data:0x       0)[6]EngDrvOuts
CDE32> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26118, Data:0x       0)[5]EngDrvOuts
CDE55> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f2611c, Data:0x       0)[4]EngDrvOuts
CDE79> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26120, Data:0x       0)[3]EngDrvOuts
CDE9C> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26124, Data:0x       0)[2]EngDrvOuts
CDEBF> shoot_task:ff290e18:15:02: [ENGIO](Addr:0xc0f26128, Data:0x       0)[1]EngDrvOuts
CDEE5> shoot_task:000125c0:15:01: ResetEDmac(0x20000001)
CDF1F> shoot_task:000af06c:00:00: *** EngDrvOut(0xc0f26104, 0x20000001), from 125d4
CDF55> shoot_task:000af06c:00:00: *** EngDrvOut(0xc0f26108, 0x18800064), from 125ec

This reveals the register configuration performed by SetEDmac.

How do we know where to stop? The next line reads:
Code: [Select]
CDF9C> shoot_task:000af06c:00:00: *** RegisterEDmacCompleteCBR(0x11, 0xff580b3c "[PackMem] CompleteInterrupt In %d", 0x11), from ff580f38

which is the next call, and nothing interesting happens in-between (so, all those messages must be from SetEDmac).

Alternative: log the return value of SetEDmac, so you'll know for sure where this call ends. Exercise for the reader.

How does the main CPU know when the compression is finished?

The JPCORE CPU announces it using an interrupt:
Code: [Select]
EC9F6> **INT-A3h*:000aed4c:00:00: >>> INT-A3h Jp57IntrHandler ff291e2c(0)
ECA36> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20044), from ff291ce4
ECA70> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20024), from ff291d18
ECA90> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x1403b4e )
ECADC> **INT-A3h*:000af06c:00:00: *** EngDrvOut(0xc0f26104, 0x20000006), from 12aec
...
ECCB1> **INT-83h*:00000588:00:00: <<< INT-A3h done

That magic number is 0x1403b4e = 20986702 - looks like a valid compressed size for a full-res picture (and you can confirm by printing output_size in lossless.c).

OK, so that must be where the firmware finds out the compressed size! Let's see which of these registers returns it. Enable logging the return value of EngDrvIn and repeat the experiment:

Code: [Select]
-    { 0xff290ca0, "EngDrvIn", 1 },
+    { 0xff290ca0, "EngDrvIn", 1 | RET },

Code: [Select]
D46DC> **INT-A3h*:000aed4c:00:00: >>> INT-A3h Jp57IntrHandler ff291e2c(0)
D4720> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20044), from ff291ce4
D482B> **INT-A3h*:000af1d0:00:00: !!! cannot log return value (err 10)
D4876> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20024), from ff291d18
D491E> **INT-A3h*:000aee24:00:00: ==> 0x14264a0, from ff291d18
D4954> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x14264a0 )
D49AA> **INT-A3h*:000af06c:00:00: *** EngDrvOut(0xc0f26104, 0x20000006), from 12aec
...
D4E17> **INT-0Ah*:00000588:00:00: <<< INT-A3h done

So it's the register 0xc0e20024 where compressed output size is coming from!

Still, what's up with 0xc0e20044? Why its value could not be logged? Look up the error code in patch.h => E_PATCH_CACHE_COLLISION...

Oh no, a cache conflict! What do we do?

Check the index bits of ff291ce8 (LR after the first EngDrvIn call): 0xff291ce8 & 0x7E0 = 0x4e0. Check the other logged addresses (I know, this should be automatic) => 0xff4fa4ec & 0x7E0 = 0x4e0. Comment it out.

Not feeling like doing the math? Just comment out all other ROM addresses (those starting with FF).

Code: [Select]
E45AF> **INT-A3h*:000aed4c:00:00: >>> INT-A3h Jp57IntrHandler ff291e2c(0)
E45F3> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20044), from ff291ce4
E4681> **INT-A3h*:000aee24:00:00: ==> 0x400, from ff291ce4
E46D2> **INT-A3h*:000af06c:00:00: *** EngDrvIn(0xc0e20024), from ff291d18
E47C4> **INT-A3h*:000aee24:00:00: ==> 0x14c14d8, from ff291d18
E47FA> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x14c14d8 )

With this method and some patience, you can trace just about anything in the firmware.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nikfreak on April 14, 2017, 01:03:14 PM
Wow, thx for the nice tour and explanation.  8)
Will definitely become useful, not only for me. It just becomes a pain to insert the card a dozen of times but i am used to it from porting. Guess QEMU is the way to go in future...
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on April 14, 2017, 01:30:38 PM
It just becomes a pain to insert the card a dozen of times

Here's my current solution for this. (http://www.magiclantern.fm/forum/index.php?topic=17041.msg183173#msg183173)
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nikfreak on April 14, 2017, 03:44:48 PM
Houston, we've got a malloc problem and that only with the lossless DNG option, rest is fine:

(http://i67.tinypic.com/9uwuwl.jpg)
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on April 14, 2017, 05:33:58 PM
This should help: https://bitbucket.org/hudson/magic-lantern/commits/05a3127230a07ceefabaa45c870739c5460e4945?at=crop_rec_4k

(the same steps should work just fine on the crop_rec_4k branch)
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on April 15, 2017, 12:05:22 AM
(the same steps should work just fine on the crop_rec_4k branch)

You mean the same steps should work on the compressed_raw branch?

Last time I checked, full res silent lossless compressed DNG (that was a mouthful) wasn't working on the crop_rec_4k branch. The files were being created but the files were way too small and would not open with any of the raw editing programs I've got. Mainly Adobe Camera Raw and dcraw.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: Ant123 on April 15, 2017, 12:09:09 AM
Can anybody post here logs containing UHS-SD initialisation messages(like there (https://chdk.setepontos.com/index.php?topic=13089.msg132312#msg132312))
Title: Re: Intercepting DebugMsg with cache hacks
Post by: nikfreak on April 15, 2017, 12:40:25 AM
is this ok for you?
http://www.magiclantern.fm/forum/index.php?topic=12862
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on April 16, 2017, 07:58:28 AM
Now compile ML core and the silent picture module, place them on the card, and take a full-res silent picture with lossless compression (so the routine we have modified will get called in a quiet context - with LiveView turned off).

Not sure how to shoot a silent full resolution picture with LiveView turned off but I did manage to capture a dm-0000.log file on the 5D3.123 starting from the compressed_raw branch.

Code: [Select]
B0B0A> shoot_task:000af81c:00:00: *** register_interrupt("Jp57IntrHandler", 0xa3, 0xff291e2c, 0x0), from ff3dcc78
B10AE> shoot_task:ff3dd01c:16:03: [TTL] START WR1:0x18800064 WR2:0x15a0c000
B10DA> shoot_task:ff3dd048:16:03: [TTL] START RD1:0x86051a8 RD2:0xad24490
CDF6B> **INT-A3h*:ff3dcc08:16:03: [TTL][0,0,0] JpCoreCompleteCBR( 0x1327926 )
CDFAA> **INT-83h*:00a12858:00:00: LosslessCompleteCBR

Switching over to the 700D I got the Malloc error that @nikfreak pointed out and no log saved. Next I tried silent.c from the crop_rec_4k branch but that gave the same errors that I reported back in Reply #31 (http://www.magiclantern.fm/forum/index.php?topic=2388.msg183203#msg183203) on the 5D3.123 with the crop_rec_4k branch. A screenshot is probably worth more than any description I can come up with:

(https://c1.staticflickr.com/3/2934/33252839873_4419a32c77_z.jpg) (https://flic.kr/p/SEruMp)

Then I tried applying just the changes in the commit that @a1ex pointed out (https://bitbucket.org/hudson/magic-lantern/commits/05a3127230a07ceefabaa45c870739c5460e4945?at=crop_rec_4k) but ended up with the same results. On the 700D it results in a system freeze that requires a battery pull. No log or DNG file saved.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on April 21, 2017, 11:44:43 PM
Last time I checked, full res silent lossless compressed DNG (that was a mouthful) wasn't working on the crop_rec_4k branch.

Hopefully fixed. (https://bitbucket.org/hudson/magic-lantern/commits/a9c8ad412af5c282bbad6889ce5f65d974ca3ada)
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on November 22, 2017, 04:49:52 PM
Thought I'd create some logs on the 7D (2.0.3) just to see what makes it tick. First issue was that dm-spy-experiments wouldn't compile because it was missing some stubs. That was easy enough to find:

Code: [Select]
/** MPU communication **/
NSTUB(0xFF1F0298, _mpu_send);                               /* dwSize < TXBD_DATA_SIZE */
NSTUB(0xFF06B2BC, _mpu_recv);                               /* registered in InitializeIntercom as CBR */

So now it compiles and creates logs. I was going to make a pull request but there are no mpu events in the log files so maybe I haven't found the right addresses?

Also noticed there are some memory patches showing up in the Debug menu while logging is in progress:

(https://farm5.staticflickr.com/4539/37865846634_9d908ab7d1.jpg) (https://flic.kr/p/ZG5mYJ)

There are a few of us looking to get the 7D running on the crop_rec_4k branch. Looks like it is going to be a challenging group project to say the least.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on November 22, 2017, 06:01:41 PM
The stubs look fine, and so does the memory patches dialog. Hoped to debug it in QEMU, but the emulation doesn't go as far as initializing the MPU communication on this model, and it's not trivial to do so (need to sit down to investigate it).

While you are at it, you could try to collect logs regarding IPC (inter-processor communication) - what registers it uses, what their values are, what's their meaning. For example, the emulation gets stuck at "Wait Master Wakeup Timeout". This one is easy to figure out - return 1 for 0xC0220024 (eos_handle_gpio) and it goes past this, but then gets stuck waiting for startup flag 0x40000000. To figure it out, log FF02AAF8 NotifyComplete and see where it's called with this argument; if it's not obvious, try to get a backtrace. Then see why the emulation doesn't reach that code (usually it's waiting for some semaphore triggered by some interrupt).

Also try to find out where FF06B404 InitializeIntercom is called (it's not obvious to me).

edit: looking to an older startup log, I've noticed there are very few messages with DisablePowerSave / EnablePowerSave (which are called every time a MPU message is received, among others):
Code: [Select]
for f in *.log *.LOG; do echo -n "$f: "; cat $f | grep PowerSave | wc -l; done
7D-startup.LOG: 18
60D-startup-menu.LOG: 382
550D-startup.LOG: 330
500D-startup.log: 274
450D-dm-mpu.log: 418
700D-startup.LOG: 1543

Hypothesis: could the 7D be the only EOS model that does not use a MPU?!

(photo-parts.ua shows TX19A (https://photo-parts.com.ua/parts/?part=Canon%207D), and I even have a dump of its code...)

Hm, maybe it's used from the master CPU then. Not obvious from that code either, and emulation doesn't even go past bootloader.

The best way to tell is getting a backtrace for GUI_Control to see where the button codes are coming from.


To log the values of 0xC0220024:

Code: [Select]
./run_canon_fw.sh 7D -d debugmsg,io,nochain -singlestep |& grep -C 5 0xC0220024

It will print something like this:
Code: [Select]
    { 0xFF080094, "0xC0220024", R(0), mmio_log },     /* [GPIO] GPIO_9 (master woke up on 7D) at Startup:FF080090 (0x0)*/

Paste that into dm-spy-extra.c and you'll get the values returned by this register on actual hardware.

Repeat for other relevant registers (unfortunately I don't know them in advance).



Based on this snippet from a 7D startup log (using an older dm-spy-experiments):

Code: [Select]
**INTERRUP:ff079ff0:3f:02:9: ipcEventHandler (1, 0x280b, 8)
**INTERRUP:ff07b084:40:01:9: (ID = 0x280b, 0x8)
  MainCtrl:ff01e584:9c:01:9: ID:13(42)
  MainCtrl:ff01c10c:89:03:9: bindReceiveSwitch (19, 0)

I'm pretty sure the button codes are coming from the master CPU. In other words, the slave CPU (where ML runs currently) does not talk directly to the MPU; that's probably what the master CPU does.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on November 23, 2017, 03:49:19 AM
Followed your instructions and got QEMU to print out the line for 0xC0220024 so I pasted it in dm-spy-extra.c and ran a startup log on the camera (https://pastebin.com/9JZWsx0U). I also made a pull request (https://bitbucket.org/hudson/magic-lantern/pull-requests/881/dm-spy-experiments-for-7d/diff) for dm-spy-experiments to work with the 7D.

Don't see anything on the log that looks like eos_handle_gpio and no mpu activity so perhaps this is happening on the master CPU.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on November 23, 2017, 09:56:30 AM
Something's wrong - the log is not covering anything from the startup. The logging hooks are probably disabled (a call to sync_caches would do that) and will get re-enabled when loading modules (most likely). That's something I need to fix in the patch manager.

Workaround: move the first call to debug_intercept() right before calling Canon's init_task (init_task_func). FIXED!

Definitions with firmware version are not there yet, except for 5D3 - let's do this in makefile-updates while we are at it (also need to fix some Mac stuff iirc - will check later). DONE!

eos_handle_gpio is from QEMU ;)

BTW - this looks like an easy way to get a log from both master and slave CPU (not as complete as ours, but still useful):
Code: [Select]
call("dumpfall");

based on these strings from master CPU:
Code: [Select]
Cannot use [dumpf] from Master
Please use [dumpfall] from Slave

note: call("dumpf") was the old way to get a debug log (see http://magiclantern.wikia.com/wiki/Debugging_Magic_Lantern )
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on November 24, 2017, 01:47:52 AM
Saw several commits and merges made on the dm-spy-experiments branch. Not sure if that was my cue but I posted a new 7D startup log - photo mode (https://pastebin.com/CSFpnx7c). Still no MPU events.

[EDIT] Looks like we both posted at the same time. I'll read up on dumpfall.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on November 24, 2017, 08:14:55 AM
Confirmed - MPU messages are on the Master processor. Actually, g3gg0 tried to log them (https://bitbucket.org/hudson/magic-lantern/commits/f856de937243) back in 2012 (!)
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 07, 2018, 09:39:04 PM
Sneak preview - able to log MMIO reads/writes for many all? peripherals (https://bitbucket.org/hudson/magic-lantern/pull-requests/900/mmio-tracing-backend-insanely-powerful), outside QEMU:

Code: [Select]
B3C46> >>> INT-A3h Jp57IntrHandler FF28DAD0(00000000)
B3C5D> **INT-A3h*:FF28C95C:MMIO:  [0xC0E20044] -> 0x00000400
B3C6B> **INT-A3h*:FF28C95C:MMIO:  [0xC0E20024] -> 0x01A39113
B3C95> **INT-A3h*:000afd90:16:03: [TTL][167,86328,0] JpCoreCompleteCBR( 0x1a39113 )

In the previous example (https://www.magiclantern.fm/forum/index.php?topic=2388.msg183168#msg183168), to discover the above MMIO reads, we went to great lengths, adding custom handlers, fighting with cache conflicts and so on. Now it's as simple as enabling CONFIG_MMIO_TRACE=y and selecting the JPCORE memory region in io_trace.c.

This is insanely powerful - it has the potential to reveal the complete interactions between our ARM processor, and the various peripherals in the camera. It would shorten the time required to get a half-decent log by at least one order of magnitude :)

Credits goto g3gg0 - I just adapted his code (mem_prot) and re-purposed it.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on February 07, 2018, 10:27:44 PM
Merged in the EOSM2 and it spit out this startup log: https://pastebin.com/3vXEdXRP
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 07, 2018, 10:38:21 PM
That's cool, first try on a different camera (I've only tested on 5D3) and worked out of the box :D

What about trying it to see the touch screen protocol? On M2 (unlike 100D), the touch screen uses I2C - can you try logging REGION(0xC0090000, 0x001000) ?

BTW, there's a newer changeset that has a nicer logging output (although I can interpret both).
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on February 08, 2018, 12:19:50 AM
CONFIG_MMIO_TRACE with CONFIG_DEBUG_INTERCEPT_STARTUP is working great. CONFIG_DEBUG_INTERCEPT, not so much. At least not on the EOSM2. The camera freezes after a few seconds. I made another startup log with your latest changes and tapping on the touch screen as the camera started up. Hopefully there's something interesting in there.

https://pastebin.com/Rjr0N6gM
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 08, 2018, 12:24:24 AM
Please log the I2C memory region mentioned above (by default, it logs the GPIO registers).
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on February 08, 2018, 01:06:19 AM
Oops, sorry didn't read the instructions.

Tried this:

Code: [Select]
static ASM_VAR uint32_t protected_region = REGION(0xC0090000, 0x001000);

but the camera won't start up. Blinking orange LED then solid orange. No log file saved.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 08, 2018, 01:18:46 AM
Hm, that works fine on 5D3 with HDMI monitor attached (the HDMI chip also talks via I2C). Not sure what the problem might be.

I was also able to log the entire MMIO range with REGION(0xC0000000, 0x10000000), but this one will fill the log very quickly. Any luck with that? (just curious whether it crashes or not).

REGION(0xC0000000, 0x100000) is probably also worth trying.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on February 08, 2018, 01:50:04 AM
REGION(0xC0000000, 0x100000) is probably also worth trying.

Same lock up issue with that one. Need to run off now but will check back later. It might be something I'm doing or the EOSM2 isn't ready to come out of the closet.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on February 08, 2018, 07:18:21 AM
Hum. It compiles fine on the EOSM2 merge I made but not on the original EOSM in an unadulterated io_trace branch. Maybe this is pointing to a possible problem?

Code: [Select]
[ LD       ]   magiclantern
liveview.o:(.rodata+0x0): undefined reference to `LiveViewApp_handler_BL_JudgeBottomInfoDispTimerState'
make: *** [magiclantern] Error 1

Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 08, 2018, 12:40:59 PM
Right, the vanilla startup-log build also failed to build for EOSM.

This comes from an experimental PR (https://bitbucket.org/hudson/magic-lantern/pull-requests/734/cleaner-method-for-hiding-canons-bottom) I've included in order to solve a lockup on 60D in LiveView (back then I wrote it to solve issue #2434 (https://bitbucket.org/hudson/magic-lantern/issues/2434/ml-crash-when-resizing-image-eos-650d) on 650D and 700D, but was reported to cause issues with 70D, so I left it aside for a while).
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on February 08, 2018, 05:38:52 PM
Found out that I have the DEBUGMSG variation of CONFIG_LVAPP_HACK instead of PATCH in the EOSM2. This change will compile on the original EOSM:

Code: [Select]
  /** Patch LiveViewApp dialog hander to hide Canon bottom bar */
- #define CONFIG_LVAPP_HACK_PATCH
+ #define CONFIG_LVAPP_HACK_DEBUGMSG

but is this correct?

FYI - When I hit a problem on the EOSM2 I try it out on the EOSM.

Going back to the touch screen test, here is a startup log on the EOSM (https://pastebin.com/SQb3Tugr) using CONFIG_LVAPP_HACK_DEBUGMSG and REGION(0xC0090000, 0x001000). I was tapping on the screen during startup. Like the EOSM2 the EOSM hung on startup but it did save a log.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 08, 2018, 09:42:50 PM
The log from M looks good.

CONFIG_LVAPP_HACK_DEBUGMSG is no longer a valid option on the above PR (it has no effect, same as commenting out the old option). Without any bottom bar hack enabled, Canon's bottom bar will probably conflict with ML one (it does on all other cameras I tried).

To debug, first advice would be to start without LiveView, but I assume you already tried that.

Next, I'd comment out our logging stuff from the data abort handler, and leave just the bare minimum for tracing the MMIO reads/writes (without actually logging anything):

Code: [Select]
static void __attribute__ ((naked)) trap()
{
    /* data abort exception occurred. switch stacks, log the access,
     * enable permissions and re-execute trapping instruction */
    asm(
        /* save context */
        "STMFD  SP!, {R0-R12, LR}\n"

        /* prepare to re-execute the old instruction */
        "SUB    R0, LR, #8\n"
        "LDR    R0, [R0]\n"

        /* copy it to uncacheable memory - will this work? */
        "LDR    R1, =trapped_instruction\n"
        "ORR    R1, #0x40000000\n"
        "STR    R0, [R1]\n"

        /* sync caches - unsure */
        "mov    r0, #0\n"
        "mcr    p15, 0, r0, c7, c10, 4\n" // drain write buffer
        "mcr    p15, 0, r0, c7, c5, 0\n" // flush I cache

        /* enable full access to memory */
        "MOV    R0, #0x00\n"
        "MCR    p15, 0, r0, c6, c7, 0\n"

        /* restore context */
        "LDMFD  SP!, {R0-R12, LR}\n"

        /* placeholder for executing the old instruction */
        "trapped_instruction:\n"
        ".word 0x00000000\n"

        /* save context once again */
        "STMFD  SP!, {R0-R12, LR}\n"

        /* sync caches - unsure, try commenting out, see b771f5a0ed3f */
        "mov    r0, #0\n"
        "mcr    p15, 0, r0, c7, c10, 4\n" // drain write buffer
        "mcr    p15, 0, r0, c7, c5, 0\n" // flush I cache

        /* re-enable memory protection */
        "LDR    R0, protected_region\n"
        "MCR    p15, 0, R0, c6, c7, 0\n"

        /* restore context */
        "LDMFD  SP!, {R0-R12, LR}\n"

        /* continue the execution after the trapped instruction */
        "SUBS   PC, LR, #4\n"

        /* ------------------------------------------ */
    );
}

The above runs "step by step" the trapped instruction (the one that does the MMIO read/write), and allows one to log what happens before and after its execution (but doesn't actually log anything - this is just the framework for logging the MMIO accesses). This code should run without crashing, but I'm not sure what to do with the caching registers - if it doesn't work, that could benefit from some fiddling. I don't really know whether the cache is handled correctly - if it's not, it's likely to result in subtle side effects that may be hard to track down. If you know how to handle this stuff properly, please chime in.

On 5D3, with REGION(0xC0000000, 0x10000000), the above code sequence may have some side effects when capturing photos (distorted colors, non-deterministic, doesn't happen all the time) and LiveView doesn't start (black screen, camera locked up). Looking into it.

If that won't work, changeset b771f5a0ed3f was the first one that worked on the camera; it did not save any kind of logs, but it did not trap all the MMIO accesses either; only the first one from a DryOS time slice (a very small percentage of them). Still, useful for debugging. You don't have the REGION macro on that changeset - REGION(0xC0090000, 0x001000) evaluates to 0xC0090017.



Edit: narrowed down a LiveView crash to REGION(0xC0F19000, 0x001000). No idea what device uses this range, and emulation doesn't go far enough to access these registers.

The crash happens on the first exception from this range (which makes it easier to debug). Failed instruction:
Code: [Select]
0001C370  LDMIA   R1!, {R3,R4,R12,LR}

and it appears to be in some sort of memcpy, which gets called like this:
Code: [Select]
7DD90>        Evf:000afaa8:00:00: *** sub_1C364(0xfe4bd00, 0xc0f19000, 0x140, 0x0), from ff48f25c
7DDDF>        Evf:000afaa8:00:00: *** sub_1C364(0xfe4be40, 0xc0f19200, 0x140, 0x0), from ff48f270
7DE2C>        Evf:000afaa8:00:00: *** sub_1C364(0xfe4bf80, 0xc0f19400, 0x140, 0x0), from ff48f284
7DE78>        Evf:000afaa8:00:00: *** sub_1C364(0xfe4c0c0, 0xc0f19600, 0x140, 0x0), from ff48f298
...

If I re-enable the permissions and return from the data abort handler so it re-executes this instruction execute from its original location (SUBS PC, LR, #8), I get no crash.

If I execute this instruction from our handler (self-modifying code), and return from data abort to the next instruction (SUBS PC, LR, #4), it crashes.

This scenario should be reproducible in QEMU, outside LiveView.



Edit: indeed, it is. Somehow it attempts to execute an undefined instruction.

Code: [Select]

0x0001c370:  e8b15018      ldm r1!, {r3, r4, ip, lr}
Taking exception 4 [Data Abort]
...with DFSR 0x1 DFAR 0xc0f19000
0x00000010:  e59ff014      ldr pc, [pc, #20] ; 0x2c
0x000b0710:  e92d5fff      push {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
0x000b0714:  e59f05b4      ldr r0, [pc, #1460] ; 0xb0cd0
0x000b0718:  e2800001      add r0, r0, #1 ; 0x1
0x000b071c:  e58f05ac      str r0, [pc, #1452] ; 0xb0cd0
0x000b0720:  e24e0008      sub r0, lr, #8 ; 0x8
0x000b0724:  e5900000      ldr r0, [r0]
0x000b0728:  e59f15a8      ldr r1, [pc, #1448] ; 0xb0cd8
0x000b072c:  e3811101      orr r1, r1, #1073741824 ; 0x40000000
0x000b0730:  e5810000      str r0, [r1]
0x000b0734:  e3a00000      mov r0, #0 ; 0x0
0x000b0738:  ee070f9a      mcr 15, 0, r0, cr7, cr10, {4}
0x000b073c:  ee070f15      mcr 15, 0, r0, cr7, cr5, {0}
0x000b0740:  e3a00000      mov r0, #0 ; 0x0
0x000b0744:  ee060f17      mcr 15, 0, r0, cr6, cr7, {0}
000B073C: MCR p15, ...          : CACHEMAINT x261 (omitted)
000B0744: MCR p15,0,Rd,cr6,cr7,0:  946_PRBS7 <- 0x0       
0x000b0748:  e8bd5fff      pop {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
0x000b074c:  e8b15018      ldm r1!, {r3, r4, ip, lr}
[ENGIO] at run_test:000B074C:0001C378 [0xC0F19000] -> 0x0       : ???
[ENGIO] at run_test:000B074C:0001C378 [0xC0F19004] -> 0x0       : ???
[ENGIO] at run_test:000B074C:0001C378 [0xC0F19008] -> 0x0       : ???
[ENGIO] at run_test:000B074C:0001C378 [0xC0F1900C] -> 0x0       : ???
0x000b0750:  e25ef004      subs pc, lr, #4 ; 0x4
0xfffffffc:  ffffffff      undefined instruction 0xffffffff
Taking exception 1 [Undefined Instruction]

The trapped instruction modifies LR...
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on February 09, 2018, 12:42:26 AM
Interesting. That M log that looks good was done with CONFIG_LVAPP_HACK_DEBUGMSG which is not a valid option. The M won't compile with CONFIG_LVAPP_HACK_PATCH but the M2 will.

I didn't think of starting up without LiveView so I tried it on the M2 (start with the playback button) and with the REGION(0xC0090000, 0x001000) you were interested in. Here's the startup log for that (https://pastebin.com/kkSny6VM).

Cameras that are using the invalid CONFIG_LVAPP_HACK_DEBUGMSG in the io_trace branch are the 70D, 100D and 5D3.123 -- though you already know that.

Ok--trying that block of code, the M2 hangs on normal LiveView startup but it starts up fine starting in Play mode. In fact it even saves a startup log.

One last test, going back to the original M2 merged version with CONFIG_LVAPP_HACK_PATCH. I switched from CONFIG_DEBUG_INTERCEPT_STARTUP to CONFIG_DEBUG_INTERCEPT but it hangs on startup. Darn it, I was hoping this would help point out what is going on with the M2. Can't use io_trace to compare with the M because that camera won't compile with CONFIG_LVAPP_HACK_PATCH.

Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 09, 2018, 01:05:36 AM
That LiveView hack doesn't affect the logs; it only affects the user interface in LiveView (in other words, you can still get valid logs by disabling it).

The M2 log looks good, too.

The logging backend depends a lot on the address where ML is loaded (actually, whether it can do short jumps into ROM, or not - when patching low-level code, we have to be careful with this tiny detail, while for regular C code, it doesn't really matter - the compiler handles it for us). On 60D and 7D IIRC, I had to change the boot method to "classic" AllocateMemory - this change fixed the logging hooks, but broke the LiveView hack (so 60D was crashing in LiveView). The LVAPP_HACK_PATCH method attempts to unify the LiveView hacks (same code working on all cameras) and also plays nice with the logging hooks. There are, indeed, a few things to clean up, as you've already noticed.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on February 15, 2018, 11:48:28 AM
Some logs (huge, previously impossible to capture) from 5D3.113:

- camera startup (http://a1ex.magiclantern.fm/bleeding-edge/mmio/DM-0000-startup.LOG)
- still photo capture (http://a1ex.magiclantern.fm/bleeding-edge/mmio/DM-0002-photo-capture.LOG)
- image playback (http://a1ex.magiclantern.fm/bleeding-edge/mmio/DM-0003-image-play-zoom.LOG) (with zoom)
- LiveView, entering from photo mode (http://a1ex.magiclantern.fm/bleeding-edge/mmio/DM-0004-enter-lv-fps22wtf.LOG) (for some unknown reason, frames were captured at about 22 fps)
- 1080p25, entering from photo mode (http://a1ex.magiclantern.fm/bleeding-edge/mmio/DM-0005-enter-1080p25-fps22wtf.LOG) (same FPS issue)
- 1080p25 (http://a1ex.magiclantern.fm/bleeding-edge/mmio/DM-0006-1080p25.LOG) (a few frames, FPS correct this time)

It is my understanding that the above logs include *all* MMIO activity performed by Canon code (or ML) in the 0xC0000000 range (there are a few other regions used for Eeko); however, I have no hard proof for that (or for the correctness of all these values). The logs are huge, but the information you are looking for (regarding how the hardware is configured for various tasks) is probably there.

Previously, to get the value of one (1) single register, I had to examine the disassembly, find out where that register is read or written (from QEMU logs), find a place to install a logging hook, find out in what register I can find the result, troubleshoot if it doesn't work out of the box etc (the MPU communication was reverse engineered this way (https://bitbucket.org/hudson/magic-lantern/commits/20836fb5a7)). Then I've automated that process to some extent (mmio_log (https://bitbucket.org/hudson/magic-lantern/src/d6d93736bd4abc2c843995f629fc5f593a90f322/contrib/qemu/HACKING.rst#rst-header-checking-mmio-values-from-actual-hardware), example (https://bitbucket.org/hudson/magic-lantern/commits/4316a84f3095b07bdf5db6988f8b964b4f9c8bac)), but still, I could only place a few logging hooks in a few isolated places (so the information was scattered in many related logs that had to be re-assembled manually to understand the big picture). Now I get the complete interactions with ALL the devices our processor is talking to, in one (huge) log!

The above logs have nearly 5000 unique registers (3500 just from the photo capture log). Next: good luck understanding the meaning of all these numbers :D
(many of them are documented in various places (https://www.magiclantern.fm/forum/index.php?topic=14656.msg175083#msg175083), so these logs would benefit from some postprocessing to label the registers we already know - you can help with that! (https://www.magiclantern.fm/forum/index.php?topic=14656.0))

Code in io_trace_full (https://bitbucket.org/hudson/magic-lantern/branch/io_trace_full) (major rework of the entire logging backend, breaks some of the other features, for example CONFIG_DEBUG_INTERCEPT_STARTUP+CONFIG_MMIO_TRACE, so I created yet another branch for now). The "very simple" io_trace (https://bitbucket.org/hudson/magic-lantern/branch/io_trace) works very well with less demanding logging scenarios, such as GPIO, or for LiveView at reduced FPS, without breaking any compatibility (at startup or otherwise). The above startup log was done with a hardcoded buffer (hack not committed, trying to find a better way).

Thanks g3gg0 for the help (in particular, regarding cache coherence issues) and for lots of good advice!

For DIGIC 4, the news are a little less good: 60D and 500D seem to work fine with GPIO, but they crash when trying to log the full 0xC0000000 range. Didn't try to narrow down. Works in QEMU.

I'm very curious to see what this hack can do on DIGIC 6 and 7 (more details after getting it to work in QEMU).
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on March 09, 2018, 10:59:06 PM
I was doing some tests on the EOSM and came up with the issue of not being able to compile dm-spy-experiments. Ran hg bisect on dm-spy-experiments and it pointed to a merge with cleaner-lvapp-hack so I ran hg bisect on that branch and came up with this:

Code: [Select]
The first bad revision is:
changeset:   14357:6b83ca04841b
branch:      cleaner-lvapp-hack
user:        alex@thinkpad
date:        Sun May 15 23:09:21 2016 +0300
summary:     Disabled CONFIG_LVAPP_HACK_DEBUGMSG and replaced it with CONFIG_LVAPP_HACK_PATCH. 6D, 650D, 700D and EOSM will need a new stub. TBD on 70D/100D as well.

The EOSM doesn't have the LiveViewApp_handler_BL_JudgeBottomInfoDispTimerState stub. I looked for it but no luck. This isn't the new stub that is referred to in the commit is it? All the other platforms mentioned on the commit build fine.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on March 09, 2018, 11:07:08 PM
Yes, that's the new stub; there is one possible address mentioned in the PR (https://bitbucket.org/hudson/magic-lantern/pull-requests/734) (and looks sane at first sight).

However, you don't need it to get debug messages (and it might even interfere - possible cache conflict with some addresses). Comment out the bottom bar hacks in internals.h (you did this here (https://www.magiclantern.fm/forum/index.php?topic=2388.msg196976#msg196976) - unused CONFIG definitions have no effect).
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on March 10, 2018, 12:04:16 AM
Thanks--looks like I've been here before but there are so many places to look. Now I understand what nikfreak and Licaon Kter were talking about in that PR (https://bitbucket.org/hudson/magic-lantern/pull-requests/734/cleaner-method-for-hiding-canons-bottom/diff#comment-19933682). Tried it out and it seems to work fine.

Any tests to run to check on the possible cache conflicts? This would also be a problem on the other cameras, right? So why not just make the EOSM change on the cleaner-lvapp-hack branch?
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on March 10, 2018, 12:16:49 AM
That depends on the addresses you are logging. If there's any issue, you should get an error message (although I remember it only worked if two logged functions were in conflict; this case is a bit different, as it involves some other patch).

TODO: I should just disable the hack when compiling with CONFIG_DEBUG_INTERCEPT.

Yes, the M stub can be committed.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: dfort on March 10, 2018, 12:56:35 AM
Yes, the M stub can be committed.

Pull request (https://bitbucket.org/hudson/magic-lantern/pull-requests/910/eosm-missing-stub-found/diff) submitted.

BTW--What I was trying to do was to solve the EOSM shutter-bug issue. With all the stuff I learned in the past several years maybe I'm ready to tackle this again? Maybe not?
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on March 10, 2018, 01:09:57 AM
Good idea.

If the issue can be reproduced outside LiveView - for example, with a call("Release") somewhere at startup - that's a good candidate for trying in QEMU. I have some local changes that emulate the photo capture process to some extent - at least it's making the screen black and starts capturing the image. That should be enough for diagnosing the issue, as my understanding is that, when this bug happens, the camera doesn't even start capturing the image (maybe it also locks up?)

We'll need two logs with the above scenario, with the camera starting outside LiveView and trying to capture an image. MPU messages should be enabled (they are by default) - that's the "driving force" behind photo capture (and that's what my local changes are all about - MPU messages from photo capture logs, assigned to different hotkeys). One log should be successful and the other should have the bug.

If there is some workaround (maybe removing the lens and putting it back?), that's also worth logging.
Title: Re: Intercepting DebugMsg with cache hacks
Post by: a1ex on March 11, 2018, 07:47:06 PM
Next: good luck understanding the meaning of all these numbers :D

O jus a few thousand registers  to try, I'm almost there ;D

When all else fails, brute force prevails.

- camera startup (http://a1ex.magiclantern.fm/bleeding-edge/mmio/DM-0000-startup.LOG)

Code: [Select]
[REPL]         at 0xFF0C7950:FF0C1428 [0xC0201100] <- 0x43210DCB: found @ 3bb
[REPL]         at 0xFF0C7958:FF0C1428 [0xC0201104] <- 0xFEA98765: found @ 3bc
[REPL]         at 0xFF0C7960:FF0C1428 [0xC0201200] <- 0x1       : found @ 3bf
[REPL]         at 0xFF0C796C:FF0C1428 [0xC0201200] -> 0x1       : replayed @ 3c0
[REPL]         at 0xFF143100:FF0C76B8 [0xC0800008] -> 0x8081    : replayed @ 3c1
[REPL]         at 0xFF143108:FF0C76B8 [0xC0800008] <- 0x81      : found @ 3c2
[REPL]         at 0xFF143110:FF0C76B8 [0xC0800018] -> 0x4       : replayed @ 3c3
[REPL]         at 0xFF143118:FF0C76B8 [0xC0800018] <- 0x3F      : found @ 3c4
[REPL]         at 0xFF31AC10:FF143128 [0xC0800008] -> 0x81      : replayed @ 3c5
[REPL]         at 0xFF31AC14:FF143128 [0xC0800008] -> 0x81      : replayed @ 3c6
[REPL]         at 0xFF31AC1C:FF143128 [0xC0800008] <- 0x81      : found @ 3c7
[REPL]         at 0xFF31AC34:FF31AC2C [0xC0800010] <- 0x19      : found @ 3c8
[REPL]         at 0xFF31AC3C:FF31AC2C [0xC0800008] <- 0x81      : found @ 3c9
[REPL]         at 0xFF31AC4C:FF143148 [0xC0800008] -> 0x81      : replayed @ 3ca
[REPL]         at 0xFF31AC70:FF143148 [0xC0800008] <- 0x81      : found @ 3cb
[REPL]         at 0xFF31AC80:FF143154 [0xC0800008] -> 0x81      : replayed @ 3cc
[REPL]         at 0xFF31ACA8:FF143154 [0xC0800008] <- 0x81      : found @ 3cd
[REPL]         at 0xFF143164:FF143154 [0xC0800008] -> 0x81      : replayed @ 3ce
[REPL]         at 0xFF14316C:FF143154 [0xC0800008] <- 0x8081    : found @ 3cf
[REPL]         at 0xFF143178:FF143154 [0xC0800014] <- 0x2       : found @ 3d0
[REPL]         at 0xFF143180:FF143154 [0xC0800000] <- 0x0       : found @ 3d1
[REPL]         at 0xFF143188:FF143154 [0xC0800008] -> 0x8081    : replayed @ 3d2
[REPL]         at 0xFF0C7A38:FF0C1680 [0xC0203008] <- 0x8       : found @ 3d3
[REPL]         at 0xFF0C79A8:FF0C1688 [0xC0201010] <- 0xA       : found @ 3d4
[REPL]         at 0xFF0C79AC:FF0C1688 [0xC0201200] -> 0x1       : replayed @ 3d5
[REPL]         at 0xFF0C168C:FF0C1688 [0xC0400018] <- 0x0       : found @ 3d6
[REPL]         at 0xFF0C7AD8:FF0C1698 [0xC0400008] -> 0x430007  : replayed @ 3d7
[REPL]         at 0xFF0C7AE4:FF0C1698 [0xC0400008] <- 0x431007  : found @ 3d8
[REPL]         at 0xFF0C1698:FF0C1698 [0xC0400000] -> 0x80000000: replayed @ 3d9
[REPL]         at 0xFF0C7A68:FF0C16A4 [0xC0210200] <- 0x80000000: found @ 3da
[REPL]         at 0xFF0C7A6C:FF0C16A4 [0xC0210200] -> 0x0       : replayed @ 3db
[REPL]         at 0xFF0C7A9C:FF0C16C8 [0xC0210204] <- 0x2       : found @ 3dc
[REPL]         at 0xFF0C7AB0:FF0C16C8 [0xC0210214] <- 0x3       : found @ 3dd
[REPL]         at 0xFF0C16D0:FF0C16C8 [0xC0210208] <- 0x270F    : found @ 3de
[REPL]         at 0xFF0C7AC4:FF0C16DC [0xC0210210] <- 0x1       : found @ 3df
[REPL]         at 0xFF0C7AC8:FF0C16DC [0xC0210200] <- 0x1       : found @ 3e0
[REPL]      at init:FF0C79A8:FF0C77F8 [0xC0201010] <- 0x2E      : found @ 3e1
[REPL]      at init:FF0C79AC:FF0C77F8 [0xC0201200] -> 0x1       : replayed @ 3e2
[REPL]      at init:FF0C79A8:FF0C7800 [0xC0201010] <- 0x3A      : found @ 3e3
[REPL]      at init:FF0C79AC:FF0C7800 [0xC0201200] -> 0x1       : replayed @ 3e4
[REPL]      at init:FF1432CC:FF31AE98 [0xC05000D0] -> 0x20      : replayed @ 3e5
[REPL]      at init:FF1432D4:FF31AE98 [0xC05000D0] <- 0x20      : found @ 3e6
[REPL]      at init:FF1432E0:FF31AE98 [0xC05000C0] <- 0x4003BF20: found @ 3e7
[REPL]      at init:FF1432EC:FF31AE98 [0xC05000C4] <- 0x80      : found @ 3e8
[REPL]      at init:FF1432F8:FF31AE98 [0xC05000C8] <- 0x4003BF20: found @ 3e9
[REPL]      at init:FF143304:FF31AE98 [0xC05000CC] <- 0x80      : found @ 3ea
[REPL]      at init:FF14330C:FF31AE98 [0xC05000D4] <- 0x0       : found @ 3eb
[REPL]      at init:FF143318:FF31AE98 [0xC05000D8] <- 0x17      : found @ 3ec
[REPL]      at init:FF143324:FF31AE98 [0xC05000D0] <- 0x10023   : found @ 3ed
[REPL]      at init:FF143334:FF31AE98 [0xC0800018] <- 0x3F      : found @ 3f3
[REPL]      at init:FF143340:FF31AE98 [0xC0800014] <- 0x1       : found @ 3f4
[REPL]      at init:FF143348:FF31AE98 [0xC0800018] -> 0x3F      : replayed @ 3f5
[REPL]      at init:FF143350:FF31AE98 [0xC0800018] <- 0x7       : found @ 3f6
[REPL]      at init:FF14335C:FF31AE98 [0xC0800008] <- 0x8881    : found @ 3f7
[REPL]      at init:FF143364:FF31AE98 [0xC0800008] -> 0x8081    : replayed @ 3f8
[REPL]      at init:FF14350C:FF0C78CC [0xC0800014] -> 0x16      : replayed @ 3f9
[REPL]      at init:FF14353C:FF0C78CC [0xC0800014] <- 0x2       : found @ 3fa
[REPL]      at init:FF143548:FF0C78CC [0xC0800000] <- 0x4B      : found @ 3fb
[REPL]      at init:FF14350C:FF0C78CC [0xC0800014] -> 0x14      : replayed @ 3fc
[REPL]      at init:FF143518:FF0C78CC [0xC0800018] -> 0x7       : replayed @ 3fd
[REPL]      at init:FF143520:FF0C78CC [0xC0800018] <- 0x5       : found @ 3fe

Next: interrupu.