Stack trace (aka backtrace) in crash logs

Started by a1ex, July 01, 2017, 04:26:49 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.


Sneak preview of what I'm working on:

at ../../src/stdio.c:44 (streq), task module_task
lv:0 mode:3

module_task stack: ad340 [69c60-1dd3b0]
0x0006DC34 @ 7162c:1dd400
0x0007CF7C @ 6dd3c:1dd3f0
0x00069C00 @ 7cf9c:1dd3e0
0x000AD644 @ 69c5c:1dd3b0

What's the meaning of these codes?

eu-addr2line -s -S --pretty-print -e magiclantern 0x0006DC34 0x0007CF7C 0x00069C00 0x000AD644
entry_guess_icon_type at menu.c:694
streq at stdio.c:43
ml_assert_handler at boot-hack.c:596
backtrace_getstr at backtrace.c:859

eu-addr2line -s -S --pretty-print -e magiclantern 7162c 6dd3c 7cf9c 69c5c
menu_add.part.25+0x100 at menu.c:1212
entry_guess_icon_type+0x108 at menu.c:711
streq+0x20 at stdio.c:44
ml_assert_handler+0x5c at boot-hack.c:605

Putting all together:

menu_add (menu.c:1212) called entry_guess_icon_type (located menu.c:694)
entry_guess_icon_type (menu.c:711) called streq (located at stdio.c:43)
  streq (stdio.c:44) called ml_assert_handler (located at boot-hack.c:605) - that's the ASSERT macro
   ml_assert_handler (boot-hack.c:605) called backtrace_getstr (located backtrace.c:859)

Heh, that backtrace went a little bit too far :)

Note: the above line numbers are valid for this changeset.

Works for Canon code too (but it's unable to figure out indirect calls):

at RscMgr.c:2513, task InnerDevelopMgr
lv:0 mode:3

InnerDevelopMgr stack: ad360 [697d8-19e498]
0xUNKNOWN  @ de48:19e568
0xUNKNOWN  @ 17bbc:19e540
0x000178B4 @ ff139c38:19e528
0xUNKNOWN  @ 178e4:19e518
0xUNKNOWN  @ 1796c:19e4f8
0xFF0F2F14 @ ff301928:19e4e0
0x00001900 @ ff0f2f80:19e4d0
0x000AD664 @ 697d4:19e498

Will post more details after committing the source.

In the mean time, I'd appreciate a small script (easy coding task) that would take a crash log as input (as in the above examples) and create a human-readable output from it (as in the "putting all together" example). To get the debugging info required for name translation, you'll need this changeset.



that canon assert tracing would have saved a lot of work if we had that earlier :D
Help us with datasheets - Help us with register dumps
magic lantern: 1Magic9991E1eWbGvrsx186GovYCXFbppY, server expenses: [email protected]
ONLY donate for things we have done, not for things you expect!


PR open:

How does it work?

It attempts to figure out where return addresses are on the stack (some sort of stack unwinding). On ARM, when a function call is made, the return address is saved in the LR register; with nested calls, LR is pushed on the stack (and restored on return). However, the stack contains a bunch of other stuff, so it's not straightforward to tell what positions on the stack were used for LR.

To find the return address of a function, this code attempts to "emulate" the ARM program from the current position (e.g. where the exception happened) until it gets out of the function. The interesting things to emulate are:
- instructions that modify SP
- direct jumps (they should be followed)
- instructions that restore LR (POP)
- function calls are skipped (they shouldn't modify SP according to the ABI)

Sometimes the "emulation" may get stuck in a loop, without being able to find the end of the function. To find a solution, I've tried to take conditional branches randomly, hoping one of them would be lucky. It helps in some cases, but not much. It's not enabled by default.

After writing all this code, I've found out I might have been reinventing the wheel:
- g3gg0 has an interesting heuristic in gdb.c: check each item and see whether it looks like a valid LR (whether there's a call instruction at that LR); that's worth trying as a backup strategy (edit: DONE).
- this public domain code appears to do something similar. Didn't try it yet. edit: tried it; if anyone would like to see a comparison, just ask.
- stack unwinding for Thumb using the same idea.

However, all the solutions I've found before writing this code required some sort of debugging info (dwarf) or frame pointer (not present in Canon code). Maybe I didn't know what keywords to use.


This is really useful. So cool that I needed to post this just to follow this thread.
5D3.113 | 5D3.123 | EOSM.203 | 7D.203 | 70D.112 | 100D.101 | EOSM2.* | 50D.109