Quite hard bug diagnosed and provisionally fixed in module relocation. Took me a few weeks to understand. This is very significant, it removes a large number of very hard to understand crashes from modules, and exposes other, easier to debug problems.
Module loading in ML works by building ELF objects, which are copied to card, and then ML uses libtcc to load these into mem. The loading process has a relocation step. There were (probably) two problems here. The more important, libtcc has I think a bug when it checks to see if the relocation is too far. Since modules are built as ARM, not Thumb, the default approach means the target of calls / jumps can only be +-32MB from the call site; ARM encodes the address in 24 bits.
For at least Digic 7 cams, offsets can be outside this range, they're typically in 0xe000.0000 or 0xdf00.0000 areas. Older cams are in 0xff80.0000 or thereabouts, so you can underflow and be within 32MB. Libtcc tries to check if the target is outside this range but the error condition doesn't trigger. This was causing tccelf.c, relocate_sections() to fixup the modules with inappropriate relocations. And that meant calls from within module code would go to *completely unrelated* offsets, with unpredictable and very hard to diagnose crashes.
As far as I can see, the fact that older cams happen to allow modules to successfully relocate was always luck. If heap allocs had occured from a different region, or if stub addresses hadn't happened to be "near" to heap addresses via wraparound, it would never have worked.
My provisional fix is to build modules with -mlong-calls. This changes the asm output so the form is "blx r3" style, which allows full 32-bit offsets. This incurs a minor space and perf cost, and is currently untested on old cams.