the BUG is back on my camera.
Yes, this is what I'm trying to tell you. All this going back and forth between multiple versions is completely useless. You may get lucky or you may not; if you get lucky, the bug will return sooner or later.
I thought it was fixed.
Please stop speculating whether the bug is fixed or not.
It's not.To fix the bug, one has to understand what's going on and patch it somehow (emphasis on
understand, not "it seems to work"). We need a clear way to reproduce it (a deterministic way, not a probabilistic one like yesterday it worked and today it doesn't). So, if you try different settings, you should do this in order to find a clear pattern for reproducing it, not to try to get rid of the bug.
I've hunted a similar bug in 550D/600D for roughly 1 year. Turns out it was a buffer overflow in Canon code: they were writing a 32-bit 0 in the middle of ML code. There was no side effects without ML, because the 0 was in the middle of a uninitialized malloc area (where ML was loading). With ML, the side effects were, obviously, depending where exactly that 0 was falling: from not noticeable (writing a 0 on something already 0 or on unused memory), to cropped strings (say it displayed Inte instead of Intervalometer) or weird behavior if the 0 was in the middle of some code (lockup in playback mode, garbled histogram and so on). The deterministic pattern was: (1) the symptom was always the same with a given autoexec.bin and (2) any code added or removed before the affected location (even just a variable declaration) was
changing the symptom somehow (of course, the symptom should have been visible and known before inserting the test code), but any code added after that (in the source code) never changed the symptom. This allowed me to narrow down the affected location within a small function, dumped that part of RAM, compared "before" and "after", and then g3gg0 patched it within one day (remember: took 1 year to find a pattern and diagnose it). See
https://bitbucket.org/hudson/magic-lantern/commits/99e80332797c53d3b8f3b75b3ad7fe7f608da4c5So far, my observations are:
- The bug never happens with plain Canon firmware.
- The bug happens with any user code (even with simply jumping to Canon firmware,
both autoexec and FIR), with some probability. The autocorrelation is very high (once the bug appears, it's very likely to stay).
- Removing the lens and putting it back will always remove the symptoms.
- Incomplete reboot (power down and power up quickly) will always remove the symptoms.
- When the bug is present, enabling the intervalometer will lock up the camera (battery pull needed).
- Do not confuse the bug with the camera simply failing to focus (that's not a bug).
The above items should be 100% true. If you can prove that any of them is wrong, please do.
So, here's my current hypothesis:
Jumping to Canon firmware from the bootloader may not be the best thing to do. One has to understand what exactly the bootloader is doing when it's jumping to Canon firmware on its own, without ML, and what is different when our code is jumping to Canon firmware.
Just a guess: when our code jumps to Canon firmware, lens initialization may be incomplete. Listening to the lens pins may be helpful (if there's a difference in the logs, it may confirm this). We may be able to force a lens re-initialization somehow (since the symptoms disappear when by rotating the lens). Be careful though: this will only address the symptoms
without understanding the root cause of the bug, so I do not recommend this way (or if you find a solution with this method, you should not stop researching).