Magic Lantern Forum

Developing Magic Lantern => General Development => Topic started by: dfort on March 11, 2018, 05:04:43 AM

Title: EOSM Shutter-Bug reboot
Post by: dfort on March 11, 2018, 05:04:43 AM
EOS-M Shutter-Bug

One of the peskiest and long lived Magic Lantern bug is the EOSM shutter-bug. The first report of this bug that I could find was made in 2012 (https://www.magiclantern.fm/forum/index.php?topic=3648.msg21088#msg21088). It has been debated to death in the EOS M Alpha shutter-bug discussion  (https://www.magiclantern.fm/forum/index.php?topic=8347.0)and Issue #1893, EOS-M shutter bug (https://bitbucket.org/hudson/magic-lantern/issues/1893/eos-m-shutter-bug) on Bitbucket.

Quick workaround - use the one SD card that has been verified to work consistently, the SanDisk 32GB Extreme PRO SDHC UHS-I. A smaller SanDisk Extreme PRO card is ok but don't use anything larger because pretty much all SDXC cards show the shutter-bug.

Better solution, let's try to find a way to squash this bug.

First of all, how to reproduce this bug.

Lens - Only EF-M zoom lenses, both Canon and third party like the Tamron 18-200mm exhibit the shutter-bug. It is probably not the zoom but more likely when the boot process is reading the image stabilization system that is triggering the bug. Maybe, we don't really know. Lenses that have the collapsing feature also exhibit the shutter-bug if the camera is started with the lens extended.

SD Card - Many cards exhibit the bug but there are a few surprises. For example I've got an old 1GB Kodak card that is super slow but works fine. Anything over 32GB seems to show the bug.

Photo Mode - The shutter bug prevents you from taking a still photo. It doesn't affect the silent still module, only regular JPEG and CR2 shots that need a full shutter press. It also has nothing to do with movie recording.

Starting the camera with the Play button makes no difference. Even holding down the INFO button which on the EOSM prevents ML from loading will show the shutter-bug. However, interrupting the startup process by turning the camera on/off/on curing startup will kill the bug if you do it just right. This puts the camera in some sort of a special "StartupCondition" - This is what we know about StartupCondition so far:

StartupCondition : 1, 0 = camera powered on normally, with the ON/OFF button
StartupCondition : 16, 0 = camera powered on via the Play button
StartupCondition : 4, 0 = on/off/on trick that somehow kills the shutter-bug

Another way to kill the shutter-bug is by either twisting the lens on and off in order to break the lens/body connection or if the lens has the collapsing feature simply collapsing and extending the lens also works.

Experienced EOSM ML users have learned to live with the shutter-bug but it continues to surprise new users. Developers have tried multiple times to figure this out but now we have some more powerful tools that we can use.

I'm not sure if this time we'll be able to do it. This will be more of a test of what we've learned over the past several years and what the new tools reveal. Here's a suggestion where to start:

Quote from: a1ex on March 10, 2018, 01:09:57 AM
If the issue can be reproduced outside LiveView - for example, with a call("Release") somewhere at startup - that's a good candidate for trying in QEMU. I have some local changes that emulate the photo capture process to some extent - at least it's making the screen black and starts capturing the image. That should be enough for diagnosing the issue, as my understanding is that, when this bug happens, the camera doesn't even start capturing the image (maybe it also locks up?)

We'll need two logs with the above scenario, with the camera starting outside LiveView and trying to capture an image. MPU messages should be enabled (they are by default) - that's the "driving force" behind photo capture (and that's what my local changes are all about - MPU messages from photo capture logs, assigned to different hotkeys). One log should be successful and the other should have the bug.

If there is some workaround (maybe removing the lens and putting it back?), that's also worth logging.

First test - using the io_trace branch I ran a startup log holding down both the Play button so it would start outside of LiveView and the INFO button so it wouldn't load ML. I ran this on both a card that is known to always show the shutter-bug and a SanDisk 32GB Extreme PRO SDHC UHS-I card. Then I looked up some addresses for lens properties and found something interesting:

src/property.h
#define PROP_LENS               0x80030011 // info about lens? flags?

card with shutter-bug log (https://pastebin.com/7vaKC1eZ)
PROP:0x80030011,10502801

card without shutter-bug log (https://pastebin.com/6XMUXbCD)
PROP:0x80030011,10522a01

Don't know if this is significant. I was careful not to change anything between the tests and the logs are very similar. So if this register holds some lens information or flags, it looks like there is something going on here. Perhaps figuring out how to re-read the lens information/flags at the end of the boot process might be a way to eliminate the shutter-bug?
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 11, 2018, 05:58:28 AM
Tried anther test with the io_trace build only this time logging events instead of just a startup. What happens if the 11-22mm lens along with the card that exhibits the shutter-bug is started with the lens collapsed and we log the action of extending the lens. This kills the shutter-bug but what is going on with that interesting lens property? Well, it shows up several times on the log:

EOSM extend 11-22mm lens log (https://pastebin.com/G9kJv2mq)
copyDataToStorage eventID(0x80030011)Data(11740044)size(31)

So it is copying lens information to storage? This showed up on once in each of the logs I have posted.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 11, 2018, 08:06:02 AM
Ok--one more test before calling it a night.

Log shooting a CR2 on both a card without the shutter-bug (https://pastebin.com/NzyvLj9K) and a card with the shutter-bug (https://pastebin.com/TUzdnsSc). Of course the card with the shutter-bug blocked the shot but it also logged some ptpPropertyChangeEvent events that didn't show up on the other log.

ptpPropertyChangeEvent[80000028][4][87][0]
ptpPropertyChangeEvent[80030013][4][0][0]


So what are these properties?

src/property.h
#define PROP_PICTURE_STYLE 0x80000028   // 0x81 = std, 82 = portrait, 83 = landscape, 84 = neutral, 85 = faithful, 86 = monochrome, 21 = user 1, 22 = user 2, 23 = user 3
#define PROP_BATTERY_CHECK 0x80030013


The camera was set to Auto Picture Style so it makes sense that it changes, but to 87? Looks like that one isn't a known picture style.

[EDIT] Just out of curiosity I ran the test again with the Picture Style set to Standard and it did come up with 81 so that's good--though why would it be a property change when taking a picture?
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 11, 2018, 12:18:19 PM
Last night I've cleaned up a script that annotates this kind of logs (https://www.magiclantern.fm/forum/index.php?topic=17596.msg198329#msg198329), printing the property ID and GUI event names for most of the MPU messages.

Noticed a bug: the log files must have Unix-style newlines, otherwise the script won't work well. The logs from here seem to be with Windows-style newlines.

Tried to diff the CR2 logs, but LiveView is way too noisy. Please capture the same info with call("Release") somewhere at startup, without entering LiveView. For example, with CONFIG_DEBUG_INTERCEPT_STARTUP, locate the 20-second blinks that indicate the experiment is running:


#ifdef CONFIG_DEBUG_INTERCEPT_STARTUP
    info_led_blink(20,500,500);
    debug_intercept();
#endif


and change to:

#ifdef CONFIG_DEBUG_INTERCEPT_STARTUP
    info_led_blink(10,500,500);
    beep();
    DryosDebugMsg(0, 0, "*** Attempting to take a picture...");
    call("Release");
    info_led_blink(10,500,500);
    debug_intercept();
#endif


What I could tell from the two CR2 logs was that both logs have the SW1 (half-shutter) event, but only one has SW2 (full shutter press or capture start, an event initiated by the MPU).

cr2-nobug-vs-bug.html (http://a1ex.magiclantern.fm/bleeding-edge/EOS-M/cr2-nobug-vs-bug.html) (too noisy)

Diff'ed the two startup logs:
startup-nobug-vs-bug.html (http://a1ex.magiclantern.fm/bleeding-edge/EOS-M/startup-nobug-vs-bug.html)

- 02 12 is a group of properties (https://bitbucket.org/hudson/magic-lantern/src/qemu/contrib/qemu/eos/mpu_spells/known_spells.py?fileviewer=file-view-default#known_spells.py-172) related to lens (sent by MPU to main CPU)
- 09 2e is some property related to LiveView (sent by MPU to main CPU); seems to arrive with some delay (edit: 80050045 PROP_LV_FOCAL_DISTANCE)
- PROP_AVAIL_SHOT: likely given by free space on the card (it's sent by main CPU to MPU)
- PROP_POWER_KIND appears in a different order (multitasking during EOS startup - PowerShot firmware doesn't do that)
- 03 25: no idea, also out of order
- 02 15 is PROP_LENS
- 06 05 06 12 00 00 is some unidentified GUI event (GUICMD_CLOSE_SLOT_COVER)

Note: both startup logs were created with: StartupCondition : 4, 0:

QuoteStartupCondition : 4, 0 = on/off/on trick that somehow kills the shutter-bug

Shouldn't they be created with this one instead?

QuoteStartupCondition : 16, 0 = camera powered on via the Play button




The above logs can be used in QEMU.

1) convert them with extract_init_spells.py (https://bitbucket.org/hudson/magic-lantern/src/qemu/contrib/qemu/eos/mpu_spells/extract_init_spells.py)


python extract_init_spells.py EOSM-start-bug-a.log > EOSM-bug.h
python extract_init_spells.py EOSM-start-nobug-a.log > EOSM-nobug.h


That way, I can switch between the two by renaming the header used in mpu.c .

This will allow comparing the different properties and figuring out what they might be.

2) run ML and try to print the contents of PROP_LENS, for example:

PROP_HANDLER(PROP_LENS)
{
    uint8_t* info = (uint8_t *) buf;

    qprintf("PROP_LENS: ");
    for (int i = 0; i < len; i++)
    {
        qprintf("%02x ", info[i]);
    }
    qprintf("\n");
    ...



(./run_canon_fw.sh EOSM,firmware="boot=1" -s -S & arm-none-eabi-gdb -x EOSM/patches.gdb) |& grep -a PROP_LENS

# good
[MPU] Sending : 20 1f 03 24 45 46 2d 4d 31 31 2d 32 32 6d 6d 20 66 2f 34 2d 35 2e 36 20 49 53 20 53 54 4d 00 00  (PROP_LENS_NAME)
[MPU] Sending : 24 23 03 15 01 2a 52 10 33 00 0b 00 16 90 53 9a 7c 08 ff 00 00 00 00 00 00 00 00 49 04 01 00 00 37 2a 01 00  (PROP_LENS)
[MPU] Sending : 24 23 03 15 01 2a 52 10 33 00 0b 00 16 90 53 9a 7c 08 ff 00 00 00 00 00 00 00 00 49 04 01 00 00 37 2a 01 00  (PROP_LENS)
PROP_LENS: 01 2a 52 10 33 00 0b 00 16 90 53 9a 7c 08 ff 00 00 00 00 00 00 00 00 49 04 01 00 00 37 2a 01

# bad
[MPU] Sending : 20 1f 03 24 45 46 2d 4d 31 31 2d 32 32 6d 6d 20 66 2f 34 2d 35 2e 36 20 49 53 20 53 54 4d 00 00  (PROP_LENS_NAME)
[MPU] Sending : 24 23 03 15 01 28 50 10 33 00 0b 00 16 90 53 9a 7c 08 ff 00 00 00 00 00 00 00 00 49 04 01 00 00 37 29 01 00  (PROP_LENS)
[MPU] Sending : 24 23 03 15 01 28 50 10 33 00 0b 00 16 90 53 9a 7c 08 ff 00 00 00 00 00 00 00 00 49 04 01 00 00 37 29 01 00  (PROP_LENS)
PROP_LENS: 01 28 50 10 33 00 0b 00 16 90 53 9a 7c 08 ff 00 00 00 00 00 00 00 00 49 04 01 00 00 37 29 01


The two differences are raw_aperture_min and raw_aperture_max (f4.3-f24 vs f4-f22). Likely different focal length - that one is taken from PROP_LV_LENS. Though, in QEMU, using the startup logs only, focal length is reported as 1408 (ignored by ML).

Focal length is reported in LiveView, in the CR2 logs, but there are a couple of different values (showing the lens was probably zoomed in and out during the experiment).

3) Try to identify properties from unknown MPU messages (https://www.magiclantern.fm/forum/index.php?topic=17596.msg195296#msg195296):


# properties
if 1
  b *0xFF11FB04
  prop_request_change_log

  b *0xFF31F6F0
  mpu_analyze_recv_data_log

  b *0xFF31CE8C
  prop_lookup_maybe_log

  b *0xFF325994
  mpu_prop_lookup_log
end


4) ... ??




StartupCondition is a property that comes from the MPU:

*** mpu_recv(06 05 03 20 04 00), from 1dedc                        ; PROP_STARTUP_CONDITION


Can be easily faked on the main CPU side. Guess: maybe it skips sensor cleaning if camera is powered often?


# edit PROP_STARTUP_CONDITION in mpu_spells/EOSM.h (values: 0x01, 0x04, 0x10)
# we need -icount for deterministic emulation (so the stuff that appears in logs will be in the same order)
(./run_canon_fw.sh EOSM,firmware="boot=1" -s -S -d debugmsg -icount 5 & arm-none-eabi-gdb -x EOSM/patches.gdb) &> startup-condition-01
(./run_canon_fw.sh EOSM,firmware="boot=1" -s -S -d debugmsg -icount 5 & arm-none-eabi-gdb -x EOSM/patches.gdb) &> startup-condition-04
(./run_canon_fw.sh EOSM,firmware="boot=1" -s -S -d debugmsg -icount 5 & arm-none-eabi-gdb -x EOSM/patches.gdb) &> startup-condition-10



startup-condition-01-vs-04.html (http://a1ex.magiclantern.fm/bleeding-edge/EOS-M/startup-condition-01-vs-04.html) (looks like - START_AS_CHECK is sensor cleaning)
startup-condition-01-vs-10.html (http://a1ex.magiclantern.fm/bleeding-edge/EOS-M/startup-condition-04-vs-10.html) (16 decimal)

Long shot: what if you disable sensor cleaning?




Some more tests I'd like to run:

1) to exclude differences in SD card contents (including, but not limited to, card size):
- prepare the smaller SD card out of these two (format, make bootable, copy logging code)
- create a card image (dd if=/dev/sdcard of=sd.img bs=1M)
- copy the image to the large card (dd if=sd.img of=/dev/sdcard bs=1M)
- run the logging experiment from both cards (starting with the same initial conditions)

2) to exclude timing differences while loading autoexec.bin:
- compile with CONFIG_MARK_UNUSED_MEMORY_AT_STARTUP
- this will slow down the boot process by a few seconds
- the slowdown happens before the MPU is initialized (06 04 02 00 00)
- it's likely some times larger than any timing difference caused by different card speeds
- I should probably think of a way to get repeatable timings (maybe by delaying certain debug messages until some preset value)

3) can the bug be reproduced without autofcus? (if yes, turn it off)
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 11, 2018, 06:29:25 PM
Quote from: a1ex on March 11, 2018, 12:18:19 PM
Noticed a bug: the log files must have Unix-style newlines, otherwise the script won't work well. The logs from here seem to be with Windows-style newlines.

Darn it, just found out that Pastebin is doing that. Can't find a way to change it--is it a problem? Should I find another service to post my logs?

Thanks for the detailed post. This will keep me busy for a while.

Quote from: a1ex on March 11, 2018, 12:18:19 PM
Note: both startup logs were created with: StartupCondition : 4, 0:

Interesting--I pressed both the Play and INFO button on startup so it would start outside of LiveView and not load ML yet the shutter-bug still showed up. So much for StartupCondition having something to do with killing the bug.

[EDIT] Thought I'd try this:

Quote from: a1ex on March 11, 2018, 12:18:19 PM
Long shot: what if you disable sensor cleaning?

Didn't kill the bug. I also tried "Clean now" because the EOSM displays the sensor cleaning only on shutdown but even though it triggered the shutter during cleaning the shutter-bug was still present after sensor cleaning.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 11, 2018, 06:44:03 PM
Replace "\n" with "\r\n" in the script (will commit it later).
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 11, 2018, 11:42:33 PM
Started creating logs and processing them. Got something worth sharing:

https://www.dropbox.com/sh/j59xlz1qd1yelen/AABiu9QYITsJ-8RYBQiQZFnta?dl=0

First of all, the changes I did to io_trace (love being able to read this from autoexec.bin):

Magic Lantern Nightly.2018Mar11.EOSM202
Camera   : EOSM
Firmware : 202
Changeset: 2838aae45890+ (io_trace)
Built on : 2018-03-11 20:56:53 by rosiefort@RosieFoComputer

diff -r 2838aae45890 src/boot-hack.c
--- a/src/boot-hack.c Tue Feb 13 21:36:23 2018 +0100
+++ b/src/boot-hack.c Sun Mar 11 13:56:54 2018 -0700
@@ -25,6 +25,7 @@
  * Boston, MA  02110-1301, USA.
  */

+#include "beep.h"
#include "dryos.h"
#include "config.h"
#include "version.h"
@@ -528,7 +529,11 @@


#ifdef CONFIG_DEBUG_INTERCEPT_STARTUP
-    info_led_blink(20,500,500);
+    info_led_blink(10,500,500);
+    beep();
+    DryosDebugMsg(0, 0, "*** Attempting to take a picture...");
+    call("Release");
+    info_led_blink(10,500,500);
     debug_intercept();
#endif
}
diff -r 2838aae45890 src/dm-spy.c
--- a/src/dm-spy.c Tue Feb 13 21:36:23 2018 +0100
+++ b/src/dm-spy.c Sun Mar 11 13:56:54 2018 -0700
@@ -112,7 +112,7 @@

     /* only log messages from LiveView tasks until the buffer is half-full
      * (these are very verbose) */
-    if (lv && len > buf_size / 2)
+    if (lv)
     {
         if (streq(current_task->name, "Evf") ||
             streq(current_task->name, "Epp") ||
diff -r 2838aae45890 platform/EOSM.202/internals.h
--- a/platform/EOSM.202/internals.h Tue Feb 13 21:36:23 2018 +0100
+++ b/platform/EOSM.202/internals.h Sun Mar 11 13:56:54 2018 -0700
@@ -138,7 +138,7 @@
//~ #define CONFIG_AUDIO_CONTROLS

/** Patch LiveViewApp dialog hander to hide Canon bottom bar */
-#define CONFIG_LVAPP_HACK_PATCH
+// #define CONFIG_LVAPP_HACK_PATCH

/** Workaround for menu timeout in LiveView */
#define CONFIG_MENU_TIMEOUT_FIX


CONFIG_DEBUG_INTERCEPT_STARTUP = y


The test wouldn't run when pressing Play and INFO so I used only the Play button. Lights flashed two logs were saved and in both cases the second log showed the "*** Attempting to take a picture..." message so I processed the second logs.

Found some interesting differences but that's about as far as I got.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 12, 2018, 12:35:45 AM
The SW1 event is near the end of the log; may be worth increasing the log size (default is 512K, but many models accept 2MB or more). May take some trial and error.

Comparing MPU messages gives pretty much the same pattern as before (e.g. 06 04 09 2d 00 appears early on the good log). First difference is on some mpu_recv (the MPU decides to send different data, for some reason).
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 12, 2018, 06:22:04 AM
Saw you made some tweaks to the io_trace_full branch so I jump on that and ran the same test again only this time pretending that the EOSM is a real grownup camera:

src/dm-spy.c
/* plenty of memory available */
-#if defined(CONFIG_5D2) || defined(CONFIG_5D3) || defined(CONFIG_50D) || defined(CONFIG_500D) || defined(CONFIG_100D) || defined(CONFIG_70D)
+#if defined(CONFIG_5D2) || defined(CONFIG_5D3) || defined(CONFIG_50D) || defined(CONFIG_500D) || defined(CONFIG_100D) || defined(CONFIG_70D) || defined(CONFIG_EOSM)
#define BUF_SIZE_MALLOC (2048*1024)
#endif


It worked! Well, sort of. It can run the test as long as the camera is started via the play button. Starting "normally" will hang the camera but that isn't anything new (https://www.magiclantern.fm/forum/index.php?topic=2388.msg196998#msg196998).

One thing I was interested in was "ALERT DELIVER ARRAY" because it shows up in the "good" log but not in the shutter-bug log on my earlier test. It looks like this is located around ff11fc00 and it is a part of the EOSM _prop_request_change stub that is apparently never called when the shutter-bug is present. However, the new logs are much larger and both logs (shutter-bug and no shutter-bug) have ALERT DELIVER ARRAY. So much for that theory.

The io_trace_full logs are in the same Dropbox location (https://www.dropbox.com/sh/j59xlz1qd1yelen/AABiu9QYITsJ-8RYBQiQZFnta?dl=0) as the earlier logs.

By the way, it is easy to figure out which log belongs to which SD card because the 32GB card is an SDHC card formatted as FAT 32 and has all uppercase filenames while the card with the shutter-bug is a 64GB SDXC card formatted as exFAT and has lowercase filenames. Note that it isn't just the file format that is the problem though I'm pretty sure that all SDXC cards have the bug while only a few SDHC cards don't.

In any case, this seems like an interesting experiment:

Quote from: a1ex on March 11, 2018, 12:18:19 PM
1) to exclude differences in SD card contents (including, but not limited to, card size):
- prepare the smaller SD card out of these two (format, make bootable, copy logging code)
- create a card image (dd if=/dev/sdcard of=sd.img bs=1M)
- copy the image to the large card (dd if=sd.img of=/dev/sdcard bs=1M)
- run the logging experiment from both cards (starting with the same initial conditions)

I'm on a Mac which is slightly different (https://www.cyberciti.biz/faq/how-to-create-disk-image-on-mac-os-x-with-dd-command/) but dd is running now. Don't know if it is a Mac thing but it is taking a long time.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 12, 2018, 10:26:16 AM
Yeah, I fixed some bugs related to what I was studying (emulation-related stuff). There were some missing messages in these huge logs (https://www.magiclantern.fm/forum/index.php?topic=2388.msg197313#msg197313), so I updated the first one.

Something interesting: both logs are showing LiveView frames (uppercase: 19 frames; lowercase: 14 frames). Look for ">>> INT-6Ah" (HEAD1), 6B (HEAD2, missing, present on 5D3), D9 (HEAD3), E0 (HEAD4) and check the timestamps.

For example:

cat DM-0001.LOG | grep ">>> INT-6Ah" | cut -c 1-6,8-10 | gnuplot -p -e 'plot "/dev/stdin"'


Average delta 23.248 ms (43 FPS?!) +/- 0.13 us.

(TODO: I should take the interrupt names from here (https://www.magiclantern.fm/forum/index.php?topic=21526) and fill in the gaps; enabling the register_interrupt stub will also name some of them)

The switch to LiveView happens shortly after call("Release") - at MPU message 06 05 04 00 00 (half-shutter).

Can we find a way to take a picture without half-shutter? (maybe by triggering PROP_REMOTE_SW2 directly). Will look into it.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 12, 2018, 04:58:37 PM
Got the 64GB exFAT card that reliably shows the shutter-bug formatted to 32GB FAT 32 using dd. Now I've got a 32GB card that reliably shows the shutter-bug. Yes, this card still has the shutter-bug. Ran the same io_trace_full test and saved the log files in the same Dropbox folder (https://www.dropbox.com/sh/j59xlz1qd1yelen/AABiu9QYITsJ-8RYBQiQZFnta?dl=0).

Something interesting here--the DM-0000.LOG file is much smaller than on the other tests and it is missing "mpu_recv" messages that are on the other logs.

Ran gnuplot on the various io_trace_full startup logs and this is what I got:

32GB card without the shutter-bug
(https://farm5.staticflickr.com/4794/38959466220_cdb5bd8d3e.jpg) (https://flic.kr/p/22mHrYQ)

64GB card with shutter-bug
(https://farm5.staticflickr.com/4797/40059880494_4b1370df7d.jpg) (https://flic.kr/p/242XmNJ)

64GB card formatted to 32GB shutter-bug
(https://farm5.staticflickr.com/4796/25898715357_a57d56c237.jpg) (https://flic.kr/p/FszFM8)

Don't know what they mean but graphs are always fun to look at.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 12, 2018, 05:57:23 PM
Had a crazy idea -- is a firmware dump done while the shutter-bug is active different from a firmware dump done without the shutter-bug?

MD5 (shutter-bug/ROM0.BIN) = 39e8ee840f543fd2e5312bb9ea54a153
MD5 (no-shutter-bug/ROM0.BIN) = 99d5e249bdccaded63fa1acf669cc90f

MD5 (shutter-bug/ROM1.BIN) = b85842898bb3ff1328dbf83047bba1c4
MD5 (no-shutter-bug/ROM1.BIN) = 4a068761b41e61bd0e2741526d8795da


Same camera, same conditions so shouldn't they match? I'll see if this is repeatable then do a disassembly and see what changed. Uh--how do you do those cool html diff files? My diff fu is dismal.

diff -q shutter-bug/ROM1.BIN.dis no-shutter-bug/ROM1.BIN.dis
Files shutter-bug/ROM1.BIN.dis and no-shutter-bug/ROM1.BIN.dis differ
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 12, 2018, 06:18:39 PM
ROMs are reflashed at each shutdown with updated Canon settings (from menu or internal), so I don't expect repeatability here.

https://bitbucket.org/hudson/magic-lantern/pull-requests/825/prevent-canon-settings-from-being-saved
https://www.magiclantern.fm/forum/index.php?topic=19369

Warning: huge can of worms.




The graphs show timestamps for each LiveView frame (X = frame ID, Y = timestamp in seconds, with microsecond resolution).

The frame rate might be bumped to 43 FPS during autofocus, and gets back to 30 (guess) when autofocus finishes. Does ML's frame rate indicator agree with this?




HTML diffs:

#!/usr/bin/python2
import difflib
import os, sys

a = open(sys.argv[1]).readlines()
b = open(sys.argv[2]).readlines()

a = [l[:100] for l in a]
b = [l[:100] for l in b]

c = open("diff.html", "w")

h = difflib.HtmlDiff()
print >> c, h.make_file(a, b)


Gets in trouble in some tricky cases (doesn't always match the unchanged lines well). Very slow on large files.

May require preprocessing (to remove unimportant stuff that may be different - timestamps, code addresses etc)

Alternatives:
colordiff -y (very fast; console only; can be converted (http://skramm.blogspot.com/2014/05/subversion-colordiff-fo-html-files.html), but...)
gvimdiff before.log after.log -c "colorscheme garden" -c TOhtml -c "w! diff.html" -c q! -c q! -c q!  (looks ugly)
kdiff3 and meld seem to work too (no idea how to save as HTML though)
Beyond Compare gives good results in tricky cases (matches unchanged lines very well), but it's not free.
Pretty sure there are a lot of other tools around.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 12, 2018, 09:16:04 PM
Quote from: a1ex on March 12, 2018, 06:18:39 PM
Warning: huge can of worms.

Figures. Thought I had something there.

Here's a one-liner that works pretty good but of course I can't share the output:

diff -y -W250 shutter-bug/ROM1.BIN.dis no-shutter-bug/ROM1.BIN.dis | expand | grep -E -C3 '^.{123} [|<>]( |$)' | colordiff | aha >mydiff.html

Had to install colordiff and aha. Yeah aha is the name of the program. Well, at least my diff fu has improved.




Quote from: a1ex on March 12, 2018, 06:18:39 PM
The graphs show timestamps for each LiveView frame (X = frame ID, Y = timestamp in seconds, with microsecond resolution).

The frame rate might be bumped to 43 FPS during autofocus, and gets back to 30 (guess) when autofocus finishes. Does ML's frame rate indicator agree with this?

The frame rate indicator changes during autofocus? Wow it does! Never noticed that before. As you know on standby the fps indicator shows 29.97 (approx. it bounces around) on most settings except for Movie crop mode where it shows the setting that corresponds with the Canon menu (1920x1080 30/25/24).

Ok--so here it is in movie mode on standby:
(https://farm5.staticflickr.com/4798/39878909145_449a566562.jpg) (https://flic.kr/p/23KXQk4)

During focus (bottom display trick working nicely):
(https://farm5.staticflickr.com/4788/40063956034_3232e52007.jpg) (https://flic.kr/p/243jfjE)

Finally, it shows the "real" fps only when recording H.264:
(https://farm5.staticflickr.com/4778/25902935457_85e3081bd7.jpg) (https://flic.kr/p/FsXjgt)




Quote from: a1ex on March 11, 2018, 12:18:19 PM
The two differences are raw_aperture_min and raw_aperture_max (f4.3-f24 vs f4-f22). Likely different focal length - that one is taken from PROP_LV_LENS. Though, in QEMU, using the startup logs only, focal length is reported as 1408 (ignored by ML).

Focal length is reported in LiveView, in the CR2 logs, but there are a couple of different values (showing the lens was probably zoomed in and out during the experiment).

Maybe on the first tests. On my later tests I was very careful not to bump anything. I also made sure that WB and ISO were not on auto and set Focus on manual. That's why I was a little surprised when you mentioned the fps changing during autofocus.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 13, 2018, 08:45:25 PM
Ran some more tests today. Log files located in this Dropbox folder:

https://www.dropbox.com/sh/hwfpnsz7ki1yowu/AABYKwFaP-UYtBzeiSFsKcOLa?dl=0

First of all:

Quote from: a1ex on March 11, 2018, 12:18:19 PM
3) can the bug be reproduced without autofcus? (if yes, turn it off)

Yes and all the recent tests have been run using the MF setting.

Quote from: a1ex on March 11, 2018, 12:18:19 PM
2) to exclude timing differences while loading autoexec.bin:
- compile with CONFIG_MARK_UNUSED_MEMORY_AT_STARTUP
- this will slow down the boot process by a few seconds
- the slowdown happens before the MPU is initialized (06 04 02 00 00)
- it's likely some times larger than any timing difference caused by different card speeds
- I should probably think of a way to get repeatable timings (maybe by delaying certain debug messages until some preset value)

Ran this with the io_trace_full branch so there's lots of information in that log. I only ran it on the card with the shutter-bug because I figured that the idea here was that if we slow down the boot process maybe that will kill the shutter-bug. It didn't but I do have another large startup log file to look at.

Another test I thought would help was twisting the lens off and on. This is one of the oldest and most reliable workarounds for cards that exhibit the shutter-bug. Tried it both on the card with the shutter-bug as well as the one without the shutter-bug. I wanted to see if there was any difference and of course every line is different on these logs. In fact I ran the test multiple times and every time it was different. Something interesting is the area around cls_task because when the lens is twisted back on the screen goes black for a moment. Maybe this is the point where the lens firmware is scanned? My prediction is that the solution is going to be figuring out a way to reload the lens firmware because the shutter-bug is only happening on CPU lenses that have IS so there's quite a bit of code in the lens firmware as opposed to the simpler EF-M 22mm prime lens that doesn't have IS and doesn't exhibit the shutter-bug.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 13, 2018, 09:18:42 PM
Quote from: dfort on March 13, 2018, 08:45:25 PM
I only ran it on the card with the shutter-bug because I figured that the idea here was that if we slow down the boot process maybe that will kill the shutter-bug.

Or maybe the large delay might cause the shutter bug on the good card - who knows.

Although I still believe there might be something in the bootloader (since there were minimal autoexec.bin and FIRs that did nothing other than jumping to Canon firmware - one instruction in ASM - but still showed the bug). One of my earlier tests - which IIRC failed - was to re-run the entire bootloader with boot flag disabled (to simulate a software reset). Maybe I should revisit it.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 13, 2018, 09:36:16 PM
Quote from: a1ex on March 13, 2018, 09:18:42 PM
Or maybe the large delay might cause the shutter bug on the good card - who knows.

I didn't think of that. Ok, fair enough--ran the same test on the Sandisk 32GB card that has never exhibited the shutter-bug and surprise! Shutter-bug! Log uploaded to today's test folder (https://www.dropbox.com/sh/hwfpnsz7ki1yowu/AABYKwFaP-UYtBzeiSFsKcOLa?dl=0).

[EDIT] Ran the test again, this time on the current unified branch with the only change being to define CONFIG_MARK_UNUSED_MEMORY_AT_STARTUP in config-defines.h. Shutter-bug. So now we know how to put the shutter-bug on the one card that has consistently worked for us.

So is this the step back we need to take before running forward? In animation we call this "anticipation."

(https://farm5.staticflickr.com/4773/40794318701_467c4a115f.jpg) (https://flic.kr/p/259RxGr)
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 13, 2018, 09:49:27 PM
Wow - first sign that we might be getting closer?!

Binary search - reduce the size argument for memset64 until you find the threshold. There may be some range of values where the bug appears with some probability - finding that interval could be useful.

TODO: I should find a way to measure the time spent in bootloader (before loading autoexec.bin). ARMv7 has performance counters, but I couldn't find any on ARM946E-S; however, our processor appears to have some extensions borrowed from future CPUs, so it might be worth trying some stuff blindly. Re-reading autoexec.bin from bootloader code (including SD card re-initialization) might give a good approximation.

BTW - these two logs, with CONFIG_MARK_UNUSED_MEMORY_AT_STARTUP, don't have any interesting messages - they only have the interrupts (which are not very interesting btw - you can comment them out). The messages from MPU are the most important here - don't disable them.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 13, 2018, 10:05:20 PM
Updated my previous post.  :D

Quote from: a1ex on March 13, 2018, 09:49:27 PM
The messages from MPU are the most important here - don't disable them.

Hum--I didn't disable them. Using the same options as the previous tests that showed the MPU messages.

So what am I supposed to do here?

src/boot-hack.c
#ifdef CONFIG_MARK_UNUSED_MEMORY_AT_STARTUP
    memset64(0x00D00000, 0x124B1DE0 /* RA(W)VIDEO*/, 0x1FE00000 - 0x00D00000);
#endif


Would it be best to work on the unified branch for this or is there a different branch I should be using?
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 13, 2018, 10:28:42 PM
There is an option that filters out messages not starting with lowercase e - probably taken from here (https://www.magiclantern.fm/forum/index.php?topic=21617.msg197312#msg197312).

(0x1FE00000 - 0x00D00000) / 2 = 0xF880000.

http://www.101computing.net/my-first-python-game-guess-the-number/

Any branch should be fine; there might be a small difference in results if you are also logging the startup messages somehow, but otherwise I don't expect any difference.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 14, 2018, 06:59:03 AM
I played that divide by 2 game many times and it has always worked for me but this time it was rather frustrating. Partially because of the interruptions during testing and partially because there seems to be this range where the shutter-bug sometimes shows up and sometime it doesn't. This is probably the zone where several people have reported the bug but couldn't reliably reproduce it.

Here are my notes to find the sweet spot--or rather the bitter spot:

0x0000000 - no shutter-bug
0xF880000 - shutter-bug
0x7C40000 - shutter-bug
0x3E20000 - shutter-bug
0x1F10000 - shutter-bug
0x0F88000 - shutter-bug
0x07C4000 - no shutter-bug
0x0BA6000 - shutter-bug
0x09B5000 - no shutter-bug
0x0AAD800 - no shutter-bug
0x0B29C00 - no shutter-bug
0x0B67E00 - shutter-bug
0x0B48D00 - no shutter-bug
0x0B58580 - shutter-bug
0x0B50940 - no shutter-bug
0x0B54760 - shutter-bug
0x0B52850 - shutter-bug
0x0B518C8 - shutter-bug
0x0B51104 - shutter-bug
0x0B50D22 - shutter-bug
0x0B50B31 - no shutter-bug
0x0B50C29 - shutter-bug
0x0B50BAD - no shutter-bug
0x0B50BEB - shutter-bug
0x0B50BCC - shutter-bug
0x0B50BBC - no shutter-bug
0x0B50BC3 - shutter-bug
0x0B50BBF - shutter-bug
0x0B50BBD - shutter-bug
0x0B50BBB - shutter-bug
0x0B50BBC - shutter-bug
0x0B50BBB - shutter-bug
0x0B50BBA - shutter-bug
0x0B50BB9 - no shutter-bug
0x0B50BBA - no shutter-bug
0x0B50BBB - Sometimes yes, sometimes no


So it looks like the shutter bug shows up somewhere around 0x0B50BB9 to 0x0B50BBC. At this point it was pretty much a coin toss whether or not the shutter-bug would show up.

Used the unified branch for this test. Only changes were to define CONFIG_MARK_UNUSED_MEMORY_AT_STARTUP in src/config-defines.h and change the value here:

src/boot-hack.c
Quote#ifdef CONFIG_MARK_UNUSED_MEMORY_AT_STARTUP
    memset64(0x00D00000, 0x124B1DE0 /* RA(W)VIDEO*/, 0x0B50BBB);
#endif
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 14, 2018, 10:56:45 AM
That's quite a narrow range - I've expected to be a coin toss for some hundreds or thousands of units.

You can convert this to seconds after running the memory benchmark - you may need this PR (https://bitbucket.org/hudson/magic-lantern/pull-requests/906/memory-backend-improvements/diff) in case of error.

5D3: 217MB/s; 70D: 213MB/s => about (https://twitter.com/astroengy/status/841585485592088576) 50 ms?

Also run the following:

./run_canon_fw.sh EOSM,firmware="boot=1" -d sdcf     # any autoexec, doesn't matter, we will be analyzing the bootloader only
./run_canon_fw.sh EOSM,firmware="boot=1" -d sdcf,io  # look for "Card LED"


Hypothesis: when starting the camera with boot flag enabled, we've got the following delays:
- SD card initialization (unknown, the LED is off during this process)
- loading autoexec.bin (LED is on during this process)
- ML initialization (blob_memcpy, memset64)
- Canon initialization (until the first mpu_send) - there are some variations here as well, e.g. in the first startup logs (bug vs nobug) there is a 71-microsecond difference (minor compared to 50ms):

15609>    Startup:000e146c:00:00: *** mpu_send(06 04 02 00 00), from 38b8                            ; Init group
155C2>    Startup:000e146c:00:00: *** mpu_send(06 04 02 00 00), from 38b8                            ; Init group


- that also implies the autoexec.bin size matters - larger binaries might be more likely to cause the bug (therefore explaining the inconsistent reports with the minimal binaries - some showing the bug, others not)
- that also implies the filesystem usage matters - as the card fills up with many files, finding autoexec.bin takes more time
- therefore, formatting the card may also appear to fix this bug in some cases (if the total delay was slightly above 50ms before formatting, and slightly below afterwards).

Can you somehow time the startup process of the EOS M on both cards, since pressing the button?

Recording at 60 FPS would give a time slice of 16.7 ms. Probably good enough for our tests, as long as the button press event can be clearly seen in the video. I just want to check whether the startup with the bad card is slower by more than ~ 50 ms, compared with the good card (in vanilla configuration).

You can also add a delay (msleep) for the first mpu_send call (but not for any other). In the logging function (mpu_send_log), which gets called right before running the mpu_send, try something like this (with the good card):

  static int call_num = 0;
  call_num++;
  if (call_num == 1) /* delay the first call only */
  {
    /* do the math after running memory benchmark, then round to multiples of 10 */
    /* let's say it's 53ms -> 50 should not cause the bug, 60 should cause it */
    msleep(50);
  }


That would move the delay right before mpu_send (where I believe it matters).

Then, try delaying some other mpu_send call (the second, the third and so on). At some point, the bug should disappear.




Low-level question: when is the hardware timer (0xC0242014) started?

5D3: reading it before reloc_entry() (boot-hack.c, early startup) locks up the camera. Reading it right after that returns 0x3170 (repeatable), even if there's a CF card in the camera (in this case, the startup time is noticeably longer). Might be initialized here:


[*unk*]        at 0xFF0C008C:FFFF09C0 [0xC0242010] -> 0x0       : ???
[*unk*]        at 0xFF0C0094:FFFF09C0 [0xC0242010] <- 0x1       : ???


In other words, it cannot be used to measure the time required by the bootloader to do whatever it does (including SD card activity).




BTW, this works on QEMU, but not on the camera (autoexec.bin loaded twice):

diff -r 36acb0f9c3e2 src/reboot.c
--- a/src/reboot.c
+++ b/src/reboot.c
@@ -211,2 +211,25 @@
{
+    /* assuming memory contents are kept across software reboots */
+    /* pick some address that's not going to be overwritten during startup */
+    /* e.g. from https://www.magiclantern.fm/forum/index.php?topic=5601.msg196632#msg196632 */
+    volatile uint32_t * first_time = 0x4CDFFFFC;    /* 5D3 */
+    //volatile uint32_t * first_time = 0x47F24000;    /* EOSM */
+
+    if (*(first_time) != 0xABCD1234)  /* likely true after power on */
+    {
+        /* re-execute the bootloader, but next time take the other branch */
+        *(first_time) = 0xABCD1234;
+        void (*reboot)() = (void *) 0xFFFF0000;
+        reboot();
+
+        /* unreachable */
+        while (1);
+    }
+    else
+    {
+        /* we are back! */
+        /* clear this address for the next power on and continue loading ML */
+        *(first_time) = 0;
+    }
+
     #if !(CURRENT_CAMERA_SIGNATURE)
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 16, 2018, 01:32:02 AM
TL;DR

Just kidding  :D

Of course I was hoping for a, "Just change this one line and Bob's your uncle" solution. Somehow I knew that wasn't going to happen.

I also made a few discoveries. The one card that I have that is under 2GB is an old Kodak "SD Standard" card. According to the SD Association (https://www.sdcard.org/developers/overview/capacity/) standard cards are formatted either FAT 12 or FAT 16. That SD Standard card doesn't exhibit the shutter-bug. Cards 2-32GB are "SDHC Standard" cards and formatted FAT 32. The SanDisk Extreme Pro card that works so well falls into this category but there are lots of other cards that don't work which also fall into this category. Finally, cards greater than 32GB are "SDXC Standard" formatted exFAT and all of these cards that I have tested show the shutter-bug.

That said, nothing seems to be 100% certain with the shutter-bug. When I was doing the prior test I found a zone where it may or may not show up. In this case I tried to shoot 5 stills:

0x0B50BBB
1 bug
2 no bug
3 no bug
4 no bug
5 no bug


So I went to the next lower value and darn it, didn't work at first. There are reports that formatting the card "fixes" the shutter-bug so:

0x0B50BBA
1 bug
2 bug
3 bug
4 bug
5 bug
format card
6 no bug
7 no bug
8 no bug
9 no bug


Interesting, so I gathered several cards and started shooting the video you asked for. First attempt--the 32GB SandDisk Extreme Pro that has never failed me showed the shutter-bug. What? That was the test card I've been using and at first I thought I got things mixed up but nope--all cards had the same unified build. So not even this card is 100% free of the shutter-bug. Another thing I discovered was that the 128GB SanDisk Extreme Pro card could be cleared of the shutter-bug using the in-camera formatting trick but the bug comes back if the camera is restarted. The Lexar 64GB card I tested reliably showed the shutter-bug throughout my tests.

Put the video on YouTube and also in this Dropbox folder (https://www.dropbox.com/sh/vj6cosen2spwj9n/AABT0YbO9kPygh0pUp29RzFLa?dl=0). Hope you find what you're looking for. Note that at the end I showed the formatting trick. I also cut out the boring bits to keep the file manageable.



Anyway, still got some more homework to complete from your previous post.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 16, 2018, 02:13:04 AM
I assume you want this test to be run in the camera but this is QEMU code, right?

Quote from: a1ex on March 14, 2018, 10:56:45 AM
In the logging function (mpu_send_log), which gets called right before running the mpu_send, try something like this (with the good card):

  static int call_num = 0;
  call_num++;
  if (call_num == 1) /* delay the first call only */
  {
    /* do the math after running memory benchmark, then round to multiples of 10 */
    /* let's say it's 53ms -> 50 should not cause the bug, 60 should cause it */
    msleep(50);
  }


That would move the delay right before mpu_send (where I believe it matters).

Then, try delaying some other mpu_send call (the second, the third and so on). At some point, the bug should disappear.

The only place I found mpu_send_log was in the QEMU GDB scripts.

Quote from: a1ex on March 14, 2018, 10:56:45 AM
Low-level question: when is the hardware timer (0xC0242014) started?

Is it possible find a low-level answer to this? Will a io_trace startup log show it?

[EDIT] I remember testing several minimal autoexec.bin files to see if the shutter-bug shows up so just for the heck of it I tried building a minimal autoexec.bin from the minimal/EOSM directory and put it on the card that always shows the shutter-bug -- no shutter-bug!
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 16, 2018, 06:50:06 AM
The function you are looking for is in dm-spy-extra.c.

The vanilla io_trace won't work because:
1) reading the timer (to timestamp the MMIO messages) will crash the camera before the initial hardware setup (the first part of reloc_entry, until "We're back!")
2) the pointer to data abort function, at that early stage, is at 0x30, not at 0x2C (from FFFF0010 vs FF0C0D9C, both copied to 0x10 - the exception vector for data abort)

Both issues can be worked around and debugged in QEMU. After that, got a trace log for the initial hardware setup for 5D3 1.1.3:

; 000dd43c maps to FF0C0014
000dd43c:MMIO : [0xC0000010] <- 0xD9C5D9C5
000dd448:MMIO : [0xC020010C] <- 0x00000001
000dd450:MMIO : [0xC020000C] <- 0x000000FF
000dd454:MMIO : [0xC020001C] <- 0x000000FF
000dd458:MMIO : [0xC020002C] <- 0x000000FF
000dd45c:MMIO : [0xC020003C] <- 0x000000FF
000dd460:MMIO : [0xC020004C] <- 0x000000FF
000dd464:MMIO : [0xC020005C] <- 0x000000FF
000dd468:MMIO : [0xC020006C] <- 0x000000FF
000dd46c:MMIO : [0xC020007C] <- 0x000000FF
000dd470:MMIO : [0xC020008C] <- 0x000000FF
000dd474:MMIO : [0xC020009C] <- 0x000000FF
000dd478:MMIO : [0xC02000AC] <- 0x000000FF
000dd47c:MMIO : [0xC02000BC] <- 0x000000FF
000dd480:MMIO : [0xC02000CC] <- 0x000000FF
000dd484:MMIO : [0xC02000DC] <- 0x000000FF
000dd488:MMIO : [0xC02000EC] <- 0x000000FF
000dd48c:MMIO : [0xC02000FC] <- 0x000000FF
000dd494:MMIO : [0xC0400008] -> 0x00010003
000dd4a0:MMIO : [0xC0400008] <- 0x00430007
000dd4b4:MMIO : [0xC0242010] -> 0x40000400
000dd4bc:MMIO : [0xC0242010] <- 0x40000401

QEMU:
000dd494:MMIO : [0xC0400008] -> 0x00010003 (set in bootloader)
000dd4a0:MMIO : [0xC0400008] <- 0x00430007
000dd4b4:MMIO : [0xC0242010] -> 0x00000000 (0x40000400 comes from hardware)
000dd4bc:MMIO : [0xC0242010] <- 0x00000001


Then, tried this before reloc_entry():

    MEM(0xC0400008) |= 0x430005;    /* enable clocks, FF0C0068; locks up without it */
    MEM(0xC0242010) |= 1;           /* start digic timer, FF0C0088; prints 0 without it after adding memset64 */
    t0 = MEM(0xC0242014);           /* read digic timer */
    /* printed t0 later when boot finished => 0 */


Then, added this right before reading the timer into t0:

    memset64(0x00D00000, 0x124B1DE0 /* RA(W)VIDEO*/, 0x0B50BBA);


=> 51910 microseconds.

Some notes from g3gg0 on this register (Register_Map (http://magiclantern.wikia.com/wiki/Register_Map) page):


Another Timer
0xC0242010 [32] Control register
                  ---------------1 enable flag
                  ----xxxx-------- predivider
                      1 = 3.000 MHz tick rate
                      2 = 2.250 MHz tick rate
                      3 = 1.125 MHz tick rate
                      4 = 1.000 MHz tick rate (default)
0xC0242014 [32] 12 bit counter. used for task load measurement.


=> let's try different values:

0x40000501 => 25955 (ratio 0.5)
0x40000401 => 51910 (reference)
0x40000301 => 58399 (ratio 1.125)
0x40000201 => 116798 (ratio 2.25)
0x40000101 => 155730 (ratio 3)
0x40000001 => 233595 (ratio 4.5)
0x10000401 => 51910 (no change)
0x40000421 => 710 (?!)
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 17, 2018, 08:38:05 PM
True Confessions -- I'm only getting about 10% of what you're talking about but I do find it fascinating. Tried a few of your suggestions in QEMU and understand that the issue seems to be some sort of a delay between mpu events. I attempted to put in some delays in the boot up process but everything I tried either had no effect or the camera locked up.

Maybe there is still something that a mere mortal with an EOSM in hand can do to help a1ex resolve the shutter-bug?

Here's what I tried today. Startup logs using io_trace with these options:

src/trace_io.c
static ASM_VAR uint32_t protected_region = REGION(0xC0210000, 0x010000); // timers

Makefile.user
CONFIG_DEBUG_INTERCEPT_STARTUP = y
CONFIG_MMIO_TRACE=y


I've always had issues getting this working without crashing the camera but it does save startup logs so here's the tests I ran:

SanDisk 32GB card using Canon EF-M 11-22mm lens (https://pastebin.com/LVRPe7wm) -- no shutter-bug.
Lexar 64GB card using Canon EF-M 11-22mm lens (https://pastebin.com/b6uqE84e) -- shutter-bug.
Lexar 64GB card using Canon EF-M 22mm prime lens (https://pastebin.com/EugJF1bC) -- no shutter-bug.

Got crash logs with different information on the zoom lens logs and on the Lexar card with the zoom lens (shutter-bug) it created two log files while all the others had only one log but that's just something worth mentioning, the juicy stuff is in these logs.

This is the second and third mpu_recv events when the shutter-bug is not present. (167D9 - 161A3 =1590μs ?)

161A3> **INT-36h*:000e14e4:00:00: *** mpu_recv(50 4e 02 0e 03 03 03 00 03 00 00 65 00 00 01 19 64 00 00 00 00 81 00 00 06 04 00 00 06 04 00 00 06 04 01 01 00 00 3c 00 68 3b 00 00 00 00 00 00 00 00 00 00 00 00 80 48 78 48 00 03 00 01 00 00 00 00 02 01 00 00 00 01 00 00 00 00 00 00 01), from 1dedc
161E9> **INT-36h*:ff0f382c:00:02: [SEQ] NotifyComplete (Startup, Flag = 0x2)
16213> **INT-36h*:ff0f3890:00:03: [SEQ] NotifyComplete (Cur = 1, 0x20000003, Flag = 0x2)
16276> **INT-50h*:ff137e38:00:01: [PM] DisablePowerSave (Counter = 2)
1629E> **INT-50h*:ff137ea8:00:01: [PM] EnablePowerSave (Counter = 1)
1657F> **INT-36h*:000e14e4:00:00: *** mpu_recv(24 23 02 0d 00 00 00 00 01 00 00 00 00 00 00 00 00 01 02 00 00 00 00 00 01 01 01 01 00 00 00 00 00 00 00 00), from 1dedc
165D9> **INT-50h*:ff137e38:00:01: [PM] DisablePowerSave (Counter = 2)
165FF> **INT-50h*:ff137ea8:00:01: [PM] EnablePowerSave (Counter = 1)
167D9>    PropMgr:ff132454:00:05: [RTC] ChangePropertyCBR 0x0, 0x0


These are the same mpu_recv events when the shutter-bug is present. (16D15 - 1656F = 1958μs ?)

1656F> **INT-36h*:000e14e4:00:00: *** mpu_recv(50 4e 02 0e 03 03 03 00 03 00 00 65 00 00 01 19 64 00 00 00 00 81 00 00 06 04 00 00 06 04 00 00 06 04 01 01 00 00 3c 00 68 3b 00 00 00 00 00 00 00 00 00 00 00 00 80 48 78 48 00 03 00 01 00 00 00 00 02 01 00 00 00 01 00 00 00 00 00 00 01), from 1dedc
165B6> **INT-36h*:ff0f382c:00:02: [SEQ] NotifyComplete (Startup, Flag = 0x2)
165E2> **INT-36h*:ff0f3890:00:03: [SEQ] NotifyComplete (Cur = 1, 0x20000003, Flag = 0x2)
16635> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1665C> **INT-50h*:ff137e38:00:01: [PM] DisablePowerSave (Counter = 2)
16684> **INT-50h*:ff137ea8:00:01: [PM] EnablePowerSave (Counter = 1)
166A5> **INT-50h*:0001DB64:MMIO:  [0xC022006C] -> 0x00000047
166C4>    PropMgr:FF131D60:MMIO:  [0xC022C0C4] <- 0x0083DC00
166D5> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
166EC> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1670C> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1671A>    PropMgr:FF131CAC:MMIO:  [0xC022C0C4] <- 0x0093D800
1672A> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16743> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1675F> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1677C> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16796> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
167B3> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
167CE> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
167EA> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16806> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1681A>    PropMgr:FF131D60:MMIO:  [0xC022C0C4] <- 0x0083DC00
16822> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1683D> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1685A> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
1687D> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16896> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16A35> **INT-36h*:000e14e4:00:00: *** mpu_recv(24 23 02 0d 00 00 00 00 01 00 00 00 00 00 00 00 00 01 02 00 00 00 00 00 01 01 01 01 00 00 00 00 00 00 00 00), from 1dedc
16A7E> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16A9E> **INT-50h*:ff137e38:00:01: [PM] DisablePowerSave (Counter = 2)
16AC3> **INT-50h*:ff137ea8:00:01: [PM] EnablePowerSave (Counter = 1)
16AE3> **INT-50h*:0001DB64:MMIO:  [0xC022006C] -> 0x00000047
16B11> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16B27> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16B43> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16B62> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16B7D> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16B9A> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16BBC> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16BD8> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16BF5> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16C09>    PropMgr:FF131CAC:MMIO:  [0xC022C0C4] <- 0x0093D800
16C10> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16C38> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16C4D> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16C51>    PropMgr:FF131D60:MMIO:  [0xC022C0C4] <- 0x0083DC00
16C68> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16C84> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16CA2> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16CAA>    PropMgr:FF131CAC:MMIO:  [0xC022C0C4] <- 0x0093D800
16CBE> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16CDA> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16CF9> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047
16D15> **INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047


The Lexar 64GB that is consistently showing the shutter-bug looks like this with the EF-M 22mm prime lens -- this is a lens that doesn't trigger the shutter-bug. . (167E8 - 1614D = 1691μs ?)

1614D> **INT-36h*:000e14e4:00:00: *** mpu_recv(50 4e 02 0e 03 03 03 00 03 00 00 65 00 00 01 19 64 00 00 00 00 81 00 00 06 04 00 00 06 04 00 00 06 04 01 01 00 00 3c 00 68 3b 00 00 00 00 00 00 00 00 00 00 00 00 80 48 78 48 00 03 00 01 00 00 00 00 02 01 00 00 00 01 00 00 00 00 00 00 01), from 1dedc
16193> **INT-36h*:ff0f382c:00:02: [SEQ] NotifyComplete (Startup, Flag = 0x2)
161BE> **INT-36h*:ff0f3890:00:03: [SEQ] NotifyComplete (Cur = 1, 0x20000003, Flag = 0x2)
1621D> **INT-50h*:ff137e38:00:01: [PM] DisablePowerSave (Counter = 2)
16245> **INT-50h*:ff137ea8:00:01: [PM] EnablePowerSave (Counter = 1)
16561> **INT-36h*:000e14e4:00:00: *** mpu_recv(24 23 02 0d 00 00 00 00 01 00 00 00 00 00 00 00 00 01 02 00 00 00 00 00 01 01 01 01 00 00 00 00 00 00 00 00), from 1dedc
165B6> **INT-50h*:ff137e38:00:01: [PM] DisablePowerSave (Counter = 2)
165DB> **INT-50h*:ff137ea8:00:01: [PM] EnablePowerSave (Counter = 1)
167E8>    PropMgr:ff132454:00:05: [RTC] ChangePropertyCBR 0x0, 0x0
Title: Re: EOSM Shutter-Bug reboot
Post by: g3gg0 on March 18, 2018, 01:44:37 AM
@dfort:
when formatting, does the camera reboot?
the delay after that reboot when it works is then 1.6msec again, right?

edit:
probably a good idea to hook mpu_recv and check for the two messages, measuring the delay and show on screen "EXPECT [NO] BUG"
and run the procedure several times to get 100% confidence that this is correlated
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 18, 2018, 04:08:51 AM
Quote from: g3gg0 on March 18, 2018, 01:44:37 AM
@dfort:
when formatting, does the camera reboot?

Yes, the camera reboots and there is quite a bit of LED activity--assume that it is saving a new firmware dump at this point?

Quote from: g3gg0 on March 18, 2018, 01:44:37 AM
the delay after that reboot when it works is then 1.6msec again, right?

I can't test for that with this trace_io build because it freezes up the camera. Note that reformatting the card doesn't always work on the 128GB SanDisk card and it doesn't work at all on the 64GB Lexar card. When I was doing the tests that showed the shutter-bug on the SanDisk 32GB card formatting the card worked on that narrow range that we found but I haven't done enough testing to conclude when it is 100% reproducible.

It looks like the time it takes to run just those two mpu_recv events takes about 1.7msec when the shutter-bug isn't present and nearly 2msec when it is.

In addition, The shutter-bug has a bunch of "**INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047" events while the logs that don't have the shutter-bug don't include these events.

Quote from: g3gg0 on March 18, 2018, 01:44:37 AM
probably a good idea to hook mpu_recv and check for the two messages, measuring the delay and show on screen "EXPECT [NO] BUG"
and run the procedure several times to get 100% confidence that this is correlated

Makes sense but not sure I can figure out how to do that. Tell me exactly how to set this up and I'll test it.

Curious if the card speed affecting the total boot time had anything to do with the shutter-bug but it doesn't look like it does. The 1GB Kodak card is super slow yet it doesn't show the shutter-bug.

Taking a couple of unique events in the logs and calculating the time between these events:

SanDisk 32GB zoom -> no shutter-bug
0A316 [SEQ] seqEventDispatch (Startup, 0)
FF2D1 gmtProperty ID=0x80030011(0x10502801)
1,003 msec

Lexar 64GB zoom -> shutter-bug
0A45D [SEQ] seqEventDispatch (Startup, 0)
ED71B gmtProperty ID=0x80030011(0x10502801)
930 msec

Lexar 64GB prime -> no shutter-bug
0A321 [SEQ] seqEventDispatch (Startup, 0)
D06D3 gmtProperty ID=0x80030011(0x10501801)
812 msec


Just wondering, is the "gmtProperty ID=0x80030011" the lens ID? Notice the difference between the 11-22mm zoom and the 22mm prime lens. The shutter-bug only shows up on EF-M zoom lenses -- they all have IS so it seems reasonable to search for some sort of a connection here?
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 18, 2018, 01:00:52 PM
Cleaning out some files and realized that I sort of just glossed over this:

Quote from: dfort on March 17, 2018, 08:38:05 PM
...on the Lexar card with the zoom lens (shutter-bug) it created two log files while all the others had only one log but that's just something worth mentioning...

Turns out that second log of the card with the shutter-bug and the EF-M zoom (https://pastebin.com/vHbUVyfJ) doesn't contain those "**INT-36h*:0001DBF4:MMIO:  [0xC022006C] -> 0x00000047" events.

So how does this work? The camera is started, code is copied to RAM then it reboots, right?

Lexar 64GB zoom -> shutter-bug -> dm-0001.log (2nd log)
0A303 [SEQ] seqEventDispatch (Startup, 0)
D0D30 gmtProperty ID=0x80030011(0x10502801)
813 msec


So this looks pretty much like the logs without the shutter-bug yet the bug is there.

Quote from: dfort on March 17, 2018, 08:38:05 PM
Maybe there is still something that a mere mortal with an EOSM in hand can do to help a1ex resolve the shutter-bug?

Maybe not?
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 18, 2018, 01:12:31 PM
For some reason, the logs without shutter bug do not include any MMIO events. Were they created by the same binary? (I should probably include some version info in the logs). BTW, REGION(0xC0210000, 0x010000) does not match anything from C022nnnn, so it must have been a different binary.

Don't see any reason to log the MMIO activity for the timers; that range only covers DryOS heartbeat timer.

./run_canon_fw.sh 5D3,firmware="boot=0" -d io,int,v |& grep -a 0xC021


You can see these in my startup log (https://www.magiclantern.fm/forum/index.php?topic=2388.msg197313#msg197313), too (with the first debug_intercept call moved at "We're back!" - a modification not compatible with all models). Without that change, REGION(0xC0210000, 0x010000) will not log any events (because there's no Canon code that changes these registers after the early boot sequence).

My experiment was with the initialization before the "We're back!" stage (where the vanilla io_trace and most of other ML code cannot run). That register turned out not to be useful for our investigation (unless we can somehow reflash the bootloader or re-run it with caches patched during a software reboot).

0x80030011 is PROP_LENS (what it contains is in lens.c).
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 18, 2018, 01:26:29 PM
Same binary. Just io_trace with these minimal changes:

Magic Lantern Nightly.2018Mar18.EOSM202
Camera   : EOSM
Firmware : 202
Changeset: 2838aae45890+ (io_trace)
Built on : 2018-03-18 12:22:52 by rosiefort@RosieFoComputer

diff -r 2838aae45890 src/io_trace.c
--- a/src/io_trace.c Tue Feb 13 21:36:23 2018 +0100
+++ b/src/io_trace.c Sun Mar 18 05:22:52 2018 -0700
@@ -35,7 +35,7 @@
  * REGION(0xC0E20000, 0x010000): JPCORE (JP57, lossless)
  * REGION(0xC0000000, 0x1000000): everything except EEKO? (DIGIC <= 5)
  */
-static ASM_VAR uint32_t protected_region = REGION(0xC0220000, 0x010000);
+static ASM_VAR uint32_t protected_region = REGION(0xC0210000, 0x010000);

/* number of 32-bit integers recorded for one MMIO event (power of 2) */
#define RECORD_SIZE 8
diff -r 2838aae45890 platform/EOSM.202/internals.h
--- a/platform/EOSM.202/internals.h Tue Feb 13 21:36:23 2018 +0100
+++ b/platform/EOSM.202/internals.h Sun Mar 18 05:22:52 2018 -0700
@@ -138,7 +138,7 @@
//~ #define CONFIG_AUDIO_CONTROLS

/** Patch LiveViewApp dialog hander to hide Canon bottom bar */
-#define CONFIG_LVAPP_HACK_PATCH
+// #define CONFIG_LVAPP_HACK_PATCH

/** Workaround for menu timeout in LiveView */
#define CONFIG_MENU_TIMEOUT_FIX


CONFIG_DEBUG_INTERCEPT_STARTUP = y
CONFIG_MMIO_TRACE=y
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 19, 2018, 03:21:31 PM
Quote from: a1ex on March 18, 2018, 01:12:31 PM
BTW, REGION(0xC0210000, 0x010000) does not match anything from C022nnnn, so it must have been a different binary.

A screw up on my part is a possibility though I would swear under oath that I did everything possible to make sure that all tests were on the same binary.

Took another look at your startup log and noticed what looks like Russian characters in it:
0.000.000    юяякюяяк:ff0c7950:MMIO : [0xC0201100] <- 0x43210DCB

Ran it through Google Translate and it detected Belarusian:
(https://farm5.staticflickr.com/4785/40005436975_fe39e2ffbc_z.jpg) (https://flic.kr/p/23X9jDn)

My logs aren't showing yuyayakyuyayak.

Ok--enough of the fun and games.

Quote from: a1ex on March 18, 2018, 01:12:31 PM
My experiment was with the initialization before the "We're back!" stage (where the vanilla io_trace and most of other ML code cannot run). That register turned out not to be useful for our investigation (unless we can somehow reflash the bootloader or re-run it with caches patched during a software reboot).

That's the area that I was experimenting with by putting in some msleep calls but it would just freeze on startup.

One thing I noticed is that with the camera bootflag enabled when changing the lens with the camera turned off the LED would flash if there was a card with the bootflag enabled but not if the card didn't have the bootflag enabled.

Lens twist test with camera turned off:







Camera bootflag     Card bootflag     LED
ononflashes
onoffno activity
offonno activity
offoffno activity

I did a quick test on the 5D3 and it does the same thing.

So is the LED activity indicating that it is writing to the card or is it also active when reading the card? Whatever is going on with the lens is also happening even without a card in the camera because on the 22mm lens every time the lens is put onto the body it racks the focus even when the camera is turned off. That rack focus doesn't happen with any other lens that I tried including EF lenses with the EF-M adapter.

Of course ML requires both camera and card bootflags set so maybe whatever is causing the shutter-bug might be in this very early lens initiation stage before the camera is even turned on?
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 19, 2018, 03:38:24 PM
Quote from: dfort on March 19, 2018, 03:21:31 PM
That's the area that I was experimenting with by putting in some msleep calls but it would just freeze on startup.

msleep only works after DryOS is up and running - long after that early boot stage. I've only tried it after calling Canon's init_task; might also work right before it, but it won't work earlier.

Quote
So is the LED activity indicating that it is writing to the card or is it also active when reading the card?

Generally speaking, LED activity is independent of card activity. There can be card access with LED turned on, card access with LED turned off, no card access with LED turned on and... no card access with led turned off.

For example, 5D2 bootloader reads from the card without ever bothering to turn on the LED.

To see the specific activity in some particular case, run in QEMU with -d io and look for SD/CF and LED activity.

Quote
Of course ML requires both camera and card bootflags set so maybe whatever is causing the shutter-bug might be in this very early lens initiation stage before the camera is even turned on?

Possible. Lens initialization (whether successful or not) might depend on the time it takes from physical power on until the MPU initialization (first mpu_send call, or maybe some subsequent mpu_send call). That's why I told you to place the msleeps on only one of the mpu_send_log calls - to verify this hypothesis.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 30, 2018, 09:07:15 PM
Quote from: a1ex on March 16, 2018, 06:50:06 AM
The function you are looking for is in dm-spy-extra.c.

Ok--so I switched over to the dm-spy-experiments branch. Commented out CONFIG_LVAPP_HACK_PATCH in internals.h in order to get it to compile (adding the new stub might interfere (https://www.magiclantern.fm/forum/index.php?topic=2388.msg198295#msg198295)) ran a few startup logs and--the "good" card is showing the shutter-bug!

Let me repeat that - The "good" card is showing the shutter-bug in the dm-spy-experiments branch with CONFIG_DEBUG_INTERCEPT_STARTUP enabled.

Eight out 10 logs showed the same sequence of mpu_recv events:

DM-0000
15F7D> **INT-36h*:000e1624:00:00: *** mpu_recv(50 4e 02 0e 03 03 03 00 03 00 00 68 00 00 01 19 64 00 00 00 00 81 00 00 06 04 00 00 06 04 00 00 06 04 01 01 00 00 00 00 68 3b 00 00 00 00 00 00 00 00 00 00 00 00 80 48 78 48 00 03 00 01 00 00 00 00 02 01 00 00 00 01 00 00 00 00 00 00 01), from 1dedc
16315> **INT-36h*:000e1624:00:00: *** mpu_recv(24 23 02 0d 00 00 00 00 01 00 00 00 00 00 00 00 00 01 02 00 00 00 00 00 01 01 01 01 00 00 00 00 00 00 00 00), from 1dedc
16685> **INT-36h*:000e1624:00:00: *** mpu_recv(28 27 02 0f 01 00 00 10 01 01 00 00 00 00 00 00 00 00 00 00 00 18 00 00 00 0c 00 00 00 00 00 00 00 02 00 00 03 01 00 00), from 1dedc
1727A> **INT-36h*:000e1624:00:00: *** mpu_recv(3c 3a 02 10 00 00 00 00 00 00 01 00 00 00 00 ff 00 00 ff 00 ff ff ff ff ff ff ff ff 00 00 00 00 00 01 ff 00 00 02 01 01 00 03 00 01 00 04 00 00 01 05 00 01 01 00 01 00 00 3f 00), from 1dedc
17738> **INT-36h*:000e1624:00:00: *** mpu_recv(3e 3c 02 11 01 ff 01 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 3b 00 ff 00 00 ff 00 ff ff ff ff ff ff ff ff 00 00), from 1dedc
17841> **INT-36h*:000e1624:00:00: *** mpu_recv(06 05 01 4f 01 00), from 1dedc
17973> **INT-36h*:000e1624:00:00: *** mpu_recv(0a 08 01 6e 80 48 00 00 00), from 1dedc


However, two of the logs showed a different sequence:

DM-0002
15840> **INT-36h*:000e1624:00:00: *** mpu_recv(06 05 01 00 03 00), from 1dedc
161D1> **INT-36h*:000e1624:00:00: *** mpu_recv(50 4e 02 0e 03 03 03 00 00 00 00 68 00 00 01 19 64 00 00 00 00 81 00 00 06 04 00 00 06 04 00 00 06 04 01 01 00 00 00 00 68 3b 00 00 00 00 00 00 00 00 00 00 00 00 80 48 78 48 00 03 00 01 00 00 00 00 02 01 00 00 00 01 00 00 00 00 00 00 01), from 1dedc
16573> **INT-36h*:000e1624:00:00: *** mpu_recv(24 23 02 0d 00 00 00 00 01 00 00 00 00 00 00 00 00 01 02 00 00 00 00 00 01 01 01 01 00 00 00 00 00 00 00 00), from 1dedc
168E1> **INT-36h*:000e1624:00:00: *** mpu_recv(28 27 02 0f 01 00 00 10 01 01 00 00 00 00 00 00 00 00 00 00 00 18 00 00 00 0c 00 00 00 00 00 00 00 02 00 00 03 01 00 00), from 1dedc
16D7A> **INT-36h*:000e1624:00:00: *** mpu_recv(3c 3a 02 10 00 00 00 00 00 00 01 00 00 00 00 ff 00 00 ff 00 ff ff ff ff ff ff ff ff 00 00 00 00 00 01 ff 00 00 02 01 01 00 03 00 01 00 04 00 00 01 05 00 01 01 00 01 00 00 3f 00), from 1dedc
17250> **INT-36h*:000e1624:00:00: *** mpu_recv(3e 3c 02 11 01 ff 01 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3b 00 ff 00 00 ff 00 ff ff ff ff ff ff ff ff 00 00), from 1dedc
17441> **INT-36h*:000e1624:00:00: *** mpu_recv(0a 08 01 6e 80 48 00 00 00), from 1dedc


Note that the only way to startup the EOSM with this build is via the PLAY button. Starting up normally the camera never gets to LiveView though it does save a startup log.

Quote from: a1ex on March 14, 2018, 10:56:45 AM
You can also add a delay (msleep) for the first mpu_send call (but not for any other). In the logging function (mpu_send_log), which gets called right before running the mpu_send, try something like this (with the good card):

  static int call_num = 0;
  call_num++;
  if (call_num == 1) /* delay the first call only */
  {
    /* do the math after running memory benchmark, then round to multiples of 10 */
    /* let's say it's 53ms -> 50 should not cause the bug, 60 should cause it */
    msleep(50);
  }


That would move the delay right before mpu_send (where I believe it matters).

Then, try delaying some other mpu_send call (the second, the third and so on). At some point, the bug should disappear.

Alright, we need to run memory benchmarks but the bench module doesn't compile in the dm-spy-experiments branch so I switched over to the unified branch--not working there either so switched to the lua_fix branch...

Not very useful:
(https://farm1.staticflickr.com/882/26247435327_2a171e8ae1.jpg) (https://flic.kr/p/FZoY5p)

Shoot another boring video:


To come up with the test results:
Test function: Display on:        Display off:
memcpy cacheable 34.18 MB/s 39.68 MB/s OK
memcpy uncacheable 10.46 MB/s 11.90 MB/s OK
memcpy64 cacheable 40.04 MB/s 44.87 MB/s OK
memcpy64 uncacheable 15.96 MB/s 18.24 MB/s OK
dma_memcpy cacheable 34.20 MB/s 37.47 MB/s OK
dma_memcpy uncachab 33.55 MB/s 37.34 MB/s OK
edmac_memcpy 345.99 MB/s (test skipped) OK
edmac_copy_rectangle 175.15 MB/s (test skipped)
memset cacheable 95.11 MB/s 110.40 MB/s
memset uncacheable 40.97 MB/s 47.35 MB/s
memset64 cacheable 89.53 MB/s 104.35 MB/s
memset64 uncacheable 58.31 MB/s 65.16 MB/s
read32 cacheable 39.68 MB/s 45.50 MB/s
read32 uncacheable 12.55 MB/s 14.68 MB/s
read64 cacheable 53.03 MB/s 60.54 MB/s
read64 uncacheable 23.25 MB/s 27.27 MB/s
bpm_fill to idle buf 88.28 MB/s (test skipped)


It's not for a lack of trying but I'm still lost. Which of these values am I supposed to use to figure out how long to make the msleep call? In addition, what is going on with that code snippet? Aren't we looking for mpu_recv events? Where does it go?

[EDIT] I did try this but it isn't creating the shutter-bug on the good card (with startup log disabled) or removing the shutter-bug on the card that always shows the bug:
diff -r 912ba70b63dc src/dm-spy-extra.c
--- a/src/dm-spy-extra.c Sat Feb 03 21:55:07 2018 +0100
+++ b/src/dm-spy-extra.c Fri Mar 30 12:38:09 2018 -0700
@@ -55,6 +55,8 @@
static void StartEDmac_log(uint32_t* regs, uint32_t* stack, uint32_t pc);
static void TryPostEvent_log(uint32_t* regs, uint32_t* stack, uint32_t pc);

+static int call_num = 0;
+
struct logged_func
{
     uint32_t addr;                              /* Logged address (usually at the start of the function; will be passed to gdb_add_watchpoint) */
@@ -750,6 +752,14 @@
     mpu_decode(buf, msg, sizeof(msg));
     DryosDebugMsg(0, 0, "*** mpu_send(%02x %s), from %x", size_ex, msg, caller);
     last_mpu_timestamp = MEM(0xC0242014);
+   
+    call_num++;
+    if (call_num == 1) /* delay the first call only */
+    {
+        /* do the math after running memory benchmark, then round to multiples of 10 */
+        /* let's say it's 53ms -> 50 should not cause the bug, 60 should cause it */
+        msleep(60);
+    }
}

static void mpu_recv_log(uint32_t* regs, uint32_t* stack, uint32_t pc)


Quote from: a1ex on March 14, 2018, 10:56:45 AM
At some point, the bug should disappear.

Just not yet.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 30, 2018, 09:41:10 PM
Press PLAY before running the benchmark. Running the benchmarks in LiveView will give much lower numbers, not relevant for this experiment; the PLAY mode is a good approximation, as the DryOS tasks are mostly idle in this case. You may also disable the interrupts while running these benchmarks - that might give even higher speeds (but will prevent any kind of multitasking during the test).


/* modules/bench/mem_bench.c */
/* caveat: might interfere with EDMAC or DMA memcpy (comment them out if they lock up) */
int old = cli();
bench_fun(arg0, arg1, arg2, arg3);
sei(old);
/* might need some msleep here, to allow the Canon tasks to catch up - they could not do anything during the test */


Our call from boot-hack.c is memset64 with cacheable memory and display off.

You've got some good logs before - you could start from the same configuration. You can disable everything else, except for MPU messages, to minimize overhead. Disable MPU_DELAY_SEND too, as we don't want any slowdown for MPU (I slow them down to find the dependency relationships between them, e.g. which mpu_recv lines are likely replies to the previous mpu_send). The io_trace_full branch has minimal logging overhead, so you may use that one instead. MMIO messages won't work on that branch when logging the startup process, but we don't need them for this experiment.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 30, 2018, 10:21:06 PM
Don't get how to work with that new code snippet but did get a good memory benchmark run in Play mode:

(https://farm1.staticflickr.com/788/27248056048_ac0ac263fe.jpg) (https://flic.kr/p/HvPpKN)

Quote from: a1ex on March 30, 2018, 09:41:10 PM
Our call from boot-hack.c is memset64 with cacheable memory and display off.

    /* do the math after running memory benchmark, then round to multiples of 10 */
    /* let's say it's 53ms -> 50 should not cause the bug, 60 should cause it */


So let's say it is 180, right? Am I on the right track with the [EDIT] I made on Reply #33 (https://www.magiclantern.fm/forum/index.php?topic=21728.msg199063#msg199063)?

Or am I getting ahead of myself and need to create some more startup logs showing only MPU messages with MPU_DELAY_SEND disabled.

Switching over to the io_trace_full branch and uhhhhhhhhhh....
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 31, 2018, 03:31:36 AM
Did lots of experimenting today. Disabled MPU_DELAY_SEND, tried to follow the instructions to add a delay (msleep) for the first mpu_send call, even used the MPU_DELAY_SEND trick of using the busy waiting code instead of msleep. At one point I had the shutter-bug on the card that reliably doesn't show the bug and vice versa but before posting how I did it I tried reproducing the results without success. Slippery little bastard.

I did make some new startup logs with the io_trace_full branch. Here's a shutter-bug (https://pastebin.com/3qCx2PXy) log and a no shutter-bug log (https://pastebin.com/DG1AMRyQ). Searching for anything that might offer a clue I grepped mpu_send and found something rather interesting.

no shutter-bug
0.088.758     Startup:000038b8:00:00: *** mpu_send(06 04 02 00 00)
0.116.269     PropMgr:0000393c:00:00: *** mpu_send(08 06 00 00 02 0e 00)
0.117.568     PropMgr:0000393c:00:00: *** mpu_send(08 06 00 00 01 97 00)
0.121.538     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 57 10 33 00 01 00)
0.146.158     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 00 00 00)
0.146.631     PropMgr:0000393c:00:00: *** mpu_send(06 04 03 10 00)
0.147.472     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 ff 00)
0.170.272     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 2e 01 00)
0.172.456     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 0b 00 00 00 00 00)
0.233.858     PropMgr:0000393c:00:00: *** mpu_send(22 20 03 3b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.244.377     PropMgr:0000393c:00:00: *** mpu_send(06 05 04 0e 01 00)
0.253.125     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 17 d4 00 00)
0.253.721     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 05 00)
0.254.194     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 04 59 00)
0.255.082     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 11 01 00)
0.255.497     PropMgr:0000393c:00:00: *** mpu_send(06 05 02 0a 00 00)
0.262.666     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 0d 00 00)
0.262.971     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 0c 00 00)
0.290.910     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 19 01 00)
0.323.529     PropMgr:0000393c:00:00: *** mpu_send(06 05 09 11 01 00)
0.324.523     PropMgr:0000393c:00:00: *** mpu_send(12 11 09 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.325.260     PropMgr:0000393c:00:00: *** mpu_send(14 13 09 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.325.979     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 5a 00 00)
0.326.420     PropMgr:0000393c:00:00: *** mpu_send(06 05 09 2f 01 00)
0.349.545     PropMgr:0000393c:00:00: *** mpu_send(14 13 09 25 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.350.631     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 5a 01 00)
0.375.568     PropMgr:0000393c:00:00: *** mpu_send(0a 08 09 1a 00 00 00 00 00)
0.376.172     PropMgr:0000393c:00:00: *** mpu_send(0a 08 09 24 00 00 00 00 00)
0.388.969     PropMgr:0000393c:00:00: *** mpu_send(08 06 03 18 00 00 00)
0.389.683     PropMgr:0000393c:00:00: *** mpu_send(08 06 03 1f 00 00 00)
0.390.479     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 13 00 00)
0.390.880     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 1e 00 00)
0.399.998     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 8a 00 00)
0.400.643     PropMgr:0000393c:00:00: *** mpu_send(08 06 03 1f 00 00 00)
0.401.637     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff ff 00 00 00)
0.405.589     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff 00 00 00 00)


shutter-bug
0.088.785     Startup:000038b8:00:00: *** mpu_send(06 04 02 00 00)
0.116.777     PropMgr:0000393c:00:00: *** mpu_send(08 06 00 00 02 0e 00)
0.118.057     PropMgr:0000393c:00:00: *** mpu_send(08 06 00 00 01 97 00)
0.122.037     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 57 10 33 00 01 00)
0.146.153     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 00 00 00)
0.146.618     PropMgr:0000393c:00:00: *** mpu_send(06 04 03 10 00)
0.147.482     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 ff 00)
0.170.538     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 2e 01 00)
0.172.702     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 0b 00 00 00 00 00)
0.234.753     PropMgr:0000393c:00:00: *** mpu_send(22 20 03 3b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.245.293     PropMgr:0000393c:00:00: *** mpu_send(06 05 04 0e 01 00)
0.254.129     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 17 d3 00 00)
0.254.905     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 05 00)
0.255.395     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 08 c7 00)
0.256.404     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 11 01 00)
0.256.811     PropMgr:0000393c:00:00: *** mpu_send(06 05 02 0a 00 00)
0.264.150     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 0d 00 00)
0.264.453     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 0c 00 00)
0.291.576     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 19 01 00)
0.324.192     PropMgr:0000393c:00:00: *** mpu_send(06 05 09 11 01 00)
0.325.238     PropMgr:0000393c:00:00: *** mpu_send(12 11 09 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.325.968     PropMgr:0000393c:00:00: *** mpu_send(14 13 09 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.326.703     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 5a 00 00)
0.327.142     PropMgr:0000393c:00:00: *** mpu_send(06 05 09 2f 01 00)
0.347.417     PropMgr:0000393c:00:00: *** mpu_send(14 13 09 25 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
0.348.033     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 5a 01 00)
0.372.658     PropMgr:0000393c:00:00: *** mpu_send(0a 08 09 1a 00 00 00 00 00)
0.373.331     PropMgr:0000393c:00:00: *** mpu_send(0a 08 09 24 00 00 00 00 00)
0.385.430     PropMgr:0000393c:00:00: *** mpu_send(08 06 03 18 00 00 00)
0.386.130     PropMgr:0000393c:00:00: *** mpu_send(08 06 03 1f 00 00 00)
0.386.919     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 13 00 00)
0.387.330     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 1e 00 00)
0.396.373     PropMgr:0000393c:00:00: *** mpu_send(06 05 01 8a 00 00)
0.397.026     PropMgr:0000393c:00:00: *** mpu_send(08 06 03 1f 00 00 00)
0.398.011     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff ff 00 00 00)
0.401.832     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff 00 00 00 00)


If I understand the way this log is laid out both these logs show the first mpu_send event (Startup) starting at approximately the same time but the shutter-bug log is running ahead of the no shutter-bug log. Something probably even more significant is that there is a difference between a few lines. I ran multiple startup logs and while the no shutter-bug logs were always the same, the logs with the shutter-bug showed some differences:

no shutter-bug - all iterations were the same
0.253.125     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 17 d4 00 00)
0.253.721     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 05 00)
0.254.194     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 04 59 00)


shutter-bug 1
0.254.129     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 17 d3 00 00)
0.254.905     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 05 00)
0.255.395     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 08 c7 00)


shutter-bug 2
0.253.759     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 17 d4 00 00)
0.254.294     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 05 00)
0.254.786     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 08 c8 00)


shutter-bug 3
0.313.787     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 17 d4 00 00)
0.314.456     PropMgr:0000393c:00:00: *** mpu_send(06 05 03 07 05 00)
0.315.267     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 08 c7 00)


I don't know, it just seems strange to me. Shouldn't the mpu_send events always be the same on startup like the no shutter-bug logs?
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 31, 2018, 08:55:57 AM
Quote from: dfort on March 31, 2018, 03:31:36 AM
grepped mpu_send

That's a warning flag - if you disable everything else (including Canon messages) to have minimal overhead, you should get *only* mpu_send and mpu_recv lines in the log. Preparing one such build here (#2 will be the minimalist one):

https://builds.magiclantern.fm/jenkins/view/Experiments/job/startup-log-mpu/

In particular, logging all the interrupts has a high overhead.

The startup process is non-deterministic anyway, so you won't get the same sequence twice (if you look at all the messages, including non-MPU ones, some will be different).

To find out what's up with the above differences, try annotating the logs (https://www.magiclantern.fm/forum/index.php?topic=17596.msg198329#msg198329) (it will "explain" what each MPU message is).
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 31, 2018, 04:10:18 PM
Quote from: a1ex on March 31, 2018, 08:55:57 AM
To find out what's up with the above differences, try annotating the logs (https://www.magiclantern.fm/forum/index.php?topic=17596.msg198329#msg198329) (it will "explain" what each MPU message is).

So much for my great discovery--the differences between the logs are because of:


0.253.125     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 17 d4 00 00)                         ; PROP_CARD2_FILE_NUMBER
...
0.254.194     PropMgr:0000393c:00:00: *** mpu_send(0a 08 03 06 00 00 04 59 00)                      ; PROP_AVAIL_SHOT


Of course the logs are different, the cards are different sizes, 32GB for the no shutter-bug card and 64GB for the shutter-bug card thus the difference in PROP_AVAIL_SHOT. I probably checked to see if the shutter-bug was active a few times on the shutter-bug card so that might be why the file number changed.

Quote from: a1ex on March 31, 2018, 08:55:57 AM
Preparing one such build here (#2 will be the minimalist one):

That link points to the latest build which is currently build #3 (https://builds.magiclantern.fm/jenkins/view/Experiments/job/startup-log-mpu/3/). I was able to find build #2 (https://builds.magiclantern.fm/jenkins/view/Experiments/job/startup-log-mpu/2/) but in all cases there is no EOSM build. Easy on me you're making it not.

[EDIT] Without the new LiveViewApp_handler_BL_JudgeBottomInfoDispTimerState stub or disabling CONFIG_LVAPP_HACK_PATCH in internals.h the EOSM won't compile in these io_trace branches.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 31, 2018, 05:06:32 PM
Ok--applied the patch and created new annotated logs. Hope this brings some light to the shutter-bug:

https://www.dropbox.com/sh/eq0k1qcyeijt1c2/AADbKjfbrIio1fflq6L1gNi8a?dl=0

Interesting side note, on one of my cards that always shows the shutter-bug this build didn't show the shutter-bug but that was only on the first boot. All subsequent startups, including after formatting the card, the shutter-bug would show up.

I know that I'm supposed to be experimenting with delaying various mpu calls but so far nothing I have tried has made any difference. Any hints? Of course the Coup De Grâce for the shutter-bug is what I really want.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 31, 2018, 05:31:22 PM
Start with a card that does not show the shutter bug with logging enabled.

Then delay the first mpu_send call by some large amount, e.g. 100ms. Does it show the shutter bug? (I assume yes)

Then try 10ms. Does it show the shutter bug? (I assume no)

If my assumptions were correct, keep reading, otherwise stop.

Binary search to find the critical delay value (where slightly higher values show the shutter bug and slightly lower values do not). Timer resolution for msleep is 10 ms. Cross-check the value with memset64 critical buffer size, which you already found. Use benchmark result to convert between the two, so you can compare.

Once that works, delay some other mpu_send call (not the first). At some point, I'd expect the shutter bug to disappear, even if you increase the delay.

Write down where that happens - it's going to be a very important clue.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 01, 2018, 05:56:23 AM
Quote from: a1ex on March 31, 2018, 05:31:22 PM
Start with a card that does not show the shutter bug with logging enabled.

Check!

Quote from: a1ex on March 31, 2018, 05:31:22 PM
Then delay the first mpu_send call by some large amount, e.g. 100ms. Does it show the shutter bug? (I assume yes)

I did this before on the the dm-spy-experiments branch and it didn't work but it is doing exactly as you assumed it would with this trimmed down io_trace_full branch.

Magic Lantern Nightly.2018Mar31.EOSM202
Camera   : EOSM
Firmware : 202
Changeset: 90647297667d+ (io_trace_full_shutter-bug)
Built on : 2018-03-31 20:59:25 by rosiefort@RosieFoComputer

diff -r 90647297667d src/dm-spy-extra.c
--- a/src/dm-spy-extra.c Sat Mar 31 07:37:17 2018 -0700
+++ b/src/dm-spy-extra.c Sat Mar 31 13:59:25 2018 -0700
@@ -57,6 +57,8 @@
static void StartEDmac_log(uint32_t* regs, uint32_t* stack, uint32_t pc);
static void TryPostEvent_log(uint32_t* regs, uint32_t* stack, uint32_t pc);

+static int call_num = 0;
+
#define DM_OVERRIDE(field, new_value) \
     do { \
         struct debug_msg * dm = debug_get_last_block(); \
@@ -706,6 +708,14 @@

static void mpu_send_log(uint32_t* regs, uint32_t* stack, uint32_t pc)
{
+  call_num++;
+  if (call_num == 1) /* delay the first call only */
+  {
+    /* do the math after running memory benchmark, then round to multiples of 10 */
+    /* let's say it's 53ms -> 50 should not cause the bug, 60 should cause it */
+    msleep(100);
+  }
+
#ifdef MPU_DELAY_SEND
   if (MPU_DELAY_SEND)
   {


CONFIG_DEBUG_INTERCEPT_STARTUP = y


Quote from: a1ex on March 31, 2018, 05:31:22 PM
Then try 10ms. Does it show the shutter bug? (I assume no)

If my assumptions were correct, keep reading, otherwise stop.

Right again so let's continue.

Quote from: a1ex on March 31, 2018, 05:31:22 PM
Binary search to find the critical delay value (where slightly higher values show the shutter bug and slightly lower values do not). Timer resolution for msleep is 10 ms.

Now for the crazy part:


10  - no shutter-bug
100 - shutter-bug
50  - no shutter-bug
70  - no shutter-bug
80  - shutter-bug (format card) no shutter-bug
90  - no shutter-bug
100 - no shutter-bug (what the?)
110 - no shutter-bug
120 - no shutter-bug
130 - no shutter-bug
140 - no shutter-bug
150 - no shutter-bug
160 - shutter-bug (format card) no shutter-bug
170 - no shutter-bug
180 - no shutter-bug
190 - no shutter-bug
200 - no shutter-bug
210 - no shutter-bug
220 - no shutter-bug
230 - no shutter-bug
240 - no shutter-bug
250 - no shutter-bug
300 - shutter-bug (format card) shutter-bug
270 - shutter-bug (format card) shutter-bug
260 - shutter-bug (format card) shutter-bug
250 - shutter-bug (format card) no shutter-bug
260 - no shutter-bug (restart) shutter-bug
270 - shutter-bug (format card) no shutter-bug
280 - shutter-bug (format card) shutter-bug


Ok--so it looks like 280 is where the shutter-bug can be reliably reproduced on a SanDisk Extreme Pro 32GB card. I don't want to say that there is a lower end that is "safe" because I have experienced the shutter-bug on these cards, albeit on very rare instances. Why the card formatting? I've got a SanDisk Extreme Pro 128GB card that can be cleared of the shutter-bug by formatting the card but it will return on restart. The Lexar 64GB card that reliably shows the shutter-bug has in fact been clear of the shutter-bug on one of the tests that I ran in Reply #39 (https://www.magiclantern.fm/forum/index.php?topic=21728.msg199091#msg199091). My hypothesis is that there are several cards that fall into a rather large zone that could be affected by the shutter-bug depending on the card's memory fragmentation.

Quote from: a1ex on March 31, 2018, 05:31:22 PM
Cross-check the value with memset64 critical buffer size, which you already found. Use benchmark result to convert between the two, so you can compare.

On Reply #35 (https://www.magiclantern.fm/forum/index.php?topic=21728.msg199066#msg199066) I came up with 180 and on Reply #20 (https://www.magiclantern.fm/forum/index.php?topic=21728.msg198456#msg198456) I came up with 0x0B50BBC -- is this the memset64 critical buffer size? How does this translate into some value that we can use?

Quote from: a1ex on March 31, 2018, 05:31:22 PM
Once that works, delay some other mpu_send call (not the first). At some point, I'd expect the shutter bug to disappear, even if you increase the delay.

I suppose you want me to keep using the same card now that we have it tuned into something that reliably shows the shutter-bug but how much to delay some other mpu_send call? It seems a little heavy handed to use 280ms, doesn't it?

Quote from: a1ex on March 31, 2018, 05:31:22 PM
Write down where that happens - it's going to be a very important clue.

Has an important clue emerged out of all this yet?
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 01, 2018, 03:51:31 PM
Perhaps some logs will help determine where to put the next mpu_send delay. This time the card was formatted before installing ML so these are first time startup logs with and without the shutter-bug on the same card. The only difference being the msleep(280) call in dm-spy-extra.c in order to induce the shutter-bug. After startup I checked to make sure that the shutter-bug was or wasn't present.

Annotated logs:

https://www.dropbox.com/sh/nskqhfj1am797et/AAA11okP24-gPpJwEHVGSVLAa?dl=0

A couple of quick observations: "PROP 80050044" is missing in the log with the shutter-bug and the positions of "PROP 8004004B" and "Complete WaitID = Mode group" are swapped between the two logs.

[EDIT] Holy smokes -- first thing I tried worked. Must be the April 1st effect!

static void mpu_send_log(uint32_t* regs, uint32_t* stack, uint32_t pc)
{
  call_num++;
  if (call_num == 1) /* delay the first call only */
  {
    /* do the math after running memory benchmark, then round to multiples of 10 */
    /* let's say it's 53ms -> 50 should not cause the bug, 60 should cause it */
    /* tests on camera show that 270ms is the least amount that will reliably reproduce the shutter-bug */
    msleep(280);
  }
  if (call_num == 53) /* PROP 80050044 */
  {
    msleep(280);
  }

#ifdef MPU_DELAY_SEND


Uploading the log and running some more tests to make sure this is repeatable but it looks promising.

[EDIT2] Took out the first delay and put it on the Lexar 64GB card that reliably shows the shutter-bug and -- no shutter-bug! Just to make sure this wasn't a fluke I formatted the card and re-installed ML just like on the previous tests. The shutter-bug came back, bummer. Ok, more testing to do.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on April 01, 2018, 04:12:04 PM
Yeah, 280 is a bit high. From the benchmark, memset64 uncacheable runs at 180MB/s. Since it has to fill 0x0B50BBC, that would take 0x0B50BBC / (1024*1024) / 180 = 0.062 seconds. In other words, I've expected 70 ms to show the shutter but and 60 ms not to show it. At least the order of magnitude is not too different.

Does the shutter bug disappear if you delay some other mpu_send call? (second, third...)
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 01, 2018, 04:38:20 PM
Looks like we were both posting at the same time. Check my Reply #42 (https://www.magiclantern.fm/forum/index.php?topic=21728.msg199135#msg199135).

Of course I should test to make sure the results I got with the SanDisk card (no shutter-bug card) are repeatable and to see if I can find something on the Lexar card (shutter-bug card) that can be reliably reproduced.

Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 01, 2018, 05:13:37 PM
Nothing seems to be reliably reproducible with the exception of being able to get the shutter-bug to reliably show up on a card that usually isn't affected by the shutter-bug. Ugh!

Maybe we're trying to hit a moving target? What I mean is that the mpu_send call that needs to be delayed might be showing up in different locations? Seems like I was on to something when I found out that "PROP 80050044" is missing in the log with the shutter-bug. Went back to previous logs and it seems to be consistant. Well, almost--the log with the secondary mpu_send delay is also missing that property yet it didn't show the shutter-bug.

[EDIT] Just an educated guess at  "PROP 80050044" -- PROP_LV_CONTINUOUS_STOP ?? Searched the disassembly for a match.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on April 01, 2018, 06:39:37 PM
PROP_LV_PROHIBIT_OUTPUT?


AB96A> **INT-36h*:000e14d4:00:00: *** mpu_recv(06 04 09 2d 01), from 1dedc
AB9F6>    PropMgr:ff31fc9c:01:03: Receive Size = 0 ( ID = 0x9, 0x2D )
ABA77>    PropMgr:ff31ce4c:01:03: Deliv WaitID = 0x80050044, 0xFF183400(1)
ABAE9>        Gmt:ff17c93c:9a:01: gmtProperty ID=0x80050044(0x1)
ABB26>        Gmt:ff1800e0:9a:16:  PROP_LV_PROHIBIT_OUTPUT 1 1
Title: Re: EOSM Shutter-Bug reboot
Post by: g3gg0 on April 01, 2018, 09:19:51 PM
Quote from: dfort on April 01, 2018, 05:13:37 PM
Maybe we're trying to hit a moving target?

sporadic race conditions in a system you didn't design are the worst thing you could ever face :)
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 01, 2018, 09:24:32 PM
Interesting. PROP_LV_PROHIBIT_OUTPUT sounds like something that would block the camera from doing something--like saving an image.

However, going over previous logs it looks like some of the shutter-bug logs have the same values for that property as the no shutter-bug logs so it is probably just another dead end.
Title: Re: EOSM Shutter-Bug reboot
Post by: g3gg0 on April 01, 2018, 09:28:18 PM
yet the timing might play some role.
did it change during startup?
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 01, 2018, 10:17:36 PM
Yes, the timing changes slightly and sometimes the order also changes. I thought I had something significant this morning when the log that exhibited the shutter-bug showed that the mpu_send event that sets what we found out was PROP_LV_PROHIBIT_OUTPUT was skipped. I counted lines and figured it was the 53rd mpu_send call so that's the one I delayed and it worked on both the SanDisk card hacked to show the shutter-bug as well as the Lexar card that shows the shutter-bug. However, it wasn't consistant over several camera restarts.

https://www.dropbox.com/sh/nskqhfj1am797et/AAA11okP24-gPpJwEHVGSVLAa?dl=0
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 02, 2018, 08:01:15 AM
Yet more testing. I wasn't getting anything worth posting with the SanDisk card hacked to show the shutter-bug so I grabbed the Lexar card that reliably shows the shutter-bug. This morning it did start up without the shutter-bug by delaying mpu_send call 53 by 280ms so how about trying calls 2-60 one at a time? In addition, 280ms is heavy handed so I went with the expected 70ms delay. Here's how it went:

call  3 delay 70 - worked but not repeatable
call 26 delay 70 - worked but not repeatable
call 28 delay 70 - worked log saved not repeatable
call 34 delay 70 - worked but not repeatable
call 42 delay 70 - worked but not repeatable
call 43 delay 70 - worked but not repeatable
call 50 delay 70 - worked but not repeatable
call 51 delay 70 - worked but not repeatable


Then I tried doing "else if" statements that covered all of the calls that worked but that didn't work at all.

Finally, I tried the trick of pressing the ON/OFF button quickly twice as soon as something comes on the LiveView screen and was able to get it to start up without the shutter-bug. Yeah, sounds weird but that's one of the workarounds that was reported on the Bitbucket issue page (https://bitbucket.org/hudson/magic-lantern/issues/1893/eos-m-shutter-bug). Now that I think of it, that workaround is probably doing the delay only it is something that is repeatable--thought not very practical. Saved that log in the same Dropbox folder listed on my previous posts from today. Maybe there are some important clues in these logs?

Looks like this is the section that gets a big delay with the ON/OFF/ON workaround:

0.329.426  **INT-36h*:0001dedc:00:00: *** mpu_recv(06 05 01 8a 00 00)                               ; PROP_SW2_MOVIE_START
0.330.038     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff ff 00 00 00)                      ; PROP_USBDEVICE_CONNECT
0.331.657     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff 00 00 00 00)                      ; PROP_USBDEVICE_CONNECT
0.707.970  **INT-36h*:0001dedc:00:00: *** mpu_recv(06 05 03 5d 94 00)                               ; PROP 80030062
0.708.158  **INT-36h*:0001dedc:00:00: *** mpu_recv(06 05 03 38 96 00)                               ; PROP 80030035
0.788.374     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 1c e4 00 00)                         ; PROP_CARD2_FILE_NUMBER
0.789.574     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 1c e4 00 00)                         ; PROP_CARD2_FILE_NUMBER


[EDIT] Another false assumption. Turns out that there is always a delay of about 300-400ms at mpu_send event 37 whether or not the ON/OFF/ON workaround is used.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 02, 2018, 07:08:44 PM
Whoopi -- more logs:

https://www.dropbox.com/sh/91yck4i9cn1fbi9/AACUnEJ-X_ERgXJCkLO7AOi0a?dl=0

Since the ON/OFF/ON workaround is something that consistently kills the shutter-bug, I thought it might be interesting to see what makes it tick. Maybe figure out where to put a delay?

Process - make 5 runs each of the ON/OFF/ON workaround with the Lexar (shutter-bug) card and the SanDisk (no shutter-bug) card. The SanDisk card has the delay on the first mpu_send event to induce the shutter-bug while the Lexar card does not (not needed). Also saved are reference logs showing what happens without the ON/OFF/ON workaround. One last log is the SanDisk card started up normally without the shutter-bug hack as a "gold standard" reference.

One of the things I noticed after annotating the logs is that all of the ON/OFF/ON workaround logs show PROP_ACTIVE_SWEEP_STATUS. This is the sensor cleaning on shutdown event. This makes sense because the camera is shutdown and quickly restarted. I also tried disabling sensor cleaning and the PROP_ACTIVE_SWEEP_STATUS doesn't show up on the log but it has no effect on the shutter-bug or the effectiveness of the ON/OFF/ON workaround so I won't go any further down that road.

Interesting is that even though the power is shutdown then quickly turned back on it never goes back to the "Init group" and doesn't repeat most of the early mpu_send calls.

These do not repeat after ON/OFF/ON is triggered:
   ; Init group
   ; Complete WaitID = Mode group
   ; Complete WaitID = PROP 8003006F
   ; PROP 8003005E
   ; PROP 80030008
   ; PROP_SAVE_MODE


But wait--there's more. As long as I was logging workarounds I made one last annotated log of the lens twist workaround only this time with the trimmed down io_trace_full build on the Lexar card. I keep thinking that one way to kill the shutter-bug is to re-initialize the lens properties at some point in the startup process and maybe this log will show how that can be done?
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on April 05, 2018, 06:14:27 PM
Quote from: a1ex on April 01, 2018, 10:23:19 PM
Let's try some overclocking: sd_uhs.mo (https://a1ex.magiclantern.fm/bleeding-edge/uhs/sd_uhs.mo) (to be loaded on top of crop_rec_4k branch). Got the idea after looking into the EOS M shutter bug and understanding how clocks work (https://www.magiclantern.fm/forum/index.php?topic=21728.msg198529#msg198529) - to some extent.

Cool - good to know something good has emerged from this.

First thing I tried was to see if overclocking cures the shutter-bug. Nope.

Found something interesting when untwisting the lens--which always clears up the shutter-bug. PROP_LENS (0x80030011) temporarily changes value to zero.

8CBEF>     PtpDps:ff292d3c:33:01: ptpPropertyChangeEvent[80030011][1f][0][0]
8CC2E>     PtpDps:ff2838c8:32:01: GetConnectSessionFirst end[0][3]
8CC5F>     PtpDps:ff29e648:33:01: PROP:0x80030011,0


Note that this is not the same as starting the camera without a lens attached:

9CC43>     PtpDps:ff292d3c:33:01: ptpPropertyChangeEvent[80030011][1f][10542c01][0]
9CC7E>     PtpDps:ff2838c8:32:01: GetConnectSessionFirst end[0][3]
9CCA2>     PtpDps:ff29e648:33:01: PROP:0x80030011,10542c01


Maybe there's something here? Do a prop_request_change on PROP_LENS to "0" hoping it doesn't brick the camera so next time the lens is polled it will reload the properties and kill the shutter-bug? Maybe this is a read only property and prop_request_change won't work? Never tried it--sounds sort of scary.

[EDIT] Never mind. Looks like that property is read only.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on May 01, 2018, 09:39:20 PM
Picked up a couple of super cheap sd cards at the local electronics swap meet (http://w6trw.com/index.php/w6trw-amateur-radio-club-swap-meet/) and turns out that they aren't affected by the shutter-bug. These cards still had the GPS maps on them but after formatting and installing ML on them, they worked fine. I didn't run the benchmark speed test on them but they are very slow cards.

(https://farm1.staticflickr.com/908/41790424312_f4efff02f0.jpg) (https://flic.kr/p/26ESRc7)

More information about card sizes and formatting:

https://www.sdcard.org/developers/overview/capacity/index.html

So far all SD standard cards I've tested don't show the shutter-bug, the SanDisk Extreme 32GB which is a SDHC standard card is fine most of the time and no SDXC standard card I've tested is immune from the bug.

Yeah, I'd like to resolve the shutter-bug once and for all but sometimes you just need to let something rest for a while.
Title: Re: EOSM Shutter-Bug reboot
Post by: Danne on May 01, 2018, 10:09:49 PM
sd_uhs? :) :)
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on May 01, 2018, 10:10:54 PM
The file I/O routines have some trouble with large filesystems - for example, the portable ROM dumper works best on a 256MB filesystem, but a 16GB card is already too large for it. The issue can be reproduced in QEMU, without loading any code from the card first: from the FROMUTILITY menu, jump to "Init SD" (address printed by portable ROM dumper), then to the "Open file for write" function (address printed there as well) and use a GDB breakpoint to supply some parameters (drive, file name, address, size). Maybe these bootloader I/O functions are just very slow with large cards?

During normal execution (with boot flag disabled), there's no SD card access made by the bootloader. With boot flag enabled, the bootloader has to mount the filesystem in order to load autoexec.bin from it.

If you copy the 256MB QEMU image with dd to the large card, and run ML from it, does it still show the shutter bug?

(I've suggested a similar test earlier, but the raw card speed can also influence the result; this should no longer be the case with a 256MB filesystem)

@Danne: we are talking about bootloader, before ML code is loaded and has a chance to run.
Title: Re: EOSM Shutter-Bug reboot
Post by: Danne on May 01, 2018, 10:20:04 PM
Yes, following this thread always, what I can understand. Nasty bug...
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on May 02, 2018, 03:30:51 AM
Quote from: a1ex on May 01, 2018, 10:10:54 PM
If you copy the 256MB QEMU image with dd to the large card, and run ML from it, does it still show the shutter bug?

No it doesn't. I think we might have played around with this before but it is worth revisiting. I took a 64GB card that quite reliably shows the shutter bug and copied the QEMU sd.img to it this way:

sudo dd if=~/qemu-eos/sd.img of=/dev/disk3
Password:
506880+0 records in
506880+0 records out
259522560 bytes transferred in 245.820091 secs (1055742 bytes/sec)


Viola! No shutter-bug. Of course now that 64GB card is only 256MB. Formatted card in camera and it is back to 64GB and the shutter-bug is back too.
Title: Re: EOSM Shutter-Bug reboot
Post by: Licaon_Kter on May 18, 2018, 10:52:52 AM
Keeping an eye on this too
Title: Re: EOSM Shutter-Bug reboot
Post by: ArcziPL on May 18, 2018, 12:51:07 PM
Quote from: dfort on May 02, 2018, 03:30:51 AMI took a 64GB card that quite reliably shows the shutter bug and copied the QEMU sd.img to it
Does the camera format the card using another filesystem than the QEMU image uses?
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on May 18, 2018, 10:26:24 PM
The camera formats the card differently depending on what card you're using, SD/SDHC/SDXC. According to the Mac OS disk utility the QEMU sd.img file is formatted as MS-DOS (FAT16) which is used for SD standard cards. Note that SD standard (2GB and less) do not show the shuter-bug. More information here:

https://www.sdcard.org/developers/overview/capacity/index.html
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on May 19, 2018, 10:32:12 AM
Current hypothesis: what makes a difference is the time needed to index the contents of that filesystem (or whatever Canon bootloader does in order to load autoexec.bin). That time might be smaller with a FAT32 filesystem, as it's less complex. That time may also depend on whether the card contains photos or not, or maybe just other files in the root directory.

The 256MB image used with QEMU makes a difference because the bootloader no longer looks at all the unused space on the card (a process that takes a while on larger cards). If you are wondering where that magic size comes from -- one of my cards has that size, so the initial QEMU image was a dd image of that card. However, now that we can format the virtual card directly from the emulated camera, you can use any other size (dd if=/dev/zero of=sd.img count=whatever). The largest I've tested in the emulator is a 64GB image copied from a real card, saved as qcow2 (https://bitbucket.org/hudson/magic-lantern/src/qemu/contrib/qemu/README.rst#rst-header-creating-custom-sd-cf-card-images).

You can format the 64GB card as FAT32 in the camera with the experimental card_fmt (https://www.magiclantern.fm/forum/index.php?topic=13983.msg163200#msg163200) module. Maybe it helps with some cards (though I remember copying the FAT32 filesystem from a 32GB card didn't help...)
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on July 27, 2018, 08:36:20 PM
Wondering if anybody experienced the shutter-bug with the EF-M 28mm IS Macro lens?

(https://farm1.staticflickr.com/858/41868387390_158df5925b.jpg) (https://flic.kr/p/26MLqU7)

Just curious if what is causing the bug is related to the image stabilization or if it only happens on zoom lenses. All EF-M IS lenses up until now were zoom lenses--in fact all EF-M zoom lenses have IS. This macro lens is the first prime lens that I'm aware of that has image stabilization.
Title: Re: EOSM Shutter-Bug reboot
Post by: Kathode-Ray on October 18, 2018, 12:17:17 PM
I can confirm that with the 28mm Macro lens the shutter-bug is also present.

I used the crop_rec_4k_mlv_lite_snd.2018Apr03.EOSM202 build. As soon as I unlock the lens and turn it to the other shooting mode (normal to super macro or v.v.), the bug is gone.

Switching off the camera for few minutes usually brings it back, but sometimes it also decides to stay away, and then I have to pull the battery to get the bug back. I tried to reproduce it reliably for two hours but no luck...
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on October 18, 2018, 06:17:08 PM
v.v.?

Ok--I was betting that the shutter-bug would show up with this lens.

Looking at all of the available EF-M lenses from Canon, only the 22mm prime lens doesn't show the shutter-bug. It is also the only non Image Stabilization (IS) lens from Canon. Other manufacturers like Tamron also make EF-M lenses, also with their form of image stabilization and also show the shutter-bug.

@Kathode-Ray - you should be able to reproduce the bug with any SD card 64GB or larger, any ML build and starting the camera with the lens in the shooting position. It only affects still images so put the camera in photo mode and try shooting a still. Put the lens in the collapsed position then back to the shooting position and the shutter-bug should disappear. There's no need to pull the battery, simply restarting the camera with the lens in the shooting position will bring it back.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on October 20, 2018, 03:32:40 AM
It has been a while since we made any progress on this so I ran a rather unscientific test and found something that may or may not be significant.

I was looking at some posts in the Canon EF lens mount - Communication protocol (https://www.magiclantern.fm/forum/index.php?topic=3796.msg163298#msg163298) topic and one of the links pointed to an interesting experiment to control the lens using an Arduino board (https://www.arduino.cc/). This particular suggestion to throw out cable that connects to the image stabilization system was intriguing:

(https://farm2.staticflickr.com/1945/45386461902_dbb2087469.jpg) (https://flic.kr/p/2c9DtTd)

Not wanting to take apart my lenses and start cutting cables I thought an interesting experiment would be to block each pin on the lens to see if perhaps one will have an affect on the shutter-bug. Yeah, I know the lens/body communication is more complicated than that but as a kid I used to stick various objects in to electrical outlets to see what happens so I thought, why not?

First problem was that I couldn't find a pin out chart for EF-M lenses. The EF to EF-M adapter is wired one-to-one but there are 8 pins on the EF, EF-S side and 9 pins on the EF-M side. Note that two pins are used for P-Gnd on EF lenses but apparently this is different on the EF-M so there might be two pins that are used exclusively in the EF-M lens protocol.

(https://farm2.staticflickr.com/1959/30494370687_75d592c2fb.jpg) (https://flic.kr/p/NsFBZR)

What I did was to use a small piece of tape to block one pin at a time and eventually I found something that eliminated the shutter-bug. Well, sort of. Blocking the pin that is probably the +6 volts to power internal lens focus motors with the lens in manual still photo mode, the shutter-bug disappeared. Of course this isn't a practical solution, the lens isn't getting power so the aperture is stopped down all the way. However, I was able to shoot stills with no sign of the shutter-bug. Now for the weird part, with the camera on full auto mode, the shutter-bug is present. Not only is it present, but the workarounds we use to kill the bug don't work.

If we call that "VBat" pin #1 (if indeed that's what it is on the EOSM) then going through the other pins and blocking them created some interesting results. First of all, the next two pins should be power ground and blocking one should do nothing but on the EOSM I got different results blocking each pin.

Blocking pin #2 caused this error:

(https://farm2.staticflickr.com/1908/44520132745_9f729e7bbc.jpg) (https://flic.kr/p/2aQ6jpB)

and saved this error log (https://pastebin.com/kK81SUgw) using the lua_fix branch.

However, blocking pin #3 caused this error:

(https://farm2.staticflickr.com/1949/44520129485_7ff0605cd8.jpg) (https://flic.kr/p/2aQ6irp)

and this error log (https://pastebin.com/d2BzYmhc). So obviously these aren't the same as the EF/EF-S pins.

Blocking pin #4 the camera wouldn't even start.

Blocking pin #5 gave Err 01 and this error log (https://pastebin.com/t298Dx1r).

Blocking pin #6 gave Err 01 and this error log (https://pastebin.com/PD9Dz3Xc).

Blocking pin #7 gave Err 01 and this error log (https://pastebin.com/TsMC7kGN).

Blocking pin #8 did nothing as far as I could see, no error message or log and the shutter-bug showed up as expected. Maybe it isn't used on the lens I used for this test?

Blocking pin #9 gave Err 01 and this error log (https://pastebin.com/pWcw1cKd).

This test was done with an 11-22mm EF-M lens and a 64GB Komputerbay card that reliably shows the shutter-bug.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on October 20, 2018, 08:48:23 AM
Interesting experiments.

Quote from: dfort on October 20, 2018, 03:32:40 AM
First problem was that I couldn't find a pin out chart for EF-M lenses.

This came up as first result in a search engine (query string: "ef-m pinout"):
https://www.youtube.com/watch?v=iCVcoF_YOY0




One hard way to figure this out would be via emulation, but one has to cover 3 processors:
- the MPU (Renesas SHA2-FPU (https://www.magiclantern.fm/forum/index.php?topic=17596.msg199973;topicseen#msg199973)); this talks directly to the lens
- the lens MCU (unknown, we tried to look into the lens firmware update (https://www.magiclantern.fm/forum/index.php?topic=20969), and IIRC some lens MCUs were identified by @leegong as  Renesas)
- the main CPU (already emulated reasonably well).

Emulating the firmware running on these secondary CPUs is probably the easier part; figuring out their hardware connections is going to be difficult.

If I'll ever look into this, I'll start by emulating the DIGIC 4 MPU (TX19A), because:
- the architecture is well documented (MIPS + some Toshiba extensions)
- the exact MCU used in these cameras is documented as well
- I've actually emulated a few instructions from the TX19A firmware with NikonHacker's emulator some time ago

Functionally, both MPUs (TX19A and SHA2-FPU) do pretty much the same thing; the messages exchanged with the main CPU are pretty much the same.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on October 20, 2018, 06:01:36 PM
Quote from: a1ex on October 20, 2018, 08:48:23 AM
This came up as first result in a search engine (query string: "ef-m pinout"):
https://www.youtube.com/watch?v=iCVcoF_YOY0

I did that exact same search and looked at the video but the only pinout information was this:

(https://farm2.staticflickr.com/1935/31573976658_4d0ff3f62e.jpg) (https://flic.kr/p/Q75TeU)

That didn't help explain how all of the 8 pins on the EF/EF-S side were wired to 9 pins on the EF-M side. This is the best illustration I could find about the EF pinouts, taken from this document on How to Move Canon EF Lenses (http://web.media.mit.edu/~bandy/invariant/move_lens.pdf).

(https://farm2.staticflickr.com/1911/44724898054_92d948655c.jpg) (https://flic.kr/p/2b9bMZY)

So I pulled out my trusty multimeter and started checking how the pins were wired on my EF to EF-M adapter but the meter was telling me that the wiring wasn't all that straight forward. Next, I took apart the adapter and found this:

(https://farm2.staticflickr.com/1955/31574553518_1f7d4d5bed.jpg) (https://flic.kr/p/Q78QHL)

Hum, maybe that explains how the 8 pins on the EF side are connected to the 9 pins on the EF-M side but it is hard to see the other end of that cable so I put it back together and continued checking the pins with the multimeter and came up with this:











EF
EF-M
Pin 1 (VBAT)Pin 1 and 2
Pin 2 (P-GND)Pin 3
Pin 3 (P-GND)Pin 4
Pin 4 (VDD)Pin 5
Pin 5 (DCL)Pin 8
Pin 6 (DLC)Pin 6
Pin 7 (CLK)Pin 9
Pin 8 (D-GND)Pin 7

Hope I didn't mix things up but as you can see it isn't quite that straight forward. It would be great if someone else with a EF to EF-M adapter and a multimeter or continuity checker can verify this.
Title: Re: EOSM Shutter-Bug reboot
Post by: Kathode-Ray on October 25, 2018, 10:40:01 PM
v.v. = vice versa  :)

The problem with the bug over here is, that sometimes it decides to stay away just like that. A simple power cycle doesn't help to get it back, and I have to pull the battery to get the bug back. It also helps to remove the 28mm macro and mount the 18-55m to get it back. I also tried if manipulating the SD card with my W10 laptop helped, it seemed at first the bug would return by simply copying and removing some files to the card but i'm having a hard time reproducing that. I'll try a more scientific approach.

The pinning for the EF/EF-M adapter checks out, but i'm measuring pin 1 & 2 of the EF-M are connected to the metal bajonets, so that must be GND and not VBAT. Also, i'm getting around 680 ohms between pins 7 and 9 (CLK) so it looks like Canon inserted a resistor somewhere.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on February 12, 2019, 09:47:54 PM
If there is a way to diagnose the shutter bug, I believe it's along these lines:

https://www.magiclantern.fm/forum/index.php?topic=22770.msg211964#msg211964
https://www.magiclantern.fm/forum/index.php?topic=7531.msg208794#msg208794

Reason: we need to find out what the MPU is thinking. It will output debug messages via some TX pin, which is... somewhere on the mainboard.




edit: Daniel already found the UART pins:

Quote from: a1ex on February 13, 2019, 12:09:50 AM
(https://a1ex.magiclantern.fm/bleeding-edge/EOS-M/canon-eos-m_serial-ports.jpeg)

He doesn't have any lens that would trigger this bug, so the diagnostic will have to be done by someone else. Step by step:
- get a 3.3V UART to USB dongle (Daniel recommends one with FTDI chip, such as this (https://www.sparkfun.com/products/12977))
- locate the TXDMPU and GROUND (GND) pins in the above picture
- connect GND between camera and dongle (do I really need to say this?)
- connect TXDMPU to the RX pin of your dongle (this pin uses 3.3V, so it should work out of the box, maybe even with 5V UART dongles)
- use a serial terminal monitor (https://learn.sparkfun.com/tutorials/terminal-basics/all) to log the messages coming from that pin
- capture a log with:
    (a) a lens + SD card that will show the bug
    (b) same lens + another SD card that does not show the bug (or the same SD card formatted with a very small partition)
- comparing the two logs should give additional insights
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on February 13, 2019, 08:12:19 PM
Quote from: a1ex on February 12, 2019, 09:47:54 PM
- get a 3.3V UART to USB dongle (Daniel recommends one with FTDI chip, such as this (https://www.sparkfun.com/products/12977))

Step one -- ordered the cable. Though wouldn't it be easier just to send that Daniel a lens that triggers the shutter-bug?
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on February 24, 2019, 06:44:46 AM
Cable arrived -- looks like the next step is to take apart the camera. Are there instructions for this? Hidden screw locations? Use a saw and crowbar?
Title: Re: EOSM Shutter-Bug reboot
Post by: DeafEyeJedi on February 24, 2019, 07:10:07 AM
You may use my 2nd hand body that I have.. it's the used one that I found online for cheap which ended up having this big ugly green pixelated (seems burnt through) that shows up on my files and could careless what happens to it if you'd prefer to dissemble this one?  :-X
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on February 24, 2019, 07:21:14 AM
Quote from: dfort on February 13, 2019, 08:12:19 PM
Though wouldn't it be easier just to send that Daniel a lens that triggers the shutter-bug?

Worth mentioning we actually did this step a few days ago (together with danieel and dfort), but could not find any interesting messages.
Quote
18:22 <danieel> ok, not shooting now
18:24 <danieel> it does show only MON>>> at power on, and nothing otherwise
18:27 <danieel> so its not taking photos until i cycle the lens (take off, mount on)
18:27 <danieel> when taking off the lens, the thing is like:
18:27 <danieel> MON>>>[LEO]LIN E
18:28 <danieel> after installing the lens, it takes photos normally
...
19:31 <danieel> untwisting the lens gave another error:
19:31 <danieel> MON>>>[LEO]1_CS_ERR
19:31 <danieel> [LEO]LIN E

So, unless there is some easy way to increase the verbosity level on the MPU monitor, I'm afraid the path to solving this puzzle is a long one:
- identify the MPU processor architecture and microcontroller (DONE, danieel did this (https://www.magiclantern.fm/forum/index.php?topic=17596.msg212171#msg212171))
- dump MPU firmware (mpu_dump.c (https://bitbucket.org/hudson/magic-lantern/src/unified/modules/mpu_dump/mpu_dump.c), 1MB for the ROM, TBD - i.e. read the MCU manual - for other areas)
- attempt to run it in QEMU (might require implementing additional instructions, emulating peripherals etc)
- figure out lens communication (i.e. what happens with the "buggy" lenses) ? Might require running some logging code on the MPU; maybe even something like io_trace. Caveat: it's not ARM processor, so one has to learn the new architecture.
- run it alongside the main CPU in QEMU to attempt to reproduce the error

And that's just to figure out where exactly the issue is. Yes, it can be more difficult than porting ML to a Nikon camera. Don't be discouraged by that - it does not mean it's impossible. Far from it.

Now, the easy "shot in the dark" path:
- hacksaw (https://www.scotttorborg.com/canon-eos-m-teardown)
- RX of that cable soldered to TXDMPU
- TX of that cable soldered to RXDMPU, with a resistor in series to limit the current, as this pin will send stuff into the camera (unsure which pin; should be nearby; 1K resistor should be OK, I guess)
- type "?" on the terminal and hope to get more commands.

Do not try the above with RXDICU - that one runs at 1.8V. Use a voltage divider for that (2x470 ohm) if you really want to. All of the stuff from RXDICU is under our full control in software; there's no need to use this pin, unless the camera is bricked (i.e. when not even the bootloader is running any more). The command interpreter listening on RXDICU includes some commands to brick your camera (erasing flash, bootloader etc). Not joking.
Title: Re: EOSM Shutter-Bug reboot
Post by: kitor on February 24, 2019, 02:20:11 PM
Quote- hacksaw
By location of this connector on the board, I still believe there's a hole somewhere under battery/sd card slot cover. Unfortunately my keywords don't return any good photos to confirm that.
Title: Re: EOSM Shutter-Bug reboot
Post by: a1ex on March 08, 2019, 09:21:39 PM
Don't remember if it was tested or not - does the shutter bug persist after a software reboot triggered from ML?

With some lua_fix-based build, try this script:

camera.reboot()


It works for me on 5D3 for a different issue (https://www.magiclantern.fm/forum/index.php?topic=23622.msg213163#msg213163).

If the answer is negative (i.e. no shutter bug), I can prepare a workaround that would delay the startup process by 2-3 seconds.
Title: Re: EOSM Shutter-Bug reboot
Post by: dfort on March 08, 2019, 11:34:52 PM
Tried it and the it didn't work. If I follow your post -- the answer is positive, shutter-bug still present after a software reboot.

By the way, don't set Autorun on that script.  :P