EOSM Shutter-Bug reboot

Started by dfort, March 11, 2018, 05:04:43 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

dfort

Yes, the timing changes slightly and sometimes the order also changes. I thought I had something significant this morning when the log that exhibited the shutter-bug showed that the mpu_send event that sets what we found out was PROP_LV_PROHIBIT_OUTPUT was skipped. I counted lines and figured it was the 53rd mpu_send call so that's the one I delayed and it worked on both the SanDisk card hacked to show the shutter-bug as well as the Lexar card that shows the shutter-bug. However, it wasn't consistant over several camera restarts.

https://www.dropbox.com/sh/nskqhfj1am797et/AAA11okP24-gPpJwEHVGSVLAa?dl=0

dfort

Yet more testing. I wasn't getting anything worth posting with the SanDisk card hacked to show the shutter-bug so I grabbed the Lexar card that reliably shows the shutter-bug. This morning it did start up without the shutter-bug by delaying mpu_send call 53 by 280ms so how about trying calls 2-60 one at a time? In addition, 280ms is heavy handed so I went with the expected 70ms delay. Here's how it went:

call  3 delay 70 - worked but not repeatable
call 26 delay 70 - worked but not repeatable
call 28 delay 70 - worked log saved not repeatable
call 34 delay 70 - worked but not repeatable
call 42 delay 70 - worked but not repeatable
call 43 delay 70 - worked but not repeatable
call 50 delay 70 - worked but not repeatable
call 51 delay 70 - worked but not repeatable


Then I tried doing "else if" statements that covered all of the calls that worked but that didn't work at all.

Finally, I tried the trick of pressing the ON/OFF button quickly twice as soon as something comes on the LiveView screen and was able to get it to start up without the shutter-bug. Yeah, sounds weird but that's one of the workarounds that was reported on the Bitbucket issue page. Now that I think of it, that workaround is probably doing the delay only it is something that is repeatable--thought not very practical. Saved that log in the same Dropbox folder listed on my previous posts from today. Maybe there are some important clues in these logs?

Looks like this is the section that gets a big delay with the ON/OFF/ON workaround:

0.329.426  **INT-36h*:0001dedc:00:00: *** mpu_recv(06 05 01 8a 00 00)                               ; PROP_SW2_MOVIE_START
0.330.038     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff ff 00 00 00)                      ; PROP_USBDEVICE_CONNECT
0.331.657     PropMgr:0000393c:00:00: *** mpu_send(0a 08 01 3b ff 00 00 00 00)                      ; PROP_USBDEVICE_CONNECT
0.707.970  **INT-36h*:0001dedc:00:00: *** mpu_recv(06 05 03 5d 94 00)                               ; PROP 80030062
0.708.158  **INT-36h*:0001dedc:00:00: *** mpu_recv(06 05 03 38 96 00)                               ; PROP 80030035
0.788.374     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 1c e4 00 00)                         ; PROP_CARD2_FILE_NUMBER
0.789.574     PropMgr:0000393c:00:00: *** mpu_send(08 07 01 2a 1c e4 00 00)                         ; PROP_CARD2_FILE_NUMBER


[EDIT] Another false assumption. Turns out that there is always a delay of about 300-400ms at mpu_send event 37 whether or not the ON/OFF/ON workaround is used.

dfort

Whoopi -- more logs:

https://www.dropbox.com/sh/91yck4i9cn1fbi9/AACUnEJ-X_ERgXJCkLO7AOi0a?dl=0

Since the ON/OFF/ON workaround is something that consistently kills the shutter-bug, I thought it might be interesting to see what makes it tick. Maybe figure out where to put a delay?

Process - make 5 runs each of the ON/OFF/ON workaround with the Lexar (shutter-bug) card and the SanDisk (no shutter-bug) card. The SanDisk card has the delay on the first mpu_send event to induce the shutter-bug while the Lexar card does not (not needed). Also saved are reference logs showing what happens without the ON/OFF/ON workaround. One last log is the SanDisk card started up normally without the shutter-bug hack as a "gold standard" reference.

One of the things I noticed after annotating the logs is that all of the ON/OFF/ON workaround logs show PROP_ACTIVE_SWEEP_STATUS. This is the sensor cleaning on shutdown event. This makes sense because the camera is shutdown and quickly restarted. I also tried disabling sensor cleaning and the PROP_ACTIVE_SWEEP_STATUS doesn't show up on the log but it has no effect on the shutter-bug or the effectiveness of the ON/OFF/ON workaround so I won't go any further down that road.

Interesting is that even though the power is shutdown then quickly turned back on it never goes back to the "Init group" and doesn't repeat most of the early mpu_send calls.

These do not repeat after ON/OFF/ON is triggered:
   ; Init group
   ; Complete WaitID = Mode group
   ; Complete WaitID = PROP 8003006F
   ; PROP 8003005E
   ; PROP 80030008
   ; PROP_SAVE_MODE


But wait--there's more. As long as I was logging workarounds I made one last annotated log of the lens twist workaround only this time with the trimmed down io_trace_full build on the Lexar card. I keep thinking that one way to kill the shutter-bug is to re-initialize the lens properties at some point in the startup process and maybe this log will show how that can be done?

dfort

Quote from: a1ex on April 01, 2018, 10:23:19 PM
Let's try some overclocking: sd_uhs.mo (to be loaded on top of crop_rec_4k branch). Got the idea after looking into the EOS M shutter bug and understanding how clocks work - to some extent.

Cool - good to know something good has emerged from this.

First thing I tried was to see if overclocking cures the shutter-bug. Nope.

Found something interesting when untwisting the lens--which always clears up the shutter-bug. PROP_LENS (0x80030011) temporarily changes value to zero.

8CBEF>     PtpDps:ff292d3c:33:01: ptpPropertyChangeEvent[80030011][1f][0][0]
8CC2E>     PtpDps:ff2838c8:32:01: GetConnectSessionFirst end[0][3]
8CC5F>     PtpDps:ff29e648:33:01: PROP:0x80030011,0


Note that this is not the same as starting the camera without a lens attached:

9CC43>     PtpDps:ff292d3c:33:01: ptpPropertyChangeEvent[80030011][1f][10542c01][0]
9CC7E>     PtpDps:ff2838c8:32:01: GetConnectSessionFirst end[0][3]
9CCA2>     PtpDps:ff29e648:33:01: PROP:0x80030011,10542c01


Maybe there's something here? Do a prop_request_change on PROP_LENS to "0" hoping it doesn't brick the camera so next time the lens is polled it will reload the properties and kill the shutter-bug? Maybe this is a read only property and prop_request_change won't work? Never tried it--sounds sort of scary.

[EDIT] Never mind. Looks like that property is read only.

dfort

Picked up a couple of super cheap sd cards at the local electronics swap meet and turns out that they aren't affected by the shutter-bug. These cards still had the GPS maps on them but after formatting and installing ML on them, they worked fine. I didn't run the benchmark speed test on them but they are very slow cards.



More information about card sizes and formatting:

https://www.sdcard.org/developers/overview/capacity/index.html

So far all SD standard cards I've tested don't show the shutter-bug, the SanDisk Extreme 32GB which is a SDHC standard card is fine most of the time and no SDXC standard card I've tested is immune from the bug.

Yeah, I'd like to resolve the shutter-bug once and for all but sometimes you just need to let something rest for a while.

Danne


a1ex

The file I/O routines have some trouble with large filesystems - for example, the portable ROM dumper works best on a 256MB filesystem, but a 16GB card is already too large for it. The issue can be reproduced in QEMU, without loading any code from the card first: from the FROMUTILITY menu, jump to "Init SD" (address printed by portable ROM dumper), then to the "Open file for write" function (address printed there as well) and use a GDB breakpoint to supply some parameters (drive, file name, address, size). Maybe these bootloader I/O functions are just very slow with large cards?

During normal execution (with boot flag disabled), there's no SD card access made by the bootloader. With boot flag enabled, the bootloader has to mount the filesystem in order to load autoexec.bin from it.

If you copy the 256MB QEMU image with dd to the large card, and run ML from it, does it still show the shutter bug?

(I've suggested a similar test earlier, but the raw card speed can also influence the result; this should no longer be the case with a 256MB filesystem)

@Danne: we are talking about bootloader, before ML code is loaded and has a chance to run.

Danne

Yes, following this thread always, what I can understand. Nasty bug...

dfort

Quote from: a1ex on May 01, 2018, 10:10:54 PM
If you copy the 256MB QEMU image with dd to the large card, and run ML from it, does it still show the shutter bug?

No it doesn't. I think we might have played around with this before but it is worth revisiting. I took a 64GB card that quite reliably shows the shutter bug and copied the QEMU sd.img to it this way:

sudo dd if=~/qemu-eos/sd.img of=/dev/disk3
Password:
506880+0 records in
506880+0 records out
259522560 bytes transferred in 245.820091 secs (1055742 bytes/sec)


Viola! No shutter-bug. Of course now that 64GB card is only 256MB. Formatted card in camera and it is back to 64GB and the shutter-bug is back too.

Licaon_Kter


ArcziPL

Quote from: dfort on May 02, 2018, 03:30:51 AMI took a 64GB card that quite reliably shows the shutter bug and copied the QEMU sd.img to it
Does the camera format the card using another filesystem than the QEMU image uses?
M50.110 [main cam] | G7X III [pocket cam] | 70D.112 [gathers dust] | M.202 [gathers dust] | waiting for M5II

dfort

The camera formats the card differently depending on what card you're using, SD/SDHC/SDXC. According to the Mac OS disk utility the QEMU sd.img file is formatted as MS-DOS (FAT16) which is used for SD standard cards. Note that SD standard (2GB and less) do not show the shuter-bug. More information here:

https://www.sdcard.org/developers/overview/capacity/index.html

a1ex

Current hypothesis: what makes a difference is the time needed to index the contents of that filesystem (or whatever Canon bootloader does in order to load autoexec.bin). That time might be smaller with a FAT32 filesystem, as it's less complex. That time may also depend on whether the card contains photos or not, or maybe just other files in the root directory.

The 256MB image used with QEMU makes a difference because the bootloader no longer looks at all the unused space on the card (a process that takes a while on larger cards). If you are wondering where that magic size comes from -- one of my cards has that size, so the initial QEMU image was a dd image of that card. However, now that we can format the virtual card directly from the emulated camera, you can use any other size (dd if=/dev/zero of=sd.img count=whatever). The largest I've tested in the emulator is a 64GB image copied from a real card, saved as qcow2.

You can format the 64GB card as FAT32 in the camera with the experimental card_fmt module. Maybe it helps with some cards (though I remember copying the FAT32 filesystem from a 32GB card didn't help...)

dfort

Wondering if anybody experienced the shutter-bug with the EF-M 28mm IS Macro lens?



Just curious if what is causing the bug is related to the image stabilization or if it only happens on zoom lenses. All EF-M IS lenses up until now were zoom lenses--in fact all EF-M zoom lenses have IS. This macro lens is the first prime lens that I'm aware of that has image stabilization.

Kathode-Ray

I can confirm that with the 28mm Macro lens the shutter-bug is also present.

I used the crop_rec_4k_mlv_lite_snd.2018Apr03.EOSM202 build. As soon as I unlock the lens and turn it to the other shooting mode (normal to super macro or v.v.), the bug is gone.

Switching off the camera for few minutes usually brings it back, but sometimes it also decides to stay away, and then I have to pull the battery to get the bug back. I tried to reproduce it reliably for two hours but no luck...
6DmkII | M50 | M || AT-X 16-28 | 24-70 2.8 VC | 24-105 4.0L | 70-200 4.0L IS || 35 2.0 IS | 45 1.8 VC | 100 2.0 | 135 2.0 || 11-22 | 18-55 | 55-200 | 22 | 28 Macro || Metz 58 AF-2 | Røde Videomic Pro

dfort

v.v.?

Ok--I was betting that the shutter-bug would show up with this lens.

Looking at all of the available EF-M lenses from Canon, only the 22mm prime lens doesn't show the shutter-bug. It is also the only non Image Stabilization (IS) lens from Canon. Other manufacturers like Tamron also make EF-M lenses, also with their form of image stabilization and also show the shutter-bug.

@Kathode-Ray - you should be able to reproduce the bug with any SD card 64GB or larger, any ML build and starting the camera with the lens in the shooting position. It only affects still images so put the camera in photo mode and try shooting a still. Put the lens in the collapsed position then back to the shooting position and the shutter-bug should disappear. There's no need to pull the battery, simply restarting the camera with the lens in the shooting position will bring it back.

dfort

It has been a while since we made any progress on this so I ran a rather unscientific test and found something that may or may not be significant.

I was looking at some posts in the Canon EF lens mount - Communication protocol topic and one of the links pointed to an interesting experiment to control the lens using an Arduino board. This particular suggestion to throw out cable that connects to the image stabilization system was intriguing:



Not wanting to take apart my lenses and start cutting cables I thought an interesting experiment would be to block each pin on the lens to see if perhaps one will have an affect on the shutter-bug. Yeah, I know the lens/body communication is more complicated than that but as a kid I used to stick various objects in to electrical outlets to see what happens so I thought, why not?

First problem was that I couldn't find a pin out chart for EF-M lenses. The EF to EF-M adapter is wired one-to-one but there are 8 pins on the EF, EF-S side and 9 pins on the EF-M side. Note that two pins are used for P-Gnd on EF lenses but apparently this is different on the EF-M so there might be two pins that are used exclusively in the EF-M lens protocol.



What I did was to use a small piece of tape to block one pin at a time and eventually I found something that eliminated the shutter-bug. Well, sort of. Blocking the pin that is probably the +6 volts to power internal lens focus motors with the lens in manual still photo mode, the shutter-bug disappeared. Of course this isn't a practical solution, the lens isn't getting power so the aperture is stopped down all the way. However, I was able to shoot stills with no sign of the shutter-bug. Now for the weird part, with the camera on full auto mode, the shutter-bug is present. Not only is it present, but the workarounds we use to kill the bug don't work.

If we call that "VBat" pin #1 (if indeed that's what it is on the EOSM) then going through the other pins and blocking them created some interesting results. First of all, the next two pins should be power ground and blocking one should do nothing but on the EOSM I got different results blocking each pin.

Blocking pin #2 caused this error:



and saved this error log using the lua_fix branch.

However, blocking pin #3 caused this error:



and this error log. So obviously these aren't the same as the EF/EF-S pins.

Blocking pin #4 the camera wouldn't even start.

Blocking pin #5 gave Err 01 and this error log.

Blocking pin #6 gave Err 01 and this error log.

Blocking pin #7 gave Err 01 and this error log.

Blocking pin #8 did nothing as far as I could see, no error message or log and the shutter-bug showed up as expected. Maybe it isn't used on the lens I used for this test?

Blocking pin #9 gave Err 01 and this error log.

This test was done with an 11-22mm EF-M lens and a 64GB Komputerbay card that reliably shows the shutter-bug.

a1ex

Interesting experiments.

Quote from: dfort on October 20, 2018, 03:32:40 AM
First problem was that I couldn't find a pin out chart for EF-M lenses.

This came up as first result in a search engine (query string: "ef-m pinout"):
https://www.youtube.com/watch?v=iCVcoF_YOY0




One hard way to figure this out would be via emulation, but one has to cover 3 processors:
- the MPU (Renesas SHA2-FPU); this talks directly to the lens
- the lens MCU (unknown, we tried to look into the lens firmware update, and IIRC some lens MCUs were identified by @leegong as  Renesas)
- the main CPU (already emulated reasonably well).

Emulating the firmware running on these secondary CPUs is probably the easier part; figuring out their hardware connections is going to be difficult.

If I'll ever look into this, I'll start by emulating the DIGIC 4 MPU (TX19A), because:
- the architecture is well documented (MIPS + some Toshiba extensions)
- the exact MCU used in these cameras is documented as well
- I've actually emulated a few instructions from the TX19A firmware with NikonHacker's emulator some time ago

Functionally, both MPUs (TX19A and SHA2-FPU) do pretty much the same thing; the messages exchanged with the main CPU are pretty much the same.

dfort

Quote from: a1ex on October 20, 2018, 08:48:23 AM
This came up as first result in a search engine (query string: "ef-m pinout"):
https://www.youtube.com/watch?v=iCVcoF_YOY0

I did that exact same search and looked at the video but the only pinout information was this:



That didn't help explain how all of the 8 pins on the EF/EF-S side were wired to 9 pins on the EF-M side. This is the best illustration I could find about the EF pinouts, taken from this document on How to Move Canon EF Lenses.



So I pulled out my trusty multimeter and started checking how the pins were wired on my EF to EF-M adapter but the meter was telling me that the wiring wasn't all that straight forward. Next, I took apart the adapter and found this:



Hum, maybe that explains how the 8 pins on the EF side are connected to the 9 pins on the EF-M side but it is hard to see the other end of that cable so I put it back together and continued checking the pins with the multimeter and came up with this:











EF
EF-M
Pin 1 (VBAT)Pin 1 and 2
Pin 2 (P-GND)Pin 3
Pin 3 (P-GND)Pin 4
Pin 4 (VDD)Pin 5
Pin 5 (DCL)Pin 8
Pin 6 (DLC)Pin 6
Pin 7 (CLK)Pin 9
Pin 8 (D-GND)Pin 7

Hope I didn't mix things up but as you can see it isn't quite that straight forward. It would be great if someone else with a EF to EF-M adapter and a multimeter or continuity checker can verify this.

Kathode-Ray

v.v. = vice versa  :)

The problem with the bug over here is, that sometimes it decides to stay away just like that. A simple power cycle doesn't help to get it back, and I have to pull the battery to get the bug back. It also helps to remove the 28mm macro and mount the 18-55m to get it back. I also tried if manipulating the SD card with my W10 laptop helped, it seemed at first the bug would return by simply copying and removing some files to the card but i'm having a hard time reproducing that. I'll try a more scientific approach.

The pinning for the EF/EF-M adapter checks out, but i'm measuring pin 1 & 2 of the EF-M are connected to the metal bajonets, so that must be GND and not VBAT. Also, i'm getting around 680 ohms between pins 7 and 9 (CLK) so it looks like Canon inserted a resistor somewhere.
6DmkII | M50 | M || AT-X 16-28 | 24-70 2.8 VC | 24-105 4.0L | 70-200 4.0L IS || 35 2.0 IS | 45 1.8 VC | 100 2.0 | 135 2.0 || 11-22 | 18-55 | 55-200 | 22 | 28 Macro || Metz 58 AF-2 | Røde Videomic Pro

a1ex

If there is a way to diagnose the shutter bug, I believe it's along these lines:

https://www.magiclantern.fm/forum/index.php?topic=22770.msg211964#msg211964
https://www.magiclantern.fm/forum/index.php?topic=7531.msg208794#msg208794

Reason: we need to find out what the MPU is thinking. It will output debug messages via some TX pin, which is... somewhere on the mainboard.




edit: Daniel already found the UART pins:

Quote from: a1ex on February 13, 2019, 12:09:50 AM


He doesn't have any lens that would trigger this bug, so the diagnostic will have to be done by someone else. Step by step:
- get a 3.3V UART to USB dongle (Daniel recommends one with FTDI chip, such as this)
- locate the TXDMPU and GROUND (GND) pins in the above picture
- connect GND between camera and dongle (do I really need to say this?)
- connect TXDMPU to the RX pin of your dongle (this pin uses 3.3V, so it should work out of the box, maybe even with 5V UART dongles)
- use a serial terminal monitor to log the messages coming from that pin
- capture a log with:
    (a) a lens + SD card that will show the bug
    (b) same lens + another SD card that does not show the bug (or the same SD card formatted with a very small partition)
- comparing the two logs should give additional insights

dfort

Quote from: a1ex on February 12, 2019, 09:47:54 PM
- get a 3.3V UART to USB dongle (Daniel recommends one with FTDI chip, such as this)

Step one -- ordered the cable. Though wouldn't it be easier just to send that Daniel a lens that triggers the shutter-bug?

dfort

Cable arrived -- looks like the next step is to take apart the camera. Are there instructions for this? Hidden screw locations? Use a saw and crowbar?

DeafEyeJedi

You may use my 2nd hand body that I have.. it's the used one that I found online for cheap which ended up having this big ugly green pixelated (seems burnt through) that shows up on my files and could careless what happens to it if you'd prefer to dissemble this one?  :-X
5D3.113 | 5D3.123 | EOSM.203 | 7D.203 | 70D.112 | 100D.101 | EOSM2.* | 50D.109

a1ex

Quote from: dfort on February 13, 2019, 08:12:19 PM
Though wouldn't it be easier just to send that Daniel a lens that triggers the shutter-bug?

Worth mentioning we actually did this step a few days ago (together with danieel and dfort), but could not find any interesting messages.
Quote
18:22 <danieel> ok, not shooting now
18:24 <danieel> it does show only MON>>> at power on, and nothing otherwise
18:27 <danieel> so its not taking photos until i cycle the lens (take off, mount on)
18:27 <danieel> when taking off the lens, the thing is like:
18:27 <danieel> MON>>>[LEO]LIN E
18:28 <danieel> after installing the lens, it takes photos normally
...
19:31 <danieel> untwisting the lens gave another error:
19:31 <danieel> MON>>>[LEO]1_CS_ERR
19:31 <danieel> [LEO]LIN E

So, unless there is some easy way to increase the verbosity level on the MPU monitor, I'm afraid the path to solving this puzzle is a long one:
- identify the MPU processor architecture and microcontroller (DONE, danieel did this)
- dump MPU firmware (mpu_dump.c, 1MB for the ROM, TBD - i.e. read the MCU manual - for other areas)
- attempt to run it in QEMU (might require implementing additional instructions, emulating peripherals etc)
- figure out lens communication (i.e. what happens with the "buggy" lenses) ? Might require running some logging code on the MPU; maybe even something like io_trace. Caveat: it's not ARM processor, so one has to learn the new architecture.
- run it alongside the main CPU in QEMU to attempt to reproduce the error

And that's just to figure out where exactly the issue is. Yes, it can be more difficult than porting ML to a Nikon camera. Don't be discouraged by that - it does not mean it's impossible. Far from it.

Now, the easy "shot in the dark" path:
- hacksaw
- RX of that cable soldered to TXDMPU
- TX of that cable soldered to RXDMPU, with a resistor in series to limit the current, as this pin will send stuff into the camera (unsure which pin; should be nearby; 1K resistor should be OK, I guess)
- type "?" on the terminal and hope to get more commands.

Do not try the above with RXDICU - that one runs at 1.8V. Use a voltage divider for that (2x470 ohm) if you really want to. All of the stuff from RXDICU is under our full control in software; there's no need to use this pin, unless the camera is bricked (i.e. when not even the bootloader is running any more). The command interpreter listening on RXDICU includes some commands to brick your camera (erasing flash, bootloader etc). Not joking.