EOS M Alpha shutter-bug discussion

Started by jerrykil, September 17, 2013, 07:14:00 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Malakai

Quote from: maxotics on October 09, 2013, 06:57:58 PM
Hi Malakai, since one can't shoot decent RAW with any of those cards it's hard (for me at least) to get worried about a bug I can't seem to replicate. You need to get a recent UHS 1 40MBS or greater card (like Sandisk).  Maybe borrow one?  Maybe it's complete wrong, but the two people with outstanding shutter bug problems are not using what I would consider the rights SD cards.  Shouldn't we factor that out?

I tried it yesterday with a brand new SanDisk SD Card, this one to be exact and it has the same bug.

I think you might find that the type of SD card has no effect on RAW capture. The sensor data will be exactly the same wether you use a cheap nasty card or a high priced top of the range card. All it will effect is the time it takes to write the data to the card.

As for the bug. I use my 18-55mm a lot and its a pain to use ML/TL with this bug. We shouldnt just factor this out just because you dont have it and a few of us do. I can see it not being an issue for those who dont use this lens or the 11-22mm. But there are some that do. Bearing in mind that ML/TL for the EOS M is still in alpha state. For it to ever progress out of alpha these bugs need ironing out.
Hunting for that elusive EOS M shutterbug!!

maxotics

HI Malakai, I bought a brand new EOS-M with 18-55mm just to help de-bug this problem!  Thanks for getting a newer card (though that is an early model of the fast ones).  Interesting that in the other thread someone has the shutter bug with a manual lens.  It seems we don't have enough users.  As you know, we offered to send 1% a lens, but he wanted to wait.  Probably a good thing since someone is now having the problem with a manual lens.

I haven't had the problem, probably because I use these cameras to shoot video RAW. 

I believe I asked you for some steps to reproduce the problem, but they were things like take battery out while camera running, which I won't do because one shouldn't do in any case.

We need steps to produce the bug, from installation, to taking a photo that DO NOT include any inappropriate actions. 

Anyway, I want to fix this too and so does everyone else.  I thought it was fixed.  I take your word for it, it isn't.



1%

Manual lens thing is probably incorrect settings as I've used extension tubes (just metal) and chipped manual lenses without issue.

Still the bug remains and goes hand in hand with the funky usb behavior. I think we need another boot option... ie rscmgr boot. Don't know if g3gg0 updated the chart for M when he was making it for 7D/60D. Fingers crossed it has free space.

a1ex

The bug is in the minimal bin (also loaded as FIR), which does not reserve any memory (it just jumps to Canon firmware at FF0C0000). So, you need to look elsewhere.

maxotics

I don't believe bug is in minimum bin.  Who has that?

Malakai

Hunting for that elusive EOS M shutterbug!!


1%

It never allocates the memory in the bad one. I can't find the point where they diverge tho.

a1ex


1%

I don't see any free rscmgr memory from just casually looking... there would have to be a space in between one of the sections.

EVVK

Malakai and a1ex, I've mentioned it few times now in both threads. But I had to re-flash Canons original firmware to get rid of the shutter bug. Also factory reset settings to get rid of some wierd colors in liveview during bursts.

My memorycard was totally formatted and clean when I still had the bug. So don't just blindly look at what is currently on your card.

maxotics

Hi EVVK, can you give some specific step-by-step instructions, thanks!  (there are so many places one could misunderstand).

gary2013

the shutter bug is back on my camera. I even tried a new 32 gb card. I have no bug with just the Canon install and no ML. As soon as I install ML, no matter what version, the bug shows up. I am trying the Oct 11 version of ML I saw posted tonight. Seems that once the bug shows up, it just stays and now shows up on both cards.

Gary

EVVK

Quote from: maxotics on October 10, 2013, 11:14:59 PM
Hi EVVK, can you give some specific step-by-step instructions, thanks!  (there are so many places one could misunderstand).

Sure, I'm mostly written this from memory.. but it should be almost correct:

Uninstall/reverting back to known working state:
1) Put camera in M-mode and flash the the FIR file that was included with ML.
2) Disable bootflag with the jog dial as instructed.
3) Do a low-level format in the camera.
4) Download Windows version of the firmware from Canons site:
http://www.canon.co.uk/Support/Consumer_Products/products/cameras/Digital_SLR/EOS_M.aspx
5) Put EOSM1202.FIR from the ZIP in the clean memorycard and flash it the same way in M mode.
4) Go to menu (Still in M-mode) and "Clear settings", all camera settings should be enough (c.fn not needed).
5) Done.

Install ML and re-producing the shutter bug:
1) Use EOSCard or other methods which you prefer to set the bootflags to the card. (I use Ubuntu so can't really give any guidance here)
2) Download EOSM labeled ML from http://ml.bot-fly.com/ and extract the files to card.
2) Put the card back in the camera and flash the ML FIR file in M mode, like FW update
3) Reboot the camera, (auto)load all the ML modules. No modules needed with Oct 10 build to reproduce the issue.
4) Shut down the camera, and remove the battery
5) Insert either EF-M 11-22mm or 18-55mm lens if you are not using those already and start up the camera again.
6) Get focus and press the shutter button. Either it's working or not. Repeat few times again from step 4.
7) have a coffee
8.) You could also try from step 4 with the EF-M 22mm or EF to EF-M adapter with other lenses just to see that it's still working from cold boot, any lens will work with a how-swap.

a1ex

Quotethe BUG is back on my camera.

Yes, this is what I'm trying to tell you. All this going back and forth between multiple versions is completely useless. You may get lucky or you may not; if you get lucky, the bug will return sooner or later.

QuoteI thought it was fixed.
Please stop speculating whether the bug is fixed or not. It's not.

To fix the bug, one has to understand what's going on and patch it somehow (emphasis on understand, not "it seems to work"). We need a clear way to reproduce it (a deterministic way, not a probabilistic one like yesterday it worked and today it doesn't). So, if you try different settings, you should do this in order to find a clear pattern for reproducing it, not to try to get rid of the bug.

I've hunted a similar bug in 550D/600D for roughly 1 year. Turns out it was a buffer overflow in Canon code: they were writing a 32-bit 0 in the middle of ML code. There was no side effects without ML, because the 0 was in the middle of a uninitialized malloc area (where ML was loading). With ML, the side effects were, obviously, depending where exactly that 0 was falling: from not noticeable (writing a 0 on something already 0 or on unused memory), to cropped strings (say it displayed Inte instead of Intervalometer) or weird behavior if the 0 was in the middle of some code (lockup in playback mode, garbled histogram and so on). The deterministic pattern was: (1) the symptom was always the same with a given autoexec.bin and (2) any code added or removed before the affected location (even just a variable declaration) was changing the symptom somehow (of course, the symptom should have been visible and known before inserting the test code), but any code added after that (in the source code) never changed the symptom. This allowed me to narrow down the affected location within a small function, dumped that part of RAM, compared "before" and "after", and then g3gg0 patched it within one day (remember: took 1 year to find a pattern and diagnose it). See https://bitbucket.org/hudson/magic-lantern/commits/99e80332797c53d3b8f3b75b3ad7fe7f608da4c5

So far, my observations are:

- The bug never happens with plain Canon firmware.
- The bug happens with any user code (even with simply jumping to Canon firmware, both autoexec and FIR), with some probability. The autocorrelation is very high (once the bug appears, it's very likely to stay).
- Removing the lens and putting it back will always remove the symptoms.
- Incomplete reboot (power down and power up quickly) will always remove the symptoms.
- When the bug is present, enabling the intervalometer will lock up the camera (battery pull needed).
- Do not confuse the bug with the camera simply failing to focus (that's not a bug).

The above items should be 100% true. If you can prove that any of them is wrong, please do.

So, here's my current hypothesis:

Jumping to Canon firmware from the bootloader may not be the best thing to do. One has to understand what exactly the bootloader is doing when it's jumping to Canon firmware on its own, without ML, and what is different when our code is jumping to Canon firmware.


Just a guess: when our code jumps to Canon firmware, lens initialization may be incomplete. Listening to the lens pins may be helpful (if there's a difference in the logs, it may confirm this). We may be able to force a lens re-initialization somehow (since the symptoms disappear when by rotating the lens). Be careful though: this will only address the symptoms without understanding the root cause of the bug, so I do not recommend this way (or if you find a solution with this method, you should not stop researching).

gary2013

It can be a few hours or a few days as to when the bug shows up on my camera. I couldn't say what I actually did to cause it to appear since I do a lot of testing everyday with this camera and I cannot recall every step.

Gary

jerrykil

a1ex,

First off, thanks for taking the time to explain the appropriate approach and giving us an example.

We've only had one user report that the minimal .bin still has the bug. I couldn't get a shutterbug with the minimal bin. Has this been confirmed? I get the feeling you were able to reproduce the bug with the minimal auto exec...

I get this weird thing i've noticed digging through the menu:


Lastly, touch shutter also locks up the camera. EDIT: if the shutter won't fire because of the bug

Thanks again! very intelligent and informative

a1ex

For me, this one user showing the bug with the minimal bin is relevant. He can reproduce it every time and I've got the diagnostic logs from him. Of course, having more users that can reproduce it with the minimal bin is even better (but I have a feeling that not many people tried it).

The exact conditions that trigger the bug are not known, so this shows the issue is not in the user code, not in the memory allocation approach and not in what modules you load or how many. It's true that these things cause the bug to come and go, but I don't think they are the root cause.

PROP is a tool for reverse engineering and it's not meant to be touched by users. In my opinion, it should never be included in the public builds (but since EOS-M development only happens in Tragic Lantern, which diverged in significant ways from Magic Lantern, I can't do much about it).

jerrykil

my concern is that with all the twisting on and off of the lens, there may have been damage. you're probably right, anyway, but would it be a good idea to suggest that malakai turn on the "shoot without lens" in the custom functions menu while trying the minimal build?

also, i meant to show the weird "µ'" that shows up in my menu below the PROP function, is that normal? apologies that i was not clear

from my experience with the bug i agree that every build i've tried so far has it. it is not reliably reproduced.

malakai, if you haven't tried yet, could you set the shoot without lens custom function?

a1ex

I twist off lenses and remove battery all day long, no damage yet.

I've already explained what's up with the PROP function (remove it, it's FEATURE_PROP_DISPLAY in features.h).

We've tried with "shoot without lens", AF, MF... doesn't seem to change anything.

jerrykil

ah, gotcha, didn't realize the µ symbol was part of that

1%

Quotealso, i meant to show the weird "µ'" that shows up in my menu below the PROP function, is that normal? apologies that i was not clear

The "u" is just property value being converted to string. The prop display is pretty harmless, but you can disable it and save 1K of bin maybe. It helps if lets say you want to find out what happens when you change touch shutter (lets assume its a property)... You navigate to the value and then turn touch shutter on, value is X, turn touch shutter off, value is Y. Now you know how to toggle it.

QuoteTurns out it was a buffer overflow in Canon code: they were writing a 32-bit 0 in the middle of ML code. There was no side effects without ML, because the 0 was in the middle of a uninitialized malloc are

Didn't g3ggo make some kind of memory protection? This seems likely related + to the USB bug. It is affected by using or not using malloc.

a1ex

The overflow I'm talking about was caused by Canon's init_task (before most ML stuff has a chance to run).

mem_prot is good to catch some null pointer errors (writing below 0x1000 or something like this), but on 5D2 for example it just locks up. The stack overflow warnings are quite handy, but they can't catch every possible overflow, and they slow down raw_rec by 2 MB/s or so (enough for people to complain).

The new memory backend catches off-by-one (or off-by-a-little) errors, also double free, but not much else.

1%

With the USB bug... malloc goes down by 1K after unplugging the cable and camera being unable to shut down.

The stack in that case also shows 0 sometimes.

a1ex

Can you catch some stack overflow messages? in what task do these happen?