Just finished something that I had in mind for a long time: a new unified memory backend.
https://bitbucket.org/hudson/magic-lantern/src/tip/src/mem.h
https://bitbucket.org/hudson/magic-lantern/src/tip/src/mem.c
Background: in each camera, there are a bunch of memory pools which have some unused space (Canon code allocates memory from there, but they are not fully used - there's a bit of free space left, and ML uses it).
Of course, Canon code doesn't know about ML (so it expects there's enough RAM for whatever it wants to do), and ML should be careful not to take away all that RAM (otherwise Canon code will fail and will usually throw ERR70).
Until now, we have used malloc and AllocateMemory for general-purpose stuff, and shoot_malloc for very large stuff (e.g. raw video frames, or storing ML files for restoring them after format). The decision whether to use malloc or AllocateMemory was hardcoded (most cameras have more free RAM in AllocateMemory, but some of them have more in malloc). A table with free RAM for each memory pool is here: https://docs.google.com/spreadsheet/ccc?key=0AgQ2MOkAZTFHdFFIcFp1d0R5TzVPTVJXOEVyUndteGc#gid=2
Now, we can just call "malloc" and the backend will decide what buffer to use (based on free space in each of them).
Advantages:
- you just call malloc without thinking too much about it (the backend decides what pool to use; you will only see a large heap of RAM that just works)
- traceable: you can see in menu how much RAM was allocated, current/peak usage, a list of blocks that also shows source file, line, task, also a history of RAM usage...
- extensible: if you find some more methods for allocating RAM, simply create a malloc-like interface and add it to the list of allocators
- you can run huge modules if you really want that (didn't try, but should work)
More RAM sources that were not used: task stack space (most cameras have 500K free, some have 1MB) and unused RscMgr memory (e.g. on 60D). For these, you can take PicoC's heap routines (that behave like malloc), add them to the allocator list in mem.c, and the existing code will start using them right away.
Notes:
- shoot_malloc is a bit special: Canon code expects it to be free when you change certain settings (e.g. noise reduction on 60D). Therefore, if we use it, we must free it quickly, before the user goes to Canon menu).
- error handling: malloc always returns 0
- for new code and refactoring existing code, I suggest using:
- plain malloc/free for most cases
- fio_malloc/fio_free if the memory is used for reading files (this goes to alloc_dma_memory or shoot_malloc)
- tmp_malloc/tmp_free if the memory will be freed quickly (so the backend will prefer shoot_malloc)
A bit related: after loading modules, TCC will be unloaded. This will free another big chunk of RAM.
Screenshots:
5D3, after loading a bunch of modules:
(http://a1ex.magiclantern.fm/bleeding-edge/newmem-5d3.png) (http://a1ex.magiclantern.fm/bleeding-edge/newmem-5d3-graph.png)
60D, no modules loaded:
(http://a1ex.magiclantern.fm/bleeding-edge/newmem-60d.png) (http://a1ex.magiclantern.fm/bleeding-edge/newmem-60d-graph.png)
(the peak is still from module backend btw - turn it off in debug menu and it will disappear)
Caveat: all ports will require renaming the malloc stubs (add an underscore); otherwise, compiling will fail (so nightly builds will be broken until somebody with code skills will take a look at each port, update the stub names and see if it works). I know, it's not very nice, but I prefer a completely broken build (because the users can just download an older build) instead of a something that compiles and has a big chance to fail.
So, all you have to do is to:
- rename malloc-related stubs in stubs.h
- get the two screenshots and upload them here
- create a pull request (so I shouldn't have to edit each stubs file)
Now you have enough memory to run OpenCV on the camera ;)
Credits go to g3gg0 for memcheck (this code is based on it).
Damn A1ex, great news. This is when I wish I had coding skills to lend a hand.
I wouldn't worry about broken nightly builds, like you say, there are plenty of other builds to use.
Good job dude ....
500D, no modules loaded: :
(http://imageshack.us/a/img856/8026/t17d.png) (http://imageshack.us/a/img819/9997/cw74.png)
So far no memory related crashes + lots of ram free.
Defish never gives up what it allocates but now it isn't a problem. Will have to check 600D soon, it + 6D were the main crash on start offenders.
good job, thanks!
should we push the module major number to make sure the users will get an error message about outdated modules,
or should we add stubs for the old calls which dont have the additional preprocessor based additional information? (__LINE__, etc)
first: clear cur, latter: legacy mode
I'd push the major number only when there are data structure changes not caught by the linker (e.g. if adding some field to menu structure => size changes, modules no longer binary compatible).
Right now the linker will just complain about missing symbols. If a module does not allocate anything dynamically, I think it will still work.
Warning: OldAPI modules are still loaded and code executed from them. Need to fix that.
550D Ettr, Raw_Rec, and Dual_ISO
(http://s15.postimg.org/h552frbiz/VRAM0_Modules.png) (http://s15.postimg.org/k0i5mmfiz/VRAM1_Modules.png)
550D No Modules
(http://s23.postimg.org/3pa6r8otn/VRAM1_No_Modules.png) (http://s23.postimg.org/qf9bk8817/VRAM0_No_Modules.png)
I noticed for some reason on my 550D and the 600D after using the cameras in live view and shooting Dual ISO the live view freezes on screen and I have to take out battery to reset. I need to do more tests but so far with the latest update with the memory code this is happening to both cameras.
The 600D freezes when trying to get details of the memory, when it finishes its refresh it freezes live view?
The 550D mostly freezes using the modules, but have not used camera long enough without modules loaded to see if it just freezes without them.
Just wanted to post a result update.
Here is the 600D memory image you asked for A1ex.
(http://i44.tinypic.com/2ed6ucx.jpg)
That's an ancient one...
Yeahh.. was 17th build, as all other builds said failed at the time. Im now on latest 20th build.. and heres the results, - Screen shot no longer crashes 600D
(http://i40.tinypic.com/2howmxd.jpg)
Nice work on the new memory stuff btw, ML seems more responsive than ever.
could this allow Small Hack to work better or is that something completely different?
Placebo? The new fonts are twice as slow (try running the menu benchmark).
This update is mostly about not failing with err70 when you load a lot of things.
Think its because iv jumped from TragicLantern to this... and just seems smoother, i do prefer old font though (from 15th sept build) was much more clear and bold to read.
Will run a benchmark now.
Latest Nightly build 20th Sept: Results: Elapsed time 16300ms
TragicLantern Aug13th: Results: 11179ms
After doing Menu Benchmark i now have this, rebooted and still here.
(http://i43.tinypic.com/zslgzo.jpg)
Update: Formatting card from camera has cleared it.
This means the memory fragmentation is bad, so the backend needs to know the size of the largest block (not just the free space).
Or a fallback mechanism (if one allocator fails, try another one).
Or both.
Other than that all seems ok, good work, ill keep trying the nightly's with my 600D.
Alex, did this memory fix affect raw recording at all? I was pleasantly surprised that I'm no longer getting pink frames with the latest nightly.
Or is this the placebo effect again ... lol
I did not even try it...
What I did at some point was to use the fast zebras by default in LiveView instead of slow raw ones. This could have had an influence over pink frames.
Quote from: a1ex on September 20, 2013, 07:29:16 PM
I did not even try it...
What I did at some point was to use the fast zebras by default in LiveView instead of slow raw ones. This could have had an influence over pink frames.
Once the build bot compiles tonight's build I will test raw further, I see you fixed the arrow shortcuts already. Thanks!
What's the status on this? Would the potentially released RAM could be used to shoot longer RAW recordings?
Maybe if you solder some memory chips.
Quote from: tin2tin on October 05, 2013, 11:25:48 AM
What's the status on this? Would the potentially released RAM could be used to shoot longer RAW recordings?
Before memory backend, I managed to record 1600 frames of raw video at 1920x1080 23,9fps on 50D, now I manage to record 2500 frames at the same settings ! 64Gb KomputerBay card. 81Mb/s - 82Mb/s boundary writing speeds, 81.5Mb/s mean value.But I don't want this to be a placebo effect... ;-)
I bet it's unrelated.
I have(some cams had) issues with it not freeing unless using TMP_malloc/fio_malloc... it takes but it won't give back till memory dwindles to nothing.
Do you have some screenshots that show the source of the error?
I would have to make you a gif/video of it eating the memory...
I had 2 problems:
1. 7D using alloc_dma_memory for FPS from slave (right inside eng_drv_out_lv)
Double free errors, underflow
*changed this to only allocate once at init, its just sizeof(uint32_t)
2. 6D murdered recording wav + H264
http://www.magiclantern.fm/forum/index.php?topic=8657.msg80747#msg80747
I changed *most* calls to use fio_malloc... small aloc to tmp_malloc 6D records for 27min +, 7d does 3 minutes and freezes
This morning I worked on 7d some more; boggled by the hard lock... go to bypass the backend completely, find I've missed the alloc_dma_memory calls in wav init
change to:
wav_buf[0] = (int *) _alloc_dma_memory(WAV_BUF_SIZE);
wav_buf[1] = (int *) _alloc_dma_memory(WAV_BUF_SIZE);
so far so good knock on wood, recorded over 5 mins no lock...
step 3 is going back to fio/tmp malloc calls and fixing above then making sure 7D still lockup free.
what else could this be affecting?
*had another thought... could those calls actually be freed while recording after ~3 mins?
The backend is most likely catching a nasty bug, so don't bypass the backend just because it appears to work. This is not an error that can be safely ignored. The side effects for this kind of error vary from not noticeable in practice, to camera acting like drunk without finding any logical explanation.
I've recorded some short WAV clips on 5D2, all seems OK.
I'll try to check the 7D next week.
I kno... there is a memory leak... either I forgot a free or bypassing it shows beep.c has the leak in general record over 2 mins and look at memory in debug menu... I think part of the problem is small alloc, I don't think its possible to replace it with tmp_malloc.
I did this:
wav_buf[0] = (int16_t*) _alloc_dma_memory(WAV_BUF_SIZE);
wav_buf[1] = (int16_t*) _alloc_dma_memory(WAV_BUF_SIZE);
rootq = SmallAlloc(sizeof(WRITE_Q));
Used fio_malloc/small for the rest and now still freezing, I'm going to try more combos like all fio_malloc/etc.
The alloc type is only a hint for choosing the allocator (whether it will it prefer to go into malloc/AllocateMemory/shoot_malloc/whatever).
It seems to be choosing malloc tho..
No memory wrapper... malloc doesn't leak, allocatemem leaks very slowly, no freezing
FIO_malloc/_malloc + _alloc_dma (for initializer) - malloc leaks ~1kb/min - allocatemem doesn't leak (i think), no freezing for ~5mins
FIO_malloc/_malloc - malloc leaks ~1kb/min - allocatemem doesn't leak (i think), no freezing 6mins
fio_malloc/tmp_malloc - freeze ~2 min.
On 6D just fixing dma_memory to fio_malloc appears to fix everything.
I think the issue *might* be that it allocates dynamically based on the data its holding, if its too big it allocates it larger and then when it goes to free it may be freeing the wrong thing.
Understood. I think we need to rewrite it from scratch, with the method used for burst silent pics (zero CPU overhead) and maybe a buffering algorithm similar to raw_rec (or maybe just double-buffering with a larger buffer should do the trick). This will also remove the need for dma_memcpy.
Also I'd like an option to pass the buffers filled with audio data via a CBR (instead of saving to a file), which should make it easy to integrate audio in MLV.
Would be nice to have a rewrite... I tried edmac memcpy there and it would kill the screen/cam as soon as you left the wav menu...
Also if you rewrite, everything seems to be working with stereo int 16 sampled as 2 uint8, both channels are recorded.
16->16, had hiccups because of size but may not if using the silent pic method/edmac
the mono 8 was just :( in terms of quality and inability to do internal L, ext R, etc.
Cool. It may be a good idea to write down what's needed for an audio API (e.g. a function for playing custom wav data, like in g3gg0's SMPTE module, or maybe sending out RS232 signals, or a function to record short samples of audio, maybe somebody wants to write a module that does voice recognition or plot a spectrogram? stuff like that)
Selecting quality (maybe you only need 22.1 ref audio), maybe fixed duration, rs232/timecode out the headphone (or pseudo headphone). Or genlock can be done this way... cam 1 hp -> cam2 mic, vc versa etc
One rather nice thing, write the header at the end so you have real lenght.
Real length is possible now, since we have a file seek function. Back then we didn't.
For double free, I think I found the bug: if you look at add_write_q and write_q_dump, the latter is allowed to save a queue that was not yet filled completely. When this happens, that buffer is saved (and freed) twice.
It's not a bug in the memory backend, but in the wav recording code.
At a closer look, the bug is severe (the recording task will write on deallocated memory, so you can expect any kind of side effects), so I'll disable this feature until a solution is found.
ah, finally. nice :)
Yeah, my bad for accepting code without fully understanding how it works.
What about static allocation of the buffer and only free when recording is finished?
understanding the code? thats rarely possible in reality.
static allocation? yep. allocation on recording, free on finishing.
permanent alloc/free/alloc makes no sense for such large blocks. (it really does that?)
My old implementation had just two buffers and no memcpy (but I think it did file I/O from the CBR). I'm thinking to try it again.
Seems to use only a few 100KB, but I meant static allocation while recording instead of allocate -> write -> free
It looks like it uses 3*~200KB or so.
Otherwise maybe do it like silent pics but using allocate mem instead of shoot malloc.
Some tweaks to the memory backend (including a fix for err70 when memory gets fragmented):
https://bitbucket.org/hudson/magic-lantern/pull-request/384/memory-backend-improvements
g3gg0 wrote a really cool module that exercises the memory backend: https://bitbucket.org/hudson/magic-lantern/commits/c7d8de335e664504755d4d2d7a0b6af1db995042
After tweaking it to be a little more aggressive (https://bitbucket.org/hudson/magic-lantern/commits/9e657454a9ce494d863e2a36a23e2b115a643eed) (allocate a lot of small blocks and keep them allocated for a while), I ran a little test on 60D (of course, with the latest tweaks applied). There are 18 tasks flooding the backend with random malloc/free requests, and the total load varies around 10-50 MB:
(http://a1ex.magiclantern.fm/bleeding-edge/malloc-flood.png)
On 700D the stub for GetMemoryInformation = 0x7498 so according to my IDA am I right that the GetSizeofMaxRegion = 0x7444???
(http://s12.postimg.org/f64d2boel/Untitled_1.jpg)
Yep. Easy to find, just time-consuming.
Suggestion: you can load an additional binary in IDA at the ram functions address space. It will make your life much easier.
More updates:
- much faster allocation (the underlying memcheck library was filling every single buffer with 0xA5)
- updated heuristics (it should no longer try to allocate tiny blocks from shoot_malloc)
- re-implemented cache bit mangling => g3gg0's flood tests passed on 5D3 too, not just 60D (right now 2600 iterations x 8 tasks and counting)
A bunch of flood tests in parallel on 5D3 (8 threads):
(http://a1ex.magiclantern.fm/bleeding-edge/malloc-flood2.png)
5D2 is even stronger (25 threads). The screenshot function can no longer keep up with screen redraws :D
(http://a1ex.magiclantern.fm/bleeding-edge/malloc-flood3.png)
Looks like the startup lockup is gone!
Cool, now try loading as many modules as you can.
Seems to be able to load all of them.. the big test will be enabling GDB again and trying that monster module.
Just loaded the big ADTG on 60D, along with 2 pages of other modules. 763K module code, 846 total, peak 1.7M, malloc 82k/381k, AllocMem 763k/1.0M, and... shoot_malloc: 0 used.
What I'm wondering is why the get max region shows lower than total... is that just contiguous memory vs all available? Or is it from the remapping of allocate mem?
Yep, max contiguous region. If you try to allocate more than that... err70.
Sorry for posting (as I'm not dev) here, but maybe there could be a chance to use this memory to hold on data that slower cameras (like 600d) could not write enough fast..? :) This could be stupid idea, just sayin.. Sorry!
ML already uses as much memory as is available for this purpose (ever notice the all the 'buffer' stuff? A buffer is simply memory that is being used exactly for this purpose). This thread has to do with the nitty gritty of how we allocate and manage the various types of memory that is available for this and various other purposes.
Some backend updates. Started to fix things to allow the 1100D and EOSM to run the memory benchmarks, and found a bunch of other issues in the process. Result: finally managed to run the old-style Lua (which does over 5000 malloc calls just to load the default set of scripts) on 1100D (the camera with the lowest amount of memory)!
Details and torture tests: https://bitbucket.org/hudson/magic-lantern/pull-requests/906/memory-backend-improvements/diff