New writing strategy - variable buffering

a1ex · June 20, 2013, 05:08:30 PM

If you ever looked in the comments from raw_rec.c, you have noticed that I've stated a little goal: 1920x1080 on 1000x cards (of course, 5D3 at 24p). Goal achieved and exceeded - even got reports of 1920x1280 continuous.

During the last few days I took a closer look at the buffering strategy. While it was near-optimal for continuous recording (large contiguous chunks = faster write speed), there was (and still is) room for improvement for those cases when you want to push the recording past the sustained write speed, and squeeze as many frames as possible.

So, I've designed a new buffering strategy (I'll call it variable buffering), with the following ideas in mind:

* Write speed varies with buffer size, like this (thanks to testers who ran the benchmarks for hours on their cameras)

* Noticing the speed drop is small, it's almost always better to start writing as soon as we have one frame captured. Therefore, the new strategy aims for 100% duty cycle of the card writing task.

* Because large buffers are faster than small ones, these are preferred. If the card is fast enough, only the largest buffers will be touched, and therefore the method is still optimal for continuous recording. Even better - adding a bunch of small buffers will not slow it down at all.

* This algorithm will use every single memory buffer that can contain at least one frame (because small buffers are no longer slowing it down).

* Another cause of stopping: when the buffer is about to overflow, it's usually because the camera is trying to save a huge buffer (say a 32MB one), which will take a long time (say 1.5 seconds on slow SD cameras, 21MB/s). So, I've added a heuristic that limits buffer size - so, in this case, if we predict the buffer will overflow after only 1 second, we'll save only 20MB out of 32, which will finish at 0.95 seconds. At that moment, the capturing task will have a 20MB free chunk that can be used for capturing more frames.

* Buffering is now done at frame level, not at chunk level. This finer granularity allows me to split buffers on the fly, in whatever configuration I believe it's best for the situation.

* The algorithm is designed to adjust itself on the fly; for this, it does some predictions, such as when the buffer is likely to overflow. If it predicts well, it will squeeze a few more frames. If not... not

* More juicy details in the comments.

This is experimental. I've ran a few tests, played back a few videos on the camera, but that was all. I didn't even check whether the frames are saved in the correct order.

Build notes

- This breaks bolt_rec. Buffering is now done at frame level, not at chunk level, so bolt_rec has to be adjusted.
- The current source code has debug mode enabled - it prints funky graphs. You'll find them on the card.
- The debug code will slow down the write speed.
- I'd like you to run some test recordings and paste the graphs - this will allow me to check if there's any difference between theory and practice (you know, in theory there isn't any).
- I did not run any comparison with the older method on the camera (I did only in simulation). Would be very nice if you can do this.
- It may achieve lower write speeds. This is normal, because it also uses smaller buffers. If you also consider the idle time, it should be better overall.
- For normal usage, disable the debug code (look at the top of raw_rec.c).

History

[2013-05-17] Experiment about checking the optimal buffer sizes. People ran the benchmarks for hours on their cameras and posted a bunch of logs. They pretty much confirmed my previous theory, that any buffer size between 16MB and 32MB should result in highest speeds.

[2013-05-30] Noticed that file writes aligned at 512 bytes are a little faster (credits: CHDK). Rounded image size to multiples of 64x32 or 128x16 to ensure 512-byte rounding.

[2013-08-06] Figured out that I could just add some padding to each frame to ensure 512-byte rounding and keep the high write speeds without breaking the converters too hard. Also aligned everything at 4096 bytes, which solved some mysterious lockups from EDMAC and brought back the highest speed in benchmarks (over 700MB/s).

[2013-05-30] speedsim.py - First attempt to get a mathematical model of the recording process. Input: resolution, fps and available buffers. Output: how many frames you will get, with detailed graphs. Also in-camera estimation of how many frames you will get with current settings.

[2013-06-18] Took a closer look at these logs and fitted a mathematical model for the speed drop at small buffer sizes.

[2013-06-18] Does buffer ordering/splitting matter? 1% experimented with it before, but there was no clear conclusion.
* is it better to take the highest one first or the smallest one first? there's no clear answer, each one is best for some cases and suboptimal for others.
* since some cameras had very few memory chunks (e.g. 550D: 32+32+8 MB), what if each of the 32MB buffer is divided in 2x16 or 4x8 MB? This brought a significant improvement for resolutions just above the continuous recording threshold, but lowered performance for continuous recording.
* optimization: updated speedsim.py so it finds the best memory configuration for one particular situation. Xaint confirmed the optimization results on 550D.
* problem: there was no one-size-fits-all solution.

[2013-06-19] Simulation now matches perfectly the real-world results. So, the mathematical model is accurate!

[2013-06-19] Started to sketch the variable buffering algorithm and already got some simulation results. There was a clear improvement for borderline cases (settings that require just a little more write speed that your camera+card can deliver).

Example: 550D, 1280x426, 23.976fps, 21.16MB/s, simulation:

- 8MB + 2x32MB (current method) - 317 frames
- 9x8MB - 1566 frames
- Variable buffering, starting from 8MB + 2x32MB - 1910 frames

[2013-06-20] Ran a few more tests and noticed that it meets or exceeds the performance of the old algorithm with sort/split optimization. There are still some cases where the sort/split method gives 2-3 more frames (no big deal).

Got rid of some spikes, which squeezed a few more frames (1925 or something).

Implemented the algorithm in camera. It's a bit simplified, I didn't include all the optimizations from the simulated version, but at least it sems to work.

Took me two hours just to write this post. Whooo

Enjoy and let me know if the theory actually works!

Fixed broken link - Audionut

Rewind · June 20, 2013, 06:30:39 PM

Yes, it works.
I get about 45-55% increase in number of frames before skipping compared to 1x8Mb + 4x16Mb buffering method.
(550d Sandisk extreme pro 95 mb/s).
Some actual test results

haemma · June 20, 2013, 06:31:54 PM

Very nice. Hopefully my Lexar 1000x 64gb Card arrives tomorrow so maybe I could test some few things. (on 5D3)

Thanks for your amazing work!

Stedda · June 20, 2013, 06:32:45 PM

You sir are a wizard... can't wait to try it out.

a1ex · June 20, 2013, 06:34:27 PM

@Rewind: did you find any screenshots on your card? can you upload them?

They are saved when recording stops (if the build was compiled with debug flags on, which is default right now).

Rewind · June 20, 2013, 06:46:41 PM

Screenshots for this test
[spoiler]

[/spoiler]

driftwood · June 20, 2013, 06:47:28 PM

@a1ex I want to test this on the 5DMKIII immediately - have you got a build link?

a1ex · June 20, 2013, 06:51:56 PM

Straight from the card, not packaged:

raw_rec.mo - portable, should work on any camera

autoexec.bin and 5D3_113.sym - only for 5D3 1.1.3 (optional)

driftwood · June 20, 2013, 06:57:10 PM

Thanks. Testing now...

Stedda · June 20, 2013, 07:18:00 PM

I'm not finding any screenshots... maybe it's me...

Bioskop.Inc · June 20, 2013, 07:27:47 PM

Here's some from 60D @ 16:9:

https://dl.dropboxusercontent.com/u/90827273/60D.zip

This was the first build before you disabled Debug Code

a1ex · June 20, 2013, 07:32:05 PM

So far, all the screenshots looked good, so we can disable the debug info and let it run at full speed.

a.d. · June 20, 2013, 07:33:08 PM

@Stedda
You need to edit the code
from

Code Select

#undef DEBUG_BUFFERING_GRAPH
to

Code Select

#define DEBUG_BUFFERING_GRAPH

Stedda · June 20, 2013, 07:34:39 PM

ahhh... i thought I read it was by default. My bad. Thanks.

Danne · June 20, 2013, 07:39:39 PM

Couldn,t find any screenshots but managed to record 1920x1152 25fps continuosly on my transcend 1000x 64gb card on my 5d mark 3. Before it worked on 1920x1080 24fps, occasionally I could get it to work with 25fps in 1920x1080 25fps depending on the mood of my camera

.

This means 1 step up in aspect ratio fully working with 25fps. NICE!

Tried also some 50fps and got it working continuosly in 1856x492.

Awesome work Alex!
//D

a1ex · June 20, 2013, 07:46:17 PM

I didn't expect any improvement in continuous recording TBH.

But if you were getting around 1000 frames before, now you can expect to get around 10000 (which is almost continuous).

driftwood · June 20, 2013, 07:53:21 PM

Definite improvement a1ex, getting continuous at 24p/25p 1920x1152 now on the lexar 128Gb 1000x. Global Draw On, ETTR On

Quick tests:

24p - 1080 - 92.8MB/s - 10% idle continuous
24p - 1152 - 93MB/s - 5% idle continuous
24p - 1280 - 95.7MB/s - 1-7ms idle > 1000 frames stopped
30p - 1080 - 91.8MB/s - 1ms idle stopped at 502 frames
30p - 1.85:1 - 93.8MB/s - 1ms idle continuous
25p - 1080 - 91.9MB/s - 5% idle continuous
25p - 1152 - 92.2MB/s - 49ms climbing to 80ms idle continuous

Incred1ble work a1ex. :-)

Also getting around 25% improvement on slower 800x cards.

DNG files look good.

NateVolk · June 20, 2013, 07:59:07 PM

I'm seeing an improvement as well. 5d3 Komputerbay 128gb card. Was 1920 @24 for 150 frames before skipping. Now 518 frames before skipping... Oh, and global draw on, memory hack on, force left...
I'm still new, hope this helps!

Stedda · June 20, 2013, 08:05:22 PM

I am seeing the same as Danne Reported.

I was able to record 1920x1080 continuous previously. Now I can record 1920x1152 to what seems to be continuous. I didn't let it run until it filled the card so I'll have to test more but I look like I may even be able to go to 1920x1280 without a problem based off the on screen graph.

This is on a Transcend 1000x 32GB GD on with ETTR only NO Memory Hack.
Nice !

Yuppa · June 20, 2013, 08:10:55 PM

Model: 60D
Card: Sandisk Extreme 45 MB/s 16 GB
Global draw off.

1280 x 544 (2.35:1).

Before: 707 frames.
After: 861 frames.

Got an ERR 70 the 1st time, but not the 2nd, 3rd...

The buffer(s) indicator is cool, better than the stars by far.

Edit: at 1152 x 490 (2.35:1), before 1802 frames, after over 3,000 frames!

swinxx · June 20, 2013, 08:52:58 PM

that looks really promising. great alex

KMA_WWC · June 20, 2013, 09:03:57 PM

As soon as I get back home Alex, I will run the test on all of my cards, and let you know the results and screenshots.

Aborgh · June 20, 2013, 09:11:38 PM

Does it work with 650D?

kgv5 · June 20, 2013, 09:55:57 PM

Could we test it on 6D?

crash-film · June 20, 2013, 10:04:16 PM

thank you a1ex for that wonderful work!

has anybody found a reason why there´s always that mysterious 2 times so called warm up recordings?
it can´t really be a thermal warm up.... then we would be way out of regular technical specifications.

what´s your strategy in optimizing the (probably) card specific writing strategy?
firmware check on the cards (brand, size, manufacturing date)?

or a new benchmarking tool in ML to determine the best settings?
is there a way to format the card from inside ML with optimized cluster settings and/or aligning?

i think the problem here will be the cards and their constantly changing inner structure. let there be a new controler IN the card and the writing strategy might be a nightmare.... a komputerbay x1000 might work for one user and for the next it might be completely "useless"....because everybody is looking for the highest resolutions possible. sick pixel x pixel marketing....

or am i wrong and everything you do is card independent?

News:

New writing strategy - variable buffering

Aborgh