RAW video format specification

Started by studiokitara, May 29, 2013, 03:50:31 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.


Hi all,

Is there a specification sheet for the current RAW video file format?
I was looking at the raw2dng source, and see that ML writes the metadata in the file footer. I was wondering if it made more sense to reserve a specific part in the header of the file, and draw some format specification sheet to insure future compatability. That way, we can start building post processing applications without having to worry the file spec is going to change while still in experimental stage. I know I can reverse engineer it from the raw2dng source, but it would be nice to have a format specification. Perhaps we can approach the ffmpeg and libav team with this format specification for them to integrate or do it ourselves. I did a quick search on the forum, but did not find a topic about this. Sorry if it's a duplicate.

Does anybody feel like sharing some ideas about this?
I would be happy to make a format specification document.

Kind regards


There's no formal specification, and the file format may change. This is really bleeding edge stuff.

Proposals are welcome.


I'll see if I can make a good proposal.

I assume using an existing format specification would be out of order because of performance issues? For example, if we use MXF we are going to have a lot of overhead writing description content besides the raw frames.

Are there any technical issues we should take into account? I'm thinking about the position of the format descriptor; is there a technical issue preventing us from writing the header to the start of the file and let the raw frames start at an offset of X bytes?

Probably audio will not be muxable, and will always be a seperate file?

I did not check yet in the source if there is any exif information currently written to the raw video files, but I think it's a good idea to include it in the spec.

Reading the 10-12bit thread, it would be a good idea to allow different bpp in the raw format.


It's a footer because g3gg0 implemented it this way in lv_rec, and I've copied his code. I don't see any solid reasons for this.

Technical limitation: we can't rewind and write the header at the end of recording, but other than number of frames (which can be autodetected from file size) I can't imagine what other params are not known prior to recording. So yeah, we can write a header, it will help in-camera playback too.

MXF is completely unknown to me. Audio... muxing may be possible, but I have no experience with that.

We can also embed some metadata for every frame.

bpp is already in the footer.


Basically, in post processing it's usually not a problem that the format descriptor or metadata partition is at the foot, unless you are streaming the file in its entirety and want to progressively display it while transfering (as in web download). But that will never be the case, so that's not an issue. It's just a choice we have to make in the specification. It is indeed true that we can calculate the number of frames by its size. But then I'm thinking about the dropped frames. Is it currently know which frame(s) where dropped, can we have that level of information and add that to the descriptor? Better yet, if it's not causing to much slowdown in card write speeds and are able to add metadata, we could add a presentation timestamp to every frame, that way can identify the dropped ones.

Well MXF is a really big book, but it's a standard that can have CinemaDNG as video content. If I understand correctly, the raw frames are currently converted by raw2dng to DNG pictures right? So they need some processing before they become a DNG picture, that means we would be introducing a whole new operational pattern in MXF, which is not very easy and frankly think nobody would support it anyway. That kinda beats the implementation of a standard format.

Since I read on some threads that the audio is slightly desync (either drift, or offset) it would be interesting to make a muxer, that could solve the desync issues. But that would also complicate the format spec. Is this something we want to do?


We are streaming the file when we play it back in camera. Now it works only with last clip and only with the last settings, mainly because ML can't read the footer before playing (because it's at the end).

So, I'd move the metadata at the beginning just for this.

For dropped frames, we can consider metadata for each frame. This doesn't add much overhead IMO (probably not noticeable at all). This would enable a smart converter to use interframe or twixtor to interpolate the dropped frames.

Another example for per-frame metadata: I wanted to add fine-tuning offsets for smooth panning with increments smaller than 8 pixels. But since I did not see any video sample with panning, I guess nobody uses it, so it's not worth the hassle.

For audio, sync is difficult (I have no idea where to start).


Quote from: a1ex on May 29, 2013, 06:32:22 PM
Another example for per-frame metadata: I wanted to add fine-tuning offsets for smooth panning with increments smaller than 8 pixels. But since I did not see any video sample with panning, I guess nobody uses it, so it's not worth the hassle.

I would love that. Digital panning is a great option when there's no physical space to put a slider (or for tight budgets). I don't use it more because I can't smooth start/stop and, also, because when I enter zoom mode the images pixelates and goes black and white, so I can't see on the screen where I'm focusing. But that only requires me to make focus marks and that's not a real problem.
5D Mark III


If you think it's impossible, you have lost beforehand


Quote from: sergiocamara93 on May 29, 2013, 07:03:22 PM
I would love that. Digital panning is a great option when there's no physical space to put a slider (or for tight budgets). I don't use it more because I can't smooth start/stop and, also, because when I enter zoom mode the images pixelates and goes black and white, so I can't see on the screen where I'm focusing. But that only requires me to make focus marks and that's not a real problem.

About Digital Panning; I've gave it a couple of tries, it works like a charm at lower resolutions on 550D.


It seems there is some interest a1ex  :)

Regarding spanned files (> 4gb), I see that the advised way to process is to merge the files. Does this mean no footer is written to the subsequential files? In that case I think we must add a per-frame metadata property that can identify to wich master file (or better; frameset) the spanned file belangs to. Perhaps a GUID, or just the filename of the master. Of coarse it would be interesting if we had the ability to update the metadata of the master file, and specify how many spanned files there are. But I get that you can not go back to he master file to adjust the footer, so we'll have to improvise. Perhaps add a bit in the per-frame metadata tag to specify the file is continued in the next part, and then post processing software can just read the last frame's metadata tag and see if it needs to look further.

I have studied the raw2dng source yesterday, and saw that the raw frames stored, are actually a direct copy from the bayer raw bufffer, is that right?


Spanned files: RAW, R00, R01 and so on, just like WinRar. Footer is only on the last one, header can be on the first one.

Raw frames are direct copy from Canon's raw buffer, with some byte swapping to match the DNG spec.


I see, ok no problem.

I have not tried this myself yet, but is it currently possible to zoom while recording raw? I probably don't think so because now the frame size is only specified in the footer, and when you zoom on a 5d2 it changes resolution to 1920. Do we need to add support in the spec for different frame sizes in the same recording?

Also, currently we are using RAWM as FourCC identifier, which is fine and not in use yet (you guys probably already looked this up :)).


Hahaha, first time I hear about variable frame size in video... imagine the video player resizing back and forth while playing!


mxf spec supports this :-)
I work at a tv broadcast network, we occasionally receive mxf files that are not conform and merged together. when played in Sony's xdcam player it behaves exactly like you say, the window goes crazy.

It's a matter of choice, for future proofing this could be interesting, but personally I don't see the use of it.


Another thing: I remember reading on CHDK that writing 512-byte multiples improves write speed. We don't have this problem with the footer, but if we are adding headers, maybe we should take it into account (e.g. minimum header size should be 512, both global and per-frame).


Excellent idea, I was also just thinking about the size; currently the footer size is 192.
How do you feel about adding full exif data to the header? I need to do the math, buth I guess we'll really need to do 512. Guess it's not that much overhead; 512kb per 1000 frames. Anyway, if we gain performance by it, it's definitely a no-brainer.

MA Visuals


Regarding panning mode.... +1 here as well.  If studiokitara's work helps improve this feature, I say go for it.

When needing to gather additional b-roll without a slider, this feature will be invaluable in my opinion.  Also, configurable speeds as well as a way to configure a delay before recording begins and then have it automatically begin panning.  This would make it very flexible for self shooters.   I'm kicking myself for not remembering about this feature this weekend when I went out to shoot.  Here's what I shot and I definitely could have used panning mode for a few shots.


I'm new, so please excuse me if what I'm suggesting is not possible.
If I understand correctly, the DNG's created with Magic Lantern are compatible with the DNG specification from Adobe and can be opened using Adobe's Camera Raw.
Wouldn't it make sense then to just use the CinemaDNG specification from them? This way it should be possible to open those files in After Effects and any other application supporting that format. (wikipedia suggests there are a few able to do so).



there is already a CinemaDNG thread about exactly the same.
its just a matter of time until its programmed.
Help us with datasheets - Help us with register dumps
magic lantern: 1Magic9991E1eWbGvrsx186GovYCXFbppY, server expenses: [email protected]
ONLY donate for things we have done, not for things you expect!


Regarding the variable recording size (dynamic resolution), if the FOV remains the same then you could add sharpness by increasing size. In the same way the Frame rate remap alters temporal resolution. Then you could perform the crop in post (retaining pixel width, sacrificing FOV) to affect a zoom.

Also you could alter detail, during the shot, to effect a sort of codec compression? Could you use the focus algorithm to estimate requirement of detail. More frequency=more detail, more flat areas < less detail.
550D on ML-roids


g3gg0, are you by any chance referring to the CinemaDNG for raw2dng thread? Because I'm talking about making a spec for the recording format. As far as directly recording to CinemaDNG goes, you have two format choices: directory with seperate dng files for each frame, or use mxf as a container. But using mxf as a container will cause too much overhead I guess. Saving seperate dng files for each file, wouldn't that also be a performance hit since we need to create a new file for each frame (add dng header, shift bits...)

If someone does implement the CinemaDNG standard directly on camera without too much performance loss, that would be great. That's much more desirable than creating a custom format spec.


The Problem is, we need 10 12 or 16 Bit Videodata per Pixel/Sensel - Resolve & Speedgrade do not understand 14Bit. So we have to restructure the Sensordata while writing to CF-Card. Possible? Dont know..

[size=2]phreekz * blog * twitter[/size]


To understand the requirements better, I'm writing a small video player in c. I have no experience in linear 14bit color spaces, could somebody explain to me what I should do with the black/white level integers? Also, what is the pitch; is that bytes per line? (I copied it from the raw2dng source wich says width*14/8; does 14 mean the bbp)
In my source, I'm using the raw_get_pixel from raw2dng, what do I get back? a 32 bit integer with 14bit color information?

I'm trying to display this data on a 24bit rgb surface, but I'm kind of screwing around  ;)


See raw_to_ev and raw_preview_fast, these will answer your questions.


Thanks, will try to implement this later. I was succesful in displaying frames, only thing missing was the color conversion.