Basically, in post processing it's usually not a problem that the format descriptor or metadata partition is at the foot, unless you are streaming the file in its entirety and want to progressively display it while transfering (as in web download). But that will never be the case, so that's not an issue. It's just a choice we have to make in the specification. It is indeed true that we can calculate the number of frames by its size. But then I'm thinking about the dropped frames. Is it currently know which frame(s) where dropped, can we have that level of information and add that to the descriptor? Better yet, if it's not causing to much slowdown in card write speeds and are able to add metadata, we could add a presentation timestamp to every frame, that way can identify the dropped ones.
Well MXF is a really big book, but it's a standard that can have CinemaDNG as video content. If I understand correctly, the raw frames are currently converted by raw2dng to DNG pictures right? So they need some processing before they become a DNG picture, that means we would be introducing a whole new operational pattern in MXF, which is not very easy and frankly think nobody would support it anyway. That kinda beats the implementation of a standard format.
Since I read on some threads that the audio is slightly desync (either drift, or offset) it would be interesting to make a muxer, that could solve the desync issues. But that would also complicate the format spec. Is this something we want to do?