12-bit (and 10-bit) RAW video development discussion

Audionut · June 05, 2013, 07:50:40 AM

Lets stop with the useless speculation and keep it on topic thanks.

The activity in this thread should make it pretty clear that the developers are interested in the idea, and being able to make it happen.

mucher · June 05, 2013, 10:12:05 AM

Quote from: vicnaum on May 28, 2013, 08:16:00 AM
Nope, LUT things will be just y = lut [ x ], and that's all.

And about the calculations you mention - they are easily done (like g3gg0 said) with bit shifting:
Raw14b: 13598 = 0011 0101 0001 1110
Shift 2 bits right =>12bit = 0000 1101 0100 0111 = 3399
13598 * 2^12 / 2^14 = 13598 * 4096 / 16384 = 3399.5 = 3400

So no need for calculations at all. Rsh solves it on cpu-elementary level.
If you need better rounding (like said), add half of the bits (2bits = 4, so add 2) to number before shifting:
(13598+2) = 13600
13600 Rsh 2 = 3400

As per vicnaum. If this works, there will be no loss in DR, just lose lum levels because of that 10bit's less levels, but I understand very little of that hex level calculation and I have no time to check its validity. And if vicnaum is right, there is no calculations to do except moving bits in HEX level, which sounds very promising to me.

glnf · June 07, 2013, 01:50:57 AM

I don't want to distract anyone working on this. I just like to add a little clarifications for those that are not clear about the goal of this operation. It is actually a very simple idea. The RAW data from the sensor is 14 bits long. Each 14 bit number represents the grey value of a red, green or blue image pixel.

So a typical value looks like this: 1010 0101 0100 10 (the spaces are just for reading convenience)

We could translate the value into the decimal system (it is 10578) but there is not really a need for this. All we have to do is to get rid of the last two digits. So we end up with 1010 0101 0100. That is quite a different value. We write that onto the card. Then, when we retrieve the values from the card we simply add 00 at the end of the number, making it: 1010 0101 0100 00. This is almost the same value (decimal 10576) as initially. So we loose a tiny bit of precision but we save some space and reduce data rate.

There is by the way no need to do any rounding. Chopping off the lowest 2 (or alternatively 4) bits before writing the value to the card and replacing it with 2 or 4 "0" for further processing on the computer does the trick. Instead of adding "0" it would be slightly more elegant to add random values, hence mimicking (a very low level of) noise.

So far goes the theory. Since I haven't done any experiments with the camera I don't know if there are any additional obstacles, eg. a reverse or unusual bit order. (I remember reading something about black level in this thread and that sounded a bit odd to me. And no reason to come up with logs or whatever. That only harms the beauty of this trick.) Cheers, g

Critical Point · June 07, 2013, 03:46:19 AM

So, lets say on a 600D, what would it mean a 10 or 12 bit raw ? What kind of resolutions are we talking about for 24 fps ? Please don't say 1080p...because I'm going to have a heart attack.

hammermina · June 07, 2013, 06:31:59 AM

@ critical point

there will be no 1080/24p i think the max will be 720p if this mode will work

xNiNELiVES · June 07, 2013, 07:27:31 AM

Not to be a stickler guys but audio nut just told everyone to keep useless speculations aside. It'd best if we just keep out of this development discussion. If you were to open up a thread dedicated to general discussion of this topic that would be permissible.

mucher · June 10, 2013, 12:32:45 AM

Maybe we can use some precalculated values. It is seemingly that DIGICs has accuracy problems when multiply too many times. But I am not sure how it will do in dividing, probably it will be fine. The equation changing to 10bit, x * 2^10 / 2^14, can actually turn into two parts x * ( 2^10 / 2^14), and 2^10 / 2 ^14 equals to 1/16, so the equation can change to x* 1 / 16, and that equals to x / 16, and likewise, changing to 12bit means x/4

g3gg0 · June 10, 2013, 12:51:42 AM

the last time:
devs are aware how to convert data from 14 bit to 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, and even 0 bits.
it is no riddle. it is no challenge.

can do this while sleeping, we can do this while steering a car and we can even do this in electrical circuits.
seriously.

the only issue is: THE CPU IS SLOW.

driftwood · June 10, 2013, 12:55:47 AM

Exactly, the cpu is perpetually at 96% or greater ceiling at the moment.

necronomfive · June 10, 2013, 12:59:49 AM

Quote from: mucher on June 10, 2013, 12:32:45 AM
...and that equals to x / 16, and likewise, changing to 12bit means x/4

Or in other words: shift right by 4 or 2 bits....

Concerning precalculated values: if CPU bus speed to memory is an issue (just look at the plain memcpy values), then I don't see how ADDITIONAL external memory fetches from a LUT will help this cause in any way.

I also see no way how 12-bit/10-bit compression will ever be achieved with the CPU. It was only designed to do GUI/filesystem management related stuff. The real processing is done by highly dedicated signal processing blocks inside the DIGIC. As said before, the only chance to get this working is finding an undocumented register bit in DIGIC which reduces the bit width of written sensor data.

With no documentation on DIGIC, this is like finding a small coin inside a barn full of hay, without any kind of guaranty IF this coin exists at all.

necronomfive · June 10, 2013, 01:10:26 AM

.

IliasG · June 11, 2013, 10:08:53 PM

Quote from: necronomfive on June 10, 2013, 12:59:49 AM
...
Concerning precalculated values: if CPU bus speed to memory is an issue (just look at the plain memcpy values), then I don't see how ADDITIONAL external memory fetches from a LUT will help this cause in any way.

I also see no way how 12-bit/10-bit compression will ever be achieved with the CPU. It was only designed to do GUI/filesystem management related stuff. The real processing is done by highly dedicated signal processing blocks inside the DIGIC. As said before, the only chance to get this working is finding an undocumented register bit in DIGIC which reduces the bit width of written sensor data.

With no documentation on DIGIC, this is like finding a small coin inside a barn full of hay, without any kind of guaranty IF this coin exists at all.

Since I am not familiar with 5DIII internals and I have already asked 2-3 times how much time consuming would the LUT proposal be (no answer yet), can you give an estimation ??.

A clear answer from the start would keep the thread compact and on target ..

Chucho · June 12, 2013, 08:21:40 AM

I'm a little behide here, so changing the 14bit can be done by code but is the big challenge to find a way to do it in camera? If you search for CrawBit Invalid it will give you the acceptable arguments for FA_SetCRawBitNum() 0x10=16bit, 0xE=14bit, 0xC=12bit, and 0xA=10bit. Doing something this
void foo();
{
call("FA_CaptureTestImage");
   call("FA_GetCrawBuf");
   call("FA_SetCRawBitNum",0xE);
call("FA_DefectsTestImage");
   call("FA_CreateTestImage");
}

And then looking for the registers that change. Would that give us some sort of hint?

tin2tin · June 14, 2013, 08:12:12 AM

@ Chucho
Looks like a very interesting finding. I wonder what the devs can make of it?

g3gg0 · June 14, 2013, 10:16:44 AM

i am not 100% sure but i remember smth like that.
iirc it was setting up a bunch of registers that are related to the twoadd etc stuff.
so it was not a simple register change and the CMOS sends us 10 bit, but the parameter to one of those not-yet understoor modules.
if we understand what they do, we can redirect EDMAC output from CMOS to these modules etc.

that involves a lot of register setup.

rudi · June 14, 2013, 12:48:24 PM

Quote from: g3gg0 on May 23, 2013, 12:16:09 AM
yeah..
well, i implemented an memcpy using LDMIA/STMIA for LV buffer copying and this was a dead end.
so i tried to get EDMAC working.

I don´t want to keep the idea of a 12 bit shifter alive, but i´m just interested in how much MB/s you recieved with LDMIA/STMIA.
the mysterious "d" seemed to reach about 30-40MB/s on an smaller modell.
and as far as i can see from the debug scrennshots a memcopy can reach over 70 MB/s. Ist that correct. Was that LDMIA/STMIA?
Some years ago a made heavy X86/MMX/SSE2 optimisations and looked some days ago at ARM optimizing.
As far as i can see (and what "d" did) you can sqeeze out a lot of bubbles and stalls out of hand optimizing.
Shifts are also free in ARM. Nearly every instruction can be combined with the barrel shifter at no cost. (I´m sure you are ware of this).
And you have a lot of Cache/Flush/Prefill Options in ARM.
But again, i don´t want to reanimate the idea, i´m just curious how fast optimized LDMIA/STMIA / memcpy on a 5DMK3 can be.

bjacklee · September 27, 2013, 02:47:02 PM

If only this is possible..

1berto · September 30, 2013, 12:36:42 PM

I don´t understant nothing of this, but IMO compression is the way to go, thats was the secret of RED cameras and the problem of BMCC workflow...

If you find a way to compress the raw data so that the image don't get compromissed and at the same time can be captured for the card.

Midphase · October 01, 2013, 10:28:43 AM

Sigh, for the nth time...it's not possible since the Canon CPU's aren't fast enough to perform the necessary calculations. They can barely keep up with writing the data to a CF card as it is.

pascal · October 01, 2013, 11:24:54 AM

Quote from: Midphase on October 01, 2013, 10:28:43 AM
Sigh, for the nth time...it's not possible since the Canon CPU's aren't fast enough to perform the necessary calculations. They can barely keep up with writing the data to a CF card as it is.

On the contrary if you add cpu time to conversion you remove cpu time from data write.

Audionut · October 01, 2013, 11:57:24 AM

Read the thread fully before making assumptions!

Feel free to PM me to have this thread reopened, if you have code that makes bitdepth reduction in camera possible.

a1ex · May 11, 2016, 05:37:11 PM

Quote from: Audionut on October 01, 2013, 11:57:24 AM
[...] to have this thread reopened, if you have code that makes bitdepth reduction in camera possible.

I think I've got such code, based on g3gg0's raw_twk experiments.

https://bitbucket.org/hudson/magic-lantern/commits/c6fbba9

Timing: 120ms for a full-sized image (5936x3950).

Help is welcome - not just by testing (or asking for) the finished thing, but by trying to understand how Canon's image processor works, how to call it for other purposes (it can do lots of operations, such as add, sum, min, max, debayer, curves, filters, (m)jpeg compression, lossless raw compression... the main difficulty is understanding how to configure these image processing modules), and - of course - by turning this proof of concept into something useful.

Other relevant pages:

http://magiclantern.wikia.com/wiki/Register_Map
http://www.magiclantern.fm/forum/index.php?topic=13408
http://www.magiclantern.fm/forum/index.php?topic=6740.0
http://www.magiclantern.fm/forum/index.php?topic=16428.msg159735#msg159735

Happy hacking.

markodarko · May 11, 2016, 07:05:22 PM

I've just read the entire thread (I wasn't around here in 2013) and I'm a little confused. Without standing on anyone's toes by not helping from a development POV, can someone explain why @d apparently got this working but others say that the CPU is not fast enough? (Or did I misread?)

Thanks,

Mark.

a1ex · May 11, 2016, 07:33:26 PM

Read the benchmarks and do the math.

markodarko · May 11, 2016, 08:12:59 PM

Quote from: a1ex on May 11, 2016, 07:33:26 PM
Read the benchmarks and do the math.

:-D you take me for a learn-ed man, Mr. @a1ex.

I did take a look at these:

Quote from: a1ex on May 23, 2013, 10:26:21 AM

Impressive.

...but I don't know what I'm looking for to do any math on. Or where to find a standard 14-bit benchmark to compare against. I figured that you saying "impressive" was a good thing though. :-)

Cheers,

Mark.

News:

12-bit (and 10-bit) RAW video development discussion