12-bit (and 10-bit) RAW video development discussion

Started by d, May 22, 2013, 10:58:34 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

mucher

Or the other way:

int x = x * (2 ^ 12);
x = x / (2^14);


mucher

But if x is int, and it is divided by a very large number, it will change to value 0 perhaps,


so I would rather change it to:

int x; //YUV brightness grade

register double y = x; //temporarily change it to double precision

y = y / (2^14);
y = y * (2^12);

x = y; //change the value back into int

What do you say developers?

KMikhail

Why don't you just divide by 4? Which would be similar to bit shift by 2 bits >>

Bit shift was handy since the 80286 era, when x*320 was = x <<  8 + x << 6. But since then mul and div became very cheap.

Granted, if your integer isn't right alinged (least mening bits) you have to bit shift more to the right, and then back to the left. Masking would do the job too, but would require either a register or mem op.

BTW, according to spec ARM9 has 16 32bit MAD units, should be more than capable of basic integer math, no?

EDIT:

I see, you probably are trying to get a better rounding. For that purpose you can ADD/SUB something, prior to dividing. But dividing and then multiplying a double number won't give you the result, it is caled a floating number, for a reason. There is a set of specialty rounding operations for doubles. However, I doubt double operations are fast on such hardware.

mucher

Quote from: KMikhail on May 25, 2013, 11:18:34 AM
Why don't you just divide by 4? Which would be similar to bit shift by 2 bits >>

Bit shift was handy since the 80286 era, when x*320 was = x <<  8 + x << 6. But since then mul and div became very cheap.

Granted, if your integer isn't right alinged (least mening bits) you have to bit shift more to the right, and then back to the left. Masking would do the job too, but would require either a register or mem op.

BTW, according to spec ARM9 has 16 32bit MAD units, should be more than capable of basic integer math, no?

EDIT:

I see, you probably are trying to get a better rounding. For that purpose you can ADD/SUB something, prior to dividing. But dividing and then multiplying a double number won't give you the result, it is caled a floating number, for a reason. There is a set of specialty rounding operations for doubles. However, I doubt double operations are fast on such hardware.

What I worried was that accuracy problem, using int might not be accurate enough, and if the data were too small, divide by (2^14) might make it unusable at all, so I change it to floating point, and, I worried that float might not be accurate enough too, so I wanted it to use double instead. Modern CPU might be fast enough to handle double, my wild guess. If not, you can still change to use float, that might be accurate enough already. But if I have got the full source code, I will definitely compile it into my way, before loading it to my camera, including meddling a bit with that raw.c too.

BTW, ARM9's 16 x 32bit MAD unit is in floating point to my understanding.

KMikhail

Quote from: mucher on May 25, 2013, 11:51:47 AM
What I worried was that accuracy problem, using int might not be accurate enough, and if the data were too small, divide by (2^14) might make it unusable at all, so I change it to floating point, and, I worried that float might not be accurate enough too, so I wanted it to use double instead. Modern CPU might be fast enough to handle double, my wild guess. If not, you can still change to use float, that might be accurate enough already. But if I have got the full source code, I will definitely compile it into my way, before loading it to my camera, including meddling a bit with that raw.c too.

BTW, ARM9's 16 x 32bit MAD unit is in floating point to my understanding.

int x; // 14 bit number, in a 16/32 bit format
x = (x +2) / 4; // rounding closer to normal rounding, not CPU binary world.

// x is 12 bits now

2 => 1; 3 => 1; 4 => 1; 5 => 1; 6 => 2; 7 => 2; 8 => 2;

Double is 64 bit, and by now is totally fine on modern CPU. But for raw handling in our Canons we need more integer performance for packing all this raw data into something more space efficient. Floats, though, are nice to have at times too.

mucher

Quote from: KMikhail on May 25, 2013, 12:08:01 PM
int x; // 14 bit number, in a 16/32 bit format
x = (x +2) / 4; // rounding closer to normal rounding, not CPU binary world.

// x is 12 bits now

2 => 1; 3 => 1; 4 => 1; 5 => 1; 6 => 2; 7 => 2; 8 => 2;

Double is 64 bit, and by now is totally fine on modern CPU. But for raw handling in our Canons we need more integer performance for packing all this raw data into something more space efficient. Floats, though, are nice to have at times too.

I don't understand your code very much.

ARM9 has 16 32bit floating units, let us assume that a double calculation costs 4 times more time to calculate than a float calculation, then the 16 32bit floating units' efficiency is like 16/4 = 4, so it can mean that there are 4 double's running at the same time, that can be translated to 4 frames at the same time. If all the calculation costs around 16-32 CPU cycles per frame(including time waiting for CPU's task scheduling), the CPU should by far have enough power to process 24fps in real-time. I reckon. So my modification should work, theoretically.

eyeland

We seriously need a canon engineer to defect (again?) and give us a few spoilers...
Daybreak broke me loose and brought me back...

1%

Why not hack at those registers in the noise thread... one of those probably enables 10/12/14 bit. Then nothing will have to run on the CPU.

g3gg0

the ARM used in DIGIC has no VFP afaik.
this is an extra option when licensing ARM's IP.

so pricessing any floating point can only be done in software.
Help us with datasheets - Help us with register dumps
magic lantern: 1Magic9991E1eWbGvrsx186GovYCXFbppY, server expenses: [email protected]
ONLY donate for things we have done, not for things you expect!

budafilms

For those who have videos in 10 bits, the question is if that video it isn´t a quality of a good britrate in h264 with a prolost preset...?

mucher

Then we have to stick to the original int thing :o

But the color level is still 2^14 >> 2^12 or >> 2^10, right? Mighty developers

squig

Quote from: budafilms on May 26, 2013, 10:04:11 AM
For those who have videos in 10 bits, the question is if that video it isn´t a quality of a good britrate in h264 with a prolost preset...?

This is a joke right?

g3gg0

when cutting the least 4 bits you will usually cut off the noise that is said to be ~3-4 bits in bright areas
Help us with datasheets - Help us with register dumps
magic lantern: 1Magic9991E1eWbGvrsx186GovYCXFbppY, server expenses: [email protected]
ONLY donate for things we have done, not for things you expect!

g3gg0

i just ran some code to read out VFP registers and can confirm now, that this HW has no VFP unit.
Help us with datasheets - Help us with register dumps
magic lantern: 1Magic9991E1eWbGvrsx186GovYCXFbppY, server expenses: [email protected]
ONLY donate for things we have done, not for things you expect!

vicnaum

Quote from: g3gg0 on May 26, 2013, 01:08:50 PM
i just ran some code to read out VFP registers and can confirm now, that this HW has no VFP unit.

So there's no need in it then :-) There are many other (fast) ways of bit conversion without using floating point calculations.

IliasG

Of course the best is to get a ready 12bit raw or even 10 bit if it is gamma/log encoded. In fact I believe a 10bit log is better than 12bit linear.

But if this hack is difficult-impossible then the compression must be made in CPU.

I think that the way to go is 10bit log or rec.709 gamma encoding by a lookup table. No floats, no calculations .. all precalculated in at table.
Then the reverse (linearization table) can be in raw2dng or in the dedicated exif tag.

Something similar do BlackMagick and Sony and Nikon's etc.
BlackMagic encodes a 16bit linear raw in a 12bit gamma rec.709 (or log) file. The 12 log to 16 linear bitdepth sounds hyperbolic but we have to remember that BM uses only digital amplification (in fact it is just a exif tag .."Baseline exposure") for ISOs higher that the base so they need a bit more bitdepth.

ML needs 14bit linear to 12-10bit gamma or log and together with the analog ISO (up to 3200-6400) the expected file quality is at the same level as BM for mid ISO and higher at high ISO.

The question is how fast can this lookup be ??.

The lookup table with 16384 indexes and one column data with 1024 distinct values (10bit) is OK I think.

To make things better we can safely clip the 14bit data at about 1984 ( Black point is around  2048 - 64 = 1984) so any pixel with 14bit value <= 1984 will be mapped to 0 10bit. This way we can have a denser sampling of the useful range of values.

Looking at Blackmagick's and Nikon's linearisation tables ("Dump raw curve" with RawDigger) one can see that Nikon use an extended linear area at the darks and the exp area follows for midtones and highlights. It is usefull to keep data linear at the darks and without any manipulation .. usefull to avoid posterization and color shifts at the darks.

The 64 levels "below black" I choose is because so does BM (256 levels 16bit) .. and to leave footroom if something goes wrong (Black Level). It can be half (32 levels) with no problem.

Ideally we need a gamma with 16X multiplier at the linear part (so that no change needed there going from 14bit to 10) which extends around 20-40 levels (10bit) over "Black Level" (Nikon uses 400 linear levels at 14bits) .. and a relatively mild exp (^2.0) so that the highlight data stay relatively dense. Sadly my background in Maths is weak and i cannot match the linear with the exp part .. help needed ..

16X linear multiplier has the Prophoto gamma but it is linear area is very small, all falls in the "below black" area ..

xNiNELiVES

If the devs here get 12 bit working I think it will be possible for the 6D to shoot in 1880x817 resolution. From other photos taken in 14 and 12 bit raw I found the difference to be about 79% of the file size of a 14 bit photo. So with 1880x817 resolution requiring 55mb/s x .79 = 43.45mb/s. The highest 6D write speed I've seen is 41.7mb/s so within 2mb do you guys think it would be sustainable? Am I doing any of the math or analysis wrong?

Or is there a way to have real time 24p compression? Something to allow for higher resolutions?

Audionut

Quote from: xNiNELiVES on May 27, 2013, 06:20:05 AM
Or is there a way to have real time 24p compression? Something to allow for higher resolutions?

We should leave this thread to development discussion of making 12/10 bit encoding possible, before considering what advantages it will have with RAW recording.

vicnaum

Quote from: IliasG on May 26, 2013, 10:14:01 PM
I think that the way to go is 10bit log or rec.709 gamma encoding by a lookup table. No floats, no calculations .. all precalculated in at table.
Then the reverse (linearization table) can be in raw2dng or in the dedicated exif tag.

Yeah, I also thought about the lookup table method. Should be faster than math.

xNiNELiVES

Quote from: Audionut on May 27, 2013, 06:27:15 AM
We should leave this thread to development discussion of making 12/10 bit encoding possible, before considering what advantages it will have with RAW recording.

Ok :) it may be an incentive for other developers not yet dedicated implementing this feature yet, then again they probably already knew...

mucher

Hello, Masters,

I just did a simple math, to see which part will change its value while converting from 14bit to 12bit or 10bit, and I use this formula:

We will try to find out which brightness value in 14bit world, after converting to 12bit by using int formula like x * 2^12 / 2^14, will change to the smallest value, which either 1 or 0 in int world:

int x; //the value to be found out

formula:

x * (2^12) / (2^14) = 1

the result:

x = 4

So any value in the original 14bit data smaller than 4 will be converted into 0 in the final 12bit data format, which totally consists 4 values: 0, 1, 2, 3

That means that in the original 14bit data, only brightness level from 0, 1, 2, 3 will be converted into value 0 in the final 12bit format. And these 4 values only consists only minimum fraction comparing with 14bit world, which is about 0.024%(4 / 2^14) of all the values in 14bit world, and its final form, which is total black level value = 0. What the good news is this part of data will not be transfer by HDMI interface(I heard that HDMI usually will not transfer brightness level from 0 - 5 in all the 0 - 255 values to the final equipments of display (like TV, projectors...).

To see the 10 bit scenerio:

formula:

x * (2^10) / (2^14) = 1

x = 16

There  will be totally 16 values converted into 0 while using int formula x * (2^10) / (2^14), that consists 0.098% of all the values in the 14bit data format, and these values might be neglected by HDMI interface anyway.

So, I think that using int formula x * (2^12) / (2^14),  or x * (2^10) / (2^14) into 10bit respecfully,  might be very viable methods.

mucher

The bad news is that every 4 value in 14bit data will change to the same value in 12bit form, but then I realize that 12bit world has 4 times less value than 14bit world anyway.

What do you say masters? ;D

vicnaum

HDMI doesn't belong here in any way, cause it's a display transfer format, but raw is not made for display, but for processing first. For transfer via HDMI sRGB values are compressed to 16-235 (in Rec.709 standard), and then recompressed back to full 0-255 range for display in TV (as far as I remember).

It's obvious that we'll have only every 16 number if we supress 14->10bit (each bit is a power of 2, so 2^4bits = 16), so we'll get 1024 values instead of 16535. But the good news is that's still more than sRGB 255 (although, need to try it, how it will Raw-convert, will there be banding, etc, etc).

Btw, to eliminate banding, we can use a bit of dithering while using the lookup table (but that's not so good as dithering in picture-space).

Anyways, all these are idle talks, until we get confirmation that this lookup-table thing will be realizable and fast on our little slow ARMs.

mucher

Another thought.

In x * (2^12) / (2^14), one can only multiply first, or every value will become 0 if one divide by 2^14 first. So the worse scenerio can be this:

The largest number in 14bit data times 2^12, and then divide by 2^14, can be pains-takingly slow, but worth a try.

the largest value: 2^14 * 2^12 / 2 ^ 14, that means 16384 * 4096(that is 67108864
)/ 16384

And we can use precalculated values in the programs too, like instead of using x * (2^12) / (2^14) for easier reading, we can always use x * 4096 / 16384 instead.

As a last resort, we can use unsigned int instead of int, can we?

ARM might possibly be fast enough, because int is pretty fast to run, usually 1-a few CPU cycles in x86 world, and if we use unsigned int, it can be even faster.

mucher

As the rounding issue, I don't know what these DIGICs will do, probably it will delete everything behind the decimal point to speed up running. But the difference can be like 2394 or 2395, can still be far more accurate than the human eyes can tell.