After a bunch of investigation, here is my current state of understanding (long post!).
How Canon determines white balance(1) Choose a subset of the sensor pixels. More on this later.
(2) Add up the (black-level-subtracted) ADC values from this pixel set. These values are recorded as "Raw Measured RGGB", e.g.:
Raw Measured RGGB : 99230 126060 976974 359041
In this 50D dual ISO shot, the RG pair appears to only sample the primary ISO (100) while the GB pair appears to only sample the recovery ISO (800), which strongly indicates that there is line skipping going on in step (1). If the ADC values are not being scaled down in this sum, then it could mean that the white balance is being computed from as few as ~100 pixels.
(3) Divide the previous values into large number that is proportional to the number of pixels in that channel (essentially, this is the inverse of the average ADC value). These values are recorded as "WB RGGB Levels Measured", e.g.:
WB RGGB Levels Measured : 5160 4224 582 1457
Because the dividend appears to be channel-dependent, I suspect that Canon masks out a subset of the pixels in each channel (possibly those that fall below a certain threshold) to keep them from biasing the white balance.
(4) Transform the "WB RGGB Levels Measured" values into a temperature ("Color Temp Measured") and a tint (not recorded in the CR2).
Color Temp Measured : 10719
(5) If the temperature is outside of some "acceptable" range, choose a better temperature through some algorithm. This value is recorded as "Color Temp Auto", e.g.:
Color Temp Auto : 7448
That is, Canon decided that 10719 K was way too hot, and set a new temperature of 7448 K. As an alternate example, from a regular shot in tungsten lighting:
Color Temp Measured : 2400
Color Temp Auto : 3611
Canon decided that the "Measured" temperature was too cold, and increased the temperature. I suspect this is why I typically can't rely on AWB for tungsten lighting.
(6) Using the (corrected) temperature and the previously determined tint, which has
not been changed, invert the transforms in step (5). These values are stored in "WB RGGB Levels Auto", e.g.:
WB RGGB Levels Auto : 2970 1024 1024 1369
Even when the temperature is not changed, the "WB...Measured" and "WB...Auto" values can be slightly different, which suggests that this transform-inverse-transform step always takes place.
(7) Assuming the camera is set to AWB, copy these values to "WB RGGB Levels As Shot"
WB RGGB Levels As Shot : 2970 1024 1024 1369
Color Temp As Shot : 7448
and, on the 50D at least, compute "Red Balance" and "Blue Balance"
WB RGGB Levels : 2970 1024 1024 1369
Blue Balance : 1.336914
Red Balance : 2.900391
White balance approaches for dual ISO shotsThe convenient approach is to use the EXIF information to construct a better white balance.
- For cameras like the 5D3, we may be fortunate that the RGGB values are always determined from one ISO (as opposed to mixing them together), and so you'll just have to set ISOs accordingly. For example, 100/800 might have correct white balance, while 800/100 does not. If the ISOs are mixed, then all balances will be thrown off by saturation, with no obvious way to correct them.
- For cameras like the 50D, it looks to be "easy", because you can just renormalize "WB RGGB Levels Measured" (see my earlier post). One color balance (i.e., red or blue) will be correct, and the other one may be somewhat off due to saturation. Note that using these corrected "WB RGGB Levels Measured" values for "WB RGGB Levels As Shot" values will circumvent the sanity check that Canon applies to the temperature.
In either case, the "WB RGGB Levels As Shot" values need to be converted in
cr2hdr and fed into
dng_set_wbgain.
The alternate approach is to estimate the white balance in
cr2hdr, since we're looking at all of the raw data anyway. We would just need to mimic steps (1)–(3) and (7), and then feed the values into
dng_set_wbgain. It'd be very difficult to replicate steps (4)–(6) because we don't know what Canon's transforms and algorithm are.