Based on the data from Audionut, I've computed how much noise we gain (or lose) by increasing each of these 3 gains.
Think at ISO 100 vs 200: you increase the gain and you get lower noise (cleaner shadows). There is a point in keeping this kind of gains at large values (you may clip highlight detail, but you get cleaner shadows in return). So, the 0xFE gain is one of these gains that help reducing noise.
Also think at ISO 3200 vs 6400: you increase the gain, but there's little or no improvement in noise. You want to keep this kind of gains at smaller values (because you may clip highlight detail and you get nothing in return). The other two gains have only a minor impact on the measured noise, so we want to keep them as low as possible.
Now, from these numbers, can we draw a conclusion regarding the position of each amp in the processing chain? Can we say the order is CMOS[0] -> 0xFE -> 888x + SaturateOffset -> 8/9/A/B -> ADC?
(note that I've renamed them back to their register address, because I'm no longer sure whether 8/9/A/B is before 888x or viceversa)