When I snap a normal picture, the field of view of my sensor is laid out like a sheet over the pixels, which we could consider to be like lots of red party cups fitted closely together.
Looking at the sheet on the whole, we'll see the outline of lots of rings (being the rims of the cups), but the majority of the sheet will be drooping in the cups. In this example, most of the light reaching the sensor will end up illuminating one pixel or another.
In the lite version of this proposal, I want to discuss the possibility of capturing the parts of the FOV that don't ultimately land on a pixel, i.e., they land on the rim of a cup or between rims. By utilizing small vibrations that I think can be generated using an image stabilizer or the environment, we can do the equivalent of wiggling the sheet around. As we wiggle it, the parts of the sheet that before landed in between the cups, will eventually end up in one cup or another. Thus, we should be able to take a handful of pictures of the same scene, combine them, and end up with slightly higher resolution.
That seems possible, but not all that useful. The main hope I have is to use this far more dramatically.
(The following is just a thought experiment, and I make no attempt to describe this method mathematically).
Let's imagine we have a single-pixel (this being composed of the typical three color sub-sensors) camera. Let's consider two different lenses focusing two different fields of view, one focused narrowly (A), and one wide angle (B).
Regardless of the narrowness of the field (assuming it's not so narrow as to require QM considerations), there will always be some blending of details. This is quite clear with B, where we might imagine a 90 degree field of view focused on our single pixel. Composing an image where a green lawn takes up the lower half of this FOV and a red building takes up the FOV's upper half, will result in an image that is blended yellow. We can still refer to the original three color channels to see the contribution each has, but no information will be recorded about where within the FOV they come from. But this is only true with a snapshot.
With a narrower focus, blending will still occur, but do to the nature of things, the narrower the view the less the disparity across the FOV. If we use lens A, with say a 1 degree FOV, from the same distance that we used lens B, we can much more accurately represent the image by stitching together something like a thousand pictures with small angle adjustments. This isn't miraculous, as we've essentially used our single pixel to replicate temporally the job that a higher res sensor could do spatially, but it segues into my next point.
Let's aim lens A at some transition point between the red house and the green grass. For the sake of this argument, let's consider an infinitely sharp transition line between the physical house and grass, meaning that any blending of red and green in our sensor is do to the blending described above and isn't inherent in the scene. How might we reconstruct an image that shows this contrast even beyond the resolution and FOV combination might allow? This is the purpose of this analysis.
We begin at the transition point described above and allow our camera to vibrate, to move randomly, with the maximum distance of displacement resulting in a half degree of FOV displacement. So, as our sensor moves around, it will record data from a total area of 2 degrees FOV. To simplify this down to what's relevant, let's only consider Y axis displacement. Let Y=0 be our starting position, where the FOV is cleanly divided horizontally with the red house on top and the green grass on bottom. Let Y=-1 be a half a degree moved down, so the grass fills 100% of the FOV; let Y=+1 do likewise but with regard to the house. Snapping a picture at Y=-1 will result in a purely green image; snapping a picture at Y=+1 will result in a purely red image; snapping a picture at Y=0 will result in a yellow image; snapping a picture anywhere in this range will result in some intermediate level of blending.
We can intuit from the setup described above that it should be possible to construct an image of a higher level of contrast than the sensor can provide. It seems we can sharpen the line of contrast as much as we want, regardless of the resolution of the sensor or FOV of the focus, by using a couple pieces of additional information.
In a world where infinitely precise measurements were possible, a single pixel could theoretically resolve a quasar, but we have some constraints on the information we can obtain.
We will be limited by the noise in the environment, noise in the sensor/camera, the A/D converter that changes the analog input of each color sub-pixel into some number of bits (which reduces the information for the sake of fidelity), and we will be limited by the accuracy with which we can measure the position of the camera at the moment each picture was taken (the key concept to this method).
To put this all together, we would begin with a certain number of pictures taken from a single vantage point while the camera was in some way vibrating or moving very subtly from one frame to the next. Within the metadata of each image, we would include the position of the camera (either by measuring absolute position in some way, or by using an accelerometer of some kind to measure the acceleration of the camera from frame to frame). Assuming we can measure the position with an accuracy within the FOV of a single pixel, then we should be able to use this method to improve the resolution of a single pixel by combining the position information with the color levels at that position, and comparing this to the color levels of nearby positions. As described, this is all scalable, and so it should be possible to implement even in cameras of very high res.
In theory, any camera coupled with an accelerometer of sufficient accuracy could be used to increase resolution beyond the resolution of a sensor. Though random vibrations from the environment could be used to achieve the displacement required, they would be practically random in direction and magnitude, and so more pictures would need to be taken of the same scene than strictly necessary. The optimal setup would be to have a mechanism for very controlled movements that could move the sensor in precisely calibrated ways so as to reduce redundancy, and allow for full coverage.
I don't know the details of how accurately we can measure positions in this way, and depending, this method might be lost to noise. Still, I found the thought experiment interesting, and would appreciate thoughts and elaborations.