When we do GPU-based image processing with Fast CinemaDNG Processor software, we have to bear in mind that performance of CPU and SSD also play an important role in this process, since for any image we have to read, to download, to parse, to decode DNG, and only then we can upload uncompressed raw data into GPU memory for image processing and display.
The main bottleneck for CPU is fast DNG decoding. Typically DNG images are encoded with lossless compression algorithm and usually this is Lossless JPEG.
Lossless JPEG algorithm is essentially serial, so GPU can't help at decoding. To speed up decompression on CPU, it is possible to do decoding of each tile or of the entire image in a separate thread, and one can accelerate Huffman decoding algorithm on CPU.
We have implemented both methods, so we can do lossless jpeg decoding at multi-threaded mode, and we have also optimized the process of DNG decoding on CPU. It's difficult to say how fast that new DNG decoder, because decoding performance strongly depends on image content. Here you can see some benchmarks which correspond to the best and the worst cases of Lossless JPEG decoding for multithreaded applications. These examples illustrate the idea of multithreading performance for lossless jpeg decoding on multicore CPU.
16-bit image, compression ratio 10.4 bpp (lossless compression)
LJ92 (library liblj92): 266 MPix/s
Fastvideo LJ Decoder: 407 MPix/s
12-bit image, compression ratio 5.6 bpp (lossless compression)
LJ92 (library liblj92): 284 MPix/s
Fastvideo LJ Decoder: 475 MPix/s
These results show that fast DNG decoding on CPU is possible in realtime for DNG series with 4K resolution and more. Decoding optimization, vectorization and multithreading are key factors to achieve high performance decoding.
At the following link there is more detail concerning benchmarks and other info about lossless jpeg decoding on CPU:
http://www.fastcinemadng.com/info/jpeg/lossless-jpeg-decoder.htmlOur new DNG decoder helps to reduce CPU load. Earlier, due to lack DNG decoding speed on CPU, we ran into problems with smooth video playback for 4.6K footages from BMD URSA, and now we don't have these jerks on good PC even in the case if we switch on gpu-based denoiser.