MLV App 1.14 - All in one MLV Video Post Processing App [Windows, Mac and Linux]

Started by ilia3101, July 08, 2017, 10:19:19 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.


vastunghia

Quote from: names_are_hard on January 22, 2023, 08:44:00 PM
I think we're learning that old Macs are slow, not new Macs are fast :)

Ahah, maybe, yes.

But actually the fact that by just switching to updated libs / different compiler we were able to achieve a 300% performance increase tells me that code optimization is far from perfect. And that could largely explain observed differences in performances on different machines.

Just my 2 cents.

EDIT: btw I tried MLV App official 1.14 release on my x86 Mac Windows partition and in terms of playback it sits just below latest local build with Qt6+llvm15 on MacOS (but still much higher than official 1.14 build on MacOS).
5D3 for video
70D for photo

Mattia


masc

Quote from: vastunghia on January 22, 2023, 11:24:34 PM
EDIT: btw I tried MLV App official 1.14 release on my x86 Mac Windows partition and in terms of playback it sits just below latest local build with Qt6+llvm15 on MacOS (but still much higher than official 1.14 build on MacOS).
Official x86 macOS build was compiled with the old llvm@6, for supporting all macOS since 10.9.5. It is possible, that this is slower.
5D3.113 | EOSM.202

vastunghia

5D3 for video
70D for photo

70MM13

are we getting close to a new mlv app official version?
i'd like to try it again comparing to resolve, and any extra speed for the windows version would be most welcomed!

iaburn

While cleaning and bringing back the custom parameters to the dual iso processing, I've noticed that activating "fullres reconstruction" is causing the pink stripes on the highlights, even on the updated code.
There is a comment from the programmer:

/* reconstruct a full-resolution image (discard interpolated fields whenever possible) */
/* this has full detail and lowest possible aliasing, but it has high shadow noise and color artifacts when high-iso starts clipping */


There is a hidden option on the current version of MLV App to toggle fullres reconstruction on and off, but it is disabled. I enabled the option and on the updated code it works as expected: turning that off fixes the artifacts, but on the "old" version the image is totally broken.
Not trying to find out why, I'm happy that it's broken, otherwise I updated the code for nothing xD

Activating "Fullres reconstruction" improves the resolution on the shadows but can break highlights, so I will try to add an option to do fullres reconstruction only on the shadows for these cases

masc

Sometimes it is good, if we just hide features instead of deleting them, if not working... nice find!  8)

Quote from: 70MM13 on January 25, 2023, 05:30:36 PM
are we getting close to a new mlv app official version?
i'd like to try it again comparing to resolve, and any extra speed for the windows version would be most welcomed!
Quote from: Mattia on January 23, 2023, 12:18:51 PM
Could anyone share an updated build of Mlv app for Windows?
Each commit and dev work brings us closer... but there is still a way to go. iaburn just started porting dualiso and already did a great job. Thank you!

Don't expect the Windows version to get faster until now - this was a macOS compiler optimization. And Windows offers no hardware support for ProRes encoders.
As compiling MLVApp is really very easy, you could compile each commit you like, test and report.
5D3.113 | EOSM.202

BatchGordon

Quote from: iaburn on January 25, 2023, 09:37:40 PM
Activating "Fullres reconstruction" improves the resolution on the shadows but can break highlights, so I will try to add an option to do fullres reconstruction only on the shadows for these cases

From the documentation of the a1ex dual-iso algorithm, full frame reconstruction should only be possible for midtone and impossible for both highlights and shadows.
This is because only the midtones fall into the "sensitive" part of both the low and high ISO lines.


iaburn

Quote from: BatchGordon on January 26, 2023, 10:59:30 AM
From the documentation of the a1ex dual-iso algorithm, full frame reconstruction should only be possible for midtone and impossible for both highlights and shadows.
This is because only the midtones fall into the "sensitive" part of both the low and high ISO lines.

You are right, but I was talking one step before the final blending, when they build the non-interpolated fullres image out of the interpolated, exposure corrected, bright and dark images. For context:
Interpolated bright image example:


Interpolated dark image example:


Then they build an initial fullres image using pixels from the bright image if it's a bright row (and not overexposed in theory), or from the dark image if it's a dark row:

for (int y = 0; y < h; y ++)
{
    for (int x = 0; x < w; x ++)
    {
        if (BRIGHT_ROW)
        {
            uint32_t f = bright[x + y*w];
            /* if the brighter copy is overexposed, the guessed pixel for sure has higher brightness */
            fullres[x + y*w] = f < (uint32_t)white_darkened ? f : MAX(f, dark[x + y*w]);
        }
        else
        {
            fullres[x + y*w] = dark[x + y*w];
        }
    }
}



It looks like there are cases where pixels are so overexposed that doesn't look overexposed anymore, and the comparison is taking the pink rows from the bright image anyways.
This is what I would like to change to keep as much detail as possible, while avoiding the pink parts. I tried ignoring the comparison with white_darkened and taking always the MAX, like this:
fullres[x + y*w] = MAX(f, dark[x + y*w]);
And that gives visually a resolution between the halfres and the fullres with no pink artifacts. The problem is that I don't understand what the values of the pink areas means, I don't know how to figure out when a pink pixel is "too pink" and should be discarded. The comparison with "white_darkened" seems to work most of the time, but that's just too complex for me.
Maybe someone with more knowledge on the topic can help us...




iaburn

I took a naive solution, thinking that dark and bright images with exposure correction should be similar, I'm comparing the absolute value of the difference between the dark and the bright row and if the result is greater than a threshold value, just assign MAX(f, dark[x + y*w]).
If it's smaller, keep the current f < (uint32_t)white_darkened ? f : MAX(f, dark[x + y*w])

That keeps all the detail if the threshold value is correctly selected, but the perfect value I guess will be different between clips, cameras, configurations...
I took 50000 and works fine for my clip tests

Danne

Cool, will be nice to test when ready. Are you considering adding back the openmp code? Would be cool if that was working as speed is totally needed for dualiso  :P.

BatchGordon

Yes, the idea of comparing points between images and taking the one from the low iso image if the difference exceeds a threshold might be naive... but in my opinion it's also quite wise.  ;)

The choice of the threshold value can be critical and, as you said, it can be different depending on many things (especially the selected ISO values).
However, I still have the impression that the root of the problem may be exposure for video: ETTR should be applied for the high iso, not the low one.

Technically speaking, with dual ISO we gain more detail in the shadows, not in the highlights (I said technically because in post we can change exposure as we prefer).
That's why a1ex reduces the exposure of the high iso image before merging, instead of increasing the exposure of the low iso.

By the way, his blending algorithm is much more complex and sophisticated, but I suspect it's also too slow to be usable for video. That's why it has been simplified in the implementation.

iaburn

But if you ETTR for higher ISO on Dual ISO, the darks on the low ISO will be useless as they will probably be 0 or at least below the noise level.
ETTR to the low ISO causes trouble with the overexposed high ISO part, but it makes more sense because it keeps the highlights while also improving the shadow noise thanks to the high ISO part.
According to this chart, up to 2 stops of shadow improvement: https://photonstophotos.net/Charts/PDR_Shadow.htm#Canon%20EOS%20M

iaburn

Quote from: Danne on January 27, 2023, 10:42:29 AM
Cool, will be nice to test when ready. Are you considering adding back the openmp code? Would be cool if that was working as speed is totally needed for dualiso  :P.

I saw a branch from many years ago willing to use OpenCL, it would be awesome if we could use the GPU, it's made for this! Imagine fully using the GPU and CPU... :o

Another important thing is that it looks like the code was not deeply reviewed for optimization, more like trying to keep it "easy" to understand.
So there is also room for improvement on the algorithms implementation, but for this you really need to understand what's going on :S

I personally export to DNG and edit in Resolve, so for me the easiest way to use multiple CPU cores would be assigning 1 frame per core. If we can also use the GPU for processing each frame, that would be a dream <3

Danne

Gpu, hehe, hard shit.
I was referring to theBilalfakhouri link before. It was an improvement when it was in there but a bit buggy. Well, let's see later on :)

masc

@iaburn: if you know how to use the GPU... I have absolutely no idea here. I just heard it is impossible with C/C++, because it has its own kernel language. So this means mostly rewriting everything.
Yes, in the past we tried to use the GPU for debayering. In the end we had success with it technically, but it was slower as with CPU, because the copy actions between graphic RAM and main RAM needed nearly the same time as if the CPU debayers. So GPU might be cool if you process all on it - single tasks are probably for nothing.
5D3.113 | EOSM.202

iaburn

Quote from: theBilalFakhouri on January 22, 2023, 03:48:15 PM
@iaburn

Nice work on Dual ISO :) , finally we have new one who can deal with Dual ISO code :D
Could you fix multi-threading issue too? :P

Dual ISO multi threaded vs single threaded

Keep it up!

I missed your post! I absolutely cannot deal with this code, let alone doing all the rewrite needed for safe multithread  :o

I was just trying to fix the pink stripes and see how hard would it be to speed up dual ISO processing, but the second part it's really hard... I have no experience in multithreading nor GPU programming sadly, I just have good intention and hope haha

names_are_hard

Quote from: masc on January 27, 2023, 01:59:35 PM
@iaburn: if you know how to use the GPU... I have absolutely no idea here. I just heard it is impossible with C/C++, because it has its own kernel language. So this means mostly rewriting everything.
Yes, in the past we tried to use the GPU for debayering. In the end we had success with it technically, but it was slower as with CPU, because the copy actions between graphic RAM and main RAM needed nearly the same time as if the CPU debayers. So GPU might be cool if you process all on it - single tasks are probably for nothing.

I've only tinkered with GPU programming a little, so don't take my advice very seriously here.  You're right that GPU programming is a different way of thinking, but, if you work in CUDA, the language you use is very C-like:
https://cuda-tutorial.readthedocs.io/en/latest/tutorials/tutorial01/

Note that CUDA is Nvidia only.  But it's a good interface and is a large part of why Nvidia dominate GPGPU.

Yes, bandwidth to the GPU is often a bottleneck.  But, you only need to copy once for all GPU cores to see the data.  So, you want to divide the problem up, have each core debayer a portion of the image, then combine.  Copy in once, process in parallel,  copy out once.

Debayer should be an easy algorithm to split up like this, and indeed there are free libraries available to do GPU debayer, e.g.: https://github.com/avionic-design/cuda-debayer

masc

I think CUDA is no option at all. In comparison there are just a few computers able to run this, while e.g. OpenCL works on nearly all computers out of the box. So if we would implement GPU features, I'd try again with OpenCL. And there is already a CUDA based MLV tool: fastcinemadng. I was never able to try it - no CUDA GPU in my near.

@iaburn: openmp is not difficult to understand. There are these #pragma lines, which will make 'for' loops multi threaded. There should be no dependency between the loop iterations - this can't work. (e.g. writing on same variable)

This would work:

#pragma ...
for(int i = 0; i < n; i++)
{
    c[i] = a[i] + b[i];
}
5D3.113 | EOSM.202

iaburn

I've been watching an OpenMP tutorial and trying after on some expensive loops, and the results are encouraging :D
On the currently available windows version, it took 5 minutes 33 seconds to process 100 frames from a 2.5K Dual ISO clip, with amaze interpolation and alias map disabled.
With the same configuration and clip, it took just 2 minutes 22 seconds with the new compilation, and I could see on the CPU graph a constant "chainsaw" on all CPU cores, as opposed to the flat graph while exporting on the "old" version.
Edit: Forgot to say that this is on a Ryzen 3400G with 4 cores 8 threads

I also took the chance to compare the output and it was exactly the same, but with the pink strips fixed :)

It would be cool if someone can give it a try and share his results, but it definitely worth using openmp. Here my WIP changes: https://github.com/anibarro/MLV-App


Spoke too soon, I guess I made a mistake testing...

Quote from: masc on January 27, 2023, 08:05:42 PM
@iaburn: openmp is not difficult to understand. There are these #pragma lines, which will make 'for' loops multi threaded. There should be no dependency between the loop iterations - this can't work. (e.g. writing on same variable)

This would work:

#pragma ...
for(int i = 0; i < n; i++)
{
    c[i] = a[i] + b[i];
}


Just read your post, you are right, I thought it was going to be more complicated!

names_are_hard

Quote from: masc on January 27, 2023, 08:05:42 PM
I think CUDA is no option at all. In comparison there are just a few computers able to run this, while e.g. OpenCL works on nearly all computers out of the box. So if we would implement GPU features, I'd try again with OpenCL. And there is already a CUDA based MLV tool: fastcinemadng. I was never able to try it - no CUDA GPU in my near.

Right, if you did want to use it you'd certainly need it to detect at runtime and be optional.  Which is more complexity.  OpenMP is nice and general.

names_are_hard

Quote from: iaburn on January 27, 2023, 08:11:23 PM
It would be cool if someone can give it a try and share his results, but it definitely worth using openmp. Here my WIP changes: https://github.com/anibarro/MLV-App
Just read your post, you are right, I thought it was going to be more complicated!

I can run some comparisons, is there some CLI way to start MLVApp and run a task?  Would be nice for doing a reproducible test.  If not, please describe how to run a test (what options to enable, sample file to use, etc).  I guess I want to compare your 8e1b6d89 against 95d8d20a?

iaburn

Quote from: names_are_hard on January 27, 2023, 08:17:36 PM
I can run some comparisons, is there some CLI way to start MLVApp and run a task?  Would be nice for doing a reproducible test.  If not, please describe how to run a test (what options to enable, sample file to use, etc).  I guess I want to compare your 8e1b6d89 against 95d8d20a?

I think I spoke too soon, I tried more videos and now I cannot replicate the speed gain, it's just the same or worst... :(
I will try to find out why  ::)