Author Topic: MLV App 1.14 - All in one MLV Video Post Processing App [Windows, Mac and Linux]  (Read 994805 times)

mlrocks

  • Member
  • ***
  • Posts: 180
Computer: MBA (entry model)
CPU: M1
RAM: 8GB
GPU: 7 core
OS: Latest
App: mlv app apple silicon version
https://www.dropbox.com/s/ej6ufca61ijeeyf/MLV.App.v1.13.macOS.arm64.zip?dl=0
All default settings
Export to H264 MP4 HQ

Footage: UHD 1X3 12 bit lossless from 650D
Scene: The same test scene

Performance
Playback: 18 fps most of the time, not stable
Export time: 15 minutes, calculating at the 50% of the timeline. The first several minutes the speed was not stable and time was changing. I always use the time at 50% and times 2 to get the total time.

This version is about 50% better than the MLV APP Image version.


mlrocks

  • Member
  • ***
  • Posts: 180
It is interesting to see that my MBA performs similarly to my Xeon E3-1270V2 desktop in exporting UHD 1X3 footage. Makes me feel good that my old desktop is not outdated yet. LOL.

Danne

  • Developer
  • Hero Member
  • *****
  • Posts: 7297
I don't think m1 has so vastly improved on cpu speed, but on power saving.
Plenty to read about M1 cpu progression which explains why a "simple" iphone will outperform most(not all) intel processing units. Compiling mlv app with arm 64 makes all the difference in playback. Still mostly rosetta fallback when exporting with ffmpeg but apps are following. Resolve is already working quite nicely too on m1.

mlrocks

  • Member
  • ***
  • Posts: 180
Plenty to read about M1 cpu progression which explains why a "simple" iphone will outperform most(not all) intel processing units. Compiling mlv app with arm 64 makes all the difference in playback. Still mostly rosetta fallback when exporting with ffmpeg but apps are following. Resolve is already working quite nicely too on m1.

Maybe this explains that why M1 has 3 times better playback but the same export speed.

mlrocks

  • Member
  • ***
  • Posts: 180
Just tried another workflow on MBA M1.
In MLV App Apple Silicon, export setting to cdng uncompressed, dvr file name format. Export 1 minute UHD footage is about 1 minute. The cdng folder can be opened in Davinci Resolve 17. This way DVR's CUDA capability can be utlized to shorten the process time significantly.
I know many people here already know this workflow long time ago. Just a learning experience for me.
The cdng files cannot be opened in Blender 3.1 VSE. So this workflow does not help Blender.

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 1985
DNG "export" is no real export, it is more a copy action of RAW data. You could also use MLVFS - this saves time and disk space.
5D3.113 | EOSM.202

70MM13

  • Hero Member
  • *****
  • Posts: 548
Also it is best to use EXR format if you want to load the files into blender...

bouncyball

  • Contributor
  • Hero Member
  • *****
  • Posts: 849
What was the video card on this system?
Some integrated intel GPU (don't remember Xeon model now) as every server hardware have.

bouncyball

  • Contributor
  • Hero Member
  • *****
  • Posts: 849
Single thread performance seems much more important than multi threaded for playback.
Yup this 99% true. As I remember decoding and raw corrections of the frame happens in a separate thread but then debayering and color processing is done as multithreaded.

@masc You have shot some amazing clips before using 5D3/EOS M, it would be cool if you shorten some of them and upload them somewhere (if you still have them and don't mind :) ).
Yeah!
@theBilalFakhouri: did you mean he was more creative while using his good, old 5d2? :D

Skinny

  • Member
  • ***
  • Posts: 178
I suggest someone to record few MLV clips in different resolutions/modes and upload them somewhere.
And if we want to use one MLV file as "standard" for testing, it means it will be downloaded many times. And especially by the people who are new to ML, they just want to see how it looks, how it can be color graded and so on. So it's important to use something that will show ML RAW from the best side.. I think it should be:

1. Something really beautiful and not just test clips with some random stuff
2. It should look nice with default MLV App settings
3. Probably shot with 5D3 because it is the best ML raw camera for now
4. Shot using steadycam or tripod, no shaky footage obviously
5. 1920x1080 resolution because most people want at least true 1920 and at the same time file size won't be too big.
6. Filesize? ~2gb or more?

Just my thoughts.

mlrocks

  • Member
  • ***
  • Posts: 180
Computer: MBA (entry model)
CPU: M1
RAM: 8GB
GPU: 7 core
OS: Latest
App: mlv app apple silicon version
https://www.dropbox.com/s/ej6ufca61ijeeyf/MLV.App.v1.13.macOS.arm64.zip?dl=0
All default settings

Footage: UHD 1X3 12 bit lossless from 650D
Scene: The same test scene

Codec Optimization
Export to H264 MP4 HQ, ffmpeg, Export time: 15 minutes. Used as the baseline here.

Export to Prores 422 LT, ffmpeg, Export time: 12 minutes.
Export to Prores 422, ffmpeg, Export time: 8 minutes.
Export to Prores 422 HQ, ffmpeg, Export time: 10 minutes.
Export to Prores 4444, ffmpeg, Export time: 18 minutes.

All of the above codecs can be opened by the Blender VSE. If export to Prores 422, it is another two folds improvement. Maybe ProRes 422 is the most widely used and therefore the best optimized and the most mature one.


More:

AVID's codecs:
Export to DNxHD, 10 bit 1080p, frame rate override to 23.976, Export time: 6 minutes.
Export to DNxHR, 444 10 bit, Export time: 12 minutes.
Export to DNxHR, HQX 10 bit, Export time: 7 minutes.
Export to DNxHR, HQ 8 bit, Export time: 8 minutes.
Export to DNxHR, SQ 8 bit, Export time: 8 minutes.
Export to DNxHR, LB 8 bit, Export time: 7 minutes.

All of AVID's codecs can be imported in the Blender VSE.


Seems to me the best workflow for UHD 1x3 is to have camera frame rate override at 23.976 when recording, MLV App on Linux or Mac M1, Export to DNxHD 10 bit 1080p or DNxHR HQX 10 bit or LB 8 bit without even considering frame rate override to 23.976, edit in Blender VSE 3.1. For high resolution cinematic projects, DNxHR 444 10 bit encodes faster than H264 420 8 bit. AVID is the still the king of video editing.


In conclusion, using export codecs of AVID DNxHR HQX 10 bit, increases the encoding speed twice, and improves final image quality to 10 bit 422, comparing to the H264 codec. Apple's ProRes 422 is a good second choice, about 20% slower than the DNxHR HQX 10 bit. In general, AVID's codecs are efficient.



This is what Avid recommended the codecs were suitable for:

DNxHR LB - Low Bandwidth (8-bit 4:2:2) Offline Quality

DNxHR SQ - Standard Quality (8-bit 4:2:2) (suitable for delivery format)

DNxHR HQ - High Quality (8-bit 4:2:2)

DNxHR HQX - High Quality (12-bit 4:2:2) (UHD/4K Broadcast-quality delivery)

DNxHR 444 - Finishing Quality (12-bit 4:4:4) (Cinema-quality delivery)

theBilalFakhouri

  • Developer
  • Hero Member
  • *****
  • Posts: 894
Yup this 99% true. As I remember decoding and raw corrections of the frame happens in a separate thread but then debayering and color processing is done as multithreaded.

I don't know how MLVApp code works, never looked into it.
But in theory if we re-write the code somehow so it could utilize more threads for decoding and raw correction then we may have faster playback, would be that possible or *some parts of processing must be single threaded?

*Even if it's must be single threaded, I guess we can process multiple frames at the same time (multi processing), e.g. each frame uses one thread or two frames per thread:
First frame would be processed on CORE 0 while second frame is being processed on CORE 1 and so one for other cores, when CORE 0 finishes, it would load the next frame which we need to process.


I have ~16 FPS playback speed using MLVApp 1.13 (default settings, Bilinear debayer) on Ryzen 3900x, Clip used: UHD 1280x2160 10-bit lossless 23.976 FPS
CPU utilization: is only ~17%
Using Amaze debayer it's ~33% (~11 FPS)

I opened two copies of MLVApp, opened same clip, used default settings (Bilinear debayer), playback for each copy ~16 FPS, CPU utilization is ~35%, multi processing theory works, total ~32 FPS playback speed :D and there is 65% of CPU power left.

What I am saying in short:
There is a room for enhancement, even if this mean re-writing MLVApp from scratch. I bet someone from MLVApp team would do that :P (seems a lot of work).

That's why we should start dedicated Patreon account for MLVApp, if one (or two/all) of MLVApp team feels he can improve MLVApp but can't do it for free (fully understandable), funding is a solution for that. I think a lot of users would fund such thing beside there is no legal concerns. if MLVApp team don't want to work on MLVApp for whatever reason (again, it's fully understandable), the idea of creating Patreon account for MLVApp lives, but instead we may hire a freelancer.
 
Yeah!
@theBilalFakhouri: did you mean he was more creative while using his good, old 5d2? :D

I did not mean that, just meant some beautiful MLV clips and used 5D3/EOSM as example (masc seems to have nice MLV clips, and looks good out of the box in MLVApp I guess), I forgot about masc owned 5D2 before :P, I don't think I have watched 5D2 clips from masc (or I just don't remember how it was). Of course clips from 5D2 would be ideal too, to be used for uncompressed MLV benchmarking :D
New custom build for 100D/SL1 is up:
https://www.magiclantern.fm/forum/index.php?topic=26511.msg239125#msg239125
Porting still work in progress!

mlrocks

  • Member
  • ***
  • Posts: 180
I don't know how MLVApp code works, never looked into it.
But in theory if we re-write the code somehow so it could utilize more threads for decoding and raw correction then we may have faster playback, would be that possible or *some parts of processing must be single threaded?

*Even if it's must be single threaded, I guess we can process multiple frames at the same time (multi processing), e.g. each frame uses one thread or two frames per thread:
First frame would be processed on CORE 0 while second frame is being processed on CORE 1 and so one for other cores, when CORE 0 finishes, it would load the next frame which we need to process.


I have ~16 FPS playback speed using MLVApp 1.13 (default settings, Bilinear debayer) on Ryzen 3900x, Clip used: UHD 1280x2160 10-bit lossless 23.976 FPS
CPU utilization: is only ~17%
Using Amaze debayer it's ~33% (~11 FPS)

I opened two copies of MLVApp, opened same clip, used default settings (Bilinear debayer), playback for each copy ~16 FPS, CPU utilization is ~35%, multi processing theory works, total ~32 FPS playback speed :D and there is 65% of CPU power left.

What I am saying in short:
There is a room for enhancement, even if this mean re-writing MLVApp from scratch. I bet someone from MLVApp team would do that :P (seems a lot of work).

That's why we should start dedicated Patreon account for MLVApp, if one (or two/all) of MLVApp team feels he can improve MLVApp but can't do it for free (fully understandable), funding is a solution for that. I think a lot of users would fund such thing beside there is no legal concerns. if MLVApp team don't want to work on MLVApp for whatever reason (again, it's fully understandable), the idea of creating Patreon account for MLVApp lives, but instead we may hire a freelancer.
 
I did not mean that, just meant some beautiful MLV clips and used 5D3/EOSM as example (masc seems to have nice MLV clips, and looks good out of the box in MLVApp I guess), I forgot about masc owned 5D2 before :P, I don't think I have watched 5D2 clips from masc (or I just don't remember how it was). Of course clips from 5D2 would be ideal too, to be used for uncompressed MLV benchmarking :D

Blender adds render farm feature and multithreading and maybe 10 bit color depth editing in its latest VSE version. Blender has improved dramatically in the last five years, making it a serious option for video editing and VR creation. MLV App may consider adding render farm and improving multithreading in its future versions. ML coders and heros can set up your own render farm companies to create your own business for yourself and for the good of ML.

bouncyball

  • Contributor
  • Hero Member
  • *****
  • Posts: 849
What I am saying in short:
There is a room for enhancement, even if this mean re-writing MLVApp from scratch. I bet someone from MLVApp team would do that :P (seems a lot of work).
Yes it surely is a lot of work. Let me explain.

Decoding/Playing engine of MLVApp is not using SDL or any other existing lib. It is written by Markus from scratch in QT with a few humble suggestions from me. This includes everything: video + audio, sync of frames with audio, frame dropping if power is insufficient etc. Just one framebuffer is filling always in the background in a separate thread.

Preparing framebuffer consists of:
* reading the raw frame from MLV according to index (easy task initially written by Ilia and optimized/enhanced by me),
* decode frame it if comressed using lib written by @Baldwin which is quite tricky to optimize further (needs deep understanding of lossless jpeg 92 compression which is used for DNG/cDNG standard),
* do raw correction if checked (darkframe subtract, stripes removal, bad/focus pixels removal, dual iso, smoothing, implemented by me, non multithreaded because it is mostly adaptation of all existing ML/MLVFS raw correction code)

Then multithreaded debayer of framebuffer is done to Camera RGB space and after converting it, for now, to rec709 color space we are ready for image processing which is also uses some multithreading.

The most time consuming parts are (fields for optimization):
* decompresion of the image (some overhead)
* raw correction (mostrly light except smoothing and dual iso, there was also pattern noise removal, disabled now)
* color processing (this is the bitch)

In theory dividing clip logically and processing each chunk in parallel should decrease exporting time but won't improve playing speed.

Writing from scratch is no option as well :P

regards
bb

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 1985
@mlrocks: use AVFoundation instead of ffmpeg option and ProRes will be again faster (at better quality).

I don't know how MLVApp code works, never looked into it.
But in theory if we re-write the code somehow so it could utilize more threads for decoding and raw correction then we may have faster playback, would be that possible or *some parts of processing must be single threaded?

Today MLVApp is nearly completely multithreaded. Last single threaded features:
-> RBF denoiser, highlights/shadows/clarity. All works with a recursive bilateral filter. This only works single threaded. You could exchange the recursive thing by multithreading. This makes the filter way slower, even on Octacore+.
-> DualISO. Here is much room for improvement. Anybody is invited to create a new algorithm which works faster at same or better quality. I gave up with it - I tried to multithread the current algorithm and it never worked.
-> lossless de-/compression. shouldn't be too hard to do this, but won't have much effect. Yes: uncompressed is faster than lossless footage in MLVApp - but only some single percents.

---> all other algorithms are already multithreaded. (or did I forget something?)

I don't understand why MLVApp doesn't use your Ryzen. My M1 runs always at 95-100% when exporting in MLVApp.

*Even if it's must be single threaded, I guess we can process multiple frames at the same time (multi processing), e.g. each frame uses one thread or two frames per thread:
First frame would be processed on CORE 0 while second frame is being processed on CORE 1 and so one for other cores, when CORE 0 finishes, it would load the next frame which we need to process.
That's what we had in very first beta versions of MLVApp (process multiple frames single threaded) and it was very bad. You need 2 processing pipelines if you still like realtime preview: one for preview and one for export. Very hard to debug... and doesn't make much sense. We discussed it a lot - you should find this discussion very often here.
5D3.113 | EOSM.202

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 1985
* do raw correction if checked (darkframe subtract, stripes removal, bad/focus pixels removal, dual iso, smoothing, implemented by me, non multithreaded because it is mostly adaptation of all existing ML/MLVFS raw correction code)
RAW corrections are already multithreaded... I did that as far I remember.
5D3.113 | EOSM.202

bouncyball

  • Contributor
  • Hero Member
  • *****
  • Posts: 849
RAW corrections are already multithreaded... I did that as far I remember.
Yeah, yeah, right. we used OpenMP and discussed and corrected some issues then. Not every loop can be OMP accelerated.
In short we OMPed every loop and disabled some after issues. 99% of the correction code (except dual iso) is multithreaded.

bouncyball

  • Contributor
  • Hero Member
  • *****
  • Posts: 849
But... as I said image processing is a real heavy bitch :D

names_are_hard

  • Developer
  • Senior
  • *****
  • Posts: 499
  • 200D idiot
MLVApp uses QT Creator I think?  It should be relatively easy for users to profile their usage and see where the pain is coming from.  I've never done this with QT Creator but I'm a big fan of profiling if you're trying to optimise.

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 1985
Just the UI uses QT. All the processing is standard C, some 3rd party algorithms are standard C++.
5D3.113 | EOSM.202

devtone

  • New to the forum
  • *
  • Posts: 2
Any option to use NVidia GPU NVenc in MLV-APP to speed up export? I cant find any way to change the ffmpeg parameters to use NVenc.

mlrocks

  • Member
  • ***
  • Posts: 180
Anyone has MLV App workstations built with AMD Threadripper CPUs, or EPYC CPUs? How do they perform comparing to Intel or ARM opponents?

devtone

  • New to the forum
  • *
  • Posts: 2
My i7 10875h 8Core (HT Disabled) is fully used. 4 cores 100% MLVapp and 4 cores 100% ffmpeg. Seems to be multithreaded. That should scale pretty nice on those AMD's.

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 1985
Any option to use NVidia GPU NVenc in MLV-APP to speed up export? I cant find any way to change the ffmpeg parameters to use NVenc.
ffmpeg NVenc was for H.264? There is a way, but it isn't implemented, because I don't own any Nvidia graohics card. Without it, it can't work...
In MainWindow.cpp you'll find a function "startExportPipe". In this function you'll find
Code: [Select]
...else if( m_codecProfile == CODEC_H264 )...In the following lines you could change the command parameters and try what happens.
5D3.113 | EOSM.202

theBilalFakhouri

  • Developer
  • Hero Member
  • *****
  • Posts: 894
Thanks @bouncyball and @masc for the details.

I don't understand why MLVApp doesn't use your Ryzen. My M1 runs always at 95-100% when exporting in MLVApp.

Nah, MLVApp utilize my Ryzen 3900x up to ~80% during export, in reply #4961 I was talking about CPU utilization during Playback . .
New custom build for 100D/SL1 is up:
https://www.magiclantern.fm/forum/index.php?topic=26511.msg239125#msg239125
Porting still work in progress!