Author Topic: MLV App 1.14 - All in one MLV Video Post Processing App [Windows, Mac and Linux]  (Read 1041128 times)

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
Okay, probably not a leak using all your allowed process memory... 
Who knows :)
No idea if there is a limit per application, or a limit per app per time. Because we malloc/calloc a lot (but also free a lot). In sum we're about 200-300MB, but it changes all the time.

I can try and look for problems in the Linux version, they might apply on both. 
Thank you!

Did you try an ASAN build?
Sry, never heard about this. I will do some research.

Is this the right place to get current version?
https://github.com/ilia3101/MLV-App
Yes, correct.
5D3.113 | EOSM.202

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
Is there any chance that MLVApp will get VP9 codec export ?

Added VP9 lossless and CRF18. Please test.
https://github.com/ilia3101/MLV-App/commit/9a533b8a2e9c33d9cfb8c7d89b34b4858eb50b1f
5D3.113 | EOSM.202

theBilalFakhouri

  • Developer
  • Hero Member
  • *****
  • Posts: 1013
  • UHS-I
Cool! How does this work?

I don't know how does WSL2 work, I am using it for ML compiling and QEMU stuff, followed g3gg0 tutorial, some notes for WSL2 and DISPALY export, you will need X server also for MLVApp.

So WSL2 is some kind of VM designed by Microsoft, you can choose what Linux distribution you want to install when installing WSL2, it's Ubuntu by default (which I am using).


That was funny, but what is funnier:

Running MLVApp linux version using WSL2 on Windows 10 is faster compared to native MLVApp for Windows :D:

-Playback speed
-1360x1976 @ 23.976 FPS 14-bit lossless, default MLVApp 1.13 settings:
Windows MLVApp version on Windows 10: ~16 FPS
Linux MLVApp version via WSL2 on Windows 10: ~20 FPS

-1280x2160 @ 23.976 FPS 14-bit lossless, default MLVApp 1.13 settings:
Windows MLVApp version on Windows 10: ~15-16 FPS
Linux MLVApp version via WSL2 on Windows 10: ~18-19 FPS

-1736x2214 @ 23.976 FPS 11-bit lossless, default MLVApp 1.13 settings:
Windows MLVApp version on Windows 10: ~12 FPS
Linux MLVApp version via WSL2 on Windows 10: ~14-15 FPS

-Export
-1736x976 @ 23.976 FPS 14-bit lossless, default MLVApp 1.13 settings, ProRes 444:
Windows MLVApp version on Windows 10: 2:18 (2 minuets and 18 seconds)
Linux MLVApp version via WSL2 on Windows 10: 1:54 (1 minuets and 54 seconds)

-Same CPU utilization for both versions.
-More tests are probably needed.


Regarding MLVApp crash on Windows 10:
Yeah, I can have some random crashes especially when rendering time is long, probably same memory issue.

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
Running MLVApp linux version using WSL2 on Windows 10 is faster compared to native MLVApp for Windows :D
Haha... how funny is that.
5D3.113 | EOSM.202

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
Masc - thanks for the reply.  It built nice and easily.  Had to fiddle around a bit with config (never used it before), am now trying to repro dark subtraction crash.

Only complaint so far: it expects ffmpeg binary to be in the same dir as mlvapp, and the error message isn't very good if it's missing "encoder ffmpeg missing".  I have ffmpeg in system path, but not next to mlvapp.  Had to use strace to realise it was opening with AT_FDCWD, and then copy the binary into my build location.  Is this deliberate?  Maybe people need to use custom ffmpeg versions sometimes?  Maybe it's fixed by a proper install (I just ran make and started mlvapp from there).  It would be nicer for me if when ffmpeg is not found, it tried to use the one in system path.  Maybe that would be a bad default for other people, I don't know.

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
Top during dark frame subtraction export:

Code: [Select]
1395499 username       20   0 1710964 267988  22812 R 888.7   0.8  51:24.36 ffmpeg                                                                                 
1395388 username       20   0   18.0g  15.2g  53684 R 324.6  48.5  18:32.42 mlvapp

The amount of reserved mem steadily increases through the export.  Peaked at 20GB.  No crash here, this machine is fat.  It would surely crash if you had less ram + swap.  I would expect it's easy to observe memory going up during the export - should it be doing this?  Exported file is 7.7GB.  Maybe we're keeping a reference to each frame, something like that?  So they don't get garbage collected during the export?  Will dig a bit deeper.

theBilalFakhouri

  • Developer
  • Hero Member
  • *****
  • Posts: 1013
  • UHS-I
Peaked at 20GB.  No crash here, this machine is fat.  It would surely crash if you had less ram + swap.

My system has 64 GB of RAM, and sometime it can crash very early when dark-frame subtraction is on, like after hitting export by ~10 seconds.
Did you make your test on Windows? also what do you mean by "swap"?

If you want to me to run some analysis on my machine, please let me know.

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
I don't have a Windows machine to test on.  Swap is disk space reserved to swap memory to if ram is exhausted.

Currently I am trying to run export under valgrind, but it might not be practical, it's so slow :)

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
Valgrind has found one likely error so far, conceivably related to the dark subtraction issue, though I'd guess probably not.  It's an easy fix in code to test.

Here, in dng.c, we ROR32 over a pointer into a uint16 buffer.  This can read 2 bytes past the end of buffer (this *probably* won't crash on Windows, which typically has readable bytes after the allocated space on the heap.  I think if you exactly hit the end of a page boundary maybe it doesn't).  Probably we should check the buffer is 4 byte aligned earlier on?  Sometimes these type of errors are FPs by valgrind when it doesn't understand the asm for the function, but ROR32 over 16 bit buffer feels likely to be real to me.

Code: [Select]
703         uint32_t uncorrected_data = *((uint32_t *)&packed_bits[bits_address]);
 704         uint32_t data = ROR32(uncorrected_data, rotate_value);

Valgrind dump so people unfamiliar can see how useful it is:
Code: [Select]
==1650919== Thread 35:
==1650919== Invalid read of size 4
==1650919==    at 0x1FC885: dng_unpack_image_bits._omp_fn.0 (dng.c:704)
==1650919==    by 0x4890DE5: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==1650919==    by 0x5DEFEA6: start_thread (pthread_create.c:477)
==1650919==    by 0x6231DEE: clone (clone.S:95)
==1650919==  Address 0xc676e0e is 967,678 bytes inside a block of size 967,680 alloc'd
==1650919==    at 0x483AB65: calloc (vg_replace_malloc.c:760)
==1650919==    by 0x2114F1: df_load_ext (darkframe.c:102)
==1650919==    by 0x2119BC: df_validate (darkframe.c:256)
==1650919==    by 0x15A5E2: MainWindow::on_lineEditDarkFrameFile_textChanged(QString const&) (MainWindow.cpp:9728)
==1650919==    by 0x26D5C2: MainWindow::qt_metacall(QMetaObject::Call, int, void**) (moc_MainWindow.cpp:1764)

Shows you which buffer was used badly, including where it was allocated.  Learn to use valgrind if you're debugging C or C++! (works on anything but especially useful in these languages).

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
Probably a mem leak.  It took over two hours to run the test:

Code: [Select]
==1650919== HEAP SUMMARY:
==1650919==     in use at exit: 2,168,469,999 bytes in 10,637 blocks
==1650919==   total heap usage: 3,537,763 allocs, 3,527,126 frees, 62,845,029,066 bytes allocated
==1650919==
==1650919== LEAK SUMMARY:
==1650919==    definitely lost: 1,769,939,527 bytes in 4,387 blocks

Now I need to run it again with more logging, which will make it take longer.

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
I made a much shorter clip, hoping it would still show the leak.  It did, and ran faster.

Code: [Select]
==1685697== 182,476,800 bytes in 55 blocks are definitely lost in loss record 511 of 511
==1685697==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==1685697==    by 0x1DC83C: openMlvClip (video_mlv.c:1876)
==1685697==    by 0x2114A1: df_load_ext (darkframe.c:57)
==1685697==    by 0x1DFE3C: applyLLRawProcObject (llrawproc.c:172)
==1685697==    by 0x1D9C30: getMlvRawFrameFloat (video_mlv.c:308)
==1685697==    by 0x1D879A: get_mlv_raw_frame_debayered (frame_caching.c:305)
==1685697==    by 0x1DA0C5: getMlvRawFrameDebayered (video_mlv.c:439)
==1685697==    by 0x1DA190: getMlvProcessedFrame16 (video_mlv.c:465)
==1685697==    by 0x160B8F: MainWindow::startExportPipe(QString) (MainWindow.cpp:2572)
==1685697==    by 0x170093: MainWindow::exportHandler() (MainWindow.cpp:8128)
==1685697==    by 0x17D688: MainWindow::on_actionExport_triggered() (MainWindow.cpp:6596)
==1685697==    by 0x26D5C2: MainWindow::qt_metacall(QMetaObject::Call, int, void**) (moc_MainWindow.cpp:1764)

Valgrind thinks the allocation to rgb_raw_current_frame is not always being freed, and because "definitely" lost, we are getting to a state where there are no references to that block of mem.  That suggests we overwrite the pointer.
Code: [Select]
1874     /* For frame cache */
1875     video->rgb_raw_frames = (uint16_t **)malloc( sizeof(uint16_t *) * video->frames );
1876     video->rgb_raw_current_frame = (uint16_t *)malloc( getMlvWidth(video) * getMlvHeight(video) * 3 * sizeof(uint16_t) );
1877     video->cached_frames = (uint8_t *)calloc( sizeof(uint8_t), video->frames );

Hacked in some quick printf debugging around alloc free of rgb_raw_current_frame and got this:
Code: [Select]
initMlvObject hit
rgb_raw_current_frame alloc'd
freeMlvObject hit
rgb_raw_current_frame free'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd
rgb_raw_current_frame alloc'd

That's truncated a lot.  Looks like it allocates to rgb_raw_current_frame every frame that's exported, and never frees them.

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
Wow wow wow. Thank you so much @names_are_hard. I'll have a look for this variable. So linux "ps" or something already shows you 20GB of RAM usage? On Windows or macOS it is all the time about 200-300MB here and at least on macOS I could export over days without any swap.
Now I'll look for, what rgb_raw_current_frame does exactly...
5D3.113 | EOSM.202

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
@names_are_hard: your tests are done using a darkframe? I think I now know how what happens and will try to fix this. Thank you!
Did you also tested without a darkframe? Because last week here I also had a crash - whyever this did not happen before in endless tests.
5D3.113 | EOSM.202

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
Find a small change in the latest commit.
Could you all test again please? Would be interesting if this runs stable on all the different platforms. macOS only here - I don't "feel" any change :D .
5D3.113 | EOSM.202

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
Only complaint so far: it expects ffmpeg binary to be in the same dir as mlvapp, and the error message isn't very good if it's missing "encoder ffmpeg missing". 
Hm... do you use latest commit? Since some time the message tells "Encoder ffmpeg missing in application path.". When compiling Qt should extract and copy the right ffmpeg version automatically into the application path. Just on Windows we have no alternative for this yet.
5D3.113 | EOSM.202

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
Cool, glad if it seems helpful testing.  Bilal gave nice repro instructions so I was trying to do that for dark frame export, yes.  Haven't tried anything else yet.

Yes, high memory usage via "top".  Not very scientific.

Your change seems the right kind of thing to me, I tried a quick hack addition of freeing raw_rgb_current_frame as part of dl_free() but that seg faulted, so I didn't mention it.  I didn't know where cleanup should live :)  Testing now.

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
Nice, much better:

Code: [Select]
==1745753== LEAK SUMMARY:
==1745753==    definitely lost: 7,140 bytes in 32 blocks
==1745753==    indirectly lost: 116,660 bytes in 156 blocks
==1745753==      possibly lost: 7,961,728 bytes in 30 blocks

There were several other leaks besides the big one I listed before, these are also fixed - makes sense, you free a bunch of related things in the latest change.

Most of the remaining leaks are a single block per size.  Often that means you created something once at the start and never free it.  That's fine if you want it to exist until the program exits.  It's nicer to explicitly free on exit, just so leak checkers don't FP on it.  Not important beyond that.

Bilal, does that change stop your crash?  This commit: https://github.com/ilia3101/MLV-App/commit/faddb3e1b5a1cec8c73b85252728fe031b6b23d3

Oh yeah, re ffmpeg, I am only doing make, not make install.  So perhaps it's not expected that ffmpeg gets copied for me?

theBilalFakhouri

  • Developer
  • Hero Member
  • *****
  • Posts: 1013
  • UHS-I
I made few tests (I am willing to do more later today with a lot of clips):

MLVApp compiled by MinGW_32:
I exported a clip five times with "Darkframe Subtraction" on and no crashes so far . .

MLVApp compiled by MinGW_64:
I exported a clip two times with "Darkframe Subtraction", first time no crash, but in the second try it crashed.

Tests made on Windows 10 x64 (no WSL2 used here :P).

Q:Does compiling with MinGW_32 makes MLVApp version x32 too? and same for MinGW_64, makes MLVApp x64?

Nice work @names_are_hard and @masc, many thanks for looking into it.

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
I ran a non-dark frame export with a bunch of different options turned on and didn't get anything suspicious.  These tests take about 20 minutes to export a 2s clip, so I can't be bothered doing them if there's not a decent chance it will find something.  If you have something fairly reproducible and can share e.g. a session file so I can copy it, I'm quite happy to try it.

Since you're on Mac, Valgrind isn't well supported (it used to work okay, Apple broke it with Big Sur).  ASAN via clang does some of the same things, I strongly recommend you try it (introduce some buffer overflow bugs to test it in action).  According to Stack Overflow, Apple clang doesn't support leak checking via ASAN, so you'll want to get llvm from Brew or similar.  Then you need to change build options to include "-fsanitize=address", and when running, use "ASAN_OPTIONS=detect_leaks=1".  ASAN is faster than Valgrind, but not as thorough.

I've got two small fixes to current code that are unrelated to export, I'll PR them later.

names_are_hard

  • Developer
  • Hero Member
  • *****
  • Posts: 506
  • 200D idiot
Q:Does compiling with MinGW_32 makes MLVApp version x32 too? and same for MinGW_64, makes MLVApp x64?

Depends on build system.  Both compilers should be able to make 32 and 64 bit output files.  You can inspect the exe or the running process to find out what they've done.  Task Manager, should be a Platform column (maybe not visible by default?).  A 32 bit process will be limited to 2GB mem so this can be quite relevant.  Do you see the same crashes via WSL?

I saw the leaks from a 64 bit mlvapp (but on Linux).

Can you share a session file with me?  I can edit to use my MLV files.  If that doesn't repro then sharing your clips might help, hopefully not needed (exporting a 3 min clip takes several hours so I would like to avoid this!).

theBilalFakhouri

  • Developer
  • Hero Member
  • *****
  • Posts: 1013
  • UHS-I
Depends on build system.  Both compilers should be able to make 32 and 64 bit output files.  You can inspect the exe or the running process to find out what they've done.  Task Manager, should be a Platform column (maybe not visible by default?).

Thanks for the answer, yeah I can see if a program is 64 bit or 32 bit in Windows task manager and it's visible by default.

Do you see the same crashes via WSL?

Well, could you provide a compiled MLVApp version (with the latest commit) on Linux?

Can you share a session file with me? ..

Sure, here is one which causes crashing issues, and was used in the following tests:

Tests:

-Two 1736x976 clips and one 1736x1160 clip, all loaded and "Darkframe Subtraction" on (same .MLV dark frame used for the 1736x976 clips), also .MLV dark frame are loaded in MLVApp.

64 bit:
-MLVApp 1.13 (official version):
 Made three tests, and MLVApp crashes after few minutes, same thing in the three tests.

-MLVApp with latest commit : (compiled with MinGW_64)
 Made three tests, and MLVApp crashes after few seconds when it start exporting (same thing in the three tests).

32 bit:
-MLVApp 1.13 (official version):
 Made three tests, and MLVApp crashes after few seconds, same thing in the three tests.

-MLVApp with latest commit : (compiled with MinGW_32)
 Made three tests, and MLVApp crashes after few minutes in two tests, third test it crashed in less than a minute.

Same exporting settings mentioned here. "Darkframe Subtraction" was on in all tests.

2blackbar

  • Hero Member
  • *****
  • Posts: 507
At least one pass should be doable.
https://trac.ffmpeg.org/wiki/Encode/VP9

What mode is required? Lossless? Or constant bitrate? Or...
Well for me , id like to export VP9 encoded to resolutions like 8k cause ive got some 6k/8k from other blackmagic cams ( yeah braw works in mlvapp ) and settings that would allow them to work on youtube so 10 bit.Ideally id like adjustable bitrate so i could have own settings but i asked about it few years ago and you stated that its complex to add a box with customisable bitrate values so i have my own compiled versions when i bumped up quaity of x265 and h264.
Ideally - lossless, VBR,CBR and a box where we can set own values for VBR and CBR

masc

  • Contributor
  • Hero Member
  • *****
  • Posts: 2007
Tested latest commit today on Win10 with Qt 5.13.2, MinGW_64 7.3.0, dynamic build; 1736x976, 14bit, >3200 frames clip, with darkframe subtraction. Exported this clip ~20x to ProRes4444 without any issue over the entire day, using the settings described by theBilalFakhouri.
5D3.113 | EOSM.202

theBilalFakhouri

  • Developer
  • Hero Member
  • *****
  • Posts: 1013
  • UHS-I
@masc

Could you provide the compiled version with latest commit for Windows?

bouncyball

  • Contributor
  • Hero Member
  • *****
  • Posts: 849
Tested latest commit today on Win10 with Qt 5.13.2, MinGW_64 7.3.0, dynamic build; 1736x976, 14bit, >3200 frames clip, with darkframe subtraction. Exported this clip ~20x to ProRes4444 without any issue over the entire day, using the settings described by theBilalFakhouri.

Very nice job guys! Thank you all!