Author Topic: Benchmarking & CFLAGS  (Read 13502 times)

Marsu42

  • Contributor
  • Hero Member
  • *****
  • Posts: 1557
  • 66d + flashes
Benchmarking & CFLAGS
« on: September 08, 2012, 01:28:31 AM »
1. cpu optimization:

I tried different optimization flags for the arm cpu and would advise you use "CFLAGS_USER= -march=armv5te -mcpu=arm946e-s" as it reduces the autoexec.bin size by -12% while the "CFLAGS_USER= -march=armv5te -mtune=arm946e-s" only cuts 6%, both result in no performance changes in the zebra benchmark.

2. gcc optimization:

On my 60d, a -O2 version is 20% faster in zebras than -Os. However, -O3 crashes the camera, so I cannot say how much faster that would be. The problems have to be one or some of the added flags over -O2, i.e. "-finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone" ... with try & error it would be possible to find the culprit and exclude it like for example "-O3 -fnoinline-functions" if anyone has time to spare.

3. gcc version:

The Linaro gcc 4.7 shows no fps improvement over vanilla fsf 4.6. That isn't really a surprise, many Linaro optimizations are for the latest arm cores, and it's only the zebra test that was tested. Still, as I wrote before Ubuntu & Linaro have switched to Linaro and it certainly doesn't hurt. I'll run this compile on my camera and see if any regressions occur, but since few things changed from gcc 4.6->4.7 I don't expect any.

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #1 on: September 08, 2012, 02:51:26 AM »
I put O3 in and binary size got bigger. Its compiling the modules from .c files with O3. Will try other option.


I put O3 everywhere and no further changes:

Code: [Select]
%.s: %.c
$(call build,CC -S,$(CC) $(CFLAGS) -S -o $@ $<)
%.o: $(SRC_DIR)/%.c
$(call build,CC,$(CC) $(CFLAGS) -c -O3 -o $@ $<)
%.o: $(PLATFORM_DIR)/%.c
$(call build,CC,$(CC) $(CFLAGS) -c -O3 -o $@ $<)
%.i: %.c
$(call build,CPP,$(CC) $(CFLAGS) -E -c -O3 -o $@ $<)
%: %.c
$(call build,LD,$(CC) $(CFLAGS) -o $@ $<)
%.o: %.S
$(call build,AS,$(CC) $(AFLAGS) -c -O3 -o $@ $<)
%.bin: %
$(call build,OBJCOPY,$(OBJCOPY) -O binary $< $@)

I tried the cflags user and it didn't change size.

Putting those flags in CFLAGS produces:

Code: [Select]
/home/user/arm-toolchain462/lib/gcc/arm-elf/4.6.2/../../../../arm-elf/bin/ld: error: dm-spy.o uses VFP instructions, whereas magiclantern does not
/home/user/arm-toolchain462/lib/gcc/arm-elf/4.6.2/../../../../arm-elf/bin/ld: failed to merge target specific data of file dm-spy.o
collect2: ld returned 1 exit status
make[1]: *** [magiclantern] Error 1
make[1]: Leaving directory `/home/user/magic/tragic-lantern/platform/600D.102'
make: *** [600D] Error 2

a1ex

  • Administrator
  • Hero Member
  • *****
  • Posts: 12564
Re: Benchmarking & CFLAGS
« Reply #2 on: September 08, 2012, 07:40:43 AM »
The -O3 binary may not load on 60D because it may be too big. Try adding this in CFLAGS: -DCONFIG_5D3_MINIMAL . This macro doesn't enable any 5D3-specific things, but only enables a minimal feature set (this) and the benchmarks are enabled. I've tried the macro on 5D2.

Marsu42

  • Contributor
  • Hero Member
  • *****
  • Posts: 1557
  • 66d + flashes
Re: Benchmarking & CFLAGS
« Reply #3 on: September 08, 2012, 10:14:34 AM »
I tried the cflags user and it didn't change size.

On my box, CFLAGS_USER for some reason or another sometimes isn't picked up by the main Makefile, make sure it gets through by using make V=1 and looking at the gcc lines! Or use the fool-proof version and Makefile.inc anyway. The problem simply is that the current Makefile is indeed quite a mess.

The -O3 binary may not load on 60D because it may be too big. Try adding this in CFLAGS: -DCONFIG_5D3_MINIMAL . This macro doesn't enable any 5D3-specific things, but only enables a minimal feature set (this) and the benchmarks are enabled. I've tried the macro on 5D2.

I used the 60d-only autoexec.bin from the 60D.111 directory - should adding -DCONFIG_5D3_MINIMAL do anything to this, i.e. is the 5d2/5d3 code compiled even into into the 60d binary?

But you might be correct, with -O2 "Free Memory: 70k + 1226k" .. what's the meaning of this anyway, maybe you could add an explanation in the source like "70k flash + 1226k ram".

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #4 on: September 08, 2012, 05:10:30 PM »
I did get the options in just using the I did get options in using makefile.inc... O3 optimizations are working but...

Processor/architecture does not work, compiles all modules successfully but won't cut the bin together form them.

Error is:
whatever.o uses VFP instructions, whereas magiclantern does not

How do I turn on VFP? I think its virtual floating point.

Marsu42

  • Contributor
  • Hero Member
  • *****
  • Posts: 1557
  • 66d + flashes
Re: Benchmarking & CFLAGS
« Reply #5 on: September 08, 2012, 06:17:39 PM »
How do I turn on VFP? I think its virtual floating point.

I changed nothing further in the Makefile, what compiler are you using? With the gcc versions I compiled myself it doesn't throw any error when using -ftune -or -fcpu. Try the binary arm from Launchpad if it works on your platform (doesn't on my x64 Ubuntu). And ff you want I can upload the build gcc build archive somewhere, the one from arm on launchpad has some bugs I had to workaround.

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #6 on: September 08, 2012, 07:20:09 PM »
I'm using GCC from summon arm. I think its arm-elf.  I'll update to newer gcc and see if that helps. I'm using the march stuff. how do I add ftune or fcpu?

Marsu42

  • Contributor
  • Hero Member
  • *****
  • Posts: 1557
  • 66d + flashes
Re: Benchmarking & CFLAGS
« Reply #7 on: September 08, 2012, 07:48:04 PM »
I'm using GCC from summon arm. I think its arm-elf.  I'll update to newer gcc and see if that helps. I'm using the march stuff. how do I add ftune or fcpu?

Maybe you should try arm-non-eabi, that's what I've compiled. I added the march stuff by using USER_CFLAGS (see first post), but you can also do it the foolproof way by replacing every "-march=armv5te" in Makefile by "-march=armv5te -mcpu=arm946e-s".

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #8 on: September 08, 2012, 08:07:01 PM »
Well I seem to have gotten it to stop giving the error when I just add -mcpu instead of  -march and mcpu. File size goes down a few kb. March was already in most of the things built.

What about -mfloat-abi?


Marsu42

  • Contributor
  • Hero Member
  • *****
  • Posts: 1557
  • 66d + flashes
Re: Benchmarking & CFLAGS
« Reply #9 on: September 08, 2012, 08:30:13 PM »
What about -mfloat-abi?

Though I am rather imtimate with OpenWrt and my mips router, I'm no arm compiler insider :-o ... What I did was just compile Linaro 4.7 with the arm scripts from Lauchpad (and not summon-arm) and add the flags nomad suggested, other than that I left the whole thing at it was except -O2 and -O3. For other options I can only read it, too: http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html ... and in this case it'll tell you to set it as soft because the arm core inside the eos cameras certainly doesn't have a floating point unit.

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #10 on: September 08, 2012, 08:36:07 PM »
Yea, I tried softfp and it kept looking for SD card. Hard doesn't compile.

I get a performance boost from -mcpu for sure. Going to try the none-eabi toolchain next and see if any improvements.

None-eabi doesn't produce much difference but will take mcpu/march together. Same binary size. Compiles a little faster tho.

miyake

  • Developer
  • Senior
  • *****
  • Posts: 396
Re: Benchmarking & CFLAGS
« Reply #11 on: September 16, 2012, 06:16:34 PM »
@Marsu42

Do you still have problem with CFLAGS_USER?
I just compiled linaro 4.7 (/arm-none-eabi-gcc-4.7.2), and test it.
I was set this flags.
Code: [Select]
CFLAG_USER = -DCONFIG_AUDIO_600D_DEBUG -march=armv5te -mcpu=arm946e-s -O3 \

Then "make 600D V=1"
Code: [Select]
~/sat/bin/arm-none-eabi-gcc-4.7.2 -Wp,-MMD,./.zebra.o.d -Wp,-MT,zebra.o -nostdlib -fomit-frame-pointer -fno-strict-aliasing -DCONFIG_MAGICLANTERN=1 -DCONFIG_600D=1 -DRESTARTSTART=0xC80100 -DROMBASEADDR=0xFF010000 -DVERSION=\"v2.3.NEXT.2012Sep17.600D102\" -DCONFIG_DEBUGMSG=0  -Os -Wall -W -mstructure-size-boundary=32 -Wno-unused-parameter -Wno-implicit-function-declaration -Wno-unused-function -Wno-missing-field-initializers -Wno-format -std=gnu99 -D__ARM__ -I. -I../../src  -DCONFIG_AUDIO_600D_DEBUG -march=armv5te -mcpu=arm946e-s -O3  -c -o zebra.o ../../src/zebra.c

My flags are confirmed.



 -march=armv5te -mcpu=arm946e-s -O3
Code: [Select]
[miyake@MLdev32 magic-lantern600daudio]$ ll platform/600D.102/autoexec.bin
-rwxrwxr-x. 1 miyake miyake 471040 Sep 17 01:10 platform/600D.102/autoexec.bin

 -march=armv5te -mcpu=arm946e-s
Code: [Select]
[miyake@MLdev32 magic-lantern600daudio]$ ll platform/600D.102/autoexec.bin
-rwxrwxr-x. 1 miyake miyake 319488 Sep 17 01:12 platform/600D.102/autoexec.bin

no additional flags
Code: [Select]
[miyake@MLdev32 magic-lantern600daudio]$ ll platform/600D.102/autoexec.bin
-rwxrwxr-x. 1 miyake miyake 319488 Sep 17 01:13 platform/600D.102/autoexec.bin


I think "-march=armv5te -mcpu=arm946e-s" has not effective for binaly....

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #12 on: September 16, 2012, 06:39:55 PM »
The user flags don't work so well for me. I had to throw it in makefile.inc. Shrank binary for me by like 5kb and gave a slight speed boost. ML dialogs sometimes drew really fast and you could see canon screen underneath for a msecond or so. Piggybacking is disabled. I think 03 gave most of the results, the other flags only a tiny bit. I'm using 4.6 precompiled, wonder if there is any difference with 4.7

miyake

  • Developer
  • Senior
  • *****
  • Posts: 396
Re: Benchmarking & CFLAGS
« Reply #13 on: September 16, 2012, 07:44:44 PM »
Not yet confirmed on actual camera.
I just saw ML menus and over-ray items(histgram, waveform) are really fast redraw.
*May be* CBR3.0 is more stable. I think some ML tasks free resources, So buffer full issue is reduced.
And Focus peak bench is faster about 3fps.
Probably, we can't use big binaly for size issue, but it's a good reason to change compiler environment.

big binary size.
Code: [Select]
-rwxrwxr-x. 1 miyake miyake 2723844 Sep 17 01:18 platform/all/autoexec.bin

Will try focus peak benti on 4.6 and 4.7 with O3

miyake

  • Developer
  • Senior
  • *****
  • Posts: 396
Re: Benchmarking & CFLAGS
« Reply #14 on: September 16, 2012, 07:49:40 PM »
Overwrite this script in git cloned folder. If you want.
http://chirari.ddo.jp/pub/betauploader/OnePercent/

Benchmark

linaro gcc 4.7 -O3
23sec => 43fps

gcc 4.6.2 -O3
23sec => 43fps

w/o O3
36fps


*******And found CFLAG_USER issue**************
You may CFLAG_USER to the final line on your Makefile.user.
http://www.magiclantern.fm/forum/index.php?topic=2582.0

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #15 on: September 16, 2012, 08:41:00 PM »
The only benefit from linaro was that it took the arm specific flags. If you used -mtune vs mcpu it worked on GCC and performance/size was similar. Compile time went down a little on linaro so I kept it. Binary size is fine, only problem is for unified.

miyake

  • Developer
  • Senior
  • *****
  • Posts: 396
Re: Benchmarking & CFLAGS
« Reply #16 on: September 16, 2012, 08:47:13 PM »
hmm.
Code: [Select]
arm-none-eabi-gcc-4.7.2: error: unrecognized command line option '-mtune'
arm-none-eabi-gcc-4.7.2: error: unrecognized command line option '-mcpu'

1%

  • Developer
  • Hero Member
  • *****
  • Posts: 5936
  • 600D/6D/50D/EOSM/7D
Re: Benchmarking & CFLAGS
« Reply #17 on: September 16, 2012, 11:01:35 PM »
-mtune=arm946e-s , etc that wasn't all typed out. It should take the parameters.

Marsu42

  • Contributor
  • Hero Member
  • *****
  • Posts: 1557
  • 66d + flashes
Re: Benchmarking & CFLAGS
« Reply #18 on: September 20, 2012, 11:15:26 AM »
The -O3 binary may not load on 60D because it may be too big. Try adding this in CFLAGS: -DCONFIG_5D3_MINIMAL.

Thanks again, that did it, -O3 now works on my 60D, too.

nikfreak

  • Developer
  • Hero Member
  • *****
  • Posts: 1197
Re: Benchmarking & CFLAGS
« Reply #19 on: July 15, 2014, 04:51:05 PM »
Did some testing regarding autexec binary sizes with different FLAGS and a fresh VM with GCC 4.8.4. Posted values are for building 6D.113. All changes I made were done to Makefile.setup and this is how it looks by default on bitbucket:
....
ifeq ($(TARGET_COMPILER), arm-gcc)
   CFLAGS += -Os -mthumb-interwork -march=armv5te \
            -D__ARM__
endif
....

Compiling that outputs autoexec.bin: 464016 bytes

Next try:
....
ifeq ($(TARGET_COMPILER), arm-gcc)
   CFLAGS += -Os -mthumb-interwork -march=armv5te -mfloat-abi=softfp \
            -D__ARM__
endif
....

autoexec.bin: 463440 bytes

Next try:
....
ifeq ($(TARGET_COMPILER), arm-gcc)
   CFLAGS += -O2 -mthumb-interwork -march=armv5te \
            -D__ARM__
endif
....

autoexec.bin: 500400 bytes

Next try:
....
ifeq ($(TARGET_COMPILER), arm-gcc)
   CFLAGS += -O3 -mthumb-interwork -march=armv5te \
            -D__ARM__
endif
....

autoexec.bin: 597936 bytes

Next try:
....
ifeq ($(TARGET_COMPILER), arm-gcc)
   CFLAGS += -O3 -mthumb-interwork -march=armv5te -mtune=arm946e-s \
            -D__ARM__
endif
....

autoexec.bin: 597360 bytes

I expect to hold my cam in hands once I am back @home. Could increased binary sizes brick my brand new cam? Ofc I wouldn't use them for now cause I have no clue what to benchmark and first would need to get a feeling for default nightlies but red colored flags and a combo of them might be an option to use for compiling (mfloat-abi=softfp / mtune=arm946e-s / O2 / O3 ) and test later. At least my VM is setup to built ML and therefore I tried to set some flags and compare the output - just need to get comfortable with hg and bitbucket as I solely used git to fetch code from github while compiling android roms...
70D.112 & 100D.101