fyi: gcc 4.9.3 breaks ml (sometimes)

Started by Marsu42, July 13, 2015, 09:26:58 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Marsu42

Just to spare all you good people some trouble and b/c it might be the natural inclination of the hardware hacker to use bleeding edge software, too...

... the latest gcc 4.9.3 (from launchpad) breaks my 60d build, the led activates on boot and that's that. It's working just fine though with gcc 4.8.4, so some new optimizations don't like the ml source or there are some bugs in 4.9.

Strangely, compiling the very same source with 4.9.3 for *6d* works just fine. Go figure :-p

nikfreak

no problems with 70D here for the past 6 months trying 2015 q1 / q2 gcc from launchpad. maybe related to digic4? did you already try to redefine the path to libc.a in makefile.user.default. It points to ml source dir by default. On a side note: has anybody tried to use newlib-nano already. My intention was to unify the build and drop dietlibc to see what I would get delivered in terms of binary size when only using newlib-nano. It would be much easier (also for newbies) having a single libc being used... In the end I was able to compile an autoexec.bin w/o the map-files by defining gcc instead of ld as the linker in combo with  -specs=nano.specs. I had some small success and was able to reduce autoexec.bin by some kbytes but it didn't feel like i purely used nano libs. Guess I have to wait for someone who has some experience with newlib-nano.
[size=8pt]70D.112 & 100D.101[/size]

Marsu42

Quote from: nikfreak on July 13, 2015, 09:37:58 PMno problems with 70D here for the past 6 months trying 2015 q1 / q2 gcc from launchpad. maybe related to digic4?

Must be *something* that works with 4.8 but not 4.9 on the 60d, so maybe really some digic4-stuff gets mis-compiled. Not that important, it's just a ~20k smaller binary with the newer gcc - since the memory issues have been resolved, I'm not that much into optimizing for size but rather for speed.

Quote from: nikfreak on July 13, 2015, 09:37:58 PMdid you already try to redefine the path to libc.a in makefile.user.default. It points to ml source dir by default.

Um, never heard of that idea. What are you talking about exactly, the only libc.a I see in makefile.user.default is the "NEWLIB_LIBC_A=$(NEWLIB_PATH)/libc.a" which indeed does end up in the src/libs dir - but where should I point it to instead?

nikfreak

Modify the newlib path to your downloaded 4.9.3 gcc arm path which contains libc.a or libc_nano.a.
ML extracts needed stuff like memcpy / atoi etc from different libs including dietlibc. As far as I recallonce you change the path you would additionally delete the line with memcpy-stub.o in makefile.src as it is obsolete.

Quote from: Marsu42 on July 14, 2015, 02:22:34 AM
...I'm not that much into optimizing for size but rather for speed.

I can report better performance on raw zebras with O2 / O3 also read32/64 but no positive effects on write speed. Therefore I tried to see if I can replace dietlibc with the newlib-nano and compare the end results
[size=8pt]70D.112 & 100D.101[/size]

Marsu42

Quote from: nikfreak on July 14, 2015, 06:43:34 AM
Modify the newlib path to your downloaded 4.9.3 gcc arm path which contains libc.a or libc_nano.a.

Thanks, two questions though even if I appear rather incompetent (again :-p)...

1. What's the difference between vanilla libc and _nano, what's the upside of manually specifiying the _nano flavor in the Makefile.user?

2. Just replacing the path doesn't work for me, the compiler error is (cygwin with launchpad gcc) ... any hints?


make -C  /cygdrive/H/SHARED/ml/ml-modm42/platform/60D.111
make[1]: Entering directory '/cygdrive/H/SHARED/ml/ml-modm42/platform/60D.111'
[ VERSION  ]   ../../platform/60D.111/version.bin
[ VERSION  ]   ../../platform/60D.111/version.c
[ CC       ]   version.o
make -C ../../tcc
make[2]: Entering directory '/cygdrive/H/SHARED/ml/ml-modm42/tcc'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/cygdrive/H/SHARED/ml/ml-modm42/tcc'
make[1]: *** No rule to make target 'C:\cygwin\opt\gcc-arm-484\arm-none-eabi\lib\libc.a', needed by 'lib_a-setjmp.o'.  Stop.
make[1]: Leaving directory '/cygdrive/H/SHARED/ml/ml-modm42/platform/60D.111'
Makefile:18: recipe for target '60D' failed
make: *** [60D] Error 2


Quote from: nikfreak on July 14, 2015, 06:43:34 AM
I can report better performance on raw zebras with O2 / O3 also read32/64 but no positive effects on write speed. Therefore I tried to see if I can replace dietlibc with the newlib-nano and compare the end results

Well, write speed is a train wreck on the 6d anyway, so never mind that :-p ... but somewhat faster overlays are definitely a plus as it's immediate better usability.

nikfreak

Can't speak for cygwin. On Ubuntu I have no probs with using the delivered libc.a from 4.9.3 launchpad embedded gcc. libc_nano.a is said to have same or better performance compared to dietlibc. That's what I seen on forum discussions on the inet. You should first try to use the libc.a from 4.9.3 rather than libc_nano.a and see if that solves your 60D issue.

Btw: I think you use 4.9.3, sure you do?

Quote from: Marsu42 on July 14, 2015, 10:14:22 AM
...
make[1]: *** No rule to make target 'C:\cygwin\opt\gcc-arm-484\arm-none-eabi\lib\libc.a', needed by 'lib_a-setjmp.o'.  Stop.
...

Well, write speed is a train wreck on the 6d anyway, so never mind that :-p ... but somewhat faster overlays are definitely a plus as it's immediate better usability.

As said overlays / zebras being more fluid is welcome for me too but I think a1ex is more interested in small binary size rather than blowing it up. Once the following PR's are merged there should be some room for gcc flag-related optimizations:

https://bitbucket.org/hudson/magic-lantern/pull-request/610/benchmarks-and-self-tests-refactored-as/diff
https://bitbucket.org/hudson/magic-lantern/pull-request/603/dng-module-wip/diff
[size=8pt]70D.112 & 100D.101[/size]

Marsu42

Quote from: nikfreak on July 14, 2015, 11:53:30 AMOn Ubuntu I have no probs with using the delivered libc.a from 4.9.3 launchpad embedded gcc. libc_nano.a is said to have same or better performance compared to dietlibc. That's what I seen on forum discussions on the inet.

My 6d works fine with both launchpad's 4.9 libc.a and libc_nano.a, and from what I understand I should prefer the latter? Btw with 4.9 I really had to delete the memcpy-stub line while with 4.8 it was working 'as is'.

Most likely the differences are minor anyway, but makes you feel special to have a custom ML running :-p

Quote from: nikfreak on July 14, 2015, 11:53:30 AMYou should first try
to use the libc.a from 4.9.3 rather than libc_nano.a and see if that solves your 60D issue.

Nope, still crashes on start - but works fine with 4.8.4 launchpad libc, so the compiler seems to take the blame or some digic4 garbled ml boot code. Probably there's a reason why the default makefile uses 4.8 :-p

Quote from: nikfreak on July 14, 2015, 11:53:30 AM
Can't speak for cygwin.

I solved that: as the gcc binary is mingw/windows, it wants to have the libc path in windows notation - unlike the rest of the cygwin makefile which wants the unix-ish paths. Easy to get confused with mixing these both systems, but cygwin is just darn convenient for a quick ml compile.

Quote from: nikfreak on July 14, 2015, 11:53:30 AM
Once the following PR's are merged

Yeah, right - but it's rather *if* than once :-p though I do like the dng module approach. Really not picking on alex here, but the time doesn't seem very convenient for big ml refactoring... real life tends to get in the way and code beautification might not make it to the top of the "most wanted" list.

jpaana

4.9 also miscompiled 5D3 and M builds last time I checked, same symptoms, ie. the led stays on and nothing else happens and neither of them has Digic 4.

Marsu42

Quote from: jpaana on July 20, 2015, 10:35:32 AM
4.9 also miscompiled 5D3 and M builds last time I checked, same symptoms, ie. the led stays on and nothing else happens and neither of them has Digic 4.

Doh - so someone has to sit down, look at the 4.8 -> 4.9 changes and try to track down the culprit by disabling new optimizations.

Btw I faintly remember compiling 60d with 4.9 some longer time ago, so the bug might have been introduced in a very recent 4.9 minor release - if I'm not mistaken going back some revisions on launchpad might be worth a try if anyone is desperate for a 4.9 compile (resulting in a bit smaller binary).

a1ex

I've got the 60D working after disabling CONFIG_TSKMON, then narrowed the issue down to null_pointer_check. Then I've got the following minimal example that causes the crash:


    int value_at_zero = MEM(0);
    /* note: MEM is defined as: #define MEM(x) *(volatile uint32_t*)(x) */


ASM code created by gcc 4.9:

.text:1FE3818C 00 30 A0 E3                 MOV     R3, #0
.text:1FE38190 00 30 93 E5                 LDR     R3, [R3]
.text:1FE38194 F0 00 F0 E7                 UND     #0


ASM by gcc 4.8:

.text:1FE382D4 00 30 A0 E3                 MOV     R3, #0
.text:1FE382D8 00 30 93 E5                 LDR     R3, [R3]
.text:1FE382DC 1E FF 2F E1                 BX      LR


What the duck?!

It appears to be related to this option:
Quote
-fdelete-null-pointer-checks
    Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. This option enables simple constant folding optimizations at all optimization levels. In addition, other optimization passes in GCC use this flag to control global dataflow analyses that eliminate useless checks for null pointers; these assume that a memory access to address zero always results in a trap, so that if a pointer is checked after it has already been dereferenced, it cannot be null.

    Note however that in some environments this assumption is not true. Use -fno-delete-null-pointer-checks to disable this optimization for programs that depend on that behavior.

    This option is enabled by default on most targets. On Nios II ELF, it defaults to off. On AVR and CR16, this option is completely disabled.

    Passes that use the dataflow information are enabled independently at different optimization levels.

edit: seems to work, https://bitbucket.org/hudson/magic-lantern/pull-requests/671/updated-to-gcc-493