Did you configure all the necessary things for the zImage way? The initial decompressor needs a separate memory area that is a normal CONFIG_whatever setting IIRC.
About the context switches: the actual switch happens in arch/arm/kernel/entry-common.S, around ret_to_user(), in a macro called restore_user_regs. I can see that the program counter goes properly into the user process (disassembly in gdb shows its asm source, and the address is the same that is printed in load_flat_binary()), but a single step at the first "mov r2, 0x39" does not appear to run that but instead gdb freezes, and a ^C breaks properly but shows the machine in a strange state, pc being 0xc, in __vectors_start+12 which contains a jump to vector_pabt, which means an abort exception, and lr contains just 0x10 (the jump never happens though - as if it would trigger another abort again). Looks like 0xc is a prefetch abort in arm. I wonder if the memory is not set up correctly or some cache flag goes wrong in the context switch mangling?
About the context switches: the actual switch happens in arch/arm/kernel/entry-common.S, around ret_to_user(), in a macro called restore_user_regs. I can see that the program counter goes properly into the user process (disassembly in gdb shows its asm source, and the address is the same that is printed in load_flat_binary()), but a single step at the first "mov r2, 0x39" does not appear to run that but instead gdb freezes, and a ^C breaks properly but shows the machine in a strange state, pc being 0xc, in __vectors_start+12 which contains a jump to vector_pabt, which means an abort exception, and lr contains just 0x10 (the jump never happens though - as if it would trigger another abort again). Looks like 0xc is a prefetch abort in arm. I wonder if the memory is not set up correctly or some cache flag goes wrong in the context switch mangling?