Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - sooda

#1
Did you configure all the necessary things for the zImage way? The initial decompressor needs a separate memory area that is a normal CONFIG_whatever setting IIRC.

About the context switches: the actual switch happens in arch/arm/kernel/entry-common.S, around ret_to_user(), in a macro called restore_user_regs. I can see that the program counter goes properly into the user process (disassembly in gdb shows its asm source, and the address is the same that is printed in load_flat_binary()), but a single step at the first "mov r2, 0x39" does not appear to run that but instead gdb freezes, and a ^C breaks properly but shows the machine in a strange state, pc being 0xc, in __vectors_start+12 which contains a jump to vector_pabt, which means an abort exception, and lr contains just 0x10 (the jump never happens though - as if it would trigger another abort again). Looks like 0xc is a prefetch abort in arm. I wonder if the memory is not set up correctly or some cache flag goes wrong in the context switch mangling?
#2
I just pulled again and tried qemu with a 700D build. It booted up until the delay loop calibration, where it got stuck every time as it got no timer interrupts. I copied the eos_trigger_int() call also to the "interrupt enable?" branch in eos_handle_timers() and got the timer interrupts properly in Linux and the boot process continued properly. Edit: Whoops, i didn't notice the change in the linux patch. now it works without changes. Great - now that qemu runs linux, gdb can be used to debug the process starting issues.
#3
That lkml answer is not quite complete, as it only describes exec done from an already existing user process via the execve() syscall. In Linux, context switches happen often implicitly. There is an explicit timer that runs CONFIG_HZ times per second for that, but often the scheduler runs implicitly in e.g. mutex_lock(), and several other places too (grep for cond_resched(), or might_sleep()). Process creation just sets up the process and puts it in the scheduler's task list, where it is started "sometime later", AFAIK. Returning from syscalls is one of those implicit places, like the lkml answer says. After having handled an interrupt is another I guess.

If you put printk()s all over the place where kernel_init() is called and run_init_process() too, you'll see that it just returns. Track schedule() to see that init actually gets set up:


--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2849,6 +2851,10 @@ asmlinkage __visible void __sched schedule(void)
        struct task_struct *tsk = current;

        sched_submit_work(tsk);
+       printk("schedule %d:%s @ %lx\n", tsk->pid, tsk->comm, instruction_pointer(current_pt_regs()));
        __schedule();
}
EXPORT_SYMBOL(schedule);


Result: continuous prints between init (its ip doesn't increment though :-/) and ksoftirqd. (not sure if ksoftirqd actually has something to do or not... and its ip appears zero since it doesn't have an user process - regs of a process get saved in the end of its stack; for kernel threads, that memory has other uses)



(the log contains some of my other debug prints too, but that's irrelevant)

I recommend a good book, as these things are not consistently described elsewhere unless you really dig through the whole code - Robert Love's Linux Kernel Development is a recent one. (edit: by consistent, I mean that you have to swim around the internets to find small bits of information at a time. That's probably necessary for some ARM specific stuff anyway, though.)

I'll discuss this at work - we've got many old Linux experts that are probably interested. Will keep you updated.
#4
A compressed initrd is easy with a small change. The atag for the ramdisk needs to be the actual, uncompressed size. Otherwise I got a "RAMDISK: incomplete write ...". It can also be bigger - say, just set it to 16MB and be done with it until the error appears again and double the size if necessary. It's usually unmounted and the memory freed anyway when the actual rootfs would be mounted from the memory card. When I hardcoded its exact uncompressed size in the bootloader and bzip2'd the image then the userspace init loaded successfully (the config didn't have gzip support enabled - with it set, also a gzipped image works).
#5
Thanks for the memory map, that confirms what I've read so far. (edit: oh, and any pointers for qemu? fiddling with the memory card is slowish)

The timer/irq stuff also confuses me. What do you mean by "sometimes it doesnt get the timer interrupt, or looses it and it wont fire again"? I'd like to get a stable led blinker and a proper clock at least for the kernel log... About the irq lookup failure comment in the code, I'm not sure if you first need to register irq numbers to specific handlers and stuff in order to work with that. I've only worked with slightly higher-level irqs before, so not sure about that before actually looking around the code.

About the general questions on what is possible: At first, this is obviously just playing around for fun to see how far we can go. There is this joke that you can easily ping localhost (i.e. have working network stack just inside the box) but not actually do anything meaningful (such as take pics, or talk with the outer world) on stuff that you install Linux on. I believe it would be easier to write modules as just userspace processes that would use drivers written into the kernel, with proper multiprocessing, maybe even a windowing system. How easy would that be for the end-user without a mouse is another question. With the original firmware missing, we obviously have full control of the hardware, but we'd still need to know how exactly to control it. With "normal" ML, we can jump into the original firmware to take photos and all that, but the firmware probably assumes that its own operating system is running, so any of this cannot be done from Linux.

I also wonder how not having an mmu (memory management unit, providing virtual addresses) affects the userspace. At least any kind of memory protection is not there, as all pointers represent physical memory, and all processes can poke around other processes' memory even if they just get pointers wrong. Less controlled application crashing, more memory corruption. And forking (duplicating processes) is done in some strange way. Linux was not really designed for mmu-less processors, but nowadays it still works kind of well, with certain restrictions.

One of the best things is that compiling stuff for Linux is kind of easy whatever the hardware is, and all the interfaces are well documented. But as the hw is different, you can't just take gimp built for a desktop pc and copy it to the memory card and expect it to work at all. (ML's module system is good already, though.)

Python or some other scripting language would also be interesting to get working asap as a proof-of-concept, as it wouldn't need cross-compiling (binary libs for it are a different thing). Of course, it's possible to run that also without Linux directly, but anyway.
#6
@g3gg0: yes, that's right. I just used a slightly different place to store the binaries and elf2flt to test things around. And the little readme was missing the instructions for the initrd image, but the usual loop device trick worked. I also didn't need to compile binutils separately since Gentoo's crossdev did that.

I was wondering what's the strange jump-straight-to-loadaddr change in load_flat_binary() (I believe it won't help/work in the long run), and took it away to see what happens. With some debug prints in schedule(), it seems that the process starts up (or is at least put somewhere to be scheduled properly), but its instruction pointer never increments from the start address between the calls to schedule(). Not sure if there's something wrong in some low-level assembly stuff that boots processes up?

Also, I'm slightly confused about some memory addresses (and the kernel is too). Backtraces (from WARN_ON debugging, and the panic seen by all) show correct numerical addresses that can be tracked back to file/line/function name as long as CONFIG_DEBUG_INFO is set to y in .config. However, the function names in the traces are completely bogus. I believe this is because the kernel is loaded as executable-in-place (why's that? it's meant for loading from rom, afaik) into a position (0x00800000) that is smaller than the ram start (0x01000000). How is that possible or is that just a different ram mapping? And why is it done this way in the bootloader? Anyway, to help debugging, I deleted the check "if (s->addr < kernel_start_addr) return 0;" in scripts/kallsyms.c:symbol_valid() as a workaround and got the backtraces working. That sanity check is to disable loading some unnecessary symbol names to the kernel.

About buildroot, I couldn't find support for this particular cpu in its menus and also it wants to use elf binaries even if mmu support is turned off. We could try if yocto is any better out-of-the-box.
#7
ok, i can has tools set up. verified that everything gets built by modifying the kernel's console writer into a rainbow, and added my name in the userspace tool that prints hello:



my phone's camera doesn't like blue though.
#8
Ah, I misread the assignment. Now it makes sense. I'll try to dig in but first these toolchains need to be set up...
#9
@g3gg0: could you spare some details? I can see some fb stuff in the patch, does it do anything yet? I grepped for the memory address (0xC0F140D0) and looks like disp_direct.c puts the actual memory buffer address there, and the kernel code just passes the address to the fb driver and I'd guess it would use that directly for writing the pixel data, so a pointer indirection seems to be missing.

I'll try to apply the patch on a kernel source and build things from scratch. Buildroot is a good idea too, I've used that before successfully for a whole tiny "distro".
#10
Confirmed for 700D.





This is awesome. I want to help. I know something about Linux source (I play with it for a living) and am willing to study whatever is necessary. For plain ML code too, of course.

(yiss, this is my first actual post.)
#11
User Introduction / Hello from Finland
April 02, 2015, 07:44:32 PM
Hi all,

I registered here about a year ago but have been only reading occasionally for now. We Finns are not that talkative usually. Since a kid, I've loved to take things apart just to see how they work, and sometimes to modify them to work better. I love nature, puzzles, numbers, bit twiddling and hardware, so naturally low-level hacking and reverse engineering are relevant to my interests. I've done several reverse engineering projects just for fun and taken part in some CTF contests, and C is my other native language. I'm not afraid to swim deep into assembly code either.

I registered initially to better look at ML's internals for my master's thesis project, which was about implementing a 3D scanning rig. ML was one of the reasons to choose Canon DSLRs for that, even though I didn't end up using too many features (yet), but just the possibility to write own stuff is awesome; I'm all for open source. (Writing the thesis isn't quite finished yet, so I have tried to stay away from ML to keep distractions minimal, as I have also a real job already.)

I've had a 700D for about a year now, shutter count is at ~14K, and I still can't take nice pictures; I only understand numbers and technology, not art. I'm a programmer for life and I patch the Linux kernel for embedded devices for living, in a company that makes GPUs and stuff. I hope to be useful for ML development too. TBH, ML source code isn't nearly as clean as my daily dose of Linux is (well, Linus is a bigger nazi asshole than you all combined ;)), but I wish that the main ML devs won't turn down all my wild ideas to make it more comfortable. I'm excited to start hacking on new features, general improvements and reverse-engineering, though I do have other projects in my drawers too. The Linux thread was the last straw; I really want to play with that too and to help to see if you can only ping localhost, or even run Gimp. Even though I hack Linux source daily, I'm no expert in porting Linux on a new platform, but I've been part in doing that before.