Dual-core cams direct RPC

Started by names_are_hard, November 05, 2023, 06:21:21 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

names_are_hard

Digic 7, 8 and X are dual-core parts.  This is distinct from "Dual Digic" - that means a cam with two separate Digic cores, on different "sockets".  Dual-core is two cores on the same die.

We found task_create_ex() a long time ago, that allows one core to create a task that starts on the other.  This task goes into normal task scheduling, which has pre-emption and priorities, so when it runs, or how often, is not strongly controlled by the code creating the task.

Recently I found a method for direct RPC.  A core can send a function address to another core, which will immediately switch to that code, regardless of current task.  In fact, this works before the task system has been initialised, which is valuable.

All addresses are from 200D 1.0.1 unless otherwise stated.  These are fairly easy to find in other cams, there are lots of RPC strings, including to create_named_semaphore().  There's an associated spinlock global that helps, too.

Top-level DryOS function looks like this, 0x34b5a:

int _request_RPC(func *param_1, void *func_param)


The first param is a function pointer that takes void * and returns void.  At least on 200D, the param must fit in a single register (no large structs), because a blx r1 happens at one point.

_request_RPC() disables interrupts then calls an inner function that does the work.  The target func address is written into a global - there's a single address for this so there can only be one RPC func in progress at a time.  On Canon side a global spinlock is used to ensure no conflicts.  ML side we use a semaphore.

The caller then wakes up all CPUs and sends an interrupt: send_software_interrupt(0xc, cpu_mask);

During early init, both CPUs have registered a handler for SGI 0xc, 0x349d6:


void register_SGI_handler_0xc(void)

{
  register_GIC_handler(0x1cc, check_for_RPC, 0);
  return;
}

The handler looks like this, 0x349ba:

void check_for_RPC(void)
{
  uint cpu_id = get_current_cpu();
  if ((1 << (cpu_id & 0xff) & ~*(uint *)(inter_cpu_spinlock - 4)) == 0) {
    call_RPC_func();
    return;
  }
  return;
}


It looks to me like this is generic code that can cope with up to 8 cores, allowing any core to trigger a function on a masked set of cores (including itself if desired).  Presumably this is library code, and it looks a bit redundant when MAX_CPUS is set to 2.  It seems clear that SGI 0xc is reserved for inter-core comms.

The reason this was relevant to me, is that I've been working on MMU remapping code.  For this, you want each CPU to use the remapped addresses as early as possible - or it will call code that uses the old content, a particular problem if it initialises an external device.  Prior to finding this RPC code, I had no generic way to get cpu1 to see patched memory before it initialised its tasks.  Now, for all the D78X cams I've checked, it's possible to get cpu1 to take patched mem before it starts init1_task...  which means all tasks on both cpus will see our updated memory content.

As a side benefit this means we can intercept init1_task in the same way we do for init_task on cpu0.

Using the DryOS functions directly is a little tricky.  When a cpu wakes up and calls the target RPC function, it's in a loop, and will keep calling it, until the function pointer gets set NULL!  In practice, this means the target function is responsible for clearing the global, or the cpu will loop forever, constantly calling it.  You also must ensure the passed in params remain valid until the RPC call completes - this is easy to forget since the call happens on cpu1; you could easily use local vars for storage, they're on cpu0 stack, and your function ends before cpu1 reads from that stack...

Consequently, in ML code, I wrap this in request_RPC(), that tries to make things safe, in dryos_rpc.c:


struct RPC_args
{
    void (*RPC_func)(void *);
    void *RPC_arg; // argument to be passed to above func
};

int request_RPC(struct RPC_args *args)
{
    extern int _request_RPC(void (*f)(void *), void *o);

    // we can only have one request in flight, cpu0 takes sem
    int sem_res = take_semaphore(RPC_sem, 0);
    if (sem_res != 0)
        return -1;

    // storage for RPC params must remain valid until cpu1 completes request,
    // don't trust the caller to do this.
    static struct RPC_args RPC_args = {0};
    RPC_args.RPC_func = args->RPC_func;
    RPC_args.RPC_arg = args->RPC_arg;

    int status = _request_RPC(do_RPC, (void *)&RPC_args);
    return status;
}

void do_RPC(void *args)
{
    extern int clear_RPC_request(void);
    struct RPC_args *a = (struct RPC_args *)args;
    a->RPC_func(a->RPC_arg);
    clear_RPC_request();
    a->RPC_func = 0;
    a->RPC_arg = 0;
    // request complete, cpu1 releases sem
    give_semaphore(RPC_sem);
}