At present, the spufs code has the invariant that a context is only ever loaded to an SPE when it is being run; ie, a thread is calling the spu_run syscall on the context.

However, there are situations where we may want to load the context without it being run. For example, to use the SPU's DMA engine from the PPE, requires the PPE thread to write to registers in the SPU's problem-state mapping (psmap). Faults on the psmap area can only be serviced while the context is loaded, so will block until someone runs the context. Ideally, we could allow such accesses to the psmap without the spu_run call. We also need to allow contexts to be loaded outside of spu_run to implement gang scheduling correctly and efficiently.

So, I've been working on some experimental changes to allow "external scheduling" for SPE contexts. The "external" refers to a thread external to the SPE's usual method of scheduling (ie, it's owning thread calling spu_run). In the example above, the external schedule would be caused by the fault handler for the problem-state mapping.

Although a context may be scheduled to an SPE, we still can't always guarantee forward progress. For example, in the "use the psmap to access the DMA engine" scenario, a DMA may cause a major page fault, which needs a controlling thread to service. In this case, the only way to ensure forward progress is through calling spu_run. However, I have some ideas on how we can remove this restriction later.

the interface

First up, we need to tell the spufs scheduler that we want a context to be loaded:

 * Request an 'external' schedule for this context.
 * The context will be either loaded to an SPU, or added to the run queue,
 * depending on SPU availability.
 * Should be called with the context's state mutex locked, and the context
 * in SPU_STATE_SAVED state.
int spu_request_external_schedule(struct spu_context *ctx);

These functions are implemented by incrementing or decrementing a count of "external schedulers" on the context. If multiple threads are requesting an external schedule, then the first will activate the context. When the last thread calls the cancel method, the context can be descheduled.


We can use these two functions to allow the problem-state mapping fault handler to proceed outside of spu_run:

--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -413,9 +413,11 @@ static int spufs_ps_fault(struct vm_area_struct *vma,

        if (ctx->state == SPU_STATE_SAVED) {
+               spu_request_external_schedule(ctx);
                spu_context_nospu_trace(spufs_ps_fault__sleep, ctx);
                ret = spufs_wait(ctx->run_wq, ctx->state == SPU_STATE_LOADED);
                spu_context_trace(spufs_ps_fault__wake, ctx, ctx->spu);
+               spu_cancel_external_schedule(ctx);
        } else {
                area = ctx->spu->problem_phys + ps_offs;

Note that the spu_cancel_external_schedule function doesn't unload the context right away; if it did, the refault would fail too, and we'd end up in an infinite loop of faults. Instead, it keeps the context scheduled for the rest of its timeslice. This gives the faulting thread time to access the mapping after the fault handler has been invoked.

We also need to do a bit of trickery with the priorities of contexts during external schedule operations. If a high-priority thread access the problem-state mapping of a low-priority context, we want the context to temporarily inherit the higher priority. To do this, we raise the priority when spu_request_external_schedule is called, and drop it back after the context has finished its timeslice on the SPU.

the code

I've created a development branch in the spufs repository for these changes, which is available:

  • via git: git://, in the ext-sched branch; or
  • on the browsable gitweb interface.

Note that this is an experimental codebase, expect breakages!