TrapHandlers – Oberon RTK

Overview

This document and the accompanying test program TrapHandlers explore and evaluate the use of synchronous trap exceptions as means to grant exclusive mutating access to shared data, or other protected functionality or hardware.

The following topics are covered:

basic exception behaviour, regarding exception priorities and exception numbers;
using trap handlers to implement exclusive access to protected data and functions;
passing parameters to trap handlers.

We’ll use module Alarms to trigger hardware-based asynchronous interrupts.

Program Description

Structure

The test program TrapHandlers implements

three alarm interrupt handlers, which will be triggered with different timing relationships, interrupt priorities, and alarm numbers to evaluate the interrupt behaviour, first the basics, then when triggering trap handlers (TrapHandlers.ah0, TrapHandlers.ah1, and TrapHandlers.ah2);
a trap exception handler in various implementation variants, which will synchronously be triggered by software, not hardware, both from alarm interrupt handlers as well as from program thread mode code (Kernel threads or otherwise), using one of the unwired interrupts on the NVIC (TrapHandlers.th0, TrapHandlers.th1, TrapHandlers.th2, TrapHandlers.th3);
procedures to trigger the trap exception handlers from code in thread mode (TrapHandlers.tt0, TrapHandlers.tt1).

The trap mechanism is of specific interest with view on the kernel-v2, as well as other similar system modules. This program, however, does not make use of the kernel.

Timing and Pre-caching

The measured times are used to interpret the results, so we want to get them as precise and consistent as possible.

Hence, the test handlers and procedures are pre-cached to avoid any influence of the flash memory caching mechanism. Also, the handlers are coded in-line, without calls to library code, eg. to de-assert the alarm interrupts, to get consistent and comparable results without the need to also pre-cache any library code.

Test Cases

Program module TrapHandlers is structured furthermore into test cases that can be selected in TrapHandlers.run, and which do all the required set-up and selection of code paths in the different procedures.

As an aside, there’s a lot of duplicated code in these test cases, which could be factored out, but I have left it as unwieldy as it is, since there is the advantage that the parameters for each test case are clear and visible in one spot.

Data Recording and Output

The different handlers and procedures collect their run data, which will then be printed by yet another alarm handler TrapHandlers.pR.

Terminology

I may at times use language such as “ah1 fired at 10250”, in lieu of “the alarm triggering ah1 fired at 10250”, which is not precise, but avoids lengthy wording.

Basic Test Cases

Test Case 0: Baseline

ah1 fires after the run time of ah0.
Nothing really interesting here, just to explain the output.

Build and run TrapHandlers, which prints to the serial terminal:

test case: 0
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000      --      --   10112   112    -    -
   1  ah1   0    2   10250   10250      --      --   10363   113    -    -

rec: run record, ie. the recorded data set in the sequence as collected by the test handlers and procedures;
id: the id of the test handler or procedure, eg. ah0, ah2 th1, or tt1, referring to their procedure name;
int: interrupt number, if applicable; note that the alarm number corresponds to the interrupt number;
prio interrupt priority, if applicable;
alarm: alarm trigger time, if applicable;
begin: time at the start of the handler;
end: time at the end of the handler;
rtm: run time, ie. end - begin.

The other output data/columns are not yet relevant for the first series of test cases. We’ll introduce them for the relevant test cases.

All times are in microseconds (us), read from the timer device with the usual caveats.

Observations:

With the alarms 250 us apart (at 10000 and 10250) and run times of just above 100 us, we just see the two corresponding handlers ah0 and ah1 triggered and executed independently, and without interaction.

Test Case 1

ah1 fires during the run time of ah0.
ah0 and ah1 have the same priority.

test case: 1
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000      --      --   10112   112    -    -
   1  ah1   0    2   10050   10113      --      --   10227   114    -    -

Observations:

ah1 fires at 10050, but only starts executing at 10113, right after ah0 ends;
this is consistent with the ARMv6 specs: an interrupt triggered during the run time of another will remain pending, and then be executed right after the running one, if the second interrupt has equal or lower priority (note: lower prio numbers mean higher prio in the hardware);

The RP2040 implements tail-chaining, hence ah1 executes without unstacking after ah0 and stacking again for ah1.

Test Case 2

ah1 fires during the run time of ah0.
ah0 and ah1 have different priority.

test case: 2
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000      --      --   10228   228    -    -
   1  ah1   0    1   10050   10050      --      --   10163   113    -    -

Observations:

ah1 fires at 10050, and immediately starts to execute, preempting ah0, which then finishes after ah1 has terminated;
this is consistent with the ARMv6 specs: a higher prio interrupt preempts a running lower prio one;
the state of ah0 is stacked and then unstacked upon entry and exit of ah1, respectively.

Test Case 3

ah0 and ah1 fire at the same time.
ah0 and ah1 have the same priority.

test case: 3
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah1   0    2   10000   10000      --      --   10114   114    -    -
   1  ah0   1    2   10000   10114      --      --   10228   114    -    -

Observations:

ah1 gets executed before ah0, the latter is executed right after the former;
this is consistent with the ARMv6 specs: if two interrupts with the same prio become pending at the same time, the one with the lower interrupt number gets executed first.

Due to tail-chaining, ah0 is executed without unstacking after ah1 and stacking again for ah0.

About Traps

Overview

The term trap is borrowed from operating systems concepts and design. It is used to denote access to, and execution of, system level functions, with two main aspects:

raised execution privilege level to operate low level facilities, including the hardware, and
exclusive access to mutate protected data, via defined corresponding (API) function calls.

In Oberon RTK, we want to use the trap concept for protecting the kernel data when accessing and executing kernel functions. Other comparable modules can possibly use the same approach.

A related trap concept is used by the Astrobe compiler: a failed run-time check will result in the immediate execution of the SVC exception, which in turn is handled by RuntimeError.errorHandler. As with the trap handlers described here, triggering the error trap is synchronous with the faulty program code.

Raised Privilege Level

Without any operating system support, on an embedded ARM MCU, we usually have two privilege levels, Unprivileged and Privileged (no kidding), as implemented in the hardware. With the Cortex-M0/M0+ (ARMv6), without a corresponding architecture extension,¹ all code is always running in privileged mode, hence there’s no need to raise it for access to all register addresses and privileged registers and operations.

Other Cortex-M MCUs (ARMv7) offer unprivileged and privileged execution modes, however with Astrobe all thread mode code always runs privileged, too, unless the programmer changes that: the MCU starts from reset in privileged thread mode, so changing to running unprivileged means explicit measures – default is privileged.

Exception handlers execute in handler mode, all other code in thread mode. An exception handler always executes in privileged mode, as does thread mode code on the M0/M0+, as outlined above, but code running in thread mode on the M3 and above can be privileged or unprivileged. For clarity, thread mode here denotes a mode of the MCU; a kernel thread is a different concept, even though kernel threads usually do run in MCU thread mode.²

As an aside for now, to add some more fun, there’s another, orthogonal concept: thread mode code can use the main stack (MSP) or the process stack (PSP), handler mode code always uses the main stack. Exception stacking can happen in the process stack or the main stack. We’ll encounter that further down when discussing parameter passing.

Exclusive Protected Data (or Functionality) Access

Access to protected (kernel) data requires synchronisation among kernel threads, as well as with exception handlers. If we access the protected data exclusively via exceptions – the trap handlers –, and ensure exclusivity via a defined exception priority scheme, where all code that needs to mutate the protected data cannot preempt a currently running trap handler, the MCU’s Nested Vectored Interrupt Controller (NVIC) will do the legwork for us in hardware.

Trap (Exception) Handlers

As we have observed with the above test cases, the NVIC implements and enforces clear rules of how different exception handlers interact or interfere with each other, which allows to implement both aspects of traps, namely raising the privilege level as well as exclusive write access to data, using exceptions, without resorting to other synchronisation methods.

The basic concept is simple and straight-forward:

Identify
- the protected data: to be protected from being mutated at the same time by different threads or exceptions handlers, as well as
- the requesting or triggering code: the potentially mutating sections of code, usually via specific procedures.
Trap priority: define an exception priority level where no preemption of the trap handler by any requesting code will occur. If the requesting code runs in thread mode, any trap priority will do, but if it runs in handler mode, the trap priority must be higher than the requesting handler priority.
Trap handler: implement and configure one or more exceptions handlers to mutate the data, running at the trap priority level. Only the trap handlers are allowed to mutate the protected data.
Trigger the trap handlers from code in thread or handler mode. This is a synchronous operation with respect to the triggering code, unlike an asynchronous interrupt triggered by the hardware. An asynchronous interrupt can trigger a trap, but from the perspective of its handler this will be a synchronous operation.

Trap Handler Exception

The SVC system exception would be the obvious candidate to use for traps as described here, but Astrobe uses it for run-time error trapping. While this could easily be resolved, ie. using the SVC exception for both trap handlers and run-time error trapping, it’s useful to have run-time error detection also in trap handlers, and SVC exceptions cannot be used within the SVC exception handler.³

However, of the 32 interrupts of the RP2040 and its NVIC, only 26 are actually wired ({0..25}), so six ({26..31}) can be used for this purpose.

We’ll use interrupt 26 here for the test traps. The trap interrupt is triggered by setting it to pending state from software. Since, according to the concept rules, the trap handler has higher priority than the requesting code, it will execute immediately.

Note the disadvantage of using an interrupt this way: we need two CPU registers to set it to pending, see below.

Test Cases for the Trap Handler Triggered by Handler Mode Code

In the following test cases, both handlers ah0 and ah1 attempt to get access to the protected data, by triggering th0. Both ah0 and ah1 need to be of lower prio than th0, according to the concept rules.

Before triggering the trap interrupt, we store two (randomly chosen) parameter “marker” values in registers R3 and R12, respectively, which are read and registered by the trap handler th0. Together with the timing data, we can then assess if the lockout works, and if the trap handler executions are actually initiated by the corresponding requesting code ah0 or ah1, respectively.

We’ll look at passing parameters to trap handlers later.

Test Case 4

ah1 is triggered while ah0 runs, but before ah0 triggers th0.
ah0 and ah1 have the same priority.

test case: 4
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000   10001   10112   10226   226   13  -13
   1  th0  26    1      --   10113      --      --   10225   112   13  -13
   2  ah1   0    2   10050   10226   10227   10340   10455   229   42  -42
   3  th0  26    1      --   10342      --      --   10455   113   42  -42

Additional relevant data columns:

p-th: time of the start to prepare the trap handler (parameters);
t-th: time just before the trap handler is triggered;
p0: parameter 0, as passed to the trap handler by ah0 and ah1, and as read by the trap handler th0, respectively;
p1: parameter 1, analogously.

Observations:

ah0 fires and starts to run at 10000, triggers th0 at 10112, which starts to run at 10113, and ends at 100225;
ah1 fires at 10050, but does not preempt ah0 (same prio) and neither th0 (higher prio), so starts to run at 10226, ie. right after th0 as triggered by ah0;
importantly, each th0 run is uninterrupted, correctly as triggered by the requester ah0 or ah1 (compare the “marker” parameter data), so any data mutations (or other protected functions) by th0 are safe.

Test Case 5

ah1 is triggered while th0 triggered by ah0 runs.
ah0 and ah1 have the same priority.

test case: 5
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000   10001   10112   10226   226   13  -13
   1  th0  26    1      --   10113      --      --   10225   112   13  -13
   2  ah1   0    2   10150   10226   10227   10340   10454   228   42  -42
   3  th0  26    1      --   10341      --      --   10454   113   42  -42

Observations:

again, ah1 cannot preempt running th0, as the latter runs at higher prio than the former.

Test Case 6

ah1 and ah0 are triggered at the same time.
ah0 and ah1 have the same priority.

test case: 6
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah1   0    2   10000   10000   10001   10113   10227   227   42  -42
   1  th0  26    1      --   10114      --      --   10227   113   42  -42
   2  ah0   1    2   10000   10227   10229   10341   10454   227   13  -13
   3  th0  26    1      --   10341      --      --   10454   113   13  -13

Observations:

since ah1 is assigned a lower interrupt number, it takes precedence over ah0 with equal priority when triggered at the same time;
th0 runs accordingly.

Test Case 7

ah1 is triggered while ah0 runs, but before ah0 triggers th0.
ah0 and ah1 have different priority.

test case: 7
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    3   10000   10000   10001   10343   10456   456   13  -13
   1  ah1   0    2   10050   10050   10051   10164   10279   229   42  -42
   2  th0  26    1      --   10166      --      --   10278   112   42  -42
   3  th0  26    1      --   10343      --      --   10456   113   13  -13

Observations:

ah1 preempts ah0, and triggers “its” trap handler run th0;
ah0 will continue after th0 triggered by ah1 terminates;
the state of ah0 is saved and restored by stacking/un-stacking of ah1.

Test Case 8

ah1 is triggered while th0 triggered by ah0 runs.
ah0 and ah1 have different priority.

test case: 8
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    3   10000   10000   10001   10113   10455   455   13  -13
   1  th0  26    1      --   10113      --      --   10226   113   13  -13
   2  ah1   0    2   10150   10226   10227   10340   10455   229   42  -42
   3  th0  26    1      --   10342      --      --   10454   112   42  -42

Observations:

ah1 cannot preempt th0 triggered by ah0.

Test Cases for the Trap Handler Triggered by Thread Mode Code

In the following test cases, both handler ah1 and thread mode code tt0 attempt to get access to the protected data, by triggering th0. Both ah1 and tt0 need to be of lower prio than th0, according to the concept rules. tt0 implicitly has the lowest priority, ie. lower than any exception handler.

This scenario reflects, for example, when we have a kernel thread enabling another (tt0), while the SysTick timer handler does its spiel (ah1), both wanting to mutate the corresponding kernel data.

Test Case 9

ah1 is triggered while tt0 runs, but before tt0 synchronously triggers th0.
tt0 is thread mode code and therefore has no priority.

test case: 9
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  tt0   -    -      --   10000   10001   10342   10455   455   13  -13
   1  ah1   0    2   10050   10050   10051   10164   10279   229   42  -42
   2  th0  26    1      --   10166      --      --   10278   112   42  -42
   3  th0  26    1      --   10342      --      --   10455   113   13  -13

Observations:

ah1 preempts tt0, since its priority is implicitly higher than the thread mode code;
tt0 will continue after th0 triggered by ah1 terminates;
the state of tt0 is saved and restored by stacking/unstacking of ah1.

Test Case 10

ah1 is triggered while th0 triggered by tt0 runs.
tt0 is thread mode code and therefore has no priority.

test case: 10
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  tt0   -    -      --   10000   10000   10111   10454   454   13  -13
   1  th0  26    1      --   10112      --      --   10224   112   13  -13
   2  ah1   0    2   10150   10224   10225   10338   10454   230   42  -42
   3  th0  26    1      --   10340      --      --   10453   113   42  -42

Observations:

the running th0 as triggered by tt0 cannot be preempted by ah1, which has to wait until th0 ends.

Test Case 11

ah1 and tt0 are triggered, or start at, respectively, at the same time.

test case: 11
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah1   0    2   10000   10000   10001   10114   10229   229   42  -42
   1  th0  26    1      --   10116      --      --   10228   112   42  -42
   2  tt0   -    -      --   10230   10230   10341   10454   224   13  -13
   3  th0  26    1      --   10342      --      --   10454   112   13  -13

Observations:

basically the same as test case 10, there’s nothing special with starting at the same time when considering a handler and thread mode code.

Passing Parameters to a Trap Handler

Overview

To use a trap handler for system functions, we want to be able to pass parameters. For example, in a generalised set-up, we only have one kernel (or OS) trap, which will execute the requested function based on a corresponding selection parameter. In the kernel, where we use a trap to put a thread on the run-queue, we need pass along the id of the thread.

An exception handler is a parameterless procedure, whose address is selected from the vector table by the hardware exception mechanism, and “called” by putting that address into the program counter. There’s no concept of parameters. Hence we need to pass the parameters in a different fashion than to non-handler procedures.

Basically, we need to prepare the parameter data in a suitable storage location, before triggering the handler, and the handler will then pick the data up when it runs.

Module Variables vs. Registers and Stack

Parameter storage in module variables is out of question, since that is not thread-safe. For example, if a thread were to start to prepare the parameter data in module variables, with the intention to then trigger the trap handler, but an exception handler is triggered during this preparation process, which also intends to trigger the same trap handler, and also stores its parameters in the same module variables, the parameters put there by the thread would be corrupted. Of course, we could disable all exceptions during the parameter preparation, but disabling all exceptions should be avoided if there are better alternatives, not least since the whole trap concept attempts to avoid just that. :)

The alternative is to use CPU registers and the stack. If the thread in the above example stores the parameters in registers, the incoming exception would push the values of registers r0 to r3, and r12, onto the stack upon exception entry (stacking), and restore them upon exception exit (unstacking). Hence, storing the parameters in registers protects against incoming exception handlers, without disabling them.

Parameter Pickup by the Trap Handler

The question is: where shall the trap handler pick up its parameters, from the registers directly, or the stack.

In the above test cases, with the “marker” parameters stored in R3 and R12, the trap handler th0 picked them up directly from the registers. This has worked so far, since we don’t have any other exception handlers than the ones used in our test framework so far.

This would even work in general, if there weren’t a pesky edge case: if an accepted exception, say the trap handler, is in the process of entry stacking, and another exception becomes pending, and has higher priority than the original one, the higher prio exception gets actually executed first, and only thereafter the original one (late arrival optimisation).

Basically, the higher prio exception usurps the exception execution. Importantly, it runs with the same stacking as the original exception, ie. does not do its own stacking. After the higher prio exception terminates, the original exception runs, but the potential problem is that the higher prio exceptions may have changed the registers’ contents.

Consequently, it’s not safe to pick up the parameters from the registers directly, they need to be read from the stack.

Test Cases with Parameters

The following test case 12, which shows how an incoming high prio exception can interfere with directly reading the parameters from the registers, is a bit shaky: the high prio exception ah2 has to fire exactly when the stacking for the trap handler th0, triggered by ah0, happens. That’s 15 clock cycles. The register stacking and fetching of the exception vector run purely in hardware, without reading any instructions, so with a 125 MHz clock we have a window of 15 * 8 ns = 120 ns, ie. about 1/8 of a microsecond. Hard to hit with a microseconds timer.

I have fiddled with the while loop in ah0 to get this working, but unfortunately it’s not a stable, repeatable test case, and on a different MCU specimen, it will likely not be the same. Then again, the beauty of falsification is that we only need one failed experiment⁴ to topple a theory or concept. :)

Test Case 12: Read from Registers, ah2 Interfering

ah0 prepares and triggers th0, while the “trouble maker” ah2 fires exactly when the exception entry for th0 happens.
ah2 changes one of the parameter registers to -99.
th0 reads the parameters directly from registers.

test case: 12
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000   10001   10112   10251   251   13  -13
   1  ah2   2    0   10113   10113      --      --   10138    25    -    -
   2  th0  26    1      --   10138      --      --   10250   112  -99  -13

Observations:

th0 reads a wrong parameter directly from the register, as “injected” by ah2.

Test Case 13: Read from Stack, ah2 Not Interfering

Same as test case 13, but:
Now using th1: read the parameters from the stack.

test case: 13
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000   10001   10112   10256   256   13  -13
   1  ah2   2    0   10113   10113      --      --   10138    25    -    -
   2  th1  26    1      --   10138      --      --   10256   118   13  -13

Observations:

th1 reads the correct value from the stack, where it was put by the exception entry stacking for th1.

Before or after the hardware exception entry time window, we just have a normal higher prio exception, with regular stacking and unstacking.

Test Case 14

ah0 prepares and triggers th1, trouble maker ah2 fires and runs during this preparation.
ah2 changes one of the parameter registers to -99.
th1 reads the parameters from the stack.

test case: 14
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000   10001   10138   10257   257   13  -13
   1  ah2   2    0   10050   10050      --      --   10075    25    -    -
   2  th1  26    1      --   10138      --      --   10256   118   13  -13

Observations:

Normal stacking and unstacking for ah2 while ah0 sets the parameters for th1, without impact on the interaction of ah0 and th1.

Test Case 15

ah0 prepares and triggers th1, trouble maker ah2 fires and runs during the execution of th1.
ah2 changes one of the parameter registers to -99.
th1 reads the parameters from the stack.

test case: 15
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  ah0   1    2   10000   10000   10001   10112   10257   257   13  -13
   1  th1  26    1      --   10113      --      --   10256   143   13  -13
   2  ah2   2    0   10150   10150      --      --   10175    25    -    -

Observations:

Normal stacking and unstacking for ah2 while th1 runs, without impact on the interaction of ah0 and th1.

Getting the Parameters from the Stack

When reading the parameters from the stack, we need to pay attention if both the process and the main stack pointers are used, ie. PSP and MSP, respectively. If the PSP is used, the stacking in thread mode happens on the process stack, but the handler itself uses the main stack. If only the MSP is used, or when a handler preempts another, also the stacking happens there.

The value EXC_RETURN in the link register tells us what’s what.

The code in th1 demonstrates this:

PROCEDURE th1[0];
  CONST Rv0offset = 12; Rv1offset = 16; LR = 14; SP = 13; PSPflag = 2;
  VAR v0, v1, regAddr: INTEGER;
BEGIN
  IF PSPflag IN BITS(SYSTEM.REG(LR)) THEN
    SYSTEM.EMIT(MCU.MRS_R03_PSP);
    regAddr := SYSTEM.REG(3)
  ELSE
    regAddr := SYSTEM.REG(SP) + 16
  END;
  SYSTEM.GET(regAddr + Rv0offset, v0);
  SYSTEM.GET(regAddr + Rv1offset, v1);
  (* ... *)
END th1;

Remember, we use R3 and R12. Their offsets in the stack frame after stacking are Rv0offset = 12 and Rv1offset = 16, respectively.
Bit 2 in the value EXC_RETURN in the link register tells us if the PSP is used, in which case we’re getting its value from the special register using a SYSTEM.EMIT instruction, which reads it into r3 (MRS instruction, privileged).
If only the MSP is used, we simply correct the current SP value by 16, accounting for the procedure’s local variables, and for the link register, which has been pushed by the prologue:

.   728   02D8H  0B500H          push     { lr }
.   730   02DAH  0B083H          sub      sp,#12

Now regAddr points to the bottom of the stacked registers for both cases, and we can get the stacked register values at their respective offsets.

If we only consider all the requesting code, and if the exception scheme outlined above is adhered to, we could read the registers directly. It’s any other exception outside this framework that can disrupt it, if they are assigned priorities that are equal or higher than the trap priority. Hence, in libraries we better read the parameters from the stack, as we never know how they will be used in a specific control program.⁵

Which Registers Can Be Used?

Registers r0 to r3, and r12

First, lets look at the registers that are stacked by the hardware during the exception entry sequence.

The test code for all the above test cases uses registers r3 and r12 for the two parameters. r0 to r2 are used by the test code logic, but in a real program, we could arrange this differently, so the question is, which registers are even available at maximum.

Since we need to set the trap handler exception to pending, we can start there to explore. Obviously this happens right after the parameters are set, so any register used for this operation is not free for parameters.

Typically:

  VAR v: INTEGER

    SYSTEM.PUT(MCU.NVIC_ISPR, {v})
.  1162   048AH  09800H          ldr      r0,[sp]
.  1164   048CH  02101H          movs     r1,#1
.  1166   048EH  04081H          lsls     r1,r0
.  1168   0490H  04801H          ldr      r0,[pc,#4] -> 1176
.  1170   0492H  06001H          str      r1,[r0]

r0 and r1 are used for setting the trap exception to pending in the NVIC.

Lets look at setting the registers.

    VAR v,w, x, y, z: INTEGER;

    SYSTEM.LDREG(12, v);
.  1308   051CH  09800H          ldr      r0,[sp]
.  1310   051EH  04684H          mov      r12,r0
    SYSTEM.LDREG(3, w);
.  1312   0520H  09801H          ldr      r0,[sp,#4]
.  1314   0522H  04603H          mov      r3,r0
    SYSTEM.LDREG(2, x);
.  1316   0524H  09802H          ldr      r0,[sp,#8]
.  1318   0526H  04602H          mov      r2,r0
    SYSTEM.LDREG(1, y);
.  1320   0528H  09803H          ldr      r0,[sp,#12]
.  1322   052AH  04601H          mov      r1,r0
    SYSTEM.LDREG(0, z);
.  1324   052CH  09804H          ldr      r0,[sp,#16]

Setting the parameter registers in this order would allow us to use r0 to r3, plus r12 if we’re adventurous (r12 is declared as “reserved” by Astrobe).

But we’re limited by the use of r0 and r1 for pending the trap handler, so only r2, r3, and r12 remain of the ones stacked by the hardware upon exception entry.

Note: using the SVC exception, we could use all registers r0 to r3, since we would use SYSTEM.EMITH to issue the SVC instruction right after preparing the parameters.

Registers r4 to r7

Now let’s look at the registers not stacked by the hardware upon exception entry, namely r4 to r11, of which only r4 to r7 are actually used by the Astrobe for Cortex-M0 compiler. Astrobe for Cortex-M3 and up allocates all registers up to r11.

The Astrobe compiler adds all registers upward from r4 to the push operation in the prologue, and to the pop in the epilogue, in case they are being altered by the code of the exception handler.

Caveat: this does not include any registers set via SYSTEM.LDREG, or via code inserted by SYSTEM.EMIT or SYSTEM.EMITH that modifies any register.

If the handler calls any procedure, all registers r4 to r7 (for M0) are pushed, regardless if the code actually alters them.

Put the other way around, we can be sure that registers r4 and r7 will always “survive” any exception, either because they are not altered, or because they are saved and restored. As this happens in software, the above late arrival edge case is not an issue: if the higher prio interrupt gets pending during the stacking of the lower prio one, and gets precedence and takes over, it will save and restore r4 to r7 as needed, or leave them unaltered.

Consequently, we can use r4 to r7 to pass our parameters. In the handler we read them directly from the registers.

However, due to the above caveat regarding SYSTEM.LDREG and SYSTEM.EMIT, if we write an exception handler that will trigger a trap handler, and we put the parameters into r4 to r7, we need to save and restore these registers ourselves. The same is true if the trap handler uses SYSTEM.LDREG or SYSTEM.EMIT to mutate registers r4 to r7.

Registers r8 to r11

The Astrobe for Cortex-M0 compiler never allocates registers r8 to r11. Consequently, they will never be altered by an exception handler (unless we use SYSTEM.LDREG or SYSTEM.EMIT). An exception handler does not even have to save and restore them (M0), but we cannot know if another module gets “creative” about these registers, as we are here, so it’s always better to save and restore.

Therefore, these register can be used for passing parameters, and the handler can read them directly from the registers.

The compiler allocates registers r8 to r11 for the M3, M4, and M7 MCUs. However, they will be saved and restored by corresponding push and pop operations in case they are altered. So with the same reasoning and conditions as for registers r4 to r7 for the M0 MCU, these registers can be used for passing parameters also for the M3, M4, and M7.

What About Using Only the Stack?

The stack is a memory space that is always preserved by any configuration and combination of exceptions. Could we not save ourselves some conceptual and implementation headaches by passing the parameters for a trap handler on the stack? Any number of parameters could be passed this way, in a unified way, without the need to consider the different register ranges, and without requiring to set, and save/restore any registers in any case.

Here’s a typical piece of code triggering a trap handler with parameters passed in registers r2 and r3.

PROCEDURE tt;
  CONST R2 = 2; R3 = 3; IntNo = 26;
  VAR v0, v1: INTEGER;
BEGIN
  (* determine v0 and v1 *)
  SYSTEM.LDREG(R2, v0);
  SYSTEM.LDREG(R3, v1);
  SYSTEM.PUT(MCU.NVIC_ISPR, {IntNo}})
END tt;

The handler would then retrieve the parameters from the stack using the code above, from the stack frame created by the trap handler entry stacking.

However, v0 and v1 are already on the stack of tt, at addresses SP + 0 and SP + 4, respectively. In most cases, the parameters for the trap handler are, or can be set there. Hence, the trap handler can access these local variables on the stack.

As easy as the trap handler can determine the base address of the stacked registers, it can also determine the stack pointer value at the time it was triggered – the corresponding address is right above the stack frame created by the exception stacking, possibly corrected for the double-word (eight bytes) stack alignment.

We can even set a rule that the trap handler parameters always be the first variables in the requesting code’s VAR declaration, which is usually a specific procedure. Any position will do, though, we just need to use the right stack pointer offset in the trap handler. Code maintenance will be easier if we follow some rule as outlined above.

Here’s the code to access the parameters on the triggering code’s stack:

PROCEDURE th3[0];
  CONST
    LR = 14; SP = 13; PSPflag = 2; AlignFlag = 9;
    StackFrameSize = 32; PSRoffset = 28;
  VAR v0, v1, regAddr, parAddr: INTEGER;
BEGIN
  IF PSPflag IN BITS(SYSTEM.REG(LR)) THEN
    SYSTEM.EMIT(MCU.MRS_R03_PSP);
    regAddr := SYSTEM.REG(3)
  ELSE
    regAddr := SYSTEM.REG(SP) + 16
  END;
  parAddr := regAddr + StackFrameSize;
  IF SYSTEM.BIT(regAddr + PSRoffset, AlignFlag) THEN
    INC(parAddr, 4)
  END;
  SYSTEM.GET(parAddr, v0);
  SYSTEM.GET(parAddr + 4, v1);
  (* ... *)
END th3;

The corresponding triggering procedure now simply is:

PROCEDURE tt1;
  CONST R2 = 2; R3 = 3; IntNo = 26;
  VAR v0, v1: INTEGER;
BEGIN
  (* determine v0 and v1 *)
  SYSTEM.PUT(MCU.NVIC_ISPR, {IntNo}})
END tt1;

Passing parameters this way is a bit opaque regarding the mechanism. Needs a clear comment.

The following test case uses that method.

Test Case 17

Thread mode code tt1 sets the parameters as local variables on its stack.
th3 reads the parameters from the stack.
ah2 changes one of the parameter registers to -99.

test case: 17
 rec   id int prio   alarm   begin    p-th    t-th     end   rtm   p0   p1
   0  tt1   -    -      --   10001   10009   10124   10268   267   13  -13
   1  th3  26    1      --   10129      --      --   10267   138   13  -13
   2  ah2   2    0   10150   10150      --      --   10175    25    -    -

Observations:

Normal stacking and unstacking for ah2 while th3 runs, without impact on the interaction of tt1 and th3

Bottom Line

This document describes the concept and implementation of traps. A trap is an exception, with a corresponding trap handler, as a means to grant exclusive mutating access to shared data, or other protected functionality or hardware. It also raises the privilege level, which isn’t relevant for the Cortex-M0/M0+, but can be for the M3 and up.

Priority scheme

The concept relies on defining an exception priority scheme, and then letting the NVIC arbitrate the mutual access to the protected data, without the need for other synchronising or lock-out mechanics:

prio 0: SVC for run-time error handling
prio 1: trap handlers: mutate protected data (or access other privileged functionality)
prio 2 or 3: exceptions requesting mutating access to protected data via trap
no prio: thread code requesting mutating access to protected data via trap

For clarity, exception handlers that do not request access to the protected data can be of any priority. If we want to have run-time error reporting in these exception handlers, they should avoid priority level 0.

Parameters

We can pass parameters to the trap handler using different methods

via r2, r3, and (if adventurous) r12, and read the values from the stack, assuming we use an unwired interrupt, hence r0 and r1 are not available; using SVC, we could also use r0 and r1;
via r4 to r7, and read the values from the registers
via r8 to r11, and read the values from the registers
via the stack only, and read the values from the stack

Or a combination thereof.

Results

Not shown here, but results of the trap handler could be passed back to the triggering code using the same concepts as for passing parameters.

Output Terminal

See Set-up, one-terminal set-up.

Build and Run

Build module TrapHandlers with Astrobe, and create and upload the UF2 file using abin2uf2.

Set Astrobe’s memory options as listed, and the library search path as explained.

Repository

TrapHandlers.mod

Also the Cortex-M0/M0+ can offer both levels of privilege with the addition of a corresponding architecture extension, but the RP2040 does not. ↩︎
We could design and implement a kernel where all threads are executed in handler mode. It’s an interesting concept, which I may one day attempt to implement. ↩︎
Maybe to be reconsidered. If (or when) the kernel trap handlers become stable, maybe we could get away with the hard fault error messages resulting of using SVC inside an SVC handlers. SVC is a nice concept for protected system level functions. ↩︎
Or successful experiment, depending on the point of view. ↩︎
Which means the prototype implementation of kernel-v2 at the time of this writing is wrong. One of the motivators for this test program and documenting the results here was to figure all this out. ↩︎