Introduction
Aspects
Having two processor cores available, certain aspects of creating programs and library modules need specific considerations, including:
- concurrent execution of the same code at the same time by both cores
- memory segmentation and allocation between the cores
- data sharing and separation
- module initialisation
- use of hardware resources, such as peripheral devices
Note that the following does not yet consider any memory segmentation for Secure and Non-secure code on the RP2350.
Basics
Generally speaking, the code needs to be re-entrant and thread-safe, meaning that the two cores can execute the same code at the same time, without any impact on control flow and results, or the operation the hardware devices (peripherals).
For any procedure, this means at minimum:
- don’t hold state between calls, and
- don’t use non-local variables temporary storage, eg. string buffers
With other words, the crucial question is: can this procedure be called at anytime from both cores, and also – not directly related to the two core architecture – can this procedure be interrupted at any time, and the “interrupter” can also call it? Such interruption could be caused by a hardware interrupt, or by a preemptive scheduler, time-sliced or not. For interrupt handlers, we can define restrictions for what is allowed, for library modules in general we should not need to stipulate such restrictions.
Temporary storage should be
- allocated on the stack, or
- provided by the caller via the API.
If a procedure’s state needs to be retained, the caller needs to provide the corresponding storage via the API. For state not pertaining to a single procedure, see next point.
Synchronisation
Related to state, another issue is (potentially) shared data and devices. We need to ensure that calls from both cores (or interrupts) do not
- corrupt the shared data, or
- interfere with the physical operation of a device.
This can be achieved by:
- convention and program design: we assign specific data or device to a core, that is, only one core is allowed to mutate and use it it;
- mutual lockout: we ensure that the shared data is never accessed by one core while the other still mutates it, or a device is never used by two cores at the same time.
Convention and program design can mean:
- implemented in software: the procedure detects which core is calling it, and selects the corresponding data;
- enforced in software: generate a runtime error, since its violation is a programming error (or malicious code);
- a rule that programmers abide by: “just don’t do it”;
depending on the nature of our control program and the modules.
The mutual lockout (or synchronisation) can be internal or external:
- internal: the procedure itself claims a lock and releases it, transparent to the caller;
- external: we require the caller to claim a lock before calling, and then release it when done.
With internal synchronisation, only one single call can be protected, with external synchronisation the protection can span several procedure calls, for example to lock down a serial device to call several output procedures to complete the desired output.
Before we look at different modules in the RTK framework against the above background, let’s have a look at the memory organisation.
Memory Segmentation and Access
The RP MCUs are bare metal devices, and unless we install an operating system, we are confronted with all the nitty-gritty details of memory management from the ground up. We have to to design and implement appropriate solutions for our two-core programs. We need stacks, heaps, vector tables, and some space for module variables.
In general, each core of the RP2040 or RP2350 can access the whole memory space.1 This includes SRAM, flash memory, and peripheral device addresses. For the program code, both cores must be able to access the same address space, be it in flash memory or SRAM.
Regarding volatile data storage, a processor core does not “know” about heap, or even stack memory, it’s the compiler and linker that instructs the CPU accordingly. Yes, the processor has specific instructions that facilitate using a stack, for example, since its designers assumed that one will be used, but we could write whole programs without using a stack or heap.
Of course we don’t want to go back to the stone ages, and not use a stack. Or two stacks, with two cores. Therefore we need to segment the available SRAM. At the minimum, each core requires a stack, with a stack starting address that provides sufficient space to grow downwards without interfering with the other, or the heap.
The initial stack address is provided
- for core 0 during the boot sequence, read from the binary code file,
- for core 1 when it gets activated by core 0: one of the values that core 0 must send to core 1 through the inter-core FIFO is the initial stack address.
Heap memory space and the module variables space is completely defined in software.
Memory Map
Astrobe’s IDEs let you slice and dice the memory map in many ways, using the configuration options – within the physical confines of the device. You specify the address range for the program code, and the one for data. Astrobe does not (yet?) have options for more than one core, so we provide a solution via module Config
, which extends the options for the other core, see here.
The chosen memory map for SRAM and flash memory looks thusly:2
SRAM:
+---------------------------+ 020040000H = CoreOneStackStart
| core 1 stack |
| |
~~~ ~~~
| |
| core 1 heap | 020030200H = CoreOneHeapStart
+---------------------------+
| |
| core 1 vector table | 020030000H = CoreOneDataStart = LinkOptions.DataEnd
+---------------------------+
| module data |
| |
+---------------------------+ CoreZeroStackStart = LinkOptions.StackStart
| core 0 stack |
| |
~~~ ~~~
| |
| core 0 heap | 020000200H = CoreZeroHeapStart = LinkOptions.HeapStart
+---------------------------+
| |
| core 0 vector table | 020000000H = CoreZeroDataStart = LinkOptions.DataStart
+---------------------------+
Flash memory:
+---------------------------+ 010200000H = CodeEnd = LinkOptions.CodeEnd
| |
| |
~~~ code (shared) ~~~
| |
| | 010000100H = CodeStart = LinkOptions.CodeStart
+---------------------------+
| boot code (RP2040) |
| meta data (RP2350) | 010000000H
+---------------------------+
Using the kernel, the stack space for one core looks like this:
+---------------------------+
| main stack (MSP) |
+---------------------------+
| thread 0 stack (PSP) |
+---------------------------+
| thread 1 stack (PSP) |
+---------------------------+
| thread 2 stack (PSP) |
+---------------------------+
~~~ ~~~
+---------------------------+
| thread n stack (PSP) |
+---------------------------+
| |
| |
The main stack (via MSP) is first used for initialisation, and then for exception handling. The threads use their own stacks, using the PSP.
Let’s focus on SRAM, and have a look at the different parts of the memory map.
Stack Memory
Each core has to get its own stack memory space, there’s no way around it. It’s the storage for local variables. We’ve mentioned above that temporary data should be kept on the stack to avoid any sharing of buffers and the like. Entries on the stack can point to non-local storage space though, via VAR and (value) POINTER parameters.
Heap Memory
The two cores could share a common heap, but this would require coordination, or arbitration, for the dynamic allocation of memory via NEW
. Hence, each core gets its own heap memory space. Note that, once allocated, each core can access the heap memory of the other core anyway, provided it gets a corresponding POINTER.
Module Data
These are the VARs declared at the module level. Subject to data visibility and accessibility defined for modules, module data is shared between cores.
Vector table
Each core gets its own exception vector table. They could share one, as each core can be given the address of the vector table (MCU.PPB_VTOR
). But exception handling is done by each core separately, based on its interrupt signals and settings of priorities etc., and having each core installing exception vectors in a shared table just opens unnecessary complexity.
Device Registers
Most of the RP2040’s and RP2350’s device registers have fully atomic set, clear, and xor aliases, using masks, hence they can be mutated without the need for a read-modify-write cycle, making writing the corresponding procedures much easier.
Program Initialisation
Here’s the initialisation sequence for example program NoBusyWaiting:
04CE4H 0F7FBFB8EH bl.w LinkOptions..init
04CE8H 0F7FBFB94H bl.w MCU2..init
04CECH 0F7FBFBC8H bl.w Config..init
04CF0H 0F7FBFC06H bl.w Resets..init
04CF4H 0F7FBFD26H bl.w GPIO..init
04CF8H 0F7FBFD4CH bl.w PowerOn..init
04CFCH 0F7FBFEC0H bl.w Clocks..init
04D00H 0F7FBFF52H bl.w MAU..init
04D04H 0F7FCF98EH bl.w Memory..init
04D08H 0F7FCF9A2H bl.w LED..init
04D0CH 0F7FCFE5EH bl.w RuntimeErrors..init
04D10H 0F7FDFABEH bl.w Error..init
04D14H 0F7FDFAFAH bl.w TextIO..init
04D18H 0F7FDFD52H bl.w Texts..init
04D1CH 0F7FDFF3CH bl.w ResData..init
04D20H 0F7FEFB52H bl.w RuntimeErrorsOu..init
04D24H 0F7FEFC8CH bl.w UARTdev..init
04D28H 0F7FEFCCCH bl.w UARTstr..init
04D2CH 0F7FEFD44H bl.w Coroutines..init
04D30H 0F7FEFD6EH bl.w SysTick..init
04D34H 0F7FFF994H bl.w Kernel..init
04D38H 0F7FFF9E6H bl.w UARTkstr..init
04D3CH 0F7FFFA8CH bl.w Terminals..init
04D40H 0F7FFFB74H bl.w Out..init
04D44H 0F7FFFBBCH bl.w Main..init
04D48H 0F7FFFC6CH bl.w MultiCore..init
04D4CH 0F7FFFD6EH bl.w Exceptions..init
04D50H 0F7FFFDE4H bl.w Timers..init
04D54H 0F7FFFF68H bl.w NoBusyWaitingC1..init
04D58H 0F7FFFFBEH bl.w NoBusyWaitingC0..init
The bodies of all modules are run in sequence as determined by the Astrobe build system – by core 0. Core 1 is not even awakened yet. Hence, no data corruption can happen due to concurrent data access, until NoBusyWaitingC0..init
is executed, calls NoBusyWaitingC0.run
, which in turn wakes up core 1 by passing NoBusyWaitingC1.Run
to MultiCore.InitCoreOne
.
In order to maintain the correctness of the program, core 1 should only ever be activated from the program for core 0, that is, as soon as the complete module initialisation sequence has finished.
Let’s have a look a the different RTK framework modules and consider which memory is being allocated, and where, and how it is used.
Pure Hardware Access
Some modules operate the devices directly without creating data structures that represent abstractions to the hardware: Resets
, GPIO
, StartUp
, Clocks
, LED
. Any state is held directly in hardware, and the corresponding access procedures only use local storage on the stack, if any. Access by different cores is coordinated by program design (agreement, convention). Access restrictions from one core or the other for specific hardware resource could be implemented and enforced in software, though, depending on the requirements regarding run-time robustness.
A special case in this category are the device register addresses that are the same for each core, but actually operate on different hardware devices.3 One core does not “see” the other core’s registers in that address range. They include the system tick, system control block, and memory protection unit. Currently available related modules are Exceptions
and SysTick
.
Memory Allocation
About Recursion and Dynamic Memory Allocation
Like recursion, dynamic memory allocation is often not recommended (or even forbidden) by programming guidelines for control programs. The problem of course is: what does an unsupervised embedded program do in the case of a stack overflow due to recursion, or when all available heap memory is used? If the stack overflow is even detected, and does not just corrupt data.
I would strongly agree with not using recursion, as it can be replaced by iteration, but I think dynamic memory allocation can be allowed as follows:
- RTK uses RECORDs to describe and represent hardware devices, and POINTERs to these records can easily be “passed around” and used as procedure parameters to select a specific instance of as device, such an UART. Such RECORDs are only allocated if the program will actually use the device.
- We know exactly which hardware devices will be used in our program, hence we can allocate all the needed device RECORDs at the start-up of the program. After the initialisation of the program,
NEW
is not required anymore, and we can even lock down the heap thereafter. During initialisation, we canASSERT(p # NIL)
for hard checks, since we’re catching program design or implementation errors, not run-time errors. - The same holds for other data structures, eg. in the kernel: create at start-up, assert for NIL.
- If we use the kernel, the same holds, mutatis mutandis, for stack allocation.
So:
- no recursion
- allocate all heap and stack memory during program initialisation
- possibly lock all allocation after the initialisation, eg. with
Memory.LockHeaps
.
Heap Memory and Stack Allocators
Module Memory
implements the memory allocators for both cores. The data to manage these storage areas is held in two equal data structures, one per core, in the module data space. The access procedures check on which core they are running by enquiring the core number from the hardware (MCU.SIO_CPUID
), then select the corresponding management data structure, and only ever mutate data in that RECORD. No conflict possible between the cores.
With cooperatively scheduled threads, or implicit threads with a “Big Main Loop Program”, this arrangement also suffices to avoid conflict between threads. With a preemptive scheduler it suffices if all allocations are done during initialisation. Interrupts should not be allowed to allocate memory in any case.
Read-only Data
Some modules, among them Astrobe’s LinkOptions
, as well as Config
, set variables during initialisation that will never change during the run-time of a program. This kind of “life time” read-only variable is safe to access from both cores at any time.
While the exported variables of LinkOptions
and Config
are read-only enforced by the compiler, there are other data structures that are read-only by convention, see UARTdev.Device
below.
Kernel
All data structures for the kernel data, as well as for threads and their coroutines, are created at start-up. Each core runs its own kernel and scheduler, and each thread is allocated to a core, hence all related data are strictly separated from each other. Access employs the same scheme as module Memory
. All kernel data can only be accessed using the corresponding procedures (full encapsulation).
Data Structures Representing Hardware Devices, eg. UARTdev.Device
Module UARTdev
, for example, defines a RECORD UARTdev.Device
that represent one UART instance in software, and is accessed using a POINTER. The corresponding memory is allocated at start-up. The IO procedures in modules UARTstr
and UARTkstr
take a UARTdev.Device
as parameter to access the specific hardware device. While it would be possible to make the fields of UARTdev.Device
only accessible via corresponding procedures, a compromise was chosen for performance reasons, namely to make the address values for the transmit, receive, and flag registers directly accessible by making them public.
Of course, this makes these register addresses also open for modification by all modules that use UARTdev.Device
, such as UARTstr
and UARTkstr
. The solution here is a programmer guideline: don’t do it. As long as we don’t expect any malicious code, this is sufficient. More protection is always possible, but at the cost of more code complexity as well as run-time overhead.
While the procedures in UARTstr
and UARTkstr
are re-entrant – they only use local storage, and two concurrent calls can happily share the same instance of UARTdev.Device
in read-only mode – the use of the actual hardware of course is not. The two calls would attempt to write and read the same hardware registers at the same time, resulting in havoc. Hence, the program design must ensure that the two cores do not attempt to use the same UART at the same time.
See Text Output and Input for possible solution, which can be tweaked in many ways to fit a program’s needs. The software is re-entrant, the synchronised access to the hardware peripherals is solved by program design, including the use of signals, semaphores, or messages.
Run-time Error Handling
There can only happen one run-time error at one time per core.4 Module RuntimeErrors
provides a separate set of module variables for the data structures to hold error and fault data. It uses the same access separation mechanics as module Memory
.
Module RuntimeErrorsOut
, for printing the error data collected by RuntimeErrors
, uses the TextIO.Writer
channel infrastructure (Text Output and Input), ie. part re-entrant procedures, part program design to access the output peripheral device.
See Runtime Errors.
Summary
- The RTK framework’s procedures are written in a re-entrant fashion:
- no state
- temporary data local on the stack
- Data structures are separated per core, and accessed only from software on that core, which is implemented in software:
- memory management (heap, stacks)
- run-time errors
- kernel
- The driver software for hardware peripheral devices is re-entrant, ie. the cores can use the same procedures at the same time, while access to the peripheral hardware devices themselves is synchronised by program design, for example:
- UART devices
- SPI devices
- GPIO via SIO
- All memory allocation is done at program start-up, either during the module initialisation, or by the programs for the two cores.
- Only wake up core 1 as soon as all modules are initialised.
- Shared read-only data is
- defined at start-up, and cannot be mutated: LinkOptions, Config
- defined when the data structures for devices are set up, and not mutated by their drivers by program design: eg. UARTdev, UARTstr, UARTkstr
-
Unless we implement some protection scheme using the Memory Protection Unit (MPU). ↩︎
-
The actual values used may be different when you read this, but the principle remains. ↩︎
-
All registers that start with
PPB_
. ↩︎ -
Depending on the design and implementation, this can be different with a time-slicing thread scheduler. ↩︎