Overview
The RP2040/Pico and RP2350/Pico2 use SoC-external flash memory, which is read serially using SSI/SPI. The Execute in Place (XIP) functionality allows to load instructions as if the serially connected flash memory were a linear address space, where addresses are translated into serial reads, transparent to the CPU and the bus.
Both MCUs provide a 16k on-chip SRAM cache for the flash memory, to alleviate the long loading times directly from flash. It works transparently from a programmer’s point of view, much like XIP itself.
Example Programs
The following example programs evaluate some questions around code loading and thus execution performance when
- directly executing from flash memory, or
- loading code in SRAM for execution, or
- pre-caching code before execution.
The programs:
In a Nutshell
- Directly executing from flash, without using the cache, or before the cache has warmed up, results in long loading times of about 400 nanoseconds per load.
- Loading code into SRAM results in the shortest possible load times, below 20 nanoseconds per load.
- Once cached, the code performs as if loaded into SRAM.
- In lieu of loading code into SRAM, it can be pre-cached before execution. As shown in AlarmEval, this works well.
- Loading code into different SRAM banks for each core can yield benefits if we need the ultimate jitter-free loading performance, since we avoid access congestion between the bus masters, which include the DMA read and write channels.
Notes:
- Astrobe does support running code from SRAM. To run code from SRAM, certain limitations and coding rules apply (cf. CodeLoading):
- procedures called from SRAM-based code must be referred to using procedure variables,
- run-time error messages refer to the SRAM addresses, and are thus of little help, and stack traces are not created.
- Getting helpful error addresses and stack traces from SRAM-based code would be possible, but would introduce some complexity, which I’ll avoid for now, not least because pre-caching works well.
- Maybe at some point Astrobe the will provide SRAM-loading and execution of code in a better way, but it’s a hairy problem. No obvious generic solution that I see for now.
- The above load times are with a 125 MHz system clock frequency.