Flash Memory

Overview

The RP2040 uses SoC-external flash memory, which is read serially using SSI/SPI. The Execute in Place (XIP) functionality allows to load instructions as if the serially connected flash memory were a linear address space, where addresses are translated into serial reads, transparent to the CPU and the bus.

The RP2040 provides a 16k on-chip SRAM cache for the flash memory, to alleviate the long loading times directly from flash. It works transparently from a programmer’s point of view, much like XIP itself.

Example Programs

The following example programs evaluate some questions around code loading and thus execution performance when

  • directly executing from flash memory, or
  • loading code in SRAM for execution, or
  • pre-caching code before execution.

The programs:

In a Nutshell

  • Directly executing from flash, without using the cache, or before the cache has warmed up, results in long loading times of about 400 nanoseconds per load.
  • Loading code into SRAM results in the shortest possible load times, below 20 nanoseconds per load.
  • Once cached, the code performs as if loaded into SRAM.
  • In lieu of loading code into SRAM, it can be pre-cached before execution. As shown in AlarmEval, this works well.
  • Loading code into different SRAM banks for each core can yield benefits if we need the ultimate jitter-free loading performance, since we avoid access congestion between the bus masters, which include the DMA read and write channels.

Notes:

  • Astrobe for Cortex-M0 does not yet “officially” support the RP2040. For now, to run code from SRAM, certain limitations and coding rules apply (cf. CodeLoading):
    • procedures called from SRAM-based code must be referred to using procedure variables,
    • run-time error messages refer to the SRAM addresses, and are thus of little help, and stack traces are not created.
  • Getting helpful error addresses and stack traces from SRAM-based code would be possible, but would introduce some complexity, which I’ll avoid for now, not least because pre-caching works well.
  • Maybe the official release will provide SRAM-loading and execution of code in a better way, but it’s a hairy problem. No obvious generic solution that I see for now.
  • The above load times are with a 125 MHz system clock frequency.