Stack Traces

Overview

Runtime Errors provides an overview of the handling of Errors and Faults. Here’s we look at stack traces, which can be created when catching a run-time error.

Example/Test Programs

There is a set of example/test programs do demonstrate and explore stack traces as described below.

Terminology

  • (Run-time) error exception, error exception handler: the Error and Fault exceptions and handlers in RuntimeErrors.
  • Application exception, application exception handler: exceptions belonging to the application (program) logic.

Interrupts are a subset of all exceptions. That is, the term ’exception’ includes ‘interrupt’, unless noted otherwise.

About Stack Traces

Stack traces are a programmer’s tool to identify and fix defects, or other run-time problems, such as reading wrong or garbled data from a sensor.

The stacks of a program represent its state, together with the module variables (globals) and the state of the CPU1. In particular, the stacks contain the chain of procedure calls and the values of their local variables and parameters at any moment. In Oberon, all code execution happens in procedures. This includes the module bodies, which are compiled as procedures as well (.init), and called by the start-up sequence of the program, as created by the linker.

A stack trace is a snapshot of these procedure calls.2 It is usually used in error situations to depict what has lead up to the problem, but could basically be created at any time and code location.

It’s important to realise that the actual procedure calls themselves are not registered on the stack – this happens only indirectly when the callee’s prologue pushes the link register (LR) value onto the stack, which contains the return address inside the caller. The address of the actual call (bl.w, blx) has to be derived from the pushed return address when creating the stack trace.

Should this push of the LR onto the stack fail, eg. due to a MCU hardware fault, or if intercepted by an interrupt, we simply cannot “see” the call, even though it has been made.

The Stacks

A “plain” program usually uses just one stack, referred to by the main stack pointer, MSP. All procedure calls and exception handling use this one stack. Astrobe allocates the corresponding memory space below the module variables, from where the stack grows downwards.

However, each kernel thread requires its own stack to preserve its state when other threads get scheduled and run. Module Memory implements the allocation of the corresponding memory slots.

We use the process stack pointer (PSP) for kernel threads, so we don’t have to allocate space for exception handling on each thread stack. Exception handling always uses the MSP. We only need to reserve sufficient space on a thread stack for one exception stacking: when an exception occurs in a thread, stacking happens in its (process) stack, but exception handling, including nested interrupt stacking and handling, is done using the MSP.

In any case, the main stack must reserve sufficient memory space for exception handling. With the RP2040 and RP2350 we have ample SRAM available, but on MCUs with less SRAM, eg. 32k only, this needed reserve may impact the program design: an overflowing stack will corrupt the heap.

The needed memory space for exceptions can be substantial, in particular on the RP2350 with its FPU, and in case we use both Secure and Non-secure realms:

  • The maximum stack frame size is 208 bytes, not including any padding for double-word alignment.
  • If we use all eight exception priorities, we need to reserve 8 * 208 = 1,646 bytes just for the exception stack frames.
  • Plus, we also need stack space for the handlers themselves, ie. for local variables, plus any push of non-stacked registers if required.

By using the PSP for threads, we only need to allocate exception handling stack memory space on the main stack (apart from one stacking in the thread’s stack, as outlined above). Only running in Secure mode, and not making use of all exception prios reduces the required exception handling stack space accordingly.

MCU Modes

In general, there are two (execution) modes of the MCU:

  • Handler mode: the code is executing an exception handler;
  • Thread mode: the code is not executing an exception handler.

Thread mode of the MCU is not to be confused with kernel threads, even though the latter do execute in thread mode, but so do programs not using the kernel.

  • Thread mode can use either the main stack pointer (MSP) or the process stack pointer (PSP),
  • Handler mode always uses the MSP.
  • The MCU resets to Thread mode, using the MSP.

Stack Traces

A stack trace can be created for Errors and Faults, and both in Thread and Handler mode.

  • Many of the Faults are caused synchronously by the program code, so a stack trace indicates what has lead to the problem.

  • Even if a problem in an exception handler arises asynchronously to the currently running program, or thread, the status of the latter is often relevant, especially if there’s some interaction via shared data or other resources.

Creating Stack Traces Without Kernel Threads

Error or Fault Not in an Exception Handler (MCU Thread Mode)

An Error or Fault is signalled and handled as an error exception – this is our starting point to create a stack trace. Our Error or Fault handler starts to execute with the error exception stack frame on the stack, using the main stack (MSP) below the stacked registers. Hence, all Error and Fault handling code and stack use does not interfere with stack traces.

The first entry of the stack trace is extracted from the Error or Fault handler’s stack frame, together with the corresponding Error or Fault code, offending code address, stack address, and source line number, if available.

Since we don’t have a frame pointer, we need to scan the (main) stack upwards from the location of the error to find bl.w and blx instructions. For this, we check each value above the stack frame as inserted by the stacking process upon error exception entry if it’s a valid link register value, and then take it from there if yes to get the bl.w and blx address and source line number, if available (see below). Check the source of RuntimeErrors.getLR and friends. The code is borrowed, if in slightly modified form, from Astrobe’s module Traps.3

Naturally, this can lead to false positives. The most obvious case are uninitialised, or not yet initialised, local variables on the stack, where former valid link register values “shine through”. There’s nothing that can be done about that, so we just need to be aware when reading the trace.

The scan stops when either the top of the stack is reached, or we have used all available trace points in the Error or Fault exception data structure. If all trace points are used, but the stack trace would reveal more procedure calls, the exception data structure is marked accordingly. The procedure that prints the stack trace can make use of this marker to indicate that what we see is not the full trace.

The top of the stack is marked with the ARM-recommended seal value.

Example from Stacktrace, including the error message and the collected register values:

run-time error in thread mode: 7 core: 1
integer divided by zero or negative divisor
Stacktr0.error  addr: 100049BCH  ln: 33
trace:
  Stacktr0.error             100049BCH    33   2003FFD0H
  Stacktr0.p2                10004A00H    43   2003FFE0H
  Stacktr0.p1                10004A12H    55   2003FFE8H
  Stacktr0.p0                10004A22H    63   2003FFECH
  Stacktr0.run               10004A5AH    71   2003FFF0H
stacked registers:
 psr: 61000200H
  pc: 100049BEH
  lr: 10004A05H
 r12: 0A0B0C0DH
  r3: 00000002H
  r2: 10003DCCH
  r1: 00000000H
  r0: 00000000H
  sp: 2003FFB8H
current registers:
  sp: 2003FF84H
  lr: FFFFFFF9H
  pc: 10002D52H
 psr: 6000000BH

Note the data selected for output, as well as their format and presentation used here, are not part of the Error or Fault handling per se, but done by RuntimeErrorsOut.PrintException, as described here.

Error or Fault in an Application Exception Handler (MCU Handler Mode)

Errors and Faults in application exception handlers are also caught, by the same error handlers in RuntimeErrors – as long as the application exception priority is lower than the run-time error handlers’ prio.

Creating a stack trace in this case is a bit more involved. We again gather the Error and Fault code, addresses, source line number, etc. from the stack frame of the error exception that caught the run-time error, and then again scan the stack upwards from right above the error stack frame, accounting for a possible padding word to get the stack frame double-word aligned.

However, since the faulty code was executing an application exception handler, possibly nested, there are the corresponding exception stack frames on the main stack, which have to be accounted for. If we naively do the same scan, we will run through these stack frames, and get false positives, eg. for the stacked LR values, and potentially more but more for each double-word padding, for a “reserved” address inside the floating point (FP) context with the RP2350 (and even more when considering Non-secure exceptions).

Consequently, we have to skip these application exception stack frames, including any double-word padding. We know that the application exception handler has pushed the link register onto the stack right below the stack frame, and it contains the distinctive EXC_RETURN value that can be detected by the scan. To avoid false positives (or at least minimise the probability) for other stack values containing an EXC_RETURN value, the current implementation also checks if the value at the stacked LR address inside the stacked register block is valid, or if it is an EXC_RETURN value (which happens with nested exceptions that don’t do any procedure calls).

After we have identified the stack frame, we can then skip it. In addition, we can use the return address within the stack frame (pc value) to identify the procedure that had been interrupted by the application exception. Also, the current trace point gets an annotation marker. The procedure that prints the stack trace can make use of this marker to indicate interrupted procedures.

Example from Stacktrace (omitting the register values):

run-time error in handler mode: 7 core: 1
integer divided by zero or negative divisor
Stacktr1.error  addr: 10005140H  ln: 37
trace:
  Stacktr1.error             10005140H    37   2003FF50H
  Stacktr1.i2                10005184H    47   2003FF5CH
  Stacktr1.i1                10005196H    53   2003FF64H
  Stacktr1.i0                100051A2H    58   2003FF68H
  --- exc ---
  Stacktr1.h2                100051B6H         2003FF80H
  Stacktr1.h1                100051E4H    77   2003FFA4H
  Stacktr1.h0                100051F6H    82   2003FFB0H
  --- exc ---
  Stacktr1.p1                10005216H         2003FFC8H
  Stacktr1.p0                10005222H   101   2003FFECH
  Stacktr1.run               100052A2H   115   2003FFF0H

Creating Stack Traces With Kernel Threads

Error or Fault Not in an Exception Handler (MCU Thread Mode)

Creating the stack trace for a kernel thread is not essentially different than for a “plain program”: scan upwards from above the stacked registers, and check for valid LR entries, as described above.

The difference is that we need to scan the process stack, as indicated by the process stack pointer (PSP), not the main stack. The error handler executes using the main stack, but the error exception stacking has occurred in the process stack. We detect that by inspecting the EXC_RETURN value, set the scan start address above the stacked registers on the process stack, taking into account any double-word alignment padding, and scan as described above. Just as the main stack, the top of the process stack is marked with a seal value. In fact, the code executed to collect the stack trace is the same for the non-kernel and kernel cases.

Error or Fault in an Exception Handler (MCU Handler Mode)

Creating a stack trace for an Error or Fault occurring in an application exception handlers when using kernel threads is only different insofar as we need to scan both the main stack and the thread’s process stack. First, we scan the main stack, as described above, skipping the application exception stack frame, or frames in case of a nested exception, and then switch over to the process stack.

When the kernel starts, it resets the main stack pointer to the top, just underneath the seal value, so we have only exception handler data on the main stack, and none from the initialisation code. Hence, the first value on the main stack is always an EXC_RETURN value, pushed there by the application exception handler, after stacking on the process stack has occurred. So when scanning the main stack upwards, we switch to the process stack when we reach that specific EXC_RETURN value. From there, scanning continues as described above.

Stack Trace Management

The Astrobe IDE has an option to disable the Stack Trace. However, what this actually means is that no source code line numbers are inserted into the binary for all bl.w or blx instructions, which are the anchor points for a stack trace. Hence, a stack trace can be created even with the Stack Trace option disabled in Astrobe, only without the source code line numbers.4

RuntimeErrors provides RuntimeErrors.SetStacktraceOn to enable or disable stack traces. Depending on the setting in Astrobe, the stack trace will include source line numbers or not.

Making Sense of Stack Traces With Addresses Only (No Source Line Numbers)

Just the structure of the stack trace gives a quick overview of the program status. Often, the exact locations of the procedure calls are not even needed – or are obvious in short procedures – to find and fix defects.

For more insight, we need an assembly listing, and for this, the Professional Edition of Astrobe is required. With it, we have different options.

  • Use the Project > Disassemble Application functionality to get an assembly listing of the whole program, with the absolute code addresses as allocated by the linker. This program level listing does not include the Oberon source code, though.

  • Use the Project > Disassemble Module functionality to get an assembly listing of the module with the defect, which includes the source code. The code addresses in this listing are relative to the module start, so you need to use the module’s code start address, found in program’s map file as created by the linker to find the instruction.

Since I am lazy, my OberonDev “IDE” creates another set of module level assembly listings after linking, where all the relative addresses haven been resolved using the data from the map and binary files. Example/test programs uses these files for explanations. It’s my go-to listing for both finding problems as well as understanding the code in detail. All pre-link data in the standard module assembly is resolved, I see the code and addresses as actually in memory, and I have the source code embedded for my better understanding. I am no good at reading assembly code.

Note: if there’s interest in this kind of “extended” or “absolute” module assembly listing, I may extract it as stand-alone command line program. As said, the Professional Edition of Astrobe is required.

Concluding Thoughts

Stack traces (as well as collecting register values and such) are programmer’s tools. Hence, the stack trace functionality may be best extracted into its own module, in order to keep the pure Error and Fault handling code as simple as possible: Error and Fault handling must be 100% free of any defects, or derailments by edge cases. Creating stack traces is partly based on heuristics, and “minimising probabilities” for false positives. If Error and Fault handling is to be the basis for autonomous corrective action, it may be a good idea to separate the two.

See Also

  • Example/test program Stacktrace for examples of all the Error and Fault types outlined above.

  1. Plus there could be external storage that contributes to the state. ↩︎

  2. In general, a stack trace could also include the local variable values. ↩︎

  3. With permission. ↩︎

  4. Astrobe suppresses the creation of a stack trace in its library module Traps in case no source code line numbers for bl.w or blx are found. I think the former name of that configuration option, Line Numbers, was more precise than the current Stack Trace↩︎