Debug Data Generation

How debug data is extracted from Astrobe listing files and packaged as DWARF sections in an ARM ELF executable

Astrobe compiles Oberon source code to ARM machine code, producing binary files, assembly listing files, and a linker map file. These outputs contain enough information to reconstruct the debug data that a source-level debugger needs – but the information is implicit, scattered across files, and encoded in the compiler’s own formats rather than in a standard debug format.

The elfdata library extracts this information and generates DWARF debug sections. The make-elf tool then packages the program binary together with the debug sections into an ARM ELF executable that GDB, OpenOCD, and other standard tools can use.

The Challenge

A compiler that targets a standard object format (such as ELF) can emit debug data during compilation – it has direct access to type information, variable locations, and the mapping between source lines and machine instructions. Astrobe, however, produces raw binary code and text-based listing files. It does not emit DWARF or any other standard debug format.

Reconstructing debug data from listing files means:

parsing Oberon declarations to recover types, variables, and procedure signatures;
correlating source lines with assembly instructions to build line-number tables;
computing variable locations from the compiler’s known register and stack allocation rules;
resolving type references across module boundaries;
encoding everything in the binary formats that DWARF specifies.

What Is DWARF

DWARF is the standard debug data format for ELF executables. It defines a set of sections, each carrying a specific category of information. A debugger reads these sections to map machine addresses back to source-level constructs.

The sections generated by elfdata are:

Section	Purpose
`.debug_info`	type definitions, variable declarations, procedure signatures – the core structural data
`.debug_abbrev`	templates that define the structure of entries in `.debug_info`
`.debug_line`	mapping from code addresses to source file and line number
`.debug_str`	shared string pool referenced by `.debug_info` entries
`.debug_frame`	call-frame descriptions for stack unwinding
`.debug_aranges`	address ranges per compilation unit, for fast lookup
`.debug_pubnames`	index of exported global variable names
`.ARM.attributes`	ARM architecture attributes (CPU, instruction set)

Two additional sections are not DWARF but serve related purposes:

Section	Purpose
`.symtab`	symbol table (procedure names, global variables, mapping symbols)
`.strtab`	string table for symbol names

How the Debugger Uses These Sections

When the debugger stops at an address, it uses .debug_aranges to find the compilation unit (module) that contains the address, then looks up .debug_line to determine the source file and line number. It reads .debug_info to find the enclosing procedure, its local variables and parameters, and their types. It uses .debug_frame to unwind the call stack. .debug_pubnames lets it look up exported global variables by name. .symtab provides procedure and variable symbols for symbol-based commands such as break Module_Proc.

Compilation Units

DWARF organises debug data by compilation unit (CU). Each Oberon module becomes one compilation unit in the ELF file, with its own entries in .debug_info, .debug_line, and .debug_frame. The module’s assembly listing file (.alst) is the source file recorded in the line-number table.

Debug Information Entries

The .debug_info section is built from Debug Information Entries (DIEs). Each DIE has a tag that identifies what it describes and a set of attributes that carry the data. For example, a DW_TAG_subprogram DIE describes a procedure and has attributes for its name, address range, parameter list, and local variables. A DW_TAG_variable DIE describes a variable with attributes for its name, type, and location.

DIEs form a tree: the compilation unit DIE is the root, procedures are its children, and parameters and local variables are children of their enclosing procedure.

Type Representation

Oberon types are represented as DWARF type DIEs:

Oberon type	DWARF tag
`INTEGER`, `CHAR`, `BOOLEAN`, `REAL`, `SET`, `BYTE`	`DW_TAG_base_type`
`ARRAY N OF T`	`DW_TAG_array_type` with `DW_TAG_subrange_type` child
`RECORD` … `END`	`DW_TAG_structure_type` with `DW_TAG_member` children
`POINTER TO T`	`DW_TAG_pointer_type` referencing the target type
`PROCEDURE` type	`DW_TAG_pointer_type` (4 bytes, opaque)

Array dimensions, record field offsets, and pointer targets are encoded so that the debugger can display structured values and navigate nested data.

Variable Locations

DWARF location expressions tell the debugger where to find a variable’s value at run time. The encoding depends on where the compiler placed the variable:

global variables: DW_OP_addr with the variable’s absolute address in RAM;
local variables on the stack: DW_OP_fbreg with an offset from the frame base (stack pointer);
local variables in registers (leaf procedures): DW_OP_regN naming the register directly;
VAR parameters (passed by reference): DW_OP_fbreg plus; DW_OP_deref – read the pointer from the stack, then dereference.

Line-number Mapping

The .debug_line section contains a line-number program – a sequence of opcodes that a state machine in the debugger executes to reconstruct the (address, file, line) mapping. The program encodes address and line deltas compactly, typically using a single special opcode per source line.

Call-frame Information

The .debug_frame section describes how each procedure’s stack frame is laid out. It tells the debugger which registers are saved where, so that it can unwind the call stack and show the chain of callers. Each procedure gets a Frame Description Entry (FDE) that records the saved registers and the stack adjustment.

The Extraction Process

elfdata processes listing files in several phases.

Input: Assembly Listing Files

The listing files in the rdb/ directory (produced by gen-rdb) interleave Oberon source code with ARM assembly. Source lines appear as-is from the module’s source file. Assembly lines begin with a dot in column zero and show the instruction offset, absolute address, hex encoding, and mnemonic:

  GPIO.GetPadBaseCfg(padCfg);
.     8  010004C10      04668  mov       r0,sp
.    10  010004C12  0F8DF1058  ldr.w     r1,[pc,#88] -> 100
.    14  010004C16  0F7FCFA5F  bl.w      GPIO.GetPadBaseCfg [-7585 -> 100010D8]
.    18  010004C1A      0E000  b         0 -> 22
.    20  010004C1C      0002E  <LineNo: 46>

Annotations such as <LineNo: 46> and <Type: 64> are embedded in the assembly by the Astrobe compiler and carried through into the .alst files by gen-rdb.

Phase 1: Tokenisation and Parsing

The tokeniser reads the listing file and produces a stream of Oberon tokens – identifiers, numbers, and punctuation. It skips assembly lines (those starting with .) and comments ((* ... *)), since only the Oberon declarations carry type and structure information.

The parser is a recursive-descent parser that walks the token stream and extracts:

the module name and import list (with any aliases)
CONST definitions (integer constants used in array dimensions)
TYPE definitions (records, arrays, pointers, procedure types)
VAR declarations (global variables with types and export markers)
PROCEDURE declarations (name, formal parameters, local variables, local types, result type, procedure kind)

The parser skips procedure bodies (BEGIN … END) – only the declaration-level information matters for debug data.

Oberon’s type system allows cross-module references (IMPORT M; ... VAR x: M.SomeType). The parser records import aliases so that type references can be resolved across modules in a later phase.

Phase 2: Assembly Analysis

A second pass reads the assembly lines to extract information that only exists at the machine-code level:

procedure addresses: the absolute address of each procedure’s first instruction, located by scanning forward from the PROCEDURE declaration line to the first assembly line;
procedure sizes: computed from the distance between consecutive procedure addresses;
frame information: the push {... lr} instruction at procedure entry identifies saved registers; an optional sub sp,#N reveals the local stack frame size;
line entries: <LineNo: N> annotations provide the mapping from code address to source line number;
type sizes: <Type: N> annotations after RECORD … END blocks give the byte size of record types.

Phase 3: Type Resolution

Before any address or size computations, all type references are resolved in place. This means:

import aliases are followed (BT1.T → BasicTypes1.T);
named types are expanded to their definitions;
record base types are resolved and their fields merged;
array element types and pointer targets are resolved recursively.

This phase operates across all modules simultaneously, since a type in one module may refer to a type in another.

Self-referential types (such as a linked-list node with a pointer to its own type) are detected and handled via cycle detection.

Phase 4: Address Computation

With all types fully resolved, elfdata computes concrete addresses and offsets:

global variable addresses: allocated downward from each module’s data region top (from the linker map file), following the compiler’s alignment rules;
local variable offsets: allocated upward from the stack pointer, word-aligned;
parameter offsets: derived from the register assignment order (r0, r1, r2, …) and the local frame size.

Leaf procedures are explicitly marked by the programmer (PROCEDURE*) and must not call other procedures. The compiler places many of their variables in registers instead of on the stack. elfdata replicates this allocation to produce the correct DWARF register-based location expressions.

Phase 5: DWARF Generation

The final phase encodes everything into the binary DWARF format:

.debug_abbrev – the abbreviation table defines 14 DIE templates shared by all compilation units;
.debug_info – one compilation unit per module, containing type DIEs, global variable DIEs, and subprogram DIEs with their parameter and local variable children;
.debug_line – one line-number program per module, mapping code addresses to source lines;
.debug_str – a shared string pool; all names in .debug_info are stored as offsets into this pool to avoid duplication;
.debug_frame – one CIE (Common Information Entry) plus one FDE per procedure, describing register save locations;
.debug_aranges – address range entries per compilation unit
.debug_pubnames – exported global variable names with their .debug_info offsets.

The symbol table (.symtab) is also built at this stage, containing procedure symbols (STT_FUNC), global variable symbols (STT_OBJECT), and ARM mapping symbols ($t for Thumb code regions).

ELF Packaging

The make-elf tool takes the program binary and the debug data from elfdata, and writes the final ELF file. It constructs the ELF header, program headers (for loadable code and BSS segments), and section headers, then appends the binary code, all debug sections, and the symbol table.

The resulting ELF is a standard 32-bit ARM executable that GDB and OpenOCD can load, flash-program, and debug.