Debug Data Generation
Astrobe compiles Oberon source code to ARM machine code, producing binary files, assembly listing files, and a linker map file. These outputs contain enough information to reconstruct the debug data that a source-level debugger needs – but the information is implicit, scattered across files, and encoded in the compiler’s own formats rather than in a standard debug format.
The elfdata library extracts this information and generates DWARF debug sections. The make-elf tool then packages the program binary together with the debug sections into an ARM ELF executable that GDB, OpenOCD, and other standard tools can use.
The Challenge
A compiler that targets a standard object format (such as ELF) can emit debug data during compilation – it has direct access to type information, variable locations, and the mapping between source lines and machine instructions. Astrobe, however, produces raw binary code and text-based listing files. It does not emit DWARF or any other standard debug format.
Reconstructing debug data from listing files means:
- parsing Oberon declarations to recover types, variables, and procedure signatures;
- correlating source lines with assembly instructions to build line-number tables;
- computing variable locations from the compiler’s known register and stack allocation rules;
- resolving type references across module boundaries;
- encoding everything in the binary formats that DWARF specifies.
What Is DWARF
DWARF is the standard debug data format for ELF executables. It defines a set of sections, each carrying a specific category of information. A debugger reads these sections to map machine addresses back to source-level constructs.
The sections generated by elfdata are:
| Section | Purpose |
|---|---|
.debug_info |
type definitions, variable declarations, procedure signatures – the core structural data |
.debug_abbrev |
templates that define the structure of entries in .debug_info |
.debug_line |
mapping from code addresses to source file and line number |
.debug_str |
shared string pool referenced by .debug_info entries |
.debug_frame |
call-frame descriptions for stack unwinding |
.debug_aranges |
address ranges per compilation unit, for fast lookup |
.debug_pubnames |
index of exported global variable names |
.ARM.attributes |
ARM architecture attributes (CPU, instruction set) |
Two additional sections are not DWARF but serve related purposes:
| Section | Purpose |
|---|---|
.symtab |
symbol table (procedure names, global variables, mapping symbols) |
.strtab |
string table for symbol names |
How the Debugger Uses These Sections
When the debugger stops at an address, it uses .debug_aranges to
find the compilation unit (module) that contains the address, then
looks up .debug_line to determine the source file and line number.
It reads .debug_info to find the enclosing procedure, its local
variables and parameters, and their types. It uses .debug_frame to
unwind the call stack. .debug_pubnames lets it look up exported
global variables by name. .symtab provides procedure and variable
symbols for symbol-based commands such as break Module_Proc.
Compilation Units
DWARF organises debug data by compilation unit (CU). Each Oberon
module becomes one compilation unit in the ELF file, with its own
entries in .debug_info, .debug_line, and .debug_frame. The
module’s assembly listing file (.alst) is the source file recorded
in the line-number table.
Debug Information Entries
The .debug_info section is built from Debug Information Entries
(DIEs). Each DIE has a tag that identifies what it describes and a set
of attributes that carry the data. For example, a DW_TAG_subprogram
DIE describes a procedure and has attributes for its name, address
range, parameter list, and local variables. A DW_TAG_variable DIE
describes a variable with attributes for its name, type, and location.
DIEs form a tree: the compilation unit DIE is the root, procedures are its children, and parameters and local variables are children of their enclosing procedure.
Type Representation
Oberon types are represented as DWARF type DIEs:
| Oberon type | DWARF tag |
|---|---|
INTEGER, CHAR, BOOLEAN, REAL, SET, BYTE |
DW_TAG_base_type |
ARRAY N OF T |
DW_TAG_array_type with DW_TAG_subrange_type child |
RECORD … END |
DW_TAG_structure_type with DW_TAG_member children |
POINTER TO T |
DW_TAG_pointer_type referencing the target type |
PROCEDURE type |
DW_TAG_pointer_type (4 bytes, opaque) |
Array dimensions, record field offsets, and pointer targets are encoded so that the debugger can display structured values and navigate nested data.
Variable Locations
DWARF location expressions tell the debugger where to find a variable’s value at run time. The encoding depends on where the compiler placed the variable:
- global variables:
DW_OP_addrwith the variable’s absolute address in RAM; - local variables on the stack:
DW_OP_fbregwith an offset from the frame base (stack pointer); - local variables in registers (leaf procedures):
DW_OP_regNnaming the register directly; - VAR parameters (passed by reference):
DW_OP_fbregplus;DW_OP_deref– read the pointer from the stack, then dereference.
Line-number Mapping
The .debug_line section contains a line-number program – a sequence
of opcodes that a state machine in the debugger executes to
reconstruct the (address, file, line) mapping. The program encodes
address and line deltas compactly, typically using a single special
opcode per source line.
Call-frame Information
The .debug_frame section describes how each procedure’s stack frame
is laid out. It tells the debugger which registers are saved where,
so that it can unwind the call stack and show the chain of callers.
Each procedure gets a Frame Description Entry (FDE) that records the
saved registers and the stack adjustment.
The Extraction Process
elfdata processes listing files in several phases.
Input: Assembly Listing Files
The listing files in the rdb/ directory (produced by
gen-rdb) interleave
Oberon source code with ARM assembly. Source lines appear as-is from
the module’s source file. Assembly lines begin with a dot in column
zero and show the instruction offset, absolute address, hex encoding,
and mnemonic:
GPIO.GetPadBaseCfg(padCfg);
. 8 010004C10 04668 mov r0,sp
. 10 010004C12 0F8DF1058 ldr.w r1,[pc,#88] -> 100
. 14 010004C16 0F7FCFA5F bl.w GPIO.GetPadBaseCfg [-7585 -> 100010D8]
. 18 010004C1A 0E000 b 0 -> 22
. 20 010004C1C 0002E <LineNo: 46>Annotations such as <LineNo: 46> and <Type: 64> are embedded in
the assembly by the Astrobe compiler and carried through into the
.alst files by gen-rdb.
Phase 1: Tokenisation and Parsing
The tokeniser reads the listing file and produces a stream of
Oberon tokens – identifiers, numbers, and punctuation. It skips
assembly lines (those starting with .) and comments ((* ... *)),
since only the Oberon declarations carry type and structure
information.
The parser is a recursive-descent parser that walks the token stream and extracts:
- the module name and import list (with any aliases)
- CONST definitions (integer constants used in array dimensions)
- TYPE definitions (records, arrays, pointers, procedure types)
- VAR declarations (global variables with types and export markers)
- PROCEDURE declarations (name, formal parameters, local variables, local types, result type, procedure kind)
The parser skips procedure bodies (BEGIN … END) – only the
declaration-level information matters for debug data.
Oberon’s type system allows cross-module references
(IMPORT M; ... VAR x: M.SomeType). The parser records import
aliases so that type references can be resolved across modules in a
later phase.
Phase 2: Assembly Analysis
A second pass reads the assembly lines to extract information that only exists at the machine-code level:
- procedure addresses: the absolute address of each procedure’s
first instruction, located by scanning forward from the
PROCEDUREdeclaration line to the first assembly line; - procedure sizes: computed from the distance between consecutive procedure addresses;
- frame information: the
push {... lr}instruction at procedure entry identifies saved registers; an optionalsub sp,#Nreveals the local stack frame size; - line entries:
<LineNo: N>annotations provide the mapping from code address to source line number; - type sizes:
<Type: N>annotations afterRECORD…ENDblocks give the byte size of record types.
Phase 3: Type Resolution
Before any address or size computations, all type references are resolved in place. This means:
- import aliases are followed (
BT1.T→BasicTypes1.T); - named types are expanded to their definitions;
- record base types are resolved and their fields merged;
- array element types and pointer targets are resolved recursively.
This phase operates across all modules simultaneously, since a type in one module may refer to a type in another.
Self-referential types (such as a linked-list node with a pointer to its own type) are detected and handled via cycle detection.
Phase 4: Address Computation
With all types fully resolved, elfdata computes concrete addresses and offsets:
- global variable addresses: allocated downward from each module’s data region top (from the linker map file), following the compiler’s alignment rules;
- local variable offsets: allocated upward from the stack pointer, word-aligned;
- parameter offsets: derived from the register assignment order (r0, r1, r2, …) and the local frame size.
Leaf procedures are explicitly marked by the programmer
(PROCEDURE*) and must not call other procedures. The compiler
places many of their variables in registers instead of on the stack.
elfdata replicates this allocation to produce the correct DWARF
register-based location expressions.
Phase 5: DWARF Generation
The final phase encodes everything into the binary DWARF format:
.debug_abbrev– the abbreviation table defines 14 DIE templates shared by all compilation units;.debug_info– one compilation unit per module, containing type DIEs, global variable DIEs, and subprogram DIEs with their parameter and local variable children;.debug_line– one line-number program per module, mapping code addresses to source lines;.debug_str– a shared string pool; all names in.debug_infoare stored as offsets into this pool to avoid duplication;.debug_frame– one CIE (Common Information Entry) plus one FDE per procedure, describing register save locations;.debug_aranges– address range entries per compilation unit.debug_pubnames– exported global variable names with their.debug_infooffsets.
The symbol table (.symtab) is also built at this stage, containing
procedure symbols (STT_FUNC), global variable symbols
(STT_OBJECT), and ARM mapping symbols ($t for Thumb code
regions).
ELF Packaging
The make-elf tool takes the program binary and the debug data from elfdata, and writes the final ELF file. It constructs the ELF header, program headers (for loadable code and BSS segments), and section headers, then appends the binary code, all debug sections, and the symbol table.
The resulting ELF is a standard 32-bit ARM executable that GDB and OpenOCD can load, flash-program, and debug.