NoBusyWaiting – Oberon RTK

Overview

The current module UARTstr implements the PutString procedure using busy waiting: as soon as the RP2040’s 32 byte transmitter FIFO is full, it waits in a busy-loop until there’s space in the buffer again. Busy waiting is detrimental for cooperative scheduling (and even with a time-sliced pre-emptive scheduler it would waste precious processor cycles that could be assigned to another thread (control process)).

The kernel offers a simple solution, without using interrupts: let the scheduler poll the device for availability, allowing other threads to run in the meantime. This example program demonstrates the problem first, and then how a simple kernel feature can help.

Description

The program is implemented using two modules:

NoBusyWaitingC0 for core 0
NoBusyWaitingC1 for core 1

Since the issue is confined to a single scheduler, we could have used only one core, but why should we, right? :)

Core 0: run one thread with the mandatory blinker as a “heartbeat” indicator
Core 1: run two threads, one causing the problem (thread 1), one showing it (thread 0)

Since we want to show output from two threads on the same core, the standard module Out with its core-specific output terminals does not work. We could use module Texts, but it was easy enough to quickly write a custom module Out, which is found in the example directory. Simply using Out, thread 0 writes to terminal 0, thread 1 to terminal 1.

To exacerbate the issue for demo purposes, the two terminals are set to work at a measly 9,600 Baud. To print one character takes about one millisecond. This ensures that filling an empty transmit buffer takes way less time than emptying it, so that processing overhead plays a minor role for this demo.

As you’ll see we will swap the device drivers for the UART device, which are allocated in module Main, using Terminals.Open. Hence, there’s also a custom module Main in the example directory so that the generic one remains untouched.

Output Terminals

See Set-up, two-terminal set-up.

Set the terminals to 9,600 Baud.

Build and Run

Threads (Processes)

Thread 0

Thread 0 is instrumented with timing points and output to demonstrate the problem.

It prints, each second:

c1-t0  5002 69

“c1-t0”: core one, thread zero
“5002”: the max time since it was last run by the scheduler (microseconds), captured during the last second
“69”: the max time required for one run-through (microseconds), captured during the last second; this time will vary between the different RPs

The thread is set to a period of 5 milliseconds, so a 5,002 microseconds interval time is OK.

Thread 1

Thread 1 is set up to be the troublemaker. It prints a string (see TestString0 etc.), per second.

Output:

c1-t1 01234567890123456789 P

“c1-t1”: core one, thread one
“01234567890123456789”: the test string (it’s TestString0 in this case)
“P”: triggered by period (see below)

Challenge

Let’s run the code with ever longer strings printed by thread 1; replace TestString0 with its longer siblings TestString1 etc. Here are the outputs of four test runs.

c1-t1 0123456789012345678901234 P
c1-t0  5001    69

c1-t1 012345678901234567890123456789 P
c1-t0  7199    69

c1-t1 01234567890123456789012345678901234 P
c1-t0 12405    74

c1-t1 0123456789012345678901234567890123456789 P
c1-t0 17612    74

As long as we stay below the 32 byte transmit buffer, everything looks good. Thread 0 gets scheduled and run every 5 milliseconds. With longer strings, thread 1 will start to eat processing time busy-waiting for the FIFO to be emptied towards the serial terminals, blocking thread 0 to be scheduled, so its interval time gets above its 5 ms period (7,199, 12,405, and 17,612 microseconds, respectively). For each five characters more that thread 1 transmits, thread 0 has to wait about 5 milliseconds longer, which jibes with the 1 ms per character of 9,600 Baud.

Solution

Let’s run the same tests with the improved kernel, which offers a feature to avoid busy waiting.

c1-t1 0123456789012345678901234 P
c1-t0  5002    69

c1-t1 012345678901234567890123456789 D
c1-t0  5002    69

c1-t1 01234567890123456789012345678901234 D
c1-t0  5001    68

c1-t1 0123456789012345678901234567890123456789 D
c1-t0  5002    69

Thread 0 does not get delayed anymore, but runs on time, independent of the length of the test string that thread 1 prints. Also notice how P changes to D at the end of the line, indicating when thread 1 starts to be triggered by the device availability, not the period anymore. It’s exactly when thread 0 started to show delays above (7,199 us), ie. when thread 1 prints strings that don’t fit the transmit buffer – see the explanation below.

To get this result, change this line in Main.init (the one in the example directory):

Terminals.Open(UART1, UART1_TxPinNo, UART1_RxPinNo, Baudrate, UARTstr.PutString, UARTstr.GetString);

to this:

Terminals.Open(UART1, UART1_TxPinNo, UART1_RxPinNo, Baudrate, UARTkstr.PutString, UARTkstr.GetString);

The terminal for thread 1 will now use a different PutString procedure.

Explanation

PutString in module UARTstr – busy-waiting on buffer full:

PROCEDURE PutString*(dev: TextIO.Device; s: ARRAY OF CHAR; numChar: INTEGER);
  VAR dev0: UARTdev.Device; i: INTEGER;
BEGIN
  dev0 := dev(UARTdev.Device);
  IF numChar > LEN(s) THEN numChar := LEN(s) END;
  i := 0;
  WHILE i < numChar DO
    IF ~SYSTEM.BIT(dev0.FR, UARTdev.FR_TXFF) THEN (* not full *)
      SYSTEM.PUT(dev0.TDR, s[i]);
      INC(i)
    END
  END
END PutString;

PutString in module UARTkstr – transfer control to kernel on buffer full:

PROCEDURE PutString*(dev: TextIO.Device; s: ARRAY OF CHAR; numChar: INTEGER);
  VAR dev0: UARTdev.Device; i: INTEGER;
BEGIN
  dev0 := dev(UARTdev.Device);
  IF numChar > LEN(s) THEN numChar := LEN(s) END;
  i := 0;
  WHILE i < numChar DO
    IF ~SYSTEM.BIT(dev0.FR, UARTdev.FR_TXFF) THEN (* not full *)
      SYSTEM.PUT(dev0.TDR, s[i]);
      INC(i)
    ELSE
      Kernel.AwaitDeviceFlags(dev0.FR, {UARTdev.FR_TXFE}, {}) (* empty *)
    END
  END
END PutString;

That is, when the transmit buffer is full, the thread transfers back to the scheduler, and will only be triggered again as soon as flag UARTdev.FR_TXFE at address dev0.FR (see module UARTdev) is set, indicating an empty buffer. In the meantime, other threads can run, in this case, thread 0.

The example code as well as the implementation in module Kernel should be easy to understand. It’s a very simple solution indeed. But effective, IMHO.

Repository

libv2: for RP2040/Pico and RP2350/Pico2: NoBusyWaiting
lib (v1): this version is no longer being maintained: NoBusyWaiting