Overview
The current module UARTstr
implements the PutString
procedure using busy waiting: as soon as the RP2040’s 32 byte transmitter FIFO is full, it waits in a busy-loop until there’s space in the buffer again. Busy waiting is detrimental for cooperative scheduling (and even with a time-sliced pre-emptive scheduler it would waste precious processor cycles that could be assigned to another thread (control process)).
The kernel offers a simple solution, without using interrupts: let the scheduler poll the device for availability, allowing other threads to run in the meantime. This example program demonstrates the problem first, and then how a simple kernel feature can help.
Description
The program is implemented using two modules:
- NoBusyWaitingC0 for core 0
- NoBusyWaitingC1 for core 1
Since the issue is confined to a single scheduler, we could have used only one core, but why should we, right? :)
- Core 0: run one thread with the mandatory blinker as a “heartbeat” indicator
- Core 1: run two threads, one causing the problem (thread 1), one showing it (thread 0)
Since we want to show output from two threads on the same core, the standard module Out
with its core-specific output terminals does not work. We could use module Texts
, but it was easy enough to quickly write a custom module Out
, which is found in the example directory. Simply using Out
, thread 0 writes to terminal 0, thread 1 to terminal 1.
To exacerbate the issue for demo purposes, the two terminals are set to work at a measly 9,600 Baud. To print one character takes about one millisecond. This ensures that filling an empty transmit buffer takes way less time than emptying it, so that processing overhead plays a minor role for this demo.
As you’ll see we will swap the device drivers for the UART device, which are allocated in module Main
, using Terminals.Open
. Hence, there’s also a custom module Main
in the example directory so that the generic one remains untouched.
Output Terminals
See Set-up, two-terminal set-up.
Set the terminals to 9,600 Baud.
Build and Run
Build module NoBusyWaitingC0
with Astrobe, and create and upload the UF2 file using abin2uf2
.
Set Astrobe’s memory options as listed, and the library search path as explained.
Threads (Processes)
Thread 0
Thread 0 is instrumented with timing points and output to demonstrate the problem.
It prints, each second:
c1-t0 5002 69
- “c1-t0”: core one, thread zero
- “5002”: the max time since it was last run by the scheduler (microseconds), captured during the last second
- “69”: the max time required for one run-through (microseconds), captured during the last second
The thread is set to a period of 5 milliseconds, so 5,002 microseconds interval time is OK.
Thread 1
Thread 1 is set up to be the troublemaker. It prints a string (see TestString0
etc.), per second.
Output:
c1-t1 01234567890123456789 P
- “c1-t1”: core one, thread one
- “01234567890123456789”: the test string (it’s
TestString0
in this case) - “P”: triggered by period (see below)
Challenge
Let’s run the code with ever longer strings printed by thread 1. Here are the outputs of four test runs.
c1-t1 0123456789012345678901234 P
c1-t0 5001 69
c1-t1 012345678901234567890123456789 P
c1-t0 7199 69
c1-t1 01234567890123456789012345678901234 P
c1-t0 12405 74
c1-t1 0123456789012345678901234567890123456789 P
c1-t0 17612 74
As long as we stay below the 32 byte transmit buffer, everything looks good. Thread 0 gets scheduled and run every 5 milliseconds. With longer strings, thread 1 will start to eat processing time busy-waiting for the FIFO to be emptied towards the serial terminals, blocking thread 0 to be scheduled, so its interval time gets above its 5 ms period (7,199, 12,405, and 17,612 microseconds, respectively). For each five characters more that thread 1 transmits, thread 0 has to wait about 5 milliseconds longer, which jibes with the 1 ms per character of 9,600 Baud.
Solution
Let’s run the same tests with the improved kernel, which offers a feature to avoid busy waiting.
c1-t1 0123456789012345678901234 P
c1-t0 5002 69
c1-t1 012345678901234567890123456789 D
c1-t0 5002 69
c1-t1 01234567890123456789012345678901234 D
c1-t0 5001 68
c1-t1 0123456789012345678901234567890123456789 D
c1-t0 5002 69
Thread 0 does not get delayed anymore, but runs on time, independent of the length of the test string that thread 1 prints. Also notice how P
changes to D
at the end of the line, indicating when thread 1 starts to be triggered by the device availability, not the period anymore. It’s exactly when thread 0 started to show delays above (7,199 us), ie. when thread 1 prints strings that don’t fit the transmit buffer – see the explanation below.
To get this result, change this line in Main.init
(the one in the example directory):
Terminals.Open(UART1, UART1_TxPinNo, UART1_RxPinNo, Baudrate, UARTstr.PutString, UARTstr.GetString);
to this:
Terminals.Open(UART1, UART1_TxPinNo, UART1_RxPinNo, Baudrate, UARTkstr.PutString, UARTkstr.GetString);
The terminal for thread 1 will now use a different PutString
procedure.
Explanation
PutString
in module UARTstr
– busy-waiting on buffer full:
PROCEDURE PutString*(dev: TextIO.Device; s: ARRAY OF CHAR; numChar: INTEGER);
VAR dev0: UARTdev.Device; i: INTEGER;
BEGIN
dev0 := dev(UARTdev.Device);
IF numChar > LEN(s) THEN numChar := LEN(s) END;
i := 0;
WHILE i < numChar DO
IF ~SYSTEM.BIT(dev0.FR, UARTdev.FR_TXFF) THEN (* not full *)
SYSTEM.PUT(dev0.TDR, s[i]);
INC(i)
END
END
END PutString;
PutString
in module UARTkstr
– transfer control on buffer full:
PROCEDURE PutString*(dev: TextIO.Device; s: ARRAY OF CHAR; numChar: INTEGER);
VAR dev0: UARTdev.Device; i: INTEGER;
BEGIN
dev0 := dev(UARTdev.Device);
IF numChar > LEN(s) THEN numChar := LEN(s) END;
i := 0;
WHILE i < numChar DO
IF ~SYSTEM.BIT(dev0.FR, UARTdev.FR_TXFF) THEN (* not full *)
SYSTEM.PUT(dev0.TDR, s[i]);
INC(i)
ELSE
Kernel.AwaitDeviceSet(dev0.FR, {UARTdev.FR_TXFE})
END
END
END PutString;
That is, when the transmit buffer is full, the thread transfers to the scheduler, and will only be triggered again as soon as flag UARTdev.FR_TXFE
at address dev0.FR
(see module UARTdev
) is set, indicating an empty buffer. In the meantime, other threads can run, in this case, thread 0.
The example code as well as the implementation in module Kernel
should be easy to understand. It’s a very simple solution indeed. But effective, IMHO.
Repository
- NoBusyWaitingC0
- NoBusyWaitingC1 (that’s the interesting one :)