Chapter 4: Peripherals and the Real World

If you want to read a sensor, spin a motor, or print a debugging message to a console, your CPU needs to talk to the physical world. In the old days of desktop motherboards, engineers routed massive parallel buses—sometimes 32 or 64 physical copper traces—to move data between components.

In the embedded Cyber-Physical System (CPS) world, pins and board space are your most precious commodities. You simply cannot afford to run 32 wires to a temperature sensor. Instead, we use serial communication. By transmitting data sequentially, one bit at a time, we drastically reduce the physical footprint of the hardware.

In this chapter, we will dissect the three most ubiquitous serial protocols in embedded architecture: UART, SPI, and I2C. Then, we are going to roll up our sleeves and write a production-grade, bare-metal UART driver in C.

4.1 Serial Protocols: UART, SPI, and I2C

When choosing a serial protocol, hardware engineers balance three competing needs: wire count, speed, and whether the communication is point-to-point or a shared network.

UART: The Universal Asynchronous Receiver-Transmitter

The UART is the absolute workhorse of the embedded world. If you plug a USB-to-serial cable into a Raspberry Pi or an automotive Engine Control Unit (ECU) to view its console, you are talking to a UART.

UARTs are asynchronous, meaning there is no shared clock wire between the sender and receiver. Because they don’t share a clock, both devices must be configured in software to transmit and receive at the exact same speed, known as the baud rate.

A standard UART connection uses just three wires: Transmit (TX), Receive (RX), and Ground. The protocol frames each byte of data with synchronization bits:

The Idle State: The line is held at a high voltage (logic 1).
The Start Bit: The transmitter pulls the line low (logic 0) to grab the receiver’s attention and start its internal sampling timer.
The Data Payload: The 8 bits of the payload are transmitted sequentially.
The Stop Bit: The transmitter pulls the line high (logic 1) for at least one bit duration to cleanly finish the frame.

Because the RX and TX lines are independent, a UART is full-duplex—it can send and receive data simultaneously.

SPI: The Serial Peripheral Interface

While UART is great for consoles, it tops out around a few megabits per second. If you need to quickly stream data to a high-resolution LCD screen or read an SD card, you need SPI.

Developed by Motorola in the 1980s, SPI is synchronous, meaning it uses a dedicated clock wire to keep the sender and receiver in perfect lockstep. SPI typically requires four wires:

SCK (Serial Clock): Generated by the Master.
MOSI (Master Out, Slave In): Data flowing from the CPU to the peripheral.
MISO (Master In, Slave Out): Data flowing from the peripheral to the CPU.
CS/SS (Chip Select / Slave Select): Pulled low by the Master to wake up a specific peripheral.

SPI is essentially a massive, distributed shift register. On every tick of the SCK clock, the Master shifts one bit out on the MOSI line, and simultaneously reads one bit in on the MISO line.

WARNING: The Full-Duplex Trap SPI is always full-duplex. If you only want to read data from an SPI sensor, you cannot just sit back and listen. Because the Master generates the clock, you must transmit dummy bytes (usually 0x00 or 0xFF) out of the MOSI pin just to keep the clock ticking so the sensor can send its data back to you on the MISO pin.

I2C: The Inter-Integrated Circuit

SPI is fast, but it requires a dedicated Chip Select wire for every single peripheral you add to the bus. If you have 10 sensors, you need 13 wires. I2C solves this by putting multiple devices on a shared two-wire network.

I2C uses:

SCL (Serial Clock): Driven by the Master.
SDA (Serial Data): A bidirectional data line.

To avoid electrical short circuits when multiple devices try to talk at once, I2C uses an open-drain architecture. The silicon chips can only actively pull the wires low (to 0 volts). When they want to transmit a logic 1, they simply let go of the wire, and external pull-up resistors passively float the voltage back to high.

Because there are no Chip Select wires, the Master begins communication by broadcasting a 7-bit address over the SDA wire. Every device on the bus listens, but only the hardware matching that specific address responds by pulling the SDA line low for one clock cycle to signal an Acknowledge (ACK). Because SDA is bidirectional, I2C is strictly half-duplex—devices must take turns talking.

4.2 Writing a Bare-Metal UART Driver

Now that we understand the hardware, let’s write a driver. We are going to target the ARM PL011 UART, which is the standard serial port found on Raspberry Pis and in QEMU virt machine emulators.

To use the UART, we can’t just throw data at it. We have to properly compute the timing divisor to hit our target baud rate, configure the line control registers (8 data bits, no parity), and monitor the hardware FIFOs.

4.2.1 The Register Map

Based on the ARM peripheral documentation, our PL011 UART lives at the physical base address 0x09000000 (on QEMU) and exposes several 32-bit Memory-Mapped I/O (MMIO) registers. Here are the ones we care about:

UARTDR (Data Register - Offset 0x00): Read/Write this to receive/send data.
UARTFR (Flag Register - Offset 0x18): Read-only status flags (e.g., is the FIFO full?).
UARTIBRD (Integer Baud Rate Divisor - Offset 0x24): The whole number part of the clock divider.
UARTFBRD (Fractional Baud Rate Divisor - Offset 0x28): The fractional part of the clock divider.
UARTLCR_H (Line Control Register - Offset 0x2C): Sets data frame size, parity, and enables FIFOs.
UARTCR (Control Register - Offset 0x30): Master switch to enable the UART, TX, and RX modules.

4.2.2 The Baud Rate Math

UARTs operate by dividing the main system clock down to the target baud rate. The PL011 uses a fractional baud rate generator. The formula provided by ARM hardware manuals is:

$$ \text{Divisor} = \frac{\text{System Clock Frequency}}{16 \times \text{Target Baud Rate}} $$

Assume our system clock runs at 48 MHz (48,000,000 Hz) and we want a standard console baud rate of 115,200.

Calculate the exact divisor: 48,000,000 / (16 * 115,200) = 26.041666...
Extract the Integer part (IBRD): 26
Calculate the Fractional part (FBRD): We take the fractional remainder (0.041666...), multiply it by 64, and round to the nearest integer. 0.041666... * 64 = 2.666... which rounds to 3.

We will program 26 into the UARTIBRD register and 3 into the UARTFBRD register.

4.2.3 The Driver Code

Here is the complete C code for initializing the UART, putting characters into the transmit FIFO, and polling the receive FIFO for input.

#include <stdint.h>

// 1. Define the physical base address of the UART
#define UART_BASE 0x09000000

// 2. Map the register offsets to volatile pointers
#define UART_DR     (*(volatile uint32_t *)(UART_BASE + 0x00))
#define UART_FR     (*(volatile uint32_t *)(UART_BASE + 0x18))
#define UART_IBRD   (*(volatile uint32_t *)(UART_BASE + 0x24))
#define UART_FBRD   (*(volatile uint32_t *)(UART_BASE + 0x28))
#define UART_LCR_H  (*(volatile uint32_t *)(UART_BASE + 0x2C))
#define UART_CR     (*(volatile uint32_t *)(UART_BASE + 0x30))

// 3. Define the bit-masks for the Flag Register (FR)
#define FR_TXFF (1 << 5) // Transmit FIFO Full
#define FR_RXFE (1 << 4) // Receive FIFO Empty

// 4. Initialize the UART hardware
void uart_init(void) {
    // Step A: Disable the UART before making configuration changes
    UART_CR = 0;

    // Step B: Set the baud rate to 115200 (assuming a 48 MHz clock)
    // Divisor = 48MHz / (16 * 115200) = 26.041666...
    UART_IBRD = 26; 
    UART_FBRD = 3;  // 0.041666 * 64 = 2.666 -> 3

    // Step C: Configure Line Control (8 data bits, 1 stop bit, no parity)
    // Bit 5 & 6 (0b11 << 5) = 8-bit word length
    // Bit 4 (1 << 4) = Enable the hardware FIFOs
    UART_LCR_H = (3 << 5) | (1 << 4);

    // Step D: Enable the UART, Transmit (TXE = bit 8), and Receive (RXE = bit 9)
    // Master Enable (UARTEN) = bit 0
    UART_CR = (1 << 9) | (1 << 8) | (1 << 0);
}

// 5. Send a single character out of the serial port
void uart_putc(char c) {
    // Spin-wait while the Transmit FIFO is Full
    while (UART_FR & FR_TXFF) {
        // CPU burns cycles here waiting for hardware to catch up
    }
    // Shove the character into the Data Register
    UART_DR = c;
}

// 6. Receive a single character from the serial port
char uart_getc(void) {
    // Spin-wait while the Receive FIFO is Empty
    while (UART_FR & FR_RXFE) {
        // CPU waits here for the user to hit a key
    }
    // Pull the character out of the Data Register (masking off status bits)
    return (char)(UART_DR & 0xFF);
}

// 7. Helper function to print a whole string
void uart_puts(const char *str) {
    while (*str) {
        uart_putc(*str++);
    }
}

4.2.4 Line-by-Line Breakdown

Let’s look at exactly how this interacts with the silicon:

Step 2: The volatile Pointers Just as we discussed in Chapter 1, memory-mapped I/O requires the volatile keyword. Notice how we cast the raw hexadecimal address 0x09000000 + 0x18 to a (volatile uint32_t *), and then immediately dereference it with the leading *. This turns UART_FR into a macro that behaves exactly like a standard C variable, but strictly forces the compiler to generate raw load/store instructions directly over the AXI bus to the peripheral.

Step 4: The Initialization Sequence (uart_init) You cannot change the tires on a car while it’s driving. If you attempt to change the Baud Rate Divisors while the UART is actively transmitting, you will send corrupt glitch data down the wire. Step A forcefully disables the peripheral by writing 0 to the Control Register (UART_CR). After writing the computed baud dividers in Step B, we set up the Line Control Register (UART_LCR_H) in Step C. By writing (3 << 5), we set the Word Length to 8 bits. We also explicitly enable the 16-byte hardware FIFOs. Finally, in Step D, we flip the master power switch, the TX enable, and the RX enable bits back on.

Step 5: Handling the Transmit FIFO (uart_putc) Because the CPU operates at gigahertz speeds, and the UART transmits at kilohertz speeds, the CPU can instantly overwhelm the UART. The UART has a 16-byte Transmit FIFO hardware buffer to absorb this. However, if we dump 17 characters into the UART, the buffer fills up. The hardware asserts the FR_TXFF (Transmit FIFO Full) bit in the Flag Register. Our while (UART_FR & FR_TXFF) loop is known as polling or spin-waiting. The processor halts its progress, constantly reading the Flag Register over the memory bus, waiting for the hardware to shift a bit over the physical TX wire and free up a slot in the FIFO buffer. Only when the full flag clears does the CPU write the next character into UART_DR.

Step 6: Handling the Receive FIFO (uart_getc) Receiving data is the exact inverse. If the CPU wants to read user input, but the user hasn’t pressed a key yet, the Receive FIFO is empty. The hardware holds the FR_RXFE (Receive FIFO Empty) bit high. The while (UART_FR & FR_RXFE) loop blocks the program, burning CPU cycles until the UART hardware detects a Start Bit, deserializes an entire 8-bit frame, and drops it into the RX FIFO. Once the empty flag drops, we read UART_DR. We apply an & 0xFF bitwise mask because the PL011 uses the upper bits of the Data Register to report physical line errors (like framing or parity errors); masking ensures we only return the clean 8-bit ASCII character.

TIP: The Cost of Polling Spin-waiting on the UART_FR register works perfectly for a simple console or a bootloader. But as we’ll see in the next chapter, trapping your CPU in an infinite loop while waiting for a 115200 baud serial connection is catastrophically inefficient. A true RTOS will configure the UART to fire a hardware interrupt when the FIFO is ready, allowing the CPU to go to sleep or execute other threads in the meantime!

Keyboard shortcuts

Foundations of Computer Architecture and Cyber-Physical Systems 2.0