Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 1: Meet Your Hardware: Processors and the Memory Map

If you’re coming from desktop software or cloud backend development, you are used to the operating system lying to you. When you allocate an array in Python, instantiate an object in Java, or call malloc() in C++, the operating system’s Memory Management Unit (MMU) hands you a virtual address. You don’t know where that data actually lives in the physical silicon RAM chips, and frankly, you don’t have to care. The OS handles page tables, swaps to disk, and keeps you safely isolated in your own little sandbox so you can’t accidentally overwrite another program.

Welcome to bare-metal embedded systems. The lies stop here.

When you write firmware for a Cyber-Physical System (CPS)—whether that is a drone flight controller, an engine management unit, or an IoT sensor—you are writing code that interacts directly with the physical world. To do this, you must bypass the abstractions and interact directly with the raw hardware. In this chapter, we explore how the CPU sees the world when the training wheels are off.

1.1 The Big Picture: Escaping the Virtual Sandbox

To understand how to write software that controls hardware, you first need to understand how the hardware is built. Modern embedded devices are not built from discrete, disconnected chips scattered across a massive circuit board. Instead, they are designed as a System-on-Chip (SoC).

As defined in David J. Greaves’s Modern System-on-Chip Design, a SoC essentially consists of a collection of Intellectual Property (IP) blocks and an associated interconnect. IP blocks are highly optimized, reusable hardware modules. A typical SoC is assembled from a variety of these blocks:

  • Processor Cores: The CPU itself (e.g., an ARM Cortex-M or a RISC-V core), which fetches and executes your software.
  • Memory Blocks: Static RAM (SRAM) for your variables and Flash ROM for your compiled code.
  • I/O Peripherals: Specialized hardware devices like Universal Asynchronous Receiver-Transmitters (UARTs) for serial communication, Ethernet MACs, and General-Purpose Input/Output (GPIO) pins that physically connect to the outside world.

To the software developer, the most important question is: How does the processor core actually talk to all these different IP blocks?

In a desktop environment, you might assume there are special “hardware instructions” to talk to a network card or a serial port. But in modern 32-bit and 64-bit SoC architectures, such as ARM and RISC-V, this is not the case. Instead, the entire system operates with a global, flat address space that covers all peripherals and memory.

Every IP block on the chip is wired into a massive internal network known as the interconnect bus. When your CPU wants to talk to a peripheral, it doesn’t use a special command; it simply writes a standard 32-bit or 64-bit number to a specific physical address on that bus. The hardware routes your data to the correct IP block just like a postal worker routes a letter.

1.2 The Memory Map

Because every piece of hardware on the SoC shares the same address bus, system architects must slice up the available address space into dedicated regions. This layout is known as the memory map.

A memory map ensures that memory devices within a computer are configured so that each device occupies a unique span of the system address space. If you look at the memory map of a typical ARM processor, you won’t just see RAM. You will see a precisely engineered zoning plan for the entire chip.

Here is a simplified visual representation of a bare-metal memory map for a 32-bit ARM Cortex SoC:

 0xFFFF_FFFF +-------------------------+
             |                         |
             |       System Level      | <- CPU Control Registers, 
             |                         |    Interrupt Controllers (NVIC)
 0xE000_0000 +-------------------------+
             |                         |
             |     I/O Peripherals     | <- UART, GPIO, Timers, SPI, I2C
             |                         |    (Mapped to hardware pins)
 0x4000_0000 +-------------------------+
             |                         |
             |       SRAM (Data)       | <- Your variables, the Call Stack, 
             |                         |    and the Heap
 0x2000_0000 +-------------------------+
             |                         |
             |     Flash ROM (Code)    | <- Your compiled C/Rust/ASM program
             |                         |    and read-only constants
 0x0000_0000 +-------------------------+

When your CPU core executes a load or store instruction, it places the target address on the bus. The interconnect uses an address decoder logic circuit that looks at the upper bits of the address to figure out which IP block should receive the data.

Let’s walk through how this practical layout affects you as a software developer:

1. The Code Region (0x0000_0000) When the processor powers up, the reset circuitry forces the Program Counter (PC) to the bottom of the memory map (or a similarly well-defined reset vector) to fetch the very first instruction. In embedded systems, this region maps to non-volatile Flash ROM. Your compiled program lives here. Because it is physically implemented as ROM, any attempt by your software to write data to an address in this range will simply be ignored by the hardware (or trigger a hardware fault).

2. The Data Region (0x2000_0000) This region is physically wired to the on-chip Static RAM (SRAM). Unlike Flash, SRAM is extremely fast but volatile, meaning it loses its contents when power is removed. When you declare a global variable like int sensor_reading = 0; in your C code, the linker places that variable in this region. When your functions execute, the CPU’s Stack Pointer (SP) moves up and down within this SRAM region to store your local variables and function return addresses.

3. The Peripheral Region (0x4000_0000) This is where the magic happens. Addresses in this range do not point to memory cells at all. Instead, they point directly to the control registers inside the physical I/O devices. If you write a byte to an address mapped to a UART transmitter, that byte does not get stored; it gets shoved into a hardware shift register and physically blasted out over a wire at thousands of bits per second. This concept is called Memory-Mapped I/O (MMIO).

NOTE: Beware the Aliases A common trick SoC designers use to simplify the address decoder logic is to not decode all 32 bits of the address if the IP block only requires a few bits. This can result in aliasing, where the exact same physical memory location or hardware register appears at multiple different addresses in the memory map. As a developer, you must only access hardware using the official base addresses provided in the manufacturer’s datasheet, or you risk strange bugs when porting your code to newer chips.

By understanding the memory map, you realize that writing embedded software is essentially the art of moving data between different zones of physical memory. To change the physical world—to spin a motor, fire a spark plug, or send a network packet—you simply need to write the correct binary sequence to the correct physical address in the peripheral zone.

1.3 Memory-Mapped I/O (MMIO)

If you want to read a file or print text to a screen on a desktop computer, you call a high-level operating system API. On a bare-metal Cyber-Physical System, there is no OS to do the heavy lifting for you. If you want to turn on a motor, blink an LED, or send a byte over a serial port, you have to talk directly to the silicon. You do this using Memory-Mapped I/O (MMIO).

In an MMIO architecture, the hardware peripherals—such as General-Purpose Input/Output (GPIO) pins, timers, and UARTs—are assigned specific addresses in the processor’s physical address space. To the CPU, an I/O device looks exactly like a standard RAM location. You don’t need special, arcane CPU instructions to interact with the outside world; you just read and write to these specific memory addresses using standard pointers.

Let’s look at how we actually command the hardware in C. Suppose you are working with a microcontroller where the datasheet tells you that the GPIO Port output data register is physically wired to memory address 0x40020000. To flip a GPIO pin high (which might turn on an LED or fire a thruster), you just need to write a 32-bit integer to that exact address.

Here is the inline C code to make that happen:

// 1. Define the raw physical memory address from the datasheet
#define GPIO_PORT_DATA_ADDRESS 0x40020000

// 2. Cast that raw address into a pointer to a 32-bit unsigned integer
// 3. Dereference it so we can write directly to the hardware
#define GPIO_PORT_DATA (*((volatile uint32_t *) GPIO_PORT_DATA_ADDRESS))

void turn_on_led(void) {
    // Write a 1 to the 5th bit (Pin 5) to flip the GPIO pin HIGH
    GPIO_PORT_DATA |= (1 << 5); 
}

WARNING: The Optimizer Wants to Ruin Your Day

Notice the volatile keyword in that pointer cast? In embedded systems, volatile is the difference between a working robot and a smoking crater.

Modern C/C++ compilers (like GCC or Clang) are aggressively optimized. If you write a loop that constantly checks a hardware status register to see if a sensor is ready, the compiler’s optimizer will look at your code, notice that your software never changes the value at that memory address, and assume the value is static. It will optimize your hardware read out of the loop entirely, caching the value in a CPU register. Your program will compile perfectly, but it will lock up in an infinite loop because it never checks the actual hardware again.

The volatile keyword forces the compiler to abandon these optimizations for that specific variable. It tells the compiler: “I know what I’m doing. This memory address is actually a physical hardware device that can change on its own. Force a direct, raw memory read/write over the bus every single time this variable is referenced in the code.”

1.4 The Boot Process

If you’ve written C or C++ before, you are used to the idea that program execution magically begins at the main() function. But who calls main()?

In a desktop environment, the operating system loader allocates memory, sets up the environment, and invokes your main() function. In the bare-metal world, there is no operating system. The hardware has strict, immutable rules about what happens the exact nanosecond power is applied to the chip.

When an ARM Cortex-M processor boots, it does not look for main(). Instead, the hardware reset circuitry forces the processor to look at a very specific memory location (usually the very bottom of the memory map, at address 0x0000_0000) for a data structure called the Vector Table.

The first two 32-bit entries in the Vector Table are absolutely critical to booting the processor:

  1. Initial Stack Pointer (Offset 0x00): The CPU fetches this 32-bit value and directly loads it into the Stack Pointer (SP) register. This brilliant architectural design guarantees that your processor has a working call stack before it executes a single instruction.
  2. Reset Vector (Offset 0x04): The CPU fetches this value and loads it into the Program Counter (PC). This is the memory address of the Reset Handler, the very first assembly instruction that the processor will execute.

The Reset Handler points to a small, critical piece of startup assembly code typically referred to as crt0 (C runtime zero). You cannot just jump straight into C code; the C language standard makes certain assumptions about the state of memory, and it is the job of crt0 to build that environment.

crt0 performs several vital chores before handing control over to your application:

  • Clearing the .bss Section: When SRAM powers up, it contains random, chaotic garbage values. However, the C standard dictates that any uninitialized global or static variables must default to zero. The compiler groups all of these uninitialized variables into a memory segment called the .bss section. The crt0 code must manually loop through the entire .bss region in RAM and write zeros to every single address.
  • Initializing the .data Section: If you declare a global variable with a specific starting value (e.g., int motor_speed = 500;), that value is stored in the non-volatile Flash ROM (so it survives power loss). The compiler groups these into the .data section. The crt0 code must copy these initial values from the slow Flash memory into the fast, volatile SRAM where the variables will actually live during execution.
  • Calling main(): Only after the stack pointer is initialized, the .bss section is zeroed out, and the .data section is populated, does crt0 finally execute a branch-and-link instruction to call your main() function.

At this point, the safety wheels are off, your C environment is ready, and you are in full control of the silicon.