Part 1: Bootup to LED

Introduction

This post explains how to get a minimal application booting on a Teensy 3.2. Further blog posts will explain how to access the various hardware features of this chip. The goal of this tutorial series is to explore building safe hardware abstractions in Rust. As such, existing libraries for embedded devices will not be used.

The Teensy family is a set inexpensive embedded development boards, originally designed to be programmed using the Arduino environment. The Teensy 3.2 that we’ll be targeting is based on a Freescale (NXP) MK20DX256 ARM Cortex-M4 microcontroller.

The Teensy 3.x boards use microntrollers in the same family. Much of what’s in this series will be applicable to all of them, and you can probably follow along if you’re willing to read the datasheet for your board. I’ll try to call out where things might be different for other chips, but you’ll have the best luck using the Teensy 3.1 or 3.2 that the tutorials are written for.

This tutorial is written mostly for Linux; specifically Arch. You may have to adjust commands for other OSes, or even for other Linux distros. If anything is broken for you, please feel free to file an issue on GitHub.

Target Audience

This series of posts is aimed at someone who has done “lightweight” embedded development on an Arduino (or similar). It does not rely on you having existing knowledge of how microcontrollers are programmed at the “low level”. You will be expected to already have a basic knowledge of Rust, although I’ll cover some language details when we get in to places where embedded work differs from desktop development.

A Short Introduction to Embedded Programming

Unlike with typical desktop or server applications, embedded programs do not have an operating system to provide them with hardware control. Instead, they must access the hardware directly. The exact process for hardware control varies depending on the type of processor in use. For the ARM microcontroller that we’re using, we access the hardware through memory mapped registers.

Memory mapping is assigning a special memory address which, when read from or written to, interacts with a hardware device instead of RAM. For example, address 0x4006A007 is the UART Data Register. Writing a byte to this address will cause that data to be sent across the serial port.

Writing to arbitrary memory addresses requires unsafe Rust. One of our goals through this series will be to use Rust’s language features to create safe interfaces for these unsafe memory accesses.

Development Environment

Currently, embedded development requires the use of nightly Rust to be practical. While many things can now be done with stable rust, we will still need a nightly version to access some specific hardware instructions. We’ll use Rustup to install nightly Rust.

$ rustup toolchain install nightly
$ rustup component add --target thumbv7em-none-eabi rust-std --toolchain=nightly

We need to add the appropriate stdlib for the architecture we’re targeting. For the Teensy 3.2, this is thumbv7em-none-eabi. This provides the core crate that our embedded application will be linked against.

Modern nightly versions of rust provide lld, the LLVM linker. However, we still require binutils in order to convert our binary to a format which can be loaded onto the teensy. For arch, we install binutils like so:

$ sudo pacman -S arm-none-eabi-binutils

Finally, you’ll want to get the Teensy Loader. This is a small command line tool that handles flashing a program to the Teensy. If you are on Linux, it may also be available through your package manager.

Code Overview

For this first post, we’ll be focused on the bootup procedure of the MK20DX256. We’ll start by building up the skeleton of an embedded application. Next, we’ll handle some basic hardware initialization tasks. Lastly, we will add some code to turn on the Teensy’s LED. This will let us see that our code is executing on the device.

Bootup Sequence

The MK20DX256 starts up by loading an initial stack pointer and reset vector from the beginning of flash memory. The reset vector is the equivalent of main in a normal desktop application - it is the first bit of our code that will execute.

Once our main function has control, it will have to perform some basic hardware setup - disabling the watchdog and enabling the clock gate for any peripherals that the application needs.

The watchdog is a piece of hardware which will reset the microcontroller unless the running application “checks in” in a certain interval. It’s designed to restart crashed or hung programs. For our needs in this tutorial it just adds complexity, so we will disable it.

The other part of hardware initialization is clock gating. This term comes from implementation details of how microcontrollers are constructed. You should think of a clock gate as an on/off switch for a piece of functionality. As we progress, we will need to enable the clocks for a number of hardware features.

Application Setup

We’ll start by creating a new application with cargo, and setting it to use nightly Rust.

$ cargo new --bin teensy
$ cd teensy
$ rustup override set nightly

The first thing to do is make our program embedded-friendly. There are a few major changes to src/main.rs that we’ll need to make. Here’s the new code, with explanations below:

#![feature(stdsimd)]
#![no_std]
#![no_main]

#[no_mangle]
pub extern fn main() {
    loop{}
}

The first line enables the use of intrinsics, and is the reason we need nightly Rust. The next two lines actually disable features of the Rust environment - the standard library, and the main wrapper. The Rust standard library relies on a full operating system, and can’t typically be used for embedded development. Instead, we will have access to libcore, which is the subset of std that is available without an OS. Similarly, the main wrapper is used for application setup tasks that aren’t necessary in embedded programs.

Lastly, we’ve marked main as an extern function, and added an infinite loop to it. Extern tells the Rust compiler that this function follows the C calling convention. The details of what this does vary by target, and are beyond the scope of this post. The important effect of the change is that it’s now safe to use main as our reset vector. Adding the infinite loop ensures that main will never return. There’s no code for main to return to in this embedded environment.

Language Items

The Rust compiler relies on certain functionality to be defined by the standard library. Unfortunately for us, we just disabled it. This means that we are responsible for providing these features.

For now, the only language feature we’re responsible for is the panic handler. This is the function that gets called to display a message when our code panics. We will eventually want to pass these messages along to the user, but initially we will ignore them and hang the program.

#[panic_handler]
fn teensy_panic(_pi: &core::panic::PanicInfo) -> ! {
    loop {};
}

Static Data

There are two arrays of data the the hardware expects. The first is the interrupt table. This contains the initial stack pointer and reset vector that was mentioned earlier. The second is the flash configuration. This is a block of 16 bytes which control how the flash can be read and written. The Teensy bootloader makes assumptions about these values, so we will use the same set of bytes as the Teensy Arduino tooling. Specifically, we disable all flash security through the FSEC field, and tell the processor to boot into high-power mode with FOPT.

extern {
    fn _stack_top();
}

#[link_section = ".vectors"]
#[no_mangle]
pub static _VECTORS: [unsafe extern fn(); 2] = [
    _stack_top,
    main,
];

#[link_section = ".flashconfig"]
#[no_mangle]
pub static _FLASHCONFIG: [u8; 16] = [
    0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
    0xFF, 0xFF, 0xFF, 0xFF, 0xDE, 0xF9, 0xFF, 0xFF
];

We will use the link_section attributes in a minute to control where in the flash memory these arrays end up. The no_mangle attribute is needed to tell Rust that these arrays have special meaning at link time. Without it, the data will not appear in our final executable.

_stack_top is not really a function. It is a memory address representing the initial stack pointer. We pretend that it is a function so that our _VECTORS array is easier to write. Fortunately calling it from our own code is unsafe, so we can be pretty sure that only the hardware will read these values.

Compiling and Linking

Our program now contains the important data tables, as well as a main that can be called by the microcontroller. We will now turn our attention to building the project for the Teensy. We’ll use a Makefile to handle the build process. Laying out the code and data in the Teensy’s flash memory is done with a linker script.

Linker Scripts

The linker script includes information on available memory regions, and how program code and data are organized within those regions. Our linker script will start out very simply, as we don’t have a lot going on in our program. We’ll put our linker script in a new file called layout.ld:

MEMORY
{
    FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 256K
    RAM  (rwx) : ORIGIN = 0x1FFF8000, LENGTH = 64K
}

SECTIONS
{
    .text : {
        . = 0;
        KEEP(*(.vectors))
        . = 0x400;
        KEEP(*(.flashconfig*))
        . = ALIGN(4);
        *(.text*)
    } > FLASH = 0xFF

    .rodata : {
        *(.rodata*)
    } > FLASH

    _stack_top = ORIGIN(RAM) + LENGTH(RAM);

    /DISCARD/ : {
        *(.ARM.*)
    }
}

We begin by specifying the address ranges of both FLASH and RAM. Next we lay out our program code at the beginning of the flash memory. We start with the interrupt table in the first 1024 bytes. After that comes the flash configuration at address 0x0400. The rest of flash can be laid out however we want. For now, we add our program code and read-only data right after the flash configuration block.

This file also specifies the address of _stack_top as the highest available memory address. Since we currently have no other data in RAM, this means that our stack can grow to fill up the entire memory space. We’ll eventually constrain it by adding additional data to RAM.

Finally, we discard any sections in the executable that match the pattern .ARM.*. This is all metadata that we don’t need in our binary, and would waste space in our constrained environment.

Rust Linker Configuration

Rust will not use our linker script until we tell it to. This is done with a cargo configuration file, which must be named .cargo/config. While we’re here, we’ll also tell Cargo to target our microcontroller.

[build]
target = "thumbv7em-none-eabi"

[target.thumbv7em-none-eabi]
rustflags = [
    "-C", "link-arg=-Tlayout.ld",
]

The Makefile

Building our project is a three-step process:

Use cargo to compile the rust code for the target processor
Convert the built program to the right format with objcopy
Flash our code to the Teensy with teensy_loader_cli

This sort of repetitive sequenced work is exactly what Makefiles were designed for. Ours isn’t particularly complicated:

BIN=teensy
OUTDIR=target/thumbv7em-none-eabi/release
HEX=$(OUTDIR)/$(BIN).hex
ELF=$(OUTDIR)/$(BIN)

all:: $(ELF)

.PHONY: $(ELF)
$(ELF):
	cargo build --release

$(HEX): $(ELF)
	arm-none-eabi-objcopy -O ihex $(ELF) $(HEX)

.PHONY: flash
flash: $(HEX)
	teensy_loader_cli -w -mmcu=mk20dx256 $(HEX) -v

The targets marked .PHONY will always be built by make, even if it thinks they are up-to-date. This is needed for the ELF target, since its dependencies are managed by cargo. We also add it to the flash target so that make will still flash our image even if someone creates a file called “flash”.

Running make will build our project. make flash will install it to a Teensy.

At this point, we can compile our project and flash it to a Teensy. Sadly, it does nothing interesting - we wouldn’t even be able to tell if it was running. In the next section we’ll expand our code to do something more useful.

Accessing The Hardware

Our first steps here will be some basic hardware initialization tasks. We’ll build accessors for the watchdog and for the System Integration Module, or SIM. The SIM handles clock gating as well as most other global configuration of the microcontroller. Once we have those in place, we’ll turn to the I/O functions necessary to turn on the LED.

Disabling the Watchdog

The first bit of hardware setup we’ll do is disabling the watchdog. The watchdog’s control is done through a series of 12 16-bit registers at address 0x40052000. This can be represented in Rust as a packed structure.

#[repr(C,packed)]
pub struct Watchdog {
    stctrlh: u16,
    stctrll: u16,
    tovalh: u16,
    tovall: u16,
    winh: u16,
    winl: u16,
    refresh: u16,
    unlock: u16,
    tmrouth: u16,
    tmroutl: u16,
    rstcnt: u16,
    presc: u16
}

We’ll add this struct to a new file - src/watchdog.rs. The fields of this struct use the same names that the manufacturer does for these registers. They’re hard to read here, but being consistent makes searching for their documentation much easier.

Once we have a struct representing the hardware, we need to build our functions to access it safely. To design this abstraction, we need to think about the invariants of accessing these registers. An invariant is any rule or condition that our unsafe code must take into account, in order for it to be safely callable by safe code. Fortunately the watchdog is pretty simple - it looks just like a struct in memory, and can be treated as such. The biggest invariants here are Rust’s rules about reference aliasing. There can only be one mutable reference to the watchdog struct.

For now, we will say that acquiring a reference to the watchdog is an unsafe operation. This puts the responsibility on the calling code to verify there is only one mutable reference. Once we have that reference, all the functions to update the watchdog will be safe - after all, we’re just changing some fields in memory.

In reality, using the watchdog to its full potential could introduce additional invariants. For example, requiring that a certain value be written to a watchdog register during your main loop. This is not a memory safety issue, and thus strictly falls outside of Rust’s idea of safety. It could cause correctness issues, though, and good API design will try to minimize correctness errors - even if they’re technically “safe”.

The watchdog’s implementation looks like this. Note that new is unsafe, but disable is safe.

use core::arch::arm::__NOP;

impl Watchdog {
    pub unsafe fn new() -> &'static mut Watchdog {
        &mut *(0x40052000 as *mut Watchdog)
    }

    pub fn disable(&mut self) {
        unsafe {
            core::ptr::write_volatile(&mut self.unlock, 0xC520);
            core::ptr::write_volatile(&mut self.unlock, 0xD928);
            __NOP();
            __NOP();
            let mut ctrl = core::ptr::read_volatile(&self.stctrlh);
            ctrl &= !(0x00000001);
            core::ptr::write_volatile(&mut self.stctrlh, ctrl);
        }
    }
}

The disable function is following the procedure set forth in the manufacturer’s data sheet. The watchdog is protected against being accidentally disabled by a random write to memory, so our code must “unlock” it first, by writing special values to the unlock register. Once that’s done, we need to wait for the watchdog to actually unlock itself. The __NOP intrinsic tells the processor to briefly do nothing. This introduces our necessary 2-cycle delay. Finally, we read the control register and un-set the “enable” bit.

All of our memory access are volatile. This tells the Rust compiler that the read (or write) has an effect that it can’t see from our program code. In this case, that effect is a hardware access. Without marking our memory accesses volatile, the Rust compiler would be free to say “You never read from unlock, so I will optimize away the unneeded write to it”. This would, naturally, cause our code to fail.

This disable process shows why we must have only one mutable reference to the watchdog. If an interrupt were to occur partway through this function and write to the watchdog, our attempt to disable it would fail. Knowing that an interrupt cannot change watchdog settings gives us confidence that this code will execute as we expect.

Clock Gating

The other piece of hardware involved in the microcontroller setup is the System Integration Module. We’ll use this to enable the appropriate clock gate to enable our I/O port. Just like the watchdog, the SIM is controlled through a block of memory, which also will be represent as a struct. It has the same basic memory safety rules as the watchdog does, and for now has no extra memory-safety invariants.

There is a potential correctness issue involved with the SIM - it’s possible to use a mutable reference to the SIM to disable a hardware function that another section of code relies on. We can design an API that keeps better track of which functional units are needed, but we will save that for a future post. For now, we’ll just have to trust ourselves.

The complete code for src/sim.rs is here:

use core;

#[derive(Clone,Copy)]
pub enum Clock {
    PortC,
}

#[repr(C,packed)]
pub struct Sim {
    sopt1: u32,
    sopt1_cfg: u32,
    _pad0: [u32; 1023],
    sopt2: u32,
    _pad1: u32,
    sopt4: u32,
    sopt5: u32,
    _pad2: u32,
    sopt7: u32,
    _pad3: [u32; 2],
    sdid: u32,
    _pad4: [u32; 3],
    scgc4: u32,
    scgc5: u32,
    scgc6: u32,
    scgc7: u32,
    clkdiv1: u32,
    clkviv2: u32,
    fcfg1: u32,
    fcfg2: u32,
    uidh: u32,
    uidmh: u32,
    uidml: u32,
    uidl: u32
}

impl Sim {
    pub unsafe fn new() -> &'static mut Sim {
        &mut *(0x40047000 as *mut Sim)
    }

    pub fn enable_clock(&mut self, clock: Clock) {
        unsafe {
            match clock {
                Clock::PortC => {
                    let mut scgc = core::ptr::read_volatile(&self.scgc5);
                    scgc |= 0x00000800;
                    core::ptr::write_volatile(&mut self.scgc5, scgc);
                }
            }
        }
    }
}

The simple match-based clock management we have here would get unwieldy pretty quickly if we intended to use it to manage a large number of hardware functions. We’ll get rid of it when we look in to more robust ways to manage clock gates.

I/O Ports

With the initial hardware setup out of the way, we can turn our attention to achieving that bright orange¹ glow that we’ve been working towards. We will put a pin into GPIO mode, and use it to turn on the LED. GPIO stands for “General Purpose I/O”. When a pin is in GPIO mode, software has control over the high/low state of an output pin and direct read access to the state of an input pin. This is in contrast to the pin being controlled by a dedicated hardware function, such as a serial port.

Pins are grouped into ports, and all of a pin’s settings are controlled from the port’s register block. This poses a bit of a challenge for us. We’d like each pin to be a self-contained struct, so that ownership of it can be passed from one software module to another, and only the owning module can mutate its pins. This follows Rust’s one-owner rule for pins, but would require that each pin be able to mutate its settings in the Port register block. We all know how Rust feels about shared mutable state.

Fortunately, each pin has a separate control register in the port’s block. That means there’s no actual overlap of memory locations that might be written. We’ll take advantage of this to write some very, very careful unsafe code that allows each pin instance to modify its own control settings.

We’ll start out with a port implementation in src/port.rs.

use core;

#[derive(Clone,Copy)]
pub enum PortName {
    C
}

#[repr(C,packed)]
pub struct Port {
    pcr: [u32; 32],
    gpclr: u32,
    gpchr: u32,
    reserved_0: [u8; 24],
    isfr: u32,
}

impl Port {
    pub unsafe fn new(name: PortName) -> &'static mut Port {
        &mut * match name {
            PortName::C => 0x4004B000 as *mut Port
        }
    }

    pub unsafe fn set_pin_mode(&mut self, p: usize, mut mode: u32) {
        let mut pcr = core::ptr::read_volatile(&self.pcr[p]);
        pcr &= 0xFFFFF8FF;
        mode &= 0x00000007;
        mode <<= 8;
        pcr |= mode;
        core::ptr::write_volatile(&mut self.pcr[p], pcr);
    }
}

Note the array of 32 words called pcr; each of these is an individual pin control register. The set_pin_mode function is responsible for switching a single pin into GPIO (or any other) mode. The only memory it touches is the PCR associated with a single pin, and is unsafe to call. It’s unsafety is because calling it for a pin that you do not own could cause a race condition. An interrupt that changes a PCR between the read and write in this function could have its changes overwritten.

The pin struct is next on our list. A pin is not a reference to any particular register. Instead, it is a concept in our code that represents a piece of a port. It will have a mutable reference to its containing port, as well as an integer representing which index in the PCR array it is associated with.

In order for this mutable port reference to be safe, Pin instances must only call methods of Port that affect the correct PCR. We can’t really enforce this, but to encourage it, Pin’s Port reference will actually be a pointer. This makes it impossible to call Port methods without an unsafe block, and reinforces the peculiarity of this arrangement.

pub struct Pin {
    port: *mut Port,
    pin: usize
}

impl Port {
    pub unsafe fn pin(&mut self, p: usize) -> Pin {
        Pin { port: self, pin: p }
    }
}

GPIO and the Bit-Band

There are two ways to access the GPIO registers. The first is through a block of 32-bit registers, associated with a port. It looks something like this:

#[repr(C,packed)]
struct Gpio {
    pdor: u32,
    psor: u32,
    pcor: u32,
    ptor: u32,
    pdir: u32,
    pddr: u32
}

This is very convenient to work with, but has an unfortunate flaw. Each of the fields represents all 32 pins in a Port. This means that any pin changes are subject to a race condition during our read/modify/write process. Pins that are owned by a separate piece of code can have an impact on how our pin behaves.

Fortunately, ARM has a solution to this. We will take advantage of the bit-band alias. Bit-banding is a feature of certain ARM processors that maps a memory region to one 32 times as large. Each 32-bit word of this larger regions maps to a single bit of the original region. This gives us the capability to set or clear a single bit at a time, without risk of race conditions. If we visualized this as a rust struct, the bit-band alias for the GPIO would look like this:

#[repr(C,packed)]
struct GpioBitband {
    pdor: [u32; 32],
    psor: [u32; 32],
    pcor: [u32; 32],
    ptor: [u32; 32],
    pdir: [u32; 32],
    pddr: [u32; 32]
}

This is what we will use to control the GPIO. Just like with Pins and the PCR registers, we will have individual GPIO structures that represent a single GPIO pin. They will ensure safety by only writing to the register words associated with their pin index. Let’s look at all that code now, then walk through it.

pub struct Gpio {
    gpio: *mut GpioBitband,
    pin: usize
}

impl Port {
    pub fn name(&self) -> PortName {
        let addr = (self as *const Port) as u32;
        match addr {
            0x4004B000 => PortName::C,
            _ => unreachable!()
        }
    }
}

impl Pin {
    pub fn make_gpio(self) -> Gpio {
        unsafe {
            let port = &mut *self.port;
            port.set_pin_mode(self.pin, 1);
            Gpio::new(port.name(), self.pin)
        }
    }
}

impl Gpio {
    pub unsafe fn new(port: PortName, pin: usize) -> Gpio {
        let gpio = match port {
            PortName::C => 0x43FE1000 as *mut GpioBitband
        };

        Gpio { gpio, pin }
    }

    pub fn output(&mut self) {
        unsafe {
            core::ptr::write_volatile(&mut (*self.gpio).pddr[self.pin], 1);
        }
    }

    pub fn high(&mut self) {
        unsafe {
            core::ptr::write_volatile(&mut (*self.gpio).psor[self.pin], 1);
        }
    }
}

The Gpio struct, just like the Port struct, holds a pointer to the shared data block, as well as an index of its pin number. It has two functions: one to set itself as an output, and one to set its output value to high. Thanks to the bit-band, these functions can be implemented with a single write, eliminating the potential race condition that a read-modify-write of a shared memory address would create.

Converting a Pin into a Gpio consumes the Pin. This prevents having more than one reference to a single hardware pin. Getting another copy of a pin from the port is unsafe, so we can be confident that safe code will never make a second copy of a pin that is in use as a GPIO.

Putting it Together

We now have all the pieces for our first program. Going back to the beginning, our application will do the following:

disable the watchdog
turn on the clock gate for Port C
grab pin 5 from that port, and make it a GPIO
set that GPIO high to light the LED

This all ends up being surprisingly short in main:

mod port;
mod sim;
mod watchdog;

extern fn main() {
    let (wdog,sim,pin) = unsafe {
        (watchdog::Watchdog::new(),
         sim::Sim::new(),
         port::Port::new(port::PortName::C).pin(5))
    };

    wdog.disable();
    sim.enable_clock(sim::Clock::PortC);

    let mut gpio = pin.make_gpio();

    gpio.output();
    gpio.high();

    loop {}
}

Our only unsafe code in main is creating the mutable references to the various register blocks. Creating these is always unsafe, since more than one would violate Rust’s memory safety rules. The rest of the code is 100% safe.

It’s finally time to send our first pure-rust embedded program to the Teensy! Connect your Teensy to a USB port, then run make flash. You should see the LED on the Teensy light up once the process is complete. If it doesn’t, double-check your linker script, and the link sections of the _VECTORS and _FLASHCONFIG arrays. You might also double-check the addresses of the register blocks.

Our next post will look at enabling a UART for serial communication. This will give us access to real panic messages. We’ll take advantage of panics to enforce some of our rules about duplicate pins, similar to how Rust’s RefCell panics on duplicate mutable accesses.

¹ All genuine Teensys have an orange LED. If yours has a different color, I’m sad to say it’s a knockoff.