UART (Universal Asynchronous Receiver-Transmitter)


The objective of this lesson is to understand UART, and use two boards and setup UART communication between them.



Figure 1. UART connection between two devices.

For Universal Asynchronous Receiver Transmitter. There is one wire for transmitting data (TX), and one wire to receive data (RX).


A common parameter is the baud rate known as "bps" which stands for bits per second. If a transmitter is configured with 9600bps, then the receiver must be listening on the other end at the same speed.

UART Frame

UART is a serial communication, so bits must travel on a single wire. If you wish to send a 8-bit byte (uint8_t) over UART, the byte is enclosed within a start and a stop bit. To send a byte, it would require 2-bits of overhead; this 10-bit of information is called a UART frame. Let's take a look at how the character 'A' is sent over UART. In ASCII table, the character 'A' has the value of 65, which in binary is: 0100_0001. If you inform your UART hardware that you wish to send this data at 9600bps, here is how the frame would appear on an oscilloscope :


Figure 2. UART Frame sending letter 'A' 

UART Ports

A micrcontroller can have multiple UARTs. Typically, the UART0 peripheral is interfaced to with a USB to serial port converter which allows users to communicate between the computer and microcontroller. This port is used to program your microcontroller.


  • Hardware complexity is low.
  • No clock signal needed
  • Has a parity bit to allow for error checking
  • As this is one to one connection between two devices, device addressing is not required.


  • The size of the data frame is limited to a maximum of 8 bits (some micros may support non-standard data bits)
  • Doesn’t support multiple slave or multiple master systems
  • The baud rates of each UART must be within 10% (or lower, depending on device tolerance) of each other

Hardware Design


Figure 3. Simplified UART peripheral design for the STM32F429. SCLK is used for USART.

WARNING: The above is missing a common ground connection

Software Driver

The UART chapter on LPC17xx has a really good summary page on how to write a UART driver. 
Read the register description of each UART register to understand how to write a driver. 

Memory Shadowing in UART driver


Figure 4. Memory Shadowing using DLAB Bit Register

In figure 4, you will see that registers RBR/THR and DLM have the same address 0x4000C000. These registers are shadowed using the DLAB control bit. Setting DLAB bit to 1 allows the user to manipulate DLL and DLM, and clearing DLAB to 0 will allow you to manipulate the THR and RBR registers.  

The reason that the DLL register shares the same memory address as the RBR/THR may be historic.  My guess is that it was intentionally hidden such that a user cannot accidentally modify the DLL register.  Even if this case is not very significant present day, the manufacturer is probably using the same UART verilog code from many decades ago.

Control Space Divergence (CSD) in UART driver

In figure 4, you will see that register RBR and THR have the same address 0x4000C000. But also notice that access to each respective register is only from read or write operations. For example, if you read from memory location 0x4000C000, you will get the information from receive buffer and if you write to memory location 0x4000C000, you will write to a separate register which the transmit holding register. We call this Control Space Diverence since access of two separate registers or devices is done on a single address using the read/write control signal is used to multiplex between them. That address is considered to be Control Space Diverent. Typically, the control space aligns with its respective memory or io space.

Note that Control Space Divergence does not have a name outside of this course. It is Khalil Estell's phrase for this phenomenon. (Its my word... mine!)

BAUD Rate Formula


Figure 5. Baud rate formula  

To set the baud rate you will need to manipulate the DLM and DLL registers. Notice the 256*UnDLM in the equation. That is merely another way to write the following (DLM << 8). Shifting a number is akin to multiplying it by 2 to the power of the number of shifts. DLM and DLL are the lower and higher 8-bits of a 16 bit number that divides the UART baudrate clock. DivAddVal and MulVal are used to fine tune the BAUD rate, but for this class, you can simply get "close enough" and ignore these values. Take these into consideration when you need an extremely close baudrate.

Advanced Design

If you used 9600bps, and sent 1000 characters, your processor would basically enter a "busy-wait" loop and spend 1040ms to send 1000 bytes of data. You can enhance this behavior by allowing your uart send function to enter data to a queue, and return immediately, and you can use the THRE or "Transmitter Holding Register Empty" interrupt indicator to remove your busy-wait loop while you wait for a character to be sent.


FreeRTOS & Tasks

Introduction to FreeRTOS


To introduce what, why, when, and how to use Real Time Operating Systems (RTOS) as well as get you
started using it with the SJSU-Dev environment.
I would like to note that this page is mostly an aggregation of information from Wikipedia and the FreeRTOS

What is an OS?

Operating system (OS) is system software that manages computer hardware and software resources and provides common services for computer programs. - Wikipedia

Operating systems like Linux or Windows

They have services to make communicating with Networking devices and files systems possible without having
to understand how the hardware works. Operating systems may also have a means to multi-tasking by allow
multiple processes to share the CPU at a time. They may also have means for allowing processes to
communicate together.

What is an RTOS?

An RTOS is an operating system that meant for real time applications. They typically have fewer services such
as the following:

  • Parallel Task Scheduler
  • Task communication (Queues or Mailboxes)
  • Task synchronization (Semaphores)

Why use an RTOS?

You do not need to use an RTOS to write good embedded software. At some point
though, as your application grows in size or complexity, the services of an RTOS might
become beneficial for one or more of the reasons listed below. These are not absolutes,
but opinion. As with everything else, selecting the right tools for the job in hand is an
important first step in any project.
In brief:

  • Abstract out timing information

The real time scheduler is effectively a piece of code that allows you to specify the
timing characteristics of your application - permitting greatly simplified, smaller (and
therefore easier to understand) application code.

  • Maintainability/Extensibility

Not having the timing information within your code allows for greater maintainability
and extensibility as there will be fewer interdependencies between your software
modules. Changing one module should not effect the temporal behavior of another
module (depending on the prioritization of your tasks). The software will also be less
susceptible to changes in the hardware. For example, code can be written such that it
is temporally unaffected by a change in the processor frequency (within reasonable

  • Modularity

Organizing your application as a set of autonomous tasks permits more effective
modularity. Tasks should be loosely coupled and functionally cohesive units that within
themselves execute in a sequential manner. For example, there will be no need to
break functions up into mini state machines to prevent them taking too long to execute
to completion.

  • Cleaner interfaces

Well defined inter task communication interfaces facilitates design and team

  • Easier testing (in some cases)

Task interfaces can be exercised without the need to add instrumentation that may
have changed the behavior of the module under test.

  • Code reuse

Greater modularity and less module interdependencies facilitates code reuse across
projects. The tasks themselves facilitate code reuse within a project. For an example
of the latter, consider an application that receives connections from a TCP/IP stack -
the same task code can be spawned to handle each connection - one task per

  • Improved efficiency?

Using FreeRTOS permits a task to block on events - be they temporal or external to
the system. This means that no time is wasted polling or checking timers when there
are actually no events that require processing. This can result in huge savings in
processor utilization. Code only executes when it needs to. Counter to that however is
the need to run the RTOS tick and the time taken to switch between tasks. Whether
the saving outweighs the overhead or vice versa is dependent of the application. Most
applications will run some form of tick anyway, so making use of this with a tick hook
function removes any additional overhead.

  • Idle time

It is easy to measure the processor loading when using Whenever the
idle task is running you know that the processor has nothing else to do. The idle task
also provides a very simple and automatic method of placing the processor into a low
power mode.

  • Flexible interrupt handling

Deferring the processing triggered by an interrupt to the task level permits the interrupt
handler itself to be very short - and for interrupts to remain enabled while the task level
processing completes. Also, processing at the task level permits flexible prioritization -
more so than might be achievable by using the priority assigned to each peripheral by
the hardware itself (depending on the architecture being used).

  • Mixed processing requirements

Simple design patterns can be used to achieve a mix of periodic, continuous and
event driven processing within your application. In addition, hard and soft real time
requirements can be met though the use of interrupt and task prioritisation.

  • Easier control over peripherals

Gatekeeper tasks facilitate serialization of access to peripherals - and provide a good
mutual exclusion mechanism.

  • Etcetera

- FreeRTOS Website (

Design Scenario

Building a controllable assembly conveyor belt

 TaQcontrollable assembly-conveyor-belt-system.png

Think about the following system. Reasonable complex, right?

Without a scheduler

 ✓  Small code size.
 ✓  No reliance on third party source code.
 ✓  No RTOS RAM, ROM or processing overhead.
 ✗  Difficult to cater for complex timing requirements.
 ✗  Does not scale well without a large increase in complexity.
 ✗  Timing hard to evaluate or maintain due to the inter-dependencies between the different functions.

With a scheduler

 ✓  Simple, segmented, flexible, maintainable design with few inter-dependencies.
 ✓  Processor utilization is automatically switched from task to task on a most urgent need basis with no
explicit action required within the application source code.
 ✓  The event driven structure ensures that no CPU time is wasted polling for events that have not occurred.
Processing is only performed when there is work needing to be done.
  *   Power consumption can be reduced if the idle task places the processor into power save (sleep) mode,
but may also be wasted as the tick interrupt will sometimes wake the processor unnecessarily.
  *   The kernel functionality will use processing resources. The extent of this will depend on the chosen
kernel tick frequency.
 ✗  This solution requires a lot of tasks, each of which require their own stack, and many of which require a
queue on which events can be received. This solution therefore uses a lot of RAM.
 ✗  Frequent context switching between tasks of the same priority will waste processor cycles.

FreeRTOS Tasks

What is an FreeRTOS Task?

A FreeRTOS task is a function that is added to the FreeRTOS scheduler using the xCreateTask() API call.

A task will have the following:

  1. A Priority level
  2. Memory allocation
  3. Singular input parameter (optional)
  4. A name
  5. A handler (optional): A data structure that can be used to reference the task later. 

A FreeRTOS task declaration and definition looks like the following:

void vTaskCode( void * pvParameters )
    /* Grab Parameter */
    uint32_t c = (uint32_t)(pvParameters);
    /* Define Constants Here */
    const uint32_t COUNT_INCREMENT = 20;
    /* Define Local Variables */
    uint32_t counter = 0;
    /* Initialization Code */
    /* Code Loop */
        /* Insert Loop Code */
    /* Only necessary if above loop has a condition */

Rules for an RTOS Task

  • The highest priority ready tasks ALWAYS runs
    • If two or more have equal priority, then they are time sliced
  • Low priority tasks only get CPU allocation when:
    • All higher priority tasks are sleeping, blocked, or suspended 
  • Tasks can sleep in various ways, a few are the following:
    • Explicit "task sleep" using API call vTaskDelay();
    • Sleeping on a semaphore
    • Sleeping on an empty queue (reading)
    • Sleeping on a full queue (writing)

Adding a Task to the Scheduler and Starting the Scheduler

The following code example shows how to use xTaskCreate() and how to start the scheduler using vTaskStartScheduler()

int main(int argc, char const *argv[])
    //// You may need to change this value.
    const uint32_t STACK_SIZE = 128;
    xReturned = xTaskCreate(
                    vTaskCode,       /* Function that implements the task. */
                    "NAME",          /* Text name for the task. */
                    STACK_SIZE,      /* Stack size in words, not bytes. */
                    ( void * ) 1,    /* Parameter passed into the task. */
                    tskIDLE_PRIORITY,/* Priority at which the task is created. */
                    &xHandle );      /* Used to pass out the created task's handle. */

    /* Start Scheduler */

    return 0;

Task Priorities

High Priority and Low Priority tasks


In the above situation, the high priority task task never sleeps, so it is always running. In this situation where the low priority task never gets CPU time, we consider that task to be starved

Tasks of the same priority


In the above situation, the two tasks have the same priority, thus they share the CPU. The time each task is allowed to run depends on the OS tick frequency. The OS Tick Frequency is the frequency that the FreeRTOS scheduler is called in order to decide which task should run next. The OS Tick is a hardware interrupt that calls the RTOS scheduler. Such a call to the scheduler is called a preemptive context switch.

Context Switching

When the RTOS scheduler switches from one task to another task, this is called a Context Switch

What needs to be stored for a Context switch to happen

In order for a task, or really any executable, to run, the following need to exist and be accessible and storable:

  • Program Counter (PC)
    • This holds the position for which line in your executable the CPU is currently executing.
    • Adding to it moves you one more instruction.
    • Changing it jumps you to another section of code.
  • Stack Pointer (SP)
    • This register holds the current position of the call stack, with regard to the currently executing program. The stack holds information such as local variables for functions, return addresses and [sometimes] function return values.
  • General Purpose Registers
    • These registers are to do computation.
      • In ARM:
        • R0 - R15
      • In MIPS
        • $v0, $v1
        • $a0 - $a3
        • $t0 - $t7
        • $s0 - $s7
        • $t8 - $t9
      • Intel 8086
        • AX
        • BX
        • CX
        • DX
        • SI
        • DI
        • BP

How does Preemptive Context Switch work?

  1. A hardware timer interrupt or repetitive interrupt is required for this preemptive context switch.
    1. This is independent from an RTOS
    2. Typically 1ms or 10ms
  2. The OS needs hardware capability to have a chance to STOP synchronous software flow and enter the OS “tick” interrupt.
    1. This is called the "Operating System Kernel Interrupt"
    2. We will refer to this as the OS Tick ISR (interrupt service routine)
  3. Timer interrupt calls RTOS Scheduler
    1. RTOS will store the previous PC, SP, and registers for that task.
    2. Scheduler picks the highest priority task that is ready to run.
    3. Scheduler places that task's PC, SP, and registers into the CPU.
    4. Scheduler interrupt returns, and the newly chosen task runs as if it never stopped.

NOTE: Context switching takes time. The reason why most OS ticks are 1ms to 10ms, because any shorter means that there is less time for your code to run. If a context switch takes 100uS to do, then with a OS tick of 1ms, your code can only run for 900uS. If an OS Tick is only 150uS, then your code may only have enough time to run a few instructions. You spend more CPU time context switching then you do performing actual work.


RTOS Queues

There are standard queues, or <vector> in C++, but RTOS queues should almost always be used in your application because they are thread-safe (no race conditions with multiple tasks), and they co-operate with your RTOS to schedule the tasks.  For instance, your task could optionally sleep while receiving data if the queue is empty, or it can sleep while sending the data if the queue is full.

Queues vs. Semaphore for "Signalling"

Semaphores may be used to "signal" between two contexts (tasks or interrupts), but they do not contain any payload. For example, for an application that captures a keystroke inside of an interrupt, it could "signal" the data processing task to awake upon the semaphore, however, there is no payload associated with it to identify what keystroke was input.  With an RTOS queue, the data processing task can wake upon a payload and process a particular keystroke.

The data-gathering tasks can simply send the key-press detected to the queue, and the processing task can receive items from the queue, and perform the corresponding action. Moreover, if there are no items in the queue, the consumer task (the processing one) can sleep until data becomes available. You can see how this scheme lends itself well to having multiple ISRs queue up data for a task (or multiple tasks) to handle. 

Example Queue usage for Tasks

After looking through the sample code below, you should then watch this video.

Let's study an example of two tasks communicating to each other over a queue.

QueueHandle_t q;

void producer(void *p)
  int x = 0;
  while (1) {
    xQueueSend(q, &x, 0); // TODO: Find out the significance of the parameters of xQueueSend()

void consumer(void *p)
  while (1) {
    // We do not need vTaskDelay() because this task will sleep for up to 100ms until there is an item in the queue
    if (xQueueReceive(q, &x, 100)) {
      printf("Got %i\n", x);
    else {
      puts("Timeout --> No data received");

void main(void)
  // Queue handle is not valid until you create it
  q = xQueueCreate(10, sizeof(int));

Example Queue usage with Interrupts

// Queue API is special if you are inside an ISR
void uart_rx_isr(void)
  xQueueSendFromISR(q, &x, NULL); // TODO: Find out the significance of the parameters

void queue_rx_task(void *p)
  int x;

  // Receive is the usual receive because we are not inside an ISR
  while (1) {
    xQueueReceive(q, &x, portMAX_DELAY);

Additional Information

Queue Management (Amazon Docs)

Queue API (FreeRTOS Docs)


Lab Assignment: UART


To learn how to communicate between two master devices using UART.


This assignment will require two boards.  The overall idea is to interface two boards using your UART driver. To test you can use a single board and perform a UART loopback (tie your own RX and TX wires together) in order to ensure that your driver is functional before trying to put two boards together.

Part 0: Get the simplest UART driver to function correctly

First, individually, get the simplest UART driver to work.  Here is the rough skeleton:

// WARNING: Some of this is psuedocode, so you figure it out
void Uart2Interrupt()
  // TODO: Queue your data and clear UART Rx interrupt

void InitializeUart2()
  // Init PINSEL, baud rate, frame size, etc. 
  // Init UART Rx interrupt (TX interrupt is optional)
  RegisterIsr(Uart2, Uart2Interrupt);

void Uart2Send(/* fill this out */)
  // Send data via uart
void Uart2Recieve(/* fill this out */)
  // Send data via uart

void vSendOverUartTask(void * pvParamater)
  while (true)
    Uart2Send(/* some data */);

void vRecieveByteOverUartTask(void * pvParamater)
  while (true) 
    if (xQueueReceive(/* ... */, portMAX_DELAY)) 
      printf("Got %c char from my UART... job is half done!");

void main(void)

HINT: You can test that your transmit and receive are working with only one SJOne board if you use a jumper to connect your UART TX and RX together. This is called a loopback test.

Part 1: Design UART driver

Using the following class template

  1. Design a UART driver as you have the previous drivers.
  2. Implement any functionality you deem useful for a general purpose UART driver.
  3. Document/comment each method and variable within the class template.
#pragma once

class LabUart
  // TODO: Fill in methods for Initialize(), Transmit(), Receive() etc.
  // Optional: For the adventurous types, you may inherit from "CharDev" class 
  // to get a lot of functionality for free

Code Block 1. UART Driver Template Class

Part 2: Serial Application

For this application, one device will ask the other device to calculate the result of two numbers and an operation.


Figure 1

Think about it like this, using figure 1 as a guide:

  1. Device 1 sends a single digit '5', Device 2 receives single digit '5' 
  2. Device 1 sends another single digit '7', device 2 receives single digit '7'
  3. Device 1 sends an operator '+', device 2 receives operator '+' and computes the result.
  4. Device 2 sends result back to device 1.

When the result is calculated, both the devices oled displays should show that resultant.

You MAY use the pre-written OLED driver for this lab.


  • Design UART driver to work with both UART2 and UART3
  • UART receive should be interrupt driven
    • When data arrives, store it to a FreeRTOS queue inside of the UART RX interrupt
    • UART receive function should dequeue the data
  • ALU application must be able to support the following operators
    • +   Addition
    • -    Subtraction
    • *   Multiplication

What to turn in:

  • Submit all relevant files and files used (includes previous lab code used).
  • Turn in any the screenshots of terminal output.
  • Logic Analyzer Screenshots
    • Waveform of device 1 UART TX sending a digits and operator to device 2.
      • These can be in separate images if you can't fit everything in one image.
    • Waveform of device 2 UART TX sending result back to device 1.
    • Whole window screenshot with the Decoded Protocols (lower right hand side of window) clearly legible.

Extra Credit

Use the on-board buttons and OLED display as human interface devices (HID) and allow the user to punch in the numbers and operations in some way. Be creative about this. 

If you are doing the extra credit, you may use the the L2/button driver.