Lesson FreeRTOS + LPC17xx

FreeRTOS & Tasks

Introduction to FreeRTOS

Objective


To introduce what, why, when, and how to use Real Time Operating Systems (RTOS) as well as get you
started using it with the SJSU-Dev environment.
I would like to note that this page is mostly an aggregation of information from Wikipedia and the FreeRTOS
website.

What is an OS?

Operating system (OS) is system software that manages computer hardware and software resources and provides common services for computer programs. - Wikipedia

Operating systems like Linux or Windows

They have services to make communicating with Networking devices and files systems possible without having
to understand how the hardware works. Operating systems may also have a means to multi-tasking by allow
multiple processes to share the CPU at a time. They may also have means for allowing processes to
communicate together.

What is an RTOS?

An RTOS is an operating system that meant for real time applications. They typically have fewer services such
as the following:

  • Parallel Task Scheduler
  • Task communication (Queues or Mailboxes)
  • Task synchronization (Semaphores)

Why use an RTOS?


You do not need to use an RTOS to write good embedded software. At some point
though, as your application grows in size or complexity, the services of an RTOS might
become beneficial for one or more of the reasons listed below. These are not absolutes,
but opinion. As with everything else, selecting the right tools for the job in hand is an
important first step in any project.
In brief:

  • Abstract out timing information

The real time scheduler is effectively a piece of code that allows you to specify the
timing characteristics of your application - permitting greatly simplified, smaller (and
therefore easier to understand) application code.

  • Maintainability/Extensibility

Not having the timing information within your code allows for greater maintainability
and extensibility as there will be fewer interdependencies between your software
modules. Changing one module should not effect the temporal behavior of another
module (depending on the prioritization of your tasks). The software will also be less
susceptible to changes in the hardware. For example, code can be written such that it
is temporally unaffected by a change in the processor frequency (within reasonable
limits).

  • Modularity

Organizing your application as a set of autonomous tasks permits more effective
modularity. Tasks should be loosely coupled and functionally cohesive units that within
themselves execute in a sequential manner. For example, there will be no need to
break functions up into mini state machines to prevent them taking too long to execute
to completion.

  • Cleaner interfaces

Well defined inter task communication interfaces facilitates design and team
development.

  • Easier testing (in some cases)

Task interfaces can be exercised without the need to add instrumentation that may
have changed the behavior of the module under test.

  • Code reuse

Greater modularity and less module interdependencies facilitates code reuse across
projects. The tasks themselves facilitate code reuse within a project. For an example
of the latter, consider an application that receives connections from a TCP/IP stack -
the same task code can be spawned to handle each connection - one task per
connection.

  • Improved efficiency?

Using FreeRTOS permits a task to block on events - be they temporal or external to
the system. This means that no time is wasted polling or checking timers when there
are actually no events that require processing. This can result in huge savings in
processor utilization. Code only executes when it needs to. Counter to that however is
the need to run the RTOS tick and the time taken to switch between tasks. Whether
the saving outweighs the overhead or vice versa is dependent of the application. Most
applications will run some form of tick anyway, so making use of this with a tick hook
function removes any additional overhead.

  • Idle time

It is easy to measure the processor loading when using FreeRTOS.org. Whenever the
idle task is running you know that the processor has nothing else to do. The idle task
also provides a very simple and automatic method of placing the processor into a low
power mode.

  • Flexible interrupt handling

Deferring the processing triggered by an interrupt to the task level permits the interrupt
handler itself to be very short - and for interrupts to remain enabled while the task level
processing completes. Also, processing at the task level permits flexible prioritization -
more so than might be achievable by using the priority assigned to each peripheral by
the hardware itself (depending on the architecture being used).

  • Mixed processing requirements

Simple design patterns can be used to achieve a mix of periodic, continuous and
event driven processing within your application. In addition, hard and soft real time
requirements can be met though the use of interrupt and task prioritisation.

  • Easier control over peripherals

Gatekeeper tasks facilitate serialization of access to peripherals - and provide a good
mutual exclusion mechanism.

  • Etcetera

- FreeRTOS Website (https://www.freertos.org/FAQWhat.html)

Design Scenario

Building a controllable assembly conveyor belt

 TaQcontrollable assembly-conveyor-belt-system.png

Think about the following system. Reasonable complex, right?

Without a scheduler

 ✓  Small code size.
 ✓  No reliance on third party source code.
 ✓  No RTOS RAM, ROM or processing overhead.
 ✗  Difficult to cater for complex timing requirements.
 ✗  Does not scale well without a large increase in complexity.
 ✗  Timing hard to evaluate or maintain due to the inter-dependencies between the different functions.

With a scheduler

 ✓  Simple, segmented, flexible, maintainable design with few inter-dependencies.
 ✓  Processor utilization is automatically switched from task to task on a most urgent need basis with no
explicit action required within the application source code.
 ✓  The event driven structure ensures that no CPU time is wasted polling for events that have not occurred.
Processing is only performed when there is work needing to be done.
  *   Power consumption can be reduced if the idle task places the processor into power save (sleep) mode,
but may also be wasted as the tick interrupt will sometimes wake the processor unnecessarily.
  *   The kernel functionality will use processing resources. The extent of this will depend on the chosen
kernel tick frequency.
 ✗  This solution requires a lot of tasks, each of which require their own stack, and many of which require a
queue on which events can be received. This solution therefore uses a lot of RAM.
 ✗  Frequent context switching between tasks of the same priority will waste processor cycles.

FreeRTOS Tasks

What is an FreeRTOS Task?

A FreeRTOS task is a function that is added to the FreeRTOS scheduler using the xCreateTask() API call.

A task will have the following:

  1. A Priority level
  2. Memory allocation
  3. Singular input parameter (optional)
  4. A name
  5. A handler (optional): A data structure that can be used to reference the task later. 

A FreeRTOS task declaration and definition looks like the following:

void vTaskCode( void * pvParameters )
{
    /* Grab Parameter */
    uint32_t c = (uint32_t)(pvParameters);
    /* Define Constants Here */
    const uint32_t COUNT_INCREMENT = 20;
    /* Define Local Variables */
    uint32_t counter = 0;
    /* Initialization Code */
    initTIMER();
    /* Code Loop */
    while(1)
    {
        /* Insert Loop Code */
    }
    /* Only necessary if above loop has a condition */
    xTaskDelete(NULL);
}

Rules for an RTOS Task

  • The highest priority ready tasks ALWAYS runs
    • If two or more have equal priority, then they are time sliced
  • Low priority tasks only get CPU allocation when:
    • All higher priority tasks are sleeping, blocked, or suspended 
  • Tasks can sleep in various ways, a few are the following:
    • Explicit "task sleep" using API call vTaskDelay();
    • Sleeping on a semaphore
    • Sleeping on an empty queue (reading)
    • Sleeping on a full queue (writing)

Adding a Task to the Scheduler and Starting the Scheduler

The following code example shows how to use xTaskCreate() and how to start the scheduler using vTaskStartScheduler()

int main(int argc, char const *argv[])
{
    //// You may need to change this value.
    const uint32_t STACK_SIZE = 128;
    xReturned = xTaskCreate(
                    vTaskCode,       /* Function that implements the task. */
                    "NAME",          /* Text name for the task. */
                    STACK_SIZE,      /* Stack size in words, not bytes. */
                    ( void * ) 1,    /* Parameter passed into the task. */
                    tskIDLE_PRIORITY,/* Priority at which the task is created. */
                    &xHandle );      /* Used to pass out the created task's handle. */

    /* Start Scheduler */
    vTaskStartScheduler();

    return 0;
}

Task Priorities

High Priority and Low Priority tasks

 BSERTOS-Tasks-HL.png

In the above situation, the high priority task task never sleeps, so it is always running. In this situation where the low priority task never gets CPU time, we consider that task to be starved

Tasks of the same priority

 U42RTOS-Tasks-MM.png

In the above situation, the two tasks have the same priority, thus they share the CPU. The time each task is allowed to run depends on the OS tick frequency. The OS Tick Frequency is the frequency that the FreeRTOS scheduler is called in order to decide which task should run next. The OS Tick is a hardware interrupt that calls the RTOS scheduler. Such a call to the scheduler is called a preemptive context switch.

Context Switching

When the RTOS scheduler switches from one task to another task, this is called a Context Switch

What needs to be stored for a Context switch to happen

In order for a task, or really any executable, to run, the following need to exist and be accessible and storable:

  • Program Counter (PC)
    • This holds the position for which line in your executable the CPU is currently executing.
    • Adding to it moves you one more instruction.
    • Changing it jumps you to another section of code.
  • Stack Pointer (SP)
    • This register holds the current position of the call stack, with regard to the currently executing program. The stack holds information such as local variables for functions, return addresses and [sometimes] function return values.
  • General Purpose Registers
    • These registers are to do computation.
      • In ARM:
        • R0 - R15
      • In MIPS
        • $v0, $v1
        • $a0 - $a3
        • $t0 - $t7
        • $s0 - $s7
        • $t8 - $t9
      • Intel 8086
        • AX
        • BX
        • CX
        • DX
        • SI
        • DI
        • BP

How does Preemptive Context Switch work?

  1. A hardware timer interrupt or repetitive interrupt is required for this preemptive context switch.
    1. This is independent from an RTOS
    2. Typically 1ms or 10ms
  2. The OS needs hardware capability to have a chance to STOP synchronous software flow and enter the OS “tick” interrupt.
    1. This is called the "Operating System Kernel Interrupt"
    2. We will refer to this as the OS Tick ISR (interrupt service routine)
  3. Timer interrupt calls RTOS Scheduler
    1. RTOS will store the previous PC, SP, and registers for that task.
    2. Scheduler picks the highest priority task that is ready to run.
    3. Scheduler places that task's PC, SP, and registers into the CPU.
    4. Scheduler interrupt returns, and the newly chosen task runs as if it never stopped.

NOTE: Context switching takes time. The reason why most OS ticks are 1ms to 10ms, because any shorter means that there is less time for your code to run. If a context switch takes 100uS to do, then with a OS tick of 1ms, your code can only run for 900uS. If an OS Tick is only 150uS, then your code may only have enough time to run a few instructions. You spend more CPU time context switching then you do performing actual work.

LPC17xx Memory Map

What is a Memory Map

A memory map is a layout of how the memory maps to some set of information. With respect to embedded systems, the memory map we are concerned about maps out where the Flash (ROM), peripherals, interrupt vector table, SRAM, etc are located in address space.

Memory mapped IO

Memory mapped IO is a a means of mapping memory address space to devices external (IO) to the CPU, that is not memory. 

For example (assuming a 32-bit system)
  • Flash could be mapped to addresses 0x00000000 to 0x00100000 (1 Mbyte range)
  • GPIO port could be located at address 0x1000000 (1 byte)
  • Interrupt vector table could start from 0xFFFFFFFF and run backwards through the memory space
  • SRAM gets the rest of the usable space (provided you have enough SRAM to fill that area)

It all depends on the CPU and the system designed around it.

Port Mapped IO

Port mapped IO uses additional signals from the CPU to qualify which signals are for memory and which are for IO. On Intel products, there is a (~M/IO) pin that is LOW when selecting MEMORY and HIGH when it is selecting IO.

The neat thing about using port mapped IO, is that you don't need to sacrifice memory space for IO, nor do you need to decode all 32-address lines. You can limit yourself to just using 8-bits of address space, which limits you to 256 device addresses, but that may be more than enough for your purposes.

 

Figure 2. Address Decoding with port map

(http://www.dgtal-sysworld.co.in/2012/04/memory-intercaing-to-8085.html)

LPC17xx memory map

 

Figure 3. LPC17xx Memory Map

From this you can get an idea of which section of memory space is used for what. This can be found in the UM10360 LPC17xx user manual. If you take a closer look you will see that very little of the address space is actually taken up. With up to 4 billion+ address spaces (because 2^32 is a big number) to use you have a lot of free space to spread out your IO and peripherials.

Reducing the number of lines needed to decode IO

The LPC17xx chips, reduce bus line count, make all of the peripherals 32-bit aligned. Which means you must grab 4-bytes at a time. You cannot grab a single byte (8-bits) or a half-byte (16-bits) from memory. This eliminates the 2 least significant bits of address space.

Accessing IO using Memory Map in C

Please read the following code snippet. This is runnable on your system now. Just copy and paste it into your main.cpp file.

/*
    The goal of this software is to set the GPIO pin P1.0 to
    low then high after some time. Pin P1.0 is connected to an LED.

    The address to set the direction for port 1 GPIOs is below:

        FIO1DIR = 0x2009C020

    The address to set a pin in port 1 is below:

        FIO1PIN = 0x2009C034
*/

#include <stdint.h>

volatile uint32_t * const FIO1DIR = (uint32_t *)(0x2009C020);
volatile uint32_t * const FIO1PIN = (uint32_t *)(0x2009C034);

int main(void)
{
    //// Set 0th bit, setting Pin 0.0 to an output pin
    (*FIO1DIR) |= (1 << 0);
    //// Set 0th bit, setting Pin 0.0 to high
    (*FIO1PIN) &= ~(1 << 0);
    //// Loop for a while (volatile is needed!)
    for(volatile uint32_t i = 0; i < 0x01000000; i++);
    //// Clear 0th bit, setting Pin 0.0 to low
    (*FIO1PIN) |= (1 << 0);
    //// Loop forever
    while(1);
    return 0;
}

volatile keyword tells the compiler not to optimize this variable out, even if it seems useless

const keyword tells the compiler that this variable cannot be modified

Notice "const" placement and how it is placed after the uint32_t *. This is because we want to make sure the pointer address never changes and remains constant, but the value that it references should be modifiable. 

Using the LPC17xx.h

The above is nice and it works, but its a lot of work. You have to go back to the user manual to see which addresses are for what register. There must be some better way!!

Take a look at the LPC17xx.h file, which It is located in the SJSU-Dev/firmware/lib/L0_LowLevel/LPC17xx.h. Here you will find definitions for each peripheral memory address in the system.

Lets say you wanted to port the above code to something a bit more structured:

  • Open up "LPC17xx.h
  • Search for "GPIO"
    • You will find a struct with the name LPC_GPIO_TypeDef.
  • Now search for "LPC_GPIO_TypeDef" with a #define in the same line.
  • You will see that LPC_GPIO_TypeDef is a pointer of these structs
      • #define LPC_GPIO0 ((LPC_GPIO_TypeDef *) LPC_GPIO0_BASE )
      • #define LPC_GPIO1 ((LPC_GPIO_TypeDef *) LPC_GPIO1_BASE )
      • #define LPC_GPIO2 ((LPC_GPIO_TypeDef *) LPC_GPIO2_BASE )
      • #define LPC_GPIO3 ((LPC_GPIO_TypeDef *) LPC_GPIO3_BASE )
      • #define LPC_GPIO4 ((LPC_GPIO_TypeDef *) LPC_GPIO4_BASE )
  • We want to use LPC_GPIO1 since that corrisponds to GPIO port 1.
  • If you inspect LPC_GPIO_TypeDef, you can see the members that represent register FIODIR and FIOPIN
  • You can now access FIODIR and FIOPIN in the following way:
#include "LPC17xx.h"

int main(void)
{
    LPC_GPIO1->FIODIR |= (1 << 0);
    //// Set 0th bit, setting Pin 0.0 to high
    LPC_GPIO1->FIOPIN &= ~(1 << 0);
    //// Loop for a while (volatile is needed!)
    for(volatile uint32_t i = 0; i < 0x01000000; i++);
    //// Clear 0th bit, setting Pin 0.0 to low
    LPC_GPIO1->FIOPIN |= (1 << 0);
    //// Loop forever
    while(1);
    return 0;
}

At first this may get tedius, but once you get more experience, you won't open the LPC17xx.h file very often. This is the preferred way to access registers in this course.

On occassions, the names of registers in the user manual are not exactly the same in this file.

Lab Assignment: FreeRTOS Tasks

Objective

To give you experience in:

  1. Loading firmware onto the SJOne board
  2. Observing the RTOS round-robin scheduler in effect
  3. Provide hands-on experience with the UART character output timing

Assignment

Part 1. Fill Out the Code

Use the following code template:

  1. Reference the code comments for what you need to do. 
  2. Fill out the xTaskCreate method parameters.
  3. Do not use printf, instead use uart0_puts() function from #include "uart0_min.h"
    • This is deliberate because uart0_puts() is not interrupt driven, and uses a polled version of the char output driver

FreeRTOSConfig.h defines the total priority levels, but for the template project it is likely to be 4

#include "FreeRTOS.h"
#include "task.h"
#include "uart0_min.h"

void vTaskOneCode(void *p)
{
    while(1)
    {
    	uart0_puts("aaaaaaaaaaaaaaaaaaaa");
    	vTaskDelay(100); // This sleeps the task for 100ms (because 1 RTOS tick = 1 millisecond)
    }
}

// Create another task and run this code in a while(1) loop
void vTaskTwoCode(void *p)
{
    while(1)
    {
    	uart0_puts("bbbbbbbbbbbbbbbbbbbb");
    	vTaskDelay(100);
    }
}

// You can comment out the sample code of lpc1758_freertos project and run this code instead
int main(int argc, char const *argv[])
{
    /// This "stack" memory is enough for each task to run properly (512 * 32-bit) = 2Kbytes stack
    const uint32_t STACK_SIZE_WORDS = 512;
  
    /**
     * Observe and explain the following scenarios:
     *
     * 1) Same Priority: A = 1, B = 1
     * 2) Different Priority: A = 2, B = 1
     * 3) Different Priority: A = 1, B = 2
     *
     * Turn in screen shots of what you observed
     * as well as an explanation of what you observed
     */
    xTaskCreate(vTaskOneCode, /* Fill in the rest parameters for this task */ );
    xTaskCreate(vTaskTwoCode, /* Fill in the rest parameters for this task */ );

    /* Start Scheduler - This will not return, and your tasks will start to run their while(1) loop */
    vTaskStartScheduler();

    return 0;
}

Code Block 1: Main.cpp

Part 2: Make observations of the output

  • How come 4(or 3 sometimes) characters are printed from each task? Why not 2 or 5, or 6?
  • Alter the priority of one of the tasks, and note down the observations. Note down WHAT you see and WHY.
Hints
  • The printf data appears over 38400bps UART, and it takes 10-bits to output a single 8-bit char, and therefore you can send as many as 3840 chars per second.
  • The SJ-One board RTOS runs at 1Khz meaning that 1 RTOS tick is 1 millisecond (for the round-robin scheduler)
  • You have to relate the speed of the RTOS round-robin scheduler with the speed of the UART to answer the questions above

Part 3. Change the priority levels

Now that you have the code running with identical priority levels, try the following:

  1. Change the priority of the two tasks.
    • A=1, B=1
    • A=2, B=1
    • A=1, B=2
  2. Take a screenshot of what you see from the console.
  3. Write an explanation of why you think the output came out the way it did using your knowledge about RTOS.

What to turn in (to Canvas as PDF):

  1. Relevant code
  2. Your observation and explanation
  3. Snapshot of the output for all scenarios.