Lookup Tables
Objective
To discuss lookup tables and how to use them to sacrifice storage space to increase computation time.
What Are Lookup Tables
Lookup tables are static arrays that sacrifices memory storage in place of a simple array index lookup of precalculated values. In some examples, a lookup table is not meant to speed a process, but simply an elegant solution to a problem.
Lets look at some examples to see why these are useful.
Why Use Lookup Tables
Simple Example: Convert Potentiometer Voltage to Angle
Lets make some assumptions about the system first:
- Using an 8-bit ADC
- Potentiometer is linear
- Potentiometer sweep angle is 180 or 270 degrees
- Potentiometer all the way left is 0 deg and 0V
- Potentiometer all the way right (180/270 deg) is ADC Reference Voltage
- Using a processor that does NOT have a FPU (Floating Point arithmetic Unit) like the Arm Cortex M3 we use in the LPC1756.
double potADCToDegrees(uint8_t adc)
{
return ((double)(adc))*(270/256);
}
Code Block 1. Without Lookup
const double potentiometer_angles[256] =
{
// [ADC] = Angle
[0] = 0.0,
[1] = 1.0546875,
[2] = 2.109375,
[3] = 3.1640625,
[4] = 4.21875,
[5] = 5.2734375,
[6] = 6.328125,
[7] = 7.3828125,
[8] = 8.4375,
[9] = 9.4921875,
[10] = 10.546875,
[11] = 11.6015625,
[12] = 12.65625,
[13] = 13.7109375,
[14] = 14.765625,
[15] = 15.8203125,
[16] = 16.875,
[17] = 17.9296875,
[18] = 18.984375,
[19] = 20.0390625,
[20] = 21.09375,
[21] = 22.1484375,
[22] = 23.203125,
[23] = 24.2578125,
[24] = 25.3125,
[25] = 26.3671875,
[26] = 27.421875,
[27] = 28.4765625,
[28] = 29.53125,
[29] = 30.5859375,
[30] = 31.640625,
[31] = 32.6953125,
[32] = 33.75,
[33] = 34.8046875,
[34] = 35.859375,
[35] = 36.9140625,
[36] = 37.96875,
[37] = 39.0234375,
[38] = 40.078125,
[39] = 41.1328125,
[40] = 42.1875,
[41] = 43.2421875,
[42] = 44.296875,
[43] = 45.3515625,
[44] = 46.40625,
[45] = 47.4609375,
[46] = 48.515625,
[47] = 49.5703125,
[48] = 50.625,
[49] = 51.6796875,
[50] = 52.734375,
[51] = 53.7890625,
[52] = 54.84375,
[53] = 55.8984375,
[54] = 56.953125,
[55] = 58.0078125,
[56] = 59.0625,
[57] = 60.1171875,
[58] = 61.171875,
[59] = 62.2265625,
[60] = 63.28125,
[61] = 64.3359375,
[62] = 65.390625,
[63] = 66.4453125,
[64] = 67.5,
[65] = 68.5546875,
[66] = 69.609375,
[67] = 70.6640625,
[68] = 71.71875,
[69] = 72.7734375,
[70] = 73.828125,
[71] = 74.8828125,
[72] = 75.9375,
[73] = 76.9921875,
[74] = 78.046875,
[75] = 79.1015625,
[76] = 80.15625,
[77] = 81.2109375,
[78] = 82.265625,
[79] = 83.3203125,
[80] = 84.375,
[81] = 85.4296875,
[82] = 86.484375,
[83] = 87.5390625,
[84] = 88.59375,
[85] = 89.6484375,
[86] = 90.703125,
[87] = 91.7578125,
[88] = 92.8125,
[89] = 93.8671875,
[90] = 94.921875,
[91] = 95.9765625,
[92] = 97.03125,
[93] = 98.0859375,
[94] = 99.140625,
[95] = 100.1953125,
[96] = 101.25,
[97] = 102.3046875,
[98] = 103.359375,
[99] = 104.4140625,
[100] = 105.46875,
// ...
[240] = 253.125,
[241] = 254.1796875,
[242] = 255.234375,
[243] = 256.2890625,
[244] = 257.34375,
[245] = 258.3984375,
[246] = 259.453125,
[247] = 260.5078125,
[248] = 261.5625,
[249] = 262.6171875,
[250] = 263.671875,
[251] = 264.7265625,
[252] = 265.78125,
[253] = 266.8359375,
[254] = 267.890625,
[255] = 268.9453125,
[256] = 270
};
inline double potADCToDegrees(uint8_t adc)
{
return potentiometer_angles[adc];
}
Code Block 2. With Lookup
With the two examples, it may seem trivial since the WITHOUT case is only "really" doing one calculation, mulitplying the uint8_t with (270/256) since the compiler will most likely optimize this value to its result. But if you take a look at the assembly, the results may shock you.
Look up Table Disassembly
00016e08 <main>:
main():
/var/www/html/SJSU-Dev/firmware/Experiements/L5_Application/main.cpp:322
[254] = 268.9411765,
[255] = 270
};
int main(void)
{
16e08: b082 sub sp, #8
/var/www/html/SJSU-Dev/firmware/Experiements/L5_Application/main.cpp:323
volatile double a = potentiometer_angles[15];
16e0a: a303 add r3, pc, #12 ; (adr r3, 16e18 <main+0x10>)
16e0c: e9d3 2300 ldrd r2, r3, [r3]
16e10: e9cd 2300 strd r2, r3, [sp]
16e14: e7fe b.n 16e14 <main+0xc>
16e16: bf00 nop
16e18: c3b9a8ae .word 0xc3b9a8ae
16e1c: 402fc3c3 .word 0x402fc3c3
Code Block 3. Dissassembly of Look up Table
Looks about right. You can see at 16e0a the software is retrieving data from the lookup table, and then it is loading it into the double which is on the stack.
Double Floating Point Disassembly
00017c64 <__adddf3>:
__aeabi_dadd():
17c64: b530 push {r4, r5, lr}
17c66: ea4f 0441 mov.w r4, r1, lsl #1
17c6a: ea4f 0543 mov.w r5, r3, lsl #1
17c6e: ea94 0f05 teq r4, r5
17c72: bf08 it eq
17c74: ea90 0f02 teqeq r0, r2
17c78: bf1f itttt ne
17c7a: ea54 0c00 orrsne.w ip, r4, r0
17c7e: ea55 0c02 orrsne.w ip, r5, r2
17c82: ea7f 5c64 mvnsne.w ip, r4, asr #21
17c86: ea7f 5c65 mvnsne.w ip, r5, asr #21
17c8a: f000 80e2 beq.w 17e52 <__adddf3+0x1ee>
17c8e: ea4f 5454 mov.w r4, r4, lsr #21
17c92: ebd4 5555 rsbs r5, r4, r5, lsr #21
17c96: bfb8 it lt
17c98: 426d neglt r5, r5
17c9a: dd0c ble.n 17cb6 <__adddf3+0x52>
17c9c: 442c add r4, r5
17c9e: ea80 0202 eor.w r2, r0, r2
17ca2: ea81 0303 eor.w r3, r1, r3
17ca6: ea82 0000 eor.w r0, r2, r0
17caa: ea83 0101 eor.w r1, r3, r1
17cae: ea80 0202 eor.w r2, r0, r2
17cb2: ea81 0303 eor.w r3, r1, r3
17cb6: 2d36 cmp r5, #54 ; 0x36
17cb8: bf88 it hi
17cba: bd30 pophi {r4, r5, pc}
17cbc: f011 4f00 tst.w r1, #2147483648 ; 0x80000000
17cc0: ea4f 3101 mov.w r1, r1, lsl #12
17cc4: f44f 1c80 mov.w ip, #1048576 ; 0x100000
17cc8: ea4c 3111 orr.w r1, ip, r1, lsr #12
17ccc: d002 beq.n 17cd4 <__adddf3+0x70>
17cce: 4240 negs r0, r0
17cd0: eb61 0141 sbc.w r1, r1, r1, lsl #1
17cd4: f013 4f00 tst.w r3, #2147483648 ; 0x80000000
17cd8: ea4f 3303 mov.w r3, r3, lsl #12
17cdc: ea4c 3313 orr.w r3, ip, r3, lsr #12
17ce0: d002 beq.n 17ce8 <__adddf3+0x84>
17ce2: 4252 negs r2, r2
17ce4: eb63 0343 sbc.w r3, r3, r3, lsl #1
17ce8: ea94 0f05 teq r4, r5
17cec: f000 80a7 beq.w 17e3e <__adddf3+0x1da>
17cf0: f1a4 0401 sub.w r4, r4, #1
17cf4: f1d5 0e20 rsbs lr, r5, #32
17cf8: db0d blt.n 17d16 <__adddf3+0xb2>
17cfa: fa02 fc0e lsl.w ip, r2, lr
17cfe: fa22 f205 lsr.w r2, r2, r5
17d02: 1880 adds r0, r0, r2
17d04: f141 0100 adc.w r1, r1, #0
17d08: fa03 f20e lsl.w r2, r3, lr
17d0c: 1880 adds r0, r0, r2
17d0e: fa43 f305 asr.w r3, r3, r5
17d12: 4159 adcs r1, r3
17d14: e00e b.n 17d34 <__adddf3+0xd0>
17d16: f1a5 0520 sub.w r5, r5, #32
17d1a: f10e 0e20 add.w lr, lr, #32
17d1e: 2a01 cmp r2, #1
17d20: fa03 fc0e lsl.w ip, r3, lr
17d24: bf28 it cs
17d26: f04c 0c02 orrcs.w ip, ip, #2
17d2a: fa43 f305 asr.w r3, r3, r5
17d2e: 18c0 adds r0, r0, r3
17d30: eb51 71e3 adcs.w r1, r1, r3, asr #31
17d34: f001 4500 and.w r5, r1, #2147483648 ; 0x80000000
17d38: d507 bpl.n 17d4a <__adddf3+0xe6>
17d3a: f04f 0e00 mov.w lr, #0
17d3e: f1dc 0c00 rsbs ip, ip, #0
17d42: eb7e 0000 sbcs.w r0, lr, r0
17d46: eb6e 0101 sbc.w r1, lr, r1
17d4a: f5b1 1f80 cmp.w r1, #1048576 ; 0x100000
17d4e: d31b bcc.n 17d88 <__adddf3+0x124>
17d50: f5b1 1f00 cmp.w r1, #2097152 ; 0x200000
17d54: d30c bcc.n 17d70 <__adddf3+0x10c>
17d56: 0849 lsrs r1, r1, #1
17d58: ea5f 0030 movs.w r0, r0, rrx
17d5c: ea4f 0c3c mov.w ip, ip, rrx
17d60: f104 0401 add.w r4, r4, #1
17d64: ea4f 5244 mov.w r2, r4, lsl #21
17d68: f512 0f80 cmn.w r2, #4194304 ; 0x400000
17d6c: f080 809a bcs.w 17ea4 <__adddf3+0x240>
17d70: f1bc 4f00 cmp.w ip, #2147483648 ; 0x80000000
17d74: bf08 it eq
17d76: ea5f 0c50 movseq.w ip, r0, lsr #1
17d7a: f150 0000 adcs.w r0, r0, #0
17d7e: eb41 5104 adc.w r1, r1, r4, lsl #20
17d82: ea41 0105 orr.w r1, r1, r5
17d86: bd30 pop {r4, r5, pc}
17d88: ea5f 0c4c movs.w ip, ip, lsl #1
17d8c: 4140 adcs r0, r0
17d8e: eb41 0101 adc.w r1, r1, r1
17d92: f411 1f80 tst.w r1, #1048576 ; 0x100000
17d96: f1a4 0401 sub.w r4, r4, #1
17d9a: d1e9 bne.n 17d70 <__adddf3+0x10c>
17d9c: f091 0f00 teq r1, #0
17da0: bf04 itt eq
17da2: 4601 moveq r1, r0
17da4: 2000 moveq r0, #0
17da6: fab1 f381 clz r3, r1
17daa: bf08 it eq
17dac: 3320 addeq r3, #32
17dae: f1a3 030b sub.w r3, r3, #11
17db2: f1b3 0220 subs.w r2, r3, #32
17db6: da0c bge.n 17dd2 <__adddf3+0x16e>
17db8: 320c adds r2, #12
17dba: dd08 ble.n 17dce <__adddf3+0x16a>
17dbc: f102 0c14 add.w ip, r2, #20
17dc0: f1c2 020c rsb r2, r2, #12
17dc4: fa01 f00c lsl.w r0, r1, ip
17dc8: fa21 f102 lsr.w r1, r1, r2
17dcc: e00c b.n 17de8 <__adddf3+0x184>
17dce: f102 0214 add.w r2, r2, #20
17dd2: bfd8 it le
17dd4: f1c2 0c20 rsble ip, r2, #32
17dd8: fa01 f102 lsl.w r1, r1, r2
17ddc: fa20 fc0c lsr.w ip, r0, ip
17de0: bfdc itt le
17de2: ea41 010c orrle.w r1, r1, ip
17de6: 4090 lslle r0, r2
17de8: 1ae4 subs r4, r4, r3
17dea: bfa2 ittt ge
17dec: eb01 5104 addge.w r1, r1, r4, lsl #20
17df0: 4329 orrge r1, r5
17df2: bd30 popge {r4, r5, pc}
17df4: ea6f 0404 mvn.w r4, r4
17df8: 3c1f subs r4, #31
17dfa: da1c bge.n 17e36 <__adddf3+0x1d2>
17dfc: 340c adds r4, #12
17dfe: dc0e bgt.n 17e1e <__adddf3+0x1ba>
17e00: f104 0414 add.w r4, r4, #20
17e04: f1c4 0220 rsb r2, r4, #32
17e08: fa20 f004 lsr.w r0, r0, r4
17e0c: fa01 f302 lsl.w r3, r1, r2
17e10: ea40 0003 orr.w r0, r0, r3
17e14: fa21 f304 lsr.w r3, r1, r4
17e18: ea45 0103 orr.w r1, r5, r3
17e1c: bd30 pop {r4, r5, pc}
17e1e: f1c4 040c rsb r4, r4, #12
17e22: f1c4 0220 rsb r2, r4, #32
17e26: fa20 f002 lsr.w r0, r0, r2
17e2a: fa01 f304 lsl.w r3, r1, r4
17e2e: ea40 0003 orr.w r0, r0, r3
17e32: 4629 mov r1, r5
17e34: bd30 pop {r4, r5, pc}
17e36: fa21 f004 lsr.w r0, r1, r4
17e3a: 4629 mov r1, r5
17e3c: bd30 pop {r4, r5, pc}
17e3e: f094 0f00 teq r4, #0
17e42: f483 1380 eor.w r3, r3, #1048576 ; 0x100000
17e46: bf06 itte eq
17e48: f481 1180 eoreq.w r1, r1, #1048576 ; 0x100000
17e4c: 3401 addeq r4, #1
17e4e: 3d01 subne r5, #1
17e50: e74e b.n 17cf0 <__adddf3+0x8c>
17e52: ea7f 5c64 mvns.w ip, r4, asr #21
17e56: bf18 it ne
17e58: ea7f 5c65 mvnsne.w ip, r5, asr #21
17e5c: d029 beq.n 17eb2 <__adddf3+0x24e>
17e5e: ea94 0f05 teq r4, r5
17e62: bf08 it eq
17e64: ea90 0f02 teqeq r0, r2
17e68: d005 beq.n 17e76 <__adddf3+0x212>
17e6a: ea54 0c00 orrs.w ip, r4, r0
17e6e: bf04 itt eq
17e70: 4619 moveq r1, r3
17e72: 4610 moveq r0, r2
17e74: bd30 pop {r4, r5, pc}
17e76: ea91 0f03 teq r1, r3
17e7a: bf1e ittt ne
17e7c: 2100 movne r1, #0
17e7e: 2000 movne r0, #0
17e80: bd30 popne {r4, r5, pc}
17e82: ea5f 5c54 movs.w ip, r4, lsr #21
17e86: d105 bne.n 17e94 <__adddf3+0x230>
17e88: 0040 lsls r0, r0, #1
17e8a: 4149 adcs r1, r1
17e8c: bf28 it cs
17e8e: f041 4100 orrcs.w r1, r1, #2147483648 ; 0x80000000
17e92: bd30 pop {r4, r5, pc}
17e94: f514 0480 adds.w r4, r4, #4194304 ; 0x400000
17e98: bf3c itt cc
17e9a: f501 1180 addcc.w r1, r1, #1048576 ; 0x100000
17e9e: bd30 popcc {r4, r5, pc}
17ea0: f001 4500 and.w r5, r1, #2147483648 ; 0x80000000
17ea4: f045 41fe orr.w r1, r5, #2130706432 ; 0x7f000000
17ea8: f441 0170 orr.w r1, r1, #15728640 ; 0xf00000
17eac: f04f 0000 mov.w r0, #0
17eb0: bd30 pop {r4, r5, pc}
17eb2: ea7f 5c64 mvns.w ip, r4, asr #21
17eb6: bf1a itte ne
17eb8: 4619 movne r1, r3
17eba: 4610 movne r0, r2
17ebc: ea7f 5c65 mvnseq.w ip, r5, asr #21
17ec0: bf1c itt ne
17ec2: 460b movne r3, r1
17ec4: 4602 movne r2, r0
17ec6: ea50 3401 orrs.w r4, r0, r1, lsl #12
17eca: bf06 itte eq
17ecc: ea52 3503 orrseq.w r5, r2, r3, lsl #12
17ed0: ea91 0f03 teqeq r1, r3
17ed4: f441 2100 orrne.w r1, r1, #524288 ; 0x80000
17ed8: bd30 pop {r4, r5, pc}
17eda: bf00 nop
Code Block 4. Arm Software Floating Point Addition Implementation
This isn't even the full code. This is a function that our calculation function has to run each time it wants to add two doubles together. Also, note that it is not just a straight shot of 202 instructions, because you can see that there are loops in the code where ever you see an instruction's mnemonic that starts with the letter b (stands for branch).
Other Use Cases
- Correlate degrees to radians (assuming degrees are whole numbers)
- Table of cosine or sine given radians or degrees
- In the radians case, you will need to create your own trivial hashing function to convert radians to an index
- Finding a number of bits SET in a 32-bit number
- Without a lookup table time complexity is O(n) where (n = 32), the number of bits you want to look through
- With a lookup table, the time complexity is O(1), constant time, and only needs the followin operations
- 3 bitwise left shifts operations
- 4 bitwise ANDS operations
- 4 load from memory addresses
- 4 binary ADD operations
- Total of 15 operations total
/* Found this on wikipedia! */
/* Pseudocode of the lookup table 'uint32_t bits_set[256]' */
/* 0b00, 0b01, 0b10, 0b11, 0b100, 0b101, ... */
int bits_set[256] = { 0, 1, 1, 2, 1, 2, // 200+ more entries
/* (this code assumes that 'int' is an unsigned 32-bits wide integer) */
int count_ones(unsigned int x) {
return bits_set[ x & 255] + bits_set[(x >> 8) & 255]
+ bits_set[(x >> 16) & 255] + bits_set[(x >> 24) & 255];
}
Code Block 5. Bits set in a 32-bit number (Found this on wikipedia (look up tables))
There are far more use cases then this, but these are a few.
Lookup Table Decision Tree
Lookup tables can be used as elegant ways to structure information. In this case, they may not provide a speed up but they will associate indexes with something greater, making your code more readable and easier to maintain. In this example, we will be looking at a matrix of function pointers.
Example: Replace Decision Tree
See the function below:
void makeADecisionRobot(bool power_system_nominal, bool no_obstacles_ahead)
{
if(power_system_nominal && no_obstacles_ahead) {
moveForward();
}
else if(power_system_nominal && !no_obstacles_ahead) {
moveOutOfTheWay();
}
else if(!power_system_nominal && no_obstacles_ahead) {
slowDown();
}
else {
emergencyStop();
}
}
Code Block 6. Typical Decision Tree
void (* decision_matrix[2][2])(void) =
{
[1][1] = moveForward,
[1][0] = moveOutOfTheWay,
[0][1] = slowDown,
[0][0] = emergencyStop,
};
void makeADecisionRobot(bool power_system_nominal, bool no_obstacles_ahead)
{
decision_matrix[power_system_nominal][no_obstacles_ahead]();
}
Code Block 7. Lookup Table Decision Tree
The interesting thing about the decision tree is that it is also more optimal in that, it takes a few instructions to do the look up from memory, then the address of the procedure [function] is looked up an executed, where the former required multiple read instructions and comparison instructions.
This pattern of lookup table will be most useful to us for the interrupts lab assignment.