Enhanced Arduino C++ Custom Math Library

by Oscar
Published: Last Updated on


This is a custom math library for Arduino, which should be more efficient than the standard C math library. This library is primarily designed for my Arduino Hexapod robot and Quadruped robot, but you might also find it useful in other application. At the moment I have come up with functions: sin(), cos(), acos(), atan2(), which has proven to be faster. Other function implementations might not be faster than the standard built-in ones, but it’s interesting to see how they can be implemented in programming language.

This custom maths library runs 10+ times faster than the standard built-in library when running on my robots (measuring the relatively math intensive Inverse Kinematics algorithm running time). Comparing to the old maths library I also get an 10% improvement, see result sections for detail.

The source code is not fully available on this page, please request through comment. Part of the implementation is at the end of this post.

Custom Math Library Development History

I have done a basic custom maths library before, where I used look-up tables for sin() and cos() functions, and a combination of polynomial and look-up table for the acos() function.

In this enhanced version, I have come up with some alternative solution for these functions and other maths functions, which I will refer to the ‘New’ ways in this post. They are not necessary better than the old ways, so I will be comparing them in the next section. I will put these ‘New’ functions implementations at the end of the post.

Arduino custom math library

Benchmarking: the Built-in Way, Old Way and New Way

The testings will be carried out on:

  • New sqrt function – exploiting bit shift operation
  • New sin/cos function – using int look-up table with 0.5 degree interval (double look-up table size of the old way)
  • New acos function – only using a look-up table (byte values)
  • New atan2 function – using sqrt() and acos()
  • Performances of Using int/long and float on Arithmetical Operation (addition, multiplication and division)

All tests are carried out using the Arduino Mega 2560 board, because that’s the environment I will be using this custom library in.


Some of the results came back as expected, and some didn’t. The performance really depends on the hareware architecture and varies from platform to platform. All results were measured in micro seconds (us)

sin() and cos()

Both functions were run 2 million times with a random input between 0 to 360 degree, for built-in way and new way. I am not testing the old way because the new way is almost the same as the old way except the look-up table size is now doubled.

  • Built-in way took: 23,273,248 us
  • New way took: 2,524,220 us


All three ways were tested: built-in way VS old-way (polynomical + look-up) VS new-way (pure look-up+mul+div). The acos() functions were run 100 000 times with a random input between 0 and 1.

  • Built-in way: 52,245,320 us
  • Old way: 38,648,904 us
  • New way: 19,225,740 us


Built-in way VS New way ( sqrt()+acos() ). This function were run 100,000 times with two random float numbers between -10.0 and 10.0.

  • Built-in way: 6,163,108 us
  • New way: 5,685,292 us


Built-in way VS New way (whole number bit shift operation).

I lost the exact result data to this one, but the outcome clearly shown the built-in way is still far better than the new implementation.

Arithmetical Operation with Float and Long/Int

This one is a interesting one. In theory, arithmetical operations should be faster with long and integer data than float generally. But when using long/int to represent numbers after the decimal point, a number is multiplied by a scaler such as 1000, depends on how many decimal places we want, for example to keep 3 decimal places 3.203 is now 3203. When this scaled number is multiplied or divided by another scaled number with the same precision (for example 1.123 -> 1123), we need to divide 1000 after the multiplication, or times 1000 before the division. And that introduces overhead to the computation, we could just have this:

3.203 * 1.123 OR 3.203 / 1.123

but now with scaled int/long presentation:

3203 * 1123 / 1000 OR 3203 * 1000 / 1123

For testing, all numbers used are random. Each testing were run 20,000 times.

  • 1 add 1 mul 1 div
    • float : 1,019,000 us
    • long: 888,120 us
  • 3 add 1 mul 1 div
    • float: 1,355,700 us
    • long: 919,560 us
  • 1 add 3 mul 1 div
    • float: 1,398 700 us
    • long: 1,267,352 us
  • 1 add 1 mul 3 div
    • float: 2,299,000 us
    • long: 2,448,716 us

The long data type performed quite well with low number of arithmetical operations and higher number of additions. But as expected, the more multiplications and divisions, the closer it gets to performance of the float type. It even became worse than float when there are more divisions involved.

Obviously the more operations we do, the more computationally expensive it gets, which will not do us any good. Besides, in the Arduino Architecture, float and long both take up 32 bits, so there is no advantage of replacing float with long data type memory wise. Therefore we will stick to float when needed.

The same applies to the look-up table. Some might suggest to use Integer type data for it, which is 16 bits Verse 32 bits for float (currently using). But if the output of the trigonometry functions are float, that means we will have to do conversions (probably with divisions involved) before output. This will kill the performance. So whenever is possible, I should stick to float type in the look-up table so I can use the data directly.

Other Note

One thing we should not do is to use other platforms other than Arduino itself for performance testings. Because it varies from architectures to architectures.

During the process of researching, I also found some useful stuff about Arduino:

  • PROGMEM – Where you can store variables in flash memory rather than RAM. It’s useful when you have run out of space in RAM and still have a large amount of data you need to store somewhere. But it’s quite slow compared to RAM.
  • Different type of memory in Arduino boards – MAX Variable Arduino can hold (look-up table) in RAM. Variables are stored in “SRAM”. 

If you want to discuss or share your ideas, you can post something in our forum here.

New Way Implementations:


[sourcecode language=”cpp”]
float fsin(int deg){

float result = 0;
int sign = 1;

if (deg < 0){
deg = -deg;
sign = -1;

while (deg>=3600)
deg -= 3600;

// 0 and 90 degrees.
if((deg >= 0) && (deg <= 900))
result = SIN_TABLE[deg / 5];

// 90 and 180 degrees.
else if((deg > 900) && (deg <= 1800))
result = SIN_TABLE[(1800-deg) / 5];

// 180 and 270 degrees.
else if((deg > 1800) && (deg <= 2700))
result = -SIN_TABLE[(deg-1800) / 5];

// 270 and 360 degrees.
else if((deg > 2700) && (deg <= 3600))
result = -SIN_TABLE[(3600-deg)/5];

return sign * result;



[sourcecode language=”cpp”]

float fcos(int deg){
float result = 0;
if (deg < 0)
deg = -deg;

while (deg>=3600)
deg -= 3600;

// 0 and 90 degrees.
if((deg >= 0) && (deg <= 900))
result = SIN_TABLE[(900-deg) / 5];

// 90 and 180 degrees.
else if((deg > 900) && (deg <= 1800))
result = -SIN_TABLE[(deg-900) / 5];

// 180 and 270 degrees.
else if((deg > 1800) && (deg <= 2700))
result = -SIN_TABLE[(2700 – deg) / 5];

// 270 and 360 degrees.
else if((deg >= 2700) && (deg <= 3600))
result = SIN_TABLE[(deg – 2700) / 5];

return result;


[sourcecode language=”cpp”]

float facos(float num){

float rads = 0;
bool negative = false;

// Get sign of input
if(num < 0){
negative = true;
num = -num;

// num between 0 and 0.9.
if((num >= 0) && (num < 0.9))
rads = (float)ACOS_TABLE[(int)(num*DEC4/79+0.5)] * 0.00616;

// num between 0.9 and 0.99.
else if ((num >= 0.9) && (num < 0.99))
rads = (float)ACOS_TABLE[(int)((num*DEC4-9000)/8 + 0.5) + 114] * 0.00616;

// num between 0.99 and 1.0.
else if ((num >= 0.99) && (num <= 1))
rads = (float)ACOS_TABLE[(int)((num*DEC4-9900)/2 + 0.5) + 227] * 0.00616;

// Account for the negative sign if required.
rads = PI – rads;

return rads;


[sourcecode language=”cpp”]
float fatan2(float opp, float adj){

float hypt = sqrt(adj * adj + opp * opp);
float rad = om_acos(adj/hypt);

if(opp < 0)
rad = -rad;

return rad;


[sourcecode language=”cpp”]
ulong fsqrt(ulong number){
ulong root = 0;
ulong bit = 1UL << 30;

// Bit starts at the highest power of four <= to input number.
while(bit > number) bit >>= 2;

while(bit != 0){
if(number >= root + bit){
number -= (root + bit);
root += (bit << 1);
root >>= 1;
bit >>= 2;

return root;

You can now find the library in GitHub (not my account).

Leave a Comment

By using this form, you agree with the storage and handling of your data by this website. Note that all comments are held for moderation before appearing.


Dj_Garfield 12th May 2015 - 11:43 am

OMG ! I will investigate on this “Math Lib” for my antenna tracker , because I have to perform some calculation to compare 2 GPS coordinates to give me the angle between referencial heading of the antenna tracker and the UAV GPS location , this will give the angle to rotate the antenna around the Pan Axis :) the same for the elevation between the antenna tracker and the UAV , Thanks to Mr Pythagore :)
This will be nice , because GPS coordinates are in degres :)

Francesco 5th May 2015 - 4:40 pm

Hi Oscar,
you made a good work. I’m working on a firmware for Arduino to control a CNC machine (sourceforge.net/projects/easycnc/) and I have the problem that the trigonometric functions of the math lib are too heavy and they make slower the execution. I was thinking to reimplement them as you made, but fortunately, before I made a search on the web.
Could you provide the sources of the lib?
I’ll promote your work in my project.

Oscar 6th May 2015 - 12:48 pm

there is a link to the github source code at the bottom of the post. :)

Mohit 12th January 2015 - 12:50 pm

Really struggling with the math.h library’s speed. Would love it if you could share your library with me :-)


Sud 14th May 2014 - 7:10 am

This sounds great, can I be emailed the library please? And what licensing will be needed if I use this in a commercial project?

Misha 11th January 2014 - 7:18 am

It is all sounds very good ! But where can I get the library ? :)

Eugene 21st November 2013 - 10:48 am

I’m trying to optimize math as well.
Did you try to divide by 4 or 8 instead – replace it with shift operation?

Also, can you send me the library?

mike 9th November 2013 - 6:25 pm

Please let me know, if you could go into further detail with this one.
Or send me the thing for lookup/improving/adapting.
Im doing a quadcopter Arduino thing and need some extensive calulus in control theory^^.

Thanks for sharing.
If you are that kind: [email protected] (dot) at
and send me the whole thing. If not, i still thank you!