The formula for the euclidean distance between two points, in three dimensions is:

d3 = (X2 + Y2 + Z2)1/2

In two dimensions its:

d2 = (X2 + Y2)1/2

Computing this function requires a square root, which even on many computers is expensive ( but not always, for example consider SSE instructions ). Even when implemented in hardware it is usually solved by iterative approximation - which means the computer enters a loop approximating the square root until it is within a given error margin. If you are working on more limited hardware which does not have a square root function implemented in hardware, then computing even a small number of square roots may be prohibitive.

For many computations it is possible to compare square distances and so to avoid doing square roots at all. For example, if your computation is concerned only with comparing distances, then comparing square distances is equivalent. However, there are computations where you really do need to get distances - say if you need to normalize vectors, or you are implementing a collision system using spherical bounding volumes.

In these cases, if you cannot afford to compute the standard distance function, there are classes of functions which give a pretty good approximation to the distance function and which are composed entirely of easy to compute linear pieces. In theory you can keep making a linear approximation of a non-linear function arbitrarily accurate by adding more pieces. In fact this is not unlike what the square root function does when implemented on computers. The functions I will be discussing can generally approximate the distance within 2.5% average error, and with about a 5% maximum error with a small number of linear components and are therefore very fast to run. They can be adjusted to trade off maximum error for average error and you can construct functions that will move the error to certain places.

euclidean distance plot
First lets look at a 3d plot of the true distance function. This 3d patch represents a euclidean distance function where x and y range from -1 to 1. The height of the plot represents the negative distance. So when x and y are equal to zero, the distance is equal to zero, and that is the peak of the cone. Increasing values of x and y correspond to points lower down on the cone. The cross section of this plot would look like a circle.

euclidean distance plot
euclidean distance plot
Here are plots of the maximum and minimum of the absolute value of x and y, where x and y range from -1 to 1. The maximum plot is a 4 sided pyramid. The point where x and y are equal to zero corresponds to the top of the pyramid. Its cross section is a square. You can see that the plot for the minimum tends to operate as an inverse of the pyramid. Its cross section looks like a cross.

euclidean distance plot
Therefore if you make a linear combination of these two functions you get a pretty good approximation to the distance function. Different scaling values for min and max will give you approximations with different properties. This one is balanced to produce the smallest possible maximum error. You can also balance the function to produce a minimum average error, or minimum average square error. You can see that the places of maximum error are where either x or y approach zero. That is, these are the places where the shape of the cross section tends to deviate most from the ideal circle. ( It is possible to reduce the error at these points to zero by adjusting the scaling factors, but at great cost to accuracy everywhere else ).

euclidean distance plot
You can improve the accuracy of this function by adding a term specifically designed to correct error at these points. In this case we change the scaling of this function only where the larger of x or y, is greater than 16 times the value of the smaller of x or y. You can see in the plot of this new function that the sharp indentations where x or y approach zero are somewhat rounded out. You can continue to process further reducing the amount of error until it is suitably accurate for your purposes. Of course as you add linear pieces to the function it becomes more expensive to compute.

It is easy to demonstrate that for any approximate distance function that you want to work with the minimum and maximums of the absolute values of x and y. Consider that for the true distance function, for any given distance X you get points on a circle with radius X. So you are essentially trying to make an approximation of a circular graph using straight lines. Notice that taking the absolute value of x and y does not change the result of the distance function. Nor does swapping x and y. By first taking the absolute values of x and y, you are reducing the problem to approximating only one quadrant of the circle. By further considering the mininum and maximum of x and y you are constraining the problem to just one octant of a circle. This is a small enough arc that you can get a pretty good approximation with just a few straight lines.

These functions are very easy to compute, and they can be computed on modern hardware in constant time. Notice that the coefficients I am using are expressed as fractions of 1024. This means that you can implement this function without using only integer registers and a few multiplies and then finally scale the result down by shifting. Expressing your coefficients using a denominator that is a power of two will let you do this. How many bits you use will depend on the size of your registers and how much accuracy you need. Here is an example implementation of one of the given approximation functions:

u32 approx_distance( s32 dx, s32 dy ) { u32 min, max, approx; if ( dx < 0 ) dx = -dx; if ( dy < 0 ) dy = -dy; if ( dx < dy ) { min = dx; max = dy; } else { min = dy; max = dx; } approx = ( max * 1007 ) + ( min * 441 ); if ( max < ( min << 4 )) approx -= ( max * 40 ); // add 512 for proper rounding return (( approx + 512 ) >> 10 ); }

It is also possible to implement this approximation without even using multiplies if your hardware is that limited:

u32 approx_distance( s32 dx, s32 dy ) { u32 min, max; if ( dx < 0 ) dx = -dx; if ( dy < 0 ) dy = -dy; if ( dx < dy ) { min = dx; max = dy; } else { min = dy; max = dx; } // coefficients equivalent to ( 123/128 * max ) and ( 51/128 * min ) return ((( max << 8 ) + ( max << 3 ) - ( max << 4 ) - ( max << 1 ) + ( min << 7 ) - ( min << 5 ) + ( min << 3 ) - ( min << 1 )) >> 8 ); }

Solution in 3 Dimensions

We can derive a similar approximation in 3 dimensions. First we note that we can express the 3 dimensional distance function in terms of the two dimensional version: d3 = (X2 + Y2 + Z2)1/2 d2 = (X2 + Y2)1/2 Therefore: d3 = (d2( X, Y )2 + Z2)1/2 Then: d3 = d2( d2( X, Y ), Z) The function approx_distance from above is an approximation for d2. We can make a trivial implementation of an approximate d3 by just plugging in the approximation function to match the formula above:

u32 approx_distance_3( s32 dx, s32 dy, s32 dz ) { return approx_distance( approx_distance( dx, dy ), dz ); }

There are a number of ways in which this function is not optimal. Since this approx_distance tends to give the greatest error when one of the numbers approaches zero, it would be better to pass the two smallest values to the first call of approx_distance, and then use the result of that and the largest element to the second call. We can also do better if we find new scaling values spefically balanced for the three dimensional case. Of course we do better still by writing it all as one function to save ourselves the overhead of calling two functions. So again what we want to do first is simplify the problem by mapping all the parameters into positive numbers, and then figuring out which is the biggest and the smallest. Then we make a linear combination of the three numbers as we did in the two dimensional case, and we search for new scaling parameters. The function given below is optimized to minimze square error, and has a mean square error of 4.31% compared to the true distance.

u32 approx_distance_3( s32 dx, s32 dy, s32 dz ) { u32 min, med, max, approx; if ( dx < 0 ) dx = -dx; if ( dy < 0 ) dy = -dy; if ( dz < 0 ) dz = -dz; if ( dx < dy ) { min = dy; med = dx; } else { min = dx; med = dy; } if ( dz < (s32)min ) { max = med; med = min; min = dz; } else if ( dz < (s32)med ) { max = med; med = dz; } else { max = dz; } assert( min <= med ); assert( med <= max ); approx = ( max * 860 ) + ( med * 851 ) + ( min * 520 ); if ( max < ( med << 1 )) approx -= ( max * 294 ); if ( max < ( min << 2 )) approx -= ( max * 113 ); if ( med < ( min << 2 )) approx -= ( med * 40 ); // add 512 for proper rounding return (( approx + 512 ) >> 10 ); }
















A note about modern PC processors with SSE:

This code was developed for GameBoy Advance, and on that platform it is way faster to approximate distances the way I describe in this article than doing a proper square root using the floating point emulation library. But it was brought to my attention that processors supporting SSE (P3 and above etc.) have a very fast square root instruction that runs in 8-10 cycles. In addition you can run up to 4 of these square root instructions in parallel, so there are even more gains there. Here is some example assembler code to implement the square root using SSE:

Float Result, X = 4.5;

__asm {
   movss   xmm0, [X]
   rsqrtss xmm1, xmm0
   mulps   xmm1, xmm0
   movss   [Result], xmm1
}

I ran a few tests, and on modern PC's this asm code is in fact way faster than the approximation in this article.