I was writing some tests today and I ran into a peculiar floating point issue. I had generated a sequence of numbers using numpy.linspace:
>>> np.linspace(0.1, 1, 10)
array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
Part of the code I was testing ended up testing whether the value 0.3 was in the range 0.3 – 0.8, including the end points. The answer should of course be yes, but there is a twist due to the actual values in the array returned by linspace:
>>> a = np.linspace(0.1, 1, 10)
>>> 0.3 in a
False
>>> 0.3 < a[2]
True
What’s happening is that the 0.3 returned by linspace is really 0.30000000000000004, but the 0.3 when I type 0.3 is really 0.29999999999999999. It’s not clear whether this situation would ever actually arise in the normal usage of the code I was testing, but I wanted to make sure this wouldn’t cause problems. My solution was to make a function which would test whether a value was in a given range with a tiny bit of fuzziness at the edges.
NumPy has a useful function for comparing floating point values within tolerances called allclose. But that’s for comparing equality, I need fuzzy (but not very fuzzy) less than / greater than comparisons. To provide just that little bit of fuzziness I turned to the numpy.nextafter function.
nextafter
gives the next representable floating point number after the first input value. The second input value controls the direction so you can get the next value either up or down. It turns out that the two numbers that are tripping me up are right next to each other in their floating point representation:
>>> np.nextafter(0.29999999999999999, 1)
0.30000000000000004
>>> np.nextafter(0.30000000000000004, 0)
0.29999999999999999
So to catch this case my range checking function only needs one ULP of fuzziness (which is not much at all) to handle this floating point error. To allow for this I wrote a function called fuzzy_between that takes a value and the lower and upper bounds of the test range and expands the test range by a couple ULP before doing a simple minval <= val <= maxval
comparison:
import numpy as np
def fuzzy_between(val, minval, maxval, fuzz=2, inclusive=True):
"""
Test whether a value is within some range with some fuzziness at the edges
to allow for floating point noise.
The fuzziness is implemented by expanding the range at each end `fuzz` steps
using the numpy.nextafter function. For example, with the inputs
minval = 1, maxval = 2, and fuzz = 2; the range would be expanded to
minval = 0.99999999999999978 and maxval = 2.0000000000000009 before doing
comparisons.
Parameters
----------
val : float
Value being tested.
minval : float
Lower bound of range. Must be lower than `maxval`.
maxval : float
Upper bound of range. Must be higher than `minval`.
fuzz : int, optional
Number of times to expand bounds using numpy.nextafter.
inclusive : bool, optional
Set whether endpoints are within the range.
Returns
-------
is_between : bool
True if `val` is between `minval` and `maxval`, False otherwise.
"""
# expand bounds
for _ in xrange(fuzz):
minval = np.nextafter(minval, minval - 1e6)
maxval = np.nextafter(maxval, maxval + 1e6)
if inclusive:
return minval <= val <= maxval
else:
return minval < val < maxval
For a great discussion on comparing floating point numbers see this randomascii post, and for some interesting discussion on the fallibility of range functions see this post on Google+ by Guido van Rossum. Guido actually calls out numpy.linspace
as a range function not susceptible to floating point drift (since it’s calculating intervals, not adding numbers), but it’s always possible to get surprises with floating point numbers.