They made a related project - http://herbgrind.ucsd.edu/ - a tool that can automatically find floating point issues in real programs. You can also annotate the region you're interested in with herbgrind on/off controls.
Unfortunately - once you know you have a floating point problem, tracking it down and fixing it may be easy. Not realising it, you won't use these tools and the program will do the wrong thing :)
Very true. Floating point bugs are hard to find, because there's often no way to compute a more accurate answer. Herbgrind was our attempt at helping track down a root cause once you knew floating point was the problem---that might be the case if, for example, you changed from double to single precision and suddenly the result was way off. But Herbgrind is pretty high overhead; you'd want some other tool to tell you that floating point was a problem to begin with.
Oh nice, this is new to me. Herbgrind does dynamic analysis, but doesn't this mean it doesn't know which values are to be treated as variables?
Once I hooked Herbie up to the Clang Static Analyzer to pull out expressions from code and suggest alternatives. It wasn't a great fit---CSA doesn't model floating point, and it would consider every sub-expression separately as it explored the program. But I think there's merit to the idea.
That's right, Herbgrind doesn't know which values are variables. It uses antiunification to see which values are always the same (which it assumes are constants) and which differ between invocations (which it assumes are variables).
It doesn't seem to be doing too well on log(1+x). It does correctly identify that the Taylor expansion is more accurate for small x (although in my opinion it switches back to log(1+x) a bit too soon for positive x), however it doesn't seem to take into account the essential singularity at x=-1 and keeps using the Taylor series for negative values all the way up to x=-1 leading to horrendous accuracy.
That's right. Herbie's error measure is average error over the whole range of inputs; one way to think about that is that it's really measuring the area of inputs that have reasonable error. There aren't many floating-point values near -1, so Herbie doesn't consider the error there that important. To be precise, Herbie tries to avoid overfitting by "charging" itself 1 bit of accuracy for every branch; adding a branch for, say, -1 <= x <= -0.8 isn't worth it. If you set a precondition, say -1 <= x <= -0.8, Herbie will instead focus on that range.
Also note that the Herbie web demo has some options set that make it fast (to handle load) at the cost of lower accuracy. For example, if you download and install it yourself, you can turn on support for the special numeric functions (like log1p) or increase the number of search iterations done.
I see. I guess it ends up doing something a bit unexpected because the distribution it's sampling from is quite different from the numbers people often deal with (also unlike the example on the main page it can apparently fail to find a function that is more accurate for all inputs).
Also this is a bit of a special case because it's easy to show that x - x^2 / 2 is within 10^-15 of the true value provided abs(x) is within 10^-6 or so, so it's easy to figure out how good Herbie is doing.
That's right, the implicit distribution is uniform over floating-point values, so for example about a quarter of samples are between -1e-150 and 1e-150.
This seems like a great tool for certain statistical applications. I am learning a bit about the controversey surrounding extremely small p-values in Genome Wide Association Studies (think a million n ANOVA) and machine precision. I guess tools like this would be very useful to make sure your statistical tests can more accurately calculate p-values approaching the limits of machine precision.
If your p-values are approaching machine precision then you don't need better floating point precision you need to be using the log-likelihood (admittedly for gaussian samples this is the variance which requires some careful handling).
Also if your p-values are approaching machine precision, then maybe p-values aren't the most meaningful metric.
I totally agree, and in fact in big scientific code this is an ongoing reproducibility issue---different platforms (like GPUs) can have subtly different floating-point behavior, and how do you know the results hold up.
That said, your floating-point code probably didn't come with a numerical analysis to begin with, so how do you know it's better than the replacement? (Well... Herbie does dumb stuff sometimes, so do use with care.)
I think it is probably easier to just use it for specific instructions and then spot check the results.
It isn't something like a compiler plugin that automatically rewrites your code, it just spits out alternative formulations when you give it a math expression...
It also can show you a graph of the error as a function of input value, which is pretty helpful.
I think it would be interesting to re-run experiments / data analyses (ones sensitive to FP correctness) to see what holds up. I imagine that would be a pain though; reproducibility isn’t usually easy.
Unfortunately - once you know you have a floating point problem, tracking it down and fixing it may be easy. Not realising it, you won't use these tools and the program will do the wrong thing :)
Once I hooked Herbie up to the Clang Static Analyzer to pull out expressions from code and suggest alternatives. It wasn't a great fit---CSA doesn't model floating point, and it would consider every sub-expression separately as it explored the program. But I think there's merit to the idea.
Also note that the Herbie web demo has some options set that make it fast (to handle load) at the cost of lower accuracy. For example, if you download and install it yourself, you can turn on support for the special numeric functions (like log1p) or increase the number of search iterations done.
Also this is a bit of a special case because it's easy to show that x - x^2 / 2 is within 10^-15 of the true value provided abs(x) is within 10^-6 or so, so it's easy to figure out how good Herbie is doing.
Also if your p-values are approaching machine precision, then maybe p-values aren't the most meaningful metric.
That said, your floating-point code probably didn't come with a numerical analysis to begin with, so how do you know it's better than the replacement? (Well... Herbie does dumb stuff sometimes, so do use with care.)
It isn't something like a compiler plugin that automatically rewrites your code, it just spits out alternative formulations when you give it a math expression...
It also can show you a graph of the error as a function of input value, which is pretty helpful.