Herbie: Automatically Improving Floating Point Accuracy

(herbie.uwplse.org)

152 points | by lelf 1358 days ago

7 comments

  • vii 1357 days ago
    They made a related project - http://herbgrind.ucsd.edu/ - a tool that can automatically find floating point issues in real programs. You can also annotate the region you're interested in with herbgrind on/off controls.

    Unfortunately - once you know you have a floating point problem, tracking it down and fixing it may be easy. Not realising it, you won't use these tools and the program will do the wrong thing :)

    • pavpanchekha 1357 days ago
      Very true. Floating point bugs are hard to find, because there's often no way to compute a more accurate answer. Herbgrind was our attempt at helping track down a root cause once you knew floating point was the problem---that might be the case if, for example, you changed from double to single precision and suddenly the result was way off. But Herbgrind is pretty high overhead; you'd want some other tool to tell you that floating point was a problem to begin with.
    • rgovostes 1357 days ago
      Oh nice, this is new to me. Herbgrind does dynamic analysis, but doesn't this mean it doesn't know which values are to be treated as variables?

      Once I hooked Herbie up to the Clang Static Analyzer to pull out expressions from code and suggest alternatives. It wasn't a great fit---CSA doesn't model floating point, and it would consider every sub-expression separately as it explored the program. But I think there's merit to the idea.

      • pavpanchekha 1357 days ago
        That's right, Herbgrind doesn't know which values are variables. It uses antiunification to see which values are always the same (which it assumes are constants) and which differ between invocations (which it assumes are variables).
  • contravariant 1357 days ago
    It doesn't seem to be doing too well on log(1+x). It does correctly identify that the Taylor expansion is more accurate for small x (although in my opinion it switches back to log(1+x) a bit too soon for positive x), however it doesn't seem to take into account the essential singularity at x=-1 and keeps using the Taylor series for negative values all the way up to x=-1 leading to horrendous accuracy.
    • pavpanchekha 1357 days ago
      That's right. Herbie's error measure is average error over the whole range of inputs; one way to think about that is that it's really measuring the area of inputs that have reasonable error. There aren't many floating-point values near -1, so Herbie doesn't consider the error there that important. To be precise, Herbie tries to avoid overfitting by "charging" itself 1 bit of accuracy for every branch; adding a branch for, say, -1 <= x <= -0.8 isn't worth it. If you set a precondition, say -1 <= x <= -0.8, Herbie will instead focus on that range.

      Also note that the Herbie web demo has some options set that make it fast (to handle load) at the cost of lower accuracy. For example, if you download and install it yourself, you can turn on support for the special numeric functions (like log1p) or increase the number of search iterations done.

      • contravariant 1357 days ago
        I see. I guess it ends up doing something a bit unexpected because the distribution it's sampling from is quite different from the numbers people often deal with (also unlike the example on the main page it can apparently fail to find a function that is more accurate for all inputs).

        Also this is a bit of a special case because it's easy to show that x - x^2 / 2 is within 10^-15 of the true value provided abs(x) is within 10^-6 or so, so it's easy to figure out how good Herbie is doing.

        • pavpanchekha 1357 days ago
          That's right, the implicit distribution is uniform over floating-point values, so for example about a quarter of samples are between -1e-150 and 1e-150.
  • jonnycomputer 1357 days ago
    The natural application of this is in a code-inspection plugin for an IDE. Pretty cool tool!
    • pavpanchekha 1357 days ago
      There used to be GHC and Rust plugins. I don't know much about building IDE plugins, but I agree it'd be a great use of Herbie.
  • ImaCake 1357 days ago
    This seems like a great tool for certain statistical applications. I am learning a bit about the controversey surrounding extremely small p-values in Genome Wide Association Studies (think a million n ANOVA) and machine precision. I guess tools like this would be very useful to make sure your statistical tests can more accurately calculate p-values approaching the limits of machine precision.
    • contravariant 1357 days ago
      If your p-values are approaching machine precision then you don't need better floating point precision you need to be using the log-likelihood (admittedly for gaussian samples this is the variance which requires some careful handling).

      Also if your p-values are approaching machine precision, then maybe p-values aren't the most meaningful metric.

  • b34r 1357 days ago
    I wouldn’t feel remotely comfortable using this on an existing project unless it had an iron-clad test suite.
    • pavpanchekha 1357 days ago
      I totally agree, and in fact in big scientific code this is an ongoing reproducibility issue---different platforms (like GPUs) can have subtly different floating-point behavior, and how do you know the results hold up.

      That said, your floating-point code probably didn't come with a numerical analysis to begin with, so how do you know it's better than the replacement? (Well... Herbie does dumb stuff sometimes, so do use with care.)

    • spott 1357 days ago
      I think it is probably easier to just use it for specific instructions and then spot check the results.

      It isn't something like a compiler plugin that automatically rewrites your code, it just spits out alternative formulations when you give it a math expression...

      It also can show you a graph of the error as a function of input value, which is pretty helpful.

  • mrnuclear 1357 days ago
    I think it would be interesting to re-run experiments / data analyses (ones sensitive to FP correctness) to see what holds up. I imagine that would be a pain though; reproducibility isn’t usually easy.
  • bionhoward 1357 days ago
    We could have a field day applying this to all the PyTorch and TensorFlow Probability distributions