Forecasting in Python with Prophet

(mode.com)

151 points | by pplonski86 1889 days ago

5 comments

  • Tarq0n 1889 days ago
    A large forecasting competition called M4 [1] recently published their results. If you're interested in forecasting I suggest checking out their summary paper. [2]

    Highlights include:

    * Pure ML methods are still not competitive with statistical models;

    * Ensembles perform better than any single model, an important difference to the last competition;

    * Very simple benchmarks can perform very well on this type of competition.

    The top 3 included multiple statistical models feeding into an RNN (by an Uber engineer), another ensemble using XGboost for the final layer and a combination of just statistical methods with a clever weighting scheme.

    If you're interested in making production-level predictions, it's probably a good idea to ensemble prophet with other methods.

    [1] https://en.wikipedia.org/wiki/Makridakis_Competitions

    [2] https://www.scribd.com/document/382185710/IJF-Published-M4-P...

    • wenc 1889 days ago
      Those highlights match my field experience.

      I've found that ensembles aren't necessarily more accurate for individual forecasts per se, but in terms of aggregate errors over the long run, they end up being less wrong. (bias-variance tradeoff)

  • freeradical13 1889 days ago
    For those looking for a straightforward introduction to forecasting, see the free e-book, Forecasting: Principles and Practice by Hyndman and Athanasopoulos: https://otexts.com/fpp2/
    • muraiki 1889 days ago
      That's an excellent resource that I've used heavily (along with their awesome forecast package for R). I also found QuantStart's series on Time Series Analysis (scroll down a bit: https://www.quantstart.com/articles) quite helpful, and it gets a little deeper into the math so I had a better understanding of why certain things are done.
  • andrewljohnson 1889 days ago
    Is Prophet useful for forecasting subscription revenue?

    Say you have multiple product lines like iOS-1month, iOS-1year, iOS-1year-premium, Android-1month, web-5years, etc. And you have historical numbers for new users and retention for each product line.

    I looked at using Prophet for this, but instead I made a giant spreadsheet and modeled my assumptions around seasonality, retention, and new user growth. I wasn't totally sure, but it seemed like Prophet wasn't intended for my use case, where my assumptions can be stated very explicitly, and I want to twiddle the knobs to look at various scenarios.

    • pplonski86 1889 days ago
      It should be useful for forecasting subscription revenue. I can help you with this for free! I'm building autoML solution and this is interesting use case for me. Please email me if you want help on this
  • maliker 1889 days ago
    We've enjoyed using Prophet for forecasting in our work with electric utilities. It's pretty fire-and-forget. We wish it had an option for fast retraining of the model when we get new hourly data, though. That's pushing us towards other options in scipy.
    • wenc 1889 days ago
      I looked at Prophet a few months ago because we needed a fire-and-forget library similar to 'auto.arima' (R "forecast" package) for Python, but no good candidates existed.

      However I found Prophet to be computationally a little heavier than auto.arima because it uses "stan" (Bayesian) underneath, which in turn uses an MCMC type approach and has quite a few dependencies. We needed fast model retraining as well, and at the time, it didn't seem like that was something it excelled at. (might have changed, I'm not sure)

      I ended up putting together a simple ensemble forecast model class with "statsmodels" which automatically selected/averaged the best models over a collection of model types via heuristics and cross-validation. It works ok, but I'm still waiting for someone to port the R auto.arima over to Python. (I tried rpy, which in theory should have worked, but I struggled with the impedance mismatch)

      • venuur 1889 days ago
        I found this port of auto-arima for Python. I haven’t used it in production, but it was easy to test on some demo data. https://pypi.org/project/pyramid-arima/
        • wenc 1887 days ago
          Thanks, I'll take a look.
      • Tarq0n 1888 days ago
        You only need to do MCMC if you want simulated confidence intervals, it doesn't do anything for model estimation.
        • wenc 1887 days ago
          You’re right. It uses Stan to do MAP for model estimation and HMC for (optional) computation of confidence intervals. [1]

          Thanks for the correction.

          I need to look at Prophet again. Maybe it will work for fast model retraining. I had trouble the last time I tried it.

          [1] https://research.fb.com/prophet-forecasting-at-scale/

  • minimaxir 1889 days ago
    It's been awhile since my undergrad stats classes, but doesn't using a boxcox-transform on a objective metric, fitting the dataset to the transformed metric, and then inverting it violate model assumptions? (or maybe that's just in the case of a linear regression and not Prophet's approach)
    • wenc 1889 days ago
      I believe as long as the transformation function is invertible (no information is lost), it ought to be a valid approach. The Box-Cox function as far as I can tell is invertible.

      If a transformation f has the property (f^-1 o f)(x) = x then (f^-1 o g o f)(x) should be valid -- though how to rigorously demonstrate this in all cases evades me for now. (there are some conditions that need to be satisfied)

      Rob Hyndman, the author of the "forecast" package in R, also thinks it's ok (slide 17 [1])

      [1] https://robjhyndman.com/talks/RevolutionR/7-Transformations....

    • tfehring 1889 days ago
      It just establishes new model assumptions. In the linear regression case, fitting boxcox(y) = βX + ε and predicting with boxcox^(-1)(yhat) produces different predictions than fitting y = βX + ε (minimizing Σ(boxcox(y) - boxcox(yhat))^2 does not minimize Σ(y - yhat)^2 in general). Either model would be an approximation, of course, and it's possible for either to produce more accurate predictions than the other for a given set of observations.

      I'm not familiar enough with Prophet to know whether the same logic applies here, though I'd hazard a guess that it does.

      • yorwba 1889 days ago
        Right. In particular, using least squares is justified by the assumption that errors are normally distributed, in which case least squares yields the maximum-likelihood estimate. Because boxcox is intended to transform a random variable into one that is normally distributed, the model assumptions are actually more likely to be satisfied if you do regression on the transformed values. (Which was probably the reason boxcox was invented in the first place.)
    • oarabbus_ 1889 days ago
      For those of us whom it's also been a while since our undergrad stats classes but clearly we retained much less of it than you did, could you elaborate a bit here?

      Shit, I've forgotten everything, I'm googling model assumptions, boxcox transform, the works

    • CapmCrackaWaka 1889 days ago
      Yes, it does. However, it can usually be offset by some adjustment measure. The same thing happens when you take the log of your target variable and run a regression with RMSE as your loss, then try to get your predictions by taking e^prediction. Going from a lognormal to a normal distribution requires an offset, in this case: exp(variance(log preds - log target)/2). Box-Cox (which is very similar to log) would require a similar offset, however I don't know how to calculate it.

      https://en.wikipedia.org/wiki/Log-normal_distribution

      • wenc 1889 days ago
        > Yes, it does.

        I'm trying to follow your explanation but help me understand better: which model assumptions would it violate?

        Typical assumptions for time-series forecasting are stationarity (differencing is used to achieve this if not already the case), residuals are homoskedastic (constant variance) and normally distributed, etc. In fact, the primary use of the Box-Cox is to stabilize the variance and make the data more normal. Box-Cox itself doesn't violate any model assumptions. Forecasting on Box-Cox transformed data shouldn't either -- if anything, Box-Cox attempts to better satisfy the assumptions of time-series forecast models (well, as best it can -- some data just don't want to be normal).

        Now the question is whether the inverse Box-Cox (or the entire process) violates any model assumptions. Intuitively, I don't believe it does (and Rob Hyndman, author of book on Forecasting agrees) but I'm not certain how to demonstrate this rigorously.

        • CapmCrackaWaka 1889 days ago
          The assumption that your target is normally distributed about the expected value. Let's say you take the log of your target, and run a regression using RMSE. You now assume that log(Y) is normally distributed. Now, for a given sample, your model outputs E[log(Y)|X]. However, exp(E[log(Y)|X]) is not the same as E[exp(log(Y)|X], which is what you really want. It can be shown mathematically that in order to get to E[exp(log(Y)|X], you need to multiply by an offset factor, which in this case is exp(var(E[log(Y)]-log(Y)/2), which is how you would normally transform between the mean of a normal distribution to a lognormal distribution.

          Here is a paper which addresses the issue in the introduction. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4024993/

          • wenc 1889 days ago
            Let me see if I understood you correctly. Let Y represent a random variable representing stationary time series data. Let g(.) be the forecast function.

            1) Log-transformation: log(Y)

            2) Forecast results of transformed data: E[g(log(Y))]

            3) Inversion of forecast results: exp(E[g(log(Y))])

            4) However, what we really want is the expectation of the transformed-forecasted-inversed results, which is E[exp(g(log(Y)))].

            Jensen's inequality states that for a convex function φ,

            φ(E[X]) <= E[φ(X)]

            And equality is only attained if either X is constant or φ is affine. Since neither is (generally) the case here, therefore

            exp(E[g(log(Y))]) < E[exp(g(log(Y)))]

            An offset factor ε is needed to correct (3) to (4)

            E[exp(g(log(Y)))] = exp(E[g(log(Y))]) * ε

            Did I get that right? (Offset changed to multiplicative factor)

            • CapmCrackaWaka 1889 days ago
              Yes, exactly. However in this case, the offset is multiplicative.
              • wenc 1888 days ago
                Understood. To add one more point, I'm noticing the reason the above works the way it does is because most forecast algorithms output an expected value instead of a random variable, hence the results are E[g(log(Y)] instead of just g(log(Y)).

                It strikes me that if you package the entire thing as a random variable:

                Z = exp(G(log(Y)))

                and use a different kind of forecast function G : Y -> Y' where Y, Y' ~ Normal, then we don't need the multiplicative factor -- which can be difficult to calculate for an arbitrary transformation. We can just get the expected value of Z, ie. E[Z] = E[exp(G(log(Y)))]. This is not done in the article, but in theory it could be.