5 comments

  • sitkack 1711 days ago
    Skimmed article, didn't read paper if there was one. The article talks about data center links. These are rarely the failure mode, the failure mode is most often a bad config push and that then brings about a gray failure. And while a route may exist, because of priority levels and the way routes are announced, a part of the network is down even though there exists a physical path. This is a solution to a highly constrained model, not actual cloud computing.
    • cfors 1711 days ago
      For sure putting another complex system in front of any sort of traffic routing increases the failure modes, but the article makes a nod towards "signal quality" as the metric for traffic shifting.

      > Failure probabilities were obtained by checking the signal quality of every link every 15 minutes. If the signal quality ever dipped below a receiving threshold, they considered that a link failure.

      If the signal quality could be a higher level construct (Layer 7 errors), this could route around bad config pushes if they are constrained. I'm not going to pretend that this is definitely feasible, but at least that was my first thought.

  • jamez1 1711 days ago
    • jamez1 1711 days ago
      It just appears to apply VaR to failure rates, which is hardly a 'wall street secret'.

      The authors also completely misunderstand how VaR works - it is the minimum value at risk, not the maximum.

      Value at Risk (VaR) [33] captures precisely these bounds. Given a probability threshold β (say β = 0.99), VaRβ provides a probabilistic upper bound on the loss: the loss is less than VaRβ with probability β

      It is actually a probabilistic lower bound on the loss.

      • seanhunter 1711 days ago
        Value at risk is a probabilistic upper bound on the loss. See for example:

        https://www.risk.net/definition/value-at-risk-var

        "It is defined as the maximum dollar amount expected to be lost over a given time horizon, at a pre-defined confidence level. For example, if the 95% one-month VAR is $1 million, there is 95% confidence that over the next month the portfolio will not lose more than $1 million."

        • jamez1 1710 days ago
          That is incorrect, most definitions you'll see of VaR are incorrect. Think about it logically - the purpose is to quantify the tail events you've observed in the past. Practically that would be the lower bound (as the future can surprise you more).

          Read the discussion on VaR here if you're interested in the detail: https://martinkronicle.com/aaron-browns-red-blooded-risk/

      • dmurray 1711 days ago
        I'm not sure what you're claiming here: that the authors used the wrong definition for VaR, or that they misread their own definition. Either way, they look correct to me. VaR is a probabilistic upper bound on the loss: You can be 99% sure the loss is no greater than VaRβ, though it may be much less or (in the finance case) negative.
      • pierlu 1711 days ago
        I understand if I am correct, probabilistic lower bound with 0.01 probability, upper bound to the loss with 0.99 probability.
      • mattrp 1711 days ago
        network operators already use a sort of var strategy — Where there’s actual value at risk, the value customers pre-empt general bandwidth in the case of an outage. That’s kind of the TE principle the authors were claiming to improve.
  • Buge 1711 days ago
    >for a target percentage of time—say, 99.9 percent—the network can handle all data traffic, so there is no need to keep any links idle. During that 0.01 percent of time, the model also keeps the data dropped as low as possible.

    99.9 + 0.01 != 100

  • thinkloop 1711 days ago
    "Wall Street secrets" to idle nodes less unnecessarily? Article writing truly is an art.