Resilience engineering: Where do I start?

(github.com)

167 points | by azhenley 11 days ago

6 comments

  • solipsism 9 days ago
    <p>Readers beware -- this particular taxonomy of robustness vs resilience is not a pervasive or even common one. Often these terms are used completely synonymously. And often they are used with different subtleties that distinguish them.<p>For example, some distinguish between the two terms in that robustness refers more to staying functional in the face of failures, where resilience refers more to the capability to work around failures (neither having anything to do in particular with whether the unknowns were unknown).<p>The blog post author says that this taxonomy come straight from David woods, so there&#x27;s no problem. Just keep in mind that most people don&#x27;t use these terms in this particular way.
    • delias_ 9 days ago
      <p>Outside of software, resilience engineering is an established field using this definition and disambiguating the others. Some info on the origins going back to the 70s <a href="http:&#x2F;&#x2F;erikhollnagel.com&#x2F;ideas&#x2F;resilience-engineering.html" rel="nofollow">http:&#x2F;&#x2F;erikhollnagel.com&#x2F;ideas&#x2F;resilience-engineering.html</a>. It’s only the last 5-10 years that people in software have been getting involved
      • vageli 9 days ago
        <p>&gt; Readers beware -- this particular taxonomy of robustness vs resilience is not a pervasive or even common one. Often these terms are used completely synonymously. And often they are used with different subtleties that distinguish them.<p>&gt; For example, some distinguish between the two terms in that robustness refers more to staying functional in the face of failures, where resilience refers more to the capability to work around failures (neither having anything to do in particular with whether the unknowns were unknown).<p>&gt; The blog post author says that this taxonomy come straight from David woods, so there&#x27;s no problem. Just keep in mind that most people don&#x27;t use these terms in this particular way.<p>Can you go into more detail with specific examples between the two that highlight the differences? &quot;Working around failures&quot; and &quot;staying functional in the face of failures&quot; sound borderline synonymous to me, so I&#x27;m curious how that plays out in practice.
        • fao_ 9 days ago
          <p>&gt; &quot;Working around failures&quot; and &quot;staying functional in the face of failures&quot; sound borderline synonymous to me<p>One is going around the iceberg, the other is ensuring the Titanic can sail on with a couple of huge holes in it&#x27;s hull.
          • vageli 1 day ago
            <p>If the Titanic does not hit the iceberg, it does not enter a failure mode. That doesn&#x27;t sound like &quot;working around failures&quot; but &quot;avoiding failure&quot; which seems very different.
      • rdoherty 9 days ago
        <p>This is a great overview. I would also recommend Dekker&#x27;s book The Field Guide to Understanding Human Error [1]. It&#x27;s a bit easier to read than Drift Into Failure, which I found to be <i>very</i> dense.<p>1: <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Field-Guide-Understanding-Human-Error&#x2F;dp&#x2F;1472439058" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Field-Guide-Understanding-Human-Error...</a>
        • FigmentEngine 9 days ago
          <p>The team I work on at AWS wrote a paper on this <a href="https:&#x2F;&#x2F;d1.awsstatic.com&#x2F;whitepapers&#x2F;architecture&#x2F;AWS-Reliability-Pillar.pdf" rel="nofollow">https:&#x2F;&#x2F;d1.awsstatic.com&#x2F;whitepapers&#x2F;architecture&#x2F;AWS-Reliab...</a> covers concepts such as Recovery Oriented Computing (ROC) etc
          • FigmentEngine 9 days ago
            <p>this is a good one as well, show how humans and our societal systems act around disasters (we are happy to live on volcanoes even after we see they blow up) The Big Ones: How Natural Disasters Have Shaped Us (and What We Can Do About Them) <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Big-Ones-Natural-Disasters-Shaped&#x2F;dp&#x2F;0385542704" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Big-Ones-Natural-Disasters-Shaped&#x2F;dp&#x2F;...</a>
          • <p>I think the work by NN Taleb on Fat Tails, Black Swans and Antifragility at least deserves a mention on this list.<p>Edit: also this USCSB youtube channel has some cool info on disaster engineering <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;channel&#x2F;UCXIkr0SRTnZO4_QpZozvCCA" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;channel&#x2F;UCXIkr0SRTnZO4_QpZozvCCA</a> (see also <a href="https:&#x2F;&#x2F;www.csb.gov&#x2F;videos&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.csb.gov&#x2F;videos&#x2F;</a>)
            • nwhatt 9 days ago
              <p>I love small concise mini-syllabi’s like this. Just give me the big papers and set some context.
              • jonahhorowitz 9 days ago
                <p>I&#x27;ve been trying to implement and apply these principals at my $job. It&#x27;s so helpful to have an intro guide published with all the supplemental reading. I&#x27;m going to send this around to all my teams.
                • sexisfun 9 days ago
                  <p>Is it specific enough to give them direction?