Resilience engineering: Where do I start?

(github.com)

167 points | by azhenley 102 days ago

6 comments

  • solipsism 100 days ago

    Readers beware -- this particular taxonomy of robustness vs resilience is not a pervasive or even common one. Often these terms are used completely synonymously. And often they are used with different subtleties that distinguish them.

    For example, some distinguish between the two terms in that robustness refers more to staying functional in the face of failures, where resilience refers more to the capability to work around failures (neither having anything to do in particular with whether the unknowns were unknown).

    The blog post author says that this taxonomy come straight from David woods, so there's no problem. Just keep in mind that most people don't use these terms in this particular way.

    • delias_ 100 days ago

      Outside of software, resilience engineering is an established field using this definition and disambiguating the others. Some info on the origins going back to the 70s http://erikhollnagel.com/ideas/resilience-engineering.html. It’s only the last 5-10 years that people in software have been getting involved

      • vageli 100 days ago

        > Readers beware -- this particular taxonomy of robustness vs resilience is not a pervasive or even common one. Often these terms are used completely synonymously. And often they are used with different subtleties that distinguish them.

        > For example, some distinguish between the two terms in that robustness refers more to staying functional in the face of failures, where resilience refers more to the capability to work around failures (neither having anything to do in particular with whether the unknowns were unknown).

        > The blog post author says that this taxonomy come straight from David woods, so there's no problem. Just keep in mind that most people don't use these terms in this particular way.

        Can you go into more detail with specific examples between the two that highlight the differences? "Working around failures" and "staying functional in the face of failures" sound borderline synonymous to me, so I'm curious how that plays out in practice.

        • fao_ 100 days ago

          > "Working around failures" and "staying functional in the face of failures" sound borderline synonymous to me

          One is going around the iceberg, the other is ensuring the Titanic can sail on with a couple of huge holes in it's hull.

          • vageli 92 days ago

            If the Titanic does not hit the iceberg, it does not enter a failure mode. That doesn't sound like "working around failures" but "avoiding failure" which seems very different.

      • rdoherty 100 days ago

        This is a great overview. I would also recommend Dekker's book The Field Guide to Understanding Human Error [1]. It's a bit easier to read than Drift Into Failure, which I found to be very dense.

        1: https://www.amazon.com/Field-Guide-Understanding-Human-Error...

        • FigmentEngine 100 days ago

          The team I work on at AWS wrote a paper on this https://d1.awsstatic.com/whitepapers/architecture/AWS-Reliab... covers concepts such as Recovery Oriented Computing (ROC) etc

        • DyslexicAtheist 100 days ago

          I think the work by NN Taleb on Fat Tails, Black Swans and Antifragility at least deserves a mention on this list.

          Edit: also this USCSB youtube channel has some cool info on disaster engineering https://www.youtube.com/channel/UCXIkr0SRTnZO4_QpZozvCCA (see also https://www.csb.gov/videos/)

          • nwhatt 100 days ago

            I love small concise mini-syllabi’s like this. Just give me the big papers and set some context.

            • jonahhorowitz 100 days ago

              I've been trying to implement and apply these principals at my $job. It's so helpful to have an intro guide published with all the supplemental reading. I'm going to send this around to all my teams.

              • sexisfun 100 days ago

                Is it specific enough to give them direction?