How to Safely Think in Systems

(lethain.com)

188 points | by kiyanwang 923 days ago

6 comments

  • Barrin92 923 days ago
    None of that has really anything to do with systems thinking. That models change, omit information and don't represent reality as such is true for any kind of thinking, systems or otherwise. That's just a truism.

    At its core systems thinking is an antidote to reductionism. The essential feature of a system is that it is an indivisible whole and that its features are more than the sum of its parts. Systems lose their properties when taken apart into components, because their properties are the result of interactions between components, not the sum of the properties of components in isolation.

    That insight produces the most important practical application of systems thinking, which is that systems always need to be designed from the ground up. People should never try to improve individual metrics, but only improve parts if it improves the whole. That's also what most of the examples in Maedow's book are about that's quoted in the piece.

    It's also quite funny to bring up a transition to microservices in the piece in this context. I think microservice architectures are almost always the result of ignoring systems thinking lessons. People start to think in terms of the performance of each individual service, start to develop each service on its own terms, and start to ignore the fact that what matters is the interaction between services. It's exactly the kind of silo-ing that systems approaches try to avoid.

    • throwaway984393 923 days ago
      To expand on that microservices point, there's a particular form of systems blindness that happens when people focus on one part of a system, think they understand that one part, and then ignore all the others, as if all you need to know is what's right in front of you.

      I see it every day, in every team that I work with. The team works on component A. Maybe they can see that it's supposed to work with components B and C (maybe). So they design for those components alone. Meanwhile, components D, E, F, and G depend on components B and C. But because they don't directly work with each other, team A doesn't know that its design decisions will cause problems for D/E/F/G, and those teams don't learn about the limits of A. And they don't even think to ask. It's really nutty.

      As a systems veteran, I already know problems are going to happen purely because of not having a deeper understanding of the system. If I ask them to go out of their way to discover the interaction of the whole system, they balk (because "that's not my job"), and upper management doesn't care that nobody knows how the whole system works. It's like this at most places I've worked. We need a revolution in establishing these conventions of building components in a system, the way the 12 factor app redefined building a modern stateless app.

      • secondaryacct 922 days ago
        I lead a team that went beyond this problem in an investment bank: we must make the trading resistant to volume increase, whatever the system.

        It's an exhausting exercise: we must constantly join new team, extract the source code from their grippy hands, show them how to profile, explain the globality of the problem, how doing a good job only matters if down the line people can still trade whatever the volume and yes, even if it's a silly volume, what can we do.

        We were able to really get creative solutions by stopping the absolute technical talk and just ask stupid questions: what's our upstream maximum rate, how fast are we feeding down stream, what's blocking in the middle, how do we scale (usually we can stop at profiling after we find 3 horrendous beginner mistakes, thank God lol)

        • EastSmith 922 days ago
          Not optimized code is not code with "beginners mistakes". It is simply code that need optimisation.
          • Too 922 days ago
            Boiling water in a pot without the lid on is a beginners mistake. The fix is easy, obvious and the gains are significant. Claiming that it eventually works anyway is setting a low standard.

            With code the factor can often be several magnitudes larger than the example above. Like using O(N*N) where O(N) is the norm, spawning a thread for every function call, doing rpc for things that should be local, and so on.

            These things are bad practices and it’s ok to call them mistakes. Beginners will make mistakes and that is acceptable. Everybody learns.

          • mbrodersen 921 days ago
            Not understanding how your software fits in the overall system and not understanding what is important is a beginner mistake.
      • DelightOne 922 days ago
        How are all connections usually tracked, if they are?
        • throwaway458864 922 days ago
          Assuming you mean the connections between the components - a hodge-podge of different models, tools, techniques. There is no one way to do it, partly because of how different any given system can be from another. Even within software engineering, it really depends on the industry you're in, the application of the software, the stakeholders, the risks.

          But generally speaking, most people only track the connections at design time, as an artifact of overall architecture. And this isn't great, because as the system changes (modern software systems change constantly) the entire system development lifecycle is not being re-assessed every time some component changes.

          So in the best case, with a Waterfall model, you have very well defined connections in design, and you have to pray that your SDLC validates that design. But most people prefer Agile, which in practice means "I don't need a well defined system! #YOLOEngineering". So everything is built ad-hoc and nobody even attempts to figure out the entire picture. And in that case, Operations may be told to figure it out (they're the ones running it all, so they have the best vantage), and they tend to implement monitoring and distributed tracing that enables cobbling together a picture of how things are actually working. But that's not fed back into teams' designs, it's just used for addressing problems after the fact.

          To be specific: you might use ADRs and manually crafted diagrams to map out the connections, or UML, or some other systems diagramming tool/standard. But often that's created only at a certain level of the system, and doesn't dive deep into component interfaces or tolerances/limits or availability. So the full picture can never be seen from one view, and it's almost never the teams themselves mapping it out.

          • DelightOne 922 days ago
            That's exactly what I meant. For standardization, does Kubernetes help in that regard? For example when using network rules to whitelist what component is allowed to communicate with what service? I imagine extracting the current rules and building a graph makes discovery easier. No tolerance/limits/throughput or availability data is included though. The approach is also limited to the cluster level, excluding out-of-cluster communication, while having everything in the cluster may not be that secure.
            • throwaway984393 922 days ago
              You're spot on, it would provide limited information. In fact, it may be better to use a network monitor to trace network connections and graph that. Old network rules stick around, and so a graph of just the rules would show you connections that may not exist. And network rules are often made of CIDRs or port ranges, so it's not telling you what actual nodes are receiving traffic. If the CIDR and port range includes multiple networks with multiple components each, you don't really know what's connected to what. Distributed tracing is basically that from the application layer (and includes network calls).

              Like yourapostasy says, this kind of post-hoc system design can lead to fallacies, and doesn't contribute to the initial design of the system. If you have nothing else to go on, it helps. But your time is probably better spent investing in formal specifications, and then developing components, connections, and all the operational aspects as implementations and validations of the specification.

              Many papers have been published about this, spanning from the 70s to the late 90s, talking about the evolution of software systems engineering. After the 2000s, software engineering became more art than science when the Agile Manifesto gave everyone an excuse to stop caring about rigor.

            • yourapostasy 922 days ago
              Oh, ho ho. It is so much more than network dependencies. K8s helps somewhat by pointing a possible direction, but this is truly an Alice in Wonderland, "just how deep into the rabbit hole do you want to go?" problem space. Note the following is from the big-org perspective, small organizations don't really have this problem nearly as bad, but might start seeing this more as we all move into the cloud.

              IMHO, the declarative configuration management folks have their heart in the right place, but at their level we've already lost a lot of information and are just shoving around peas on the plate. Post hoc systems information capture is always a lossy, imprecise, empirically-driven affair. Service registries are only scratching the tip of the iceberg.

              Everyone is afraid to bite the bullet and start Encoding All The Things, because down that path lies religious wars over what to encode and how to express the encoding. Even with a service registry, I lack information on SLO's, SLA's, RTO's, RPO's, planned outages, A/B (and C/D/E/...) state, ownerships of all kinds, responsibilities of all kinds, architecture, deps of all kinds, onboarding steps and constraints, governance gates, decomm steps and constraints, change approval gates, the timing of each of those, and so on. That's just capturing the information; now imagine the insanity of walking that nightmare graph to seek impossible interlocks (which we humans accept by overriding with outages, for example), or figure out just how long it should take to accomplish a given set of related goals.

              We currently handle this as an industry through blunt force trauma on the problem space itself, while contorting ourselves as Matrix-like as possible to sustain as little in return upon ourselves in the process, through a hodge podge of techniques, tools, processes, and exasperation. At this point, I'm not exactly certain we'll fully address this space without a Culture Mind-level AI (said tongue in cheek, I really do think there is some promising work being done in this field, it is just a grind).

    • gumby 923 days ago
      > At its core systems thinking is an antidote to reductionism.

      This is a brilliant insight.

      • groby_b 922 days ago
        It's also wrong :)

        Systems thinking incorporates reductionism - you do think about the individual parts as well as the hole. Systems thinking is a synthesis of reductionism and holistic/emergence thinking, and both are necessary.

        From Meadow's "Thinking in Systems": I start with the basics: the definition of a system and a dissection of its parts (in a reductionist, unholistic way). Then I put the parts back together to show how they interconnect to make the basic operating unit of a system: the feedback loop.

        It is not an antidote so much as an acknowledgment of the incompleteness of reductionism.

      • bmitc 923 days ago
        It’s a pretty old and well known perspective.

        More is Different by P. W. Anderson

        https://cse-robotics.engr.tamu.edu/dshell/cs689/papers/ander...

        More and Different by Philip W. Anderson

        https://www.amazon.com/gp/aw/d/9814350133/

      • gumby 922 days ago
        Thanks for the references bmitc.

        Sorry that your comment is dead for some reason -- so I couldn't reply directly.

    • yourapostasy 922 days ago
      > I think microservice architectures are almost always the result of ignoring systems thinking lessons.

      Pretty much. The current hope is This Time It's Different, because of various self-discovery service registries, observability approaches, more orchestration, and so on. We've arguably made some progress; FAANG's wouldn't be be able to operate at their scale if something hadn't been accomplished on that front.

      But I'd wager there is a lot more systems thinking that goes on in those architectures than is being let on in public.

      Management is not a substitute for leadership. Throwing together components is not a substitute for systems thinking.

      There are no silver bullets, whocoodanode?

      It is curiously ironic to me that we're traveling around to a similar kind of (yet still different) environment as mainframes, with well-published interfaces, inscrutable non-realtime chargeback style billing, vendor lock-in, lots of scrutiny upon fine-grained transaction monitoring, and so on. Somewhere over in a well-earned retirement, probably sits some white-beard, chuckling to themselves as they down another plate of chicken wings and quaff a fizzy umbrella drink.

      • Blackstrat 922 days ago
        Yep but my beard isn’t completely white yet and I don’t like umbrella drinks. But seriously, my position was “eliminated” just prior to the pandemic, largely because I tried to convince the CIO that a financial company with a million customer target, probably didn’t need a cloud-based, micro services architecture. I was deemed a dinosaur. He wanted the marketing buzz. They still don’t have their new product.
        • yourapostasy 919 days ago
          > ...a financial company with a million customer target, probably didn’t need a cloud-based, micro services architecture.

          Only a million? There aren't hugely many scenarios where cloud-based microservices would beat a monolith at that small scale.

          Where many such decisions bog down is in re-engineering existing processes into microservices, or cloud, or simultaneously both. If you're starting out brand-new in a white space, then yes, by all means dive into that if you also possess the operational expertise to run it (another area I see a lot of stumbling). Functionally lift and shift strategies into the cloud is still frequently decided upon by well-meaning C-levels even this late into the cloud era, and they are still unerringly shocked at the resultant bills. Those with massive legacy infrastructures get seduced by the promised 10X cost reductions, and overlook the fine print it will cost them 10-100X current total spend as capital spend to get there. The future is always unevenly distributed.

    • varjag 923 days ago
      Yep, the Uber anecdote sounds more about hubris and inexperience than about systems design.
    • naasking 923 days ago
      > At its core systems thinking is an antidote to reductionism. The essential feature of a system is that it is an indivisible whole and that its features are more than the sum of its parts. Systems lose their properties when taken apart into components, because their properties are the result of interactions between components, not the sum of the properties of components in isolation.

      If this is true, then "systems thinking" = "emergentism".

  • lukeasrodgers 923 days ago
    Some other commenters here seem to think this article is (or should have been?) an introduction to systems thinking, or an argument for using it.

    The article is explicitly framed as some advice, based in experience, on avoiding common pitfalls with using explicit models in systems thinking. As such I think it is a nice read. It is also full of links to other interesting work by the author, not least of which is a tool to assist in system modeling (https://github.com/lethain/systems), and links to other similar tools I was unfamiliar with (https://insightmaker.com), which will make for some fine weekend reading and tinkering.

  • RajSinghLA 923 days ago
    Solid read. Will sounds like a Farnam Street reader.

    > Effective systems thinking comes from the tension between model and reality, without a healthy balance you’ll always lose the plot.

    I’ve also heard this as “don’t confuse the map for the territory.”

    • Tuckerism 923 days ago
      I have this article bookmarked on this very topic: https://fs.blog/2015/11/map-and-territory/

      I also like quoting George Box to those who are drinking the kool-aid a little too much with their models, "All models are wrong, but some are useful." :)

  • jackblemming 923 days ago
    The authors number one recommended book is ~166 pages in total. I skimmed through and it's mostly pictures. The blog also contained no data. This is all completely fine, but it rubs me the wrong way when things are posed as scientific or rigorous, but are mostly subjective opinion pieces. I like to see what "thinking in systems" gets me with real evidence. I appreciate they took the time to create and share this though.
    • kqr 923 days ago
      Donatella Meadows created real impact wherever she went.

      You might also want to look into real-life studies of efficiency and safety by people such as Deming, Weinberg, Womack, Leveson, Dekker, Shewhart, Hollnagel, Ward, etc. There plenty of evidence that it works, even if Meadow's more popular book isn't filled with references.

      • janpieterz 922 days ago
        I read the book twice. I walked away feeling the same as OP. I would appreciate it a lot of you have any other reference material on this, any book or resource that convinced you or helped you. I feel I'm close to grasping the underlying reasoning and benefits, but think I am missing a slightly different angle on this.
        • kqr 922 days ago
          I did mention several other researchers/authors, and it's hard to give a more specific recommendation without knowing what part of it attracts you.

          - Deming: management philosophy

          - Weinberg: software engineering

          - Womack: lean vs mass production

          - Leveson: accident analysis, system safety

          - Dekker: system safety

          - Hollnagel: operator experience

          - Ward: innovation, product development

        • tuatoru 921 days ago
          Try Business Dynamics by John Sterman.

          I lent my copy to my brother over a decade ago. Still haven't got it back...

          Also Exploring Requirements: Quality Before Design by Donald Gause and Gerald Weinberg. Really useful advice about questions to ask people to understand what the system really does/should do.

    • chubot 923 days ago
      I read Thinking in Systems a few years ago, in part because Bill Gates recommended it. (I had seen it elsewhere too, probably on Hacker News.)

      As I recall, there are problems with the book, because the author actually passed away before it was finished.

      The biggest problem IMO is that it doesn't adequately address modeling error. It barely mentions it at all. It simply introduces models that were simulated on a computer, and talks about their consequences. And I remember there being a tenuous connection to the original research, which is notable because I believe the author's group did a lot of it.

      So I would be interested in some other high level / overview books on the same subject. I'm not sure I got a lot out of this book. I think the main thesis was to focus on "points of leverage" when trying to change systems. I think it is valuable to hammer that point home, but it's hardly novel, and it wasn't justified with practice. I felt like it was too focused on simulations, and confused the simulations with reality.

    • rawicki 922 days ago
      I found Engineering a Safer World: Systems Thinking Applied to Safety to be a much better explainer for systems thinking than Meadows' book.

      https://www.amazon.com/Engineering-Safer-World-Systems-Think...

    • dasil003 922 days ago
      Someone always has to come with this particular middle-brow dismissal whenever anything subjective gets posted to HN. Not that I think this is the most brilliant article, but like most things in general-purpose high-level software engineering management, the ideas presented are not quantifiable or specific enough to be tractable for rigorous scientific exploration, you just have to consider them within the framework of your own expertise and experience. Trying to force a data-driven approach on top chaotic human systems where the inputs and outputs themselves are vague and subjective is a quick path to the McNamara Fallacy and other management theory quackery.
      • jackblemming 922 days ago
        We're in complete agreement then. The data-driven approach need not be applied to everything. My point was subjectivity trying to chalk itself up as rigorous using scientific sounding terms simply rubs me the wrong way.
    • groby_b 922 days ago
      What does your current way of thinking get you? Do you have tangible evidence for that? Not just "it's my way of thinking, and I create output, therefore", but the causal relationship you demand from this book?

      "How to think" is always subjective, and additional ways to think get us additional tools in our toolbox. I'd strongly recommend you at least skim the book, and see if it's applicable to your world of problems. (It's 166 pages and "mostly pictures", so it should be a quick read - and it only needs to make minimal impact to repay the time you spent)

    • gmuslera 923 days ago
      If it was so short, why you just skimmed through it?

      You are seeing a few parts (titles, maybe a paragraph or two, a sample of the diagrams and pictures) and you are not seeing the system behind it. You have to read it whole to properly understand its point.

      And as that system is not isolated from the world, after you understanding it you will get a good hint of the outer system, things that are happening in the real world outside of it.

    • swyx 922 days ago
      judging the quality of the book by number of pages or picture vs word density is like judging the quality of a program by line of code. i'm sure you know this, just gently reminding you of the natural bias we all have
    • visskiss 923 days ago
      Oh no. The book was only 160 pages? I whatever will you do?
  • hyperpallium2 923 days ago
    The iterative interactions of systems also occurs in computational fluid dynamics, with lots of clever techniques (some by von uNeumann) in a rigorously formalized mathematical domain - where it still isn't a "solved" problem.