Understanding Programs Using Graphs

(engineering.shopify.com)

193 points | by caution 1423 days ago

12 comments

  • simonw 1423 days ago
    One of the many neat uses of Observable is quickly prototyping these kinds of graphics - often by starting with a notebook someone has already created and iterating on it.

    I have a notebook here which can visualize the table structure of any Datasette-hosted database: https://observablehq.com/@simonw/datasette-table-diagram

    My first version used a force-directed graph for this, but then Thomas Ballinger from the Observable team showed me how to use a DOT diagram, which is a much better visualization for relational tables.

  • kipply 1423 days ago
    Seas of nodes are also used as IR in Java Hotspot and the V8 engine (https://doar-e.github.io/blog/2019/01/28/introduction-to-tur...)
  • ncfausti 1423 days ago
    I started learning Blueprints in Unreal Engine recently, I swear it has helped my problem solving ability in other general areas of programming. Watching the links light up as nodes are executed is pretty great for debugging.
    • johnsonjo 1423 days ago
      I honestly feel I had a similar feeling when I played Little Big Planet 2 when I was in my late teens. It had a creative mode where you could use basic logic gates like and gates and or gates, and other utilities like counters and the like (including not just digital inputs but also analog ones). Understanding inputs and outputs from a chip level was quite interesting to me. I tried to make a calculator and learned all about bit adders and such, but was never able to figure out as a teenager how to make a conversion of the binary bits to decimal so that I could display it. I'm still not entirely sure what the proper algorithm is for that (if by chance anyone knows that would be nice to know), but I'm sure I could find it out now that I have a degree in CS (just saying that's plenty sufficient but not necessary to have a degree).
      • indiv0 1423 days ago
        You'd use a BCD to 7 segment display circuit [0]. At least that's what we did in my digital systems design course. Depending on what number you're trying to display you light up different parts of the segmented display.

        [0]: http://electronics-course.com/bcd-7-segment

        • johnsonjo 1423 days ago
          Sweet! Thanks for sharing! I’ll have to look into this more when I have time. I think the missing key for me was the BCD part. I had figured out how to do the seven segment display at some point using other utilities that LBP2 had.
      • gnramires 1420 days ago
        A general formula is:

        d1 = n % 10

        d2 = n/10 % 10

        ...

        dk = n/10^(k-1) % 10

        There are a number of implementations possible, iterative is probably more economical. Another way that avoids integer division (if you just want +/-/* ) is doing all your arithmetic in BCD (binary coded decimal), using 4 bits per digit.

        I think those kinds of exercises are useful because there is some confusion around arithmetic. Sometimes there's confusion between what are numbers, and what are number encodings or digits (which themselves represent individual numbers). I found myself confusing (or simply not having the concept of differentiating) the digits of a number and a number itself. Say '14' to me was uniquely associated to those two digits. When you learn binary arithmetic, you start generalizing and see it could be written '1110' as well. The number 18 is a concept independent of its representation. So you can talk about the digits of a number: the digits represent individual numbers themselves, but they are taken together to represent another number. In my example, I had 'dk' and 'n', where n is a number that will usually be represented in binary form, but that is irrelevant, and 'dk' are their digits, as numbers, also usually represented in some binary form (again might or might not be relevant). You even consider simpler encodings such as unary (e.g. as used in tally marks or finger counting), those of course have lower efficiency (O(logn) vs O(n)). They're all just representations of this abstract concept that are numbers (with which we can do mathematics and arithmetic operations).

        I think it's illuminating to distinguish between digital properties (properties specific to a digital representation), and properties of numbers. For example, 4 in base 3 is 11, which violates the property that even numbers are those whose last digit is even (only true for even bases). Divisibility by two is a numerical property, last even digit (implying evenness) is a digital property.

        Numbers are so simple conceptually, it's their digital representation (and digital arithmetic) that's a bit more complicated.

  • zomglings 1423 days ago
    This was a nice article, and I thank you for providing those references at the end.

    The article makes it seem as if TruffleRuby is a Shopify project, where this GitHub repo seems to differ on that point: https://github.com/oracle/truffleruby. What's the deal?

    I'm also very curious about how much TruffleRuby helps you save in infrastructure costs. It looks like Shopify is starting to put together serious systems around optimization of Ruby code.

    • chrisseaton 1423 days ago
      We're working on TruffleRuby at Shopify, but it's an Oracle-led project yes.

      > I'm also very curious about how much TruffleRuby helps you save in infrastructure costs.

      We don't have an empirical answer to this yet. We're working on it. We're investing in MRI, TruffleRuby, and many other parts of the Ruby ecosystem.

      • zomglings 1423 days ago
        Cool. Another question since you are around: How many of your sea-of-nodes-level optimizations do you think can be applied instead as AST-level optimizations?
        • chrisseaton 1423 days ago
          Well you can see the one in the article about sharing the computation from the two branches - there's no way to represent that in an AST, for the very reason that it's a tree - there can only be one user of a node in an AST, but in a graph, multiple nodes can use it.

          You could introduce some kind of hidden local variables and assign to those, but then you're kind of building a graph in an AST, and might as well use a graph, and those variables are harder to understand for other optimisations.

          (Note that these aren't 'my' optimisations, I'm just writing about them.)

      • thrwaway69 1423 days ago
        I am curious if you use https://crystal-lang.org/ at Shopify.
  • retox 1423 days ago
    Aren't we coming full circle back round to UML? UML is very good at designing and documenting complex code and it's interactions at many different levels, and having a common graphing notation helps ensure everyone is on the same page by looking at the same diagram.

    The problem comes when the design and the code get out of step, but there are tools available to round-trip the code and model to keep everything in sync. Executable UML used to be a thing too, but I was never very convinced by it.

    • RNCTX 1423 days ago
      Hopefully they get it right this time when someone reinvents UML.

      By right I mean, the first time someone comes along and says “it’s not a visual language anymore” the response is “wrong, and you’re fired.”

  • bllguo 1423 days ago
    not _directly_ related, but one interesting thing my coworker has been working on is visualizing ML models using graphs, which has made them much more interpretable for business stakeholders. Being able to break the model down into steps, feed the graph individual observations to see how the model flow would work, etc. I was surprised how receptive non-technical people were to the concept
  • Woberto 1423 days ago
    I enjoyed the article, but am maybe too dense to understand the repercussions. My understanding of the summary is that a goal is to be able to develop these graphs and then convert them back to ruby so that developers who know ruby can see how the compiler is optimizing. My honest question is, why does this matter? If the developers then start writing code more in line with what a compiler would do, then it seems compilation would be faster - is it significant, and is this a reason why? What else could it be? Thanks for the article and any thoughts!
    • pubby 1423 days ago
      My apologies if you know this, but by graph it means the mathematics definition (a type of data structure) and not the actual visualization image: https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)

      Compilers convert source code into graphs and then perform optimizations on these graphs. They never work directly on the source code itself. Once the graphs have been optimized, they're then converted into the target format, which is typically assembly language or bytecode.

      So mostly this is about techniques compilers use to optimize and transform code.

      • Woberto 1423 days ago
        I didn't know that, thank you for kindly enlightening me! So from the blogpost, the "compiler debug dumps" are the optimized graph data struct, and instead of converting to assembly or byte code, they want to "decompile" or convert back to ruby. Do you think they're doing this to better understand and improve the compiler? Or to write code that is more easily compiled?
        • jimsmart 1422 days ago
          I suspect this may have originally developed as a debugging aid, your “to better understand and improve the compiler”, but also to better understand how Ruby programs execute. With non-trivial data structures and algorithms (such as sea-of-nodes here), being able to visually introspect statically, and/or visualise its operation dynamically, can often provide informative insights.

          Instead of, or perhaps as well as, converting to assembly or byte code, here the author is producing a visualisation of the internal graph structure contained within the compiler.

          The graph isn’t necessarily a one-to-one match with the AST derived from the source, due to graph transformations that take place during the optimising phase of the compiler/runtime. This makes it even more important/interesting to see what is going on under the hood.

          Usually, in high-level programming languages one rarely has concerns regarding how easily the compiler might deal with the code - one just codes, and uses language features appropriately, as needed. With the only caveat here being the extreme performance crew, e.g. games programmers might use a restricted subset of C++, and also might disassemble compiler output, and use that information to help try and optimise their source somewhat.

  • flexvision 1423 days ago
    What is a simple tool to produce ASTs from code? Can it work cross-language?
    • AntonyGarand 1423 days ago
      I would start with https://astexplorer.net, which handles many languages, and provides interactivity: click on a node within the ast, and il will highlight the corresponding code and vice versa.

      The ast's are usually language specific, therefore it's not cross language.

  • srikz 1423 days ago
    The big value of the visual representation is if it is dynamic. That is, it gets updated while you are running the code and can use it during debugging. Otherwise, the novelty wears off quickly.

    Static graphs are still useful to understand a new piece of code, see Source Trail[1] [1]: https://www.sourcetrail.com/

  • Sevaris 1423 days ago
    Interesting topic. I feel like the representation of the Ruby fib misses the mark though because of how many language implementation details are caught up in it. The lack of abstraction here takes away from the usefulness of using a graph for understanding control/data flow in a program.
    • tom_mellior 1423 days ago
      > I feel like the representation of the Ruby fib misses the mark though because of how many language implementation details are caught up in it.

      The article's title is not optimal. It's really about compilation to machine code, and for that you do need all these details. The only "understanding" of these graphs is about understanding what the compiler is doing with the code, it's not for program understanding on a high level.

  • navan 1423 days ago
    Interesting article and good question about whether we should program using graphs. In a different domain, languages like halide, tvm and tiramisu or in a way letting programmers create graphs for high performance.
    • lmkg 1423 days ago
      Programming via graphs is common in big-data machine-learning as well. Apache Beam and Tensorflow are both programmed by creating an execution graph object, and handing that off to the runtime.
      • rcfox 1423 days ago
        Makefiles basically just trees.
  • thelazydogsback 1423 days ago
    The graphic is too fuzzy -- but I'm trying to figure out why the AST for fib in Ruby would be so complex...
    • chrisseaton 1423 days ago
      Ruby has more complicated semantics.

      An example of a concrete difference here is that Ruby checks for overflow on integer maths, where Java does not. This means each maths operation is a little cluster of nodes, rather than a single node.

      Another example is that Ruby has dynamic typing, so values that come into the function need to be type checked, and values going out need to put into a format where they can have their type inspected.

      One more example is that the call to fib in Java is static - we know exactly where that goes. In Ruby the call is dynamic, so we need to check we're calling the correct method.

      Here's a full SVG copy of that diagram https://www.dropbox.com/s/k68id9aqu258blw/fib-ruby.svg.

      • thelazydogsback 1423 days ago
        Thanks - I knew all that, but I guess I expected those semantics to be implicit rather than explicit. I guess because we're not really looking at an "AST" but an intermediate executable representation. Certainly this is good for understanding what Ruby is doing, but not so useful for understanding what fib is doing - trees vs. forest. I guess what you really want is to be able to zoom into IL-semantics only if you care about such things, otherwise you show only enough detail to get across the algorithm being expressed -- possibly with the addition of declaratively showing additional constraints that have been added by the runtime without showing how the contraints have been satisfied.