Beginner's Guide to Abstraction

(jesseduffield.com)

243 points | by jesseduffield 1384 days ago

24 comments

chowells 1383 days ago
What's the source of this idea that "abstraction" means "combining duplicated code"? An abstraction is a transformation between models that strips away information not essential to the target model. This is an idea firmly planted in semantics, not code coincidence.
The first example in the article is a nice demonstration of this happening well. You are transitioning from a model where geometry is unknown to one where at least simple geometry is known. It may not be the most compelling model change ever, but it at least captures the key point - there is a semantic operation going on: calculate the volume of a sphere. The actual formula for doing so isn't important at the level you want to think about your code, so replacing the formula with a function call simplifies the model in which you're working.
Contrast that with the example of things going wrong in the next part, with the bad "average" function. What unnecessary details are being removed there? Being sure you're calling that function correctly actually takes more work than calculating an average without it. That's not an abstraction, it's an indirection. You still need to track it down and read the code to understand it. That's not something you have to do with sphere_volume in the preceding part.
So how do you know whether duplicated code represents an opportunity for abstraction? You start thinking in terms of the semantics you want to be using. Is the code duplicated because it's doing the same thing, semantically, in your target model? Well then that's an opportunity for abstraction. Or is it just duplicated because it happens to share an implementation? Well then that's just a coincidence. Don't try to share code.
Not to suggest that this is an easy test, of course. It's very possible that two things might be different instances of a common problem that you're unaware of, and you have no idea that they share code because they actually are the same. That's ok, and there's always more to learn. But I think if people put more thought into what abstraction is and why it exists, the questions of when and how to use it fade away.
[-]
- NumberWangMan 1383 days ago
  The way I've heard Sandi Metz talk about it is basically that Don't Repeat Yourself isn't the important thing, but it's what we tell beginners who don't realise that having 15 copies of the same logic all over their program isn't a good idea. It's a proxy for the idea of abstraction, because that takes much longer to develop a good taste for.
  And unfortunately, a lot of people learn the quick reminder "Don't Repeat Yourself" without ever hearing the actually definition of the principle: "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system", which doesn't roll off the tongue as easily. The DRY principle is a bit easy to corrupt and mis-apply, if you don't ever learn what it really means.
- jesseduffield 1383 days ago
  In my mind, abstraction is finding a single representation for various things that share traits. If those things happen to be chunks of code all performing the same task in the same way, then factoring the code into a method would be an example of abstraction. I'm not sure that this definition conflicts with the idea of abstraction being about higher-orderness as another commenter suggests, but I don't think it does.
  I disagree with your claim that the 'bad' `average` method is an example of indirection rather than abstraction. Indirection, afaik, is about decoupling two things via an interface so that one isn't directly dependent on the implementation of the other. In this case you are still directly calling the method which will directly execute the code. Maybe the dispute here is around whether we should call these things 'the wrong abstraction' or 'a failed attempt at abstraction' but I think it's a spectrum where you can have a good abstraction but with some parts that don't really belong, and in that case it's still an abstraction.
  [-]
  - daviddaviddavid 1383 days ago
    Not the parent, but I think one of the main differences here is that your definition seems to imply that one abstracts from code where parent's definition seems to imply that one abstracts from the domain which one's computer program is modelling.
    I'd tend to think that the "chunks of code all performing the same task" aren't the thing you abstract from, rather they are just a poor abstraction of the same domain.
    [-]
    - jesseduffield 1383 days ago
      I agree with your distinction, however I'd say that it's abstraction all the way down: the code you write is an abstraction of the business requirements, which is itself an abstraction of the business domain, and when you're dealing with legacy code that nobody understands, it ends up being treated as a defacto domain unto itself.
      So I don't think the term 'abstraction' should only be reserved for use in reference to the domain because abstraction is already happening at every level anyway.
- teambayleaf 1383 days ago
  > What's the source of this idea that "abstraction" means "combining duplicated code"?
  It's coming from the tradition of Sir Francis Bacon's Empiricism. Some people believe that abstract ideas are mostly acquired thorough empirical observation a.k.a. carefully watching a series of repetitive events. The point here is that "to know" is a synonym of "generalization" for them. So OP naturally puts emphasis on duplicated code as a good opportunity for abstraction.
  People from other camps have a very different view on human knowledge, and think that abstract ideas can indeed come from other sources, most importantly, from our reasoning function. For these people, the process of programming is to implement our innate idea into concrete code. For them, an actual piece of code is basically a "shadow" of our abstract concept.
  Hell, someone really should write a book titled "The Logic of Programmatic Discovery" ;)
- proc0 1383 days ago
  Yes, removing duplication is closer to refactoring, not abstracting, although I don't think they are mutually exclusive. Abstracting is hiding everything BUT the essential elements of something. Good abstraction only exposes what's necessary, and thus the epitome of abstraction are interfaces. Coding with good abstractions is about writing interfaces at every level, from the functions to the user. At the mathematical level, Category Theory is the math of abstraction and basically describes interfaces formally.
- twirlock 1383 days ago
  It's a heuristic for just kind of hopefully involving abstraction. The source is countless blogs and books written confidently by authors who don't actually know what abstraction is, including the Wikipedia page.
codemonkey-zeta 1384 days ago
I'm surprised I'm the first to bring this up, but the "over-abstract" example feels more like "over-specified". The actual operation being defined works on the same level of abstraction, since it defines precisely the same operation. The first has just specified extra tweakable aspects of its execution via the argument list. I'm not saying it's not bad, I just don't think it's because it is "too abstract" compared to the simpler solution.
Abstraction in my mind is fundamentally about the "higher-orderness" of a thing. These two average methods are just as abstract as the other, since one is not a higher-order operation than the other. I would use the word over-abstract if one was to write a program modeling a dog-walking business (a very specific thing), by writing a system which models actions on entities (a very abstract thing), where walking is an action which may involve one or more entities, and dogs, humans, employees, and customers are all entities. If the core thing you want to do is just a single concretion of the system that you actually built, then you "over-abstracted". I feel like we should not discourage the practice of abstraction, since that's our business. I literally get paid to think about the real world in terms of abstraction and write it up into a computer. Young engineers should not be taught to fear "over-abstraction".
[-]
- MaxBarraclough 1383 days ago
  > The actual operation being defined works on the same level of abstraction, since it defines precisely the same operation.
  Agreed. This confusion is something Zed Shaw wrote about in blog post called Indirection Is Not Abstraction [0]. Surprisingly it seems it was never discussed properly on HN [1], but it was discussed elsewhere [2][3].
  (There's also another blog post by this name by another blogger, by independent reinvention/coincidence [4].)
  > Young engineers should not be taught to fear "over-abstraction".
  Disagree. As you just showed, unnecessary abstraction is bad. Ineffective abstractions are also bad. It's not easy to get right.
  [0] https://web.archive.org/web/20160304022133/http://zedshaw.co...
  [1] https://news.ycombinator.com/from?site=zedshaw.com&next=8820...
  [2] https://www.reddit.com/r/programming/comments/38hobm/zed_sha...
  [3] https://lobste.rs/s/ja5ihv/indirection_is_not_abstraction
  [4] https://news.ycombinator.com/item?id=18344033
  [-]
  - gen220 1383 days ago
    Thank you for these excellent links. This debate often takes the concrete form of “one large imperative function with 100 lines” vs “5 functions with 20 lines”. Many people who separate out logic for the sake of minimizing lines-per-function unfortunately do so by introducing indirection, rather than abstractions, and thereby make the program more challenging to reason about and test. But, the reason the debate is never-ending is because the question isn’t sufficiently defined! :)
    I like to think of a program like a tree (main is root, each function is a node, calls are edges). Each sub tree should be a bounded context, in that (ideally) you only have to think about parameters defined in that sub-tree. Leaves implement the “nitty-gritty” (mainly I/O, number-crunching, and complex transformations), and are heavily tested. Each node that isn’t a leaf is either an abstraction composing leaves, or an abstraction composing abstractions. Unit tests for leaves test the nitty gritty, unit tests for non-leaves must only test composition.
    I find that human-readable modules have some limits (number of children, height of the tree). You can violate those limits sometimes, but only if you provide some assistance in the form of comments.
    Sometimes, a 100-line function is not composing many distinct nitty-gritty ideas. It just really takes 100 lines to express “write this model to the database”.
    [-]
    - MaxBarraclough 1383 days ago
      I agree that relatively large functions aren't always an evil. As you say, sometimes there isn't a tidy way to further decompose it. At the same time though I don't think it's always a sin to write a function just for decomposition, without it doing any abstraction.
      A 'helper function' might be tightly bound to some other function, i.e. the helper function is sensitive to the internal workings of the function it serves, and is not intended to be called from anywhere else.
      And there's still no excuse for source files that are 5000 lines long, of course.
      HN discussion of John Carmack's thoughts on how long functions are sometimes preferable: https://news.ycombinator.com/item?id=8374345
- rpastuszak 1383 days ago
  > I feel like we should not discourage the practice of abstraction, since that's our business. I literally get paid to think about the real world in terms of abstraction and write it up into a computer. Young engineers should not be taught to fear "over-abstraction".
  I agree with most of your comment, but in my experience (and, I think many may echo the same sentiment) over-abstracting has been always more dangerous than the opposite.
  I’d choose a messy, duplicated piece of code over an overly abstracted 10x developer made mess, every single time.
  > I literally get paid to think about the real world in terms of abstraction and write it up into a computer.
  I love software engineering, but the most satisfying moments in my work involve removing code or not relying on tech to solve problems whatsoever.
- jesseduffield 1383 days ago
  I consider abstraction to be about giving various concrete things the same representation. In this case I'm saying that we're over-abstracting by pulling too much of the dissimilar code between the examples into the one representation (i.e. the method). I would also say with the `average` methods it's not quite the same operation, despite having the same name. The over-abstracted method had quite a bit more going on internally than the minimal abstraction, and had a different interface.
  With your dog walking example, I'd say each of your listed abstractions would be 'the right abstraction' because you're not bundling up dissimilar things into a single representation as if they were similar. Specifically with dogs/humans, my example about circles/squares is the same: both might conform to the same interface, but you wouldn't want to represent them with the same class. I agree that doing so is perhaps a special kind of mistake for which there may be a better term than 'over-abstraction', though it's not obvious to me what that term would be (over-specified doesn't quite sound right to me).
  [-]
  - codemonkey-zeta 1383 days ago
```
  In this case I'm saying that we're over-abstracting by pulling too much of the dissimilar code between the examples into the one representation (i.e. the method).
```
    Ok it sounds like we agree on what is wrong with the example. "Over-abstract" still feels like the wrong word for that, because the actual problem is that we have ruined our abstraction layer with junk about lower layers. Average is an operation on lists of numbers, but ignore_nulls is a feature of the members of the list in your programming language, same with the Type argument. The members of data structures intuitively (maybe not always) exist at a lower level of abstraction. I would be more inclined to call this something similar to "partial-abstraction", because the programmer didn't take the time to remove all the semantics of the lower levels from the interface to the higher level.
    [-]
    - jesseduffield 1383 days ago
      I just came across the term of https://en.wikipedia.org/wiki/Leaky_abstraction. I think that term captures what we're talking about, do you agree?
      [-]
      - codemonkey-zeta 1383 days ago
        This! I knew I'd heard of this issue before, just couldn't recall the term. I think leaky abstraction fits the bill.
- searchableguy 1383 days ago
  I think with modern editors, the cost of code duplication is lower than the cost of a tight over abstracted code linked everywhere in your system. You can easily find and replace instances of duplicated code as it would be isolated and can be automated to some degree.
kevsim 1383 days ago
One of the nightmares I've experienced time and time again in large mature codebases is incomplete abstractions. Some developer gets a great idea of how to abstract something away, defines the plumbing needed for said abstraction, and sets to work going through the codebase bit by bit, moving code over to this new abstraction.
But then, they leave. Or they change projects or they just lose their enthusiasm for this major refactor. And you're left with a half baked abstraction in the code. Then another developer comes along, and another, and another and before two long you've got a spaghetti mess of incomprehensible "abstractions".
I was told by a friend once that the iOS app for Facebook has a whole bunch of implementations encapsulating what a "form" is in the app. Many developers came and went with their own ideas of what that abstraction should look like, but none became the one abstraction to rule them all.
[-]
- cooperadymas 1383 days ago
  I run into this scenario particularly in large React code bases. Multiple developers building their own half baked abstractions on top of a framework with its own continually shifting abstractions. Lack of uniting engineering leadership and not enough time to devote to code discipline - it's a real joy.
  [-]
  - searchableguy 1383 days ago
    My problem with react is that there are so many ways to do the same things that reusability and refactoring becomes harder than it should be.
    I usually found 3-4 ways of styling in a medium codebase on github. I find varying degrees of state management - some built with context and hooks, some imported and traditional mobx/redux. I haven't found many significant opportunities to reuse code without refactoring a lot or doing micro files of 7 lines of code and separating logical components.
    I now think of react as a library to build your own frameworks. It aligns perfectly with the javascript ecosystem mentality.
    [-]
    - randompwd 1383 days ago
      Just started looking at Spring Boot (w/Kotlin).
      Never try it. A bazillion ways to do basic web service & web things and every google search returns different methods.
      Nightmare.
      [-]
      - vips7L 1383 days ago
        Did you look at https://spring.io/guides ? Building a REST service is one of the examples.
        https://spring.io/guides/tutorials/rest/
  - kevsim 1383 days ago
    Oh yeah! Nothing like a bunch of old poorly thought out "render props" style high level components getting mushed together with a bunch of new poorly thought out custom React hooks :-)
- kqr 1383 days ago
  This is something I've struggled a bit with professionally recently. I've been leading initiatives to modernise legacy applications (both in terms of code and user experience, measurably so). This cannot be done all at once -- the applications are large and used in production and need to be maintained.
  So the only option seems to be to commit to the legacy (in many ways measurably bad) implementation, or start upgrading/rewriting it a little bit at a time.
  Of course, the latter is good. Except it leads to the situation you describe with multiple ongoing half-finished abstractions.
  There are better ways to do it: not do all the plumbing first, and then the actual business logic, and instead do thin vertical slices at a time. But it still feels... wrong.
pierremenard 1384 days ago
I've learned the hard way that perfect abstractions don't exist. The mathematician in me wants to find the "most elegant representation" of a given problem, but when I give in to that urge, I often end up with a god function that takes `n` boolean flags that toggle the behavior slightly for different cases.
Why does this happen? I think a partial explanation is in this Nietzsche essay [1], where he says, "every concept arises from the equation of unequal things" — in other words, the abstractions of the world were built bottom up in our heads, and the Platonic essence of things is just a fairy tale we tell young programmers so they can sleep easily at night.
1. http://nietzsche.holtof.com/Nietzsche_various/on_truth_and_l...
[-]
- twhitmore 1383 days ago
  A key insight can be to abstract behaviors, rather than state.
  It is typically much more possible to cleanly abstract the behavior for a single interaction/ role, than for the entirety of an entity's state. If you have multiple interactions you might want to use more than one interface.
  Once you have clean behavioral APIs for your interactions, you may also be able to use composition, wrapping and delegation to implement/ enhance behavior. This is the 'Strategy pattern'.
  The one thing you lose here is object identity -- you can no longer assume a delegate is the entity itself. This is no big loss given the clarity & flexibility that are gained.
mplanchard 1383 days ago
There was a great discussion here on HN a few months ago about DRY [0] and how developers generally get the concept wrong. Specifically, it’s not about removing duplicated code. It’s about ensuring there’s a single source of truth for a given piece of knowledge in your application. For example, if you’re calculating interest on purchases all over the place and hard coding the rate everywhere, it should be unified into a single function or method, so that there’s one source of truth for calculating interest. If you just have some code that looks similar but doesn’t represent “knowledge” that is being duplicated, DRY does not apply. Some folks in the comments mentioned they liked to use SPoT as an acronym that’s a bit clearer, and I’ve been using that since then in code reviews.
[0]: https://news.ycombinator.com/item?id=22329787
foxtr0t 1383 days ago
I think this definition of abstraction is lazy. Abstraction is indeed hard to define precisely, but it may be more accurate to describe it as the process of implementing interfaces that are conceptually familiar to users, often through metaphor, such as the unix "pipe".
Creating the "right" abstraction is _not_ the process of bundling up repeated code and only questioning how "abstract" it should be, it is the process of creating an interface that is familiar and conceptually easy to grasp. This is done through naming, comments, the use of metaphors, etc. From this viewpoint there can be many correct levels of abstraction, some more useful than others. We should aim to create good abstractions at all layers of code, even if finding abstraction bliss is unattainable.
[-]
- kqr 1383 days ago
  As Dijkstra put it in the '60s: we start with hardware, which is fully capable of solving our problem, but not designed to make it convenient to express that solution.
  So we make "virtual machines" by creating new "instructions" that extend the physical machine. (Dijkstra called them virtual machines; these days we might say APIs or something.)
  We keep doing this: extending level n-2 into level n-1, with the guiding principle that n-1 should be a "virtual machine" that makes it slightly more convenient to express level n. At some level k, the solution to our original problem is trivial.
  Other important properties of these abstraction levels is that
  - They should do resource allocation and management to the point where a raw resource used by level n should not be visible as such in level n+1.
  - Any level should ideally only depend on one level below it.
  - No level can ever depend on a level above it -- this creates cycles in the dependency tree and prevents further extension and contraction of functionality.
  ----
  Drawing further from Parnas instead: abstractions should hide design decisions that are likely to change. Typical examples of those are the format of data, layout of data structures, implementation details.
  Things that are less likely to change are things that come from the problem domain. Design interfaces based on problem domain concepts, not solution domain concepts. This applies at every level: the abstractions of level n-1 should have interfaces in terms of problem domain concepts of level n.
  [-]
  - cinnamonheart 1383 days ago
    One of my favourite quotes on the topic of abstraction comes from Dijkstra, too.
    > The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.
    [-]
    - kqr 1383 days ago
      Yes! And this also ties in to the notion that abstractions should encapsulate invariants.
  - blitmap 1383 days ago
    There are things in here I had not considered before. I often abstract away to make something "easier for me" to think about, but I rarely consider "what is likely to change" and hide that behind a layer. Thank you for sharing.
    [-]
    - kqr 1383 days ago
      Bonus tip: any time your team has a heated argument about any decision, that is a strong contender for abstraction.
      [-]
      - wh33zle 1383 days ago
        As in, create an abstraction so everyone gets to do it their way and you don't have to debate?
        [-]
        ptx 1383 days ago
        Or the heated debate might be an indication that it's not clear which choice is the right one, so we use an abstraction to make it easier to change our minds later if it turns out our choice was wrong.
        vardump 1383 days ago
        I think more like something that gives rise to an argument might be more multifaceted issue than it seems like to any given single participant?
  - agumonkey 1383 days ago
    I like the change isolation viewpoint, also the space subdivision aspect. A new problem is large and dark, cut it into small enough pieces that have potential to be a bit stable and aim at a good balance between subdivision and links (or nodes and edges in graphspeak). This will divide any change in one node into the minimum amount of impact on the whole system.
- kortilla 1383 days ago
  You’re overly restricting the process of abstraction to making it conceptually easier to grasp for users. That’s only one use case for abstraction. Let’s call it dumbing down for your external callers.
  There is also developing abstractions for south-bound APIs where it goes the opposite way. I declare that I want to be able to do operations X, Y, and Z but I don’t care how it happens. Then an implementer will be the one who knows all of the details specific to their implementation even though I’m the one who “abstracted” by defining the abstract API signatures.
  > it is the process of creating an interface that is familiar and conceptually easy to grasp
  Sometimes it’s not though. Often times the “familiar” and “easy to grasp” is the API burdened by legacy systems tied too much to one implementation. In this case the process of abstraction is cleanly separating the intent from the legacy crap and defining a new interface that captures the intent and nothing more (not leaking the details of the “how”).
  An API that writes bytes to an arbitrary location on a disk is certainly familiar and easy to grasp. But without a filesystem abstraction on top of it, it’s not really useful if you want portability, multiple processing using a disk, etc.
  [-]
  - kqr 1383 days ago
    I think your first point is really about the same thing, only taking a top-down design approach instead of bottom-up. The top-down approach is good in cases where you're confident in your ability to estimate the feasibility of constructing an implementation that matches specifications. In cases where you äre more likely to do well in estimating the utility of constructed implementations, the bottom-up approach works better. (Randell, 1968)
    In reality, though, a combination tends to be used, and development is an ongoing process of "I have this set of operations and I would like to construct a component that meets this specification -- how do I join those realities?"
    I agree strongly with your second point, though.
    [-]
    - kortilla 1383 days ago
      But they aren’t the same at all. In one case you fully understand the implementation because you are the author. You’re just guessing what your consumers’ use cases will be.
      When developing a driver framework, you are the consumer and you’re just declaring what you want a provider to do. You don’t know anything about how it will be implemented nor do you care.
      They are completely different actors in the system designing the API with completely different goals. It’s not just top-down vs bottom-up unless you’re both the consumer and the implementor, which is just a trivial subset of API work.
- jesseduffield 1383 days ago
  I disagree about the primacy of naming and metaphor here. From my perspective, the only reason the unix pipe is intuitive to people is that it actually behaves like a pipe. I don't think that the abstraction of 'pipe' started by thinking of the metaphor, but rather it started by thinking about how best to reduce complexity when it comes to IO operations, and it just so happened that the best way to do it was with something resembling an actual pipe.
  That is to say, the first step is to find the representation of something which reduces overall complexity, and the next step is to think about naming. In my experience, one of the signs that you've found the right abstraction is that it's easy to find a name for it, because the intrinsic complexity of most things is low (or at least lower than I thought when I started programming), and so it's not hard to find metaphors when your abstraction brings the extrinsic complexity of a system down towards the level of intrinsic complexity of its requirements.
- thecupisblue 1383 days ago
  I think it may be more accurate to describe it as >the process of expressing ideas that are conceptually familiar to users, often through metaphor.
  IMO fits it better than implementing interfaces.
hackeryogi 1384 days ago
Well written article
> DON'T BE AFRAID TO DISMANTLE THE WRONG ABSTRACTION
Couldn't agree more with the statement, though I don't completely agree with the author's suggestion to copy paste. Duplicating code _is debt_. It may help us go faster now, but it'll almost inevitably come back to bite. It is manageable if 1/2 people do it 1/2 times - definitely not manageable if 5/6 people do it 5/6 times.
I believe the general hesitation of not touching a piece of code (or, getting by with that optional param) is due to the fear of fucking things up. Having your code test covered gives an amazing amount of confidence to rip apart old abstractions to yield newer ones that serve the purpose of the _current code_. To me, this route is more preferable to duplicating code.
Even with the best of intentions, Hacking an abstraction with that one optional parameter is inevitable. Tests help in our ability to repay that debt faster - on time & in full.
Basically they make all abstractions a lot cheaper - easier to write and easier to throw away. Thereby solving the problem of having a 'wrong abstraction' too early.
[-]
- sagichmal 1383 days ago
  It is almost always better to copy/paste a function to accommodate "that one optional parameter" that breaks the original abstraction, than to add the parameter to the function signature. The "cost" of a broken/leaky abstraction is at least an order of magnitude higher than that of duplicated code.
Ozzie_osman 1383 days ago
> Why is it a good idea to abstract the formula for a sphere's volume into its own method? Because if mathematicians ever found out they got the formula wrong, you would want to go through all the places in your code that you used the formula and update it to be correct. That is, we know ahead of time that we want the code to be in lockstep.
This is actually not the main reason you'd want to abstract, and I think the whole article kind of gets it wrong. The main reason to abstract is not to keep code DRY, but to "abstract away" things that are not important in a certain context (or layer). You want to put the formula for the volume of a sphere aside when it's not relevant to what the code at hand is doing and would get in the way of trying to change or understand that code. For example, a really strong case for abstracting that formula is if you're writing code that is calculating the volume of several shapes.
Yes, code duplication is often a sign that you've screwed up your abstractions (ie they correlate), and creating the right abstraction will often make your code more DRY, but it's the means, not the end. The end is code that is easy to read, understand, and change.
My high-level "sniff test" is essentially rubber-ducking (try to explain to yourself, someone else, or a duck) what the code is doing. For instance, you might explain a function called calculate_remaining_space_in_box:
1) we get the volume of all shapes (including spheres) in the box
2) we get the volume of the entire box
3) we calculate the difference between them
In that explanation, you realize there's really no extra benefit to a reader of that code at that level to knowing the exact formula for the volume of a sphere (or any shape for that matter).
There are, of course, other signals to measure whether you've abstracted correctly beyond just code duplication. For instance, and the Single Responsibility Principle is a good example (https://blog.cleancoder.com/uncle-bob/2014/05/08/SingleRepon...). Code that changes for the same reason should usually be grouped together, and code that changes for different reasons (or at different frequencies) should be grouped apart. But again, this is in service of the end goal: making code easy to read and change.
bokwoon 1383 days ago
One of my favourite abstraction advice comes from this article: https://blog.carlmjohnson.net/post/2020/go-cli-how-to-and-ad....
"You want one layer to handle user input and get it into a normalized form. You want one layer to do your actual task. And you want one layer to handle formatting and output to the user. Those are the three layers you always need."
The "do the task" layer can be abstracted again further. But starting it off as a monolithic layer, separated from input and output, is always the right call.
xcskier56 1384 days ago
This is the sort of article that I could have really used at about year 1.5 of my programming career. I’ve learned many of these lessons the hard way and resonate/agree with the examples here. It would have been really nice to have read this years ago and not have to hit my head on quite so many sharp corners to learn.
You invariably have to hit hour head sometimes, but I hope clearly written articles with understandable but not completely contrived examples like this one reduce the head knocks for some people
[-]
- darkteflon 1384 days ago
  I came here to say exactly the same thing. This is such an important lesson to get early on - even at the very beginning of your career - even if you don’t fully understand it until later.
l0b0 1384 days ago
Nicely put! Most best practice articles end up reading like dogma because they only ever show clear-cut cases where the best practice applies. Augmenting the plain good/bad examples with good/bad depending on the situation examples seems like a great way to avoid that.
halayli 1383 days ago
IMHO one of the best indicators that I've nailed the abstraction is when I am able to use/re-use it in many places.
A concrete and easy example is when you're developing a new software and building your common utils/libs. If you've nailed the abstractions, you'll notice that you're able to expand the libs by frequently reusing/leveraging other libs you've built.
Abstractions come to exist from the requirements. IMO the key to a good abstraction is to be able to dissect a requirement into smaller requirements that you are familiar with and have already solved and have a solid understanding of them.
Metaphorically speaking, bad abstractions are ones that converts a requirement into a new polygon shape, and good abstractions are ones that dissect a requirement into one or more shapes that we are all familiar with (circle, square, rectangle, rhombus etc..).
The difference between a polygon and common shapes is that no 2 polygons will look alike unless the requirement is exactly the same and I'd argue that a developer will create a new polygon if asked to solve the requirement twice.
When a new requirement comes in, it's common to start implementing it right away. Create one or more classes with names that maps to the requirement, add few methods etc and voila you have a new polygon shape.
The key point here is to dissect the requirement into sub-requirements that look like familiar shapes (problems you've previously solved). Every once in a while you'll end up creating a polygon here and there for a sub-requirement, which get refactored over time to a known shape.
A good developer can quickly see the familiar shapes that the requirement is hiding behind instead of creating a new polygon shape.
searchableguy 1383 days ago
This article is good (although they could work on examples a bit more). One thing I have really found useful when working with elixir is that it gives you a way to abstract common patterns or add more use cases by differentiating based on arity and pattern matching.
it's easy to write the average function in the article in this way.
```
  def average(arg) when Enum.all(arg, &is_string/1) do
  # implementation
  end
  
  def average(arg) when Enum.all(arg, &is_integer/1) # for the int
```
which would be a more proper abstraction as it hides details of the type of your data.
or using pattern matching in the arguments.
```
  def shape("circle", ...) do
  # implementation
  end

  def shape("square", ...) do
  # different implementation
  end
```
When you have map as an argument, you can do
```
  def is_good(%{ hn: HN }) do
    IO.puts "#{HN.someprop} is good"
  end

  def is_good(%{reddit: Reddit}) do
    IO.puts "#{Reddit.someprop} is bad"
  end

  def multiply_on_two_numbers_otherwise_square(a), do: a * a

  def multiply_on_two_numbers_otherwise_square(a, b), do: a * b
 
```
My examples are trivial but this really gives you some awesome refactoring powers.
imvetri 1383 days ago
I want to share few things I learnt while making https://github.com/imvetri/ui-editor.
It abstracts component development for frontend hiding details about framework.
I applied DRY principle on code that we write. Framework syntaxes are a repetitive hardcode that I tried to abstract.
Pragmatic programmer is the book referred to me by a friend of mine and it definitely works.!
deltron3030 1383 days ago
Kinda sad that we can't focus on "concrete designs" and have to deal with high costs of rewrites and therefore "manual organization" and finding those abstractions. If rewrites wouldn't be costly it just wouldn't make much sense to compose an architecture manually.
Parametricism is slowly taking over other industries like architecture and industrial design. In essence it's automatic rewrites and programs finding the right abtractions/compositions/organizations based on given parameters, where the designer job is more in the actual problem domain, providing the right paramters to the program and selecting the most promising outcomes of that automated process.
The web moving to site generators and serverless is maybe a glimpse of a future with dynamic site generators, where the generators then get much smarter and responsive to input parameters and surrounding contexts.
rahulmax 1383 days ago
I'm a designer. Just came here to say, this is so much inline with the "right level of abstraction" in graphic design:
https://computersciencewiki.org/index.php/File:Abstract_hear...
kovac 1383 days ago
I'd argue that your definition of an abstraction resembles wrappers (which IMHO is the weakest form of abstraction) rather than abstraction in general. I think it's better to take it as modelling a complex system in a simple way to solve some specific problem by stripping the unnecessary details. For example, design patterns are abstractions but I don't think they all qualify as simply collecting a larger interface into a smaller one. Similarly, a virtual machine process like JVM is an abstraction for the hardware details which is also a lot more than simply reducing the size of the hardware interface. Still, a useful article. Thanks.
[-]
- specialist 1383 days ago
  Agreed. Pretty good, worthwhile article. But it's about code construction and organization, spanning abstract data types (ADTs) and object-oriented design heuristics.
  https://en.wikipedia.org/wiki/Abstract_data_type
  https://www.oreilly.com/library/view/object-oriented-design-...
luord 1382 days ago
In the notices example, I arrived at number one for answer because of a simpler evaluation: it's less code.
One of the principles I follow is "the best code goes unwritten" so a good rule of thumb for a good abstraction, for me, is if it reduces (noticeably) the total amount of code. Conversely, if abstracting just increases the total code, I call it an indirection and avoid it.
mlthoughts2018 1384 days ago
One hard fought lesson I’ve learned over the years is that copy/paste is often a very good solution. If the downside is that a developer has to manually spray a change to 50 different locations with the same copy/pasted implementation snippet, that’s really fine. Even for 500 or possibly 5000 locations is fine, with a good editor available or other tooling. Testing these changes for correctness is easy.
Meanwhile the cost for getting an abstraction wrong is often far worse. And it’s really easy to get wrong because abstractions by their nature are always built based on yesterday’s data plus good intentions. People are arguing about subjective theories and unmeasured concepts of extensibility, wasting time arguing about Liskov Substitution Principle, SOLID, dependency injection, type system design patterns, etc., but it’s mostly junk that just adds code bloat.
Obviously there are other solutions that don’t require premature abstraction or copy/paste 1000 times (like simple module functions as the core unit of reusability, or a macro system for code injection). But the point is copy/paste gets a bad wrap. It’s simple, straightforward, easy to automate, easy to test, and adds no extra concepts to the code.
“Functional core, imperative shell” is the best advice I’ve found. Ruthlessly avoid object orientation, and when you need it, stick to extremely shallow inheritance. Make everything a module function, and when you write data structures, don’t give them member function logic for their core functions (like search, sort, add items, remove items), rather create functions that accept data structure instances as arguments and perform these operations with no class-like internal state.
[-]
- tigershark 1383 days ago
  No, having to change the code in 50 plus locations is not fine at all, it’s criminal. In a team there will be always someone that misses some of the locations and the implementations will inevitably diverge over time. And at that point every time that anyone needs to do a change he will need to manually fine tune each of the implementations and test each of them since their behaviour may well be different.
  [-]
  - mlthoughts2018 1383 days ago
    Your concerns (missing locations and divergence over time) are fairly easy to address as copy/paste scales. Of course there are downsides, just as there are major downsides to solving with abstraction, it just comes down to a tradeoff. Many times copy/paste wins that tradeoff but is refused because our industry has a cult obsession with abstraction.
    [-]
    - tigershark 1383 days ago
      How do you address them fairly easily?
      [-]
      - mlthoughts2018 1383 days ago
        Various editor tools, code quality tools like sonarqube that can be configured to track duplications, writing code generation tools (usually around templating or macro libraries), and heavy coverage of unit testing.
        You can do “test once, reuse everywhere” without any inheritance at all. You could use metaprogramming, sealed types with pattern matching, or macros/code-gen. It’s just tradeoffs to decide between them.
        The point isn’t that creative abstraction never wins that tradeoff, rather just that it wins very infrequently compared to how often it is preferred based on parochial “software design” hand wavy reasons.
        [-]
        tigershark 1383 days ago
        So you will need external tools to find duplication and edit all the locations hoping that everyone in your team does the same. In the meantime instead of having 30 lines that solve the problem you’ll have 30*50=1500 lines spread everywhere that do exactly the same thing. With a proper abstraction you can just look at one line and understand what it does, with your solution you need to look at 30 lines for 50 times. And all this because you are incapable or too lazy to find a proper abstraction? I worked on projects quite big with a lot of copy and paste and only who never had this “pleasure” can be in favour of copy and paste. Code generation, meta programming, macros are all tools to avoid duplication not to allow you to copy and paste multiple times.
- im3w1l 1384 days ago
  “Functional core, imperative shell”
  I would say this in a slightly different way myself. "Mathematical core, business shell". Code in the mathematical core is defined by it's behaviour. "sort", sorts a list. Functionality in the business shell solves business problems. For instance "add_customer".
  In a year's time, sort will do the exact same thing. But "add_customer" may update a different database from what it used to, and in a different way. Code in the mathematical core will never become incorrect because the world has changed. But it can become irrelevant. Sorting may no longer be needed by the application.
  [-]
  - tigershark 1383 days ago
    Not exactly, the functional core/imperative shell paradigm has nothing to do with the data model/business logic distinction. It’s used in functional languages to create an imperative boundary with the real world and its side effects. For example in the functional core you’ll have no nulls and all the business objects are valid. In the imperative shell that interfaces with the real world you validate all the inputs, build your valid objects and handle all the external error conditions.
- wh33zle 1383 days ago
  Adding to this, copy/pasting things works especially well if the code is contained within another abstraction that is much more unlikely to change. For example, say your service has a REST API. It doesn't matter much, how the functionality is implemented as long as you are fulfilling the API contract.
  It is so much easier to just get the functionality down first, properly test it and once it is all there, good abstractions within the implementation might just surface from looking at the (duplicated) code.
- dragonker 1384 days ago
  Could you elaborate on '...when you write data structures, don't give them member function logic,' please?
  [-]
  - mlthoughts2018 1384 days ago
    I mean
```
    heap_sort(my_array)
```
    is strictly better than
```
    my_array.heap_sort()
```
    The main reason is that fluent interfaces (the second format) makes it much harder and require more code to do patching / mocking / dependency injection, since you need to carefully patch around instance creation and ensure the patching only applies to specific instances during tests where some objects need patching and others don’t.
    The first interface above just needs the simple top level name “heap_sort” patched or mocked with easy control of the scope. The act of array construction and the knowledge of internal array data is completely isolated away from the act of sorting.
    As you start adding parameters to the mix and involve highly stateful internal operations, start mixing with inherited methods, class methods and static methods, and as you chain the fluent interface (eg foo().bar().baz()) all these effects get worse and test complexity becomes unberable.
    Basically it boils down to function composition.
```
    f(g(x))
```
    is simply a better way to structure programs than
```
    x.g().f()
```
    And if you’re worried about writing f() and g() once and not rewriting an implementation, there are many ways to achieve this without classes - the best way being use of GADTs to define exhaustive pattern matching in f() or g() that allows custom implementations matched on basically simple record types or named tuples, instead of defining the custom implementation in overridden inheritance methods.
    You don’t need pure functional programming or compiler enforcement to do this either. It is how I’ve structured large C and Python programs in business settings for many years.
    [-]
    - rileymat2 1384 days ago
      The promoters of oo would not disagree with this example. Clean Code goes into a lot of subtle detail about data structures v. objects.
      https://blog.cleancoder.com/uncle-bob/2019/06/16/ObjectsAndD...
    - kqr 1383 days ago
      But if your language uses lexical scoping (like nearly all of them do) then the procedure call will be incredibly hard to mock in any code that isn't prepared by taking it as an argument.
      As an example I might want to test code that internally calls heap_sort but replace the heap_sort with a test double (sounds like a silly example but maybe the heap_sort calls out to a heap sorting microservice or something equally dumb.)
      With the procedure call and lexical scope, that's not going to happen. With the instance method, that might happen if I pass in an array with a different heap sorting implementation.
      ----
      To be clear: I'm all for functional composition over mutation through interface methods. But it's not, primarily, about syntactic convenience. It's about immutability and referential transparency.
      The example you picked is bad for another reason too: heap_sort is not a fundamental method to arrays that should be part of its core definition. So no, clearly it shouldn't be an instance method, but for different reasons!
      [-]
      - mlthoughts2018 1383 days ago
        I think it’s the opposite. It’s not going to happen with the instance method version, because patching instance creation is much harder. Patching within the scope of heap_sort is much simpler. I actually gave a real example of this in another comment not long ago:
        https://news.ycombinator.com/item?id=23573358
    - ivalm 1383 days ago
      So would you say there is never a use case for something like builder pattern? I feel like at least for readability builder patterns can be very nice
      (Take blah then do a then do b then do c then do d then return the result which is if same type as blah).
      The functional approach basically moves “blah” all the way to the right.
      In terms of mocking builder methods are just as simple (since they return the class rather than modify in place).
      (Concretely I am think of something like transformations on dataset, especially if you do different transformations/in different order at different places in code)
      [-]
      - mlthoughts2018 1383 days ago
        The trouble with builder pattern is that it basically introduces a new concept (the builder class and any class hierarchy of builders), purely for the sake of constructor functions.
        The constructors that live in builder classes can just be separate module functions, attached to no class, and use decorators or other non-object-oriented types of metaprogramming to add specializations.
        It’s very nice to encapsulate complex creation in a helper function, but builder pattern just takes this idea and adds unnecessary code bloat.
    - tigershark 1383 days ago
      In C# you can simply use the extensions methods to have the syntactic sugar of the fluent interface and the flexibility of the static functions.
    - Izkata 1383 days ago
      > fluent interfaces (the second format)
      > and as you chain the fluent interface (eg foo().bar().baz())
      Aside: A lone method call isn't a fluent interface (though I suppose it could be part of one), so the naming here is a bit misleading. A fluent interface is a particular type of method chaining, so the second example may or may not be one depending on what foo/bar/baz actually do.
    - andi999 1383 days ago
      Excellent pointsm What does GADT stand for though?
      [-]
      - kqr 1383 days ago
        Generalised algebraic data types. You might come across it as an extension to the Haskell type system -- making it even more expressive.
- crimsonalucard5 1383 days ago
  >“Functional core, imperative shell”
  IO by nature is an imperative phenomenon. Inputs, outputs and prompts happen in procedural temporal steps.
  Due to this, you have to make the shell imperative. You have no choice because reality is imperative and the shell interfaces with reality.
  Unless you're thinking about haskell, but even haskell has a black box that takes the IO monad and does imperative things with it.
jyriand 1383 days ago
In a perfect world all my abstractions would be small and self-contained packages, that i could import as a library without any dependencies.
bvrmn 1383 days ago
My first advise for beginners is not to start a new code with classes. It allows to see shared state patterns after they build some data model and extract it to real abstractions.
crimsonalucard5 1383 days ago
People don't understand how to build the perfect abstraction. They think you can't but the truth is, you can get really really close.
What usually happens is even if you designed something that is 99% perfect what ends up happening is requirements change or you didn't anticipate some emergent complexity occurring out of nowhere. Due to this, the abstraction always ends up being imperfect.
But like I said you can actually get around this problem using a technique people usually talk negatively about.
The perfect abstraction is actually ravioli code.
Take for example: fizz buzz. Your client wants you to impliment fizz buzz.
Typical solution/abstraction level:
```
   def fizzbuzz(n: int) -> None:
      for i in range(n):
          if i % 5 == 0:
              print("Buzz")
          elif i % 3 == 0:
              print("Fizz")
          else:
              print(str(i))
 
```
Ravioli code:
```
   def map(func, list_of_stuff):
       return [func(i) for i in list_of_stuff]

   def create_numbers(n: int) -> List[int]:
       return [i for i in range(n)]

   Enum FizzBuzzCase = Fizz | Buzz | Int
   def logic(i: int) -> FizzBuzzCase:
       if i % 5 == 0:
           if i % 5 == 0:
              return Buzz
          elif i % 3 == 0:
              return Fizz
          else:
              return Int

   def handle_logic(logic_result: FizzBuzzCase, i: int) -> str:
       if logic_result is Fizz:
            return "Fizz"
       elif logic_result is Buzz:
            return "Buzz"
       elif logic_result is Int:
            return str(i)

   def display_string(s: str) -> None:
       print(s)

   def fizzbuzz(n: int) -> None:
      map(display_string(handle_logic(logic(i))), create_numbers(n))
```
Less readable more complicated. But believe it or not this ravioli code is a better fit for what most people perceive as the "perfect abstraction" Although both examples of code fit the requirements of the code perfectly as with all projects requirements change and the ravioli code is the only one that can maintain code that is always a perfect fit with the requirments.
Let's say the client wants to change this: Fizz must be printed when less than 50 and buzz must be printed when >= 50. Just rewrite one function:
```
   Enum FizzBuzzCase = Fizz | Buzz | Int
   def logic(i: int) -> FizzBuzzCase:
       if i < 50:
          return Fizz
       else:
          return Buzz
```
Let's say the client also wanted to reverse the numbers printing from n to 0. Just rewrite one function:
```
      def create_numbers(n: int) -> List[int]:
       return [i for i in range(n, 0, -1)]
```
Let's say the client also wanted to change from printing to logging to a file. Just rewrite one function:
```
      def display_string(s: str) -> None:
         log_to_file(s, "log.txt")
```
Let's say the client also wanted the system to print "Foo" instead of Fizz and "Boo" instead of Buzz. Just rewrite one function:
```
      def handle_logic(logic_result: FizzBuzzCase, i: int) -> str:
       if logic_result is Fizz:
            return "Foo"
       elif logic_result is Buzz:
            return "Boo"
       elif logic_result is Int:
            return str(i)
```
None of these edits could be done with the initial version of FizzBuzz without a massive rewrite or ugly ass hacks
Essentially the prefect abstraction is a program with many many small abstractions. The smaller the abstractions the better. Those abstractions combine to form higher order abstractions and those higher order abstractions combine again to form even higher abstractions.
By building your abstractions using this method you can tune your abstraction at the smallest level of resolution to the point where it is nearly perfect.
If requirements change you simply need to retune the higher order abstraction by rearranging the lower level primitives.
Ravioli Code is the key to the perfect abstraction. But if you look at the ravioli code, initially, overall it looks pretty bad. It's verbose, harder to read and over complicated.
However the ravioli code fits the shape of the problem and evolves with changing requirements!
One caveat. Don't try doing this with OOP. It doesn't work. Try it if you don't believe me. First break fizz buzz into OOP ravioli primitives, then try to see if it's easily adaptable to the changes I listed above.