This advice has been around for a while (in various forms) I would say most people converge on this after a while. The only problem I have is with
"My response is generally the same to all of them, the maintenance for two function is very small compared to a bad abstraction that can grown into a monster very quickly."
That's a little bit of a red flag. Bad abstractions shouldn't lead to monsters. This is more important than when to abstract! Bad abstractions are going to happen no matter what way you create them, but you should always have a design mindset that leads you to being able to back out of them and change your code. Two very simple design concepts (but often have very deep implications) help with that: composability and modularity. They should guide your refactoring a lot, not just that there is duplication, but if you do extract it, where to? and how will it compose with other things ? you do that well, often your abstractions are easier to change. (Do it badly and often you will see an explosion of types to allow composition that often then make your abstractions much harder to change)
If you google 'when to refactor rule', the first hit is "rule of three" followed by a bunch of other articles also calling it by this name. It's a fairly popular term/idea.
The problem with this in practice is that you're often not the person implementing the 3rd instance. It's some other dev who comes and sees 2 copy/pasted implementations, either figures there's a good reason for it, or is lazy, or is risk adverse, or just wants to get the job done and move on, and copy/pastes it a 3rd time.
Explanatory comments that tell the reader to go look for other instances of this pattern are the key to success.
// AB-20200405 - Hi future developer (probably me),
// if you're copying and pasting this code, then
// please take an extra five minutes to locate the
// other similar instance of this code snippet in
// the codebase, and refactor both of these into a
// method first. The other snippet will have the
// same comment that precedes this one.
//
// Here's what this overly clever snippet of code
// is doing, so that when you do refactor it into a
// method in 2023, you don't miss an important edge
// case. Ta!
Or, the new dev doesn’t see the second copy, because they gave you the benefit of the doubt, and only apply a bugfix to one copy.
Later, since your team is growing, some other dev applies some other fix to the other branch (which was also copy pasted three times by three distinct devs because “rule of three”).
Soon, you have a dozen copies, and they are all wrong/diverged enough to not be clear copy pastes any more.
I’ve seen this happen more often than not, especially when scaling teams. Since the team is scaling, the original dev won’t always review all the code changes.
There's an art to knowing when it's time to generalize/create an abstraction. Because of this I find it silly to create heuristics or rules for doing so. They're as likely to be wrong as to be right.
Which leaves us in a spot where this is always subjective. Sometimes the third duplication is the time to generalize. Sometimes it's the tenth. Sometimes the first.
Genrralizing is only helpful for the future; in the present, it doesn't do much (unless you're simplifying egregiously complex code -- but this doesn't require a generalization, just hygiene).
Thus, every generalization is a guess, at best based on previous experience, unless you already know you're going to repeatedly use it already. So you'll always have some kind (experienced-based) hueristic; you might as well define one explicitly to go by, as a rule of thumb
I completely agree with you. There's never a true "rule" of when to refactor. The 1, 2, n pattern or as I've learned tonight, "rule of three" is more of a basic guideline than it is a steadfast rule.
I think it's easy to say, "If you copied a piece of code three times, please put that in a common place" to a junior engineer as a basic lesson for getting started with refactoring.
I generally agree with this, although this is more helpful when you can't predict future usages of an abstraction. I don't think there's anything wrong with an early abstraction if you know for certain it will be reused, or if it makes the code easier to understand.
The funny thing is that what we'd really need and lack of is inlining an abstraction. If we could just systematically broke down the abstraction hierarchy and rebuild it from the ground when needed with a helper tool, then many issues would go away. The stress is on "systematically" & "helper tool". Sure you are smart and you can do it but its easy to forget about cases. Coding is not hard at all from this perspective, not to forget about any case is hard - and tedious - can be automated for sure.
The more important thing to me is to make sure the code is written a way that makes it easy to refactor / add abstractions whenever it becomes necessary. Things like:
- having explicit variable / function names to make global search/replace easier when necessary, but generally just for clarity.
- thinking about file structure early and often
- always using constants to represent any value that is repeated across the system
You learn this pretty quickly in practice, I think every junior dev has tried to make an abstraction the first time they saw some complicated business logic and failed miserably. Now if I think an abstraction is easy I'll do it the second time around, otherwise I'll wait until its absolute (n)ecessary.
A related (but distinct) concept I've heard is "0, 1, n" rule: there are rarely actually 2 or 3 or 10 of a "thing". But often there are 0 of a thing, or 1 of a thing. If it's not 0 or 1 thing, it's probably n things.
I've heard a slightly different version of that. 0, 1, infinity. For a data item, there should be 0 items, exactly one item, or (up to) an infinite number. You shouldn't have, for example, a fixed limit on the size of a string or the number of different phone numbers a person can have. If they can have more than one phone number, they should be able to have an infinite number.
I'm doing this on a project I was pulled in on. I have 5 sizable behaviors, all copy pasted. I finished merging them this weekend.
"When you find yourself making the same enhancement in many places".
One of the first things I learned as a programmer was "Did you fix it everywhere?" Then, later, I realized how nice it was to have only one place that it needed to be fixed.
"My response is generally the same to all of them, the maintenance for two function is very small compared to a bad abstraction that can grown into a monster very quickly."
That's a little bit of a red flag. Bad abstractions shouldn't lead to monsters. This is more important than when to abstract! Bad abstractions are going to happen no matter what way you create them, but you should always have a design mindset that leads you to being able to back out of them and change your code. Two very simple design concepts (but often have very deep implications) help with that: composability and modularity. They should guide your refactoring a lot, not just that there is duplication, but if you do extract it, where to? and how will it compose with other things ? you do that well, often your abstractions are easier to change. (Do it badly and often you will see an explosion of types to allow composition that often then make your abstractions much harder to change)
Later, since your team is growing, some other dev applies some other fix to the other branch (which was also copy pasted three times by three distinct devs because “rule of three”).
Soon, you have a dozen copies, and they are all wrong/diverged enough to not be clear copy pastes any more.
I’ve seen this happen more often than not, especially when scaling teams. Since the team is scaling, the original dev won’t always review all the code changes.
Which leaves us in a spot where this is always subjective. Sometimes the third duplication is the time to generalize. Sometimes it's the tenth. Sometimes the first.
Thus, every generalization is a guess, at best based on previous experience, unless you already know you're going to repeatedly use it already. So you'll always have some kind (experienced-based) hueristic; you might as well define one explicitly to go by, as a rule of thumb
I think it's easy to say, "If you copied a piece of code three times, please put that in a common place" to a junior engineer as a basic lesson for getting started with refactoring.
- having explicit variable / function names to make global search/replace easier when necessary, but generally just for clarity.
- thinking about file structure early and often
- always using constants to represent any value that is repeated across the system