The problem with invariants is that they change over time

(surfingcomplexity.blog)

117 points | by kiyanwang 11 days ago

17 comments

BlackFly 10 days ago
I would recommend not calling such things invariants and not thinking about them in the same way you think about invariants. If you would allow a comparison to physics then you would call such things assumptions: frictionless pulleys, small angle deflections, much slower than light speed, less than nuclear density. The physical theory developed is then correct while the assumptions hold. If you are making such an assumption and your language supports it, add a debug assertion if possible.
For me, invariants are constructs of the design of an algorithm. A list doesn't change while iterating over it-by construction, not because someone else owns the reference and promises not to change it concurrently. This structure can only be instantiated via this function which ensures that that string field can always contains only numeric digits. Those are invariants, thing you enforce by design so you can rely on that later. Assumptions instead are things you do not enforce but rely on anyways.
Back to physics, an invariant would generally be something like conservation of energy or the speed of light being universal. Both of these things are only invariant in certain physical theories which enforce them by construction.
[-]
- MereInterest 10 days ago
  I think I'd put the difference between "invariant" and "assumption" as purely a difference in framing. Both describe a problem in communication between two components of a system. Suppose there are two components, where the output of component A is sent to component B. Both components seem to be working correctly, but when run together they are not producing the expected output.
  * Option 1: Component A is violating an invariant of its output type. Whether that is invalid data (e.g. nullptr), internal inconsistency (e.g. an index that will be out of bounds for its array), or some (e.g. matrices with determinant one), the output generated by Component A is wrong in some way.
  * Option 2: Component B is making an unjustified assumption about its input type. The implementation was only valid when the assumption held, but Component B's interface implied that it could accept a wider range of inputs.
  The only difference between these two cases is the human-level description of what the type is "supposed" to hold. By describing it as an "invariant", the blame is placed at Component A. By describing it as an "assumption", the blame is placed at Component B.
- DougBTX 10 days ago
  Agreed, we shouldn’t use “invariant” as a shorthand for “not expected to change”.
  We can still talk about “implicit assumptions” and “explicit assumptions”, so the article still works after that replacement. The headline isn’t as punchy though.
- n4r9 10 days ago
  Precisely. If you asked a developer at the time whether the "assumption that any loaned memory would fit into one region" was static, I'm guessing they'd say "no, it's just what we've assumed for now. I'm hoping that no one changes that without it being discussed properly."
- xmcqdpt2 10 days ago
  I don't think the examples you provided differ all that much. Someone might add a new constructor that no longer sets the string fields correctly, or your function might get modified so that it can be passed a mutable structure. Type systems and tests can be used to make it harder for future people to break your invariants, but they can still be broken.
  [-]
  - jayd16 10 days ago
    Theres a difference between the user breaking an assumption and a self inflicted breaking change. For starters the former can happen after compile time.
  - BlackFly 7 days ago
    You've chosen a good example to further this discussion precisely because it further emphasizes the differences between assumptions and invariants in code: the point when it breaks. Assumptions break at unpredictable times (like when the machine architecture changes), invariants break when someone modifies the code.
    If you or a colleague have modified the constructor NumericString to no longer enforce the invariant that it only includes digits (remember though that utf-8 has more than just the ascii digit characters, we aren't necessarily talking about ASCIINumericString) but can include hyphens, a reviewer should have alarms going off in their head: somewhere in the code surely some conversions to numbers is going to fail without further manipulation. That is the benefit of an invariant, it is a part of the code by construction and enforced there; the purpose of reviews and the purpose of encapsulation and strong typing is to narrow the boundary where the invariant must be enforced.
    Assumption on the other hand have no narrow boundary. Memory size changes, what part of your code is no longer valid? It is potentially unbounded if your code was allowed to rely on something that was not enforced.
- Joel_Mckay 10 days ago
  Personally, I view most invariants as another form of state embedding. It can often be unrolled or duplicated (GPU mailbox) for parallelization, but does not necessarily exclude out-of-order chip specific atomic operations on semaphores etc.
  There is a folk song written about Black flies:
  https://www.youtube.com/watch?v=w2q8jEuH3nk
  =3
- im3w1l 10 days ago
  Well consider a system built on the assumption of conservation of energy. Then a few product iterations the system might start leaking energy into the environment. Maybe it's the neodymium magnet in some unrelated component leading to eddy current braking?
  [-]
  - epgui 10 days ago
    That's not a violation of the conservation of energy, and the invariant isn't broken.
- richrichie 10 days ago
  Well stated. Physics analogy is spot on.
bruce511 10 days ago
Looking back, I find it hard to offer concrete advice. So many of the "things I know to be true" changed over time, while so many of the "doing x to allow for future y" didn't lead to anything.
On the other hand, I have systems being worked on and still shipping 20 years old because of fortuitous decisions in the 90s. Most of those taken because of bad experiences with earlier systems.
For example, I'm a big fan of surrogate keys and UUIDs, in database design. Mostly because of experience where they weren't used which caused problems later. (By contrast, performance costs of using them become lower every year.)
Conversely deciding to build code libraries parallel to application development means long-term gains in performance and reliability.
It's easy to view all this in hindsight, -much- harder with foresight.
Today I try and see everything through 2 rules.
1) beware of "always" statements, especially especially in regard to client requirements. IME "always" is used as a synonym for "mostly" and "at the moment".
2) choose paths now that keep options open later. Will this system ever use distributed data? Probably not, but choose a primary key that leaves that option open.
Lastly, and this applies to data design, remember "nothing is universal". Using Social Security number as an ID is great, until they outsource. Phone number is always 8 digits? Yeah right. Nobody works night shift... you get the idea.
If in doubt, err on the side of "variant". You'll be right more often than not.
[-]
- abraae 10 days ago
  > For example, I'm a big fan of surrogate keys and UUIDs, in database design.
  You would think this (using surrogate keys) would be such well worn wisdom by now that discussions about it wouldn't be a thing. But somehow new generations of developers, weaned on nosql and on not using the god-given gifts of databases like integrity constraints, seem to love bike shedding key design and arguing vociferously that natural keys can be used as primary (and foreign) keys.
  [-]
  - prometheus76 10 days ago
    We were pulling the entire list of tasks (around 80,000 tasks from various projects) from Microsoft Project 2012 (MSP) and dumping it into a database every day, so that we could track progress of projects daily. MSP uses GUIDs and they are great for uniqueness, but the memory footprint they caused when reporting using Power BI became untenable after about six months.
    We ended up creating our own integer table tied to the GUIDs and replaced all the GUIDs with integers in our reporting database. Our memory footprint dropped by 10X and we were able to report on years worth of data, rather than months.
    [-]
    - CodesInChaos 10 days ago
      > Our memory footprint dropped by 10X
      How? A UUID is only 4x the size of a 32-bit integer.
      [-]
      - prometheus76 10 days ago
        We had multiple columns using UUID that we needed, and the Tabular memory model goes up exponentially for every column.
  - ThereIsNoWorry 10 days ago
    You wouldn't believe how often I have to fight for UUIDs instead of sequencing. UUIDs are great. For all practical purposes 0 possibility of collision; you can use it as an ID in a global system, it just makes so much fucking sense.
    But the default is still a number natural number sequence. As if it matters for 99% of all cases that stuff is "ordered" and "easily identifiable" by a natural number.
    But then you want to merge something or make double use and suddenly you have a huge problem that it isn't unique anymore and you need more information to identify.
    Guess what, an UUID does that job for you, across multiple databases and distributed systems, a UUID is still unique with 99.9999% probability.
    The one counter-example every 10 years can be cared for manually.
    [-]
    - abraae 10 days ago
      As far as I know the only practical downside with UUIDs in the modern age - unless you like using keys for ordering by creation time, or you have such enormous volumes that storage is a consideration - is that they are cumbersome for humans to read and compare, e.g scanning log files.
      And in any case the German tank problem means that you often can't use incrementing numbers as surrogate keys if they are ever exposed to the public e.g in urls.
      [-]
      - mrkeen 10 days ago
        > they are cumbersome for humans to read and compare, e.g scanning log files.
        I think the positive still outweighs the negative here: you can search your whole company's logs for a UUID and you won't get false positives like you would with serial integers.
    - akira2501 10 days ago
      I've been using UUIDv7 a lot lately and I've been quite happy with the results.
  - marcosdumay 10 days ago
    I never found any kind of literature that clearly states that artificial keys are superior identifiers than natural ones.
    Every material about relational data I've seen either just explains them, says they can be either natural or artificial, and leaves at that, or says there's some disagreement over what is better.
    The unanimity exists only within field experts that apply the stuff. The theoreticians all disagree.
    [-]
    - CodesInChaos 10 days ago
      I almost always use surrogate keys as primary keys (or a tuple of them, for join-tables). Natural keys are rarely as immutable and unique as we'd like.
      Even ISO country codes can be problematic, for example we currently use XK for Kosovo, but it might get an official code at some point.
    - bruce511 10 days ago
      It's less about literature, and more about experience.
      Over the years just about every natural key I've seen ends up being either not-universal (-everyone- has a phone number right? Even that new-born baby?) Or not unique (phone numbers can be shared) or mutable (like people's names changing).
      It's just so much easier to add a column and use a surrogate key. I prefer uuids because they are unique And immutable - properties I really appreciate in primary keys.
      [-]
      - abraae 10 days ago
        It's an easy challenge. Just ask someone to name a natural key that will never change. They are as rare as rocking horse shit.
- magicalhippo 10 days ago
  > beware of "always" statements, especially especially in regard to client requirements.
  Along with its complimentary statement:
  Beware of "never", especially in regards to client requirements.
  Been on too many projects where it turned out "never" meant once or twice per year once we'd gone into production...
  [-]
  - bluGill 10 days ago
    I put it a different way: programmers should gain domain knowledge instead of reading requirement only. If you have an understanding of the customers you can make a good call 90% of the time as to what customers will demand in the future without marketing misunderstanding. Marketing often wants todays features now and so will tell you "never" when what they really mean is don't make the next release take longer to add that future feature, but if you can design for it without making the schedule take too much longer do it.
    [-]
    - magicalhippo 10 days ago
      Having domain knowledge is a superpower of sorts. We're a small team doing B2B software, and especially us senior programmers and the head of the programming team has a lot of domain knowledge.
      Thus as you say we can very often immediately figure out when something is missing from the requirements, or when we feel things might be changing soon down the line.
      We can also detect when the customer is requesting something suboptimal or similar. We've received a lot of positive feedback from customers due to us devs pushing back, helping them to improve their processes and workflow. But it also leads to less work for us, or more generic code which can be reused by more customers.
      Our support team also has extensive domain knowledge, many have been hired from our customers. It makes our support excellent, often the customer's issue is a combination of how to do X and how to do X in our program, and for new devs this is a great resource to tap into.
      Together we're punching well above our weight, dominating our niche with a quite small team. And domain knowledge has helped a lot in that.
- xxs 10 days ago
  >For example, I'm a big fan of surrogate keys and UUIDs, in database design.
  Why I fully support surrogate keys and versioning, I do not support UUIDs as primary keys - they are trivial to generate, not so much to place in B-trees. It's very common to need/process recently added data together, UUID make that much harder as they inherent placement is all over the place w/o any locality.
  UUIDs are ok/fine as external references, of course.
  [-]
  - antonyt 10 days ago
    This is a solved problem. See UUIDv6 and higher.
- yafetn 10 days ago
  > 2) choose paths now that keep options open later.
  Not sure about this one. I’ve seen teams that are too noncommittal in their architecture decisions because “what if [insert unlikely thing that’d not actually be the end of the world] changes?” Then the project ends up only using surface level features of their chosen tooling because people want to keep too many options open and are afraid to commit to something where it matters.
  [-]
  - bruce511 9 days ago
    There's no good idea or pattern that some team, somewhere, can't warp to create a bad outcome.
    Generally speaking there are lots of well-established best-practices, that are best in most contexts, but which fall over in other contexts. Knowing what to do where is part of the intangibleness that separates the great from the good from the competent.
    So some things are good to defer. -if- this product works, we -may- need to scale, so let's choose a database, and design, that allows for multiple servers. Picking say Postgres over SQLite. Other times saying "this is just a proof of concept, do it quick in SQLite".
    Is our API gonna be XML or JSON? Not sure, could go either way. Let's design so that part can be trivially replaced.
    With data design especially, IMO, it's worth -planning- for success. Starting with uuid will end up fine for 99% of the tables which will have "not many records". When we identify that 1%, which are growing by millions of records a minute, we can adjust for that table as desired.
    [On a side note, regarding clustering, it seems some people aren't aware that the primary key and clustered key don't have to be the same.]
- Terr_ 10 days ago
  I like to say "design for deletion", meaning the priority is making sure code with outdated assumptions or fit can be found, deleted, and replaced easily.
  This is in contrast to my how my younger-self would instead focus on making code that "extensible" or "flexible" or "configurable". (With some overtones of impressing people, leaving a long-term mark upon the project, etc.)
  Nope! Go for "delete-able."
  [-]
  - vsnf 10 days ago
    I've never heard this framing before. Can you offer any examples of what you mean?
    [-]
    - Terr_ 10 days ago
      A lot of it overlaps with Ya Ain't Gonna Need It and avoiding strong coupling, but I think the framing makes it easier to stay on target: A developer is less-likely to end up going: "Hey guys, I created a decoupling framework so we can swap anything for anything whenever we want!"
      If you design thinking "X years from now business-requirements or staff will have changed enough that this will need to be ripped out and rewritten, and I am not good enough at predicting the future to solve it cleverly", then it sets up a different expectation of how you invest your time.
      One might focus on code where it's easy to trace dependencies, even if that means a bit more boilerplate and avoiding layers of indirection that aren't (yet/ever) needed. "Greppability" of code also becomes important, especially if your language/tools aren't very good at tracing usages and caller/callee relationships.
      [-]
      - whstl 10 days ago
        Yep, you nailed it.
        Often the problem is exactly this magic "decoupling framework", which ends up being exactly the problematic part that that one eventually wants to swap out.
        [-]
        Terr_ 9 days ago
        I've been trying to think of a micro-example from some refactoring I've been doing lately.
        Long ago somebody made a Thingy.find(...) method where you can pass all sorts of optional arguments to filter which Thingy comes back, like Thingy.find(primary_key=1) or Thingy.find(owner=123, active=true) etc. That complexity made it flexible for people writing caller-code, because they wouldn't need to alter the method to satisfy their future arbitrary needs, they can just pass whatever key-values they want to match.
        However now some original invariants have changed, and now I need to go find all the places it's being used ("flexibly") and reconstruct the caller's intent, because in some scenarios the correct return value is going to be a list of multiple results instead of 0-or-1.
        In contrast, the task would be easier if there was a named method that captured the intent for the 3-4 distinct use-cases.
  - hinkley 10 days ago
    There’s a lot of power in designing unit tests for deletion. I’d have to think about whether I feel that applies universally.
yen223 10 days ago
The problem I think is that when you define an invariant, you are also defining an infinite number of "implicit" invariants. And it's hard to know a priori which of those implicit invariants is not an actual invariant over time.
Take a simple User type
```
  type User {
    name: string
    email: string
  }
```
This describes some obvious invariants - that a User will have a name and an email.
But it's also describing some less obvious invariants, e.g. that a User will not have a phone number. This has implications if a phone number ever does get added to a User type, and your system has to deal with two different versions of Users.
(This isn't just an academic problem - the problem of trying to evolve fields over time is the reason why Protobufs got rid of the idea of a "required" field.)
And this is an example of a very simple implicit invariant. In practice, systems that interact with each other will give rise to emergent behaviour that one might think is always true, but is not guaranteed.
[-]
- atoav 10 days ago
  This is why I tend to like to use strong type systems where possible and necessary.
```
    struct LoginUser {
        name: UserName,
        email: Email,
        phone_number: Option<PhoneNumber>,
    }
```
  First the user type should reflect the function of the user. This is one that is allowed to log in (of course overly simplified). You can then also have a OnboardingUser, a ArchivedUser and so on. And if you want to transition one between those different types you need to ensure the fields are there. This is easy to test and clear.
  The other thing is to use types for fields that have hard verification requirements. You can still have them behave mostly like strings for reading, but you can ensure that an Email in a LoginUser never contains an unverified Email, by first verifying the email string in the OnboardingUser and only converting it into the Email type once it is guaranteed to be a valid email. Because you use a special type tacking on further things like storing a date when this verification happened last is easy, just put it into the type as well.
  Lastly the phone number is an Optional type that wraps a PhoneNumber type. This way it can either be None or Some<PhoneNumber> (this uses the Rust typesystem, but the principle would work elsewhere), the phone number should be verified the same way, ideally relying on a well tested method that also fixes the formatting at the same time.
  Now in Rust if some part of the system didn't handle a user that has a phoneNumber your code won't compile, because from the standpoint of the old code this would now be an entirely new, unknown type never seen before. Sure you could step through your code and explicitly not handle it on purpose because you want back to a running system fast, fail to add a TODO and then later forget about having to do it, but that would be entirely on you.
  Now those hard typesystem constraints become even more valuable when dealing with external systems. Your type should represent all assumptions you are making about the data and test those on each import. Of course you then will have import failures if the import source changed, but we want that to fail if it changes. What we don't want is the thing silently chugging the input into the wrong pipe or invalid phone numbers to be stored because the source system decided to split them up into the prefix and the rest.
  [-]
  - yen223 10 days ago
    Static types are really good for describing invariants, I definitely agree.
    But it does also describe a lot of implicit invariants. The LoginUser you've defined has the implicit invariant that a LoginUser only has 3 fields. It is very possible for someone to write a function that relies on a LoginUser only ever having 3 fields, which then breaks if you ever extended LoginUser. This is similar to the example that the article mentions at the start, where a thing that someone assumed would always fit in one memory region no longer did, and that caused all kinds of problems.
    Those kinds of implicit invariants are the ones that are tricky to make explicit, even in strongly-typed languages.
    [-]
    - saghm 10 days ago
      > But it does also describe a lot of implicit invariants. The LoginUser you've defined has the implicit invariant that a LoginUser only has 3 fields. It is very possible for someone to write a function that relies on a LoginUser only ever having 3 fields, which then breaks if you ever extended LoginUse
      Rust does actually have a good way of dealing with this specific invariant. If you mark a struct with the `non_exhaustive` attribute[0], trying to construct it with a literal will fail outside of the package that defines it (even if all fields are marked as public), and trying to pattern match it outside the package that defines it is required to use the operator to ignore any missing fields.
      Not trying to say that it's possible to encode every possible invariant in Rust, but I do find it useful to leverage all of the tools available, and I've found this one super useful (on both structs and enums) despite not being super well known from what I can tell.
      [0]: https://doc.rust-lang.org/reference/attributes/type_system.h...
    - valenterry 10 days ago
      I think that's a wrong assumption on your side. You are assuming that the type describes the "actual thing". But there is no such actual thing really. All we do is model. And here we model a view of the user within a certain context. Totally possible that a different department will model their user in a different way.
      This isn't even a technical problem, it's a semantical problem. Trying to reuse the same type throughout the whole world is doomed to fail, no matter what techniques you use.
      [-]
      - atoav 10 days ago
        This is good advice. If you take your models of the world too seriously chances are you are in the process of programming yourself into a corner, as things get more complex.
        Getting attached to your classes by writing a single superclass of The User™ is a sure way of missing out on the benefits of the typesystem and create coupling where there should be none.
        Knowing where to couple things and where to decouple things is one of those hard programming issues, but one that might be even more important than naming things.
    - atoav 10 days ago
      This was a top of my head example of how you fan catch many invariants, not meant to be exhaustive perfect code. I was also implying the user type is something you use within your application, not something you expose to other people directly. If your application has to expose internal data it makes often more sense to add a specific interface for it (e.g. an API with versions). If there is code where people rely on your internal types, it is your responsibility to use semantic versioning to communicate breaking changes, while it is the other person's responsibility to assume breaking changes when versions change.
      If we are talking about someone other relying on your user to have 3 fields that is a collegue: it is a good idea to test your code against the whole codebase. Then you know ahead what will fail and can act accordingly.
  - cryptonector 10 days ago
    Using "optional" is not really enough for struct type extensions. You really want something more akin to ASN.1's extensibility markers, or typed-holes, or class derivation, etc.
- _ZeD_ 10 days ago
  >>> (This isn't just an academic problem - the problem of trying to evolve fields over time is the reason why Protobufs got rid of the idea of a "required" field.)
  if this is the reason of the drop of the required field it doesn't seems smart to me.
  They are conflating the definition of a scheme with the scheme evolution logic.
  And there are multiple ways to deal with the latter (like adding a schema version, for instance)
  [-]
  - larsrc 10 days ago
    Some "required" fields in very commonly used protos ended up being unused, but everybody has to fill them out all the time. If they didn't, the proto wouldn't even parse, by design.
    Protobufs explicitly do not have a built-in version system (though of course you could add an optional field for it), presumably because it is better to inspect the data than build in assumptions that a certain version overs a certain invariant.
  - yen223 10 days ago
    I don't have a very strong opinion about that change specifically. I think they could have kept their cake and ate it too by solving the schema evolvability some other way, but I also suspect those other ways will come with tradeoffs that are also not pleasant.
    But, the point is schema evolvability is a real problem, and it's often not one that a lot of engineers give a lot of thought to, even those who live in very strict statically-typed worlds.
- cryptonector 10 days ago
  > But it's also describing some less obvious invariants, e.g. that a User will not have a phone number.
  That's not right. For one, the `User` type might be extensible (what language is this?). For another one can always add additional metadata about users elsewhere -- think of how you might do it in a relational database.
  > (This isn't just an academic problem - the problem of trying to evolve fields over time is the reason why Protobufs got rid of the idea of a "required" field.)
  Protobufs is the poster boy of bad reinvention. ASN.1 got all these things right over a lot of time, and instead of taking the best of it and leaving out the worst of it, everyone is constantly reinventing ASN.1 badly. And yes, ASN.1's evolution was "academic" in many ways, and its authors blazed trails. ASN.1 never had a "required" anything, but it did (and does) have an OPTIONAL keyword.
  The reason I mention extensibility above is that ASN.1 has been through the whole academic study of extensibility over several decades:
```
  - since the original encoding rules were
    tag-length-value, in the beginning you
    could just add fields as long as you
    knew the recipient could handle them
    (e.g., by ignoring unknown fields)

  - when that proved unsatisfying they
    added extensibility markers for
    denoting "here go extensions" vs
    "no extensions allowed", and "these
    are version 2 extension fields",
    and so on, and not just fields but
    also things like INTEGER ranges,
    enums, etc.

  - they also put a lot of effort into
    formalizing the use of "typed holes"
    (things like {extension-type-ID,
     <encoded-value-of-that-type>}),
    and they did this pretty much before
    any other schemes.
    
    (The wikipedia page on comparing
     serialization schemes[0] refers to
     "typed holes" as "references", FYI.)
```
  Extensibility is fairly well covered in many serialization formats now, like XML for example, but still fairly poorly covered in others.
  In SQL and relational systems in general one can always add additional fields via tables to JOIN on, and often by adding columns to existing tables.
  Extensibility is a big deal, and serious systems will have had serious thought put into it.
```
  [0] https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats
```
kibwen 10 days ago
I'm happy to see people talking about the problem posed by implicit invariants, but this post strikes me as defeatist. This line especially:
> Implicit invariants are, by definition, impossible to enforce explicitly.
Yes, Hyrum's Law (and, at the limit, Rice's Theorem) and all that means that we won't ever be perfect, but just because an invariant is currently implicit doesn't mean it must remain implicit forever. We can identify classes of implicit invariants and make them explicit using things like type systems.
[-]
- metalrain 10 days ago
  I understood article saying that implicit invariants are like undocumented choices.
  If I have blog where blogposts have maximum length of 4096 bytes. But why would there be such limit, is it creative decision? is there technical limitation? was it just some "good enough" number when blog was created?
  I don't think type systems really can encode these reasons. You see the constraints but not the reasoning.
  [-]
  - chuckadams 10 days ago
```
    type BlogPostThatFitsInSMS = BlogPost<FixedString<150>>
```
    No one would write this type in the real world (and most type systems can’t encode it), but you see lower-level sorts of reasoning all over the place in type systems.
Swiffy0 10 days ago
Isn't the whole point of the term "invariant" that it describes something as unchanging under specific circumstances.
e.g.
The sum of the angles of triangles is 180 degrees in the context of euclidean geometry. However, if we project a triangle on a sphere, this no longer holds. So the sum of the angles is an invariant under euclidean geometry.
On the other hand, the value of PI is a constant because it stays the same regardless of the circumstances. That's why all the numbers themselves are constant as well - the number 5 is number 5 absolutely always.
So if you have a value that changes over time, it is definitely not a constant. It could be invariant, if you, e.g. specify that the value does not change as long as time does not change. Your value is now an invariant in the context of stopped time, but it can never be a constant if there is any context where it does change.
[-]
- eschneider 10 days ago
  "the number 5 is the number 5" isn't so much invariant as axiomatic.
- pixl97 10 days ago
  >the value of PI is a constant
  You mean "about 3" right?
larsrc 10 days ago
Usually an invariant is something you explicitly state about a data structure or algorithm, and which must hold before and after an operation, but can be temporarily violated.
What you are talking about here are assumptions, which are usually implicit and sometimes not even part of the thought process. One purpose of writing a design document is considering and stating the relevant assumptions.
mmis1000 10 days ago
It's still way easier to handle it when you know where it will break(even it is far way from you) instead of just let it break everywhere. Make it just explode and fix it is still much better than a unknown bug that you never know if it is fixed haunting you forever.
Rust for example. Enforce the invariant everywhere. It's probably harder to build, but it is also easier to find out where and why it won't instead of have infinite memory safety problems everywhere.
And about implicit invariant? Probably a simple code change can fix it, a test would cover it, or an assertion. Or you need a whole new language to describe and test it safely(like rust) because the one you are using just can't. People are not even close to find out the answer of this problem, but the direction is to make it explicit.
throwaway74432 10 days ago
Yes, it's called evolution. Software evolves as well as scales. Scaling is architected growth, evolution is unarchitected growth. (Sometimes scaling results in unarchitected growth, if the architecture was not reasonable.) There are many patterns for handling evolution, but they almost always involve a pattern outside of the existing architecture (a super architecture) to lean on for support. In my opinion, codifying and optimizing these super architecture patterns is one of the highest goals in software engineering because they allow for less error prone evolutions.
[-]
- larsrc 10 days ago
  Thanks to Titus Winters for the phrase "Software engineering is programming integrated over time". Handling evolution of software is different from just writing it.
brabel 10 days ago
I have always used asserts to enforce invariants (sometimes not just actual "assert" but things like `Preconditions` in Java which are always enabled). Those will actually break during testing if they ever change.
I understand you cannot assert all invariants, but as far as I can see, that's your main tool against this problem.
[-]
- perlgeek 10 days ago
  And those that you cannot assert, you can still document, which will make debugging easier.
olivierduval 10 days ago
IMHO, an "invariant" is only relative to the current system to be build, based on a specific set of specifications. As soon as you build a new version of the system (either because you change the functionalities, the implementation or because the environment change), you're expected to check which invariant still applies and which should be added, removed or changed...
In a way, "invariant" are the same as "tests sets": they are here to help to ensure that the product will have some specific properties. Both are used during the product development and checked. And both must be updated when the specs change
agentultra 10 days ago
Specifications can change over time. It sounds to me as though the developer who made the change to the system that invalidated the invariant of the original specification wasn't aware of the spec in the first place. Maybe they should have worked on the specification first before proceeding.
Sounds like a communication issue to me.
shermantanktop 10 days ago
The way invariants are used can be different than “this will always be true.”
It’s a problem solving tool. I sometimes use the term to mean “I will assume this to be true in order to solve an intractable problem. If that succeeds, I need to find a way to guarantee this is actually invariant.”
cjfd 10 days ago
When I read this story it leaves me wondering if these people had any automated testing. This may or may not have helped but it is not even mentioned and it does sound like a problem that an integration test might have caught.
[-]
- fl0ki 10 days ago
  If every test assumes the same invariant, then no test ever covers cases where the invariant doesn't hold.
  I've seen mature teams get this wrong even for basic things. For example, they had a factor option, but all of their tests only used a factor of 1.0 -- applying the factor 0 or 2+ times didn't fail any tests, the factor was effectively completely untested despite "coverage".
  My tests caught it because I was a dependent of their code and, frankly, trusted them much less than they trusted themselves. Rightly in this case, and it turned out, many others.
  My philosophy was to test with exactly the configuration to be used in production and as much as possible [representative subsets of] the data to be used in production. Their philosophy was only to test contrived isolated cases which proved grossly inadequate with respect to production.
- dexwiz 10 days ago
  Yeah this is the kind of thing testing is useful for. Writing tests against invariants is a great idea. That way if the invariant changes, the test starts to fail.
tossandthrow 10 days ago
> That assumption became obsolete the moment that Matt implemented task packing, but we didn’t notice. This code, which was still simple and easy to read, was now also wrong.
It seems like Matt did not fully understand the platform he was developing for and introduced a bug as his code did not satisfy the invariants.
It is really nice when invariants are checked either in types or in tests so Matt would have been alerted when he introduced the bug.
I don't like the discourse the article introduces. It must always be the one who writes the newest code who has the responsibility to adjust previous code. That also includes changing obselete modules (where invariants does no suffice anymore).
[-]
- xmcqdpt2 10 days ago
  I work on a library used by millions of LOC in a monorepo. For some changes, it is literally impossible to know all the places that could break, so we rely heavily on types and testing to know if a change is safe.
  I think it's a shared responsibility, the person who wrote the old code needs to have written it in a way that makes it hard to inadvertently break, through testing or otherwise. The person who modifies the code needs to understand what to do if a test fails (modify the test? modify the code?) If we had to guarantee that our changes are never going to break any code downstream, no matter how poorly tested or engineered, we couldn't make changes at all.
fedeb95 10 days ago
implicit or explicit assumptions, or invariants that can actually vary, should be minimized. Otherwise, try to explicit them with guards and exceptions. This should never happen, so I'll do that - when you think this, stop and put an "if(this) then Error" in your code. Breaking is better than not breaking, lurking, and costing money later. My opinion.
sixthDot 10 days ago
Sure sometimes there are deprecations and API changes. Assumptions must be revised accordingly.
stevep98 9 days ago
Osborn’s Law: Variables won’t; constants aren’t.