The semver trick (2019)

(github.com)

85 points | by aazaa 1 day ago

8 comments

  • kazinator 1 day ago

    > In Rust (as in C, for that matter), two structs are not interchangeable just because they look the same.

    In C, two structs with different tags in the same translation unit are indeed incompatible. (That includes two structs with no tag that look the same, because, effectively, each has some internal, machine-generated tag.)

    Within the same translation unit, looking the same is not a consideration at all; it's all based on the tag symbol.

    Between translation units, two complete struct types that are exactly the same (same members, of the same type, in the same order, with the same names, and other details like bitfield configuration) are compatible are if they both have the same tag.

    But they are also compatible if neither has a tag: and in that case, they would be incompatible if they were in the same translation unit.

    Basically, two machine-generated tags for anonymous structs are considered equivalent if they are in different translation units, in which case compatibility is purely down to structural equivalence.

    In practice, C compilers do not police struct tags at all between translation units; you can get away with it if the same object is known as "struct point { double x, y; }" and "struct pointe { double x, y; }" in another translation units. You can even change the member names, for that matter.

    You will run aground with a C++ compiler, though, due to type safe linkage which pulls in class names.

    FFI implementations obviously don't care about any of this. If I'm declaring an FFI type that is to be compatible with the C "struct stat" in a Lisp, so I can call the stat function, these concepts have gone out the window. The memory layout compatibility is all that matters: correct sizes, alignments, conversions.

    • josephcsible 1 day ago

      Doesn't the "common initial sequence" rule imply that structs with the same members must be the same, even within a translation unit? If they didn't, wouldn't putting them both in a union and accessing through the other one not work, even though the standard requires it to work?

      • What is a tag in C?

    • andybak 1 day ago

      Not a Rust user but this all sounds remarkably painful. Is it common? The only other compiled, type-safe language I use regularly is C# (via Unity) and I don't recall this level of upheaval.

      • mbrubeck 1 day ago

        This is uncommon, because it's not really necessary except for “foundational” libraries that are used “publicly” (i.e., not just internally) by hundreds of other libraries.

        If only three or four of my dependencies use the Foo crate, then I can just wait for all of them to upgrade to Foo 2.0, or I can do it myself and submit pull requests. In this case, I don't really care whether Foo's maintainer uses the semver trick.

        However, if I'm developing a really big project like Servo or rustc, and Foo is a foundational crate that is used by dozens of my (transitive) dependencies, then waiting for all of them to upgrade (or doing the work to upgrade all of them myself) starts to become prohibitive. Now a bit of work by Foo's maintainer can save a lot of time for large downstream consumers.

        An example of a “foundational” crate that has used the semver trick is `num-traits`, which has over a thousand reverse dependencies: https://crates.io/crates/num-traits/0.1.43

        • 1wd 1 day ago

          C# has assembly binding redirection.

          https://stackoverflow.com/a/43366172/3679043

          • emn13 1 day ago

            Yeah, and they're a huge pain if you indeed actually need to use them, and often result in a non-working mess. This wasn't all that uncommon in the early days of .net core, which was particularly bad at this IIRC largely because many foundational libraries were split into packages that essentially could only ever be upgraded in concert. There are a few technical nuances that mean I'm sure this isn't quite the same as the rust case, but it's pretty bad nontheless. Even well thought-out transitions like .net standard weren't free from gotcha's particularly when mixed with multitargeting and deep transitive dependency graphs (which was pretty easy to get in the early .net core days).

            The whole thing is clearly not great (not just in C#). In particular - many of these problems are entirely artificial. If the type system were sufficiently dynamic, or magically psychic - the problems often go away. E.g. in the rust example - void_c didn't actually change, it was just incompatible because rustc isn't psychic. Even seemingly serious problems like an interface (~ in rust a trait) gaining a new method aren't necessarily fundamentally breaking - as long as you're not calling it. Even truly breaking changes like "best-solution-finder returns a different result and has an incompatible API" might not be breaking between in the case of a diamond dependency pattern as long as the method's side effects allow running both versions - one for each transitive dependency requesting that specific version. Even cases where the break is "real" like a rename, it'd be trivial to consider a shim that allows old consumers to use the new api.

            In the vast, vast majority of cases in my experience this kind of breakage is a problem due to technicalities. That doesn't mean I have a clue how to solve that, but it does beg the question: isn't it possible for a dependency resolution system to do fundamentally better, here?

            • qes 1 day ago

              > This wasn't all that uncommon in the early days of .net core

              Good heavens was that a mess. It had been a nice 15+ years of easy-peasy with dependency management prior to that in MS-land, though.

              It seemed that a fair amount of the problems were MS learning the pitfalls of how they packaged System assemblies for NuGet. Then the compatibility shim mostly brought things back to what we had been used to in .Net (works) - even easier, in fact, now that automatic assembly binding in builds isn't a minefield.

              • In C#, default implementations of new methods on interfaces should at least reduce the pain of additive changes.

            • ViViDboarder 1 day ago

              I’m new to Rust, but it doesn’t appear to be.

              If I’m following correctly, it would only occur if a library is using a struct defined in a dependency in a public API you are using and you also use the same dependency elsewhere in your application. Upgrading the version elsewhere in your application could break passing that struct.

              Most of the time I see public APIs which either base types or struct a defined by the library itself.

              • geofft 1 day ago

                It's only really relevant when a) you release semver-incompatible upgrades (e.g., 1.x to 2.x), b) people use types from your library in their APIs, i.e., their types depend on yours, and c) your library is sufficiently widely used that people are likely to have transitive dependencies on it by multiple routes, such that a couple of their direct dependencies will have upgraded to your new version and a couple wouldn't have gotten around to it.

                The 'libc' crate was a perfect storm of all three. It was still at 0.x, and the whole point of the library is to define types/bindings for others to use.

                It also requires the FOSS model of independent owners, I guess, since that's why some consumers haven't upgraded. Inside a company and especially in a monorepo, someone - probably the person making the breaking change - would have just upgraded everyone at once. And a redistributor of FOSS like a Linux distro would have probably just patched everyone at once or held back upgrades until they could - in fact, this is the same shape of problem as upgrading from OpenSSL 1.0 to 1.1, or GTK 2 to 3, or libstdc++'s new ABI, or whatever.

                • yjftsjthsd-h 1 day ago

                  > Inside a company and especially in a monorepo, someone - probably the person making the breaking change - would have just upgraded everyone at once.

                  Ideally yes, in practice not always. The easy way to break it is of course political problems, but it's perfectly possible to be stuck in a situation where unfortunate design decisions early on leave you stuck unable to make breaking changes even if everyone is on the same side (at any rate, not without taking a massive service disruption that would violate our contracts with customers).

              • gazarullz 1 day ago

                would have been nice to tag the post as a rust + semver regarding post instead of only semver as the title implies

                • mjw1007 1 day ago

                  You could use the same trick for other languages and packaging systems, as long as they support linking multiple major versions of the same library into one program.

                  • hombre_fatal 1 day ago

                    I guess we'll have to (ugh) click the title and spend (shudder) 10 seconds reading to get further context.

                  • Wouldn’t this still break a consumer who doesn’t realize the trick is being employed? Doesn’t this assume the consumer is making the requisite changes in their library as part of upgrading to the “nonbreaking” upgrade that slips the new type definition in via its dependency on its own breaking upgrade?

                    • Tuna-Fish 1 day ago

                      No. Someone using the old version can continue using it, completely unaware of the fact that anything changed.

                    • diegocg 1 day ago

                      Wouldn't this be solved easily with symbol versioning?

                      • kibwen 1 day ago

                        Possibly (I've never seen a concrete proposal of per-symbol versioning, so I can't say for certain), but in that case all downstream consumers of your library would have to pre-emptively declare the version for each and every symbol they use.

                        Now obviously not all users will be using every symbol from every library that they depend on, but, to use the OP's example of libc, which contains over 4,500 symbols... that starts to look unwieldy.

                        Of course, you could technically do this today by just having every symbol in its own crate. And while that seems like quite a stretch, I think it is the consensus that the libc crate in particular is too big, and should have been split out into multiple crates in order to better facilitate these sorts of upgrades. So there might be a practical middle ground by having one crate for "crucial, fundamental symbols" and a separate crate for "ancillary symbols", where each could be versioned separately. That might get close enough to the precision of per-symbol versioning without getting unwieldy.

                      • nixpulvis 1 day ago

                        I would be very interested in how this could be created automatically by `cargo` itself. A cargo semver tool which can bump versions, and create two versions on major/minor breaking changes like this post recommends would be really cool to see.

                        • dmitriid 1 day ago

                          > Servo found themselves coordinating an upgrade of 52 libraries over a period of three months

                          And yet... people only complain about npm having lots of dependencies ;)

                          • PhineasRex 1 day ago

                            Projects of similar size in JS have hundreds of dependencies, so 52 really isn't a lot

                            • dmitriid 1 day ago

                              That's mostly because JS doesn't have a suitable standard library.

                              • saagarjha 1 day ago

                                Rust has this same problem. (Although you could make the argument that this specific state of things was explicitly chosen in Rust's case.)

                                • heavenlyblue 22 hours ago

                                  Or because JS is and was a hype and is full of people marketing their names through many easy, small packages.

                              • ilammy 1 day ago

                                Well, looking at various crates pulling in lots (dozens) of dependencies, I'd say crates.io is clearly moving somewhere into that direction of microlibraries with extensive code reuse.

                                • kibwen 1 day ago

                                  I only encounter this in Rust when doing anything web-related. There's quite a lot of prominent authors who go out of their way to reduce their dependencies by any means necessary (e.g. https://github.com/tokio-rs/tokio/pull/1324 , which is a notably extreme case).

                                  • dmitriid 1 day ago

                                    > I only encounter this in Rust when doing anything web-related.

                                    It's probably because anything web-related requires so many things no readily available in the language or in the standard library: anything from databases (sometimes different breeds of databases) to templating to serialisation (possibly multiple types of serialisation) to rest or graphql to...

                                    Just serialisation (which is almost invariable Serde) will pull in at least 54 dependencies (if you only use serde and serde_json). A framework such as rocket which provides all that, and more, will pull in ... 332 dependencies :)

                                    Edit: Calculation is invalid, see https://news.ycombinator.com/item?id=24024671

                                    • dtolnay 1 day ago

                                      > Just serialisation (which is almost invariable Serde) will pull in at least 54 dependencies (if you only use serde and serde_json).

                                      What is this number referring to? Serde + serde_json with all their transitive dependencies is maximum 12 crates (for someone who has enabled all optional features), though in typical usage it's 9 crates: serde, serde_derive, serde_json, syn, quote, proc-macro2, unicode-xid, itoa, ryu. I haven't figured out how you got to 54.

                                      • dmitriid 1 day ago

                                        I ran `cargo tree -e all | wc -l`. Now I realise that's not a proper way to do it :-/

                                        Looks like `cargo metadata --format-version=1 | jq -r ".packages|map(.name)|.[]"` is the way to do it.

                                        - 10 for serde + serde_json

                                        - 88 for rocket only

                                  • ChrisSD 1 day ago

                                    The trouble with monolithic crates is that they're monolithic. People complain about the massive amount of code they have to pull in just for a few functions, and the effect this can have on compile times.

                                    The trouble with small crates is they're small. People complain when the number of dependencies grows larger and mockingly reference "leftpad".

                                    • josephcsible 1 day ago

                                      The issue with leftpad wasn't that it was small. The issue was that so much production code relied on it not going away, despite npm letting the author make it go away.

                                      • tsimionescu 1 day ago

                                        Yes, the problem that broke everything was what you mentioned. But what struck everyone more was that this all happened because of a few line function that should have obviously never been a dependency in any sane project.

                                        • nimih 1 day ago

                                          The security/operational issue may have been that the author was able to break production code, but the simplicity of the function is what made it extremely funny.

                                        • linkdd 1 day ago

                                          If people pull a massive amount of code to use a few functions, the problem is not the size of the code, the problem is the lazyness of the developer who doesn't want to write 3 functions.

                                          Same for leftpad-like packages, I don't need a dependency to a single function that I can rewrite myself. Especially if I'm not writing open-source software and I have to audit the licenses of my dependencies.

                                          Software development is a matter of trade-offs, you get what you choose, people should not be complaining about that.

                                          • viraptor 1 day ago

                                            If you're adding encryption support, you're likely adding openssl which brings hundreds of cryptographic primitives you're not going to use. Are you lazy because you didn't write your own RSA? (you can do it in ~3 functions)

                                            That generalisation didn't work well, because as you say - it's about tradeoffs.

                                            • heavenlyblue 22 hours ago

                                              I can’t implement RSA in the amount of time it takes me to Google the name of an RSA library.

                                              I can implement left pad faster than it would take me to Google the name of the library, check whether it actually does what I want and then add it to my package manifest as a dependency.

                                              • linkdd 22 hours ago

                                                The amount of work needed to write a secure implementation of RSA (you would surely need a library like gmp to handle big numbers) is not worth my time, I would gladly trade that against some download/compilation time.

                                                The same goes for efficient map/reduce framework or even a key/value database (what's wrong with using BerkleyDB/SQLite instead of writing your own storage format?).

                                                I need to serialize user-supplied data ? I'm gonna use a library that will handle edge cases I can't think about.

                                                But I'm not gonna use SpringBoot only for its logging facility, or Django only for its ORM, or a BNF parser generator to parse "hello world".

                                      • Lammy 1 day ago

                                        > The Rust library ecosystem has a history of traumatic library upgrades. The upgrade of libc from 0.1 to 0.2 is known as the "libcpocalypse".

                                        Somebody should come up with a commonly-agreed-upon versioning scheme where we can indicate that breaking changes should be expected so people could avoid putting themselves in situations they regret.

                                        • steveklabnik 1 day ago

                                          The versioning scheme is not the issue here.

                                          The issue is that there can only be one copy of certain kinds of libraries, and when there's a major version bump, it creates a fork in the ecosystem.

                                          • dtolnay 1 day ago

                                            The article isn't supposed to imply that any regret was involved.

                                            People used early libc and early serde to get a massive benefit (respectively: talk to C code, and process JSON). Independent of version numbers those are things people want to do in Rust.