Must, Should, Don't Care: TCP Conformance in the Wild

(arxiv.org)

155 points | by gbrown_ 1475 days ago

12 comments

  • Animats 1475 days ago
    This takes me back to the early days of TCP, when I used to do this. I had a TCP implementation with a "bit bucket"; every packet that was either rejected or didn't advance the connection (such as a duplicate) was logged. Then I'd send out emails to other developers; "Your packet sent at T did not conform to spec, per para..." Gradually, things got better.

    The "urgent" option could be deprecated. The original purpose was for when a TELNET connection was sending data to a slow printing terminal such as a Teletype, and you wanted to cancel the output. The TCP connection was being held back by flow control waiting for the printer on output, and might be held back on input if you'd typed ahead but the server wouldn't take another line until the output was done. The user would push BREAK, the TCP connection would send an urgent message, bypassing any queued data, and the server would get this, stop sending, and clear its output queue.

    Almost nobody has needed that feature in this century. But there's probably someone, somewhere, with some ancient embedded device like a cash register printer, using it.

    • lkrubner 1474 days ago
      Mark Pilgrim once had a hilarious blog post in which he argued that all software developers were either assholes are psychopaths. He also argued that some people believed in angels, who followed the specs for the best of reasons, but who were actually mythical. Your """"Your packet sent at T did not conform to spec, per para..." Gradually, things got better""" would make you an angel, as you did it for the right reasons, though I imagine some of the people you wrote gave you a different classification.

      Sadly, Mark Pilgrim committed info-suicide and erased everything he'd written from the Web. (Off topic: it is sad how much disappears from the Web, and how many weblogs shut down. Some of the best essays I've ever read were on weblogs, now gone. I just revived a weblog I had run in 2005, and checking the links I found that the linkrot was running in the area of 50% to 60%.)

      • Animats 1474 days ago
        I was at an aerospace company. In aerospace, specs matter, because part A has to plug into part B. You can remove the Pratt and Whitney engines from an airliner and substitute Rolls-Royce engines. One is not emulating the other; they both meet the same spec. DoD used to be big on having multiple sources, all making interchangeable units to the same spec.

        We used to say, "if A won't work with B, check the spec. If A doesn't match the spec, A is broken. If B doesn't match the spec, B is broken. If you can't tell, the spec is broken.

        Much of this worked in the early TCP/IP days because DoD was funding most of the players, both industrial and academic. They wanted interoperabilty.

        There's much less of that today. Compatibility tends to involve reverse engineering the dominant player's product.

      • jgraham 1474 days ago
      • petercooper 1473 days ago
        On the latter topic, one of my favorite ever weblogs was that of Nat Friedman, now CEO of GitHub. Excluded from the Wayback Machine and gone forever (unless he kept a personal archive, I guess), but a lot of the stuff in the 2000-2010 era covered some very interesting technologies he was involved with building.
    • tptacek 1474 days ago
      FTP used URG too, but then, FTP is itself a deprecated monstrosity.
  • Eikon 1475 days ago
    "Should" and "May" are such horrible words to encounter when implementing an RFC.

    Sometimes, implementing workarounds for ends that only implements "Must" is harder than just implementing the RFC as if everything was just mandatory.

    In my opinion, RFCs should strive to limit the optional parts of a specification at a minimum and, maybe, put the remaining in extensions.

    • linkdd 1475 days ago
      Funny how your comment also use "should" and "may" :)

      Anyway, those key words are used to allow flexibility in the implementation. Remember that a RFC is specific to a single version of a single protocol.

      Thus splitting everything in multiple RFCs will still implies complexity in the implementation, just not the same kind of complexity (what versions to use and when?).

      > Imperatives of the type defined in this memo must be used with care and sparingly. > In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm (e.g., limiting retransmisssions)

      > For example, they must not be used to try to impose a particular method on implementors where the method is not required for interoperability.

      Source: https://www.ietf.org/rfc/rfc2119.txt

    • Disposition 1475 days ago
      That wouldn't work too well with complex RFCs like X509, should is used for some backwards compatibility to earlier RFC revisions and many optional elements that would fragment the standard into dozens of extensions.

      IMHO implementors must implement a "should", not in the same sense of the binding requirement that "must" presents, but as a boolean possibility; the element may not be present, but it should never be ignored in a way that would break implementations making use of it.

    • brudgers 1475 days ago
      RFC's are consensus documents. The optional parts address the concerns of stakeholders with existing products. This is an alternative to their veto or non-participation in the standards development process or development of competing standards for business survival.
    • dodobirdlord 1474 days ago
      Expressed a different way, my take is that non-"must" clauses of an RFC should be cleanly severable, i.e. you could literally cut them out of the document and everything else would still make sense. How a full implementation handles interoperating with an implementation that declines an optional feature should be defined in the same place as the optional feature. Further, being able to interoperate with an implementation that declines an optional feature should be required at the same optionality level as the optional feature. It shouldn't be a "must" to support interoperability with an implementation that has declined to implement a "should". This has basically the same effect as pushing optional parts into extensions. There's the core RFC that defines the spec, and a bunch of optional features that two implementations can use by mutual agreement.

      To address some common concerns about the effect this would have on general interoperability, keep in mind that a clause that reads "In scenario Foo, implementations should do X and should not do Y, but may do Z" can be much more cleanly expressed as "In scenario Foo, implementations must do X or do Y or do Z".

    • kstenerud 1475 days ago
      It's remarkably difficult to avoid "should" and "may" in any decently sized spec. I count 11 "shoulds" in [1] and 7 in [2], and I tried REALLY hard to keep them to a minimum :/

      [1] https://github.com/kstenerud/concise-encoding/blob/master/ct...

      [2] https://github.com/kstenerud/concise-encoding/blob/master/cb...

    • battery_cowboy 1474 days ago
      Every international spec I've read contains should and may, but also includes must here and there.
    • beojan 1475 days ago
      > In my opinion, RFCs should strive to limit the optional parts of a specification at a minimum

      If only USBIF had done that

  • iwalton3 1475 days ago
    This reminds me of "Hyrum's Law":

    "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."

    People might implement work-arounds for bugs in an API that could break when the underlying bug is fixed. Or software might implement the absolute bare minimum for it to "work" with some specific implementation.

    • asdfasgasdgasdg 1475 days ago
      This is kind of the opposite of Hyrum's law, actually, in that the protocol promises one behavior but what's actually universally supported in the wild is only a subset of that promise.
      • IceDane 1475 days ago
        I think you're both right.

        He's still right because my experience suggests that it is 100% likely that somewhere, some system relies on a bug in the TCP protocol stack of another system causing a segfault to shut down something critical.

        • ninju 1475 days ago
          https://xkcd.com/1172/

          (leave your mouse over the image and read the hover text :-))

          • codetrotter 1475 days ago
            I’m on mobile, what does the alt text say again?
            • detaro 1475 days ago
              https://m.xkcd.com/1172/ <- mobile version has a button for alt text

              > There are probably children out there holding down spacebar to stay warm in the winter! YOUR UPDATE MURDERS CHILDREN.

  • oconnor663 1475 days ago
    Is there a standardized set of TCP "test vectors" anywhere? Even an informal de facto standard test like what SSLLabs is for TLS?
  • rwmj 1475 days ago
    I can understand why someone implementing - say - a custom web server stack might never have encountered a TCP packet with the URGent bit set. As I understand it, it's only really used by telnet (does ssh use it?)
  • peterwwillis 1474 days ago
    Name a standard with 3 or more implementations and at least one of them will have violated the standard. Sometimes it's all three. Sometimes it's the implementation of the people who wrote the standard.

    This is inherent to all standards implemented by different products. There's no implementation police going around checking on and forcing people to implement standards properly. And in the course of just regular product development, it often starts to slowly violate the standard without notice.

    It sure does help when there's a test suite that you can validate your implementation against, but I find them rare.

  • asdfasgasdgasdg 1475 days ago
    To me, this says a lot about expectations of conformance when many different entities need to implement the same spec. As the size of the spec or the number of entities grow, the probability of nonconformance approaches one. Then you end up with two specs: the ivory tower one, and the usable, actually implemented, lowest common denominator one.

    This problem seems especially acute when there are multiple hops on the path that are able to interpret (and fuck up with) data flowing through that hop. Especially when the owners of those hops don't have a direct economic connection to the entities dealing with their failures.

    This seems to argue in favor of something like QUIC. You use an extremely simple protocol for the transport (basically, "try to send this data to that address"). You hide the complex parts of the protocol in an encrypted channel so that only the economically connected stakeholders have to conform. This aligns incentives better than in the case of TCP and probably gives you better outcomes in the long run.

    • kevingadd 1475 days ago
      The downside to QUIC is that all the other nodes in the chain lose the ability to do useful things. Of course in the long run it's turned out that Google et al do not want anyone else doing useful stuff, but for quite a while it was very useful to be able to have stuff in the middle like a caching proxy. Alas, that era is over.

      As an admin it's appealing to be able to have stuff like deep packet inspection to give me info on network traffic, but the price we pay collectively for that being possible is way too high so it was inevitable we'd lose it.

      • kevin_thibedeau 1475 days ago
        The upside to QUIC is that all the other nodes in the chain lose the ability to do useful things.

        This behavior is why we can't use SCTP everywhere and can't deploy new protocols on top of IP.

        • touisteur 1475 days ago
          But I thought you could use SCTP over UDP (works, I tried). If QUIC is another layer above SCTP it feels like wasted effort. SCTP is really interesting and featureful. Multi-homing, parallel streams, datagram oriented...
          • pas 1475 days ago
            QUIC is basically SCTP over UDP. It has substreams (like HTTP/2, but it eliminates the TCP level head-of-line blocking), there's a multipath extension (proposal from 2018) - but maybe MP-TCP will land first and then who knows.
            • toast0 1474 days ago
              > maybe MP-TCP will land first and then who knows.

              I'm not a fan of Apple, but they've deployed MP-TCP in iOS. I'm not an iOS developer, but it looks relatively simple to enable if you're already using NSURLSession [1]. Having a server to talk to is another thing, of course. :)

              [1] https://developer.apple.com/documentation/foundation/nsurlse...

            • touisteur 1474 days ago
              Thanks for this answer but I still don't understand, sorry.

              What's missing in SCTP (used everywhere in telephony/3-4-5g, right?) that we had to reinvent another complete transport+session layer ? It's already in the kernel, seems to have had its own share of CVEs that it should now be relatively trustable? But it's also available in userland, especially over UDP.

              Performance is fine, from my benchmarks. Ease of use is (to me) far better than TCP and all the socket-hand-holding you need to do (if you don't use zmq because tired of writing the same all the time). Flexibility of substreams is amazing. Multi-homing is great so you can bond links at the applicative level (so, higher decoupling instead of ip-bonding).

              I'm genuinely curious as to why they haven't just taken SCTP as is, and added the extensions (?) they need.

              • tialaramex 1474 days ago
                > What's missing in SCTP (used everywhere in telephony/3-4-5g, right?) that we had to reinvent another complete transport+session layer ?

                SCTP isn't encrypted. Because "Pervasive Monitoring is an Attack" new IETF protocols should be encrypted or explain why they can't be. HTTP/2 for example is in effect always encrypted (the document explains how one could in theory build an unencrypted variant but nobody implements that).

                > I'm genuinely curious as to why they haven't just taken SCTP as is, and added the extensions (?) they need.

                If you "just" drop the encrypted transport on top you either have to do all the work to deliver features like substreams yet again, or else all the metadata in the layer that's not encrypted is left unprotected and you'll regret that.

                • touisteur 1473 days ago
                  Ah, encryption, thanks.

                  Thought there was a sctp+tls RFC https://tools.ietf.org/html/rfc3436

                  don't know whether any userland lib supports this with sctp-over-UDP.

                  • tialaramex 1473 days ago
                    I explained the negative consequences of just layering TLS on top in my comment already.

                    RFC3436 just makes pairs of SCTP streams (one in each direction) into a transport like TCP that TLS will run on top of. Each such stream-pair then does a TLS handshake.

                    That's not what QUIC does. The entire QUIC connection, however many streams and in whichever direction, is encrypted.

                    As a very simple example - suppose you spin up three parallel stream pairs to fetch three separate documents over a hypothetical HTTP-over-TLS-over-SCTP. With RFC 3436 you reveal to an on-path adversary that there are three streams, and they get to see how much data travelled over each stream. But with QUIC it's just an opaque pipe, the on-path adversary can see how much data was transmitted each way but can't determine whether that's one document in one stream, or ten documents in two streams or anything else.

                    • touisteur 1473 days ago
                      Sorry I did not mean to say you were wrong, I just meant to say 'it exists', for reference. I understood your first explanation, didn't know whether this RFC encrypted substream by substream or the whole session (I should have read more before posting). Thanks for the clarification. And your patience.
                • nitrogen 1474 days ago
                  In addition to that I believe I've read that SCTP can't traverse all of the networks that TCP and UDP can, so any universal datagram protocol will have to build on UDP.
      • asdfasgasdgasdg 1475 days ago
        Yeah, what's useful today will be ossified in ten or twenty years. That's kind of the point I was making (which you note in your second paragraph). If you want a healthy ecosystem you have to remove the ability of economically-disconnected middlemen to screw with your protocol. And the best way to do that is to make it as dead simple as possible, so there is very little leeway in implementation.
  • dirtydroog 1475 days ago
    HTTP is in a similar sorry state. For example, it offers pipelining support but nobody can reliably use it because it probably won't work on some forgotten about machine somewhere on the path.
    • yencabulator 1462 days ago
      Another example: Setting Allow header properly in a 405 response is a MUST. I am not aware of anything that breaks due to missing that, and I am aware of many implementations that do not set it.
    • dagenix 1475 days ago
      Pipelining support in http 1.1 is basically useless even outside of the compatibility issues.
      • rkeene2 1475 days ago
        I built a server-side mouse tracking over HTTP application that used HTTP/1.1 Pipelining to increase precision [0], back when web browsers supported HTTP/1.1 Pipelining, that is.

        [0] https://github.com/rkeene/webdraw

        • dagenix 1474 days ago
          That does sound pretty cool. But it sounds like a fairly special case where the responses were likely uniformly small, which is unusual for a more general use case. Given somewhat more recent technology, it sounds like a great use case for something like WebSockets.
          • rkeene2 1474 days ago
            Yeah, WebSockets would also work here -- I would need to invent an ad-hoc protocol for making the resource requests, and it would be similar to a mini HTTP/1.1 Pipelining in spirit.

            The project needs some other work, since it looks like changes to Chrome have broken the mouse tracking. [0]

            http://webdraw.rkeene.org/

      • toast0 1475 days ago
        Pipelining in 1.1 has some issues, but it could be useful in the right circumstances (except that, there's not a lot of implementations of pipelining, so chances are the other end won't do it).

        The perfect use case for pipelining is when the client is connected to a (reverse) proxy near to it, is making retryable requests, the origin is far from the proxy, and the requests don't take much time for the origin to process.

        You can also get benefits if there is a proxy near the server and the requests take significant time to process, but the proxy can divide them over multiple servers (or the server handles pipelining and divides over multiple threads).

        The way these usecases are met instead is through multiple TCP connections, or multiple multiplexed streams in HTTP/2 (and beyond). There's certainly benefits of individual streams, but there's costs too, so it's unfortunate that the http/1.1 way was stiffled.

        • dagenix 1474 days ago
          In both of those cases it's also important that the responses be small, since, due to pipelining requiring that responses come back in the same order as requests, you have to buffer up any responses that become ready out of order. Also, the response time for each request needs to be about the same or you have to issue requests in the right order, or, you end up potentially blocking every response due to the first one taking the longest.

          HTTP pipelining isn't largely unimplemented because people were lazy - its largely unimplemented because its easy to DOS a server, possibly by accident, and there are significant latency issues due to head-of-line blocking with slow requests.

  • Diggsey 1475 days ago
    Maybe part of developing a standard should be developing and maintaining test and validation suites for that standard...
  • majkinetor 1475 days ago
    I thought this is some kind of meme - from must over should to don't care.
  • est 1475 days ago
    Could this be used to fingerprint IP origin devices?
  • ape4 1475 days ago
    Its going to be specific stacks that don't do the MUSTs.