Trip report: Fall ISO C++ standards meeting

(herbsutter.com)

88 points | by matt_d 1990 days ago

11 comments

  • Someone 1990 days ago
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090..., concludes

    ”the only machine the author could find using non-two’s complement are made by Unisys, and no counter-example was brought by any member of the C++ standards committee. Nowadays Unisys emulates their old architecture using x86 CPUs with attached FPGAs for customers who have legacy applications which they’ve been unable to migrate. These applications are unlikely to be well served by modern C++, signed integers are the least of their problem. Post-modern C++ should focus on serving its existing users well, and incoming users should be blissfully unaware of integer esoterica.”

    That led to the decision that ”Signed Integers are Two’s Complement” in C++ 20.

    That makes it too easy to write portable code :-)

    • kazinator 1990 days ago
      ... but overflowing them is still UB?

      One advantage of pinning down two's complement is that the result of arithmetic is known down to the bit level, even when it overflows.

      Also, certain bit operations. For instance, if we know that the representation is two's complement, we are justified in expecting sign extension on a right shift: the sign bit is repeated to fill the top bit positions. (The shift of a signed number should be "arithmetic"; that of unsigned "logical".)

      Also, truncations (converting a wider type to a narrower type) should have predictable results. For instance if we chop the value -238 to 8 bits, we should get 18, because -238 is 0xFF..FF12, which truncates to 0x12.

      • Joky 1990 days ago
        Actually I discourage my colleague to use unsigned especially because they can overflow. The problem with overflow is that most of the time it is a program bug if there is an overflow that wasn't intended.

        Using signed arithmetic means that you can actually trap on overflow and catch these bugs (using fuzzing for instance).

        Chandler explained it nicely in his CPPCon 2016 talk "Garbage In, Garbage Out: Arguing about Undefined Behavior...". I encourage to watch the full talk but here is one relevant example:

        https://youtu.be/yG1OZ69H_-o?t=2006

        And here he mentions how Google experimented with this internally: https://youtu.be/yG1OZ69H_-o?t=2249

        • kazinator 1988 days ago
          Unsigned types don't overflow; they reduce modulo a power of two. This is a predictable, reproducible behavior (albeit, unfortunately, not entirely portable due to the size of that power of two often being implementation-defined).

          > Using signed arithmetic means that you can actually trap on overflow and catch these bugs (using fuzzing for instance).

          The reason you can't do this for unsigned types is not simply that their modulo-reducing behavior is well-defined, but that it is actually exploited in correct code, which then leads to false positives if the behavior is trapped.

          But the overflow behavior of signed types is also exploited.

          Either one could be usefully caught with tools, if the tools can simply be told where in the program to ignore false positives. If I'm using unsigned types in foo.c, with no intention of relying on the modulo wrapping, I should be able to tell the tool to report all unsigned wrapping occurrences in just foo.c without caring what is done with unsigned types elsewhere in the program or its dependent libraries.

          All that said, I believe unsigned types should be eschewed as much as possible because thy have an unacceptable discontinuity right to the left of zero. Suppose A, B and C are small positive integers close to zero, say all less than a hundred or so. Then given

             A + B > C
          
          and knowing elementary school algebra, I can rewrite that as:

             A > B - C
          
          But I can do that in the actual code only if the type is signed. I cannot do that if it is unsigned, because C might be greater than B, and produce some huge number. This happens even though I'm working with harmless little numbers less than around a hundred.

          We should prefer to work with integers that obey basic algebraic laws, at least when their magnitudes are small, so that we don't have to think about "is that test phrased in the right way, or do we have a bug here?"

          In any higher level language than C, we should have multi-precision integers. I no longer consider any non-C language viable if it doesn't. If I'm reading the manual for some new supposedly high level language and come across a statement like "integers are 64 bits wide in Fart, but a clumsy library called Burp provides bolted-on bignums", I stop reading, hit the back button and don't come back.

      • ameliaquining 1990 days ago
        Signed integer overflow being UB enables compiler optimizations. Most famously, when compiling

          for (int i = 0; i < bound; i++)
        
        in an LP64 environment (i.e., one where the native machine word is 64 bits but int is 32 bits, as is the case in most modern compilers targeting most modern architectures), it allows i to be represented by a native machine word, increasing performance.

        You can, and people constantly do, argue about whether this is worth it, but I suspect it's the primary reason why a lot of the committee didn't want to change this behavior.

        • kazinator 1990 days ago
          If it's okay to screw programs that make nonportable assumptions, then int should just be 64 bits. After all, int is supposed to be the machine's natural integer.

          The current fashion is that we like programs that assume int is 32 bits, and don't like ones that assume wraparound arithmetic.

          • slededit 1990 days ago
            32-bit ints have a more compact encoding in x86 because it was assumed they would be the most common. The "native" size isn't a simple matter these days. Back in the old dark days of 8 and 16-bits common values regularly exceeded the size. But 4 billion is large enough to still be the default.
        • brigade 1990 days ago
          Having optimized very tight C loops which included manually promoting int types to intptr_t to avoid sign extension, I'll say emphatically that avoiding needless sign extension had no effect on my micro-benchmarks on modern wide CPUs, let alone my macro benchmarks.
    • thestoicattack 1990 days ago
      A talk by that paper's author: https://www.youtube.com/watch?v=JhUxIVf1qok
    • favorited 1990 days ago
      Really cool that this change is likely to make it through WG14 as well.
      • xenadu02 1990 days ago
        Unfortunately the working group rejected the author's proposal that signed integers have wrapping behavior, so signed overflow/underflow remain undefined:

        > Status-quo If a signed operation would naturally produce a value that is not within the range of the result type, the behavior is undefined. The author had hoped to make this well-defined as wrapping (the operations produce the same value bits as for the corresponding unsigned type), but WG21 had strong resistance against this.

        http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...

        • saagarjha 1990 days ago
          I consider this less of a shoo-in, in my opinion, since it's more complicated to implement this in the case where your integer type doesn't actually match the register size of your hardware (which is true with int on LP64). With unsigned integers you can just mask off the extra bits when overflow occurs, but with signed integers this is more complicated. Having this be implementation-specific behavior might be useful, so you don't have undefined behavior in your program, but I am generally of the opinion that overflowing an int is usually a bug rather than a feature.
          • chrisseaton 1990 days ago
            > it's more complicated to implement this in the case where your integer type doesn't actually match the register size of your hardware (which is true with int on LP64)

            If you do arithmetic on a 32 bit register on AMD64 you still get 32 bit wraparound don't you?

            • saagarjha 1990 days ago
              This is generally an issue with loops that involve memory accesses, because it causes annoying problems when optimizing: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...
              • usefulcat 1990 days ago
                I wonder, would using int_fast32_t be a viable solution to the problem described above?
              • xenadu02 1990 days ago
                Making signed overflow/underflow trapping behavior would also be acceptable.

                I don't think the performance improvements are worth the security vulnerabilities, besides which the loop variable issue could be solved by adding `auto` and promoting its use - then the compiler can choose the loop counter type appropriate for the machine since the int-as-64-bit ship already sailed.

                (Clang implements this as `__auto_type` and it harmonizes with C++; I've done #define auto __auto_type in projects and it cleans up a lot of C code quite nicely. I'm somewhat surprised no one has offered it as a solution since it is already a reserved keyword).

                • colanderman 1990 days ago
                  > the loop variable issue could be solved by adding `auto`

                  I'm not sure how this helps? The compiler doesn't get to choose based on the machine register width – it's bound to choose based on the type of the expression, which is defined by C/C++ integer promotion rules. The expression `0` is an `int` regardless of the context in which it appears, and thus will be an `auto` variable initialized to `0`.

                  You're better off using `ptrdiff_t` than `auto` for loop indices if you want something sized to native register width.

                  • grandmczeb 1990 days ago
                    > The expression `0` is an `int` regardless of the context in which it appears

                    Not totally true. In C, 0 is a special value that can be an integer or a null pointer literal depending on the context.

                • saagarjha 1990 days ago
                  > I don't think the performance improvements are worth the security vulnerabilities

                  This is C++ you're talking about, that's never the right trade-off :P

                  > the loop variable issue could be solved by adding `auto` and promoting its use - then the compiler can choose the loop counter type appropriate for the machine since the int-as-64-bit ship already sailed

                  I'm not sure this completely solves the problem, since often you're doing something like this (I know, I know, iterators are the way to do this, but people do write code like this):

                    for (int i = 0; i < some_container.size(); ++i) {
                    	// do something
                    }
                  
                  Auto doesn't really help here, because if you do something like

                    for (auto i = 0; i < some_container.size(); ++i) {
                    	// do something
                    }
                  
                  the compiler can't look at your code and say "i really looks like it was meant to be a 64-bit index", it needs to be stupid and deduce it to be an int or at least behave as an int would. Unless I'm misunderstanding what you're trying to say?

                  > (Clang implements this as `__auto_type` and it harmonizes with C++; I've done #define auto __auto_type in projects and it cleans up a lot of C code quite nicely. I'm somewhat surprised no one has offered it as a solution since it is already a reserved keyword).

                  Well, you're technically breaking code that actually uses auto the way it was intended to be used, though of course since it's so useless this kind of code is very rare.

                • Joky 1990 days ago
                  > Making signed overflow/underflow trapping behavior would also be acceptable.

                  You can just add the right compiler flag if this is the right tradeoff for you application (it isn't for everyone).

        • colanderman 1990 days ago
          I would guess that the reasoning (which I agree with) is that defined signed overflow precludes a standards-compliant compiler both from making certain optimizations (made under the assumption that overflow will not occur) and from performing error checking (e.g. -ftrapv), and enables the writing of confusing code (which unsigned arithmetic would make more clear).

          Forced to choose, I'd rather see signed overflow be required to abort the program or throw an exception than wrap.

          • int_19h 1990 days ago
            The next best thing would be for C++ to adopt something like Zig's modulo-wraparound operators: +% -% *% /%. Writing a proper implementation of signed overflow in portable C++ is annoyingly complicated right now, for something that can be useful so often.

            Or better yet, some way to specify an expression context for all arithmetic operators contained inside - the default could still be undefined, but you could then also explicitly request wraparound, saturation, or fast-fail. Something like C# checked(), basically (which gives you fast-fail), but covering all useful cases.

            https://docs.microsoft.com/en-us/dotnet/csharp/language-refe...

            • pjmlp 1990 days ago
              I bet there is already something like that available as library.
              • int_19h 1990 days ago
                There is, e.g.:

                https://github.com/dcleblanc/SafeInt

                But IMO something like this should be a language thing, just so that it's not substantially harder or more verbose to use operators with explicit wraparound or trap.

                • pjmlp 1990 days ago
                  What we need is a winner in the C++ package manager wars, then adding such dependencies wouldn't be such a big issue.
          • kazinator 1990 days ago
            That's bad for code that can handle overflow and do something reasonable, like switch to bignum integers.

            If wrapping takes place, overflow can be detected in-line; no funny control paths via exceptions or whatever.

            Without the spectre of UB hanging over overflows, efficient code for dealing with overflows can also be ISO-defined at the same time.

            • colanderman 1990 days ago
              > That's bad for code that can handle overflow and do something reasonable, like switch to bignum integers.

              I see that as an argument for throwing exceptions or setting a flag on overflow, not for silently wrapping to negative values.

              Defining signed overflow means that I can't enable runtime checking of it without erroneously flagging intended cases of overflow. To deny that to support the tiny percentage of developers who write bignum libraries strikes me as a poor tradeoff.

              (Simply checking for a sign change isn't sufficient for implementing bignums anyway. It's trivially possible to multiply a large positive number by a small positive number and have it wrap not only past negatives back to positives, but to a number larger than the input.)

    • reality_czech 1990 days ago
      "Post-modern" C++? Finally someone admits that the language was all a huge prank.
      • arximboldi 1990 days ago
      • Koshkin 1990 days ago
        C++ is more like a drug - having used it makes it impossible to go back to C.
        • kazinator 1990 days ago
          It is absolutely possible. I did C++ development as a regular job years ago, now I'm coding in C (for that type of coding).

          What will get you off the C++ drug easily is Lisp.

          Once I discovered Lisp, C++ had no place in my "personal spectrum" any more, but C (including the C-like subset of C++) still did. Well, it wasn't so sudden, mind you. More like: the more Lisp I knew, the less interest I had in C++.

          (Idiotic lambda implementations and whatnot will not woo me back, sorry.)

          • int_19h 1990 days ago
            Automatic resource management killed C for me. The need to write endless towers of explicit error checks, or endless goto cleanup, and making sure that you explicitly clean up things in the right order, is the programming equivalent of washing your dishes by hand instead of using a dishwasher.

            Now, if C had something like Go's "defer"...

            • exDM69 1990 days ago
              > Now, if C had something like Go's "defer"...

              In GCC and Clang you can use __attribute__((cleanup)) to achieve this. I think MSVC has a similar construct.

              I'm pretty sure I've seen a portable "defer" library for C somewhere.

              • pjmlp 1990 days ago
                > I think MSVC has a similar construct.

                Microsoft rather focus in C++.

          • svnpenn 1990 days ago
            Sorry if this is a hated question - but are all Lisp the same rats nest of parenthesis?

            ive always been interested in Lisp but that is so ugly - i would a JavaScript/Ruby/Python method chaining to 5 levels of nested parenthesis

            • dkersten 1990 days ago
              > ive always been interested in Lisp but that is so ugly

              If it being ugly to you is enough to prevent you from trying a language, then I don't know what to tell you. Personally, I find javascript incredibly ugly, but that doesn't stop me from using it.

              When Python was first picking up steam, I remember a lot of people complaining about how ugly significant whitespace was and how they didn't want to try it because of that. After using Python for a short time, the significant whitespace simply fades into the background and its not something you really ever think about, so its really not a problem. Lisp parentheses are the same. After a short while, they simply stop being something to think about and they fade into the background. Besides, Lisp programmers use tools to help them: paredit, rainbow parentheses and more recently parinfer. I now find editing with these so much more pleasant than editing any language that isn't based on s-expressions. Lisp syntax also tends to be very regular, few things breaking the s-expression rules.

              > are all Lisp the same rats nest of parenthesis?

              No, Clojure, for example, goes to a little bit of effort to remove parentheses. There are still enough to annoy people, but far fewer than other Lisp dialects. Clojure also mixes up the types of symbols it uses (eg function parameters are in [] instead) which helps to visually break things up, making it easier to read.

              > i would a JavaScript/Ruby/Python method chaining to 5 levels of nested parenthesis

              In Clojure you can replace something like this:

                  (c (b (a 1 2)) 3 4)
              
              with

                  (-> (a 1 2)
                      b
                      (c 3 4))
            • billsix 1990 days ago
              I implemented a compile-time test framework in Gambit Scheme in 7 lines of code, and Python's yield in about 20 or so.

              http://billsix.github.io/bug.html#_make_generator

              Lisp code has few wasted moves. Programming in it does require changing your mindset of syntactic beauty. And once your mindset changes, you can make whatever syntax you want.

              Perhaps the biggest hurdle is that the code is not best read linearly. The reader must understand the order of evaluation, which follows very simple rules, in order to understand the code correctly. That hurdle is definitely worth the jump.

              • Ace17 1990 days ago
                > Perhaps the biggest hurdle is that the code is not best read linearly. The reader must understand the order of evaluation, which follows very simple rules, in order to understand the code correctly.

                Could you please give some details about how the order of evaluation differs from, say, C? Thanks!

                • billsix 1989 days ago
                  They are very similar. They are both applicative-order, meaning that the arguments to a procedure are evaluated before the procedure is applied.

                  The evaluation of a lambda results in a function value, which captures the enclosing scope, but the procedure is not yet applied to anything.

                  But the main difference that I've seen anecdotally is that imperative programmers as a whole tend to get confused by nested expressions, or lets just say they prefer sequential statements over nested expressions. My assumption is that they don't fully understand the order of evaluation in their language of choice.

                  • kazinator 1989 days ago
                    Scheme and C evaluation rules are very similar: in both languages, the order of evaluation of function arguments is unspecified.

                    Common Lisp is left to right, so that (list (inc i) (inc i) (inc i)) will reliably produce (1 2 3) if i starts at zero.

              • Koshkin 1990 days ago
                > whatever syntax you want

                “... any color as long as it’s black.”

                • billsix 1990 days ago
                  You neglected to quote the full sentence, which wasn’t even long.

                  To change your mindset, read:

                  http://www.paulgraham.com/onlisp.html

                  Or instead, how bout this. show me your implementation of generators in the “user space” of your language of choice. Show me your compile-time test framework.

                  Both of these are relevant to the article, what C++ may provide to users in 2 years. But I did these on my own, without requiring Marc Feeley’s approval nor his implementation (he is the creator of Gambit)

            • Koshkin 1990 days ago
              Thing is, most of the programming languages suffer from this problem in one form or another. There’s two ways to avoid it - either by using indentation (Python, Haskell) or by using the “concatenation” syntax (Forth, Joy, Factor).
        • pjmlp 1990 days ago
          Safer system programming languages with good type systems are like a drug that make it impossible to go back to C.

          My first drug was Turbo Pascal, from 4.0 all the way to TPW 1.5.

          Naturally the only thing C had going for it was being a language with an OS. As in regard to its features it was already unatractive in 1990 for me.

          Thankfully in 1993, C++ was already an option that also came with an OS, as it became adopted by OS vendors in their SDKs.

      • de_watcher 1990 days ago
        Try not to flood the topic with empty comments.
  • fourthark 1990 days ago
    Looks like switching to regular releases of the standard 10 (!) years ago, and all the work they put into proposal workflow, has really paid off.

    C++ stagnated for a long time, but IMO Boost's metaprogramming reinvigorated the language and community.

    Now they are following through with concepts, metaprograms that can do everything runtime programs can do, and soon reflection. Wow.

  • KerrickStaley 1990 days ago
    I'm really exited for the addition of ranges to C++20. Ranges are objects that contain both a start and an end iterator. So you can simplify the APIs of functions that take start and end iterators as separate arguments.
    • Sharlin 1990 days ago
      It's been a long journey; the first version of Boost.Range dates back to 2003. It's been understood for a long time that functions taking iterator pairs compose very poorly, but as always, the devil is in the details. But compared to modern composable abstractions such as Rust iterators, the STL paradigm feels positively archaic.
  • mehrdadn 1990 days ago
    Does anyone have examples of well-written modern C++ code out there? I used to consider myself a C++ expert like circa 2011 but it's advanced to so quickly that it's almost like a foreign language to me... (though I have tried to keep up a bit... but it's hard without nontrivial examples)
  • stabbles 1990 days ago
    C++20 must be exciting with ranges, concepts, modules and easier metaprogramming.

    Are there any numbers/graphs on how many proposals are being submitted over time? I feel like they are exploding.

  • saagarjha 1990 days ago
    I'm very excited to see Ranges and Concepts being adopted, since this makes writing and working with templated code that deals with containers much easier.
  • zamadatix 1990 days ago
    > at this meeting we decided to target merging Networking into C++ for soon post-C++20 (i.e., targeting C++23).

    It'll be nearing a decade of "next release" now as it's own TS and near two decades of proposal since TR2.

    Edit: The Reddit trip report lists 23 as optimistic and 26 as the conservative.

  • adamnemecek 1990 days ago
    The concept syntax is actually somewhat pleasant. The previous one was such a monstrosity.
  • grandinj 1990 days ago
    Is it not time that specifications for additional features for C++ came with a test-suite? Preferably something shared via git somewhere that people can contribute to.

    Would make bootstrapping some of these new features to a usable level much quicker.

    Speaking as a dev that needs features to span 3 major compilers (clang/gcc/vs) before I can use them.

    • pjmlp 1990 days ago
      You can buy a ISO C or ISO C++ certification suite.

      http://www.plumhall.com/suites.html

      https://www.opengroup.org/testing/testsuites/perenial.htm

      http://modena.us/

      https://peren.com/pages/cppvs_set.htm

      Ah just needing the support across 3 major compilers!

      I guess you haven't enjoyed the glory days of writing portable C, C++ or Pascal code during the 1990's.

      • grandinj 1990 days ago
        I work on a codebase (LibreOffice) that has seen those days (and I worked through them on other projects, but thankfully without needing cross-platform support).

        I note that there is some sharing of test-suites between the different compiler vendors, certainly they occasionaly run their compilers against some of each others test.

        But a central shared resource that is updated by multiple people to catch edge cases would be first prize.

        (And hopefully, seeded by the proposal authors, to act as a useful starting point)

  • pfarnsworth 1990 days ago
    C++ is changing way too fast to be useful anymore. I spent the first 15 years of my life loving C++ but in the last 8 years or so, the C++ committee has vastly outpaced its supply lines. I've been out of the language since then and I might as well learn a new language rather than try to re-understand the differences between C++ back then and now.

    The only people that can keep up are the hobbyists and the language purists. It's become a circle jerk and that's a shame because they are modernizing themselves out of existence, in my opinion. No one in real software can keep up with the biannual changes and it's getting to the point where migrating to another language like Go or Java is a lot more stable and less risky.

    • slededit 1990 days ago
      C++11 was the big change these are more window dressing. If you like your current language that's fine but as a C++ programmer who had to learn C++11 it was well worth it.

      The more recent additions are more along the lines of "I can't believe that wasn't already in the standard!" rather than new things to learn.

    • ketzu 1990 days ago
      The good part is, you can (mostly) still program as if it was c++03, but you can move on from that part by part. Start adding auto, continue to lambdas, start using range for loops.

      Or you can go on and, whenever you have a question about c++, start googling it as "c++17 <question>" instead of "c++ <question>" to find more modern answers, you might actually like, as some of them are safer to use, e.g., less off-by-one errors, and easier to read.

      I personally felt overwhelmed from all the changes, but watching some videos I saw syntax I really liked and wanted to use! For the most part it was easy to understand, too. Being able to gradually change (at least in smaller projects) was a really nice experience.

    • pjmlp 1990 days ago
      Then good luck keeping up with those languages as well, which release new versions every 6 months nowadays.

      Programming languages are just like any other software product.

      • nikbackm 1990 days ago
        C seems to be pretty stable at least. Even if it too has a new version in the works.
        • pjmlp 1990 days ago
          C is mostly confined to UNIX derived OSes and embedded devs that won't change to anything else even at gun point.

          Yet, even C just got ISO C17 this year, even though it was a minor update.

          https://www.iso.org/standard/74528.html

          Languages either die or get updated to fulfill their customer requirements.

  • eps 1990 days ago
    > So fasten your seat belts, and stay tuned. C++ programming is likely to evolve more, and in better ways, in the upcoming 5 years than it already has in the past 20.

    Yeah, that's the problem isn't?

    You do in fact need damn seat belts to cope with all the changes to what used to be a very predictable and simple to understand language.

    • de_watcher 1990 days ago
      What? C++ has always been one of the rare languages that don't mindlessly add stuff that drives them into a corner.
      • pjmlp 1990 days ago
        C++ has this fame of being a bloated language, which it kind of is.

        However it isn't really that much bigger than Java, .NET + VB/C#/F#, Python, Ada, Common Lisp,... when one looks at the size of printed language, library and implementation specifications.

        It is just that others are more beginner friendly and most devs forget they have a pile of features on their own languages as well.

      • eps 1990 days ago
        And what would these numerous languages be - the ones that did mindlessly add stuff and drove themselves in a corner?
    • Kenji 1990 days ago
      > to what used to be a very predictable and simple to understand language.

      Are we talking about the same C++ language?

      I mean, I get what you're saying, they're introducing new concepts fast, but C++ was never "very predictable and simple to understand". There is a staggering amount of corner cases and ancient obscure features.