All the talk about `printf("%d\n", a);` and it's invalid due to a certain part of a language. From list of undefined behaviours (J.2).
> In a context requiring two function types to be compatible, they do not have compatible return types, or their parameters disagree in use of the ellipsis terminator or the number and type of parameters (after default argument promotion, when there is no parameter type list or when one type is specified by a function definition with an identifier list) (6.7.5.3).
> disagree in use of the ellipsis terminator
printf is called with the following implicit type signature (due to not including stdio.h)
int (char *, int);
However, the actual printf signature is as follows.
int (const char *, ...);
While char * can be used where const char * is being used, int is not a valid replacement for ellipsis terminator. As such, the correct answer is that it's undefined behaviour.
(of course, most implementations will allow this code, but strictly speaking as a language lawyer it is undefined behaviour)
Interestingly, with the K&R version of C we would not have such discussions at all - printf as well as many other functions did not require declarations. So, as long as we can assume that nobody would call printf with the format string as, say, the second argument, everything would work without anyone giving much of a thought. This is but one example of how simple tools reduce the unnecessary cognitive load.
I know most of this trivia except where it crossed to C++, which I don't pretend to understand. And instead of a candidate having a deep knowledge of this, I would rather have the compiler refuse to compile most of the examples. Computers are good at repetitively checking things. Why should I have to?
For C++ especially, most of these conventions and techniques co-evolved with the language, so requiring the compiler to produce an error for failing to follow them would be a breaking change. You might be able to configure your particular environment to produce more errors than the standard requires, which is what some of the mentioned warning flags are about. GCC and LLVM both support syntax like this to flag specific warnings as errors: -Werror=reorder (this specific one will error out if member variables aren't declared and initialized in the same order).
I yearn for compilers that can break backwards-compatiblity and strictly enforce best practices. This should be a achieved with a relatively simple command line flag to the compiler (instead of hunting for the right combination of dozens of -Werror=*, which then still let a lot of unclean code through)
Won't see widespread usage outside of learning because you can no longer trust any compiler upgrade not requiring a rewrite to your code, as it arbitrarily decides what the latest best practice.
What might be useful though is something like -wall printing all flags representing current best practices, so you can have a snapshot of it when writing new code (the problem with current -wall is again potentially arbitrary changes on compiler updates as new warnings get added, if it actually did warn on everything)
Then you can choose when you want to update to whatever modern best practices
That would be fine! Although, I also don't think "best practices" would be something that changes every 6 months (maybe every 5 years), and being forced by the compiler to rewrite code to keep up with best practices is absolutely a good thing!
I've been the lead on a fairly large Fortran project, and we make a point to run our test suite on as many compilers as possible, with the strictest flags to avoid any kind of "undefined" behavior (which is rarer in Fortran than in C). I believe it's the only way to ensure long term maintainability.
There are some places where only the programmer can know that the undefined case doesn't happen (like integer overflow). But yeah, what purpose does it serve that the order of side effects in "a() +b()" is unspecified? I assume that's for historical reasons.
A fixed evaluation order may be (very slightly) less efficient than if the compiler is allowed to reorder. Consider evaluating "x + f()" when x is global and the compiler can't prove that x cannot change during execution of f. If the order of evaluation is fixed from left to right, then it must do the following:
1. load x into a register
2. spill the register holding x to the stack
3. call f
4. reload the "local" version of x from the stack
5. add
On the other hand, if the order of evaluation can be chosen by the compiler, it can do this:
1. call f
2. load x into a register
3. add
(Details depend on whether the caller or the callee saves x, but you get the idea.)
In practice, all competitive optimizing compilers are SSA-based, and thus they fix an order of evaluation early in the compilation process. I always wondered how much performance could be gained by actually delaying decisions about evaluation order.
Not sure if you're tyring to suggest that SSA-based compilers cannot do this reordering. For my example above, GCC reorders as I sketched, while Clang keeps the program order: https://godbolt.org/g/8KDvFo
I think this is done early, during lowering to a "mid-end" intermediate representation (whether SSA or not). This is the best time to do it, since the legality of this reordering is a property of the source language, which you want to forget about in the mid-end. On the other hand, in this particular example (function call vs. global variable access), I don't think there is any reason to delay this; doing the call first should never be worse than loading the global first.
As for deciding other things later, instruction scheduling and global code motion can reorder a lot of things, though calls and memory accesses block many movements.
The knowledge presented in the slides will advance where you are going with your programming in the same way as knowing from which wood chess board is made will improve your ELO rating. Although learning about wood may be more worthwhile goal then learning about topics in these slides.
What is this hoping to accomplish? To convince candidates that they can't print a number? Maybe instead of intimidating people they should be inspiring them?
The knowledge presented in the slides isn’t deep; it’s essential. Can’t imagine a C programmer not knowing that parts of an expression can be executed in arbitrary order. Especially that this is aimed at embedded C programmers. It’s more like playing chess not knowing what „en passant” is.
Knowledge of standards before C99 is not "essential" unless you're on a legacy codebase. Which, to be fair, embedded, is very plausible. But then you're looking for an expert candidate anyways.
There are actively-used platforms that do not have a fully C99-compliant compiler (unpleasant, I know, but such is life :-( ). Also, many products in the embedded field have very long lifetimes (10-15 years), during which they at least need to be maintained. There are a lot of actively-used platforms that did not have a C99-compliant compiler ten years ago, and they are not exactly legacy codebases.
Edit: oh -- and (this I find truly revolting) there are companies that have not updated their coding standards and mandate that all written code should target an older standard (usually C89).
I agree completely. And honestly, it looks like the presenter is trying to brag about their knowledge of obscure C and C++ quirks through the voice of candidate #2.
There is no doubt these slides wreak of humble bragging and maybe even a bit of shaming, which isn't a good look. Obviously, there are important properties to programmers other than knowing obscure stuff.
That being said... It's still solid and I think the slides have many, many important points, both for developer attitudes and things to know about C and C++. A lot of these obscure quirks can hit you hard if you're not paying enough attention. The order of initializer lists is extremely painful and hits me every so often even though I would probably catch it in an interview.
If anyone takes anything away from this w.r.t. C and C++, I'd say the best thing would just be to always code with `-Werror -Weffc++ -Wall ...`
Hold off on the -Weffc++, it is known for false positives. It is based on a set of guidelines for C++ that appear in Scott Meyer's book (first edition!), but these guidelines are routinely violated in well-written code. For example, it warns for any base class having a non-virtual destructor, warns if any member is missing from an initialization list, etc. Some of the advice is inappropriate for modern (C++11) code.
I feel similarly. On the other hand a good intuition about memory representation is important: the difference between global and automatic variables, linker visibility etc. The unknowledgeable programmer from the slides has no sufficient intuition in my opionion. You code C precisely because you want control over these things.
Personally I don't care about the tiniest details too much. I pick them up as I go and typically at least remember that there was something that I can look up again later. Also it's really easy to navigate around most pitfalls. As the first programmer says, "I would never write code like that".
There is nothing especially deep about C. Its design is intentionally extremely primitive, almost to the point of being a thin layer of syntactic sugar on top of assembly. In fact, the creation of C was a reaction to a real deep sea monstrosity that had been gaining wide-spread popularity at the time, PL/I - a devilish mixture of COBOL, Fortran, and even assembly, all dressed in a horrible ad-hoc syntax, yet intended for the use in both systems and application programming - basically, C++ of its time. PL/I was selected as the implementation language of the Multics operating system which, in turn, precipitated the creation of UNIX.
PL/I is probably the most unpleasant language I have ever had to deal with. It makes C, without warnings, look as safe as a nuclear bunker.
The best quote that resonates with me, has to come from an unattributed UNIX fortune file:
> Speaking as someone who has delved into the intricacies of PL/I, I am sure that only Real Men could have written such a machine-hogging, cycle-grabbing, all-encompassing monster. Allocate an array and free the middle third? Sure! Why not? Multiply a character string times a bit string and assign the result to a float decimal? Go ahead! Free a controlled variable procedure parameter and reallocate it before passing it back? Overlay three different types of variable on the same memory location? Anything you say! Write a recursive macro? Well, no, but Real Men use rescan. How would a language so obviously designed and written by Real Men not be intended for Real Man use?
C, with it's almost-assembly and fairly predictable semantics was an absolute blessing.
As a consequence we have secure IBM i (PL/I), z/OS (PL/S), and Unisys ClearCase (NEWP) with OS features not yet present in modern OSes, while C coders are the job security of exploit writers.
Personally I find that I don't ever use any of the more oop features of c++, especially copy constructors and assignment operators.
If I'm in a situation where I need to make a deep copy of an object, I need to ask myself why am I making an exact copy of a complex object with lots of internal pointers and state, an expensive operation. If I'm going to modify the result of the copy, perhaps what I should do is write a function that generates a new, different object based on the original one, in which case I would be passing in a const reference to the old object and maybe some other data about the new object I want to create. And if I'm not modifying the object, again why not just pass a const reference?
Also, the exact behavior of all these hidden functions can be really hard to figure out. Who wants to try and figure out how many times the copy and assignment operator is called if you do a=f(g(a)); and pass by value? No thanks.
I got up to slide 80, it was really fun; although i'm not sure if the 'knowledge' given in this slide is of really high importance (for example, it explains that static variables are automatically set to 0 upon declaration; i'd better wish my team would always make sure variables are initialized after declaration, unless for performance reasons.)
It's a useful knowledge to have in the sense that it implies these variables are stored in the BSS section. Unless there are highly specific reasons why one would do this, though, I certainly agree that relying on this property is not a good idea.
It can be useful, though. Many years ago, we used it to trim the twenty bytes or so that prevented an updated version of our firmware from fitting into the tiny flash space of a device that had been on the market for quite some time.
It is at the very least useful to know it when debugging code written by programmers who thought this feature was nice and relied on it throughout the code (either because they thought it was a good optimization to make, or because it further obfuscated their code and thus contributed to the security of their jobs).
I seem to recall that one of the previous times this set of slides was presented here ( https://news.ycombinator.com/item?id=3093323 , https://news.ycombinator.com/item?id=6596855) , some people commented that they would hire the less knowledgeable candidate rather than the more knowledgeable candidate. I genuinely can't remember what the justification was. A hope that by not knowing the details of the language and how it was implemented on typical hardware, he was a better programmer, I think.
It feels like some people do programming for the sake of programming (like linguists). Others create poems and exquisite novels with it (authors). I’d argue you don’t need to be a linguist to be a nobel (or pulitzer) prize winning author. Why? It’d be a distraction. You’d start focusing on the wrong kinds of things.
Linguists are still highly respectible people. Society needs them. Just that being a great author does not require you to be one.
Is there any good reason for writing "void" in empty parameter lists? I have never seen one and, unless there is one, that declaration is just useless line noise.
int main() is not a prototype. int main(void) is a prototype. int main() declares main with an unknown (at this point) but fixed number of arguments (i.e., non-variadic). Callers must guess the correct number and types of arguments, the compiler does not enforce anything. In contrast, int main(void) declares main with exactly 0 arguments, to be enforced by the compiler.
For main it doesn't matter much since usually you don't call it yourself, but consider:
int f(); // no arguments, apparently
int g(int a, int b, int c) {
return f(a) + f(b, c); // at least one of these is fishy...
}
int f(double d) { // oh, the caller guessed wrong. twice.
return (d != 0.0 ? 0 : 1);
}
GCC and Clang do not complain about this program even with -Wall (they do with -Wstrict-prototypes).
Apparently void foo() will not error when called with arbitrary values - it's implicitly variadic. This surprised me too, but I confirmed it with gcc, even on -std=c1x and with -Wall -Wextra -pedantic.
I find this interesting, but for another reason than what the authors probably intended: the guy, who is presented as the "dumb" one (after a while the invisible interviewer is even making jokes about him), actually shows what many people would think intuitively would happen in their C code, so he is a good guideline for a compiler that wants to take that into account.
Either a brand new compiler or an existing compiler like gcc/clang that adds a new flag that performs its regular optimizations as long as they wouldn't break common assumptions about what their C code would do. Of course it would be hard to find these assumptions, but the linked presentation is a good start.
Personally if i was to do something like this i'd use a simple rule: what would be the dumbest, most straightforward way to implement a C compiler? What effect would some expression have in that compiler? Then this is what the "unsurprising" compiler mode should do - perform any optimizations as long as they do not interfere with that effect.
I think it would be a win/win situation for compilers to do that: they'd get to play their performance game and also provide a surprise-free mode without fully abandoning performance.
I understand why knowing what the standard is useful, but why should one try to know what happens in a case where the behavior is undefined? It's platform, compiler, and optimization specific and is literally just trivia.
Glad I checked the comments after a few dozen slides.
I think this presentation is supposed to convince you that it’s important to have a deep understanding of both the implementation details and official specification of your language of choice. However my take away is that C is really complicated and has no respect for the least astonishment principle.
A huge number of the slides are small incremental changes to a base slide. Things like adding each bullet point, callouts, individual newlines and edits to make code changes easier to follow, etc..
In this short example you compile, the compiler
complains and then you troubleshoot the errors
reported. You don't extrapolate on the variations
possible in another dialect.
You don't optimize early, you don't overthink.
You can then add platform convention, indentation and other
sugar to the base code to fulfill whatever workplace standard or best practice you need to match.
> In a context requiring two function types to be compatible, they do not have compatible return types, or their parameters disagree in use of the ellipsis terminator or the number and type of parameters (after default argument promotion, when there is no parameter type list or when one type is specified by a function definition with an identifier list) (6.7.5.3).
> disagree in use of the ellipsis terminator
printf is called with the following implicit type signature (due to not including stdio.h)
However, the actual printf signature is as follows. While char * can be used where const char * is being used, int is not a valid replacement for ellipsis terminator. As such, the correct answer is that it's undefined behaviour.(of course, most implementations will allow this code, but strictly speaking as a language lawyer it is undefined behaviour)
What might be useful though is something like -wall printing all flags representing current best practices, so you can have a snapshot of it when writing new code (the problem with current -wall is again potentially arbitrary changes on compiler updates as new warnings get added, if it actually did warn on everything)
Then you can choose when you want to update to whatever modern best practices
I've been the lead on a fairly large Fortran project, and we make a point to run our test suite on as many compilers as possible, with the strictest flags to avoid any kind of "undefined" behavior (which is rarer in Fortran than in C). I believe it's the only way to ensure long term maintainability.
1. load x into a register
2. spill the register holding x to the stack
3. call f
4. reload the "local" version of x from the stack
5. add
On the other hand, if the order of evaluation can be chosen by the compiler, it can do this:
1. call f
2. load x into a register
3. add
(Details depend on whether the caller or the callee saves x, but you get the idea.)
I think this is done early, during lowering to a "mid-end" intermediate representation (whether SSA or not). This is the best time to do it, since the legality of this reordering is a property of the source language, which you want to forget about in the mid-end. On the other hand, in this particular example (function call vs. global variable access), I don't think there is any reason to delay this; doing the call first should never be worse than loading the global first.
As for deciding other things later, instruction scheduling and global code motion can reorder a lot of things, though calls and memory accesses block many movements.
What is this hoping to accomplish? To convince candidates that they can't print a number? Maybe instead of intimidating people they should be inspiring them?
Edit: oh -- and (this I find truly revolting) there are companies that have not updated their coding standards and mandate that all written code should target an older standard (usually C89).
That being said... It's still solid and I think the slides have many, many important points, both for developer attitudes and things to know about C and C++. A lot of these obscure quirks can hit you hard if you're not paying enough attention. The order of initializer lists is extremely painful and hits me every so often even though I would probably catch it in an interview.
If anyone takes anything away from this w.r.t. C and C++, I'd say the best thing would just be to always code with `-Werror -Weffc++ -Wall ...`
Personally I don't care about the tiniest details too much. I pick them up as I go and typically at least remember that there was something that I can look up again later. Also it's really easy to navigate around most pitfalls. As the first programmer says, "I would never write code like that".
The best quote that resonates with me, has to come from an unattributed UNIX fortune file:
> Speaking as someone who has delved into the intricacies of PL/I, I am sure that only Real Men could have written such a machine-hogging, cycle-grabbing, all-encompassing monster. Allocate an array and free the middle third? Sure! Why not? Multiply a character string times a bit string and assign the result to a float decimal? Go ahead! Free a controlled variable procedure parameter and reallocate it before passing it back? Overlay three different types of variable on the same memory location? Anything you say! Write a recursive macro? Well, no, but Real Men use rescan. How would a language so obviously designed and written by Real Men not be intended for Real Man use?
C, with it's almost-assembly and fairly predictable semantics was an absolute blessing.
If I'm in a situation where I need to make a deep copy of an object, I need to ask myself why am I making an exact copy of a complex object with lots of internal pointers and state, an expensive operation. If I'm going to modify the result of the copy, perhaps what I should do is write a function that generates a new, different object based on the original one, in which case I would be passing in a const reference to the old object and maybe some other data about the new object I want to create. And if I'm not modifying the object, again why not just pass a const reference? Also, the exact behavior of all these hidden functions can be really hard to figure out. Who wants to try and figure out how many times the copy and assignment operator is called if you do a=f(g(a)); and pass by value? No thanks.
It can be useful, though. Many years ago, we used it to trim the twenty bytes or so that prevented an updated version of our firmware from fitting into the tiny flash space of a device that had been on the market for quite some time.
It is at the very least useful to know it when debugging code written by programmers who thought this feature was nice and relied on it throughout the code (either because they thought it was a good optimization to make, or because it further obfuscated their code and thus contributed to the security of their jobs).
https://news.ycombinator.com/item?id=3093323
https://news.ycombinator.com/item?id=6596855
Linguists are still highly respectible people. Society needs them. Just that being a great author does not require you to be one.
Is there any good reason for writing "void" in empty parameter lists? I have never seen one and, unless there is one, that declaration is just useless line noise.
For main it doesn't matter much since usually you don't call it yourself, but consider:
GCC and Clang do not complain about this program even with -Wall (they do with -Wstrict-prototypes).Learnt something!
Either a brand new compiler or an existing compiler like gcc/clang that adds a new flag that performs its regular optimizations as long as they wouldn't break common assumptions about what their C code would do. Of course it would be hard to find these assumptions, but the linked presentation is a good start.
Personally if i was to do something like this i'd use a simple rule: what would be the dumbest, most straightforward way to implement a C compiler? What effect would some expression have in that compiler? Then this is what the "unsurprising" compiler mode should do - perform any optimizations as long as they do not interfere with that effect.
I think it would be a win/win situation for compilers to do that: they'd get to play their performance game and also provide a surprise-free mode without fully abandoning performance.
error C3873: '0x201c': this character is not allowed as a first character of an identifier
error C2065: '“': undeclared identifier
I think this presentation is supposed to convince you that it’s important to have a deep understanding of both the implementation details and official specification of your language of choice. However my take away is that C is really complicated and has no respect for the least astonishment principle.
You don't optimize early, you don't overthink. You can then add platform convention, indentation and other sugar to the base code to fulfill whatever workplace standard or best practice you need to match.
I bet most HN readers don't speak German.