sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'
Here’s another way of doing this:
echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid
You can’t use > redirection to a different user, because that’s just a file descriptor and they don’t have a user associated with them; but pipes can run between different users like this (well, I guess it’s sudo handling the plumbing there), and so tee can run as root, receiving input from an unprivileged user and then doing what it does, writing the contents of stdin both to a file and to stdout.
This leads to a particularly useful trick in Vim if you edit a file you don’t have permission to write (most likely because you forgot to run `sudo vim` but now you’ve made the changes and you just want to save it, not quit and do it again as root):
I build systems from source. Sometimes I do things before installing a full userland. (Sometimes I never do install a full userland, opting for a smaller customised one.) I have learnt to use what is available on BSD install media and in BSD toolchains. sed is popular choice utility for BSD toolchains. tee, not so much.
I see I was correct in guessing that you weren’t dealing with Linux, since that’s almost certain to use GNU userland, and there tee is part of coreutils, and good luck getting anything done without coreutils. (GNU sed is its own thing rather than being part of coreutils or any other *utils bundle. I don’t know why this is.)
I’m still a little surprised to hear of tee not being present when sed is, but I’m not familiar with BSD. Maybe they care even less about POSIX than I would have expected?
I’ve been a little bit embittered against sed because of the differences across platforms. As usual, the version that comes out of the box on macOS is ancient, for licensing reasons, and the distinction has caused me grief too many times. I seem to recall having to give up on using sed for in-place regular expression replacement in a file when I was only caring about Linux, Linux under WSL 1 and macOS, and ended up using a fairly convoluted Perl one-liner instead (something like -plni.bak followed by rm foo.bak, because -i sans extension broke under WSL 1). Cross-platform operation is trickier than I wish it were!
I prefer ed for so-called "in-place" editing, not "sed -i". I am always using tmpfs anyway and it makes little difference to me if I can see the temporary file created or whether it is effectively hidden. Even NetBSD has added "-i" to sed now, and coreutils does not include ed. I do not see such moves as "progress" but that's just me.
If MacOS sed has "-a" then something like
sed -na 's/a/b/;H;$!d;g;w'file file
should approximate "-i" (but without a temp file). The catch is it will leave a blank line at the top.
Note that there is a significant difference in that `tee` also outputs the file. For something like this a single line with a single character is unlikely to bother anyone, but if you are trying to write an image to a disk it will be rather unpleasant.
rr is insanely useful. There are some debugging patterns that commonly come up for me when I use it.
1. Investigate a crash: run program until it crashes/shows error dialog/other undesired state. If there was an exception involved then set up a catchpoint (`catch throw`) then reverse-continue. You are now in the vicinity of the crash, you can now debug it going backwards in time.
2. Some variable has garbage state, you want to know what caused it. Set up a hardware watchpoint to the variable, then reverse-continue. You can then see the line that wrote to the variable the last time. Sometimes it is a line that is writing to an entirely different variable, then you know you have variable lifetime issues.
3. The executable you debug is timing sensitive (network timeouts for example). Just record a run with rr then debug it later, don't worry about breakpoints in the middle of timing sensitive code.
And I'm pretty sure that there are a ton of other use cases.
These are excellent examples. I’ll throw one of my stories into the pot:
One time I had a client that was using a large and complicated C++ library to build a larger and yet more complicateder C++ program. It had an intermittent crash that they just couldn’t track down. The stack traces they showed me were always deep in the bowels of the C++ library, in places no crash should ever be. The library was open source and widely used; I _knew_ that it didn’t just crash in any of those places.
I recorded it in rr a few times until I captured the crash. I set a watchpoint on the memory address of the crash, and ran the program backwards from the crash. (Memory addresses are stable in the replay, making this kind of thing super easy.) A few seconds later it stopped on the line that was responsible; turns out they were accidentally overwriting the method pointer to the deconstructor in the vtable of this one class. Eventually one of those objects would go out of scope on some random thread, and need to be deconstructed. BOOM. I checked a few of the recordings where it didn’t crash, and in those cases it had just overwritten something much less obvious, like a string containing HTML that it had downloaded.
My advice? First, always make sure your pointers are initialized correctly before you go dereferencing them. You won’t like what happens otherwise.
Second, learn rr. With rr, and a little brain–sweat, you can do in a couple of hours what an entire team of engineers couldn’t do in months. Admittedly they had a lot of things on their plate, but it’s definitely a superpower. I think this might have been the first time I ever ran rr, too; some of that time was getting rr built and installed. I think I even had to ask a question on the IRC channel, because there was a confusing error message.
Third, learn Rust. This was a few years back, when Rust was little more than a crazy idea. If the program had been written in Rust, there would have been a lot fewer landmines for them to step on. That’s a rather different kind of superpower. Incidentally, you can combine these superpowers… but that’s a story for another day.
Compare the time spent using rr to track down the occasional failure against the sum of time spent waiting for the Rust compiler, every build, every day. I know which number is bigger, at least for the sort of code I work on.
In the last 10 years I have spent strictly more time preparing bug reports against compilers than on tracking down memory usage errors. So, while Rust solves a problem all C coders have, it is not a problem that modern C++ coders necessarily experience enough to justify the extra coding effort, build time, and tool maturity risk.
Rust, at this stage of maturity, is fun in a puzzle-solving sense, and modern, and enlightening. No one who learns Rust will regret the effort spent. The only serious risk is that it may take many months to restore one's habit of putting a terminating semicolon where C++ demands one but Rust does not. A shift in preference against designs requiring mutex locks may improve the performance of your C++ code.
The real cost to my client was that for months or years their system had been crashing, and they could do nothing about it. They just had to live with it, rerunning jobs that had crashed. Maybe the compile times would have been unfortunate, but I think they would have come out ahead. On the other hand, they probably wouldn’t have had to hire me.
I watched it (at 2x, 17 mins): The enthusiasm of the convert.
Values are in tension, but time is fungible. When you add up hundreds of hours waiting for very, very, very slow builds, you should wonder if maybe those hundreds of hours would be better spent elsewhere. If you have traded them for two hours of debugging, have you come out ahead?
(Is there any objective reason for the Rust compiler to be two orders of magnitude slower than a normal compiler? Maybe an alternative implementation strategy would help?)
It was telling when he posted his "values" of C++. Suffice to say, when you need to lie to make your case, the argument is already lost.
Your client lived with crashes because they couldn't be bothered to use valgrind, to use the address sanitizer, to use the UB sanitizer, to set a watchpoint in gdb? They didn't need rr.
There is no substitute for competence. If they had been competent to (re-?)write it in Rust, they would be more than competent to spend 0.1% as much time just fixing it, or (better) not coding the bug in the first place. But who could they hire to code it in Rust? There are orders of magnitude too few Rust coders for that to work.
Rust might be a way not to code bugs in the first place, but modern C++ is another way. It doesn't pretend to make bugs impossible, as Rust pretends, but it does remove temptation to bugs: bug-prone code is ugly, and better ways are equally fast. And, modern C++ is mature, fun, fast building, and can "#include" C headers for essential libraries unchanged.
People frequently overstate the slowness of the Rust compiler. Sometimes it’s slower than you’d like, but when you investigate you find that you are doing specific things that are really hurting your compile times. Rust’s procedural macros are powerful, but can turn out to be surprisingly expensive. Breaking your code up into smaller crates can have an unexpectedly large beneficial effect on compile times as well. Etc.
Contrast this with C and C++ where the compile time for a large project is often dominated not by the compiler itself, but by running the linker at the end. For the project I mention here, a full rebuild took over an hour, with the linker taking a few minutes of that. A partial build, when you have changed only one file, took several minutes. Running the compiler on the one file that you changed took just a second or two, but running the linker takes the same amount of time in either case.
But yes, they certainly could have saved time and money if their builds had been faster.
> It was telling when he posted his "values" of C++. Suffice to say, when you need to lie to make your case, the argument is already lost.
Which value or values of C++ do you think he got wrong?
> There is no substitute for competence.
This is true, but I suspect that we disagree on the definition of competence. In my experience, using any of these tools, ever, puts you in a better situation than most programmers. Of course, programmers that use safe languages never need to run Valgrind, UBSan, or similar tools (because the language takes care of memory safety for them), so they’re ahead of the pack as well. But I would bet that 90% of programmers have never used a profiler either. I don’t think that we can call 90% of programmers incompetent simply because they’ve never used a profiler.
You might even say that even the existence of tools like Valgrind and UBSan is a pretty strong condemnation of C and C++.
The reason I recommend learning rr is not merely because it can help you find memory safety bugs in your C++ programs. If all you want to do is that, then there are other more specialized tools that will do the job.
rr is a _general purpose_ tool. It is a debugging system that has powerful features that can be used to debug _any_ problem your program exhibits. I did not at the time know that this bug was due to memory corruption. (Obviously since it was an intermittent crash in a C++ program, it was pretty high on my list of suspects.) I didn’t know that they had never run valgrind on the thing either. But because rr is a general purpose tool, I didn’t need any more information about the nature of the problem in order to find the bug and fix it.
The need for valgrind etc. is indeed a black eye for C. But, as I said, I have not found occasion to use it on C++ code.
Certainly there is plenty of C code, and also plenty of bad and un-modern C++ code, that could benefit from these tools. There is not much Rust code in existence. Imagine running the Rust compiler just once over an amount of Rust code commensurate with those bodies. How many core-millennia would that take? The mind boggles.
Familiarity with essential tools is certainly a prerequisite for competence. One who has not used a profiler might never have been asked to make a program faster. One who has not used valgrind uses some other means to discover the causes of problems; if they spend notably more time than using valgrind or other tools would have needed, that would mark incompetence.
It seems like rr could save many people a great deal of time. It would not have saved me much, this past decade, because it could save no more time than I did spend tracking down proximate causes of trouble.
In general, it is always much better to achieve correctness by construction, using tools that only produce correct or, at worst, easily diagnosed results. Such a toolset might include a compiler alone, but I have found a powerful language that enables good libraries, and such good libraries, yield the same benefit. You need good libraries anyway.
> Which value or values of C++ do you think he got wrong?
Sure. In fact, once this bug was fixed I ran the same program through some tests with valgrind and found two more problems that were occurring less frequently. The difference is that with rr I can record and diagnose a specific crash. With valgrind I would have found three problems, fixed them, and the intermittent crashes would have gone away. On the one hand that’s good enough, but on the other hand I wouldn’t have found out that the weird crashes that couldn’t happen were because we were overwriting a vtable.
To see where a value came from, set a memory-breakpoint and reverse-continue. This is not just useful when you suspect memory corruption (your point 2).
I use it more often with a large legacy C code base. It outputs a complex data structure. There's often several code paths computing values for a particular field. For someone unfamiliar with the code base, it can be tricky to tell which code path was responsible for computing the value in a particular instance of the struct (where I see incorrect output).
By reverse debugging with memory breakpoints, I can trace the data flow backwards until I find where the computation went wrong.
Note: Since I'm developing on Windows, I'm not using rr for that; but Microsoft's Time-Travel Debugger (WinDbg Preview).
> rr does not monitor processes outside the children of what it is recording, and misses any communication through shared memory to an outside process.
The current tendency seems to be moving towards communication through ring buffers in shared memory, not only with outside processes but also with the kernel (io_uring), so unless they find a workaround, rr is going to become less useful as time goes on.
I think we can support io_uring with some work. Basically we need to have rr manage the real ring buffers and give the application fake ring buffers, then have rr copy data between the fake and real ring buffers at defined times.
Does anyone know how rr implements the rewind capability? It seems that it has to keep a snapshot of the entire program memory at every step, or are there any techniques it uses to optimize this in specific cases?
In replay mode, the program is executed again, but rr will replay the recorded syscall effects instead of doing actual syscalls. As a result, the program will deterministically behave the same way it did the first time around.
There's a bunch of more magic to make other things (incl. multi-threading) behave deterministically.
rr uses hardware performance counters (telling the CPU things like "trigger an interrupt after executing N conditional branches") to seek to a particular time in the program's execution.
So as a first approximation, "step back" can be implemented as "re-run from beginning, programming the perf counter break after N-1 steps".
This is where the trouble with certain CPU models comes into play -- these perf counters aren't always perfectly reliable, and rr needs different workarounds for each CPU generation.
And I guess rr has some full program snapshots as well at regular intervals so that the replay doesn't need to start all over from scratch on every "step back".
If you take a snapshot initially then all you need is the results of anything non-deterministic - ie anything which cannot be produced by program instructions themselves. For those you can pretty much get them all by logging the results of system calls. All that is left is things like RDTSC, RDRND, and thread interleaving of a multi threaded application.
This works in theory, but if your program has been running for a minute, going back a second by re-executing from the beginning will take a minute. rr snapshots every second so that going back takes similar time to going forward, rounded to seconds, adjusted for overhead.
Honestly, the best tool is your own brain. The easiest way to develop that tool is through experience. In a previous role, I did a ton of system level coding in C++ [~10yrs], and no tool can match your own intuition and thinking about the problem at hand.
The best way to develop this is to be mindful of what is happening as the code is executing, in terms of the CPU, the memory and any peripherals or other devices you may be using. So for e.g, be very very comfortable tracing through code in an end-end fashion and not just when it crashes. This way the code paths, intermediate variable values, and everything else is ingrained within you. Your brain will very quickly spot situations where something looks off. As you do this a bunch it will get easier and easier, and you will slowly develop an intuition for these things.
If you're focusing on crash dumps and postmortems, you also need to be aware of what raw memory looks like at the bit level, understand the basics of the CPU architecture, etc. Again, the more you do this the easier it will get.
With most things the big-picture view is always dark and murky when you start out. The knowledge and experience you gain will be the light that you shine to gain a clearer view.
>"rr records trace information about the execution of an application. This information allows you to repeatedly replay a particular recording of a failure and examine it in the GNU Debugger (GDB) to better investigate the cause. In addition to replaying the trace, rr lets you run the program in reverse, in essence allowing you “rewind the tape” to see what happened earlier in the execution of the program."
PDS: An awesome set of ideas! A sorely lacking feature in most debuggers...
Actually, now that I think about it... there's this notion of "machine state" associated with any point in time of any given program run...
"Machine State" (of a program run, at a point in time) -- not only includes the set of processor registers at that point in time and the memory used by the program -- but also such things as OS entities, like files, sockets, critical sections, shared memory, etc., that are open and active (and their state!) -- at given point in time, during a program's run...
It may be somewhat infeasible for today's Operating Systems to do this, but in the future, it might be possible to take a "snapshot" of a program (CPU, registers, memory, open OS objects, eveything) at a given point in time -- then be able to reload that snapshot into a debugger and move forward in time from there...
On the one hand, OS'es have had for the longest time, the ability to write "Crash Dumps" of failed programs to disk, and debuggers can load them up and see machine state at the point of the crash -- but you typically can't reverse the program a series of steps from that point, nor can you re-create the now-closed OS objects that the program was holding on to -- nor reinstate their exact state at a given time, which would be necessary to run the point-in-time program snapshots as if nothing happened...
But future OS's -- will/should -- be able to (if requested) -- store state information about a program's open OS objects at a given point in time (perhaps that should be a series of new OS API calls? Like perhaps SnapshotAllOSObjectsUsedByThisProgramNow(); and RestoreAllOSObjectsUsedByThisProgramAtTime(Time);...?). But that's just speculative at this point...
Also -- wouldn't Plan 9 and other OS'es like it, ones that have the ability to move programs/processes from one physical machine to another while the program is running, have the preexisting infrastructure to snapshot a program's open OS objects at points in time, with very little code modification? It seems like they might...
Anyway, rr is a great and awesome step forward for debugging!
rr is magic. Unfortunately I have no use for it since I'm not programming C/C++ anymore.
rr should work on Python, but you need to make sure that you have all of your GDB config set up to properly interact with python objects so that you aren't dealing with the raw CPython too much. GDB plugins are very powerful and if you find the right ones you can have an excellent experience in just about any language. (I don't know the current state of Python but a number of years ago it was pretty acceptable). However it probably will never be quite as convenient as IPDB on pure-python code. So in the past I have switched back and forth, using IPDB for the simpler stuff and breaking out rr+GDB when I needed advanced features (like time travel)
This is just a link, sorry, but undo.io has "call for a quote" reverse debugging for Java.
TTD uses code instrumentation so typically has a lot higher overhead than rr on single-threaded code. On the other hand, it can handle shared memory and use multicore, so for parallel programs on big machines TTD could be faster.
TTD has some debugging features rr+gdb doesn't have, like some cool query-based debugging features. But it doesn't let you run application functions (e.g. prettyprinters) at debug time like rr does. Pernosco consumes rr recordings and blows TTD away IMHO :-).