$ time sleep 12
$ time ./fluxcapacitor -- sleep 12
In the author's words:
Fluxcapacitor is really good at speeding
up client/server or protocol tests.
Actually fluxcapacitor was originally
created to speed up the sockjs-protocol
test suite. It wasn't possible to mock
up a time library - we needed to run
the tests against any sockjs http server,
whether it's in erlang, node.js or
python. It turns out the only way to run
timeout-related tests in a reasonable
time is to use fluxcapacitor.
You might ask how fluxcapacitor works.
In short - it uses ptrace to catch
syscalls like clock_gettime or
gettimeofday and overwrite the kernel
response with a fake time.
Additionally it short-circuits
syscalls that can block for a timeout
like select or poll. For technical
details see the README.
It seems that this library might be useful to anyone needing to test communication protocols by helping tickle those edge cases that happen with timing.
Yeah, this is amazing. It allows for a whole different aspect of troubleshooting (or maybe even fuzzing?). What happens if two different pieces of software have to communicate but can't agree on the time? Is there an annoying caching bug that normally only gets triggered every few days?
(author here) That's the intention! Protocol development, protocol testing. Think - leap seconds. While we don't support having different timer for each process yet, this is easy to imagine.
Also - this is somewhat similar to golang race detector in spirit - looking for explicit synchronizations.
The general idea is that you should be able to run multiple processes within the thing, and that they should be able to just talk to each other freely.
There are a number of "protocol simulators" out there, but I always found them academic. I wanted to build a tool that will allow me run and test and do code coverage of my code, without having to wait forever for timeouts and timers to kick in to test rare branches.
Try using it for consensus debugging. Like wrapping a few zookeeper or console processes with FC. Time is a fun one to debug, slips are not fun to debug in prod even though one thinks ntp is always reliable.
Doesn't 0.05s still seem like quite a long runtime for something like this? I wonder what it is doing under the hood that takes so long.
For start, it seems like they hijack a bunch of clock calls with LD_PRELOAD and then redirect vdso ones to syscalls just so they can interpose with ptrace. Why not do the manipulation in the PRELOAD?
I guess using the debugger API makes it easier to coordinate a bunch of processes, but you could probably do the same thing with some sort of IPC mechanism between PRELOADED targets. In the common case (single program running under FC, no IPC is needed.
This is cool but it should be used with caution. The time your thread spends sleeping may be reduced to nothing, but the time it spends working won't be. This could lead to all kinds of scenarios (timeouts, deadlocks, etc) that are only theoretically possible in realtime. It may or may not be useful to encounter these cases.
Edit: also worth mentioning that a good way of doing this kind of testing when you have full control of the code is by abstracting access to time operations to go through an interface and injecting mock/fake time objects.
But you don't always have access to the code, so I think this tool is very useful.
One problem of creating your own time abstraction class is not only that you need full control of the code base and introduce everyone new to project to always use this class. Some time-based operations can be quite hard to mock, such as wait for thread synchronization primitive or wait for IO with timeout. However those are most often usually used in the domain where you don't want to mock time because, as you say, there the wall-time and the processing-time is more often dependent on each other and you probably want to test everything exactly as it would run in production.