libxev: A cross-platform, high-performance event loop

(github.com)

289 points | by tosh 12 days ago

9 comments

mitchellh 12 days ago
This is my project.
I use it as the core cross-platform event loop layer for my terminal (https://mitchellh.com/ghostty). I still consider libxev an early, unstable project but the terminal has been in use by hundreds to now over a thousand beta testers daily for over a year now so at least for that use case its very stable. :) I know of others using it in production shipped software, but use it at your own risk.
As background, my terminal previously used libuv (the Node.js core event loop library), and I think libuv is a great project! I still have those Zig bindings available (archived) if anyone is interested: https://github.com/mitchellh/zig-libuv
The main issue I had personally with libuv was that I was noticing performance jitter due to heap allocations. libxev's main design goal was to be allocation-free, and it is. The caller is responsible for allocating all the memory libxev needs (however it decides to do that!) and passing it to libxev. There were some additional things I wanted: more direct access to mach ports on macOS, io_uring on Linux (although I think libuv can use io_uring now), etc. But more carefully controlling memory allocation was the big one.
And it worked! Under heavy IO load in my terminal project, p90 performance roughly matched libuv but my p99 performance was much, much better. Like, 10x or more better. I don't have those numbers in front of me anymore to back that up and my terminal project hasn't built with libuv in a very long time. But I consider the project a success for my use case.
You're probably better off using libuv (i.e. the Node loop, not my project) for your own project. But, the main takeaway I'd give people is: don't be afraid to reimplement this kind of stuff for you. A purpose-built event loop isn't that complicated, and if your software isn't even cross-platform, it's really not complicated.
[-]
- password4321 12 days ago
  > libxev's main design goal was to be allocation-free
  Maybe "allocation-free" should be in the GitHub project description instead of or in addition to "high performance".
  [-]
  - rgrmrts 11 days ago
    It is
    > Zero runtime allocations. This helps make runtime performance more predictable and makes libxev well suited for embedded environments.
    [-]
    - password4321 11 days ago
      To be clear, I am discussing the text under "About" in the top right, labeled as "Description" when edited, which currently states:
      > libxev is a cross-platform, high-performance event loop that provides abstractions for non-blocking IO, timers, events, and more and works on Linux (io_uring or epoll), macOS (kqueue), and Wasm + WASI. Available as both a Zig and C API.
      ... with no mention of zero-allocation though yes it is mentioned later as a feature in the README.
- mattgreenrocks 12 days ago
  Very nice! TBH, libuv sometimes felt like it is popular because it's popular rather than sheer technical prowess. I was never comfortable with how much allocation is done by it, and I don't always find how it deals with platform primitives as useful as I'd like.
  > don't be afraid to reimplement this kind of stuff for you. A purpose-built event loop isn't that complicated,
  Amen. There's no need to view the event loop as mysterious. It's just a while loop that is constantly coordinating IO.
- samsquire 12 days ago
  Thank for you for sharing.
  What do you think are the next steps for a next generation event loop?
  I've been experimenting with barriers/phasers, LMAX Disruptors and my own lock free algorithms.
  I think some form of multithreaded structured concurrency with coroutines and io_uring.
  I've been experimenting with decoupling the making sending and recv independently parallel with multiple io_urings "split parallel io" - so you can process incoming traffic separately from the stream that generates data to send. Generating sends is unblocked by receive parsing and vice versa.
  Interested in seastar and reactors.
  [-]
  - password4321 12 days ago
    https://en.wikipedia.org/wiki/Data_Plane_Development_Kit
    [-]
    - nextaccountic 11 days ago
      Is dpdk still needed after io_uring? io_uring can also do zero-copy packet processing
      edit: there is this thesis https://liu.diva-portal.org/smash/get/diva2:1789103/FULLTEXT...
      On 5.1.5 Summary of Benchmarking Results (page 44)
      > Of the three different applications and frameworks, DPDK performs best in all aspects con- cerning throughput, packet loss, packet rate, and latency. The fastest throughput of DPDK was measured at about 25 Gbit/s and the highest packet rate was measured at about 9 mil- lion. The packet loss for DPDK stays under 10% most of the time, but for packet sizes 64 bytes and 128 bytes, and for transmission rates of 32% and over, the packet loss reaches a maximum of 60%. Latency stays at around 12 μs for all sizes and transmission rates under 32% and reaches a maximum latency of 1 ms for packets of size 1518 bytes with transmission rates of 64% and above.
      > Based on these results, it was determined that DPDK can optimally handle transmission rates up to around 64 bytes, above rate 64% performance increases are non-existent while packet loss and latency increase.
      > io_uring had a maximum throughput of 5.0 Gbit/s and was achieved at a transmission rate of 16% or higher when the packet size was 1518 bytes. The packet loss was significant, especially for transmission rates over 16%, and when packet size was below 1280 bytes. Gen- erally, the packet loss decreased when packet sizes increased for all different transmission rates. The packet rate reached a maximum of approximately 460,000 packets per second. For higher transmission rates and for larger packet sizes, the packet rate decreased. This reached a minimum of around 40,000 packets per second for a transmission rate of 1%. The latency of io_uring is highest at size 1518 and transmission rate 100% with a latency of around 1.3 ms. For lower transmission rates under 64%, the latency decreases when packet size increase, reaching a minimum of around 20 to 30 μs.
      > The results of running io_uring at different transmission rates show that io_uring reaches its best performance on our system at around transmission rate 16%. Above rate 16% there are no improvements in performance and latency and packet loss increase.
      Ok 25Gbps vs 5Gbps seems like a huge difference, specially since io_uring was having higher packet loss as well
    - password4321 7 days ago
      Related: https://news.ycombinator.com/item?id=40122256
    - jmakov 11 days ago
      Thank you for that. Would be interesting to see benchmarks.
- keepamovin 12 days ago
  Three, I wanted an event loop library that could build to WebAssembly (both WASI and freestanding) and that didn't really fit well into the goals of API style of existing libraries without bringing in something super heavy like Emscripten.
  This is a cool motivation!
  Could you drop this into Node to make Nodeex ? A kind of experimental allocation-free Node that somehow carves out the allocations into another layer (admittedly still within the node c code)?
- ajoseps 12 days ago
  I saw ghostty and thought, “isn’t that the terminal written by the guy who cofounded hashicorp?”. I really enjoy your ghostty blog posts and will be checking out libxev!
- adonese 12 days ago
  (Off topic) but any chance you might include me in the ghostty private testers? (adonese@nil.sd)
Jarred 12 days ago
We copied libxev's code for the timer heap implementation in Bun for setTimeout & setInterval, and it was a ~6x throughput improvement[0] on Linux compared to our previous implementation.
[0]: https://twitter.com/jarredsumner/status/1736741811039899871
[-]
- hinkley 12 days ago
  Why do you suppose node is still 50% faster on this benchmark? v8 trickery or something in the library differences?
  [-]
  - Jarred 12 days ago
    The timer heap currently lives on a different thread instead of the main thread, which means timers have to be allocated and scheduled separately for each one. Scheduling things to other threads is expensive. The reason it works this way isn't good and we will fix it but haven't prioritized it yet
eqvinox 12 days ago
As someone maintaining a project with its own event loop: don't do it in larger projects.
The problem is that you'll start having dependencies on external libraries. And when those then need event loop integration, things get messy. We've introduced bugs before, caused by subtle differences in semantics. (Like: does write imply read? Are events disarmed while running? What about errors?)
If the lib and event loop are reasonably popular, someone else probably has integrated them before. Or the lib supports the event loop natively (or uses libverto.) Either saves you some trouble.
Also: please add libverto support for your event loop :) https://github.com/latchset/libverto
[-]
- GoblinSlayer 12 days ago
  The interface looks like verto is linux first design, like git. But what's the point? Just implement epoll like Illumos did. Also allocation heavy and apparently can't use deno-style loop.
jauntywundrkind 12 days ago
io_uring support is obviously great & excellent, fulfills the "high performance" part well. brought an immediately smile to my face.
i was not expecting "Wasm + WASI" support at all. that's very cool. implementation is wasi_poll.zig (https://github.com/mitchellh/libxev/blob/main/src/backend/wa...). not to be unkind, but this makes me wonder very much if WASI is already missing the mark, if polling is the solution offered.
gotta say, this is some very understandable clean code. further enhancing my sense that i really ought be playing with zig.
hinkley 12 days ago
I was going to say, "I wonder if Bun.js would/could use this" but it looks like Jarred Sumner has been cherry-picking bits of libxev for at least six months.
jpgvm 12 days ago
Completion based cross-platform I/O? Sign me up.
dsp_person 12 days ago
Can this be used to make something that feels like Qt's signals and slots?
[-]
- jcelerier 11 days ago
  Many signal/slot implementations are done synchronously without any event loop involved, the two are somewhat orthogonal. Even Qt will call the signals synchronously most of the time without the event loop involved, it's just an additional feature of it to queue the event in the event loop.
gigatexal 12 days ago
on MacOS are kqueue and libdispatch/grand central dispatch doing different things?
[-]
- 0x457 12 days ago
  libdispatch/GCD is a task scheduler built on top of kqueue. It's meant for moving things away from the main UI thread without thinking how often you do that.
  [-]
  - gigatexal 12 days ago
    Thanks for clarifying!
asveikau 12 days ago
[flagged]
[-]
- mitchellh 12 days ago
  Hi, this is my project.
  The README only states that I use kqueue on macOS, but I don't claim it is specific or originated from macOS. I've read the README over a few times and can't find where you'd get the feeling that it's a macOS-only thing. If I can edit it in any way to make that clearer let me know.
  libxev is not compatible with BSD currently because macOS's kqueue API is very slightly different from BSDs to make it incompatible (i.e. I use mach ports a lot on macOS, but other parts of the syscall interface also vary slightly).
  [-]
  - asveikau 12 days ago
    If you depend heavily on Mach ports I don't think "kqueue (macOS)" is an accurate description. That makes it sound like it has more of a chance to work on BSD than it does.
    [-]
    - mitchellh 12 days ago
      It is an accurate description. The mach ports are waited on through kqueue, and I use kqueue for all other waiters with "standard" fds (i.e. files). But my usage of mach ports (even for a partial use case) make it incompatible with BSD, and even if I didn't use mach ports the kqueue structures used by macOS are slightly different and incompatible anyways, and I don't claim BSD support anywhere.
      It's splitting hairs and being a bit pedantic, but you also reordered my descriptions: in the README I always say "macOS (kqueue)" and not the reverse which you incorrectly quoted. I think that makes a small but tangible difference.
      [-]
      - asveikau 12 days ago
        I did misread and misquote that. But when a remark is parenthetical I guess I consider them equivalent. macOS and kqueue is not equivalent. Maybe macOS (using kqueue and Mach ports) would make it clearer?
        [-]
        cornstalks 12 days ago
        No one but you is saying macOS and kqueue are equivalent. OP’s phrasing is perfectly fine.
- CharlesW 12 days ago
  Yep, macOS leverages a bunch of technologies and tools that originated in FreeBSD.