To me the valuable insight is "Fibers make asynchronous functions appear to be synchronous. Depending on what color glasses you are wearing, this is either a cool trick or a hidden gotcha. Over time, the consensus of most of the computing community has settled on the side of “hidden gotcha”."
Lots of people talk about "coloured functions", because async functions in e.g. C# do indeed have a different colour that gets transmitted up the call stack. Raymond is hinting there that this is actually good because you can see the colour; working with fibers results in the same thing where you can't see the colours.
I don't get why he goes through all that trouble to estimate the stack instead of just calling VirtualQuery or QueryVirtualMemoryInformation to figure out where the guard page is. Seems a lot more reliable and doesn't require actually touching that memory.
Slight aside, but reading this - as with most any article on concurrency - just makes me think: "thank you Joe, Richard, Bogdan et al for Erlang and the BEAM".
One consistent, easy to grok, scalable, robust, widely-applicable approach to concurrency. I don't need to debate OS processes vs threads vs fibers vs async/await vs whatever every time I need things to run concurrently. I don't need to remember the pros, cons, and pitfalls of each.
There'll no doubt be those who argue that the BEAM is slow / actors have limitations wrt formal reasoning / they prefer the control of cooperative multi-tasking / ... And I'm not saying you're wrong.
But for me, at least, concurrency in Erlang just means no cognitive load in deciding which concurrency primitive(s) to use. There's just one, and it hasn't failed me yet.
> There'll no doubt be those who argue that the BEAM is slow / actors have limitations wrt formal reasoning / they prefer the control of cooperative multi-tasking / ... And I'm not saying you're wrong.
The just rewrite the slow bit in C/C++ approach works well for Python. How easy is it to call out to C/C++ from Erlang/BEAM?
Pretty easy, it’s a fairly standard approach for anything requiring heavy lifting that doesn’t suit BEAM - the relevant term is “NIF” for “Natively Implemented Function”.
LuaJIT (which incorporates the Coco patch to PUC Lua 5.1) makes use of Fibers on Windows [1], because "None of the other methods [for switching C stacks] work for Windows because OS specific code is required to switch exception handling contexts." This allows the following very nice extension to Lua [2]:
> The LuaJIT VM is fully resumable. This means you can yield from a coroutine even across contexts, where this would not possible with the standard Lua 5.1 VM: e.g. you can yield across pcall() and xpcall(), across iterators and across metamethods.
This implies that if a Lua function calls a C function that C function acts like it's on the stack of the current Lua coroutine and can yield (suspend itself and the coroutine) and be resumed. Note that PUC Lua 5.2 added a different (more portable) mechanism for accomplishing the C-lua part of this: lua_yieldk, lua_callk, and lua_pcallk [3], which require the programmer to do all the hard work themselves. LuaJIT just does it by magic.
That PUC Lua and LuaJIT have suspendable C functions is something that IMO sets them apart as truly mature scripting language implementations; almost no other scripting languages can do this!
As far as I understand it, it's all the same concept and the difference is mainly where the scheduler is. Raymond Chen also notes that, while an interesting thing to have in the OS in 1996, they aren't really used anymore. And unlike very kernel-tied things like I/O there isn't much sense to use the OS-provided support that's just on one OS anyway; easier to just include your own scheduler, which also avoids the kernel call overhead.
They're really incredibly useful for writing emulators. You have to simulate 3-8 processors all running in parallel, but doing so with locks and mutexes tens of millions of times a second is excruciatingly slow and painful, so you have to do this in a single thread (unless you're talking about very modern designs that have lower expectations of cycle-based timings.)
Cooperative threads like this let you completely avoid having to develop state machines for each cycle within state machines for each instruction, etc. They let you suspend a thread four levels into the call stack, and then immediately resume at that point once other emulated processors have caught up to it in time. That lets you do fun tricks like only synchronizing components when required, so it can in some instances end up not only far more elegant, but also much faster than state machines, when they're used well.
Seems like adopting async/await throughout would accomplish the same benefits (letting you co-operatively yield whenever you want) while maintaining the performance of the state machine (since that's what async/await is in a single-threaded context).
The key thing is that I need to be able to suspend 3-5 layers deep into the call frame. The instruction dispatcher calls into an instruction which calls into a bus memory read function which triggers a DMA transfer that then needs to switch to the video processor, and then I need to resume right there inside the DMA transfer function once the video processor has caught up in time. So the extra stack frame for each fiber/thread is essential.
Ruby has fibers, though up until recently they weren't used much for anything, except lazy iteration. Recent efforts have built on fibers to provide concurrency with non-blocking I/O:
The main difference, at least from my limited view, is that D fibers have to explicitly yield and as far as I know in project loom they've built this yielding into the JVM when I/O occurs. It looks like libraries have to intercept libc calls in D to achieve the same thing [0].
I also see that D provides a fiber scheduler, but it only schedules on a single thread vs loom's virtualThreadExecutor which will use multiple threads.
To me the valuable insight is "Fibers make asynchronous functions appear to be synchronous. Depending on what color glasses you are wearing, this is either a cool trick or a hidden gotcha. Over time, the consensus of most of the computing community has settled on the side of “hidden gotcha”."
Lots of people talk about "coloured functions", because async functions in e.g. C# do indeed have a different colour that gets transmitted up the call stack. Raymond is hinting there that this is actually good because you can see the colour; working with fibers results in the same thing where you can't see the colours.
https://devblogs.microsoft.com/oldnewthing/20200602-00/?p=10...
One consistent, easy to grok, scalable, robust, widely-applicable approach to concurrency. I don't need to debate OS processes vs threads vs fibers vs async/await vs whatever every time I need things to run concurrently. I don't need to remember the pros, cons, and pitfalls of each.
There'll no doubt be those who argue that the BEAM is slow / actors have limitations wrt formal reasoning / they prefer the control of cooperative multi-tasking / ... And I'm not saying you're wrong.
But for me, at least, concurrency in Erlang just means no cognitive load in deciding which concurrency primitive(s) to use. There's just one, and it hasn't failed me yet.
The just rewrite the slow bit in C/C++ approach works well for Python. How easy is it to call out to C/C++ from Erlang/BEAM?
https://erlang.org/doc/tutorial/nif.html has some details.
(Edited to add docs link)
• Do any interpreters/compilers make use of this functionality? I know D has fibers, and Java is planning on adding them.
• How does it compare against 'green threads'?
• Can't this kind of thing be done without making syscalls into the kernel?
> The LuaJIT VM is fully resumable. This means you can yield from a coroutine even across contexts, where this would not possible with the standard Lua 5.1 VM: e.g. you can yield across pcall() and xpcall(), across iterators and across metamethods.
This implies that if a Lua function calls a C function that C function acts like it's on the stack of the current Lua coroutine and can yield (suspend itself and the coroutine) and be resumed. Note that PUC Lua 5.2 added a different (more portable) mechanism for accomplishing the C-lua part of this: lua_yieldk, lua_callk, and lua_pcallk [3], which require the programmer to do all the hard work themselves. LuaJIT just does it by magic.
That PUC Lua and LuaJIT have suspendable C functions is something that IMO sets them apart as truly mature scripting language implementations; almost no other scripting languages can do this!
[1] https://coco.luajit.org/portability.html [2] https://luajit.org/extensions.html [3] https://www.lua.org/manual/5.2/manual.html#4.7
Cooperative threads like this let you completely avoid having to develop state machines for each cycle within state machines for each instruction, etc. They let you suspend a thread four levels into the call stack, and then immediately resume at that point once other emulated processors have caught up to it in time. That lets you do fun tricks like only synchronizing components when required, so it can in some instances end up not only far more elegant, but also much faster than state machines, when they're used well.
I wrote a bit more about this and showed some examples here if anyone's interested: https://near.sh/articles/design/cooperative-threading
I also use them for my web server because I like them, but there are probably better ways of doing that.
[0]: https://docs.microsoft.com/en-us/sql/database-engine/configu...
https://github.com/socketry/async https://github.com/digital-fabric/polyphony
Disclaimer: I'm the author of Polyphony.
I imagine you've already seen [1] and the two other articles it links to.
[0] https://vibed.org/features#fibers
[1] https://tour.dlang.org/tour/en/multithreading/fibers
The main difference, at least from my limited view, is that D fibers have to explicitly yield and as far as I know in project loom they've built this yielding into the JVM when I/O occurs. It looks like libraries have to intercept libc calls in D to achieve the same thing [0].
I also see that D provides a fiber scheduler, but it only schedules on a single thread vs loom's virtualThreadExecutor which will use multiple threads.
[0] https://github.com/DmitryOlshansky/photon#solution