"No dedicated hardware"; but for best results you'll need a wired ethernet connection, wired headphones, a good microphone and an audio interface, and on Windows an ASIO driver.
"For the lowest possible latency, FarPlay establishes peer-to-peer connections between users": Except when the peers are on different ISP networks and the packets are forced to travel to some distant node, introducing latency at every node along the way. In my experience with this issue using Jamulus, having a server in the cloud at a suitable location reduces latency. Minimizing "the distance travelled" isn't important if there are few nodes introducing latency. The speed of light along wires is orders of magnitude faster than the speed of sound through air.
"the faster your connection, the better your results will be"; Misleading; it's latency not bandwidth that's critical and a "fast" connection normally refers to bandwidth.
Agree on jamulus. I re-joined my highschool band during the pandemic, despite us all living in different cities. Jamulus jams were an important part of us getting back together, along with zoom calls and passing recordings back and forth. Best thing that happened for me out of the pandemic.
Hope you're not planning to play with a backing track, ASIO4ALL notoriously does not play nice with multiple audio sources. It's almost like someone wanted to backport the horrors of ALSA to Windows because they missed how annoying it was having a single pair of inputs and outputs.
>ASIO4ALL notoriously does not play nice with multiple audio sources.
Yep, that's because Asio4all uses WDM-KS and WDM-KS since Vista doesn't support multiple sources.
Actual ASIO drivers made by sound card manufacturers usually don't have this limitation, as long as you keep the same sample rate everywhere. But it can also vary depending on who made the driver and/or on whether the source apps are using ASIO or a mix of ASIO and WDM/WASAPI. Getting low latency audio to work nicely on Windows can be messy (compared to macOS, at least).
I seriously doubt it. I don't think it's possible for shared WASAPI to go below 20-30ms. How are you measuring it? Input, output or round-trip?
For easy RTL measurements, you can use this: https://oblique-audio.com/rtl-utility.php
Oof, I forgot what thread I was in, because I just meant the buffer size, not the round-trip latency or not even the one-way latency to audio output. Not sure why I said "latency" as that's plain wrong, especially when we're talking from capture in this case.
It's just that I'm more focused on soft synths, and I can get a clean signal out of 64-samples buffers. Granted, that's not what I'd use with any realistic processing (for instance, I use Reaper at 128spl@48k).
While I haven't measured end-to-end yet, I do hope it stays below 20ms. I'm working on a synth-powered rhythm game, and the whole reason I chose to stick with plain WASAPI was to avoid requiring users to install extra drivers and because of Windows 10's low latency stack, with its advertised 0ms capture and 1.3ms output overhead on top of application and driver buffers.
Update: I ran RTL on the budget 2012 desktop and got worse results than I expected at 18ms@128spl for shared-mode WASAPI . For some reason, I couldn't select smaller buffers. On the same hardware, exclusive-mode WASAPI managed 12.25ms@128spl, and ASIO4ALL managed 12.5ms@64spl and 15.1ms@128spl.
Thanks for testing. That 18ms result is actually much better than I expected for a shared mode. It got me curious, so I tried it and I was able to replicate it with my Realtek (I got 19ms). I'm still a bit skeptical about its real-world use, because I've experienced some garbled/distorted audio with some low latency modes. I also can't find that mode in Reaper, which usually has everything. Still, it looks promising.
You might want to check out audio processing on Linux with a (soft) real-time kernel. The choice of plugins is limited, but it is reasonable to run a 5 man band (including three guitar amp modelers and voice processing) at 2.8 ms (internal) round-trip latency (plus some ms for AD/DA) on a "some what beefy but still just a laptop"-laptop.
Actually, the RT_PREEMPT stuff gives you worst-case blips around the 100-300 microsecond mark, and if it's just audio with remotely tolerant handling of buffer under/overrun, you can ignore those and use the more normal latency ceiling around 20-50 microseconds.
Note: 192 kHz is 5.2 microseconds/sample, 48 kHz is 20.8 microseconds/sample. The 15 cm distance between the ears takes around 100 microseconds to traverse (at the higher speed-of-sound in the head, vs. free-air).
The 1m distance of air for close-by human 1:1 talking takes a full 3 milliseconds to traverse.
There is a non-profit  with hard-realtime applications (including CNC) that runs a few racks of systems with latency monitoring.
For example, the blue rack3slot0 line  is a histogram for an almost-standard distribution kernel on an IvyBridge Xeon-E3 running a thread with timer interrupts every 200 microseconds for about 5.5 hours (100 M times, specifically), and recording the latency of that interrupt.
As one can see, there were about 20 at-or-above 20 microsecond delay, and even then just barely over.
With remotely decent under/overrun hiding, 10 microsecond latency should be easily usable.
And yes, those systems had background load at normal priority and this realtime thread at high priority:
> Between 7 a.m. and 1 p.m. and between 7 p.m. and 1 a.m., a simulated application scenario is running using cyclictest at priority 99 with a cycle interval of 200 µs and a user program at normal priority that creates burst loads of memory, filesystem and network accesses. The particular cyclictest command is specified in every system's profile referenced above and on the next page. The load generator results in an average CPU load of 0.2 and a network bandwidth of about 8 Mb/s per system.
Not entirely sure what you are getting at, I am guessing "10 ms latency is good enough for audio"? Plus "normally we are so far away from the speaker that it does not really make a difference"?
That figure is thrown around a lot and is definitely grounded in some solid research ... just ... lower latency numbers (in jackd) "feel" better when playing guitar. There is a lot of subjectivity in the guitar playing world and I am definitely not immune to that.
So ... 2.8 ms round-trip time in jackd plus 1-2 ms for AD/DA conversion plus 3 ms that the sound takes to travel from the speaker to my ear (plus any latency that the brain needs to process the sound). 2.8 + 2*1-2 + 3 already gets us very close to 10 ms.
No idea what I am getting at here, but I am on my third generation of modelling amps (cheap M-Audio BlackBox, POD X3 Live, now Guitarix) and while I never really had an issue with the latency ... I feel like I probably would if I went back to a previous generation.
There is a lower threshold beyond which you won't feel the difference anymore.
Also, by replacing a speaker with headphones, there is enough latency budget from that that can be spent on lightspeed delay for 100+km distance, if optimizing the audio stack for deep-sub-millisecond delays using RT_PREEMPT.
Yes, this precludes USB2, but modern computers have quite decent on-board audio codecs (aka A/D + D/A engines) that end up connected to the southbridge and are accessed via PCIe. That has sub-microsecond latency between the digital side of the A/D + D/A converters and the CPU cache.
I guess I mostly just wanted to say "RT_PREEMPT reduces jitter enough to allow sub-millisecond AD->jackd(mixer only)->DA without much effort using modern onboard audio", and to show that is truly little in sound wave path length.
"the issue" refers to the audio latency of speakers, I assume? If not, please elaborate; my understanding of the matter in practical/human terms is fairly fuzzy due to most of this being very domain-specific knowledge I haven't been in the right places for.
Oh, I know USB-attached-SCSI (the good USB3 storage protocol) is nice in latency due to actually exploiting the dedicated TX/RX lanes for latency benefits. It just shoves command packets towards the drive, and receives response packets when the drive has them ready.
However, USB still has comparatively severe driver overhead due to the MMIO-level protocols, to a similar (but iirc worse) extend as AHCI (with NVMe being the better replacement).
My main platform is Linux :) on it I run a RME Multiface II ; with it I can go down to 32 samples of latency and still do some useful stuff without even using a RT kernel (can only go down to 64 in Windows w/ ASIO).
Recently I had a cool art project where we ran 48-channel ambisonic sound spatialization + live video effects, all that from a single Dell laptop sending audio through AES67 (so Ethernet) and video on 3 1080p outputs. Linux is incredible with the right hardware !
I love the idea of connecting musicians over the internet.
But in many music technology applications latency is critical and from a musician's standpoint the technically achievable latency is not suitable for actual music making apart from some exceptions - even with dedicated hardware.
When making music that depends on a beat, pocket or groove any roundtrip latency larger than 3ms is noticeable and >6ms is not playable.
If you play software synthesizers on your computer you probably will be aware of the issue since the sample rate and buffer settings of your soundcard alone introduce latency that ranges from playable (for me <4ms) to unacceptable.
Since network latency alone can easily get worse than that I don't see technology like this being usable for playing serious music that focusses on rhythm.
Note that the demo they use is piano and singer which in combination with the chosen song is decently forgiving latency-wise.
I'd like to hear a demo with a grove where they switch between each participant to hear each musicians version.
As it happens I used to play the organ and yes latency can be a real issue, especially in orchestras.
Orchestras are notoriously behind conductors to the point that conductors conduct ahead of the time they want the beat to be.
Drummers in orchestras need to guess (and have the experience) at which point after the conductor's beat they hit in order to be in sync with most of the other musicians. That's a tough spot to be in latency-wise :)
Apart from that organ music tends not to have a strong focus on rhythm and groove, in which case latency is less of an issue.
When Discord first launched and my usual TeamSpeak friends moved over there I was super annoyed by the extra latency. How a group of fairly serious gamers who otherwise complained about less lag in any other circumstance shrugged it off, I'm not sure.
More recently, I was surprised by low latency was between two Asterisk servers in the same city on different ISPs. It was a very adhoc setup with a cheap EOL Cisco IP phone on either end connected to an Asterisk SIP server on the local network. I'm so used to laggy voice chat that it actually caught me off guard how nice it was once I actually got it to work. Besides being a total nightmare to configure.
I struggle enough in person to find the right time to talk without interrupting, and >100ms of Discord latency makes it that much worse for me. I hope something peer-to-peer like this catches on for remote teams. I could really benefit from it. I don't think I would mind if a screen share lagged behind the speaker's voice. I'm sure Zoom already does that though.
Networks just suck, period. Imo, in WebRTC (which powers stuff like Zoom or Teams), there's nothing in particular that adds latency - connections can be peer-to-peer and it uses UDP under the hood, with latency optimized codecs, from that point, there's little left on the table, that can't be traced back to poor network quality.
Well, there’s actually a standard for low latency LAN audio that uses PTP “precision timing protocol” a frame prioritization protocol. Given a switch that supports PTP, including some bog-standard Netgear models, you can use what is called AVB or audio video bridge. You can use an AVB driver directly on each system and audio being processed on one system can playback over another’s audio hardware!
So, there’s also a layer three standard called AES67 from the Audio Enfineering Society and this competes directly with systems like Dante.
I have no idea how these perform over the broader general internet as I would think that each hop alters the latency landscape significantly
Not the parent post, but I've been thinking about unifying all devices in my home by sending all audio streams into a single, unified "sound server" from which I can easily switch between TV audio, Bluetooth, headphones, etc.
The idea is to be able to take a call on my phone and receive notifications on my laptop while developing on my PC. It'd also be nice for streaming stuff in general.
As many people who have also looked into this stuff probably already know, interactive audio is really sensitive to latency. I've given it a quick shot by running pulse over the network, but honestly I wasn't impressed. I've given up for now, but some suggested software here might be good enough for a second shot, based on pipewire and its JACK backend this time.
That sounds too connected - I like the idea of some sort pulseaudio network services but with better latency. I'd like to see pulseaudio with a better cross platform UI and a set of decentralized services versus introducing a single point of failure like you are proposing. Maybe a solution with dns-sd with stun support and a slick UI for desktop/mobile?
Did you try a local mumble server? Haven't played with it for a long time but last time it was much faster than teamspeak and we used it happily at home sitting next to eachother. That and get better headphones that close out sound from environment :-)
I've been looking for a simple way to be able to listen to audio from my desktop and laptop simultaneously without requiring a hardware mixer or something. I'm currently using Scream but there is a bit of latency/delay.
Farplay seems to be more for audio recording, but would Farplay be a good solution for my use case?
This isn't a use case that we've considered, but FarPlay would work well for this. If you want to play a file on one computer and have it play at the same time on the other, you'll have to play the file into FarPlay by using something like BlackHole, Jack or LoopBack, depending on your platform.
What's the point of this? Stuff like Stadia already works on top of WebRTC, and they can have sub ~100ms latency, where most of it is in the network, which no amount of driver trickery can do anything about. Add to this the fact that humans are generally more tolerant of audio lag, than video, and I'm not sure if you don't have video, or real-time input, how do you event detect such low latencies. If it's for musicians, as depicted on the page, you generally need to connect more than 2 of them.
It's different when a game is making a sound and you just hear it a bit later than you would if it were local. This case is for when you need a local sound to occur at the same time as a sound sent over the network. Latency is very noticeable.
big win here is that throu self-hosting one can pick the optimal place i.e. an isp "in the middle" or otherwise local to the participants. just stay away from places that host a lot of content (ovh, hetzner, big-cloud)
Everything is relative and everything is a tradeoff, so I think what they are trying to say with ultra-low latency is that they have tried to minimise possible sources of latency and that make all possible trade-offs to favour lower latency.
A streaming music service might for instance buffer audio (which incurs latency) to get reliability, which FarPlay probably doesn't do
I'm looking right now at a mtr (traceroute) screen from a rural fibre-to-the-premises broadband connection in Lancashire, north-west UK, achieving 2.5ms RTT to Manchester (the nearest major city, ~60km) and 8.5ms RTT to London (~325km). Things are improving.
That's only ~10x and ~8x the light travel time, assuming straight lines which is best case. ~8.3x and ~5.3x respectively if you consider that light is much slower in fibre than air.
Not all clients have an available P2P route between them, and require a dedicated TURN server to forward the traffic between them. So there are some ongoing compute costs. But they also put the upfront effort into writing the code, and there is an ongoing maintenance burden. And they're looking for a way to turn a profit on their work.
Generating those sessions to punch through NAT will need some sort of backend service, but I don't think it's subscription worthy.
I would be happy to pay for this if it were a once off once the beta is over, and if FarPlay needs some dev money for upcoming features or functionality, then I would be happy to pay for the updates as well instead of a subscription.
Stateful firewalls let UDP traffic out. And they generally let "replies" come back from the destination IP & port to the source IP and port on the host. That means that you just need a central server to coordinate the session ID and the IP & port of each side. Then the two peers can send each other data directly and the firewall will let it through because it's a "reply" to the data that was sent.
I think they are referring to opening ports on your router as opposed to the automatic opening of incoming ports when connections are opened from the client machine. They probably use the STUN protocol to facilitate connecting two machines together peer-to-peer like what is done with WebRTC.
Hi everyone, Dan Tepfer here, co-creator of FarPlay. As a skeptic myself, I appreciate the skeptical tone of this thread. And I'd like to address a few points made here about FarPlay and low-latency audio.
First, a little about me: I've been coding for most of my life (see https://www.youtube.com/watch?v=SaadsrHBygc for my NPR Tiny Desk Concert of my improvised algorithmic music project Natural Machines) but I'm first and foremost a musician. I make my living playing concerts around the world as a pianist. During the pandemic, I used JackTrip to perform remote livestream concerts with some of the greatest musicians in jazz: Christian McBride, Cécile McLorin Salvant, Ben Wendel, Gilad Hekselman, Fred Hersch, Antonio Sanchez, Melissa Aldana, Miguel Zenon, Linda May Han Oh and others. This is just to say that music, and particularly rhythm, is very important to me, and that I care about low-latency audio as an active practitioner.
Someone in this thread wrote that for rhythmic (groove) music, latencies of 3ms are noticeable, and latencies higher than 6ms are prohibitive. This isn't the case. Sound travels in air at about 1ft per ms, so a latency of 3ms is equivalent to playing with someone 3ft away from you, which is obviously unnoticeable. 6ms is equivalent to playing with someone 6ft away, which is also unnoticeable. James Brown grooved his ass off with his band spread out over a relatively wide area on stage, long before in-ear monitors, which confirms what the research says: even for advanced professional musicians, latencies up to 20ms (equivalent to 20ft in air) are not significantly noticeable even for intricately rhythmic music. Here's an excerpt over JackTrip with Christian McBride, where at the end, we play a demanding bebop head in unison, a very tough test of latency: https://www.facebook.com/watch/?v=1076063889493342. Above 20ms, things do get noticeable, but depending on the type of music you're playing, it's possible to adjust. It starts to feel like the people you're playing with are, as we say in the jazz world, "laying back on the beat". For example, I did a livestream performance for the French Institute in NYC last January with pianist Thomas Enhco in Paris and myself in Brooklyn, 3500 miles away, and despite a clearly noticeable (to us) ~40ms of latency, we were able to make real music together, including rhythmic music. Note that at the time, using JackTrip, I couldn't accurately estimate the actual latency, and this 40ms figure is a guess. FarPlay, in contrast, measures the current latency and displays it on the connection screen.
Someone mentioned Jamulus. I've tried Jamulus, and for my professional needs, which include rock-solid stability and the lowest possible latency, JackTrip is far superior. But JackTrip, as someone else pointed out here, is impossible to use for the average user. It requires opening ports on your router, interacting with the command line, and installing and using Jack, which itself is forbidding for most users. Our goal with FarPlay was to take the best elements of JackTrip, unbeatable stability and latency, and make them easily accessible.
SonoBus is also mentioned in this thread. SonoBus is an excellent project which we only came across a few months ago. We've tried it, and we've found that if you measure the actual sound-to-sound latency, i.e. the time from sound production at the source to sound reproduction at the destination, FarPlay achieves lower latencies than SonoBus, probably because of the way it processes audio internally. Also, we believe our interface, which we've put a lot of thought into, is easier to use for non-technical musicians than SonoBus. Another advantage of FarPlay over SonoBus, this one particularly important to me as a live performer, is Broadcast Output, which is an essential feature of JackTrip that FarPlay co-creator Anton Runov and I invented (see https://farplay.io/about#history). To play in low latency, it's often necessary for the musicians to tolerate artifacts in the audio, since some audio packets inevitably get delayed on their way. Broadcast Output allows you to play in low latency with artifacts in your headphones, while simultaneously outputting artifact-free audio for live broadcast or recording. To me, this is the holy grail of remote performance, allowing us to have our cake and eat it too — ultra-low-latency interaction with no sacrifice in final audio quality (see https://farplay.io/tipsandtricks#broadcastoutput). I should mention that FarPlay only allows one-to-one connections at the moment, while SonoBus allows multi-user sessions. We plan to add multi-user sessions to the FarPlay user interface soon; our underlying processes already allow them.
Some of you are nitpicking our claim to not require third-party software. Remember, we're coming from JackTrip, which requires users to install and use Jack in addition to JackTrip. On Mac and Linux, there is no third-party software whatsoever required. On Windows, low-latency audio is currently impossible without ASIO drivers. Many musicians on Windows have audio interfaces with ASIO drivers already installed, so in their case there are no additional downloads required. If you don't have an ASIO driver, you'll have to use ASIO4ALL, but this true for any software doing low-latency audio on Windows. In essence, FarPlay is as self-contained as it can be at this stage.
Someone asked if FarPlay will connect two users on the same LAN. The answer is yes, it works great.
Someone else brought up the advantages of low-latency audio not only for music, but also for regular conversations. We wholeheartedly agree: conversations feel vastly more natural without the awkward delay added by Zoom, FaceTime, WhatsApp and regular phone calls. FarPlay also transmits uncompressed audio, so the quality is as good as your mic and sound card can provide, which also helps conversations feel more real.
In conclusion, I want to thank you for bringing your attention to FarPlay, and if you enjoy playing music with other people, we'd love for you to try it! It's really quite magical, I feel, and the magic hasn't worn off for me even after having done it regularly for over a year. We've tried to make the process of using FarPlay as frictionless as possible: you don't even need to register for an account to use it, just download it (https://farplay.io/download) and go.
Thanks and Happy Thanksgiving to those of you who celebrate,