Scaling WebRTC with SFU

Huddle01 is building the next generation of communication systems by taking a decentralized approach to improving the current webRTC architecture. If you are wondering what is WebRTC and why it needs to be efficient, you can read it here and here!

If you are lazy just like me and don’t want to open links to read another two blogs, here is the TLDR for you 😉

WebRTC is an open-source technology that enables projects to have RTC (Real-time communication) over the internet (Mostly video conferencing, also specific audio applications can be used).
WebRTC is a P2P technology, which basically means everyone is connected with everyone. This means there are a lot of connections! I mean a lot!
This limits WebRTC for scaling (i.e having a lot of friends in the same video call) so alternate technologies like SFU and MCU are used. We are going to focus more on SFU here.

Like always, take a quick look at what you’ll find here so that if you know something, you can just move ahead!

SFU? what’s that? 🤔

That’s way too technical, let’s take an easy route! 😉

Cool, is it the best solution? 😎

Why is Huddle01 talking about this? 👀

SFU? What’s that 🤔

SFU stands for Selective Forwarding Unit and is one of the most popular solutions when it comes to scaling WebRTC. The easiest definition after spending a ton of time on google about SFU is,

“In the SFU architecture, every participant sends his or her media stream to a centralized server (SFU) and receives streams from all other participants via the same central server. The architecture allows the call participant to send multiple media streams to the SFU, where the SFU may decide which of the media streams should be forwarded to the other call participants” by Call stats

In easy language, SFU chooses which stream you should receive and which you shouldn’t along with a single uplink to the server.

Before further getting into how it works, let’s first figure out what uplinks and downlinks are. I got you, don’t worry!

Uplink is the path for audio-video information that you send to your friend in a video call ⬆

Downlink is the path for audio-video information you receive from your friend ⬇

Single uplink:

SFU enables a single uplink, unlike P2P it reduces total uplinks from the total number of peers in the call to just 1. As there is only 1 outgoing channel, it is easier to maintain a single uplink channel.

Selective downlink:

SFU acts like a filter for certain incoming downlinks. If there are 20 people on call and 19 are on mute, there is no need for you to have downlinks to everyone, instead, you’ll optimize to have the audio of just one person that is speaking and that’s how an SFU works for video and audio communication.

I know, you might be wondering, what are you talking about? I can’t understand!

That’s way too technical, let’s take an easy route 😉

Let us try to break down SFU with an example. Have you ever gone to a barbecue party with your friends where everyone brings different food & ingredients and then you make a variety of dishes from the same ❗

For the sake of example, let’s say you bring chicken wings, your friend brings barbecue sauce, your crush brings some veggies, etc. Now what you do is process the raw food with everyone but in the end, only pick what you’d like to eat 😋

If you are a vegetarian, you’d only pick veggies and some lemonade, if you are a hardcore chicken lover then you’d opt-in for those saucy chicken wings with some beer 🍻

Won’t the party be better if someone just brought you your choice instead of you talking to everyone and then picking food?

That helper, that filter, is SFU!

Rather than talking to everyone and trying everything, you just talk to one person about what you’d have, and then they help you pick food!

It brings you your preference and you can then zone out the stuff you don’t like instead of having a piece of everything that is made!

Cool, is it the best solution 😎

Of course not!

It is one of the most adopted solutions depending on your use case!

Fun fact: There is another type of SFU called Multicast SFU that optimizes SFU to the next level called Simulcast SFU. In Simulcast SFU, each client has access to the highest quality streams that its local network bandwidth can support. Thus it solves the issue of the lowest bandwidth problem.

MCU (Multipoint Conferencing Unit) is also o an alternative solution that even further reduces the total number of downlinks to 1, this means all the processing and filtering is done on the Server side.

As more people join in on the call, the load on the server increases due to processing done on the server end.

SFU has its own set of benefits, like:

The incoming connection is made to the media server rather than to each participant.
The client does not require a wide outgoing channel because there is only one outgoing stream.
SFU architecture requires fewer server resources than other video conferencing architectures.

Each solution caters to a different problem statement. If your application doesn’t require you to optimize for more than 5 peers in a video call, then you can just use webRTC and dust off the project in style!

But here is a small comparison chart for you.

Why is Huddle01 talking about this 👀

There are a bunch of different solutions that can be applied to scale WebRTC. But, we think SFU is one of the best solutions as it is cost efficient along with and read this very closely, a small alpha, we are building something in which SFU algorithms will play a vital role 😉

I won’t say how and when, but a form of SFU is always easy to adopt.

As we mentioned before, we are building the future of how communications will take place over the internet, so it makes sense that we may be coming up with an innovation. That’s it from our side!

If you have any questions/suggestions or team-ups in mind, reach out on Twitter or land on our discord! See you at the next one 👋

Scaling WebRTC with SFU

Table of contents

SFU? What’s that 🤔

That’s way too technical, let’s take an easy route 😉

Cool, is it the best solution 😎

Why is Huddle01 talking about this 👀