WebRTC broadcasting with WHIP
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/57070 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2022264 / 287
2
4
6
8
12
17
21
23
31
35
37
41
44
45
46
47
50
62
65
66
67
68
71
73
81
84
85
86
90
92
94
100
102
105
111
114
115
116
117
118
121
122
124
127
131
133
135
137
139
140
141
142
145
149
150
156
164
165
167
169
170
171
172
174
176
178
180
183
184
189
190
192
194
198
205
206
207
208
210
218
220
224
225
229
230
232
235
236
238
239
240
242
243
244
245
246
249
250
253
260
262
264
267
273
274
277
282
283
287
00:00
Broadcasting (networking)Real numberBroadcasting (networking)Presentation of a groupStandard deviationBitCASE <Informatik>Communications protocolWordProcess (computing)Moment (mathematics)DiagramEngineering drawingXML
00:49
Boom (sailing)Order (biology)Presentation of a groupTouch typingLink (knot theory)Network topologyBitBroadcasting (networking)Server (computing)AuthorizationComputer animationMeeting/Interview
01:24
Network topologyTranscodierungMixed realityQuicksortBitDifferent (Kate Ryan album)Range (statistics)Data conversionOrder (biology)Server (computing)Open sourceHypermediaRevision controlStreaming mediaCommunications protocolSet (mathematics)Multiplication signComputer architectureSingle-precision floating-point formatBounded variationPoint (geometry)Focus (optics)Content delivery networkBroadcasting (networking)Limit (category theory)Cartesian coordinate systemComputer animation
04:20
Cartesian coordinate systemQuicksortBitBroadcasting (networking)ResultantStreaming mediaComputer animation
04:55
Multiplication signSynchronizationResultantCartesian coordinate systemBroadcasting (networking)Streaming mediaComputer animation
06:03
Broadcasting (networking)Streaming mediaInternetworkingEvent horizonPrice indexStandard deviationHypermediaGraphical user interfaceCodecVorwärtsfehlerkorrekturSampling (music)Codierung <Programmierung>AerodynamicsGoogolDistribution (mathematics)BlogStreaming mediaDifferent (Kate Ryan album)MereologyBitSoftwareCASE <Informatik>BlogRevision controlGoogolDistanceOpen sourceFlow separationData conversionWeb browserTape drive2 (number)Address spaceEvent horizonCodierung <Programmierung>Cartesian coordinate systemBuffer solutionInternetworkingSound effectFile formatEncapsulation (object-oriented programming)Standard deviationInternettelefonieLatent heatMusical ensembleMultiplication signError messageDistribution (mathematics)Term (mathematics)Goodness of fitCodecBroadcasting (networking)Similarity (geometry)QuicksortYouTubeComputing platformGeneric programmingComputer configurationFile viewerImage resolutionThermal fluctuationsBand matrixArithmetic meanAdaptive behaviorScalabilityComputer animation
15:05
Expert systemLatent heatGroup actionComputer animation
15:52
Structural loadMenu (computing)AbstractionCommunications protocolContent (media)Parameter (computer programming)Peer-to-peerToken ringAuthenticationPatch (Unix)Dependent and independent variablesText editorHypermediaServer (computing)Host Identity ProtocolServer (computing)Streaming mediaCommunications protocolGroup actionOrder (biology)Content (media)InformationPatch (Unix)QuicksortToken ringConnected spaceHypermediaImplementationAddress spaceMechanism designRevision controlConnectivity (graph theory)Point (geometry)Interactive televisionBitLatent heatDifferent (Kate Ryan album)CASE <Informatik>DiagramProcess (computing)Goodness of fitCartesian coordinate systemData conversionSystem callComputer animation
21:14
Token ringServer (computing)Computer configurationCompilerExpressionPrototypeDistribution (mathematics)Address spaceEmailStructural loadDependent and independent variablesPatch (Unix)Event horizonSanitary sewerTranslation (relic)Sinc functionServer (computing)Order (biology)Revision controlMessage passingPrototypeQuicksortComputing platformMereologyOnline helpFocus (optics)Cartesian coordinate systemDifferent (Kate Ryan album)Term (mathematics)Fitness functionAuthorizationRepresentational state transferConnected spaceProcess (computing)Radical (chemistry)Event horizonStreaming mediaMoment (mathematics)Client (computing)Patch (Unix)BitFile formatLink (knot theory)DiagramTheoryInstance (computer science)Computer animation
26:37
Service (economics)Server (computing)CodeGroup actionComputer configurationTape driveWeb browserModul <Datentyp>Open sourceClient (computing)Software testingToken ringAuthenticationLink (knot theory)Address spaceStreaming mediaDefault (computer science)ForceInternet service providerLevel (video gaming)GoogolCodierung <Programmierung>Interactive televisionCartesian coordinate systemVideo game consoleUser interfaceServer (computing)Order (biology)Client (computing)Software testingResultantFront and back endsMotion captureDifferent (Kate Ryan album)AuthorizationUniform resource locatorSoftwareMusical ensembleBitOpen sourceHypermediaMachine codeStreaming mediaAxiom of choiceCASE <Informatik>Stack (abstract data type)InformationPower (physics)Presentation of a groupRevision controlCommunications protocolWeb browserSlide ruleInstance (computer science)ImplementationSource codeComputer animation
30:52
Codierung <Programmierung>Computer configurationConnected spaceServer (computing)DisintegrationBoom (sailing)Network topologyJava appletFreewareClient (computing)Software testingRevision controlTape driveBroadcasting (networking)BlogMereologyInterface (computing)Computer networkLocal area networkMultimediaSoftware development kitSoftware developerOpen sourceStandard deviationHypermediaDirect numerical simulationImplementationLink (knot theory)Plug-in (computing)Streaming mediaFunction (mathematics)outputProcess (computing)Streaming mediaLatent heatPlug-in (computing)BitClient (computing)Slide ruleLink (knot theory)Tape driveSoftware frameworkLocal area networkOpen sourceImplementationWeb pageLattice (order)Cartesian coordinate systemINTEGRALMultimediaOrder (biology)Standard deviationWeb browserBroadcasting (networking)CodeSoftware testingBlogMereologyFormal languageDifferent (Kate Ryan album)Communications protocolComputer configurationRevision controlFront and back endsMotion captureServer (computing)Java appletFlow separationFreewareSampling (statistics)Sound effectLocal ringData compressionoutputMultiplication signSource codeComputer animation
37:07
Function (mathematics)Computer wormIcosahedronSample (statistics)Peer-to-peerStreaming mediaTape driveDemo (music)Function (mathematics)DiagramRemote procedure callStreaming mediaDifferent (Kate Ryan album)Client (computing)Tape driveDemosceneServer (computing)Set (mathematics)HypermediaSource codeComputer animation
38:01
Demo (music)8 (number)Broadcasting (networking)Streaming mediaServer (computing)Plug-in (computing)VideoconferencingHypermediaHypothesisEvent horizonInformationInternetworkingPoint cloudNetwork topologyDistribution (mathematics)Overhead (computing)Real numberBroadcast programmingTape drivePresentation of a groupDemo (music)Distribution (mathematics)Process (computing)Streaming mediaSoftware frameworkCartesian coordinate systemStandard deviationLevel (video gaming)Instance (computer science)Server (computing)Basis <Mathematik>Axiom of choiceBitSoftwareDifferent (Kate Ryan album)Computer configurationHypermediaGroup actionBroadcasting (networking)Real-time operating systemNetwork topologyMoment (mathematics)File viewerPlug-in (computing)Overhead (computing)Online helpInternetworkingEvent horizonIdentity managementRemote procedure callComputer animation
44:16
Computer animation
Transcript: English(auto-generated)
00:10
Hi, everyone, and thanks for tuning in for my presentation today. Unfortunately, just as last year, we cannot meet in person, which is why I'm making this presentation from home rather than enjoying a nice gopher in Brussels with all of you.
00:22
But hopefully, you'll be able to enjoy my presentation nevertheless. And specifically, today, I'll be talking about broadcasting as a scenario and why and in which cases WebRTC may actually make a lot of sense for the purpose. And I'll also talk a bit about some of the standardization efforts that are actually taking place in that direction,
00:41
more specifically with respect to the WIP protocol that at the moment focuses more on the ingestion process. But before I move forward, let me just spend a few words about myself. So I got my PhD at the University of Naples a few years ago, and I'm actually one of the co-founders and currently the chairman of a company based in Naples as well, which is called MITECO.
01:01
And some of you may know me as the main author of the Janus WebRTC server that I talk about often in past editions of FOSDEM. And here you can also find a few links about how to get in touch with me or access some of the presentations I made in the past or just check what my hobbies are. And in order to understand broadcasting as a scenario in the first place,
01:24
it may make sense to talk a bit about the most common topologies you usually find out there when it comes to WebRTC. So how people typically use WebRTC servers for the purpose of doing some interesting scenarios. And, of course, two come to mind the most.
01:42
One of those is the MCU topology, where you basically have a focus server that basically acts as a central point for all the involved participants. And then all the media the participants send to the server are mixed, and this mix is then sent back to all of the participants, which has some advantages and disadvantages
02:01
that I will not focus much here on, but it's definitely one of the topologies that was very common in the past. Nowadays, it's more common to see SFU as a topology when it comes to WebRTC instead, where typically the WebRTC server doesn't perform any sort of mixing or transcoding of the streams, but just relays them as they are,
02:21
which is why in conferencing, for instance, it's a very common topology because it means that each participant just sends their own contribution and interested participants can then subscribe to each other in order to engage in a conversation with each other. And SFU as a topology in particular is actually quite flexible because it allows you to do a wide range of different applications
02:44
because, of course, it's very used in conferencing, but it's also just used when you want to do some sort of a limited sort of interaction, for instance, a webinar where just a few speakers are talking and a lot are receiving, or even just focusing on the streaming part, you can also just do a one-to-many kind of approach,
03:01
which is basically a slight variation on the SFU team, if you want, where you basically have a single participant contributing a media stream and a pool of users that are actually just interested in subscribing to that single stream instead, which, if you want, can be seen as a very simplified version
03:20
of broadcasting in the first place, where broadcasting typically is a single stream being ingested and a potentially very large audience subscribing to it. And when we look at how traditional broadcasting usually works, you typically find an architecture that looks most of the times like this. So you have a media source that basically contributes their stream
03:42
via a protocol that is usually RTMP to a streaming server. This streaming server is then passing this media to a CDN somehow, and this CDN then distributes the stream to a very wide audience using a different set of protocols that may be HLS,
04:03
maybe DASH, and maybe others. And this typically works and is also quite efficient most of the times, and it's also an approach that has been around for a very long time. But it does suffer a bit from issues like latency and stuff like this, which is why whenever you're actually interested in, let's say, more interactivity,
04:22
it may be a bit more problematic to use a solution like this, because, for instance, you may want to do some sort of a betting application. In that case, if you're actually doing betting on a live stream, you want the live stream to be as close to real-time as possible. Otherwise, the results may be flowed. You may actually be doing some sort of a broadcast of a concert, let's say,
04:45
and in this case, for instance, we see a concert being broadcasted where the audience is also remote and everybody's contributing their video from a remote location as well. And in that case, of course, if you assume that both the broadcast and the ingestion are working pretty much the same way,
05:02
you definitely want to avoid issues like a desync between the audience and the performance they are attending, especially if they are not actually in sync with each other as far as timing is concerned. And another interesting scenario may be another one that seems to be quite popular recently,
05:22
like live quiz puzzle applications that are being broadcasted live and a lot of people can participate at the same time. In this case, in applications like these, you have a live stream broadcasting the puzzles, making the questions, and so on, and you have to answer as soon as possible. And of course, again, you want to make sure that everybody is on the same page
05:43
and that the stream is as real-time as possible because otherwise the results may be skewed and not really reliable. And these are just, let's say, a few simple examples that already do highlight why having, let's say, less latency might be a very interesting aspect
06:01
in a broadcasting application, which is why this raises the question why not use WebRTC for this instead? Because, as we've said, traditional broadcasting is very efficient and works quite well, especially considering how widespread it is, but at best it can definitely never go beyond a few seconds of delay in the broadcasting,
06:21
which is fine in some scenarios and not in others, as we've just seen. And besides, traditional broadcasting also typically suffers from another side effect that is that typically different users may experience different delays, and this is due to the buffering that takes place in these kinds of broadcasting applications. Different users may be buffering in different ways,
06:41
and so may be actually experiencing the live event in different ways, which can become quite problematic in some use cases, which is why WebRTC may actually be a good idea here, most of all because, as you know, WebRTC was not evenly conceived for very low latency in the first place because it was born for conversational audio, video, and data at the very beginning,
07:03
even though, as years passed, it has been used for much more than that and in several different use cases and scenarios, and often it is actually used for monodirectional streaming as well, whether it is just for webinars, for streaming, or even for broadcasting purposes already. And the weird thing is that, for some reason,
07:23
the broadcasting industry seemed to shy away a bit from WebRTC for different reasons, and WebRTC as a broadcasting technology was actually the main topic of my PhD a few years ago, and I'll actually address this in a few slides, but an important aspect here is that, so far, the industry seemed to, let's say,
07:45
take a bit of distance for WebRTC for the wrong reasons, and actually you can read an interesting blog post by Dr. Alex over there that basically cleared a bit of the fad that was happening around there, so basically the industry focusing on the wrong aspects are actually not being that knowledgeable about WebRTC as a technology itself,
08:03
so possibly seeing obstacles where there weren't actually any. One important aspect, though, is that, for sure, one of the most important aspects to actually foster WebRTC adoption in the broadcasting scenario is actually making ingestion as easy as possible,
08:22
mostly because whenever you think about broadcasting to a platform like Twitch, for example, you always think of just downloading an application like OBS or XSplit or others, just inserting an address, and then you just start broadcasting and streaming, and it is distributed to a very wide audience quite easily,
08:42
so the ingestion part is very easy for a potential streamer. They just download an application, insert a URL, and that works. With WebRTC, it typically isn't that easy, so that was definitely one of the first aspects to focus on, and I'll explain why in a few minutes. But before we go there, let's focus a bit on the codecs first,
09:03
because, of course, if we want to implement a broadcasting scenario in the first place, we need to make sure that the codecs that are involved are actually up to the task, and luckily that is indeed true for WebRTC, especially considering the announcements that WebRTC has seen across the years.
09:23
First of all, for audio, WebRTC mandates Opus, which is definitely a good thing, because Opus was very much designed from the very beginning as an Internet codec, so it was very much designed to be very flexible in terms of sampling rates, bit rates, and very dynamic in that sense.
09:41
It also supports innovative features like forward error correction, in-band, and these sort of things. And it's also interesting that it has different profiles for whether we are actually using it for voice, for instance, for VoIP or conversational audio, or if we are using it for different purposes, like encoding music or other sources of audio that are not, let's say, strictly speaking, voice-related,
10:04
which means that the encoding and the decoding may vary in that case, all in a dynamic fashion. So the Opus specification does take that into account. And it's also interesting to note is that Opus can be encoded whether it is mono and stereo,
10:21
which definitely gives already a wide range of options, but again with different sampling rates and bit rates, but also, and this is a little of a less known feature, it also does support using some tweaks, some kind of surround audio instead. And specifically, in Chrome, there is an experimental way to actually enable surround audio using 5.1 and 7.1 channels instead,
10:44
which is basically performed by using OGG as an encapsulation format for the RTP packets and using SDP in a custom way to actually negotiate these. And it does indeed work. So again, it is not that widely known because it's a bit of a hidden feature and not really documented,
11:02
but it's actually used today, for instance, by Google for study, which means it's definitely a very good option to have available and something that might be interesting in several use cases. And I mentioned the broadcasting of concerts before, for instance, which might be a good example of that. And for video, we are also at a very good place itself because, of course,
11:24
we know that WebRTC does support some interesting codecs out there, like VP8, VP9, H.264, and some browsers also support others. But most importantly, considering that in broadcasting, an important aspect is the adaptability of the codec,
11:40
depending on different conditions, WebRTC does provide some tools to address that for video as well. And in that case specifically, we can definitely take advantage of features like simulcast or scalable video coding, especially recently thanks to the AV1 support. And both simulcast and SBC are actually quite helpful in terms of adaptability
12:01
because they allow means for a source to ingest different resolutions at the same time and then allow the platform to decide which version is actually best suited for different viewers because maybe some users don't have enough bandwidth to receive the full stream, the high-quality stream, but may have enough bandwidth to receive a lower quality stream instead
12:22
and possibly also in a dynamic way to address potential fluctuations on the bandwidth and on the network and still provide a reasonably good quality of experience. Which means that when we have a look again at the generic broadcasting infrastructure that we've seen before, we can typically see a couple of different challenges.
12:42
So one, as I said, is definitely ingestion. So we definitely want to make the ingestion part as easy as possible. And this is indeed a challenge, and I'll explain why in a few minutes. And of course, we also want to make sure that we can also distribute the streams to a very wide audience in an easy way at the same time.
13:01
And this is also an important aspect because the main reason why traditional broadcasting has been so successful in the past is that there is a very long history and tradition in terms of how to actually create and distribute streams via traditional CDNs, which means it's very easy to just send a stream to YouTube Live or Twitch, let's say,
13:23
and then just let the Twitch and Google take care of actually distributing this stream to a very wide audience on our behalf because they have the resources to do that. So if you want to do something similar with WebRTC instead, this could be a bit of a different challenge because in that case, there are different problems that need to be addressed.
13:43
And of course, I mean, it's not just the matter of how you ingest and how you distribute, the scaling part is definitely important as well. So how you actually turn that single stream into a broadcast and how you distribute the stream internally to cover a large amount of audience,
14:00
that's indeed an important part of the question. But again, the main, the first problem to address is indeed ingestion, which is what basically WebRTC is trying to address with a new specification called WIP. And this WIP is not a new effort because basically it was first proposed slightly more than a year ago by Cosmo Software
14:26
and basically we started prototyping it a bit also at the time in September 2020. And the name they chose was actually quite interesting because it allowed me, for instance, to choose a very silly picture for the blog post in the first place using Indiana Jones and his famous WIP.
14:42
And of course, I prototyped this using Janus. But likely, WIP was so interesting to this standardization community that eventually it actually started to be addressed within this standardization community in the first place. And I actually discussed a bit of this also in another more recent blog post if you're interested in having a look at it.
15:02
But before we move forward, I have to give a shout out to Dr. Alex. Some of you may be familiar with Dr. Alex because he was a very widely known and great expert in the WebRTC community and helped foster a lot of technologies, including WIP in the first place. WIP was actually proposed by him and Sergio Garcia Murillo slightly more than a year ago.
15:25
And unfortunately, a few months ago, Dr. Alex passed away, which was a huge blow not only to the whole WebRTC community, but also to me personally because I considered him a good friend and I was always happy to engage with him. So again, there would be no WIP without Dr. Alex.
15:42
So I felt it was necessary to give him the proper credit for that before we moved forward. And in fact, actually, Dr. Alex was one of the main reasons why WIP was eventually adopted by the IETF as a working group specification. So basically, in order to make sure that WIP could be designed by the IETF community as a standard protocol in the first place.
16:09
And he helped basically create the WISH working group, which is the working group where WIP is being created. And if you're confusing now by why the working group is called WISH and why the protocol is called WIP, the reason is quite simple.
16:24
There already existed a working group in the past called WIP that worked on something completely different. And so they had to choose and find a new acronym for the working group in the first place. But the only working group, the only activity that is currently taking place in this working group is indeed the specification of the WIP protocol.
16:44
And right now we are at version 01, which seems early, but we're actually at a very good point, as I'll explain in a few minutes, because there has already been a lot of implementation activity and we're actually quite close to a working group last call as well, which means we may be close to an RFC for the specification, which is definitely very good news.
17:07
And to explain a bit how WIP works and especially what it aims to do, as I was explaining, basically WIP is an attempt to standardize how you do WebRTC ingestion using indeed WebRTC,
17:20
which means it aims at being a very simple HTTP-based protocol to create send-only peer connections. Because again, we only want to do ingestion. We are not really interested in any sort of conversational media here. We just want to make sure that we can send media from an application to a server in order for it to be distributed. And the way that it's supposed to work is very simple on purpose.
17:44
So the idea is that as a client, you just prepare your SDP offer and you put it into an HTTP post request. You address it to a server and then in the answer to that post, you will get the SDP answer, which means that in that way you definitely already complete the negotiation process in a single dialogue
18:06
without having to add any round-trip time over that. Tearing down sessions is also supposed to be quite easy because you just send an HTTP delete. And there are a few other functionalities, for instance, like how you use bearer tokens to implement authentication or authorization,
18:23
how you can perform the trickling of candidates or ice restarts using HTTP patch and SDP fragments and so on and so forth. But really, in a nutshell, it's definitely supposed to be just as easy. So the idea is that you just use HTTP to exchange some information
18:41
and then everything else is your usual and traditional WebRTC, which means that ICE works the same way, DTLS works the same way and so on and so forth, which means that there is a lot we can reuse when we use WIP as a signaling protocol instead. And this diagram is supposed to make it a bit easier to understand because, again, we can just send an HTTP post including our offer and potentially a token.
19:10
The server will send an SDP answer back and the address to a different resource which we can use to trickle candidates and interact with the resource itself, which is important because the endpoint and the resource in WIP are different things.
19:24
The endpoint is your first point of contact and the resource basically addresses your unique stream that is being broadcasted in that case. And once you have exchanged all the information, you do your ICE and DTLS exchanges and eventually you start sending RTP media to the server.
19:41
And in this case, we are representing endpoint resource and media server as different components, but the WIP specification doesn't say anything in that regard. So these three components may be actually a single server, they may be grouped in different servers. It doesn't really matter as long as you are aware that they are actually three different components that work together as far as the WIP server is concerned.
20:05
And as I was saying, there are a few different ways that you can actually enhance WIP in the first place. So after you do your HTTP post exchange in order to implement the signaling, what's important is that you can also trickle candidates using an HTTP patch request after that,
20:23
which means that it's very easy to trickle candidates and the same mechanism is also used to perform ICE restarts. And ICE restarts in particular are greatly simplified with respect to how ICE restarts are typically expected to be done in WebRTC because basically the way that you perform ICE restarts using WIP is that you just exchange the updated ICE credentials,
20:44
which are really the only relevant piece of information that you need to perform an ICE restart if the session remains largely the same. And again, as I was saying, also tearing down a session is quite easy because you just send a delete to the resource, which will tear down the peer connection and basically clear up all the resources that were involved.
21:04
Even though, of course, the tear down of a peer connection may also happen in different ways, like with a nice timeout in the content freshness, for instance, or a detail alert or these sort of things, an HTTP delete is just if you want to make this explicit using signaling in the first place.
21:24
And of course, since WIP is so simple and looks so interesting and effective, I definitely wanted to prototype some version of it one way or another. And of course, since it's all based on WebRTC, I thought it made a lot of sense to use Janus as a starting point for that.
21:41
And as I was mentioning initially, I'm the main author of Janus. So I definitely was interested in checking whether or not it was a good fit as a WIP server in the first place. And the main challenge here is that Janus, as most WebRTC servers out there, implements its own API. So it's not always easy to implement a different API when an existing API already exists.
22:07
And so I thought that the easiest way forward, especially in terms of flexibility, was just to create a very simple API translator in front of Janus, mostly because most of the features that WIP provides are actually features that are provided by different parts of Janus,
22:22
which means that basically orchestrating the features of Janus using its API could be a very simple way of creating a very quick prototype of WIP in the first place, which is why I basically just created a very simple Node.js-based application that uses Express to implement some sort of a REST API.
22:43
And so this REST API implements the WIP API, so exposing all the messages that we've seen in previous examples, and then talks to Janus accordingly in order to establish pair connections, create sessions, and so on and so forth. And of course, since WIP is only supposed to take care of the ingest part of broadcasting,
23:05
that's true for my prototype as well. So I didn't focus on actually distributing anything from the WIP server in the first place. This can be done with the help of Soleil, which is the platform that I worked on in my PhD that I'll discuss later, but it's considered out of scope as far as the WIP server is concerned.
23:23
And this is a completely open-source project, so here you can find the link to it if you're interested, and all you actually need is a working instance of Janus and a working instance of this WIP server, and that's basically all you need to get it running, so everything should work. And the way that it works is in theory quite simple, because if you remember the diagram that we've seen before,
23:46
basically what we need to do is indeed take care of sending an offer to the WIP server and receiving an answer back. And so considering that in Janus the easiest way to perform WebRTC ingesting is using the video room plugin,
24:00
which implements an SFU kind of topology, what we can do is basically receive an SDP offer via WIP, and then basically create a connection towards Janus, create a connection towards the video room plugin, and create a fake participant in a video room instance, basically, so that we can send the offer the WIP client sent.
24:22
The moment Janus sends an SDP answer back, we can basically send it back to the client as well, thus completing the negotiation process. And everything else is supposed to be relatively straightforward as well. So for instance, the trickling of candidates is also relatively straightforward, because it only involves translating between the formats by which WIP describes trickling candidates,
24:45
and how Janus expects trickle candidates instead. As soon as they are exchanged, typically ICE and DTLS can work as expected, and so everything should work. ICE restarts are a bit more complex than that, because as I've explained,
25:01
WIP simplifies the process of ICE restarts, but Janus still expects ICE restarts to work pretty much the way that they are supposed to work in WebRTC, which means exchanging a full SDP back and forth. So the way I implemented this was basically whenever I detect new ICE credentials in an HTTP patch request,
25:20
I can basically reuse the SDP I received previously, update the ICE credentials there, and send the updated offer to Janus in order to perform a renegotiation and trigger an ICE restart. This will basically ensure that I receive in the WIP server an updated SDP answer. I can extract those new credentials from there and make sure that the WIP client only receives those instead,
25:43
in order to update the session and ensure that new ICE exchanges from the new candidates can take place. And again, deleting is also quite easy, because basically deleting a WIP session means that we need to inform that Janus needs to tear down the peer connection,
26:02
and the easiest way to do that is just to tear down the handle associated to the video room connection that was created originally, and this will tear down the session as well. And of course, as I was saying, different events could actually result in the termination of a peer connection,
26:20
which means that if we get events like Azure DTLS telling us that the peer connection is over, we can also notify the WIP server accordingly so that it can clean up the resources automatically, if no signaling had been there to notify that. And running the server is quite easy and also anticlimactic if you want,
26:43
because it's basically just running a console application that just expects some interactions via the WIP API, and I also created a very simple user interface to basically manage the creation of endpoints in order to do testing, to make testing a bit easier.
27:00
And of course, I mean, a WIP server is useful, but only as long as you have a WIP client to test it with. And when we think about WIP clients, of course, there are a couple of requirements. So first of all, we do need the WebRTC stack, because WebRTC is still involved, but we also need an HTTP stack, because that's what WIP is based upon. And of course, browsers are the obvious choice here, because they do have both,
27:24
but I explained previously how actually broadcasting as an industry has been successful also because of how easy it is to perform ingestion using native clients instead, like OBS or XSplit or others. And so I wanted to play a bit with this, but unfortunately, OBS itself is not an option,
27:43
because even though there is a version of OBS that does support WebRTC implemented by Cosmos Software, it currently does not implement WIP. So it did implement WIP in the past, but not right now. At the moment, it only supports MilliCast, and so it was not an option. So I decided to instead play a bit with the WebRTC being stuck within Gstreamer for the purpose instead,
28:05
so that I could create a native application that used a native WebRTC stack in order to then implement a native WIP client instead. And the main reason why I chose Gstreamer is, first of all, because I used it in the past, in this case for JamRTC, which is an application that I actually mentioned a bit in a presentation last year
28:25
when I talk about how to use WebRTC with music at FOSDEM, but also because it's a very modular and very powerful piece of software that can be used with different codecs, different capture sources. And so it was definitely an interesting aspect to take into account,
28:42
which, as I'll explain later, actually proved to be quite useful. And this application is also completely open source. And without bothering you with the details about the implementation too much, basically, I ended up with a command-line client that basically means that the application expects you to provide all the information
29:00
to talk to the WIP client in advance, and then once you do that, it does its work and starts streaming to a WIP server in the first place. More specifically, some interesting aspects to take into account are, of course, the URL of the WIP server to talk to, so the WIP endpoint we've seen before, possibly the token to use for authorization purposes, and then the audio and video pipeline to use are completely up to the user,
29:24
which means that if you use the Gstreamer syntax for that, it's very easy to use your own capture sources to encode the media or however you want to do it, as long as you do it in a way that is compliant with the WebRTC specification, and so on and so forth. And for instance, an example is provided in the next slide,
29:42
where I can, for instance, use the WIP client to stream to a WIP server and a specific endpoint and then capture some testing audio and video devices and then encode them via Opus and VP8, packetize them accordingly, and then the WebRTC is stuck in the WIP client,
30:03
so based on WebRTCBean takes care of the rest and makes sure that it is streamed via WebRTC on the other end, which means that eventually we end up in a scenario that can be represented as in this diagram, where we have basically a WIP client that uses the WIP protocol
30:23
to talk to a WIP server. If the WIP server is actually acting as a front-end to Janus, it will use the Janus API to set up a session accordingly, and eventually the end result is that the WIP client will send media via WebRTC to a Janus instance.
30:40
And of course, this is just an example using Janus, but there are different endpoints, different implementations that use different backends and different frameworks, and I'll mention actually a couple in the next few slides. And again, the WIP client is a native application, and more specifically, it's a command line application, so when you launch it, you are basically presented with debugging,
31:03
with some debugging information, so it tells you about how the negotiation is going, how the ICE and the TLS process go, and eventually, if everything works as expected, something should happen. So in this case, for instance, I was using my WIP client as a way to talk to a local Janus instance,
31:21
and so since the WIP server that uses Janus acts as a frontend to a video room instance, the easiest way to check whether it is working is basically to just join via web browser the same video room and check whether or not you see something. And in this case, for instance, I was looking at the bouncing ball
31:42
that Gstreamer was generating, which meant that the WIP ingestion was actually doing its job effectively. And as I was anticipating, these are definitely not the only WIP implementations out there. Actually, WIP has been of interest to several different implementers, and so, for instance, as far as servers are concerned,
32:01
I implemented my version based on Janus, but there's also an implementation in Galin by Julius Kroposchek. Galin is an SFU based on Pion. Sergio Garcia Murillo integrated all these in Millicast, which is a WebRTC CDN, and there's also another integration in another Pion-based SFU called dead SFU.
32:21
And as far as clients are concerned, there is even more choices, because besides my Gstreamer-based client, there are a few other clients in different languages and frameworks as well, so a couple based on JavaScript, so easy to use in a browser, but there's also a sample stack based on Pion by Gustavo Garcia, one based on the Python IORTC stack that is called Free Switch by Alberto Trastoy,
32:46
and even one that you can use on a Raspberry using a Java stack that Tim Pantone wrote as part of Pipe, which is called Weepy. And we actually made some tests to check whether or not all of these applications could interact with each other in an hackathon recently,
33:03
so every ITF meeting is always preceded by a week of a so-called hackathon, where different implementers work together in order to figure out whether or not the specifications that are being worked on in the ITF make sense from a code perspective, and so it made a lot of sense for the Weep implementers to meet together
33:22
and check whether or not we were all on the same page as far as the implementations were concerned, and luckily that seemed to be indeed the case, because basically most of the implementations were able to talk to each other quite easily, and when we started, we even had less green faces than you see in the slide over there, so we had to work a bit before getting there,
33:41
but it was definitely a very good opportunity to improve things, and actually most of the red faces that you see over there were not really related to the Weep specification itself, but mostly to other incompatibilities. So, for instance, a client only supporting VP8 and a server only supporting H.264, for instance, and in that case, there's little that Weep can do.
34:01
I was also interested in checking whether or not Weep could be used in broadcasting workflows in the first place, so again, I mentioned frameworks like OBS and other things like this that make it so easy for streamers to do their job, and as I explained, OBS WebRTC is not an option there, at least at the moment,
34:20
and no popular streamer tool actually currently supports Weep at all because it's a very new specification, of course, and it may take a bit of time because it does require WebRTC stack, which may not be a trivial thing to do, so I decided to instead focus a bit on a more loose integration instead, so trying to use the tools as they exist today
34:42
and then somehow make them work with my Gstreamer-based Weep client to turn them into WebRTC streams instead, and I decided to rely on a protocol called NDI for the purpose, and I talked a bit about this in a couple of different blog posts
35:00
if you're interested in learning more, and if you don't know much about what NDI is, it's basically a royalty-free standard that stands for Network Device Interface by NewTek that basically allows you to perform a live exchange of multimedia streams within the same LAN, so it basically allows you to exchange multichannel and uncompressed media streams that are of high quality
35:22
within the same LAN using mDNS for service discovery, so to find out which devices are available and which streams are available, and it has a native SDK that makes it very easy to integrate in different applications and devices, and it's actually very widely spread in the broadcaster industry because it's basically natively supported in many devices
35:43
and streamer tools that are available out there, and I actually worked a lot on this in the past few months, mostly from a WebRTC to NDI perspective, mostly because I wanted to check how I could do WebRTC ingestion, for instance, to do remote interviews
36:01
and then make them available in broadcaster tools for processing, let's say. In this case, we are interested in a different approach instead, which is NDI to WebRTC, because we want to use a popular tool to create a stream the way that we are used to, but then we want to distribute it via WebRTC,
36:20
and the good news is that there is indeed a cool plugin in Gstreamer that does support NDI, and the link is available over there, and since our WIP client is based on Gstreamer too, it means that it's very easy to then use and this plugin has a way to capture NDI streams and make them available for the transcoding process to WebRTC,
36:44
and of course we need something that generates NDI in the first place, but that's something that is quite common in broadcasting and streaming tools, which is why I chose OBS for the purpose, mostly because it's another open source tool that is very commonly used in the broadcasters industry
37:01
and does have an NDI plugin for both NDI input and output, which means that to do my test, I just basically had to ensure that I specified OBS to create an NDI output. I created my scene using a set of cheesy different animations to create something that looked like something a typical streamer would do,
37:26
and then I told my WIP client to capture the NDI stream from that remote OBS application, which was actually running on a different laptop, and send it to a WIP server instead, which means that I turned this diagram that we've seen before
37:42
to this diagram instead, where basically I was creating my scene in OBS and recording me playing guitar and generating an NDI stream out of that, and then the WIP client would use WIP to establish a session with the WIP server and with Janus and then send the media to Janus eventually, which ended up in something like this,
38:02
which is a demo that I made recently at a ClueCon dangerous demo session, where basically I implemented a very simple WebRTC concert that started from an OBS session in the first place. And I actually also used it recently in a ComCon presentation that I made, where I actually went more in detail about how this all worked.
38:21
And in this case, for instance, you can see me and then engaging in real time because I was sending the stream via OBS and it was talking back to me via Broadcaster. And Broadcaster is based on Janus and does support WIP as well, which was a very easy way, an interesting way to see it all working. And of course, we've talked a lot about ingestion,
38:42
but broadcasting the stream is then definitely another important aspect because ingestion only gives us the first step, a very important first step, but this is indeed only the first step. So once we get the stream to the video room plugin in Janus, for instance, or to any other WIP server,
39:01
then we need to figure out how to actually distribute this. And when it comes to Janus, the Janus video room plugin is definitely not a good option there because it's more optimized for a conferencing scenario than for broadcasting scenarios. So it doesn't work well if you need to distribute the stream to a hundred people or a thousand or a million, let's say.
39:20
The Janus streaming plugin is indeed a much better choice for the job instead, mostly because it's more natively optimized for doing one-to-many. And it's a nice way to communicate with the video room somehow, as I'll explain in a few slides. And this was actually at the very basis of my PhD thesis, so the one that I teased you a bit about in the first slides,
39:43
which was about a framework called SoLEI, which is an acronym for streaming on large-scale events over internet clouds. The idea being that if you have a WebRTC stream that you want to distribute to a very large audience, a tree-based distribution of the stream could actually do the job.
40:02
And if you assume that only the ingest and the edges are WebRTC and everything in the middle is just RTP, then the distribution is much easier to do, mostly because working with RTP at the intermediate layers are several advantages, like no need to actually worry about the WebRTC overhead
40:20
in that sense, mostly because a lot of applications do understand RTP but do not understand WebRTC. And having a lot of applications that you can take advantage of can be a very important tool. And most importantly, since you are just working with RTP and UDP, you can also just work with the streams at the network level, so you could take advantage of multicast, for instance,
40:41
which would be a very interesting accelerator in that sense. And to get things working, you only need RTP forwarding to get the ball moving. And RTP forwarding is something that I talked about a bit also at FOSDEM a few years ago, and it's something that the WIP server does support. I actually added support to it a few weeks ago.
41:03
And as I mentioned, I did talk about RTP forwarders in a presentation I made a couple of years ago at FOSDEM. And just to give you a quick understanding of how this works, it's basically the idea that if you have a WebRTC participant sending media to JANUS, and then JANUS can, of course, send these to other WebRTC participants,
41:22
but using RTP forwarders, you can also relay the RTP streams externally to an external application instead for different purposes, which may be, for instance, remote processing, identity verification, external recording, or so on and so forth, but also just feeding the media to another JANUS instance instead,
41:41
for instance, using the JANUS streaming plugin. And in that case, you are basically decoupling the ingestion from the distribution part, which means that you can either use the same server or different servers entirely, because in that case, you basically have an RTP stream that can become a WebRTC broadcast that multiple people can subscribe to. So having a look at it from a tree-based perspective,
42:03
it means that if we have a tree-based distribution like this, where we have a stream being ingested and a lot of viewers that actually want to access it, what this means is that if you have WebRTC at the edges, you can do, for instance, WIP here to do the ingestion part,
42:21
you can just do a plain RTP-based distribution internally, and then you just use WebRTC at the edge instead only at the edge to do the actual physical distribution to a wide audience. And of course, the wider the tree and the more servers you have at the edges, the larger the audience can be independently of how powerful the ingestion server is in the first place.
42:48
As long as you distribute the stream internally, it should work and do the job. And as I was saying, multicast could actually be of great help here as well, most importantly because both the video room plug-in in RTP forwarders
43:01
and the streaming plug-in as far as receiving media is concerned do have support for multicast, which means that it's very easy to RTP forward media to a multicast group and then distribute it to multiple streaming instances in the first place. And the moment that you actually have to, for instance, feed,
43:22
make sure that two different multicast groups can communicate with each other so that you can widen the group across, let's say, different data centers, for instance. In that case, the bridging is something that you can easily do at the network level instead. And this is basically all. So I just wanted to give you a very quick introduction to WebRTC broadcasting in the first place,
43:42
especially focusing more on ingestion because that's where most of the standardization is working on, but also teasing you a bit on the distribution part, which is definitely an important aspect that a lot of people are working on and that we have started also focusing a bit more on ourselves. So I hope this was an interesting presentation and that basically I gave you enough,
44:08
I made you curious enough to basically start looking into it yourself. And of course, I'm open to any kind of question that you may have in that regard. Thank you.