On May 18, we gave a presentation at the SF WebRTC Meetup hosted by Twilio detailing how Airtime leverages WebRTC to power the media stack for our mobile group video chat. Although WebRTC provides the building blocks for our media stack, there are additional challenges to providing a great multiparty experience on mobile across a wide range of networks and device types.
The slides for the presentation are available here.
The talk starts with why the fully-connected mesh peer-to-peer model, which WebRTC provides by default, breaks down for multiparty video use-case. Specifically, mobile devices do not have the resources (network and CPU) to encode and send streams to all the group participants. To overcome this, we use a star topology with a media server in the middle; this allows the publisher to only send and encode a single stream. The rest of the talk dives into the details of two key functions the media server provides: adapting to network conditions and adapting to heterogeneous device capabilities of the participants in a group video chat.
End-to-End Network Adaptation
Imagine two users video chatting on a great LTE network. Since bandwidth is not an issue, they will send and receive high quality video streams (high frame rate and resolution). If a third user joins and is on a lower quality connection (lower bandwidth), we need to make sure that person can participate. However, since the other users have more bandwidth available, we don’t want to degrade their experience. In order to balance this, the media server independently adapts the streams received by each participant to make sure that each gets the best quality possible for their current network conditions. We talk about some of the strategies the media server uses to achieve this goal and go through particular examples, such as reducing frame rate by dropping temporal layers and reducing resolution through transcoding.
End-to-End Device Adaptation
The next section of the talk is about adapting video streams based on device capabilities. Encoding and decoding video streams are CPU intensive operations. Mobile devices in use today have a wide range of processing power, so we need to ensure that encoding/decoding does not overwhelm the device and prevent our application from being responsive. To achieve this, our media server tracks the capabilities of each device participating in a group video chat and adapts the frame rate and resolution for each. In addition, the media server is aware of the resolution of the video streams being displayed and other high-CPU application activities (e.g., playing a YouTube video) so that it can use those to guide its adaptation. The talk ends by going through a few scenarios that demonstrate how this device adaptation behaves.
More to Come
This is only a small subset of the interesting technology we have developed for our media stack. We will continue to present more details in the future, so follow this blog in order to be notified when there are updates. If you have questions or feedback feel free to reach out using the comments section or firstname.lastname@example.org.
Originally published at https://medium.com on June 28, 2016.