What is WebRTC ? (Part 2~ Signalling )

SUSMIT
Huddle 01
Published in
8 min readJan 7, 2021

--

Signalling is a process where a peer discovers other peers over internet and exchange control information like kind of media being sent, its format, the transfer protocol being used, the endpoint’s IP address and port , codecs . Signalling methods are not specified by WebRTC and one can use even copy/paste , email or postal delivery to achieve connectivity.

1 . Signalling

Initiating Peer-to-Peer Connection

Before initiating peer to peer connection there are lots of challenges that need to be dealt with as compared to client/peer to server. Major difference between client/peer to server (XHR, EventSource, WebSocket)and peer to peer call is that client /peer to server rely on HTTP/HTTPS to negotiate the parameters of connection. Server has a publicly routable IP address as compared to remote peer and is always listening to requests.

Some of the other problems that we may also encounter are

  • Difficulty in reaching directly to peers as they can be behind the Layers of NAT
  • Accounting the unreliable connection between peer.
  • Accounting the unreachability / offline for remote peer.
  • Remote peer might not be be interested in connection.
  • Remote peer might be already engaged with other connections

Therefore to solve these issue we are required to

  1. Notify the other peer of the intent to open a peer-to-peer connection, so it starts listening for incoming packets.
  2. Identify potential routing paths for the peer-to-peer connection on both sides of the connection and relay this information between peers.
  3. Exchange the necessary information about the parameters of the different media and data streams — protocols, encodings used, and so on.

All these is taken care in Signalling process where peers exchange control information such as

  • Control messages used to set up, open, and close the communication channel, and to handle errors.
  • Information needed in order to set up the connection: the IP addressing and port information needed for the peers to be able to talk to one another.
  • Media capability negotiation: what codecs and media data formats can the peers understand? These need to be agreed upon before the WebRTC session can begin.

Since there lots of connectivity option available for signalling channels there is no specific standards for same defined specially. This give developer to leverage or interoperate with existing protocols powering other existing communication infrastructure such as

  • Session Initiation Protocol (SIP) Application-level signalling protocol, widely used for voice over IP (VoIP) and videoconferencing over IP networks.
  • Jingle Signalling extension for the XMPP protocol, used for session control of voice over IP and videoconferencing over IP networks.
  • ISDN User Part (ISUP) Signalling protocol used for setup of telephone calls in many public switched telephone networks around the globe.

One can even make there own custom protocols just for signaling and it doesnt even need to exist on network . It can also be written on paper plane and flown towards other peer.

SDP protocol

The exact configuration of peers is called a session description and is handle by SDP protocol and includes information about the kind of media being sent, its format, the transfer protocol being used, the endpoint’s IP address and port, settings, bandwidth information, and other metadata information needed to describe a media transfer endpoint

SDP protocol contains information in form of key/value pairs. The information primarily exchanged via SDP is

  • IPs and Ports of peers where they are reachable
  • No of audio and video tracks peer wishes to send
  • Types of audio/video codec of peers
  • Values used while connecting (uFrag/uPwd)
  • Values used while securing (certificate fingerprint)

Interpreting SDP :- Every line in a Session Description will start with a single character, ie key. It will then be followed by an equal sign. Everything after that equal sign is the value followed by new line.

The Session Description Protocol defines all the keys that are valid. You can only use letters for keys as defined in the protocol. These keys all have significant meaning, which will be explained later.

Take this Session Description excerpt.

a=my-sdp-value
a=second-value

You have two lines. Each with the key a. The first line has the value my-sdp-value, the second line has the value second-value.

WebRTC only uses some SDP keys Not all key values defined by the Session Description Protocol are used by WebRTC. The following are the only keys you need to understand. Don’t worry about fully understanding yet, but this will be a handy reference in the future.

  • v - Version, should be equal to ‘0’
  • o - Origin, contains a unique ID useful for renegotiations
  • s - Session Name, should be equal to ‘-’
  • t - Timing, should be equal to ‘0 0’
  • m - Media Description, described in detail below
  • a - Attribute, free text field this is the most common line in WebRTC
  • c - Connection Data, should be equal to ‘IN IP4 0.0.0.0’

Media Descriptions in a Session Description A Session Description can contain an unlimited amount of Media Descriptions. A Media Description definition contains a list of formats. These formats map to RTP Payload Types. The actual codec is then defined by an Attribute with the value rtpmap in the Media Description. Each Media Description then can contain an unlimited amount of attributes.

Take this Session Description excerpt.

v=0
m=audio 4000 RTP/AVP 111
a=rtpmap:111 OPUS/48000/2
m=video 4000 RTP/AVP 96
a=rtpmap:96 VP8/90000
a=my-sdp-value

You have two Media Descriptions, one of type audio with fmt 111 and one of type video with fmt 96. The first Media Description has only one attribute. This attribute maps the Payload Type 111 to Opus. The second Media Description has two attributes. The first attribute maps the Payload Type 96 to be VP8, and the second attribute is just my-sdp-value

The following brings all the concepts we have talked about together. These are all the features of the Session Description Protocol that WebRTC uses. If you can read this you can read any WebRTC Session Description!

v=0
o=- 0 0 IN IP4 127.0.0.1
s=-
c=IN IP4 127.0.0.1
t=0 0
m=audio 4000 RTP/AVP 111
a=rtpmap:111 OPUS/48000/2
m=video 4002 RTP/AVP 96
a=rtpmap:96 VP8/90000
  • v, o, s, c, t are defined but they do not affect the WebRTC session.
  • You have two Media Descriptions. One of type audio and one of type video.
  • Each of those has one attribute. This attribute configures details of the RTP pipeline, which is discussed in the ‘Media Communication’ chapter.

How WebRTC uses SDP

The next piece of the puzzle is understanding how WebRTC uses the Session Description Protocol. WebRTC applications do not have to deal with SDP directly. The JavaScript Session Establishment Protocol (JSEP) abstracts all the inner workings of SDP behind a few simple method calls on the RTCPeerConnection object.

What are Offers and Answers? When a user starts a WebRTC call to another user, a special description is created called an offer. This description includes all the information about the caller’s proposed configuration for the call. The recipient then responds with an answer, which is a description of their end of the call. In this way, both devices share with one another the information needed in order to exchange media data. This exchange is handled using Interactive Connectivity Establishment (ICE, a protocol which lets two devices use an intermediary to exchange offers and answers even if the two devices are separated by Network Address Translation (NAT).

Each peer, then, keeps two descriptions on hand: the local description, describing itself, and the remote description, describing the other end of the call.

The offer/answer process is performed both when a call is first established, but also any time the call’s format or other configuration needs to change. Regardless of whether it’s a new call, or reconfiguring an existing one, these are the basic steps which must occur to exchange the offer and answer, leaving out the ICE layer for the moment:

  1. The caller captures local Media via navigator.mediaDevices.getUserMedia()
  2. The caller creates RTCPeerConnection and called RTCPeerConnection.addTrack() (Since addStream is deprecating)
  3. The caller calls RTCPeerConnection.createOffer() to create an offer.
  4. The caller calls RTCPeerConnection.setLocalDescription() to set that offer as the local description (that is, the description of the local end of the connection).
  5. After setLocalDescription(), the caller asks STUN servers to generate the ice candidates
  6. The caller uses the signalling server to transmit the offer to the intended receiver of the call.
  7. The recipient receives the offer and calls RTCPeerConnection.setRemoteDescription() to record it as the remote description (the description of the other end of the connection).
  8. The recipient does any setup it needs to do for its end of the call: capture its local media, and attach each media tracks into the peer connection via RTCPeerConnection.addTrack()
  9. The recipient then creates an answer by calling RTCPeerConnection.createAnswer().
  10. The recipient calls RTCPeerConnection.setLocalDescription(), passing in the created answer, to set the answer as its local description. The recipient now knows the configuration of both ends of the connection.
  11. The recipient uses the signaling server to send the answer to the caller.
  12. The caller receives the answer.
  13. The caller calls RTCPeerConnection.setRemoteDescription() to set the answer as the remote description for its end of the call. It now knows the configuration of both peers. Media begins to flow as configured.

SDP Values used by WebRTC This list is not extensive, but this is a list of common attributes that you will see in a Session Description from a WebRTC Agent. Many of these values control the subsystems that we haven’t discussed yet.

group:BUNDLE Bundling is the act of running multiple types of traffic over one connection. Some WebRTC implementations use a dedicated connection per media stream. Bundling should be preferred.

fingerprint:sha-256 This is a hash of the certificate the peer is using for DTLS. After the DTLS handshake is completed you compare this to the actual certificate to confirm you are communicating with whom you expect.

setup: This controls the DTLS Agent behaviour. This determines if it runs as a client or server after ICE has connected.

  • setup:active - Run as DTLS Client
  • setup:passive - Run as DTLS Server
  • setup:actpass - Ask other WebRTC Agent to choose

ice-ufrag This is the user fragment value for the ICE Agent. Used for the authentication of ICE Traffic.

ice-pwd This is the password for the ICE Agent. Used for authentication of ICE Traffic.

rtpmap This value is used to map a specific codec to a RTP Payload Type. Payload types are not static so every call the Offerer decides the Payload types for each codec.

fmtp Defines additional values for one Payload Type. This is useful to communicate a specific video profile or encoder setting.

candidate This is an ICE Candidate that comes from the ICE Agent. This is one possible address that the WebRTC Agent is available on. These are fully explained in the next chapter.

ssrc A SSRC defines a single media stream track.

label is the id for this individual stream. mslabel is the id for a container that can multiple streams inside of it.

Inspiration for content and images were taken from references mentioned below.

References

--

--