Title: Media: Voice and Video in your SIP Environment
1Media Voice and Video in your SIP Environment
Jitendra Shekhawat
2Agenda
Objective Introduction of Media in the SIP
environment.
- Common Audio and Video Codecs
- Media/Codec Negotiations
- Tuning Your Network for Voice and Video
- QoS issues, metrics and user quality expectations
3IP Audio/Video Telephony Network
- Call Control SIP
- Media RTP
- Video H263, H264, MPEG4
- Audio G711, G723, G729, G726, AMR-NB, etc.
SIP Video Endpoints
SIP Soft Phone
SIP Desk Phone
SIP
SIP
RTP
RTP
PC Email Client
Multimedia Server
SIP
SIP
Broadband Users
RTP
RTP
RTSP
- Applications
- Video Mail
- Video Portal
- Live content streaming
CNN, ESPN, Bloomberg, live feed
4SIP Call Example
5Audio Video Codecs and Payload Types
6Media Transport
- RTP
- Real Time Transport Protocol
- media packet transport
- One stream per direction between endpoints
- RTCP
- RTP Control Protocol
- Provides quality information
- Generate reports to the network
7RTP Packet
RTP Datagram
RTP Datagram
RTP Datagram
IP Header 20 bytes
UDP Header 8 bytes
RTP Header 12 bytes
RTP Payload N bytes
Version 2 bits
Padding 1 bit
Extension 1 bit
CSRC count 4 bits
Marker 1 bit
Payload Type 7 bits
Sequence Number 2 bytes
Time stamp 4 bytes
Source Identifier 4 bytes
8RTCP Packet
- Receiver of RTP stream sends periodic updates to
the originator - Packet count
- Byte count
- Packet loss
- Timestamps to assess round-trip delay
- Jitter
9RTP Packet Payload size
Function of codec speed, frame-size
Frequency packets sent
-
-
-
-
- Example g.711, 20 ms frames 64000 bps X 20 msec
/ 8 160 byte payload
codec speed X frame size
Payload size
8 X 1000
bits/byte
msec / sec
10Media Stream (RTP) Bandwidth
- Packet size Header Payload
- Header Ethernet (IP UDP RTP) 38 (20
8 12) 38 40 bytes - Payload depends on codec
- Example g.711, 20 ms frames (50 packets/s)
- 160 byte payload (38 40) byte header
- IP bandwidth 200 byte/packet 80,000 bps ? 160
kbps for 2 way - Ethernet bandwidth 238 byte/packet 95,2000 bps
? 190.4 kbps for 2 way - Ethernet Preamble (8) Ethernet Header (14)
Ethernet CRC (4) Inter-frame gap (12) 38
11Codec Bandwidths
Coder Bitrate Encoded bandwidth
G.711 64 kbps 200-3400 Hz
G.723 5.4 or 6.3 kbps 200-3400 Hz
G.729A (20ms Packet) 8 kbps 200-3400 Hz
AMR 4.75 to 12.2 kbps 200-3400 Hz
AMR-WB Variable 6.6 up to 23.85 (non-continuous) 50 to 7000 Hz
AMR-WB Variable 6-36 kbps (mono) or 7-48 kbps (stereo) 50 Hz 7.2 kHz up to 50 Hz 19.2 kHz
iLBC 13.33 kbps for 30 ms, 15.20 kbps for 20ms 200-3400 Hz
12Codec Bandwidths
Coder IP Bandwidth / RTP stream
G.711 (30 ms Packet) 74.6 kbps
G.711 (20ms Packet) 80 kbps
G.711 (10 ms Packet) 96 kbps
G.723.1 (30ms Packet) 15.7 kbps
G.729A (20ms Packet) 24 kbps
AMR (20 ms) 20.4 - 28 kbps
AMR-WB (20ms) 22.4 39.6 kbps
AMR-WB (20ms) 22 52 kbps
iLBC (20ms or 30ms) 31.2 kbps or 24 kbps
13Video streams
I-frames (Key frames)
P-frames (predicted frames)
Frame Sequence
14Video Formats (IP vs. 3G)
- High resolution for IP networks
- More bandwidth available
- SIP Video Phones are generally CIF size (352
288 pixels) - Recommended CIF, 15 or 30fps, up to 384kbps
- Low resolution for 3G networks
- Total bandwidth limited to 64kbps
- Generally video audio is approx 52kbps
(12.2kbps AMR 40kbps H263) - 3G Mobile phones are generally QCIF size (176
144 pixels)
15Performance Issues
- Propagation Delay
- Time required to travel end to end across the
network - Transport Delay
- Time required to traverse network equipment
- Packetization Delay
- Time to digitize, build frames and undo at
destination - Jitter Delay
- Fixed delay by receiver to hold 1 or more packets
to damp variations in arrival times - Packet Loss
- Packet size impacts sound quality
16Jitter Delay
- Calculated on inter-arrival time of successive
packets - Average inter-arrival time
- Standard deviation
- Goal inter-arrival time inter-arrival time on
emitted packets - 3 phenomena effecting jitter
- Packet loss (threshold 5)
- Silence suppression
- Out of sequence packets
- Can be configured on most VoIP equipment
17Packet Fragmentation
- Audio RTP packets
- Not generally fragmented since packet size is
less than MTU - Video RTP packets
- A large frame is fragmented into a series of
packets for transmission over network - I-Frame fragmentation
- Receiver must receive all fragments to properly
reconstruct frame
18Packet Loss
- Audio
- Packet Loss Concealment (PLC)
- Mask effect of lost or discarded packets
- Replay previous packet or use previous packets to
estimate missing data - Key method for improving voice quality
- Packet Loss Recovery (PLR)
- Packet Redundancy
- Increased bandwidth
- Video
- I-Frame
- If a fragment is lost, subsequent P-Frames will
not be sufficient to reconstruct image at
receiver - Video conversion tools available to generate
files more suitable for real-time transmission
19G.107 to MOS mapping
20Codec Bandwidth and Voice Quality Comparison
Codec Payload Bit Rate Voice Quality
G.711 64 Kbps Excellent (MOS 4.2)
G.723 6.4 Kbps / 5.3 Kbps Good (MOS 3.9) Fair (MOS 3.7)
G.729 8 Kbps Good (MOS 4.0)
G.726 or G.721 16/24/32/40 Kbps 2/3.2/4/4.2
iLBC 13.33/15.2 kbps Good (MOS 4.0)
AMR-WB 6-36 kbps Good (MOS near 4.0)
21Network Issues?
22Network Issues Now What
- Determine the source of delay
- Codecs?
- Too many hops?
- Not enough bandwidth?
- Define means to reduce delay
- Provision smaller packet sizes
- Reduce hop count
- Change routing protocols used
- Keep monitoring
- Find problems first
- Objectively identify issues
23IP Header
24Traffic Shaping
25Conclusion
- Reliability
- Can calls be made when needed?
- Will call setup time match current environment?
- Will calls be disconnected?
- Quality
- Is the voice quality of the calls the same?
- Can the users tell the difference?
- Cost
- What are the cost benefits of VoIP?
- What equipment will be needed?
26Wrap-up
Q A / Quiz
27Frame Sizes
Format Dimension (H x W, pixels) gt1 bits/pixel
Sub-QCIF (SQCIF) 128 x 96
Quarter-CIF (QCIF) 176 x 144
CIF (Common Intermediate Format) 352 x 288
4CIF (4 x CIF) 704 x 576
16CIF (16 x CIF) 1408 x 1152