Title: Internet Quality-of-Service (QoS)
1Internet Quality-of-Service (QoS)
- Henning Schulzrinne
- Columbia University
- Fall 2003
2Quality of Service
- Motivation
- Service availability
- Elementary queueing theory
- Traffic characterization control
- Integrated services (RSVP, NSIS)
- Differentiated services (DiffServ)
3What is quality of service?
- Many applications are sensitive to the effects of
delay ( jitter) and packet loss - may have floor below which utility drops to
zero - The existing Internet architecture provides a
best effort service. - All traffic is treated equally (generally, FIFO
queuing) - No mechanism for distinguishing between delay
sensitive and best effort traffic - Original IP architecture (IPv4) has TOS
(type-of-service byte) in packet header - RFC 795 defined multiple axes (delay,
throughput, reliability) - rarely used outside some (rumor) military
networks
utility ()
bandwidth
4Motivation
- QoS ? service availability
- not good enough if all but 2 minutes of my phone
call sound perfect - Support mission-critical applications that cant
tolerate disruption - VoIP
- VPNs (LAN emulation)
- high-availability computing
- Charge more for business applications vs.
consumer applications
5Service availability
- Users do not care about QoS
- at least not about packet loss, jitter, delay
- rather, its service availability ? how likely is
it that I can place a call and not get
interrupted? - availability MTBF / (MTBF MTTR)
- MTBF mean time between failures
- MTTR mean time to repair
- availability successful calls / first call
attempts - equipment availability 99.999 (5 nines) ? 5
minutes/year - ATT (2003)
- Sprint IP frame relay SLA 99.5
6Availability PSTN metrics
- PSTN metrics (Worldbank study)
- fault rate
- should be less than 0.2 per main line
- fault clearance ( MTTR)
- next business day
- call completion rate
- during network busy hour
- varies from about 60 - 75
- dial tone delay
7Example PSTN statistics
Source Worldbank
8Measurement setup
9Measurement setup
- Active measurements
- call duration 3 or 7 minutes
- UDP packets
- 36 bytes alternating with 72 bytes (FEC)
- 40 ms spacing
- September 10 to December 6, 2002
- 13,500 call hours
10Call success probability
- 62,027 calls succeeded, 292 failed ? 99.53
availability - roughly constant across I2, I2, commercial ISPs
11Overall network loss
- PSTN once connected, call usually of good
quality - exception mobile phones
- compute periods of time below loss threshold
- 5 causes degradation for many codecs
- others acceptable till 20
12Network outages
- sustained packet losses
- arbitrarily defined at 8 packets
- far beyond any recoverable loss (FEC,
interpolation) - 23 outages
- make up significant part of 0.25 unavailability
- symmetric A?B ?? B?A?
- spatially correlated A?B ? ? A?X?
- not correlated across networks (e.g., I2 and
commercial)
13Network outages
14Network outages
15Outage-induced call abortion probability
- Long interruption ? user likely to abandon call
- from E.855 survey Pholding e-t/17.26 (t in
seconds) - ? half the users will abandon call after 12s
- 2,566 have at least one outage
- 946 of 2,566 expected to be dropped ? 1.53 of
all calls
16Conclusions from measurement
- Availability in space is (mostly) solved ?
availability in time restricts usability for new
applications - initial investigation into service availability
for VoIP - need to define metrics for, say, web access
- unify packet loss and no Internet dial tone
- far less than 5 nines
- working on identifying fault sources and
locations - looking for additional measurement sites
17Whats next?
- Existing SLAs are mostly useless
- too many exceptions
- wrong time scales month vs. minutes
- no guarantees for interconnects
- Existing measurements similarly dubious
- Limited ability to learn from mistakes
- what are the primary causes of service
unavailability? - what can I do to protect myself multi-homing
via same fiber? diverse access mechanisms? - Consumers of services have no good ways to
compare service availability - only some very large customers may get access to
carrier-internal data - Thus, market failure
- Need published metrics
- similar to switch availability reporting
18What's hard to scale (and not)
- Signaling does not have be hard
- one message, on a reliable peering channel or IP
router alert option - NSIS effort in the IETF?
- YESSIR RTCP-based signaling
- 700 MHz Celeron processor
- 10,000 flow setups/second ? 300,000 softstate
flows - If scaling matters, sink-tree based reservation
(BGRP)
19Diversity is good
- Unlike routing, no need for single signaling
protocol - multicast is much harder
- dumb end devices
- edge "pop-up" ? only show up in edge nodes
20AAA
- Signaling can easily be done in ASIC (no harder
than IP), but - need cryptographic verification of request
- need interface to Authentication, Authorization,
Accounting (AAA) - cross-domain authentication ? hard, but 3G
networks will do it anyway - easier if both sides ask their own access router
- see also iPass for dial-up, OSP (open settlement
protocol)
21AAA example
reserves for both directions
Internet
AR1
AR2
source
destination
signs request
Cell phone model both sides pay
22Reservation scaling
- Example every long-distance call in the US uses
VoIP with per-flow resource reservation - 2000 567.4 billion minutes _at_ 10 minutes each ?
1,800 calls/second - single mySQL server can sustain 5002,000
queriesupdates/second
23Business models don't work
- Most of the time, "tin" service is no worse than
"platinum" service - can't impress others with platinum AmEx card
- no frequent flyer bonuses
- ? everybody switches only when the network is in
bad shape
24Resource control reservation
Application
Tspec
Y/N
Reservation Protocol
Admission Control
Routing Protocols DBs
Traffic Control DB
Packet Scheduler
Classifier route selection
Data
USC EE-S 555
25RED (Random Early Detection)
- TCP synchronization effect ? during overload,
many connections lose packets and go into
slowstart - RED start dropping based on average queue
occupancy (vs. instantaneous queue occupancy) - Parameter setting critical and non-trivial
- See also RFC 2309
26ECN (Explicit Congestion Notification)
- Extension of RED mark instead of drop
- RFC 2481 (A Proposal to add Explicit Congestion
Notification (ECN) to IP) - IP TOS6 bit indicates congestion ECN
- IP TOS7 bit indicates support for mechanism
- Needs cooperation of TCP (or similar protocols)
- TCP should act almost as if packet was dropped
- ½ congestion window
- but dont do slow-start
ECT1 ECN0
ECT1 ECN1
TCP ACK ECN echo
27Next steps in signaling (NSIS)
- RSVP not widely used for resource reservation
- but is used for MPLS path setup
- design heavily biased by multicast needs
- marginal and after-the-fact security
- limited support for IP mobility
- Thus, IETF NSIS working group developing new
framework for general state management protocol - resource reservation
- NAT and firewall control
- traffic and QoS measurement
- MPLS and lambda path setup
- Split into two components
- NSLP services
- NTLP transport
28NSIS
- On-path vs. off-path
- off-path ? bandwidth brokers
- Discovery of next NTLP or NSLP hop
- use router alert option
QoS
NAT/FW
measure
NTLP
SCTP
UDP
TCP
SCTP