Title: A case study: IPTV SLA Monitoring
1A case study IPTV SLA Monitoring
Corso di Reti di Calcolatori II
- Giorgio Ventre
- The COMICS Research Group
- _at_
- The University of Napoli Federico II,
2Outline
- The general problem SLA, who cares?
- A business case for QoS
- Defining Service Level Agreements
- A Real-Life SLA monitoring service
- A case study IPTV SLA Monitoring
3Recent trends in the industry
- New emerging multimedia services both in fixed
and wireless networks - Traditional voice carriers are moving to NGN
- Essential to control costs and drive up revenues
- Triple play services Voice Video Data
- Video represents a key element of the service
portfolio - Price/quality balance must attract/retain users
- TV quality must compete with satellite and cable
4Challenges and quality issues
- Users are conditioned to expect high quality TV
pictures - Users unlikely tolerate poor/fair quality
pictures in IPTV - Early delivery of broadband services is
unfeasible due to the limited bandwidth compared
to cable and satellite - Compulsory data compression can potentially
degrade quality - Need for robust transmission to minimize
data-loss and delay
5Why Quality Assurance is a major issue?
- Because otherwise we wouldnt be here
- Quality Assurance adds a new perspective to the
flatness of the current market of triple-play
services - Quality measurement for service assurance
- End-to-end quality monitoring
- SLA based on quality delivered to end-user
- New business models and scenarios
6QoS vs QoE
- Quality of Service (QoS) refers to the capability
of a network to provide better service to
selected network traffic over various
technologies. QoS is a measure of performance at
the packet level from the network perspective. - Quality of Experience (QoE) describes the
performance of a device, system, service, or
application (or any combination thereof) from the
users point of view. QoE is a measure of
end-to-end performance at the service level from
the user perspective.
7From QoS to MOS
- MOS Mean Opinion Score
- Used in POTS to have a quantitative value for a
qualitative evaluation - How do you evaluate the quality you perceived
during your last service usage/access? - Very easy for simple services telephony
- Very complex for complex services multimedia
(sound vs video vs data vs mix) - Even more complex when quality of service depends
on the distribution network AND terminals AND
servers
8QoS evaluation
9Requirements
- Identify parameters contributing to a
satisfactory QoE - Define network performance requirements to
achieve target QoE - Design measurement methods to verify QoE
10Performance parameters
- IPTV service is highly sensitive to packet loss
- The impact of packet loss depends on several
factors - Compression algorithm (MPEG2, H.264)
- GOP structure
- Type of information lost (I, P, B frame)
- Codec performance (coding, decoding)
- Complexity of the video content
- Error concealment at STB
11Quality Measurement
- Quality Measurement
- Objective
- Pure computational
- Network performance
- Objective perceptual
- Measurements representative of human perception
- Traditional metrics such as PSNR, PLR, BER are
inadequate - Requirements for objective perceptual metrics
12Why Quality-Monitoring is hard?
- Measures have to be
- Time-based
- Remoted
- Distributed
- Sharp
- Highly etherogeneous environments (codecs, CPEs,
media-types, ) - Sampled measures?
- SLAs are not sampled.
- In order to ensure quality, measures have to be
carried out with quality
13Why Quality-Monitoring is hard?
- High impact also of content based factors
- MPEG performance depends on content pattern and
scene changes - Highly variable (movements, colours, lights)
scenes generates more data - Stallone vs Bergman
- or better
- Rambo vs The Seventh Seal
14Methods state of the art
- Full-Reference
- Reduced-Reference
- No-Reference
15Full-reference
- Measures are performed at both the input to the
encoder and the output of the decoder - Both the source and the processed video sequences
are available - Requires a reliable communication channel in
order to collect measurement data
16Reduced-Reference
- Extracts only a (meaningful) sub-set of features
from both the source video and the received video - A perceptual objective assessment of the video
quality is made - The transmitter needs to send extracted features
in addition to video data
17No-Reference
- Perceptual video quality evaluation is made based
solely on the processed video sequence - There is no need for the source sequence
- Measurements results are intrinsically based on a
predictive model
18Standards for voice quality assessment
- ITU-T P.862 (Feb. 2001)
- Full-reference perceptual model (PESQ)
- Signal-based measurement
- Narrow-band telephony and speech codecs
- P.862.1 provides output mapping for prediction on
MOS scale - ITU-T P.563 (May 2004)
- No-reference perceptual model
- Signal-based measurement
- Narrow-band telephony applications
19Standards for voice quality assessment
- ITU-T P.862.2 (Nov. 2005)
- Extension of ITU-T P.862
- Wide-band telephony and speech codecs(5 7Khz)
- ITU-T P.VQT (ongoing)
- Targeted at VoIP applications
- Uses P.862 as a reference measurement
- Models analyze packet statistics speech payload
is assumed
20Standards for video quality assessment
- ITU-T J.144 and ITU-R BT.1683 (2004)
- Full reference perceptual model
- Digital TV
- Rec. 601 image resolution (PAL/NTSC)
- Bit rates 768 kbps 5 Mbps
- Compression errors
21Standards for video quality assessment
- IETF RFC 4445 (April 2006) A proposed Media
Delivery Index (MDI) - MDI can be used as a quality indicator for
monitoring a network intended to deliver
applications such as streaming media, MPEG video,
Voice over IP, or other information sensitive to
arrival time and packet loss. - It provides an indication of traffic jitter, a
measure of deviation from nominal flow rates, and
a data loss at-a-glance measure for a particular
video flow.
22Our research
- Objectives
- Real-time computation of achieved quality level
- Quality as perceived by the user
- Per-single-user measurements
- Light computation (about 5 overhead)
- Approach
- Media playout and measures are both part of an
integrated process - Measurement subsystems exposes a consistent
abstract interface - Measurements results are high-level quality
indicators
23VQM (1/2)
- No-Reference
- Evaluates the video quality as perceived by the
user - QoS ? QoE
- Based on MPEG2
- Light parsing
- Doesnt parse motion vectors, DCT coefficients,
and other macroblock-specific information - degradation due to packet losses is estimated
using only the high-level information contained
in Group of Pictures, frame, and slice headers
24VQM (2/2)
- Does not need to make assumptions concerning how
the decoder deals with corrupted information - i.e. what kind of error concealment strategy it
uses. - Based on this information it determines exactly
which slices are lost - GoP loss-rate
- Frame loss-rate
- Slice loss-rate
- Differentiation per frame type (I, P, B)
- It computes how the error from missing slices
propagates spatially and temporally into other
slices - Appropriate for measuring video quality in a
real-time fashion within a network
25Parsing method (1/2)
GOP
I
B
B
P
B
B
P
B
B
P
B
B
Frame
26Parsing method (2/2)
27QoE vs. MOS
- Mapping between Quality of Experience evaluation
and MOS (Mean Opinion Score ITU/T P.800) value
28MOS vs SLAs
- Knowledge of the function MOS(t) directly enables
SLAs monitoring
29Experimental testbed
Controlled-Loss Router
Video Server
Video Client Quality Meter
Dropped Packets
Video Characteristics MPEG2-TS Constant Bit
Rate 3.9Mbps
30High Quality
Throughput 5.0 Mbps
31Medium Quality
Throughput 3.9 Mbps
32Low Quality
Throughput 3.0 Mbps
33From SLA to PLA Provisioning Level Agreements
Scuola di Dottorato in Ingegneria
Informatica Palermo, 17-28 settembre 2007
- Giorgio Ventre
- The COMICS Research Group
- _at_
- The University of Napoli Federico II,
-
- ITEM Laboratory, Italian University Consortium on
Informatics
34A service model for resilient networks
- We are moving from Quality of Service to a more
complex concept of qualityresiliency
35Quality of future distributed services
- The most important QoS characteristic for future
distributed services is arguably going to be
resilience - Resilience is the property of a system to restore
services to normal after a failure (as fast as
the service users need) - However, an investigation into resilience reveals
the importance of considering risk when
developing our future research agenda
36The need for resilience
- We are increasingly reliant on the Internet and
on networked systems in general (including of
course the Web) - This is happening in businesses and indeed in
every walk of life including the home - The EU is promoting and developing the
Information Society, which is based on
communication technologies and systems
37Interdependence of networks (1)
- Not only are we dependent on networks
- But all sorts of other networks are, too
- Electricity, water, gas
- Corporate networks
- Banking networks
- Health networks
- Information networks are crucial to the
successful operation of other networks
38Interdependence of networks (2)
- Interdependencies of critical infrastructures
- Power nets and information nets
- The virtual utility
- Rune.Gustavsson_at_bth.se
- The introduction of proper supporting ICT of
power nets forming a virtual utility is an
important instance of networked enabled
capabilities (NEC) systems. Furthermore, by
pursuing this task we can gain experiences and
develop models and technologies that besides
addressing societal critical systems also can be
useful in other efforts on development and
maintenance of complex systems. - In Italy, Report del Comitato sulla Protezione
delle Infrastrutture Critiche, Presidenza del
Consiglio dei Ministri, 2004
39Internet meltdown?
- Article in The Independent (UK) 8 September 2004
- The internet is becoming a utility Karl
Auerbach - As a utility, the net will have to live up to
different, more stringent standards than its
previous uses as an academic and research
playground, and then a mainstream experiment.
People are building billion-dollar businesses,
governments are turning themselves digital, and
in the meantime there isn't so much as a
service-level agreement to guarantee that the
most basic level of connectivity will be there
tomorrow. - If the technologists no longer believe they can
fix it by themselves, the Internet really has hit
a meltdown.
40Vulnerabilities
- The Internet was originally designed to withstand
basic link and switch failures - But it was never envisaged as a utility (i.e.
offering near-perfect availability), supporting
commercial initiatives and acting as a vital
infrastructure - Whatever vulnerabilities are present in the
infrastructure may be inherited by the
applications it aims to support
41Attacks
- Complex, well engineered systems should be built
by keeping in mind faults - Today, we need to keep into account other
disruption sources - Network attacks of all sorts are increasing in
variety and number - Spam / junk
- Viruses, Worms etc.
- DDoS attacks
- Physical
- These cause huge costs in time and energy, but no
coherent approach to a solution
42Multiple levels
- This is of course a multi-level problem
- Physical layer
- Networking / IP
- Middleware layer / O.S.
- Web / applications
- A solution to achieving resilience needs to apply
at all levels this is a grand challenge for
future networked systems infrastructure
43Complexity
- This is a distributed computing problem
- According to Leonard Kleinrock, we have no
suitable theory to handle this, because of its
inherent complexity - This is compounded by nomadicity
-
- complicated difficult to study but fit for
purpose, static whereas complex growing,
evolving
44Complexity not simplicity
- In spite of all hype on global network
architectures, today we face a complex,
heterogeneous reality - Fixed access networks POTS, xDSL, CATV, MetroLAN
- Mobile, wireless access networks GPRS, UMTS,
WiFi, Wimax - Interoperability with terrestrial digital
broadcasting - Additional complexity issues
- New, diverse terminals (Symbian Cell. Phones,
PDAs, smart set-top-boxes) - Dynamic creation of novel services and
applications
45Complexity as an opportunity
- The availability of a multiplicity of networks,
devices and services should be seen as an
opportunity - No single infrastructure of critical importance
- Ease of access to all players government,
companies, common people - Availability of a multitude of sources of
information - Availability of a multitude of computing
resources - Availability of a multitude of communication
media/networks - provided that such a rich scenario can be
managed as a system
46Some recent events
- We learned some lessons recently
- 9/11 2001 Attacks
- US East Coast Blackout
- Italy Blackout
- Series of attacks
- Worms (NIMDA, Witty, Slammer )
- DDOS
- Routing attacks
- We probably need to re-discover traditional
values typical of traditional engineering practice
47Lessons from 9/11 2001
- The Internet under Crisis Conditions A
Committee of the National Research Council of the
National Academies (www.nap.edu) - Findings of the Committee
- Attacks had very limited effects on the Internet
as a global, best effort communication system - Internet technology appears to be robust per se
but considerable efforts are needed to protect
Internet-based systems - Many critical interdependencies discovered only
after the attacks
48Known and less known effects
- Dependency of Internet on other telecommunication
systems (fixed, wireless, cellular) - Obvious co-location of sites, tubes, cables
running out of diesel - Not so obvious e.g. communications between NYC
ISPs and TelCos hampered by problems to toll-free
numbers - Facility disaster planning as a rare
expertise/culture in the Internet world - Very limited capacity of backup power generation
even in major ISP sites/POPs - Other issues, e.g.
- DNS for .za domain was hosted on a server in NYC
- WiFi LANs of two major Manhattan hospitals
operating in outsourcing via Internet
49Lessons from 9/28 2003
- Anticipated by the US East Coast blackout much
larger scale than WTC but apparently more limited
damage - Different effects and impacts
- POTS infrastructure capable of enduring very long
power outages practically no effects - Cellular Networks locally in deep crisis
- National TV and Radio broadcasters OK, local
players generally in crisis - Global and VoIP operators knocked-out
- What about the Internet?
- All IT based services affected AAA, CDN, Servers
50Lessons in ATC systems
- Press Releases (http//www.natca.org/mediacenter/p
ress-release-detail.aspx?id394) - MASSIVE POWER, COMMUNICATIONS FAILURE AT MAJOR
AIR TRAFFIC CONTROL CENTER PUTS CONTROLLERS IN
DARK, FLIGHTS IN JEOPARDY - 07/19/2006 Bob Marks
PALMDALE, Calif. A massive
power and communications failure late Tuesday at
the Los Angeles Air Route Traffic Control Center
left scrambling air traffic controllers to deal
with a nightmare scenario how to keep dozens of
flights away from each other above a large swath
of the Southwestern United States despite the
inability to see them, talk to them or relay
crucial instructions for 15 excruciatingly long
minutes.Every ounce of skill, heart and
determination that controllers bring into the
control room every day was put to the test during
one of the worst outages to ever hit the
facility. It was so bad, controllers say, that
the only thing they had of use to aid the
situation that actually worked was their cell
phones devices which the Federal Aviation
Administration, inexplicably, has barred from
control rooms, further impeding the safety of the
system. - More details in http//themainbang.typepad.com/blo
g/2006/07/complete_failur.html
51Issues for research (1)
- Forget OSI-type layering/abstractions
- Services depend not only on peer and adjacent
layers - Resiliency is a system-wide issue, with vertical
and horizontal dependencies - Start speaking about networked systems and not
only of networks - IT based services must be considered as part of
the whole picture - Contributions from several disciplines
- Multi-level approach
- Cross-layer approach
52Issues for research (2)
- Monitoring of services and infrastructures
- We cant trust what we cant control
- Robustness of services
- To unexpected situations faults,
misconfigurations, excessive demand, soft attacks
(DDOS) - To expected but complex situations
tools/methodologies for proper dimensioning of
services (Service Engineering) - Resiliency of infrastructures
- Focus on survivability of communication systems
to hard attacks (terrorist hits, natural
disasters) - Reconfigurability of communication systems
- Make different networks/systems a single
infrastructure
53Issues for research (3)
- Towards a GRID of communication infostructures
- Connect them all physically
- Make them resilient separately
- Allow for services to migrate
- Prepare for interconnecting them if needed
- From the computational GRID to the communication
GRID - But try to make it with an autonomic
communication flavour
54Issues for research
- Resiliency of infrastructures
- Focus on survivability of communication systems
to hard attacks (terrorist hits, natural
disasters) - Reconfigurability of communication systems
- Make different networks a single infrastructure
- Resiliency of services
- To unexpected situations faults, excessive
demand, soft attacks (D-DOS) - To expected but complex situations
tools/methodologies for proper dimensioning of
services (Service Engineering)
55From QoS to Resiliency to
- We should not forget the past
- QoS is as important as resiliency, and is
backValue of supporting Class-of-Services in IP
Backbones, M. Yuksel et al., IWQOS 2007 - Also because QoS is a good mechanism to improve
resiliency of a distributed system - So, we should probably talk about
- QoSiliency
56User-Centered Architectures
Service Directories
57Info about content (metadata)
SLAs are the triggers
Service Directories
Access Controllers
Service Controller
SLS
Resource Controllers
User
Policy rules
QoS-capable Networks
58A change of perspective
- One of the major problems with SLA based
architectures was their limited capability to
scale with the number of users and services - We therefore introduce the concept of
Provisioning Level Agreement (PLA) A PLA is a
contract between a service provider and the owner
of the Infrastructure defining the level of
service to be guaranteed to final users during
the provisioning of a service on top of that
Infrastructure.
59A change of perspective (cont.)
- In a PLA it is the service provider who defines
- the type of service
- the treatment the service needs to get from the
network (QoS, resiliency needs, security and
privacy reqs.) - the classes of possible SLAs that can be
subscribed by the users - A PLA is signed at service deployment time, and
can be dynamically modified and updated any time
the service characteristics and requirements
change - Once a PLA is signed, Provisioning Level
Specifications are produced to allow the
infrastructure to be properly configured to
accommodate the new service and future service
subscriptions by final users
60Info about content (metadata)
Service Centered Architectures
Service Directories
Service Provider
PLS
Resource Controllers
Policy rules
QoS-figurable Networks
PLAs are the triggers