OAM and QoS - PowerPoint PPT Presentation

About This Presentation
Title:

OAM and QoS

Description:

Title: Performance Management: Application-driven Evolution Author: Y(J)S Keywords: OAM, PM, FM Last modified by: yaakov_s Created Date: 2/2/2004 11:54:59 AM – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 41
Provided by: YJS5
Category:

less

Transcript and Presenter's Notes

Title: OAM and QoS


1
OAM and QoS
  • Presented by
  • Yaakov (J) Stein
  • Chief Scientist

Unique Access Solutions
2
Service Guarantees
3
Why do we pay for services ?
  • Generally good (and frequently much better than
    toll quality)
  • voice service is available free of charge
    (Skype, Fring, Nimbuzz, )
  • So why does anyone pay for voice services ?
  • Similarly, one can get free
  • (WiFi) Internet access
  • email boxes
  • file storage and sharing
  • web hosting
  • software services
  • So why pay ?

4
Paying for QoS
  • The simple answer is that one doesnt pay for the
    service
  • one pays for Quality of Service guarantees
  • In our voice model
  • But what does QoS mean
  • and why are we willing to pay for it ?
  • To explain, we need to review some history

5
Father of the telephone
  • Everyone knows that the father of the telephone
    was
  • Alexander Graham Bell
  • (along with his assistant Mr. Watson)
  • But Bell did not invent the telephone network
  • Bell and Watson sold pairs of phones to customers
  • The father of the telephone network was
  • Theodore Vail

6
Theodore Vail -
  • Theodore Who?
  • Son of Alfred Vail (Morses coworker)
  • Ex-General Superintendent of US Railway Mail
    Service
  • First general manager of Bell Telephone
  • Father of the PSTN
  • Why is he so important?
  • Organized PSTN
  • Established principle of reinvestment in RD
  • Established Bell Telephones IPR division
  • Executed merger with Western Union to form ATT
  • Solved the main technological problems
  • use of copper wire
  • use of twisted pairs
  • Organized telephony as a service (like the postal
    service!)
  • Vailism is the philosophy that public services
    should be run as closed centralized monopolies
    for the public good

7
Whats the difference ?
  • In the Bell-Watson model
  • the customer pays once, but is responsible for
  • installation
  • wires
  • wiring
  • operations
  • power
  • fault repair
  • performance (distortion and noise)
  • infrastructure maintenance
  • while the Bell company is responsible only for
  • providing functioning telephones
  • In the Vail model the customer pays a monthly fee
  • but the provider assumes responsibility for
    everything
  • including fault repair and performance
    maintenance
  • the telephone company owns the telephone sets
    and even the wires in the walls !

8
Service Level Agreements
  • In order to justify recurring payments
  • the provider agrees to a minimum level of
    service in an SLA
  • SLAs should capture Quality of user Experience
    (QoE)
  • but this is often hard to quantify
  • So SLAs usually actually detail measurable
    network parameters
  • that influence QoE, such as
  • availability (e.g., the famous five nines)
  • time to repair (e.g., the famous 50 ms)
  • information rate (throughput)
  • information latency (delay)
  • allowable defect densities (noise/distortion)
  • Availability (basic connectivity) always
    influences QoE
  • It is hard to predict the effect of the other
    parameters on QoE even when there is only one
    application (e.g., voice)
  • When multiple applications are in use - it may be
    impossible

9
Some Applications
  • System traffic
  • routing protocols, DNS, DHCP, time delivery,
    system update, OAM,
  • tunneling and VPN setup
  • Business processes
  • database access, backup and data-center, B2B,
    ERP
  • Communications - interactive
  • voice, video conferencing, telepresence, instant
    messaging,
  • remote desktop, application sharing
  • Communications non-interactive
  • email, broadcast programming, music
  • video progressive download, live streaming,
    interactive
  • Information gathering
  • http(s), Web 2.0, file transfer
  • Recreational
  • gaming, p2p file transfer
  • Malicious
  • DoS, malware injection, illicit information
    retrieval

10
What do applications need ?
  • Some applications only require availability
  • Some also require minimum available throughput
  • Some require delay less then some end-end (or RT)
    delay
  • Some require packet loss ratio (PLR) less than
    some percentage
  • and these parameters are not necessarily
    independent
  • For example,
  • TCP throughput drops with PLR

11
Some rules of thumb
  • Mission Critical (and life critical) applications
    require
  • high availability
  • If there are any MC applications
  • then system traffic requires high availability
    too
  • MC applications do not necessarily require strict
    throughput
  • but always indirectly require
  • a certain minimal average throughput
  • bounded delay
  • If the MC application uses TCP then it requires
  • low PLR
  • Real-time applications require
  • sufficient throughput
  • but not necessarily low PLR (audio and video
    codecs have PLC)
  • Interactive applications require
  • low RT delay
  • It may be more scalable for a SP to measure 1-way
    delays

12
OAM
13
Monitoring an SLA
  • The Service Providers justification for payment
  • is the maintenance of an SLA
  • To ensure SLA compliance, the SP must
  • monitor the SLA parameters
  • take action if parameter is dropping below
    compliance levels
  • But how does the SP verify/ensure that the SLA is
    being met ?
  • Monitoring is carried out using
  • Operations, Administration, Maintenance (OAM)
  • The customer too may use OAM to see that the SP
    is compliant !
  • Technical note
  • OAM is a user-plane function
  • but may influence control and management plane
    operations
  • for example
  • OAM may trigger protection switching, but doesnt
    switch
  • OAM may detect provisioned links, but doesnt
    provision them

14
Operations, Administration, Maintenance
  • Traditionally, one distinguishes between 2 OAM
    functionalities
  • Fault Monitoring
  • OAM runs continuously/periodically at required
    rate
  • detection and reporting of anomalies, defects,
    and failures
  • used to trigger mechanisms in the
  • control plane (e.g. protection switching) and
  • management plane (alarms)
  • required for maintenance of basic connectivity
    (availability)
  • Performance Monitoring
  • OAM run
  • before enabling a service
  • on-demand or
  • per schedule
  • measurement of performance criteria (delay, PDV,
    etc.)
  • required for maintenance of all other QoE
    attributes

15
Early OAM
  • Analog channels and 64 kbps digital channels
  • did not have mechanisms to check signal validity
    and quality
  • Thus
  • major faults could go undetected for long periods
    of time
  • hard to characterize and localize faults when
    reported
  • minor defects might be unnoticed indefinitely
  • As PDH networks evolved, more and more OAM was
    added on
  • monitoring for valid signal
  • loopbacks
  • defect reporting
  • alarm indication/inhibition
  • The OAM overhead started to explode in size !
  • When SONET/SDH was designed
  • bounded overhead was reserved for OAM functions

16
OAM for Packet Switched Networks
  • OAM is more complex for Packet Switched Networks
  • in addition to the previous defects
  • loss of signal
  • bit errors
  • we have new defect types
  • packets may be lost
  • packets may be delayed
  • packets may delivered to the wrong destination
  • The first PSN-like network to acquire OAM was ATM
    (I.610)
  • Although technically ATM is cell-based, not
    packet-based

17
Some FM OAM mechanisms (1)
  • How do we perform Continuity Check ?
  • send OAM packets at a constant known rate
  • if CC packets are not received for gt3 intervals
    then declare a fault
  • see also LB / echo mode
  • How do we perform Connectivity Verification ?
  • send OAM packets to a known destination
  • if CV packets are received somewhere else then
    declare a fault
  • How do we indicate AIS (FDI) ?
  • when do not receive forward traffic send AIS OAM
    packets
  • if AIS packets received then declare a fault
  • How do we indicate RDI (BDI) ?
  • when do not receive reverse traffic send RDI OAM
    packets
  • if RDI packets received then declare a fault
  • Note RDI is often a flag set on CC message

18
Some FM OAM mechanisms (2)
  • How do we use LoopBack ?
  • non-intrusive (in-service) (echo mode)
  • send LB request OAM packet to remote site
  • remote site replies with LB reply
  • if LB reply not received then declare a fault
  • intrusive (out-of-service)
  • put remote site into LB mode
  • remote sites reflects (and does not forward) all
    traffic
  • (note that it must monitor OAM traffic)
  • if packets sent are not received then declare a
    fault
  • note need to inform next hops of LB by locking
  • How do we use LinkTrace ?
  • send LB request OAM packet to next hop
  • send LB request to following hop
  • etc.

19
Some PM OAM mechanisms (1)
  • How do we measure Packet Loss Ratio ?
  • Traffic (counter) based
  • maintain 2 counters
  • number of packets transmitted to peer Tx
  • number of packets received from peer Rx
  • send Tx counter to peer at time 1 Tx(1)
  • peer notes its Rx counter at time of reception
    Rx(2)
  • and its Tx counter at time of its reply Tx(3)
  • originator notes its Rx counter when reply is
    received Rx(4)
  • calculate PLR in both directions
  • Synthetic
  • do not maintain counters use OAM packets
  • Note synthetic loss is only a rough estimate
  • How do we measure Throughput?
  • Primitive way (RFC 2544)
  • send packets at maximum rate and observe packet
    loss
  • reduce rate until no loss is observed
  • Note there are more sophisticated mechanisms !

20
Some PM OAM mechanisms (2)
  • How do we measure 1-way Packet Delay (Latency) ?
  • synchronize clocks at both OAM peers
  • send timestamp T1 to peer
  • peer timestamps receipt with T2
  • calculate time difference T2 T1
  • How do we measure 2-way Packet Delay (Latency) ?
  • send timestamp T1 to peer
  • peer timestamps receipt with T2
  • peer replies at T3
  • originator timestamps receipt of reply at T4
  • calculate time difference (T4 T1) (T3 - T2)
  • assuming symmetry, 1-way delay is half this
    amount
  • Note do not need to synchronize clocks
  • How do we measure Packet Delay Variation ?
  • send timestamps at a constant rate
  • peer calculates timestamp differences and
    statistics thereof
  • Note do not need to synchronize clocks

21
Ethernet OAM
22
What about Ethernet ?
  • Carrier Ethernet has replaced ATM as the default
    layer-2
  • Ethernet is by far the most widespread network
    interface
  • Ethernet has some advantages as compared to ATM
  • it has network-wide unique addresses
  • it has a source address in every packet
  • but some aspects make Ethernet OAM more difficult
  • ConnectionLess (CL)
  • multipoint to multipoint
  • overlapping layering need OAM for operator,
    SPs, customer
  • some specific problematic ETH behaviors
    (flooding, multicast, )

23
Whats the problem with CL ?
  • OAM makes a lot of sense in Connection Oriented
    environments
  • connections last a relatively long amount of time
  • there is some SLA at the connection level
  • For CL networks, the network path is neither
    known nor pinned
  • So it doesnt really make sense to talk about FM
  • what does continuity mean if when a link goes
    down
  • the network automatically reroutes around the
    failure ?
  • The Ethernet CL problem is solved by overlaying
    CO functionality
  • flows or
  • EVCs

24
Ethernet OAM
  • For many years there was no OAM for Ethernet
  • (LANs dont need OAM)
  • now there are two incompatible ones!
  • Link layer OAM 802.3 clause 57 (EFM OAM,
    802.3ah)
  • single link only
  • slow protocol, limited functionality
  • some management functions
  • Service OAM Y.1731, 802.1ag (CFM)
  • any network configuration
  • multilevel OAM functionality
  • In some cases one may need to run both
  • while in others only service OAM makes sense
  • Link layer OAM is only for a single link, which
    is necessarily CO
  • Service OAM is most frequently used for
    infrastructure networks,
  • which are also CO

25
Layer 2 control protocols (L2CPs)
  • Do not be confused - L2CPs are NOT OAM !
  • Here are a few well-known L2CPs

protocol DA reference
STP/RSTP/MSTP 01-80-C2-00-00-00 802.2 LLC 802.1D 8,9 802.1D17 802.1Q 13
PAUSE 01-80-C2-00-00-01 802.3 31B 802.3x
LACP/LAMP 01-80-C2-00-00-02 EtherType 88-09 Subtype 01 and 02 802.3 43 (ex 802.3ad)
Link OAM 01-80-C2-00-00-02 EtherType 88-09 Subtype 03 802.3 57 (ex 802.3ah)
ESMC 01-80-C2-00-00-02 EtherType 88-09 Subtype 10 G.8264
Port Authentication 01-80-C2-00-00-03 802.1X
E-LMI 01-80-C2-00-00-07 MEF-16
Provider MSTP 01-80-C2-00-00-08 802.1D 802.1ad
Provider MMRP 01-80-C2-00-00-0D 802.1ak
LLDP 01-80-C2-00-00-0E EtherType 88-CC 802.1AB-2009
GARP (GMRP, GVRP) Block 01-80-C2-00-00-20 through 01-80-C2-00-00-2F 802.1D 10, 11, 12
Note IEEE disallows forwarding of L2CPs, MEF
allows it under certain circumstances
26
Link Layer OAM (AKA EFM OAM)
  • Ethernet in the First Mile (Last Mile ?)
  • EFM networks are mostly p2p DSL links or p2mp
    PONs
  • thus a link layer OAM is sufficient for EFM
    applications
  • Since EFM link is between customer and Service
    Provider
  • EFM OAM entities are either active (SP) or
    passive (customer)
  • active entity can place passive one into LB mode
    but not the reverse
  • EFM OAMPDUs are a slow protocol frames never
    forwarded
  • Ethertype 88-09 and subtype 03
  • messages multicast to slow protocol specific
    group address
  • OAMPDUs must be sent once per second (heartbeat)
  • messages are TLV-based

27
EFM OAM capabilities
  • 6 codes are defined
  • Information (autodiscovery, heartbeat, fault
    notification)
  • Event notification (statistics reporting)
  • Variable request (active entity query passives
    configuration) (mngt)
  • Variable response (passive entity responds to
    query) (mngt)
  • Loopback control (active entity enable/disable of
    intrusive LB mode)
  • Organization specific (proprietary extensions)
  • and there are flags in every OAMPDU to
  • expedite notification of critical events
  • link fault (RDI)
  • dying gasp
  • unspecified
  • monitor slow degradations in performance

28
Service OAM (AKA CFM, Y.1731)
  • Many SPs need to monitor full networks
  • not just single links
  • Service layer OAM provides end-to-end integrity
  • of the Ethernet service over arbitrary server
    layers
  • Because Ethernet is flat
  • not true client-server layering (except
    MAC-in-MAC)
  • service layer OAM is multilevel
  • Because SPs want to replace transport networks
    with Ethernet
  • service OAM must support all OAM features
  • and must enable advanced transport capabilities
  • (such as linear/ring protection switching)
  • a transport network is a network with
  • High availability (Fault Management OAM and
    Automatic Protection Switching)
  • SLA support (Performance Management OAM and QoS
    mechanisms)
  • a Management plane (optionally a control plane)
    for configuration and provisioning
  • Efficiency and Scalability

29
Y.1731 messages
  • Y.1731 supports many OAM message types
  • Continuity Check proactive heartbeat with 7
    possible rates
  • Synthetic Loss Measurement on demand loss rate
    estimation
  • LoopBack unicast/multicast pings with optional
    patterns
  • Link Trace identify path taken to detect
    failures and loops
  • AIS periodically sent when CC fails
  • RDI flag set to indicate reverse defect
  • Client Signal Fail sent by MEP when client
    doesnt support AIS
  • LoCK signal inform peer entity about diagnostic
    actions
  • TeST signal in-service/out-of-service tests for
    loss rate, etc.
  • Automatic Protection Switching
  • Maintenance Communications Channel remote
    maintenance
  • EXPerimental
  • Vendor SPecific

30
Y.1731 frame format
  • after DA, SA and Ethertype (8902)
  • Y.1731/802.1ag PDUs have the following header
    (may be VLAN tagged)
  • if there are sequence numbers/timestamp(s)
  • they immediately follow
  • then come TLVs, the end TLV, followed by the
    CRC
  • TLVs have 1B type and 2B length fields
  • there may or not be a value field
  • the end-TLV has type zero and no length or
    value fields

31
Y.1731 PDU types
opcode OAM Type DA
1 CCM M1 or U
3 LBM M1 or U
2 LBR U
5 LTM M2
4 LTR U
6-31 RES IEEE
32-63 unused RES ITU-T
33 AIS M1 or U
35 LCK M1or U
37 TST M1 or U
39 Linear APS M1or U
40 Ring APS M1or U
41 MCC M1 or U
43 LMM M1 or U
42 LMR U DA
45 1DM M1 or U
47 DMM M1 or U
46 DMR UA
49 EXM
48 EXR
51 VSM
50 VSR
52 CSF M1 or U
55 SLM U
54 SLR U
64-255 RES IEEE
32
MEPs and MIPs
  • Maintenance Entity (ME) entity that requires
    maintenance
  • ME is a relationship between ME end points
  • because Ethernet is MP2MP, we need to define a ME
    Group
  • MEGs can be nested, but not overlapped
  • MEG LEVEL takes a value 0 7
  • by default - 0,1,2 operator, 3,4 SP, 5,6,7
    customer
  • MEP MEG end point (MEG ME group, ME
    Maintenance Entity)
  • (in IEEE
    MEG is called MA Maintenance Association)
  • unique MEG IDs specify to which MEG we send the
    OAM message
  • MEPs responsible for OAM messages not leaking out
  • but transparently transfer OAM messages of higher
    level
  • MIPs MEG Intermediate Points
  • never originate OAM messages,
  • process some OAM messages
  • transparently transfer others

33
MEPs and MIPs (cont.)
34
How is OAM used ?
  • MEF-30 Service OAM FM and MEF-xx Service OAM PM
  • describe the use of OAM for Carrier Ethernet
    networks, such as
  • which Y.1731/802.1 features/messages should be
    used
  • where to put MEPs, what MA and MEG levels names
    should be used
  • minimum number of EVCs that must be supported
  • what should be reported and how
  • Y.1564 (ex Y.156sam) Ethernet Service Activation
    Test Methodology
  • describes commissioning procedures (replaces
    RFC2544-like benchmarking)
  • Tests that desired performance level can be
    achieved, including
  • CIR, EIR (and optionally CBS and EBS for
    bursting)
  • traffic policing
  • rate, loss, delay, delay variation, availability
    (measured simultaneously)
  • Testing in two steps
  • Service Configuration Test each service
    separately
  • Service Performance Test all services together
  • Performance testing may be for
  • 15 minutes (new service on operational network)
  • 2 hours (single operator network)
  • 24 hours (multiple operator networks)

35
QoS enforcement
36
QoS approaches
  • There are two approaches to QoS handling
  • IntServ (guaranteed QoS)
  • define traffic flows (CO approach)
  • guarantee QoS attributes for each flow
  • reserve resources at each router along the flow
  • signaling protocol (e.g., RSVP) needed
  • DiffServ (statistical QoS)
  • retain CL paradigm
  • no guaranteed QoS attributes
  • mark packets (differentiated e.g., gold,
    silver, bronze)
  • marking can be by VLAN, P-bits, IP-ToS/DSCP, or
    general flow
  • offer special treatment (priority) relative to
    other packets
  • no resource reservation
  • For Ethernet and IP DiffServ is the preferred
    approach

37
Some fields for marking
  • Example
  • For an IPv4 packet inside Q-in-Q Ethernet
  • we have various choices for marking priority

802.1p user priority field AKA P-bits 0
7 priority tagging (VLAN0) if no VLAN P0 means
non-expedited traffic 802.1Q recommends mappings
  • IP ToS
  • RFC 2474 redefined ToS to contain
  • 6 bit DSCP (see also RFC 4594)
  • 2 bit ECN

38
Queuing
  • Ethernet switches have queues FIFO buffers
  • on each output port
  • If there were only one queue
  • then traffic handling would be FIF
  • To enable DiffServ prioritization
  • multiple queues are used
  • Outgoing frames are inserted into queues
  • according to priority marking
  • Many methods for emptying queues
  • The most popular are
  • Strict Priority
  • always take from nonempty queue
  • of highest priority
  • Weighted Fair Queuing
  • take from nonempty queues according
  • to configured weight

39
Traffic shaping
  • One of the most important parts of an SLA is the
  • Committed Information Rate (bps)
  • This is the datarate (bandwidth) SP guarantees
    will be forwarded
  • There may also be an
  • Extra Information Rate (bps)
  • This is a datarate that the SP will forward if
    possible
  • Packet traffic is often bursty
  • A customer who did not send data for a while
  • will expect to be able to send a higher rate
    afterwards
  • This is accomplished via traffic shaping
  • time integration is accomplished by leaky/token
    buckets
  • the effect of shaping is marking drop eligibility
  • (marking a packet on the line is only possible
    with S-tags!)
  • There is often also traffic policing
  • policing simply discards packets to police a
    maximum rate !

40
MEF token bucket algorithm
  • Metro Ethernet Forum 10.x defines a bandwidth
    profile
  • there are two byte buckets, C of size CBS and E
    of size EBS (in bytes)
  • tokens are added to the buckets at rate CIR/8 and
    EIR/8
  • when bucket overflows tokens are lost (use it or
    lose it)
  • if ingress frame length lt number of tokens in C
    bucket
  • frame is green and its length in tokens is
    debited from C bucket
  • else
  • if ingress frame length lt number of tokens in E
    bucket
  • frame is yellow and its length of tokens is
    debited from E bucket
  • else frame is red
  • green frames are delivered
  • and service objectives apply
  • yellow frames are delivered
  • but service objectives dont apply
  • red frames are discarded
  • for simplicity we assume
  • no coupling and
  • no sharing !
Write a Comment
User Comments (0)
About PowerShow.com