Title: MPLS
1MPLS
- Yaakov (J) Stein February 2006
- Chief Scientist
- RAD Data Communications
2Why study MPLS?
- its new
- its different and interesting
- along with IPv6 it will save the Internet
- it enables delivery of IP VPN services
- it facilitates pseudowire functionality
- all of the above
3Course Outline
- 1) Introduction
- 2) MPLS Forwarding Component
- 3) MPLS Control Component
- 4) MPLS Traffic Engineering
- 5) MPLS Applications
41.1 Review of forwarding and routing 1.2 Problems
with IP routing 1.3 Label Switching
5Forwarding and Routing
- a router (switch) performs 2 distinct algorithms
- forwarding algorithm (forwarding component)
- routing algorithm (control component)
- may be manual configuration or signaling protocol
- from the forwarding point of view
- there are three different types of network
- broadcast (e.g. Ethernet)
- connectionless (e.g. IP)
- connection oriented (e.g. ATM)
1.1
6Broadcast (e.g. Ethernet)
- message is sent to all possible recipients
- all receivers check message destination address
- if message destination address does not match
receivers address - message ignored (except in promiscuous mode)
- if message destination address matches receivers
address - message processed
1.1
7Connectionless Forwarding
- broadcast is only practical for small networks
(e.g. LANs) - large networks can be broken into subnetworks
- with routers performing the internetworking
- such a network is connectionless (CL) if
- no setup is required before sending data
- each router makes independent forwarding
decision - packets are self-describing
- packet inserted anywhere will be properly
forwarded - Notes
- the address must have global significance
- IP actually relies on a L2 protocol (Ethernet,
PPP) for final stage
1.1
8IP Routing
- Distance Vector (Bellman-Ford), e.g. RIP
- send ltaddr,costgt to neighbors
- routers maintain cost to all destinations
- need to solve count to ? problem
- Path Vector, e.g. BGP
- send ltaddr,cost,pathgt to neighbors
- similar to distance vector, but w/o count to ?
problem - like distance vector has slow convergence
- doesnt require consistent topology
- can support hierarchical topology gt exterior
protocol (EGP) - Link State, e.g. OSPF, IS-IS
- send ltneighbor-addr,costgt to all routers
- determine entire flat network topology
(Dijkstras algorithm) - fast convergence, guaranteed loopless gt
interior routing protocol (IGP) - convergence time is the time taken until all
routers work consistently - before convergence is complete packets may be
misforwarded, and there may be loops
1.1
9IP Forwarding
- what field is used for forwarding?
- field used depends on application
- regular use longest destination prefix match
- ToS use longest destination prefix match exact
match on ToS - multicast use longest/exact match on
source/group address - how is forwarding performed?
- forwarding algorithm depends on routing
algorithm! - for Distance Vector - forward by cost
- for Path Vector - forward by cost and path
- for Link State - Dijkstras algorithm
1.1
10Connection Oriented Forwarding
- each forwarder maintains forwarding table (or
table per input port) - control component
- route must be set-up (table must be updated)
before data sent - set-up may be manual or signaled
- once route no longer needed it should be torn-down
1.1
11CO Forwarding
- CO addresses need not have global significance
- by using locally defined addresses
- addresses are smaller in size
- no need for global allocation mechanism
- no need to maintain global database
- L2 forwarding is based on address read and swap
- Note
- when addresses are purely local
- CO forwarding is called L2 (link layer) forwarding
1.1
12CO Forwarding Table
input port input address output port output address
1 21 2 21
1 37 1 21
2 12 5 12
3 5 5 12
4 15 3 37
Note never really have an input port column
either it is irrelevant (CO address is per
switch not per input port), or if
needed (e.g. ATM) we keep separate tables per
input port (more efficient)
1.1
13CO Routing
- for CO forwarding we need to find a route at
setup - manual configuration (strict explicit route)
- use routing protocols (e.g. P-NNI for ATM)
- CO routing protocols search for a global path
- hence they can guarantee global characteristics
(SLA) - pure CL routing can not guarantee path QoS
- information source usually knows desired
characteristics - so usually use source routing
- source routing
- can force route to go through specific forwarders
(loose explicit route) - can reject forwarders that can not guarantee
needed characteristic - can request resource reservation in selected
forwarders
1.1
14Problems with IP routing
- scalability
- router table overload
- routing convergence slow-down
- increase in queuing time and routing traffic
- problems specific to underlying L2 technologies
- hard to implement load balancing
- QoS and Traffic Engineering
- problem of routing changes
- difficulties in routing protocol update
- lack of VPN services
1.2
15Scalability
- when IP was first conceived scalability was not a
problem - as IP traffic increases, routing shows stress
- simplistic example
- N hosts
- each router serves M hosts
- each router entry takes a bytes
- hence
- router table size a N N
- N / M N routers (more routers gt slower
convergence) - packet processing time N (since have to
examine entire table) - N routers send to N routers tables of size N
- so routing table update traffic increases N 3
(or N 4 ) - IP routing requires 1000s of clocks per decision
1.2
16L2 Backbone Scalability
- instead of expensive and slow IP routers
- we can use faster and cheaper ATM switches in
core - but this doesnt help! ATM switches are
transparent to routers - since ATM switches do not participate in IP
routing protocols - every IP router must be logically adjacent to
every other - and we need N2 ATM VCs !
- if only the ATM switches could understand IP
routing protocols!
1.2
17Load Balancing
- using hop-count as path cost
- traffic destined for G from both A and B goes
through D - so links CD and DG become congested
- while links CE, EF, and FG are underutilized
- since IP forwarding uses only destination address
- this cant be overcome in pure IP routing
- problem even exists in equal-cost multi-path
(ECMP) case - solution requires traffic engineering
1.2
18QoS and TE
- pure IP, being CL, can not guarantee path QoS
- but other protocols in the IP suite can help
- TCP adds CO layer compensating for loss and
misordering - IntServ (RSVP) sets up path with reserved
resources - DiffServ (ToS) prioritizes packets (neither
-Serv widely deployed) - so IP network managers mostly use network
engineering - i.e. throw BW at the problem
- rather than traffic engineering
- i.e. optimally exploit the BW you have
1.2
19Routing Changes
- IP routing is satisfactory in the steady state
- but what happens when something changes?
- change in routing information
- (new router, router failure, new inter-router
connection, etc) - necessitates updating of tables of all routers
- convergence will be slow
- change in routing protocol is even worse
- (e.g. Bellman-Ford to present, classes to
classless, IPv4 to IPv6) - necessitates upgrade of all router software
- upgrade may have to be simultaneous
- need more complete separation of forwarding from
routing functionality
1.2
20VPN Services
- IP was designed to interconnect LANs
- not to provide VPN services
- all routers logically interconnected, so security
weak - LANs may use non globally unique addresses
(present solution - NAT) - complex provider - customer relationship
- VC-merge problem (discussed later)
1.2
21Solution - Label Switching
- label switching adds the strength of CO to CL
forwarding - label switching has three stages
- routing (topology determination) using L3
protocols - path setup (label binding and distribution)
- data forwarding
- label switching the solution to all of the above
problems - speeds up forwarding
- decreases forwarding table size (by using local
labels) - load balance by explicitly setting up paths
- complete separation of routing and forwarding
algorithms - no new routing algorithm needed
- but new signaling algorithm may be needed
1.3
22Where is it?
- unlike TCP, the CO layer lies under the CL layer
- if there is a broadcast L2 (e.g. Ethernet), the
CO layer lies above it - hence, label switching is sometime called layer
2.5 switching
1.3
23Labels
- a label is a short, fixed length, structure-less
address - the following are not labels
- telephone number (not fixed length,
country-codearea-codelocal-number) - Ethernet address (too long, note vendor-code is
not meaningful structure) - IP address (too long, has fields)
- ATM address (has VP/VC)
- not explicit requirement, but normally only local
in significance - label(s) added to CL packet, in addition to L3
address - layer 2.5 forwarding
- may find a different route than the L3 forwarding
- is faster than L3 forwarding
- requires a flow setup process and signaling
protocol
1.3
24Forwarding Equivalence Class
- equivalence class - set of entities sharing
common characteristics - that can be considered equivalent for some
purpose - Theorem (from set theory) - any equality relation
(e.g. common features) - divides all entities into non-overlapping
equivalence classes - any forwarding algorithm need only consider
- destination (and sometimes source) address
- service requirements (e.g. priority, BW, allowed
delay, etc) - we can group together all packets with the same
- destination and service requirements as a FEC
- by the theorem every packet belongs to one unique
FEC - packets in the same FEC should follow the same
route - so we should map them to the same label (bind
them)
1.3
25FECs
- what constitutes a FEC ?
- for IP routing (de facto)
- all packets w/ same destination IP prefix
- that prefix being the longest in the routing
table - for MPLS we can decide !
- coarsest granularity
- all packets with a destination address
- served by a given router
- finest granularity
- all packets from given source socket
- to given destination socket
- with specified handling requirements
1.3
26Label Switching Architecture
- label switching is needed in the core, access can
be L3 forwarding - core interfaces the access at the edge (ingress,
egress) - LSR router that can perform label switching
- LER LSR with non-MPLS neighbors (LSR at edge of
core network) - LSP unidirectional path used by label switched
forwarding (ingress to egress) - not every packet needs label switching (e.g.
only small number of packets, no QoS)
1.3
27Label Switched Forwarding
- LSP needs to be setup before data is forwarded
- and torn down once no longer needed
- LSR performs
- label switched forwarding for labeled packets
- label unique to LSR or unique to input interface
(like ATM) - optionally L3 forwarding for unlabeled packets
- ingress LER
- assigns packet to FEC
- labels packet
- forwards it downstream using label switching
- egress LER
- removes label
- forwards packet using L3 forwarding
- exception PHP (discussed later)
- once packet is assigned to a FEC and labeled,
- no LSR looks at the L3 headers/address
1.3
28Hierarchical Forwarding
- many networks use hierarchical routing
- decreases router table size
- increases forwarding speed
- decreases routing convergence time
- telephone numbers
- Internet DNS
- ATM
- Ethernet/802.3 address space is flat
- (even though written in byte fields)
Country-Code Area-Code Exchange-Code
Line-Number 972 2
588 9159
host SLD TLD myrad . rad . co .
il
1.3
29IP Routing Hierarchy
- IPv4
- originally flat space (1st come - 1st) until
router tables exploded - then 3 classes
- now CIDR ltRFC1519gt
- ASAutonomous System
- IP exploits hierarchy by employing
interior/exterior routing protocols - IP can even support arbitrary levels of hierarchy
- by advertising aggregated addresses
- but the exploitation is not optimal
1.3
30IP Routing Hierarchy Bug
- traffic from network 1 to network 2 must go
through transit domain - to get to any host in network 2,
- all transit domain routers forward to border
router 2 - transit domain routers shouldnt need to know
anything about network 2 - but for IP protocols change in network 2 forces
rerouting in transit domain - in worst case, transit domain routers need to
know all routes in Internet - avoiding maintenance of interdomain information
- conserves routing table memory
- ensures faster convergence
- transit domain routers restart faster
- provides better fault isolation
except at BGP-OSPF interface
1.3
31Label Stacks
- since labels are structure-less, the label space
is flat - label switching can support arbitrary levels of
hierarchy - by using a label stack
- label forwarding based only on top label
- before forwarding, three possibilities (listed in
NHLFE) - read top label and pop
- read top label and swap
- read top label, swap, and push new label(s)
1.3
32Example Uses of Label Stack
- Example applications that exploit the label stack
- fast rerouting
- VPNs
- X over MPLS (PWE3)
- Note three labels is usually more than enough
1.3
33Fast Rerouting
- IP has no inherent recovery method (like SONET)
- in order to ensure resilience we provide bypass
links - to reroute quickly we pre-prepare labels for the
bypass links
when link down change fwd table
swap
swap
12
11
10
swap push
swap
pop
swap
11
from here on no difference!
protection LSP
label space per LSR not per input port
1.3
34Label Switched VPNs
Key C Customer router CE Customer Edge
router P Provider router PE Provider
Edge router
1.3
35Label Switched VPNs (cont.)
- customers 1 and 2 use overlapping IP addresses
- C-routers have inconsistent tables
- ingress PE-router inserts two labels
- P-routers dont see IP addresses
- so no ambiguity
- P-routers see only the label of the egress
PE-router - they dont know about VPNs at all
- no need to understand customer configuration
- hence smaller tables no rerouting if customer
reconfigures - ingress PE router only knows about CE routers
- no need to understand customer configuration
1.3
36X over MPLS
MPLS-f inner label outer label(s) Dictionary
ITU-T interworking label transport
label(s) IETF PW label tunnel label(s)
1.3
37Service Interworking
- Network interworking (PW)
- Service interworking
Customer Edge (CE)
Native Service A
Customer Edge (CE)
Provider Edge (PE)
Provider Edge (PE)
Native Service B
provider network
1.3
38- MPLS Forwarding Component
2.1 The essentials 2.2 The documents
39MPLS history
- many different label switching schemes were
invented - Cell Switching Router (Toshiba) ltRFC 2098,2129gt
- IP Switching (Ipsilon, bought by Nokia) ltRFC
2297gt - Tag Switching (Cisco) ltRFC 2105gt
- Aggregate Route-based IP Switching (IBM)
- IP Navigator (Cascade bought by Ascend bought by
Lucent) - so the IETF decided to standardize a single
method - BOFs 1994-1995
- WG chartered 1997
- co-chairs from Cisco and IBM (so similar to tag
switching and ARIS) - Cisco, IBM, Ascend authored architecture document
- MPLS standards-track RFCs 2001-
2.1
40MPLS
- of all the label switching technologies - what is
special about MPLS ? - multiprotocol - from above and below
- label in L2 or shim header
- single forwarding algorithm, including for
multicast and TE - can run on IP router or ATM switch with only SW
upgrade - (although can benefit from special HW)
- label distribution piggybacked on existing
routing protocols or via LDP - control-driven downstream label binding
- support for constraint-based routing (for TE)
2.1
41Multiprotocol Label Switching
- IPv4 IPv6 IPX etc.
- MPLS
- Ethernet ATM frame-relay etc.
2.1
42MPLS Labels
- label may be an appropriate address in either L2
or L3 - (even if we lose other features)
- Examples
- for ATM use VPI and VCI fields as two labels
(only two labels, no TTL) - for frame-relay use DLCI (only one label, no CoS,
no TTL) - otherwise use shim header (described later)
MPLS over ATM
2.1
43ATM Label Switching
1st cell
MPLS WG devoted a lot of time to this case
in this mode the label is carried in the
VPI/VCI this facilitates using ATM
switches however, we still need the shim header
for S, and TTL the label field in shim header is
a placeholder and set to zero
2.1
44MPLS Shim Header
- when a shim header is needed, its format should
be - Label there are 220 different labels ( 220
multicast labels) - Exp (CoS) left undefined by IETF WG
- was CoS in Cisco Tag Switching
- could influence packet queuing
- Stack bit S1 indicates bottom of label stack
- TTL decrementing hop count
- used to eliminate infinite routing loops
- generally copied from/to IP TTL field
- Special (reserved) labels
- 0 IPv4 explicit null
- 1 router alert
- 2 IPv6 explicit null
- 3 implicit null
2.1
45Single Forwarding Algorithm
- IP uses different forwarding algorithms
- for unicast, unicast w/ ToS, multicast, etc.
- LSR uses one forwarding algorithm (LER is more
complicated) - read top label L
- consult Incoming Label Map (forwarding table)
Cisco terminology LFIB - perform label stack operation (pop L, swap L - M,
swap L - M and push N) - forward based on Ls Next Hop Label Forwarding
Entry - NHLFE contains
- next hop (output port, IP address of next LSR)
- if next hop is the LSR itself then operation must
be pop - for multicast there may be multiple next hops,
and packet is replicated - label stack operation to be performed
- any other info needed to forward (e.g. L2 format,
how label is encoded) - ILM contains
- a NHLFE for each incoming label
- possibly multiple NHLFEs for a label, but only
one used per packet
2.1
46LER Forwarding Algorithm
- LERs forwarding algorithm is more complex
- check if packet is labeled or not
- if labeled
- then forward as LSR
- else
- lookup destination IP address in FEC-To-NHLFE Map
- if in FTN
- then prepend label
- and forward using LSR algorithm
- else forward using IP forwarding
Cisco terminology LIB
2.1
47Penultimate Hop Popping
- the egress LER E also may have to work overtime
- read top label
- lookup label in ILM
- find that in NHLFE that the label must be popped
- lookup IP address in IP routing
- forward to CE using IP forwarding
- we can save a lookup (and the first 3 steps) by
performing PHP - but pay in
loss of OAM capabilities - penultimate LSR PH performs the following
- read top label
- lookup label in ILM
- pop label revealing IP address of CE router
- forward to CE using IP forwarding
2.1
48Route Aggregation
IP link
12
net 1
13
IP link
32
31
net 3
22
23
IP link
net 2
MPLS domain
- traffic from both network 1 and network 2 is
destined for network 3 - scalability advantages
- fewer labels
- conserve table memory
- disadvantages
- IP forwarding may be required
- OAM backwards trail is destroyed
2.1
49IP Router / ATM Switch
- migration - important to be able to exploit
existing forwarding hardware - we can use IP routers as LSRs
- already use the routing protocols for topology
determination - need to add label distribution protocol (or
extend routing protocol) - need to alter the forwarding algorithm
- can manage with software upgrade
- but probably best to upgrade hardware too
- we can use ATM switches as LSRs
- need to alter the routing protocols for topology
determination - need to add label distribution protocol
- already uses the forwarding algorithm
- probably can just upgrade software
2.1
50ATM Cell Interleave
- ATM switches
- have separate label space per input port
- segment traffic into ATM cells
- what if the forwarding tables for both port 1and
port 2 have the entry - incoming label 13 outgoing label 17 outgoing
port 1 ?
A
13 data B2
B
13 data B1
2.1
B
51VC/VP Merge
- in order to properly reconstruct the packet we
can use - VC-merge
- buffer cells at the ATM LSR until end-of-packet
detected - cells belonging to different packets arent
interleaved - introduces delay
- requires large memory
- requires ATM LSR to detect AAL5 end-of-packet
- VP merge
- use VPI as MPLS label
- use VCI as multiplexing index
- limits number of available labels
2.1
52Label Distribution Protocols
- when LSR creates/removes a FEC - label binding
- needs to inform other LSRs (remote binding
information) - MPLS allows piggybacking label distribution on
routing protocols - protocols already in use (dont need to invent or
deploy) - eliminates race conditions (when route or
binding, but not both, defined) - ensures consistency between binding and routing
information - only for distance vector or path vector routing
protocols (not OSPF, IS-IS) - not all routing protocols are sufficiently
extensible (RIP isnt) - has been implemented for BGP-4
- MPLS WG invented a new protocol LDP for plain
label distribution - messages sent reliably using TCP/IP
- messages encoded in TLVs
- discovery mechanism to find other LSRs
- and extended RSVP to LSPs for QoS
2.1
53Control Mechanisms
- pre-MPLS protocols used various control
mechanisms - Example label distribution
- Cisco and IBM - control driven
- Toshiba and Ipsilon - data (traffic) driven
- MPLS is distinguished by its choice of control
protocols - control driven
- downstream binding
- unsolicited and on-demand allowed
- independent and ordered allowed
- we will discuss these in the section on the
control plane
2.1
54MPLS RFCs (page 1 of 2)
- 2547 BGP/MPLS VPNs
- 2702 Requirements for Traffic Engineering Over
MPLS - 2917 A Core MPLS IP VPN Architecture
- 3031 Multiprotocol Label Switching Architecture
- 3032 MPLS Label Stack Encoding
- 3035 MPLS Using LDP and ATM VC Switching
- 3036 LDP Specification
- 3037 LDP Applicability
- 3038 VCID Notification over ATM link for LDP
- 3063 MPLS Loop Prevention Mechanism
- 3107 Carrying Label Information in BGP-4
- 3209 RSVP-TE Extensions to RSVP for LSP Tunnels
- 3210 AS for Extensions to RSVP for LSP-Tunnels
- 3212 Constraint-Based LSP Setup using LDP
- 3213 Applicability Statement for CR-LDP
- 3214 LSP Modification Using CR-LDP
- 3215 LDP State Machine
2.2
55More MPLS RFCs
- 3270 MPLS Support of Differentiated Services
- 3346 AS for Traffic Engineering with MPLS
- 3353 Overview of IP Multicast in MPLS Environment
- 3429 Assignment of the 'OAM Alert Label' for MPLS
OAM functions - 3443 TTL Processing in MPLS Networks
- 3468 The MPLS WG decision on MPLS signaling
protocols - 3469 Framework for MPLS-based Recovery
- 3477 Signalling Unnumbered Links in RSVP-TE
- 3496 Support of ATM Service Class-aware MPLS
Traffic Engineering - 3564 Support of DiffServ-aware MPLS Traffic
Engineering - 3478 Graceful Restart Mechanism for LDP
- 3479 Fault Tolerance for the LDP
- 3480 Signalling Unnumbered Links in CR-LDP
2.2
56Yet More MPLS RFCs
- 3612 Applicability Statement for Restart
Mechanisms for LDP - 3630 OSPF-TE
- 3809 Generic Requirements for PPVPNs
- 3811 Definitions of Textual Conventions for MPLS
Management - 3812 MPLS Traffic Engineering Management
Information Base - 3813 MPLS Label Switching Router (LSR) MIB
- 3814 MPLS FEC-To-NHLFE MIB
- 3815 Definitions of Managed Objects for LDP
- 3916 Requirements for Pseudo-Wire Emulation
Edge-to-Edge - 3988 Maximum Transmission Unit Signalling
Extensions for LDP - 3985 PWE3 Architecture
- 4023 Encapsulating MPLS in IP or GRE
- 4026 PPVPN Terminology
- 4031 Service requirements for Layer 3 PPVPNs
2.2
57Even More MPLS RFCs
- 4023 Encapsulating MPLS in IP or Generic Routing
Encapsulation (GRE) - 4090 Fast Reroute Extensions to RSVP-TE for LSP
Tunnels - 4105 Requirements for Inter-Area MPLS Traffic
Engineering - 4124 Protocol Extensions for Support of
Diffserv-aware MPLS TE - 4125 Maximum Allocation BW Constraints Model for
Diffserv-aware MPLS TE - 4126 Max Allocation w/ Reservation BW Constraints
Model for Diffserv MPLS TE - 4127 Russian Dolls Bandwidth Constraints Model
for Diffserv-aware MPLS TE - 4128 Bandwidth Constraints Models for
Diffserv-aware MPLS TE - 4182 Removing a Restriction on the use of MPLS
Explicit NULL - 4201 Link Bundling in MPLS Traffic Engineering
(TE) - 4206 Label Switched Paths (LSP) Hierarchy with
GMPLS Traffic Engineering - 4216 MPLS Inter-AS TE Requirements
- 4220 Traffic Engineering Link Management
Information Base - 4221 Multiprotocol Label Switching (MPLS)
Management Overview - 4247 Requirements for Header Compression over
MPLS - 4364 BGP/MPLS IP VPNs (was 2547bis)
- 4368 MPLS Label-Controlled ATM and FR Management
Interface Definition - 4377 OAM Requirements for MPLS Networks
- 4378 A Framework for MPLS OAM
2.2
58MFA forum IAs
- 1.0 Voice over MPLS IA
- 2.0 MPLS-PVC UNI IA
- 3.0 LDP Conformance IA
- 4.0 TDM Transport over MPLS using AAL1 IA
- 5.0 I.366.2 Voice Trunking over MPLS IA
- 6.0.0 MPLS Proxy Admission Control
- 7.0.0 MPLS UNI protocol
- 8.0.0 TDM over MPLS using Raw Encapsulation
2.2
59ITU-T Recommendations
- Y.1411 ATM-MPLS network interworking - Cell mode
user plane interworking - Y.1412 ATM-MPLS network interworking - Frame mode
user plane interworking - Y.1413 TDM-MPLS network interworking User
plane interworking - Y.1414 Voice services - MPLS network
interworking - Y.1415 Ethernet-MPLS network interworking
User plane interworking - Y.1710 Requirements for OAM functionality for
MPLS networks - Y.1711 Operation Maintenance mechanism for MPLS
networks - Y.1712 User-plane fault-management for ATM and
MPLS OAM - Y.1713 Misbranching detection for MPLS networks
- Y.1720 Protection switching for MPLS networks
- X.84 FR over MPLS core networks
- G.8110/Y.1370 MPLS Layer Network architecture
2.2
603.1 Procedures and scenarios 3.2 Label
distribution protocols 3.3 LDP and BGP-4 details
61Control and User Planes
- topology determination
- use standard IP routing protocols
- BGP, OSPF, IS-IS, PIM
- label distribution
- piggyback on routing protocol
- use LDP, RSVP-TE
- forwarding paradigm
- label-stack CO forwarding
3.1
62What is Needed ?
- all IP routing protocols (BGP,OSPF,PIM,etc)
- procedure to bind label to FEC (label assignment)
- protocol to distribute label binding information
- procedure to create forwarding table
- procedure to label incoming packet
- forwarding procedure
- forwarding table lookup
- label stack operations
3.1
63All the Tables
- FEC table
- Free Labels 128-200 presently free
- FTN
- ILM
- NHLFE
FEC protocol input port handling
192.115/16 IPv4 2 best-effort
FEC port/label in port/label out
192.115/16 2/17 3/137
port/label in port/label out next hop operation
2/17 3/137 5.4.3.2 swap
3.1
64LER Architecture
control plane
IP routing table
user (data) plane
FTN
IP forwarding table
egress LER
MPLS forwarding table
3.1
65Binding Distribution Options
- label binding (assignment)
- per port or per LSR label space
- control driven vs. data driven (traffic driven)
- liberal vs. conservative label retention
- label distribution (advertisement)
- downstream vs. upstream
- downstream on-demand (dod) vs. downstream
unsolicited (du) - independent vs. ordered
3.1
66Per Port Label Space
- LSR may have a separate label space for each
input port (I/F) - or a single common label space
- or any combination of the two
- separate labels spaces means separate forwarding
tables per port - ATM LSR have per port label spaces (leads to
interleave problem) - per port label spaces increases number of
available labels - common label space facilitates several MPLS
mechanisms (e.g. reroute)
3.1
67Control vs. Data Driven
- there are two philosophies as to when to create a
binding - data-driven (traffic-driven) binding (Toshiba
CSR, Ipsilon IP-Switching) - automatically create binding when data packets
arrive - (from first packet?, after enough packets? when
tear LSP down?) - control-driven binding (Cisco Tag Switching,
IBM ARIS) - create binding when routing updates arrive
- (only update when topology changes? update upon
request?) - although not specifically stated in the
architecture document - MPLS assumes control driven binding
- two implementations of control driven
- topology-driven (routing tables are consulted)
- control-traffic driven (only routing update
messages are used)
3.1
68Liberal vs. Conservative Retention
- LSR receives advertisements (label distribution
messages) from other LSRs - conservative label retention
- LSR retains only label-to-FEC bindings that are
presently needed - liberal label retention
- LSR stores all bindings received (more labels
need to be maintained) - using liberal retention can speed response to
topology changes - LSRs must agree upon mode to be used
- A advertises label
- B is previous hop LSR
- but C retains label anyway
- later routing change makes C the previous hop
- C immediately can start forwarding
3.1
69Downstream vs. Upstream
downstream
- binding means to allocate a label to a FEC
- local binding - LSR allocates the label from
free label pool - remote binding - LSR that receives the label
- which LSR allocates ?
- MPLS uses downstream binding
- label allocated by LSR downstream from the LSR
that prepends it - label distribution information flows upstream
- reverse in direction from data packets
- to set up LSP through link from LSR A to LSR B
- LSR B binds label 13 to FEC
- B advertises label to LSR A
- LSR A sends packets with label 13 to B
label 13
B
A
3.1
70On-demand vs. Unsolicited
- downstream on-demand label distribution
- LSRs may explicitly request a label
- from its downstream LSR
- unsolicited label distribution
- LSR distributes binding to upstream LSR w/o a
request - (e.g. based on time interval, or upon receipt of
topology change) - LSR may support on-demand, unsolicited, or both
- adjacent LSRs must agree upon which mode to be
used - LSR A needs to send a packet to LSR B
- LSR A requests a label from LSR B
- B binds label 13 to the FEC
- B distributes the label
- A starts sending data with label 13
3.1
71Independent vs. Ordered
- independent binding (Tag Switching)
- each LSR makes independent decision to bind and
distribute - ordered binding (ARIS)
- egress LSR binds first and distributes binding to
neighbors - LSR that believes that it should be the
penultimate LSR - binds and distributes to its neighbors
- binding proceeds in orderly fashion until ingress
LSR is reached - LSRs must agree upon mode to be used
- B sees that it is egress LSR for 192.115.6
- B allocates label 13
- B distributes label to C and D
- C distributes label to E
3.1
72LDP tasks
- a label distribution protocol is a signaling
protocol - that can perform the following tasks
- discover LSR peers
- initiate and maintain LDP session
- signal label request
- advertise binding
- signal label withdrawal
- loop prevention
- explicit routing
- resource reservation
3.2
73Label Distribution Protocols
- label distribution can be carried over various
protocols - There are presently four options
- LDP
- MPLS-enhanced IP networks
- BGP4-MPLS
- RFC 2547 VPNs
- RSVP-TE
- traffic engineering support
- CR-LDP
- constraint based (no longer recommended by IETF)
3.2
74LDP vs. BGP
- both use TCP for reliable transport (LDP uses UDP
for hellos) - both are hard-state protocols
- both use TLV format for parameters
BGP multiprotocol (IPv4, IPv6, IPX, MPLS) highly
complex protocol provides routing / label
distribution built-in autodiscovery mechanism
- LDP
- MPLS only
- simpler protocol
- only label distribution
- extendable for autodiscovery
3.2
75LDP
- major focus of the IETF MPLS WG was the design of
LDP - based on similar TDP from Cisco
- LDP sets up a bidirectional LDP session
- both sides can request or advertise labels
- LDP usually uses TCP
- needs reliable transport (e.g. what happens if
miss a binding) - needs in-order delivery (e.g. bindingwithdrawal)
- hard to develop new reliable transport protocols
- single acknowledgement timer for session
- piggybacking ACK on data packets
- Use UDP for discovery (hello) messages
- periodic keepalive messages (if not received,
session terminated) - messages encoded in TLV (Type Length Value) form
3.2
76LDP Setup
Hello (UDP)
Discovery
Hello (UDP)
Initialization (TCP)
Session
Initialization (TCP)
Label Request (TCP)
Distribution
Label Distribution (TCP)
3.2
77Discovery Phase
- LSR periodically multicast transmits hello to
LDP discovery UDP port - to all routers on subnet multicast group
- to preconfigured IP address (when not all LSRs on
same subnet) - (extended discovery) targeted LDP
- LSRs listen on this UDP port for hello messages
- Hello message contains
- hold time
- LSR Identifier
- when LSR receives Hello from another LSR
- it opens a TCP connection to that other LSR (if
needed) - or (for extended discovery)
- it unicast transmits a hello back to the other
LSR - LDP session can now be established
3.2
78Session Initialization
- The LSR with higher identifier sends (TCP)
- a session initialization message to the other LSR
- session initialization message contains
- LDP Protocol version
- label distribution and control method
- timer values
- label space ranges (not any more !)
- if receiving LSR accepts these parameters
- then it transmits a KeepAlive
- else it transmits a reject
3.2
79Distribution Messages
- label mapping
- downstream LSR advertisement of a label mapping
for a FEC - two FEC types host address, IP address
prefix - label withdrawal
- reverse of mapping message
- downstream LSR informs upstream LSR
- that it has revoked a previous binding
- upstream LSR can not longer use the label
- label release
- upstream LSR informs downstream LSR
- that it no longer needs a binding
- typically when downstream is no longer next hop
- and operating in conservative retention mode
3.2
80Request Messages
- in downstream-on-demand mode upstream LSR must
request binding - upstream LSR sends label request message when
- FEC in FEC table
- next hop LSR is LDP peer
- FEC not in forwarding table
- FEC next hop changes
- upstream LSR doesnt have a mapping from new
next hop - receives FEC label request from upstream LDP peer
- next hop LSR is LDP peer
- upstream LSR doesnt have a mapping from next
hop - upstream LSR sends label request abort message
when - upstream LSR needs to revoke request before
satisfied - for example, next hop LSR for FEC has changed
3.2
81Notifications
- There are two types of notifications
- error notifications (fatal errors - terminate
session) - advisory notifications (status messages)
- LSR sends notification messages when
- received LDP message with unsupported protocol
version - received LDP message with unknown type
- KeepAlive timer expired
- session initialization fails due to unacceptable
parameters - etc.
3.2
82LDP state machine
- LSR periodically transmits hello UDP messages
- multicast to all routers on subnet group
- targeted to preconfigured IP address
- LSRs listen on this UDP port for hello messages
- when LSR receives hello from another LSR
- it opens a TCP connection to that other LSR
- or (for extended discovery)
- it unicast transmits a hello back to the other
LSR - LSR with higher ID sends session initialization
message - other LSR LDP accepts (sends keepalive) or
rejects - informative or keepalive messages sent
3.2
83LDP packet format
header (10B)
version (2B)
length (2B)
LDP-ID (6B)
message TLVs (variable)
- version presently 1
- length - PDU length, excluding version and length
fields - LDP-ID identifies label space of sending LDP
peer - LSR-ID(4B) globally unique LSR ID
- label space ID (2B) for per-port label spaces
- (zero for per-platform label spaces)
- message TLVs zero or more message TLVs (see
next page)
3.3
84LDP message TLVs
- U unknown message bit
- if message type unknown to receiver
- U0 receiver returns notification to sender
- U1 receiver silently ignores
- length - message length, excluding type and
length fields - message-ID unique ID for message (for matching
with returned notification) - if there are mandatory parameters, they most
appear in a specific order - optional parameters may appear in any order
3.3
85All LDP message types
- Hello (0x0100)
- Initialization (0x0200)
- KeepAlive (0x0201)
- Notification (0x0001)
- Address (0x0300)
- Address Withdraw (0x0301)
- Label Mapping (0x0400)
- Label Withdraw (0x0402)
- Label Request (0x0401)
- Label Release (0x0403)
- Label Abort Request (0x0404)
3.3
86LDP parameter TLVs
type (14b)
length (2B)
value
U
F
- U unknown message bit
- if message type unknown to receiver
- U0 receiver returns notification to sender
- U1 receiver silently ignores
- F forward unknown message bit
- if U1 and message type unknown to receiver
- F0 do not forward
- F1 forward
- type - TLV type (FEC TLV, label TLV, address list
TLV, hop count TLV, - path vector TLV,
status TLV ) - length - length of value field in bytes
3.3
87FEC TLV
type (2B) U0 F0 type0x0100
length (2B)
FEC element 1
- there may be more than one FEC element for
mapping messages only - the FEC elements are not themselves TLVs (no
length needed), instead - wildcard FEC (0x01)
- prefix FEC (0x02) address family (IPv4, IPv6,
Ethernet, E.164, etc.) prefix length in bits
prefix - host address FEC (0x03) address family length
address
3.3
88Generic Label TLV
type (2B) U0 F0 type0x0200
length (2B)
label (20 bits)
- this is the generic label TLV
- there are also special label TLVs for ATM and FR
based MPLS
3.3
89Status TLV
type (2B) U F type0x0300
length (2B)
E
F
status code data (30b)
message ID (32b)
message type (16b)
- Status TLVs mandatory parameters in
notification messages - optional parameters in
other messages - U0 when status in notification message, else U1
- E - fatal error bit E1 for fatal error E0
for advisory notification - the two F bits are equal and have the normal
meaning - status code data 0 means success
- message ID - the message to which the status
refers - message type - the message type to which the
status refers
3.3
90Example full message - label mapping
message-ID (4B)
mapping message type 0x0400
length24 (2B)
U0
FEC TLV (2B) U0 F0 type0x0100
length8 (2B)
prefix FEC element (8B) type 0x02
family1(IPv4) prefix-length16 prefix192.115/16
label TLV (2B) U0 F0 type0x0200
length4 (2B)
label 17
3.3
91BGP4 Label Distribution
- BGP peers exchange VPN routes
- can easily associate a label with these routes
- all BGP procedures are immediately available for
use - for label distribution messages
- BGP4 is a very extensible protocol
- multiprotocol extensions support address families
- (originally for IPv4,IPv6, etc)
- MPLS defines a new address family
3.3
92BGP
header (19B)
marker (16B)
length (2B)
type (1B)
data (variable)
- marker can be used for authentication (TCP MD5
signature) - length is total BGP PDU length, including header
- type
- OPEN (for session initialization)
- UPDATE (add, change and withdraw routes)
- NOTIFICATION (return error messages, terminate
session) - KEEPALIVE (heartbeat)
- KEEPALIVE packet consists of 19B header only
3.3
93BGP state machine
- idle no session (awaiting session
initialization) - connect attempting to connect to peer
- active started TCP 3-way handshake (router
busy) - open sent have sent OPEN message
- open confirm after receiving TCP SYN for OPEN
message - established BGP session up and running
3.3
94BGP OPEN
version (1B)
my AS (2B)
hold time (2B)
opt parameters (variable)
BGP-ID (2B)
op len (1B)
- version (3 or 4)
- my AS identifier of autonomous system
- hold time max time (sec) between receipt of
messages - BGP ID senders BGP identifier
- op len length (bytes) of optional parameters
- opt parameters - TLVs
3.3
95BGP UPDATE
path attributes (var)
WR len (2B)
withdrawn routes (var)
PA len (2B)
NLRI (var)
- Withdrawn Routes list of routes no longer to be
used (NLRI format- see below) - Path Attributes route specific information (see
next page) - Network Layer Reachability Information
(classless) routing information - the NLRI is a list of address-prefixes
- each prefix must be masked from the left to the
length specified
3.3
96BGP UPDATE - Path Attributes
- flags
- O optional/well-known bit
- if 1 must be recognized by all BGP
implementations - if W1 and unrecognized attribute, BGP sends
notification and session closed - T transitive/nontransitive bit
- if 1 and attribute unrecognized it is passed
along, else silently ignored - well-known attributes are always transitive
- C complete/partial bit (for optional transitive
attributes only) - L attribute length bit (0 attribute length is
1B, 1 length is 2B) - type code
- ORIGIN, AS_PATH, NEXT_HOP, MED, LOCAL_PREF,
- AGGREGATOR, COMMUNITY, ORIGINATOR_ID
3.3
97BGP NOTIFICATON
error code (1B)
error subcode (2B)
data (var)
- all notification messages cause BGP session to
close - error codes include
- message header error
- open message error
- update message error
- hold timer expired
- state machine error
- other fatal error
3.3
984.1 Introduction to Traffic Engineering 4.2
DiffServ and IntServ 4.3 TE protocols
99Traffic Engineering
- TE is control of network traffic to achieve
specific objectives - unfortunately users and providers have
contradictory objectives - user objectives (QoS)
- network availability
- packet loss
- end-to-end delay
- round-trip delay
- packet delay variation (PDV)
- error rate
- provider objectives (performance)
- bandwidth utilization
- resource utilization
- speed of failure recovery
- ease of management
- monetary outlay
4.1c
100Network and Traffic Engineering
- Network Engineering
- putting the bandwidth where the traffic is
- physical cable deployment (thick pipes)
- over-provisioning
- virtual connection provisioning
- violates providers objectives
- Traffic Engineering
- putting the traffic where the bandwidth is
- explicit traffic routing
- route optimization
- can it meet user objectives?
4.1
101IPs Problem
D
A
1
traffic from A to G 1Gb traffic from B to G
500Mb all links 1Gb except EF 500 Mb
G
C
0.5
B
E
F
- were C,D,E, and F ATM switches there would be no
problem - (1Gb over ACDG, 500Mb over BCEFG)
- with standard hop-count cost function, all
traffic over CDG - resulting in 1.5Gb there (congestion) and CEFG
idle - with administrative cost on CDG we can force all
the traffic to CEFG - even worse congestion !
- finally with administrative cost and ECMP we can
load balance - 750 Mb over CDG and CEFG, link EF is still
congested ! - what can we do?
4.1
102Solution
- IPs problem arises from
- the forwarding being purely local
- but the routing being too global
- IP routing always optimizes a global (usually
additive) metric - discrete optimization problem
- does not take local constraints (e.g. BW of
individual links) into account - we need to optimize a global metric
- while taking local constraints into account
- this is called constraint-based routing
- discrete optimization with inequality constraints
- main constraint is BW, but also maximum delay,
packet loss, etc. - another constraint explicit include/exclude
links/routers
4.1
103MPLS TE
- MPLS that we have seen so far allows explicit
routing - can minimize number of hops
- can tailor LSP to needs
- MPLS-TE LSPs can be setup according to
constraints - only include in LSP LSR with sufficient available
BW - only include in LSP LSR that guarantees
sufficiently low delay - traffic engineering achieved by extending base
protocols - enhanced routing protocols (resource reservation)
- OSPF ? OSPF-TE
- ISIS ? ISIS-TE
- constraint-based signalling protocols (pinned
LSPs and reservation) - RSVP ? RSVP-TE
- LDP ? CR-LDP (no longer under active development)
4.1
104IP Fixes
- two approaches to QoS were developed for IP
(never popular) - IntServ (guaranteed QoS)
- define traffic flows (CO approach)
- guarantee QoS attributes for each flow
- reserve resources at each router along the flow
- signaling protocol (RSVP) needed
- DiffServ (statistical QoS)
- retain CL paradigm
- no guaranteed QoS attributes
- classify packets (differentiated)
- offer special treatment (priority) relative to
other packets - no resource reservation
4.2
105DiffServ
RFCs 2474, 2475
- DIffServ was developed in IETF after IntServ
- IntServ too heavy-weight (and too revolutionary)
for most purposes - resource reservation is against IP-philosophy
- if not enough BW, then more democratic for all to
suffer - if reserve BW and dont use, then this is simply
over-provisioning - DiffServ is evolutionary coarse-grained
approach to IP QoS - DiffServ
- divides traffic into service classes
- and allocates resources on a per-class basis
- uses 6 bits of ToS byte in IP header to mark
packets - field is renamed Differentiated Services Code
Point - no setup or router state required
- DSCP defines per-hop behaviors (PHB)
- tells router how to treat packet
- three standard PHBs (BE, AF, EF)
4.2
106DiffServ PHBs
- Best Effort
- standard IP service
- QoS depends on momentary network load
- Assured Forwarding
- AF specifies class that determines queue
- in addition, three drop-precedence levels (low,
med, high) - AF packets from a single source should not be
mis-ordered - even if have different drop-precedence (i.e.
single queue) - Expedite Forwarding
- EF packet should experience no queuing delays
- EF packets should have low loss
- implemented by dedicated EF router queue
4.2
107MPLS and DiffServ
- what does all this have to do with MPLS ?
- MPLS will have to co-exist with DiffServ IP
- MPLS provides similar functionality
- Exp bits in MPLS shim header are similar to DSCPs
- only three bits (8 classes) while DiffServ ended
up with 6 bits - but 8 service classes is usually more than enough
- commonly 4 classes are offered (bronze,
silver, gold, platinum) - LSR can maintain mapping from Exp to PHB
- e.g. Exp000 for BE, Exp001 for AF low drop
precedence, etc - no new signaling mechanism required
- LSP with Exp mapping to PHB is called an E-LSP
- ingress LER assigns the EXP field
4.2
108L-LSP vs. E-LSP
- with DiffServ each FEC is defined by
- destination address
- service class
- there are two ways of implementing the DiffServ
mapping - L-LSP (Labeled-inferred LSP)
- behavior based on label alone
- support different service classes by using
different labels - LSP BW allocated from specific queue (class)
- Exp may be used for drop precedence
- E-LSP (Exp-inferred LSP)
- behavior based on label Exp field
- LSP BW allocated from link
- can support 8 different service classes per label
4.2
109IntServ
RFCs 2205-2216, 2379-2382