Title: The Basics of BGP Routing and its Performance in Today
1The Basics of BGP Routing and its Performance in
Todays Internet
- Nina Taft
- Sprint, Advanced Technology Labs
- California, USA
- May 2001
2Outline
1. Highlights 2. Addressing and CIDR 3. BGP
Messages and Prefix Attributes 4. BGP Decision
and Filtering Processes 5. I-BGP 6. Route
Reflectors 7. Multihoming 8. Aggregation 9.
Routing Instability 10. BGP Table Growth
3Internet Topology
- AS (Autonomous System) - a collection of routers
under the same technical and administrative
domain. - EGP (External Gateway Protocol) - used between
two ASs to allow them to exchange routing
information so that traffic can be forwarded
across AS borders. Example BGP
4Purpose to share connectivity information
AS2
BGP
AS1
A
border router
internal router
5BGP Sessions
- One router can participate in many BGP sessions.
- Initially node advertises ALL routes it wants
neighbor to know (could be gt50K routes) - Ongoing only inform neighbor of changes
AS1
BGP Sessions
AS3
AS2
6Routing Protocols
A
E-BGP
AS2
7Configuration and Policy
- A BGP node has a notion of which routes to share
with its neighbor. It may only advertise a
portion of its routing table to a neighbor. - A BGP node does not have to accept every route
that it learns from its neighbor. It can
selectively accept and reject messages. - What to share with neighbors and what to accept
from neighbors is determined by the routing
policy, that is specified in a routers
configuration file.
8History
- Popularity
- until a few years ago BGP fairly unknown. Used by
small number of large ISPs. - in 1995 (beginning of web popularity) number of
organizations using BGP grew tremendously. - Two reasons for growth of usage and interest
- significant growth in number of ISPs
- organizations were born that had mission-critical
dependence upon their connectivity - CIDR (Classless Inter-Domain Routing) introduced
in 1995
9Assigning IP address and AS numbers (Ideally)
- A host gets its IP address from the IP address
block of its organization - An organization gets an IP address block from its
ISPs address block - An ISP gets its address block from its own
provider OR from one of the 3 routing registries - ARIN American Registry for Internet Numbers
- RIPE Reseaux IP Europeens
- APNIC Asia Pacific Network Information Center
- Each AS is assigned a 16-bit number (65536 total)
Currently 10,000 ASs in use
10Addressing Schemes
- Original addressing schemes (class-based)
- 32 bits divided into 2 parts
- Class A
- Class B
- Class C
- CIDR introduced to solve 2 problems
- exhaustion of IP address space
- size and growth rate of routing table
2 million nets 256 hosts
11Problem 1 Lifetime of Address Space
- Example an organization needs 500 addresses. A
single class C address not enough (256 hosts).
Instead a class B address is allocated. (64K
hosts) Thats overkill -a huge waste. - CIDR allows networks to be assigned on arbitrary
bit boundaries. - permits arbitrary sized masks 178.24.14.0/23 is
valid - requires explicit masks to be passed in routing
protocols - CIDR solution for example above organization is
allocated a single /23 address (equivalent of 2
class Cs).
12Problem 2 Routing Table Size
Without CIDR
With CIDR
232.71.0.0 232.71.1.0 232.71.2.0 .. 232.71.255.0
232.71.0.0/16
13CIDR Classless Inter-Domain Routing
- Address format ltIP address/prefix Pgt. The prefix
denotes the upper P bits of the IP address. - Idea - use aggregation - provide routing for a
large number of customers by advertising one
common prefix. - This is possible because nature of addressing is
hierarchical - Summarizing routing information reduces the size
of routing tables, but allows to maintain
connectivity. - Aggregation is critical to the scalability and
survivability of the Internet
14Address Arithmetic Address Blocks
- The ltaddress/prefixgt pair defines an address
block - Examples
- 128.15.0.0/16 gt 128.15.0.0 - 128.15.255.255
- 188.24.0.0/13 gt 188.24.0.0 - 188.31.255.255
consider 2nd octet in binary - Address block sizes
- a /13 address block has 232-13 addresses (/16 has
232-16) - a /13 address block is 8 times as big as a /16
address blockbecause 232-13 232-16 23
15CIDR longest prefix match
- Because prefixes of arbitrary length allowed,
overlapping prefixes can exist. - Example router hears 124.39.0.0/16 from one
neighborand 124.39.11.0/24 from another neighbor - Router forwards packet according to most specific
forwarding information, called longest prefix
match - Packet with destination 124.39.11.32 will be
forwarded using /24 entry. - Packet w/destination 124.39.22.45 will be
forwarded using /16 entry
16Will CIDR work ?
- For CIDR to be successful need
- address registries must assign addresses using
CIDR strategy - providers and subscribers should configure their
networks, and allocate addresses to allow for a
maximum amount of aggregation - BGP must be configured to do aggregation as much
as possible - Factors that complicate achieving aggregation
- multihoming, proxy aggregation, changing providers
17Four Basic Messages
- Open Establishes BGP session (uses TCP port
179) - Notification Report unusual conditions
- UpdateInform neighbor of new routes that become
activeInform neighbor of old routes that become
inactive - Keepalive Inform neighbor that connection is
still viable
18OPEN Message
- During session establishment, two BGP speakers
exchange their - AS numbers
- BGP identifiers (usually one of the routers IP
addresses) - A BGP speaker has option to refuse a session
- Select the value of the hold timer
- maximum time to wait to hear something from
other end before assuming session is down. - authentication information (optional)
19NOTIFICATION and KEEPALIVE Messages
- NOTIFICATION
- Indicates an error
- terminates the TCP session
- gives receiver an indication of why BGP session
terminated - Examples header errors, hold timer expiry, bad
peer AS, bad BGP identifier, malformed attribute
list, missing required attribute, AS routing
loop, etc. - KEEPALIVE
- protocol requires some data to be sent
periodically. If no UPDATE to send within the
specified time period, then send KEEPALIVE
message to assure partner that connection still
alive
20UPDATE Message
- used to either advertise and/or withdraw prefixes
- path attributes list of attributes that pertain
to ALL the prefixes in the Reachability Info field
Withdrawn routes length (2 octets)
FORMAT
Withdrawn routes (variable length)
Total path attributes length (2 octets)
Path Attributes (variable length)
Reachability Information (variable length)
21Advertising a prefix
- When a router advertises a prefix to one of its
BGP neighbors - information is valid until first router
explicitly advertises that the information is no
longer valid - BGP does not require routing information to be
refreshed - if node A advertises a path for a prefix to node
B, then node B can be sure node A is using that
pathitself to reach the destination.
22ATTRIBUTES
- ORIGIN
- Who originated the announcement? Where was a
prefix injected into BGP? - IGP, EGP or Incomplete (often used for static
routes) - AS-PATH
- a list of ASs through which the announcement for
a prefix has passed - each AS prepends its AS to the AS-PATH
attribute when forwarding an announcement - useful to detect and prevent loops
23Attribute NEXT HOP
- For EBGP session, NEXT HOP IP address of
neighbor that announced the route. - For IBGP sessions, if route originated inside AS,
NEXT HOP IP address of neighbor that announced
the route - For routes originated outside AS, NEXT HOP of
EBGP node that learned of route, is carried
unaltered into IBGP.
BGP Table at Router C
IP Routing Table at Router C
24Attribute Multi-Exit Discriminator (MED)
- when ASs interconnected via 2 or more links
- AS announcing prefix sets MED
- enables AS2 to indicate its preference
- AS receiving prefix uses MED to select link
- a way to specify how close a prefix is to the
link it is announced on
AS1
Link B
Link A
MED50
MED10
AS2
AS3
25Attribute Local Preference
- Used to indicate preference among multiple paths
for the same prefix anywhere in the internet. - The higher the value the more preferred
- Exchanged between IBGP peers only. Local to the
AS. - Often used to select a specific exit point for a
particular destination
BGP table at AS4
26Routing Process Overview
Choose best route
accept, deny, set preferences
forward, not forward set MEDs
Output policy engine
BGP table
IP routing table
27Input Policy Engine
- Inbound filtering controls outbound traffic
- filters route updates received from other peers
- filtering based on IP prefixes, AS_PATH,
community - denying a prefix means BGP does not want to reach
that prefix via the peer that sent the
announcement - accepting a prefix means traffic towards that
prefix may be forwarded to the peer that sent the
announcement - Attribute Manipulation
- sets attributes on accepted routes
- example specify LOCAL_PREF to set priorities
among multiple peers that can reach a given
destination
28BGP Decision Process
- 1. Choose route with highest LOCAL-PREF
- 2. If have more than 1 route, select route with
shortest AS-PATH - 3. If have more than 1 route, select according
to lowest ORIGIN type where IGP lt EGP lt
INCOMPLETE - 4. If have more than 1 route, select route with
lowest MED value - 5. Select min cost path to NEXT HOP using IGP
metrics - 6. If have multiple internal paths, use BGP
Router ID to break tie.
29Output Policy Engine
- Outbound Filtering controls inbound traffic
- forwarding a route means others may choose to
reach the prefix through you - not forwarding a route means others must use
another router to reach the prefix - may depend upon whether E-BGP or I-BGP peer
- example if ORIGINEGP and you are a non-transit
AS and BGP peer is non-customer, then dont
forward - Attribute Manipulation
- sets attributes such as AS_PATH and MEDs
30 Transit vs. Nontransit AS
Transit traffic traffic whose source and
destination are outside the AS
Nontransit AS does not carry transit traffic
Transit AS does carry transit traffic
- Advertise own routes only
- Do not propagate routes learned from other ASs
- case 1
- Advertises its own routes PLUS routes
- learned from other ASs
31Clients, Providers and Peers
- AS has customers, providers and peers
- Relationships between AS pairs
- customer-provider
- peer-to-peer
- Type of relationship influences policies
- Exporting to providerAS exports its routes
its customers routes, but not routes learned
from other providers or peers - Exporting to peer (same as above)
- Exporting to customerAS exports its routes plus
routes learned from its providers and peers
32Internal BGP (I-BGP)
- Used to distribute routes learned via EBGP to all
the routers within an AS - I-BGP and E-BGP are same protocol in that
- same message types used
- same attributes used
- same state machine
- BUT use different rules for readvertising
prefixes - Rule 1 prefixes learned from an E-BGP neighbor
can be readvertised to an I-BGP neighbor, and
vice versa - Rule 2 prefixes learned from an I-BGP neighbor
cannot be readvertised to another I-BGP neighbor
33I-BGP Preventing Loops and Setting Attributes
- Why rule 2? To prevent announcements from
looping. - In E-BGP can detect via AS-PATH.
- AS-PATH not changed in I-BGP
- Implication of rule a full mesh of I-BGP
sessions between each pair of routers in an AS is
required - Setting Attributes The router that injects the
route into the I-BGP mesh is responsible for - setting the LOCAL-PREF attribute
- prepending AS to AS-PATH
34Route Reflectors
- Problem requiring a full mesh of I-BGP sessions
between all pairs of routers is hard to manage
for large ASs. - Solution
- group routers into clusters.
- Assign a leader to each cluster, called a route
reflector (RR). - Members of a cluster are called clients of the RR
- I-BGP Peering
- clients peer only with their RR
- RRs must be fully meshed
RR
RR
clusters
clients
35Route Reflectors Rule on Announcements
- If received from RR, reflect to clients
- If received from a client, reflect to RRs and
clients - If received from E-BGP, reflect to all - RRs and
clients - RRs reflect only the best route to a given
prefix, not all announcements they receive. - helps size of routing table
- sometimes clients dont need to carry full table
36Avoiding Loops with Route Reflectors
- Loops cannot be detected by traditional approach
using AS-PATH because AS-PATH not modified within
an AS. - Announcements could leave a cluster and re-enter
it. - Two new attributes introduced
- ORIGINATOR_ID router id of routes originator in
ASrule announcement discarded if returns to
originator - CLUSTER_LIST a sequence of cluster ids. set by
RRs.rule if an RR receives an update and the
cluster list contains its cluster id, then update
is discarded.
37Default Routes
AS2
- If you dont have a routing entry in the table
for a destination, send it along the default
route - Can be statically configured
- Can be learned via BGP
- Can have multiple defaults
- use LOCAL_PREF to distinguish primary and backup
default routes
0.0.0.0/0 next_hop1.1.1.1
0.0.0.0/0 next_hop2.2.2.2
Local pref100
Local pref300
AS1
38Multihoming
39Single-homed vs. Multi-homed subscribers
- A single-homed network has one connection to the
Internet (i.e., to networks outside its domain) - A multi-homed network has multiple connections to
the Internet. Two scenarios - can be multi-homed to a single provider
- can be multi-homed to multiple providers
- Why multi-home?
- Reliability
- Performance
- a sites bandwidth to internet is sum of
bandwidth on all links
40Single-homed AS
- Subscriber called a stub AS
- Provider-Subscriber communication for route
advertisement - static configuration. most common.
- Providers router is configured with subscribers
prefix. - good if customers routes can be represented by
small set of aggregate routes - bad if customer has many noncontiguous subnets
- can use BGP between provider and stub AS
Provider
R1
R2
Subscriber
41Multihoming to a Single Provider 4 scenarios
42Multihoming to Multiple Providers
43Multihoming Issues
- Load sharing
- how distribute the traffic over the multiple
links? - Reliability
- if load sharing leads to preferencing certain
links for certain subnets, is reliability
reduced? - Address/Aggregation
- which subnet addresses should the multihomed
customer use? - how will this affect its providers ability to
aggregate routes?
44Load sharing from ISP to Customer using
attributes
- Goal provider splits traffic across 2 links
according to prefix - Implement this strategy using attributes
- customer sets MEDs
- provider sets LOCAL_PREF
R2
R3
Customer
45Load sharing from Customer to ISP using policy
blue announcements red traffic
- Goal send traffic to ISPs customers on one
link send traffic to the rest of the Internet on
2nd link - Implement using policy to control announcements
ISP
R1
R2
advertise customer routes only
advertise default route 0/0
R3
traffic
Customer
46Address/Aggregation Issue
- Where should the customer get its address block
from? - 1. From ISP1
- 2. From ISP2
- 3. From both ISP1 and ISP2
- 4. Independently from address registry
(cases 1 and 2 are equivalent)
47Case 1 2 Get address block from one ISP
- example customer gets address from ISP 1
- ISP 1s aggregation is not broken
- customers prefix not aggregatable at ISP 2
- longer prefix becomes a traffic magnet
- How good is load sharing?If all ISPs generate
same amount of traffic for customer, then
ISP2-customer link twice as loaded as
ISP1-customer link
48Case 3 Get address block from both ISPs
- announcement policy announce prefix only to its
parent - advantage both ISPs can aggregate the prefix
they receive - disadvantage lose reliability
- load balancing good? depends upon how much
traffic sent to each prefix
140.20/16
200.50/16
140.20.1/24
200.50.1/24
140.20.1/24
200.50.1/24
49Case 4 obtain address block from registry
- no aggregation possible
- no traffic magnets created
- good reliability
- how achieve load sharing?
- customer breaks address block into 2 /25 blocks,
and announce one per link (but may lose
reliability) - OR use method of AS-PATH manipulation
150.55.10/24
150.55.10/24
50AS-PATH manipulation
- Idea prepend your AS number in AS-PATH multiple
times to discourage use of a link - makes AS-PATH seem longer than it is
- recall BGP decision process uses shortest AS-PATH
length as a criteria for selecting best path - Example ISP 3 will choose path through AS 2
because its AS-PATH appears shorter
150.55.10/24 - 33
51Aggregation
52Address Arithmetic When is aggregation
possible? Case 1
- Possible when one prefix contained in another.
- Example Two ASs having customer-provider
relationship. Provider does the aggregation. - Provider has address block 18.0.0.0/8
- Its customers have address blocks 18.6.0.0/15 and
18.9.0.0/15 - Provider announces its address block only
- Rule Prefix p1 contains prefix p2 iff
length(p2) gt length(p1) AND address(p2) /
232-length(p1) address(p1) / 232-length(p1)
53Address Arithmetic When is aggregation
possible? Case 2
- Some pairs of consecutive prefixes
- Example routes within the same AS
- AS has 2 address blocks 1.2.2.0/24
0000001.00000010.00000010.00000000/241.2.3.0/24
0000001.00000010.00000011.00000000/24can
announce instead 1.2.2.0/23 - Rule consecutive prefixes p1 and p2 are
aggregatable iff - length(p1)length(p2) ANDaddress(p1) /
232-length(p1) 1 address(p2) / 232-length(p2)
AND address(p1) 233-length(p1) 0
54Aggregation and Multihoming
- Most common scenario customer breaks its address
block in 2 for load sharing purposes.YET, also
advertises whole block for reliability.
AS 2
1.2.3.0/24
1.2.2.0/24
1.2.2.0/23
1.2.2.0/23
customer
1.2.2.0/23
55Black holes and cardinal sins
- The cardinal sin of BGP routing is advertising
routes that you don't know how to get to called
"black-holing" someone - If you announce part of IP space owned by someone
else, using a more-specific prefix, all their
traffic flows to you. Makes that address block
disconnected from the Internet. - Example black holes can happen inadvertently by
non-careful aggregation
wrong !! 100.24.0.0/18
100.24/16
222.2/16
ISP 1 100.24/16
ISP 2 222.2/16
56Limitations to Aggregation
- Lack of hierarchical allocation of address space
prior to CIDR (before 1995) - A single AS can have noncontiguous address blocks
- Customer AS and provider AS can have
non-contiguous address blocks - Reluctance of customers to renumber their address
space when they switch providers - Multi-homing
- multi-homed prefixes require global visibility
- may choose not to load sharing
57Rules of Thumb for Aggregation
- To avoid black holes an ISP is not allowed to
aggregate routes unless it is a supernet of the
address block it wants to aggregate - In other words, an ISP has to specifically
announce each IP address range of its downstream
customers that are not part of its own address
space.
58Routing Instability
59Route Stability
- Routing instability rapid fluctuation of network
reachability information - route flapping when a route is withdrawn and
re-announced repeatedly in a short period of
time - happens via UPDATE messages
- because messages propagate to global Internet,
route flapping behavior can cascade and
deteriorate routing performance in many places - Effects increased packet loss, increased network
latency, CPU overhead, loss of connectivity - Route Stability Studies by C. Labovitz, R. Malan
F. Jahanian
60Types of Routing Updates
- Forwarding instability
- reflects legitimate topology changes
- e.g., changes in Prefix, NEXT_HOP and/or ASPATH
- affects forwarding paths used
- Policy fluctuation
- reflects changes in policy
- e.g., changes in MED, LOCAL_PREF, etc.
- may not necessarily affect forwarding paths used
- Pathological
- redundant messages
- reflect neither topology nor policy changes
61Anecdotes of Route Flap Storms
- April 25, 1997 - small Virginia ISP injected
incorrect map into global Internet. Map said
Virginia ISP had optimal connectivity to all
destinations. Everyone sent their traffic to this
ISP. Result shutdown of Tier-1 ISPs for 2 hours. - August 14, 1998 - misconfigured database server
forwarded all queries to .net to wrong server.
Result loss of connectivity to all .net servers
for few hours. - Nov. 8, 1998 - router software bug led to
malformed routing control message. Caused
interoperability problem between Tier-1 ISPs.
Result persistent pathological oscillations and
connectivity loss for several hours.
62Taxonomy (as per Labovitz et. al.)
63General Statistics
- 1996 3-5 million updates per day in Internet
core - 1998 300K-700K updates per day in Internet core
- 1996 average number of announcements per day was
275K. - 1998 average number of announcements per day was
400K - Correlation of instability and usage
- instability highest during business hours
- instability lowest during nights, on weekends and
in summer
64Per Event Type Statistics
- 1996 relative impact (approximately) WWDup
(96), AADup (2), WADup (1), AADiff(1/2),
WADiff (1/2)
- 1998 relative impact (approximately) AADup
(30-40), WWDup(25-30), AADiff (20) other
(rest)
65Whos Responsible?
- ASs
- No single AS dominates instability statistics
- No correlation between the size of an AS and its
share of updates generated. - Prefixes
- Instability is evenly distributed across routes.
- Example of measurements
- 75 of AADiff events come from prefixes change
less than 10 times a day. - 80-90 of instability comes from prefixes that
are announced less than 50 times/day.
66Sources of Instabilities in General
- router configuration errors
- transient physical and data link problems
- software bugs
- problems with leased lines (electrical timing
issues that cause false alarms of disconnect) - router failures
- network upgrades and maintenance
67Instability Problem and Cause. Example 1.
- Problem 3-5 million duplicate withdrawals
- Cause stateless BGP implementation
- time-space tradeoff no state maintained on what
advertised to peers - when receive any change, transmit withdrawal to
all peers regardless of whether previously
notified or not - sent updates for both explicit and implicit
withdrawals - By 1998, most vendors had BGP implementations
with partial state. - Result number of WWDups reduced by an order of
magnitude
68Instability Problem and Cause. Example 2
- Problem duplicate announcements
- Cause min-advertisement timer stateless BGP
- min-adv timer wait 30 seconds. Combine all
received updates in last 30 seconds into single
outbound update message (if possible). - within 30 seconds route can be withdrawn and
re-announced so that there is no net change to
original announcement - Solution Have BGP keep some state about recently
sent messages to peers. Avoid sending duplicate
messages
69Instability Problem and Cause. Example 3
Net X
AS 1
AS3
6
5
4
10
3
2
AS2
border router
internal router
Example interaction IGP/BGP policy set MED
using IGP metrics, such as shortest
path
E-BGP
IGP
70Controlling route instability Route Dampening
- track number of times a route has flapped over a
period of time - routes that exhibit a high level of instability
in a short period of time should be suppressed
(not advertised) - penalize ill behaved routes proportionally to
their expected future stability - if a suppressed route stops flapping for a long
enough period of time, unsuppress it (readvertise)
71Route Dampening
suppress limit
penalty
reuse limit
time
72Route Dampening Algorithm
- Each time a route flaps, increase the penalty
- If the route has not flapped in the last
half-life time units, then cut penalty in half. - If the penalty gt suppress limit, then suppress
the route - If the penalty lt reuse limit, then free a
suppressed route
73Side Effect of Route Dampening
- A legitimate update may arrive and it will be
ignored because that route has been suppressed
and not yet released. - The modification needed for the legitimate
announcement is delayed
74Aggregation can help route flapping
- If a more-specific route is flapping, but
provider only announces aggregated prefix, then
other networks dont see route flap. - Hence aggregation can mask route flapping.
- Aggregation helps combat instability because it
reduces the number of networks visible in the
core Internet.
not flapping
140.40/16
140.40.4/24
flapping
75Growth of BGP Tables
76Long Term Growth Trends in Internet Routing
- Will this routing system be able to scale and
meet the growth of the Internet and its
ever-expanding level of demands? - Are there any inherent limitations?
- As more devices connect to Internet and consume
addresses, the need to maintain reachability to
these addresses implies larger routing tables - What is the ability of the system to produce a
stable view of the overall network topology?
77BGP Table Growth (1989-2001)
78AS Number Growth
79Reasons for Exponential Growth
Data in last 3 slides from Geoff Hustons INET
publications
80Increasing fine levels of routing details in BGP
table
- AS space growth at 50 addresses spanned by
the table growing at 7gt each AS advertising
smaller address ranges - /24 is fastest growth prefix in table (in
absolute terms). /24 - /31 is area with fastest
relative growth
- 1999 average span of prefix 16,000 addresses
(mean prefix length 18.03)2000 average span of
prefix 12,000 addresses (mean prefix length 18.44)
81Holes
- When both aggregated prefix and a more-specific
prefix exist in the table, the more-specific
prefix is called a hole. - Why are holes created?
- Punch hole in policy of larger aggregated
announcement to create a different policy for
finer announcement. - Load sharing in multihoming scenario
- reliability via multihoming
- 37 of BGP table is holes.
82Multihoming vs. Resiliency
- Multiple peering relationships can be cheaper
than using a single upstream provider - implies multihoming is seen as a substitute for
upstream service resiliency - Impact
- providers cant command a price for reliability,
and thus dont spend money to engineer it into
network. - resiliency is becoming responsibility of customer
not provider - Can BGP scale adequately to continue to undertake
this role?
83Conclusions
- Things are getting better (stability)
- router software and configuration management are
maturing - increased emphasis on aggregation and route
dampening are helping - Things are getting worse (scalability)
- multihoming is still growing
- internet topology growing less hierarchical
- noise in BGP table is growing
84Bibliography
85Bibliography (contd)