Title: INFO 330 Computer Networking Technology I
1INFO 330Computer Networking Technology I
- Chapter 4
- The Network Layer
- Glenn Booker
2The Network Layer
- So, the transport layer provides process to
process communication - The network layer is expected to provide host to
host communication - Cool.
- Um, how?
3The Network Layer
- The Network Layer has to do two things
- Forwarding is the process within a single router
to determine which outgoing link a packet has to
take - Routing is the process (and algorithm) of
choosing the best path (route) between source
and destination - Forwarding is like deciding which turn to make
at one intersection - Routing is deciding which roads to take
4The Network Layer
- Recall the network layer is expected to
- Receive segments from the transport layer
- Encapsulate them into datagrams (how much does
data weigh?) - And pass them through the network
- The job of most routers is to look at the network
header information, and determine which link to
pass the datagram - The application and transport layer information
are invisible and irrelevant to routers
5The Network Layer
- A router has a forwarding table which tells which
link to take, based on the headers destination
address - The forwarding table is written based on output
from a routing algorithm - Routing algorithms may be centrally controlled
and then downloaded to each router or each
router may follow their own algorithm
6The Network Layer
- A packet switch is a device that transfers a
packet from an input link to an output link - Some are link-layer switches, which use the link
layer header info - The rest we call routers, which use network layer
header info - Another function in the network layer can be
connection setup - Only for virtual circuit networks (ATM, X.25)
7Network Service Model
- What services could we expect from a network
layer? - Guaranteed delivery of all packets
- Delivery within a specified time (bounded delay)
- Delivery of packets in order
- Guaranteed minimal bandwidth
- Guaranteed maximum jitter (delay variation)
- Security services
- Would be nice, huh?
8Network Service Model
- What do we get from the Internet?
- Best-effort service
- Meaning, none of the above!!
- Some VC networks, such as ATM, can provide many
of the ideal services (see p. 322) - Constant Bit Rate (CBR) and Available Bit Rate
(ABR) are types of ATM service
9Network Service Model
- Refining our earlier definition, the network
layer can provide connection-based or
connection-less service - A network that provides only a connection-based
service at the network layer is a virtual
circuit (VC) network - A network that provides only connectionless
service at the network layer is a datagram
network
10Virtual Circuit Networks
- A VC Network needs to have
- A path from source to destination
- VC numbers, one per link along the path
- Entries in the forwarding table in each router
along the path - Each packet carries a VC number which changes as
it goes along each link in the VC - This keeps from having to store and coordinate VC
numbers across routers
11Virtual Circuit Networks
- Each router has to know the VC numbers for
incoming and outgoing linksIncoming Link
Incoming VCandOutgoing Link Outgoing VC
- Each foursome of in/out link and VC numbers
corresponds to how one VC is handled in that
router so each VC being created adds one line of
data (which is later removed)
12Virtual Circuit Networks
- So a simple VC might have VC 12 on the first
link, then get VC 22 on the second link, and VC
37 on the third - So the life of a VC connection includes
- VC setup the network layer defines the routers
in the VC, sets VC numbers for each link, and
creates new entries in the forwarding table of
each router
13Virtual Circuit Networks
- Data transfer is the intended purpose of the VC
connection - VC teardown is when sender or receiver tells the
VC it wants to end the connection then the
forwarding tables are updated to remove the
entries associated with this VC - Notice that VC setup and teardown involve the
hosts and all routers along the path, whereas
TCP only involved the hosts
14Virtual Circuit Networks
- The messages to set up and tear down a VC are
signaling messages, which have their own
protocols, e.g. ATMs Q.2931 - No, were not going to dissect them
- yippee
15Datagram Networks
- Datagram networks stamp each packet with the
address of the destination host, and send it into
the network - There is no state information about connections,
because there arent any connections within the
network!
16Datagram Networks
- Each router between hosts uses the address to
forward the packet using a forwarding table - If our addresses had 32 bits, there could be
4,294,967,296 entries in that table!
17Datagram Networks
- Fortunately, we dont need to look at ALL of the
address to determine its correct link (a key
observation!) - Instead, match the address prefix with
forwarding table entries - Use the longest prefix matching rule
- Match the longest prefix possible in the
forwarding table - For this to be practical, large ranges of
addresses should go to each link, or the table
will be huge!
18Longest prefix matching rule
- The router just finds the longest prefix and uses
that entry in the routing table to forward the
packet - Prefix Link
- 11001000 00010111 00010 0
- 11001000 00010111 00011000 1
- 11001000 00010111 00011 2
- Otherwise 3
19Datagram Networks
- So even though there is no connection data,
routers in datagram networks need to maintain the
forwarding tables - The routing algorithm typically updates them
every 1-5 minutes - Hence its quite possible for the later part of a
long session to follow a different path than the
first part!
20More History
- The VC network came about because of its
similarity to telephone networks - But the Internet was connecting complex
computers, so the datagram network was created
because the computers could handle more complex
operations than the routers (recall our IMP
friends from Chapter 1) - This also makes it easier to connect dissimilar
networks, and create many new applications
21Router Innards
- Now look at forwarding in more detail
- A router has four kinds of parts
- Input ports
- Output ports
- Switch fabric between the inputs and outputs
- And a routing processor to control the switch
fabric, using the routing protocols
22Router Innards
23Router Innards
- The input and output ports include
- The physical connection to the network, and
- Take the signal through the data link layer
- The input ports also look up the destination
address, decides how to forward the packet, and
creates control packets to send to the routing
processor - The three boxes represent the physical layer,
data link layer, and lookup/forward module
24Input Ports
- The routing processor determines the forwarding
table contents, and shadow copies it to each
input port - This avoids a processing bottleneck
- Looking up where to forward packets is simple in
concept the challenge is maintaining line speed - Want to process each packet in less time than it
takes to receive the next one
25Tree Lookup
- One way to look up the correct output port is
through a binary tree data structure - Look at the first bit in the address if its a
zero, follow the left branch of the tree
otherwise follow the right branch - Repeat as many times as needed to resolve the
address - Sadly, this is still too slow
- Content addressable memories (CAMs), caching,
and better data structure are possible solutions
26Tree Lookup
For a 3-bit address
27Switching Fabric
- The input ports determine the output port needed
switching fabric makes it happen - Many approaches for switching fabric have been
used - Switching via memory uses the CPU directly
- Switching via bus makes every packet go over a
bus before getting off at the correct output
very slow
28Switching Fabric
- Switching via interconnect network uses 2n
horizontal and vertical buses to connect n inputs
to n outputs but this can produce blockages - Lots of other approaches have been used
- Switches handle staggering data rates (400
million packets/sec as of 11/09), so their
technology is constantly being pushed
29Switching Fabric Approaches
30Output Ports
- The output ports take packets from the output
port memory (queue) and transmit them over the
outgoing link - Hence the three functions of output ports are
- Queuing
- Data link processing
- Physical line termination
31Queuing
- Weve discussed buffers in connection with
output ports, but they also exist with input
ports - Packet loss can occur at input or output queues,
depending on - Input traffic load
- Switching fabric speed
- Line speed
32Switching Fabric Speed
- For a router with n input and n output ports
- If the switching fabric has a speed n times as
fast as the input line speed, no queuing can
occur at the inputs - But the output ports can easily become overloaded
if many inputs all feed the same output port - A packet scheduler at the output port decides
which packet is next for transmission
33Packet Scheduler
- The packet scheduler needs rules
- Could use first come, first served (FCFS)
approach - Could use weighted fair queuing (WFQ)
- The packet scheduler affects the quality of
service of the connection - More details on this in Chapter 7, which we
arent covering this term
34Incoming Buffer
- If theres not enough room in the buffer for a
new incoming packet, have to decide - Drop the new packet (called drop tail), or
- Drop an existing packet to make room
- Can also mark packets for congestion control when
buffer is getting full - Dropping and marking strategies are Active Queue
Management (AQM) algorithms
35Incoming Buffer
- Examples of AQM algorithms include
- Random Early Detection (RED), which uses random
variables to decide when to drop or mark a packet
when buffer approaches full - If the switch fabric is too slow, packets have to
wait in the input queue before moving to an
output queue - Head-of-the-line (HOL) blocking is when a packet
waits for a packet to cross, even though its
output port is open
36The Internet Protocol (IP)
- Now see how all this applies to the Internet
- Well cover both the existing IPv4 and the
emerging IPv6 (versions 4 and 6) - The network layer has three major parts
- Internet Protocol, which handles addressing
- Routing protocols (e.g. RIP, OSPF, BGP), which
choose the best path for packets - Internet Control Message Protocol (ICMP), which
handles error reporting and signaling
37Datagram Format
- A segment in the transport layer becomes one or
more datagrams in the network layer - First discuss IPv4, with hints how IPv6 is
different
38Datagram Format
- The IPv4 datagram header has at least five
4-byte (32-bit) fields, like TCP - Version number, header length, type of service,
and datagram length in bytes - Identifier, some flags, and fragmentation offset
- Time-to-live, upper layer protocol, and header
checksum - Source IP address (32 bits)
- Destination IP address (32 bits)
- Then options, followed by the segment data
39Datagram Format
- Version number is 4 bits for the IP version
- Header length is 4 bits for the number of bytes
in the IP header (usually 20 B) - Type of service (TOS) is 8 bits which allow one
to specify different levels of service (real
time or not) - Datagram length in bytes is the total of the
header plus the actual data segment - Is a 16 bit field, but typical length is under
1500 B
40Datagram Format
- The Identifier, flags, and fragmentation offset
all relate to IP fragmentation (breaking a
segment into multiple datagrams) - Time-to-live (TTL) is a countdown integer, to
prevent packets from wandering in the network for
40 years - It increments down one with each router, and
kills the datagram when it gets to zero
41Datagram Format
- Protocol is the transport layer protocol
- Only used when get to the destination host
- E.g. 6TCP, 17UDP see RFC 3232 for others
- Header checksum hey, didnt we have a transport
checksum? - Yes, but this only covers the IP header, not the
segment data - And TCP might be run over other network
protocols, e.g. our VC buddy, ATM
42Datagram Format
- Source and destination IP addresses well
discuss in more detail soon - Option fields allow for rarely used functions,
but slow IP processing - Hence these are not allowed in IPv6
- The Data in the datagram can be the TCP or UDP
segment, or contain other message formats such as
ICMP
43Fragmentation
- A frame can hold up to the Maximum Transmission
Unit (MTU) bytes of data - But not all link-layer protocols can handle the
same size packets - Ethernet handles up to 1500 B frames
- Some WAN protocols only handle 500 B frames
- Since datagrams get passed from one router to the
next, and dont know the path ahead, some routers
have to break up a datagram
44Fragmentation
- An IP datagram can be broken into two or more
fragments - Expect the fragments to be reassembled by the
destination hosts network layer - Recurring theme minimize work done by routers
- Each initial datagram has an identification
number, in addition to the source and destination
addresses
45Fragmentation
- This is the Identification field in the header
- The identification number is incremented for
each new segment - Each fragment keeps the original identification
number - The last fragment has Flag0 set, all other frags
with that ID number have Flag1 - The offset field identifies where the frag fits
in the original datagram the number of 8-byte
chunks from the start
46Fragmentation Example, p. 347
- Suppose we have a 4000 B datagram (20 B of
header, plus 3980 B of segment), but the MTU only
allows 1500 B per frame - Make three fragments (4000/1500 round up)
- All frags have the same identifier (e.g. 777)
- The first two frags will have 1480 B of data,
plus 20 B of IP header the last frag will have
the remaining data (1020 B) plus 20 B header - The first two frags have Flag1 the last Flag0
47Fragmentation Example, p. 347
- The offset value is weird counts 8-byte chunks
- Offset is 0 for the first frag (its the first
frag), 185 8-byte chunks (1480 B) for the second
frag, and 370 8-byte chunks (2980 B) for the
third frag - Why 8-byte chunks? Offset is a 13 bit field, but
the offset in bytes could be 16 bits long, hence
use 8 (23) byte chunks to describe offset - Forces fragments to be a multiple of 8 bytes in
size - Fortunately, IPv6 gets rid of router fragmentation
48Evil Fragmentation
- Fragmentation can be used for attacks
- Jolt2 attack Send a lot of incomplete fragments
to a server (e.g. none have zero offset) itll
eventually run out of storage and crash - Send overlapping frags to a server some get
confused and crash
49IPv4 Addressing
- Recall that hosts have to have interfaces to the
network, over which to send datagrams - Routers need many interfaces, since they are
connected to multiple links - Therefore every IP address is associated with an
interface, not a host or router - IPv4 addresses are 32 bits (4 bytes), written in
dotted decimal notation (byte.byte.byte.byte)
50IPv4 Addressing
- Every Internet address visible to the must have a
unique IP address - Local networks can hide many systems behind one
IP using network address translation (NAT) - IP addresses are given out as hierarchically as
possible, so many local addresses have the same
prefix or subnet (leftmost bits in the IP
address) - Subnet IP network network in much literature
51IPv4 Addressing
- How many bits of the address are used to define
the subnet is given as a suffix after a slash,
e.g. 213.1.3.0/24 means the first 24 bits of the
address are the subnet mask - Often the links of a router each point to a
different subnet, e.g. in Fig 4.15 - Subnets also can be defined for the interfaces
between routers - A subnet is essentially an isolated part of a
larger network
52Fig 4.15 Subnet example
53Pre-CIDR
- Internet domains originally had prefixes of
- Class A8, Class B16, or Class C24 bits
- Led to lots of wasted address space!
- Class A ? 16,777,216 hosts per domain
- Class B ? 64k hosts
- Class C ? 256 hosts
54CIDR
- Now we use Classless Interdomain Routing (CIDR,
RFC 1519) to avoid that limitation - Any subnet of the form a.b.c.d/x can be used
- The x is called the prefix or network prefix
- Outside of the network (subnet), only the prefix
is used for routing - The rest of the address defines hosts within the
network
Image from http//www.naturalandsustainable.com/ca
tegory/hard-cider/
55CIDR
- So if a prefix is of the form a.b.c.d/21,
- 21 bits of the address are the prefix
- The remaining 32-21 11 bits are unique to each
device within that subnet - Giving you room for 211 2048 hosts
- The a.b.c.d part of the CIDR address can be
anything that fits within the prefix length in
binary
56Broadcast Address
- The IP broadcast address is a special IP address
255.255.255.255 (or all ones, 111111111.11111111.1
1111111.11111111) - When the destination address is that value, the
message goes to all hosts within the subnet - Routers usually wont forward these messages but
might
57Obtaining IP Addresses
- Typically an ISP gets a block of IP addresses,
and assigns them to customers - E.g. the ISP might get 200.23.16.0/20, which it
breaks down into smaller subnets for each
customer 200.23.16.0/23 for one, 200.23.18.0/23
for another, etc. - That way, routing knows anything starting with
200.23.16.0/20 goes to that ISP, and the ISP
routes it more specifically to each customer,
who then routes it to each specific host
58Obtaining IP Addresses
The use of a prefix for multiple subnets is
called address or route aggregation, or route
summarization
59Managing IP Addresses
- While ideally it would be nice to have a unique
subnet for everything, in reality it gets
messier many ISPs might have several subnet
ranges assigned to them - ICANN manages IP addresses, based on RFC 2050,
as well as managing domain names
60Getting a Host IP Address
- An organization assigns host addresses within its
subnet - Routers have IP addresses manually assigned
- Hosts can be manually assigned, but usually use
Dynamic Host Configuration Protocol (DHCP) - DHCP sets the host IP address, the subnet mask,
defines the first-hop router (default gateway),
and local DNS server - DHCP is often known as a plug-and-play protocol,
because it makes network admin much easier!
61DHCP
- For example, an ISP can use DHCP to assign IP
addresses to dialup customers - Need fewer IP addresses than you have customers,
since all wont be online at once - Need to manage which IP addresses are in use, and
which are available to be assigned - DHCP is also handy for mobile clients, such as
connecting to Dragonfly
62DHCP
- Dynamic Host Configuration Protocol (DHCP) makes
our lives much easier - DHCP is client/server based
- There must be at least one DHCP server to tell
everyone else what their IP addresses are - A router can act as a DHCP relay agent, so that
multiple subnets can share one DHCP server
63DHCP
- A new host on a subnet follows a four-step
process to get an address - DHCP server discovery use a DHCP discover
message (using UDP, port 67) to the broadcast IP
of 255.255.255.255, with a source IP of all zeros - A relay agent will pass the message to the server
- DHCP server offer(s) each DHCP server responds
with a DHCP offer message, including IP, network
mask, address lease time (TTL), etc. - Many offers can be received by a host
64DHCP
- DHCP request the new host (client) chooses from
the offers, selects one, and sends a DHCP request
message to that server - DHCP ACK the server responds with an ACK
message, and confirms the requested parameters - Once the client is connected with its assigned
IP, the lease can be renewed - One minor drawback is that an IP address cant be
kept between subnets, bad for mobile clients
65Network Address Translation
- Network Address Translation (NAT) allows local
networks to define IP addresses that are
invisible to the outside world - The NAT router looks like a device with one IP
address to the outside world, but usually uses
DHCP to assign IP addresses from private
networks to local devices - It doesnt have to use private networks, you
could use publicly visible IP addresses
66Private networks
- NAT typically uses prefixes reserved for private
networks, per RFC 1918 - The Internet Assigned Numbers Authority (IANA)
has reserved the following three blocks of the IP
address space for private internets - 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
67Network Address Translation
68Network Address Translation
- The NAT router keeps a translation table
- Destination address and port number
- Source local host IP AND port number
- Hence NAT has to change the addressing of every
datagram in out of the network! - Some purists object to this, because it
interferes with host-to-host communication - Need workarounds for P2P applications
69UPnP
- Peer to peer applications need an easy way to
cross a NAT router (NAT traversal) - Universal Plug and Play (UPnP) does that, for
either TCP or UDP packets
70ICMP
- ICMP is an old (1981) protocol (RFC 792) to
communicate error messages across the network
layer - E.g. Destination network unreachable
- ICMP is a nudge above IP, since ICMP sends IP
datagrams, instead of a TCP or UDP segment - ICMP messages have a type and code field (p.
364), plus the first 8 bytes of the
offending IP datagram
71ICMP Ping
- ICMP message also convey other kinds of
information, such as congestion control, bad IP
header data, TTL expired, etc. - Ping uses an ICMP message type 8, code 0, which
is an echo request - The reply should be type 0, code 0, echo reply
72Traceroute
- Traceroute sends UDP segments with bad port
numbers and successive TTL (1, then 2, then 3,
etc.) and times each datagram - When each TTL occurs, an ICMP warning message is
sent from that router, which returns to give the
round trip time (RTT) and the routers information
73Traceroute
- When a datagram gets to the other host, the UDP
segment has a weird port number, which prompts an
ICMP message of type 3, code 3, destination port
unreachable - That tells traceroute the other host has been
reached, so no more datagrams are needed - Sneaky!
74ICMP and Firewalls
- Firewalls typically inspect the headers of
packets to look for threatening contents - Pings coming from outside your network can map IP
addresses, for example - Port scans can look for open ports
- An Intrusion Detection System (IDS) goes further
by looking at packet contents (data), and
comparing them to known attacks
75IPv6
- The IETF realized that the Internet would run out
of IP address space, and CIDR, NAT, and DHCP
arent enough to save it - By 1996, 100 of Class A addresses were used, 62
of Class B addresses, and 37 of Class C - IPv6 was first called IPng (next generation)
- IPv6 is defined by RFC 2460
- Whats different from IPv4?
76IPv6 Datagram
- The IP addresses went from 32 to 128 bits
- 2128 340,282,366,920,938,463,463,374,607,431,770
,000,000 - Really, we wont run out of IP addresses. Ever.
- In contrast, the number of cells in 6 billion
people is about 6E91E12 6E21, a factor of 56
million billion under the 3.4E38 possible
addresses
77IPv6 Datagram
- Adds an anycast address type, which can go to
any in a group of hosts - Header is fixed 40-bytes (2x4 B 2x16 B)
- Adds flow labeling and priority, where a flow is
a group of packets requiring special handling
(real time service, or paid priority enhancement)
78IPv6 Datagram
- IPv6 addresses can be a 16-value dotted decimal
notation, e.g. 128.91.45.157.220.40.0.0.0.0.252.87
.212.200.31.255 or the hex equivalent
805B.2D9D.DC28.0000.0000.FC57.D4C8.1FFF - There are lots of rules for abbreviating IPv6
addresses most common is which hides a
bunch of zeroes - Removes from IPv4
- Fragmentation, Header checksum, and Options
79IPv6 Datagram
- Specifically, IPv6 headers have the following
fields - IP version, now obviously a 6
- Traffic class, similar to the TOS field
- Flow label, an identifier for a given flow
- Payload length number of bytes in the data
- Does not count the header, since thats a fixed
40 B - Next header is the protocol field from IPv4
- Hop limit acts like the time-to-live (TTL) field
- Source and destination addresses, are 128 bits
each - Then the data
80ICMPv6
- ICMP has been updated for new messages under IPv6
- It also takes over the Internet Group Management
Protocol (IGMP) which well get to later it
involves joining and leaving multicast groups
81IPv4 versus IPv6
- The transition from IPv4 to IPv6 is huge tens
of millions of hosts and routers only speak IPv4 - Three major approaches for making the transition
to v6 - Flag day approach
- Have everyone (in the whole world) update to v6
by a given specific day only run v6 after that
day - Isnt logistically or financially possible
82IPv4 versus IPv6
- The dual stack approach means implement v4 and v6
at the same time, and switch back forth as
needed - Every v6 node also runs v4 this is called an
IPv6/IPv4 node - Works, but often loses the benefit of v6 existing
- Tunneling is also possible
- Wherever a section of IPv4 links needs to be
crossed, package the IPv6 datagram in an IPv4
datagram - Then unwrap the v6 datagram when back in v6 land
83IPv6 Adoption
- The adoption of IPv6 has been slow, partly
because of CIDR, NAT, and DHCP - However large scale technology changes typically
take a long time - How many phone lines are optical yet?
- Network protocols are very slow to change,
whereas apps are easy to change - IPv6 will probably be around a long time!
84IP Security
- IPv4 was designed in the 1970s, long before
anyone expected the Internet to be a public
medium and hence it has no security in it - IPsec was created to work with IPv4 or IPv6 and
add security to the network layer - It allows TCP and UDP traffic to take place in a
secure environment
85IP Security
- IPsec
- Allows hosts to negotiate encryptiion protocols
- Use that protocol to encrypt each datagram
- Verify that the header and data retain their
integrity - Authenticate the origin of a trusted source
- This is covered more in chapter 8
86Routing Algorithms
- Mostly have focused on forwarding now address
routing - Both datagram and VC networks need to perform
routing, i.e. find good paths between sender and
receiver - A host is typically attached to its default
router (first hop), which well call the source
router similarly the destination has a
destination router
87Routing Algorithms
- A good route typically minimizes cost, but may
also avoid other concerns (e.g. ownership of
networks, privacy of data, etc.) - Use a graph to show routing problems, with N
nodes (routers) and E edges (links) - Assume the cost of each edge is a given c(x,y)
cost of edge between nodes x and y(x,y) is the
edge between those nodes
88Routing Algorithms
- The cost of an edge not available is infinite
- A path is defined by a sequence of nodes (x1, x2,
x3, , xn) - The cost of a path is the sum of the edge costs
along it c(x1,y1)c(x2,y2)c(xn, yn) - Some path between nodes x and y is the least-cost
path - If all edges have the same cost, the shortest
path is also the least-cost path
89Routing Algorithms
- Two key ways to classify routing are
- A global routing algorithm uses knowledge of the
entire network to calculate the best path - Also called link-state (LS) algorithms
- A decentralized routing algorithm finds the least
cost path in an iterative decentralized manner
no node has complete knowledge of the network - Only the local costs are known
- The distance-vector (DV) algorithm is one example
90Routing Algorithms
- Another way to classify routing algorithms is
static vs dynamic - Static routing algorithms change slowly over
time, often by human intervention - Dynamic routing algorithms change to adjust for
traffic, topology, etc. - Can update periodically, or adjust for network
changes
91Routing Algorithms
- A third classification (!) is load-sensitive
versus load-insensitive algorithms - Does congestion change the routing?
- High cost for a congested link leads to using
load-sensitive routing, but most Internet
algorithms are load-insensitive - So we have global vs. decentralized, static vs.
dynamic, and load-sensitive vs.
load-insensitive
92Link-State Routing Algorithm
- The LS algorithm uses complete knowledge of
network topology and link costs - The identity and cost of links for each router
are broadcast using a link-state broadcast, such
as the Internets OSPF protocol - The actual routing is calculated using Dijkstras
algorithm (named for Edsger Dijkstra)
93Link-State Routing Algorithm
- Dijkstras algorithm is iterative, so that after
k iterations, the least-cost paths are known to k
destination nodes - The global routing algorithm initializes all
nodes, then does a loop as many times as you have
nodes in the network - Each loop adds the lowest cost node to N, the
list of nodes no longer under consideration,
until all nodes are in N
94Dijkstras Algorithm
95Dijkstras Algorithm
- For example, the algorithm finds the cost to get
from u to w is first 5 (path uw), then 4 (uxw),
then 3 (uxyw), and cant improve on the cost of 3 - When done, we have the lowest cost path from the
source to all other nodes - Complexity of this algorithm is the need to
search n(n1)/2 nodes, which is O(n2) (the
order of n squared)
96Oscillations
- If the cost of a path depends on the direction
through that path, algorithms can undergo
oscillations where the best path changes from
clockwise to counter-clockwise with each
iteration - To avoid this, dont run the algorithm on all
nodes at the same time - Or dont use load-based link costs
97Distance-Vector (DV) Routing
- The Distance-Vector Routing Algorithm is
iterative, asynchronous, and distributed - Nodes get data from directly attached neighbors,
and distribute the results to the neighbors - Assume were going from node x to node y, and the
neighbors of x are nodes v - The Bellman-Ford equation gives us
- dx(y) minc(x,v) dv(y)
98Distance-Vector Routing
- Say what?
- Start at node x
- For each neighbor v, find the cost to get from v
to y, which is dv(y) - The cost from each neighbor to y is the cost from
x to v, plus the cost from v to y, or c(x,v)
dv(y) - The cheapest cost from x to y is the smallest
value of the previous bullet for any neighbor of x
99Distance-Vector Routing
- Cute parlor trick?
- Actually this is the basis for forwarding tables!
- For some destination y, the lowest cost path
goes through a particular neighbor v - The DV algorithm essentially follows the
Bellman-Ford equation - As each node gets cost data from its neighbors,
the cost to get anywhere in the network
approaches the ideal value dx(y)
100Distance-Vector Routing
- This depends on asynchronous data exchange among
nodes - And after all nodes have exchanged information,
the routing wont change (becomes quiescent)
until theres a change in link cost or a dead
link - Many protocols use some variation on this
approach, including ARPAnet, the Internets RIP
and BGP protocols, Novell IPX, ISO IDRP, etc.
101DV Changes
- If the cost of a link decreases, updates to its
neighbors will generally occur peacefully - If a cost goes up, leftover incorrect information
can cause a routing loop (bounce back and forth
between nodes) - Large cost increases can result in thousands of
bounces before the problem corrects itself, hence
known as the count-to-infinity problem
102DV Changes
- Fix somewhat with the poisoned reverse
- Pretend the cost to go backward on a link is
infinite, so it wont try to bounce back - But if the loop involves more than two nodes,
this doesnt help
103Compare LS vs. DV Routing
- Under LS, nodes talk to all other nodes, but
exchange costs of direct connections - Under DV, nodes only talk to neighbors, but
gives cost estimates to all other nodes - Message complexity
- LS sends cost changes to every node in the
network DV only propagates changes when cost
decreases
104Compare LS vs. DV Routing
- Speed of convergence
- LS converges with speed O(n2) DV converges
slowly, and can suffer from routing loops and the
count-to-infinity problem - Robustness
- If a node fails under LS, the rest of the network
is relatively unaffected (for routing) under DV,
a faulty router can mislead the rest of the
network - So both approaches have advantages
105Other Routing Approaches
- LS and DV are the only routing approaches widely
used in the Internet - Many others have been defined over the years
- Network flow problems model the network as a big
equation to solve - Circuit-switched routing algorithms use
telephone-like logic to find the cheapest routes
106Hierarchical Routing
- LS and DV assume the network is a herd of
connected routers all peers or equals - Scaling for LS routing is daunting for huge
number of routers - Most administrators want autonomy to decide their
structure - What happens if theres structure to routers?
- Organize routers into autonomous systems (AS)
107Autonomous Systems (AS)
- Under AS, groups of routers
- Are under control of one administration authority
- Use one routing protocol (LS or DV) within that
group, their intra-autonomous system routing
protocol - Connect to other groups via gateway routers
- Routing information separates routing within the
AS from routing outside the AS - Need to know which outside addresses are best
reached from which gateway routers
108Autonomous Systems (AS)
109Autonomous Systems (AS)
- In order for the AS to talk to each other, they
need to use the same inter-AS routing protocol
called BGP4 for the Internet - BGP4 defines which subnets are reachable from
various gateway routers (assuming more than one
exists) - One common strategy is hot-potato routing, where
you send a packet to the cheapest gateway
router
110Autonomous Systems (AS)
- AS communicate to each other about new
destinations nearby - Large ISPs may set up dozens of AS just for
themselves smaller ISPs might be one AS - Now look at two intra-AS routing protocols (RIP
and OSPF) and the inter-AS routing protocol BGP
111RIP
- The Routing Information Protocol (RIP) is an
older intra-AS routing protocol - Based on work by Xerox and part of the BSD Unix
distribution in 1982 - RIP version 2 is defined by RFC 2453
- Works based on the DV model
- Cost is based on hop count each link has cost1
- Hop is the number of subnets crossed to get from
source to destination
112RIP
- Max cost allowed in RIP is 15 hops
- Routing updates are every 30 sec using RIP
response messages or advertisements - Each RIP router maintains a routing table
- The routing table contains the destination
subnet, the next router to get there, and the
number of hops to that destination - Exchanging routing tables allows routers to find
the cheapest routes
113RIP
- If a neighboring router doesnt provide an update
for three minutes, its assumed to be dead (rest
in peace?), and the routing table is adjusted
accordingly - RIP messages go over UDP using port 520
- In Unix, the daemon routed (route dee)
implements RIP
114OSPF (think sunscreen?)
- OSPF and its cousin, IS-IS are widely used for
intra-AS routing - OSPF version 2 is defined by RFC 2328
- IS-IS is defined by RFC 1195
- OSPF uses LS routing, and creates a complete
topological map of the entire AS - Then it follows Dijkstras algorithm to find the
shortest paths everywhere in the AS
OSPF Open Shortest Path First, IS
Intermediate System
115OSPF
- Link cost can be 1 (just count hops) or weighted
inversely to the links capacity (to put more
traffic where it can be handled well)
116OSPF
- All routers in the AS broadcast state information
to all other routers - 1) when theres a change in link cost or status,
or - 2) every 30 minutes to say theyre alive
- OSPF messages are carried straight over IP
117OSPF
- OSPF advantages include
- Security exchanges between OSPF routers must be
authenticated, either by simple password or MD5
encryption - Use multiple paths that are the same cost
- Also handles multicast (MOSPF)
- Allows creation of hierarchy within the AS
- Defines Areas, which connect to the Boundary
Routers through Area Boundary Routers and maybe
Backbone Routers
118OSPF Internal Hierarchy
119BGP
- So, RIP or OSPF can be used for routing within an
AS - But when the source and destination hosts cross
many AS, need BGP, the Border Gateway Protocol
(currently BGP4) - BGP gives AS the means to
- Get subnet info from neighboring AS
- Propagate that info to routers within the AS
- Find good routes to subnets
120BGP
- BGP is massively complex
- BGP uses semi-permanent TCP connections (using
port 179) between routers that connect AS, and
between routers within an AS - Connections between AS are external BGP (eBGP)
- Within an AS uses internal BGP (iBGP)
121BGP
- Which destinations are reachable through a
neighboring AS is expressed using CIDR prefixes,
e.g. 138.67.16/24 - Each AS is identified by an ASN (AS number)
- ASNs are defined by ICANN and RFC 1930
122BGP
- BGP peers (routers) advertise routes to each
other - Routes consist of a prefix and BGP attributes
- BGP learns all possible routes, then follows a
set of rules to determine which to keep - Policies are established to determine what kind
of routes are allowed, not just possible
123Broadcast and Multicast
- So far everything has focused on one source and
one destination trying to communicate (unicast) - Broadcast routing sends a packet from a source to
all other nodes in the network - Multicast routing sends from a source node to
selective other network nodes
124Broadcast Routing
- A simple way to handle broadcasting is to make N
copies of a packet, and send one to each of the N
destination nodes (hosts) - This is N-way-unicast, since it really isnt a
broadcast method at all - Major disadvantages of this simple approach
- Its really inefficient, and overloads the first
link - Its hard to know all target addresses, unless
you add on a broadcast membership protocol
125Uncontrolled Flooding
- A possible approach is to send a packet to its
neighbors, who send it to their neighbors, etc. - Massive problems include
- Cycle never ends if there are loops in the
network - Multiple interconnections result in a broadcast
storm when a node gets e.g. three messages to
broadcast to all their neighbors, who get
multiple broadcast messages, and so on
126Controlled Flooding
- Try flooding, but with more logic to prevent a
broadcast storm - Several possible approaches
- Sequence-number-controlled flooding adds its
address and a broadcast sequence number in the
packet - Nodes check for having received this sequence
number (e.g. broadcast 1254) already if not,
duplicate it and send to neighbors
127Controlled Flooding
- Reverse path forwarding (RPF) or reverse path
broadcasting (RPB) is subtle - When a packet is received, send it out on all
other links ONLY IF it was received from the
shortest unicast path back to the source - Otherwise, throw it out
128Spanning-Tree Broadcast
- While the controlled flooding approaches do avoid
a broadcast storm, they can still send duplicate
packets - A spanning tree diagram connects all the nodes in
a network exactly once - One that has minimum cost is a minimum spanning
tree - Hence a possible broadcast approach is to
construct a minimum spanning tree and use it
129Spanning-Tree Broadcast
- Once defined, the spanning tree can be used to
initiate a broadcast from any node - Each node only knows which adjacent nodes are
part of the tree - Many algorithms can be used to create spanning
trees
130Reality v Broadcast Algorithms
- Broadcast algorithms are used at the application
and network layers - Gnutella uses app-layer broadcasting, with a
time-to-live hop number countdown to give
limited-scope flooding - OSPF uses sequence-controlled flooding to
broadcast link-state advertisements (LSAs), as
well as in the IS-IS protocol - Sequence number and age data are used by OSPF to
tell old LSAs from newer ones
131Multicast
- Multicast sends a packet only to select nodes in
a network - There also may be more than one sender
- Examples of uses include
- Bulk software upgrades
- Streaming media to a class or meeting
- Shared apps like teleconferencing
- Data feeds (stock prices)
- Interactive gaming
132Multicast
- Key problems are
- How to identify the receivers of the message
- How to address those receivers
- In unicast, the IP address of the recipient was
enough but now, does every address get the list
of all recipients? - Addressing could be larger than the message
- Solve using address indirection
133Multicast
- Address indirection uses a single identifier
(here, a class D multicast address) for the group
of receivers, and address the packet only with
that single identifier - The single identifier is a multicast group
- So how do we manage this multicast group? Create
an RFC! (duh!) - Internet Group Management Protocol
134IGMP
- The Internet Group Management Protocol (IGMP),
version 3, RFC 3376, works between a gateway
router (first hop router) and its hosts only
within its LAN - IGMP allows a host to tell the router that a
hosted app wants to join a multicast group - Then the router communicates to other routers
using a network-layer multicast routing
algorithm, e.g. PIM, DVMRP, or MOSPF
135IGMP
- IGMP only has three message types, carried in an
IP datagram - Membership_query is sent by the router to find
all groups joined by hosts on that interface, or
determines if a particular group has been joined - Membership_report is sent by the hosts to reply
to a query, or to tell the router when a group
has first been joined
136IGMP
- Leave_group message is oddly optional a host can
leave a group by not responding to queries - So joining a multicast group is based on receiver
host action sending a membership_report to the
router - This means the sender doesnt control membership
doesnt add new receivers to the group
137Multicast Routing
- Multicast routing algorithms need to ensure that
all routers with hosts in the group get the
desired packets - Other routers might have to get them too, but
avoid that where possible - Two major approaches are used for multicast
routing - Using a group-shared tree
- Using a source-based tree
138Using a group-shared tree
- Like the spanning-tree algorithm, build a tree
that includes all edge routers with hosts in the
group - Uses a single tree to allow sending from any
sender kind of a global approach - A central node is used to coordinate the process,
so new routers send messages to it to get added
to the tree - Also called a center-based tree approach
139Using a source-based tree
- Focuses on making a shared routing tree based on
a specific source sender - Uses the RPF (reverse path forwarding) algorithm,
tweaked for multicast - Can result in thousands of unwanted packets to
routers with no group members - Routers who get unwanted packets send a pruning
message to a router upstream from it
140Multicast in the Internet
- The first multicast routing algorithm is the
Distance-Vector Multicast Routing Protocol
(DVMRP, RFC 1075) - Uses source-based trees with RPF and pruning
- Uses a DV algorithm to find the shortest path to
the source - Also monitors downstream dependent routers
- Has graft messages to, yes, undo a pruning
141Multicast in the Internet
- The Protocol-Independent Multicast (PIM, RFC
3973) routing protocol is widely used - Uses dense or sparse modes, depending on the
density of routers with group member hosts - Dense mode uses flood-and-prune RPF
- Sparse mode uses center-based tree, like the
core-based tree (CBT) protocol - Can switch from group-shared tree to source-based
tree after joining
142Multicast in the Internet
- PIM sparse domains can be joined at rendevous
points using Multicast Source Discovery Protocol
(MSDP, RFC 4611) - A third option for multicast is Source-Specific
Multicast (SSM, RFC 4607) - Under SSM only one host can send traffic into
the multicast tree, which makes defining the
tree a lot easier
143Multicast in the Internet
- BGP can also support multicast (RFC 4271)
- RFC 5110 is good for more discussion of multicast
routing - Increasingly multicast is being handled at the
application layer, such as End System Multicast
(ESM) from Carnegie Mellon
144Multicast Babel?
- So far assumed all routers use the same multicast
protocol - Within an AS this should be true
- But different AS could run different protocols
- RFC 2715 defines interoperability rules for
multicast routing protocols to play nicely with
each other - DVMRP is the de facto standard, but PIM and BGP
are also viable
145Are We Dead Yet?
- Diving into the network core, weve covered
- Service models for datagram and VC networks
- Router components and how they work
- IPv4 and IPv6 datagram formats
- Allocation of IP addresses
- NAT and ICMP
- Link-state and distance-vector routing algorithms
146Are We Dead Yet?
- Routing within and among AS
- Routing protocols RIP, OSPF, BGP
- Broadcast routing algorithms uncontrolled
controlled flooding, spanning-tree - Multicast routing algorithms IGMP, DVMRP, and
PIM and a few more - And you thought the network layer was just IP