Title: Switching
1Switching
- Information Switching Technology in Networks
2What is it?
- Switching --- a process by which a network
element forwards information from its inputs to
its outputs according to predetermined rules. - Switch --- an element used to fulfill switching
task.
3Types of switches
- Telephone switches
- mostly switching voice information in voice
samples format - in telephone networks
- also called exchangers(e.g., PBX).
- Datagram switches
- (Datagram a data packet with its address
information inside.) - switching datagram in Internet, called routers
- switching Ethernet datagram(MAC based packet) ,
called Ethernet switch - ATM/FR/MPLS switches
- (FR frame relay MPLS MultiProtocol Label
Switching) - switching ATM/FR/MPLS datagram
- in ATM, cells, fixed-size packets
- FR/MPLS, variable-size packets
4Classification
- Switches can be categorized by following ways
- Packet switches vs. circuit switches
- A packet switch --- switching packets
- A circuit switch --- switched data are the pure
data to be transmitted. - connectionless switches vs. connection-oriented
switches - A connectionless switch --- reads the destination
address from the incoming packet and decides its
output port(interface) by looking up an existing
switching/routing table in the switch. - A connectionless switch is sure a packet switch,
but the reverse is not true. - A connection-oriented switch --- many compose a
connection-oriented switching system, in which
data are transferred along a pre-established
path, the path usually has an identifier.
5Classification
- Connection-oriented switches(contd)
- The path is denoted by identifiers, which are
assigned ahead of the data transmission - in ATM/FR, the path identifier is VCI or VPI/VCI
pair - In MPLS, the identifier is a MPLS label
- In exchangers, the identifier is time
slot TS - Path Ids are swapped, so changed along the path
- When transmission ends, switch recycles path Ids
- Can be separated to 2 parts switch fabric and
switch controller - The switch fabric switches the data to the output
interface by the path ID(So also a switching
table/translation table is associated) - The switch controller assigns a path ID to a data
stream according to its destination address
(signaling process)
6Classification(contd)
- Conclude with following table
- ???
7Other functions of switches
- Participate in routing algorithms
- to build routing tables, Chp11
- Resolve contention for output links
- Scheduling, Chp9
- Admission control
- to guarantee resources to certain streams,
Chp13,14 - This Chapter, focus on pure data movement and
architecture
8Requirements
- Capacity of switch is the maximum rate at which
it can move information, assuming all data paths
are simultaneously active - Primary goal maximize capacity
- subject to cost and reliability constraints
- A circuit switch will reject a call if it reaches
its capacity, called call blocking. (In this
case, the caller usually is told by a busy tone.)
- goal minimize call blocking
- If a packet switch reaches its capacity, it will
try to store data in buffer first. If the buffer
then becomes overflowed, switch had to discard
the data, called packet loss. - goal minimize packet loss
9Requirements(contd)
- Reordering packets is harmful
- Costly to order back packets
- Increase end-end delay greatly
- Connection-oriented switches generally do not
reorder packets,but connectionless switches (such
as routers ) often do. - Goal minimize packet reordering.
- ???
10A generic switch
- A generic switch has 4 parts
- input buffers
- a port mapper
- a switch fabric
- output buffers.
- But realistic switches often distribute, combine,
or omit one or more of the upper parts
11A generic switch(contd)
- Input buffers temporarily store data arriving on
the input ports. - The buffer space differs from different kind of
switches. - very small, buffers are at output side
- very large, almost all of the switch buffers at
input side. - a port mapper reads destination address or path
identifier of the packet and uses it as an index
to look up the switching/translation/routing
table to find out the output port - only for packet switches.
- A circuit switch does not need a port mapper,
every time slot (TS) is statically associated to
a output port. - Does not do switching
12A generic switch(contd)
- A switch fabric switches packets (or data) to the
specific output port buffer according to the
output port number assigned by the port mapper. - In packet switches, switch fabric is usually a
processor or a complex multiprocessor, at
high-speed. - In circuit switches, the switch fabric is usually
an ASIC or multi ASICs - Output buffers store data, waiting for a turn to
put on output link. - A scheduler is usually used to manage the output
buffers and decide the access right to the output
link. - buffer space may differ from different kind of
switches - very small, with most buffers at input side.
- very large, with almost all of the switch
buffers at the output side. - Usually, if the input buffers are large, the
output buffers are small, and vice versa. - ???
13Outline
- (Circuit switching)
- Packet switching
- Switch generations
- Switch fabrics
- Buffer placement
- Multicast switches
14Packet switching
- Discuss two types of packet switches
- virtual circuit ATM switches
- datagram switches --- routers
- Other than that, very similar
- ATM switches
- Handle fixed-size packets
- port mapper decides the output port by VCI or VPI
or VPI/VCI in every incoming packet header - switching table is a VCI-to-Port table ---
translation table
15Packet switching
- Routers --- a kind of switches
- Handle variable-size packets.
- port mapper decides the output port by the entire
destination address included in every incoming
packet header - The switching table is a entire address to
Port table --- routing table. - ???
16Repeaters, bridges, routers, and gateways
- According to the different working position in
the protocol stack, datagram switches have
different names - Repeaters
- operating at physical level(L1)
- repeating input signal on one or more of its
outputs, only with the signal reinforced at
physical level (such as regulating the waveform,
restoring timing, etc). - A Hub is a kind of repeater.
- Bridges
- at datalink level (L2,)
- connecting different segments of a LAN together
by forwarding packets directly to the segments
according to the packets datalink layer
destination address(in Ethernet, the MAC
address), lowing the amount of traffic on the
LAN. - In Ethernet, is an Ethernet LAN switch(based on
MAC addresses). - discover attached segments by listening
- Difference between Ethernet switches and Hubs?
17Repeaters, bridges, routers, and gateways(contd)
- Routers
- at network level (L3)
- forwarding packets to one of the output ports
according to the packets network layer address(
in Internet, the IP address). - participate in routing protocols
- Differences between bridges and routers
- bridges using datalink layer address, connecting
different segments of one same network(such as
LAN) together. - routers using network layer address, connecting
different packet based networks together, forming
an Internet. - Application level gateways
- a datagram switch working at application
layer(L7) - interconnecting between different networks and
even different systems, treating an entire
network as a single hop - E.g
- an IP phone gateway connects PSTN networks into
Internet, so that a telephone call can be made
via Internet. - mail gateways and transcoders
18Repeaters, bridges, routers, and gateways(contd)
- Summary
- repeater -gt bridge -gt router -gt gateway
- layer in protocol stack low level -gt upper
level - functions simple -gt complex
- forwarding speed fast -gt slow
- Gain functionality at the expense of forwarding
speed - for best performance, push functionality as low
as possible - ???
19On port mappers in routers
- Look up output port based on destination address
- Easy for ATM just use a table, with one
field(VCI or VPI or VPI/VCI) to match - Harder for IP routers
- need to match more fields
- e.g. packet with address 128.32.1.20
- routing tables may have entries (128.32.),
(128.32.1.), (128.32.1.20) - routing table is usually quite large, sometimes
with tens of thousands of entries, data structure
and match algorithm become very important for the
router to reach high performance, should be
quick, efficient, and reliable. - An often used data structure for routing table
trie
20Trie
- Is a commonly used data structure for routing
tables - basically a tree data structure, with the address
elements as its tree nodes. - A node is also associated with an output
interface number, meaning the address the match
ends at this node should be mapped to that output
interface. - A match process starts at the root node, and
goes as far down as possible. - The output interface of the farthest node the
match can go is just the port mappers result.
21Trie
- An example
- IP address based trie structured routing table.
Each node represents a partial string of the
whole IP address. - If no match can be made for a given IP address,
the match ends at the root node. This is the
default routing state for the router, so the
associated output interface for root node is for
all default routes. - All other nodes have specific IP addresses or
address segments to match with, and all are
associated with output interface numbers. - Match for 128.32.5.10 ?
Out1
Out3
Out2
Out4
Out5
Out8
Out5
Outport6
Out7
22Trie
- What other data structures for routing tables?
- - linear tables, for example 128.32.1.120
- 128.32.1.100
- 128.32.1.
- 128.32.25.
- 128.32.
- Benefits from trie net
- quick in address match, the larger the routing
table the quicker. Much quicker than linear table
structures. - But for small tables, this is not true
- small in table storing space.
- Some improvement tricks for port mapper
- using a cache to store recently matched IP
addresses. The cache is first checked before a
new match begins. - using a cache to store some often used IP address
match results. The cache is first checked before
a new match begins. - ???
23Blocking in packet switches
- Can have both internal and output blocking
- Internal
- no path to output, happen at input buffers, port
mapper, and switch fabric. - Output
- trunk unavailable, happen at output buffers and
output ports. - If packet is blocked, must either buffer or drop
it - May first try to buffer, then to drop
- cannot predict if packets will blocked or not,
unlike a circuit switch - circuit switches can only block a call during a
call-setup time. During data transmission phase,
no block anymore. To block the call is simply to
reject to setup a path for the call. So blocking
in circuit switches only affects connection setup
ability for a network instead of transmission
quality of a call. - Packet switches can not predict quantity of
packets to transfer - Even in ATM switches, blocking may happen
24Dealing with blocking
- Speed up
- Speeding up internal switching processing to
prevent internal blocking, internal links much
faster than inputs - Buffers
- Placing more at input or output
- QoS
- Using admission control technology to simulate
circuit switching. - Backpressure
- if blocking happens at switch fabric or output
port, switch sends a signal back to its input
port telling the blocking and preventing more
data entering the switch center. (flow control ) - Improve performance of switch fabric
- Sorting and randomization
- Parallel switch fabrics
- increases effective switching capacity
- ???
25Outline
- Circuit switching
- Packet switching
- Switch generations
- Switch fabrics
- Buffer placement
- Multicast switches
26Three generations of packet switches
- Along with ever-advancing technologies, switches
have evolved for three generations, with the
switching capacity becoming larger and larger. - Different trade-offs between cost and performance
27First generation switch
- Switch fabric is realized by software in CPU, the
same is for port mapper part. Buffers are mainly
computer memory. - Most Ethernet switches and cheap packet routers
are - Also PC based routers
- Bottleneck can be CPU, host-adaptor or I/O bus,
depending - ???
28Example
- First generation router built with 133 MHz
Pentium(Instruction cycle1/1337.52
nanoseconds), assume - Mean packet size 500 bytes
- Interrupt takes 10 microseconds,
- word access take 50 ns(time to r/w a word from
memory,I/O bus speed) - Per-packet processing(port mapper, switch fabric)
time takes 200 instructions 1.504 µs - Copy loop(one word)
- register lt- memoryread_ptr
- memory write_ptr lt- register
- read_ptr lt- read_ptr 4
- write_ptr lt- write_ptr 4
- counter lt- counter -1
- if (counter not 0) branch to top of loop
- Copy one word takes 4 instructions 2 memory
accesses 130.08 ns, copying packet takes 500/4
130.08 16.26 µs interrupt 10 µs - Total time
- 10us(int.)16.26(copying packet)1.504us(per-pac
ket processing) 27.764 µs - gt speed is 500bytes/27.764us144.1 Mbps
- overhead such as doing route-table computation
not taken into account - In conclusion, not efficient in resources, can be
improved - ???
29Second generation switch
- Line cards are more intelligent, with a processor
inside - able to decide output port of some packets
themselves, and put packets on bus and directly
to output linecards, saving lots of time - Some port mapping done by line cards themselves,
instead of center CPU - Need line cards keeping a local
routing/translation cache, while center computer
keep a larger routing table - For routers, If a packet can not be routed by
line card, its sent to center processor, which
will forward it and update routing cache of this
line card then. This may result in packet
reordering. - For ATM switches, because translation cache in
line card is setup by path signaling process,
port mapping can surely be fulfilled in line
cards, no packets sent to center, no packet
reordering. But during path setup, line card need
control from center.
30Second generation switch(contd)
- A shared bus or a ring to connect line cards and
center processor - a bottleneck for system, bandwidthsum of b of
all ports - Improvement
- In connection-oriented switches, using a shared
port mapper card to replace all distributed port
mappers in port cards. - Packets go input line card-gtport mapper card
-gtoutput line card - Ipsilon IP switching
- goal to route IP packet over ATM, assume
underlying ATM network - by default, assemble cell packets back to IP
packet and use an adjunct processor to route it. - if detect a flow, ask upstream to send on a
particular VCI, and install entry in port mapper
gt implicit signaling, no need to assemble
anymore. - ???
31Third generation switches
- Bottleneck in second generation switch is the bus
(or ring) - Third generation switch, shared bus is replaced
by a switch fabric - Switch fabric is composed of interconnected buses
and switching elements.
32Third generation (contd.)
- Port mappers are in ILCs
- Switch fabric features
- self-routing fabric
- output buffer is a point of contention
- unless we arbitrate(manage) access to fabric
- ???
33Outline
- Circuit switching
- Packet switching
- Switch generations
- Switch fabrics
- Buffer placement
- Multicast switches
34Switch fabrics
- Transfer data from input to output according to
the output port given by port mapper, ignoring
scheduling and buffering, - Usually consist of links and switching elements
- 1st and 2nd gens may use software. 3rd gen
hardware - Many designs,
- Crossbar
- Broadcast
- Switching element
- Banyan
- Sorting and merging fabrics
- Batcher-Banyan
- Be aware that hard part in designing a switch is
not the switch fabric, but deciding where to
place buffers and how to schedule access to the
buffers and bandwidth.
35Crossbar
- Simplest switch fabric
- Quite same as that in telephone circuit switch
- Used here for packet routing cross-point is left
open long enough to transfer a packet from an
input to an output - A switching scheduler is used to tell where,
when, and how long - (differs from schedulers of output port)
- For packets with fixed-size or a known arrival
pattern(ATM), can compute schedule in advance - Otherwise, need to compute a schedule
on-the-fly(by reading the length message of
packet).
36Buffered crossbar
- What happens if packets at two inputs both want
to go to same output? ---- output blocking. To
avoid, - Can have the crossbar N times faster than the
inputs(expensive) - Or, buffer crosspoints(practical)
- ???
37Broadcast
- Packets are tagged with output port by port
mappers - Then are broadcast to all output ports
- Each output matches tags, so need to match N
address items in parallel at each output(should
be N times faster than input) - Need not a switching schedule
- Useful only for small switches, or as a stage in
a large switch - ???
38Switch fabrics built with fabric element
- Can build complicated fabrics from a simple basic
switch fabric element - A switch fabric element
- Consisting of 2 inputs, 2 outputs, and an
optional buffer - Routing rule check a bit of the output port
number. if 0, send packet to upper output, else
to lower output - If both packets go to the same output, buffer or
drop
39Features of fabrics built with switching elements
- NxN switch can be built with many bxb switching
elements. Itl have stages with
switching elements per stage - E.g 4096x4096 switch can be built with 4 stages
of 8x8 switching elements, with 512 elements in
each stage. 8x8 switching elements can further
be built with 2x2 elements, with 3 stages, per
stage with 4 elements. - Fabric is self routing, given an output port
number. - Recursively composed from smaller to larger
- Can be synchronous or asynchronous
- Async. need a buffer, fit for variable-size
packet switching - Regular and suitable for VLSI implementation
- ???
40Banyan(1973)
- Simplest self-routing recursive fabric
- Input packets are tagged with output port in
binary, i th element looks at the i th bit of
the tag, forwarding rule 0---upper layer, 1---
lower layer - 2n outputs need n stages with 2n-1 elements in
each stage - ???
- What if two packets both want to go to the same
output of a switching element? - Blocking happens at output of the switching
element
110-gt
41Blocking
- Can avoid with a buffered banyan switch, every
element being associated with a buffer - but this is too expensive
- hard to achieve zero loss even with
buffers(impossibly large enough) - Instead, can check if path is available before
sending packet - three-phase scheme
- send requests
- inform winners
- send packets
- Or, use several banyan fabrics in parallel
- intentionally misroute and tag one of a colliding
pair - divert tagged packets to a second banyan, and so
on to k stages - expensive
- can reorder packets
- output buffers have to run k times faster than
input - Or, use sorting --- Batcher-Banyan
- ???
42Batcher-Banyan
- A Batcher sorting network sorts packets according
to their output port - A trap network removes duplicate packets
- Where to remove?
- recirculate to beginning
- or run output of trap to multiple banyans
(dilation) - A shuffle-exchange network shuffles the packets
43Batcher-Banyan(contd)
- Principle
- blocking in Banyan can be avoided by choosing
order in which packets appear at input ports - E.m two packets individually are to output port
000 and 001 - if put at input of 000 and 001, packets will be
blocked at first stage - if put at input of 000 and 010, no blocking
- Procedure
- packets at inputs sorted by output number, by
sorting network - remove duplicates and remove gaps
- Shuffle with a perfect-shuffle network
- Enter into normal Banyan network
- It has be shown that Banyan network with above
procedure is internally nonblocking. - For example
- Ordered set of out port X, 011, 010, X, 011,
X, X, X - -(sort)-gt 010, 011, 011, X, X, X, X, X
- -(remove dups)-gt 010, 011, X, X, X, X, X, X
- -(shuffle)-gt 010, X, 011, X, X, X,
X, X - Bater-Banyan cheaper than per-element buffering
in avoiding blocking - ???
44Sorting
- (5,7,2,3,6,2,4,5)-sort-gt(2,2,3,4,5,5,6,7)
- Can build sorters from merge networks
- Merging
- merge two sorted list into one sorted list
- E.g (5,7) (2,3)-merge-gt(2,3,5,7)
- (2,6) (4,5)-merge-gt(2,4,5,6)
- E.g a 4m4-gt8 merging network using comparator
- Sorting by merging
- E.g
- 1m1-gt2 merging,
- (5)m(7)-gt(5,7), (2)m(3)-gt(2,3), (6)m(2)-gt(2,6),
(4)m(5)-gt(4,5) - 2m2-gt4
- (5,7)m(2,3)-gt(2,3,5,7),
- (2,6)m(4,5)-gt(2,4,5,6)
- 4m4-gt8
- (2,3,5,7)m(2,4,5,6)-gt(2,2,3,4,5,5,6,7)
- Can sort any size list by merging two sorted
lists recursively - ???
45Shuffle
- Use a shuffle algorithm
- E.g
- 010, 011, X, X, X, X, X, X -(shuffle)-gt 010,
X, 011, X, X, X, X, X - Need to read more reference to understand the
algorithm - http//www.nist.gov/dads/HTML/perfectShuffle.html
- ???
46Effect of packet size on switching fabrics
- A major motivation for using small, fixed packet
size in ATM is ease of building large parallel
fabrics. - But it seems small packet size may not help in
building switches - In general, smaller size gt more per-packet
overhead, but less packetization delay - At high speeds, overhead(such as interrupt time
in 1st gen switch) dominates! - Fixed size packets helps build synchronous switch
- But we could fragment at entry and reassemble at
exit - Or build an asynchronous fabric
- Thus, variable size doesnt hurt too much
- Conclusion maybe Internet routers with
comparatively larger,variable-size packets can be
almost as cost-effective as ATM switches - ???
47Outline
- Circuit switching
- Packet switching
- Switch generations
- Switch fabrics
- Buffer placement
- Multicast switches
48Buffering
- All packet switches need buffers to match input
rate to service rate - or cause heavy packet loses
- Where should we place buffers?
- input
- in the fabric
- output
- shared
49Input buffering (input queuing)
- An arbiter is used to schedule release of packets
from input queues when an access to both switch
fabric and output trunk is available - Advantages
- No speedup in buffers or trunks (unlike output
queued switch), all elements(queues, switch
fabrics, etc) inside only need to run as fast as
input lines - the arbiter needs to run at sum of speed of input
lines - therefore, can possibly build fast speed switches
50Input Queuing(contd)
- Problem head of line blocking(HoL)
- If a packet in the queue blocked, all followed
packets are blocked even if the paths for them
are available. - with randomly distributed packets, utilization of
queue resources is only at most 58.6, even worse
with hot spots (some ports are more favored) - Dealing with HoL
- Per-output queues at inputs, so N outputs -gt N
queues in every input queue group. If one output
access is blocked, others still can go. - Arbiter must choose one of the input ports for
each output port - But how to select?
- Parallel Iterated Matching
- inputs tell arbiter which outputs they are
interested in - output selects one of the inputs
- some inputs may get more than one grant, others
may get none - if gt1 grant, input picks one at random, and tells
output - losing inputs and outputs try again
- Used in DEC Autonet 2 switch
- Arbitration schemes for input-queued switches are
still on the research frontier - ???
51Output queueing
- Dont suffer from head-of-line blocking
- Can also allow control of output queues in fine
grain - But output buffers need to run much faster than
trunk speed - Need to store packets from all inputs, so sum of
all input speed for buffer - Make it more expensive than to build an input
queuing switch. - At the same cost, can build a faster input
queuing switch - Can reduce some of the cost by using the knockout
principle - Unlikely (though possible) that all N inputs will
have packets for the same output - So set a number SltN
- Need drop extra packets, fairly distributing
losses among inputs - ???
52Shared memory
- Route only the header to output port
- Switch fabric only switches packet headers,
easier to build - Bottleneck is time taken to read and write
multiported memory - Need N times as fast as inputs (sum of input
speed) to access memory - Hard to build a large scale switch with many
input - For an 64 input ports and155Mbps each ATM switch,
need to access 64 ATM cells at 53bytes/155Mbps2.7
3us gt access speed1word/3.05ns, too high,
common EDO memory 50ns/word - But can form an element in a multistage switch
53Datapath switch(IDTI) clever shared memory design
- An 8x8 integrated shared-memory ATM switch module
in one chip - Includes a controller, 4k cell memory
- Reduces read/write cost by doing wide reads and
writes(using parallel shift registers) - 1.2 Gbps switch for 50 parts cost
- ???
54Buffered fabric
- Have discussed buffered crossbar fabric, can also
be used in all switch fabrics such as Banyan - Buffers in each switch element
- From first stage to last stage
- advantage
- Speed up is only as much as fan-in
- Hardware backpressure can be used to reduces
buffer requirements - disadvantage
- costly (unless using single-chip switches), many
distributed memory - scheduling is hard, so hard for QoS
- As a result, purely buffered-fabric switch is
impractical, more are actually composed of
buffered single-chip switch elements such as
Datapath chips
55Hybrid solutions
- Buffers at more than one point
- Such that
- a few input buffers to deal with output blocking,
lower output buffer speed, - More output buffers to be well scheduled,
providing QoS - Using shared bus with each interface card
providing both input and output buffering - Becomes hard to analyze and manage
- But common in practice
- Buffer arrangement is a tough point in switch
design - Transmission cost vs. buffer cost
- Buffer is more economical, using as large as
possible memory to reduce transmission link speed - ???
56Outline
- Circuit switching
- Packet switching
- Switch generations
- Switch fabrics
- Buffer placement
- Multicast switches
57Multicasting
- Multicast send packets to several chosen
destinations - Unicast send packets to one destination
- Broadcast send packets to all available
destinations inside a network such as a LAN - Transmission point-to-point, point-to-multipoint,
multipoint-to-multipoint(conference) - better to do this in hardware in switch fabrics,
how? - Multicast packet arrives, port mapper retrieves
the list of outputs(instead of one output port
number) - Incoming packet copied to these output ports
- Two sub-problems
- generating and distributing copies, how?
- VCI translation for the copies
58Generating and distributing copies
- Either implicit or explicit
- Implicit
- suitable for bus-based, ring-based, crossbar, or
broadcast switches - multiple outputs enabled according to the port
list given by port mapper after placing packet on
shared bus - Only put packet on bus one time
- used in Paris and Datapath switches
- Explicit
- Suitable for such as Banyan switch
- need to copy a packet at switch elements
- use a copy network before entering Banyan network
to do this - One input many output
- Propagate packets according to the port list
given by port mapper - Many packets with different port tags at output
of copy network - Output of copy network then as input of Banyan
- Both schemes increase blocking probability
59Header translation
- Recall VCI is swapped in ATM switch
- Normally, in-VCI to out-VCI translation can be
done either at input or output - Just by looking up translation table and finding
out out-VCI and replacing it with in-VCI - Can be done at time of port mapping(input side)
- With multicasting, every copied packet need
individual VCI swapping - so translation easier at output port side
- Need two separate tables port mapping table and
a normal translation table - Port mapping table is at input side, for input
packet to use in-VCI as index to find out a set
of output ports(multicast). - translation table at output side, for every
copied packet to use in-VCI and output port to
find out-VCI and swap it - Need to do two lookups per packet
- ???
60Summary
- 1st generation switches a computer with many
interface cards - Can achieve speed up to 300Mbps(1997)
- 2nd generation switches line cards are more
intelligent and with a high speed bus connected
together directly - Can achieve speed up to 8 or more times of 1st
gen. switches - 3rd generation switches using many parallel high
speed buses - Can achieve speed up to Gbps
61Assignments
- Exercises 8.2
- Exercises 8.9
- Exercises 8.11
- Exercises 8.5