CS 2200 Lecture 15 Networking with a focus on architectural implications presentation

About This Presentation

Transcript and Presenter's Notes

Title: CS 2200 Lecture 15 Networking with a focus on architectural implications

1
CS 2200 Lecture 15Networking (with a focus on
architectural implications)

(Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, and Michael
Niemier)

2
Quiz

2-3 problems

3
Networking

Lots of XANs
SAN system area network (text calls it MPP)
Not designed for generality usually connects
homogeneous nodes
Physical extent is small less than 25 meters
usually much less
Connectivity usually from hundreds to thousands
of nodes
Main focus is high bandwidth and low latency
Supported by MPP industry very model specific
LAN local area network
Heterogeneous hosts assumed designed for
generality
Physical extent usually within a few hundred kms
Connectivity usually in the hundreds of nodes
Performance is typically mundane
Supported by workstation industry definite open
system model

4
One More

WAN wide area network
General connectivity for thousands of
heterogeneous nodes
High bandwidth but latency is usually horrible
Physical extent Ks of kilometers
Supported by the telecommunications industry
Open system standard model

5
Slightly more general model

Software responsible for reliable transmission
Several problem sources
Garbled in transit add checksum trailer
Lost in transit
When message is received a reply must be
generated
Pick a time when, if the message was received,
then the reply would have been received
On a send, start timer, and if it times out, then
assume the message was lost and resend
New message format
2 bit header (request, reply, request-ack,
reply-ack)
32 bit original payload
4 bit trailer checksum

6
Software Steps (just ex.)

Send
Application calls OS to send data
OS copies data to an OS buffer
OS calculates the checksum in this case it will
be put in the trailer starts the timer
OS sends the data to the NW interface and tells
the HW to send it
Receiver
System copies the data from the NW interface into
the OS buffer
System calculates the checksum and checks it with
the transmitted version
If it matches, an ack is sent, and data is copied
into the proper user space location, and the tne
OS signals the application to continue
If it doesnt match the message is deleted
since the sender will resend after the timeout

7
Notice

Symmetry
In send and receive protocols
Similarity
Quite close to a Unix UDP/IP protocol
Note
Lots off things are easy here
Single message basis
Homogeneous environment

8
More realistic scenario

Heterogeneous node types
Enter the big endian vs. little endian byte order
Protocol will determine which transmit order is
required
Wrong Endian
Will need to do byte reversal
On both sends and receives to make up for the
difference

9
Other reliability issues

Duplicate messages
Resend happened but previous try got there anyway
Unique Identifier allows receiver to discriminate
properly
Usually need some realistic and safe timing
assumption to avoid wrap-around aliasing
Rouge messages
Never received but stay in the fabric
Time to live field in packet provides a
self-destruct mechanism
Must work even when receivers buffer is full
Flow control
Requires some form of feedback to sender or
piecewise stall capability
Well defer issues like
Deadlock, livelock, safety, fairness

10
Performance parameters (see board)

Bandwidth
Maximum rate at which interconnection network can
propagate data once a message is in the network
Usually headers, overhead bits included in
calculation
Units are usually in megabits/second, not
megabytes
Sometimes see throughput
Network bandwidth delivered to an application
Time of Flight
Time for 1st bit of message to arrive at receiver
Includes delays of repeaters/switches length /
m (speed of light) (m determines property of
transmission material)
Transmission Time
Time required for message to pass through the
network
size of message divided by the bandwidth

11
Performance parameters (see board)

Transport latency
Time of flight transmission time
Time message spends in interconnection network
But not overhead of pulling out or pushing into
the network
Sender overhead
Time for mP to inject a message into the
interconnection network including both HW and SW
components
Receiver overhead
Time for mP to pull a message out of
interconnection network, including both HW and SW
components
So, total latency of a message is

12
Example
13
Some more odds and ends

Note from the example (with regard to longer
distance)
Time of flight dominates the total latency
component
Repeater delays would factor significantly into
the equation
Message transmission failure rates rise
significantly
Its possible to send other messages with no
responses from previous ones
If you have control of the network
Can help increase network use by overlapping
overheads and transport latencies
Can simplify the total latency equation to
Total latency Overhead (Message
size/bandwidth)
Leads to
Effective bandwidth Message size/Total latency

14
Interconnection Networks
Node
Node
Node
Shared Media (Ethernet)
Node
Node
Switched Media (ATM)
(A. K. A. data switching interchanges,
multistage interconnection networks, interface
message processors)
Switch
Node
Node
15
Shared Media Networks

Need arbitration to decide who gets to talk
Arbitration can be centralized or distributed
Centralized not used much for networks
Special arbiter device (or must elect arbiter)
Good performance if arbiter far away? Nah.
Distributed arbitration
Check if media already used (carrier sensing)
If media not used now, start sending
Check if another also sending (collision
detection)
If collision, wait for a while and retry
For a while is random (otherwise collisions
repeat forever)
Exponential back-off to avoid wasting bandwidth
on collisions

16
Switched Networks

Need switches
Introduces switching overheads
No time wasted on arbitration and collisions
Multiple transfers can be in progress
If they use different links, of course
Circuit or Packet Switching
Circuit switching end-to-end connections
Reserves links for a connection (e.g. phone
network)
Packet switching each packet routed separately
Links used only when data transferred (e.g.
Internet Protocol)

17
Routing

Shared media has trivial routing (broadcast)
In switched media we can have
Source-based (source specifies route)
Virtual circuits (end-to-end route created)
When connection made, set up route
Switches forward packets along the route
Destination-based (source specifies destination)
Switches must route packet toward destination
Also can be classified into
Deterministic (one route from a source to a
destination)
Adaptive (different routes can be used)

18
Routing Messages

Shared Media
Broadcast to everyone!
Switched Media needs real routing. Options
Source-based routing message specifies path to
the destination (changes of direction)
Virtual Circuit circuit established from source
to destination, message picks the circuit to
follow
Destination-based routing message specifies
destination, switch must pick the path
deterministic always follow same path
adaptive pick different paths to avoid
congestion, failures
randomized routing pick between several good
paths to balance network load

19
Routing Methods for Switches

Store-and-Forward
Switch receives entire packet, then forwards it
If error occurs when forwarding, switch can
re-send
Wormhole routing
Packet consists of flits (a few bytes each)
First flit contains header w/ destination address
Switch gets header, decides where to forward
Other flits forwarded as they arrive
Looks like packet worming through network
If an error occurs along the way, sender must
re-send
No switch has the entire packet to re-send it

20
Cut-Through Routing

What happens when link busy?
Header arrives to switch, but outgoing link busy
What do we do with the other flits of the packet?
Wormhole routing stop the tail when head stops
Now each flit along the way blocks the a link
One busy link creates other busy links gt
traffic jam
Cut-Through Routing
If outgoing link busy, receive and buffer
incoming flits
The buffered flits stay there until link becomes
free
When link free, the flits start worming out of
the switch
Need packet-sized buffer space in each switch
Wormhole Routing switch needs to buffer only one
flit

21
Store and Forward vs. Cut-Through

Store-and-forward policy each switch waits for
the full packet to arrive in switch before
sending to the next switch (good for WAN)
Cut-through routing or worm hole routing switch
examines the header, decides where to send the
message, and then starts forwarding it
immediately
In worm hole routing, when head of message is
blocked, message stays strung out over the
network, potentially blocking other messages
(needs only buffer the piece of the packet that
is sent between switches).
Cut through routing lets the tail continue when
head is blocked, accordioning the whole message
into a single switch. (Requires a buffer large
enough to hold the largest packet).
See board

22
Routing Network Latency

Switch Delay
Time from incoming to outgoing link in a switch
Switches
Number of switches along the way
Transfer time
Time to send the packet through a link
Store-and-Forward end-to-end transfer time
(SwitchesSwitchDelay)(TransferTime(Switches1))
Wormhole or Cut-Through end-to-end transfer time
(SwitchesSwitchDelay) TransferTime
Much better if there are many switches along the
way
See the example on page 811

23
Switch Technology

What do we want in a switch
Many input and output links
Usually number of input and output links the same
Low contention inside the switch
Best if there is none (only external links cause
contention)
Short switching delay

24
Switch Technology

Two common switching organizations
Crossbar
Allows any node to communicate with any other
node with 1 pass through an interconnection
Very low switching delay, no internal contention
Complexity grows as square of number of links
Can not have too many links (i.e. 64 in, 64 out)
Omega
Uses less HW (n/2 log2n vs. n2 switches) more
contention
Build switches with more ports using small
crossbars
Lower complexity per link, but longer delay and
more contention

25
(others) crossbar vs. omega
Crossbar
Omega
26
(others) Fat-tree topology
Circles switches, squares processor-memory
nodes
Higher bandwidth, higher in the tree match
common communication patterns
27
(others) Ring topology

Instead of centralizing small switching element,
small switches are placed at each computer
Avoids a full interconnection network
Disadvantages
Some nodes are not directly connected
Results in multiple stops, more overhead
Average message must travel through n/2 switches,
n nodes
Advantages
Unlike shared lines, ring has several transfers
going at once

Example of a ring topology
28
(others) Dedicated communication links

Usually takes the form of a communication link
between every switch
An expensive alternative to a ring
Get big performance gains, but big costs as well
Usually cost scales by the square of the number
of nodes
The big costs led designers to invent things in
between
In other words, topologies between the cost of
rings and the performance of fully connected
networks
Whether or not a topology is good typically
depends on the situation
Some popular topologies for MPPs are
Grids, tours, hypercubes

29
(others) Topologies for commercial MPPs
2D grid or mesh of 16 nodes
2D tour of 16 nodes
Hypercube tree of 16 nodes (16 24, so n 4)
30
Practical issues with topologies

3D drawings have to be mapped to chips
This is easier said than done
Different layers of metal in VLSI/CMOS circuits
help give you added dimensions but only so much
(See board for explanation)
Reality things that should work perfectly
theoretically dont really work in practice
What about the speed of a switch?
If its fixed, more links/switch less
bandwidth/link
Which could make a topology less desirable
Latency through a switch depends on complexity of
routing pattern which depends on the topology

31
Network Topology

What do we want in a network topology
Many nodes, high bandwidth, low contention, low
latency
Low latency few switches along any route
For each (src, dest) pair, we choose shortest
route
Longest such route over all (src,dst) pairs
network diameter
We want networks with small diameter!
Low contention high aggregate bandwidth
Divide network into two groups, each with half
the nodes
Total bandwidth between groups is bisection
bandwidth
Actually, we use the minimum over all such
bisections

32
Bisection bandwidth

A popular measure for MPP connections
Calculated by dividing all of the interconnect of
a machine/system into 2 equal parts
Each part has ½ of the nodes
Then, sum the bandwidth of the lines that cross
the imaginary dividing line
For example
For fully connected interconnections, the
bisection bandwidth is (n/2)2 (n of nodes)
Problem not all interconnection are symmetric
Solution pick the worst possible configuration
We generally want a worst-case estimate

33
Example

See board for bisection bandwidth example

34
Protocol Stacks
35
Protocol Stack TCP/IP
36
Clusters

A kind of message-passing machines
Use commodity components
Or even commodity PCs and LANs
Very cost effective
Uses mass-produced components (cheap)
Very good for highly parallel tasks
E.g. web searches largely independent

37
Rack-Mounted Systems
38
Top 500 Supercomputers
39
TCO for Clusters

Write a Comment

User Comments (0)

About PowerShow.com

CS 2200 Lecture 15 Networking with a focus on architectural implications PowerPoint PPT Presentation