Networking for the Grid - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Networking for the Grid

Description:

Balancing processors, storage and network utilization. Networking is ... too late... Shows linear(ish) region followed by plateau. Optimal socket ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 36

Provided by: YeeTi9

Category:

more less

Transcript and Presenter's Notes

Title: Networking for the Grid

1
Networking for the Grid

Yee-Ting Li
eScience Summer School _at_ Edinburgh

2
What the GRID is

Worldwide Distributed System
Interconnected with networks
Balancing processors, storage and network
utilization
Networking is important to make GRID work

3
Networking Important!

Only way two grid nodes can communicate with each
other
Need ways of determining how efficiently they
talk
Focus on
The characterising how they talk
The language they use to talk

4
Part 1

Networking
Networking Monitoring
Networks are also transient
Network performance also varies as youre sharing
with n million other users
Sometimes you can notice periodic patterns
sometimes you cant
Difficult to analyse and create
trends/predictions
Show steps towards

5
Networking 101

Networking straight forward
Just connect to the network and it works!
HA!

6
Networking

Complex? Gets more complex!
Each node has its own scheduling priorities
Routers must serve trillions of data units per
second!

7
Networking

Complex stack from which data has to flow to get
onto network
Each node on the network also has their own
stacks
Routers have IPR on stacks no one knows what
Cisco stuff looks like!

8
Example Metrics

Connectivity
Delay
One-way delay
Two-way delay
Throughput / goodput
Network path
Loss
Jitter

9
Metrics Example

Video Conferencing
Needs predictable bit rate
Doesnt usually matter if bit rate changes too
much
Needs constant jitter
Low one-way delay preferable
FTP
Needs reliable transport
Throughput depends on urgency of data
Jitter and delay dont matter

10
Network Monitoring Uses

Monitoring is measuring over long periods of time
Gives an indication of network performance over
time a baseline
Allows comparison of different tools for analysis
Allows analysis of how different protocols behave
in different conditions in real life
Allows tuning of existing protocols to make
most out of network

11
Possible Users of a NM Web Service

Network Managers
See how much bandwidth is being used
Network Analysts
Make things faster and better!
Resource Brokers
Broker to determine where to send jobs Network
Cost
Bandwidth Brokers
Allocate bandwidth depending on current network
state
Replication Managers
Distribute data only when network is not busy
QoS Brokers (aka Managed bandwidth Services)
Universal language for intercommunication..?
Next Generation FTP
First look up historical throughputs before
sending to determine best path

12
GridNM

Architecture for monitoring the network
Backend collects data for presentation
Logs metrics in ASCII log files on a single host
Allows mesh measurements all nodes performs
measurements to al other nodes
Uses standard UNIX infrastructure ssh
Should be easily adaptable to using Globus
certifications once interactive processing is
introduced in EDG.

13
GridNM (cont)

Uses existing (and future tools) to collect
metrics
Modular - uses XML to describe available
resources
Hosts
Tools
Locks hosts if under measurement prevents other
tests affecting metrics
Currently monitoring 6 sites around Europe using
5 tools

14
GridNM plot
15
Web Service Network Monitoring

GridNM just one Network Monitoring Program
Many different programs out there!
Unify data exchange between different monitoring
infrastructures

16
piPEs

Internet2 e2ePI Architecture for network
monitoring
Defines information flow to diagnose networks and
hosts performance white paper
Incorporates a finger pointing mechanism to
identify poor performers
Ideal starting point!
BUT found out about it too late
Currently investigating implementation with SLAC
software web service as possible implementation
of piPEs software

17
GGF NMWG

Defines characteristics that are just the values
that we are interested in
Defines classes of metrics, e.g. bandwidth, delay
etc. that these characteristics report
Defines singleton and derived characteristics
Defines samples of data and their inherent
sampling patterns
Timestamps
Still in draft form

18
GGF NMWG cont. / Schema Design

As its all in XML, designing a XML schema to
describe objects to be passed around
XML Schema Document (XSD)
Focusing actually implementing what the NMWG
document says and doesnt say
Note We are also tackling this from a pure OO
design too however, due to technical
differences between objects in C, Java and
SOAP/XML then there may be issues to overcome

19
Part 2

Network Communication Languages
Known as transport protocols - determines how
applications put traffic into the network
Sits on top of IP common language of the
internet

20
Transport Level Protocols

TCP (HTTP, FTP, GridFTP) used for file transfer
Gives guarantee on delivery
All data is copied precisely
Performance can be poor
Respects other internet users
UDP (Real, H323) used for video conferencing
Gives no guarantees on delivery
Data may be incomplete
Performance good
Doesnt respect other internet users

21
UDP vs TCP

Udp min274, max565, ave493, stdev43
Tcp min37, max292, ave195, stdev40
Summary tcp is rubbish! why?

22
Memory and Disk transfers
Fast Ethernet
Over 60Mbits/s iperf gtgt file copy
OC3
Disk limited
File copy disk-to-disk
Iperf TCP Mbits/s
Les Cottrell, SLAC
23
What does TCP do?
Socket buffer size

TCP retransmits lost data
Even retransmits data it thinks has been lost!
Needs and uses a windowing system
Uses ACKnowledgements from reciever
Grows a Congestion Window cwnd to determine the
size of window
Model
Tap is independent of Tank size
Tank filled by application
Valve opening (data rate) determined by feedback
from network
Small tanks mean small data rate
Large tanks mean larger data rate

TCP Protocol
Network
24
TCP socket buffer sizes

Iperf observations 490
Standard socket buffer graph
Shows linear(ish) region followed by plateau
Optimal socket buffer size just over 2mB

25
Retransmitted Data

Graph shows the amount of retransmitted data
against the throughput
Retransmitted data is due to loss on the network
General case ACKs have to timeout before
resending
We get more retransmitted data for low
throughputs with large windows

26
Measuring Performance of Transport Level Protocols

Need to identify what we want to measure the
metrics.
Dependant on the use of the transport protocol.
Need to analyse application level usage
For Grid
Movement of transient data
File Transfer and Replication
process jobs or sandboxes
Movement of Real-Time Data
Video Conferencing Access Grid
Real-Time applications

27
Web 100 TCP

OSI states that we should not know anything about
the separate layers
How do we know something is going wrong? your
throughput decreases!
Prevents congestion collapse!
Need Web100! Allows in depth tcp stack analysis
per flow
Kernel patch 2.4.16, alpha1.2
New version 2.4.19 alpha2.0pre1
Using program to grab web100 results - logvars

28
Reliability of Web100 results

Still alpha but reliable
Graph against iperf throughputs correlate very
well
At least as reliable as the result offered by
iperf!

29
Congestion Window

Looking at the max_cwnd achieved for each
measurement
Appears to be two regions
with high correlation of throughput and max cwnd
A linear region where we get the a range of
throughputs for same max_cwnd
Cwnd never grows beyond 1500kbytes!

30
Bandwidth Delay Product

Window bandwidth delay
We want
Bandwidth 1,000,000,000 bit/sec
We have
Delay 19ms
Window needs to be an average of
1e9 19e-3 / 8 bytes
2.25mbytes!
We only achieve 1.5mbytes max!
Need to implement some monitoring of the degree
of the average and variation of cwnd for each tcp
connection

31
TCP Optimisation

Its actually TCP that is limiting our transfer
rates!
All applications use it!
Understandable as TCP hasnt changed much for the
last 15-20 years!
When standard link was about 56kbit/sec!
Solution Need new TCP implementations!

32
What is High Speed TCP?

Changes the way TCP behaves at high speed (ie
large cwnd)
Standard TCP has two modes
Slow start (not very slow)
Congestion Avoidance
Focuses on Congestion Avoidance Region ie when
TCP knows (thinks it knows) how well the network
behaves
BUT only when we are at high speeds, else do what
normal Standard TCP does
Readily deployable 1st step towards Equation
Based Congestion Control

33
What does it do?

Standard TCP uses two parameters
Increase parameter, a
Decrease parameter, b
i.e. AIMD( a,b )
Standard TCP uses
a1
b0.5
High Speed TCP introduces
a-gta(cwnd)
b-gtb(cwnd)
i.e. The value of a and b depends on the current
congestion window size
If we increase a more with larger cwnd we can get
back up to our optimal cwnd size for the
network path
If we decrease b less we dont lose as much
bandwidth due to a small congestion window

34
What exactly does it do?

Based on the TCP response function
Relates loss and throughput
Uses the TCP response function to investigate
certain parameters
High_Window, High_Loss largest cwnd needed for x
throughput and the required loss for that
throughput
Low_Window, Low_Loss smallest cwnd when we
actually switch from Standard TCP and the
required loss rate for that cwnd size
High_B the smallest decrease in b when we are at
a large cwnd
Equations to transform this information into a
table for a(cwnd) and b(cwnd)

35
Transport Protocols NG
Name Transport Notes
UDP Blast UDP
Tsunami UDP/TCP Uses TCP as control channel
High Speed TCP TCP For 10Gb/sec links
PGM / CC Modified UDP Multicast UDP new transport protocol
IBP Application logistical networking

Write a Comment

User Comments (0)