Title: Networking for the Grid
1Networking for the Grid
- Yee-Ting Li
- eScience Summer School _at_ Edinburgh
2What the GRID is
- Worldwide Distributed System
- Interconnected with networks
- Balancing processors, storage and network
utilization - Networking is important to make GRID work
3Networking Important!
- Only way two grid nodes can communicate with each
other - Need ways of determining how efficiently they
talk - Focus on
- The characterising how they talk
- The language they use to talk
4Part 1
- Networking
- Networking Monitoring
- Networks are also transient
- Network performance also varies as youre sharing
with n million other users - Sometimes you can notice periodic patterns
sometimes you cant - Difficult to analyse and create
trends/predictions - Show steps towards
5Networking 101
- Networking straight forward
- Just connect to the network and it works!
- HA!
6Networking
- Complex? Gets more complex!
- Each node has its own scheduling priorities
- Routers must serve trillions of data units per
second!
7Networking
- Complex stack from which data has to flow to get
onto network - Each node on the network also has their own
stacks - Routers have IPR on stacks no one knows what
Cisco stuff looks like!
8Example Metrics
- Connectivity
- Delay
- One-way delay
- Two-way delay
- Throughput / goodput
- Network path
- Loss
- Jitter
9Metrics Example
- Video Conferencing
- Needs predictable bit rate
- Doesnt usually matter if bit rate changes too
much - Needs constant jitter
- Low one-way delay preferable
- FTP
- Needs reliable transport
- Throughput depends on urgency of data
- Jitter and delay dont matter
10Network Monitoring Uses
- Monitoring is measuring over long periods of time
- Gives an indication of network performance over
time a baseline - Allows comparison of different tools for analysis
- Allows analysis of how different protocols behave
in different conditions in real life - Allows tuning of existing protocols to make
most out of network
11Possible Users of a NM Web Service
- Network Managers
- See how much bandwidth is being used
- Network Analysts
- Make things faster and better!
- Resource Brokers
- Broker to determine where to send jobs Network
Cost - Bandwidth Brokers
- Allocate bandwidth depending on current network
state - Replication Managers
- Distribute data only when network is not busy
- QoS Brokers (aka Managed bandwidth Services)
- Universal language for intercommunication..?
- Next Generation FTP
- First look up historical throughputs before
sending to determine best path
12GridNM
- Architecture for monitoring the network
- Backend collects data for presentation
- Logs metrics in ASCII log files on a single host
- Allows mesh measurements all nodes performs
measurements to al other nodes - Uses standard UNIX infrastructure ssh
- Should be easily adaptable to using Globus
certifications once interactive processing is
introduced in EDG.
13GridNM (cont)
- Uses existing (and future tools) to collect
metrics - Modular - uses XML to describe available
resources - Hosts
- Tools
- Locks hosts if under measurement prevents other
tests affecting metrics - Currently monitoring 6 sites around Europe using
5 tools
14GridNM plot
15Web Service Network Monitoring
- GridNM just one Network Monitoring Program
- Many different programs out there!
- Unify data exchange between different monitoring
infrastructures
16piPEs
- Internet2 e2ePI Architecture for network
monitoring - Defines information flow to diagnose networks and
hosts performance white paper - Incorporates a finger pointing mechanism to
identify poor performers - Ideal starting point!
- BUT found out about it too late
- Currently investigating implementation with SLAC
software web service as possible implementation
of piPEs software
17GGF NMWG
- Defines characteristics that are just the values
that we are interested in - Defines classes of metrics, e.g. bandwidth, delay
etc. that these characteristics report - Defines singleton and derived characteristics
- Defines samples of data and their inherent
sampling patterns - Timestamps
- Still in draft form
18GGF NMWG cont. / Schema Design
- As its all in XML, designing a XML schema to
describe objects to be passed around - XML Schema Document (XSD)
- Focusing actually implementing what the NMWG
document says and doesnt say - Note We are also tackling this from a pure OO
design too however, due to technical
differences between objects in C, Java and
SOAP/XML then there may be issues to overcome
19Part 2
- Network Communication Languages
- Known as transport protocols - determines how
applications put traffic into the network - Sits on top of IP common language of the
internet
20Transport Level Protocols
- TCP (HTTP, FTP, GridFTP) used for file transfer
- Gives guarantee on delivery
- All data is copied precisely
- Performance can be poor
- Respects other internet users
- UDP (Real, H323) used for video conferencing
- Gives no guarantees on delivery
- Data may be incomplete
- Performance good
- Doesnt respect other internet users
21UDP vs TCP
- Udp min274, max565, ave493, stdev43
- Tcp min37, max292, ave195, stdev40
- Summary tcp is rubbish! why?
22Memory and Disk transfers
Fast Ethernet
Over 60Mbits/s iperf gtgt file copy
OC3
Disk limited
File copy disk-to-disk
Iperf TCP Mbits/s
Les Cottrell, SLAC
23What does TCP do?
Socket buffer size
- TCP retransmits lost data
- Even retransmits data it thinks has been lost!
- Needs and uses a windowing system
- Uses ACKnowledgements from reciever
- Grows a Congestion Window cwnd to determine the
size of window - Model
- Tap is independent of Tank size
- Tank filled by application
- Valve opening (data rate) determined by feedback
from network - Small tanks mean small data rate
- Large tanks mean larger data rate
TCP Protocol
Network
24TCP socket buffer sizes
- Iperf observations 490
- Standard socket buffer graph
- Shows linear(ish) region followed by plateau
- Optimal socket buffer size just over 2mB
25Retransmitted Data
- Graph shows the amount of retransmitted data
against the throughput - Retransmitted data is due to loss on the network
- General case ACKs have to timeout before
resending - We get more retransmitted data for low
throughputs with large windows
26Measuring Performance of Transport Level Protocols
- Need to identify what we want to measure the
metrics. - Dependant on the use of the transport protocol.
Need to analyse application level usage - For Grid
- Movement of transient data
- File Transfer and Replication
- process jobs or sandboxes
- Movement of Real-Time Data
- Video Conferencing Access Grid
- Real-Time applications
27Web 100 TCP
- OSI states that we should not know anything about
the separate layers - How do we know something is going wrong? your
throughput decreases! - Prevents congestion collapse!
- Need Web100! Allows in depth tcp stack analysis
per flow - Kernel patch 2.4.16, alpha1.2
- New version 2.4.19 alpha2.0pre1
- Using program to grab web100 results - logvars
28Reliability of Web100 results
- Still alpha but reliable
- Graph against iperf throughputs correlate very
well - At least as reliable as the result offered by
iperf!
29Congestion Window
- Looking at the max_cwnd achieved for each
measurement - Appears to be two regions
- with high correlation of throughput and max cwnd
- A linear region where we get the a range of
throughputs for same max_cwnd - Cwnd never grows beyond 1500kbytes!
30Bandwidth Delay Product
- Window bandwidth delay
- We want
- Bandwidth 1,000,000,000 bit/sec
- We have
- Delay 19ms
- Window needs to be an average of
- 1e9 19e-3 / 8 bytes
- 2.25mbytes!
- We only achieve 1.5mbytes max!
- Need to implement some monitoring of the degree
of the average and variation of cwnd for each tcp
connection
31TCP Optimisation
- Its actually TCP that is limiting our transfer
rates! - All applications use it!
- Understandable as TCP hasnt changed much for the
last 15-20 years! - When standard link was about 56kbit/sec!
- Solution Need new TCP implementations!
32What is High Speed TCP?
- Changes the way TCP behaves at high speed (ie
large cwnd) - Standard TCP has two modes
- Slow start (not very slow)
- Congestion Avoidance
- Focuses on Congestion Avoidance Region ie when
TCP knows (thinks it knows) how well the network
behaves - BUT only when we are at high speeds, else do what
normal Standard TCP does - Readily deployable 1st step towards Equation
Based Congestion Control
33What does it do?
- Standard TCP uses two parameters
- Increase parameter, a
- Decrease parameter, b
- i.e. AIMD( a,b )
- Standard TCP uses
- a1
- b0.5
- High Speed TCP introduces
- a-gta(cwnd)
- b-gtb(cwnd)
- i.e. The value of a and b depends on the current
congestion window size - If we increase a more with larger cwnd we can get
back up to our optimal cwnd size for the
network path - If we decrease b less we dont lose as much
bandwidth due to a small congestion window
34What exactly does it do?
- Based on the TCP response function
- Relates loss and throughput
- Uses the TCP response function to investigate
certain parameters - High_Window, High_Loss largest cwnd needed for x
throughput and the required loss for that
throughput - Low_Window, Low_Loss smallest cwnd when we
actually switch from Standard TCP and the
required loss rate for that cwnd size - High_B the smallest decrease in b when we are at
a large cwnd - Equations to transform this information into a
table for a(cwnd) and b(cwnd)
35Transport Protocols NG
Name Transport Notes
UDP Blast UDP
Tsunami UDP/TCP Uses TCP as control channel
High Speed TCP TCP For 10Gb/sec links
PGM / CC Modified UDP Multicast UDP new transport protocol
IBP Application logistical networking