Networking for the Grid - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Networking for the Grid

Description:

Balancing processors, storage and network utilization. Networking is ... too late... Shows linear(ish) region followed by plateau. Optimal socket ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 36
Provided by: YeeTi9
Category:

less

Transcript and Presenter's Notes

Title: Networking for the Grid


1
Networking for the Grid
  • Yee-Ting Li
  • eScience Summer School _at_ Edinburgh

2
What the GRID is
  • Worldwide Distributed System
  • Interconnected with networks
  • Balancing processors, storage and network
    utilization
  • Networking is important to make GRID work

3
Networking Important!
  • Only way two grid nodes can communicate with each
    other
  • Need ways of determining how efficiently they
    talk
  • Focus on
  • The characterising how they talk
  • The language they use to talk

4
Part 1
  • Networking
  • Networking Monitoring
  • Networks are also transient
  • Network performance also varies as youre sharing
    with n million other users
  • Sometimes you can notice periodic patterns
    sometimes you cant
  • Difficult to analyse and create
    trends/predictions
  • Show steps towards

5
Networking 101
  • Networking straight forward
  • Just connect to the network and it works!
  • HA!

6
Networking
  • Complex? Gets more complex!
  • Each node has its own scheduling priorities
  • Routers must serve trillions of data units per
    second!

7
Networking
  • Complex stack from which data has to flow to get
    onto network
  • Each node on the network also has their own
    stacks
  • Routers have IPR on stacks no one knows what
    Cisco stuff looks like!

8
Example Metrics
  • Connectivity
  • Delay
  • One-way delay
  • Two-way delay
  • Throughput / goodput
  • Network path
  • Loss
  • Jitter

9
Metrics Example
  • Video Conferencing
  • Needs predictable bit rate
  • Doesnt usually matter if bit rate changes too
    much
  • Needs constant jitter
  • Low one-way delay preferable
  • FTP
  • Needs reliable transport
  • Throughput depends on urgency of data
  • Jitter and delay dont matter

10
Network Monitoring Uses
  • Monitoring is measuring over long periods of time
  • Gives an indication of network performance over
    time a baseline
  • Allows comparison of different tools for analysis
  • Allows analysis of how different protocols behave
    in different conditions in real life
  • Allows tuning of existing protocols to make
    most out of network

11
Possible Users of a NM Web Service
  • Network Managers
  • See how much bandwidth is being used
  • Network Analysts
  • Make things faster and better!
  • Resource Brokers
  • Broker to determine where to send jobs Network
    Cost
  • Bandwidth Brokers
  • Allocate bandwidth depending on current network
    state
  • Replication Managers
  • Distribute data only when network is not busy
  • QoS Brokers (aka Managed bandwidth Services)
  • Universal language for intercommunication..?
  • Next Generation FTP
  • First look up historical throughputs before
    sending to determine best path

12
GridNM
  • Architecture for monitoring the network
  • Backend collects data for presentation
  • Logs metrics in ASCII log files on a single host
  • Allows mesh measurements all nodes performs
    measurements to al other nodes
  • Uses standard UNIX infrastructure ssh
  • Should be easily adaptable to using Globus
    certifications once interactive processing is
    introduced in EDG.

13
GridNM (cont)
  • Uses existing (and future tools) to collect
    metrics
  • Modular - uses XML to describe available
    resources
  • Hosts
  • Tools
  • Locks hosts if under measurement prevents other
    tests affecting metrics
  • Currently monitoring 6 sites around Europe using
    5 tools

14
GridNM plot
15
Web Service Network Monitoring
  • GridNM just one Network Monitoring Program
  • Many different programs out there!
  • Unify data exchange between different monitoring
    infrastructures

16
piPEs
  • Internet2 e2ePI Architecture for network
    monitoring
  • Defines information flow to diagnose networks and
    hosts performance white paper
  • Incorporates a finger pointing mechanism to
    identify poor performers
  • Ideal starting point!
  • BUT found out about it too late
  • Currently investigating implementation with SLAC
    software web service as possible implementation
    of piPEs software

17
GGF NMWG
  • Defines characteristics that are just the values
    that we are interested in
  • Defines classes of metrics, e.g. bandwidth, delay
    etc. that these characteristics report
  • Defines singleton and derived characteristics
  • Defines samples of data and their inherent
    sampling patterns
  • Timestamps
  • Still in draft form

18
GGF NMWG cont. / Schema Design
  • As its all in XML, designing a XML schema to
    describe objects to be passed around
  • XML Schema Document (XSD)
  • Focusing actually implementing what the NMWG
    document says and doesnt say
  • Note We are also tackling this from a pure OO
    design too however, due to technical
    differences between objects in C, Java and
    SOAP/XML then there may be issues to overcome

19
Part 2
  • Network Communication Languages
  • Known as transport protocols - determines how
    applications put traffic into the network
  • Sits on top of IP common language of the
    internet

20
Transport Level Protocols
  • TCP (HTTP, FTP, GridFTP) used for file transfer
  • Gives guarantee on delivery
  • All data is copied precisely
  • Performance can be poor
  • Respects other internet users
  • UDP (Real, H323) used for video conferencing
  • Gives no guarantees on delivery
  • Data may be incomplete
  • Performance good
  • Doesnt respect other internet users

21
UDP vs TCP
  • Udp min274, max565, ave493, stdev43
  • Tcp min37, max292, ave195, stdev40
  • Summary tcp is rubbish! why?

22
Memory and Disk transfers
Fast Ethernet
Over 60Mbits/s iperf gtgt file copy
OC3
Disk limited
File copy disk-to-disk
Iperf TCP Mbits/s
Les Cottrell, SLAC
23
What does TCP do?
Socket buffer size
  • TCP retransmits lost data
  • Even retransmits data it thinks has been lost!
  • Needs and uses a windowing system
  • Uses ACKnowledgements from reciever
  • Grows a Congestion Window cwnd to determine the
    size of window
  • Model
  • Tap is independent of Tank size
  • Tank filled by application
  • Valve opening (data rate) determined by feedback
    from network
  • Small tanks mean small data rate
  • Large tanks mean larger data rate

TCP Protocol
Network
24
TCP socket buffer sizes
  • Iperf observations 490
  • Standard socket buffer graph
  • Shows linear(ish) region followed by plateau
  • Optimal socket buffer size just over 2mB

25
Retransmitted Data
  • Graph shows the amount of retransmitted data
    against the throughput
  • Retransmitted data is due to loss on the network
  • General case ACKs have to timeout before
    resending
  • We get more retransmitted data for low
    throughputs with large windows

26
Measuring Performance of Transport Level Protocols
  • Need to identify what we want to measure the
    metrics.
  • Dependant on the use of the transport protocol.
    Need to analyse application level usage
  • For Grid
  • Movement of transient data
  • File Transfer and Replication
  • process jobs or sandboxes
  • Movement of Real-Time Data
  • Video Conferencing Access Grid
  • Real-Time applications

27
Web 100 TCP
  • OSI states that we should not know anything about
    the separate layers
  • How do we know something is going wrong? your
    throughput decreases!
  • Prevents congestion collapse!
  • Need Web100! Allows in depth tcp stack analysis
    per flow
  • Kernel patch 2.4.16, alpha1.2
  • New version 2.4.19 alpha2.0pre1
  • Using program to grab web100 results - logvars

28
Reliability of Web100 results
  • Still alpha but reliable
  • Graph against iperf throughputs correlate very
    well
  • At least as reliable as the result offered by
    iperf!

29
Congestion Window
  • Looking at the max_cwnd achieved for each
    measurement
  • Appears to be two regions
  • with high correlation of throughput and max cwnd
  • A linear region where we get the a range of
    throughputs for same max_cwnd
  • Cwnd never grows beyond 1500kbytes!

30
Bandwidth Delay Product
  • Window bandwidth delay
  • We want
  • Bandwidth 1,000,000,000 bit/sec
  • We have
  • Delay 19ms
  • Window needs to be an average of
  • 1e9 19e-3 / 8 bytes
  • 2.25mbytes!
  • We only achieve 1.5mbytes max!
  • Need to implement some monitoring of the degree
    of the average and variation of cwnd for each tcp
    connection

31
TCP Optimisation
  • Its actually TCP that is limiting our transfer
    rates!
  • All applications use it!
  • Understandable as TCP hasnt changed much for the
    last 15-20 years!
  • When standard link was about 56kbit/sec!
  • Solution Need new TCP implementations!

32
What is High Speed TCP?
  • Changes the way TCP behaves at high speed (ie
    large cwnd)
  • Standard TCP has two modes
  • Slow start (not very slow)
  • Congestion Avoidance
  • Focuses on Congestion Avoidance Region ie when
    TCP knows (thinks it knows) how well the network
    behaves
  • BUT only when we are at high speeds, else do what
    normal Standard TCP does
  • Readily deployable 1st step towards Equation
    Based Congestion Control

33
What does it do?
  • Standard TCP uses two parameters
  • Increase parameter, a
  • Decrease parameter, b
  • i.e. AIMD( a,b )
  • Standard TCP uses
  • a1
  • b0.5
  • High Speed TCP introduces
  • a-gta(cwnd)
  • b-gtb(cwnd)
  • i.e. The value of a and b depends on the current
    congestion window size
  • If we increase a more with larger cwnd we can get
    back up to our optimal cwnd size for the
    network path
  • If we decrease b less we dont lose as much
    bandwidth due to a small congestion window

34
What exactly does it do?
  • Based on the TCP response function
  • Relates loss and throughput
  • Uses the TCP response function to investigate
    certain parameters
  • High_Window, High_Loss largest cwnd needed for x
    throughput and the required loss for that
    throughput
  • Low_Window, Low_Loss smallest cwnd when we
    actually switch from Standard TCP and the
    required loss rate for that cwnd size
  • High_B the smallest decrease in b when we are at
    a large cwnd
  • Equations to transform this information into a
    table for a(cwnd) and b(cwnd)

35
Transport Protocols NG
Name Transport Notes
UDP Blast UDP
Tsunami UDP/TCP Uses TCP as control channel
High Speed TCP TCP For 10Gb/sec links
PGM / CC Modified UDP Multicast UDP new transport protocol
IBP Application logistical networking
Write a Comment
User Comments (0)
About PowerShow.com