Endtoend performance: issues and suggestions - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Endtoend performance: issues and suggestions

Description:

Mark = a pseudo-Grid end user. I'm not a real user, but I look ... Name of the CE: fangorn.man.poznan.pl:2119/jobmanager-lcgpbs-dteam. se1.egee.man.poznan.pl ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 63
Provided by: rtlin
Category:

less

Transcript and Presenter's Notes

Title: Endtoend performance: issues and suggestions


1
End-to-end performanceissues and suggestions
  • TERENA 5th NRENs and Grids Workshop
  • Paris, June 2007
  • Mark Leese

2
Talk Emphasis
  • monALISA a monitoring tool/framework
  • DANTE a network operator
  • EGEE-II a Grid
  • Mark a pseudo-Grid end user
  • Im not a real user, but I look at the issues
    from their viewpoint
  • Large Hadron Collider in the UK (GridPP)
  • UK e-Science
  • OGF
  • Aimed at a mixed audience (NRENs and Grid users)
    so some network/Grid things you will
    already.Zzzzzzzzzzzz )

3
Contents
  • Just two things
  • What makes the Grid different to other network
    users, wrt performance?
  • What are the end-to-end performance (monitoring)
    issues? Any suggestions?
  • If the links in the presentation dont work,
  • they are listed again on the last three slides

4
1. What makes the Grid differentto other network
users, wrt performance?
5
The Grid
  • The Grid is all about
  • Sharing resources
  • the obvious, e.g. databases
  • the specialised, e.g. remotely control telescopes
  • and new ideas, e.g. CPU time
  • co-allocate resources to a task to remove the
    limitations of the individual resources
  • most basic analogy you can move house faster if
    you have two vans
  • Sharing resources which are geographically
    distributed
  • Sharing resources efficiently
  • optimisation selecting the best resources for
    the job

6
The Grid
Network(s)
7
The Grid
  • Get apps running on the right resources
    (wherever they are)
  • Make disparate compute resources into a coherent
    whole

Network(s)
8
Optimisation
  • Its a little like the checkout counters in a
    supermarket
  • There is a line of 10 checkouts to which you can
    take your big shopping basket
  • Two checkouts you cannot use. They are for people
    with five items or less caisse express
  • Another two checkouts cannot be used. They are
    reserved for something else (the staffs lunch
    break)
  • Six left how big is each queue and how long will
    it take each person to exit the queue (how many
    items in each basket)?
  • If you choose wrong, you get delayed!
  • You miss the train, you get home late,
  • your partner has given your dinner to the dog
  • To take the analogy to extremes hopefully your
    basket does not have a broken wheel )

9
Scheduling
  • Grid job the basic unit of work
  • SEs provide storage resources and access to mass
    storage systems
  • CEs provide processing power, e.g. cluster of
    Worker Nodes (PC farm)
  • Scheduling deciding when a job will run, and
    with which resources
  • Typically there will be many CEs capable of
    running a job
  • If a CE already has lots of jobs queued, you
    would like to use another
  • File replication proven technique for improving
    data access
  • Distribute multiple copies of the same file
    across a Grid
  • Increases number of CEs with good network
    connectivity to the data
  • Extreme example Pisa?Roma or Pisa?Fermilab?
  • So, typically there may also be several SEs
    holding the required data

10
Network Aware Scheduling (i)
  • So we have a set of CEs a,b,c, and SEs
    x,y,z, capable of running a job
  • We want a node from each list such that the job
    will complete the fastest
  • Take account of
  • capability of CEs
  • size and number of jobs already waiting (queued)
    at CEs
  • performance of network link for each CE-SE
    combination
  • Further complicated by the compute/data intensity
    of the job
  • computationally intensive job lots of maths
  • data intensive job lots and lots and lots of
    data
  • do we pull the data to the job or push the job to
    the data?

11
Network Aware Scheduling (ii)
  • In Utopia we would know about the current state
    of the network, and any future reserved bandwidth
  • In reality we could use monitored network
    performance to make an estimate
  • Its not perfect, but patterns (diurnal
    variation, chronic poor performance) can be
    identified
  • The following slides show iperf tests between
    dedicated test nodes at LHC sites in the UK
    (GridPPs gridmon infrastructure)

12
Network Aware Scheduling (iii.a)
  • Transfer at 0000, yes. Transfer at 1200, no.
    Theres a big difference between 500 and 200 Mbps
    for data intensive jobs!

13
Network Aware Scheduling (iii.b)
  • RAL Tier-2?Tier-1 local transfers are likely the
    best performers

14
Network Aware Scheduling (iii.c)
  • Here, you have absolutely no idea what
    performance you would get ? avoid
  • Summary ignore the network at your peril )

15
Network Aware Scheduling (iv)
  • Two good papers to read
  • B. Volckaert, P. Thysebaert, M. De Leenheer, F.
    De Turck, B. Dhoedt, P. Demeester
  • Network Aware Scheduling in Grids
  • Richard McClatchey, Ashiq Anjum, Heinz
    Stockinger, Arshad Ali, Ian Willers, Michael
    Thomas
  • Data Intensive and Network Aware (DIANA) Grid
    Scheduling
  • We dont consider potential uses in more detail
    (job placement, replica selection) because we
    dont know if it will happen!

16
Network Aware Scheduling (v)
  • There are some ve feelings
  • The network is not a problem. Over-provisioning
    will always keep us ahead. Either that or fibre
    and GigE everywhere
  • Report of the International Grid Performance
    Workshop 2005 concluded that "Performance simply
    is not on the critical path for many application
    projects. Applications that struggle to get code
    to execute correctly simply do not consider
    whether they are using resources efficiently or
    achieving good performance
  • Personal experience suggests that there is so
    much to think about elsewhere, that the network
    is often the last thing to be considered
  • Right now, Grid apps rely on the network being
    good, with no real checks
  • And by way of real life indications
  • EDG WP7 developed network cost function
  • Returned cost of variable size file transfers
    between source and dest Grid elements
  • Based on periodic (WP7) iperf measurements
  • Used by WP2 Replica Optimization Service
  • job placement where to start a job so that it is
    as close as possible to the required data
  • replica selection from where to fetch the
    closest replica once a job had started
  • EDG was not a production Grid, and the work was
    not taken forward

17
Network Aware Scheduling (vi)
  • In EGEE
  • Tommaso Coviello and Tiziana Ferrrari proposed to
    use network performance data from EGEE-JRA4
  • CompletionTimeCEi JobExecutionTime
  • max(InputDataTransferTime,QueueTime)
  • estimate file transfer times based on thruput
  • reject paths exhibiting packet loss
  • SEs selection refined based on SEs using low
    congestion links (jitter the suggested test)
  • Some prototype work, but not taken forward
  • QueueTime found to be unreliable
  • Data for 100 paths required within 0.2 seconds of
    receiving request
  • Grid Information Service was not ready to hold
    the data
  • a problem for JRA4s Web Service interface (WS, ?
    accessible but slow)

18
Network Aware Scheduling (vii)
  • In WLCG/EGEE (if I understand correctly)
  • The close SE approach is applied
  • Each CE must have a close SE the node with the
    best access for data retrieval from that CE
  • These relationships are statically defined in the
    Grids Information Service, which provides
    information about the Grid resources and their
    status
  • lcg-infosites --vo dteam closeSE
  • Name of the CE g02.phy.bg.ac.yu2119/blah-pbs-dt
    eam
  • se.phy.bg.ac.yu
  • Name of the CE fangorn.man.poznan.pl2119/jobman
    ager-lcgpbs-dteam
  • se1.egee.man.poznan.pl
  • se2.egee.man.poznan.pl

19
Network Aware Scheduling (viii)
  • To run a job the user submits a job description
    in JDL (Job Description Language) format
  • It defines which executable to run, any
    parameters, input data (Grid files) etc.
  • A match-making process then takes places to
    identify a CE to execute the job
  • Identify all CEs which
  • can run the job, i.e. match the users
    requirements (JDL)
  • are close to an SE holding the required input
    Grid files
  • select CE with the highest rank
  • by default, rank estimation of the time
    interval between the being job submitted and
    execution actually beginning
  • a function of the number of running and queued
    jobs at each CE
  • See gLite User Guide for more info
  • As already stated, the presence of replicas of
    data increases the number of CEs close to the
    data which can potentially execute the job
  • But decisions are still made on the static
    declaration of close SEs
  • Users are able to re-write the site selection
    code themselves

20
Difference 1
  • So, difference 1
  • The Grid may use network performance data to
    improve its decision making

21
Difference 2
  • Difference 2
  • The Grid will exercise the network

22
Qualitative View
  • By its very nature
  • sharing lots of resources to build powerful
    systems
  • to process complex, large data sets
  • in geographically distributed teams
  • some in real-time, e.g. visualisation
  • so far there has been lots of embarrassingly
    parallel problems (completely independent tasks
    which can be executed in parallel) but what about
    tasks requiring inter-processor communication
    (MPI, Message Passing Interface)?
  • a lot of data moving across the network
  • high bandwidth
  • low-latency
  • stable and guaranteed transmission rates

23
Quantitative View (i)
  • The Large Hadron Collider is a collection of four
    experiments based at CERN (ALICE, ATLAS, CMS and
    LHCb) that will monitor the collision of
    accelerated particles
  • 15 Petabytes of data generated every year
  • Around 100,000 standard CPUs required to process
  • GridPP (UK) is contributing the equivalent of
    10,000 PCs

24
Quantitative View (ii)
  • My understanding is that the LHC when
    operational, will be pushing out 700 Mbytes/s (
    5 Gbps) from the Tier-0 to each Tier-1
  • 11 Tier-1s, linked to CERN with 10 Gbps Optical
    Private Network
  • So no problems there
  • Additional variable flows 4 Gbps are expected
    between the Tier-1s
  • What about Tier-1s to Tier-2s?
  • gt 150 Tier-2s, 18 in UK
  • Tier-1s and Tier-2s currently linked by standard
    research networks
  • Are you going to commission dedicated fibres or
    lambdas for each?

25
Quantitative View (iii)
26
Rolls Royce Networks
  • Lots of projects working on adding extra
    intelligence into the network, and/or interfacing
    Grid applications with network control plane for
    auto-provisioning of dedicated bandwidth
  • Ciscos Network Based On-demand/Grid System
    (NBGS)
  • The NAREGI project
  • Enlightened Computing
  • http//www.g-lambda.net/
  • These are still development projects
  • Can fibre/lambdas be provided for all that need
    it?
  • Even if provided, temptation to spend on CPU
    power?
  • May still fall victim to end-system and last
    mile (e.g. firewall) problems

27
Is the Grid a lot of Hype?
  • Its good to be skeptical about things. Every
    four years people say England will win the World
    Cup/Coupe du Monde -)
  • The Grid is ambitious
  • but so was the World Wide Wait
  • Now everyone loves the Web, and it has become
    important to people
  • Internet banking, online shopping (flights,
    holidays, music, supermarket), e-Government etc.
    etc.
  • MySpace, Facebook, YouTube
  • The Web also drove investment in the Net
    infrastructure and as a result it can now support
    video conferencing, VoIP etc.

28
Summary of Differences
  • Network Operations We can safely say that
    greater demands will be placed on the network
  • massive datasets, 1000s of networked resources
  • geographically distributed Long Fat Networks
  • high bandwidth, high availability, low latency
  • networks will need to be debugged for efficiency
  • Network Intelligence The Grid may want to
    consume network performance data to improve its
    decision making

29
2. What are the end-to-endperformance
(monitoring) issues?
30
The Overall Issue
  • We have seen that the Grid could use network
    performance data for decision making
  • but we dont know whether it will
  • As a result, we concentrate on debugging the
    network for Grid users

31
End-to-End?
  • When I say end-to-end I mean PC-PC, not PoP to
    PoP or similar
  • Core and Metro Area are normally fine
  • Most problems are in the last mile
  • End-system
  • NIC
  • disc
  • TCP config
  • poor cabling
  • the application itself (e.g. older versions of
    scp)
  • I could go on for ever (no, please dont!)
  • Site firewall
  • Off-site connections

32
So Many Issues
  • Beyond the basics of which tests to run, and how
    to control/schedule them, there are too many
    end-to-end performance issues to consider when
    monitoring. Here, I mention a few and make some
    suggestions.
  • TCP performance
  • Parallel TCP streams
  • Different data transfer protocols (e.g. GridFTP
    vrs HTTP)
  • New protocols, e.g. DDCP
  • TCP-IP is ubiquitous so we stick with it - we
    cant necessarily wait for new protocols and
    network architectures
  • Measurement types
  • active vrs passive
  • capture logs of real GridFTP transfersis there
    Grid Information Service support?
  • can we monitor Grid workflows in real-time?
  • Too many test paths. Can we plug in to VO data to
    test only the required paths

33
Over-Provisioning
  • Q Okay, so why dont we just throw some more
    bandwidth at the problem? Upgrade the links.
  • A For want of a more interesting term to make
    sure youre still paying attention, this is what
    I call the Heroin Effect
  • You start off with a little, but thats not
    really doing it for you its not solving the
    problem. So you keep increasing the dose, yet
    its never as good as you thought it would be.
  • By analogy you keep buying more and more
    bandwidth to take you to new highs but it's never
    quite as good as you thought it would be
  • Simple over-provisioning is not sufficient
  • Doesnt address the key issue of end-to-end
    performance
  • Network backbone in most cases is genuinely not
    the source of the problem
  • Last mile (campus network?end-user system?your
    app) often cause of the problem firewall,
    wiring, hard disc, application and many more
    potential culprits
  • Also, If simple over-provisioning was a total
    solution, there would not be so much other work
    going on, e.g. protocol research (high speed TCPs)

34
Lets Puts Fibre Everywhere (1)
  • Fibre is cheaper than it was, but for large
    deployments, its still expensive
  • We can see the benefits of fibre with the UKLight
    infrastructure and the ESLEA exploitation
    project, but it still doesnt address the
    end-to-end issue. Take a real-life ESLEA example
    (thanks to ESLEA for the figures)
  • The UK wanted to transfer data from FermiLab
    (Chicago) to UCL for analysis by physicists,
    before returning the results
  • datasets currently 1-50TB
  • 50TB would take gt 6 mths on production net, or
    one week at 700Mbps
  • So a 1Gbps circuit-switched light path was
    provisioned
  • Result disc-to-disc transfers _at_ 250Mbps, just
    1/4 of theoretical max
  • Tests revealed a problem at an end site

35
Lets Puts Fibre Everywhere (2)
  • UCL RealityGrid, for modelling complex condensed
    matter systems computational steering,
    visualisation.
  • Test node 2 1.8GHz Athlon, 4 GB, GigE, CentOS
  • DL HPCx super computer
  • Test node 3 GHz P4, 2 GB, GigE, Scientific Linux
  • RTT is always 9mS
  • TCP bandwidth is, errr....

36
Marks Tips
  • There are lots of tools, frameworks,
    infrastructures out there.
  • Massive list at http//www.slac.stanford.edu/xorg/
    nmtf/nmtf-tools.html
  • Pick something that works for you - its a
    balance of
  • ongoing administration
  • deployment effort (e.g. persuading remote sites
    to install tools and allow you to run tests)
  • how intrusive the tests are
  • Start your investigations in the last mile
  • Do put real data over the network
  • you can send 1 ping a second forever and see 10-8
    loss
  • you then run an iperf test and the performance is
    terrible
  • Keep historic data things change
  • you will want to look back, and you will want
    points of reference
  • When you see a problem, follow it up and get
    information
  • Not only is the problem fixed, but you get to
    demonstrate why this is useful which helps with
    deployment, support, growing user base
  • Remember the social aspects - persistent but
    patient )

37
Suggestions Tools and Techniques
  • Start with the local host
  • As you would expect
  • uname
  • netstat
  • ifconfig (watch error counters etc.)
  • LISA (Localhost Information Service Agent)
  • a component of MonALISA
  • almost complete system monitoring (load, CPU,
    memory, disk, disk I/O, paging, processes,
    network traffic and connectivity...)
  • Check everything
  • TCP configuration
  • machine load
  • disc (sas, sata, nasty old ide?)
  • If TCP is the problem, what UDP rates can you
    achieve?

38
Suggestions Tools and Techniques
  • ping still useful but need to send much faster
    than 1 per second, and for a long time.10-8 loss
  • back of envelope calculation on Saturday I ran
    a 10 sec iperf test which transferred 624MB in
    480,000 packets. So 1.3KB per packet
  • 1 loss every 100,000,000 packets 128GB
    transferred before a loss causes your transfer
    rate to drop
  • can use Synack tool (sparingly) if icmp is
    blocked
  • traceroute and reverse traceroutes regularly
    measuring the routes to your most important
    collaborators is very useful
  • dedicated monitoring boxes are useful here
    because they may be allowed (firewalls etc.) for
    icmp

39
Suggestions Tools and Techniques
  • As we will see, time series data is probably the
    most useful
  • When did your problems start? When did things
    change?
  • Unfortunately, relies on there being proximity
    between your paths/devices and ones for which
    there is available data
  • If you suspect the problem is in the core you may
    be able to find the problem router (or rough
    location) through a so called "looking glass"
    servers statistics of network operator
    performance
  • ping and iperf very useful herebut be wary
  • In May 2004, Les Cottrell (SLAC) said As
    measured by NetFlow, 25 of the traffic on
    Abilene is iperf and ping type traffic

40
Suggestions Tools and Techniques
  • Thrulay is an iperf-like tool for measuring TCP
    and UDP bandwidth
  • useful because it also gives you the RTT seen by
    the transfer, not ping/traceroutes estimate
  • Two detective type tools
  • Tom Dunnigan and Rich Carlson's Network
    Diagnostic Tool (NDT)
  • client-server
  • useful because client can be lightweight Java
    applet, runs in a Web browser on most systems
  • command line client (compile and install) also
    available
  • public servers (linux boxes with Web100 kernels)
    although I think only one outside US (thank you
    SWITCH)
  • detects problems, makes suggestions duplex
    problems, TCP tuning amongst others
  • The SURFnet Detective

41
Suggestions Tools and Techniques
42
Suggestions Tools and Techniques
  • We could do these but dont because theres too
    much data to process/correlate
  • Cisco NetFlow data routers record details of
    all traffic flows which they see
  • src and dest IP addresses and ports
  • start and end time
  • amount of traffic transferred
  • Parsing firewall logs
  • root_at_gridmon2 iperf -c hepgrid7.ph.liv.ac.uk
  • -------------------------------------------------
    -----------
  • Client connecting to hepgrid7.ph.liv.ac.uk, TCP
    port 5001
  • TCP window size 16.0 KByte (default)
  • -------------------------------------------------
    -----------
  • 3 local 193.62.125.96 port 58316 connected
    with 138.253.178.107 port 5001
  • 3 0.0-10.0 sec 873 MBytes 732
    Mbits/sec
  • Jun 10 221258 NetScreen device_idgw-fw
    system-notification-00257(traffic)
    start_time"2007-06-10 221555" duration22
    servicetcp/port5001 src zoneESC-DMZ dst
    zoneUntrust actionPermit sent948533470
    rcvd40793960 srclthiddengt dstlthiddengt
    src_port58316 dst_port5001 session_id995619
  • Not wholly accurate (22 secs not 10) and ignores
    overheads but can be used relative

43
Suggestions Tools and Techniques
  • SNMP data is (understandably) impossible to
    obtain for non-networkers
  • Sharing data with the OGF NM-WG XML schemas may
    improve things
  • And now some quick examples from gridmon
  • Dedicated boxes
  • Same spec, OS, configuration - makes life a lot
    easier (comparing like-for like)
  • If running regular tests, get the results in an
    SQL data fast, repeatable queries
  • If no dedicated boxes available, deploy a box
    for
  • either the best performance possible
  • Something representative of systems at that
    end-site
  • Sorry, no-end system examples here we
    configured the boxes ourselves -)

44
Example 1
  • Glasgow running transfer tests to Edinburgh over
    weekend 28-29th October
  • Experiencing poor rates (80Mbps)
  • 1st thing despite transferring just 80Mbps,
    residual TCP bandwidth drops by 400Mbps
  • Warning bells

45
Example 1
  • Traceroute data reveals suspect router
  • traceroute to gridmon.epcc.ed.ac.uk
    (129.215.175.71), 30 hops max, 38 byte packets
  • 1 194.36.1.1 (194.36.1.1) 0.941 ms 0.882 ms
    0.815 ms
  • 2 130.209.2.1 (130.209.2.1) 0.875 ms 0.831 ms
    0.830 ms
  • 3 130.209.2.118 (130.209.2.118) 60.415 ms
    55.453 ms 31.327 ms
  • 4 glasgowpop-ge1-2-glasgowuni-ge1-1-v152.clyde.ne
    t.uk (194.81.62.153) 32.420 ms 34.404 ms
    29.424 ms
  • 5 glasgow-bar.ja.net (146.97.40.57) 43.467 ms
    52.298 ms 39.349 ms
  • 6 po9-0.glas-scr.ja.net (146.97.35.53) 45.856
    ms 44.445 ms 41.388 ms
  • 7 po3-0.edin-scr.ja.net (146.97.33.62) 51.509
    ms 63.493 ms 31.435 ms
  • 8 po0-0.edinburgh-bar.ja.net (146.97.35.62)
    22.454 ms 25.412 ms 31.381 ms
  • 9 146.97.40.122 (146.97.40.122) 44.602 ms
    42.494 ms 35.492 ms
  • 10 gridmon.epcc.ed.ac.uk (129.215.175.71)
    33.515 ms 34.623 ms 37.694 ms

46
Example 1
  • Reverse route confirms. Traceroutes are normal
    until we hit suspect router
  • traceroute to gppmon-gla.scotgrid.ac.uk
    (194.36.1.56), 30 hops max, 38 byte packets
  • 1 vlan175.srif-kb1.net.ed.ac.uk
    (129.215.175.126) 0.435 ms 0.387 ms 0.380 ms
  • 2 edinburgh-bar.ja.net (146.97.40.121) 0.357 ms
    0.329 ms 0.322 ms
  • 3 po9-0.edin-scr.ja.net (146.97.35.61) 0.564 ms
    0.485 ms 0.485 ms
  • 4 po3-0.glas-scr.ja.net (146.97.33.61) 1.656 ms
    1.511 ms 1.499 ms
  • 5 po0-0.glasgow-bar.ja.net (146.97.35.54) 1.850
    ms 1.352 ms 1.422 ms
  • 6 146.97.40.58 (146.97.40.58) 1.679 ms 1.661
    ms 1.569 ms
  • 7 glasgowuni-ge1-1-glasgowpop-ge1-2-v152.clyde.ne
    t.uk (194.81.62.154) 1.796 ms 1.677 ms 1.646
    ms
  • 8 130.209.2.117 (130.209.2.117) 31.197 ms
    34.615 ms 29.121 ms
  • 9 130.209.2.2 (130.209.2.2) 32.814 ms 32.158
    ms 32.145 ms
  • gppmon-gla.scotgrid.ac.uk (194.36.1.56) 41.634
    ms 37.555 ms 24.635 ms
  • Graphs and traceroutes provide evidence for
    further investigation

47
Example 1
  • Further investigation revealed that the router
    had exhausted its CAM space
  • ltsee next slide if you want to know what this isgt
  • In simple terms, the router was forced to switch
    in software
  • Because a particular lookup in a
    routing/switching/access table was not being
    hardware accelerated, problems were caused under
    certain flow conditions
  • The solution the CAM dynamic database was
    re-optimised (to free up CAM space) and the unit
    began switching in hardware again

48
Example 1
  • CAM Content-Addressable Memory
  • Hardware (fast) implementation of an associative
    area
  • a data word (not memory address!) is used to
    access it
  • the CAM searches its entire contents to see if
    the data word is stored
  • if the word is found, the CAM returns a list of
    one or more corresponding storage addresses, or
    other data associated with those storage
    addresses
  • CAM memory is used for switching and routing,
    e.g. Ethernet switches store learned MAC
    addresses and their associated switch port in CAM
  • MAC Address Located on Port
  • ------------- ---------------
  • 000039-0643f5 26
  • 000089-01af9a 5
  • 000102-162346 16
  • When an Ethernet frame arrives at the switch with
    a destination address of 000089-01af9a the switch
    searches its CAM for that address.
  • The CAM will return 5 so the switch sends this
    Ethernet frame out on port 5

49
Example 2
  • Local departmental firewall reconfigured to
    switch off strict checking of TCP sequence
    numbers
  • Potential minefield SACK etc.

50
Example 3
  • Almost constant 33 UDP packet loss
  • Fatal to most/all applications using UDP
  • Occasional dip to 0

51
Example 3
  • Zooming into a particular day shows a period of
    0 loss
  • Site firewall limits UDP to 1,000 packets per
    second, per endpoint pair
  • Temporarily raised to 20,000 pps for Video
    Conferences

52
The Answer
  • Blair (vintage 1996) before he game to power
  • Education, education, education became a mantra
    for his party
  • NRENs are ideally placed to provide this

53
The Answer
  • Blair (vintage 1996) before he game to power
  • Education, education, education became a mantra
    for his party
  • NRENs are ideally placed to provide this

54
The Answer
  • Blair (vintage 1996) before he game to power
  • Education, education, education became a mantra
    for his party
  • NRENs are ideally placed to provide this

55
NFNN
  • Talks on TCP, LAN, diagnostic steps, security
  • http//gridmon.dl.ac.uk/nfnn/
  • As an example
  • Networks for non-Networkers workshops
  • Aimed at people working at the technical level in
    high-bandwidth dependant science

56
Your Application
  • Is your application making effective use of the
    network?
  • Consider using multiple TCP sockets (i.e.
    multiple streams) for your data transfers
  • One thread per socket
  • Keep your pipe full of data
  • use asynchronous I/O, i.e. run computation and
    I/O in parallel
  • pre-fetch data you know you are going to need,
    again in parallel with other computation or I/O
  • when possible, read/write large blocks of data at
    a time better to infrequently r/w ? 1MB than
    frequently r/w 4K

57
What Is Your Application Doing?
  • Instrument your code, e.g. Netlogger, a
    Networked Application Logger
  • Methodology and set of tools
  • Low overhead can generate up to 5000/500
    events/sec using the C/Java APIs with negligible
    impact on the app
  • Simple and sensible methodology, e.g.
  • Rule 3 Log all of the following events Entering
    and exiting any program or software component,
    and begin/end of all I/O (disk and network).

58
Netlogger
  • client side GridFTP
  • note the large overhead ( 8s) of initial
    handshaking before real writing begins

59
Conclusion
  • The Grid could use network performance data
  • The reality is that it doesnt
  • The Grid will exercise networks
  • Core fine. Metro mostly fine. Most problems
    in the last mile.
  • Not every Grid app wants, needs or can afford
    dedicated ?s
  • Education, education, education. But please, no
    wars!
  • Tune your end systems and applications
  • Instrument you application so you can see whats
    happening
  • For more information m.j.leese_at_dl.ac.uk

60
Links (1)
  • The GridPP (LHC in the UK) "gridmon" network
    monitoring infrastructure http//gridmon3.dl.ac.u
    k/gridmon/
  • Network Aware Scheduling in Grids
  • "Network Aware Scheduling in Grids" paper
    http//users.atlantis.ugent.be/bvolckae/papers/NOC
    2004.pdf
  • "Data Intensive and Network Aware (DIANA) Grid
    Scheduling" paper http//hst.web.cern.ch/hst/publ
    ications/diana-JoGC.pdf
  • Report of the International Grid Performance
    Workshop 2005 http//www-unix.mcs.anl.gov/schopf
    /GPW2005/report.pdf
  • EDG WP7 Final Report https//edms.cern.ch/file/41
    4132/2.1/DataGrid-07-D7-4-0206-2.0.pdf
  • EGEE-JRA4 http//egee-jra4.web.cern.ch/EGEE-JRA4/
  • gLite User Guide https//edms.cern.ch/file/722398
    /gLite-3-UserGuide.html

61
Links (2)
  • Rolls Royce Networks
  • Ciscos Network Based On-demand/Grid System
    http//www.terena.org/activities/nrens-n-grids/wor
    kshop-03/NBGS-Terena.pdf
  • The NAREGI project http//www.naregi.org/index_e.
    html
  • Enlightened Computing http//www.mcnc.org/index.c
    fm?fuseactionpagefilenameenlightened_computing.
    html
  • G-Lambda http//www.g-lambda.net
  • Monitoring Grid workflows in real-time
    http//www.di.unipi.it/augusto/seminars/200705_OG
    F20/2007-04-09_OGF-Slides.pdf
  • Exploiting fibre infrastructures, UK ESLEA
    project closing conference http//www.eslea.uklig
    ht.ac.uk/conf.html
  • UCL Reality Grid project http//www.realitygrid.o
    rg
  • Daresbury Laboratory HPCx super computer
    http//www.hpcx.ac.uk

62
Links (3)
  • End host monitoring, LISA (Localhost Information
    Service Agent) http//monalisa.cacr.caltech.edu
  • Synack, alternative ping tool http//www-iepm.sla
    c.stanford.edu/tools/synack/
  • Thrulay, iperf-like tool http//www.internet2.edu
    /shalunov/thrulay/
  • Network Diagnostic Tool http//e2epi.internet2.ed
    u/ndt/
  • SURFnet Detective http//detective.surfnet.nl/en/
    index_en.html
  • Sharing network performance data, OGF Network
    Measurements Working Group http//nmwg.internet2.
    edu/
  • TCP Selective Acknowledgements (SACK)
    http//www.ietf.org/rfc/rfc2018.txt
  • Netlogger (Networked Application Logger)
    http//dsd.lbl.gov/NetLogger/
Write a Comment
User Comments (0)
About PowerShow.com