E2E Integrated - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

E2E Integrated

Description:

E2E performance is a major part of the user experience and is the product of the ... Networks are complex with many possible failure modes that all look alike ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 14
Provided by: chris773
Category:

less

Transcript and Presenter's Notes

Title: E2E Integrated


1
E2E Integrated
  • A Holistic View of Diagnostics and Performance
    Tuning
  • June 14, 2006 TG06
  • Chris Rapier Kathy Benninger - Pittsburgh
    Supercomputing Center
  • rapier_at_psc.edu

2
What E2E is..
E2E performance is a major part of the user
experience and is the product of theinteraction
between the network, the hardware, the
application, and the user.
3
Which means
  • Diagnosis of problems and tuning E2E to meet the
    users needs requires an integrated approach.

4
The User
  • User expectations
  • What does the user expect to see?
  • Are those expectations realistic?
  • Both extremes are problematic
  • How do you teach them?

5
Host Hardware
  • Interface point for the user and the E2E system.
    Might not be where the user is sitting.
  • Chokepoints
  • Mistuned TCP Buffers
  • Web100 and autotuning help resolve this problem
  • NB setsockopt() on buffer overrides autotuning
  • Resource Contention
  • Just a few busy users can bog down a system
  • Resource Limits
  • Disk and file system
  • Cache nodes, like the LCNs at PSC, can help
    address these limits if not overcome them.

6
Networks
  • Encompasses everything between the end points of
    the E2E system.
  • Routers, switches, firewalls, IDS, fiber, copper
    and even policy
  • Networks are complex with many possible failure
    modes that all look alike
  • Poor performance is often the only symptom
  • Use the right tools to find the problem
  • IPerf, TG Network Graphs, pathdiag, etc

7
Network Delay
  • Delay is critical in TCP/IP and has wide reaching
    impact
  • Increased delay can change the behavior and needs
    of an application
  • Increased delay demands decreased loss rates
  • 1Gb/s to NCSA
  • RTT 15.5ms
  • Acceptable loss 1 in 100,000 9k packets
  • 1Gb/s to SDSC
  • RTT 70ms
  • Acceptable loss 1 in 2,000,000 9k packets
  • Pathdiag, part of NPAD from PSC, can help
    determine if delay or mistuning on local
    resources will hamper performance. Addresses
    symptom scaling issues.
  • http//www.psc.edu/networking/projects/pathdiag

8
Applications
  • Interactive
  • Low latency is critical
  • Latency can be cause by host, application, or
    network.
  • It may work in the LAN but break in the WAN or
    vice versa
  • Because of symptom scaling
  • Knowing how an application or protocol works can
    help

9
Applications II
  • Bulk
  • Buffers must be properly sized and managed
  • Otherwise these may act like an invisible brake
    on the throughput.
  • Great applications can be crippled by not
    addressing this
  • SSH is multiplexed and so has its own flow
    control
  • The window is limited to 64K so throughput will
    always be less than 64K/RTT
  • HPN-SSH, developed at PSC, fixes this problem and
    can provide 10x to 20x performance boost
  • http//www.psc.edu/networking/projects/hpn-ssh
  • Is it the right tool for the job?
  • That depends, whats the job?

10
How Components Relate
  • Not all components of E2E are equal.
  • Not all data is equal
  • Bulk data tends to be more sensitive to host
    tuning issues
  • Interactive data most sensitive to delay and
    contention
  • There are no laws only rules of thumb.

11
Our Strategy
  • What does the user expect to happen?
  • Modify expectations
  • Many times you may need to increase expectations
  • Talk to the user
  • Get as much information as possible
  • Dont be rigid in approach
  • Symptoms may look the same but the problems are
    often different
  • Start with the obvious.
  • Dont assume anything. Not even on the TG.
  • Know what your tools can tell you
  • NPAD and Iperf are both network tools but answer
    different questions

12
E2E Diagnostics In Action
  • User only seeing 1MB/s with SCP
  • Problems and Solutions
  • Application
  • Non HPN SSH is too slow to sustain high
    throughput
  • Installed HPN-SSH
  • Host Misconfiguration
  • Buffers were far too small.
  • Dramatically increase buffer size
  • Contention
  • Other users on system were using enough CPU
    resources to reduce performance
  • Time transfers at low load times or use different
    node
  • Improved throughput to 18.2MB/s

13
The End(2End)
  • Dont assume its any one thing
  • E2E has many different components and each has
    many facets
  • People have a tendency to focus on what they know
    best.
  • Dont forget the user
  • Use the right tools to find the problem
Write a Comment
User Comments (0)
About PowerShow.com