Network performance measurements - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Network performance measurements

Description:

Countries monitored. Contain 78% of world population. 99% of online users of Internet ... nlanr.net/training/presentations/tcp-tutorial.ppt. www-didc.lbl.gov ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 30
Provided by: cot90
Category:

less

Transcript and Presenter's Notes

Title: Network performance measurements


1
Network performance measurements
  • Les Cottrell SLAC
  • Prepared for the ICFA-SCIC, CERN December 8, 2001

Partially funded by DOE/MICS Field Work Proposal
on Internet End-to-end Performance Monitoring
(IEPM), also supported by IUPAP
2
PingER deployment
  • Measurements from
  • 34 monitors in 14 countries
  • 6 DoE Labs, 4 DoE Uni
  • Over 600 remote hosts
  • 65 DoE funded universities
  • Over 72 countries
  • Over 3300 monitor-remote site pairs
  • Measurements go back to Jan-95
  • Reports on RTT, loss, reachability, jitter,
    reorders, duplicates
  • Countries monitored
  • Contain 78 of world population
  • 99 of online users of Internet
  • Lightweight (100bps/host pair)
  • Very useful for inter-regional and poor links,
    need more intensive for high performance Grid
    sites
  • Plan to continue project

3
New stuff
  • 8 metrics, added duplicates, out of order,
    jitter, min RTT, conditional loss probability
  • Now have defined 45 affinity groups 72
    countries 32 monitoring sites
  • FNAL re-involved cleaning up , the database and
    graphing engine, looking to migrate database,
    already much more reliable, creating web site
  • UCL, Daresbury extending to IperfER
  • Besides ESnet, there are data sets for IPv6
    6bone, XIWT, NTON when it existed.

4
Throughput quality improvements
TCPBW lt MSS/(RTTsqrt(loss))
80 annual improvement factor 10/4yr
China
Note E. Europe keeping up
Macroscopic Behavior of the TCP Congestion
Avoidance Algorithm, Matthis, Semke, Mahdavi,
Ott, Computer Communication Review 27(3), July
1997
5
Losses Rest of world by region
  • lt1 good, 2.5acceptable, lt 5poor, lt12v.
    poor, gt12bad
  • E. Europe still problems
  • Middle East, S. Central America also problem
    regions

6
Rest of world by TLD
  • Russia poor to bad China poor

7
We need to better understand
  • Closer to applications, e.g. FTP
  • Understand how to make throughput measurements
  • Duration frequency (balance impact against
    granularity needed),
  • Windows and or vs parallel streams,
  • OS dependencies, cpu utilization, interface
    speeds, security (e.g. ssh)
  • Impact on others, variability on different
    time-scales
  • Can we use QBSS, can/should application self
    limit?
  • How well does simulation work, how to improve?
  • How to relate to simpler measurements
  • How does file transfer work compared to iperf?
  • Is compression useful and when?
  • How useful is it for the application to get
    feedback from the network?

8
How to measure network throughput
  • Selected about 2 dozen major collaborator sites
    in US, CA, JP, FR, CH, IT, UK over last year
  • Of interest to SLAC (HENP, PPDG, Internet
    measurement centers)
  • Can get logon accounts
  • Use iperf, bbcp (soon bbftp, gridftp)
    traceroute etc.
  • Choose window size and parallel streams
  • Run for 10 seconds together with ping (loaded)
  • Stop iperf, run ping (unloaded) for 10 seconds
  • Change window or number of streams repeat
  • Record streams, window, throughput (Mbits/s),
    loaded unloaded ping responsesVerify window
    sizes are set properly by using tcpdump cant
    believe what application tells you LM
  • Compare bandwidth measurement tools plus with
    iperf, bbcp, bbftp, gridFTP choose minimum set,
    automate

9
Typical results
64kB
100kB
32kB
16kB
8kB
10
Windows vs Streams
  • Multi-streams often more effective than windows
  • more agile in face of congestion
  • Often easier to set up
  • Need root to configure kernel to set max window
  • Network components may not support big windows
  • Some OS treat max windows strangely D
  • May be able to take advantage of multiple paths
  • But
  • may be considered over-aggressive (RFC 2914) p
  • can take more cpu cycles
  • how to know how many streams?

11
Iperf client CPU utilization
  • As expected increases with throughput (mainly
    kernel)
  • d 0.7MHz/Mbits/s
  • For fixed throughput
  • Fewer streams take less cpu J 6
  • E.g. 1-4 streams take 20 less cpu than 8-16
    streams for same throughput (if can get it)

12
CPU vs window vs streams vs throughput
Increasing window
  • MHz 0.97 Mbps
  • Bigger windows less cpu for fixed throughput

Increasing streams
Hooks at end saturation
13
Pathologies
Diurnal variation often indicates saturation
Routing change
Flat time series due to host NIC limit (100Mbps)
14
BBCP vs Iperf
iperf
bbcp
Slope Avg0.6 Sd 0.2
15
Bbcp memory vs disk (gt/tmp)
16
But
LANL has Maxtor IDE disks
17
Compression
  • 60Mbyte Objectivity file, using zlib, 8 streams,
    64KB window
  • Can improve throughput on this link with these
    hosts (Sun Ultra Sparcs with 360MHz cpus) by more
    than a factor of 2.
  • Want to characterize improvement as
    function(hosts, link speeds, )

18
Impact on Others
  • Make ping measurements with without iperf
    loading
  • Loss loaded(unloaded)
  • RTT
  • Looking at how to avoid impact e.g. QBSS/LBE,
    application pacing, control loop on stdev(RTT)
    reducing streams, want to avoid scheduling

19
File Transfer
  • Used bbcp (written by Andy Hanushevsky)
  • similar methodology to iperf, except ran for file
    length rather than time, provides incremental
    throughput reports, supports /dev/zero, adding
    duration
  • looked at /afs/, /tmp/, /dev/null
  • checked different file sizes
  • Behavior with windows streams similar to iperf
  • Thrubbcp 0.8Thruiperf
  • For modest throughputs (lt 50Mbits/s) rates are
    independent of whether destination is /afs/,
    /tmp/ or /dev/null.
  • Cpu utilization 1MHz/Mbit/s is 20 gt than
    for iperf

20
Application rate-limiting
  • Bbcp has transfer rate limiting
  • Could use network information (e.g. from Web100
    or independent pinging) to bbcp to
    reduce/increase its transfer rate, or change
    number of parallel streams

No rate limiting, 64KB window, 32 streams
15MB/s rate limiting, 64KB window, 32 streams
21
Typical QBSS test bed
Cisco 7200s
  • Set up QBSS testbed
  • Configure router interfaces
  • 3 traffic types
  • QBSS, BE, Priority
  • Define policy, e.g.
  • QBSS gt 1, priority lt 30
  • Apply policy to router interface queues

10Mbps
100Mbps
100Mbps
100Mbps
1Gbps
22
Example of effects
Kicks in fast (lt 1 s)
  • Also tried 1 stream for all, and priority at
    30, 100 Mbps 2 Gbps bottlenecks
  • 2Gbps challenge to saturate did at SC2001, 3
    Linux cpus with 51 Gbps NIC cards and 2 Gbps
    trunk from subnet to floor network, sending to 17
    hosts in 5 countries

23
Impact on response time (RTT)
  • Run ping with Iperf loading with various QoS
    settings, iperf 93Mbps
  • No iperf ping avg RTT 300usec (regardless of
    QoS)
  • Iperf QBSS, pingBE or Priority RTT550usec
  • 70 greater than unloaded
  • IperfPing QoS (exc. Priority) then RTT5msec
  • gt factor of 10 larger RTT than unloaded
  • If both ping iperf have QoSPriority then ping
    RTT very variable since iperf limited to 30
  • RTT quick when iperf limited, long when iperf
    transmits

24
Possible HEP usage
  • Apply priority to lower volume interactive
    voice/video-conferencing and real time control
  • Apply QBSS to high volume data replication
  • Leave the rest as Best Effort
  • Since 40-65 of bytes to/from SLAC come from a
    single application, we have modified to enable
    setting of TOS bits
  • Need to identify bottlenecks and implement QBSS
    there
  • Bottlenecks tend to be at edges so hope to try
    with a few HEP sites

25
Acknowledgements for SC2001
  • Many people assisted in getting accounts, setting
    up servers, providing advice, software etc.
  • Suresh Man Singh, Harvey Newman, Julian Bunn
    (Caltech), Andy Hanushevsky, Paola Grosso, Gary
    Buhrmaster, Connie Logg (SLAC), Olivier Martin
    (CERN), Loric Totay, Jerome Bernier (IN2P3),
    Dantong Yu (BNL), Robin Tasker, Paul Kummer (DL),
    John Gordon (RL), Brian Tierney, Bob Jacobsen,
    (LBL), Stanislav Shalunov (Internet 2), Joe Izen
    (UT Dallas), Linda Winkler, Bill Allcock (ANL),
    Ruth Pordes, Frank Nagy (FNAL), Emanuele Leonardi
    (INFN), Chip Watson (JLab), Yukio Karita (KEK),
    Tom Dunigan (ORNL), John Gordon (RL), Andrew
    Daviel (TRIUMF), Paul Avery, Greg Goddard (UFL),
    Paul Barford, Miron Livny (UWisc), Shane Canon
    (NERSC), Andy Germain (NASA), Andrew Daviel
    (TRIUMF), Richard baraniuk, Rolf Reidi (Rice).

26
SC2001 Bandwidth Challenge/demo
  • Send data from SLAC/FNAL booth computers (emulate
    a tier 0 or 1 HENP site) to over 20 other sites
    with good connections in about 6 countries
  • Throughputs from SLAC range from 3Mbps to gt
    300Mbps
  • 2 Dell/Linux cpus 2 GE NICs each, 1.4Gbps/cpu,
    plus 1 Dell with 1 GE NIC
  • Can get 980Mbits/s/cpu with jumbo frames
  • Saturate 2Gbps connection to floor network
  • Got 1.6Mbits/s from booth to floor network
  • Apply QBSS to some sites, priority to a few and
    rest Best Effort
  • See how QBSS works at high speeds
  • Competing bulk throughput streams
  • Interactive low throughput streams, look at RTT
    with ping

27
Ns-2 thruput loss predict
90
  • Indicates on unloaded link can get 70 of
    available bandwidth without causing noticeable
    packet loss
  • Can get over 80-90 of available bandwidth
  • Can overdrive no extra throughput BUT extra loss

28
WAN thruput conclusions
  • High FTP performance across WAN links is possible
  • Can do 100s of Mbits/s
  • Even with 20-30Mbps bottleneck can do gt
    100Gbytes/day
  • Can easily saturate a fast Ethernet interface
    over WAN
  • Need GE NICs, gt OC3 WANs to improve performance
  • Need high speed cpus, disks, buses
  • Careful attention to copies, buffering etc.
  • Performance is improving
  • OS must support big windows selectable by
    application
  • Need multiple parallel streams in some cases
  • Improvements of 5 to 60 in thruput by using
    multiple streams larger windows
  • Impacts others users, QBSS looks hopeful

29
More Information
  • IEPM/PingER home site
  • www-iepm.slac.stanford.edu/
  • Bulk throughput site
  • www-iepm.slac.stanford.edu/monitoring/bulk/
  • SC2001 high throughput measurements
  • www-iepm.slac.stanford.edu/monitoring/bulk/sc2001/
  • Transfer tools
  • http//dast.nlanr.net/Projects/Iperf/release.html
  • http//doc.in2p3.fr/bbftp/
  • www.slac.stanford.edu/abh/bbcp/
  • http//hepwww.rl.ac.uk/Adye/talks/010402-ftp/html/
    sld015.htm
  • TCP Tuning
  • www.ncne.nlanr.net/training/presentations/tcp-tuto
    rial.ppt
  • www-didc.lbl.gov/tcp-wan.html
  • QBSS measurements
  • www-iepm.slac.stanford.edu/monitoring/qbss/measure
    .html
Write a Comment
User Comments (0)
About PowerShow.com