GridPP%20Meeting%20Edinburgh%204-5%20Feb%2004 - PowerPoint PPT Presentation

About This Presentation
Title:

GridPP%20Meeting%20Edinburgh%204-5%20Feb%2004

Description:

Network Interface Card and associated driver and their configuration ... IDE. Disc. Pack. Ethernet. Logic Analyser. Display. PCI Activity: Read Multiple data ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 37
Provided by: rhu99
Category:

less

Transcript and Presenter's Notes

Title: GridPP%20Meeting%20Edinburgh%204-5%20Feb%2004


1
High Performance Networking for ALL
Members of GridPP are in many Network
collaborations including
Close links withSLACUKERNA, SURFNET and other
NRNsDanteInternet2Starlight,
NetherlightGGFRipeIndustry
UKLIGHT
2
Network Monitoring 1
  • Architecture

3
Network Monitoring 2
24 Jan to 4 Feb 04 TCP iperf RAL to HEP Only 2
sites gt80 Mbit/s
24 Jan to 4 Feb 04 TCP iperf DL to HEP
HELP!
4
High bandwidth, Long distance.Where is my
throughput?
  • Robin Tasker
  • CCLRC, Daresbury Laboratory, UK
  • r.tasker_at_dl.ac.uk

DataTAG is a project sponsored by the European
Commission - EU Grant IST-2001-32459
5
Throughput Whats the problem?
One Terabyte of data transferred in less than an
hour
On February 27-28 2003, the transatlantic DataTAG
network was extended, i.e. CERN - Chicago -
Sunnyvale (gt10000 km). For the first time, a
terabyte of data was transferred across the
Atlantic in less than one hour using a single TCP
(Reno) stream. The transfer was accomplished
from Sunnyvale to Geneva at a rate of 2.38 Gbits/s
6
Internet2 Land Speed Record
On October 1 2003, DataTAG set a new Internet2
Land Speed Record by transferring 1.1 Terabytes
of data in less than 30 minutes from Geneva to
Chicago across the DataTAG provision,
corresponding to an average rate of 5.44 Gbits/s
using a single TCP (Reno) stream
7
So how did we do that?
  • Management of the End-to-End Connection
  • Memory-to-Memory transfer no disk system
    involved
  • Processor speed and system bus characteristics
  • TCP Configuration window size and frame size
    (MTU)
  • Network Interface Card and associated driver and
    their configuration
  • End-to-End no loss environment from CERN to
    Sunnyvale!
  • At least a 2.5 Gbits/s capacity pipe on the
    end-to-end path
  • A single TCP connection on the end-to-end path
  • No real user application
  • Thats to say - not the usual User experience!

8
Realistically whats the problem why do
network research?
  • End System Issues
  • Network Interface Card and Driver and their
    configuration
  • TCP and its configuration
  • Operating System and its configuration
  • Disk System
  • Processor speed
  • Bus speed and capability
  • Network Infrastructure Issues
  • Obsolete network equipment
  • Configured bandwidth restrictions
  • Topology
  • Security restrictions (e.g., firewalls)
  • Sub-optimal routing
  • Transport Protocols
  • Network Capacity and the influence of Others!
  • Many, many TCP connections
  • Mice and Elephants on the path

9
End Hosts Buses, NICs and Drivers
Throughput
  • Use UDP packets to characterise Intel PRO/10GbE
    Server Adaptor
  • SuperMicro P4DP8-G2 motherboard
  • Dual Xenon 2.2GHz CPU
  • 400 MHz System bus
  • 133 MHz PCI-X bus

Latency
Bus Activity
10
End Hosts Understanding NIC Drivers
  • Linux driver basics TX
  • Application system call
  • Encapsulation in UDP/TCP and IP headers
  • Enqueue on device send queue
  • Driver places information in DMA descriptor ring
  • NIC reads data from main memory
  • via DMA and sends on wire
  • NIC signals to processor that TX
  • descriptor sent
  • Linux driver basics RX
  • NIC places data in main memory via
  • DMA to a free RX descriptor
  • NIC signals RX descriptor has data
  • Driver passes frame to IP layer and
  • cleans RX descriptor
  • IP layer passes data to application
  • Linux NAPI driver model
  • On receiving a packet, NIC raises interrupt
  • Driver switches off RX interrupts and schedules
    RX DMA ring poll
  • Frames are pulled off DMA ring and is processed
    up to application
  • When all frames are processed RX interrupts are
    re-enabled
  • Dramatic reduction in RX interrupts under load
  • Improving the performance of a Gigabit Ethernet
    driver under Linux
  • http//datatag.web.cern.ch/datatag/papers/drafts/l
    inux_kernel_map/

11
Protocols TCP (Reno) Performance
  • AIMD and High Bandwidth Long Distance networks
  • Poor performance of TCP in high bandwidth wide
    area networks is due
  • in part to the TCP congestion control algorithm
  • For each ack in a RTT without loss
  • cwnd -gt cwnd a / cwnd - Additive Increase,
    a1
  • For each window experiencing loss
  • cwnd -gt cwnd b (cwnd) -
    Multiplicative Decrease, b ½

12
Protocols HighSpeed TCP Scalable TCP
  • Adjusting the AIMD Algorithm TCP Reno
  • For each ack in a RTT without loss
  • cwnd -gt cwnd a / cwnd - Additive Increase,
    a1
  • For each window experiencing loss
  • cwnd -gt cwnd b (cwnd) -
    Multiplicative Decrease, b ½
  • High Speed TCP
  • a and b vary depending on current cwnd where
  • a increases more rapidly with larger cwnd and as
    a consequence returns to the optimal cwnd size
    sooner for the network path and
  • b decreases less aggressively and, as a
    consequence, so does the cwnd. The effect is that
    there is not such a decrease in throughput.
  • Scalable TCP
  • a and b are fixed adjustments for the increase
    and decrease of cwnd
  • such that the increase is greater than TCP Reno,
    and the decrease on
  • loss is less than TCP Reno

13
Protocols HighSpeed TCP Scalable TCP
  • HighSpeed TCP
  • Scalable TCP

Success
HighSpeed TCP implemented by Gareth Manc Scalable
TCP implemented by Tom Kelly Camb Integration of
stacks into DataTAG Kernel Yee UCL Gareth
14
Some Measurements of Throughput CERN -SARA
  • Using the GÉANT Backup Link
  • 1 GByte file transfers
  • Blue Data
  • Red TCP ACKs
  • Standard TCP
  • Average Throughput 167 Mbit/s
  • Users see 5 - 50 Mbit/s!
  • High-Speed TCP
  • Average Throughput 345 Mbit/s
  • Scalable TCP
  • Average Throughput 340 Mbit/s

15
Users, The Campus the MAN 1
Pete White Pat Meyrs
  • NNW to SJ4 Access 2.5 Gbit PoS Hits 1 Gbit
    50
  • Man NNW Access 2 1 Gbit Ethernet

16
Users, The Campus the MAN 2
  • Message
  • Continue to work with your network group
  • Understand the traffic levels
  • Understand the Network Topology
  • LMN to site 1 Access 1 Gbit Ethernet
  • LMN to site 2 Access 1 Gbit Ethernet

17
10 GigEthernet Tuning PCI-X
18
10 GigEthernet at SC2003 BW Challenge (Phoenix)
  • Three Server systems with 10 GigEthernet NICs
  • Used the DataTAG altAIMD stack 9000 byte MTU
  • Streams From SLAC/FNAL booth in Phoenix to
  • Pal Alto PAIX 17 ms rtt
  • Chicago Starlight 65 ms rtt
  • Amsterdam SARA 175 ms rtt

19
Helping Real Users 1Radio Astronomy VLBIPoC
with NRNs GEANT 1024 Mbit/s 24 on 7 NOW
20
VLBI Project Throughput Jitter 1-way Delay
  • 1472 byte Packets Manchester -gt Dwingeloo JIVE
  • 1472 byte Packets man -gt JIVE
  • FWHM 22 µs (B2B 3 µs )
  • 1-way Delay note the packet loss (points with
    zero 1 way delay)

21
VLBI Project Packet Loss Distribution
  • Measure the time between lost packets in the time
    series of packets sent.
  • Lost 1410 in 0.6s
  • Is it a Poisson process?
  • Assume Poisson is stationary
    ?(t) ?
  • Use Prob. Density Function P(t) ?
    e-?t
  • Mean ? 2360 / s426 µs
  • Plot log slope -0.0028expect -0.0024
  • Could be additional process involved

22
VLBI Traffic Flows Only testing!
  • Manchester NetNorthWest - SuperJANET Access
    links
  • Two 1 Gbit/s
  • Access linksSJ4 to GÉANT GÉANT to
    SurfNet

23
Throughput PCI transactions on the Mark5 PC
  • Mark5 uses Supermicro P3TDLE
  • 1.2 GHz PIII
  • Mem bus 133/100 MHz
  • 2 64bit 66 MHz PCI
  • 4 32bit 33 MHz PCI

Ethernet
NIC
IDE Disc Pack
SuperStor
Input Card
Logic Analyser Display
24
PCI Activity Read Multiple data blocks 0 wait
  • Read 999424 bytes
  • Each Data block
  • Setup CSRs
  • Data movement
  • Update CSRs
  • For 0 wait between reads
  • Data blocks 600µs longtake 6 ms
  • Then 744µs gap
  • PCI transfer rate 1188Mbit/s(148.5 Mbytes/s)
  • Read_sstor rate 778 Mbit/s (97 Mbyte/s)
  • PCI bus occupancy 68.44
  • Concern about Ethernet Traffic 64 bit 33 MHz PCI
    needs 82 for 930 Mbit/s Expect 360 Mbit/s

Data transfer
Data Block131,072 bytes
CSR Access
PCI Burst 4096 bytes
25
PCI Activity Read Throughput
  • Flat then 1/t dependance
  • 860 Mbit/s for Read blocks gt 262144 bytes
  • CPU load 20
  • Concern about CPU load needed to drive Gigabit
    link

26
Helping Real Users 2HEPBaBar CMS
Application Throughput
27
BaBar Case Study Disk Performace
  • BaBar Disk Server
  • Tyan Tiger S2466N motherboard
  • 1 64bit 66 MHz PCI bus
  • Athlon MP2000 CPU
  • AMD-760 MPX chipset
  • 3Ware 7500-8 RAID5
  • 8 200Gb Maxtor IDE 7200rpm disks
  • Note the VM parameterreadahead max
  • Disk to memory (read)Max throughput 1.2 Gbit/s
    150 MBytes/s)
  • Memory to disk (write)Max throughput 400 Mbit/s
    50 MBytes/s)not as fast as Raid0

28
BaBar Serial ATA Raid Controllers
  • 3Ware 66 MHz PCI
  • ICP 66 MHz PCI

29
BaBar Case Study RAID Throughput PCI Activity
  • 3Ware 7500-8 RAID5 parallel EIDE
  • 3Ware forces PCI bus to 33 MHz
  • BaBar Tyan to MB-NG SuperMicroNetwork mem-mem
    619 Mbit/s
  • Disk disk throughput bbcp 40-45 Mbytes/s (320
    360 Mbit/s)
  • PCI bus effectively full!

Read from RAID5 Disks
Write to RAID5 Disks
30
MB NG SuperJANET4 Development Network BaBar
Case Study
  • Status / Tests
  • Manc host has DataTAG TCP stack
  • RAL Host now available
  • BaBar-BaBar mem-mem
  • BaBar-BaBar real data MB-NG
  • BaBar-BaBar real data SJ4
  • Mbng-mbng real data MB-NG
  • Mbng-mbng real data SJ4
  • Different TCP stacks already installed

31
Study of Applications MB NG SuperJANET4
Development Network
32
24 Hours HighSpeed TCP mem-mem
  • TCP mem-mem lon2-man1
  • Tx 64 Tx-abs 64
  • Rx 64 Rx-abs 128
  • 941.5 Mbit/s - 0.5 Mbit/s

33
Gridftp Throughput HighSpeedTCP
  • Int Coal 64 128
  • Txqueuelen 2000
  • TCP buffer 1 M byte(rttBW 750kbytes)
  • Interface throughput
  • Acks received
  • Data moved
  • 520 Mbit/s
  • Same for B2B tests
  • So its not that simple!

34
Gridftp Throughput Web100
  • Throughput Mbit/s
  • See alternate 600/800 Mbitand zero
  • Cwnd smooth
  • No dup Ack / send stall /timeouts

35
http data transfers HighSpeed TCP
  • Apachie web server out of the box!
  • prototype client - curl http library
  • 1Mbyte TCP buffers
  • 2Gbyte file
  • Throughput 72 MBytes/s
  • Cwnd - some variation
  • No dup Ack / send stall /timeouts

36
More Information Some URLs
  • MB-NG project web site http//www.mb-ng.net/
  • DataTAG project web site http//www.datatag.org/
  • UDPmon / TCPmon kit writeup http//www.hep.man
    .ac.uk/rich/net
  • Motherboard and NIC Tests
  • www.hep.man.ac.uk/rich/net/nic/GigEth_tests_Bos
    ton.ppt http//datatag.web.cern.ch/datatag/pfldn
    et2003/
  • TCP tuning information may be found
    athttp//www.ncne.nlanr.net/documentation/faq/pe
    rformance.html http//www.psc.edu/networking/p
    erf_tune.html
Write a Comment
User Comments (0)
About PowerShow.com