Title: Investigating the Network Performance of Remote Real-Time Computing Farms For ATLAS Trigger DAQ.
1Investigating the Network Performance of Remote
Real-Time Computing Farms For ATLAS Trigger DAQ.
Richard Hughes-Jones University of Manchester
In Collaboration withBryan Caron University
of Alberta Krzysztof Korcyl IFJ PAN Krakow
Catalin Meirosu Politehnica University of
Bucuresti CERNJakob Langgard Nielsen Niels
Bohr Institute
2- Introduction
- Poster On the potential use of Remote Computing
Farms in the ATLAS TDAQ System
3Atlas Computing Model
PByte/sec
Trigger Event Builder
PC (2004) 1 kSpecInt2k
10 GByte/sec
Event Filter7.5MSI2k
320 MByte/sec
Tier 0
5 PByte/yearno simulation
CERN Center PBytes of Disk Tape Robot
Castor
75MB/s/T1 for ATLAS
Tier 1
UK Regional Centre (RAL)
US Regional Centre
French Regional Centre
Dutch Regional Centre
MSS
2 PByte/year/T1
Tier 2
Tier2 Centre 200kSI2k
Tier2 Centre 200kSI2k
Tier2 Centre 200kSI2k
622Mb/s 1 Gbit/s links
200 TByte/year/T2
- High Bandwidth Network
- Many Processors
- Experts at Remote sites
Lancaster 0.25TIPS
Sheffield
Manchester
Liverpool
Physics data cache
100 - 1000 MB/s links
Desktop
4Remote Computing Concepts
Remote Event Processing Farms
Copenhagen Edmonton Krakow Manchester
ATLAS Detectors Level 1 Trigger
Event Builders
lightpaths
GÉANT
Level 2 Trigger
Local Event Processing Farms
CERN B513
Mass storage
Experimental Area
5ATLAS Remote Farms Network Connectivity
6ATLAS Application Protocol
- Event Request
- EFD requests an event from SFI
- SFI replies with the event 2Mbytes
- Processing of event
- Return of computation
- EF asks SFO for buffer space
- SFO sends OK
- EF transfers results of the computation
- Tcpmon - instrumented tcp request-response
program emulates the Event Filter EFD to SFI
communication.
7 8End Hosts NICs CERN-nat-Manc.
Throughput Packet Loss Re-Order
- Use UDP packets to characterise Host, NIC
Network - SuperMicro P4DP8 motherboard
- Dual Xenon 2.2GHz CPU
- 400 MHz System bus
- 64 bit 66 MHz PCI / 133 MHz PCI-X bus
Request-Response Latency
- The network can sustain 1Gbps of UDP traffic
- The average server can loose smaller packets
- Packet loss caused by lack of power in the PC
receiving the traffic - Out of order packets due to WAN routers
- Lightpaths look like extended LANShave no
re-ordering
9- Using Web100 TCP Stack Instrumentation
- to analyse application protocol - tcpmon
10tcpmon TCP Activity Manc-CERN Req-Resp
- Round trip time 20 ms
- 64 byte Request green1 Mbyte Response blue
- TCP in slow start
- 1st event takes 19 rtt or 380 ms
11tcpmon TCP activity Manc-cern Req-RespTCP stack
tuned
- Round trip time 20 ms
- 64 byte Request green1 Mbyte Response blue
- TCP starts in slow start
- 1st event takes 19 rtt or 380 ms
- TCP Congestion windowgrows nicely
- Response takes 2 rtt after 1.5s
- Rate 10/s (with 50ms wait)
- Transfer achievable throughputgrows to 800
Mbit/s
12tcpmon TCP activity Alberta-CERN Req-RespTCP
stack tuned
- Round trip time 150 ms
- 64 byte Request green1 Mbyte Response blue
- TCP starts in slow start
- 1st event takes 11 rtt or 1.67 s
- TCP Congestion windowin slow start to 1.8s
then congestion avoidance - Response in 2 rtt after 2.5s
- Rate 2.2/s (with 50ms wait)
- Transfer achievable throughputgrows slowly from
250 to 800 Mbit/s
13SC2004 Disk-Disk bbftp
- bbftp file transfer program uses TCP/IP
- UKLight Path- London-Chicago-London PCs-
Supermicro 3Ware RAID0 - MTU 1500 bytes Socket size 22 Mbytes rtt 177ms
SACK off - Move a 2 Gbyte file
- Web100 plots
- Standard TCP
- Average 825 Mbit/s
- (bbcp 670 Mbit/s)
- Scalable TCP
- Average 875 Mbit/s
- (bbcp 701 Mbit/s4.5s of overhead)
- Disk-TCP-Disk at 1Gbit/sis here!
14Time Series of Request-Response Latency
- Manchester CERN
- Round trip time 20 ms
- 1 Mbyte of data returned
- Stable for 18s at 42.5ms
- Then alternate points 29 42.5 ms
- Alberta CERN
- Round trip time 150 ms
- 1 Mbyte of data returned
- Stable for 150s at 300ms
- Falls to 160ms with 80 µs variation
15- Using the Trigger DAQ Application
16Time Series of T/DAQ event rate
- Manchester CERN
- Round trip time 20 ms
- 1 Mbyte of data returned
- 3 nodes 1 GEthernet two 100Mbit
- 2 nodes two 100Mbit nodes
- 1node one 100Mbit node
- Event Rate
- Use tcpmon transfer time of 42.5ms
- Add the time to return the data 95ms
- Expected rate 10.5/s
- Observe 6/s for the gigabit node
- Reason TCP buffers could not be set large enough
in T/DAQ application
17- Tcpdump of the Trigger DAQ Application
18tcpdump of the T/DAQ dataflow at SFI (1)
Cern-Manchester 1.0 Mbyte event Remote EFD
requests event from SFI
Incoming event request Followed by ACK
SFI sends event Limited by TCP receive
buffer Time 115 ms (4 ev/s)
When TCP ACKs arrive more data is sent.
N 1448 byte packets
19Tcpdump of TCP Slowstart at SFI (2)
Cern-Manchester 1.0 Mbyte event Remote EFD
requests event from SFI
First event request
SFI sends event Limited by TCP Slowstart Time 320
ms
N 1448 byte packets
When ACKs arrive more data sent.
20tcpdump of the T/DAQ dataflow for SFI SFO
- Cern-Manchester another test run
- 1.0 Mbyte event
- Remote EFD requests events from SFI
- Remote EFD sending computation back to SFO
- Links closed by Application
Link setup TCP slowstart
21Some First Conclusions
- The TCP protocol dynamics strongly influence the
behaviour of the Application. - Care is required with the Application design eg
use of timeouts. - With the correct TCP buffer sizes
- It is not throughput but the round-trip nature of
the application protocol that determines
performance. - Requesting the 1-2Mbytes of data takes 1 or 2
round trips - TCP Slowstart (the opening of Cwnd) considerably
lengthens time for the first block of data. - Implementation improvements (Cwnd reduction)
kill performance! - When the TCP buffer sizes are too small (default)
- The amount of data sent is limited on each rtt
- Data is send and arrives in bursts
- It takes many round trips to send 1 or 2 Mbytes
- The End Hosts themselves
- CPU power is required for the TCP/IP stack as
well and the application - Packets can be lost in the IP stack due to lack
of processing power
22Summary
- We are investigating the technical feasibility of
remote real-time computing for ATLAS. - We have exercised multiple 1 Gbit/s connections
between CERN and Universities located in Canada,
Denmark, Poland and the UK - Network providers are very helpful and interested
in our experiments - Developed a set of tests for characterization of
the network connections - Network behavior generally good e.g. little
packet loss observed - Backbones tend to over-provisioned
- However access links and campus LANs need care.
- Properly configured end nodes essential for
getting good results with real applications. - Collaboration between the experts from the
Application and Network teams is progressing well
and is required to achieve performance. - Although the application is ATLAS-specific, the
information presented on the network interactions
is applicable to other areas including - Remote iSCSI
- Remote database accesses
- Real-time Grid Computing eg Real-Time
Interactive Medical Image processing
23Thanks to all who helped, including
- National Research NetworksCanarie, Dante,
DARENET, Netera, PSNC and UKERNA - ATLAS remote farms
- J. Beck Hansen, R. Moore, R. Soluk, G.
Fairey, T. Bold, A. Waananen, S. Wheeler, C. Bee - ATLAS online and dataflow software
- S. Kolos, S. Gadomski, A. Negri, A. Kazarov,
M. Dobson, M. Caprini, P. Conde, C. Haeberli, M.
Wiesmann, E. Pasqualucci, A. Radu
24More Information Some URLs
- Real-Time Remote Farm site http//csr.phys.ualbert
a.ca/real-time - UKLight web site http//www.uklight.ac.uk
- DataTAG project web site http//www.datatag.org/
- UDPmon / TCPmon kit writeup http//www.hep.man
.ac.uk/rich/ (Software Tools) - Motherboard and NIC Tests
- http//www.hep.man.ac.uk/rich/net/nic/GigEth_te
sts_Boston.ppt http//datatag.web.cern.ch/datata
g/pfldnet2003/ - Performance of 1 and 10 Gigabit Ethernet Cards
with Server Quality Motherboards FGCS Special
issue 2004 - http// www.hep.man.ac.uk/rich/ (Publications)
- TCP tuning information may be found
athttp//www.ncne.nlanr.net/documentation/faq/pe
rformance.html http//www.psc.edu/networking/p
erf_tune.html - TCP stack comparisonsEvaluation of Advanced
TCP Stacks on Fast Long-Distance Production
Networks Journal of Grid Computing 2004http//
www.hep.man.ac.uk/rich/ (Publications) - PFLDnet http//www.ens-lyon.fr/LIP/RESO/pfldnet200
5/ - Dante PERT http//www.geant2.net/server/show/nav.0
0d00h002
25 26 27End Hosts NICs CERN-Manc.
Throughput Packet Loss Re-Order
- Use UDP packets to characterise Host NIC
- SuperMicro P4DP8 motherboard
- Dual Xenon 2.2GHz CPU
- 400 MHz System bus
- 66 MHz 64 bit PCI bus
Request-response Latency
28TCP (Reno) Details
- Time for TCP to recover its throughput from 1
lost packet given by - for rtt of 200 ms
2 min
UK 6 ms Europe 20 ms USA 150 ms