PHENIX Computing Center in Japan CCJ - PowerPoint PPT Presentation

About This Presentation
Title:

PHENIX Computing Center in Japan CCJ

Description:

Presented on 08/02/2000 at CHEP2000 conference, Padova, Italy ... Motherboard: ASUS p2b. Dual CPU /node (currently total 64 CPU) ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 17
Provided by: takashii7
Category:

less

Transcript and Presenter's Notes

Title: PHENIX Computing Center in Japan CCJ


1
PHENIX Computing Center in Japan (CC-J)
  • Takashi Ichihara
  • (RIKEN and RIKEN BNL Research Center)
  • Presented on 08/02/2000 at CHEP2000
    conference, Padova, Italy

2
Contents
  • 1. Overview
  • 2. Concept of the system
  • 3. System Requirement
  • 4. Other requirement as a Regional Computing
    Center
  • 5. Plan and current status
  • 6. WG for constructing the CC-J (CC-J WG)
  • 7. Current configuration of the CC-J
  • 8. Photographs of the CC-J
  • 9. Linux CPU farm
  • 10. Linux NFS performance v.s. kernel
  • 11. HPSS current configuration
  • 12. HPSS performance test
  • 13. WAN performance test
  • 14. Summary

3
PHENIX CC-J Overview
  • PHENIX Regional Computing Center in Japan (CC-J)
    at RIKEN
  • Scope
  • Principal site of computing for PHENIX simulation
  • PHENIX CC-J is aiming at covering most of the
    simulation tasks of the whole PHENIX experiments
  • Regional Asian computing center
  • Center for the analysis of RHIC spin physics
  • Architecture
  • Essentially follow the architecture of RHIC
    Computing Facility (RCF) at BNL
  • Construction
  • RD for the CC-J started in April 98 at RBRC
  • Construction began in April 99 over a three
    years period
  • 1/3 scale of of the CC-J will be operational in
    April 2000

4
Concept of the CC-J System
5
System Requirement for the CC-J
  • Annual Data amount
  • DST 150 TB
  • micro-DST 45 TB
  • Simulated Data 30 TB
  • Total 225 TB
  • Hierarchical Storage System
  • Handle data amount of 225TB/year
  • Total I/O bandwidth 112 MB/s
  • HPSS system
  • Disk storage system
  • 15 TB capacity
  • All RAID system
  • I/O bandwidth 520 MB/s
  • CPU ( SPECint95)
  • Simulation 8200
  • Sim. Reconst 1300
  • Sim. ana. 170
  • Theor. Mode 800
  • Data Analysis 1000
  • Total 11470
  • Data Duplication Facility
  • Export/import DST, simulated data.

6
Other Requirements as a Regional Computing Center
  • Software Environment
  • Software environment of the CC-J should be
    compatible to the PHENIX Offline Software
    environment at the RHIC Computing Facility (RCF)
    at BNL
  • AFS accessibility (/afs/rhic)
  • Objectivity/DB accessibility (replication to be
    tested soon)
  • Data Accessibility
  • Need exchange data of 225 TB/year to RCF
  • Most part of the data exchange will be done by
    SD3 tape cartridges (50GB/volume)
  • Some part of the data exchange will be done over
    the WAN
  • CC-J will use Asia-Pacific Advanced Network
    (APAN) for US-Japan connection
  • http//www.apan.net/
  • APAN has currently 70 Mbps bandwidth for Japan-US
    connection
  • Expecting 10-30 of the APAN bandwidth (7-21 M
    bps) can be used for this project
  • 75-230 GB/day ( 27 - 82 TB/year) will be
    transferred over the WAN

7
Plan and current status of the CC-J
8
Working Group for the CC-J construction (CC-J WG)
  • CC-J WG is a main body to construct the CC-J
  • Hold bi-weekly regular meeting at RIKEN Wako, to
    discuss technical items and project plans etc.
  • Mailing list of the CC-J WG created (mail
    traffic 1600 mails /year)

9
Current configuration of the CC-J
10
Photographs of the PHENIX CC-J at RIKEN
11
Linux CPU farms
  • Memory Requirement 200-300 MB/CPU for a
    simulation chain
  • Node specification
  • Motherboard ASUS p2b
  • Dual CPU /node (currently total 64 CPU)
  • PentiumII (450MHz) 32 CPU Pentium III (600
    MHz) 32 CPU
  • 512 MB memory / node (1GB SWAP/node)
  • 14 GB HD /node (system 4GB, work 10 GB)
  • 100 BaseT Ethernet interface (DECchip Tulip)
  • Linux Redhat 5.2 (kernel 2.2.11 nfsv3 patch)
  • Portable Batch System (PBS V2.1) for batch
    queuing
  • AFS is accessed through the NFS (No AFS client is
    installed on Linux pc)
  • Daily mirroring of the /afs/rhic contents to a
    local disk file system is carrying out
  • PC Assemble (Alta cluster)
  • Remote hardware-reset/power control, Remote CPU
    temp. monitor
  • Serial port login from the next node (minicom)
    for maintenance (fsck etc.)

12
Linux NFS performance v.s. kernel
  • NFS Performance test using bonnie benchmark for 2
    GB file
  • NFS Server SUN Enterprise 450 (Solaris 2.6) 4
    CPU (400MHz) 1GB memory
  • NFS client Linux RH5.2, Dual Pentium II 600 MB,
    512 MB memory
  • NFS performance of the recent Linux kernel seems
    to be improved
  • nfsv3 patch is still useful for the recent
    kernel (2.2.14)
  • currently we are using the kernel 2.2.11 nfsv3
    patch
  • nfsv3 patch is available from http//www.fys.uio.n
    o/trondmy/src/

13
Current HPSS hardware configuration
  • IBM RS6000-SP
  • 5-node (silver node Quadruple PowerPC604e 332
    MHz CPU/node)
  • Core server 1, Disk mover 2, Tape mover 2
  • SP switch (300 MB/s) and 1000BaseSX NIC (OEM of
    Alteon)
  • A StorageTek Powderhorn Tape Robot
  • 4 Redwood drives and 2000 SD3 cartridges (100 TB)
    dedicated for HPSS
  • Sharing the robot with other HSM systems
  • 6 drives and 3000 cartridges for other HSM
    systems
  • Gigabit Ethernet
  • Alteon ACE180 switch for Jumbo Frame ( 9 kB MTU)
  • Use of the Jumbo Frame reduces the CPU
    utilization for transfer
  • CISCO Catalyst 2948G for distribution to 100BaseT
  • Cache Disk 700 GB (total), 5 components
  • 3 SSA loops (50 GB each)
  • 2 FW-SCSI RAID (270 GB each)

14
Performance test of parallel ftp (pftp) of HPSS
  • pput from SUN-E450 12 MB/s for one pftp
    connection
  • Gigabit Ethernet, Jumbo Frame (9 kB MTU)
  • pput from LINUX 6 MB/s for one pftp connection
  • 100BaseT - G.Ether - Jumbo (defragment on a
    switch)
  • Totally ?50 MB/s pftp performance was obtained
    for pput

15
WAN performance test
  • RIKEN (12 Mbps) - IMnet - APAN (70 Mbps)
    -startap- ESnet - BNL
  • Round Trip Time for RIKEN-BNL 170 ms
  • File transfer rate is 47 kB/s for 8 kB TCP
    widowsize (Solaris default)
  • Large TCP-window size is necessary to obtain
    high-transfer rate
  • RFC1323 (TCP Extensions for high performance, May
    1992) describes the method of using large TCP
    window-size (gt 64 KB)
  • Large ftp performance (641 kB/s 5 Mbps) was
    obtained for a single ftp connection using a
    large TCP window-size (512 kB) over the pacific
    ocean (RTT 170 ms)

16
Summary
  • The construction of the PHENIX Computing Center
    in Japan (CC-J) at RIKEN Wako campus, which will
    extend over a three years period, began in April
    1999.
  • The CC-J is intended as the principal site of
    computing for PHENIX simulation, a regional
    PHENIX Asian computing center, and a center for
    the analysis of RHIC spin Physics.
  • The CC-J will handle the data of about 220
    TB/year and the total CPU performance is planned
    to be 10,000 SPECint95 in 2002.
  • CPU farm of 64 processors (RH5.2, kernel 2.2.11
    with nfsv3 patch) is stable.
  • About 50 MB/s pftp performance was obtained for
    HPSS access.
  • Large ftp performance (641 KB/s 5 Mbps) was
    obtained for a single ftp connection using a
    large TCP window-size (512 kB) over the Pacific
    Ocean (RTT 170 ms)
  • Stress tests for the entire system were carried
    out successfully.
  • Replication of the Objectivity/DB over the WAN
    will be tested soon.
  • The CC-J operation will be started in April 2000.
Write a Comment
User Comments (0)
About PowerShow.com