Converting ASGARD into a MC-Farm for Particle Physics

About This Presentation

Title:

Converting ASGARD into a MC-Farm for Particle Physics

Description:

Converting ASGARD into a MC-Farm for Particle Physics. Beowulf-Day ... 4x 400GB SATA, RAID-10 (800GB usable) / frame. 17.01.05. Adrian Biland, IPP/ETHZ. 18 ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 19

Provided by: biland

Category:

more less

Transcript and Presenter's Notes

Title: Converting ASGARD into a MC-Farm for Particle Physics

1
Converting ASGARD into a MC-Farm for Particle
Physics

Beowulf-Day 17.01.05
A.Biland IPP/ETHZ

2
Beowulf Concept
Three Main Components
3
Beowulf Concept
Three Main Components
CPU Nodes
4
Beowulf Concept
Three Main Components
CPU Nodes
Network
5
Beowulf Concept
Three Main Components
CPU Nodes
Network
Fileserver
6
Beowulf Concept
Three Main Components
CPU Nodes
Network
Fileserver
? ?
? How much of the (limited)
money to spend for what ??
7
Beowulf Concept
Intended (main) usage
Eierlegende Woll-Milch-Sau (one size fits
everything) Put equal amount of money into each
component gt ok for (almost) any possible use,
but waste of money for most applications
8
Beowulf Concept
Intended (main) usage
CPU-bound jobs with limited I/O and interCPU
communication
80 10
10
ASGARD , HREIDAR-I
9
Beowulf Concept
Intended (main) usage
Jobs with high interCPU communication needs
(Parallel Proc.)
50 40
10
HREIDAR-II
10
Beowulf Concept
Intended (main) usage
Jobs with high I/O needs or large datasets
(Data Analysis)
50 10
40
11
Fileserver Problems
a) Speed (parallel access)
Inexpensive Fileservers reach disk-I/O 50
MB/s 500 single-CPU jobs gt 50 MB/s /500 jobs
100kB/s /job (as an upper limit typical values
reached much smaller) Using several Fileservers
in parallel -- difficult data management
(where is which file ?) use
parallel filesystems ? -- hot spots (all jobs
want to access same dataset ) data
replication gt
12
Fileserver Problems
a) Speed (parallel access)
How (not) to read/write the data Bad NFS
(constant transfer of small chunks of data)
gt always disk repositioning gt disk-I/O --gt
0 (somewhat improved with large cache
(gtgt100MB) in memory if write-cache
full long time to flush to disk gt server
blocks) ok rcp (transfer of large blocks
from/to local /scratch ) /scratch
rather small on ASGARD if many jobs
want to transfer at same time ??? Best
fileserver initiates rpc transfers on request
user discipline, not very transparent,
13
Fileserver Problems
b) Capacity .
500 jobs producing data Each writes 100kB/s gt
50 MB/s to Fileserver gt 4.2 TB
/ day !
14
Particle Physics MC

Need huge amount of statistically independent
events
events gtgt CPUs gt embarassingly parallel
problem
gt 5x500 MIPS as
good as 1x2500 MIPS
Usually two sets of programs
Simulation produce huge, very detailed MC-files
(adapted standard packages GEANT,
CORSKA,
b) Reconstruction read MC-files, write smaller
reco-files
selected events, physics data
(special SW developed by each
experiment)
Mass-Production only reco-files needed
gt combine both tasks in one job, use /scratch

15
ASGARD Status
10 frames
/home
24 nodes/ frame
/work
/arch
Local disk per node 1GB / 1GB swap 4GB /scratch
16
ASGARD Status
10 frames
/home
24 nodes/ frame
/work
/arch
Local disk per node 1GB / 1GB swap 4GB /scratch
Needed
Fileserver bandwidth, capacity /scratch
guaranteed space/job
17
ASGARD Upgrade
4x 400GB SATA, RAID-10 (800GB usable) / frame
10 frames
/home
24 nodes/ frame
/work
/arch
Local disk per node 0.2GB / 0.3GB swap 2.5GB
/scr1 2.5GB /scr2
18
ASGARD Upgrade
4x 400GB SATA, RAID-10 (800GB usable) / frame
10 frames
/home
24 nodes/ frame
/work
/arch
Local disk per node 0.2GB / 0.3GB swap 2.5GB
/scr1 2.5GB /scr2
Adding 10 Fileservers (65kFr), ASGARD can serve
2years as MC Farm and GRID Testbed

Write a Comment

User Comments (0)