Title: Performance - Directed Resource Allocation
1Performance - Directed Resource Allocation
- Seung-Hye Jang, Xingfu Wu, Valerie Taylor
- Department of Computer Science
- Texas AM University
The International Grid Performance Workshop,
Edinburgh, UK, June 22-23, 2005
2Outline
- Motivation Goals
- Performance Prediction Prophesy
- Case Studies
- Grid Physics Network (GriPhyN)
- Grid2003 infrastructure
- GEO LIGO pulsar search
- Educational Application
- Multiple servers
- ADDAMLSS
3Motivation
Galaxy Formation Simulation
4Distributed Systems are available
5Goals
- To efficiently map jobs to appropriate resources
- To present a resource planner that uses
performance prediction based upon historical data - To select resources to reduce the execution time
of the application
6Performance Prediction
- Prophesy (http//prophesy.cs.tamu.edu)
- Performance analysis and modeling of parallel and
distributed applications - Three main components
- PAIDE System
- Database
- Model builder
7Prophesy System
Web-based Prophesy GUI
Performance
Database
Actual
Execution
DATA
DATA
ANALYSIS
DATABASES
COLLECTION
8Prophesy Model Builder
- Utilize historical information in the Prophesy
databases - Performance database
- Template database
- System database
- Three techniques
- Curve Fitting
- Easy to generate the model
- Very few exposed parameters
- Parameterization
- Requires one-time manual analysis
- Exposes many parameters , Explore different
system scenarios - Coupling
- Represents application performance in terms of
its kernels and components - Builds upon previous techniques c fit for each
kernel and combines them into polynomial function
9Curve Fitting Usage
Application Performance
Function Performance
Basic Unit Performance
Model Template
Data Structure Performance
10Matrix-matrix multiplication, 16P, IBM SP
11Case Study 1 GriPhyN
Resource Selection
Chimera Virtual Data System
Prophesy
Transform using VDL
Grid Middleware
Ganglia
Monitoring
Submission
12Case Study 1 GriPhyN
Resource Selection
Chimera Virtual Data System
Prophesy
Transform using VDL
Grid Middleware
Ganglia
Monitoring
Submission
GRID 2003
13Resource Selector
Prophesy
Application Name
Interface
Rankings of sites
Input Parameters, List of available sites
Weights of each site
Predictor
14Application- GEO LIGO Pulsar Search -
- The pulsar search is a process of finding
celestial objects that may emit gravitational
waves - GEO (German-English Observatory) LIGO (Laser
Interferometer Gravitational-wave Observatory)
pulsar search is the most frequent coherent
search method that generates F-statistic for
known pulsars
15Grid2003 Testbed
16Execution Environment
Site Name CPUs Batch Compute Nodes Compute Nodes Compute Nodes
Site Name CPUs Batch Processors Cache Size Memory
alliance.unm.edu (UNM) 436 PBS 1 X PIII 731 GHz 256 KB 1 GB
atlas.iu.edu (IU) 400 PBS 2 X Intel Xeon 2.4 GHz 512 KB 2.5 GB
pdsfgrid3.nersc.gov (PDSF) 349 LSF 2 X PIII 650-1.8 GHz 2 X AMD 2100 - 2600 256 KB 2 GB
atlas.dpcc.uta.edu (UTA) 158 PBS 2 X Intel Xeon 2.4 2.6 GHz 512 KB 2 GB
nest.phys.uwm.edu (UWM) 296 CONDOR 1 X PIII 1GHz 256 KB 0.5 GB
boomer1.oscer.ou.edu (OU) 286 PBS 3 X Intel Xeon 2 GHz 512 KB 2 GB
cmsgrid.hep.wisc.edu (UWMadison) 64 CONDOR 1 X Intel Xeon 2.8 GHz 512 KB 2 GB
cluster28.knu.ac.kr (KNU) 104 CONDOR 1 X AMD Athlon XP 1700 256 KB 0.8 GB
acdc.ccr.buffalo.edu (Ubuffalo) 74 PBS 1 X Intel Xeon 1.6 GHz 256 KB 3.7 GB
Selected Grid2003 Sites (Batch local scheduler,
VO Virtual Organization)
17Comparison
- Load-based selection method
- Ganglia 1 to monitor the Grid2003
- Selects the least loaded site
- Random selection method
1 Ganglia, http//ganglia.sourceforge.net/.
18Experimental Results
Parameters Parameters Prediction-based Prediction-based Load-based Load-based Load-based Random Random Random
Alpha Freq Site Time (sec) Site Time (sec) Error Selected Site Time (sec) Error
0.0065 0.002 PDSF 3863.66 UWMadison 9435.80 59.05 UWMilwaukee 48065.83 60.09
0.0085 0.001 IU 2850.39 UWMadison 11360.28 74.91 KNU 7676.56 62.87
0.0075 0.009 IU 22090.17 PDSF 20197.88 -9.37 UNM 77298.13 71.42
0.0055 0.009 IU 16216.25 UTA 27412.45 40.84 UWMadison 31555.10 48.61
0.0005 0.009 PDSF 1365.51 Ubuffalo 3226.00 57.67 UWMilwaukee 16009.82 91.47
0.0075 0.003 PDSF 6723.30 IU 7343.37 8.44 KNU 8287.77 18.88
0.0065 0.007 PDSF 13561.01 PDSF 13561.01 0.00 UNM 52379.31 74.65
0.0085 0.004 PDSF 10121.27 Ubuffalo 19649.22 48.49 IU 11158.72 9.30
0.0035 0.005 PDSF 5241.28 Ubuffalo 20799.05 74.80 UWM 51936.49 89.91
0.0065 0.009 IU 19184.36 UWMadison 24995.94 23.25 OU 23441.16 18.16
0.0045 0.009 IU 13278.68 UTA 20453.30 35.08 UWMadison 14137.44 6.07
0.0085 0.009 IU 25021.39 UWMadison 26246.68 4.67 OU 31538.22 20.66
Average Average 32.62 52.01
19Case Study 2 AADMLSS
- African American Distributed Multiple
Learning Styles System (AADMLSS) developed by Dr.
Juan E. Gilbert
20Site Selection Process
21Testbed Overview
CATEGORY SPECSÂ Loner (TX) Tina (MA) Interact (AL)
Hardware    CPU Speed (MHz) 997.62 1993.56 697.87
Hardware    Bus Speed (MB/s) 205 638 214
Hardware    Memory (MB) 256 256 256
Hardware    Hard Disk (GB) 30 40 10
Software    O/S Redhat Linux 9.0 Redhat Linux 9.0 Redhat Linux 9.0
Software    Web Server Apache 2.0 Apache 2.0 Apache 2.0
Software    Web Application PHP 4.2 PHP 4.2 PHP 4.1
22Experimental Results 3 Servers -
Concept SRT-LOAD () SRT-RANDOM ()
3/0/0 D 6.21 14.05
3/0/1 D 12.13 21.94
3/0/2 N 14.02 25.83
3/0/3 N 18.12 23.52
3/1/0 N 8.05 12.04
3/1/1 N 7.31 12.25
3/1/2 N 12.60 18.74
3/1/3 N 10.96 19.11
3/2/0 N 7.93 12.58
3/2/1 N 8.05 14.25
3/2/2 N 9.14 15.97
3/2/3 D 9.79 20.58
3/3/0 D 8.94 13.64
3/3/1 D 8.26 16.74
3/3/2 D 9.21 15.21
3/3/3 D 9.97 19.36
AVERAGE 10.04 17.24
23Experimental Results 2 Servers (local and
remote) -
Concept SRT-LOAD () SRT-RANDOM ()
3/0/0 D 9.91 10.24
3/0/1 D 13.04 15.06
3/0/2 D 18.06 19.16
3/0/3 D 20.54 21.29
3/1/0 N 9.81 9.58
3/1/1 N 7.02 7.91
3/1/2 N 11.35 12.15
3/1/3 N 10.47 10.36
3/2/0 D 8.56 8.67
3/2/1 D 8.75 9.75
3/2/2 D 10.06 10.92
3/2/3 D 10.15 10.50
3/3/0 N 8.41 9.56
3/3/1 N 8.58 8.08
3/3/2 N 8.31 7.95
3/3/3 N 10.21 10.19
AVERAGE 10.83 11.34
24Experimental Results- 2 Servers (remote and
remote) -
Concept SRT-LOAD () SRT-RANDOM ()
3/0/0 D 3.13 4.03
3/0/1 D 4.26 5.97
3/0/2 D 7.02 8.28
3/0/3 D 8.64 9.02
3/1/0 D 3.25 4.94
3/1/1 D 3.27 4.10
3/1/2 D 3.93 5.97
3/1/3 D 3.64 4.08
3/2/0 D 3.15 3.32
3/2/1 D 4.39 5.20
3/2/2 D 5.80 5.97
3/2/3 D 6.52 6.95
3/3/0 D 4.39 5.64
3/3/1 D 4.16 5.20
3/3/2 D 4.81 5.73
3/3/3 D 5.02 5.58
AVERAGE 4.71 5.62
25Summary
- Presented a Resource selection method based upon
performance predictions - Illustrated the advantages of using performance
predictions using two case studies - Large scale scientific application GEO LIGO on
Grid2003 - An average 33 better than load-based selection
- Now considering queue wait time predictions
- AADMLSS on 3 servers
- An average 10 better than load-based selection
26Thanks!
- Questions?
- http//prophesy.cs.tamu.edu
- Seung-Hye Jang (jangs_at_cs.tamu.edu)