Title: Parallel Job and File Distribution in an Agent Hierarchy
1Parallel Job and File Distribution in an Agent
Hierarchy
- Munehiro Fukuda
- Computing Software Systems, University of
Washington, Bothell - Funded by
2Outline
- Objectives
- AgentTeamwork
- Job Distribution
- File Distribution
- Performance Evaluation
- Summary
3Objectives
- Utilizing Campus Computing Resources
- Why Mobile Agents?
- Objectives
4Utilizing Campus Computing Resources
1. Objectives
- Labs at 3pm on a weekday
- Computational demands
- Research UWB Brain Network, etc.
- Teaching Parallel Distributed Comp, Network
Design, etc. - Resources set-up and management
- Instructional machines renewed periodically and
maintained - Research machines purchased with external funds
asynchronously - Needs for allocating idle machines to demanding
applications
5Why Mobile Agents?
1. Objectives
- An execution model previously highlighted as a
prospective infrastructure of distributed
systems. - Static job deployment and result collection No
more than an alternative approach to centralized
grid middleware implementation - Our goal Use agents navigational/behavioral
autonomy for grid computing
FTP
Cycle
Cycle
Cycle
Central manger
HTTP
Server
Server
Server
RPC
User
Internet
6Objectives
1. Objectives
- Multi-cluster job coordination
- Supporting clusters of asynchronously-controlled,
independent computing nodes - Check-pointing various communication styles in
applications master slave, heart beat, all
reduce.. - Improving performance for job deployment, file
transfer, monitoring, and resumption
- Use of mobile agents
- Migrating and resuming jobs with agents
- Having each agent check-pointing its
corresponding user process - Performing in a hierarchy of mobile agents
7AgentTeamwork
- System Overview
- Execution Layer
- Top-Down Details of Each Layer
- User Program Wrapper
- Error Recoverable Multi-Cluster Tcp
- UWAgent Mobile Agent Platform
8System Overview
2. AgentTeamwork
User As Process
User As Process
User Bs Process
TCP Communication
Snapshot Methods
GridTCP
User program wrapper
Sentinel Agent
Sentinel Agent
Sentinel Agent
Commander Agent
Commander Agent
Resource Agent
Resource Agent
Bookkeeper Agent
BookkeeperAgent
9Execution Layer
2. AgentTeamwork
Java user applications
mpiJava API
GridTcp
Java socket
User program wrapper
Commander, resource, sentinel, and bookkeeper
agents
UWAgents mobile agent execution platform
Operating systems
10User Program and Wrapper
2. AgentTeamwork
- import AgentTeamwork.Ateam.
- public class MyApplication extends AteamProg
- public MyApplication( Object o )
- public MyApplication( )
- private void compute( )
- if ( MPI.rank( ) 0 )
- MPI.Send( )
- else if ( MPI.rank( ) 1 )
- MPI.Recv( )
- ateam.takeSnapshot( phase )
-
- public static void main( String args )
- MyApplication program null
- if ( ateam.isResumed( )
- program ( MyApplication )
- ateam.retrieveLocalvar( program )
- else
- MPI.Init( args )
- program new MyApplication( )
- import AgentTeamwork.Ateam.
- public class MyApplication extends AteamProg
- 3-4 .....
- private void compute( )
- if ( MPI.rank( ) 0 )
- MPI.Send( )
- else if ( MPI.rank( ) 1 )
- MPI.Recv( )
- ateam.takeSnapshot( phase )
-
- public static void main( String args )
- 13-21 .....
- prgoram.compute( )
- MPI.Finalize( args )
-
backup
backup
incoming
incoming
User Program Wrapper
n1.uwb.edu
User Program Wrapper
Snapshot maintenance
User Program Wrapper
n0.uwb.edu
n2.uwb.edu
11Error-Recoverable MultiCluster Tcp
2. AgentTeamwork
mnode1
mnode1
uw1-320-01
metis
umetis
Internet
medusa.uwb.edu
metis.uwb.edu
perseus.uwb.edu
cluster-private domain
mnode0
mnode1
mnode4
12UWAgent Mobile Agent Platform
2. AgentTeamwork
- An agent hierarchy created per each submission
from the Unix shell - Messages forwarded through an agent tree (or
directly if possible) - A user job scheduled as a thread, using
suspend/resume
13Job Distribution
- Job Deployment in Public Domain
- Challenges in Multi-Cluster Agent Tree
- Multi-Cluster Job Deployment
- Multi-Cluster Job Resumption
14Job Deployment in Public Domain
3. Job Distribution
Job Submission
Commander id 0
XML Query
Spawn
Sentinel id 2 rank 0
Bookkeeper id 3
Resource id 1
XML DB
Sentinel id 8 rank 1
Sentinel id 11 rank 4
Sentinel id 10 rank 3
Sentinel id 9 rank 2
Bookkeeper id 12
Bookkeeper id 13
Sensor id 4
Sensor id 5
snapshot
Sentinel id 32 rank 5
Sentinel id 34 rank 7
Sentinel id 33 rank 6
id agent id rank MPI Rank
15Challenges in Multi-Cluster Agent Tree
3. Job Distribution
Commander id 0
Partition 2
Partition 1
Sentinel id 2 rank 0
Bookkeeper id 3
Resource id 1
Cluster 0
Sentinel id 8 rank 1
Sentinel id 11 rank 4
Sentinel id 10 rank 3
Sentinel id 9 rank 2
Cluster 2
Sentinel id 32 rank 5
Sentinel id 34 rank 7
Sentinel id 33 rank 6
Sentinel id 35 rank 8
Sentinel id 46 rank 19
Sentinel id 47 rank 20
Cluster 1
16Multi-Cluster Job Deployment
3. Job Distribution
Commander id 0
Sentinel id 2 rank -1
Bookkeeper id 3
Resource id 1
Cluster gateway 0
Desktop computers
Sentinel id 8 rank -2
Sentinel id 9 rank X
Cluster gateways 1, 2, and 3
Sentinel id 32 rank 0
Sentinel id 33 rank -3
Sentinel id 34 rank -4
Sentinel id 35 rank -5
Sentinel id 39 rank X4
Sentinel id 38 rank X3
Sentinel id 37 rank X2
Sentinel id 36 rank X1
Sentinel id 131 rank 4
Sentinel id 130 rank 3
Sentinel id 129 rank 2
Sentinel id 132 rank 6
Sentinel id 128 rank 1
Cluster 1
Cluster 3
Cluster 2
Cluster 0
Sentinel id 531 rank 10
Sentinel id 512 rank 5
Sentinel id 530 rank 9
Sentinel id 529 rank 8
Sentinel id 528 rank 7
17Multi-Cluster Job Resumption
3. Job Distribution
Commander id 0
Sentinel id 2 rank -1
Bookkeeper id 3
Resource id 1
Cluster gateway 0
Desktop computers
Sentinel id 8 rank -2
Sentinel id 9 rank 11
Cluster gateways 1, 2, and 3
Sentinel id 32 rank 0
Sentinel id 32 rank 0
Sentinel id 33 rank -3
Sentinel id 34 rank -4
Sentinel id 35 rank -5
Cluster 0
Another Cluster 0
Sentinel id 131 rank 4
Sentinel id 130 rank 3
Sentinel id 129 rank 2
Sentinel id 132 rank 6
Sentinel id 128 rank 1
Sentinel id 131 rank 4
Sentinel id 130 rank 3
Sentinel id 129 rank 2
Sentinel id 128 rank 1
Cluster 1
Cluster 3
Cluster 2
Sentinel id 531 rank 10
Sentinel id 512 rank 5
Sentinel id 530 rank 9
Sentinel id 529 rank 8
Sentinel id 528 rank 7
New Sentinel id 512 rank 5
Sentinel id 512 rank 5
18File Distribution
- File Duplication in a Hierarchy
- File Partitioning
- File Fragmentation and Aggregation
- File-Strip Check-Pointing and Recovery
19File Duplication in a Hierarchy
commander
- A commander agent reads a file
- A sentinel agent duplicates the file
- The sentinel agent sends the file to its child
agents
file
(1) read a file
sentinel
(2) duplicate the file
file
file
file
file
sentinel
sentinel
sentinel
sentinel
file
file
file
file
(3) transfer the files
sentinel
sentinel
sentinel
sentinel
20File Partitioning
4. File Distribution
- MPI/IO-based file
- etype
- a unit of file access
- a primitive data type
- filetype
- a repetitive file record title with etypes
- Divide an entire file into multiple stripes.
- Specify which file portion is delivered a given
MPI rank. - Define an etype and a filetype of each file
portion.
21File Fragmentation and Aggregation
4. File Distribution
key value
GUI
read files
commander Id 0
128_inputFile1_1
128_inputFile1_1
contents
528
contents
32_inputFile1_0
32_inputFile1_0
contents
contents
32_inputFile2_0
contents
32_inputFile2_0
contents
528_inputFile2_7
contents
528_inputFile2_7
contents
128_inputFile1_1
contents
528_inputFile1_7
contents
528_inputFile1_7
contents
root sentinel Id 2
32_inputFile1_0
contents
32_inputFile2_0
contents
sentinel Id 8
32
sentinel Id 9
128
528
528_inputFile2_7
contents
528_inputFile1_7
contents
sentinel Id 32
sentinel Id 33
sentinel Id 38
sentinel Id 36
sentinel Id 37
sentinel Id 39
sentinel Id 128
sentinel Id 129
sentinel Id 130
sentinel Id 131
sentinel Id 132
128_inputFile1_1
contents
32_inputFile1_0
contents
sentinel Id528
32_inputFile2_0
contents
22File-Stripe Check-Pointing and Recovery
- Messages relayed through a tree
- Thread threads created within each sentinel
- Strips passed through the main (user) thread
- Outputs directly sent back to the commander
- Snapshot taken and reported to a sentinel
- Snapshot retrieved upon a crash
- Loss messages requested directly from the
commander and the source agent
commander
bookkeeper
sentinel
sentinel
sentinel
23Performance
- Multi-Cluster Job Deployment
- Multi-Cluster Job Check-Pointing
- Comparison of File Duplication
- File Fragmentation and Pipelined Transfer
- Random-Access File Transfer
- File Recovery Overheads
24Multi-Cluster Job Deployment
5. Performance
AgentTeamwork
Duroc/OpenPBS
Depth First
100Mbps backbone
Snt 8
Snt 32
PBS
PBS
Cluster-R (32CPUs 2.8-3.2GHz 1Gpbs)
Cluster-I (32CPUs 1.5GHz 100M-1Gpbs)
Breath First
Snt 8
Snt 32
PBS
PBS
255. Performance
Multi-Cluster Job Check-Pointing
Commander id 0
Sentinel id 2 rank -1
Bookkeeper id 3
Resource id 1
Sentinel id 8 rank -2
Cluster R Gateway
Sentinel id 33 rank -3
Sentinel id 32 rank 0
Cluster I Gateway
Cluster R
Sentinel id 131 rank 4
Sentinel id 130 rank 3
Sentinel id 129 rank 2
Sentinel id 132 rank 32
Sentinel id 128 rank 1
Cluster I
..
Sentinel id 531 rank 36
Sentinel id 512 rank 5
Sentinel id 530 rank 35
Sentinel id 529 rank 34
Sentinel id 528 rank 33
26Comparison of File Duplication
5. Performance
Commander id 0
sentinel id 2 rank 0
sentinel id 9 rank 0
Sentinel id 36 rank 1
Sentinel id 39 rank 4
Sentinel id 38 rank 3
Sentinel id 37 rank 2
Sentinel id 144 rank 5
Sentinel id 146 rank 7
Sentinel id 145 rank 6
Sentinel id 147 rank 8
Sentinel id 156 rank 19
Sentinel id 157 rank 20
27File Fragmentation and Pipeline Transfer
5. Performance
Commander id 0
read
1M
read
1M
read
1M
read
1M
1M
read
sentinel id 2 rank 0
sentinel id 9 rank 0
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
Sentinel id 36 rank 1
Sentinel id 39 rank 4
Sentinel id 38 rank 3
Sentinel id 37 rank 2
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
Sentinel id 144 rank 5
Sentinel id 146 rank 7
Sentinel id 145 rank 6
Sentinel id 147 rank 8
Sentinel id 156 rank 19
Sentinel id 157 rank 20
28Random-Access File Transfer
5. Performance
29File Recovery Overheads
5. Performance
Commander id 0
read
1M
read
1M
read
1M
read
1M
1M
read
sentinel id 2 rank 0
sentinel id 9 rank 0
- Recovery overhead at a leaf 4.08 sec
- Recovery overheads at the root
- 30.97sec for 11 nodes (2.82 per node)
- 87.28sec for 31 nodes
- 177.4sec for 63 nodes
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
Sentinel id 36 rank 1
Sentinel id 39 rank 4
Sentinel id 38 rank 3
Sentinel id 37 rank 2
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
1M
Sentinel id 144 rank 5
Sentinel id 146 rank 7
Sentinel id 145 rank 6
Sentinel id 147 rank 8
Sentinel id 156 rank 19
Sentinel id 157 rank 20
30Final Comments
31Summary
6. Summary
- Applying mobile agents to multi-cluster computing
- Job deployment/resumption in a two-level agent
hierarchy - Performance improved with depth-first deployment,
direct snapshot transfer, and multiple
bookkeepers - Distributing files in an agent hierarchy
- Duplication, fragmentation, and aggregation of
file stripes at each tree level - Better performance than NSF and best with
random-access file - Recovery overheads increased at higher-level
agents
32Plans
6. Summary
- To complete
- File recovery mechanism
- Native Execution with JNI (Java Native Interface)
- To implement
- Dynamic load-balancing with active agent
migration - Runtime job scheduling
- Our web site
- http//depts.washington.edu/dslab/AgentTeamwork/
33Can AgentTeamwork Become Their Competitor?
6. Summary
Nimrod
34Questions?
35Resource Allocation and Monitoring
7. Appendix Resource Management
Job submission
total nodes x multiplier
Our own XML DB
Commander id 0
Resource id 1
eXist
An XML query
A list of available nodes
CPU Architecture OS Memory Disk Total
nodes Multiplier
Spawn
Sentinel id 2 rank 0
Sentinel id 8 rank 1
Case 1 Total nodes 2 Multiplier 1.5
Bookkeeper id 2 rank 0
Bookkeeper id 12 rank 5
Sentinel id 2 rank 0
Sentinel id 8 rank 1
Bookkeeper id 2 rank 0
Bookkeeper id 12 rank 5
Case 2 Total nodes 2 Multiplier 3
Future use
Future use
Future use
36Computational Granularity 1
7. Appendix More Execution Performance
Master-slave computation
Master
Communication
Slave
Slave
Slave
Slave
Slave
37Computational Granularity 2
7. Appendix More Execution Performance
Heartbeat communication
Process
Process
Process
Process
Process
Communication
38Computational Granularity 3
7. Appendix More Execution Performance
All to all broadcast
Communication
Process
Process
Process
Process
Process
39Performance Evaluation - Series
7. Appendix More Execution Performance
Master-slave computation
40Performance Evaluation - RayTracer
7. Appendix More Execution Performance
All reduce communication but few data to send
41Performance Evaluation MolDyn
7. Appendix More Execution Performance
All to all broadcast
42Overhead of Job Resumption
7. Appendix More Execution Performance