AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination PowerPoint PPT Presentation

presentation player overlay
1 / 43
About This Presentation
Transcript and Presenter's Notes

Title: AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination


1
AgentTeamwork Mobile-Agent-Based Middleware for
Distributed Job Coordination
  • Munehiro Fukuda
  • Computing Software Systems, University of
    Washington, Bothell
  • Funded by

2
Outline
  1. Introduction
  2. Execution Model
  3. System Design
  4. Performance Evaluation
  5. Related Work
  6. Conclusions

3
1. Introduction
  • Why Grid Computing
  • Background
  • Objective
  • Project Overview

4
Why Grid Computing
  • Textbooks say
  • Only 30 CPU utilization
  • Only episodic job requirements
  • Anyone and anywhere like a power grid
  • Many research prototypes and commercial products
  • Globus, Condor, Legion(Avaki), NetSolve, Ninf,
    Entropia PCGrid, Sun Grid Engine, etc.
  • Then, have you ever used them?
  • Probably not so many of you.
  • What is a big hurdle?
  • You dont need it anyway. Or, what?

5
BackgroundMost Grid Systems
  • Functional viewpoints
  • Centralized resource/job management
  • Two drawbacks
  • A powerful central server essential to manage all
    slave computing nodes
  • Applications based on master-slave or
    parameter-sweep model
  • Out motivation
  • Decentralized job distribution, coordination, and
    fault tolerance
  • Applications based on a variety of communication
    models
  • Practical viewpoints
  • Systems dedicated to large institutions/companies
  • Two drawbacks
  • A lot of installation work required under the
    root account.
  • A group of individual computer owners not
    targeted at.
  • Our motivation
  • Easy participation in grid-computing and easy
    installation

6
BackgroundHow to Pursue Our Motivation
  • Use of mobile agents
  • We are experts in mobile agents.
  • Most mobile agents
  • An execution model previously highlighted as a
    prospective infrastructure of distributed
    systems.
  • No more than an alternative approach to
    centralized grid middleware implementation.
  • Our initial goal
  • Decentralized middleware design with mobile agents

7
Objective
  • A mobile agent execution platform fitted to grid
    computing
  • Allowing an agent to identify which MPI rank to
    handle and which agent to send a job snapshot to.
  • A fault-tolerant inter-process communication
  • Recovering lost messages.
  • Allowing over-gateway connections.
  • Agent-collaborative algorithms for job
    coordination
  • Allocating computing nodes in a distributed
    manner.
  • Implementing decentralized snapshot maintenance
    and job recovery.

8
Project Overview
  • Funded by NSF Middleware Initiative
  • Sponsored by University of Washington
  • In Collaboration of Ehime University
  • In a Team of UWB Undergraduates

9
2. Execution Model
  • System Overview
  • Execution Layer
  • Programming Interface

10
System Overview
User As Process
User As Process
User Bs Process
TCP Communication
Snapshot Methods
GridTCP
User program wrapper
Sentinel Agent
Sentinel Agent
Sentinel Agent
Commander Agent
Commander Agent
Resource Agent
Resource Agent
Bookkeeper Agent
BookkeeperAgent
11
Execution Layer
Java user applications
mpiJava API
GridTcp
Java socket
User program wrapper
Commander, resource, sentinel, and bookkeeper
agents
UWAgents mobile agent execution platform
Operating systems
12
Programming Interface
  • public class MyApplication
  • public GridIpEntry ipEntry //
    used by the GridTcp socket library
  • public int funcId //
    used by the user program wrapper
  • public GridTcp tcp // the
    GridTcp error-recoverable socket
  • public int nprocess //
    processors
  • public int myRank //
    processor id ( or mpi rank)
  • public int func_0( String args ) //
    constructor
  • MPJ.Init( args, ipEntry, tcp ) //
    invoke mpiJava-A
  • ..... //
    more statements to be inserted
  • return 1 //
    calls func_1( )
  • public int func_1( ) //
    called from func_0
  • if ( MPJ.COMM_WORLD.Rank( ) 0 )
  • MPJ.COMM_WORLD.Send( ... )
  • else
  • MPJ.COMM_WORLD.Recv( ... )
  • ..... //
    more statements to be inserted
  • return 2 //
    calls func_2( )

13
3. System Design
  • Mobile Agents
  • Job Coordination
  • Distribution
  • Monitoring
  • Resumption and migration
  • Programming Support
  • Language preprocessing
  • Communication check-pointing

14
UWAgents Concept of Agent Domain
  • Agent domain created per each submission from the
    Unix shell
  • children each agent can spawn is given upon the
    initial submission
  • No name server
  • Messages forwarded through an agent tree
  • A user job scheduled as a thread, using
    suspend/resume

15
UWAgents Over Gateway Migration
16
Job Distribution
Job Submission
Commander id 0
XML Query
Spawn
Sentinel id 2 rank 0
Bookkeeper id 3 rank 0
Resource id 1
eXist
Sentinel id 8 rank 1
Sentinel id 11 rank 4
Sentinel id 10 rank 3
Sentinel id 9 rank 2
Bookkeeper id 12 rank 1
Bookkeeper id 15 rank 4
Bookkeeper id 14 rank 3
Bookkeeper id 13 rank 2
Sensor id 4
Sensor id 5
Sentinel id 32 rank 5
Sentinel id 34 rank 7
Sentinel id 33 rank 6
Bookkeeper id 48 rank 5
Bookkeeper id 50 rank 7
Bookkeeper id 49 rank 6
id agent id rank MPI Rank
17
Resource Allocation
Job submission
total nodes x multiplier
Commander id 0
Resource id 1
eXist
An XML query
CPU Architecture OS Memory Disk Total
nodes Multiplier
A list of available nodes
Spawn
Sentinel id 2 rank 0
Sentinel id 8 rank 1
Case 1 Total nodes 2 Multiplier 1.5
Bookkeeper id 2 rank 0
Bookkeeper id 12 rank 5
Future use
Sentinel id 2 rank 0
Sentinel id 8 rank 1
Bookkeeper id 2 rank 0
Bookkeeper id 12 rank 5
Case 2 Total nodes 2 Multiplier 3
Future use
Future use
18
Resource Monitoring
A resource request
Commander id 0
Resource id 1
eXist
A list of available nodes
Spawn
  • Current restrictions
  • Minimum interval 3secs
  • Static distribution of sensor agents
  • Future extensions
  • Sensor migration
  • Use of NWS at each site

19
Job Resumption by a Parent Sentinel
Sentinel id 2 rank 0
MPI connections
Sentinel id 8 rank 1
Sentinel id 11 rank 4
Sentinel id 10 rank 3
Sentinel id 9 rank 2
Bookkeeper id 15 rank 4
20
Job Resumption by a Child Sentinel
Commander id 0
New
Sentinel id 2 rank 0
Bookkeeper id 3 rank 0
Resource id 1
Sentinel id 8 rank 1
Bookkeeper id 12 rank 1
21
User Program Wrapper
User Program Wrapper
Source Code
int fid 1 while( fid -2) switch(
func_id ) case 0 fid func_0( ) case
1 fid func_1( ) case 2 fid func_2( )
check_point( ) // save this object
// including func_id // into a file
func_0( ) statement_1 statement_2
statement_3 return 1 func_1( )
statement_4 statement_5 statement_6
return 2 func_2( ) statement_7
statement_8 statement_9 return -2
statement_1 statement_2 statement_3 statement_
4 statement_5 statement_6 statement_7 stateme
nt_8 statement_9
check_point( ) check_point(
) check_point( )
Preprocessed
22
Preproccesser and Drawback
Preprocessed Code
Source Code
Preprocessed
int func_0( ) statement_1 statement_2
statement_3 return 1 int func_1( )
while() statement_4 if ()
statement_5 return 2 else
statement_7 statement_8
int func_2( ) statement_6 statement_8
while() statement_4 if ()
statement_5 return 2
else statement_7 statement8

statement_1 statement_2 statement_3 check_point
( ) while () statement_4 if ()
statement_5 check_point( )
statement_6 else statement_7
statement_8 check_point( )
Before check_point( ) in if-clause
After check_point( ) in if-clause
  • No recursions
  • Useless source line numbers indicated upon errors
  • Still need of explicit snapshot points.

23
GridTcp Check-Pointed Connection
User Program Wrapper
rank ip
1 n1.uwb.edu
2 n2.uwb.edu
user program
TCP
outgoing
backup
incoming
Snapshot maintenance
n1.uwb.edu
n2.uwb.edu
  • Outgoing packets saved in a backup queue
  • All packets serialized in a backup file every
    check pointing
  • Upon a migration
  • Packets de-serialized from a backup file
  • Backup packets restored in outgoing queue
  • IP table updated

n3.uwb.edu
24
GridTcp Over-Gateway Connection
User Program Wrapper
User Program Wrapper
User Program Wrapper
User Program Wrapper
rank dest gateway
0 mnode0 -
1 medusa -
2 uw1-320 medusa
3 uw1-320-00 medusa
rank dest gateway
0 mnode0 -
1 medusa -
2 uw1-320 -
3 uw1-320-00 Uw1-320
rank dest gateway
0 mnode0 medusa
1 medusa -
2 uw1-320 -
3 uw1-320-00 -
rank dest gateway
0 mnode0 uw1-320
1 medusa uw1-320
2 uw1-320 -
3 uw1-320-00 -
user program
user program
user program
user program
medusa.uwb.edu (rank 1)
uw1-320.uwb.edu (rank 2)
uw1-320-00 (rank 3)
  • RIP-like connection
  • Restriction each node name must be unique.

mnode0 (rank 0)
25
MPJ Package
MPJ
Init( ), Rank( ), Size( ), and Finalize( )
Communicator
All communication functions Send( ), Recv( ),
Gather( ), Reduce( ), etc.
JavaComm
mpiJava-S uses java sockets and server sockets.
GridComm
mpiJava-A uses GridTcp sockets.
DataType
MPJ.INT, MPJ.LONG, etc.
  • InputStream for each rank
  • OutputStream for each rank
  • User a permanent 64K buffer for serialization
  • Emulate collective communication sending the same
    data to each OutputStream, which deteriorates
    performance

MPJMessage
getStatus( ), getMessage( ), etc.
Op
Operate( )
etc
Other utilities
26
MPI Job Execution
UWPlace (UWAgent Execution Platform)
27
4. Performance Evaluation
  • Evaluation Environment
  • A 8-node Myrinet-2000 cluster 2.8GHz
    pentium4-Xeon w/ 512MB
  • A 24-node Giga-Ethernet cluster 3.4GHz
    Pentium4-Xeon w/512MB
  • Computation Granularity
  • Java Grande MPJ Benchmark
  • Process Resumption Overhead

28
MPJ.Send and Recv Performance
29
Computational Granularity 1
30
Computational Granularity 2
31
Computational Granularity 3
32
Performance Evaluation - Series
33
Performance Evaluation - RayTracer
34
Performance Evaluation MolDyn
35
Overhead of Job Resumption
36
5. Related Work
  • From the viewpoints of
  • System Architecture
  • Mobile Agents
  • Fault Tolerance

37
System Architecture
Systems Architectural basis
Globus A toolkit
Condor Process migration
Ninf, NetSolve RPC
Legion (Avaki) OO
Catalina, J-SEAL2, AgentTeamwork Mobile agents
  • Difference from Catalina/J-SEAL2
  • They are not fully implemented.
  • They are based on a master-slave model

38
Mobile Agents
Mobile agents Naming Cascading termination Job scheduling Security
IBM Aglets AgeltFinder traces all agents Needs to retract one by one Schedules jobs with Baglets. Java byte-code verification
Voyager RPC-based system-unique agent IDs Needs to be implemented at a user level Launches an independent user process. CORBA security service
DAgent Unpredictable agent IDs Needs to be implemented at a user level Launches an independent user process. A currency-based model
Ara (Obsolete) Unpredictable agent IDs Calls ara_kill to kill all agents Launches an independent user process. An allowance model
UWAgent Agent domain Waits for all descendants termination Schedules jobs with Java thread functions. Agent-to-agent security w/ Agent domain
39
Fault Tolerance
Systems Libraries Data recovery Communication recovery
Legion (Avaki) FT-MPI Variables passed to MPI_FT_save( ) N/A
Condor MW Library All master data Master-worker communication
Dome Dome_env Objects declared as dXXX lttypegt N/A
AgentTeamwork GridTcp All serializable class data All in-transit messages
40
6. Conclusions
  • Project Summary
  • Next Two Years

41
Project summary
  • Our focus
  • A decentralized job execution and fault-tolerant
    environment
  • Applications not restricted to the master-slave
    or parameter-sweeping model.
  • Applications
  • 40,000 doubles x 10,000 floating-point operations
  • Moderate data transfer combined with
    massive/collective communication
  • At least three times larger than its
    computational granularity
  • Current status
  • UWAgent completed
  • Agent behavioral design basic job
    deployment/resumption implemented
  • User program wrapper completed except security
    feature
  • GridTcp/mpiJava in testing
  • Preprocessor in design

42
Next Two Years
  • Application support
  • Preprocessor implementation
  • Efficient input/output file transfer
  • Security enhancement in remote execution
  • GUI improvement
  • Agent algorithms
  • Over-gateway application deployment
  • Dynamic resource monitoring
  • Priority-based agent migration
  • Performance evaluation
  • Dissemination

43
Questions?
Write a Comment
User Comments (0)
About PowerShow.com