Title: Connecting Condor Pools into Computational Grids by Jini
1Connecting Condor Pools into Computational Grids
by Jini
- Gergely Sipos and Péter Kacsuk
- MTA SZTAKI Computer and Automation Research
Institute - sipos, kacsuk_at_sztaki.hu
2Goals of the research
- To create Grid systems by connecting large number
of clusters in a dynamic way - Condor is successful approach (see Hungarian
ClusterGrid with more than 1000 PCs), but - To overcome the limitation of Condor Grids where
friendly Condor pools are connected using the
flocking technique - To demonstrate that the service-oriented Grid
approach (such as OGSA) is usable for HPC - To demonstrate that Jini is a viable alternative
to create service-oriented Grid systems
3A pure Condor grid
Condor pool
Condor pool
Condor pool
Condor pool
Condor pool
Static friendly relationships
4A pure Condor grid
Resources do meet the requirements of the job
execute it
Condor pool
Condor pool
Resources do not meet the requirements of the
job forward it to a friendly pool
Condor pool
Condor pool
Client machine
Condor pool
5Problems with Condor grids
- Friendly relationships are defined statically.
- Firewalls are not allowed between friendly pools.
- Client can not choose pool directly.
- Private Condor protocols are used to connect
friendly pools together. - The client needs an account to be able to submit
a job and get the results back. - Not service-oriented
6The role of Jini in connecting Condor pools
Condor pool
Jini service program
Matching
Jini lookup service
Client machine
Jini client program
Condor pool
Jini service program
Condor pool
Jini service program
7The role of Jini in Condor-based grids
Everything is made dynamic
Condor pool
Jini service program
Jini lookup service
register
Client machine
Jini client program
Condor pool
Jini service program
requirements
proxy attributes
proxy
proxy attributes
proxy attributes
register
Condor pool
Jini service program
register
Job
submit
8The functioning of the service 1. job submission
Can be used to submit a job
Client machine
Condor cluster
Cluster front-end node
node
node
node
Jini client program
proc
Jini service program
proxy
Central manager node
- To control the remote job
- Stop the job
- Get status
Condor daemon
Local file system
Local file system
Local file system
HTTP server
HTTP server
9The functioning of the service2. result download
Client machine
Condor cluster
Cluster front-end node
node
node
node
Jini client program
Job process
Job process
proc
Jini service program
proxy
Central manager node
job proxy
Condor daemon
Local file system
Local file system
Local file system
HTTP server
HTTP server
10The functioning of the service 2. result download
Client machine
Condor cluster
Cluster front-end node
node
node
node
Jini client program
Job process
Job process
proc
Jini service program
proxy
Central manager node
job proxy
archive results
Condor daemon
Local file system
Local file system
Local file system
HTTP server
HTTP server
11Cluster-level security
Demilitarized zone
Client machine
Condor cluster
Cluster front-end node
node
node
node
Jini client program
Job process
Job process
proc
Jini service program
proxy
Central manager node
job proxy
archive results
Condor daemon
Local file system
Local file system
Local file system
Condor-clusters can be protected by firewalls
HTTP server
HTTP server
12Application monitoring by GAMI
- GAMI Grid Application Monitoring Infrastructure
(developed by SZTAKI in the DataGrid and GridLab
projects) - On the client side
- The Instrumentation API of GRM has to be used
during application development - GRM trace collector has to be started
- PROVE trace visualiser has to be started
- On the cluster side
- Mercury monitor service has to be started
13Application monitoring scenario
Jini lookup service
The contact URL of the Monitor Service. (Stored
as a service attribute)
Condor cluster
node
node
Cluster front-end node
Client machine
Jini client program
Jini service program
proc
Local monitor
Local monitor
job proxy
ID
Necessary information to subscribe for the trace
of a job.
Central manager node
MercuryMain monitor
Monitor service
Prove, GRM
14Application monitoring scenario
Jini lookup service
Condor cluster
proxy
MS URL
node
node
register
download
Job process
Job process
Cluster front-end node
Client machine
ID
ID
Jini client program
Jini service program
proxy
MS URL
proc
Local monitor
Local monitor
job proxy
ID
job proxy
ID
ID
ID
Central manager node
MercuryMain monitor
Monitor service
Prove, GRM
15Application monitoring scenario
Jini lookup service
On-line application monitoring and visualization
Condor cluster
proxy
MS URL
node
node
register
download
Job process
Job process
Cluster front-end node
Client machine
ID
ID
Jini client program
Jini service program
proxy
MS URL
proc
Local monitor
Local monitor
job proxy
ID
job proxy
ID
ID
ID
Central manager node
MercuryMain monitor
Monitor service
Prove, GRM
subscribe
16Conclusions 1.
- We have created a service-oriented Grid by
connecting Condor clusters - Protected by firewalls
- Applying publicly known protocols and API
- Providing on-line monitoring and visualization
- Implementation by Jini
- Authentication and authorization facility from
Jini 2.0 - Generalization possibilities
- Jini services can be easily replaced by GT-3 or
GT-4 services - Condor replaceable by any local job manager (SGE,
PBS, etc.)
17Conclusions 2.
- Successful test on several clusters of Hungarys
country-wide academic network. - Live demo at the IEEE Cluster2003 conference
- Part of the Hungarian Jgrid project
- Project web page pds.irt.vein.hu/jgrid
- Thanks for your attention