Title: Bridging Unicore and Condor
1Bridging Unicore and Condor
- Hidemoto Nakada
- National Institute of Advanced Industrial Science
and Technology, Japan
2Background The NAREGI project (1)
- NAtional REsearch Grid Initiative
- Japanese national Grid project, funded by
Ministry of Education. - 5 years project, starts April, 2003
- 17M / year
- The Goals
- To develop a Grid Middleware and Upper Layer
- To construct a production Grid for Nano-science
simulations - Organization
- Corp. Fujitsu, Hitachi, NEC
- Academic AIST, NII, Titech, Institute for
Molecular Science
3Background The NAREGI project (2)
- For the first 2 year, employ Condor, UNICORE and
Globus as the bases - Construct a Grid testbed using these three
- They must be interoperable
Condor
UNICORE
Globus
4The UNICORE System
- A Grid middleware developed mainly by Fujitsu
Lab. Europe - Owned by UNICORE Forum
- Designed to utilize several supercomputers
installed distributed supercomputer centers - SSH based security model
- No PROXY CERT.
- Firewall Aware (c.f. Globus)
- Connection is one-way
- Can be used from private addressed network
- Submit and Disconnect!
- Totally in Java (except for small perl scripts)
5The UNICORE System
- Workflow Management
- Everything is a task
- Invocation of a executable file
- File Transfer staging in, out
- Workflow (task flow) is represented as a Java
Object (AJO Abstract Job Object) - Flow control structures are provided
- If branch
- For loop
- The workflow graph can be cyclic !
- No scheduler/broker included
- NAREGI will provide it
6The UNICORE architecture
- Gateway
- Application level Router
- Runs on a Firewall
- Relay all communications
- SSL based security
- NJS (Network Job Supervisor)
- Workflow engine
- Interpret AJO and execute
- TSI (Target System Interface)
- Wrap batch sub system
- Implemented in Perl
Client
Firewall
Gateway
NJS
TSI
Batch Subsystem
Vsite
Usite
7GUI Client (UNICORE Pro Client)
- GUI
- Edit work flow
- Monitors jobs
- Not freely available, right now
- Provided by a company called Pallas
- Will be soon ...
8Overview of UNICORE
- Usite Supercomputer center
- Vsite Cluster or Supercomputer in the center
Client
Vsite
Vsite
Vsite
Usite
Usite
Firewall
9Bridging Condor and UNICORE
- UNICORE ? Condor
- Use Condor as local scheduler within a single
site. - EASY just write a TSI perl script.
- C.f. Condor Job-manager for Globus
- Condor ? UNICORE
- Use Condor as a global scheduler
- Unicore serves as local scheduler
- NOT SO EASY have to implement bridging modules
- C.f. Condor-G for Globus
10UNICORE-C UNICORE ? Condor
11Unicore-C Overview
Client
Firewall
Gateway
NJS
Condor Pool
TSI
TSI
PBS
Condor Submit
Vsite
Usite
12TSI Target System Interface
- Written in Perl
- Takes care of
- Job invocation
- File placement
13NJS-TSI interface
NJS
NJS
Script
Script
TSI
TSI
Script
Condor Submit
Submit
qsub
PBS
Condor
14Implication of Unicore-C
- UNICORE serves as a work flow engine for Condor
- C.f. DAGMAN
- Users can use GUI to edit Workflow graphs
- UNICORE as a submission tool for the Condor
- Users can submit jobs from outside of the cite
- Can submit from private addressed network
- Submit and Disconnect
15Condor-U Condor ? UNICORE
16Condor-U overview
- Condor-G
- Condor ? Globus bridge
- Replace G (Globus) to U (UNICORE)
17Condor-G Overview
GLOBUS WORLD
Condor Submit Machine
GRAM Protocol
Schedd
GateKeeper
User
JobManager
Grid Manager
Batch System
Globus GAHP
Globus GAHP Server
Job
18What is the GAHP (Grid ASCII Helper Protocol)?
- Text based simple protocol
- Introduced to cope with the Globus module
instability - Encapsulate Globus module inside the GAHP Server
- Originally, it was Globus ASCII Helper Protocol
- Separates Return and Result
- To enable asynchronous operation
- Return comes immedidate
- Result comes later
Grid Manager
GAHP Server
Request
Return
Result
19Condor-G Overview
GLOBUS WORLD
Condor Submit Machine
GRAM Protocol
Schedd
GateKeeper
User
JobManager
Grid Manager
Batch System
Globus GAHP
Globus GAHP Server
Job
20Condor-U Overview
UNICORE WORLD
Condor Submit Machine
UNICORE Protocol
Schedd
Firewall
Gateway
NJS
User
Grid Manager
TSI
UNICORE GAHP
Batch Subsystem
Job
UNICORE GAHP Server
Vsite
Usite
21Almost the Same!
- Can we do it just by re-implementing the GAHP
Server? - NO!
- The GAHP command set is Globus Specific
- Cannot be used for UNICORE
22GAHP command set for Globus
- INITIALIZE_FROM_FILE
- INITIALIZE_FROM_MYPROXY
- COMMANDS
- VERSION
- ASYNC_MODE_ON
- ASYNC_MODE_OFF
- QUIT
- RESULTS
- GRAM_CALLBACK_ALLOW
- GRAM_ERROR_STRING
- GRAM_JOB_REQUEST
- GRAM_JOB_CANCEL
- GRAM_JOB_STATUS
- GRAM_JOB_SIGNAL
- GRAM_PING
- GRAM_JOB_CALLBACK_REGISTER
- GASS_SERVER_INIT
- REFRESH_PROXY_FROM_FILE
- MYPROXY_REFRESH
- MYPROXY_RETRIEVE
- PROXY_INFO
- MYPROXY_DESTROY
- MYPROXY_DELEGATE
23What we did
- Redesign the GAHP command set
- Simple and Generic as much as possible
- Note The GAHP protocol is not changed
- Implement the Grid manager for UNICORE
- Can be reused for other systems
- Done by Jaime Fry _at_ Condor Team
- Implement the GAHP server for UNICORE
24Design principle of the GAHP commands for UNICORE
- Simple and Generic
- High level commands
- Just 4 commands - c.f. 23 commands for Globus
- Hide UNICORE specific logic in the GAHP server,
not the Grid Manager - So that it can be used for other systems.
- Use ClassAd as a command argument and a return
value - To ensure the generality of the command set
- System specific things are encapsulated in the
ClassAd - You can extend the functionality by just
extending the ClassAd attribute, without touching
the Command set itself
25Command set for Unicore GAHP
- Job Create
- Create a Job
- Input ClassAd
- OutputJob Handle
- Job Start
- Invoke the Job
- InputJob Handle
- Job Status
- Query the Job status
- InputJob Handle
- Output Status ClassAd
- Job Destroy
- Destroy information stored in the GAHP server
- InputJob handle
GridManager
Unicore GAHP
Job Create
Job Handle
Job Start
Job Status
Running
Job Status
Complete
Job Destroy
26ClassAd Attributes (1) generic
Cmd pathname of the command to execute /home/foo/a.exe
Args arguments for the execuatble arg1, -a
Env environment variables LANGen_US
current directory home/nakada/condor
In standard input input.dat
Out standard output ouput.dat
Err error Error.dat
TransferInput Stage in files a.exe, input.dat
TransferOutput Stage out files out.dat
JobStatus Condor job status jobClassAdAttributes
ErrMessage Error messages Script reported no errors
RemoteWallClockTime Execution wall clock time 123.0
ByteSent Bytes sent 1023004
ByteRecvd Bytes received 1023004
ExitBySignal The job process quitted by signal? TRUE
ExitCode Exit code of the process 1
ExitSignal Exit signal of the process 9
27ClassAd attributes (2) UNICORE specific
UnicoreUsite FQDN and port number for the Unicore Usite gateway fujitsu.com1234
UnicoreVsite Vsite name NaReGI
KeystoreFile Keystore file name /home/foo/key
PassphraseFile Pass phrase file name /home/foo/passwd
UnicoreJobId Job ID used as the handle fujitsu.com1234/NaReGI/1374036929
UnicoreJobStatus UNICORE job status
UnicoreLog UNICORE log filename /var/log/unicore.log
28Submit file sample for Condor-U
- Universe globus
- SubUniverse unicore
- Executable a.out
- output tmpOut
- error tmpErr
- log tmp.log
- UnicoreUsite fujitsu.com1234
- UnicoreVsite NaReGI
- KeystoreFile /home/foo/key
- PassphraseFile /home/foo/passwd
- Queue
Historical reason
Specifies GAHP Server
Specifies Site will be used
To get certificate
29Implication of the Condor-U
- Condor users can use resources managed by the
UNICORE - From out side of the sites
- Users can use Condor as a job-scheduler for
UNICORE managed resources - Condor GlideIn might be used on it
- Communication between nodes have to be assured
it is not common for UNICORE setup
30Current Status
- UNICORE-C
- Done
- Will be available soon from our Web site
- http//www.naregi.org/
- Condor-U
- Under implementation
- Will be available by this summer, I hope.
31Summary
- Condor and UNICORE and Globus are bridged
together! - Users can submit jobs from one system to another
- The UNICORE GAHP server command set is Generic
and Simple - Can be used to bridge to other systems
32Thank you!