Bridging Unicore and Condor - PowerPoint PPT Presentation

About This Presentation
Title:

Bridging Unicore and Condor

Description:

Bridging Unicore and Condor Hidemoto Nakada National Institute of Advanced Industrial Science and Technology, Japan – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 32
Provided by: wisc175
Category:

less

Transcript and Presenter's Notes

Title: Bridging Unicore and Condor


1
Bridging Unicore and Condor
  • Hidemoto Nakada
  • National Institute of Advanced Industrial Science
    and Technology, Japan

2
Background The NAREGI project (1)
  • NAtional REsearch Grid Initiative
  • Japanese national Grid project, funded by
    Ministry of Education.
  • 5 years project, starts April, 2003
  • 17M / year
  • The Goals
  • To develop a Grid Middleware and Upper Layer
  • To construct a production Grid for Nano-science
    simulations
  • Organization
  • Corp. Fujitsu, Hitachi, NEC
  • Academic AIST, NII, Titech, Institute for
    Molecular Science

3
Background The NAREGI project (2)
  • For the first 2 year, employ Condor, UNICORE and
    Globus as the bases
  • Construct a Grid testbed using these three
  • They must be interoperable

Condor
UNICORE
Globus
4
The UNICORE System
  • A Grid middleware developed mainly by Fujitsu
    Lab. Europe
  • Owned by UNICORE Forum
  • Designed to utilize several supercomputers
    installed distributed supercomputer centers
  • SSH based security model
  • No PROXY CERT.
  • Firewall Aware (c.f. Globus)
  • Connection is one-way
  • Can be used from private addressed network
  • Submit and Disconnect!
  • Totally in Java (except for small perl scripts)

5
The UNICORE System
  • Workflow Management
  • Everything is a task
  • Invocation of a executable file
  • File Transfer staging in, out
  • Workflow (task flow) is represented as a Java
    Object (AJO Abstract Job Object)
  • Flow control structures are provided
  • If branch
  • For loop
  • The workflow graph can be cyclic !
  • No scheduler/broker included
  • NAREGI will provide it

6
The UNICORE architecture
  • Gateway
  • Application level Router
  • Runs on a Firewall
  • Relay all communications
  • SSL based security
  • NJS (Network Job Supervisor)
  • Workflow engine
  • Interpret AJO and execute
  • TSI (Target System Interface)
  • Wrap batch sub system
  • Implemented in Perl

Client
Firewall
Gateway
NJS
TSI
Batch Subsystem
Vsite
Usite
7
GUI Client (UNICORE Pro Client)
  • GUI
  • Edit work flow
  • Monitors jobs
  • Not freely available, right now
  • Provided by a company called Pallas
  • Will be soon ...

8
Overview of UNICORE
  • Usite Supercomputer center
  • Vsite Cluster or Supercomputer in the center

Client
Vsite
Vsite
Vsite
Usite
Usite
Firewall
9
Bridging Condor and UNICORE
  • UNICORE ? Condor
  • Use Condor as local scheduler within a single
    site.
  • EASY just write a TSI perl script.
  • C.f. Condor Job-manager for Globus
  • Condor ? UNICORE
  • Use Condor as a global scheduler
  • Unicore serves as local scheduler
  • NOT SO EASY have to implement bridging modules
  • C.f. Condor-G for Globus

10
UNICORE-C UNICORE ? Condor
11
Unicore-C Overview
Client
Firewall
Gateway

NJS
Condor Pool
TSI
TSI
PBS
Condor Submit
Vsite
Usite
12
TSI Target System Interface
  • Written in Perl
  • Takes care of
  • Job invocation
  • File placement

13
NJS-TSI interface
NJS
NJS
Script
Script
TSI
TSI
Script
Condor Submit
Submit
qsub
PBS
Condor
14
Implication of Unicore-C
  • UNICORE serves as a work flow engine for Condor
  • C.f. DAGMAN
  • Users can use GUI to edit Workflow graphs
  • UNICORE as a submission tool for the Condor
  • Users can submit jobs from outside of the cite
  • Can submit from private addressed network
  • Submit and Disconnect

15
Condor-U Condor ? UNICORE
16
Condor-U overview
  • Condor-G
  • Condor ? Globus bridge
  • Replace G (Globus) to U (UNICORE)

17
Condor-G Overview
GLOBUS WORLD
Condor Submit Machine
GRAM Protocol
Schedd
GateKeeper
User
JobManager
Grid Manager
Batch System
Globus GAHP
Globus GAHP Server
Job
18
What is the GAHP (Grid ASCII Helper Protocol)?
  • Text based simple protocol
  • Introduced to cope with the Globus module
    instability
  • Encapsulate Globus module inside the GAHP Server
  • Originally, it was Globus ASCII Helper Protocol
  • Separates Return and Result
  • To enable asynchronous operation
  • Return comes immedidate
  • Result comes later

Grid Manager
GAHP Server
Request
Return
Result
19
Condor-G Overview
GLOBUS WORLD
Condor Submit Machine
GRAM Protocol
Schedd
GateKeeper
User
JobManager
Grid Manager
Batch System
Globus GAHP
Globus GAHP Server
Job
20
Condor-U Overview
UNICORE WORLD
Condor Submit Machine
UNICORE Protocol
Schedd
Firewall
Gateway
NJS
User
Grid Manager
TSI
UNICORE GAHP
Batch Subsystem
Job
UNICORE GAHP Server
Vsite
Usite
21
Almost the Same!
  • Can we do it just by re-implementing the GAHP
    Server?
  • NO!
  • The GAHP command set is Globus Specific
  • Cannot be used for UNICORE

22
GAHP command set for Globus
  • INITIALIZE_FROM_FILE
  • INITIALIZE_FROM_MYPROXY
  • COMMANDS
  • VERSION
  • ASYNC_MODE_ON
  • ASYNC_MODE_OFF
  • QUIT
  • RESULTS
  • GRAM_CALLBACK_ALLOW
  • GRAM_ERROR_STRING
  • GRAM_JOB_REQUEST
  • GRAM_JOB_CANCEL
  • GRAM_JOB_STATUS
  • GRAM_JOB_SIGNAL
  • GRAM_PING
  • GRAM_JOB_CALLBACK_REGISTER
  • GASS_SERVER_INIT
  • REFRESH_PROXY_FROM_FILE
  • MYPROXY_REFRESH
  • MYPROXY_RETRIEVE
  • PROXY_INFO
  • MYPROXY_DESTROY
  • MYPROXY_DELEGATE

23
What we did
  • Redesign the GAHP command set
  • Simple and Generic as much as possible
  • Note The GAHP protocol is not changed
  • Implement the Grid manager for UNICORE
  • Can be reused for other systems
  • Done by Jaime Fry _at_ Condor Team
  • Implement the GAHP server for UNICORE

24
Design principle of the GAHP commands for UNICORE
  • Simple and Generic
  • High level commands
  • Just 4 commands - c.f. 23 commands for Globus
  • Hide UNICORE specific logic in the GAHP server,
    not the Grid Manager
  • So that it can be used for other systems.
  • Use ClassAd as a command argument and a return
    value
  • To ensure the generality of the command set
  • System specific things are encapsulated in the
    ClassAd
  • You can extend the functionality by just
    extending the ClassAd attribute, without touching
    the Command set itself

25
Command set for Unicore GAHP
  • Job Create
  • Create a Job
  • Input ClassAd
  • OutputJob Handle
  • Job Start
  • Invoke the Job
  • InputJob Handle
  • Job Status
  • Query the Job status
  • InputJob Handle
  • Output Status ClassAd
  • Job Destroy
  • Destroy information stored in the GAHP server
  • InputJob handle

GridManager
Unicore GAHP
Job Create
Job Handle
Job Start
Job Status
Running
Job Status
Complete
Job Destroy
26
ClassAd Attributes (1) generic
Cmd pathname of the command to execute /home/foo/a.exe
Args arguments for the execuatble arg1, -a
Env environment variables LANGen_US
current directory home/nakada/condor
In standard input input.dat
Out standard output ouput.dat
Err error Error.dat
TransferInput Stage in files a.exe, input.dat
TransferOutput Stage out files out.dat
JobStatus Condor job status jobClassAdAttributes
ErrMessage Error messages Script reported no errors
RemoteWallClockTime Execution wall clock time 123.0
ByteSent Bytes sent 1023004
ByteRecvd Bytes received 1023004
ExitBySignal The job process quitted by signal? TRUE
ExitCode Exit code of the process 1
ExitSignal Exit signal of the process 9
27
ClassAd attributes (2) UNICORE specific
UnicoreUsite FQDN and port number for the Unicore Usite gateway fujitsu.com1234
UnicoreVsite Vsite name NaReGI
KeystoreFile Keystore file name /home/foo/key
PassphraseFile Pass phrase file name /home/foo/passwd
UnicoreJobId Job ID used as the handle fujitsu.com1234/NaReGI/1374036929
UnicoreJobStatus UNICORE job status
UnicoreLog UNICORE log filename /var/log/unicore.log
28
Submit file sample for Condor-U
  • Universe globus
  • SubUniverse unicore
  • Executable a.out
  • output tmpOut
  • error tmpErr
  • log tmp.log
  • UnicoreUsite fujitsu.com1234
  • UnicoreVsite NaReGI
  • KeystoreFile /home/foo/key
  • PassphraseFile /home/foo/passwd
  • Queue

Historical reason
Specifies GAHP Server
Specifies Site will be used
To get certificate
29
Implication of the Condor-U
  • Condor users can use resources managed by the
    UNICORE
  • From out side of the sites
  • Users can use Condor as a job-scheduler for
    UNICORE managed resources
  • Condor GlideIn might be used on it
  • Communication between nodes have to be assured
    it is not common for UNICORE setup

30
Current Status
  • UNICORE-C
  • Done
  • Will be available soon from our Web site
  • http//www.naregi.org/
  • Condor-U
  • Under implementation
  • Will be available by this summer, I hope.

31
Summary
  • Condor and UNICORE and Globus are bridged
    together!
  • Users can submit jobs from one system to another
  • The UNICORE GAHP server command set is Generic
    and Simple
  • Can be used to bridge to other systems

32
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com