Title: Remote Deployment and Execution in Distributed Systems
1Remote Deployment and Execution in Distributed
Systems
2Agenda
- Historical Development of Distributed Systems
- Basic Questions of Remote Deployment and
Execution - Example application Image Rendering with POV-Ray
- Remote Deployment and Execution with
- a Simple Shell Script
- the Distributed Resource Management System Condor
- the Globus Grid Toolkit
3Historical Development of Distributed Systems
- Problem Execution of computationally intensive
jobs - First solution supercomputer
- Expensive much processing power ownership by
one organisation restricted access - Introduction of personal computers
- Cheap each user can own one little processing
power - Introduction of internet and Web technolgies
- World wide access to any resource global
distribution possible - Second solution clustering of personal computers
- Cheap distributed scalable
- Type 1 Distributed ownership, heterogeneous
resources - Type 2 Central ownership, homogeneous resources
- Third solution Grid computing
- Share resources (supercomputer, cluster) across
organizational borders through standardized
interfaces and protocols
Job Submission
Job Submission
Job Submission
4Basic Questions of Remote Deployment and Execution
- How do we describe what we want to do ?
- Shell script, C program
- Job description
- How do we transfer files to and from the
execution machine? - Program files
- Input and output data
- Log and error data
- How do we execute and manage the jobs?
- Uncontrolled (start process manually)
- Controlled (use management software for cluster
or grid systems)
5Example Application Image Rendering with POV-Ray
- POV-Ray, the Persistence of Vision Raytracer, is
a ray tracing program that can render a 3D scene
from a scene description file written in the
scene description language (SDL). - Problem
- Rendering of a scene requires much processing
power and might take hours or days - Solution
- POV-Ray allows to render only parts of a scene
and to store them in PPM format - POV-Ray allows to compose a scene out of the
independently rendered scene parts
6Example Application Image Rendering with POV-Ray
povray FP Irenderfile.pov
Opart1.ppm W1024 H768
SR1 ER96
- Image rendering with POV-Ray is a good example
for the distributed execution of an application
on a cluster or grid system!
7Distributing a Job with a Simple Shell Script
- Requirements
- Knowledge about available machines (execution
nodes) - User account on each execution node (NIS)
- Private key public key on each execution node
(SSH-KEYGEN) - Executable shell scripts
- Deployment
- Alternative 1 NFS home directory is available
on each execution node - Alternative 2 SMOUNT mount directory of
submission node on each execution node - Alternative 3 SFTP (SCP) transmit program
input data to and output data from each execution
node - Execution
- Start process on execution node via SSH
Execution Node
Execution Node
SSH
SSH
SSH
SSH
Submission Node
Execution Node
Execution Node
User Account (NIS)
Home Directory (NFS)
8Distributing the POV-Ray Job with a Simple Shell
Script
- multipovray.sh script executed on submission node
- starts render processes on execution nodes and
waits for process end - builds image from rendered parts using the
buildimage.sh script - createimagepart.sh script executed on execution
nodes renders an image part
1- Prepare image generation
4- Build image from rendered parts
ssh tb0.asg-platform.org ./createimagepart.sh
part1 W1024 H768 SR1 ER192 while -f
/tb0.asg-platform.org do sleep 1 done
Submission Node
2 Start process on execution node
(SSH)
tail --bytes17 part1.ppmgtpart_t1.ppm echo
"P6"gtheader echo "1024 768"gtgtheader echo
"255"gtgtheader cat header part_t1.ppm gt
renderedimage.ppm
3 Wait for process execution end
3 Render image part
Execution Node
touch /hostname povray FP Irenderfile.pov
O1.ppm 2 3 4 5 rm /hostname
9Problems of the Simple Shell Script Solution
- General Problems
- User requires account on each execution node
- User needs to know all available execution nodes
- Job code and job management code are mixed up
- Script can hardly be reused
- Execution Specific Problems
- Job execution is not reliable
- Capabilities of execution node are not considered
- Number of processors, processor speed, operating
system, shell, installed software, etc. - Process priority not considered
- Utilization of execution node is not considered
- Advanced Problems
- Consumption of resources can not be monitored,
metered, accounted and billed
10Lessons learned from the Simple Shell Script
Solution
- We require a better resource management that
tells us which resources are available, what
capabilities they have, and how they are
utilized! - We require a better job management that allows to
match resources required by a job with available
resources, to execute jobs reliably, to define
the order of job execution, and to cash a user
for consumed resources! - We require a Distributed Resource Management
(DRM) system that provides the desired
functionality!
11Architecture of a DRM
12Scheduling Strategies of a DRM
Submission Node
- Basic Scheduling Algorithm
- First-Come-First-Serve (FCFS)
- Queue with priority order
- Backfilling
- Allows small jobs to move ahead
- Problem Starvation
- Advanced Reservation
- Book resources in advance to run a job in the
future - Problem Gaps
- Gang Scheduling
- Schedules related threads or processes to run
simultaneously on different processors - Allows threads to communicate with each other at
the same time - Jobs are preempted and re-scheduled as a unit
Job Submission
Head Node with Batch Scheduler
D
C
B
First Job
A
Job Execution
Job Queues
Execution Nodes
Fully utilized node
Booked from 700 800 PM
13What is Condor?
- System for Distributed Resource Management (DRM)
- Manages resources (machines) and resource
requests (jobs) - System for High Throughput Computing (HTC)
- Manage and exploit unused computing resources
efficiently - Maximize the amount of accessible resources to
its users - Resources are not dedicated and not always
available as in other DRM systems - Ownership of resources is distributed among
different users
14Architecture of Condor
15Key Features of Condor
- Distributed Infrastructure
- Available resources are always known
- Job execution can be monitored and is reliable
- Declarative Job Description
- Job code and job management code are seperated
- Resource Matchmaking via Classified Advertisement
Mechanism - Resources advertise their capabilities
- Jobs describe the required and desired resources
- Universe Mechanism
- Different run-time environments for program
execution can be selected (Standard, Vanilla,
MPI, etc.)
16Key Features of Condor
- Checkpointing
- Job execution is checkpointed and jobs can be
migrated to another resource - File Transfer Mechanism
- Program code and data can automatically be
transfered to execution node - Priority Scheduling Algorithm
- Priority queue is sorted by user priority, job
priority and submission time - Starvation is prevented by giving each users the
same amount of machine allocation time over a
specified interval - Scheduling behaviour can be changed through the
use of ClassAd mechanism - DAGMan meta scheduler
- Job dependencies can be described in Directed
Acyclic Graphs - DAG can be used to describe sequential and
parallel executions
17Distributing a Job with Condor
- Requirements
- User account on each execution node (NIS)
- Machine configured as submission node
- Valid job description
- Deployment
- Alternative 1 NFS - home directory is available
on each execution node - Alternative 2 Condors file transfer mechanism
- Execution
- Execute condor_submit or condor_submit_dag to
add job to local queue
Send resource ClassAds
Execution Node
Central Node
Query resource request ClassAds
Start remote process
Send resource ClassAds
Start remote process
Execution Node
Submission Node
User Account (NIS)
Home Directory (NFS)
18Distributing the POV-Ray Job with Condor
- createimagepart job file contains job description
for image part rendering - buildimage job file contains job description for
image generation - multipovray job file contains workflow for image
generation
1- Submit Job
2- Start first job in workflow
Executable ./povray-3.6/povray Universe
vanilla Requirements (Arch "INTEL" OpSys
"LINUX") Arguments FP Irenderfile.pov
Opart1.ppm
L./povray-3.6/include/ W1024 H768 SR1
ER96 Queue
4 Start next job in Workflow
Submission Node
Central Node
3 Start image part rendering
5 Start image generation
Executable ./buildimage.sh Universe
vanilla Requirements (Arch "INTEL" OpSys
"LINUX") Queue
Execution Node
Execution Node
User Account (NIS)
Home Directory (NFS)
Job A createimageparts Job B
buildimage PARENT A CHILD B
19Problems with the Condor Solution
- User requires account on each execution node
- No central submission node
- Jobs cannot be executed if submission node is
down - No automated distribution and deployment of
software - Software required for job execution is not
deployed automatically - Interoperability with other DRMs and Grid
solutions - Standardized protocols and interfaces to access
resources and scheduler of other clusters,
supercomputers and also to provide access
(Condor-G, Glide-In, Flocking)
20Leasons learned from Condor Solution
- Condor is an excellent solution to distribute a
computationally intensive job to a pool of
available resources. But it would be nice to be
able to also access schedulers and resources of
other DRMs and to provide access to Condor
managed schedulers and resources to other DRMs. - To make a long story short, it would be nice to
have standardized protocols and interfaces that
allow to share (computing) resources across
organizational borders. This is one goal that
grid computing tries to achieve.
21What is the Globus Toolkit?
- A fundamental enabling technology for the Grid
- Allows people to share computing power,
databases, and other tools - Resources can be shared across corporate,
institutional, and geographic boundaries - Enforces local autonomy
- A software toolkit for developing grid
applications - Provides software services and libraries for
resource management (WS-GRAM), data management
(RFT GridFTP), information services (WS MDS
Index Trigger Service), security - Services, interfaces and protocols are based on
the WS-Resource Framework (WSRF) and the Open
Grid Services Architecture (OGSA) standards - Goal Achieve interoperability for distributed,
dynamic and heterogeneous environments
22Globus Toolkit Architecture
23Distributing a Job with the Globus Toolkit
- Requirements
- Valid Globus security credentials
- User account on each execution host
- Mapping from Globus credentials to local user
identity - Machine configured as submission node
- Valid job description
- Deployment
- GridFTP server on submission node and Reliable
File Transfer (RFT) service in globus grid
container - RFT service and execution nodes use shared file
system - Execution
- Create security credentials via grid-proxy-init
- Submit job via globusrun-ws to the specified
WS-GRAM service
GridFTP
Submission Node
File upload and download
Submission of WS-GRAM job description
Globus Node
Submit job to LSF via adapter
Submit job to PBS via adapter
LSF Head Node
PBS Head Node
Execution Nodes
User Credentials
User Account (NIS)
Home Directory (NFS)
24Distributing the POV-Ray Job with the Globus
Toolkit
- createimageparts.xml contains WS-GRAM job
description - First part Definition of WS-GRAM WSRF factory
endpoint
Submission Node
1- Submit job using job description
2- Factory endpoint points to WS-GRAM service on
tb1
lt?xml version"1.0" encoding"UTF-8"?gt ltjob
xmlnsgram"http//www.globus.org/namespaces/2004/
10/gram/job" xmlnswsa"http//schemas.xmlso
ap.org/ws/2004/03/addressing"gt
lt/jobgt
Globus Node 1 (tb1)
Globus Node 2 (tb2)
ltfactoryEndpointgt ltwsaAddressgthttps//tb1
.asg platform.org8443/wsrf/
services/ManagedJobFactoryServicelt/wsaA
ddressgt ltwsaReferencePropertiesgt
ltgramResourceIDgtLSFlt/gramResourceIDgt
lt/wsaReferencePropertiesgt lt/factoryEndpointgt
3- ResourceID says use scheduler on LSF cluster
PBS Head Node
LSF Head Node
25Distributing the POV-Ray Job with the Globus
Toolkit
- Second part Definition of program to be executed
lt?xml version"1.0" encoding"UTF-8"?gt ltjob
xmlnsgram"http//www.globus.org/namespaces/2004/
10/gram/job" xmlnswsa"http//schemas.xmlso
ap.org/ws/2004/03/addressing"gt
lt/jobgt
Globus Node (tb1)
LSF Head Node (tb1)
ltdirectorygtGLOBUS_USER_HOME/povray/globus/e
xec1lt/directorygt ltexecutablegt./povray-3.6/pov
raylt/executablegt ltargumentgtIrenderfile.pov
L./povray-3.6/include/ OrenderedImage.ppmlt/argum
entgt ltargumentgtFP W1024 H768 SR1 ER768
lt/argumentgt ltstderrgtmultipovray.errlt/stderrgt
ltstdingt/dev/nulllt/stdingt
ltstdoutgtmultipovray.outlt/stdoutgt
ltcountgt1lt/countgt
Execution Node (tb3)
1- Execute povray on execution node
2- Use renderfile.pov as input and store
result in renderedImage.ppm
26Distributing the POV-Ray Job with the Globus
Toolkit
- Third part File staging statements
lt?xml version"1.0" encoding"UTF-8"?gt ltjob
xmlnsgram"http//www.globus.org/namespaces/2004/
10/gram/job" xmlnswsa"http//schemas.xmlso
ap.org/ws/2004/03/addressing"gt
lt/jobgt
Submission Node
GridFTP
ltfileStageIngt lttransfergt
ltsourceUrlgtgsiftp//tb1.asg-platform.org2811/povr
ay/globus/renderfile.povlt/sourceUrlgt
ltdestinationUrlgtfile///GLOBUS_USER_HOME/povray
/globus/exec1/renderfile.povlt/destinationUrlgt
lt/transfergt lt/fileStageIngt
ltfileStageIngt lttransfergt
ltsourceUrlgtgsiftp//tb1.asg-platform.org2811/povr
ay/globus/povray-3.6lt/sourceUrlgt
ltdestinationUrlgtfile///GLOBUS_USER_HOME/povray
/globus/exec1/povray-3.6lt/destinationUrlgt
lt/transfergt lt/fileStageIn ltfileStageOutgt
lttransfergt ltsourceUrlgtfile///GL
OBUS_USER_HOME/povray
/globus/exec1/renderedImage.ppmlt/sourceUrlgt
ltdestinationUrlgtgsiftp//tb1.asg-platform.or
g2811/povray/globus/outputlt/destinationUrlgt
lt/transfergt lt/fileStageOutgt
1- Download renderfile.pov and povray-3.6
folder
2- Upload renderedImage.ppm
File transferdefinition
Globus Node (tb1)
27Problems with the Globus Solution
- User account on each execution node required plus
valid security credentials plus mapping from
credentials to local user identity - Additional infrastructure and adapters required
- Additional overhead through Web services and data
exchange via XML - Not everything (services, interfaces, protocols)
standardized yet - Matchmaking
- Automated deployment of software
- Workflow
- etc.
28Lessons learned from Globus Grid Solution
- The Globus Grid Toolkit is not the Holy Grail for
solving interoperability problems in distributed,
dynamic and heterogeneous environments. - The advantage of standardized interfaces,
services, protocols and job descriptions comes
along with possibly less accessible features and
more administrative overhead for homogenizing the
heterogeneous infrastructure.
29Questions?