Overview - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Overview

Description:

Brook Gore, Senior Fellow, Micron Technology ... John Patrick, IBM's vice-president for Internet strategies says, 'the next big ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 31
Provided by: New9154
Category:
Tags: overview | patrick

less

Transcript and Presenter's Notes

Title: Overview


1
Overview
  • Grid Computing What is it?
  • Condor What is it?
  • The Condor Project at Boise State University
    College of Engineering
  • Guidelines for Implementing a Grid Project

2
Acknowledgements
  • Brook Gore, Senior Fellow, Micron Technology
  • Elisa BarneySmith Associate Professor ECE,
    College of Engineering,Boise State University
  • Lynn Russell, Former Dean of the College of
    Engineering, Boise State University

3
What is Grid Computing?
  • Grid Computing is both batch processing and
    service delivery through distributed networked
    computers.
  • It can be dedicated resource or opportunistic use
    of available computer cycles (scavenging).
  • Grid computing can be thought of as distributed,
    large-scale cluster computing.
  • Sometimes it is characterized as a form of
    networked disturbed parallel computing.

4
Why Should We Be Interested in Grid Computing
  • John Patrick, IBM's vice-president for Internet
    strategies says, the next big thing will be grid
    computing."
  • Grid computing is very cost-effective when unused
    computer cycles are used.
  • Grid computing opens up solutions to problems
    that can't be approached without an enormous
    amount of computing power.

5
What is Condor?
  • Condor converts a collection of distributed
    computers into a high-throughput computing
    facility.
  • Condor provides
  • queuing mechanism
  • scheduling policy
  • priority scheme
  • resource classification
  • client services

6
What is Condor (Continued)
  • Condor is supported on many platforms
  • MacOSX
  • AIX
  • Sun
  • HPUX
  • Irix
  • Linux
  • Windows NT, 2000 XP

7
Condor Components
  • The Central Manager
  • 1. Oversees all resources of the pool.
  • 2. Schedules jobs.
  • 3. Queue management.
  • The pool machines or clients can be configured in
    one of three ways
  • to only run jobs.
  • to only submit jobs.
  • to both run and submit jobs.

8
A typical Condor Pool
Central Manager
Monitors status of execute hosts and assigns jobs
to them
Matches jobs from submit hosts to appropriate
execute hosts
These machines are both submit and execute hosts
Execute hosts
Submit hosts
Checkpoint files from jobs that checkpoint are
stored on checkpoint server
9
Central Manager
Execute Host tells Central Manager about itself.
Central Manager tells it when to accept a job
from Submit Host.
Submit Host tells Central Manager about a job.
Central Manager tells it to which Execute Host it
should send job to.
Condor daemons (Normally listen on ports 9614 and
9618)
ClassAds are passed by the execute host and
asked for by the submit host
Execute Host
Submit Host
Send job to Execute Host. Send results to Submit
Host.
Condor daemons
Condor daemons
Spawns job and signals it when to abort, suspend,
or checkpoint.
condor_shadow process
Users job
Users executable code
10
The ClassAd Mechanism
  • Each execute host in the Pool reports their
    ClassAds to the Condor Master.
  • ClassAds are configurable to certain degree.
  • Each job that is submitted has a list of ClassAd
    requirements.
  • For a Job to run the requirement ClassAds must
    match a reported ClassAd in the Condor Masters
    list.

11
What is ClassAd Matchmaking?
  • Condor uses ClassAd Matchmaking to make sure that
    work gets done within the constraints of both
    users and owners of the workstations.
  • Users (jobs) have constraints
  • I need an Opteron with 512 MB RAM
  • Owners (machines) have constraints
  • Only run jobs when I am away from my desk and
    never run jobs owned by Bob.

12
ClassAds
  • MyType "Machine"
  • TargetType "Job"
  • Name "MEC408-15"
  • Machine "MEC408-15"
  • Rank 0.000000
  • CpuBusy ((LoadAvg - CondorLoadAvg) gt 0.500000)
  • VirtualMachineID 1
  • VirtualMemory 754084
  • Disk 65736860
  • CondorLoadAvg 0.000000
  • LoadAvg 0.020000
  • KeyboardIdle 0
  • ConsoleIdle 0
  • Memory 384
  • Cpus 1
  • StartdIpAddr "lt132.178.151.181029gt"
  • Arch "INTEL"
  • OpSys "WINNT51"

13
The Condor Job Format
  • A minimal Condor Job is made up of
  • 1) a submit file
  • 2) an executable file or a batch file
  • Additional data files can be part of a more
    complicated job stream.

14
A Simple Condor Submit File, printname.sub
  • universe vanilla
  • requirements (OpSys "WINNT51" )
  • executable printname.bat
  • output printname.out
  • error printname.err
  • log printname.log
  • queue

15
A Simple Condor Job, printname.bat
  • echo Howdy!
  • echo Output from "net name"
  • net name
  • echo That's all folk

16
Output from the Simple Job
  • Howdy!
  • Output from "net name"
  • Name
  • -----------------------------------
  • BSU200108
  • The command completed successfully.
  • That's all folk

17
What Can Happen During a Job
  • A job can fail to be matched through ClassAds to
    an execute host and remain idle in the queue.
  • The job can run to completion and data is
    returned to the submitting host.
  • If there is keyboard or mouse activity on the
    execute host, one of three things will happen to
    the job
  • 1) terminated and requeued
  • 2) suspended and held in memory
  • 3) continue in the background

18
Basic Condor Commands
  • All commands run from the command prompt in
    Windows, Unix ,Linux etc.
  • condor_status reports on the status of the pool
  • condor_submit ltfile.subgt submits a job to the
    pool
  • condor_q reports on queued or running jobs
  • condor_rm ltjob numbergt removes a queued or
    running job.
  • condor_q -analyze tells you the ClassAd
    requirements of a queued job.

19
condor_status
  • amcdonal_at_coengrid amcdonal condor_status
  • Name OpSys Arch State
    Activity LoadAv Mem
  • Coengrid LINUX INTEL Owner
    Idle 0.380 501
  • BSU101190 WINNT50 INTEL Unclaimed Idle
    0.000 384
  • MEC202P-03 WINNT50 INTEL Unclaimed Idle
    2.180 384
  • MEC202P-05 WINNT50 INTEL Unclaimed Idle
    2.030 384
  • raidman WINNT50 INTEL Unclaimed
    Idle 0.000 312
  • BSU101194 WINNT51 INTEL Unclaimed Idle
    0.010 384
  • BSU104889 WINNT51 INTEL Unclaimed Idle
    0.010 1024
  • BSU200108 WINNT51 INTEL Unclaimed Idle
    0.000 256
  • ET238-BSU WINNT51 INTEL Unclaimed Idle
    0.010 256

20
condor_submit
  • amcdonal_at_coen condor_submit printname.sub
  • Submitting job(s)...
  • Logging submit event(s)...
  • 1 job(s) submitted to cluster 194.0

21
condor_q command
  • amcdonal_at_coen condor_q
  • -- Submitter coengrid.boisestate.edu
    lt132.178.144.7632773gt coengrid.boisestate.edu
  • ID OWNER SUBMITTED RUN_TIME
    ST PRI SIZE CMD
  • 96.0 jjensen 3/23 1348 17232848
    R 0 13.8 gaussa_X.bat 0
  • 96.1 jjensen 3/23 1348 17084146
    R 0 13.8 gaussa_X.bat 1
  • 96.2 jjensen 3/23 1348 17034933
    R 0 13.8 gaussa_X.bat 2
  • 96.3 jjensen 3/23 1348 17044034
    R 0 13.8 gaussa_X.bat 3
  • 194.0 amcdonal 4/11 1908 0000001
    I 0 0.0 printname.bat 60

22
condor_q -analyze
  • amcdonal_at_coen condor_q -analyze 194.0
  • -- Submitter coengrid.boisestate.edu
    lt132.178.144.7632773gt coengrid.boisestate.edu
  • ID OWNER SUBMITTED RUN_TIME
    ST PRI SIZE CMD
  • 194.000 Run analysis summary. Of 87 machines,
  • 86 are rejected by your job's requirements
  • 1 reject your job because of their own
    requirements
  • 0 match, but are serving users with a
    better priority in the pool
  • 0 match, but prefer another specific job
    despite its worse user-priority
  • 0 match, but will not currently preempt
    their existing job
  • 0 are available to run your job
  • No successful match recorded.
  • Last failed match Mon Apr 11 190804
    2005
  • Reason for last match failure no match
    found

23
condor_rm
  • amcdonal_at_coen condor_rm 194.0
  • Cluster 194 has been marked for removal.

24
The multiple job facilities of Condor
  • The previous job was a single batch jobs.
  • The power of Condor is in the multiple job
    facilities.
  • These can be simultaneous submissions of similar
    jobs.
  • It can also be parallel processing of a single
    job that has been decomposed into smaller parts.
  • Lets look at parallel processing of a single job.

25
A Parallel Job
  • Dr Elisa BarneySmith in the Department of ECE at
    Boise State University has a large, easily
    decomposable problem that must be run on 20 to 30
    cases.
  • This is ideal for Condor.
  • Prior to the development of the Condor Grid she
    manually distributed her work on all the
    available machines she could find.
  • With the Condor Grid, she is able to submit one
    job that will break each case into 178 parts and
    run them separately.
  • At the end the results are assembled
    programmatically.

26
Submit File Constructs
  • There are several important constructs in submit
    files for parallel processing.
  • The queue command allows multiple jobs to be run.
  • The (process) variable increments with each
    queuing of a job stream.
  • The argument command allows specific instructions
    to be passed to each queued job.

27
The Parallel Job Condor Submit File
  • universe vanilla
  • executable sqr_gaussa_W.bat
  • arguments (Process)
  • transfer_input_files sqr_gaussa_W.m,
    sqr_gaussa_W.txt
  • requirements (OpSys "Linux" )
  • queue 178

28
The sqr_gaussa_W.bat file
  • !/bin/csh -v
  • setenv PATH /usr/coen/matlab/binPATH
  • setenv LM_LICENSE_FILE /usr/coen/matlab/etc/licens
    e.dat
  • setenv PROCESS_NUM "1"
  • matlab -nosplash -r sqr_gaussa_W -logfile 1.log

29
The Parallel Job Matlab Code
  • agetenv('PROCESS_NUM')
  • processstr2num(a)
  • offset 256process
  • char,psf_type,psftextread(datafile,offset)

30
Resources for Starting A Grid Project
  • http//www.cs.wisc.edu/condor
  • 1) Condor download site.
  • 2) Tools for managing and using Condor.
  • 3) Technical support forum.
  • 4) Documentation
  • http//www-128.ibm.com/developerworks/library/gr-d
    esign.html
  • 1) Guidelines for evaluating the suitability of
    a project for a grid.
  • 2) 32 design elements are considered.
  • http//www-128.ibm.com/developerworks/library/gr-e
    nable.html
  • 1) Contains six strategies for grid application
    enablement.
  • 2) Maps progressive steps of grid application
    development.
Write a Comment
User Comments (0)
About PowerShow.com