The CrossBroker Resource Management for the Grid - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

The CrossBroker Resource Management for the Grid

Description:

The CrossBroker Resource Management for the Grid. Enol Fern ndez ... Startup-time. Reduction. Only one layer involved. Priority adjustment. 50 s. 200 s. 19 ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 28
Provided by: Ren195
Category:

less

Transcript and Presenter's Notes

Title: The CrossBroker Resource Management for the Grid


1
The CrossBroker Resource Management for the Grid
  • Enol Fernández
  • Computer Architecture Operating Systems
    Department
  • Universitat Autònoma de Barcelona

European Condor Week Barcelona 2008
2
Outline
  • Introduction
  • The CrossBroker GRMS
  • Interactive Job Support
  • Parallel Job Support
  • Conclusions

3
Batch Execution on Grids
Internet
REMOTE SITE
REMOTE SITE
4
Parallel Interactive Job Execution
  • Use of resources from different sites
  • Resource-sets search
  • Co-allocation synchronization
  • Fast start-up
  • Execution in high-occupancy situations

Internet
REMOTE SITE
REMOTE SITE
MPI
5
CrossBroker
  • Automatic job management on Grid Environments
  • Search and selection of available resources, job
    conditioning, job launching, job monitoring, job
    retry (in case of failures) and results
    retrieval.
  • Sequential and parallel applications.
  • Support for interactive and batch execution modes
  • Best effort approach to deal with
    failures/problems

6
CrossBroker - Architecture
Information Index
User Interface
Scheduling Agent
Resource Searcher
CrossBroker
Replica Manager
Application Launcher
DAGMan
Condor-G
EGEE/Globus
EGEE/Globus
LRMS
LRMS
Computing Element
Computing Element
WN
WN
7
CrossBroker - Architecture
  • Scheduling Agent
  • Receives each job and keeps it in a persistent
    queue
  • Contacts Resource Searcher and gets a list of
    available resources
  • Selects resources and passes them to the
    Application Launcher
  • Resource Searcher
  • Given a job description, performs the matchmaking
    between job needs and available resources.
  • Uses the Condor ClassAd library, originally
    designed for matches of a single job with a
    single resource.
  • A set matching has been developed to support
    matches of a single job to a group of resources.

8
CrossBroker Job Execution
Resource Level
Grid Level
Application Launcher
Computing Element
Condor-G
LRMS
Synchronization
Job Starter
Users Application
Interactive Shadow
Interactive Agent
9
CrossBroker Job Execution
  • Application Launcher
  • Responsible for providing a reliable submission
    service of parallel applications on the Grid.
  • Handle the synchronization and monitor
    application
  • Uses services of Condor-G
  • Job Starter
  • Initiate applications at the Worker Nodes
  • Responsible for file staging at the remote site
    (executable and input/output files)
  • Handle details of LRMS and parallel communication
    libraries
  • Interactive Agent
  • Create interactive sessions between application
    and user
  • Split execution Shadow and Agent

10
Job Description Language
  • Text file using extended version of JDL (Job
    Description Language)

VirtualOrganisation imain" JobType
Normal" Executable
tester-app" Arguments -f 23
-d" StdOutput std.out StdError
std.err InputSandbox
"tester-app, data" OutputSandbox
output-data, std.out Rank
other.GlueHostBenchmarkSI00 Requirements
other.GlueCEStateStatus"Production"
11
Interactive Job Support
  • Job Description Language file
  • INTERACTIVE true/false. Indicates that the job
    is interactive and the broker should treat it
    with higher proirity
  • INTERACTIVEAGENT
  • INTERACTIVEAGENTARGUMENTS
  • These attributes specify the command (and its
    arguments) used to communicate with the user.

12
Interactive Job Support
  • Type "Job"
  • VirtualOrganisation "imain"
  • JobType Normal"
  • Interactive TRUE
  • InteractiveAgent glogin
  • InteractiveAgentArguments -r p
    195.168.105.6523433
  • Executable "test-app"
  • InputSandbox "test-app", "inputfile"
  • OutputSanbox "std.out", "std.err"
  • StdErr "std.err
  • StdOutput "std.out"
  • Rank other.GlueHostBenchmarkSI00
  • Requirements
  • other.GlueCEStateStatus "Production"

13
Interactive Job Support
  • Scheduling priority
  • Interactive jobs are sent to sites with available
    machines
  • If there are not available machines, use time
    sharing
  • Support for interactivity in all kinds of jobs
  • sequential and parallel jobs
  • CrossBroker injects interactive agents that
    enable communication between user and job
  • Transparent to the user
  • Full integration with glogin gVid
  • Condor Bypass supported

14
Time sharing
  • The idea
  • Each job is encapsulated in an agent that takes
    control over the WN independently of its LRMS
  • Lightweight Virtual Machines
  • Each Worker Node is divided in 2 execution slots
  • Each VM can execute jobs independently (e.g.
    batch and interactive)
  • NOT a full virtual machine (Xen, VMWare,)
  • NO need for special priviledges in the WN

15
Time sharing
Computing Element
CrossBroker
Job
LRMS
WN
Grid Resource
16
Time sharing
Computing Element
CrossBroker
Agent
Job
Job
LRMS
WN
Slot 1
Slot 2
Grid Resource
17
Time sharing interactivity
  • Batch jobs create execution slots in WN
  • Extra Overhead due to creation of slots
  • Interactive jobs only use available resources at
    submission time
  • Free WN
  • Available slots created by Glidein
  • Priority adjustment
  • Batch job priority is decreased
  • Interactive jobs get more CPU

18
Time sharing Interactivity
Computing Element
CrossBroker
Int. Job
200 s
LRMS
Priority adjustment
50 s
WN
Slot 1
Slot 2
Batch Job
Startup-time Reduction Only one layer involved
Grid Resource
19
Parallel Job Support
  • Support for parallel jobs
  • Open MPI
  • PACX-MPI
  • MPICH-P4
  • MPICH-G2
  • Takes into account sites capabilites
  • Ability to define user Job Starters to initiate
    the parallel job
  • mpi-start is configured automatically and used by
    default.

20
Parallel Job Support
  • Job Description Language file
  • JOBTYPE
  • Normal sequential jobs, just one CPU
  • Parallel more than one CPU
  • SUBJOBTYPE
  • openmpi
  • pacx-mpi
  • mpich
  • mpich-g2
  • plain
  • JOBSTARTER (if not defined, mpi-start)
  • JOBSTARTERARGUMENTS

21
Parallel Job Support
  • Type "Job"
  • VirtualOrganisation "imain"
  • JobType "Parallel"
  • SubJobType "pacx-mpi"
  • NodeNumber 5
  • Executable "test-app"
  • Arguments "-v"
  • InputSandbox "test-app", "inputfile"
  • OutputSanbox "std.out", "std.err"
  • StdErr "std.err
  • StdOutput "std.out"
  • Rank other.GlueHostBenchmarkSI00
  • Requirements
  • other.GlueCEStateStatus "Production"

22
MPI Across Sites
Groups with 1 CEs Rank2000
aocegrid.uab.es2119/jobmanager-pbs-workq
freeCPUs 10 Groups with 2 CEs
Rank1500 zeus.cyf-kr.edu.pl2119/jobmanager
-pbs-workq freeCPUs 2
bee001.ific.uv.es2119/jobmanager-pbs-workq
freeCPUs 3 Rank1000 bee001.ific.uv.es2
119/jobmanager-pbs-workq freeCPUs 3
lngrid02.lip.pt2129/jobmanager-pbs-workq
freeCPUs 2
23
MPI Across Sites
Startup server
Cross Broker
MPI SubTask
MPI SubTask
1. Launch a PACX Startup Server
2. Submit MPI Subtasks 3. MPI-START will start
each of the Subtasks
4. Subtask notify the startup server and start
running 5. CrossBroker monitors the application
24
MPI Across Sites
  • CrossBroker searches and selects sets of
    resources for the jobs
  • There is no guarantee that all tasks of the same
    job will start at the same time
  • 1st choice select only sites with free
    resources. The job will run immediately.
    Unfortunately, free resources are not always
    available
  • 2nd choice allocate a resource temporally and
    wait until all other tasks show up. Timeshare the
    resource with a back filling policy to avoid
    resource idleness

25
Time sharing MPI Across Sites
Computing Element
CrossBroker
Job
LRMS
All tasks Ready!
WN
Slot 1
Slot 2
MPI task
Job
Priority Lowered
MPI task waiting
Back filling while the MPI waits
Grid Resource
26
Simulation of Fusion Devices
  • Simulation of 100,000 particles
  • M/W MPI application
  • worker simulate trajectories
  • master renders OpenGL video
  • glogin gvid transmit and encodes video
  • User can interact with simulation with a GUI
  • Type "Job"
  • JobType "Parallel"
  • SubJobType openmpi"
  • NodeNumber 5
  • Interactive TRUE
  • InteractiveAgent glogin
  • InteractiveAgentArguments -r p
    195.168.105.6523433
  • Executable fusion-app"
  • InputSandbox fusion-app", "inputfile"

27
Conclusions
  • CrossBroker supports both Parallel and
    Interactive jobs
  • Automatically
  • Interoperable with EGEE
  • Time sharing
  • Fast startup of jobs
  • Co-allocation without reservation or wasting
    resources
  • Used in production environments
  • Used in EU CrossGrid and int.eu.grid projects
    (12K 55K jobs per month)
  • Applications using the CrossBroker features
  • Visualization of plasma in fusion devices
  • Evolution of pollution clouds in the atmosphere
  • Ultrasound Computing Tomography Reconstruction
    of a 3D volume

28
Questions?
  • Thanks

European Condor Week Barcelona 2008
Write a Comment
User Comments (0)
About PowerShow.com