Condor, CondorG, CondorC and Stork - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Condor, CondorG, CondorC and Stork

Description:

Stork A DaP scheduler. Parrot A tool that 'speaks' a variety of distributed I/O services ... Protocol translation using Stork Disk Cache. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 35
Provided by: miro50
Category:

less

Transcript and Presenter's Notes

Title: Condor, CondorG, CondorC and Stork


1
Condor, (Condor-G),Condor-C and Stork
2
Resource Allocationvs.Work Delegation
3
(No Transcript)
4
Resource Allocation
  • A limited assignment of the ownership of a
    resource
  • Owner is charged for allocation regardless of
    actual consumption
  • Owner can allocate the resource to others
  • Owner has the right and means to revoke an
    allocation
  • Allocation is governed by an agreement between
    the consumer and the owner
  • Allocation is always a lease
  • Trees of allocations can be formed

5
  • We present some principles that we believe
    should apply in any compute resource management
    system. The first, P1, speaks to the need to
    avoid resource leaks of all kinds, as might
    result, for example, from a monitoring system
    that consumes a nontrivial number of resources.
  • P1 - It must be possible to monitor and control
    all resources consumed by a CEwhether for
    computation or management.
  • Our second principle is a corollary of P1
  • P2 - A system should incorporate circuit breakers
    to protect both the compute resource and clients.
    For example, negotiating with a CE consumes
    resources. How do we prevent an eager client from
    turning into a denial of service attack?

Ian Foster Miron Livny, "Virtualization and
Management of Compute Resources Principles and
Architecture ", A working document (February
2005)
6
Work Delegation
  • A limited assignment of the responsibility to
    perform the work
  • Delegation involves a definition of these
    responsibilities
  • Responsibilities my be further delegated
  • Delegation always consumes resources
  • Delegation is always a lease
  • Tree of delegations can be formed

7
startD
DAGMan
3
starter
schedD
1
3
Globus
4
1
2
5
3
4
6
shadow
NorduGrid
5
1
3
grid manager
4
5
6
GAHP- Globus
4
6
6
5
6
8
Some details
9
Condor-C to Condor-G
10
Condor-G to Condor-C
11
1. Glide-in
2. Submit jobs
12
Matchmaking
  • In all of these examples, Condor-C (and Condor-G)
    went to a specific remote schedD (or remote site)
  • This is not required you can do matchmaking!

13
Matchmaking with Condor-C
14
What about other types of work and Resources?
  • Make data placement jobs first class citizens
  • Manage storage space
  • Manage FTP connections
  • Bridge protocols
  • Manage network connections
  • Across private networks
  • Through firewalls
  • Through shared gateways

15
Customer requestsPlace y F(x) at L!Master
delivers.
16
Data Placement
  • Management of storage space and bulk data
    transfers play a key role in the end-to-end
    performance of an application.
  • Data Placement (DaP) operations must be treated
    as first class jobs and explicitly expressed in
    the job flow
  • Fabric must provide services to manage storage
    space and connections
  • Data Placement schedulers are needed.
  • Data Placement and computing must be coordinated
  • Smooth transition of CPU-I/O interleaving across
    software layers
  • Error handling and garbage collection

17
A simple DAG for yF(x)?L
  • Allocate (size(x)size(y)size(F)) at SE(i)
  • Move x from SE(j) to SE(i)
  • Place F on CE(k)
  • Compute F(x) at CE(k)
  • Move y to L
  • Release allocated space

Storage Element (SE) Compute Element (CE)
18
Data Placement Jobs
Computational Jobs
19
The Concept
Condor Job Queue
DaP A A.submit DaP B B.submit Job C
C.submit .. Parent A child B Parent B child
C Parent C child D, E ..
DAG specification
C
DAGMan
Stork Job Queue
C
E
20
Current Status
  • Implemented a first version of a framework that
    unifies the management of compute and data
    placement activities.
  • DaP aware Job Flow (DAGMan).
  • Stork A DaP scheduler
  • Parrot A tool that speaks a variety of
    distributed I/O services
  • NeST A portable grid enabled storage appliance
    (lot and connection management)

21
Planner
MM
SchedD
Stork
StartD
SchedD
RFT
GridFTP
22
Failure Recovery and Efficient Resource
Utilization
  • Fault tolerance
  • Just submit a bunch of data placement jobs, and
    then go away..
  • Control number of concurrent transfers from/to
    any storage system
  • Prevents overloading
  • Space allocation and De-allocations
  • Make sure space is available

23
Support for Heterogeneity
Protocol translation using Stork memory buffer.
24
Support for Heterogeneity
Protocol translation using Stork Disk Cache.
25
Flexible Job Representation and Multilevel Policy
Support
  • Type Transfer
  • Src_Url srb//ghidorac.sdsc.edu/kosart.cond
    or/x.dat
  • Dest_Url nest//turkey.cs.wisc.edu/kosart/x
    .dat
  • Max_Retry 10
  • Restart_in 2 hours

26
Real life Data Pipelines
  • Astronomy data processing pipeline
  • 3 TB (2611 x 1.1 GB files)
  • Joint work with Robert Brunner, Michael Remijan
    et al. at NCSA
  • WCER educational video pipeline
  • 6TB (13 GB files)
  • Joint work with Chris Thorn et al at WCER

27
DPOSS Data
  • Palomar-Oschin photographic plates used to map
    one half of celestial sphere
  • Each photographic plate digitized into a single
    image
  • Calibration done by software pipeline at Caltech
  • Want to run SExtractor on the images

28
NCSA Pipeline
Staging Site _at_UW
Staging Site _at_NCSA
Unitree _at_NCSA
Input Data flow
Output Data flow
Processing
Condor Pool _at_Starlight
29
NCSA Pipeline
  • Moved Processed 3 TB of DPOSS image data in
    under 6 days
  • Most powerful astronomy data processing facility!
  • Adapt for other datasets (Petabytes) Quest2,
    CARMA, NOAO, NRAO, LSST

30
WCER Pipeline
  • Need to convert DV videos to MPEG-1, MPEG-2 and
    MPEG-4
  • Each 1 hour video is 13 GB
  • Videos accessible through transana software
  • Need to stage the original and processed videos
    to SDSC

31
WCER Pipeline
  • First attempt at such large scale distributed
    video processing
  • Decoder problems with large 13 GB files
  • Uses bleeding edge technology

32
WCER Pipeline
Staging Site _at_UW
SRB Server _at_SDSC
33
Current status
  • The Stork binaries are included with the
    condor-6.7.-linux-x86-glibc23 releases.  These
    are at least compatible with RedHat9, FedoraCore
    and ScientificLinux
  • The list of supported Stork protocols is on
    http//www.cs.wisc.edu/condor/stork.
  • Stork was tested against the follwing remote
    servers GridFTP v2.x, v3.x, SRB v3.1.2, dCache
    SRM v1.5.2, FTP, HTTP, NeST, unitree diskrouter,
    Castor,
  • SRM, LBNL SRM, JLAB SRM       

34
How can we accommodatean unbounded need for
computing with an unbounded amount of
resources?
Write a Comment
User Comments (0)
About PowerShow.com