Planning on the Grid - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Planning on the Grid

Description:

Other projects: Virgo (Italy), GEO (Germany), Tama (Japan) ... Reasoning with explicit descriptions of data. More intuitive for users ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 54
Provided by: jimbl
Learn more at: https://www.isi.edu
Category:

less

Transcript and Presenter's Notes

Title: Planning on the Grid


1
Planning on the Grid
  • With slides contributed by
  • Ewa Deelman and Yolanda Gil

2
Thinking about applications of planning
  • Youve seen Planning as X,
  • X ? SAT, CSP, ILP,
  • Now Y as Planning
  • Y ? Grid/Web services
    composition,

3
Problem-solving on Grids
  • Users pool access to distributed resources
    (computers, instruments, data, ..)
  • Applications are often composed of separate
    components run at several locations
  • Grid middleware tools allow for scheduling jobs,
    resource discovery. e.g. Globus toolkit

4
The Computational Grid
  • Emerging computational and networking
    infrastructure
  • bring together compute resources, data storage
    system, instruments, human resources
  • Enable entirely new approaches to applications
    and problem solving
  • remote resources the rule, not the exception
  • can solve ever bigger problems
  • Wide-area distributed computing
  • national and international
  • Facilitate collaborative environments
  • Sharing of data which can be expensive to produce
    (experimentation/simulation)

5
Example LIGO Experiment(Laser Interferometer
Gravitational-Wave Observatory)
  • Aims to detect gravitational waves predicted
  • by theory of relativity.
  • Can be used to detect
  • binary pulsars
  • mergers of black holes
  • starquakes in neutron stars
  • Two installations in Louisiana (Livingston) and
    Washington State
  • Other projects Virgo (Italy), GEO (Germany),
    Tama (Japan)
  • Instruments are designed to measure the effect of
    gravitational waves on test masses suspended in
    vacuum.
  • Data collected during experiments is a collection
    of time series (multi-channel)
  • Analysis is performed in time and Fourier domains

6
LIGOs Pulsar Search(Laser Interferometer
Gravitational-wave Observatory)
Extract channel
Short Fourier Transform
transpose
Long time frames
30 minutes
Short time frames
Single Frame
Time-frequency Image
Extract frequency range
event DB
Construct image
Find Candidate
Store
7
Motivation Using Todays Grid
  • Users have high level requirements naturally
    stated in terms of the application domain
  • Ex Obtain frequency spectrum for signal S in
    instrument I and timeframe T
  • Users have to turn these requirements into
    executable job workflows in detailed scripts
  • Users must figure out which code generates
    desired products, which files contain it,
    physical location of the files, hosts that
    support execution given code requirements,
    availability of hosts, access policies, etc.
  • Users must query Grid middleware metadata
    catalog, replica locator, resource descriptor and
    monitoring, etc.
  • Users must oversee execution

8
Problems with todays Grid
  • Usability users must be proficient in grid
    computing
  • Complexity many interrelated choices and dead
    ends
  • Solution cost any-cost solutions are already
    hard
  • Global cost optimization necessary when
    contention
  • Reliability of execution job resubmission upon
    failure

9
Planning for workflow generation and maintenance
  • Outline
  • Formalization as a planning problem
  • Integration with the grid middleware
  • Case study planning for workflows in LIGO
  • The grid as a test bed for planning and
    scheduling research

10
(No Transcript)
11
Desiderata for workflow generator
  • Allow users to refer to data requirements by
    descriptions, not file names
  • Intuitive, requires far less input
  • Seek high quality workflows according to variable
    metric
  • Model variety of constraints declaratively
  • Data dependencies, resource constraints, user
    access rights, .

12
Planning for workflow generation and maintenance
  • Outline
  • Formalization as a planning problem
  • Integration with the grid middleware
  • Case study planning for workflows in LIGO
  • The grid as a test bed for planning and
    scheduling research

13
Planning for workflow generation
  • Application components as operators
  • Desired data as goals
  • World state includes available hosts, existing
    data products, network bandwidths,

14
Existing tools for building workflowsabstract
workflow generation
  • Chimera
  • Input-ouput transforms for files, in Virtual
    Data Language

DV third1-pulsar(a_at_input"H2_sSFT_LSC-AS-Q_714
384000_256_50_1.ilwd", b_at_output"H2_pulsa
r_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd",
t1"714384000", t2"714384255", format"ilwd",
channel"LSC-AS-Q", fcenter"50.5",
fband"0.004", instrument"H2", ra"3.123643",
de"2.56234", fderv1"0.0", fderv2"0.0",
fderv3"0.0", fderv4"0.0", fderv5"0.0")
15
Planning operator
  • (operator pulsar-search
  • (preconds
  • (
  • ( 7143800)
  • ( LSC-AS-Q)
  • ( 0.5)
  • ( 50)
  • ( 20)
  • )
  • (and
  • (created H2_sSFT_LSC-AS-Q_714384000_256_50_1.
    ilwd))
  • (effects
  • ()
  • ( (add
  • (created H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.
    004_3.ilwd))
  • )
  • ))

16
Operator with metadata parameters
  • (operator pulsar-search
  • (preconds
  • (
  • ( Number)
  • ( Channel)
  • ( Number)
  • ( Number)
  • ( Number)
  • ( File-Handle)
  • These two are parameters for the
    frequency-extract.
  • ( (and Number (get-low-freq-from-center-and
    -band

  • )))
  • ( (and Number (get-high-freq-from-center-an
    d-band

  • )))
  • )
  • (and
  • (forall ((
  • (and File-Group-Handle
  • (gen-sub-sft-range-for-pulsar-sear
    ch
  • (effects
  • ()
  • (
  • (add (created ))
  • (add (pulsar



  • ))
  • )
  • ))

17
Operator with host identified
  • (operator pulsar-search
  • (preconds
  • (( (or Condor-pool Mpi))
  • ( Number)
  • ( Channel)
  • ( Number)
  • ( Number)
  • ( Number)
  • ( File-Handle)
  • These two are parameters for the
    frequency-extract.
  • ( (and Number (get-low-freq-from-center-and
    -band

  • )))
  • ( (and Number (get-high-freq-from-center-an
    d-band

  • )))
  • ( (and Number
  • (estimate-pulsar-search-run-time


  • )))
  • )
  • (effects
  • ()
  • (
  • (add (created ))
  • (add (at ))
  • (add (pulsar



  • ))
  • )
  • ))

18
Planning for workflow generation
  • Application components as operators
  • Parameters include host plan is a concrete
    workflow
  • Desired data (in descriptive form) as goals
  • World state includes available hosts, existing
    data products, network bandwidths,

19
Operator descriptions
  • Represent applying a given component at a
    particular location with fixed parameters, inputs
    and outputs.
  • Preconditions combine
  • data dependencies derive input requirements
    from outputs
  • Task constraints e.g. component must be run on
    an MPI machine

20
Plan quality
  • Objective function may include
  • Performance expected runtime, variance
  • Reliability probability of failure, expected
    number of retries
  • Computational cost use of expensive
    resources, conformance to policies

21
Using local heuristics and global metrics
  • Need local heuristics since search space is
    intractable
  • e.g. prefer host for program with high-bandwidth
    connection to where the output is required
  • Need to test a global metric (e.g. overall
    runtime) since local heuristics can lead to
    globally poor solution
  • Create as many plans as possible, return best
  • Search control to eliminate redundant solutions

22
Example search heuristics
  • (control-rule only-transfer-from-loc-with-greatest
    -bandwidth
  • (if (and (current-ops (transfer-file))
  • (current-goal (at ))
  • (true-in-state (at ))
  • (true-in-state (at ))
  • (higher-bandwidth
    )))
  • (then reject bindings (( . ))))
  • (control-rule prefer-mpi-to-condor-for-pulsar-sear
    ch
  • (if (and (current-ops (pulsar-search))
  • (type-of Mpi)
  • (type-of Condor-pool)))
  • (then prefer bindings (( . ))
    (( . ))))

23
Planning for workflow generation and maintenance
  • Outline
  • Formalization as a planning problem
  • Integration with the grid middleware
  • The grid as a test bed for planning and
    scheduling research

24
(No Transcript)
25
Generating the planning problem
  • Currently, static file representation for
    available hosts, bandwidths
  • Query grid services prior to planning to find
    which relevant files exist
  • Future versions will make dynamic queries
  • Goal is translated from user request, plan is
    translated into DAG format suitable for grid
    scheduler.

26
LIGOs Pulsar Search at SC02
  • Used LIGOs data collected during the first
    scientific run of the instrument
  • Targeted a set of 1000 locations known pulsar or
    random locations
  • Results of the analysis published to the LIGO
    Scientific Collaboration
  • Performed using LDAS and compute and storage
    resources at Caltech, University of Southern
    California, University of Wisconsin Milwaukee.

27
Summary benefits of planning
  • Automating workflow composition
  • Just being addressed in Grid middleware
  • Reasoning with explicit descriptions of data
  • More intuitive for users
  • Far fewer inputs required than at file level
  • Better workflows by searching many plans

28
Planning for workflow generation and maintenance
  • Outline
  • Existing Grid tools for workflow generation
  • Formalization as a planning problem
  • Integration with the grid middleware
  • The grid as a test bed for planning and
    scheduling research

29
Many areas of planning research relevant for grid
  • Planning for a dynamic environment plan
    monitoring and repair, planning under uncertainty
  • Scheduling resource reasoning, temporal
    reasoning
  • Plan quality learning, acquiring preferences,
    local search planning
  • Planning for information gathering integrating
    access to grid services with workflow creation
  • Domain modeling handling multiple ontologies,
    acquiring metadata descriptions, acquiring
    operators

30
Fault-tolerant planning for a dynamic environment
  • Grid resources become unavailable, queue length
    network bandwidth change
  • Exploring plan repair strategies, balance of work
    done off-line and on-line
  • Modeling failures, keeping statistics for
    creating plans more likely to succeed,
    conditional plans, ..

31
Fault-tolerant straw men
  • Current version build fully detailed plan
    offline, resource allocation is fixed
  • Ignores world dynamics
  • Build abstract plan (without specifying hosts)
    offline, use a matchmaker online
  • Matchmaker makes local decisions only

32
Global reasoning is needed for resource
allocation
33
Approaches for fault-tolerant planning in dynamic
domains
  • RAX (Jonsson et al.) general framework. As
    implemented
  • offline builds complete plan
  • online adjusts temporal intervals
  • Combining planning and scheduling
  • offline build several abstract plans
  • online reason about critical path to
    instantiate each plan
  • MDP/POMDP approaches
  • Open area..

34
Challenge understanding when different
approaches are more important
  • Hypotheses
  • Uneven task distribution, in terms of
    computational and data expense and resource
    constraints will indicate global planning
  • Time-dependency, e.g. need to re-plan during
    execution, will indicate local planning
  • Interesting project use experiments in synthetic
    and real domains to test hypotheses and uncover
    new insights

35
Empirical tests with synthetic LIGO problems
  • Example Problem requires 100 files on one
    machine. Vary the number that exist.

36
Domain modeling
Current system
Knowledge from several sources must be used
Info from Grid services (RLS, MCS etc)
task requirements
existing data in files
State info (files, resources)
Comp. selector
User policies
Monolithic planner
available resources
KBs combined in one location
Resource selector
Resource queues
Concrete tasks
Exec. monitor
Network bandwidth
Grid task schedulers
37
Where does knowledge used by our planners come
from?
task resource requirements
user policies preferences
  • (Operator
  • (preconditions
  • ..
  • ))
  • (effects
  • ..
  • ))

resource policies
data dependencies (VDL)
Each knowledge component is used for other
purposes beyond planning
38
Automatically generated operators for several
application domains
task resource requirements
  • (Operator
  • (preconditions
  • ..
  • ))
  • (effects
  • ..
  • ))


Digital sky survey LIGO GEO Galaxy
morphology Tomography
policies
data dependencies (VDL)
Investigating patterns of data descriptions for
more efficient planning
39
  • Question if operators are gathered from
    distributed services, can we still guarantee
    soundness and completeness?
  • Under what kinds of conditions?

40
Representing appropriate information units with
metadata
  • E.g. Have 60,000 files, want to allocate 60 tasks
    each dealing with 1,000 files.
  • Previously, application components specified in
    terms of specific files
  • DV run59000-extractSFTData( input_at_inputnSFT.
    59000",,_at_inputnSFT.59999,
  • output_at_output eSFT.59000,,_at_output
    eSFT.59999,
  • t1"714384000", t2"714384063",
    freq1008,band4,instrument"H2")
  • 59 similar clauses
  • DV final-computeFStatistic( input_at_inputeSFT.
    00000,,_at_inputeSFT.59999,)

1000 files
60000 files
41
Metadata representation
  • Replace with two clauses, two input predicates
  • A predicate now represents a range of files
  • Simpler to model, greater generality, more
    efficient for reasoner
  • (operator run-extractSFTData-range
  • (preconds
  • (( Number)
  • ( (and Number (
    0)))
  • ( (and Number
  • (gen-smaller-number 1000
    ))))
  • (and (range "eSFT" 2 1
    )
  • (range "nSFT" 2 1
    999)))
  • (effects ()
  • ((add (range "eSFT" 2
    )))))

42
Requires library operators for ranges
  • E.g. if a range of files exists, then so does any
    subrange
  • Questions what are the required operators?
    Similar to spatial calculus RCC-8?
  • (operator subranges-exist
  • (preconds
  • (( Number)
  • ( Object)
  • ( (and Number (
    0)))
  • ( (and Number
    (gen-known-enclosing-begins

  • 2 1 )))
  • (
  • (and Number (gen-known-enclosing-number-of-fi
    les
  • 2 1
  • ))))
  • (created-range 2 1
    ))
  • (effects ()

43
Conclusions
  • Implemented system takes data description
    requests from LIGO users, composes workflow and
    executes on the Grid
  • Planning and scheduling technologies can make a
    large contribution to Grid infrastructure
  • Many interesting challenges for planning and
    scheduling research from Grid applications
  • http//www.isi.edu/ikcap/cognitive-grids
  • http//www.isi.edu/deelman/pegasus.htm

44
Koehler and Srivastava
  • Different approaches to specifying workflows by
    hand

45
WSDL service specification(no workflow specified)
ttp//schemas.xmlsoap.org/wsdl/" "OrderEvent" "TripRquest" "FlightRequest" "HotelRequest" "BookingFailure" "pt1" "TripRequest"/ e name "pt2" message "HotelRequest"/
"CIToFS" eration ... "pt9" message "BookingFailure/
46
BPEL4WS
"pt1" operation "CToCI" container
"OrderEvent" "HotelService" portType "pt2" operation
"CIToHS" inputContainer "HotelRequest" "pt3" operation "CIToFS" inputContainer
"FlightRequest"
47
Golog
48
Back-up slides
49
What is Needed
  • We need alternative foundations that offer
  • expressive representations
  • flexible reasoners
  • Many Artificial Intelligence (AI) techniques are
    relevant
  • Planning to achieve given requirements
  • Searching through problem spaces of related
    choices
  • Using and combining heuristics
  • Expressive knowledge representation languages
  • Reasoners that can incorporate rules,
    definitions, axioms, etc.
  • Schedulers and resource allocation techniques

50
Existing tools for building workflowsabstract
workflow generation
  • Chimera
  • Input-ouput transforms at level of actual files,
    in Virtual Data Language

DV first1-createSFT( b_at_output"H2_SFT_LSC-AS-Q_
714384000_64.gwf", t1"714384000",
t2"714384063", format"frame",
channel"H2LSC-AS-Q", instrument"H2") DV
first2-createSFT( b_at_output"H2_SFT_LSC-AS-Q_714
384064_64.gwf", t1"714384064",
t2"714384127", format"frame",
channel"H2LSC-AS-Q", instrument"H2")
DV third1-pulsar(a_at_input"H2_sSFT_LSC-AS-Q_7143
84000_256_50_1.ilwd", b_at_output"H2_pulsar
_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_2.562
34.ilwd", t1"714384000", t2"714384255",
format"ilwd", channel"LSC-AS-Q",
fcenter"50.5", fband"0.004", instrument"H2",
ra"3.123643", de"2.56234", fderv1"0.0",
fderv2"0.0", fderv3"0.0", fderv4"0.0",
fderv5"0.0")
51
Existing tools for building workflowsabstract
workflow generation
  • Chimera
  • Input-ouput transforms for files, in Virtual
    Data Language

DV first1-createSFT( b_at_output"H2_SFT_LSC-AS-Q_
714384000_64.gwf", t1"714384000",
t2"714384063", format"frame",
channel"H2LSC-AS-Q", instrument"H2") DV
first2-createSFT( b_at_output"H2_SFT_LSC-AS-Q_714
384064_64.gwf", t1"714384064",
t2"714384127", format"frame",
channel"H2LSC-AS-Q", instrument"H2")
DV third1-pulsar(a_at_input"H2_sSFT_LSC-AS-Q_7143
84000_256_50_1.ilwd", b_at_output"H2_pulsar
_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_2.562
34.ilwd", t1"714384000", t2"714384255",
format"ilwd", channel"LSC-AS-Q",
fcenter"50.5", fband"0.004", instrument"H2",
ra"3.123643", de"2.56234", fderv1"0.0",
fderv2"0.0", fderv3"0.0", fderv4"0.0",
fderv5"0.0")
52
Existing tools 2 concrete planner
  • Assigns specific hosts and data locations for
    tasks
  • Makes random selection of resources and data
  • Provided a feasible solution
  • Reused existing data products

INPUT
OUTPUT
53
Sample Pulsar Search Results to Date
  • SC 2002 run
  • Over 58 pulsar searches
  • Total of
  • 330 tasks
  • 469 data transfers
  • 330 output files produced.
  • The total runtime was 112435.
  • To date
  • 185 pulsar searches
  • Total of
  • 975 tasks
  • 1365 data transfers
  • 975 output files
  • Total runtime
  • 964947
Write a Comment
User Comments (0)
About PowerShow.com