Asynchronous Remote Execution - PowerPoint PPT Presentation

About This Presentation
Title:

Asynchronous Remote Execution

Description:

The Distributed Buffer Cache. The Kangaroo distributed buffer cache introduces asynchronous ... UNIX buffer cache. Imprecise Exceptions. Human Closure: Coda ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 76
Provided by: dougla9
Learn more at: https://www3.nd.edu
Category:

less

Transcript and Presenter's Notes

Title: Asynchronous Remote Execution


1
AsynchronousRemote Execution
  • PhD Preliminary Examination
  • Douglas Thain
  • University of Wisconsin
  • 19 March 2002

2
Thesis
  • Asynchronous operations improve the throughput,
    resiliency, and scalability of remote execution
    systems.
  • However, asynchrony introduces new failure modes
    that must be carefully understood in order to
    preserve the illusion of synchronous operation.

3
Proposal
  • I propose to explore the coupling between
    asynchrony, failures, and performance in remote
    execution.
  • To accomplish this, I will modify an existing
    system and increase the available asynchrony in
    degrees.

4
Contributions
  • A measurement of the performance benefits through
    asynchrony.
  • A system design that accommodates asynchrony
    while tolerating a significant set of expected
    failure modes.
  • An exploration of the balance between
    performance, risk, and knowledge in a distributed
    system.

5
Outline
  • Introduction
  • A Case Study
  • Remote Execution
  • Related Work
  • Progress Made
  • Research Agenda
  • Conclusion

6
Science NeedsLarge-Scale Computing
  • Theory, Experiment, Computation
  • Nearly every field of scientific study has a
    grand challenge problem
  • Meteorology
  • Genetics
  • Astronomy
  • Physics

7
The Grid Vision
Security Services
Tape Archive
Disk Archive
Disk Archive
Disk Archive
Disk Archive
8
The Grid Reality
  • Systems for managing CPUs
  • Condor, LSF, PBS
  • Programming Interfaces
  • POSIX, Java, C, MPI, PVM....
  • Systems for managing data
  • SRB, HRM, SAM, ReqEx (DapMan)
  • Systems for storing data
  • NeST, IBP
  • Systems for moving data
  • GridFTP, HTTP, Kangaroo
  • Systems for remote authentication
  • SSL, GSI, Kerberos, NTSSPI

9
The Grid Reality
  • Host uptime
  • Median 15.92 days
  • Mean 5.53 days
  • Local Maximum 1 day
  • Long et al, A Longitudinal Study of Internet
    Host Reliability, Symposium on Reliable
    Distributed Systems, 1995.
  • Wide-area connectivity
  • approx 1 chance of 30-sec interruption
  • approx 0.1 chance of a persistent outage
  • Chandra et al, End-to-end WAN Service
    Availability, Proceedings of 3rd Usenix Symp on
    Internet Technologies and Systems, 2001.

10
Security Services
Disk Archive
Disk Archive
Disk Archive
Disk Archive
Job
Tape Archive
Tape Archive
Tape Archive
Tape Archive
11
Usual ApproachHold and Wait
  • Request CPU, Wait for Success
  • Stage Data to CPU, Wait
  • Move Executable to CPU, Wait
  • Execute Program, Wait
  • Missing File -gt Stage Data, Wait
  • Stage Output Data, Wait
  • Failure? Start over...

12
Synchronous Systemsare Inflexible
  • Poor utilization and throughput due to
    hold-and-wait.
  • CPU idle while disk busy.
  • Disk idle while CPU busy.
  • Disk full? CPU stops.
  • System sensitive to failures of both performance
    and correctness.
  • Network down? Everything stops.
  • Network slow? Everything slows.
  • Credentials lost? Everything aborts.

13
Resiliency Requires Flexibility
  • Most jobs have weak couplings between all of
    their components.
  • Asynchrony Time Decoupling
  • Cant have network now? Ok, use disk.
  • Cant have CPU now? Ok, checkpoint.
  • Cant store data now? Ok, recompute later.
  • Time Decoupling -gt Space Decoupling

14
Computings Central Challenge
  • How not to make a mess of it.
  • - Edsger Dijkstra, CACM March 2001.

How can we harness the advantages of asynchrony
while maintaining a coherent and reliable user
experience?
15
Outline
  • Introduction
  • A Case Study The Distributed Buffer Cache
  • Remote Execution
  • Related Work
  • Progress Made
  • Research Agenda
  • Conclusion

16
Case StudyThe Distributed Buffer Cache
  • The Kangaroo distributed buffer cache introduces
    asynchronous I/O for remote execution.
  • It offers improved job throughput and failure
    resiliency at the price of increased latency in
    I/O arrival.
  • A small mess Jobs and I/O are not recoupled at
    completion time.

17
Kangaroo Prototype
An application may contact any node in the system
and perform partial-file reads and writes.
The node may then execute or buffer operations as
conditions warrant.
K
K
K
App
Buffer
Buffer
Disk
A consistency protocol ensures no loss of data
due to crash/disconnect.
18
Distributed Buffer Cache
K
K
K
Distributed Buffer Cache
K
K
K
K
Disk
19
MacrobenchmarkImage Processing
  • Post-processing of satellite image data Need to
    compute various enhancements and produce output
    for each.
  • Read input image
  • For I1 to N
  • Compute transformation of image
  • Write output image
  • Example
  • Image size about 5 MB
  • Compute time about 6 sec
  • IO-cpu ratio .91 MB/s

20
I/O Models for Image Processing
Offline Staging I/O
OUTPUT
OUTPUT
CPU
OUTPUT
INPUT
OUTPUT
CPU
CPU
CPU
Online Streaming I/O
OUTPUT
OUTPUT
CPU
OUTPUT
INPUT
OUTPUT
CPU
CPU
CPU
Kangaroo
CPU
INPUT
CPU
CPU
CPU
OUTPUT
OUTPUT
OUTPUT
OUTPUT
21

22
A Small Mess
  • The output will make it back eventually, barring
    the removal of a disk.
  • But, what if...
  • ...we need to know when it arrives?
  • ...the data should be cancelled?
  • ...it never arrives?
  • There is a hold-and-wait operation (push,) but
    this defeats much of the purpose.
  • The job result needs to be a function of both the
    compute and data results.

23
Lesson
  • We may decouple CPU and I/O consumption for
    improved throughput.
  • But, CPU and I/O must be semantically coupled at
    both dispatch and completion in order to provide
    useful semantics.
  • Not necessary in a monolithic system
  • All components fail at once.
  • Integration of CPU and I/O management (fsync)

24
Outline
  • Introduction
  • A Case Study
  • Remote Execution
  • Synchronous Execution
  • Asynchronous Execution
  • Failures, Transparency, and Performance
  • Related Work
  • Progress Made
  • Research Agenda
  • Conclusion

25
Remote Execution
  • Remote execution is the problem of running a job
    in a distributed system.
  • A job is a request to consume a set of resources
    in a coordinated way
  • Abstract Programs, Files, Licenses, Users
  • Concrete CPUs, Storage, Servers, Terminals
  • A distributed system is a computing environment
    that is
  • Composed of autonomous units.
  • Subject to uncoordinated failure.
  • Subject to high performance variability.

26
About Jobs
  • Job policy dictates what resources are acceptable
    to consume
  • CPU must be a SPARC
  • Must have gt 128MB memory
  • Must be within 100ms of a disk server
  • CPU must be owned by a trusted authority.
  • Input data set X may come from any trusted
    replication site.

27
About Jobs
  • The components of a job have flexible temporal
    and spatial requirements.

Input Device
Output Device
CPU
Output Data
Input Data
read
write
present throughout
present at startup
interactive preferred
Program Image
License
Creds
28
Expected Jobs
  • In this work, I will concentrate on a limited
    class of jobs
  • Executable image
  • Single CPU request
  • May checkpoint/restart to manage CPU.
  • Input data (online/offline)
  • Output data (multiple targets)

29
Expected Systems
  • High latency
  • I/O operations are ms-gtsec
  • Process dispatch is seconds-gtminutes
  • Performance variation
  • TCP hiccups cause outages of seconds-gtminutes.
  • By day, network congested, by night-gtfree
  • Uncoordinated failure
  • File system fails, CPU continues to run.
  • Network fails, but endpoints continue.
  • Autonomy
  • Users reclaim workstation CPUs.
  • Best-effort storage is reclaimed.

30
Expected Users
  • A wide variety of users will have varying degrees
    of policy aggression.
  • Scientific computation
  • Maximize long-term utilization/throughput.
  • Scientific instrument
  • Minimize use of one device.
  • Disaster response
  • Compute this ASAP at any expense!
  • Graduate student
  • Finish job before mid-life crisis.

31
The Synchronous Approach
  • Grab one resource at a time as they become
    necessary and available.
  • Assume any other resources are immediately
    available online.
  • Start with the resource with the most contention.
  • Examples
  • Condor distributed batch system
  • Fermi Sequential Access Manager (SAM)

32
The Condor Approach
Job Input Needs CPU Needs
results
Online Storage
CPU
CPU
CPU
CPU
CPU
Match Maker
33
The SAM Approach
Job Input Needs CPU Needs
results
Temp Disk
CPU
CPU
CPU
CPU
CPU
Tape Archive
34
Problems
  • What if one resource is not obviously (or
    consistently) the constraint?
  • What is the expense of holding one resource idle
    while waiting for another?
  • What if no single resource is under your absolute
    control?
  • What if all your resource requirements cannot be
    stated offline?
  • How can we deal with failures without starting
    everything again from scratch?

35
Asynchronous Execution
  • Recognize when a job has loose synchronization
    requirements.
  • Seek parallelism where available.
  • Synchronize parallel activities at necessary
    joining points.
  • Allow idle resources to be released and
    re-allocated for use by others.
  • Consider failures in execution as allocation
    problems.

36
CPU Request
Output Data
write
CPU
read
Input Data
Exit Code
exit
Program Image
exec
37
The Benefits of Asynchrony
  • Better utilization of disjoint resources -gt
    higher throughput.
  • More resilient to performance variations.
  • Less expensive recovery from partial system
    failures.

38
The Price of Asynchrony
  • Complexity
  • Many new boundary cases to cover.
  • Is the complexity worth the trouble?
  • Risk
  • Without appropriate policies, we may
  • Oversubscribe (internal fragmentation)
  • Undersubscribe (external fragmentation)

39
The Problem ofClosing the Loop
Job Submission
Job Completion
40
Synchronous I/O
Program Result
CPU Busy
CPU Idle
CPU Busy
I/O Result
I/O Dispatch
I/O Busy
41
Asynchronous Open-Loop I/O
CPU Idle
Program Result
CPU Busy
CPU Busy
I/O Result?
I/O Dispatch
I/O Result
I/O Busy
I/O Validation
42
AsynchronousClosed-Loop I/O
CPU Idle
Program Result
CPU Busy
CPU Busy
Job Result
I/O Result?
I/O Dispatch
I/O Result
I/O Busy
I/O Validation
43
Outline
  • Abstract
  • Introduction
  • Remote Execution
  • Related Work
  • Progress Made
  • Research Agenda
  • Conclusion

44
Related Work
  • Many components of grid computing
  • CPUs, storage, networks...
  • Many traditional research areas
  • scheduling, file systems, virtual memory...
  • What systems seek parallelism in operations that
    would appear to be atomic?
  • What systems exchange one resource for another?
  • How do they deal with failures?

45
Computer Architecture
  • These two are remarkably similar
  • sort -n lt infile gt outfile
  • ADD r18, r212, r3
  • Each has multiple parts with a loose coupling in
    time and space.
  • idle -gt working -gt done -gt commited
  • A failure or an unsuccessful speculation must
    roll back dependent parts.

46
Trading Storagefor Communication
  • Immediate Closure
  • Synchronous I/O
  • Bounded Closure
  • GASS, AFS, transaction
  • Indeterminate Closure
  • UNIX buffer cache
  • Imprecise Exceptions
  • Human Closure
  • Coda -gt Failures invoke email

47
Trading Computationfor Communication
  • Time Warp Simulation Model
  • All nodes checkpoint frequently.
  • All messages one-way without synchro.
  • Missed a message? Roll back and send out
    anti-messages to undo earlier work.
  • Problems
  • When can I throw out a checkpoint?
  • Cant tolerate message failure.
  • Virtual Data Grid
  • Data sets have a functional specification.
  • Transfer here, or recompute?
  • Decide at run-time using cost/benefit.

48
Outline
  • Introduction
  • Case Study
  • Remote Execution
  • Related Work
  • Progress Made
  • Research Agenda
  • Conclusion

49
Progress Made
  • We have already laid much of the research
    foundation necessary to explore asynchronous
    remote execution.
  • Deployable software
  • Bypass, Kangaroo, NeST
  • Organizing Concepts
  • Distributed buffer cache
  • I/O communities
  • Error management theory

50
Interposition Agents
Application
Interposition Agent
Standard Library
Kernel
51
The Grid Console
Half Interactive Process
Unreliable Network
52
I/O Communities
Condor Pool
Condor Pool
NeST
NeST
Data
Job
53
References in ClassAds
Refers to NearestStorage.
Knows where NearestStorage is.
Job Ad
Machine Ad
Storage Ad
match
Machine
Job
NeST
54
Distributed Buffer Cache
K
K
K
Distributed Buffer Cache
K
K
K
K
Disk
55
Error Management
  • In preparation Error Scope on a Computational
    Grid Theory and Practice
  • An environment for Java in Condor.
  • How do we understand the significance of the many
    things that may go wrong?
  • Every scope must have a handler.

56
Publications
  • Douglas Thain and Miron Livny, Error Scope on a
    Computational Grid,'' in preparation.
  • Douglas Thain, John Bent, Andrea Arpaci-Dusseau,
    Remzi Arpaci-Dusseau, and Miron Livny,
    Gathering at the Well Creating Communities for
    Grid I/O,'' Proceedings of Supercomputing 2001,
    Denver, Colorado, November 2001.
  • Douglas Thain, Jim Basney, Se-Chang Son, and
    Miron Livny, The Kangaroo Approach to Data
    Movement on the Grid,'' in Proceedings of the
    Tenth IEEE Symposium on High Performance
    Distributed Computing (HPDC10), San Francisco,
    California, August 7-9, 2001, pp 325-333.
  • Douglas Thain and Miron Livny, Multiple Bypass
    Interposition Agents for Distributed Computing,''
    Journal of Cluster Computing, Volume 4, Pages
    39-47, 2001.
  • Douglas Thain and Miron Livny, Bypass A tool
    for building split execution systems'', in
    Proceedings of the Ninth IEEE Symposium in High
    Performance Distributed Computing (HPDC9),
    Pittsburgh, Pennsylvania, August 1-4, 2000, pp
    79-85.

57
Outline
  • Introduction
  • A Case Study
  • Remote Execution
  • Related Work
  • Progress Made
  • Research Agenda
  • Conclusion

58
Research Agenda
  • I propose to create an end-to-end structure for
    asynchronous remote execution.
  • To accomplish this, I will take an existing
    remote execution system, and increase the
    asynchrony by degrees.
  • The focus will be mechanisms, not policies.
  • Suggest points where policies must be attached.
  • Use simple policies to demonstrate use.
  • Mechanisms must be correct regardless of policy
    choices.

59
Research Environment
  • The Condor distributed batch system.
  • Local resources
  • Test Pool - approx 20 workstations.
  • Main Pool - approx 1000 machines.
  • Possible to deploy significant changes to all
    partcipating software.
  • Remote Resources
  • INFN Bologna - approx 300 workstations.
  • Other pools as necessary.
  • Can only deploy changes within the context of an
    interposition agent.

60
Stage OneAsynchronous Output
  • Timeline March-May 2002
  • Goal
  • Decouple CPU allocation from output data
    movement.
  • Method
  • Couple Kangaroo with Condor and close the loop.
  • Job requires a new state waiting for output.
  • Policy
  • How long should a job remain in waiting for
    output before it is re-run?

61
Stage TwoAsynchronous Input
  • Timeline June-August 2002.
  • Goal
  • Decouple CPU allocation from input data movement.
  • Method
  • Modify scheduler to be aware of I/O communities
    and seek CPU and I/O allocations independently.
  • Unexpected I/O needs may use checkpointing to
    release CPU allocations.
  • Policy
  • How long to hold idle before timeout?
  • How to estimate queueing time for each resource?

62
Stage ThreeDisconnected Operation
  • Timeline September-December 2002
  • Goal
  • Permit application to execute without any
    run-time dependence on the submitter.
  • Method
  • Release umbilical once policy is set.
  • Job requires new state presumed alive.
  • Unexpected policy needs may require reconnection.
  • Policy
  • How much autonomy may be delegated tothe
    interposition agent? (performance/control)

63
Stage FourDissertation
  • Timeline January-May 2003
  • Design
  • What algorithms and data structures are
    necessary?
  • Performance
  • What are the quantitative costs/benefits?
  • Discussion
  • What are the tradeoffs between performance, risk,
    and knowledge?
  • What are the implications of designing for
    fault-tolerance?

64
Evaluation Criteria
  • Correctness
  • The system must meet its interface obligations.
  • Reliability
  • Satisfy the user with high probability.
  • Throughput
  • Improve by avoiding hold-and-wait.
  • Latency
  • A modest increase is ok for batch workloads.
  • Knowledge
  • Has my job finished?
  • How much has it consumed?
  • Complexity

65
Contributions
  • Short-term
  • Improvement of the Condor software.
  • Not the goal, but a necessary validation
  • Medium-term
  • Serve as a design resource for grid computing.
  • Key concepts such as closing the loop and
    matching interfaces.
  • Long-term
  • Serve as a basis for further research.

66
Further Work
  • Should jobs move to data or vice versa?
  • Lets try both!
  • Many opportunities for speculation
  • Potentially stale data in file cache? Keep going.
  • Partial program completion is useful
  • DAGMan dispatch a process based on exit code,
    dispatch another based on output data.
  • What if we change the API?
  • A drastic step, but... MPI, PVM, MW, Linda.
  • Can we admit subprogram failure and maintain a
    usable interface?

67
Outline
  • Introduction
  • A Case Study
  • Remote Execution
  • Related Work
  • Progress Made
  • Research Agenda
  • Conclusion

68
Conclusion
  • Large-grained asynchrony has yet to be explored
    in the context of remote program execution.
  • Asynchrony has benefits, but requires careful
    management of failure modes.
  • This dissertation will contribute a system design
    and an exploration of performance, risk, and
    knowledge in a distributed system.

69
Extra Slides
70
Execution Site
Submission Site
starter
shadow
Secure Remote I/O
I/O Server
I/O Proxy
Local I/O (Chirp)
Fork
Local System Calls
JVM
Home File System
The Job
I/O Library
71
Explicit descriptions of ordering, reliability,
performance, and availability. (POSIX, MPI, PVM,
MW)
Running Program
Application Interface
Interposition Agent
Remote Resource Interfaces
Few guarantees on performance, availability, and
reliability.
Disk
CPU
RAM
Network
72
Supervisor Process
Running Program
  • In the event of a failure
  • Retry Hold the CPU allocation and try again.
  • Checkpoint Release the CPU, with some restart
    condition.
  • Abort Expose the failure to a supervisor.

Interposition Agent
Disk
CPU
RAM
Network
73
The Cost of Hiding Failures
  • Each technique is valid from the standpoint of
    API conformity.
  • What to use? Depends on cost
  • Retry Holds CPU idle while retrying device.
  • Checkpoint Consumes disk and network, but
    releases CPU.
  • Abort No expense up front, but must re-consume
    resources when ready to roll forward.
  • User policy is vital in determining what costs
    are acceptable for hiding failures.

74
Running Program
What file should I open? How long should I
try? May I checkpoint now? Where should I store
the checkpoint image? Should I stage or stream
the output? FYI, Ive used 395 service units
here. FYI, Im about to be evicted from this site.
Policy Director
Interposition Agent
Disk
CPU
RAM
Network
75
Disconnected Operation
  • The policy manager is also an execution resource
    that is occasionally slow or unavailable.
  • Holding a resource idle while waiting
    indefinitely for policy direction is still
    wait-while-idle.
  • A higher degree of asynchrony can be achieved
    through disconnected operation.
  • Requires each autonomous unit be given an
    allowance.
Write a Comment
User Comments (0)
About PowerShow.com