The Kangaroo Approach to Data Movement on the Grid - PowerPoint PPT Presentation

About This Presentation
Title:

The Kangaroo Approach to Data Movement on the Grid

Description:

The Kangaroo Approach to Data Movement on the Grid Douglas Thain, Jim Basney, Se-Chang Son, and Miron Livny Condor Project University of Wisconsin – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 32
Provided by: Dougla215
Learn more at: https://www3.nd.edu
Category:

less

Transcript and Presenter's Notes

Title: The Kangaroo Approach to Data Movement on the Grid


1
The Kangaroo Approach to Data Movementon the
Grid
  • Douglas Thain, Jim Basney,
  • Se-Chang Son, and Miron Livny
  • Condor Project
  • University of Wisconsin

2
  • The Grid is
  • BYOFS.

3
Bring Your Own File SystemYou cant depend on
the host.
  • Problems of configuration
  • Execution sites do not necessarily have a
    distributed file system, or even a userid for
    you.
  • Problems of correctness
  • Networks go down, servers crash, disks fill,
    users forget to start servers
  • Problems of performance
  • Bandwidth and latency may fluctuate due to
    competition with other users, both local and
    remote.

4
Applications are Not Preparedto Handle These
Errors
  • Examples
  • open(input) -gt connection refused
  • write(file,buffer,length) -gt wait ten minutes
  • close(file) -gt couldnt write data
  • Applications respond by dumping core, exiting, or
    producing incorrect results or just by running
    slowly.
  • Users respond with

5
FocusHalf-Interactive Jobs
  • Users want to submit batch jobs to the Grid, but
    still be able to monitor the output
    interactively.
  • But, network failures are expected as a matter of
    course, so keeping the job running takes priority
    over getting output.
  • Examples
  • Simulation of high-energy collider events. (CMS)
  • Simulation of molecular structures. (Gaussian)
  • Rendering of animated images. (Maya)

App
Unreliable Network
6
The Kangaroo Approach To Data Movement
  • Make a third party responsible for executing each
    applications I/O operations.
  • Never return an error to the application.
  • (Maybe tell the user or scheduler.)
  • Use all available resources to hide latencies.
  • Benefit Higher throughput, fault tolerance.
  • Cost Weaker consistency.

7
Philosophical Musings
  • Two problems, one solution
  • Hiding errors Retry, report the error to a third
    party, and use another resource to satisfy the
    request.
  • Hiding latencies Use another resource to satisfy
    the request in the background, but if an error
    occurs, there is no channel to report it.

8
This is an Old Problem
Weak consistency guarantees. Scheduler chooses
when and where.
Interface is a file system.
Disk
App
RAM Buffer
Disk
Disk
Application can request consistency
operations Is it done? Wait until done.
Data Mover Process
Relieves the application of the responsibility of
collecting, scheduling, and retrying operations.
9
Apply it to a New World
App
Provides weak consistency guarantees. Moves data
according to network, buffer and target
availability.
File System Interface
RAM Buffer
RAM Buffer
Data Mover Process
Accepts the responsibility of moving data. App
should never receive errors.
Disk
Disk
Disk
Disk
Disk
Disk
10
Our Vision A Grid
K
K
K
Data Movement System
K
K
K
K
Disk
11
Reality CheckAre we on the right track?
  • David Turek on Reliability
  • Be less like a lizard, and more like a human.
  • (Be self repairing.)
  • Peter Nugent on Weak Consistency
  • Datasets are written once. Recollection or
    recomputation results in a new file.
  • (No read/write or write/write issues.)
  • Miron Livny on the Grid Environment
  • The grid is constantly changing. Networks go up,
    and down, and machines come and go. Software
    must be agile.

12
Introducing Kangaroo
13
Kangaroo Prototype
  • We have built a first-try Kangaroo that validates
    the central ideas of hiding errors and latencies.
  • Emphasis on high-level reliability and
    throughput, not on low-level optimizations.
  • First, work to improve writes, but leave room in
    the design to improve reads.

14
Kangaroo Prototype
An application may contact any node in the system
and perform partial-file reads and writes.
The node may then execute or buffer operations as
conditions warrant.
K
K
K
App
Disk
15
The Kangaroo Protocol
  • Simple, easy to implement.
  • Same protocol is used between all participants.
  • Client -gt Server
  • Server -gt Server
  • Can be thought of as an indirect NFS.
  • Idempotent operations on a (host,file) name.
  • Servers need not remember state of clients.

16
The Kangaroo Protocol
Get( host, file, offset, length, data ) -gt
returns success/failure data Put( host, file,
offset, length, data ) -gt no response Commit()
-gt returns success/failure Push( host, file )
-gt returns success/failure
17
The Kangaroo Protocol
  • Writes do not return a result!
  • Why? A grid application has no reasonable
    response to possible errors
  • Connection lost
  • Out of space
  • Permission denied
  • The Kangaroo server becomes responsible for
    trying and retrying the write, whether it is an
    intermediate or ultimate destination.
  • If there is a brief resource shortage, the server
    may simply pause the incoming stream.
  • If there is a catastrophic error, the server may
    drop the connection -- the caller must roll back.

18
The Kangaroo Protocol
  • Two consistency operations
  • Commit
  • Block until all writes have been safely recorded
    in some stable storage.
  • App must do this before it exits.
  • Push
  • Block until all writes are delivered to their
    ultimate destinations.
  • App may do this to externally synchronize.
  • User may do this to discover if data movement is
    done.
  • Consistency guarantees
  • The end result is the same as an interactive
    system.

19
User Interface
  • Although applications could write to the Kangaroo
    interface, we dont expect or require this.
  • An interposition agent is responsible for
    converting POSIX operations into the Kangaroo
    protocol.

K
App
Agent
Kangaroo
POSIX
20
User Interface
  • Interposition agent built with Bypass.
  • A tool for trapping UNIX I/O operations and
    routing them through new code.
  • Works on any dynamically-linked, unmodified,
    unprivileged program.
  • Examples
  • vi /kangaroo/coral.cs.wisc.edu/etc/hosts
  • gcc /gsiftp/server/input.c -o
  • /kangaroo/server/output.exe

21
Performance Evaluation
  • Not a full-fledged file-system evaluation.
  • A proof-of-concept that shows latencies and
    errors can be hidden correctly.
  • Preview of results
  • As a data-copier, Kangaroo is reasonable.
  • Real benefit comes from the ability to overlap
    I/O and CPU.

22
MicrobenchmarkFile Transfer
  • Create a large output file at the execution site,
    and send it to a storage site.
  • Ideal conditions No competition for cpu,
    network, or disk bandwidth.
  • Three methods
  • Stream output directly to target. (Online)
  • Stage output to disk. (Offline)
  • Kangaroo

23
(No Transcript)
24
MacrobenchmarkImage Processing
  • Post-processing of satellite image data Need to
    compute various enhancements and produce output
    for each.
  • Read input image
  • For I1 to N
  • Compute transformation of image
  • Write output image
  • Example
  • Image size about 5 MB
  • Compute time about 6 sec
  • IO-cpu ratio about 0.9 MB/s

25

26
I/O Models Compared
CPU Released
Task Done
Offline I/O
OUTPUT
OUTPUT
CPU
OUTPUT
INPUT
OUTPUT
CPU
CPU
CPU
CPU Released
Online I/O
OUTPUT
OUTPUT
CPU
OUTPUT
INPUT
OUTPUT
CPU
CPU
CPU
CPU Released
Task Done
Task Done
Kangaroo
CPU
INPUT
CPU
CPU
CPU
PUSH
OUTPUT
OUTPUT
OUTPUT
OUTPUT
27
Summary of Results
  • At the micro level, our prototype provides
    reliability with reasonable performance.
  • At the macro level, I/O overlap gives reliability
    and speedups (for some applications.)
  • Kangaroo allows the application to survive on its
    real I/O needs .91 MB/s. Without it, there is
    false pressure to provide fast networks.

28
Research Problems
  • Commit means make data safe somewhere.
  • Greedy approach Commit all dirty data here.
  • Lazy approach Commit nothing until final
    delivery.
  • Solution must be somewhere in between.
  • Disk as Buffer, not as File System
  • Existing buffer impl is clumsy and inefficient.
    Need to optimize for 1-write, 1-read, 1-delete.
  • Fine-Grained Scheduling
  • Reads should have priority over writes. This is
    easy at one node, but multiple nodes?

29
Related Work
  • Some items neglected from the paper
  • HPSS data movement - Move data from RAM -gt disk
    -gt tape
  • Internet Backplane Protocol (IBP)
  • Passive storage building block.
  • Kangaroo could use IBP as underlying storage.
  • PUNCH virtual file system
  • Uses NFS as data protocol.
  • Uses indirection for implicit naming.

30
Conclusion
  • The Grid is BYOFS.
  • Error hiding and latency hiding are tightly-knit
    problems.
  • The solution to both is to make a third party
    responsible for I/O execution.
  • The benefits of high-level overlap can outweigh
    any low-level inefficienies.

31
Contact Us
  • Douglas Thain
  • Thain_at_cs.wisc.edu
  • Miron Livny
  • Miron_at_cs.wisc.edu
  • Kangaroo software, papers, and more
  • http//www.cs.wisc.edu/condor/kangaroo
  • Condor in general
  • http//www.cs.wisc.edu/condor
  • Questions now?
Write a Comment
User Comments (0)
About PowerShow.com