Stork: State of the Art - PowerPoint PPT Presentation

About This Presentation
Title:

Stork: State of the Art

Description:

1. Stork: State of the Art. Tevfik Kosar. Computer Sciences Department ... Allocate/de-allocate (optical) network links. Allocate/de-allocate storage space ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 26
Provided by: kun77
Category:
Tags: art | computer | history | of | optical | state | stork | the

less

Transcript and Presenter's Notes

Title: Stork: State of the Art


1
Stork State of the Art
  • Tevfik Kosar
  • Computer Sciences Department
  • University of Wisconsin-Madison
  • kosart_at_cs.wisc.edu
  • http//www.cs.wisc.edu/condor/stork

2
The Imminent Data deluge
  • Exponential growth of scientific data
  • 2000 0.5 Petabyte
  • 2005 10 Petabytes
  • 2010 100 Petabytes
  • 2015 1000 Petabytes
  • I am terrified by terabytes
  • -- Anonymous
  • I am petrified by petabytes
  • -- Jim Gray
  • Moores Law outpaced by growth of scientific
    data!

3
2-3 PB/year
11 PB/year
500 TB/year
20 TB - 1 PB/year
4
How to access and process distributed data?
TB
TB
PB
PB
5
I/O Management in the History
6
I/O Management in the History
7
I/O Management in the History
8
I/O Management in the History
DISTRIBUTED SYSTEMS LEVEL
I/O SUBSYSTEM
9
Allocate space for input output data
Allocate space for input output data
get
JOB j
put
Release input space
Release input space
Release output space
Release output space
Individual Jobs
10
Separation of Jobs
Data A A.stork Data B B.stork Job C
C.condor .. Parent A child B Parent B child
C Parent C child D, E ..
DAG specification
11
Stork Data Placement Scheduler
  • First scheduler specialized for data
    movement/placement.
  • De-couples data placement from computation.
  • Understands the characteristics and semantics of
    data placement jobs.
  • Can make smart scheduling decisions for reliable
    and efficient data placement.
  • A prototype is already implemented and deployed
    at several sites.
  • Now distributed with Condor Developers Release
    v6.7.6

http//www.cs.wisc.edu/condor/stork
12
Support for Heterogeneity
ICDCS04
  • Provides uniform access to different data storage
    systems and transfer protocols.
  • Acts as an IOCS for distributed systems.
  • Multilevel Policy Support

Protocol translation
using Stork Disk Cache
using Stork Memory Buffer
13
Dynamic Protocol Selection
ICDCS04
  • dap_type transfer
  • src_url drouter//slic04.sdsc.edu/tmp/tes
    t.dat
  • dest_url drouter//quest2.ncsa.uiuc.edu/tmp
    /test.dat
  • alt_protocols gsiftp-gsiftp, nest-nest
  • or
  • src_url any//slic04.sdsc.edu/tmp/test.da
    t
  • dest_url any//quest2.ncsa.uiuc.edu/tmp/tes
    t.dat

Traditional Scheduler 48 Mb/s Using Stork 72
Mb/s
DiskRouter crashes
DiskRouter resumes
14
Run-time Auto-tuning
AGridM03
Traditional Scheduler (without tuning) 0.5 MB/s
Using Stork (with tuning) 10 MB/s
  • link slic04.sdsc.edu quest2.ncsa.uiuc.edu
  • protocol gsiftp
  • bs 1024KB // I/O block size
  • tcp_bs 1024KB // TCP buffer size
  • p 4 // number of parallel streams

15
Controlling Throughput
Europar04
Wide Area
Local Area
  • Increasing concurrency/parallelism does not
    always in crease transfer rate
  • Effect on local area and wide are is different
  • Concurrency and parallelism have slightly
    different impacts on transfer rate

16
Controlling CPU Utilization
Europar04
Client
Server
  • Concurrency and parallelism have totally
    opposite impacts on CPU utilization at the server
    side.

17
Detecting and Classifying Failures
Grid04
F
Check DNS Server
DNS Server error
Transient
S
F
Check DNS
No DNS entry
Permanent
S
F
Check Network
Network Outage
Transient
S
F
Check Host
Host Down
Transient
S
F
Check Protocol
Protocol Unavailable
Transient
S
F
Check Credentials
Not Authenticated
Permanent
S
Source File Does Not Exist
F
Check File
Permanent
S
S
F
Test Transfer
Transfer Failed
18
Detecting Hanging Transfers
Cluster04
  • Collecting job execution time statistics
  • Fit a distribution
  • Detect and avoid
  • black holes
  • hanging transfers
  • Eg. for normal distribution
  • 99.7 of job execution times should lie between
  • (avg-3stdev), (avg3stdev)

19
Stork can also
  • Allocate/de-allocate (optical) network links
  • Allocate/de-allocate storage space
  • Register/un-register files to Meta Data Catalog
  • Locate physical location of a logical file name
  • Control concurrency levels on storage servers
  • You can refer to ICDCS04JPDC05AGridM03

20
  • Apply to Real Life Applications

21
DPOSS Astronomy Pipeline
22
Failure Recovery
Diskrouter reconfigured and restarted
UniTree not responding
SDSC cache reboot UW CS Network outage
Software problem
23
End-to-end Processing of 3 TB DPOSS Astronomy Data
  • Traditional Scheduler
  • 2 weeks
  • Using Stork
  • 6 days

24
Summary
  • Stork provides solutions for the data placement
    needs of the Grid community.
  • It is ready to fly!
  • Now distributed with Condor developers release
    v6.7.6.
  • All basic features you will need are included in
    the initial release.
  • More features coming in the future releases.

25
Thank you for listening..Questions?
Write a Comment
User Comments (0)
About PowerShow.com