Title: Stork: State of the Art
1Stork State of the Art
- Tevfik Kosar
- Computer Sciences Department
- University of Wisconsin-Madison
- kosart_at_cs.wisc.edu
- http//www.cs.wisc.edu/condor/stork
2The Imminent Data deluge
- Exponential growth of scientific data
- 2000 0.5 Petabyte
- 2005 10 Petabytes
- 2010 100 Petabytes
- 2015 1000 Petabytes
- I am terrified by terabytes
- -- Anonymous
- I am petrified by petabytes
- -- Jim Gray
- Moores Law outpaced by growth of scientific
data!
32-3 PB/year
11 PB/year
500 TB/year
20 TB - 1 PB/year
4How to access and process distributed data?
TB
TB
PB
PB
5I/O Management in the History
6I/O Management in the History
7I/O Management in the History
8I/O Management in the History
DISTRIBUTED SYSTEMS LEVEL
I/O SUBSYSTEM
9Allocate space for input output data
Allocate space for input output data
get
JOB j
put
Release input space
Release input space
Release output space
Release output space
Individual Jobs
10Separation of Jobs
Data A A.stork Data B B.stork Job C
C.condor .. Parent A child B Parent B child
C Parent C child D, E ..
DAG specification
11Stork Data Placement Scheduler
- First scheduler specialized for data
movement/placement. - De-couples data placement from computation.
- Understands the characteristics and semantics of
data placement jobs. - Can make smart scheduling decisions for reliable
and efficient data placement. - A prototype is already implemented and deployed
at several sites. - Now distributed with Condor Developers Release
v6.7.6
http//www.cs.wisc.edu/condor/stork
12Support for Heterogeneity
ICDCS04
- Provides uniform access to different data storage
systems and transfer protocols. - Acts as an IOCS for distributed systems.
- Multilevel Policy Support
Protocol translation
using Stork Disk Cache
using Stork Memory Buffer
13Dynamic Protocol Selection
ICDCS04
-
- dap_type transfer
- src_url drouter//slic04.sdsc.edu/tmp/tes
t.dat - dest_url drouter//quest2.ncsa.uiuc.edu/tmp
/test.dat - alt_protocols gsiftp-gsiftp, nest-nest
- or
- src_url any//slic04.sdsc.edu/tmp/test.da
t - dest_url any//quest2.ncsa.uiuc.edu/tmp/tes
t.dat
Traditional Scheduler 48 Mb/s Using Stork 72
Mb/s
DiskRouter crashes
DiskRouter resumes
14Run-time Auto-tuning
AGridM03
Traditional Scheduler (without tuning) 0.5 MB/s
Using Stork (with tuning) 10 MB/s
-
- link slic04.sdsc.edu quest2.ncsa.uiuc.edu
- protocol gsiftp
- bs 1024KB // I/O block size
- tcp_bs 1024KB // TCP buffer size
- p 4 // number of parallel streams
15Controlling Throughput
Europar04
Wide Area
Local Area
- Increasing concurrency/parallelism does not
always in crease transfer rate - Effect on local area and wide are is different
- Concurrency and parallelism have slightly
different impacts on transfer rate
16Controlling CPU Utilization
Europar04
Client
Server
- Concurrency and parallelism have totally
opposite impacts on CPU utilization at the server
side.
17Detecting and Classifying Failures
Grid04
F
Check DNS Server
DNS Server error
Transient
S
F
Check DNS
No DNS entry
Permanent
S
F
Check Network
Network Outage
Transient
S
F
Check Host
Host Down
Transient
S
F
Check Protocol
Protocol Unavailable
Transient
S
F
Check Credentials
Not Authenticated
Permanent
S
Source File Does Not Exist
F
Check File
Permanent
S
S
F
Test Transfer
Transfer Failed
18Detecting Hanging Transfers
Cluster04
- Collecting job execution time statistics
- Fit a distribution
- Detect and avoid
- black holes
- hanging transfers
- Eg. for normal distribution
- 99.7 of job execution times should lie between
- (avg-3stdev), (avg3stdev)
19Stork can also
- Allocate/de-allocate (optical) network links
- Allocate/de-allocate storage space
- Register/un-register files to Meta Data Catalog
- Locate physical location of a logical file name
- Control concurrency levels on storage servers
- You can refer to ICDCS04JPDC05AGridM03
20- Apply to Real Life Applications
21DPOSS Astronomy Pipeline
22Failure Recovery
Diskrouter reconfigured and restarted
UniTree not responding
SDSC cache reboot UW CS Network outage
Software problem
23End-to-end Processing of 3 TB DPOSS Astronomy Data
- Traditional Scheduler
- 2 weeks
- Using Stork
- 6 days
24Summary
- Stork provides solutions for the data placement
needs of the Grid community. - It is ready to fly!
- Now distributed with Condor developers release
v6.7.6. - All basic features you will need are included in
the initial release. - More features coming in the future releases.
25Thank you for listening..Questions?