Title: Project Overview
1(No Transcript)
2Optimisation of Data Access in Grid Environment
- Darin Nikolow1 Renata Slota1
- Lukasz Dutka1 Jacek Kitowski12
- Piotr Nyczyk1 Mariusz
Dziewierz1 - 1Institute of Computer Science - AGH
- 2Academic Computer Centre CYFRONET - AGH
- University of Mining and Metallurgy, Cracow,
Poland
CrossGrid Project - Task 3.4
Cracow Grid Workshop, Nov.5-6, 2001
3Outline
- Background
- Bottom-top approach
- Media management software
- middleware for existing HSM
- dedicated VTSS
- Local component-expert systems
- Global policy for migration/replication
FOR MORE INFO...
http//www.icsr.agh.edu.pl/
4Motivation
- Big and growing stuff of data
- Multimedia database systems (applications -
medical, educational, virtual reality, virtual
laboratories, digital libraries, advanced
simulations, ...) - Solution Tertiary Storage Systems (TSS) Media
Libraries Management Software - Examples of existing TSS
- HPSS, DataCutter, APRIL, Condor, OmniStore,
UniTree, ...... - Possible directions
- Data access time estimation system - efficient
usage - Data distribution and grid implementation - large
scale experiments - Expert system for data management
- Replication policies
5Background
- PARMED Project(Uni. of Klagenfurt - Uni. of
Mining Metall. Cracow) - to support physicians with telematic services
for - long distance collaboration of medical centers,
- medical teleeducation
- case archives
6Bottom-top approach -Major Components
- Assumptions
- mechanism neutrality
- policy neutrality
- compatibility with grid infrastructure
- uniformity of information infrastructure
Replica Selection
Replica Management
Resource Management
Storage System
Metadata Repository
HSM
UniTree
Castor
HPSS
LDAP .....
7Media Management Software
- Nikolow, D., Slota, R., Kitowski, J., Nyczyk, P.,
Otfinowski, J., "Tertiary Storage System for
Index-Based Retrieving of Video Sequences",
Proc. Int. Conf. HPCN, Amsterdam, June 25-27,
2001, Lect.Notes in Comp. Sci. 2110, pp. 62-71,
Springer, 2001. - Nikolow, D., Slota, R., Kitowski, J.,
Benchmarking TertiaryStorage Systems with File
Fragmentation, PPAM Conf., Naleczów, Lect.Notes
in Comp.Sci., accepted.
8Media Management Softwareand its usage in X
- Darin Nikolow
- darin_at_uci.agh.edu.pl
9Motivation
- Main purpose of the developed TSS efficient
index-based retrieving of video fragments
(instead of file fragments) - specific requirements for frequent data reading
- startup latency
- transfer time
- minimal transfer rate gt video bitrate
- Two prototypes proposed and benchmarked
- middleware layer for existing HSM
- dedicated TSS
- The developed systems are of general use -gt
possible grid implementations
10Multimedia Storage and Retrieval System (MMSRS)
- Requirements
- use existing software (UniTree HSM)
- reduce latency (start-up delay), i.e. -reduce
file granularity - file fragmentation (subfiles)
- Implementation
- splitting files into pieces of similar size
- Middleware layer on HSM
- Consists of
- Automated Media Library
- UniTree HSM managing system
- MPEG extension for HSM (MEH)
- MEH receives the name of video file and the frame
range - start/end frames - output stream via HTTP
11Video Tertiary Storage System (VTSS)
- Repository Daemon REPD
- keeps repository information
- Tertiary File Manager Daemon TFMD
- managesfiledb - tape ident and startup position
of the fragmenttapedb - information about tape
usage
- Dedicated TSS
- Client requests to VTSS can be of the following
kinds - write a new file to VTSS, read a file fragment
from VTSS, delete a file from VTSS. - The fragment range is defined in the frame units
- Two daemons implemented in C using Unix sockets
12MMSRS and VTSS performance
- Hardware (AML QuantumATL)
- ATL 4/52 (DLT 2000)
- ATL 7100 (DLT 7000)
- HP D-class server (with UniTree HSM)
- Data
- 790 MB MPEG1 file with B0.4 MB/s bitrate (33
min.) - subfile for MMSRS - 16 MB (8,16, 32 MB tested)
- as short as possible to keep reproducing smooth
(low latency) - optimal subfile length depends on
- positioning time
- drive transfer rate
- bitrate of the video file
13Benchmarks
- Startup latency - time elapsed from issuing the
request to receiving the first byte - Transfer time - time from receiving the first
byte till the end of transmission - Minimal rate - minimal transfer rate experienced
by a client with endless buffer (should be
greater than the bitrate of the video stream to
have smooth reproduction)
14Startup latency
VTSS (DLT2000)
MMSRS (DLT2000)
UniTree reference startup latency 718 s
VTSS (DLT7000)
15Transfer time(beginning part shown only)
VTSS (DLT2000)
MMSRS (DLT2000)
UniTree reference transfer time 135 s
VTSS (DLT7000)
16System performance for the whole video file
transfer (DLT2000)
17Minimal transfer rate
- Definitions (for VTSS)
- Minimal transfer rate
- Time offset for tape changing direction
- n - number of packets
- Bj - number of bytes in j-th packet
- ti - time when i-th packet was received
- T - tape capacity in MB
- N - number of tracks
- Br - bitrate of video file in MB/s
- no bad blocks
18Minimal transfer rate
VTSS (DLT2000)
MMSRS (DLT2000)
- For DLT2000
- T 10 GB
- N 64
- Br 0.4 MB/s
- For DLT7000
- T 35 GB
- N 52
- Br 0.4 MB/s
Qdt 400 s
Qdt 1723 s
VTSS (DLT7000)
19Access Time Estimation Motivation for X
- Retrieving a file from TSS could last few seconds
or few hours - Users satisfaction increases when the access
time of data is known (e.g. user waiting to watch
selected video administrator recovering from
backup) - Efficient use of storage resources in Grid
environment (data replication subsystem)
20Access Time Estimation Approaches
- Open TSS approach
- source code changes
- will be used as experimental platform
- Black Box TSS approach - for existing HSMs in X
sites - retrieving TSSs state info via its native tools
and available internal files
21Access Time Estimation - Open TSS Approach
TSS
TSS Symulator
events
req. 1
ETA 4
data
ETA of req. id? 3
Client
req. id 2
TSS source code changes - adding event reporting
functions
22Access Time Estimation - Black Box TSS Approach
events collecting
TSS Monitor
update 4
TSS
TSS state 5
databases
logs
TSS Simulator
conf. files
fileid 9
ETA 6
Monitoring tools
fileid 2
data 10
Disk cache
queue state 3
Request Monitor Proxy
feedback 12
ETA 7
- Needed info by Simulator
- nr of drives
- tape labels
- media types
- position of file in media
- nr of requests
- ...
fileid 8
data 11
Client
fileid ETA? 1
23Conclusions
- MMSRS and VTSS more efficient than standard
UniTree HSM - MMSRS efficient enough to be used as a middleware
for existing HSM of UniTree type (in X sites) - Proposed measurements could be used for
- building more sophisticated distributed storage
systems (faster access to files stored in TSS) - building access time estimation subsystem
- Access time estimation subsystem ---gtgtgt an
information provider for X replication and
migration of data
http//www.icsr.agh.edu.pl/
24(No Transcript)
25Component-expert Systems
- Dutka, L., and Kitowski, J., Implementation of
expert technologies in information systems based
on a component methodology, MSK 2001 Conf.,
Nov. 19-21,2001 Cracow, accepted (in Polish). - Dutka, L., and Kitowski, J., Component-expert
technology in mass-storage grid applications,
ICCS 2002 Conf., April 2002, Amsterdam, in
preparation.
26Basics of Component-Expert Technology and its
usage in X
- Lukasz Dutka
- dutka_at_agh.edu.pl
27Classical component strategy
28Component-expert strategy
29Component structure
30Component header structure
31Structure of component code
32Call-Environment
- Describe state of the call place
- Describe call place requirements
- Caries information about user or programmer
wishes - Expert system processes Call-Environment and
finds best component for given Call-Environment
33Expert Subsystem
- Rule-based expert system
- Typical rule looks like If log-expr Then action1
Else action2 - The rules describe what is meant by The best
component for given Call-Environment - Expert system logs calls and stores deduction
results for further analysis
34Profits from Component-Expert technology
- Dynamic expanding system possibility
- Ease of solving new problems
- Minimising programmer responsibility for
component choice - Ease of programming in heterogeneous environment
- Maximal reusable of components
- Internal simplicity of components code
- Increase efficiency of programming process
35Component-Expert Technology for X Task 3.4
36Basic analysis of Data-access problems in X
- Different data set types
- Huge data files
- Distributed environment
- Long distance connections
- Mission critical applications
- Heterogeneous data storing systems
- Heterogeneous computing systems
- Open system
- Unpredictable file types
37Basic connection diagram
38Sequence Diagram
39Example of Component-Expert technology usage for
data access in X
- Sample Attributes
- User ID
- Computing Node ID
- Preferred replica localisation
- Required throughput
- Application purpose
- Data sharing
- Critical level
- Replica expiration .....
- Example of local decisions
- Devices choosing (according to availability and
type) - Storing format (blocks, multimedia
streams,......) - Available delivering performance (network,
storage devices,....) - ... And much more ...
40Control System for Migration/Replication
Strategies (1/2)
- Assumptions
- replica lt--gt file instances
- read only
- no update, no coherence
From replica manager
41System Management for Migration/Replication
Strategies (2/2)
- In cooperation with other projects
- High-level control system (e.g. cooperating with
LDAP) - Two possible realizations
- heuristic reinforcement learning based on
heuristic strategies for migration/replication
and system state - classical rule-based expert system
42Conclusions
- Some elements have been defined and implemented
- Working on higher level structure and cooperation
with other X modules and services
43(No Transcript)