Title: Data Management Challenges, Practices and Technologies
1Data Management Challenges, Practices and
Technologies
- Dr. P. Sambath Narayanan
- Senior Technology Architect
- Customer Experience Centre
- Sun Microsystems India
2Three Phases of Data ManagementData at Work,
Data in Motion, Data at Rest
Capture
Creation
3Todays Data Management Challenges The Budget
and Data Growth Gap
- Budgets not keeping up with demand
- Storage proliferation enterprise-wide
- Current storage strategy cannot keep up with data
growth - Storage management increasingly a burden
- New applications not easily accommodated
4All Data is Not Created Equal
5Sun Storage
6Multi-Tiered Storage
- Support for all Sun Storage Systems
- Cost-differentiated Storage strategy with
centralized provisioning
7Heterogeneous Storage Pooling
- Third party storage system capacities in storage
pool - VLVs open to replication and mirroring
- Easy data migration
- Investment protection
8Data Management Practices
ENTERPRISECONTENTMANAGEMENT
BUSINESS AND REGULATORY COMPLIANCE
Achieve operational goals Meet regulatory
requirements
Data Management Policy-based Archive Data
Warehousing
BUSINESSCONTINUITY
ENGAGEMENT SERVICES
Data Continuance Operational Resilience Disaster
Recovery
IT INFRASTRUCTURECONSOLIDATION
Centralize Management Consolidate
Resources Migration
END-TO-END INFORMATION MANAGEMENT
9Data Management Technologies I
- Solaris QFS Filesystem
- High Performance
- Suitable for Scientific I/O and Data
10Technical Overview
- Technical overview
- Variable Disk Allocation Unit (DAU) size
- Metadata separation
- Multiple stripe options
- SAN file system support
- Automatic direct I/O
- Pre-allocation of disk blocks
- Quick-write feature
- Multiple threads for reads or writes
- Integrated volume management
- Q-start capability
11Technical Overview
- Variable DAU size
- Adjusted/optimized based on hardware
- Allows aligning disk I/O with hardware to
optimize read/write performance
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
RAID
128k
128k
Stripe Size 128k
DAU Size 640k
128k
128k
128k
12Technical Overview
- High-performance metadata positioning
- Separates file system metadata (inodes, indirect
extents, directories) from user data - No head seek conflict on reads and writes of
short and long data
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
Metadata
SunTM QFS Software File System
File Data
File Data
13Technical Overview
- Multiple stripe options
- Standard block level
- Stripe a file on block level over a disk, RAID
array, group of disks, or striped group - Stripe groups
- Group disks or array of disks (RAID, etc.)
together for optimized I/O - Round robin
- Keep a complete file within a disk, array, or
striped group
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
Meta Data
SunTM QFS Software
File Data
14Striping Options
Standard Striping
Round Robin
Metadata
Metadata
Single Disk or RAID
Single Disk or RAID
Multiple I/O streams, eachstream (file) is
transferredpartially to multiple drives
inparallel, I/O based on DAU size
Multiple I/O streams, eachstream (file) is
transferredentirely to a single drive
inparallel, I/O based on file size
15Striping Options
Standard Striping
Round Robin
Meta Data
Metadata
Single Disk or RAID
Single Disk or RAID
SunTM QFS Software
Multiple I/O streams, eachstream (file) is
transferredpartially to multiple groups
ofstriped drives in parallel, I/Obased on DAU
size
Multiple I/O streams, eachstream (file) is
transferredentirely to a single group
ofstriped drives in parallel, I/Obased on file
size
16Technical Overview
- Supports Fibre Channel disk devices
- Share the file system between multiple hosts
- Must not duplicate the data and hardware
- Multiple Reader / Writers
- Useful for high availability and fail-over
environments - Excellent for shared environments, load balancing
systems, or other environments
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
Fibre-Channel Fabric
Reader
Reader/Writer
Reader
SunTM QFS Software File System
17Technical Overview
- QFS fully supports direct I/O
- Automatically switch between page I/O and direct
I/O depending on I/O size - Set special attributes to force direct I/O for
specific files or directories or enabled by API - Attributes on directories are inherited
- Optionally, force direct I/O on all files in a
file system by mount parameter
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
Application
System Call
Virtual File System
Direct I/O
Paged I/O
18Technical Overview
- Supports pre-allocation of disk blocks
- One of the best performance for large, sequential
I/O - Helps assures contiguous disk blocks are
allocated - Reads continuously without having to move/seek
around the disk - Since metadata is separated, head is not
disturbed during I/O - Can be used with direct I/O
- Can be switched on by API or attribute
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
Blocks are allocatedsequentially on the disk
19Technical Overview
- Quick-write features
- Switches off write lock in virtual file system
layer - Allows simultaneous reads and writes to same file
- Application must know and control multiple writes
- Can be switched on by API or attribute
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
SunTM QFS Software File System
Simultaneous writes
File A
20Technical Overview
- Fully threaded
- Multiple, simultaneous reads, writes, etc.
- Multiple read/write threads per file I/O,
selectable by API or attribute - Supports multiple file systems
- Each file system can be tuned and configured
independently - Virtually unlimited number of files per file
system - Inodes are dynamically allocated
- True 64-bit files sytem
- Supports file sizes up to 18.4 EB(true 64-bit)
- No kernel modifications
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
21Technical Overview
- Integrated volume management
- Provided internal to the file system
- Create file systems from slices complete disks,
RAID subsystems, or meta-devices - Create one of the largest file systems in seconds
- Grow files systems, add devices without
dump/restore
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
Meta Data
c10t0d0s2
Meta Data
c9t0d0s2
c8t0d0s2
c4t0d0s1
c5t0d0s1
c0t0d0s0
c6t0d0s1
c1t0d0s1
c7t0d0s1
c2t1d0s0
c3t0d0s2
c3t1d0s1
c4t0d0s0
c5t0d0s0
c6t0d0s0
c5t0d0s2
c7t0d0s0
22Technical Overview
- Q-start provides instant-on technology
- Keeps file system clean with virtually no system
overhead - Integrated error checking on all critical I/O
- Serialization of critical metadata writes
- Keeps identification records on metadata, which
can be dynamically detected and recovered - No fsck required after interruption
- Even largest file systems generate in seconds
- Dynamic inode allocation for almost unlimited
number of files - Grow file system without dump/restore
SunTM QFS Software Technology
- DAU size
- Metadata
- Stripe options
- SAN support
- Direct I/O
- Pre-allocation
- Quick-write
- Multithreaded
- Volume management
- Q-start feature
23QFS - Summary
- High performance
- Provides users and applications with the one of
the fastest file system available on Solaris
Operating Environment today - Provides near linear scaling when adding hardware
- Includes internal volume manager
- Extremely low CPU usage, even at maximum I/O rates
24Data Management Technologies II
- Solaris SAM-QFS Filesystem
- SAM Storage Archive Manager filesystem
- Information Life-cycle Management
- Suitable for Scientific I/O and Data
25Sun Content Infrastructure SystemAutomated,
Policy-based Data Management
Applications
- Dynamic, application-transparent movement of data
to appropriate class of storage - Automatic recovery/recall of data from any
storage tier - On-demand restore of files from user or
application - Continuous archive to tape via global archiving
policies
Client
Client
Client
Policy andArchiving Services
SAN Fabric
TieredStorage
26Dynamic Tiered Storage
TraditionalFile System
SAM-FS and QFS
Tier 1 FC Disks
Tier 2 ,SATA
Tier 3,Tape
After Data transparently moved to most cost
effective media via user set policies
Before Accumulating all data on disk
27SAM-FS software Advanced Storage Management
28SAM-FS Advanced Storage Management
- Product design has taken full advantage of
Solaris Operating Environment multithreading - High speed, parallel archiving, and retrieving of
files to multiple devices at full rated streaming
speeds - Data may immediately be available to users or
applications during file retrieval - with minimum
or no waiting for the stage to complete - Optimized handling of large and small files no
penalty for small files - Disaster recovery can average 100,000 inodes per
minute
- Virtually unmatchedperformance
- Complete file system
- Complete media management
- Multi-layered data protection
- Storage policy management
- Advanced storage management
29High-performance 64-bit File System
- Provides storage management capabilities through
standard UNIX file system interface - Operations and hardware transparent to users AND
applications - No kernel modifications
- No proprietary database (.inodes file)
- Virtually unlimited size of files, number of
files, and number of file systems - Supports direct access, ftp, NFS, rcp, and so on
- Virtually unmatched performance
- Complete file system
- Complete media management
- Multi-layered data protection
- Storage policy management
- Advanced storage management
30High-performance 64-bit File System
- Dynamically allocated inodes
- MAY extend disk file systems indefinitely (264-1
18,4 Exabyte) - Dual and variable block allocation units
- Increase performance and disk utilization
- Adjust DAU based on hardware
- High-speed data transfer for small and large
files - Block read/write-ahead adjustable per file system
- Virtually unmatched performance
- Complete file system (cont.)
- Complete media management
- Multi-layered data protection
- Storage policy management
- Advanced storage management
31SunTM SAM-FS Software Block Diagram
UserApplications
Archiver arfind arcopy
Releaserprocess
NFS apps. FTP shell and so on
SAM-INITmaster process
- Sun SAM-FS
- AdminCommands
- Label
- Import
- Export
- and so on
Manually loadedremovable media
Robotic Control process
Robotic Control process
Robotic Control process
Robotic control processes
Catalog(s)
Catalog(s)
32SunTM SAM-FS Software Automated Robotic Control
- Complete control of most all major libraries
- Network attached STK, IBM, EMASS/Grau
- Direct attached Ampex, Sony, ATL, STK, others
- Support for most all major tape and optical
devices - DLT, STK Redwood SD-3, Magstar, Sony DTF, Ampex
DST, AIT, 3490E, HP, Sony, and others - Variable tape block size up to 2 GB
- ANSI tape label processing
- Catalog management
- Stores important information like number of
mounts, mount date, fill grade, vsn name, and so
on in a catalog for each robot individually - Includes support for off-site tapes in a special
historian catalog
- Virtually unmatched performance
- Complete file system
- Complete media management
- Multi-layered data protection
- Storage policy management
- Advanced storage management
33SAM-FS Software Automatic Volume Management
- Simplified access to tape-based data
- Applications perform normal I/O to a file name
- SAM-FS automatically performs the tape mounts,
label processing, positioning, data transfers,
and so on - Multi-volume tape capability
- Files can span multiple volumes
- Multiple tape library capability
- Supports multiple, different types of libraries
simultaneously - Multiple drives and media types capability
- Transparently read/write to multiple devices
simultaneously at device speeds - Bad media handling and history tracking
- Virtually unmatched performance
- Complete file system
- Complete media management(cont.)
- Multi-layered data protection
- Storage policy management
- Advanced storage management
34SAM-FS Software Tape Management System
- Automated tape management for tape-based
applications - Directly write/read tapes in virtually any format
using standard UNIXâ commands or applications - User data sets resident on a tape or tapes can be
automatically referenced through a single file
name - ANSI label processing (write labels, verify
labels) - Multiple volume support
- Bar code support
- Virtually unmatched performance
- Complete file system
- Complete media management(cont.)
- Multi-layered data protection
- Storage policy management
- Advanced storage management
35SAM-FS Software Advanced Storage Management
- Flexible policy management for file grouping,
media assignment, and so on - Special file attributes to customize and
automatically control file access depending on
user needs - Associative archiving/retrieving, direct
retrieve/stage, thumbnails, and so on - "Time-based" archiving to best protect users data
- No additional backup of user data necessary
- Virtually unmatched performance
- Complete file system
- Complete media management
- Multi-layered data protection
- Storage policy management
- Advanced storage management
36SAM-FS Software Advanced Storage Management
- Includes most all standard HSM capabilities
- Provides storage management capabilities through
standard UNIX file system interface - "Virtual Disk" - most all data appears online to
user whether online, near-line, or off-line - API interface available for flexible application
control
- Virtually unmatched performance
- Complete file system
- Complete media management
- Multi-layered data protection
- Storage policy management
- Advanced storage management
37Advanced Storage and Archive Manager
- Archive Copy files from disk cache to removable
media automatically without operator intervention - Release Manage disk space, free up copied files
from disk cache automatically - Stage Automatically bring copied files back to
disk cache when accessed - Recycle Repack removable media for reuse
- Virtually unmatched performance
- Complete file system
- Complete media management
- Multi-layered data protection
- Storage policy management
- Advanced storage management
38Advanced Storage and Archive Manager
- Advanced storage management - options
- Archive
- Release
- Stage
- Recycle
- Archive sets can be defined freely
- By user, group, minsize, maxsize, directory,
file, and wildcard - Copy to specified pool(s) of media
- Support for scratch pools
- Automatic archive set generation
- Reserve media, used with scratch pools
- Optimized archiving of large and small files
- Default is large files first then small files
- Files are optionally copied together
- Join dir/size - sort date/size
- Data verification - archive files with
"checksum" - Shell commands "archive, sis, sfind, and so on"
available
39Advanced Storage and Archive Manager
- Advanced storage management - options
- Archive
- Release
- Stage
- Recycle
- Release only if a valid copy exists
- Partial release of data possible
- Beginning of file (stub) stays on disk
- For file manager application, and so on
- Amount to be released can be specified
- Release directly after archiving
- Release at watermarks
- Optionally never release data
- Data stays always on the disk cache
- Shell commands to directly control releasing
40Advanced Storage and Archive Manager
- Associative stage
- Files with the associative attribute set will be
staged together - Read-behind stage
- Data is immediately available to users or
applications while staging of the file is still
in progress - Never stage
- Data will be given to user or application
directly from the media without going through the
disk cache - Pre-stage
- Automatically stage a selection of data back to
the disk cache - Shell commands to directly control staging
- Advanced storage management - options
- Archive
- Release
- Stage
- Recycle
41Advanced Storage and Archive Manager
- Consolidation media with "inactive" files
- "Inactive" files are files which no longer exist
on file system, that is they have no inode entry - Policies to define level of recycling
- Percent of "active" files on media with specific
fill grade - Recycle media by robot or archive set
- Recycle multiple volumes parallel
- Exclude volumes from recycling
- Automatically or manually re-label media after
all active files have been moved to another media
- Advanced storage management - options
- Archive
- Release
- Stage
- Recycle
42SunTM SAM-FS Software OverviewDrive Support
- Tapes
- 3480/3490E (1/2-inch tape)
- Ampex DST 310
- Ampex DST 312
- DLT 2000/4000/7000
- Exabyte 8505c (8mm tape)
- IBM Magstar 3590/3590E
- STK Redwood SD-3
- STK TimberLine 9490
- STK 9840
- Sony AIT/AIT2
- Sony DTF
- Sun DAT (4mm tape)
- Optical Disk
- HP 1714/15/16T (1.3 GB)
- HP C1113F (2.6 GB)
- IBM 632-C2X (5-1/4" WORM)
- Maxoptix T4-2600 (1.3/2.6 GB)
- Nikon DD121 (8.0 GB)
- Sony SMO-F531/F541/F551
- Plasmon DW260 (LIMDOW P2300DW/P2600DW)
43SunTM SAM-FS Software Overview Robot Support
- Spectra Logic 2000/9000/10000S
- Overland Data LXB
- Qualstar TLS 4000
- Quantum DLT 2500/2700/4500/4700/XT
- STK PowderHorn 9310
- STK WolfCreek 9360
- STK TimberWolf 9710/14/30/40
- Sony DMS B9/35
- Optical Media Changer
- DISC DocuStore
- HP SureStore ex/fx/st/t series
- HP 1714T/1715T
- LMS 4500/6600
- Maxoptix MX-552
- Plasmon 260/520/695 series
- Tape Media Changer
- ADIC Scalar 224/448/458
- Ampex DST 410/412/712/810/812
- ASACA AD-Series 15-900
- ATL L500/P1000/P3000
- ATL 520/2640/7100 series
- Breece Hill Q series
- EMASS AML series
- GRAU ABBA series
- IBM 3494 dataserver
- IBM 3570 Magstar MP
- Mountain Gate D-28/60/360/900 series
- Mountain Gate N-300/540
- Sony DMS-8400 PetaSite
- Spectra Logic 2000/9000/10000S
44SAM-QFS - Summary
- Easily keeps up with data growth
- Largest filesystems can be created
- Exhaustive Tape Library and Media Support
- Restoring data is instant
- Having multiple copies of data provide Disaster
Recovery features - Important middleware for Information Lifecycle
Management
45Key Points to Remember
- Data is Growing Exponentially
- Not all data are Equal
- Even very old data may need to be submitted to
auditors one day so the data can not be
destroyed - Data Management is a challenge
- Information Lifecycle Management tools, processes
and technologies will help to meet the challenge
46SummaryPoints to Remember
- Data is Growing Exponentially
- Not all data are Equal
- Even very old data need to be submitted to court
one day so the data can not be destroyed - Data Management is a challenge
- Information Lifecycle Management tools, processes
and technologies will help to meet the challenge
DataStore/Retrieve