Data Management Challenges, Practices and Technologies - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Data Management Challenges, Practices and Technologies

Description:

Stripe a file on block level over a disk, RAID array, group of disks, or striped group ... systems from slices complete disks, RAID subsystems, or meta-devices ... – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 43
Provided by: iit1
Category:

less

Transcript and Presenter's Notes

Title: Data Management Challenges, Practices and Technologies


1
Data Management Challenges, Practices and
Technologies
  • Dr. P. Sambath Narayanan
  • Senior Technology Architect
  • Customer Experience Centre
  • Sun Microsystems India

2
Three Phases of Data ManagementData at Work,
Data in Motion, Data at Rest
Capture
Creation
3
Todays Data Management Challenges The Budget
and Data Growth Gap
  • Budgets not keeping up with demand
  • Storage proliferation enterprise-wide
  • Current storage strategy cannot keep up with data
    growth
  • Storage management increasingly a burden
  • New applications not easily accommodated

4
All Data is Not Created Equal
5
Sun Storage
6
Multi-Tiered Storage
  • Support for all Sun Storage Systems
  • Cost-differentiated Storage strategy with
    centralized provisioning

7
Heterogeneous Storage Pooling
  • Third party storage system capacities in storage
    pool
  • VLVs open to replication and mirroring
  • Easy data migration
  • Investment protection

8
Data Management Practices
ENTERPRISECONTENTMANAGEMENT
BUSINESS AND REGULATORY COMPLIANCE
Achieve operational goals Meet regulatory
requirements
Data Management Policy-based Archive Data
Warehousing
BUSINESSCONTINUITY
ENGAGEMENT SERVICES
Data Continuance Operational Resilience Disaster
Recovery
IT INFRASTRUCTURECONSOLIDATION
Centralize Management Consolidate
Resources Migration
END-TO-END INFORMATION MANAGEMENT
9
Data Management Technologies I
  • Solaris QFS Filesystem
  • High Performance
  • Suitable for Scientific I/O and Data

10
Technical Overview
  • Technical overview
  • Variable Disk Allocation Unit (DAU) size
  • Metadata separation
  • Multiple stripe options
  • SAN file system support
  • Automatic direct I/O
  • Pre-allocation of disk blocks
  • Quick-write feature
  • Multiple threads for reads or writes
  • Integrated volume management
  • Q-start capability

11
Technical Overview
  • Variable DAU size
  • Adjusted/optimized based on hardware
  • Allows aligning disk I/O with hardware to
    optimize read/write performance

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

RAID
128k
128k
Stripe Size 128k
DAU Size 640k
128k
128k
128k
12
Technical Overview
  • High-performance metadata positioning
  • Separates file system metadata (inodes, indirect
    extents, directories) from user data
  • No head seek conflict on reads and writes of
    short and long data

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

Metadata
SunTM QFS Software File System
File Data
File Data
13
Technical Overview
  • Multiple stripe options
  • Standard block level
  • Stripe a file on block level over a disk, RAID
    array, group of disks, or striped group
  • Stripe groups
  • Group disks or array of disks (RAID, etc.)
    together for optimized I/O
  • Round robin
  • Keep a complete file within a disk, array, or
    striped group

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

Meta Data
SunTM QFS Software
File Data
14
Striping Options
Standard Striping
Round Robin
Metadata
Metadata
Single Disk or RAID
Single Disk or RAID

Multiple I/O streams, eachstream (file) is
transferredpartially to multiple drives
inparallel, I/O based on DAU size
Multiple I/O streams, eachstream (file) is
transferredentirely to a single drive
inparallel, I/O based on file size
15
Striping Options
Standard Striping
Round Robin
Meta Data
Metadata
Single Disk or RAID
Single Disk or RAID

SunTM QFS Software

Multiple I/O streams, eachstream (file) is
transferredpartially to multiple groups
ofstriped drives in parallel, I/Obased on DAU
size
Multiple I/O streams, eachstream (file) is
transferredentirely to a single group
ofstriped drives in parallel, I/Obased on file
size
16
Technical Overview
  • Supports Fibre Channel disk devices
  • Share the file system between multiple hosts
  • Must not duplicate the data and hardware
  • Multiple Reader / Writers
  • Useful for high availability and fail-over
    environments
  • Excellent for shared environments, load balancing
    systems, or other environments

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

Fibre-Channel Fabric
Reader
Reader/Writer
Reader
SunTM QFS Software File System
17
Technical Overview
  • QFS fully supports direct I/O
  • Automatically switch between page I/O and direct
    I/O depending on I/O size
  • Set special attributes to force direct I/O for
    specific files or directories or enabled by API
  • Attributes on directories are inherited
  • Optionally, force direct I/O on all files in a
    file system by mount parameter

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

Application
System Call
Virtual File System
Direct I/O
Paged I/O
18
Technical Overview
  • Supports pre-allocation of disk blocks
  • One of the best performance for large, sequential
    I/O
  • Helps assures contiguous disk blocks are
    allocated
  • Reads continuously without having to move/seek
    around the disk
  • Since metadata is separated, head is not
    disturbed during I/O
  • Can be used with direct I/O
  • Can be switched on by API or attribute

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

Blocks are allocatedsequentially on the disk
19
Technical Overview
  • Quick-write features
  • Switches off write lock in virtual file system
    layer
  • Allows simultaneous reads and writes to same file
  • Application must know and control multiple writes
  • Can be switched on by API or attribute

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

SunTM QFS Software File System
Simultaneous writes
File A
20
Technical Overview
  • Fully threaded
  • Multiple, simultaneous reads, writes, etc.
  • Multiple read/write threads per file I/O,
    selectable by API or attribute
  • Supports multiple file systems
  • Each file system can be tuned and configured
    independently
  • Virtually unlimited number of files per file
    system
  • Inodes are dynamically allocated
  • True 64-bit files sytem
  • Supports file sizes up to 18.4 EB(true 64-bit)
  • No kernel modifications

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

21
Technical Overview
  • Integrated volume management
  • Provided internal to the file system
  • Create file systems from slices complete disks,
    RAID subsystems, or meta-devices
  • Create one of the largest file systems in seconds
  • Grow files systems, add devices without
    dump/restore

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

Meta Data
c10t0d0s2
Meta Data
c9t0d0s2
c8t0d0s2
c4t0d0s1
c5t0d0s1
c0t0d0s0
c6t0d0s1
c1t0d0s1
c7t0d0s1
c2t1d0s0
c3t0d0s2
c3t1d0s1
c4t0d0s0
c5t0d0s0
c6t0d0s0
c5t0d0s2
c7t0d0s0
22
Technical Overview
  • Q-start provides instant-on technology
  • Keeps file system clean with virtually no system
    overhead
  • Integrated error checking on all critical I/O
  • Serialization of critical metadata writes
  • Keeps identification records on metadata, which
    can be dynamically detected and recovered
  • No fsck required after interruption
  • Even largest file systems generate in seconds
  • Dynamic inode allocation for almost unlimited
    number of files
  • Grow file system without dump/restore

SunTM QFS Software Technology
  • DAU size
  • Metadata
  • Stripe options
  • SAN support
  • Direct I/O
  • Pre-allocation
  • Quick-write
  • Multithreaded
  • Volume management
  • Q-start feature

23
QFS - Summary
  • High performance
  • Provides users and applications with the one of
    the fastest file system available on Solaris
    Operating Environment today
  • Provides near linear scaling when adding hardware
  • Includes internal volume manager
  • Extremely low CPU usage, even at maximum I/O rates

24
Data Management Technologies II
  • Solaris SAM-QFS Filesystem
  • SAM Storage Archive Manager filesystem
  • Information Life-cycle Management
  • Suitable for Scientific I/O and Data

25
Sun Content Infrastructure SystemAutomated,
Policy-based Data Management
Applications
  • Dynamic, application-transparent movement of data
    to appropriate class of storage
  • Automatic recovery/recall of data from any
    storage tier
  • On-demand restore of files from user or
    application
  • Continuous archive to tape via global archiving
    policies

Client
Client
Client
Policy andArchiving Services
SAN Fabric
TieredStorage
26
Dynamic Tiered Storage
TraditionalFile System
SAM-FS and QFS
Tier 1 FC Disks
Tier 2 ,SATA
Tier 3,Tape
After Data transparently moved to most cost
effective media via user set policies
Before Accumulating all data on disk
27
SAM-FS software Advanced Storage Management
28
SAM-FS Advanced Storage Management
  • Product design has taken full advantage of
    Solaris Operating Environment multithreading
  • High speed, parallel archiving, and retrieving of
    files to multiple devices at full rated streaming
    speeds
  • Data may immediately be available to users or
    applications during file retrieval - with minimum
    or no waiting for the stage to complete
  • Optimized handling of large and small files no
    penalty for small files
  • Disaster recovery can average 100,000 inodes per
    minute
  • Virtually unmatchedperformance
  • Complete file system
  • Complete media management
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

29
High-performance 64-bit File System
  • Provides storage management capabilities through
    standard UNIX file system interface
  • Operations and hardware transparent to users AND
    applications
  • No kernel modifications
  • No proprietary database (.inodes file)
  • Virtually unlimited size of files, number of
    files, and number of file systems
  • Supports direct access, ftp, NFS, rcp, and so on
  • Virtually unmatched performance
  • Complete file system
  • Complete media management
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

30
High-performance 64-bit File System
  • Dynamically allocated inodes
  • MAY extend disk file systems indefinitely (264-1
    18,4 Exabyte)
  • Dual and variable block allocation units
  • Increase performance and disk utilization
  • Adjust DAU based on hardware
  • High-speed data transfer for small and large
    files
  • Block read/write-ahead adjustable per file system
  • Virtually unmatched performance
  • Complete file system (cont.)
  • Complete media management
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

31
SunTM SAM-FS Software Block Diagram
UserApplications
Archiver arfind arcopy
Releaserprocess
NFS apps. FTP shell and so on
SAM-INITmaster process
  • Sun SAM-FS
  • AdminCommands
  • Label
  • Import
  • Export
  • and so on

Manually loadedremovable media
Robotic Control process
Robotic Control process
Robotic Control process
Robotic control processes
Catalog(s)
Catalog(s)
  • Generic SCSI
  • Storage Tek
  • IBM
  • GRAU

32
SunTM SAM-FS Software Automated Robotic Control
  • Complete control of most all major libraries
  • Network attached STK, IBM, EMASS/Grau
  • Direct attached Ampex, Sony, ATL, STK, others
  • Support for most all major tape and optical
    devices
  • DLT, STK Redwood SD-3, Magstar, Sony DTF, Ampex
    DST, AIT, 3490E, HP, Sony, and others
  • Variable tape block size up to 2 GB
  • ANSI tape label processing
  • Catalog management
  • Stores important information like number of
    mounts, mount date, fill grade, vsn name, and so
    on in a catalog for each robot individually
  • Includes support for off-site tapes in a special
    historian catalog
  • Virtually unmatched performance
  • Complete file system
  • Complete media management
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

33
SAM-FS Software Automatic Volume Management
  • Simplified access to tape-based data
  • Applications perform normal I/O to a file name
  • SAM-FS automatically performs the tape mounts,
    label processing, positioning, data transfers,
    and so on
  • Multi-volume tape capability
  • Files can span multiple volumes
  • Multiple tape library capability
  • Supports multiple, different types of libraries
    simultaneously
  • Multiple drives and media types capability
  • Transparently read/write to multiple devices
    simultaneously at device speeds
  • Bad media handling and history tracking
  • Virtually unmatched performance
  • Complete file system
  • Complete media management(cont.)
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

34
SAM-FS Software Tape Management System
  • Automated tape management for tape-based
    applications
  • Directly write/read tapes in virtually any format
    using standard UNIXâ commands or applications
  • User data sets resident on a tape or tapes can be
    automatically referenced through a single file
    name
  • ANSI label processing (write labels, verify
    labels)
  • Multiple volume support
  • Bar code support
  • Virtually unmatched performance
  • Complete file system
  • Complete media management(cont.)
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

35
SAM-FS Software Advanced Storage Management
  • Flexible policy management for file grouping,
    media assignment, and so on
  • Special file attributes to customize and
    automatically control file access depending on
    user needs
  • Associative archiving/retrieving, direct
    retrieve/stage, thumbnails, and so on
  • "Time-based" archiving to best protect users data
  • No additional backup of user data necessary
  • Virtually unmatched performance
  • Complete file system
  • Complete media management
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

36
SAM-FS Software Advanced Storage Management
  • Includes most all standard HSM capabilities
  • Provides storage management capabilities through
    standard UNIX file system interface
  • "Virtual Disk" - most all data appears online to
    user whether online, near-line, or off-line
  • API interface available for flexible application
    control
  • Virtually unmatched performance
  • Complete file system
  • Complete media management
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

37
Advanced Storage and Archive Manager
  • Archive Copy files from disk cache to removable
    media automatically without operator intervention
  • Release Manage disk space, free up copied files
    from disk cache automatically
  • Stage Automatically bring copied files back to
    disk cache when accessed
  • Recycle Repack removable media for reuse
  • Virtually unmatched performance
  • Complete file system
  • Complete media management
  • Multi-layered data protection
  • Storage policy management
  • Advanced storage management

38
Advanced Storage and Archive Manager
  • Advanced storage management - options
  • Archive
  • Release
  • Stage
  • Recycle
  • Archive sets can be defined freely
  • By user, group, minsize, maxsize, directory,
    file, and wildcard
  • Copy to specified pool(s) of media
  • Support for scratch pools
  • Automatic archive set generation
  • Reserve media, used with scratch pools
  • Optimized archiving of large and small files
  • Default is large files first then small files
  • Files are optionally copied together
  • Join dir/size - sort date/size
  • Data verification - archive files with
    "checksum"
  • Shell commands "archive, sis, sfind, and so on"
    available

39
Advanced Storage and Archive Manager
  • Advanced storage management - options
  • Archive
  • Release
  • Stage
  • Recycle
  • Release only if a valid copy exists
  • Partial release of data possible
  • Beginning of file (stub) stays on disk
  • For file manager application, and so on
  • Amount to be released can be specified
  • Release directly after archiving
  • Release at watermarks
  • Optionally never release data
  • Data stays always on the disk cache
  • Shell commands to directly control releasing

40
Advanced Storage and Archive Manager
  • Associative stage
  • Files with the associative attribute set will be
    staged together
  • Read-behind stage
  • Data is immediately available to users or
    applications while staging of the file is still
    in progress
  • Never stage
  • Data will be given to user or application
    directly from the media without going through the
    disk cache
  • Pre-stage
  • Automatically stage a selection of data back to
    the disk cache
  • Shell commands to directly control staging
  • Advanced storage management - options
  • Archive
  • Release
  • Stage
  • Recycle

41
Advanced Storage and Archive Manager
  • Consolidation media with "inactive" files
  • "Inactive" files are files which no longer exist
    on file system, that is they have no inode entry
  • Policies to define level of recycling
  • Percent of "active" files on media with specific
    fill grade
  • Recycle media by robot or archive set
  • Recycle multiple volumes parallel
  • Exclude volumes from recycling
  • Automatically or manually re-label media after
    all active files have been moved to another media
  • Advanced storage management - options
  • Archive
  • Release
  • Stage
  • Recycle

42
SunTM SAM-FS Software OverviewDrive Support
  • Tapes
  • 3480/3490E (1/2-inch tape)
  • Ampex DST 310
  • Ampex DST 312
  • DLT 2000/4000/7000
  • Exabyte 8505c (8mm tape)
  • IBM Magstar 3590/3590E
  • STK Redwood SD-3
  • STK TimberLine 9490
  • STK 9840
  • Sony AIT/AIT2
  • Sony DTF
  • Sun DAT (4mm tape)
  • Optical Disk
  • HP 1714/15/16T (1.3 GB)
  • HP C1113F (2.6 GB)
  • IBM 632-C2X (5-1/4" WORM)
  • Maxoptix T4-2600 (1.3/2.6 GB)
  • Nikon DD121 (8.0 GB)
  • Sony SMO-F531/F541/F551
  • Plasmon DW260 (LIMDOW P2300DW/P2600DW)

43
SunTM SAM-FS Software Overview Robot Support
  • Spectra Logic 2000/9000/10000S
  • Overland Data LXB
  • Qualstar TLS 4000
  • Quantum DLT 2500/2700/4500/4700/XT
  • STK PowderHorn 9310
  • STK WolfCreek 9360
  • STK TimberWolf 9710/14/30/40
  • Sony DMS B9/35
  • Optical Media Changer
  • DISC DocuStore
  • HP SureStore ex/fx/st/t series
  • HP 1714T/1715T
  • LMS 4500/6600
  • Maxoptix MX-552
  • Plasmon 260/520/695 series
  • Tape Media Changer
  • ADIC Scalar 224/448/458
  • Ampex DST 410/412/712/810/812
  • ASACA AD-Series 15-900
  • ATL L500/P1000/P3000
  • ATL 520/2640/7100 series
  • Breece Hill Q series
  • EMASS AML series
  • GRAU ABBA series
  • IBM 3494 dataserver
  • IBM 3570 Magstar MP
  • Mountain Gate D-28/60/360/900 series
  • Mountain Gate N-300/540
  • Sony DMS-8400 PetaSite
  • Spectra Logic 2000/9000/10000S

44
SAM-QFS - Summary
  • Easily keeps up with data growth
  • Largest filesystems can be created
  • Exhaustive Tape Library and Media Support
  • Restoring data is instant
  • Having multiple copies of data provide Disaster
    Recovery features
  • Important middleware for Information Lifecycle
    Management

45
Key Points to Remember
  • Data is Growing Exponentially
  • Not all data are Equal
  • Even very old data may need to be submitted to
    auditors one day so the data can not be
    destroyed
  • Data Management is a challenge
  • Information Lifecycle Management tools, processes
    and technologies will help to meet the challenge

46
SummaryPoints to Remember
  • Data is Growing Exponentially
  • Not all data are Equal
  • Even very old data need to be submitted to court
    one day so the data can not be destroyed
  • Data Management is a challenge
  • Information Lifecycle Management tools, processes
    and technologies will help to meet the challenge

DataStore/Retrieve
Write a Comment
User Comments (0)
About PowerShow.com