Title: Implementing VTL and DeDuplication Technology
1Implementing VTL and De-Duplication Technology
- Gavin Cole
- Storage Consultant SEE
- Sun Microsystems
1
2Explosive data growth impacts performance and
ability to maintain Back-up SLA'sKeep
everything, delete nothingBackup/Restore
failures causing increased riskManagement
complexity making back-up process
ineffectiveCost to store data estimated at
5,935/TB per year
Factors impacting your ability to protect data
- Explosive data growth impacts performance and
ability to maintain Back-up SLA's - Keep everything, delete nothing
- Backup/Restore failures causing increased risk
- Management complexity making back-up process
ineffective - Cost to store data estimated at 5,935/TB per year
Back-up Window Data Volume Value of Data Storage
Complexity Budgets
Source ESG Study
3Market Trends
- InfoPro 2008
4Backup Storage Optimization
5Storage Optimization Cost SavingsThree Methods
- Straight disk for small TB configurations
- VTL with hardware compression
- Best used for performance sensitive back-ups
- Half the physical storage space needed to store
your back-up data (21 Compression ratio)? - Compression is applied across all storage within
VTL - Data De-duplication
- Flexibility Policy based
- Data De-duplication price best /GB for optimized
capacity - Look for duplicity in data AND long retention
timeframes - Best used for disaster recovery replication and
storing months of back-up versions - Engineering performance and/or higher
utilization of storage
6Virtual Tape Libraries
Open Systems Tape Emulation Improves Performance
and Enhances Backup
- Backup application thinks its talking to a tape
library - Actually backing up to VTL disk buffer
- Data can the be de-duplicated, replicated and/or
migrated to tape - Many more virtual tape resources
- Disk performance with tape look, feel, and cost
structure - Data movement without impact to servers or backup
window
- Emulated
- Library
- Drive
- Cartridges
Tape Automation
Sun VTLAppliance
7Do I need a VTL ?
Size of file systems used for D2D
Global amount of data
Restore times
Size of the DB
Number of backup policies
Number of HBA and OS
CPU usage
Primary/Secondary Disk Array
Number of volumes to be vaulted
Drive Sharing Module
Backup Server
SAN Disk
Multiple backup applications
SAN Throughput (difficult to feed)?
Cost of drive sharing module
Vault
Incremental Backup on tape take time to restore
Media Server
Number of tape drive to manage
Slow IP network
Cloning times
Number of media server
SAN Tape
Start/Stop Mechanism
Hardware error
IP Network
Tape drive Throughput (difficult to feed)?
Type of client (Oracle,SAP,Lotus ..)?
8Integrated Tape with your VTL
9Integrated Tape VTLAutomated Tape Caching
- The VTL acts as cache to the physical tape
library, providing transparent access to data
regardless of its location. - Tapes always appear to be inside virtual
libraries - Backup application will always have direct access
to data - VTL maps directly to tape resources
- Policy based migration of virtual cartridges to
physical tape - Time based - daily or weekly
- Policy based (can be combined)
- Age of data
- Disk space used
- End of backup
- Provides flexible space reclamation policies
- Free space immediately upon migration,
- After specified retention period,
- When running out of space
VTL Tape Library
10Integrated VTL NDMP 4.0
- NDMP 4.0 Using NetBackup Direct to Tape copy
- Backup application is in complete control
- Mount requests - physical drive and tape
- What data is copied
- VTL is only the data mover
- Symantecs Veritas NetBackup 6.5.1 and above only
- VTL Plus only - requires NDMP software option
11Integrated VTL NDMP 4.0 with ACSLS
VTL
BackupClients
NetBackup 6.5 Master and Media servers
SAN
data mover
mount request
ACSLS
12Integrated VTL ISV Copy Process
BackupClients
Tape Automation
MIAV(Man In A Van)
Backup servers
S A N
VTL Virtual Tape Library
- ISV backup application manages virtual and
physical tape devices and cartridges - Backup application is in complete control
- Mount requests - physical drive and tape
- What data is copied
- Highest Performance
13De-Duplication
14What is data de-duplication?
- A means through which software analyzes data
streams, using a hashing algorithm, to determine
repetitive data - A high-performance database is maintained to
store single instances of each unique block of
data - If data is identified as identical to existing
data in the database, only a pointer to that data
is stored, not a new instance of data - To restore data, pointers must be matched to the
data in database, and data must be reconstituted
15Data de-duplication Weighing the benefits
Premise
Considerations
- RELIABILITY Because only one instance of data is
stored, a single corruption will impact many
references - COMPLEXITY Adds several additional steps to
existing backup processes - PLANNING Actual compression ratios very
unpredictable - PERFORMANCE Extremely CPU-intensive, causing
performance degradation - COST Required high-performance disk server may
negate cost savings of de-duplication
- COST Significant budget space savings are
possible because only one instance of data is
stored - COST Double-digit compression ratios can be
achieved - COST Bandwidth savings for replication
- Can be significant
16Without Data De-Duplication
Sample parameters Data volume 20TB 5
growth and weekly change Onsite Retention 5
weeks
110TB
86TB
63TB
41TB
20TB
Week 1
Week 5
Week 4
Week 3
Week 2
17With Data De-Duplication
Sample parameters Data volume 20TB 5
growth and weekly change Onsite Retention 5
weeks
86 reduction
Data Stored 15.2 TB Redundant data NOT stored
94.8TB
20TB
15.2TB
14TB
12TB
10TB
6.6TB
Week 1
Week 5
Week 4
Week 3
Week 2
18Ultra-efficient replication
19Data Replication VTL Prime
- VTVs data blocks are Hashed (de-duped) resulting
in Virtual Index Tapes (VITs)? - VITs are replicated based on standard VTL IP
replication polices to the remote VTL Prime (VTL
to VTL)? - The remote VTL Prime PULLS new data blocks
(De-dup. repository to De-dup. repository)? - Only net new blocks are retrieved/pulled
- Global data de-duplication
20Suns VTL Prime ConfigurationSpoke and Hub
21Never Forget Tape
220kWh 0CO2
Data at Rest On Tape
23Off-Site Protection
24Thank you!
- Gavin Cole
- gavin.cole_at_Sun.com
24