Microsoft SQL Server High Availability and Disaster Recovery - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Microsoft SQL Server High Availability and Disaster Recovery

Description:

Manual intervention required to restore offline data copy. Backup and restore. Warm standby ... Backup and Restore. Database. Backup. Log Shipping. Primary ... – PowerPoint PPT presentation

Number of Views:3291

Avg rating:5.0/5.0

Slides: 50

Provided by: michael1059

Category:

more less

Transcript and Presenter's Notes

Title: Microsoft SQL Server High Availability and Disaster Recovery

1
Microsoft SQL Server High Availabilityand
Disaster Recovery

Michael Poremba // October 2008

2
Database HA DR Experience

Work with business to determine HA or DR
requirements for applications and data?
Design HA or DR solutions?
Administer HA or DR process?
Still learning MS SQL Server HA DR capabilities?

3
Scope of this Presentation

Presentation Focus

Beyond Scope of Presentation

Data Availability
Data recovery
High availability
Disaster recovery
Technology Focus
MS SQL Server
Physical servers
SANs

In-depth how-to(available elsewhere)
Partitioned views (federated)
Advanced DBA techniques
Custom application logic
3rd-party software solutions
Alternate DBMS engines(e.g. Oracle DB2)
HA on virtual machines
Complex scenarios solutions
Load balancing

4
Introduction to Data Availability

So, you need to make yourproduction database
bulletproof

5
Data Availability Continuum

Degrees of protection for information systems

6
Business Case for Availability

High Availability

Disaster Recovery

Keep business-critical applications available
Secondary
Server maintenance

Protect against loss of data center
Secondary
Application upgrades
Infrastructure upgrades

7
Service Level Agreement (SLA)

Permitted downtime (planned vs. unplanned?)
Acceptable data/transaction loss
Application response times
Mean time to recovery
Note Database uptime is not equivalent to
application availability
Failures of other application services
Network outages

8
Protect What?

Application data stores
Databases
Files
Other data repositories
Database services
DBMS availability for applications
Application services
Application availability for users and external
systems
Databases are the heart of most information
systemsthey deserve the highest affordable
protection.

9
Database Failure Scenarios

Physical Infrastructure Failures

Logical Data Failures

Storage subsystem
Disk
Controller
Network
Server
Power

Operator errors
DBMS interruption
Drops / deletes
Application defects
DBMS defects
Data corruption

10
Service Recovery Strategies
11
Data RecoveryTerminology

Terminology varies for source vs. copy

12
Data Recovery

Briefly

13
Database Backups

Traditional backup types
Full backup
Differential backup
Transaction log backup
Disk is better than tape
First backup to disk (separate physical disk
volume)
Detect exceptions encountered during backup
Verify backup files
Copy backup files to tape or remote disk
Data retention policy for backup files

14
Database Backup Strategy

Backup of user databases not sufficient for
recovery
System database
Master database
MSDB database
Model database
External data stores

15
Synch with External Data Stores

Synchronize recovered database with external data
stores
Identity column seeds
Full-text indexes(SQL Server 2000)
LDAP entries
File system objects
Other databases

16
Backup Retention Policy

Location of backup files
Duration of retention
Protection of sensitive data
Sarbanes/Oxley (SOX)
HIPAA
Internal policies for data management and
protection
Access to backups from offsite data storage

17
Data Recovery Process

Backup file sets
Full baseline, differential, and transaction logs
Retrieving backup files
Offsite storage
Tape
Network copy
Dependency on multiple people to get access to
backup files

Recovery strategy depends on failure scenario
Create comprehensive failure matrix
Devise recovery strategy for each scenario
Does worst-case recovery scenario fit within SLA
parameters?
Recovery time SLA
Include future data growth in recovery plan
Fully test recovery strategiespractice is
essential

18
High Availability
19
High Availability

Minimize or avoid service downtime
Whether planned or unplanned
When components fail,service interruption is
brief or non-existent
Automatic failover
Eliminate single points of failure (as
affordable)
Redundant components
Fault-tolerant servers

20
Redundant Components

Objective Avoid single points of failure (where
affordable)
Approach Use redundant components for database
service
Database server nodes
Server components
ECC RAM failure-tolerant HW OS
DBMS instance
User databases
Storage devices
Storage unit components
MPIO Interfaces paths switches controllers
RAID Disks
Networking
MPIO Interfaces paths switches
Data copies
E.g. Recovering torn page from mirror in SQL
Server 2008

21
Transaction Log Shipping

Warm standby solution
Duplicate user database
Copy transaction logs to standby server restore
Database available for read-only access
Users must disconnect for logs to be applied
Two database licenses required if querying
standby
Manual application failover
Supported on standard hardware
Possible data loss (unapplied transactions)

22
Database Mirroring

Redundancy at user database level
Duplicate copy of user database
Independent storage devices
Multiple copies of instance databases
Mirrored over private network channel
Mirror always redoing transactions from principal
Negligible impact on transaction throughput
Multiple mirroring modes
High-availability commit _at_ log on mirror
automatic failover
High-protection commit _at_ log on mirror manual
failover
High-performance commit when logged on principal
Very fast automatic failoverseconds
Requires witness server
Mirror-aware application client connection
Provided by client library
Database connection string must specify both
servers
Mirror may be available for read-only access
(snapshots)
Works with standard hardware

23
Mirror Witness

With mirroring, more than one server is required
to decide on failover
Witness automates failover from primary to mirror
Watches database availability
Reports observations back to principal and mirror
Runs in separate SQL Server instance (Express is
OK)
Prevents split brain scenario
Very low resource consumption
Can be witness for multiple databases
Not a single point of failure

24
SQL Server Failover Clustering

Two clustered nodes
Active/Passive config
MS SQL services
Running on virtual server
Shared storage device
User databases
System databases
Quorum drive
Redundant internal components

25
Active/Passive Failover Clustering

Redundancy at database instance level
All databases fail over together
Shared copy of system databases
Single data copy on shared storage device
No I/O overhead reducing throughput
Storage unit is single point of failure for
cluster
All database services are clustered
SQL Agent Analysis Services Full-Text engine,
MS DTC
Automatic failover (up to minutes)
DBMS accessed over virtual IP
Database not available from inactive node for DB
client connections
Storage is controlled by one cluster node at a
time
Requires hardware certified by Microsoft for
Microsoft Cluster Service

26
HA Comparison

Database Mirroring

Failover Clustering

Scope user DB
Standard hardware
One SQL license(unless querying snapshots on
mirror)
Very fast failover (seconds)
OS flexible (e.g. 32/64)
Independent storage
Independent services
Reporting on mirror
Geographic separation OK

Scope DBMS instance
Certified hardware
One SQL license(only one node can access
database)
Automatic failover (up to minutes)
Enterprise OS
Shared storage
Clustered services
Standby not available
Servers are usually co-located

27
Considerations for HA

HA complements backup and recovery strategy
Does not replace data recovery plan
Application service availability is often
determined by a network of interdependent
services
Availability can be difficult to define (e.g.
partial failures)
Failure probability difficult to measure or
compute
Increased system complexity could lead to lower
service availability!
Operator error a leading cause of availability
issues
Increased number/types of system components
More complex to configure and administer

28
Data Recovery Requirements
29
Disaster Recovery
30
Disaster Recovery

Minimize downtime of business operations
Redundant systems and facilities
SQL Server features
Transaction log shipping
Database mirroring
Failover clustering
Other technologies
Storage-based mirroring

31
Disaster Recovery Planning

Data security requirements
Clarify SLA, data loss allowance
Evaluate system cost vs. data protection
Failure analysis
System redundancy
Process validation
Training for personnel
Prevention practices
Executing disaster recovery and business
continuity
Practice, practice, practice

32
Business Continuity Facility

System redundancy
Systems Web servers app servers database, etc.
Data Databases data files on OS security info,
etc.
Networking Domain, routing, subnet, VIPs, etc.
Alternate facilities
Network bandwidth
Physical or network access by operations staff
Failover
Often a deliberate decision, using manual failover

33
Data Redundancy

Synchronous redundancy
Network bandwidth cost
Network latency and application performance
Network reliability
Asynchronous redundancy
Risk of data loss
More cost-effective
Resilient to network latency issues
Candidate Technologies
SQL Server database mirroring
Failover clustering with SAN-based mirroring

34
DR Using Database Mirroring

Two sites Primary and DR location
Separate failover clusters at each site
SQL Server database mirroring between sites

35
DR Using SAN-Based Mirroring

Two sites Primary and DR location
Four-node failover cluster one virtual IP
address
SAN-based mirroring between sites
Manual cluster failover

36
Complimentary Technologies

Skip if time is running short.

37
SAN-Based Data Mirroring

Data blocks duplicated at storage level
Similar to transaction log shipping
Copy performed in sequence and coordinated with
database checkpoint
Ensures consistency of mirrored data files
Synchronous or asynchronous mirroring
Co-located or geographically dispersedboth are
OK
SAN link bandwidth must support database I/O rate
May require extra feature support from SAN vendor
Could rely on Failover Clustering for HA

38
SQL Server Database Snapshots

Read-only point-in-time database snapshot
No data is copiedinstantaneous
Historical snapshot pages tracked separately from
changing pages
Snapshots can be maintained indefinitely
Limited only by available storage
Snapshot copy can be used for reporting
Read-only, so no locking issues

39
SQL Server Replication

Transactional replication
High transaction volume
Low data latency required
Mixed technologies Integrates with other DBMS
Merge replication
Bi-directional data changes
Typically server-to-client
Snapshot replication
Large, infrequent data changes
Data change latency OK
Best for smaller data sets

Subscriber databases available for reporting
Replicate data subsets
Some data loss is possible
Periodically validate replicated data

40
App Development and Admin
41
Considerations for App Developers

App services tolerant to database service
interruptions
Application transactions must be handled in
codedata consistency
Exception handling for transaction retry,
connection recovery
Requires coding standards, code reviews, and
testing
Bulk data operations
Transaction volume impacts rollback time during
failover
Batch jobs must be run on alternate nodes
Dont bypass transaction logging
Synchronization with external data sources?
Be aware of database recovery model
Mirroring uses FailoverPartner in connection
string
Use TCP/IP as client protocol

42
Considerations for Admins

Use identical server hardware, when possible
Design network redundancies, when feasible
Consider network latency for geographic
separation
Always manage through virtual cluster, not
individual cluster nodes
Retest failover/failback after HA maintenance
Diagnose after failover
Repair alternate node
Resynchronize data, as necessary
Be aware of primary/secondary locations
Ensure application services are connected and
functioning properly
Keep server node configurations synchronized
Service pack and patch levels
Duplicate non-redundant resources
Jobs logins and permissions OS sys objects

43
HA Risks

System performance degradation
HA system complexity leads to availability issues
Some system failures not planned for
Backup and recovery planning incomplete
Administrators not fully trained or informed
User databases not synchronized with other data
sources

44
Common Admin Use Cases

Maintain HA nodes
Hardware maintenance
Rolling upgrades and software patches
Resynchronize the redundant copy
Re-synch mirror
Restart log shipping
Diagnose and repair
Diagnose cause of failover
Repair failed node and restore failover
capabilities
Test failover and failback

45
Common Admin Actions

Train and practice administrators to
Initiate a database mirror
Manually failover mirror database or cluster node
Add/remove passive node from mirror or cluster
Upgrade/patch servers nodes
Restart or redirect application services

46
More Information
47
ReferencesBooks

High Availability

Related Topics

Microsoft SQL Server 2008 High Availability with
Clustering Database Mirroringby Michael Otey,
2009.
Microsoft SQL Server High Availabilityby Paul
Bertucci, 2004.
Pro SQL Server 2005 High Availabilityby Allan
Hirt, 2007.

Pro SQL Server 2005 Replication by Sujoy Paul,
2006.
Pro SQL Server 2005 Service Brokerby Klaus
Aschenbrenner, 2007.
The Rational Guide to SQL Server 2005 Service
Brokerby Roger Wolter, 2006.

48
ReferencesPresentations

Microsoft Load Balancing and Clusteringhttp//ce.
sharif.edu/courses/84-85/2/ce317/resources/root/le
cture20slides/14.20Microsoft20Load20Balancing
20and20Clustering.ppt
SQL Server 2005 High Availabilityhttp//www.atlan
tamdf.com/Presentations/AtlantaMDF_111207HA.ppt
High Availability Technologies In SQL Server 2000
And SQL Server 2005 http//202.181.238.2/hk/teche
d2004/ppt/Day_2_Rm407/DAT431(1330-1445).ppt
Meeting the Availability Challengehttp//download
.microsoft.com/download/E/D/C/EDCF54DB-19CD-4882-9
FC4-4F7D46FCEAA6/HighAvailability.ppt
Disaster Recovery Mistakeshttp//www.sqlsig.org/O
ct201120DASSUG20-20Jason20Hall2010-11-0720M
M.ppt
SQL Server 2005 High Availabilityhttp//blogs.msd
n.com/sql2005event/attachment/564303.ashx
Effective Usage of SQL Server 2005 Database
Mirroring http//www.sqlserver-qa.net/SSQA-Effect
ive20Usage20of20SQL20Server20200520Database
20Mirroring_show.ppt

49
ReferencesArticles

Achieve High Availability for SQL
Serverhttp//technet.microsoft.com/en-us/magazine
/cc162477.aspx
Geographically Dispersed Clusters in Windows
Server 2003http//www.microsoft.com/windowsserver
2003/techinfo/overview/clustergeo.mspx
Restoring file and filegroup backups
http//support.microsoft.com/kb/281122/en-us
Restoring specific tables or rows from
backupshttp//support.microsoft.com/kb/321836/en-
us
Maintaining Availability During
Upgradeshttp//msdn.microsoft.com/en-us/library/m
s191449.aspx