Title: Thomas E' Canty ServerCare, Inc' Session
1Thomas E. CantyServerCare, Inc.Session 126
Data Guard Best Practices Tuning
2Speaker Qualifications
- Thomas E. Canty, Senior Oracle DBA, ServerCare,
Inc. - 19 years of Oracle experience, starting with
version 5 - Has presented at IOUG, OpenWorld, NoCOUG, IASA,
- Has been a DBA, Developer, Architect, and IT
Manager
- Has worked with Fortune 100 companies in
Healthcare, Technology, Pharmaceuticals, and
Telecom, as well as Major Universities
888-918-6309 http//www.ServerCare.com
3Outline
- Overview
- Network Optimization
- ARCn LGWR Redo Transport
- Checkpoint, Redo Read/Apply Recovery
- Wait Events
- 10g R2 11g Improvements
- Best Practices
4Data Guard Modes
- Maximum Performance Mode
- Least performance impact
- Default mode
- Maximum Protection Mode
- Emphasis on data safety
- Requires at least one secondary
- Maximum Availability Mode
- Emphasis on uptime
- Continues if secondary unavailable
5Physical vs. Logical Standby
6Outline
- Overview
- Network Optimization
- ARCn LGWR Redo Transport
- Checkpoint, Redo Read/Apply Recovery
- Wait Events
- 10g R2 11g Improvements
- Best Practices
7Session Data Unit (SDU)
- In Oracle Net connect descriptor
- sales.servercare.com
- (DESCRIPTION
- (SDU32767)
- (ADDRESS(PROTOCOLtcp)
- (HOSTsales-server)(PORT1521))
- (CONNECT_DATA
- (SIDsales.servercare.com)))
- Globally in sqlnet.ora
- DEFAULT_SDU_SIZE32767
8Session Data Unit (SDU) (Cont.)
- On standby DB, set in listener.ora
- SID_LIST_listener_name
- (SID_LIST
- (SID_DESC
- (SDU32767)
- (GLOBAL_DBNAMEsales.servercare.com)
- (SID_NAMEsales)
- (ORACLE_HOME/u01/app/oracle/product/10.2.0/db_1))
)
9TCP Socket Buffer Size
- Set TCP socket buffer size 3 BDP
- Data Guard broker config. Set in sqlnet.ora
- Non Data Guard broker set in connect descriptor
- BDP - Bandwidth Delay Product
- RTT- Round Trip Time
10TCP Socket Buffer Size
- Assume gigabit network with RTT 25 ms
- BDP 1,000 Mbps 25msec (.025 sec)
- 1,000,000,000 .025
- 25,000,000 Megabits / 8 3,125,000 bytes
- In this example
- socket buffer size 3 bandwidth delay
- 3,125,000 3
- 9,375,000 bytes
- sqlnet.ora
- RECV_BUF_SIZE9375000
- SEND_BUF_SIZE9375000
11Network Queue Sizes
- Between kernel net. subsystems NIC driver
- txqueuelen - transmit queue size
- netdev_max_backlog - receive queue size
- Assumes gigabit network with 100ms latency
- Set queues
- ifconfig eth0 txqueuelen 10000
- sysctl.conf
- net.core.netdev_max_backlog20000
12Overall Network
- Ensure sufficient bandwidth to standby
- Verify TCP_NODELAY set to YES (default)
- RHEL3 - increase /proc/sys/fs/aio-max-size on
standby - From 131072(default) to 1048576
- Set RECV_BUF_SIZE SEND_BUF_SIZE 3 Bandwidth
Delay Product (BDP) - Use Session Data Unit (SDU) size of 32767
- Increase send receive queue sizes
- TXQUEUELENGTH
- NET_DEV_MAX_BACKLOG
13Outline
- Overview
- Network Optimization
- ARCn LGWR Redo Transport
- Checkpoint, Redo Read/Apply Recovery
- Wait Events
- 10g R2 11g Improvements
- Best Practices
14ARCn Redo Transport
1) Read from local arch 2) Receive redo 3) Ack -
know -ledge
15ASYNC LGWR Redo Transport
1) Write local redo 2) ASYNC send redo 3) Receive
redo 4) Ack - know -ledge 5) Write stdby redo
16SYNC LGWR Redo Transport
1) Write local redo 2) SYNC send redo 3) Receive
redo 4) Ack - know -ledge 5) Post receipt to LGWR
17Optimize ARCn Transport
- Increase MAX_CONNECTIONS to 5 on standby (if
possible) - default (2), maximum (5)
- Increase LOG_ARCHIVE_MAX_PROCESSES
- Larger than MAX_CONNECTIONS
- Up to network bandwidth
- default (2), maximum (30)
18Optimize LGWR Transport
- Decrease NET_TIMEOUT (default 180 secs.)
- Be careful! - Not too low
- New COMMITS
- COMMIT IMMEDIATE WAIT (default)
- COMMIT NOWAIT
- COMMIT NOWAIT BATCH
19All Redo Transport
- Standby redo logs
- Use fastest disks
- No RAID5
- Dont multiplex
- Use the recommended number of SRLs
- (maximum of online logfiles 1) maximum of
threads
20Outline
- Overview
- Network Optimization
- ARCn LGWR Redo Transport
- Checkpoint, Redo Read/Apply Recovery
- Wait Events
- 10g R2 11g Improvements
- Best Practices
21Checkpoint Phase
- Checkpoint occurs
- During log switch
- LOG_CHECK_TIMEOUT expiration
- LOG_CHECKOUT_INTERVAL reached
- Reduce log switch interval
- Resize redo log to 1GB - primary and secondary
- Recommended - checkpoint every 15 minutes
22Checkpoint Phase (Cont.)
- Determine checkpoint frequency
- COL NAME FOR A35
- SELECT NAME, VALUE, TO_CHAR(SYSDATE, HHMISS)
TIME - FROM VSYSSTAT WHERE NAME 'DBWR checkpoints'
- NAME VALUE
TIME - ----------------------------------- ----------
-------- - DBWR checkpoints 264
081543 - SQLgt /
- NAME VALUE
TIME - ----------------------------------- ----------
-------- - DBWR checkpoints 267
083406
23Redo Read (Secondary)
- Obtain read rate for the standby redo log
- SQLgt ALTER SYSTEM DUMP LOGFILE
'/u01/oradata/docprd/sredo01.log validate - System altered.
-
- vi docprd_ora_3560.trc
- Mon Mar 12 085952 2007
-
- ----- Redo read statistics for thread 1 -----
- Read rate (ASYNC) 4527Kb in 0.58s gt 6.90
Mb/sec - Longest record 19Kb, moves 0/7586 (0)
- Change moves 4340/18026 (24), moved 2Mb
- Longest LWN 92Kb, moves 1/1365 (0), moved
0Mb - Last redo scn 0x0000.01272351 (19342161)
24Redo Apply (Secondary)
- Goal
- Redo apply rate (secondary) gt Redo create rate
(primary) - Carefully consider enabling DB_BLOCK_CHECKING
- LOW, MEDIUM and FULL options
- Possible performance impact
25Redo Apply (Cont.)
- Determine Log Block Size (LEBSZ)
- SELECT LEBSZ FROM XKCCLE WHERE ROWNUM1
- Get recovery blocks - at least two snapshots
- Managed Recovery Case
- SELECT PROCESS, SEQUENCE, THREAD, block,
BLOCKS, - TO_CHAR(SYSDATE, 'DD-MON-YYYY HHMISS') time
- from vMANAGED_STANDBY WHERE PROCESS'MRP0'
- Determine the recovery rate (MB/sec) for a
specific archive sequence number - Managed Recovery Case
- ((BLOCK_END - BLOCK_BEG) LOG_BLOCK_SIZE) /
- (TIME_END - TIME_BEG) 1024 1024
26Redo Apply (Cont.)
27Recovery
- Parallel Recovery (before 10.1.0.5)
- Set to number of CPUs
- recover managed standby database parallel ltgt
- PARALLEL_EXECUTION_MESSAGE_SIZE
- Can increase to 4096 or 8192
- Uses additional shared pool memory
- Problems if set too high
- DB_CACHE_SIZE
- Can set secondary DB_CACHE_SIZE gt primary
- Must set to primary before changing roles
28Outline
- Overview
- Network Optimization
- ARCn LGWR Redo Transport
- Checkpoint, Redo Read/Apply Recovery
- Wait Events
- 10g R2 11g Improvements
- Best Practices
29Arch Wait Events - Primary
- ARCH wait on ATTACH
- Time for all arch processes to spawn RFS
connection - ARCH wait on SENDREQ
- Time for all arch processes to write received
redo to disk open close remote archived redo
logs - ARCH wait on DETACH
- Time for all arch processes to delete RFS
connection
30LGWR SYNC Wait Events - Primary
- LGWR wait on ATTACH
- Time for all log writer processes to spawn RFS
connection - LGWR wait on SENDREQ
- Time for all log writer processes to write
received redo to disk open close the remote
archived redo logs - LGWR wait on DETACH
- Time for all log writer processes to delete RFS
conn.
31LGWR ASYNC Wait Events - Primary
- LNS wait on ATTACH
- Time for all network servers to spawn RFS
connection - LNS wait on SENDREQ
- Time for all network servers to write received
redo to disk open close the remote archived
redo logs - LNS wait on DETACH
- Time for all network servers to delete RFS conn.
- LGWR wait on full LNS buffer
- Time for log writer (LGWR) process awaiting for
network server (LNS) to free ASYNC buffer space
32Wait Events on Secondary
- RFS Write
- Time to write to standby redo log or archive log
non I/O work like redo block checksum
validation - RFS Random I/O
- Time to write to a standby redo log to occur
- RFS Sequential I/O
- Time to write to an archive log to occur
33Outline
- Overview
- Network Optimization
- ARCn LGWR Redo Transport
- Checkpoint, Redo Read/Apply Recovery
- Wait Events
- 10g R2 11g Improvements
- Best Practices
3410g R2 Improvements
- Multiple archive processes can transmit a redo
log in parallel to the standby database - MAX_CONNECTIONS attribute of the
LOG_ARCHIVE_DEST_n controls the number of these
processes - Parallel Recovery for Redo apply is automatically
set equal to number of CPUs - 10.1.0.5 and 10.2.0.1
- Fast-Start Failover
- Automatically fails over to a previously chosen
physical standby database
3510g R2 Improvements (Cont.)
- LGWR ASYNC
- Uses a new process (LNSn) to transmit the redo
data directly from the online redo log to the
standby database - Physical standby database flashback
- Can flash back temporarily for reporting
- Logical standby database
- Automatically deletes applied archived log
- RMAN
- Automatically creates temp datafiles after
recovery
3611g Improvements
- Physical standby database open read/write for
test or other purposes with zero compromise in
data protection using new Snapshot Standby - Automatic failover configurable for immediate
response to designated events or errors - More flexibility in primary/standby
configurations - e.g. Windows primary and Linux standby
- Rolling upgrade options now in physical standby
with Transient Logical Standby - ASYNC transport enhanced to eliminate the impact
of latency on network throughput
3711g Improvements (Cont.)
- Fast detection of corruptions caused by lost
writes in the storage layer - SQL Apply supports XML data type (CLOB)
- Many performance, manageability, and security
enhancements - Support for new Oracle Database 11g Options
Oracle Active Data Guard and Oracle Advanced
Compression - Fast Start Failover now available for Maximum
Performance mode
38Outline
- Overview
- Network Optimization
- ARCn LGWR Redo Transport
- Checkpoint, Redo Read/Apply Recovery
- Wait Events
- 10g R2 11g Improvements
- Best Practices
39Best Practices
- Geographically separate primary standby DB
- Ensure standby hardware configuration same as the
primary - Tune standby for write intensive operations
- Test Data Guard before deploy in production
- Set standard OS and DB parameters to recommended
values - Perform switchover testing
- Fully document a failover procedure
- Use FORCE LOGGING mode
40Best Practices (Cont.)
- Use real-time apply
- Use the Data Guard Broker
- Enable Flashback Database on both primary and
secondary databases - Evaluate using AFFIRM attribute
- Possible performance issues on primary
- Verify Asynchronous I/O enabled
- Carefully consider DB_BLOCK_CHECKING
41Best Practices (Cont.)
- Dont multiplex standby redo logs (SRLs)
- Correctly set number of SRLs
- Increase PARALLEL_EXECUTION_MESSAGE_SIZE
- Place SRLs in fast disk group or disks
- Use at lease two standby DBs with Maximum
Protection Mode - Utilize COMMIT NOWAIT if appropriate
42Best Practices (Cont.)
- Ensure appropriate bandwidth between primary and
secondary - Increase default send receive queue sizes
- TXQUEUELENGTH
- NET_DEV_MAX_BACKLOG
- Session Data Unit
- Adjust value to 32767
- Improvement during large data transmissions
43Questions?
- Lots of things we didnt cover
- If we dont cover something you wanted to hear,
please contact me.
44Session 126 Data Guard Best Practices Tuning
THANK YOU!
Please fill out evaluations!
Email Tom Canty tom.canty_at_servercare.com
Or Call 888-918-6309 http//www.ServerCare.com