Oracle Data Archiving Taming the Beast - PowerPoint PPT Presentation

About This Presentation
Title:

Oracle Data Archiving Taming the Beast

Description:

operational databases that are not expected to be referenced ... Word. Excel. PDF. XML. IMS. DB2. ORACLE. SYBASE. SQL Server. IDMS. VSAM. Programs. UNIX Files ... – PowerPoint PPT presentation

Number of Views:289
Avg rating:3.0/5.0
Slides: 71
Provided by: redd7
Learn more at: http://www.nocoug.org
Category:

less

Transcript and Presenter's Notes

Title: Oracle Data Archiving Taming the Beast


1
Oracle Data ArchivingTaming the Beast
Dave Moore Neon Enterprise Software
2
Agenda
Archiving Defined Requirements and Solutions
Oracle Archiving Strategies Oracle Row Removal
Options Oracle Post Archive Operations
3
Dave
  • Oracle ACE
  • Using Oracle since 1991
  • Product Author at Neon Enterprise Software
  • Creator of OracleUtilities.com
  • Author of Oracle Utilities from Rampant Tech
    Press
  • Core competencies include performance, utilities
    and data management

4
Database Archiving
Database Archiving The process of removing
selected data records from operational databases
that are not expected to be referenced again and
storing them in an archive data store where they
can be retrieved if needed.
Purge
5
Trends Impacting Archive Needs
Data Retention Issues Volume of data Length of
retention requirement Varied types of
data Security issues
6
Archiving All Types of Data
Paper Blueprints Forms Claims
Word Excel PDF XML
IMS DB2 ORACLE SYBASE SQL Server IDMS
VSAM Programs UNIX Files
Outlook Lotus Notes Attachments
Sound Pictures Video
7
Data Archiving and ILM
Create
Discard
Operational
Reference
Archive
Needed for completing business transactions
Needed for reporting or expected queries
Needed for compliance and business protection
Mandatory Retention Period
8
Some Sample Regulations Impacting Data Retention
9
What Does It All Mean?
  • Enterprises must recognize that there is a
    business value in organizing their information
    and data.
  • Organizations that fail to respond run the risk
    of seeing more of their cases decided on
    questions of process rather than merit.
  • (Gartner,
    20-April-2007, Research Note G00148170
    Cost of E-Discovery
    Threatens to Skew Justice System)

10
Operational Efficiency
  • Database Archiving can be undertaken to improve
    operational efficiency
  • Large volumes of data can interfere with
    production operations
  • efficiency of transactions
  • efficiency of utilities BACKUP/RESTORE, REORG,
    etc.
  • Storage
  • Gartner databases copied an average of 6 times!

11
What Solutions Are Out There?
  • Keep Data in Operational Database
  • Problems with authenticity of large amounts of
    data over long retention times
  • Store Data in UNLOAD files (or backups)
  • Problems with schema change and reading archived
    data using backups poses even more serious
    problems
  • Move Data to a Parallel Reference Database
  • Combines problems of the previous two
  • Move Data to a Database Archive

12
Components of aDatabase Archiving Solution
Data Recall
13
Archiving Requirements
  • Policy based archiving logical selection
  • Keep data for very long periods of time
  • Store very large amounts of data in archive
  • Maintain Archives for ever changing operational
    systems
  • Become independent from Applications/DBMS/Systems
  • Protect authenticity of data
  • Access data when needed as needed
  • Discard data after retention period automatically

14
Policy based archiving
  • Why
  • Business objects are archived, not files
  • Rules for when something is ready can be complex
  • Data ready to be archived is distributed over
    database
  • Implications
  • User must provide policies for when something is
    to be archived
  • How
  • Full metadata description of data
  • Flexible specification of policy WHERE clause

15
For Example
Parts Master is the parent table to all other
tables
PARTS MASTER
Part Number Type Description
Unit Type Cost Price
Substitute Parts
16
Keep Data for a Long Time
  • Why retention requirements in decades
  • Implications
  • Archive will outlive applications/DBMS/systems
    that generated them
  • Archive will outlive people who designed and
    managed operational systems
  • Archive will outlive media we store it on
  • How
  • Unique data store
  • Application/DBMS/system independence
  • Metadata independence
  • Continuous management of storage
  • Continuous management of archive content

17
Maintain Archive for Changing Operational Systems
  • Why
  • Metadata changes frequently
  • Applications are re-engineered periodically
  • Change DBMS platform
  • Change System platform
  • Replace with new application
  • Consolidate after merger or acquisition
  • Implications
  • Archive must support multiple variations of an
    application
  • Archive must deal with metadata changes
  • How
  • Manage applications as major archive streams
    having multiple minor streams with metadata
    differences
  • Achieve independence from operating environment

18
Achieve Metadata Independence
  • Why
  • Operational metadata is inadequate
  • Operational metadata changes
  • Operational systems keep only the current
    metadata
  • Data in archive often does not mirror data in
    operational structures
  • Implications
  • Archive must encapsulate metadata
  • Metadata must be improved
  • How
  • Metadata Capture, Validate, Enhance capabilities
  • Store structure that encapsulates with data
  • Keeps multiple versions of metadata

19
Protect Authenticity of Data
  • Why
  • Potential use in lawsuits/ investigations
  • Potential use in business analysis
  • Implications
  • Protect from unwanted changes
  • Show original input
  • Cannot be managed in operational environment
  • How
  • SQL Access that does not support I/U/D
  • Do not modify archive data on metadata changes
  • Encryption as stored
  • Checksum for detection of sabotage
  • Limit access to functions
  • Audit use of functions
  • Maintain offsite backup copies for restore if
    sabotaged

20
Access Data Directly From Archive
  • Why
  • Cannot depend on application environment
  • Implications
  • Full access capability within archive system
  • How
  • Industry standard interface (e.g. JDBC)
  • LOAD format output for
  • For load into a database
  • May be different from source database
  • Requires full and accurate metadata
  • Ability to review metadata
  • Ability to function across metadata changes

21
Discard Function
  • Why
  • Legal exposure for data kept too long
  • Implications
  • Data cannot be kept in archive beyond retention
    period
  • Must be removed with no exposure to forensic
    software
  • How
  • Policy based discard
  • System level function
  • Tightly controlled and audited
  • True zero out capability
  • Discard from backups as well

22
Database or Archive?
Keep in DB
Keep in Archive
Performance
Space
Compliance
23
Based on Data Availability
Purge
Keep in DB
Keep in Archive
Must be Available to App
Must be Available
Must Be Secure
Not Needed
24
Oracle Archiving Strategies
  • Designed Up Front (Yeah, right)
  • Determined by Application Owner
  • Implemented by ____________
  • Utilize Oracle Features

25
Finding Large Tables
  • DBA_SEGMENTS (bytes)
  • DBA_TABLES (num_rows)
  • or based on I/O

26
Rolling Windows
  • Self Managing
  • Mostly based on DATE
  • Utilize DBMS Features
  • Partitioning
  • Transportable Tablespaces
  • Exchange Partition
  • Set tablespace read only
  • Expdp
  • Copy export file and data file

27
Rolling Windows via Partitioning
P1
P47
Probably Never Accessed
Rarely Accessed
Heavily Accessed
Data Profile
Cheap as you can get
Not so fast or expensive
Fast, expensive
Storage Profile
Read / Write
Read Only / Compressed
Read Only
28
Why not use transportable tablespaces or Oracle
exports for data retention?
29
The Problem with Oracle Files
  • Transportable Tablespaces
  • Exports
  • Backups

Oracle
Export Files Datafiles
Import Trans Tsp
Version 16Z
Year 2007
Year 2030
Not a good method for LT Data Retention
30
Partitioning (Old ways)
  • Range Partitioning
  • Data is distributed based on partition key range
    of values usually a date.
  • Good When Data is date-based.

31
Partitioning (Old Ways)
  • Hash Partitioning
  • Uses hash algorithm to create equally sized
    buckets of data.
  • Good When No natural partition key and desire
    I/O balancing (hot spots).

32
Partitioning (Old Ways)
  • List Partitioning
  • Data is distributed based on LIST of values in
    partition key.
  • Good When Have short list of values (States,
    Regions, Account Types)

33
Partitioning (New Ways 11G)
  • Interval Partitioning
  • Initial Partition is created manually, the rest
    are automatically created as new data arrives.
  • Good When Need a rolling window!

34
Partitioning (New Ways 11G)
  • REF partitioning
  • Related Tables benefit from same partitioning
    strategy, whether column exists in children or
    not!
  • Good When Desire related data to be partitioned
    in the same manner.

35
Partitioning (New Ways 11G)
  • Virtual Column Partitioning
  • Partition key may be based on virtual column
  • Good When Virtual column is required for
    partition key.

36
Rows Gotta Go
37
Row Removal Options
  • SQL DELETE
  • CTAS / DROP / RENAME
  • TRUNCATE
  • Row Marking

38
SQL DELETE
  • Good for small number of rows
  • RI handled automatically
  • Oracle was born to DELETE, better than any
    PL/SQL that you write.
  • Issue with Un-indexed Foreign Keys ?

39
DELETE Optimization
  • Work in batches, committing (only when
    programmatically DELETING)
  • Use parallel DML (Partitioned tables only)
  • Drop Indexes before (if possible)
  • Index FK columns

40
CTAS
  • Works well for PURGE, not archive
  • Perfect when you want to keep low percentage of
    rows in the table
  • Doesnt handle RI no DELETE was issued.
  • Process
  • Create table with rows you want to keep
  • Drop old table
  • Rename table
  • Recreate indexes
  • create table new_table unrecoverable as select
    from old_table where ...

41
TRUNCATE
  • Congratulations if your application lends itself
    to TRUNCATE without losing new data
  • What about RI?
  • May truncate or drop individual partitions

42
DROP
  • DROP PARTITION
  • What would you do before you drop it?
  • Exchange partition with table
  • Transportable tablespace.

43
Things to Remember
  • Benchmark the best way for you
  • Benchmark against real data if possible
  • Use parallel DML

44
Design Summary
  • Create an architecture that lends itself to
    aging, archiving, deleting
  • This architecture should compensate for business
    requirements
  • For instance, customer orders not accessible
    after 6 months or
  • top query performance needed for all ACTIVE
    accounts etc
  • Implement it THE EASY PART

45
Post Archive Challenges
46
Post Archive Challenges
I have successfully deleted 10 billion rows from
the table. HoooAhhhh! Performance will be
great, space will be available, and I will get
credit for optimizing our data warehouse
application, saving the company billions of
dollars
47
2 Days Later
48
Post Archive Challenges
Hmmmmm. It looks like - Queries are not any
faster . . . - The Select count() took the same
amount of time . . . - Space was not freed in
Oracle (DBA_FREE_SPACE) . . . - Space was not
freed in the operating system . . . WHY NOT
????? Where are the benefits ???
49
From Swiss to Provolone
After Maintenance
After DELETE
50
Post Archive Challenges
  • Statistics are not fresh
  • High Water Marks are very high
  • Space has not been freed within Oracle (if
    thats what you want)
  • Space has not been freed to the OS

51
Refresh Statistics
  • Help the optimizer, easy enough
  • dbms_stats provides many options

52
Automatic Stats
  • Recommended by Oracle
  • Calls DBMS_STATS_JOB_PROC
  • Enabled via
  • Begin
  • dbms_auto_task_admin.enable(
  • client_name gt auto optimizer stats
    collection,
  • operation gt NULL,
  • window_name gt NULL)
  • END
  • /

53
When do you go manual ?
  • High transaction DELETEs or TRUNCATEs
  • Bulk loads which add more than 10 of table size
  • So theres our answer go manual.

54
How do we Gather Them?
  • NOT the Analyze Command
  • Instead DBMS_STATS package
  • exec dbms_stats.gather_table_stats(ownname gt
    'BDB', tabname gt 'MASTER', estimate_percent gt
    dbms_stats.auto_sample_size)

55
High Water Mark
56
High Water Mark
57
Reset High Water Mark (HWM)
  • DROP or TRUNCATE
  • Multiple OTHER ways to do this depending on
    version
  • In v9 alter table move tablespace tsp name
  • Row movement must be enabled
  • Tablespace must be a LMT
  • Can move into same tablespace
  • Will occupy 2X space temporarily
  • Must then rebuild indexes
  • In v10 alter table lttable_namegt shrink space

58
  • Freeing Allocated Space

59
Create table, check space
SQLgt create table space_example as select from
dba_source Table created. SQLgt select
count() from space_example
COUNT() ---------- 296463 SQLgt exec
dbms_space.unused_space(DAVE',
'SPACE_EXAMPLE') Total blocks 6328 Unused
blocks 1 Unused bytes 8192 Last Used Block
55 Last Used Block ID 10377 Last Used Ext File
ID 4
60
Check datafile space

Size Current Poss. FILE_NAME
Poss. Size
Savings ------------------------------------------
-------- -------- -------- -------- /export/home/o
ra102/oradata/ora102/qasb001.dbf 29
46 17 /export/home/ora102/oradata/ora102/e
xample01.dbf 69 100
31 /export/home/ora102/oradata/ora102/qasb002.dbf
41 41 0 /export/home/ora102
/oradata/ora102/system01.dbf 493 500
7 /export/home/ora102/oradata/ora102/sysaux
01.dbf 430 430
0 /export/home/ora102/oradata/ora102/undotbs01.dbf
91 175 84 /export/home/ora102/
oradata/ora102/users01.dbf 44 83
39 /export/home/ora102/oradata/ora102/test.db
f 51 70 19
61
Delete rows, check space
SQLgt delete from space_example 296463 rows
deleted. SQLgt commit SQLgt exec
dbms_space.unused_space(DAVE', 'SPACE_EXAMPLE')
Total blocks 6328 Unused blocks 1 Unused
bytes 8192 Last Used Block 55 Last Used Block
ID 10377 Last Used Ext File ID 4
Nothing Changed !
62
Shrink it, check space
SQLgt alter table space_example enable row
movement SQLgt alter table space_example shrink
space SQLgt exec dbms_space.unused_space('BDB',
'SPACE_EXAMPLE') Total blocks 8 Unused
blocks 4 Unused bytes 32768 Last Used Block
4 Last Used Block ID 5129 Last Used Ext File ID
4
Space Freed From Table, but still in Oracle
63
Check space again

Size Current Poss. FILE_NAME
Poss. Size
Savings ------------------------------------------
-------- -------- -------- -------- /export/home/o
ra102/oradata/ora102/qasb001.dbf 29
46 17 /export/home/ora102/oradata/ora102/e
xample01.dbf 69 100
31 /export/home/ora102/oradata/ora102/qasb002.dbf
41 41 0 /export/home/ora102
/oradata/ora102/system01.dbf 493 500
7 /export/home/ora102/oradata/ora102/sysaux
01.dbf 430 430
0 /export/home/ora102/oradata/ora102/undotbs01.dbf
171 175 4 /export/home/ora102/
oradata/ora102/users01.dbf 44 83
39 /export/home/ora102/oradata/ora102/test.db
f 1 70 69
This datafile should be resized to save 69 MB
SQLgt alter database datafile '/export/home/ora102/
oradata/ora102/test.dbf' resize 1m
64
Free the Space
  • Space is still reserved for future inserts and
    updates, just not freed back to the OS
  • Space will not be automatically freed confirm
    by checking DBA_FREE_SPACE
  • Ways to set it free
  • drop
  • truncate
  • alter table move
  • alter table shrink space

65
Unindexed Foreign Keys Example
1 Million Rows
PARENT
COL1
ON DELETE CASCADE
COL1 COL1_PARENT
1 Million Rows
SQLgt DELETE FROM PARENT WHERE COL1 lt 1000
Fky.sql
66
Before Index
delete from parent where col1 lt 1000 call
count cpu elapsed disk query
current rows ------- ------ --------
---------- ---------- ---------- ----------
---------- Parse 1 0.01 0.08
2 27 0 0 Execute
1 0.90 0.80 4 2208799
6062 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 2 0.91 0.88
6 2208826 6062
999 delete from "DAVE"."CHILD" where
"COL1_PARENT" 1 call count cpu
elapsed disk query current
rows ------- ------ -------- ----------
---------- ---------- ----------
---------- Parse 1 0.00 0.00
0 0 0 0 Execute
999 285.94 293.11 1543900 2208789
1029 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 1000 285.94 293.11
1543900 2208789 1029 999
67
SQLgt create index prnt_ndx on child(col1_parent)
delete from parent where col1 lt 1000 call
count cpu elapsed disk query
current rows ------- ------ --------
---------- ---------- ---------- ----------
---------- Parse 1 0.00 0.00
0 0 0 0 Execute
1 0.53 0.47 7 13
7053 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 2 0.53 0.47
7 13 7053
999 delete from "DAVE"."CHILD" where
"COL1_PARENT" 1 call count cpu
elapsed disk query current
rows ------- ------ -------- ----------
---------- ---------- ----------
---------- Parse 1 0.00 0.00
0 0 0 0 Execute
999 0.42 0.46 2 3002
4058 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 1000 0.42 0.46
2 3002 4058 999
68
Unindexed Foreign Keys
  • Problem is not limited to DELETE statements
  • Search database for unindexed FK columns
  • Script is on asktom
  • Search for unindex.sql

69
Summary Points
  • Create sound Archiving strategy based on Oracle
    technical features as well as business and/or
    legal requirements
  • Leverage partitioning
  • Move partitions to cheap disk when appropriate
  • Make partitions read only and compressed
  • Remove data via DROP or TRUNCATE if possible
  • If SQL DELETE, make sure to perform maintenance
    operations
  • Consider 3rd party solutions

70
Questions?
Well done is better than well said
Ben Franklin
Write a Comment
User Comments (0)
About PowerShow.com