Stack It

About This Presentation

Title:

Stack It

Description:

UK Music Sales Data Mart. Produces BBC Radio 1 Top 40 chart and many more ... Serial Inserts using INSERT /* APPEND */ Parallel Inserts (PDML) ALTER TABLE...MOVE... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 35

Provided by: jeff51

Category:

Tags: stack

more less

Transcript and Presenter's Notes

Title: Stack It

1
Stack It Pack ItPartitioning And Compression
For Warehouses / VLDB

Jeff Moss

2
Who Dunnit ?
3
Agenda

My background
Squeeze your data with data segment compression
Partition for success
Questions

4
My Background

Independent Consultant
13 years Oracle experience
Blog http//oramossoracle.blogspot.com/
Focused on warehousing / VLDB since 1998
First project
UK Music Sales Data Mart
Produces BBC Radio 1 Top 40 chart and many more
2 billion row sales fact table
1 Tb total database size
Currently working with Eon UK (Powergen)
4Tb Production Warehouse, 8Tb total storage
Oracle Product Stack

5
What Is Data Segment Compression ?

Compresses data by eliminating intra block
repeated column values
Reduces the space required for a segment
but only if there are appropriate repeats!
Self contained
Lossless algorithm

6
Where Can Data Segment Compression Be Used ?

Can be used with a number of segment types
Heap Nested Tables
Range or List Partitions
Materialized Views
Cant be used with
Subpartitions
Hash Partitions
Indexes but they have row level compression
IOT
External Tables
Tables that are part of a Cluster
LOBs

7
How Does Segment Compression Work ?
Database Block
Block Common Header (20 bytes)
Transaction Header (24 bytes fixed 24 bytes per
ITL)
Data Header (14 bytes)
Compressed Data Header (16 bytes - variable)
Symbol Table
Row Data Area
Tail (4 bytes)
8
What Affects Compression ?

Undisclosed Algorithm
I asked but support wouldnt play ball!
Many Factors
Block size
Anything which affects block overhead
Interested Transaction Lists (INITRANS)
Number of columns
Number of rows
PCTFREE
Number of repeats (in the block)
Length of column value(s)

9
Compression v Block Size

200K rows, Non ASSM Uniform Local extents
More chance of repeats in any given block

10
Compression v ITL

10K rows, Non ASSM Uniform Local extents
More ITL more overhead less repeats

11
Compression v Number Of Columns

500K rows, Non ASSM Uniform Local extents
Same amount of data to store
More columns more overhead less repeats

12
Compression v PCTFREE

200K rows, Non ASSM Uniform Local extents
Higher PCTFREE less space less repeats

13
Compression v NDV

200K rows, Non ASSM Uniform Local extents
Higher NDV less repeats

14
Compression v Column Length

80K rows, Non ASSM Uniform Local extents
Minimum 6 characters for compression
Longer Length more compression savings

15
Compression v Ordering

Colocate data to maximise compression benefits
For maximum compression
Minimise the total space required by the segment
Identify most compressable column(s)
For optimal access
We know how the data is to be queried
Order the data by
Access path columns
Then the next most compressable column(s)

Uniformly distributed
Colocated
16
Get Max Compression Order Package

PROCEDURE mgmt_p_get_max_compress_order
Argument Name Type
In/Out Default?
------------------------------ -------------------
---- ------ --------
P_TABLE_OWNER VARCHAR2
IN DEFAULT
P_TABLE_NAME VARCHAR2
IN
P_PARTITION_NAME VARCHAR2
IN DEFAULT
P_SAMPLE_SIZE NUMBER
IN DEFAULT
P_PREFIX_COLUMN1 VARCHAR2
IN DEFAULT
P_PREFIX_COLUMN2 VARCHAR2
IN DEFAULT
P_PREFIX_COLUMN3 VARCHAR2
IN DEFAULT
BEGIN
mgmt_p_get_max_compress_order(p_table_owner gt
AE_MGMT
,p_table_name gtBIG_TABLE
,p_sample_size gt10000)
END
/

Running mgmt_p_get_max_compress_order... ---------
--------------------------------------------------
----------------------------------------- Table
BIG_TABLE Sample Size 10000 Unique Run
ID 25012006232119 ORDER BY Prefix --------------
--------------------------------------------------
------------------------------------ Creating
MASTER Table TEMP_MASTER_25012006232119 Creatin
g COLUMN Table 1 COL1 Creating COLUMN Table 2
COL2 Creating COLUMN Table 3 COL3 ---------------
--------------------------------------------------
----------------------------------- The output
below lists each column in the table and the
number of blocks/rows and space used when the
table data is ordered by only that column, or in
the case where a prefix has been specified, where
the table data is ordered by the prefix and then
that column. From this one can determine if there
is a specific ORDER BY which can be applied to to
the data in order to maximise compression within
the table whilst, in the case of a a prefix being
present, ordering data as efficiently as possible
for the most common access path(s). --------------
--------------------------------------------------
------------------------------------ NAME
COLUMN
BLOCKS ROWS SPACE_GB

TEMP_COL_001_25
012006232119 COL1
290 10000 .0022 TEMP_COL_002_25012
006232119 COL2
345 10000 .0026 TEMP_COL_003_25012006
232119 COL3
555 10000 .0042
17
Pros Cons

Pros
Saves space
Reduces LIO / PIO
Speeds up backup/recovery
Improves query response time
Transparent
To readers
and writers
Decreases time to perform some DML
Deletes should be quicker
Bulk inserts may be quicker

18
Pros Cons

Cons
Increases CPU load
Can only be used on Direct Path operations
CTAS
Serial Inserts using INSERT / APPEND /
Parallel Inserts (PDML)
ALTER TABLEMOVE
Direct Path SQLLoader
Increases time to perform some DML
Bulk inserts may be slower
Updates are slower

19
Data Warehousing Specifics

Star Schema compresses better than Normalized
More redundant data
Focus on
Fact Tables and Summaries in Star Schema
Transaction tables in Normalized Schema
Performance Impact1
Space Savings
Star schema 67
Normalized 24
Query Elapsed Times
Star schema 16.5
Normalized 10

1 - Table Compression in Oracle 9iR2 A
Performance Analysis
20
Things To Watch Out For

DROP COLUMN is awkward
ORA-39726 Unsupported add/drop column operation
on compressed tables
Uncompress the table and try again - still gives
ORA-39726!
After UPDATEs data is uncompressed
Performance impact
Row migration
Use appropriate physical design settings
PCTFREE 0 - pack each block
Large blocksize - reduce overhead / increase
repeats per block
Minimise INITRANS - reduce overhead
Order data for best compression / access path

21
A Funny Thing

Block dump trace files still show 9iR2 even in
10g releases
ALTER SYSTEM DUMP DATAFILE x BLOCK y

Thanks to Julian Dyke for the block dumping
information http//www.juliandyke.com
22
What Is Partitioning ?

Partitioning addresses key issues in supporting
very large tables and indexes by letting you
decompose them into smaller and more manageable
pieces called partitions. Oracle Database
Concepts Manual, 10gR2
Introduced in Oracle 8.0
Numerous improvements since
Subpartitioning adds another level of
decomposition
Partitions and Subpartitions are logical
containers

23
Partition To Tablespace Mapping

Partitions map to tablespaces
Partition can only be in One tablespace
Tablespace can hold many partitions
Highest granularity is One tablespace per
partition
Lowest granularity is One tablespace for all the
partitions
Tablespace volatility
Read / Write
Read Only

T_Q1_2005
P_JAN_2005
P_FEB_2005
P_MAR_2005
P_APR_2005
T_Q2_2005
P_MAY_2005
P_JUN_2005
P_JUL_2005
T_Q3_2005
T_Q3_2005
P_AUG_2005
P_SEP_2005
P_OCT_2005
T_Q4_2005
P_NOV_2005
P_DEC_2005
T_Q1_2006
P_JAN_2006
P_FEB_2006
P_MAR_2006
Read / Write
Read Only
24
Read Only Tablespaces

Quicker checkpointing
Quicker backup
Quicker recovery
Reduced space use via compression
But
depends on granularity

Partition
Tablespace
25
Why Partition ? - Performance

Improved query performance
Pruning or elimination
Partition wise joins
Full
Partial
Selective Compression
By Partition
Selective Reorganisation
Index Partition REBUILD
Table Partition MOVE

Sales Fact Table
SELECT SUM(sales) FROM part_tab WHERE sales_date
BETWEEN 01-JAN-2005 AND 30-JUN-2005
Oracle 10gR2 Data Warehousing Manual
26
Why Partition ? - Manageability

Archiving
Use a rolling window approach
ALTER TABLE ADD/SPLIT/DROP PARTITION
Easier ETL Processing
Build a new dataset in a staging table
Add indexes and constraints
Collect statistics
Then swap the staging table for a partition on
the target
ALTER TABLEEXCHANGE PARTITION
Easier Maintenance
Table partition move, e.g. to compress data
Local Index partition rebuild

27
Why Partition ? - Scalability

Partition is generally consistent and predictable
Assuming an appropriate partitioning key is used
and data has an even distribution across the key
Read only approach
Scalable backups - read only tablespaces are
ignored
so partitions in those tablespaces are ignored
Pruning allows consistent query performance

28
Why Partition ? - Availability

Offline data impact minimised
depending on granularity
Quicker recovery
Pruned data not missed
EXCHANGE PARTITION
Allows offline build
Quick swap over

P_JAN_2005
T_Q1_2005
P_FEB_2005
P_MAR_2005
P_APR_2005
T_Q2_2005
P_MAY_2005
P_JUN_2005
P_JUL_2005
T_Q3_2005
T_Q3_2005
P_AUG_2005
P_SEP_2005
P_OCT_2005
T_Q4_2005
P_NOV_2005
P_DEC_2005
P_JAN_2006
T_Q1_2006
P_FEB_2006
P_MAR_2006
Read / Write
Read Only
29
Fact Table Partitioning
Transaction Date
Load Date
Easier ETL Processing Each load deals with only 1
partition No use to end user queries! Cant prune
Full scans!
Harder ETL Processing But still uses EXCHANGE
PARTITION Useful to end user queries Allows full
pruning capability
30
Watch out for

Partition exchange and table statistics1
Partition stats updated
but Global stats are NOT!
Affects queries accessing multiple partitions
Solution
Gather stats on staging table prior to EXCHANGE
Partition exchange
Gather stats on partitioned table using GLOBAL

Jonathan Lewis Cost-Based Oracle Fundamentals,
Chapter 2
31
Partitioning Feature Characteristic Reason Matrix
32
Questions ?
33
References Papers

Table Compression in Oracle 9iR2 A Performance
Analysis
Table Compression in Oracle 9iR2 An Oracle White
Paper
Scaling To Infinity, Partitioning In Oracle Data
Warehouses, Tim Gorman
Decision Speed Table Compression In Action

34
References Online Presentation / Code

http//www.oramoss.demon.co.uk/presentations/stack
itandpackit.ppt
http//www.oramoss.demon.co.uk/Code/mgmt_p_get_max
_compression_order.prc
http//www.oramoss.demon.co.uk/Code/test_dml_perfo
rmance_delete.sql
http//www.oramoss.demon.co.uk/Code/test_dml_perfo
rmance_insert.sql
http//www.oramoss.demon.co.uk/Code/test_dml_perfo
rmance_update.sql
http//www.oramoss.demon.co.uk/Code/test_block_siz
e_compression.sql
http//www.oramoss.demon.co.uk/Code/test_column_le
ngth_compression.sql
http//www.oramoss.demon.co.uk/Code/test_itl_compr
ession.sql
http//www.oramoss.demon.co.uk/Code/test_ndv_compr
ession.sql
http//www.oramoss.demon.co.uk/Code/test_num_cols_
compression.sql
http//www.oramoss.demon.co.uk/Code/test_pctfree_c
ompression.sql

Write a Comment

User Comments (0)

About PowerShow.com

Stack It - PowerPoint PPT Presentation

Stack It

UK Music Sales Data Mart. Produces BBC Radio 1 Top 40 chart and many more ... Serial Inserts using INSERT /* APPEND */ Parallel Inserts (PDML) ALTER TABLE...MOVE... – PowerPoint PPT presentation