Title: Stack It
1Stack It Pack ItPartitioning And Compression
For Warehouses / VLDB
2Who Dunnit ?
3Agenda
- My background
- Squeeze your data with data segment compression
- Partition for success
- Questions
4My Background
- Independent Consultant
- 13 years Oracle experience
- Blog http//oramossoracle.blogspot.com/
- Focused on warehousing / VLDB since 1998
- First project
- UK Music Sales Data Mart
- Produces BBC Radio 1 Top 40 chart and many more
- 2 billion row sales fact table
- 1 Tb total database size
- Currently working with Eon UK (Powergen)
- 4Tb Production Warehouse, 8Tb total storage
- Oracle Product Stack
5What Is Data Segment Compression ?
- Compresses data by eliminating intra block
repeated column values - Reduces the space required for a segment
- but only if there are appropriate repeats!
- Self contained
- Lossless algorithm
6Where Can Data Segment Compression Be Used ?
- Can be used with a number of segment types
- Heap Nested Tables
- Range or List Partitions
- Materialized Views
- Cant be used with
- Subpartitions
- Hash Partitions
- Indexes but they have row level compression
- IOT
- External Tables
- Tables that are part of a Cluster
- LOBs
7How Does Segment Compression Work ?
Database Block
Block Common Header (20 bytes)
Transaction Header (24 bytes fixed 24 bytes per
ITL)
Data Header (14 bytes)
Compressed Data Header (16 bytes - variable)
Symbol Table
Row Data Area
Tail (4 bytes)
8What Affects Compression ?
- Undisclosed Algorithm
- I asked but support wouldnt play ball!
- Many Factors
- Block size
- Anything which affects block overhead
- Interested Transaction Lists (INITRANS)
- Number of columns
- Number of rows
- PCTFREE
- Number of repeats (in the block)
- Length of column value(s)
9Compression v Block Size
- 200K rows, Non ASSM Uniform Local extents
- More chance of repeats in any given block
10Compression v ITL
- 10K rows, Non ASSM Uniform Local extents
- More ITL more overhead less repeats
11Compression v Number Of Columns
- 500K rows, Non ASSM Uniform Local extents
- Same amount of data to store
- More columns more overhead less repeats
12Compression v PCTFREE
- 200K rows, Non ASSM Uniform Local extents
- Higher PCTFREE less space less repeats
13Compression v NDV
- 200K rows, Non ASSM Uniform Local extents
- Higher NDV less repeats
14Compression v Column Length
- 80K rows, Non ASSM Uniform Local extents
- Minimum 6 characters for compression
- Longer Length more compression savings
15Compression v Ordering
- Colocate data to maximise compression benefits
- For maximum compression
- Minimise the total space required by the segment
- Identify most compressable column(s)
- For optimal access
- We know how the data is to be queried
- Order the data by
- Access path columns
- Then the next most compressable column(s)
Uniformly distributed
Colocated
16Get Max Compression Order Package
- PROCEDURE mgmt_p_get_max_compress_order
- Argument Name Type
In/Out Default? - ------------------------------ -------------------
---- ------ -------- - P_TABLE_OWNER VARCHAR2
IN DEFAULT - P_TABLE_NAME VARCHAR2
IN - P_PARTITION_NAME VARCHAR2
IN DEFAULT - P_SAMPLE_SIZE NUMBER
IN DEFAULT - P_PREFIX_COLUMN1 VARCHAR2
IN DEFAULT - P_PREFIX_COLUMN2 VARCHAR2
IN DEFAULT - P_PREFIX_COLUMN3 VARCHAR2
IN DEFAULT - BEGIN
- mgmt_p_get_max_compress_order(p_table_owner gt
AE_MGMT - ,p_table_name gtBIG_TABLE
- ,p_sample_size gt10000)
- END
- /
Running mgmt_p_get_max_compress_order... ---------
--------------------------------------------------
----------------------------------------- Table
BIG_TABLE Sample Size 10000 Unique Run
ID 25012006232119 ORDER BY Prefix --------------
--------------------------------------------------
------------------------------------ Creating
MASTER Table TEMP_MASTER_25012006232119 Creatin
g COLUMN Table 1 COL1 Creating COLUMN Table 2
COL2 Creating COLUMN Table 3 COL3 ---------------
--------------------------------------------------
----------------------------------- The output
below lists each column in the table and the
number of blocks/rows and space used when the
table data is ordered by only that column, or in
the case where a prefix has been specified, where
the table data is ordered by the prefix and then
that column. From this one can determine if there
is a specific ORDER BY which can be applied to to
the data in order to maximise compression within
the table whilst, in the case of a a prefix being
present, ordering data as efficiently as possible
for the most common access path(s). --------------
--------------------------------------------------
------------------------------------ NAME
COLUMN
BLOCKS ROWS SPACE_GB
TEMP_COL_001_25
012006232119 COL1
290 10000 .0022 TEMP_COL_002_25012
006232119 COL2
345 10000 .0026 TEMP_COL_003_25012006
232119 COL3
555 10000 .0042
17Pros Cons
- Pros
- Saves space
- Reduces LIO / PIO
- Speeds up backup/recovery
- Improves query response time
- Transparent
- To readers
- and writers
- Decreases time to perform some DML
- Deletes should be quicker
- Bulk inserts may be quicker
18Pros Cons
- Cons
- Increases CPU load
- Can only be used on Direct Path operations
- CTAS
- Serial Inserts using INSERT / APPEND /
- Parallel Inserts (PDML)
- ALTER TABLEMOVE
- Direct Path SQLLoader
- Increases time to perform some DML
- Bulk inserts may be slower
- Updates are slower
19Data Warehousing Specifics
- Star Schema compresses better than Normalized
- More redundant data
- Focus on
- Fact Tables and Summaries in Star Schema
- Transaction tables in Normalized Schema
- Performance Impact1
- Space Savings
- Star schema 67
- Normalized 24
- Query Elapsed Times
- Star schema 16.5
- Normalized 10
1 - Table Compression in Oracle 9iR2 A
Performance Analysis
20Things To Watch Out For
- DROP COLUMN is awkward
- ORA-39726 Unsupported add/drop column operation
on compressed tables - Uncompress the table and try again - still gives
ORA-39726! - After UPDATEs data is uncompressed
- Performance impact
- Row migration
- Use appropriate physical design settings
- PCTFREE 0 - pack each block
- Large blocksize - reduce overhead / increase
repeats per block - Minimise INITRANS - reduce overhead
- Order data for best compression / access path
21A Funny Thing
- Block dump trace files still show 9iR2 even in
10g releases - ALTER SYSTEM DUMP DATAFILE x BLOCK y
Thanks to Julian Dyke for the block dumping
information http//www.juliandyke.com
22What Is Partitioning ?
- Partitioning addresses key issues in supporting
very large tables and indexes by letting you
decompose them into smaller and more manageable
pieces called partitions. Oracle Database
Concepts Manual, 10gR2 - Introduced in Oracle 8.0
- Numerous improvements since
- Subpartitioning adds another level of
decomposition - Partitions and Subpartitions are logical
containers
23Partition To Tablespace Mapping
- Partitions map to tablespaces
- Partition can only be in One tablespace
- Tablespace can hold many partitions
- Highest granularity is One tablespace per
partition - Lowest granularity is One tablespace for all the
partitions - Tablespace volatility
- Read / Write
- Read Only
T_Q1_2005
P_JAN_2005
P_FEB_2005
P_MAR_2005
P_APR_2005
T_Q2_2005
P_MAY_2005
P_JUN_2005
P_JUL_2005
T_Q3_2005
T_Q3_2005
P_AUG_2005
P_SEP_2005
P_OCT_2005
T_Q4_2005
P_NOV_2005
P_DEC_2005
T_Q1_2006
P_JAN_2006
P_FEB_2006
P_MAR_2006
Read / Write
Read Only
24Read Only Tablespaces
- Quicker checkpointing
- Quicker backup
- Quicker recovery
- Reduced space use via compression
- But
- depends on granularity
Partition
Tablespace
25Why Partition ? - Performance
- Improved query performance
- Pruning or elimination
- Partition wise joins
- Full
- Partial
- Selective Compression
- By Partition
- Selective Reorganisation
- Index Partition REBUILD
- Table Partition MOVE
Sales Fact Table
SELECT SUM(sales) FROM part_tab WHERE sales_date
BETWEEN 01-JAN-2005 AND 30-JUN-2005
Oracle 10gR2 Data Warehousing Manual
26Why Partition ? - Manageability
- Archiving
- Use a rolling window approach
- ALTER TABLE ADD/SPLIT/DROP PARTITION
- Easier ETL Processing
- Build a new dataset in a staging table
- Add indexes and constraints
- Collect statistics
- Then swap the staging table for a partition on
the target - ALTER TABLEEXCHANGE PARTITION
- Easier Maintenance
- Table partition move, e.g. to compress data
- Local Index partition rebuild
27Why Partition ? - Scalability
- Partition is generally consistent and predictable
- Assuming an appropriate partitioning key is used
- and data has an even distribution across the key
- Read only approach
- Scalable backups - read only tablespaces are
ignored - so partitions in those tablespaces are ignored
- Pruning allows consistent query performance
28Why Partition ? - Availability
- Offline data impact minimised
- depending on granularity
- Quicker recovery
- Pruned data not missed
- EXCHANGE PARTITION
- Allows offline build
- Quick swap over
P_JAN_2005
T_Q1_2005
P_FEB_2005
P_MAR_2005
P_APR_2005
T_Q2_2005
P_MAY_2005
P_JUN_2005
P_JUL_2005
T_Q3_2005
T_Q3_2005
P_AUG_2005
P_SEP_2005
P_OCT_2005
T_Q4_2005
P_NOV_2005
P_DEC_2005
P_JAN_2006
T_Q1_2006
P_FEB_2006
P_MAR_2006
Read / Write
Read Only
29Fact Table Partitioning
Transaction Date
Load Date
Easier ETL Processing Each load deals with only 1
partition No use to end user queries! Cant prune
Full scans!
Harder ETL Processing But still uses EXCHANGE
PARTITION Useful to end user queries Allows full
pruning capability
30Watch out for
- Partition exchange and table statistics1
- Partition stats updated
- but Global stats are NOT!
- Affects queries accessing multiple partitions
- Solution
- Gather stats on staging table prior to EXCHANGE
- Partition exchange
- Gather stats on partitioned table using GLOBAL
Jonathan Lewis Cost-Based Oracle Fundamentals,
Chapter 2
31Partitioning Feature Characteristic Reason Matrix
32Questions ?
33References Papers
- Table Compression in Oracle 9iR2 A Performance
Analysis - Table Compression in Oracle 9iR2 An Oracle White
Paper - Scaling To Infinity, Partitioning In Oracle Data
Warehouses, Tim Gorman - Decision Speed Table Compression In Action
34References Online Presentation / Code
- http//www.oramoss.demon.co.uk/presentations/stack
itandpackit.ppt - http//www.oramoss.demon.co.uk/Code/mgmt_p_get_max
_compression_order.prc - http//www.oramoss.demon.co.uk/Code/test_dml_perfo
rmance_delete.sql - http//www.oramoss.demon.co.uk/Code/test_dml_perfo
rmance_insert.sql - http//www.oramoss.demon.co.uk/Code/test_dml_perfo
rmance_update.sql - http//www.oramoss.demon.co.uk/Code/test_block_siz
e_compression.sql - http//www.oramoss.demon.co.uk/Code/test_column_le
ngth_compression.sql - http//www.oramoss.demon.co.uk/Code/test_itl_compr
ession.sql - http//www.oramoss.demon.co.uk/Code/test_ndv_compr
ession.sql - http//www.oramoss.demon.co.uk/Code/test_num_cols_
compression.sql - http//www.oramoss.demon.co.uk/Code/test_pctfree_c
ompression.sql