Stack It - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Stack It

Description:

UK Music Sales Data Mart. Produces BBC Radio 1 Top 40 chart and many more ... Serial Inserts using INSERT /* APPEND */ Parallel Inserts (PDML) ALTER TABLE...MOVE... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 35
Provided by: jeff51
Category:
Tags: stack

less

Transcript and Presenter's Notes

Title: Stack It


1
Stack It Pack ItPartitioning And Compression
For Warehouses / VLDB
  • Jeff Moss

2
Who Dunnit ?
3
Agenda
  • My background
  • Squeeze your data with data segment compression
  • Partition for success
  • Questions

4
My Background
  • Independent Consultant
  • 13 years Oracle experience
  • Blog http//oramossoracle.blogspot.com/
  • Focused on warehousing / VLDB since 1998
  • First project
  • UK Music Sales Data Mart
  • Produces BBC Radio 1 Top 40 chart and many more
  • 2 billion row sales fact table
  • 1 Tb total database size
  • Currently working with Eon UK (Powergen)
  • 4Tb Production Warehouse, 8Tb total storage
  • Oracle Product Stack

5
What Is Data Segment Compression ?
  • Compresses data by eliminating intra block
    repeated column values
  • Reduces the space required for a segment
  • but only if there are appropriate repeats!
  • Self contained
  • Lossless algorithm

6
Where Can Data Segment Compression Be Used ?
  • Can be used with a number of segment types
  • Heap Nested Tables
  • Range or List Partitions
  • Materialized Views
  • Cant be used with
  • Subpartitions
  • Hash Partitions
  • Indexes but they have row level compression
  • IOT
  • External Tables
  • Tables that are part of a Cluster
  • LOBs

7
How Does Segment Compression Work ?
Database Block
Block Common Header (20 bytes)
Transaction Header (24 bytes fixed 24 bytes per
ITL)
Data Header (14 bytes)
Compressed Data Header (16 bytes - variable)
Symbol Table
Row Data Area
Tail (4 bytes)
8
What Affects Compression ?
  • Undisclosed Algorithm
  • I asked but support wouldnt play ball!
  • Many Factors
  • Block size
  • Anything which affects block overhead
  • Interested Transaction Lists (INITRANS)
  • Number of columns
  • Number of rows
  • PCTFREE
  • Number of repeats (in the block)
  • Length of column value(s)

9
Compression v Block Size
  • 200K rows, Non ASSM Uniform Local extents
  • More chance of repeats in any given block

10
Compression v ITL
  • 10K rows, Non ASSM Uniform Local extents
  • More ITL more overhead less repeats

11
Compression v Number Of Columns
  • 500K rows, Non ASSM Uniform Local extents
  • Same amount of data to store
  • More columns more overhead less repeats

12
Compression v PCTFREE
  • 200K rows, Non ASSM Uniform Local extents
  • Higher PCTFREE less space less repeats

13
Compression v NDV
  • 200K rows, Non ASSM Uniform Local extents
  • Higher NDV less repeats

14
Compression v Column Length
  • 80K rows, Non ASSM Uniform Local extents
  • Minimum 6 characters for compression
  • Longer Length more compression savings

15
Compression v Ordering
  • Colocate data to maximise compression benefits
  • For maximum compression
  • Minimise the total space required by the segment
  • Identify most compressable column(s)
  • For optimal access
  • We know how the data is to be queried
  • Order the data by
  • Access path columns
  • Then the next most compressable column(s)

Uniformly distributed
Colocated
16
Get Max Compression Order Package
  • PROCEDURE mgmt_p_get_max_compress_order
  • Argument Name Type
    In/Out Default?
  • ------------------------------ -------------------
    ---- ------ --------
  • P_TABLE_OWNER VARCHAR2
    IN DEFAULT
  • P_TABLE_NAME VARCHAR2
    IN
  • P_PARTITION_NAME VARCHAR2
    IN DEFAULT
  • P_SAMPLE_SIZE NUMBER
    IN DEFAULT
  • P_PREFIX_COLUMN1 VARCHAR2
    IN DEFAULT
  • P_PREFIX_COLUMN2 VARCHAR2
    IN DEFAULT
  • P_PREFIX_COLUMN3 VARCHAR2
    IN DEFAULT
  • BEGIN
  • mgmt_p_get_max_compress_order(p_table_owner gt
    AE_MGMT
  • ,p_table_name gtBIG_TABLE
  • ,p_sample_size gt10000)
  • END
  • /

Running mgmt_p_get_max_compress_order... ---------
--------------------------------------------------
----------------------------------------- Table
BIG_TABLE Sample Size 10000 Unique Run
ID 25012006232119 ORDER BY Prefix --------------
--------------------------------------------------
------------------------------------ Creating
MASTER Table TEMP_MASTER_25012006232119 Creatin
g COLUMN Table 1 COL1 Creating COLUMN Table 2
COL2 Creating COLUMN Table 3 COL3 ---------------
--------------------------------------------------
----------------------------------- The output
below lists each column in the table and the
number of blocks/rows and space used when the
table data is ordered by only that column, or in
the case where a prefix has been specified, where
the table data is ordered by the prefix and then
that column. From this one can determine if there
is a specific ORDER BY which can be applied to to
the data in order to maximise compression within
the table whilst, in the case of a a prefix being
present, ordering data as efficiently as possible
for the most common access path(s). --------------
--------------------------------------------------
------------------------------------ NAME
COLUMN
BLOCKS ROWS SPACE_GB

TEMP_COL_001_25
012006232119 COL1
290 10000 .0022 TEMP_COL_002_25012
006232119 COL2
345 10000 .0026 TEMP_COL_003_25012006
232119 COL3
555 10000 .0042
17
Pros Cons
  • Pros
  • Saves space
  • Reduces LIO / PIO
  • Speeds up backup/recovery
  • Improves query response time
  • Transparent
  • To readers
  • and writers
  • Decreases time to perform some DML
  • Deletes should be quicker
  • Bulk inserts may be quicker

18
Pros Cons
  • Cons
  • Increases CPU load
  • Can only be used on Direct Path operations
  • CTAS
  • Serial Inserts using INSERT / APPEND /
  • Parallel Inserts (PDML)
  • ALTER TABLEMOVE
  • Direct Path SQLLoader
  • Increases time to perform some DML
  • Bulk inserts may be slower
  • Updates are slower

19
Data Warehousing Specifics
  • Star Schema compresses better than Normalized
  • More redundant data
  • Focus on
  • Fact Tables and Summaries in Star Schema
  • Transaction tables in Normalized Schema
  • Performance Impact1
  • Space Savings
  • Star schema 67
  • Normalized 24
  • Query Elapsed Times
  • Star schema 16.5
  • Normalized 10

1 - Table Compression in Oracle 9iR2 A
Performance Analysis
20
Things To Watch Out For
  • DROP COLUMN is awkward
  • ORA-39726 Unsupported add/drop column operation
    on compressed tables
  • Uncompress the table and try again - still gives
    ORA-39726!
  • After UPDATEs data is uncompressed
  • Performance impact
  • Row migration
  • Use appropriate physical design settings
  • PCTFREE 0 - pack each block
  • Large blocksize - reduce overhead / increase
    repeats per block
  • Minimise INITRANS - reduce overhead
  • Order data for best compression / access path

21
A Funny Thing
  • Block dump trace files still show 9iR2 even in
    10g releases
  • ALTER SYSTEM DUMP DATAFILE x BLOCK y

Thanks to Julian Dyke for the block dumping
information http//www.juliandyke.com
22
What Is Partitioning ?
  • Partitioning addresses key issues in supporting
    very large tables and indexes by letting you
    decompose them into smaller and more manageable
    pieces called partitions. Oracle Database
    Concepts Manual, 10gR2
  • Introduced in Oracle 8.0
  • Numerous improvements since
  • Subpartitioning adds another level of
    decomposition
  • Partitions and Subpartitions are logical
    containers

23
Partition To Tablespace Mapping
  • Partitions map to tablespaces
  • Partition can only be in One tablespace
  • Tablespace can hold many partitions
  • Highest granularity is One tablespace per
    partition
  • Lowest granularity is One tablespace for all the
    partitions
  • Tablespace volatility
  • Read / Write
  • Read Only

T_Q1_2005
P_JAN_2005
P_FEB_2005
P_MAR_2005
P_APR_2005
T_Q2_2005
P_MAY_2005
P_JUN_2005
P_JUL_2005
T_Q3_2005
T_Q3_2005
P_AUG_2005
P_SEP_2005
P_OCT_2005
T_Q4_2005
P_NOV_2005
P_DEC_2005
T_Q1_2006
P_JAN_2006
P_FEB_2006
P_MAR_2006
Read / Write
Read Only
24
Read Only Tablespaces
  • Quicker checkpointing
  • Quicker backup
  • Quicker recovery
  • Reduced space use via compression
  • But
  • depends on granularity

Partition
Tablespace
25
Why Partition ? - Performance
  • Improved query performance
  • Pruning or elimination
  • Partition wise joins
  • Full
  • Partial
  • Selective Compression
  • By Partition
  • Selective Reorganisation
  • Index Partition REBUILD
  • Table Partition MOVE

Sales Fact Table
SELECT SUM(sales) FROM part_tab WHERE sales_date
BETWEEN 01-JAN-2005 AND 30-JUN-2005
Oracle 10gR2 Data Warehousing Manual
26
Why Partition ? - Manageability
  • Archiving
  • Use a rolling window approach
  • ALTER TABLE ADD/SPLIT/DROP PARTITION
  • Easier ETL Processing
  • Build a new dataset in a staging table
  • Add indexes and constraints
  • Collect statistics
  • Then swap the staging table for a partition on
    the target
  • ALTER TABLEEXCHANGE PARTITION
  • Easier Maintenance
  • Table partition move, e.g. to compress data
  • Local Index partition rebuild

27
Why Partition ? - Scalability
  • Partition is generally consistent and predictable
  • Assuming an appropriate partitioning key is used
  • and data has an even distribution across the key
  • Read only approach
  • Scalable backups - read only tablespaces are
    ignored
  • so partitions in those tablespaces are ignored
  • Pruning allows consistent query performance

28
Why Partition ? - Availability
  • Offline data impact minimised
  • depending on granularity
  • Quicker recovery
  • Pruned data not missed
  • EXCHANGE PARTITION
  • Allows offline build
  • Quick swap over

P_JAN_2005
T_Q1_2005
P_FEB_2005
P_MAR_2005
P_APR_2005
T_Q2_2005
P_MAY_2005
P_JUN_2005
P_JUL_2005
T_Q3_2005
T_Q3_2005
P_AUG_2005
P_SEP_2005
P_OCT_2005
T_Q4_2005
P_NOV_2005
P_DEC_2005
P_JAN_2006
T_Q1_2006
P_FEB_2006
P_MAR_2006
Read / Write
Read Only
29
Fact Table Partitioning
Transaction Date
Load Date
Easier ETL Processing Each load deals with only 1
partition No use to end user queries! Cant prune
Full scans!
Harder ETL Processing But still uses EXCHANGE
PARTITION Useful to end user queries Allows full
pruning capability
30
Watch out for
  • Partition exchange and table statistics1
  • Partition stats updated
  • but Global stats are NOT!
  • Affects queries accessing multiple partitions
  • Solution
  • Gather stats on staging table prior to EXCHANGE
  • Partition exchange
  • Gather stats on partitioned table using GLOBAL

Jonathan Lewis Cost-Based Oracle Fundamentals,
Chapter 2
31
Partitioning Feature Characteristic Reason Matrix
32
Questions ?
33
References Papers
  • Table Compression in Oracle 9iR2 A Performance
    Analysis
  • Table Compression in Oracle 9iR2 An Oracle White
    Paper
  • Scaling To Infinity, Partitioning In Oracle Data
    Warehouses, Tim Gorman
  • Decision Speed Table Compression In Action

34
References Online Presentation / Code
  • http//www.oramoss.demon.co.uk/presentations/stack
    itandpackit.ppt
  • http//www.oramoss.demon.co.uk/Code/mgmt_p_get_max
    _compression_order.prc
  • http//www.oramoss.demon.co.uk/Code/test_dml_perfo
    rmance_delete.sql
  • http//www.oramoss.demon.co.uk/Code/test_dml_perfo
    rmance_insert.sql
  • http//www.oramoss.demon.co.uk/Code/test_dml_perfo
    rmance_update.sql
  • http//www.oramoss.demon.co.uk/Code/test_block_siz
    e_compression.sql
  • http//www.oramoss.demon.co.uk/Code/test_column_le
    ngth_compression.sql
  • http//www.oramoss.demon.co.uk/Code/test_itl_compr
    ession.sql
  • http//www.oramoss.demon.co.uk/Code/test_ndv_compr
    ession.sql
  • http//www.oramoss.demon.co.uk/Code/test_num_cols_
    compression.sql
  • http//www.oramoss.demon.co.uk/Code/test_pctfree_c
    ompression.sql
Write a Comment
User Comments (0)
About PowerShow.com