Title: Designing Aggregations
1DesigningAggregations
2Performance Fundamentals - Aggregations
- Pre-calculated summaries of data
- Intersections of levels from each dimension
- Tradeoff between processing (disk space) and
query times
Number of Aggregations
3Aggregation Generalities
- Are created when a cube is processed
- Include all measures
- Cannot generally be designed for specific members
- Are built and maintained automatically
4Contents of an Aggregation
5Contents of an Aggregation
- Time 33 Members
- All (1)
- Year (2)
- Quarter (8)
- Month (22)
Total Number of Aggregations 4 Total Aggregated
Values 33
6Contents of an Aggregation
- Time 33 Total Members
- All (1)
- Year (2)
- Quarter (8)
- Month (22)
- State - 14 Total Members
- All (1)
- Country (3)
- Region (4)
- State (6)
Total Number of Aggregations 16 Total
Aggregated Values 462 (Theoretical)
7Data Storage Minimal Data Explosion
- Data Explosion
- Historical weakness of OLAP
- 100 Dense
- No heaters sold in Phoenix in July no storage
allocated - Intelligent Aggregation Design
- Pre-aggregate only a subset of the data
- Compression Algorithm
8Partial Aggregation
Highest level of aggregation
Show all sales for all products for all . . .
Most detailed aggregations
Fact Table
9Partial Aggregation
Fact Table
10Analysis Server Cube Storage
11MOLAP Storage Mode
- Details and Aggregations Stored in
Multidimensional Format - Fastest Storage Option for Queries
- Often the Most Efficient in Terms of Disk
Storage, Due to Compression
12ROLAP Storage Mode
- Details and Aggregations Stored in RDBMS
- Slowest Query Performance
- Most Often the Slowest to Process
- Analysis Server Can Create Indexed Views
- Useful for Large Data Sources
- Provides Real-Time OLAP Solution
13HOLAP Storage Mode
- Details Maintained in RDBMS
- Aggregations Created in Multidimensional Format
- Good Option where Disk Consumption Is a Concern
- Good Compromise if Details Are Accessed
Infrequently
14Cube Aggregations
- Full Aggregation Not Necessary
- Effects on Cube Size and Processing Time
- Cube size and processing times increase as
aggregations are added to a cube - Tools for Implementing Aggregations
- Storage Design Wizard
- Usage-Based Optimization Wizard
15Why Usage-Based Optimization?
- Limitations of Storage Design Wizard
- Driven by structural factors
- Does not take Into account user behavior
- Benefits of Usage Based Optimization
- Collects data about user queries
- Allows aggregations to be built based on usage
patterns
16Using the Query Log
- Populating the Usage Log
- The server logs one out of ten queries by default
- You can change the query sampling rate
- Accessing the Usage Log
- You can view reports in Analysis Manager
- You can open the file msmdqlog.mdb
- Modifying the Cube Structure
- If you update the cube structure, query log
entries may be invalid - You can clear or filter the contents of the query
log in Analysis Manager
17Partitioning
- Subsets of the data within a cube
- May come from different tables or data sources
- Role in data management
- Varying storage modes and aggregations
- Adding new and removing obsolete data
18Partitioning Strategy
MOLAP 15 Agg
Current Year
Current Year
HOLAP 20 Agg
Prior Year
Prior Year
History
History
ROLAP 25 Agg
19Partitioning Data Management Example
Month 1
Month 2
Prior 2 years of data in ROLAP storage
Month 24
Month 25
Month 25
Month 26
Most recent 3 years of data in MOLAP storage
Month 60
New partition for new month data
Month 61
20Partitioning Guidelines
- Benefits for processing
- Parallelism
- Particularly in an initial load
- Increased selectivity when querying
- Must set slice information for the system!
- Beware! Changing the aggregations for a
partition will not alter other partitions