Title: OLAP
1OLAP
2Where Does OLAP Fit In? (1)
- OLAP On-line analytical processing.
- OLAP is a characterization of applications, not a
database design technique. - Idea is to provide very fast response time in
order to facilitate iterative decision-making. - Analytical processing requires access to complex
aggregations (as opposed to record-level access).
3Where Does OLAP Fit In? (2)
- Information is conceptually viewed as cubes for
simplifying the way in which users access, view,
and analyze data. - Quantitative values are known as facts or
measures. - e.g., sales , units sold, etc.
- Descriptive categories are known as dimensions.
- e.g., geography, time, product, scenario (budget
or actual), etc. - Dimensions are often organized in hierarchies
that represent levels of detail in the data
(e.g., UPC, SKU, product subcategory, product
category, etc.).
4OLAP FASMI Test
- Fast Delivers information to the user at a
fairly constant rate. Most queries should be
delivered to the user in five seconds or less. - Analysis Performs basic numerical and
statistical analysis of the data, pre-defined by
an application developer or defined ad hoc by the
user. - Shared Implements the security requirements
necessary for sharing potentially confidential
data across a large user population. - Multi-dimensional The essential characteristic
of OLAP. - Information Accesses all the data and
information necessary and relevant for the
application, wherever it may reside and not
limited by volume. - ...from the OLAP Report by Pendse and Creeth.
5Need for Multidimensional Analysis
- A simple analysis
- How many units of product A did we sell in the
store in DHA, Lahore - Typically, decision support requires more complex
analyses - How much revenue did the new product X generate
during the last three months, broken down by
individual months, in the Southern Region, by
individual stores, broken down by the promotions,
compared to estimates, and compared to the
previous version of the the product?
6Kinds of Analyses
- Roll-ups to provide summaries and aggregates
along the hierarchies of the dimensions - Drill-downs from the top level to the lowest
along the hierarchies of the dimensions - Calculations involving facts and metrics
- Algebraic equations involving key performance
indicators - Moving averages and growth percentages
- Trend analyses using statistical methods
7(No Transcript)
8(No Transcript)
9OLAP?
- The name On-Line Analytical Processing was coined
in a paper by E.F. Codd in 1993 (Providing
On-Line Analytical Processing for User Analysts) - A definition
- OLAP is a category of software technology that
enables analysts, managers, and executives to
gain insight into data through fast, consistent,
interactive access in a a wide variety of
possible views of information that has been
transformed from raw data to reflect the real
dimensionality of the enterprise as understood by
the user
10OLAP Features
11Dimensional Analysis (1)
12Dimensional Analysis (2)
13Some Queries
- Display the total sales of all products for past
five years in all stores - Compare total sales for all stores, product by
product, between years 2000 and 1999. - Show comparison of sales by individual stores,
product by product, between years 2000 and 1999
only for those products with reduced sales. - Show the results of the previous queries, but
rotating the columns with rows
14Hypercubes
- Multi-dimension cubes
- Hard to visualize and display beyond three
dimensions - Multi-dimensional domain structure (MDS)
- Represents each dimension as a line showing the
values
15MDS
16Display of Hypercubes
17(No Transcript)
18(No Transcript)
19Drill-Down and Roll-Up
20Slice-and-Dice or Rotation
21OLAP Models/Implementations
- MOLAP OLAP implemented with a multi-dimensional
database. - ROLAP OLAP implemented with a relational
database. - HOLAP OLAP implemented with a hybrid of
multi-dimensional and relational database
technologies. - DOLAP OLAP implemented for desktop decision
support environments.
22ROLAP and MOLAP
23MOLAP Implementations
- OLAP has historically been implemented through
use of multi-dimensional databases (MDDs). - Dimensions are key business factors for analysis
- geographies (zip, state, region,...)
- products (item, product category, product
department,...) - dates (day, week, month, quarter, year,...)
- Very high performance via fast look-up into
cube data structure to retrieve pre-calculated
results. - Cube data structures allow pre-calculation of
aggregate results for each possible combination
of dimensional values. - Use of application programming interface (API)
for access via front-end tools.
24(No Transcript)
25MOLAP Implementations
- Need to consider both maintenance and storage
implications when designing strategy for when to
build cubes. - Maintenance Considerations Every data item
received into MDD must be aggregated into every
cube (assuming to-date summaries are
maintained). - Storage Considerations Although cubes get much
smaller (e.g., more dense) as dimensions get less
detailed (e.g., year vs. day), storage
implications for building hundreds of cubes can
be significant.
26MOLAP Implementations
- Typically outperform relational database
technology because all answers are pre-computed
into cubes (and overhead for accessing cubes is
very low). - Difficult to scale because of combinatorial
explosion in the number and size of cubes when
dimensions of significant cardinality are
required. - Beyond tens (sometimes small hundreds) of
thousands of entries in a single dimension will
break the MOLAP model because the pre-computed
cube model does not work well when the cubes are
very sparse in the population of individual
cells. - See www.olapreport.com/DataExplosion.htm
27Virtual Cubes
- Virtual cubes are used when there is a need to
join information from two dissimilar cubes that
share one or more common dimensions. - Similar to a relational view two (or more) cubes
are linked along common dimension(s). - Often used to save space by eliminating redundant
storage of information.
28Partitioned Cubes
- One logical cube of data can be spread across
multiple physical cubes on separate (or same)
servers. - The divide-and-conquer approach of partitioned
cubes helps to mitigate the scalability
limitations of a MOLAP environment. - Ideal cube partitioning is completely invisible
to end users.
29ROLAP Implementations
- Advances in database technologies and front-end
tools have begun to allow deployment of OLAP
using ANSI SQL RDBMS implementations. - ROLAP facilitates deployment of much larger
dimension tables than MOLAP implementations. - Front-end tools to facilitate GUI access to
multi-dimensional analysis capabilities. - Aggregate awareness allows exploitation of
pre-built summary tables for some front-end
tools. - Star schema designs are often used to facilitate
OLAP against relational databases.
30(No Transcript)
31Simplified Third Normal Form (Retail)
32Simplified Star Schema
33Simplified Star Schema
- A vastly simplified physical data model!
- Collapse dimensional hierarchies into a single
table for each dimension and create a single fact
table from the header and detail records - Fewer tables.
- Fewer joins to get results.
34Star Schema for High Performance
- Business question How many in raincoats did I
sell in the first week of January through stores
in Boston? - Assume
- 4 Billion rows in fact table.
- 20 different kinds (size, color, style) of
raincoats (product category) out of 50,000 UPCs
in store. - 8 stores out of 400 are in BOSTON SMSA.
- 2 years of POS history in DBMS.
35Star Schema for High Performance
- Simple (poor performance) approach to query
execution - 1. Join item table with filtering on raincoat
product category (very selective) to fact table. - 2. Join date table with filtering by week (next
most selective) to result table. - 3. Join store table with filtering on store to
result table from step 2. - 4. Aggregate.
36Star Schema for High Performance
- Advanced (better performance) approach to query
execution - 1. Cartesian product join between dimensional
tables. - Result is 20 x 8 x 7 1,120 rows.
- 2. Use composite index on itemstoreday into
fact table for very selective access. - Access less than 0.00000008 percent of data in
fact table! - Sophisticated cost-based optimizers will figure
this out.
37Forcing a Cartesian Product Join
- Add an addition join_value column in each
dimensional table. - Set join_value to same value in all rows of the
dimensional tables. - Add additional where clause predicates joining on
this column between dimensional tables. - NOTE This shouldn't be necessary with a smart
optimizer.
38Forcing a Cartesian Product Join
- Sample code
- select sum(sales.sales_amt)
- from d_sales_detail
- ,store
- ,item
- ,period
- where d_sales_detail.store_id store.store_id
- and d_sales_detail.item_id item.item_id
- and d_sales_detail.day_dt period.day_dt
- and period.day_dt between '23-NOV-2000' and
'24-DEC-2000' - and item.trade_style_cd 'BARBIE'
- and store.state_cd 'CA'
- and store.join_value period.join_value
- and store.join_value item.join_value
- and period.join_value item.join_value
39Star Schema for High Performance
- Problem What if I want to know raincoat sales
in first week of January regardless of store? - Answer Performance advantage of composite index
in traditional RDBMS is severely impaired! - B-tree indexing techniques do not allow for
flexibility in the use of dimensions for query
purposes. - Bit indexing (and variations thereof) often
allows much more generality in achieving high
performance from a star schema.
40Star Schema for High Performance
- Bottom Line
- It is not at all unusual to obtain an order of
magnitude (or more) in performance advantage
using a star schema with advanced indexing versus
a more traditional relational database
implementation. - Despite what vendors may tell you, star schemas
cannot be effectively implemented for all DSS
business applications and/or data models.
41ROLAP
- Relational OLAP often makes heavy use of summary
tables to provide near instantaneous access for
multi-dimensional queries. - Foundation is usually star schema or snowflake
database design. - Allows OLAP with much larger data sets than
multi-dimensional database (MDD) products using
cube structures (MOLAP).
42ROLAP
- Number of summary tables can get very large if
discipline is not enforced... - Assume a retail database with the following two
dimensions on the fact table... - Calendar Day, Week, Period, Quarter, Year, All
Days - Geography Store, Zone, District, Region, All
Stores
43ROLAP
Summary tables in a naive implementation require
all combinations of the dimensions at each
aggregation level...
30 summary tables! ... Add in item, SKU,
subcategory, category, and all items...now we are
up to 150 pre-aggregates!
44ROLAP
- Summary tables are more of a maintenance issue
than a storage issue in most production
implementations. - Notice that summary tables get much smaller as
dimensions get less detailed (e.g., year vs.
day). - Should plan for double the size of the
unsummarized data for ROLAP summaries in most
environments. - Every detail record that is received into
warehouse must aggregate into EVERY summary
table (assuming "to-date" summaries are
maintained).
45ROLAP
- Warning Do not assume that dimensions are
always simple hierarchies. - Example Items are not just category,
subcategory, SKU, and atomic item.... what about
trade styles or manufacturer? - Now we need summary tables along these lines as
well...another 120 summary tables! - Calendar vs. accounting period vs. billing cycle
can be even worse...
46ROLAP
- Many ROLAP products have devised ways to reduce
the number of summary tables - Ability to build summaries on-the-fly as demanded
by end-user applications. - Ability to aggregate efficiently from subset of
the summary tables. - Tools exist in some products to assist in DBAs in
selecting the "best aggregations to build. - HOLAP (Hybrid OLAP) tools allow co-existence of
pre-built cubes alongside relational OLAP
structures.
47Intelligent Aggregation Selection
- Maximum performance boost implies lots of disk
for every pre-calculation. - Minimum performance boost implies no disk with
zero pre-calculation. - Strategy is to use meta data to heuristically
determine optimum set of aggregates from which
all other aggregates can be derived.
48Aggregate Wizards
49Fact Table Aggregates
- Enhance performance on common queries at coarser
granularities. - Save space to permit storing more history than
possible with finer granularities. - Take advantage of need to store other facts (with
similar samples) at a particular granularity.
50Aggregate Advice
- Coarser granularity decreases potential
cardinality, but usually increases density (e.g.,
daily summary table is typically twice the size
of weekly summary table - not seven times). - Strongly consider omitting candidate aggregates
where expected cardinality is more than 10 that
of next finer granularity stored. - Keep the detail for drill down, even if you
deploy aggregates for performance.
51Bottom Line
- There are many implementation techniques for
delivery of an OLAP environment. - Must fully consider the performance, scalability,
complexity, and flexibility characteristics when
deciding between MOLAP, ROLAP, and HOLAP. - Understand your tools and RDBMS!
52MOLAP Vs. ROLAP
53Implementation Issues
- Data design and preparation
- Administration and performance
- OLAP platforms
- OLAP tools and products
54Data Design and Preparation
- Characteristics of data
- Stores and uses much less data compared to a DW
- Data is summarized. You will rarely find data at
the lowest levels of detail as in the DW - Data is more flexible for processing and analysis
partly because there is much less data to work
with - Every instance of the OLAP system in your
environment is customized for the purpose that
instance serves - OLAP data is generally customized
- Types and levels
- Static and dynamic summary data
- Permanent and transient detailed data
55Administration
- Administering and managing OLAP systems should be
handled with that of the DW environment - Some considerations
- Expectations on what data would be accessed and
how - Selection of the right business dimensions
- Selection of the right filters in loading data
from the DW - Choosing the aggregation, summarization, and
precalculation - Size of the multidimensional database
- Access and security privileges
- Backup and restore facilities
- Drill-through to the data warehouse
drill-through to another OLAP instance
56Performance
- OLAP takes most of the queries that normalling
would run against the DW - OLAP is designed for complex queries so it
should enhance overall query performance of the
DW environment - OLAP can precalculate and pre-aggregate data for
quick response
57OLAP Platforms
- Usually, the data warehouse and OLAP systems
reside on the same platform in the start. Later
when the data warehouse becmes large and OLAP is
a common task, OLAP system is moved to another
platform - A separate platform is needed when
- The size and usage of the DW consumes all
resources - Many departmental users desire OLAP capabilities
- The stability and performance of OLAP degrades
- OLAP tools require a different platform
configuration than the DW