OLAP - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

OLAP

Description:

Where Does OLAP Fit In? ( 2) ... Calculations involving facts and metrics ... 4 Billion rows in fact table. ... – PowerPoint PPT presentation

Number of Views:334
Avg rating:3.0/5.0
Slides: 58
Provided by: asimk
Category:
Tags: olap

less

Transcript and Presenter's Notes

Title: OLAP


1
OLAP
  • CS 543 Data Warehousing

2
Where Does OLAP Fit In? (1)
  • OLAP On-line analytical processing.
  • OLAP is a characterization of applications, not a
    database design technique.
  • Idea is to provide very fast response time in
    order to facilitate iterative decision-making.
  • Analytical processing requires access to complex
    aggregations (as opposed to record-level access).

3
Where Does OLAP Fit In? (2)
  • Information is conceptually viewed as cubes for
    simplifying the way in which users access, view,
    and analyze data.
  • Quantitative values are known as facts or
    measures.
  • e.g., sales , units sold, etc.
  • Descriptive categories are known as dimensions.
  • e.g., geography, time, product, scenario (budget
    or actual), etc.
  • Dimensions are often organized in hierarchies
    that represent levels of detail in the data
    (e.g., UPC, SKU, product subcategory, product
    category, etc.).

4
OLAP FASMI Test
  • Fast Delivers information to the user at a
    fairly constant rate. Most queries should be
    delivered to the user in five seconds or less.
  • Analysis Performs basic numerical and
    statistical analysis of the data, pre-defined by
    an application developer or defined ad hoc by the
    user.
  • Shared Implements the security requirements
    necessary for sharing potentially confidential
    data across a large user population.
  • Multi-dimensional The essential characteristic
    of OLAP.
  • Information Accesses all the data and
    information necessary and relevant for the
    application, wherever it may reside and not
    limited by volume.
  • ...from the OLAP Report by Pendse and Creeth.

5
Need for Multidimensional Analysis
  • A simple analysis
  • How many units of product A did we sell in the
    store in DHA, Lahore
  • Typically, decision support requires more complex
    analyses
  • How much revenue did the new product X generate
    during the last three months, broken down by
    individual months, in the Southern Region, by
    individual stores, broken down by the promotions,
    compared to estimates, and compared to the
    previous version of the the product?

6
Kinds of Analyses
  • Roll-ups to provide summaries and aggregates
    along the hierarchies of the dimensions
  • Drill-downs from the top level to the lowest
    along the hierarchies of the dimensions
  • Calculations involving facts and metrics
  • Algebraic equations involving key performance
    indicators
  • Moving averages and growth percentages
  • Trend analyses using statistical methods

7
(No Transcript)
8
(No Transcript)
9
OLAP?
  • The name On-Line Analytical Processing was coined
    in a paper by E.F. Codd in 1993 (Providing
    On-Line Analytical Processing for User Analysts)
  • A definition
  • OLAP is a category of software technology that
    enables analysts, managers, and executives to
    gain insight into data through fast, consistent,
    interactive access in a a wide variety of
    possible views of information that has been
    transformed from raw data to reflect the real
    dimensionality of the enterprise as understood by
    the user

10
OLAP Features
11
Dimensional Analysis (1)
12
Dimensional Analysis (2)
13
Some Queries
  • Display the total sales of all products for past
    five years in all stores
  • Compare total sales for all stores, product by
    product, between years 2000 and 1999.
  • Show comparison of sales by individual stores,
    product by product, between years 2000 and 1999
    only for those products with reduced sales.
  • Show the results of the previous queries, but
    rotating the columns with rows

14
Hypercubes
  • Multi-dimension cubes
  • Hard to visualize and display beyond three
    dimensions
  • Multi-dimensional domain structure (MDS)
  • Represents each dimension as a line showing the
    values

15
MDS
16
Display of Hypercubes
17
(No Transcript)
18
(No Transcript)
19
Drill-Down and Roll-Up
20
Slice-and-Dice or Rotation
21
OLAP Models/Implementations
  • MOLAP OLAP implemented with a multi-dimensional
    database.
  • ROLAP OLAP implemented with a relational
    database.
  • HOLAP OLAP implemented with a hybrid of
    multi-dimensional and relational database
    technologies.
  • DOLAP OLAP implemented for desktop decision
    support environments.

22
ROLAP and MOLAP
23
MOLAP Implementations
  • OLAP has historically been implemented through
    use of multi-dimensional databases (MDDs).
  • Dimensions are key business factors for analysis
  • geographies (zip, state, region,...)
  • products (item, product category, product
    department,...)
  • dates (day, week, month, quarter, year,...)
  • Very high performance via fast look-up into
    cube data structure to retrieve pre-calculated
    results.
  • Cube data structures allow pre-calculation of
    aggregate results for each possible combination
    of dimensional values.
  • Use of application programming interface (API)
    for access via front-end tools.

24
(No Transcript)
25
MOLAP Implementations
  • Need to consider both maintenance and storage
    implications when designing strategy for when to
    build cubes.
  • Maintenance Considerations Every data item
    received into MDD must be aggregated into every
    cube (assuming to-date summaries are
    maintained).
  • Storage Considerations Although cubes get much
    smaller (e.g., more dense) as dimensions get less
    detailed (e.g., year vs. day), storage
    implications for building hundreds of cubes can
    be significant.

26
MOLAP Implementations
  • Typically outperform relational database
    technology because all answers are pre-computed
    into cubes (and overhead for accessing cubes is
    very low).
  • Difficult to scale because of combinatorial
    explosion in the number and size of cubes when
    dimensions of significant cardinality are
    required.
  • Beyond tens (sometimes small hundreds) of
    thousands of entries in a single dimension will
    break the MOLAP model because the pre-computed
    cube model does not work well when the cubes are
    very sparse in the population of individual
    cells.
  • See www.olapreport.com/DataExplosion.htm

27
Virtual Cubes
  • Virtual cubes are used when there is a need to
    join information from two dissimilar cubes that
    share one or more common dimensions.
  • Similar to a relational view two (or more) cubes
    are linked along common dimension(s).
  • Often used to save space by eliminating redundant
    storage of information.

28
Partitioned Cubes
  • One logical cube of data can be spread across
    multiple physical cubes on separate (or same)
    servers.
  • The divide-and-conquer approach of partitioned
    cubes helps to mitigate the scalability
    limitations of a MOLAP environment.
  • Ideal cube partitioning is completely invisible
    to end users.

29
ROLAP Implementations
  • Advances in database technologies and front-end
    tools have begun to allow deployment of OLAP
    using ANSI SQL RDBMS implementations.
  • ROLAP facilitates deployment of much larger
    dimension tables than MOLAP implementations.
  • Front-end tools to facilitate GUI access to
    multi-dimensional analysis capabilities.
  • Aggregate awareness allows exploitation of
    pre-built summary tables for some front-end
    tools.
  • Star schema designs are often used to facilitate
    OLAP against relational databases.

30
(No Transcript)
31
Simplified Third Normal Form (Retail)
32
Simplified Star Schema
33
Simplified Star Schema
  • A vastly simplified physical data model!
  • Collapse dimensional hierarchies into a single
    table for each dimension and create a single fact
    table from the header and detail records
  • Fewer tables.
  • Fewer joins to get results.

34
Star Schema for High Performance
  • Business question How many in raincoats did I
    sell in the first week of January through stores
    in Boston?
  • Assume
  • 4 Billion rows in fact table.
  • 20 different kinds (size, color, style) of
    raincoats (product category) out of 50,000 UPCs
    in store.
  • 8 stores out of 400 are in BOSTON SMSA.
  • 2 years of POS history in DBMS.

35
Star Schema for High Performance
  • Simple (poor performance) approach to query
    execution
  • 1. Join item table with filtering on raincoat
    product category (very selective) to fact table.
  • 2. Join date table with filtering by week (next
    most selective) to result table.
  • 3. Join store table with filtering on store to
    result table from step 2.
  • 4. Aggregate.

36
Star Schema for High Performance
  • Advanced (better performance) approach to query
    execution
  • 1. Cartesian product join between dimensional
    tables.
  • Result is 20 x 8 x 7 1,120 rows.
  • 2. Use composite index on itemstoreday into
    fact table for very selective access.
  • Access less than 0.00000008 percent of data in
    fact table!
  • Sophisticated cost-based optimizers will figure
    this out.

37
Forcing a Cartesian Product Join
  • Add an addition join_value column in each
    dimensional table.
  • Set join_value to same value in all rows of the
    dimensional tables.
  • Add additional where clause predicates joining on
    this column between dimensional tables.
  • NOTE This shouldn't be necessary with a smart
    optimizer.

38
Forcing a Cartesian Product Join
  • Sample code
  • select sum(sales.sales_amt)
  • from d_sales_detail
  • ,store
  • ,item
  • ,period
  • where d_sales_detail.store_id store.store_id
  • and d_sales_detail.item_id item.item_id
  • and d_sales_detail.day_dt period.day_dt
  • and period.day_dt between '23-NOV-2000' and
    '24-DEC-2000'
  • and item.trade_style_cd 'BARBIE'
  • and store.state_cd 'CA'
  • and store.join_value period.join_value
  • and store.join_value item.join_value
  • and period.join_value item.join_value

39
Star Schema for High Performance
  • Problem What if I want to know raincoat sales
    in first week of January regardless of store?
  • Answer Performance advantage of composite index
    in traditional RDBMS is severely impaired!
  • B-tree indexing techniques do not allow for
    flexibility in the use of dimensions for query
    purposes.
  • Bit indexing (and variations thereof) often
    allows much more generality in achieving high
    performance from a star schema.

40
Star Schema for High Performance
  • Bottom Line
  • It is not at all unusual to obtain an order of
    magnitude (or more) in performance advantage
    using a star schema with advanced indexing versus
    a more traditional relational database
    implementation.
  • Despite what vendors may tell you, star schemas
    cannot be effectively implemented for all DSS
    business applications and/or data models.

41
ROLAP
  • Relational OLAP often makes heavy use of summary
    tables to provide near instantaneous access for
    multi-dimensional queries.
  • Foundation is usually star schema or snowflake
    database design.
  • Allows OLAP with much larger data sets than
    multi-dimensional database (MDD) products using
    cube structures (MOLAP).

42
ROLAP
  • Number of summary tables can get very large if
    discipline is not enforced...
  • Assume a retail database with the following two
    dimensions on the fact table...
  • Calendar Day, Week, Period, Quarter, Year, All
    Days
  • Geography Store, Zone, District, Region, All
    Stores

43
ROLAP
Summary tables in a naive implementation require
all combinations of the dimensions at each
aggregation level...
30 summary tables! ... Add in item, SKU,
subcategory, category, and all items...now we are
up to 150 pre-aggregates!
44
ROLAP
  • Summary tables are more of a maintenance issue
    than a storage issue in most production
    implementations.
  • Notice that summary tables get much smaller as
    dimensions get less detailed (e.g., year vs.
    day).
  • Should plan for double the size of the
    unsummarized data for ROLAP summaries in most
    environments.
  • Every detail record that is received into
    warehouse must aggregate into EVERY summary
    table (assuming "to-date" summaries are
    maintained).

45
ROLAP
  • Warning Do not assume that dimensions are
    always simple hierarchies.
  • Example Items are not just category,
    subcategory, SKU, and atomic item.... what about
    trade styles or manufacturer?
  • Now we need summary tables along these lines as
    well...another 120 summary tables!
  • Calendar vs. accounting period vs. billing cycle
    can be even worse...

46
ROLAP
  • Many ROLAP products have devised ways to reduce
    the number of summary tables
  • Ability to build summaries on-the-fly as demanded
    by end-user applications.
  • Ability to aggregate efficiently from subset of
    the summary tables.
  • Tools exist in some products to assist in DBAs in
    selecting the "best aggregations to build.
  • HOLAP (Hybrid OLAP) tools allow co-existence of
    pre-built cubes alongside relational OLAP
    structures.

47
Intelligent Aggregation Selection
  • Maximum performance boost implies lots of disk
    for every pre-calculation.
  • Minimum performance boost implies no disk with
    zero pre-calculation.
  • Strategy is to use meta data to heuristically
    determine optimum set of aggregates from which
    all other aggregates can be derived.

48
Aggregate Wizards
49
Fact Table Aggregates
  • Enhance performance on common queries at coarser
    granularities.
  • Save space to permit storing more history than
    possible with finer granularities.
  • Take advantage of need to store other facts (with
    similar samples) at a particular granularity.

50
Aggregate Advice
  • Coarser granularity decreases potential
    cardinality, but usually increases density (e.g.,
    daily summary table is typically twice the size
    of weekly summary table - not seven times).
  • Strongly consider omitting candidate aggregates
    where expected cardinality is more than 10 that
    of next finer granularity stored.
  • Keep the detail for drill down, even if you
    deploy aggregates for performance.

51
Bottom Line
  • There are many implementation techniques for
    delivery of an OLAP environment.
  • Must fully consider the performance, scalability,
    complexity, and flexibility characteristics when
    deciding between MOLAP, ROLAP, and HOLAP.
  • Understand your tools and RDBMS!

52
MOLAP Vs. ROLAP
53
Implementation Issues
  • Data design and preparation
  • Administration and performance
  • OLAP platforms
  • OLAP tools and products

54
Data Design and Preparation
  • Characteristics of data
  • Stores and uses much less data compared to a DW
  • Data is summarized. You will rarely find data at
    the lowest levels of detail as in the DW
  • Data is more flexible for processing and analysis
    partly because there is much less data to work
    with
  • Every instance of the OLAP system in your
    environment is customized for the purpose that
    instance serves
  • OLAP data is generally customized
  • Types and levels
  • Static and dynamic summary data
  • Permanent and transient detailed data

55
Administration
  • Administering and managing OLAP systems should be
    handled with that of the DW environment
  • Some considerations
  • Expectations on what data would be accessed and
    how
  • Selection of the right business dimensions
  • Selection of the right filters in loading data
    from the DW
  • Choosing the aggregation, summarization, and
    precalculation
  • Size of the multidimensional database
  • Access and security privileges
  • Backup and restore facilities
  • Drill-through to the data warehouse
    drill-through to another OLAP instance

56
Performance
  • OLAP takes most of the queries that normalling
    would run against the DW
  • OLAP is designed for complex queries so it
    should enhance overall query performance of the
    DW environment
  • OLAP can precalculate and pre-aggregate data for
    quick response

57
OLAP Platforms
  • Usually, the data warehouse and OLAP systems
    reside on the same platform in the start. Later
    when the data warehouse becmes large and OLAP is
    a common task, OLAP system is moved to another
    platform
  • A separate platform is needed when
  • The size and usage of the DW consumes all
    resources
  • Many departmental users desire OLAP capabilities
  • The stability and performance of OLAP degrades
  • OLAP tools require a different platform
    configuration than the DW
Write a Comment
User Comments (0)
About PowerShow.com