OLAP - PowerPoint PPT Presentation

About This Presentation
Title:

OLAP

Description:

... the date of sale, the dealer who sold the car, the day, the time, and the price charged. ... dealer. car. date. Slicing and Dicing. Slice ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 30
Provided by: aaa23
Category:
Tags: olap

less

Transcript and Presenter's Notes

Title: OLAP


1
OLAP
2
Overview
  • Traditional database systems are tuned to many,
    small, simple queries.
  • Some new applications use fewer, more
    time-consuming, analytic queries.
  • New architectures have been developed to handle
    analytic queries efficiently.

3
The Data Warehouse
  • The most common form of data integration.
  • Copy sources into a single DB (warehouse) and try
    to keep it up-to-date.
  • Usual method periodic reconstruction of the
    warehouse, perhaps overnight.
  • Frequently essential for analytic queries.

4
OLTP
  • Most database operations involve On-Line
    Transaction Processing (OTLP).
  • Short, simple, frequent queries and/or
    modifications, each involving a small number of
    tuples.
  • Examples Answering queries from a Web interface,
    sales at cash registers, selling airline tickets.

5
OLAP
  • On-Line Analytical Processing (OLAP, or
    analytic) queries are, typically
  • Few, but complex queries --- may run for hours.
  • Queries do not depend on having an absolutely
    up-to-date database.

6
Common Architecture
  • Databases at store branches handle OLTP.
  • Local store databases copied to a central
    warehouse overnight.
  • Analysts use the warehouse for OLAP.

7
Star Schemas
  • A star schema is a common organization for data
    at a warehouse. It consists of
  • Fact table a very large accumulation of facts
    such as sales.
  • Often insert-only.
  • Dimension tables smaller, generally static
    information about the entities involved in the
    facts.

8
Example Star Schema
  • Suppose we want to record in a warehouse
    information about every car sale the serial
    number, the date of sale, the dealer who sold the
    car, the day, the time, and the price charged.
  • The fact table is a relation
  • Sales(serialNo, date, dealer, price)

9
Example -- Continued
  • The dimension tables include information about
    the autos, dealers, and days dimensions
  • Autos(serialNo, model, color)
  • Dealers(name, city, state)
  • Days(day, week, month, year)
  • (5, 27, 7, 2000)

Day dimension table probably not stored.
10
Visualization Star Schema
Dimension Table (Dealer)
Dimension Table (Autos)
Dimension Attrs.
Dependent Attrs.
Fact Table - Sales
Dimension Table (Day)
Dimension Table (etc.)
11
Dimensions and Dependent Attributes
  • Two classes of fact-table attributes
  • Dimension attributes the key of a dimension
    table.
  • Dependent attributes a value determined by the
    dimension attributes of the tuple.
  • price is the dependent attribute of our example
    Sales relation.
  • It is determined by the combination of dimension
    attributes.

12
Approaches to Building Warehouses
  1. ROLAP relational OLAP Tune a relational
    DBMS to support star schemas.
  2. MOLAP multidimensional OLAP Use a
    specialized DBMS with a model such as the data
    cube.

13
Data Cubes
  • Keys of dimension tables are the dimensions of a
    hypercube.
  • Example for the Sales data, the three dimensions
    are serialNo, date, and dealer.
  • Dependent attributes (e.g., price) appear at the
    points of the cube.

14
Visualization -- Data Cubes
car
price
dealer
date
15
Slicing and Dicing
  • Slice
  • focus on particular partitions along (one or
    more) dimension i.e., focusing on a particular
    slice of the cube
  • WHERE clause in SQL
  • Dice
  • partitions the cube into smaller subcubes and the
    points in each subcube are aggregated
  • GROUP BY clause in SQL

16
Slicing and Dicing in SQL
  • SELECT grouping-attributes and aggregations
  • FROM fact table joined with (zero or more)
    dimension tables
  • WHERE certain attributes are compared with
    constants / slicing /
  • GROUP BY grouping-attributes / dicing /

17
Slicing and Dicing Example
  • Suppose a particular car model, say Gobi, is
    not selling as well as anticipated. How to
    analyze?
  • Maybe its the color
  • Slice for Gobi. Dice for color.
  • SELECT color, SUM(price)
  • FROM Sales NATURAL JOIN Autos
  • WHERE model 'Gobi'
  • GROUP BY color
  • Doesnt show anything interesting.

18
Slicing and Dicing Example
  • Suppose the previous query doesn't tell us much
    each color produces about the same revenue.
  • Since the query does not dice for time, we only
    see the total over all time for each color.
  • We may thus issue a revised query that also
    partitions time by month.
  • SELECT color, month, SUM(price)
  • FROM (Sales NATURAL JOIN Autos) JOIN Days ON date
    day
  • WHERE model 'Gobi'
  • GROUP BY color, month

19
Slicing and Dicing Example
  • We might discover that red Gobis havnt sold well
    recently.
  • Does this problem exists at all dealers, or only
    some dealers have had low sales of red Gobis?
  • Lets dice on dealer dimension as well.
  • SELECT dealer, month, SUM(price)
  • FROM (Sales NATURAL JOIN Autos) JOIN Days ON date
    day
  • WHERE model 'Gobi' AND color 'red
  • GROUP BY month, dealer

20
Slicing and Dicing Example
  • At this point, we find that the sales per month
    for red Gobis are so small that we cannot observe
    any trends easily.
  • Thus, we decide that it was a mistake to dice by
    month. A better idea would be to partition only
    by years, and look at only the last two years
    (2006 and 2007).
  • SELECT dealer, year, SUM(price)
  • FROM (Sales NATURAL JOIN Autos) JOIN Days ON date
    day
  • WHERE model 'Gobi' AND color 'red' AND
  • (year 2001 OR year 2002)
  • GROUP BY year, dealer

21
Drill-Down and Roll-Up
  • Previous examples illustrate two common patterns
    in sequences of queries that slice-and-dice the
    data cube.
  • Drill-down is the process of partitioning more
    finely and/or focusing on specific values in
    certain dimensions.
  • Each of the example steps except the last is an
    instance of drill-down.
  • Roll-up is the process of partitioning more
    coarsely.
  • The last step, where we grouped by years instead
    of months to eliminate the effect of randomness
    in the data, is an example of roll-up.

22
Marginals
  • The data cube also includes aggregation
    (typically SUM) along the margins of the cube.
  • The marginals include aggregations over one
    dimension, two dimensions,

23
Visualization --- Data Cube w/Aggregation
car
SUM over all Days
price
dealer
date
24
Cube operator
  • CUBE(F) is an augmented table for fact table F
  • tuples (or points) added in CUBE(F)
  • have a value, denoted to each dimension
  • represents aggregation along that dimension

25
Example
  • Sales(serialNo, date, dealer, price)
  • Sales(model, color, date, dealer, val, cnt)

serialNo dimension not well suited for the cube
as summing the price over all dates, or over all
dealers, but keeping the serial number fixed has
no effect. We will replace the serial number by
model and color to which the serial number
connects. Also, we will have as independent
attributes, val for sum of prices and cnt for
number of cars sold.
26
Example CUBE(Sale)
  • ('Gobi', 'red', '2001-05-21', 'Friendly Fred',
    45000, 2)
  • On May 21, 2001, dealer Friendly Fred sold two
    red Gobis for a total of 45,000.
  • Hypothetical tuple that would be in both Sales
    and CUBE(Sales).
  • ('Gobi', , '2001-05-21', 'Friendly Fred',
    152000, 7)
  • On May 21, 2001, Friendly Fred sold seven Gobis
    of all colors, for a total price of 152,000.
  • Note that this tuple is in CUBE(Sales) but not in
    Sales.

27
Example CUBE(Sale)
  • ('Gobi', , '2001-05-21', , 2348000, 100)
  • On May 21, 2001, there were 100 Gobis sold by all
    the dealers, and the total price of those Gobis
    was 2,348,000.
  • ('Gobi', , , , 1339800000, 58000)
  • Over all time, dealers, and colors, 58,000 Gobis
    have been sold for a total price of
    1,339,800,000.
  • (, , , , 3521727000, 198000)
  • Total sales of all models in all colors, over all
    time at all dealers is 198,000 cars for a total
    price of 3,521,727,000.

28
CUBE Helps Answering Queries
  • SELECT color, AVG(price)
  • FROM Sales
  • WHERE model 'Gobi'
  • GROUP BY color
  • is answered by looking for all tuples of
    CUBE(Sales) with the form
  • ('Gobi', c, , , v, n)
  • where c is any specific color.
  • v and n will be the sum and number of sales of
    Gobis in that color.
  • The average price, is v/n.
  • The answer to the query is the set of (c v/n)
    pairs obtained from all
  • ('Gobi', c, , , v, n) tuples.

29
CUBE in SQL
  • SELECT model, color, date, dealer, SUM(val) AS v,
    SUM(cnt) AS n
  • FROM Sales
  • GROUP BY model, color, date, dealer
  • WITH CUBE
  • Suppose we materialize this into SalesCube.
  • Then the previous query is rewritten into
  • SELECT color, SUM(v)/SUM(n)
  • FROM SalesCube
  • WHERE model 'Gobi' AND
  • date IS NULL AND
  • dealer IS NULL
Write a Comment
User Comments (0)
About PowerShow.com