OLAP - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

OLAP

Description:

... tables. current data only - current products, available flights ... Product class e.g. cheap. Time Dimension. Date e.g. 11 Dec 2002. Day of Week e.g. Wednesday ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 23
Provided by: wall163
Category:
Tags: olap

less

Transcript and Presenter's Notes

Title: OLAP


1
OLAP
  • On Line Analytic Processing

2
OLTP
  • On Line Transaction Processing
  • support for real-time processing of orders,
    bookings, sales
  • typically access to single rows of tables
  • current data only - current products, available
    flights
  • supports Operational level of organisational
    hierarchy

3
Analysis
  • Support for
  • tactical (e.g. stock ordering in view of forecast
    demand
  • strategic (e.g. where to open a new store)
  • Need to understand the data - Business
    Intelligence

4
Requirements
  • Operations
  • Summation
  • Statistical analysis - variation for standard
  • Ranking and percentages
  • Comparison over time - last year-to-date
  • Static data picture to avoid inconsistent
    reads during analysis production, and
    comparisons between analyses
  • Addition data - forecasting models, external
    data, structural data (location of tills, of
    products on shelves)

5
Analysis pre database
  • 50 of code used to create complex bulk reports
    on how the company was performing
  • inflexible, costly, untimely
  • raw data too large to retain, so aggregation
    required - restricts year-to year comparison to
    level of prior aggregation

6
OLAP/Data warehouse
  • Specialised tools, data structures to support
    analysis on line i.e. on demand
  • Data will be a snap-shot to ensure consistency
  • Hold base fact table - raw basic facts - credit
    card transaction, item sale, booking
  • multiple analytic dimensions -
  • time, product, customer, store

7
SuperMarket Basket data
  • Fact - a single line on a till receipt
  • basket no
  • date
  • customer no
  • product code
  • till no
  • quantity
  • price
  • Dimensions
  • Customer, Product, Time, Till/Store

8
Star Schema
9
Kinds of Attributes
  • Measures
  • continuously varying
  • interval,ratio scales
  • able to be summed, ranked etc
  • Dimensions
  • nominal scales - no ordering
  • may be classified in multiple ways
  • Date/Time
  • interval scale - i.e ordered only
  • can be treated as dimension and classified

10
Product dimension
  • Product e.g. size 9 Bata slippers
  • Product category - shoes
  • Product group - clothing
  • Size e.g. shoe9
  • Range e.g Bata
  • Product class e.g. cheap

11
Time Dimension
  • Date e.g. 11 Dec 2002
  • Day of Week e.g. Wednesday
  • Week e.g. 49
  • Month e.g. December
  • Qtr e.g.3
  • Year e.g. 2002
  • Promotion period e.g. Pre-Christmas Sales
  • Season e.g. Autumn

12
Snowflake schema
13
Query Processing
  • Consider a typical analytic query
  • How do sales of clothing vary by day of week in
    stores in the SW region?
  • Select product.name,dow.name,sum(qtyprice)
  • from sales, product, productCategory,
    productGroup,date,DOW
  • where ( the 5 join conditions)
  • and (productGroup.nameClothing)
  • group by DOW.name,product.name
  • order by DOW.name,product.name

14
Difficulties
  • Fact table is huge - must be compressed as much
    as possible by reducing field sizes etc
  • Since dimension data is smaller and more stable,
    OK to denormalise to reduce joins
  • date - add dow.id,month.id,year,fiscalYear,
  • Balance required
  • Denormalised dimensions result in Starflake
    schema

15
Aggregation
  • Simple case
  • sales - date,product,cust,store,value
  • How many aggregations are possible
  • Store(5) Product(10) Customer(30)

16
Aggregation operations
  • Cube - all possible aggregations
  • 8 for 3 dimensions
  • Roll-up - aggregate in order
  • e.g. Product,Store,Cust
  • 4 for 3 dimensions
  • Slice and Dice
  • Takes parts or slices of a cube of aggregations
  • Drill down
  • Given an aggregation, expand an aggregated
    dimension e.g. Expand clothing sales analysis by
    City

17
Data Mining
  • OLAP requires a priori assumptions about the
    categories of interest
  • But a useful category may be hidden - can it be
    discovered a posteriori ?
  • e.g. identify high risk motor insurance policies
    by attributes of policy - gender, age, type of
    vehicle, job, postcode
  • Rule induction (machine learning) methods can be
    used

18
  • Finding relationships
  • looking for deviations from normal behaviour -
    e.g. to identify fraudulent transactions in a
    credit card company
  • looking for deviations from average e.g.
    Non-random combinations of goods in a basket -
    classic example is beer and nappies
  • Requires heavy aggregations, statistical
    selection, rule induction

19
ETL
  • Extract - Transform - Load
  • OLAP databases are often said to be read only -
    but all need periodic updating with new data
    extracted from sources, validation, re-organising
    and load, whilst maintaining aggregations and
    index

20
  • Extract
  • new or changed data from the OLTP
  • changed product and structural data
  • external data
  • External data may be in legacy systems, remote
    databases, flat files

21
Transform
  • Filtering out bad transaction data
  • Validating against database

22
Load
  • Load new facts
  • Re-de-normalising if product dimensions change
  • Re-aggregation
Write a Comment
User Comments (0)
About PowerShow.com