Class 2 Notes - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Class 2 Notes

Description:

Medium and long-term decisions: capacity planning, store locations, new lines of business ... FROM Sales, Store, Time. WHERE Sales.StoreId = Store.StoreId. AND ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 30
Provided by: michae1481
Category:
Tags: class | notes

less

Transcript and Presenter's Notes

Title: Class 2 Notes


1
Class 2 Notes
  • Data Warehouse Technology and Management

2
Comparison of Processing Environments
  • Transaction processing
  • Uses operational databases
  • Short-term decisions fulfill orders, resolve
    complaints, provide staffing
  • Decision support processing
  • Uses integrated and summarized data
  • Medium and long-term decisions capacity
    planning, store locations, new lines of business

3
Data Warehouse Definition and Characteristics
  • A central repository for summarized and
    integrated data from operational databases and
    external data sources
  • Key Characteristics
  • Subject-oriented
  • Integrated
  • Time-variant
  • Nonvolatile

4
Data Comparison
5
Business Architectures and Applications
  • Data warehouse projects
  • Top-down architectures
  • Bottom-up architecture
  • Applications and data mining

6
Data Warehouse Projects
  • Large efforts with much coordination across
    departments
  • Enterprise data model
  • Important artifact of data warehouse project
  • Structure of data model
  • Meta data for data transformation
  • Top-down vs. bottom-up business architectures

7
Two Tier Architecture
8
Three Tier Architecture
9
Bottom-up Architecture
10
Applications
11
Data Cube Concepts and Operators
  • Basics
  • Dimension and measure details
  • Operators

12
Data Cube Basics
  • Multidimensional arrangement of data
  • Users think about decision support data as data
    cubes
  • Terminology
  • Dimension subject label for a row or column
  • Member value of dimension
  • Measure quantitative data stored in cells

13
Data Cube Example
14
Dimension and Measure Details
  • Dimensions
  • Hierarchies members can have sub members
  • Sparsity many cells do not have data
  • Measures
  • Derived measures
  • Multiple measures in cells

15
Slice Operator
  • Focus on a subset of dimensions
  • Set dimension to specific value 1/1/2003

16
Dice Operator
  • Focus on a subset of member values
  • Replace dimension with a subset of values
  • Dice operation often follows a slice operation

17
Other Operators
  • Operators for hierarchical dimensions
  • Drill-down add detail to a dimension
  • Roll-up remove detail from a dimension
  • Recalculate measure values
  • Pivot rearrange dimensions

18
Operator Summary
19
Relational Data Modeling
  • Large data warehouses use relational databases
    rather than storing data directly in cubes
  • Dimension table contains member values
  • Fact table contains measure values
  • 1-M relationships from dimension to fact tables
  • Grain most detailed measure values stored
  • Materialized view companion table to support
    fast retrieval

20
Star Schema Example
21
Materialized Views
  • Stored view
  • Periodically refreshed with source data
  • Usually contain summary data
  • Fast query response for summary data
  • Appropriate in query dominant environments

22
Materialized View Example
  • CREATE MATERIALIZED VIEW MV1
  • BUILD IMMEDIATE
  • REFRESH COMPLETE ON DEMAND
  • ENABLE QUERY REWRITE AS
  • SELECT StoreState, TimeYear,
  • SUM(SalesDollar) AS SUMDollar1
  • FROM Sales, Store, Time
  • WHERE Sales.StoreId Store.StoreId
  • AND Sales.TimeNo Time.TimeNo
  • AND TimeYear gt 2000
  • GROUP BY StoreState, TimeYear

23
Refresh Process Overview
24
Refreshing a Data Warehouse
  • Match decision support needs with reality of
    available data
  • Data sources
  • Internal and external
  • Availability constraints
  • Varying change rates
  • Complex process
  • Significant resources to manage refresh process

25
Data Source Classification
  • Cooperative
  • Logged
  • Queryable
  • Snapshot

26
Maintenance Workflow
27
Data Quality Problems
  • Multiple identifiers
  • Multiple field names
  • Different units
  • Missing values
  • Orphaned values
  • Multipurpose fields
  • Conflicting data
  • Different update times

28
ETL Tools
  • Extraction, Transformation, and Loading
  • Support software development and execution of the
    refresh process
  • Specification based
  • Eliminate custom coding
  • Third party and DBMS based tools

29
Determining the Refresh Frequency
  • Maximize net refresh benefit
  • Value of data timeliness
  • Cost of refresh
  • Satisfy data warehouse and source system
    constraints
Write a Comment
User Comments (0)
About PowerShow.com