An%20Introduction%20to%20Data%20Warehousing - PowerPoint PPT Presentation

About This Presentation
Title:

An%20Introduction%20to%20Data%20Warehousing

Description:

Is this medicine available in stock. What are the tests this patient has completed so far ... Are nothing but more responsive real time data warehouses ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 51
Provided by: piyush9
Category:

less

Transcript and Presenter's Notes

Title: An%20Introduction%20to%20Data%20Warehousing


1
An Introduction to Data Warehousing
1
2
Business Intelligence
  • Now,if the Estimates made before a Battle
    indicate Victory,it is because careful
    calculations show that your conditions are more
    favorable than those of your enemyif they
    indicate defeat ,it is because careful
    calculations show that the favorable conditions
    for a Battle are fewer.With more careful
    calculations one can win with less one cannot.
    How much chance of Victory has one who makes no
    calculations at all !!
  • --- Sun Tzu ,
    The Art of War
  • Business these days are ,war minus shooting.

    -Anonymous

3
Course Roadmap
  • Introduction to Datawarehousing
  • Difference between Operational System and
    DataWarehouse
  • Emergence of Decision Support Systems
  • DataWarehouse Theoretical Architecture
  • DataWarehouse Technical Architecture
  • DataWarehouse Bus Architecture
  • Data Modelling concepts
  • E-R Modelling for OLTP System
  • Dimensional Modelling for a Datawarehouse
  • Scheme generation for Datawarehouse
  • Star Scheme Design
  • Snowflake Scheme Design
  • Key aspects in designing the Dimensional Model
  • Granularity with respect to the Fact Table in
    the Schemas
  • Conformed Facts,Dimensions

4
Course Roadmap
  • Fact less Fact Tables,Aggregate Fact Tables
  • Out Trigger Entities in the Schemas
  • Types of Relationships to be maintained between
    Facts
  • and Dimensions
  • Dependencies while generating Physical Scheme
    for
  • a DataWarehouse
  • Case Study of design of DataWarehouse for an
    existing
  • ERmodel

5
Objectives
  • At the end of this session, you will know
  • What is Data Warehousing
  • The evolution of Data Warehousing
  • Need for Data Warehousing
  • OLTP Vs Warehouse Applications
  • Data marts Vs Data Warehouses
  • Operational Data Stores
  • Overview of Warehouse Architecture

6
Objectives
  • At the end of this lesson, you will know
  • Data Warehouse Architectures
  • Components of Data Warehousing Architecture
  • An overview of each of the components
  • Considerations for Data Warehouse Design
  • Common mistakes in Warehouse designs
  • An overview of Warehouse on the web

7
  • What is a DataWarehouse ?

8
What is a Data Warehouse ?
  • A data warehouse is a subject-oriented,
    integrated, nonvolatile, time-variant collection
    of data in support of management's decisions.
  • - WH Inmon

WH Inmon - Regarded As Father Of Data Warehousing
9
Subject-Oriented- Characteristics of a Data
Warehouse
Data Warehouse
Operational
Prospects
Leads
Customers
Products
Time
Quotes
Regions
Orders
Focus is on Subject Areas rather than Applications
10
Integrated - Characteristics of a Data Warehouse
Appl A - m,f Appl B - 1,0 Appl C - male,female

m,f
Appl A - balance dec fixed (13,2) Appl B -
balance pic 9(9)V99 Appl C - balance pic S9(7)V99
comp-3
balance dec fixed (13,2)
Appl A - bal-on-hand Appl B - current-balance Appl
C - cash-on-hand
Current balance
Appl A - date (julian) Appl B - date
(yymmdd) Appl C - date (absolute)
date (julian)
Integrated View Is The Essence Of A Data Warehouse
11
Non-volatile - Characteristics of a Data Warehouse

insert
change
insert
delete
load
read only access
replace
change
Data Warehouse Is Relatively Static In Nature
12
Time Variant - Characteristics of a Data Warehouse

Data Warehouse
Operational
  • Current Value data
  • time horizon 60-90 days
  • Snapshot data
  • time horizon 5-10 years
  • data warehouse stores historical data

Data Warehouse Typically Spans Across Time
13
Alternate Definitions
  • A collection of integrated, subject oriented
    databases designed to support the DSS function,
    where each unit of data is relevant to some
    moment of time
  • - Imhoff

14
Alternate Definitions
  • Data Warehouse is a repository of data summarized
    or aggregated in simplified form from
    operational systems. End user orientated data
    access and reporting tools let user get at the
    data for decision support - Babcock

15
Evolution of Data Warehousing
1960 - 1985 MIS Era
  • Unfriendly
  • Slow
  • Dependent on IS programmers
  • Inflexible
  • Analysis limited to defined reports

Focus on Reporting
16
Evolution of Data Warehousing
1985 - 1990 Querying Era
Queries that are formulated by the user on the
spur of the moment
  • Adhoc, unstructured access to corporate data
  • SQL as interface not scalable
  • Cannot handle complex analysis

Focus on Online Querying
17
Evolution of Data Warehousing
1990 - 20xx Analysis Era
  • Trend Analysis
  • What If ?
  • Cross Dimensional Comparisons
  • Statistical profiles
  • Automated pattern and rule discovery

Focus on Online Analysis
18
Need for Data Warehousing
  • Better business intelligence for end-users
  • Reduction in time to locate, access, and analyze
    information
  • Consolidation of disparate information sources
  • Strategic advantage over competitors
  • Faster time-to-market for products and services
  • Replacement of older, less-responsive decision
    support systems
  • Reduction in demand on IS to generate reports

19
Business Queries
Typical Business Queries
  • Which product generated maximum revenue over last
    two quarters in a chosen geographical region,
    city wise, relative to the previous version of
    product, compared with the plan
  • What percent of customer procures product A with
    B in a chosen region, broken down by city,
    season, and income group

20
OLTP Systems Vs Data Warehouse
Remember Between OLTP and Data Warehouse systems
users are different
data content is different,
data structures are different
hardware is different
Understanding The Differences Is The Key
21
OLTP Vs Warehouse
22
OLTP Vs Warehouse
23
OLTP Vs Warehouse
24
Capacity Planning
Processing Power
Time of day
Processing Load Peaks During the Beginning and
End of Day
25
Examples Of Some Applications
  • Target Marketing
  • Market Segmentation
  • Budgeting
  • Credit Rating Agencies
  • Financial Reporting and Consolidation
  • Market Basket Analysis - POS Analysis
  • Fraud Management
  • Profitability Management
  • Event tracking

26
Do we need a separate database ?
  • OLTP and data warehousing require two very
    differently configured systems
  • Isolation of Production System from Business
    Intelligence System
  • Significant and highly variable resource demands
    of the data warehouse
  • Cost of disk space no longer a concern
  • Production systems not designed for query
    processing

27
Data Marts
  • Enterprise wide data warehousing projects have a
    very large cycle time
  • Getting consensus between multiple parties may
    also be difficult
  • Departments may not be satisfied with priority
    accorded to them
  • Sometimes individual departmental needs may be
    strong enough to warrant a local implementation
  • Application/database distribution is also an
    important factor

28
Data Marts
  • Subject or Application Oriented Business View of
    Warehouse
  • Finance, Manufacturing, Sales etc.
  • Smaller amount of data used for Analytic
    Processing
  • Address a single business process

A Logical Subset of The Complete Data Warehouse
29
Data Warehouse and Data Mart
30
Data Warehouse and Data Mart
31
Warehouse or Mart First ?
32
Different kinds of Information Needs
  • Current
  • Recent
  • Historical

Is this medicine available in stock What are
the tests this patient has completed so far Has
the incidence of Tuberculosis increased in last 5
years in Southern region
OLTP
ODS
Data Warehouse
33
Operational Data Store - Definition
Can I see credit report from Accounts, Sales from
marketing and open order report from order entry
for this customer
Data from multiple sources is integrated for a
subject
  • A subject oriented, integrated,
  • volatile, current valued data store containing
    only corporate
  • detailed data

Data stored only for current period. Old Data is
either archived or moved to Data Warehouse
Identical queries may give different results at
different times. Supports analysis requiring
current data
34
Operational Data Store
  • Increasingly becoming integrated with the data
    warehouse
  • Are nothing but more responsive real time data
    warehouses
  • Data Mining has anyway forced Data Warehouses to
    store transactional level data

35
OLTP Vs ODS Vs DWH
36
OLTP Vs ODS Vs DWH
37
OLTP Vs ODS Vs DWH
38
Typical Data Warehouse Architecture
Multi-tiered Data Warehouse without ODS
39
Typical Data Warehouse Architecture
Multi-tiered Data Warehouse with ODS
40
Benefits of DWH
These capabilities empower the corporate...
  • To formulate effective business, marketing
  • and sales strategies.
  • To precisely target promotional activity.
  • To discover and penetrate new markets.
  • To successfully compete in the marketplace
  • from a position of informed strength.
  • To build predictive rather than retrospective
    models.

41
Warehouse Architecture - 1
EIS /DSS
Metadata
Query Tools
Select Extract Transform Integrate Maintain
Data Warehouse
OLAP/ROLAP
Web Browsers
Operational Systems/Data
Middleware/ API
Data Mining
Data Preparation
Enterprise Data Warehouse
42
Warehouse Architecture - 2
Single Department Data Mart
43
Warehouse Architecture - 3
Data Marts
EIS /DSS
Metadata
Query Tools
Select Extract Transform Integrate Maintain
Data Warehouse
OLAP/ROLAP
Web Browsers
Operational Systems/Data
Middleware/ API
Operational Data Store
Data Mining
Data Preparation
Multi-tiered Data Warehouse
44
Data Warehouse Architectures
  • There are three schools of thought about DW
    architectures
  • One supports Dimensional Modeling all through
    (Ralph Kimball)
  • Second supports ER for Data Warehouse and Star
    Schemas for Data Marts
  • Third supports ER model for DW (NCR)

45
Kimballs View
Multiple Data Marts With Conformed Dimensions
46
Inmons View
Data Warehouse (ER) Feeding Multiple Data Marts
(Star Schema)
47
Components of a Data Warehouse Architecture
  • Source Databases
  • Data extraction/transformation/load (ETL) tool
  • Data warehouse maintenance and administration
    tools
  • Data modeling tool or interface to external data
    models
  • Warehouse databases
  • End-user data access and analysis tools

48
Components of a Data Warehouse Architecture
Data Warehouse Is Not Just About Data... But
Tools Too
49
Source Databases - Characteristics
  • Legacy, relational, text or external sources
  • Designed for high-speed transaction processing
  • Real-time, current, volatile data
  • Fast response for larger numbers of concurrent
    users
  • Many short transactions
  • Update-intensive modifications by row
  • Inquiry-oriented access by keys
  • High integrity, security, recoverability
  • Source data is often inconsistent and poorly
    modeled

50
Data Cleaning Tools
  • To clean data at the source
  • Clean up source data in-place on the host
  • Business rule discovery tools which analyse the
    source data and write cleaning rules based on
    lexical analysis and AI techniques
  • Poorly integrated with data warehousing tools
  • ETL tools have limited yet adequate data
    cleansing functionality
Write a Comment
User Comments (0)
About PowerShow.com