Designing the data warehouse / data marts - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Designing the data warehouse / data marts

Description:

Designing the data warehouse / data marts Methodologies and Techniques Life cycle of the DW Oracle Warehouse Components Oracle Intelligence Tools Oracle Data Mart ... – PowerPoint PPT presentation

Number of Views:222
Avg rating:3.0/5.0
Slides: 46
Provided by: fad1
Category:

less

Transcript and Presenter's Notes

Title: Designing the data warehouse / data marts


1
Designing the data warehouse/ data marts
  • Methodologies and Techniques

2
Basic principles
3
Life cycle of the DW
First time load
Operational Databases
Refresh
Refresh
Purge or Archive
Refresh
4
Oracle Warehouse Components
Any Data
Any Access
Any Source
Relational tools
Relational /Multidimensional
Operational data
OLAP tools
Text, image
Spatial
Audio,video
External data
Web
Applications/ Web
5
Oracle Intelligence Tools
IS develops users Views
Business users
Analysts
Current
Tactical
Strategic
Oracle Reports
Oracle Discoverer
Oracle Express
6
Oracle Data Mart Suite
Data Modeling Oracle Data Mart Designer
OLTP Databases
Data Mart Database
OLTP Engines
Ware- housing Engines
Oracle8
SQLPLUS
Data Access Analysis Discoverer Oracle Reports
Data Management Oracle Enterprise Manager
Data Extraction Oracle Data Mart Builder
7
Big Bang ApproachAdvantages and Disadvantages
  • Advantages
  • warehouse built as part of major project (eg
    BPR)
  • Having a big picture of the data warehouse
    before starting the data warehousing project
  • Disadvantages
  • Involves a high risk, takes a longer time
  • Runs the risk of needing to change requirements
  • Costly and harder to get support for from users

8
Incremental Approach to Warehouse Development
  • Multiple iterations
  • Shorter implementations
  • Validation of each phase

9
Benefits of an Incremental Approach
  • Delivers a strategic data warehouse solution
    through incremental development efforts
  • Provides extensible, scalable architecture
  • Quickly provides business benefits and ensures a
    much earlier return of investment
  • Allows a data warehouse to be built based on a
    subject or application area at a time
  • Allows the construction of an integrated data
    mart environment

10
Data Mart
  • A subset of a data warehouse that supports the
    requirements of a particular department or
    business function.
  • Characteristics include
  • Do not normally contain detailed operational data
    unlike data warehouses.
  • May contain certain levels of aggregation

11
Dependent Data Mart
Flat Files
Operational Systems
Sales
Data Warehouse
Finance
Data Marts
External Data
12
Independent Data Mart
Operational Systems
Flat Files
Sales or Marketing
External Data
13
Reasons for Creating a Data Mart
  • To give users more flexible access to the data
    they need to analyse most often.
  • To provide data in a form that matches the
    collective view of a group of users
  • To improve end-user response time.
  • Potential users of a data mart are clearly
    defined and can be targeted for support

14
Reasons for Creating a Data Mart
  • To provide appropriately structured data as
    dictated by the requirements of the end-user
    access tools.
  • Building a data mart is simpler compared with
    establishing a corporate data warehouse.
  • The cost of implementing data marts is far less
    than that required to establish a data warehouse.

15
Data Marts Issues
  • Data mart functionality
  • Data mart size
  • Data mart load performance
  • Users access to data in multiple data marts
  • Data mart Internet / Intranet access
  • Data mart administration
  • Data mart installation

16
Example of DW tool OLAP
  • Rotate and drill down to successive levels of
    detail.
  • Create and examine calculated data interactively
    on large volumes of data.
  • Determine comparative or relative differences.
  • Perform exception and trend analysis.
  • Perform advanced analytical functions for example
    forecasting, modeling, and regression analysis

17
Original OLAP Rules
  • 1. Multidimensional conceptual view
  • 2. Transparency
  • 3. Accessibility
  • 4. Consistent reporting performance
  • 5. Client-server architecture

18
Original OLAP Rules
  • 6. Multiuser support
  • 7. Unrestricted cross-dimensional operations
  • 8. Intuitive data manipulation
  • 9. Flexible reporting
  • 10. Unlimited dimensions and aggregation levels

19
Relational Database Model
Attribute 1Name
Attribute 2Age
Attribute 3Gender
Attribute 4Emp No.
1001 1007 1010 1020
31 42 22 32
F M M F
Anderson Green Lee Ramos
Row 1 Row 2 Row 3 Row 4
The table above illustrates the employee relation.
20
Multidimensional Database Model
Store
Customer
Store
Time
Time
FINANCE
SALES
Product
GL_Line
  • The data is found at the intersection of
    dimensions.

21
Two dimensions
22
Three dimensions
23
Specialised Multidimensional tool
  • Benefits
  • Quick access to very large volumes of data
  • Extensive and comprehensive libraries of complex
    functions
  • analysis
  • Strong modeling and forecasting capabilities
  • Can access multidimensional and relational
    database structures
  • Caters for calculated fields
  • Disadvantages
  • Difficulty of changing model
  • Lack of support for very large volumes of data
  • May require significant processing power

24
MOLAP Server
  • The application layer stores data in a
    multidimensional structure
  • The presentation layer provides the
    multidimensional view

DSS client
MOLAP Engine
  • Efficient storage and processing
  • Complexity hidden from the user
  • Analysis using preaggregated summaries and
    precalculated measures

Application layer
Warehouse
25
ROLAP Server
  • The warehouse stores atomic data.
  • The application layer generates SQL for the
    three- dimensional view.
  • The presentation layer provides the
    multidimensional view.

DSS client
ROLAP engine
Application layer
Multiple SQL
Warehouseserver
26
MOLAP
MDDB
Query
Periodic load
Data
Express Server
Express user
Warehouse
27
ROLAP
Cache
Live fetch
Query
Data cache
Data
Express user
Express Server
Warehouse
Also Hybrid (HOLAP)
28
Choosing a Reporting Architecture
  • Business needs
  • Potential for growth
  • interface
  • enterprise architecture
  • Network architecture
  • Speed of access
  • Openness

Good
Query Performance
OK
Simple
Complex
Analysis
29
Data Acquisition
  • Identify, extract, transform, and transport
    source data
  • Consider internal and external data
  • Perform gap analysis between source data and
    target database objects
  • Plan move of data between sources and target
  • Define first-time load and refresh strategy
  • Define tool requirements
  • Build, test, and execute data acquisition modules

30
Modeling
  • Warehouses differ from operational structures
  • Analytical requirements
  • Subject orientation
  • Data must map to subject oriented information
  • Identify business subjects
  • Define relationships between subjects
  • Name the attributes of each subject
  • Modeling is iterative
  • Modeling tools are available

31
Modeling the Data Warehouse
1
  • Defining the business model
  • Creating the dimensional model
  • Modeling summaries
  • 4. Creating the physical model

2, 3
4
32
Identifying Business Rules
Product Type Monitor Status PC 15
inch New Server 17 inch Rebuilt 19
inch Custom None
Location Geographic proximity 0 - 1 miles 1 - 5
miles gt 5 miles
Store Store gt District gt Region
Time Month gt Quarter gt Year
33
Creating the Dimensional Model
  • Identify fact tables
  • Translate business measures into fact tables
  • Analyze source system information for additional
    measures
  • Identify base and derived measures
  • Document additivity of measures
  • Identify dimension tables
  • Link fact tables to the dimension tables
  • Create views for users


34
Dimension Tables
  • Dimension tables have the following
    characteristics
  • Contain textual information that represents the
    attributes of the business
  • Contain relatively static data
  • Are joined to a fact table through a foreign key
    reference

35
Fact Tables
  • Fact tables have the following characteristics
  • Contain numeric measures (metrics) of the
    business
  • May contain summarized (aggregated) data
  • May contain date-stamped data
  • Are typically additive
  • Have key value that is typically a concatenated
    key composed of the primary keys of the
    dimensions
  • Joined to dimension tables through foreign keys
    that reference primary keys in the dimension
    tables

36
Dimensional Model (Star Schema)
Fact table
Dimension tables
37
Star Schema Model
Product Table Product_id Product_desc
Store Table Store_id District_id ...
Sales Fact Table Product_id Store_id Item_id Day_i
d Sales_dollars Sales_units ...
  • Central fact table
  • Radiating dimensions
  • Denormalized model

Time Table Day_id Month_id Period_id Year_id
Item Table Item_id Item_desc ...
38
Star Schema Model
  • Easy for users to understand
  • Fast response to queries
  • Simple metadata
  • Supported by many front end tools
  • Less robust to change
  • Slower to build
  • Does not support history




39
Snowflake Schema Model
Product Table Product_id Product_desc
Store Table Store_id Store_desc District_id
District Table District_id District_desc
Sales Fact Table Item_id Store_id Sales_dollars Sa
les_units
Time Table Week_id Period_id Year_id
Dept Table Dept_id Dept_desc Mgr_id
Mgr Table Dept_id Mgr_id Mgr_name
Item Table Item_id Item_desc Dept_id
40
Snowflake Schema Model
  • Direct use by some tools
  • More flexible to change
  • Provides for speedier data loading
  • May become large and unmanageable
  • Degrades query performance
  • More complex metadata

41
Using Summary Data
Phase 3 Modeling summaries
  • Provides fast access to precomputed data
  • Reduces use of I/O, CPU, and memory
  • Is distilled from source systems and
    precalculated summaries
  • Usually exists in summary fact tables

42
Designing Summary Tables
  • Average
  • Maximum
  • Total
  • Percentage

Units Sales() Store
Product A Total Product B Total Product C Total
43
Summary Tables Example
SALES FACTS Sales Region Month 10,000 North Jan
99 12,000 South Feb 99 11,000 North Jan
99 15,000 West Mar 99 18,000 South Feb
99 20,000 North Jan 99 10,000 East Jan
99 2,000 West Mar 99
SALES BY MONTH/REGION Month Region Tot_Sales Jan
99 North 41,000 Jan 99 East 10,000 Feb
99 South 40,000 Mar 99 West 17,000
SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb
99 40,000 Mar 99 17,000
44
Summary Management in Oracle8i
Salessummary
Sales
Region
State
City
Time
Product
45
The Time Dimension
  • Time is critical to the data warehouse.
  • A consistent representation of time is required
    for extensibility.

Timedimension
Sales fact
How and where should it be stored?
Write a Comment
User Comments (0)
About PowerShow.com