Data Warehouse and Design: Team 3A presentation

About This Presentation

Transcript and Presenter's Notes

Title: Data Warehouse and Design: Team 3A

1
Data Warehouse and DesignTeam 3A

Shane Brown
Angela Daniels
Charl Welman

2
Outline of Team 3A

Beginning with Operational Data p.82-87
Data/Process Models and the Architected
Environment p.89-101
The Data Model and Iterative Development p.102
The Data Warehouse and Data Models p.89-101
Normalization/ Denormalization p. 102- 112

3
Beginning with Operational Data

Some Integration Issues
Data that is not encoded consistently
Field transformation
Legacy Data existing under many different formats
on many different DBMSs

SAVINGS
DDA
LOANS
TRUST
Figure 3.2 Data across the different applications
is severely not integrated
4
Beginning with Operational Data (cont.)

Three Types of Loads that are made into the Data
Warehouse from the Operational Environment
Archival Data
Data currently contained in the OE
Ongoing changes to the data warehouse environment
from the changes (updates) that have occurred in
the operational environment since the last
refresh.

5
Beginning with Operational Data (cont.)

Five common Techniques of Scanning at Point of
Refresh
Scanning data that has been time-stamped in the
OE
Limiting the data to be scanned is to scan a
delta file.
Scan a log file or an audit file created as a
by-product of transaction processing.
For managing the amount of data scanned is to
modify application code
Rubbing a before and an after image of the
operational file together.

6
Data/Process Models and the Architected
Environment

A Process Model Consists of
Functional decomposition
Context-level zero diagram
Data flow diagram
Structure chart
State transition diagram
HIPO chart
Pseudocode

7
The Data Model and Iterative Development

Why an Iterative Development is Important
The industry track record of success strongly
suggests it
The end user is unable to articulate requirements
until the first iteration is done
Management will not make a full commitment until
actual results are tangible and obvious
Visible results must be seen quickly
Development efforts
The Data Warehouse serves as a roadmap for each
of the development efforts.
Developer is confident that (s)he will intersect
his or her effort with the first development
effort

8
The Data Warehouse And Data Models
9
Corporate Data Models

Corporate Data Model focuses on and represents
only primitive data
Performance factors are added into the corporate
data model as the model is transported to the
existing system environment
Figure 3.8 page 92

10
Data Warehouse and Data Models
11
Corporate Data Model Changes

Data that is used purely in the operational
environment is removed
Key structures are enhanced with time elements
Derived data is added to the corporate data model
Data relationships in the operational environment
are turned in to artifacts in the data warehouse

12
Corporate Data Model Changes

Stability Analysis involves
Grouping attributes of data together based on
their propensity for change
See Figure 3.9 on pager 91

13
Data Warehouse Data Modeling

Three levels of Data Modeling
High-Level Modeling (ERD)
Midlevel Modeling (Data item set or DIS)
Low-Level Modeling (Physical Model)

14
High-Level Modeling

Features entities and relationships
The entities that are shown in the ERD level are
the highest level of abstraction

Represents an entity or a major subject
Represents a 1n relationship
Represents an m n relationship
Represents a 1 1 relationship
15
Midlevel Data Model

There are four constructs that make up the
midlevel data model
Primary grouping exist once for each major
subject area
Secondary grouping holds data attributes that can
exist multiple times
A Connector relates data from one group to
another
Type of data is indicated by a line leading to
the right of a group

16
Physical Data Model

Created from the midlevel data model by including
keys and physical characteristics

Last design step is to factor in performance
characteristic
granularity and partitioning of the data
Physical input and output

17
NORMALIZATION/ DENORMALIZATION
18

Output of data model produces number of tables
with moderate data
Performance suffers
I/O issues when all programs use many I/O
Small tables becomes costly to operate.
Rational approach is to merge smaller tables to
reduce I/O and so cost.

19
What strategy?

Merging tables is one
Creating arrays of data is another
have normalized data residing in different
physical locations and retrieval requires a
physical I/O
if data were placed in a single row in an
array then one I/O would do to retrieve data

20
When to do Arrays

Stable number of occurrences
Data is accessed in sequence
Data is created and/or updated in a statistically
well-behaved sequence
Occurs regularly in data warehouse because of the
time-based orientation of the data
Creating arrays by month is frequent

21
Deliberate Redundancy

In fully normalized table access is usually done
through the base table
Table is large and access becomes expensive
If data is widely used and it is stable( no
worries about updates) then denormalize data

22
Separation/Calculation of Data

Different parts of data is accessed more
frequentlyseparate the data (bank balance)
Normalize in two separate tables allows for
compact storage
Derived data can reduce I/O needed-store the
calculated data, so calculated only once and its
always available without more calculations

23
Creative Indexes

Called a creative profile
Occurs as operational data is moved to data
warehouse, requires little overhead to create
Does a profile on items of interest like largest
purchases, latest shipments
If management anticipates interest a creative
index should be built

24
Referential Integrity

Appears as artifacts of relationships in data
warehouse environment
can be managed independently, are efficient
to access, do not require update
Dont replicate referential integrity in data
warehouse environment

25
Snapshots

Each type of data warehouse centers around a
structure of data called a snapshot
Basic components
a key
a unit of time
primary data related to key only
secondary data captured as part of snapshot

26
Snapshots continued

Created as result of some event occurring
Two types of triggers
discrete activitysome business occurrence
time event-end of day, week, or month

27
Shanes Question

Discuss some integration issues and give
examples.

28
Angelas Questions

What are the four constructs that make up the
midlevel data model? Explain

29
Charls Question

When do you use arrays to reduce I/O?

Write a Comment

User Comments (0)

About PowerShow.com

Data Warehouse and Design: Team 3A PowerPoint PPT Presentation