Data Warehouse and Design: Team 3A - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Data Warehouse and Design: Team 3A

Description:

Beginning with Operational Data p.82-87. Data/Process Models and the Architected ... Rubbing a 'before' and an 'after' image of the operational file together. ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 30
Provided by: Cha148
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouse and Design: Team 3A


1
Data Warehouse and DesignTeam 3A
  • Shane Brown
  • Angela Daniels
  • Charl Welman

2
Outline of Team 3A
  • Beginning with Operational Data p.82-87
  • Data/Process Models and the Architected
    Environment p.89-101
  • The Data Model and Iterative Development p.102
  • The Data Warehouse and Data Models p.89-101
  • Normalization/ Denormalization p. 102- 112

3
Beginning with Operational Data
  • Some Integration Issues
  • Data that is not encoded consistently
  • Field transformation
  • Legacy Data existing under many different formats
    on many different DBMSs

SAVINGS
DDA
LOANS
TRUST
Figure 3.2 Data across the different applications
is severely not integrated
4
Beginning with Operational Data (cont.)
  • Three Types of Loads that are made into the Data
    Warehouse from the Operational Environment
  • Archival Data
  • Data currently contained in the OE
  • Ongoing changes to the data warehouse environment
    from the changes (updates) that have occurred in
    the operational environment since the last
    refresh.

5
Beginning with Operational Data (cont.)
  • Five common Techniques of Scanning at Point of
    Refresh
  • Scanning data that has been time-stamped in the
    OE
  • Limiting the data to be scanned is to scan a
    delta file.
  • Scan a log file or an audit file created as a
    by-product of transaction processing.
  • For managing the amount of data scanned is to
    modify application code
  • Rubbing a before and an after image of the
    operational file together.

6
Data/Process Models and the Architected
Environment
  • A Process Model Consists of
  • Functional decomposition
  • Context-level zero diagram
  • Data flow diagram
  • Structure chart
  • State transition diagram
  • HIPO chart
  • Pseudocode

7
The Data Model and Iterative Development
  • Why an Iterative Development is Important
  • The industry track record of success strongly
    suggests it
  • The end user is unable to articulate requirements
    until the first iteration is done
  • Management will not make a full commitment until
    actual results are tangible and obvious
  • Visible results must be seen quickly
  • Development efforts
  • The Data Warehouse serves as a roadmap for each
    of the development efforts.
  • Developer is confident that (s)he will intersect
    his or her effort with the first development
    effort

8
The Data Warehouse And Data Models
9
Corporate Data Models
  • Corporate Data Model focuses on and represents
    only primitive data
  • Performance factors are added into the corporate
    data model as the model is transported to the
    existing system environment
  • Figure 3.8 page 92

10
Data Warehouse and Data Models
11
Corporate Data Model Changes
  • Data that is used purely in the operational
    environment is removed
  • Key structures are enhanced with time elements
  • Derived data is added to the corporate data model
  • Data relationships in the operational environment
    are turned in to artifacts in the data warehouse

12
Corporate Data Model Changes
  • Stability Analysis involves
  • Grouping attributes of data together based on
    their propensity for change
  • See Figure 3.9 on pager 91

13
Data Warehouse Data Modeling
  • Three levels of Data Modeling
  • High-Level Modeling (ERD)
  • Midlevel Modeling (Data item set or DIS)
  • Low-Level Modeling (Physical Model)

14
High-Level Modeling
  • Features entities and relationships
  • The entities that are shown in the ERD level are
    the highest level of abstraction

Represents an entity or a major subject
Represents a 1n relationship
Represents an m n relationship
Represents a 1 1 relationship
15
Midlevel Data Model
  • There are four constructs that make up the
    midlevel data model
  • Primary grouping exist once for each major
    subject area
  • Secondary grouping holds data attributes that can
    exist multiple times
  • A Connector relates data from one group to
    another
  • Type of data is indicated by a line leading to
    the right of a group

16
Physical Data Model
  • Created from the midlevel data model by including
    keys and physical characteristics
  • Last design step is to factor in performance
    characteristic
  • granularity and partitioning of the data
  • Physical input and output

17
NORMALIZATION/ DENORMALIZATION
18
  • Output of data model produces number of tables
    with moderate data
  • Performance suffers
  • I/O issues when all programs use many I/O
  • Small tables becomes costly to operate.
  • Rational approach is to merge smaller tables to
    reduce I/O and so cost.

19
What strategy?
  • Merging tables is one
  • Creating arrays of data is another
  • have normalized data residing in different
    physical locations and retrieval requires a
    physical I/O
  • if data were placed in a single row in an
    array then one I/O would do to retrieve data

20
When to do Arrays
  • Stable number of occurrences
  • Data is accessed in sequence
  • Data is created and/or updated in a statistically
    well-behaved sequence
  • Occurs regularly in data warehouse because of the
    time-based orientation of the data
  • Creating arrays by month is frequent

21
Deliberate Redundancy
  • In fully normalized table access is usually done
    through the base table
  • Table is large and access becomes expensive
  • If data is widely used and it is stable( no
    worries about updates) then denormalize data

22
Separation/Calculation of Data
  • Different parts of data is accessed more
    frequentlyseparate the data (bank balance)
  • Normalize in two separate tables allows for
    compact storage
  • Derived data can reduce I/O needed-store the
    calculated data, so calculated only once and its
    always available without more calculations

23
Creative Indexes
  • Called a creative profile
  • Occurs as operational data is moved to data
    warehouse, requires little overhead to create
  • Does a profile on items of interest like largest
    purchases, latest shipments
  • If management anticipates interest a creative
    index should be built

24
Referential Integrity
  • Appears as artifacts of relationships in data
    warehouse environment
  • can be managed independently, are efficient
    to access, do not require update
  • Dont replicate referential integrity in data
    warehouse environment

25
Snapshots
  • Each type of data warehouse centers around a
    structure of data called a snapshot
  • Basic components
  • a key
  • a unit of time
  • primary data related to key only
  • secondary data captured as part of snapshot

26
Snapshots continued
  • Created as result of some event occurring
  • Two types of triggers
  • discrete activitysome business occurrence
  • time event-end of day, week, or month

27
Shanes Question
  • Discuss some integration issues and give
    examples.

28
Angelas Questions
  • What are the four constructs that make up the
    midlevel data model? Explain

29
Charls Question
  • When do you use arrays to reduce I/O?
Write a Comment
User Comments (0)
About PowerShow.com