Chapter 5B: The Data Warehouse - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Chapter 5B: The Data Warehouse

Description:

... its data in a constrained fashion; the multidimensional DBMS allows unfettered access ... Directories. Libraries. Capturing and Managing Contextual ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 36
Provided by: tamy7
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5B: The Data Warehouse


1
Chapter 5B The Data Warehouse Technology
  • Erin Bader
  • David Hancock
  • Tamya Stallings

2
Multidimensional DBMS and the Data Warehouse
3
Multidimensional Database Management Systems
Processing
  • Provide information system with the structure
    that allows an organization to have very flexible
    access to data
  • To slice dice data any number of ways
  • To dynamically explore the relationship between
    summary and detail data
  • Offer flexibility and control to the end user
  • Fit well in a DSS environment

4
Differences between Multidimensional DBMS data
warehouse
  • Data warehouse holds massive amounts of data the
    multidimensional DBMS holds at least an order of
    magnitude less data
  • The data warehouse is geared for a limited amount
    of flexible access the multidimensional DBMS is
    geared for very heavy and unpredictable access
    and analysis of data
  • The data warehouse contains data with a very
    lengthy time horizon from 5-10 years the
    multidimensional DBMS holds a much shorter time
    horizon of data

5
Differences between Multidimensional DBMS data
warehouse
  • The data warehouse allows analysts to access its
    data in a constrained fashion the
    multidimensional DBMS allows unfettered access
  • Instead of the data warehouse being housed in a
    multidimensional DBMS, the multidimensional DBMS
    and the data warehouse enjoy a complementary
    relationship

6
One Interesting feature of Data Warehouse MDBMS
  • Data warehouse can provide a basis for very
    detailed data that is normally not found in the
    multidimensional DBMS
  • Data warehouse very fine degree of detail
  • MDBMS lightly summarized and houses all but the
    most detailed level of data

7
Relational foundation for multidimensional DBMS
data marts
  • Strengths
  • Can support a lot of data
  • Can support dynamic joining of data
  • Has proven technology
  • Is capable of supporting general-purpose update
    processing
  • Is there is no known pattern of usage of data,
    then the relational structure is as good as any
    other

8
Relational foundation for multidimensional DBMS
data marts
  • Weaknesses
  • Has performance that is less than optimal
  • Cannot be purely optimized for access processing

9
Cube Foundation for MDBMS data marts
  • Strengths
  • Performance that is optimal for DSS processing
  • Can be optimized for very fast access of data
  • If pattern of access of data is known, then the
    structure the data can be optimized
  • Can easily be sliced and diced
  • Can be examined in many ways

10
Cube Foundation for MDBMS data marts
  • Weaknesses
  • Cannot handle nearly as much data as a standard
    relational format
  • Does not support general-purpose update
    processing
  • May take a long time to load
  • If access is desired on a path not supported by
    the design of the data, the structure is not
    flexible
  • Questionable support for dynamic joins of data

11
Flow of data from legacy system
12
Flow of data from current level detail
13
Unmanaged Spider Web
14
Tamyas Question
  • What are the differences between the
    multidimensional DBMS and the data warehouse?

15
Data Warehousing across Multiple Storage Media
  • Occurs when large amounts of data are present
  • 2 processing environments
  • DASD
  • Online/interactive processing
  • Tape or Mass Storage

16
Data Warehousing across Multiple Storage Media
cont.
  • Logically, the two environments combine to form a
    single data warehouse
  • Physically, the two are very different
  • The underlying technologies are not the same.
  • Mixing technologies in the data warehouse
    environment is normal and natural when done this
    way.

17
Data Warehousing across Multiple Storage Media
cont.
  • Another way technology can be split, but is not
    normal nor natural
  • The DASD portion is split over more than one
    technology
  • One part resides on one vendors technology
  • The other part resides on another vendors
    database technology

18
Caution
  • When the split is deliberate and part of a larger
    distributed data warehouse, the split is fine.
  • If the split occurs for political or historical
    reasons, it is not advised

19
Meta Data in the Data Warehouse Environment
  • The role of meta data in the data warehouse
    environment is very different from the role of
    meta data in the operational environment.
  • In the operational environment
  • Meta data is treated as documentation
  • Meta data is optional to the IT professional

20
Meta Data in the Data Warehouse Environment cont.
  • In the Data Warehouse environment
  • Meta data is the first thing the DSS analyst
    looks at in planning how to perform
    informational/analytical processing
  • Meta data is much more important here
  • Meta data is mandatory to the DSS analyst

21
Meta Data in the Data Warehouse Environment cont.
  • Data warehouse meta data is also used to manage
    the mapping between the operational environment
    and the data warehouse environment. Without the
    mapping, controlling the interface would be
    extremely difficult.

22
From the operational to data warehouse
environments
  • Conversion
  • Filtering
  • Summarization
  • Structural Changes
  • All these activities are documented in the meta
    data
  • This allows for the drill down process to occur
  • Must keep track of the changing structure over
    time

23
Context and Content
  • Although current information such as account
    balance or inventory on hand is important,
    keeping track of the information and allowing it
    to be viewed over a period of time can offer new
    insight

24
Contextual Information
  • The contents of data by themselves are quite
    inexplicable, but when context is added, together
    they become enlightening

25
Types of Contextual Information
  • Simple Contextual Information
  • Complex Contextual Information
  • External Contextual Information

26
Simple Contextual Information
  • Relates to the basic structure of data itself and
    was managed in the past by dictionaries,
    directories, etc
  • It includes
  • The structure of data
  • The encoding of data
  • The naming conventions used for data
  • The metrics describing data
  • How much data is there
  • How fast the data is growing
  • What sectors of data are growing
  • How the data is being used

27
Complex Contextual Information
  • Describes same data as simple contextual
    information, but from a different perspective
  • Addresses aspects such as
  • Product Definitions
  • Pricing
  • Packaging
  • It is some of the most useful, but elusive
    information there is to capture
  • It is elusive because it is taken for granted and
    in the background

28
External Contextual Information
  • Information outside the corporation that plays an
    important role in understanding information over
    time.
  • Examples include
  • Inflation
  • Financial trends
  • Taxation
  • Economic growth
  • External contextual information says nothing
    about the company but everything about the
    universe in which the company operates

29
Erins Question
  • What are the three types of contextual
    information and explain each one?

30
Capturing and Managing Contextual Information
  • Difficulties with Contextual Information
  • Contextual information changes to quickly
  • Difficult to put into the system due to the fast
    changes
  • Complex and external types are hard to capture
    because they are so unstructured

31
Capturing and Managing Contextual Information
  • Past types of managing contextual information
  • Dictionaries
  • Repositories
  • Directories
  • Libraries

32
Capturing and Managing Contextual Information
  • Problems with past types
  • Information management was focused on the
    developer and not the end user
  • Attempts were passive
  • Management was only limited to simple contextual
    information

33
Testing
  • The testing environment is where programmers test
    the new programs and change to existing programs

34
Testing
  • The test environment is different for every data
    warehouse
  • Reasons are
  • The data warehouse is to large and is hard to
    justify just one, much less two
  • Programs are not run in a repetitive manner

35
Davids Question
  • Why did past attempts at managing contextual
    information fail?
Write a Comment
User Comments (0)
About PowerShow.com