Data Warehouse: - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Data Warehouse:

Description:

Distinguish between data warehouses and data marts ... Proliferates unmanageable interfaces. 8/31/09. Data Warehousing. 28. A Practical Approach ... – PowerPoint PPT presentation

Number of Views:182
Avg rating:3.0/5.0
Slides: 54
Provided by: jongso
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouse:


1
? 2 ?
  • ??? ????? ?? ??
  • Data Warehouse
  • The Building Blocks

2
Chapter Objectives
  • Review formal definition of a data warehouse
  • Discuss the defining features
  • Distinguish between data warehouses and data
    marts
  • Study each component or building block that makes
    up a data warehouse
  • Introduce metadata and highlight its significance

3
Data Warehouse
  • Information delivery system
  • Integrate and transform enterprise data into
    information
  • suitable for strategic decision making
  • Take all the historic data from the various
    operational systems
  • Combine this internal data with any relevant data
    from outside sources
  • Pull them together

4
Set up information delivery system
  • Need different components or building blocks
  • Arranged together in the most optimal way
  • Arranged in a suitable architecture

5
Bill Inmons Definition of DW
  • The father of Data Warehouse
  • A Data Warehouse is a subject oriented,
    integrated, nonvolatile, and time variant
    collection of data in support of managements
    decisions.

6
Sean Kelly
  • Another leading data warehouse practitioner
  • The data in the data warehouse is
  • ???? Separate
  • ?????? Available
  • ???? Integrated
  • ??? ???? Time stamped
  • ?? ??? Subject oriented
  • ????? Nonvolatile
  • ????? Accessible

7
DEFINING FEATURES
  • What about the nature of the data in the data
    warehouse?
  • How is this data different from the data in any
    operational system?
  • Why does it have to be different?
  • How is the data content in the data warehouse
    used?

8
DEFINING FEATURES
  • Some of Key Defining Features of the Data
    Warehouse
  • Subject-Oriented
  • Integrated Data
  • Time-Variant Data
  • Nonvolatile Data
  • ??? ????? Data Granularity

9
Subject-Oriented Data
  • Data is stored by subjects, not by applications
  • The subjects are critical for the enterprise
  • Sales, shipments and inventory for a
    manufacturing company
  • Figure 2-1
  • There is no application flavor
  • The data in a data warehouse cut across
    applications

10
(No Transcript)
11
Integrated Data
  • Need to pull together all the relevant data from
    the various systems
  • Data from internal operational systems
  • Data from outside sources
  • Before the data can be stored in a DW,
  • Remove the inconsistencies
  • Standardize the various data elements
  • Go through a process of transformation,
    consolidation, and integration of the source data

12
(No Transcript)
13
Standardization
  • Some of the items that would need
    standardization
  • Naming Conventions
  • Codes
  • Data attributes
  • Measurements

14
Time-Variant Data
  • For an operational system,
  • the stored data contains the current values
  • The data in the data warehouse is meant for
    analysis and decision making.
  • The use needs data not only about the current
    purchase, but on the past purchases.
  • A data warehouse has to contain historical data,
    not just current values.

15
Time-variant nature
  • The time-variant nature of the data
  • Allows for analysis of the past
  • Relates information to the present
  • Enables forecasts for the future

16
Nonvolatile Date
  • The data in the data warehouse is not intended to
    run the day-to-day business.
  • You do not update the data warehouse every time
    you process a single order.
  • Data from the operational systems are moved into
    the data warehouse at specific intervals.
  • Figure 2-3, not update

17
Data Granularity
  • The analysis begins at a high level and moves
    down to lower levels of detail
  • Start by looking at summary data
  • Look at the breakdown
  • Data granularity in a data warehouse refers to
    the level of detail
  • The lower the level of detail, the finer the data
    granularity
  • The lowest level of detail ? a lot of data in the
    data warehouse

18
(No Transcript)
19
DATA WAREHOUSES AND DATA MARTS
  • In 1998, Bill Inmon stated,
  • The single most important issue facing the IT
    manager this year is whether to build the data
    warehouse first or the data mart first.

20
DATA WAREHOUSES AND DATA MARTS
  • Before deciding to build a data warehouse, you
    need to ask
  • Top-down or bottom-up approach?
  • Enterprise-wide or department?
  • Which first data warehouse or data mart?
  • Build pilot or go with a full-fledged
    implementation?
  • Dependent or independent data marts?

21
(No Transcript)
22
How are They Different?
  • Figure 2-5
  • Two different basic approaches
  • Overall data warehouse feeding dependent data
    marts
  • Several departmental or local data marts
    combining into a data warehouse

23
(No Transcript)
24
Top-Down Approach Advantages
  • A truly corporate effort, an enterprise view of
    data
  • Inherently architected not a union of
    disparate data marts
  • Single, central storage of data about the content
  • Centralized rules and control
  • May see quick results if implemented with
    iterations

25
Top-Down Approach Disadvantages
  • Takes longer to build even with an iterative
    method
  • High exposure/risk to failure
  • Needs high level of cross-functional skills
  • High outlay without proof of concept

26
Bottom-Up Approach Advantages
  • Faster and easier implementation of manageable
    pieces
  • Favorable return on investment and proof of
    concept
  • Less risk of failure
  • Inherently incremental can schedule important
    data marts first
  • Allows project team to learn and grow

27
Bottom-Up Approach Disadvantages
  • Each data mart has its own narrow view of data
  • Permeates redundant data in every data mart
  • Perpetuates inconsistent and irreconcilable data
  • Proliferates unmanageable interfaces

28
A Practical Approachby Ralph Kimball
  • Plan and define requirements at the overall
    corporate level
  • Create a surrounding architecture for a complete
    warehouse
  • Conform and standardize the data content
  • Implement the data warehouse as a series of
    supermarts, one at a time
  • Supermarts are carefully architected data marts

29
An Enterprise Data Warehouse
  • A data mart is a logical subset of the complete
    data warehouse
  • A data warehouse is a conformed union of all data
    marts
  • Individual data marts are targeted to particular
    business groups
  • The collection of all the data marts form an
    integrated whole, called the enterprise data
    warehouse

30
OVERVIEW OF THE COMPONENTS
  • Architecture is the proper arrangement of the
    components
  • Build a data warehouse with software and hardware
    components
  • Arrange the building blocks for maximum benefit
  • May lay special emphasis on one component

31
Basic Components of a typical warehouse
  • Figure 2-6 building blocks or components
  • Source Data Component
  • Data Staging Component
  • Data Storage Component
  • Store and manage the data, keep track of the data
    by means of the metadata repository
  • Information Delivery Component
  • Metadata Component
  • Management and Control Component

32
(No Transcript)
33
Source Data Component
  • Production Data
  • Internal Data
  • Archived Data
  • External Data

34
Production Data
  • Data from the various operational systems
  • on different hardware platforms
  • by different database systems and operating
    systems
  • from many vertical applications
  • No conformance of data among the various
    operational systems
  • The significant and disturbing characteristic of
    production data is disparity
  • Standardize, transform, convert, and integrate
    the disparate data

35
Internal Data
  • Data from users private spreadsheets,
    documents, customer profiles, and sometimes even
    departmental database
  • Add additional complexity to the process of
    transforming and integrating the data
  • Determine strategies for collecting data from
    spreadsheets
  • Find ways of taking data from textual documents
  • Tie into departmental databases to gather
    pertinent data from these sources

36
Archived Data
  • Periodically take the old data and store it in
    archived files in an operational system
  • Many different methods of archiving
  • A separate archival database
  • Flat files on disk storage
  • Tape cartridges or microfilm and even off-site
  • A data warehouse keeps historical snapshots of
    data
  • Look into your archived data sets
  • Useful for discerning patterns and analyzing
    trends

37
External Data
  • Data from external sources for information that
    most executives use
  • Statistics relating to their industry produced by
    external agencies
  • Market share data of competitors
  • Standard values of financial indicators for their
    business
  • To spot industry trends and compare performance
    against other organizations
  • Usually, data from outside sources do not conform
    to your formats

38
Data Staging Component
  • Three major functions need to be performed for
    getting the data ready
  • extract the data
  • transform the data
  • and then load the data into the data warehouse
    storage
  • ETT
  • ??(Extraction)
  • ??(Transformation)
  • ??(Transportation)

39
Data Staging
  • Provide a place and an area with a set of
    functions to clean, change, combine, convert,
    deduplicate, and prepare source data for storage
    and use in the data warehouse

40
Data Extraction
  • Deal with numerous data sources
  • Tools for data extraction
  • Purchasing outside tools
  • Developing in-house programs
  • Extract the source data into
  • a group of flat files,
  • or a data-staging relational database,
  • or a combination of both

41
Data Transformation
  • Perform a number of individual tasks
  • Clean
  • Standardization
  • Combine
  • Purging and separating out
  • Sorting and merging
  • Assignment of surrogate keys
  • Results a collection of integrated data that is
    cleaned, standardized, and summarized

42
Data Loading
  • Two distinct groups of tasks
  • The initial loading of the data into the data
    warehouse
  • Refresh cycles
  • Extract the changes to the source data
  • Transform the data revisions
  • And feed the incremental data revisions on an
    ongoing basis
  • Figure 2-7

43
(No Transcript)
44
Data Storage Component
  • A separate repository
  • To keep large volume of historical data for
    analysis
  • To keep the data in structures suitable for
    analysis
  • The data warehouses are read-only data
    repositories
  • The data is stable and it represents snapshots at
    specified periods
  • The database in a data warehouse must be open
  • Must be open to different tools
  • RDBMSs or MDDBs

45
Information Delivery Component
  • Who are the users?
  • The novices, the casual users, the business
    analysts, and the power users
  • Different methods of information delivery
  • Ad hoc reports, complex queries, multidimensional
    analysis, statistical analysis, EIS feed,
    data-mining applications
  • Information delivery mechanism
  • Online, internet, intranet, e-mail

46
(No Transcript)
47
Metadata Component
  • The data about the data in the data warehouse
  • Similar to a data dictionary, but much more than
    a data dictionary
  • (Later, in a separate section)

48
Management and Control Component
  • Sit on top of all the other components
  • Coordinate the services and activities
  • Control the data transformation and the data
    transfer into the data warehouse storage
  • Moderate the information delivery to the users
  • Monitor the movements of data into the staging
    area and from there into the data warehouse
    storage
  • The metadata is the source of information for the
    management module

49
METADATA IN THE DATA WAREHOUSE
  • The Yellow Pages
  • A directory with data about the institutions
  • Types of Metadata
  • Operational Metadata
  • Extraction and Transformation Metadata
  • End-user Metadata

50
Operational Metadata
  • Contain all of next information about the
    operational data sources
  • Data for the data warehouse comes from several
    operational systems
  • The data elements have various field lengths and
    data types
  • You split records, combine parts of records from
    different source files, and deal with multiple
    coding schemes and field lengths

51
Extraction and Transformation Metadata
  • Contain data about the extraction of data from
    the source systems
  • the extraction frequencies
  • extraction methods,
  • and business rules for the data extractions

52
End-User Metadata
  • The navigational map
  • Enable the end-users to find information
  • Allow the end-users to use their own business
    terminology

53
Special Significance of Metadat
  • Act as the glue that connects all parts of the
    data warehouse
  • Provide information about the contents and
    structures to the developers
  • Open the door to the end-users and make the
    contents recognizable in their own terms
Write a Comment
User Comments (0)
About PowerShow.com