Data Warehouse Methodology - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Data Warehouse Methodology

Description:

Training, Acceptance. Deployment. Vision, Goals. Increment 1 Increment 2 Increment 3 ... JAD. Review existing reports, documentation. Observation. Questionnaires ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 29
Provided by: timoel4
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouse Methodology


1
Data Warehouse Methodology
  • Part 2

2
Objectives
  • Describe an end to end data warehouse methodology
  • Describe two steps in the methodology that are
    very data warehouse specific
  • Requirements analysis
  • Data source analysis

3
Planning Stage
Business Needs
A Spiral Model
Architecture,Strategy
Vision, Goals
Increment 1 Increment 2 Increment 3
Design, Build, Test
Deployment
Training, Acceptance
4
Planning Stage
  • Prior to launching the project, we need to spend
    some time (say, 4 12 weeks, depending on the
    project scope) in order to establish staffing,
    mission, priorities and preliminary architecture
  • Staffing
  • We need to put together a team to execute the
    planning phase and to execute the first iteration
  • Mission
  • The project team must efine the role the data
    warehouse will play in the organization (galactic
    warehouse? Sales Warehouse? Manufacturing
    Warehouse?)

5
Planning Stage
  • Priorities
  • Perform preliminary (high level) requirements
    analysis to establish requirement priorities
    (what do we need to tackle first) and to
    establish an increment plan. This plan will
    change over time, but it important to establish
    and publish a project roadmap to ensure all
    stakeholders understand the direction of the
    project.
  • Architecture
  • Establish Preliminary Architecture based on
    mission, requirements, existing infrastructure
    and best practices.
  • We need to consider the three main aspects of
    architecture
  • Data
  • Technical, including software selection (RDBMS,
    ETL, Metadata, Access)
  • Infrastructure (Hardware)

6
Planning Stage
  • As we build the data warehouse in increments, we
    prove our architectural choices. We may need
    to change over time. This is OK it is
    impossible to make perfect choices we just
    need to make sound decisions based on available
    information.

7
Methodology Per Development Increment
Business Needs
Delivery
Storage
Engineering
Acquisition
Access
  • Define business requirements, with focus on
    information requirements
  • Define dimensional data model for the data mart
    facts and dimensions
  • Define data model for the DW
  • Identify authoritative source of required data
    from operational systems and analyze source data
  • Map source data, define transformation
    requirements
  • Design build and test extract, transformation and
    load mechanisms
  • Design and build end user application
  • Demonstrate and refine user application via
    prototyping techniques
  • Train and conduct user acceptance

8
Step 1 Define Requirements
  • Clearly Identify the business process to be
    supported the warehouse
  • Map the major steps in the process
  • Identify key points in the process where business
    intelligence is required (there are many
    different ways business intelligence might be
    used).
  • Identify key individuals

9
Step 1 Define Requirements
  • Perform detailed requirements analysis
  • Use fact finding techniques (review details from
    text)
  • Interview
  • JAD
  • Review existing reports, documentation
  • Observation
  • Questionnaires
  • Be prepared for JADs and interviews prepare the
    users get questions to them in advance.
    Structure your interviews and JADs.
  • Keep your commitments to users if you say you
    will follow up, make sure you do so.
  • Focus of requirements analysis
  • What information is required?
  • How must information be presented?

10
Step 1 Define Requirements
11
Step 1 Define Requirements
Identify Opportunity Step Selected
Data Requirements
12
Step 2 define dimensionaldata model
  • The next step is to map the data requirements to
    facts and dimensions
  • Facts things to be measured
  • Dimensions the ways in which you want to
    analyze measures
  • Example
  • Data requirement Total purchases by product,
    person and sales channel
  • Fact Order
  • Dimensions Customer, Channel, Product

What would we add if we wanted above information
over last 3 months?
13
Step 2 define dimensionaldata model
  • Once the requirements are mapped, we can
    consolidate and develop our dimensional data
    model
  • This step will include definition of the grain
    (level of detail) of data to be maintained in the
    data mart.
  • Remember, we may not keep detailed data in the
    data mart

14
Example Fact
15
Step 3 Define DW Data Model
  • Store detailed data required to feed the data
    mart
  • Data architecture may be normalized or
    dimensional this is one of the architectural
    choices to be made at the outset of the project.

16
Step 4 Source Data Analysis
17
Source Data Analysis
Business Drivers, Objectives, User Information
Requirements
Requirements Analysis
Source Data Analysis
Data Model Development
Conceptual
Gap Analysis
Design
Mapping
Physical
Construction
18
Step 4 Source Data Analysis
  • Ensure through understanding of relevant source
    data - learn as early as possible issues,
    gaps, etc.
  • Build relationship with source data steward
  • Gain consensus with customer on data to be used
    for Data Warehouse
  • Capture source metadata so it can be leveraged
  • Employ a structured approach to ensure efficiency
    and coverage

19
Step 4 Source Data Analysis
Source Level Analysis
File/Table Level Analysis
  • Field Level
  • Analysis
  • Definitions
  • Quality Analysis

Tracking/Administration
Source Metadata
Keep/Drop Analysis
Mapping
20
Step 4 Source Data Analysis
  • System-Level Analysis
  • Objective Gain an understanding of the source
    system, and its context.
  • Gather macro level information
  • Primary inputs and outputs (context of system)
  • Business functions supported
  • Major data entities
  • Business events managed by the system,

21
Step 4 Source Data Analysis
  • Table-Level Analysis
  • Objective capture table-level data required for
    sizing and data transformation.
  • Information gathered
  • Volume of data maintained in file
  • File growth pattern/seasonality
  • Unique identifier(s) (keys)
  • Frequency of update
  • Data retention period, archiving strategy
  • Known integrity problems
  • Technical information type of file, copybook
    names, dataset names, etc.

22
Step 4 Source Data Analysis
  • Objective Gain understanding of relevant source
    fields.
  • Two types of metadata captured
  • Definitional
  • business definitions not descriptions!!
  • business rules
  • transformation logic (summary fields, etc.)
  • Known integrity issues
  • Statistical
  • Character fields
  • unique values and frequency distributions
  • counts
  • populated
  • Numeric fields
  • ranges
  • min/max/mean
  • etc.

23
Step 4 Source Data Analysis
  • Defines which source data elements will be used
    to populate the data warehouse and why.
  • Needs be collaborative process with customer
  • Some elements retained for processing only may
    not have business value.
  • Confirm value hierarchy for like source data
    elements (determining best source for each data
    element)

24
Step 4 Source Data Analysis
Metadata Database
25
Tools
  • RDBMS for metadata repository
  • MS Access
  • Database
  • User interface
  • import/export of file layouts
  • import of unique values and statistics
  • reporting!!
  • Postal software for name and address quality
    assessment
  • ID Centric
  • Group 1

26
Tools
  • Statistics
  • Home grown
  • SAS
  • Other tools
  • Vality - for complex and very dirty source data

27
Step 4 Source Data Analysis
Obtain data samples for relevant tables
Identify individuals with source data knowledge,
schedule for interviews/workshops
Get layouts and documentation (electronic form!!)
Perform initial analysis, document
Perform statistical analysis, populate repository
Metrics (per source system) Elapsed time 1-6
weeks Effort 2-10 person weeks
Conduct analysis interviews/workshop, document,
follow-up
Conduct keep/drop session, document, follow-up
Done
28
Step 4 Source Data Analysis
  • Customer project manager
  • identify and commit resources, day to day
    decision making
  • Source experts/stewards (both business and
    technical)
  • Warehouse user team
  • Business systems analyst
  • overall responsibility for analysis
  • Data Architect/Analyst
  • gain understanding of source data in order to
    build target data model
  • ensure all required data is kept
  • Ensure users understand ramifications of
    keep/drop decisions
Write a Comment
User Comments (0)
About PowerShow.com