Title: Data Warehouse Methodology
1Data Warehouse Methodology
2Objectives
- Describe an end to end data warehouse methodology
- Describe two steps in the methodology that are
very data warehouse specific - Requirements analysis
- Data source analysis
3Planning Stage
Business Needs
A Spiral Model
Architecture,Strategy
Vision, Goals
Increment 1 Increment 2 Increment 3
Design, Build, Test
Deployment
Training, Acceptance
4Planning Stage
- Prior to launching the project, we need to spend
some time (say, 4 12 weeks, depending on the
project scope) in order to establish staffing,
mission, priorities and preliminary architecture - Staffing
- We need to put together a team to execute the
planning phase and to execute the first iteration - Mission
- The project team must efine the role the data
warehouse will play in the organization (galactic
warehouse? Sales Warehouse? Manufacturing
Warehouse?)
5Planning Stage
- Priorities
- Perform preliminary (high level) requirements
analysis to establish requirement priorities
(what do we need to tackle first) and to
establish an increment plan. This plan will
change over time, but it important to establish
and publish a project roadmap to ensure all
stakeholders understand the direction of the
project. - Architecture
- Establish Preliminary Architecture based on
mission, requirements, existing infrastructure
and best practices. - We need to consider the three main aspects of
architecture - Data
- Technical, including software selection (RDBMS,
ETL, Metadata, Access) - Infrastructure (Hardware)
6Planning Stage
- As we build the data warehouse in increments, we
prove our architectural choices. We may need
to change over time. This is OK it is
impossible to make perfect choices we just
need to make sound decisions based on available
information.
7Methodology Per Development Increment
Business Needs
Delivery
Storage
Engineering
Acquisition
Access
- Define business requirements, with focus on
information requirements - Define dimensional data model for the data mart
facts and dimensions - Define data model for the DW
- Identify authoritative source of required data
from operational systems and analyze source data - Map source data, define transformation
requirements - Design build and test extract, transformation and
load mechanisms - Design and build end user application
- Demonstrate and refine user application via
prototyping techniques - Train and conduct user acceptance
8Step 1 Define Requirements
- Clearly Identify the business process to be
supported the warehouse - Map the major steps in the process
- Identify key points in the process where business
intelligence is required (there are many
different ways business intelligence might be
used). - Identify key individuals
9Step 1 Define Requirements
- Perform detailed requirements analysis
- Use fact finding techniques (review details from
text) - Interview
- JAD
- Review existing reports, documentation
- Observation
- Questionnaires
- Be prepared for JADs and interviews prepare the
users get questions to them in advance.
Structure your interviews and JADs. - Keep your commitments to users if you say you
will follow up, make sure you do so. - Focus of requirements analysis
- What information is required?
- How must information be presented?
10Step 1 Define Requirements
11Step 1 Define Requirements
Identify Opportunity Step Selected
Data Requirements
12Step 2 define dimensionaldata model
- The next step is to map the data requirements to
facts and dimensions - Facts things to be measured
- Dimensions the ways in which you want to
analyze measures - Example
- Data requirement Total purchases by product,
person and sales channel - Fact Order
- Dimensions Customer, Channel, Product
What would we add if we wanted above information
over last 3 months?
13Step 2 define dimensionaldata model
- Once the requirements are mapped, we can
consolidate and develop our dimensional data
model - This step will include definition of the grain
(level of detail) of data to be maintained in the
data mart. - Remember, we may not keep detailed data in the
data mart
14Example Fact
15Step 3 Define DW Data Model
- Store detailed data required to feed the data
mart - Data architecture may be normalized or
dimensional this is one of the architectural
choices to be made at the outset of the project.
16Step 4 Source Data Analysis
17Source Data Analysis
Business Drivers, Objectives, User Information
Requirements
Requirements Analysis
Source Data Analysis
Data Model Development
Conceptual
Gap Analysis
Design
Mapping
Physical
Construction
18Step 4 Source Data Analysis
- Ensure through understanding of relevant source
data - learn as early as possible issues,
gaps, etc. - Build relationship with source data steward
- Gain consensus with customer on data to be used
for Data Warehouse - Capture source metadata so it can be leveraged
- Employ a structured approach to ensure efficiency
and coverage
19Step 4 Source Data Analysis
Source Level Analysis
File/Table Level Analysis
- Field Level
- Analysis
- Definitions
- Quality Analysis
Tracking/Administration
Source Metadata
Keep/Drop Analysis
Mapping
20Step 4 Source Data Analysis
- System-Level Analysis
- Objective Gain an understanding of the source
system, and its context. - Gather macro level information
- Primary inputs and outputs (context of system)
- Business functions supported
- Major data entities
- Business events managed by the system,
21Step 4 Source Data Analysis
- Table-Level Analysis
- Objective capture table-level data required for
sizing and data transformation. - Information gathered
- Volume of data maintained in file
- File growth pattern/seasonality
- Unique identifier(s) (keys)
- Frequency of update
- Data retention period, archiving strategy
- Known integrity problems
- Technical information type of file, copybook
names, dataset names, etc.
22Step 4 Source Data Analysis
- Objective Gain understanding of relevant source
fields. - Two types of metadata captured
- Definitional
- business definitions not descriptions!!
- business rules
- transformation logic (summary fields, etc.)
- Known integrity issues
- Statistical
- Character fields
- unique values and frequency distributions
- counts
- populated
- Numeric fields
- ranges
- min/max/mean
- etc.
23Step 4 Source Data Analysis
- Defines which source data elements will be used
to populate the data warehouse and why. - Needs be collaborative process with customer
- Some elements retained for processing only may
not have business value. - Confirm value hierarchy for like source data
elements (determining best source for each data
element)
24Step 4 Source Data Analysis
Metadata Database
25Tools
- RDBMS for metadata repository
- MS Access
- Database
- User interface
- import/export of file layouts
- import of unique values and statistics
- reporting!!
- Postal software for name and address quality
assessment - ID Centric
- Group 1
26Tools
- Statistics
- Home grown
- SAS
- Other tools
- Vality - for complex and very dirty source data
27Step 4 Source Data Analysis
Obtain data samples for relevant tables
Identify individuals with source data knowledge,
schedule for interviews/workshops
Get layouts and documentation (electronic form!!)
Perform initial analysis, document
Perform statistical analysis, populate repository
Metrics (per source system) Elapsed time 1-6
weeks Effort 2-10 person weeks
Conduct analysis interviews/workshop, document,
follow-up
Conduct keep/drop session, document, follow-up
Done
28Step 4 Source Data Analysis
- Customer project manager
- identify and commit resources, day to day
decision making - Source experts/stewards (both business and
technical) - Warehouse user team
- Business systems analyst
- overall responsibility for analysis
- Data Architect/Analyst
- gain understanding of source data in order to
build target data model - ensure all required data is kept
- Ensure users understand ramifications of
keep/drop decisions