Ron%20Forino - PowerPoint PPT Presentation

About This Presentation
Title:

Ron%20Forino

Description:

DMR Consulting Group ... retail databases will cost American consumers as much as $2.5 ... DMR Consulting Group Inc. 6. Tactics and The End Game 'We need ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 46
Provided by: ronfo
Learn more at: https://dama-ncr.org
Category:

less

Transcript and Presenter's Notes

Title: Ron%20Forino


1
Project Driven Data Quality Improvement
Ron Forino DAMA - Washington, DC September 1999
2
Examples
  • According to DM Review, one European company
    discovered through an audit that it was not
    invoicing 4 of its orders. With 2 billion in
    revenues, that meant 80 million went unpaid.
  • Electronic data audits show that the invalid data
    values in the typical customer database average
    around 15 - 20. Physical audits suggest that
    this number may be closer to 25 - 30.
  • In 1992, 96,000 IRS tax refund checks were
    returned undeliverable due to incorrect
    addresses.
  • This year, incorrect price data in retail
    databases will cost American consumers as much as
    2.5 billion in overcharges.
  • According to organizations like the Data
    Warehouse Institute, the Gartner Group and
    MetaGroup - Data Quality is one of the top 1-3
    success factors to Data Warehousing.
  • The average mid-sized company may have 30,000 -
    50,000 fields in files, tables, screens, reports,
    etc. Platinum Technology

3
Agenda
  • Definitions
  • What is Data Quality?
  • Tactics and the End Game
  • Building Blocks to Data Quality
  • Tactical Initiatives
  • Strategic Initiatives
  • Tactical Data Quality
  • Rule Disclosure
  • Data Quality Measurement, Analysis and
    Certification
  • Meta Data Creation
  • Validation
  • Quality Improvement

4
Definitions
5
Definitions
  • Data Transformation - Changing data values to a
    format consistent with integrity and business
    rules agreed to by data stakeholders.
  • Data Cleansing - Consolidation of redundant
    customer records. Term used to describe the
    process of merging and purging of customer
    lists in an effort to reduce duplicate or
    inaccurate customer records.
  • Data Quality Improvement - The process of
    improving data quality to the level desired to
    support the enterprise information demand.
  • Data Quality - definition to follow.

6
Data Quality Improvement Decision Tree
Task Process
Transform
Conform to Business Rule
Data Reengineering
Data Quality Improvement
Process Reengineering
Standardize Validate Match Dedupe Integrate Enrich
Match Dedupe
Data Cleansing
7
Tactics and The End Game
We need better data quality...
Enterprise Initiative
Select Project
Data Quality Assessment
Report Recommendations
Source System Clean-up Initiative
8
Tactics and The End Game
We need better data quality...
Enterprise Initiative
Data Warehouse
Select Project
Data Quality Assessment
Data Quality Assessment
Staging Specifications
Report
Report Recommendations
Source System Clean-up Initiative
Source System Clean-up Initiative
9
What is Good Data Quality?
10
How Can We Know Good Data Quality?
Is this Good Data Quality?
  • Column 1
  • 321453
  • 212392
  • 093255
  • 214421
  • .
  • .
  • .

What can we conclude?
11
What is Data Quality?
  • Information Quality f(Definition Data
    Presentation)
  • Definition
  • Defines Data
  • Domain Value Specification
  • Business Rules that Govern the Data
  • Information Architecture Quality
  • Data Content
  • Completeness
  • Validity/Reasonability
  • Data Presentation
  • Accessible
  • Timely
  • Non-ambiguous

12
Common Data Quality Problems
  • Data Content
  • Missing Data
  • Invalid Data
  • Data Outside Legal Domain
  • Illogical Combinations of Data
  • Structural
  • Record Key Integrity
  • Referential Integrity
  • Cardinality Integrity
  • Migration/Integration
  • Rationalization Anomalies
  • Duplicate or Lost Entities
  • Definitions and Standards
  • Ambiguous Business Rules
  • Multiple Formats for Same Data Elements
  • Different Meanings for the Same Code Value
  • Multiple Codes Values with the Same Meaning
  • Field Used for Unintended Data
  • Data in Filler
  • Y2K Violation

13
Building Blocks to Data Quality
14
Building Blocks of a Data Quality Program
Benefits Realization
Strategic
Defect Prevention
DQ Requirements
Quality Reengineering
Enterprise Cultural Shift
QC/Process Auditing
Data Stewardship
Tactical
Meta Data Creation
Quality Improvement
Validation
Analyze Certify
Measure
Rule Disclosure
15
Tactical Data Quality
16
Steps to Tactical Data Quality
Measure Quality
Meta Data Creation
Rule Disclosure
Analyze Certify
Validation
Quality Improvement
17
Rule Disclosure
18
Sources of Meta Data
  • Legacy Meta Data
  • Data Models, Process Models
  • Data Dictionary, Definitions, Aliases
  • Glossary of Terms
  • Transformation Meta Data
  • Data Mapping
  • Transformation Rules
  • Error Handling Rules
  • Access Meta Data
  • Data Directory
  • Data Definitions
  • The Subject Matter Expert
  • Database Directory
  • Domain Values, Range of Values
  • Run Books
  • Derived Data Calculations
  • Audit Statistics
  • Source Transformation

19
Acquiring good Meta Data is Essential
Meta Data can be gathered before, during or after
the Assessment
You can pay me now, or you can pay me later
20
  • Measuring
  • Data Quality
  • Techniques
  • Tools
  • Methods

21
How can Data Quality be Measured?
One accurate measurement is worth a thousand
expert opinions Grace Hopper, Admiral, US
Navy
  • Customer Complaints
  • User Interviews Feedback
  • Customer Satisfaction Survey
  • Data Quality Requirements Gathering
  • Data Quality Assessments

22
Measuring Data Quality - Tools
  • Analysis Tools
  • Specifically designed assessment tools
  • Quality Manager, Migration Architect
  • N A Trillium, Group-1, ID Centric, Finalist,
    etc.
  • Improvisations
  • SAS, Focus, SQL, other query tools
  • Other Necessary Tools
  • File Transfer
  • Data Conversion

23
Assessment Measurements
  • Level 1 Completeness
  • Nulls or Blanks
  • Misuse (or overuse) of Default Values
  • Level 2 Validity
  • Data Integrity Anomalies
  • Invalid Data based on Business Rule
  • Level 3 Structural Integrity
  • Primary Key Uniqueness
  • Key Structure (Cardinality, Referential
    Integrity, Alternate Keys)
  • Level 4 Business Rule Violations
  • Relationship between two or more fields
  • Calculations

Field Integrity Intuitive Integrity Rules
Business Rule Integrity Requiring Meta Data
24
  • Analyze
  • and Certify
  • Identifying Problems
  • Sizing up Problems
  • To Certify or Not to Certify

25
Template - field level
  • Value - the domain occurrence
  • Frequency - the number of occurrences within the
    data set
  • Percent - the of the whole set
  • 88 Info - the copybook definition for the value
  • Analysis - comments about our findings

26
Identifying Problems
1
2
3
Analysis (and Discovery) 1. Is the field
required? If so, blanks indicate an anomaly. 2.
Are the values ID206 and STANG allowed? (Is
this a problem with the data or the Meta
Data? 3.Some values occur in only 1.3 of the
records. Is this telling us there is a problem?
27
Data Quality Scoring
28
Example Poor Data Quality
29
Field Analysis
In a range of values, in the absence of domain
rules, investigate the first and last .2
Bell curve distribution
30
Management Reporting- Short Engagement
31
Management Reporting - Status
32
Management Reporting - Anomalies
33
Management Reporting - Productivity
34
Meta Data Creation
35
Example Data Quality Repository
36
Meta Data Supply Chain
Meta Data
Field Name
Data Inventory Meta Data
Work Groups
Data Requirements
37
Results Validation
38
Report Validation
SME validation an opportunity to improve Meta
Data 1. Supply a clear name for the field. 2. Is
there a good definition? 3. Make the business
rules public? 4. Will the SME initiate a data
cleansing initiative? 5. Does the SME recommend
edit or data transformation rules? 6. Are the
findings consistent with the SMEs expectations?
Report Sections Identification
1
2
3
Field Definition Rules
4
5
Score Explanation
Statistical Reports Analysis
6
39
Quality Improvement
40
Next Steps
41
Lessons Learned- Data Cleanup
100
Completeness
100
Accuracy
42
Summary
  • We made the distinction between
  • - Data Migration
  • - Data Quality
  • - Data Cleansing
  • We defined what good data quality is.
  • We discussed that there could be 10 or more
    processes that could take place in building a
    comprehensive data quality program for the
    enterprise.
  • - Tactical should precede the Strategic or be
    the 1st step of
  • There are 6 steps to an effective tactical data
    quality initiative
  • - Rule Disclosure
  • - Quality Measurement
  • - Analyze and Certify
  • - Meta Data Creation
  • - Validation
  • - Quality Improvement

43
Reference Material
  • The Demings Management Method (Total Quality
    Management), Mary Walton
  • Data Quality for the Information Age, Tom Redman
  • The Data Warehouse Challenge Taming Data Chaos,
    Michael Brackett
  • Improving Data Warehouse and Business Information
    Quality, Larry English
  • DM Review Magazine, Information Quality series by
    Larry English

44
  • Ron Forino
  • Director, Business Intelligence
  • DMR Consulting Group
  • (732)549-4100 X-8292
  • rforino_at_dmr.com
  • ronforino_at_aol.com

45
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com