Title: Ron%20Forino
1Project Driven Data Quality Improvement
Ron Forino DAMA - Washington, DC September 1999
2Examples
- According to DM Review, one European company
discovered through an audit that it was not
invoicing 4 of its orders. With 2 billion in
revenues, that meant 80 million went unpaid. - Electronic data audits show that the invalid data
values in the typical customer database average
around 15 - 20. Physical audits suggest that
this number may be closer to 25 - 30. - In 1992, 96,000 IRS tax refund checks were
returned undeliverable due to incorrect
addresses. - This year, incorrect price data in retail
databases will cost American consumers as much as
2.5 billion in overcharges. - According to organizations like the Data
Warehouse Institute, the Gartner Group and
MetaGroup - Data Quality is one of the top 1-3
success factors to Data Warehousing. - The average mid-sized company may have 30,000 -
50,000 fields in files, tables, screens, reports,
etc. Platinum Technology
3Agenda
- Definitions
- What is Data Quality?
- Tactics and the End Game
- Building Blocks to Data Quality
- Tactical Initiatives
- Strategic Initiatives
- Tactical Data Quality
- Rule Disclosure
- Data Quality Measurement, Analysis and
Certification - Meta Data Creation
- Validation
- Quality Improvement
4Definitions
5Definitions
- Data Transformation - Changing data values to a
format consistent with integrity and business
rules agreed to by data stakeholders. - Data Cleansing - Consolidation of redundant
customer records. Term used to describe the
process of merging and purging of customer
lists in an effort to reduce duplicate or
inaccurate customer records. - Data Quality Improvement - The process of
improving data quality to the level desired to
support the enterprise information demand. - Data Quality - definition to follow.
6Data Quality Improvement Decision Tree
Task Process
Transform
Conform to Business Rule
Data Reengineering
Data Quality Improvement
Process Reengineering
Standardize Validate Match Dedupe Integrate Enrich
Match Dedupe
Data Cleansing
7Tactics and The End Game
We need better data quality...
Enterprise Initiative
Select Project
Data Quality Assessment
Report Recommendations
Source System Clean-up Initiative
8Tactics and The End Game
We need better data quality...
Enterprise Initiative
Data Warehouse
Select Project
Data Quality Assessment
Data Quality Assessment
Staging Specifications
Report
Report Recommendations
Source System Clean-up Initiative
Source System Clean-up Initiative
9What is Good Data Quality?
10How Can We Know Good Data Quality?
Is this Good Data Quality?
-
- Column 1
- 321453
- 212392
- 093255
- 214421
- .
- .
- .
-
What can we conclude?
11What is Data Quality?
- Information Quality f(Definition Data
Presentation) -
- Definition
- Defines Data
- Domain Value Specification
- Business Rules that Govern the Data
- Information Architecture Quality
- Data Content
- Completeness
- Validity/Reasonability
- Data Presentation
- Accessible
- Timely
- Non-ambiguous
12Common Data Quality Problems
- Data Content
- Missing Data
- Invalid Data
- Data Outside Legal Domain
- Illogical Combinations of Data
- Structural
- Record Key Integrity
- Referential Integrity
- Cardinality Integrity
- Migration/Integration
- Rationalization Anomalies
- Duplicate or Lost Entities
-
- Definitions and Standards
- Ambiguous Business Rules
- Multiple Formats for Same Data Elements
- Different Meanings for the Same Code Value
- Multiple Codes Values with the Same Meaning
- Field Used for Unintended Data
- Data in Filler
- Y2K Violation
13Building Blocks to Data Quality
14Building Blocks of a Data Quality Program
Benefits Realization
Strategic
Defect Prevention
DQ Requirements
Quality Reengineering
Enterprise Cultural Shift
QC/Process Auditing
Data Stewardship
Tactical
Meta Data Creation
Quality Improvement
Validation
Analyze Certify
Measure
Rule Disclosure
15Tactical Data Quality
16Steps to Tactical Data Quality
Measure Quality
Meta Data Creation
Rule Disclosure
Analyze Certify
Validation
Quality Improvement
17Rule Disclosure
18Sources of Meta Data
- Legacy Meta Data
- Data Models, Process Models
- Data Dictionary, Definitions, Aliases
- Glossary of Terms
- Transformation Meta Data
- Data Mapping
- Transformation Rules
- Error Handling Rules
- Access Meta Data
- Data Directory
- Data Definitions
- The Subject Matter Expert
- Database Directory
- Domain Values, Range of Values
- Run Books
- Derived Data Calculations
- Audit Statistics
- Source Transformation
19Acquiring good Meta Data is Essential
Meta Data can be gathered before, during or after
the Assessment
You can pay me now, or you can pay me later
20- Measuring
- Data Quality
- Techniques
- Tools
- Methods
21How can Data Quality be Measured?
One accurate measurement is worth a thousand
expert opinions Grace Hopper, Admiral, US
Navy
- Customer Complaints
- User Interviews Feedback
- Customer Satisfaction Survey
- Data Quality Requirements Gathering
- Data Quality Assessments
22Measuring Data Quality - Tools
- Analysis Tools
- Specifically designed assessment tools
- Quality Manager, Migration Architect
- N A Trillium, Group-1, ID Centric, Finalist,
etc. - Improvisations
- SAS, Focus, SQL, other query tools
- Other Necessary Tools
- File Transfer
- Data Conversion
23Assessment Measurements
- Level 1 Completeness
- Nulls or Blanks
- Misuse (or overuse) of Default Values
- Level 2 Validity
- Data Integrity Anomalies
- Invalid Data based on Business Rule
- Level 3 Structural Integrity
- Primary Key Uniqueness
- Key Structure (Cardinality, Referential
Integrity, Alternate Keys) - Level 4 Business Rule Violations
- Relationship between two or more fields
- Calculations
Field Integrity Intuitive Integrity Rules
Business Rule Integrity Requiring Meta Data
24- Analyze
- and Certify
- Identifying Problems
- Sizing up Problems
- To Certify or Not to Certify
25Template - field level
- Value - the domain occurrence
- Frequency - the number of occurrences within the
data set - Percent - the of the whole set
- 88 Info - the copybook definition for the value
- Analysis - comments about our findings
26Identifying Problems
1
2
3
Analysis (and Discovery) 1. Is the field
required? If so, blanks indicate an anomaly. 2.
Are the values ID206 and STANG allowed? (Is
this a problem with the data or the Meta
Data? 3.Some values occur in only 1.3 of the
records. Is this telling us there is a problem?
27Data Quality Scoring
28Example Poor Data Quality
29Field Analysis
In a range of values, in the absence of domain
rules, investigate the first and last .2
Bell curve distribution
30Management Reporting- Short Engagement
31Management Reporting - Status
32Management Reporting - Anomalies
33Management Reporting - Productivity
34Meta Data Creation
35Example Data Quality Repository
36Meta Data Supply Chain
Meta Data
Field Name
Data Inventory Meta Data
Work Groups
Data Requirements
37Results Validation
38Report Validation
SME validation an opportunity to improve Meta
Data 1. Supply a clear name for the field. 2. Is
there a good definition? 3. Make the business
rules public? 4. Will the SME initiate a data
cleansing initiative? 5. Does the SME recommend
edit or data transformation rules? 6. Are the
findings consistent with the SMEs expectations?
Report Sections Identification
1
2
3
Field Definition Rules
4
5
Score Explanation
Statistical Reports Analysis
6
39Quality Improvement
40Next Steps
41Lessons Learned- Data Cleanup
100
Completeness
100
Accuracy
42Summary
- We made the distinction between
- - Data Migration
- - Data Quality
- - Data Cleansing
- We defined what good data quality is.
- We discussed that there could be 10 or more
processes that could take place in building a
comprehensive data quality program for the
enterprise. - - Tactical should precede the Strategic or be
the 1st step of - There are 6 steps to an effective tactical data
quality initiative - - Rule Disclosure
- - Quality Measurement
- - Analyze and Certify
- - Meta Data Creation
- - Validation
- - Quality Improvement
43Reference Material
- The Demings Management Method (Total Quality
Management), Mary Walton - Data Quality for the Information Age, Tom Redman
- The Data Warehouse Challenge Taming Data Chaos,
Michael Brackett - Improving Data Warehouse and Business Information
Quality, Larry English - DM Review Magazine, Information Quality series by
Larry English
44- Ron Forino
- Director, Business Intelligence
- DMR Consulting Group
- (732)549-4100 X-8292
- rforino_at_dmr.com
- ronforino_at_aol.com
45(No Transcript)