Title: Business Rules, Data Quality, and Information Compliance
1Business Rules, Data Quality, and Information
Compliance
- David Loshin
- Knowledge Integrity Incorporated
- loshin_at_knowledge-integrity.com
- (301) 754-6350
2Agenda
- Data Quality and Risk
- Information Compliance
- Information Compliance and Business Rules
3The Value of Information Quality
- Both internal and external factors affect how
information is represented and used - Traditional focus on structure instead of content
has led to deficiencies in asserting validity of
data - There is growing recognition that responsibility
for information quality lies with the business
client - Increased external pressures are beginning to
influence the care we take in managing content
4Risks and Data Quality
Policy
Compliance
Value/Benefits
Poor Data Quality
5Types of Risk
- Compliance Risks
- System Development Risks
- Increased Operational Costs
- Lost Opportunities
- Being able to quantify impediments to achieving
business objectives due to poor data quality
should be seen as a critical part of business
risk management
6Managing Data Quality
- Data is a critical organizational asset
- Manage the quality of data with the same
diligence that companies use in managing all
other assets and resources TDWI Data Quality
and the Bottom Line - Companies that manage their data as a strategic
resource and invest in its quality are already
pulling ahead in terms of reputation and
profitability PriceWaterhouseCoopers Global
Data Management Survey 2001
7Data Quality is Critical to Data Warehouse Success
- By 2005, more than 50 percent of projects will
fail, Fortune 1000 companies will spend or lose
more on operational inefficiencies in the
back-office than on data warehousing or CRM.
Ted Friedman, Gartner Group - According to PriceWaterhouseCoopers Global Data
Management Survey, poor data quality meant that - Over 50 of respondents had incurred extra costs
to prepare reconciliations - A third had been forced to delay or scrap new
systems - Almost a third had failed to bill or collect
receivables
8Fitness for Use
- Defect-free data is not a requirement
- Instead, target measured compliance with user
expectations above agreed-to thresholds - But How do we know when data quality is at an
acceptable level? - Fuzzy notions of good vs. bad data
- Different criteria for different users
- Data sets are used in ways they were never
intended - Data Quality is Contextual
9What Can Go Wrong?
- Data entry errors
- Absence of agreement as to business term meanings
- Mismatched syntax, formats and structures
- Unexpected changes in source systems
- Multiple interfaces to same back end
- Validity failures
- Data conversion errors
- Changes in use and perception of data
10The Real Problem
- There are no objective measures of data quality
- The only industry metrics are based on name
deduplication or address standardization - The scope of the business importance of
information validation is widely underestimated!
11Understanding and Addressing the Problem
- Any situation where information must comply with
business client expectations may be considered a
data quality problem - To effectively manage data quality, we must be
able to - Determine data quality expectations
- Identify contextual metrics
- Assess levels of data quality
- Identify opportunities for improvement
- Eliminate sources of problems
- Measure continuous improvement against baseline
12The Knowledge Integrity Approach
- Introduce a methodology for expressing business
client data quality expectations and measuring
conformance with those expectations - Provide a framework for defining data quality and
business rules at a high level - Address the most common data quality problems at
the source instead of repetitive data correction - Demonstrate the ability to transform the
statement of rules into actualized processes to
measure and report based on those defined rules - Information Compliance
13Information Compliance
- The coordinated, measurable conformance of a
collection of data instances with a set of
explicitly defined data expectations expressed
using a formal rule language. - Can be used
- To characterize fitness for use
- To motivate the definition of data expectations
- To determine validity of information within a set
of defined constraints - As the basis for business performance metrics and
measurements of those metrics - To enable rapid root cause analysis of
information scrap and rework
14Information Compliance and Data Quality
- Effective measurement of compliance with
expectations is a core component of a data
quality strategy - Defining information consumer expectations
provides clarity as to data quality requirements - Expectations must be tied to specific business
impacts - Well-defined requirements provide insight into
objective metrics and key data quality indicators
15Information ComplianceNot Just for Data Quality!
- Traditional design processes
- Documentation/data dictionaries
- Business management
- Policies
- Standards
- Regulations
- Knowledge management
16Maturation of the Web
17Neo-Centralization
We are going to make scads of money!
Business Intelligence Data Warehouse
How do we reconcile between these different
systems?
18Regulatory Compliance
Your companys CEO
Watch this space
19Business Management
- Our corporate pricing strategy has been revised
to reflect a more customer-friendly approach - The first time a customer buys one of our
products, he or she will be given an immediate
15 discount - To encourage repeat sales, any of our current
customers can buy additional quantities of any
already purchased product at a 10 discount - Our preferred customers receive a 25 discount on
any purchase
20Business Management
- New customers are given a 15 discount on first
purchase of a product
- Current customers are given a 10 discount on
each additional product purchased
- Preferred customers are given a 25 discount on
all products
21Policy, Example 1
- The following paragraphs have been taken from the
Yahoo! Privacy policy, http//privacy.yahoo.com/pr
ivacy/us/ - Yahoo! does not rent, sell, or share personal
information about you with other people or
nonaffiliated companies except to provide
products or services you've requested, when we
have your permission, or under the following
circumstances - We provide the information to trusted partners
who work on behalf of or with Yahoo! under
confidentiality agreements. These companies may
use your personal information to help Yahoo!
communicate with you about offers from Yahoo! and
our marketing partners. However, these companies
do not have any independent right to share this
information. - We have a parent's permission to share the
information if the user is a child under age 13.
Parents have the option of allowing Yahoo! to
collect and use their child's information without
consenting to Yahoo! sharing of this information
with people and companies who may use this
information for their own purposes
22Policy, Example 1
- The following paragraphs have been taken from the
Yahoo! Privacy policy, http//privacy.yahoo.com/pr
ivacy/us/ - Yahoo! does not rent, sell, or share personal
information about you with other people or
nonaffiliated companies except to provide
products or services you've requested, when we
have your permission, or under the following
circumstances - We provide the information to trusted partners
who work on behalf of or with Yahoo! under
confidentiality agreements. These companies may
use your personal information to help Yahoo!
communicate with you about offers from Yahoo! and
our marketing partners. However, these companies
do not have any independent right to share this
information. - We have a parent's permission to share the
information if the user is a child under age 13.
Parents have the option of allowing Yahoo! to
collect and use their child's information without
consenting to Yahoo! sharing of this information
with people and companies who may use this
information for their own purposes
23Policy, Example 2
- The following paragraph has been taken from the
Earthlink Small Office DSL Terms and Conditions
http//www.earthlink.net/about/policies/smofficete
rms/ - If you need to cancel your EarthLink Small Office
DSL Service after installation, please send your
written request to EarthLink Business Access
Customer Service at fax (408) 881-3011. We
require 30 days notice for service cancellation.
To process your cancellation request, we require
that you provide the following (1) Written
request submitted on company letterhead by your
billing contact (2) Your customer or account
number (3) Current phone number (4) Reason for
canceling service.
24Policy, Example 2
- The following paragraph has been taken from the
Earthlink Small Office DSL Terms and Conditions
http//www.earthlink.net/about/policies/smofficete
rms/ - If you need to cancel your EarthLink Small Office
DSL Service after installation, please send your
written request to EarthLink Business Access
Customer Service at fax (408) 881-3011. We
require 30 days notice for service cancellation.
To process your cancellation request, we require
that you provide the following (1) Written
request submitted on company letterhead by your
billing contact (2) Your customer or account
number (3) Current phone number (4) Reason for
canceling service.
25Data Dictionary Example 1
26Data Dictionary Example 2
27Standards, Example 1
28Standards, Example 1
29Standards, Example 2
30Legislation
From the Personal Responsibility and Work
Opportunity Reconciliation Act of 1996 (PRWORA)
(1) TRANSMISSION OF WAGE WITHHOLDING NOTICES TO
EMPLOYERS.Within 2 business days after the date
information regarding a newly hired employee is
entered into the State Directory of New Hires,
the State agency enforcing the employees child
support obligation shall transmit a notice to the
employer of the employee directing the employer
to withhold from the income of the employee an
amount equal to the monthly (or other periodic)
child support obligation (including any past due
support obligation) of the employee, unless the
employees income is not subject to withholding
pursuant to section 466(b)(3).
31Legislation
From the Personal Responsibility and Work
Opportunity Reconciliation Act of 1996 (PRWORA)
(1) TRANSMISSION OF WAGE WITHHOLDING NOTICES TO
EMPLOYERS.Within 2 business days after the date
information regarding a newly hired employee is
entered into the State Directory of New Hires,
the State agency enforcing the employees child
support obligation shall transmit a notice to the
employer of the employee directing the employer
to withhold from the income of the employee an
amount equal to the monthly (or other periodic)
child support obligation (including any past due
support obligation) of the employee, unless the
employees income is not subject to withholding
pursuant to section 466(b)(3) .
32Regulations
- Sarbanes Oxley Act of 2002, which requires each
annual report of an issuer to contain an
"internal control report", which shall - state the responsibility of management for
establishing and maintaining an adequate internal
control structure and procedures for financial
reporting and - contain an assessment, as of the end of the
issuer's fiscal year, of the effectiveness of the
internal control structure and procedures of the
issuer for financial reporting.
33Information Compliance Activities
- Documentation of Policy and Linkage to Business
Rules - Assessment
- Monitoring
- ROI calculation
- Root cause analysis
- Continuous data quality improvement
- Knowledge Capture, Management, Transfer
34Benefits
- Standardization of metadata and widely-shared
reference information across collection of
organizations - Ability to capture and manage business logic as
content - Overall improved information quality and improved
operational efficiency - High-level description of information integration
process
35More Benefits
- Abstraction of business rule specification from
implementation provides for - Rapid application development
- Retargetability
- Reuse
- Continuous improvement
- Enhanced matrixed information coordination
36What is a Business Rule?
- a statement that defines or constrains some
aspect of a business by asserting control over
some behavior of that business
37Business Rules?
Ok, now get this down. If it is a Monday, and it
is raining outside, and if there might be a red
corvette parked on the roof of the garage, then
if the clients mood is ok, then we can charge
the double rate when the clients head is turned
to the right, and
Yeah, this is good that we are finally
documenting this business rule.
38Rules and Rules
- The value of business rules lies in the ability
to describe assertions about a system using a
formal framework that is both actionable and
adaptable - The popular perception of business rules leaves a
wide gap between what is describable and what is
actionable - In other words, what people think of as being
business rules are usually not business rules
39Rule-Based Validation
V A L I D A T E
Event
System State
Rules Engine
Business Rules
40The Value of Rules
- Documents business logic
- Automates business processes
- Middle ground of definition between technicians
and business clients - Rapid development
- Rapid adaptability to change
41Data Quality Business Rules
42Defining a Semantic Rule Hierarchy
- Lets use what we know about data to drive that
definition framework - Discuss how to transform those rules into
operational code in different ways - We can see ways in which the rules can be
integrated into higher-level information
compliance applications
43Granularity the Semantic Hierarchy I
- Distinct values, perhaps bound to single instance
objects
44Granularity the Semantic Hierarchy II
- Sets of values unbound to a specific attribute
45Data Values
- Range restrictions
- Format Restrictions
- Data domains
- Null values
46Null Values
- No Value
- Unavailable
- Not Applicable
- Not Classified
- Unknown
- Default
47Null Values Some Examples
- Social Security Numbers
- 000000000, 999999999
- Names
- ?, ??, ???, ????, ?????,N/A, NA,
NONE, None, UNKNOWN, Unknown, n/a,
na, none, unknown - Phone Numbers
- No phone number provided
- 000-000-0000
- 999-999-9999
48Data Domain Definition
- Enumerated Domains
- Define States as Alabama, Alaska,
- Implemented in SQL as creation of a temporary
table and populating it with values - Table-derived Domains
- Define validIDs as employee.id when
employee.status active - Implemented in SQL as a subquery
- Constructive Domains
- Valid values are defined as a function of other
values
49Granularity the Semantic Hierarchy III
- All values bound to different instances of the
same attribute
50Data Attributes
- Absence of values
- Restriction of values
51Null Value Rules
- Nulls not Allowed
- Order.productID may not be null
- Null Representations
- Define NO_NUMBER as NOVALUE as no number
provided - Represented Nulls Allowed
52Domain Membership
- Assert that each value in all instances of a
named attribute are taken from the specified data
domain - Payroll.id belongs to validIDs
- Implemented in SQL as a query extracting all
records whose values are not in the named domain,
represented by a subquery - Data domains may be shared across an enterprise
53Granularity and the Semantic Hierarchy IV
- Single set of attribute,value pairs
54Data Records
- Completeness
- Exemption
- Consistency
55Record-Level Completeness
- Assert that a record is not complete unless a
list of attributes are non-null - Example If (order.productClass option) then
incomplete without underlier, strikePrice, and
expiration - Implemented in SQL as extracting any record where
the condition is true and any of the named
attributes is null
56Record-Level Exemption
- Asserts that under the specified condition an
attribute should not have a value - Example If (hcfa1500.otherInsurance N) then
exempt (hcfa1500.otherPolicyNumber,
hcfa1500.otherGroupName) - Implemented similarly to completeness
57Record-Level Consistency
- Either a straightforward assertion or one guarded
by a condition - Example If (employee.level manager) then
(employee.salary gt 30000) and (employee.salary
lt 54000) - Implemented by finding all records that meet the
condition but not the assertion
58Granularity and the Semantic Hierarchy V
- Relationship of sets of values bound to sets of
attributes
59Functional Dependency
- Asserts the dependence of one set of attribute
values on a different set of attribute values - Example ZIPCode DEPEND ON Street, City,
State - Implemented by searching for occurrences where
pairs of records that share the determining
attribute values have different dependent
variables
60Uniqueness
- Asserts that across a set of records, there may
not be two or more records sharing the same
values in a set of named attributes - Example NAME, SSN are UNIQUE
- Implemented by searching for occurrences where
pairs of records that share the values for the
set of named attributes
61Granularity and the Semantic Hierarchy VI
- Multiple sets of attribute, value pairs
62Instance Classification Assertions
- Groups sets of objects together, then applies
assertion to group - Example All test results with the same PSA test
score value and the same prostate cancer risk and
a risk factor greater than 10 must have a white
blood count greater than 1000 - CLASSIFY BY ltclassification expressiongt,
ltconditiongt IMPLIES ltconsequentgt
63Granularity and the Semantic Hierarchy VII
- Aggregation of a set of values
64Aggregate Assertion
- Aggregate functions are useful for making
assertions - Aggregates AVG, COUNT, MAX, MIN, SUM
- An aggregate assertion makes some statement about
compliance with respect to the result of an
aggregate function - Example The number of distinct values in todays
update must not be less than the number of
distinct values in yesterdays update
65Aggregate Dependence
- A dependence rule uses the result of an aggregate
function to check for compliance against other
data instances - Example Any order greater than 1000.00 is given
a 10 discount
66Granularity and the Semantic Hierarchy VIII
- Assertional relationship that exists across
multiple instance sets
67Foreign Key Assertion
- This rule establishes a connective link between a
set of attributes of one set of objects to a set
of attributes in a different set of objects - The assertion implies that all instances of the
targeted attributes exist within one object from
the source set of objects
68Projected Completeness
- For all objects related to each source object, if
a condition evaluates to true then related object
attribute lists must have values - Example All order line items must have a product
code and a quantity
69Projected Exemption
- For all objects related to each source object, if
a condition evaluates to true then related object
attribute lists may not have values - Example For each order, if an item ordered comes
in a single color, the color must be null
70Projected Consistency
- For all objects related to each source object, if
a condition evaluates to true then the consequent
must also evaluate to true - Example For all extensions associated with a PBX
number, if the line is marked as active, then a
router ID must be a valid router ID
71Did I Just Pull a Fast One?
- For all extensions associated with a PBX number,
if the line is marked as active, then a router ID
must be a valid router ID
- We should be able to combine assertions from
lower levels at higher levels, right?
- The rule above embeds a domain membership
assertion valid router ID
72Semantic Hierarchy Summary
- Values
- Sets of values
- Values bound to object attribute across a set of
objects - Values assigned to single objects attributes
- Relationship of sets of attribute values across
set of objects - Values assigned to attributes of a set of objects
- Aggregation of a set of values
- Assertions that cross object set boundaries
73Rule-Based Validation
74Taking Action
?
Express rules formally
State rules at high level
Identify business rules in context
75Exploit Formality
- Formal representation has certain characteristics
that are desirable
- Constraint expressions neatly splits object
collection into compliant and non-compliant sets
- Formal specification is implementation-independent
- Well-defined syntax is parsable
76Transforming Rules
- Validation scheme can be constructed by
operationalizing each formal rule statement
- Question How is a rule operationalized?
- Answer Provide a scheme for turning each formal
rule statement into a corresponding executable
statement in some target implementation framework
77Distinguishing Noncompliant Objects
- By asserting a constraint relating to a set of
objects within our semantic hierarchy, we
effectively define a bisection of that set into
two subsets - Conformant objects, or ones that do not violate
the constraint - Nonconformant objects, or ones that do violate
the constraint
In practice, if we can operationalize the test of
the constraint, we can use it as a test to
identify and extract noncompliant objects
78Distinguishing Noncompliant Objects - Benefits
- Violating objects can be collected and grouped by
violated rule - Eases reconciliation
- Improves root cause analysis
- The rule statement itself can be used as an error
message - Provides high-level feedback
- Understandable by both technicians and business
clients
79Conclusion
- Data quality is a special case of a more
general concept called Information Compliance - Information Compliance introduces a methodology
for capturing and formally expressing user
information expectations and measuring
conformance with those expectations - Information Compliance can be implemented using a
business rules approach
80Questions?
- If you have questions, please contact me
- David Loshin
- 301-754-6350
- loshin_at_knowledge-integrity.com