Title: Implementing Complex Data Integrity Rules Inside Your RDBMS
1Implementing Complex Data Integrity Rules Inside
Your RDBMS
- Michael Haynes
- Nationwide
- haynesm_at_nationwide.com
- michael_haynes_at_yahoo.com
2Goals
- Explanation of the Problem
- Definition of Possible Solutions
- Examination of Several Common Data Integrity
Concerns - Presentation of Specific Examples for Users of
the Oracle RDBMS
3Agenda
- Data Integrity and The Logical-to-Physical
Conversion - Handling Several Specific Data Integrity Issues
Within Your Database - Pros and Cons
- Summary
- QA
4Data Integrity and The Logical-to-Physical
Conversion
5Your Starting Point
- Business rules defined and documented.
- Logical model is complete.
- Now all you need is a database and an application!
6Conversion Issues
- Logical-to-Physical Conversion can include many
decision points. - Easy to lose some of the value which can be
derived from capturing business rules.
7An Introduction to Our Model
- Employee Management System including
- Basic Employee Information
- Time Tracking By Project
8How to Convert?
- One option is to convert everything one-to-one.
- This produces a theoretically optimized database.
- But it may not be optimized for your specific
requirements and operations.
9One Implementation Strategy
- Results in many tables.
- Like data in multiple different locations.
(START_DATE)
10One-to-One Conversions
- The larger the model, the more likely a
one-to-one conversion can become unwieldy. - So you probably make changes to your design.
11Making Changes
- You may, in fact, make a lot of changes.
- Less tables, all like data in a single column.
- Some new issues.
12Data Integrity Issues
- Whichever way you go, even in this simple case,
you have one or more data integrity issues to
resolve. - How can that be?
- There are some rules depicted in models that
aren't inherently supported by major RDBMSs.
13Handling Several Specific Data Integrity Issues
14The Arc
15Arc Definition
- An arc relationship can be described as follows
- "For a given row, table A is related to one of
table B or table C (or table D) but not more
than one." - So, one of our timesheet rows can be either
related to a manager row or an intern row or an
employee row or a contractor row! - This design is a result of our choosing to
implement the subtypes of the EMS_STAFF entity.
16Lack of Control for Arcs
- There are no inherent RDBMS controls supporting
arcs. Since all four columns are optional,
without other controls, a user could populate all
four columns for a single row or populate none of
them.
17Resolving the Arc Dilemma
- This solution is Oracle specific but concepts can
be applied to other technologies. - Set up a generic function and trigger to check
arcs. Created new data type for column list. - Was able to easily modify the behavior when
additional requirements made modification
necessary.
18Arc Management Components
- First we created the new datatype
- CREATE TYPE COLUMN_LIST AS VARRAY(10) OF
VARCHAR2(100) - You must GRANT EXECUTE on the datatype to the
users of your function. - Then we created the function
- Created owned by the same ID as the datatype.
- Returns TRUE or FALSE
- Then we were able to create triggers using the
function.
19The ARC_CHECK Function
- CREATE OR REPLACE FUNCTION ARC_CHECK
- (ITAB COLUMN_LIST,
- IFLAG CHAR 'M')
- RETURN BOOLEAN IS
- column_name VARCHAR2(30)
- c NUMBER
- ind NUMBER
- min_hits NUMBER
- BEGIN
- IF UPPER(iFlag) 'O' THEN
- min_hits 0
- ELSE
- min_hits 1
- END IF
- c 0
- FOR ind IN 1..iTab.Count
- LOOP
- IF iTab(ind) IS NOT NULL THEN
- c c 1
- END IF
- END LOOP
- IF (c lt min_hits)
- OR (c gt 1) THEN
- RETURN FALSE
- ELSE
- RETURN TRUE
- END IF
- END
- /
20A Sample Arc Validation Trigger
CREATE OR REPLACE TRIGGER TIMESHEET_ARC_TRIG
BEFORE INSERT OR UPDATE OF CON_ID, EMP_ID,
INT_ID, MGR_ID ON EMS_TIMESHEETS FOR EACH
ROW BEGIN IF NOT arc_check( column_list(new.c
on_id,new.emp_id,new.int_id,new.mgr_id) ,'M')
THEN raise_application_error( -20610,'1
and only 1 staff column must be populated on
EMS_TIMESHEET') END IF END /
- If COLUMN_LIST is owned by another ID, explicitly
reference that in the call above.
("OTHERID.COLUMN_LIST")
21Another Arc Validation Option
- Another option
- Check constraints could serve the same purpose.
- Easy for 2 columns
- CHECK (emp_id is NULL or mgr_id is NULL) and NOT
(emp_id is NULL and mgr_id is NULL) - Gets more convoluted as columns are added.
- CHECK
- (emp_id is NOT NULL and mgr_id is NULL and int_id
is NULL) OR - (emp_id is NULL and mgr_id is NOT NULL and int_id
is NULL) OR - (emp_id is NULL and mgr_id is NULL and int_id is
NOT NULL)
22Arc Validation in Other RDBMSs
- What about other technologies?
- CHECK constraint works in DB2 or SQL Server.
- TRIGGERS and FUNCTIONS are common components of
major RDBMSs but syntax differs. - Complex data type options vary from vendor to
vendor.
23Implementing the Supertype in Our Example
24Subtypes or Supertype?
- We saw the result of implementing subtypes of the
entity EMS_STAFF without implementing the
supertype. - What would have been the result if we had chosen
to implement just the supertype without the
subtypes?
25Supertype Implementation Issues
- What potential issues were created?
- Allowing only interns to have an academic
institution. - Requiring all employees to have a benefit plan
code (and not allowing contractors to have one.)
26Resolving Supertype Issues
- Check Constraints
- CHECK (ACADEMIC_INSTITUTION is NULL or
STF_TYPE'INT') - CHECK (
- (BENEFIT_PLAN_CODE is NULL
- AND STF_TYPE in
- ('CON','INT'))
- OR
- (BENEFIT_PLAN_CODE is not NULL
- AND STF_TYPE in
- ('EMP','MGR'))
- )
27Non-Modeled Rules
- Some rules are discussed during the modeling
process but don't show up on the diagram. - Perhaps a timesheet entry must have comments if
it is for more than 40 hours. - CHECK (HOURS lt 40 OR COMMENTS is NOT NULL)
28One Name at a Time!
- Name History
- A person can have multiple names over time but
should only have one name at any given time. - The implementation at right doesn't enforce this
rule. A person could have 2 names active at the
same time.
29Name History Options
- Ways to enforce "1 name at a time" rule.
- Trigger on EMS_PERSON_NAMES.
- Logic could be somewhat complicated.
- In Oracle this can lead to the "mutating table"
problem. - CURRENT_NAME column on EMS_STAFF, history in
EMS_PERSON_NAMES. - Easier to code enforcement logic.
- Similar data in two places.
- Requires some additional considerations.
30One Possible Name History Solution
- Store current name on EMS_STAFF.
- Store history on EMS_PERSON_NAMES.
- Do not allow date gaps.
- When changing CURRENT_NAME on EMS_STAFF, insert
prior name into EMS_PERSON_NAMES. - Set END_DATE on EMS_PERSON_NAMES to current
system date. - Set START_DATE on EMS_PERSON_NAMES to
- END_DATE of previous name OR
- START_DATE of the person if they have no previous
name.
31Design Changes for Name History
- Leads to two physical design changes
- CURRENT_NAME added to EMS_STAFF.
- END_DATE on EMS_PERSON_NAMES is now mandatory.
32Name History Trigger
- create or replace trigger NAME_UPDATE_TRIG
- after UPDATE of CURRENT_NAME on EMS_STAFF
- for each row
- DECLARE
- CURSOR check_old_names (iPerson number) IS
- select max(end_date) from ems_person_names
- where stf_id iPerson
- max_date date
- BEGIN
- OPEN check_old_names(new.id)
- FETCH check_old_names into max_date
- IF max_date is not NULL THEN
- insert into ems_person_names values
(pnm_seq.nextval, - old.current_name, new.id, max_date,
SYSDATE) - ELSE
- insert into ems_person_names values
(pnm_seq.nextval, - old.current_name, new.id,
new.start_date, SYSDATE) - END IF
- END
33Non-Modeled Rules Overview
- These non-modeled rules are important to document
in some manner. - If different people are doing the logical model
and physical database design, it's important to
make sure there is a common information
repository.
34Controlled Redundancy
- Sometimes the database design that is developed
introduces some form of data redundancy. - Redundancy, if not controlled, can lead to data
conflicts. - If you're going to introduce redundancy, you need
to make sure it's controlled!
35A Simple Redundancy Example
- PARTS table's primary key is comprised of the
three foreign key columns. - Why do this?
- Foreign key columns on child tables of IMS_PARTS.
- How to handle?
- CHECK
- (CODE IPS_IDIPT_IDIV_ID)
36Oracle Mutating Tables
- ORA-04091 "table EMS_TIMESHEETS is mutating,
trigger/function may not see it" - What does this mean?
- (From Oracle's Metalink site) "A mutating table
is a table that is currently being modified by an
update, delete, or insert statement. If
your trigger contains a select statement or an
update statement referencing the table it is
triggering off of you will receive the error." - When might you see this?
- Attempts to validate data across multiple rows in
a single table. - This error does not occur in DB2 or SQL Server.
DB2 uses what it calls TRANSITION tables to
handle referencing data from the triggering
table.
37Causing the Mutating Table Error
- Suppose you want to enforce that no person
record more than 168 hours of timesheets for a
single week?
- create or replace trigger hours_in_week_trig
- before insert or update on ems_timesheets
- for each row
- DECLARE
- CURSOR check_hours(iPerson NUMBER, iWk DATE) IS
- select sum(hours) from ems_timesheets
- where stf_idiPerson and week_start_dateiWk
- hours number
- BEGIN
- OPEN check_hours(new.stf_id,new.week_start_dat
e) - FETCH check_hours INTO hours
- IF (hours new.hours) gt 168 THEN
- raise_application_error
- (-20630,'Hours for employee 'new.stf_id
- ' in week 'new.week_start_date
- ' exceeds 168.')
- END IF
- END
38Oh, no! Your table is mutating!
- Your new trigger works fine on INSERTs.
- But the first time someone tries to do an UPDATE
39Mutating Table Resolution Options
- So how do you prevent these mutants from crashing
your application? - One option is to put this sort of code in your
application code, not in triggers. - This loses the benefits derived from storing
business rule validation code in the database. - Use workarounds within the database.
- Oracle suggests a package with temporary data
storage and AFTER triggers. - We have used views and INSTEAD OF triggers to
handle the processing.
40Views and INSTEAD OF Triggers
- Create a view of the table to which you want to
apply data integrity rules . - Create an INSTEAD OF trigger on the view. It
will perform validation checks and if the data is
valid it will perform the operations originally
requested. - INSTEAD OF triggers are very powerful but can be
complex to code.
41INSTEAD OF Trigger Example
- create or replace trigger
- hours_in_week_trig
- instead of insert or update
- on v_ems_timesheets
- for each row
- DECLARE
- CURSOR check_hours
- (iPerson NUMBER,
- iWeek DATE,
- iKey NUMBER) IS
- select sum(hours)
- from v_ems_timesheets
- where stf_id iPerson
- and week_start_date iWeek
- and id ! iKey
- hours number
- BEGIN
- OPEN check_hours
- (new.stf_id,
- IF (hours new.hours) gt 168 THEN
- raise_application_error(-20630,
- 'Hours for employee 'new.stf_id
- ' in week 'new.week_start_date
- ' exceeds 168.')
- ELSE
- IF INSERTING THEN
- insert into ems_timesheets values
- (new.id, new.week_start_date,
- new.hours, new.stf_id,
- new.prj_id, new.comments)
- ELSE
- update ems_timesheets set
- week_start_date
- new.week_start_date,
- hours new.hours,
- stf_id new.stf_id,
- prj_id new.prj_id,
- comments new.comments
42NO MORE MUTANTS!
43INSTEAD OF Trigger Warning
- IMPORTANT
- If you set up an INSTEAD OF trigger on a view to
enforce data integrity you must be certain to
ONLY grant privileges on the view, not on the
underlying table. - INSERTs, UPDATEs, or DELETEs against the table
EMS_TIMESHEETS will NOT be validated by this
trigger.
44Be Kind to Your Users
- Don't forget that you're writing application code
with which others will be interacting. - Write clear error messages.
- "Arc violation!" is not a good error message.
- "Arc rules not followed for columns ABC, DEF on
YOUR_TABLE." is better. - "One and only one of columns ABC, DEF may be
populated for any row in YOUR_TABLE." is even
more clear. - Remember to comment your code for future
maintainers. - And, of course, test your design THOROUGHLY
before rolling it out.
45Pros and Cons, Summary, and QA
46What's the Bad News?
- There's bad news?
- Up-Front Expenditures
- Increases database setup time.
- Need for new skill sets
- Has benefits, but is part of the up-front cost.
- Integrity check execution does take some time.
- But they have to be done somewhere or you take
the risk of invalid data existing in your
database. - Different methods of data integrity validation
can have significantly different performance.
47What's the Good News? (1)
- You can remove part of the risk of bad data.
- Can't remove the risk of incorrect data.
(Misspelled names, etc.) - CAN remove or substantially mitigate the risk of
data which is invalid per your business rules. - The upfront costs may be recouped, at least in
part, during your development process. - Code can be reused.
- Testing time may be decreased.
48More Good News!
- Long-Term Benefits
- Better data quality.
- Tangible Benefits (decreased costs for postage,
etc.) - Intangible Benefits (customer confidence, etc.)
- Easier maintenance. (If a business rule changes,
only one piece of code needs to be modified.)
49Summary (1)
- Data integrity is important!
- One study estimates costs to US business of 611
billion per year just from the impact of poor
quality customer data. - "Data Quality and the Bottom Line" (Eckerson)
- Storing integrity checking code in the database
- Minimizes the amount of code to be written
- Ensures that integrity checks are consistently
applied. - Simplifies future maintenance of the integrity
checks.
50Summary (2)
- In even a one-to-one logical to physical
conversion there may be business rules which are
not enforced by the database's core features. - Certain design decisions can bring about the need
for additional rule enforcement. - Database features such as triggers, functions,
and check constraints can help set up needed rule
enforcement.
51Summary (3)
- There are upfront costs but they are typically
worth it. - Make sure to write clear error messages.
52QA
- Time permitting, I'll take questions!
53References
- DB2 Technical Manuals by IBM
- Oracle Technical Manuals by Oracle
- MSDN Library by Microsoft
- "Data Quality and the Bottom Line" by Wayne W.
Eckerson - Oracle Metalink
- http//metalink.oracle.com/
- "Triggers and DB2 Version 6" by Craig S. Mullins
- http//www.craigsmullins.com/db2_triggers.htm
- "Using Check Constraints to Simulate Domains" by
Craig S. Mullins - http//www.craigsmullins.com/dbu_1201.htm
- "DB2 User-Defined Functions" by Andrew Pletch
- http//www.mcs.newpaltz.edu/pletch/db2_UserDefine
dFunctions.html - Multiple "How do I?" features
- http//www.easy-sql-server.com/
54THANK YOU!
- THANK YOU!
- More Questions/Comments?
- My E-mail
- haynesm_at_nationwide.com
- OR
- michael_haynes_at_yahoo.com