Implementing Complex Data Integrity Rules Inside Your RDBMS - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Implementing Complex Data Integrity Rules Inside Your RDBMS

Description:

'For a given row, table A is related to one of table B or table C (or table D... but is part of the up-front cost. Integrity check execution does take some ... – PowerPoint PPT presentation

Number of Views:206
Avg rating:3.0/5.0
Slides: 55
Provided by: Nati154
Category:

less

Transcript and Presenter's Notes

Title: Implementing Complex Data Integrity Rules Inside Your RDBMS


1
Implementing Complex Data Integrity Rules Inside
Your RDBMS
  • Michael Haynes
  • Nationwide
  • haynesm_at_nationwide.com
  • michael_haynes_at_yahoo.com

2
Goals
  • Explanation of the Problem
  • Definition of Possible Solutions
  • Examination of Several Common Data Integrity
    Concerns
  • Presentation of Specific Examples for Users of
    the Oracle RDBMS

3
Agenda
  • Data Integrity and The Logical-to-Physical
    Conversion
  • Handling Several Specific Data Integrity Issues
    Within Your Database
  • Pros and Cons
  • Summary
  • QA

4
Data Integrity and The Logical-to-Physical
Conversion
5
Your Starting Point
  • Business rules defined and documented.
  • Logical model is complete.
  • Now all you need is a database and an application!

6
Conversion Issues
  • Logical-to-Physical Conversion can include many
    decision points.
  • Easy to lose some of the value which can be
    derived from capturing business rules.

7
An Introduction to Our Model
  • Employee Management System including
  • Basic Employee Information
  • Time Tracking By Project

8
How to Convert?
  • One option is to convert everything one-to-one.
  • This produces a theoretically optimized database.
  • But it may not be optimized for your specific
    requirements and operations.

9
One Implementation Strategy
  • Results in many tables.
  • Like data in multiple different locations.
    (START_DATE)

10
One-to-One Conversions
  • The larger the model, the more likely a
    one-to-one conversion can become unwieldy.
  • So you probably make changes to your design.

11
Making Changes
  • You may, in fact, make a lot of changes.
  • Less tables, all like data in a single column.
  • Some new issues.

12
Data Integrity Issues
  • Whichever way you go, even in this simple case,
    you have one or more data integrity issues to
    resolve.
  • How can that be?
  • There are some rules depicted in models that
    aren't inherently supported by major RDBMSs.

13
Handling Several Specific Data Integrity Issues
14
The Arc
15
Arc Definition
  • An arc relationship can be described as follows
  • "For a given row, table A is related to one of
    table B or table C (or table D) but not more
    than one."
  • So, one of our timesheet rows can be either
    related to a manager row or an intern row or an
    employee row or a contractor row!
  • This design is a result of our choosing to
    implement the subtypes of the EMS_STAFF entity.

16
Lack of Control for Arcs
  • There are no inherent RDBMS controls supporting
    arcs. Since all four columns are optional,
    without other controls, a user could populate all
    four columns for a single row or populate none of
    them.

17
Resolving the Arc Dilemma
  • This solution is Oracle specific but concepts can
    be applied to other technologies.
  • Set up a generic function and trigger to check
    arcs. Created new data type for column list.
  • Was able to easily modify the behavior when
    additional requirements made modification
    necessary.

18
Arc Management Components
  • First we created the new datatype
  • CREATE TYPE COLUMN_LIST AS VARRAY(10) OF
    VARCHAR2(100)
  • You must GRANT EXECUTE on the datatype to the
    users of your function.
  • Then we created the function
  • Created owned by the same ID as the datatype.
  • Returns TRUE or FALSE
  • Then we were able to create triggers using the
    function.

19
The ARC_CHECK Function
  • CREATE OR REPLACE FUNCTION ARC_CHECK
  • (ITAB COLUMN_LIST,
  • IFLAG CHAR 'M')
  • RETURN BOOLEAN IS
  • column_name VARCHAR2(30)
  • c NUMBER
  • ind NUMBER
  • min_hits NUMBER
  • BEGIN
  • IF UPPER(iFlag) 'O' THEN
  • min_hits 0
  • ELSE
  • min_hits 1
  • END IF
  • c 0
  • FOR ind IN 1..iTab.Count
  • LOOP
  • IF iTab(ind) IS NOT NULL THEN
  • c c 1
  • END IF
  • END LOOP
  • IF (c lt min_hits)
  • OR (c gt 1) THEN
  • RETURN FALSE
  • ELSE
  • RETURN TRUE
  • END IF
  • END
  • /

20
A Sample Arc Validation Trigger
CREATE OR REPLACE TRIGGER TIMESHEET_ARC_TRIG
BEFORE INSERT OR UPDATE OF CON_ID, EMP_ID,
INT_ID, MGR_ID ON EMS_TIMESHEETS FOR EACH
ROW BEGIN IF NOT arc_check( column_list(new.c
on_id,new.emp_id,new.int_id,new.mgr_id) ,'M')
THEN raise_application_error( -20610,'1
and only 1 staff column must be populated on
EMS_TIMESHEET') END IF END /
  • If COLUMN_LIST is owned by another ID, explicitly
    reference that in the call above.
    ("OTHERID.COLUMN_LIST")

21
Another Arc Validation Option
  • Another option
  • Check constraints could serve the same purpose.
  • Easy for 2 columns
  • CHECK (emp_id is NULL or mgr_id is NULL) and NOT
    (emp_id is NULL and mgr_id is NULL)
  • Gets more convoluted as columns are added.
  • CHECK
  • (emp_id is NOT NULL and mgr_id is NULL and int_id
    is NULL) OR
  • (emp_id is NULL and mgr_id is NOT NULL and int_id
    is NULL) OR
  • (emp_id is NULL and mgr_id is NULL and int_id is
    NOT NULL)

22
Arc Validation in Other RDBMSs
  • What about other technologies?
  • CHECK constraint works in DB2 or SQL Server.
  • TRIGGERS and FUNCTIONS are common components of
    major RDBMSs but syntax differs.
  • Complex data type options vary from vendor to
    vendor.

23
Implementing the Supertype in Our Example
24
Subtypes or Supertype?
  • We saw the result of implementing subtypes of the
    entity EMS_STAFF without implementing the
    supertype.
  • What would have been the result if we had chosen
    to implement just the supertype without the
    subtypes?

25
Supertype Implementation Issues
  • What potential issues were created?
  • Allowing only interns to have an academic
    institution.
  • Requiring all employees to have a benefit plan
    code (and not allowing contractors to have one.)

26
Resolving Supertype Issues
  • Check Constraints
  • CHECK (ACADEMIC_INSTITUTION is NULL or
    STF_TYPE'INT')
  • CHECK (
  • (BENEFIT_PLAN_CODE is NULL
  • AND STF_TYPE in
  • ('CON','INT'))
  • OR
  • (BENEFIT_PLAN_CODE is not NULL
  • AND STF_TYPE in
  • ('EMP','MGR'))
  • )

27
Non-Modeled Rules
  • Some rules are discussed during the modeling
    process but don't show up on the diagram.
  • Perhaps a timesheet entry must have comments if
    it is for more than 40 hours.
  • CHECK (HOURS lt 40 OR COMMENTS is NOT NULL)

28
One Name at a Time!
  • Name History
  • A person can have multiple names over time but
    should only have one name at any given time.
  • The implementation at right doesn't enforce this
    rule. A person could have 2 names active at the
    same time.

29
Name History Options
  • Ways to enforce "1 name at a time" rule.
  • Trigger on EMS_PERSON_NAMES.
  • Logic could be somewhat complicated.
  • In Oracle this can lead to the "mutating table"
    problem.
  • CURRENT_NAME column on EMS_STAFF, history in
    EMS_PERSON_NAMES.
  • Easier to code enforcement logic.
  • Similar data in two places.
  • Requires some additional considerations.

30
One Possible Name History Solution
  • Store current name on EMS_STAFF.
  • Store history on EMS_PERSON_NAMES.
  • Do not allow date gaps.
  • When changing CURRENT_NAME on EMS_STAFF, insert
    prior name into EMS_PERSON_NAMES.
  • Set END_DATE on EMS_PERSON_NAMES to current
    system date.
  • Set START_DATE on EMS_PERSON_NAMES to
  • END_DATE of previous name OR
  • START_DATE of the person if they have no previous
    name.

31
Design Changes for Name History
  • Leads to two physical design changes
  • CURRENT_NAME added to EMS_STAFF.
  • END_DATE on EMS_PERSON_NAMES is now mandatory.

32
Name History Trigger
  • create or replace trigger NAME_UPDATE_TRIG
  • after UPDATE of CURRENT_NAME on EMS_STAFF
  • for each row
  • DECLARE
  • CURSOR check_old_names (iPerson number) IS
  • select max(end_date) from ems_person_names
  • where stf_id iPerson
  • max_date date
  • BEGIN
  • OPEN check_old_names(new.id)
  • FETCH check_old_names into max_date
  • IF max_date is not NULL THEN
  • insert into ems_person_names values
    (pnm_seq.nextval,
  • old.current_name, new.id, max_date,
    SYSDATE)
  • ELSE
  • insert into ems_person_names values
    (pnm_seq.nextval,
  • old.current_name, new.id,
    new.start_date, SYSDATE)
  • END IF
  • END

33
Non-Modeled Rules Overview
  • These non-modeled rules are important to document
    in some manner.
  • If different people are doing the logical model
    and physical database design, it's important to
    make sure there is a common information
    repository.

34
Controlled Redundancy
  • Sometimes the database design that is developed
    introduces some form of data redundancy.
  • Redundancy, if not controlled, can lead to data
    conflicts.
  • If you're going to introduce redundancy, you need
    to make sure it's controlled!

35
A Simple Redundancy Example
  • PARTS table's primary key is comprised of the
    three foreign key columns.
  • Why do this?
  • Foreign key columns on child tables of IMS_PARTS.
  • How to handle?
  • CHECK
  • (CODE IPS_IDIPT_IDIV_ID)

36
Oracle Mutating Tables
  • ORA-04091 "table EMS_TIMESHEETS is mutating,
    trigger/function may not see it"
  • What does this mean?
  • (From Oracle's Metalink site) "A mutating table
    is a table that is currently being modified by an
    update, delete, or insert statement. If
    your trigger contains a select statement or an
    update statement referencing the table it is
    triggering off of you will receive the error."
  • When might you see this?
  • Attempts to validate data across multiple rows in
    a single table.
  • This error does not occur in DB2 or SQL Server.
    DB2 uses what it calls TRANSITION tables to
    handle referencing data from the triggering
    table.

37
Causing the Mutating Table Error
  • Suppose you want to enforce that no person
    record more than 168 hours of timesheets for a
    single week?
  • create or replace trigger hours_in_week_trig
  • before insert or update on ems_timesheets
  • for each row
  • DECLARE
  • CURSOR check_hours(iPerson NUMBER, iWk DATE) IS
  • select sum(hours) from ems_timesheets
  • where stf_idiPerson and week_start_dateiWk
  • hours number
  • BEGIN
  • OPEN check_hours(new.stf_id,new.week_start_dat
    e)
  • FETCH check_hours INTO hours
  • IF (hours new.hours) gt 168 THEN
  • raise_application_error
  • (-20630,'Hours for employee 'new.stf_id
  • ' in week 'new.week_start_date
  • ' exceeds 168.')
  • END IF
  • END

38
Oh, no! Your table is mutating!
  • Your new trigger works fine on INSERTs.
  • But the first time someone tries to do an UPDATE

39
Mutating Table Resolution Options
  • So how do you prevent these mutants from crashing
    your application?
  • One option is to put this sort of code in your
    application code, not in triggers.
  • This loses the benefits derived from storing
    business rule validation code in the database.
  • Use workarounds within the database.
  • Oracle suggests a package with temporary data
    storage and AFTER triggers.
  • We have used views and INSTEAD OF triggers to
    handle the processing.

40
Views and INSTEAD OF Triggers
  • Create a view of the table to which you want to
    apply data integrity rules .
  • Create an INSTEAD OF trigger on the view. It
    will perform validation checks and if the data is
    valid it will perform the operations originally
    requested.
  • INSTEAD OF triggers are very powerful but can be
    complex to code.

41
INSTEAD OF Trigger Example
  • create or replace trigger
  • hours_in_week_trig
  • instead of insert or update
  • on v_ems_timesheets
  • for each row
  • DECLARE
  • CURSOR check_hours
  • (iPerson NUMBER,
  • iWeek DATE,
  • iKey NUMBER) IS
  • select sum(hours)
  • from v_ems_timesheets
  • where stf_id iPerson
  • and week_start_date iWeek
  • and id ! iKey
  • hours number
  • BEGIN
  • OPEN check_hours
  • (new.stf_id,
  • IF (hours new.hours) gt 168 THEN
  • raise_application_error(-20630,
  • 'Hours for employee 'new.stf_id
  • ' in week 'new.week_start_date
  • ' exceeds 168.')
  • ELSE
  • IF INSERTING THEN
  • insert into ems_timesheets values
  • (new.id, new.week_start_date,
  • new.hours, new.stf_id,
  • new.prj_id, new.comments)
  • ELSE
  • update ems_timesheets set
  • week_start_date
  • new.week_start_date,
  • hours new.hours,
  • stf_id new.stf_id,
  • prj_id new.prj_id,
  • comments new.comments

42
NO MORE MUTANTS!
43
INSTEAD OF Trigger Warning
  • IMPORTANT
  • If you set up an INSTEAD OF trigger on a view to
    enforce data integrity you must be certain to
    ONLY grant privileges on the view, not on the
    underlying table.
  • INSERTs, UPDATEs, or DELETEs against the table
    EMS_TIMESHEETS will NOT be validated by this
    trigger.

44
Be Kind to Your Users
  • Don't forget that you're writing application code
    with which others will be interacting.
  • Write clear error messages.
  • "Arc violation!" is not a good error message.
  • "Arc rules not followed for columns ABC, DEF on
    YOUR_TABLE." is better.
  • "One and only one of columns ABC, DEF may be
    populated for any row in YOUR_TABLE." is even
    more clear.
  • Remember to comment your code for future
    maintainers.
  • And, of course, test your design THOROUGHLY
    before rolling it out.

45
Pros and Cons, Summary, and QA
46
What's the Bad News?
  • There's bad news?
  • Up-Front Expenditures
  • Increases database setup time.
  • Need for new skill sets
  • Has benefits, but is part of the up-front cost.
  • Integrity check execution does take some time.
  • But they have to be done somewhere or you take
    the risk of invalid data existing in your
    database.
  • Different methods of data integrity validation
    can have significantly different performance.

47
What's the Good News? (1)
  • You can remove part of the risk of bad data.
  • Can't remove the risk of incorrect data.
    (Misspelled names, etc.)
  • CAN remove or substantially mitigate the risk of
    data which is invalid per your business rules.
  • The upfront costs may be recouped, at least in
    part, during your development process.
  • Code can be reused.
  • Testing time may be decreased.

48
More Good News!
  • Long-Term Benefits
  • Better data quality.
  • Tangible Benefits (decreased costs for postage,
    etc.)
  • Intangible Benefits (customer confidence, etc.)
  • Easier maintenance. (If a business rule changes,
    only one piece of code needs to be modified.)

49
Summary (1)
  • Data integrity is important!
  • One study estimates costs to US business of 611
    billion per year just from the impact of poor
    quality customer data.
  • "Data Quality and the Bottom Line" (Eckerson)
  • Storing integrity checking code in the database
  • Minimizes the amount of code to be written
  • Ensures that integrity checks are consistently
    applied.
  • Simplifies future maintenance of the integrity
    checks.

50
Summary (2)
  • In even a one-to-one logical to physical
    conversion there may be business rules which are
    not enforced by the database's core features.
  • Certain design decisions can bring about the need
    for additional rule enforcement.
  • Database features such as triggers, functions,
    and check constraints can help set up needed rule
    enforcement.

51
Summary (3)
  • There are upfront costs but they are typically
    worth it.
  • Make sure to write clear error messages.

52
QA
  • Time permitting, I'll take questions!

53
References
  • DB2 Technical Manuals by IBM
  • Oracle Technical Manuals by Oracle
  • MSDN Library by Microsoft
  • "Data Quality and the Bottom Line" by Wayne W.
    Eckerson
  • Oracle Metalink
  • http//metalink.oracle.com/
  • "Triggers and DB2 Version 6" by Craig S. Mullins
  • http//www.craigsmullins.com/db2_triggers.htm
  • "Using Check Constraints to Simulate Domains" by
    Craig S. Mullins
  • http//www.craigsmullins.com/dbu_1201.htm
  • "DB2 User-Defined Functions" by Andrew Pletch
  • http//www.mcs.newpaltz.edu/pletch/db2_UserDefine
    dFunctions.html
  • Multiple "How do I?" features
  • http//www.easy-sql-server.com/

54
THANK YOU!
  • THANK YOU!
  • More Questions/Comments?
  • My E-mail
  • haynesm_at_nationwide.com
  • OR
  • michael_haynes_at_yahoo.com
Write a Comment
User Comments (0)
About PowerShow.com