Selecting Records, Maintaining Uniqueness, and Minimizing Duplication in an Immunization Registry - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Selecting Records, Maintaining Uniqueness, and Minimizing Duplication in an Immunization Registry

Description:

Selecting Records, Maintaining Uniqueness, and Minimizing ... Last Name: Von Nostrum Von Nostrum. Date of Birth: 5/4/88 5/4/88. Sex: M M. Birth Order: 1 2 ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 35
Provided by: jonathanda
Category:

less

Transcript and Presenter's Notes

Title: Selecting Records, Maintaining Uniqueness, and Minimizing Duplication in an Immunization Registry


1
Selecting Records, Maintaining Uniqueness, and
Minimizing Duplication in an Immunization Registry
  • Robert Rosofsky and Jonathan Mosley
  • Massachusetts Immunization Information System
  • Massachusetts Department of Public Health
  • 305 South Street, 5th Floor
  • Jamaica Plain, MA 02130
  • 617-983-6836, Fax 617-983-6926
  • Robert.Rosofsky_at_state.ma.us
  • April, 1999

2
MIIS Data Sources
  • Birth records, providers, local IISs, health
    networks
  • Quality and completeness of data from each source
    varies greatly

3
MIIS Identifier Fields
  • Mandatory Fields
  • First name
  • Last name
  • Date of birth
  • Sex
  • Preferred Fields
  • Middle name
  • Mothers maiden name
  • Mothers date of birth
  • Birth order
  • Birth facility

4
MIIS Data Processing Challenges
  • Aggregate data from multiple sources into a
    single history
  • Prevent duplicate records in the central database
  • Process large volumes of data on a daily basis
    with minimal manual intervention

5
Matching Example 1
Record 1 Record 2 First Name Jasmine Jasmine
Middle Name D Danyelle Last Name Hendrix Hend
rix Date of Birth 01/01/95 01/01/95 Sex F F
Birth Order 0 Mothers Maiden Anderson Moth
ers DOB 10/10/50 10/10/50 Birthplace Code 1234
6
Matching Example 2
Record 1 Record 2 First Name Jasmine Jazmynn Mid
dle Name Danyelle D Last Name Hendrix Hendricks
Date of Birth 01/01/95 01/01/95 Sex F F Birth
Order 0 Maiden Anderson Abdnerson Mothers
DOB 10/10/50 10/10/50 Birthplace Code 1234
7
Basic Approach
  • Prevention! Avoid duplicates in database
  • Database is searched prior to inserting new
    records
  • Compare received records to records already in
    the database
  • Matches? Merge the records
  • Doesnt match? Insert new record into database

8
Assumptions Concepts
  • Records can be electronically linked despite
    differences between them
  • The more information that two records have in
    common, the greater the likelihood the two
    records match
  • Degree of similarity between records can be
    expressed numerically

9
Using a Matching ScoreThresholds
1.0
Records are the same Require manual
resolution Records are unique
Duplicate Threshold
Unique Threshold
0.0
Score
10
Fundamental Principles
  • A person querying or supplying data to the
    database would provide all the identifying
    information that s/he knew about an individual
  • Computer makes a best guess as to which
    database record most closely matches the received
    record

11
Computing a Matching ScoreSimple Proportion
Record 1 Record 2 First Name Jasmine Jazmynn
Middle Name Danielle D Last Name Hendrix Hendr
icks Date of Birth 01/01/95 ? 01/01/95 Sex F ?
F Birth Order 0 ? 0 Maiden Anderson Abdn
erson Mothers DOB 10/10/50 ? 10/10/50 SCORE

12
Computing a Matching Score Weighted Proportions
  • Not all data elements are equally informative.
  • Fields with extensive variation are generally the
    most informative.
  • It is desirable to give these informative fields
    more weight when computing a score.

13
Matching ScoreA Weighted Proportion
Record 1 Record 2 Weight First Jasmine Jazmynn
1.2 Middle Danielle D 0.6 Last Hendrix Hendrick
s 1.5 Birth Date 01/01/95 01/01/95 1.7 ? Sex F
F 0.3 ? Birth Order 0 0 0.2 ? Mothers
Maiden Anderson Abdnerson 1.0 Mothers
DOB 10/10/50 10/10/50 1.5 ? WEIGHTED SCORE
14
Matching ScoreLog Transformations
  • Only those fields present in both records are
    compared
  • Score is adjusted to reflect the number of fields
    used in its calculation
  • Adjustment is made with a log transformation,
    using the of fields used in the comparison

15
Log-Transformations
Base (n) Score 3 4 5 6 7
8 0.50 0.37 0.50 0.57 0.61 0.64 0.67 0.60 0.5
4 0.63 0.68 0.71 0.74 0.75 0.70 0.68 0.74 0.78 0.8
0 0.82 0.83 0.80 0.80 0.84 0.86 0.88 0.89 0.89 0.9
0 0.90 0.92 0.93 0.94 0.95 0.95 1.00 1.00 1.00 1.0
0 1.00 1.00 1.00 Transformed score 1
Logn(Score) logn(n score)
16
Scoring Equations
Simple proportion Score Weighted
proportion Score Log-transformed weighted
proportion Score where n Number of
nonblank comparisons Ignores partial scoring of
fields.
17
Assigning a Field ScoreNames
  • Due to the redundancy inherent in many names, it
    is undesirable that they be scored in a all or
    nothing manner.
  • Example joxathan
  • The MIIS uses two methods to assign a partial
    score to name fields that are not identical.
  • Approximate Match Method
  • NYSIIS scoring

18
Name StringsApproximate Match Method
  • Names differ because of random differences
    (e.g. typos, ignorance of true spelling)
  • Count minimum number of character insertions,
    deletions and changes required to transform the
    one name string to another

19
Assigning a Field ScoreNames (Contd)
Name 1 Name 2 I - D - C Score Jonathan Jonathan
0 - 0 - 0 1.0 Hendrix Hendericks 3 - 0 -
1 0.78 Rowsofskie Rosofsky 0 - 2 -
1 0.85 Smith Smith-Jones 6 - 0 -
0 0.67 McCarthur Mac Arthur 1 - 1 -
0 0.90 John Joan 0 - 0 - 1 0.79
20
Name Strings NYSIIS Method
  • NYSIIS assumes that name fields differ because of
    common misspellings particular to the English
    language.
  • A code is generated for a name by assigning
    specified characters to each character or group
    of characters in a name.
  • These codes are then compared.

21
Example of NYSIIS Coding
Name NYSIIS Code Hendrix handrac Hinndricks ha
ndrac Henderix handarac Rosofsky rasafsc Rowso
fskie rasafsc Knight nat Nite nat Uses a
modified version of the original NYSIIS coding.
22
Names and NYSIIS (Contd)
  • A NYSIIS score is computed by multiplying an
    Approximate Match Score for the coded names by
    the ratio of the sum of the length of the NYSIIS
    codes to the sum of the lengths of the names.

23
Date Scoring
  • Dates are processed as strings of digits
    (YYYYMMDD)
  • A score is computed identically to the
    Approximate Matching Method used for name strings

24
MIIS Scoring Example 1
Record 1 Record 2 First Name Jasmine Jazmynn Mid
dle Name Danielle D Last Name Hendrix Hendricks
Date of Birth 01/01/95 01/01/95 Sex F F Birth
Order 0 0 Mothers Maiden Anderson Abdnerson Mot
hers DOB 10/10/50 10/10/50 MIIS Score 0.80
25
MIIS Scoring Example 2
Record 1 Record 2 First Name Cindy Cindy Middle
Name Elizabeth Elizabeth Last Name Castaneda Cas
taneda Date of Birth 2/5/97 2/5/97 Sex F F Birth
Order 0 0 Mothers Maiden Dentremont Cathy
Dentrement Mothers DOB 2/5/79 ltblankgt MIIS
Score 0.98
26
MIIS Scoring Example 3Limited/Uninformative
Data
Record 1 Record 2 First Name Michaela Michelle M
iddle Name Kay Kelly Last Name Wronkowski Wroble
wski Date of Birth 1/5/97 1/5/97 Sex F F Birth
Order 0 0 Mothers Maiden Jones Stockdale Mother
s DOB 5/8/63 11/27/68
27
MIIS Scoring Example 3 (Contd)
Fields Compared Score First, last,
DOB 0.88 First, last, DOB, sex, birth
order 0.95 First, last, DOB, sex, birth order,
Moms DOB 0.79 First, last, DOB, sex, birth
order, MDOB, m. maiden 0.67 First, last, DOB,
sex, birth ord., MDOB, m. maiden,
middle 0.60 First, last, DOB, MDOB, mothers
maiden 0.55 First, last, DOB, MDOB, mothers
maiden, middle 0.47
28
MIIS Example 4Twins!!!
Record 1 Record 2 First Name Bladamir Gladamir M
iddle Name Last Name Von Nostrum Von
Nostrum Date of Birth 5/4/88 5/4/88 Sex M M Birt
h Order 1 2 Mothers MaidenThomas Thomas Mother
s DOB 9/17/60 9/17/60 MIIS Score 0.99
29
Database Candidate Records
  • A set of candidate records must be selected for
    comparison that
  • Is likely to contain the record being compared
  • Contains as few additional records as possible
  • All records that have
  • Same date of birth
  • Same NYSIIS code first character for last name
    are examined

30
Advantages of Selecting Candidate Records
  • Search strategy is apt to find a matching record
  • Uses data that is contained in each record in the
    database
  • Enhances performance as not all records are
    examined

31
Disadvantages of Selecting Candidate Records
  • Can miss a true match if either
  • date of birth is incorrect or
  • first letter(s) of the last name is not a
    phonetic variation of the first letter(s) of the
    true last name.
  • Apt to return a large set of candidate records.
  • Does not use alternate search strategies
    employing other database fields.

32
Advantages of MIIS Matching Procedures
  • Allows automation of record linking and
    deduplication.
  • Candidate records can be prioritized according to
    the likelihood that they match a given set of
    data elements.
  • Parameter driven and can be modified to
    accommodate the idiosyncrasies of each data
    source.

33
Disadvantages of MIIS Matching Procedures
  • Can make false matches when data is limited
  • Requires extensive computer processing resources

34
Lessons Learned
  • Procedures and decisions must be data-driven.
  • Deduplication will always involve manual
    resolution.
  • Procedures will need continual evaluating and
    monitoring.
  • Always err on the conservative side.
Write a Comment
User Comments (0)
About PowerShow.com