Title: Member Matching Algorithm
1Member Matching Algorithm
BASUG Meeting
Sandeep Gaudana
2Case Problem
BASUG Meeting
Member Matching Algorithm
3Things to consider
BASUG Meeting
- What is the end goal ?
- What level of Accuracy is needed?
- What is the impact of incorrect member matching?
- What level of reporting is required?
- What size of data we are dealing with?
- How much does performance matter?
Member Matching Algorithm
4THE Most Important Question !!!
Level of Accuracy at RISK ??
Member Matching Algorithm
5Approach
BASUG Meeting
Member Matching Algorithm
6Approach
BASUG Meeting
Member Matching Algorithm
7Approach (Contd..)
BASUG Meeting
Match Key 2 Email, Fname, Lname, Phone
Member Matching Algorithm
8Approach (Contd..)
BASUG Meeting
Match Key 3 Fname, Lname, Phone
Member Matching Algorithm
9Program Overview
BASUG Meeting
Step 1 Import Files
/Import Raw Membership File/ proc import out
members datafile
e\sgaudana\temp\basug\2009Q1\Members.txt"
dbmsdlm replace
delimiter'7C'x getnamesYES
datarow2 guessingrows500 run /Impor
t Raw Registration File/ proc import out
registrations datafile
"E\sgaudana\temp\basug\2009Q1\Registrations.txt"
dbmsdlm replace
delimiter'7C'x getnamesYES
datarow2 guessingrows500 run
Import Members Registrants files
Member Matching Algorithm
10Program Overview (Contd..)
BASUG Meeting
Step 2 Cleanup
data clean_registrations set registrations
row _n_ /Set the row number/
organization upcase(organization) email
upcase(email) fname upcase(fname) lname
upcase(lname) phone compress(phone,'-)(.
') run data clean_members set members
organization upcase(organization) email
upcase(email) fname upcase(fname) lname
upcase(lname) phone compress(phone,'-)(.
') run
Cleanup Members Registrants dataset
Member Matching Algorithm
11Program Overview (Contd..)
BASUG Meeting
Step 3 Create a Macro
global lookup_order 0 macro
member_lookup(match_key) let lookup_order
eval(lookup_order 1) /Count number of
records for a particular combination of
match_key / let group_keysysfunc(compress(
match_key, )) proc sql create
table sel_clean_members as select
group_key,count() as count from
clean_members group by group_key
quit
Group Members table by Match_key and Count
records
Member Matching Algorithm
12Program Overview (Contd..)
BASUG Meeting
Step 3 (contd..)
data match_member(keep match_key row order
match_ind count) length match_key 200.
match_ind 1. if 0 then set
sel_clean_members /Create Hash table from
Members table/ dcl hash members (dataset
'work.sel_clean_members')
members.definekey(match_key)
members.definedata('count')
members.definedone() /If Records from
Registrants table has already been found in
Members table then add them to a separate hash
table/ if sysfunc(exist(work.append_list))
then do dcl hash match (dataset
'work.append_list') match.definekey('row
') match.definedone() end
Hash Table from Members
Hash Table from Registrants that already Matched
with Members
Member Matching Algorithm
13Program Overview (Contd..)
BASUG Meeting
/Set Registration dataset/ do until (eof)
set clean_registrations end eof
call missing(order, match_ind, match_key)
if members.find() 0 if sysfunc(exist(work.app
end_list)) then do
and match.find() ne 0
end
then do match_key
"match_key" order
lookup_order 1 if count 1
then match_ind 'L' else
match_ind 'M' output
match_member end end
stop run proc append base
append_list data match_member run mend
member_lookup
Lookup from Members table check if record has
already been matched
Member Matching Algorithm
14Program Overview (Contd..)
BASUG Meeting
Step 4 Create Output Dataset
proc sort data append_list by
row run data out.SASChallenge_SG(keep fname
lname organization phone email member
match_ind match_key order) length fname 15
lname 15 organization 30 phone 10 email 50
member 1 match_ind 1 match_key 200
order 3 merge out_registrant append_list by
row match_key compress(upcase(match_key),"'
") if match_key '' then match_key 'NO
MATCH' if match_ind '' then match_ind
'N' if match_ind 'N' then Member 'N'
else Member 'Y' run
Member Matching Algorithm
15Match Report
BASUG Meeting
Match_Ind L Linked to only one record in
Member M Linked to
Multiple Records in Member
N No Match
Member Matching Algorithm
16BASUG Meeting
Questions ?
Member Matching Algorithm
17BASUG Meeting
Contact InformationSandeep GaudanaIngenix 14
Central Park Dr Hooksett, NH 03106sandeep.gauda
na_at_ingenix.comPhone 207.409.7219