Title: Join, Merge or Lookup: Expanding your toolkit
1Join, Merge or Lookup Expanding your toolkit
Sandeep Gaudana
((Thanks to Judy Loren!!)
2Sample Case
Join, Merge or Lookup Expanding Your Toolkit
3Join, Merge or Lookup Expanding Your Toolkit
4Topics Covered
- Pros and Cons of each technique
- Performance Testing Approach
Join, Merge or Lookup Expanding Your Toolkit
5Merge Technique
Join, Merge or Lookup Expanding Your Toolkit
6Technique 1 Merge
Join, Merge or Lookup Expanding Your Toolkit
7Technique 1 Merge
Join, Merge or Lookup Expanding Your Toolkit
8SQL Technique
Join, Merge or Lookup Expanding Your Toolkit
9Technique 2 Proc SQL
_method
Join, Merge or Lookup Expanding Your Toolkit
10Technique 2 Proc SQL
Join, Merge or Lookup Expanding Your Toolkit
11Format Technique
Join, Merge or Lookup Expanding Your Toolkit
12Technique 3 Proc Format
data key set libc.admissions
(rename(member_id start)) retain fmtname
'key' label put(admit_dt,yymmdd10.) hlo
' ' output if _n_ 1 then do start
'other' hlo 'o' label 'NO'
output end drop admit_dt run
proc format cntlinkey run
Join, Merge or Lookup Expanding Your Toolkit
13Technique 3 Proc Format
data result set liba.members where
put(member_id,key.) ne 'NO' admit_dt
input(put(member_id,key.),yymmdd10.) run
Join, Merge or Lookup Expanding Your Toolkit
14Technique 3 Proc Format
Join, Merge or Lookup Expanding Your Toolkit
15Indexing Technique
Join, Merge or Lookup Expanding Your Toolkit
16Technique 4 Index Lookup
proc datasets libliba modify members index
create member_id quit
Join, Merge or Lookup Expanding Your Toolkit
17Technique 4 Index Lookup
Join, Merge or Lookup Expanding Your Toolkit
18Hash Technique
Join, Merge or Lookup Expanding Your Toolkit
19Technique 5 Hash Objects
Join, Merge or Lookup Expanding Your Toolkit
20Technique 5 Hash Objects
Join, Merge or Lookup Expanding Your Toolkit
21Better Technique ?
Join, Merge or Lookup Expanding Your Toolkit
22Performance Testing Approach
- Most concerned about large file
- Compare different sizes of second file
- Test under different conditions
- Large File already sorted
- Large File already indexed
Join, Merge or Lookup Expanding Your Toolkit
23System Used for tests
- Microsoft Windows 2000 5.00.2195 Service Pack
4 - 2.40 TByte attached disk for permanent storage
- separate 273 GByte capacity attached disk for
work space - 2 Intel Xeon MP CPUs (3.00 GHz)
- 3.8 GB RAM
- Large dataset 5,010,996 obs and 86 variables
Join, Merge or Lookup Expanding Your Toolkit
24Performance Testing Table 1
Large Dataset ( 5 mn records) Not Sorted
Indexed for Key Indexing Only
Performance Testing Stopped at 50
Join, Merge or Lookup Expanding Your Toolkit
25Performance Testing Graph 1
Join, Merge or Lookup Expanding Your Toolkit
26Performance Testing Table 2
Large Dataset ( 5 mn records) Indexed
Join, Merge or Lookup Expanding Your Toolkit
27Performance Testing Graph 2
Join, Merge or Lookup Expanding Your Toolkit
28Guidelines
- What kind of join do you need?
- Existing sorts and indexes
- Frequency of access
- Number of variables in lookup
- Number of variables in match key
- Size of datasets (absolute and relative)
- System constraints
- Code support/maintenance
Join, Merge or Lookup Expanding Your Toolkit
29Performance Macro
performance_macro(detail,
dataset_to_sample,
program_path,
macro_path,
output_path,
log_path,
low_limit0,
high_limit95,
range5,
plot_graph 0,
replaceY
)
Join, Merge or Lookup Expanding Your Toolkit
30Performance Macro Sample Usage
/Program \\srv6\sug08\programs\SQL_Match_1_Var.
sas//Proc SQL technique/proc sql create
table sql_out as select a. from wh.members
a join
small_dataset b on a.contno b.contno quit
performance_macro(detail SQL_Large_sorted_Match
_One_Var,
dataset_to_sample wh.members_all,
program_path
\\srv6\sug08\programs\SQL_Match_1_Var.sas,
macro_path
\\srv6\sug08\macros,
output_path \\srv6\sug08\outputs,
log_path
\\srv6\a\sug08\logs,
low_limit 0,
high_limit 95,
range 5,
plot_graph 0,
replace Y)
Join, Merge or Lookup Expanding Your Toolkit
31Performance Macro Output
Join, Merge or Lookup Expanding Your Toolkit
32Performance Measurement LOGPARSE
Mike Raithel Programmatically Measure SAS
Application Performance on Any Computer
Platform with the New LOGPARSE SAS Macro.
SUGI 30 www2.sas.com/proceedings/sugi30/219-30.p
df
Join, Merge or Lookup Expanding Your Toolkit
33Conclusions
- Many techniques for joining exist
- Performance can vary widely
- Choice depends on many factors
- To be effective, have many tools
- More tools ? more fun
Join, Merge or Lookup Expanding Your Toolkit
34Questions ?
Join, Merge or Lookup Expanding Your Toolkit
35Contact InformationSandeep GaudanaIngenix 14
Central Park Dr Hooksett, NH 03106sandeep.gauda
na_at_ingenix.com
Join, Merge or Lookup Expanding Your Toolkit