Join, Merge or Lookup: Expanding your toolkit - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Join, Merge or Lookup: Expanding your toolkit

Description:

Pros and Cons of each technique. Performance Testing Approach. Results. User Guidelines ... Can take advantage of existing sortation. Disadvantages: ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 36
Provided by: sg174
Category:

less

Transcript and Presenter's Notes

Title: Join, Merge or Lookup: Expanding your toolkit


1
Join, Merge or Lookup Expanding your toolkit
Sandeep Gaudana
((Thanks to Judy Loren!!)
2
Sample Case
Join, Merge or Lookup Expanding Your Toolkit
3
Join, Merge or Lookup Expanding Your Toolkit
4
Topics Covered
  • Techniques of joining
  • Pros and Cons of each technique
  • Performance Testing Approach
  • Results
  • User Guidelines
  • Performance Macro

Join, Merge or Lookup Expanding Your Toolkit
5
Merge Technique
Join, Merge or Lookup Expanding Your Toolkit
6
Technique 1 Merge
Join, Merge or Lookup Expanding Your Toolkit
7
Technique 1 Merge
Join, Merge or Lookup Expanding Your Toolkit
8
SQL Technique
Join, Merge or Lookup Expanding Your Toolkit
9
Technique 2 Proc SQL
_method
Join, Merge or Lookup Expanding Your Toolkit
10
Technique 2 Proc SQL
Join, Merge or Lookup Expanding Your Toolkit
11
Format Technique
Join, Merge or Lookup Expanding Your Toolkit
12
Technique 3 Proc Format
data key set libc.admissions
(rename(member_id start)) retain fmtname
'key' label put(admit_dt,yymmdd10.) hlo
' ' output if _n_ 1 then do start
'other' hlo 'o' label 'NO'
output end drop admit_dt run
proc format cntlinkey run
Join, Merge or Lookup Expanding Your Toolkit
13
Technique 3 Proc Format
data result set liba.members where
put(member_id,key.) ne 'NO' admit_dt
input(put(member_id,key.),yymmdd10.) run
Join, Merge or Lookup Expanding Your Toolkit
14
Technique 3 Proc Format
Join, Merge or Lookup Expanding Your Toolkit
15
Indexing Technique
Join, Merge or Lookup Expanding Your Toolkit
16
Technique 4 Index Lookup
proc datasets libliba modify members index
create member_id quit
Join, Merge or Lookup Expanding Your Toolkit
17
Technique 4 Index Lookup
Join, Merge or Lookup Expanding Your Toolkit
18
Hash Technique
Join, Merge or Lookup Expanding Your Toolkit
19
Technique 5 Hash Objects
Join, Merge or Lookup Expanding Your Toolkit
20
Technique 5 Hash Objects
Join, Merge or Lookup Expanding Your Toolkit
21
Better Technique ?
Join, Merge or Lookup Expanding Your Toolkit
22
Performance Testing Approach
  • Most concerned about large file
  • Compare different sizes of second file
  • Test under different conditions
  • Large File already sorted
  • Large File already indexed
  • 1 variable key
  • Multi variable key
  • Automate Measurement

Join, Merge or Lookup Expanding Your Toolkit
23
System Used for tests
  • Microsoft Windows 2000 5.00.2195 Service Pack
    4
  • 2.40 TByte attached disk for permanent storage
  • separate 273 GByte capacity attached disk for
    work space
  • 2 Intel Xeon MP CPUs (3.00 GHz)
  • 3.8 GB RAM
  • Large dataset 5,010,996 obs and 86 variables

Join, Merge or Lookup Expanding Your Toolkit
24
Performance Testing Table 1
Large Dataset ( 5 mn records) Not Sorted
Indexed for Key Indexing Only
Performance Testing Stopped at 50
Join, Merge or Lookup Expanding Your Toolkit
25
Performance Testing Graph 1
Join, Merge or Lookup Expanding Your Toolkit
26
Performance Testing Table 2
Large Dataset ( 5 mn records) Indexed
Join, Merge or Lookup Expanding Your Toolkit
27
Performance Testing Graph 2
Join, Merge or Lookup Expanding Your Toolkit
28
Guidelines
  • What kind of join do you need?
  • Existing sorts and indexes
  • Frequency of access
  • Number of variables in lookup
  • Number of variables in match key
  • Size of datasets (absolute and relative)
  • System constraints
  • Code support/maintenance

Join, Merge or Lookup Expanding Your Toolkit
29
Performance Macro
performance_macro(detail,
dataset_to_sample,
program_path,
macro_path,

output_path,
log_path,
low_limit0,
high_limit95,
range5,
plot_graph 0,
replaceY
)
Join, Merge or Lookup Expanding Your Toolkit
30
Performance Macro Sample Usage
/Program \\srv6\sug08\programs\SQL_Match_1_Var.
sas//Proc SQL technique/proc sql create
table sql_out as select a. from wh.members
a join
small_dataset b on a.contno b.contno quit
performance_macro(detail SQL_Large_sorted_Match
_One_Var,
dataset_to_sample wh.members_all,
program_path
\\srv6\sug08\programs\SQL_Match_1_Var.sas,
macro_path
\\srv6\sug08\macros,
output_path \\srv6\sug08\outputs,
log_path
\\srv6\a\sug08\logs,
low_limit 0,
high_limit 95,
range 5,
plot_graph 0,
replace Y)
Join, Merge or Lookup Expanding Your Toolkit
31
Performance Macro Output
Join, Merge or Lookup Expanding Your Toolkit
32
Performance Measurement LOGPARSE
Mike Raithel Programmatically Measure SAS
Application Performance on Any Computer
Platform with the New LOGPARSE SAS Macro.
SUGI 30 www2.sas.com/proceedings/sugi30/219-30.p
df
Join, Merge or Lookup Expanding Your Toolkit
33
Conclusions
  • Many techniques for joining exist
  • Performance can vary widely
  • Choice depends on many factors
  • To be effective, have many tools
  • More tools ? more fun

Join, Merge or Lookup Expanding Your Toolkit
34
Questions ?
Join, Merge or Lookup Expanding Your Toolkit
35
Contact InformationSandeep GaudanaIngenix 14
Central Park Dr Hooksett, NH 03106sandeep.gauda
na_at_ingenix.com
Join, Merge or Lookup Expanding Your Toolkit
Write a Comment
User Comments (0)
About PowerShow.com