Title: Tyson Rottweiler 15MAR2000. Do we get a warning? Join
1SAS PROC SQL or Vanilla Flavor
- Cecilia Mauldin
- January 31 2007
2What is Special in SAS PROC SQL?
- Things you already know from data step
programming - Some differences between data step and proc sql
syntax - Order of execution
- How good is my code?
3The Data
- DATA dogs
- LENGTH name breed 20
- dob 10
- INPUT name breed dob
- CARDS
- Nala Airdale 11jan1990
- Nina AMBR 22apr1997
- Jadzia Pointer 15feb1999
- Rocco MMBNR 08may1980
- Bruja Rottweiler 15aug2000
- Chacal Rottweiler 29jan1990
- Chata Boxer UNK
- Tyson Rottweiler 15mar2000
-
- RUN
- DATA parents
- LENGTH parent name 20
- INPUT parent name
- CARDS
- Ina Nala
- Ina Chata
- Ina Bruja
- Ina Fidonga
- Chavo Rocco
- Cecilia Nina
- Cecilia Jadzia
-
- RUN
4What is Special in SAS PROC SQL?
- Things you already know from data step
programming - Some differences between data step and Proc SQL
syntax - Order of execution
- How good is my code?
5Things You Already Know from Data Step Programming
- Our old friends where, keep, drop and rename
- In data step
- How do we use them?
- Why do we use them?
- In SAS PROC SQL
- How do we use them?
- Why do we use them?
6Things You Already Know from Data Step Programming
DATA new_data (KEEP RENAME( )
DROP WHERE( )) SET
old_data (KEEP RENAME()
DROP WHERE( )) SAS data
steps RUN
It is written from the PDV when specified on an
output data set.
Is applied when the dataset OLD_DATA is read into
the PDV.
7Data Step Why Do We Use Our Old Friends?
DATA new_data (KEEP RENAME ( )
DROP WHERE( )) SET old_data
(KEEP RENAME() DROP
WHERE( )) SAS data steps RUN
No need to keep observations and/or variables
that are not used in the output data set.
No need to carry observations or variables that
are not going to be used in the PDV.
8Proc SQL How Do We Use Them?
- PROC SQL SQL_OPTIONS
- CREATE TABLE example (KEEP .
-
RENAME( .) -
DROP -
WHERE( .)) - AS SELECT
- FROM data_set (KEEP var1 var2.
- RENAME(var1
newvar1 -
var2 newvar2 -
.) - DROP
- WHERE(.))
- QUIT
9Why Do We Use Them?
- Typing convenience
- Right, left and full joins
10Typing Convenience
- Your dataset has 50 variables, you need to use 45
- Do you want to type 45 variables?
- Do you carry 5 variables that you dont need?
11Typing Convenience contd
SELECT var1, var2, , var45 FROM data_set
SELECT FROM data_set (DROPvar46 var50)
12Typing Convenience contd
First name Breed Nala
Airdale Nina AMBR Jadzia
Pointer Rocco MMBNR Bruja
Rottweiler Chacal
Rottweiler Chata Boxer Tyson
Rottweiler
SELECT name AS firstname,
breed FROM dogs
- SELECT
- FROM dogs (DROPdob
- RENAME(namefirstname))
13Joins
- Avoid warning messages when creating a data set
because two variables in two different datasets
have the same name - Limit the records that come in a join
14Joins contd
- Avoid WARNING messages when 2 variables have the
same name
CREATE TABLE final AS SELECT a., b. FROM
data_set1 a, data_set2 b WHERE a.var1b.var1
AND a.var2b.var2 AND a.var3b.var3
15Joins contd
- Avoid WARNING messages when 2 variables have the
same name
CREATE TABLE family AS SELECT a., b. FROM
dogs a, parent b WHERE a.nameb.name
WARNING Variable Name already exists on file
WORK.FAMILY. NOTE Table WORK.FAMILY created,
with 6 rows and 4 columns.
16Joins contd
- Avoid WARNING messages when 2 variables have the
same name
CREATE TABLE final (DROPVAR1B VAR2B VAR3B)
AS SELECT a., b. FROM data_set1 A,
data_set2 (RENAME(var1var1b
var2var2b
var3var3b)) B WHERE a.var1b.var1b AND
a.var2b.var2b AND a.var3b.var3b
17Joins contd
CREATE TABLE family (DROPnickname) AS
SELECT a., b. FROM dogs a, parents
(RENAME(namenickname)) b WHERE
a.nameb.nickname
Name breed DOB Parent Nala Airdale 11JAN1990 Ina N
ina AMBR 22APR2001 Cecilia Jadzia Pointer 15FEB199
9 Cecilia Rocco MMBNR 08MAY2001 Chavo Bruja Rottwe
iler 15AUG2000 Ina Chata Boxer .
Ina
18Joins contd
SELECT a., b. FROM dogs a LEFT JOIN parents
b ON a.nameb.name AND a.breed'Rottweiler
'
Do we get a warning?
Name Breed DOB Parent name Bruja Rottweiler 15AU
G2000 Ina Bruja Chacal Rottweiler 29JAN199
0 Chata Boxer . Jadzia Pointer 15FEB19
99 Nala Airdale 11JAN1990 Nina AMBR 22APR2001 Rocc
o MMBNR 08MAY2001 Tyson Rottweiler 15MAR2000
19Joins contd
CREATE TABLE final AS SELECT a., b. FROM
data_set1 (WHERE(var1keep)) a LEFT JOIN
data_set2 b ON a.var1B.var1b
AND a.var2B.var2b AND
a.var3B.var3b
20Joins contd
SELECT a., b. FROM dogs (WHERE(breed'Rottweile
r'))a LEFT JOIN parents b ON a.nameb.name
Name breed DOB Parent
name Bruja Rottweiler 15AUG2000 Ina
Bruja Chacal Rottweiler 29JAN1990 Tyson Rottweiler
15MAR2000
21What is special in SAS PROC SQL?
- Things you already know from data step programming
- Some differences between data step and SQL
syntax
- Order of execution
- How good is my code?
22Some Differences Between Data Step and SQL Syntax
- Reserved word
- Format modifiers
- Descending sorting
23Reserved Word
- CASE
- Why to use it
- Is there something similar in data step?
- How to use it
- What can be the problem
- How to avoid a problem
24CASE Why to Use It
- There is no IF or SELECT statements in SQL
- Is similar to the SELECT clause in data step
25CASE Why to Use It
DATA final SET parents SELECT (name)
WHEN (Nina') education Novice' WHEN
(Nala') education None' OTHERWISE
education There is hope END
26CASE How to Use It
- SELECT var1, var2,
- CASE vark
- WHEN (value1) THEN constant value
function - WHEN (value3) THEN other constant
function -
- ELSE last value function
- END AS newvar
- FROM data_set
-
27CASE How to Use It contd
Name Education Parent Nala None Ina
Chata There is hope Ina
Bruja There is hope Ina
Fidonga There is hope Ina
Rocco There is hope Chavo
Nina Novice Cecilia Jadzia There is
hope Cecilia
- SELECT
- name,
- CASE name
- WHEN ('Nala') THEN 'None'
- WHEN ('Nina') THEN 'Novice'
- ELSE 'There is
hope' - END AS education LENGTH 15 FORMAT 15.,
- parent
- FROM parent
-
28CASE How to Use It contd
SELECT name,breed, CASE WHEN INDEX
(breed,'MB') THEN PUT(breed,UKC.) ELSE
breed END AS lbreed label "Long Breed" FROM
dogs
Name Breed Long breed Bruja Rottweiler Rottwe
iler Chacal Rottweiler Rot
tweiler Chata Boxer Boxer
Jadzia Pointer Pointe
r Nala Airdale Airdale
Nina AMBR American
Mixed Breed Registry Rocco MMBNR Mexican
Mixed Breed Non-Registered Tyson Rottweiler Rottwe
iler
29CASE What Can Be the Problem
- DATA new
- LENGTH case 20
- SET parents
- IF parent 'Ina' THEN
case'None' - ELSE IF parent 'Chavo' THEN case'Spoiled'
- ELSE
case'Work' - RUN
- PROC PRINT DATAnew
- RUN
Obs Case Parent Name 1 None Ina Nala
2 None Ina Chata 3 None Ina Bruja
4 None Ina Fidonga 5 Spoiled Chavo Rocco
6 Work Cecilia Nina 7 Work Cecilia Jadzia
30CASE What Can Be the Problem contd
SELECT FROM new WHERE parent'Ina'
Case Parent Name None Ina Nala
None Ina Chata None Ina Bruja
None Ina Fidonga
31CASE What Can Be the Problem contd
-
- SELECT FROM new WHERE case'None'
select from new WHERE case'None'
22
76 ERROR 22-322 Syntax
error, expecting one of the following
a name, a quoted string, a numeric constant,
a datetime constant, a missing
value, (, , -, BTRIM,
CALCULATED, CASE, EXISTS, INPUT, LOWER,
NOT, PUT, SELECT, SUBSTRING, TRANSLATE,
UPPER, USER, WHEN, , .
ERROR 76-322 Syntax error, statement will be
ignored.
32CASE How to Avoid a Problem
- SELECT
- FROM new (RENAME(caseeducation))
- WHERE education'None'
33Some Differences Between Data Step and SQL Syntax
- Reserved word
- Format modifiers
- Descending sorting
34Format Modifiers
- What are format modifiers?
- Format modifiers in data step
- Format modifiers in proc SQL
35Format Modifiers in Data Step
DATA ndogs SET dogs idateINPUT(dob,date9.)
IF _ERROR_1 THEN PUT _ALL_ RUN
NOTE Invalid argument to function INPUT at line
196 column 9. nameChata breedBoxer dobUNK
idate. _ERROR_1 _N_3 NOTE Mathematical
operations could not be performed at the
following places. The results of the operations
have been set to missing values. Each
place is given by (Number of times) at
(Line)(Column). 1 at 1969
36Format Modifiers in Data Step contd
DATA ndogs SET dogs idateINPUT(dob, ??
date9.) IF _ERROR_1 THEN PUT _ALL_ RUN
NOTE There were 8 observations read from the
data set WORK.DOGS. NOTE The data set
WORK.NDOGS has 8 observations and 4 variables.
37Format Modifiers in PROC SQL
CREATE TABLE ndogs AS SELECT name,
dob, INPUT(dob, date9.) AS idate FORMAT mmddyy8.
FROM dogs
ERROR Invalid day value NOTE Table WORK.NDOGS
created, with 8 rows and 3 columns.
38Format Modifiers in PROC SQL contd
CREATE TABLE ndogs AS SELECT name, dob,
INPUT(dob,? date9.) AS idate FORMAT mddyy8. FROM
dogs
NOTE Table WORK.NDOGS created, with 8 rows and 3
columns.
39Some Differences Between Data Step and SQL Syntax
- Reserved word
- Format modifiers
- Descending sorting
40Descending Sorting
PROC SORT DATA ndogs BY DESCENDING
idate RUN
SELECT FROM ndogs ORDER BY idate descending
41Descending Sorting contd
- A little lagniappe (pronounced nanap)
SELECT name, dob, INPUT( dob,? date9.) AS
idate FORMAT date9. FROM dogs ORDER BY 3
DESC
Name DOB Idate Bruja 15Aug2000 15AUG2000 Tyson 15
Mar2000 15MAR2000 Jadzia 15Feb1999 15FEB1999 Nina
22APR1997 22APR1997 Chacal 29Jan1990 29JAN1990 Nal
a 11jan1990 11JAN1990 Rocco 08may1980 08MAY1980 Ch
ata UNK .
42What is Special in SAS PROC SQL?
- The program data vector (PDV)
- Some differences between data step and SQL syntax
- Order of execution
- How good is my code?
43The Order of Execution
- SELECT VAR1, VAR2,
- FROM data_set
- FULL LEFT RIGHT INNER
JOIN TABLE2 - ON condition 1
- AND condition 2
- WHERE condition k
- AND condition l
- GROUP BY VAR1, VAR.
- HAVING condition p
- AND condition q
- ORDER BY var1, var2
44The Order of Execution contd
- PROC SQL NUMBER
- SELECT a.breed,
- INPUT(a.dob,? date9.) AS ndob format date9.,
- COALESE(a.name, b.name) AS name, b.parent
- FROM dogs a FULL JOIN parents b
- ON a.nameb.name
Breed NDOB Name Parent 1 Rottweiler 15AUG2000
Bruja Ina 2
Rottweiler 29JAN1990 Chacal
3 Boxer Chata Ina
4 .
Fidonga Ina 5
Pointer 15FEB1999 Jadzia Cecilia 6
Airdale 11JAN1990 Nala Ina 7
AMBR 22APR1997 Nina Cecilia 8
MMBNR 08MAY1980 Rocco Chavo
45The Order of Execution contd
- SELECT a.breed,
- INPUT( a.dob, ? date9.) AS ndob FORMAT
date9., - COALESCE (a.name, b.name) AS name,
- b.parent
- FROM dogs a FULL JOIN parents b
- ON a.nameb.name
- AND breed'Rottweiler'
Breed NDOB Name Parent 1 Rottweiler 15AUG2000
Bruja Ina 2
Rottweiler 29JAN1990 Chacal
3 Boxer Chata Ina
4 .
Fidonga Ina 5
Pointer 15FEB1999 Jadzia Cecilia 6
Airdale 11JAN1990 Nala Ina 7
AMBR 22APR1997 Nina Cecilia 8
MMBNR 08MAY1980 Rocco Chavo
46The Order of Execution contd
- SELECT a.breed,b.parent,
- COALESCE ( a.name, b.name)
- FROM dogs a FULL JOIN parent b
- ON a.nameb.name
- WHERE breed'Rottweiler'
Breed Parent Rottweiler Ina
Bruja Rottweiler Chacal Rottweiler
Tyson
SELECT a.breed, b.parent, COALESCE (b.name,
a.name) FROM dogs a FULL JOIN parent b ON
a.breed'Rottweiler' WHERE a.nameb.name
Breed Parent Rottweiler Ina
Bruja
47The Order of Execution contd
SELECT a.breed, INPUT(a.dob, ? date9.) AS ndob
format date9., COALESCE(a.name, b.name) AS
name, b.parent FROM dogs a FULL JOIN parent
b ON a.nameb.name WHERE
breed'Rottweiler' HAVING ndob GT '01jan2000'd
Breed NDOB Name Parent Rottweiler 15AUG2000 Bruja
Ina Rottweiler 15MAR2000 Tyson
48How Good is My Code?
- Does the options used in data step make my code
faster? - Is speed important?
49How Good is My Code? contd
- Do data step options make my code faster?
data all merge dogs parent by
name do i1 to 100 do j1 to 100
output end end run
NOTE There were 8 observations read from the
data set WORK.DOGS. NOTE There were 7
observations read from the data set
WORK.PARENT. NOTE The data set WORK.ALL has
891000 observations and 6 variables.
50How Good is My Code? contd
- Does the statements used to control the size of
my input data set make my code faster?
proc sql stimer NOTE SQL statement used
real time 0.01 seconds cpu
time 0.02 seconds
51How Good is My Code? contd
- Does the statements used to control the size of
my input data set make my code faster?
SELECT FROM all WHERE breed'Rottweiler'
- NOTE SQL statement used
- real time 10.82 seconds
- cpu time 10.50 seconds
SELECT FROMall (WHERE(breed'Rottweiler'))
NOTE SQL statement used real time
10.85 seconds cpu time 10.61
seconds
52How Good is My Code? contd
- Does the statements used to control the size of
my input data set make my code faster?
SELECT DISTINCT name, breed, dob FROM all WHERE
breed'Rottweiler'
real time 4.90 seconds cpu time 4.35 seconds
SELECT DISTINCT name, breed, dob FROM all (
WHERE(breed'Rottweiler'))
real time 9.42 seconds cpu time 4.57 seconds
SELECT DISTINCT name, breed, dob FROM all
(WHERE(breed'Rottweiler') KEEPname breed dob)
- real time 5.89 seconds
- cpu time 4.30 seconds
53Conclusions
- This presentation showed examples on how and when
to use statements that we already know - KEEP
- RENAME
- DROP
- WHERE
- (things you already know)
54Conclusions
- Possible language problems (line mines)
- Reserved word (CASE)
- Format modifiers
- Descending order
- (things you think you already know)
55Conclusions contd
- Be careful with the order of the conditions
- The order of the conditions affect the result
56Conclusions contd
- When you use SQL, consider if you need all your
variables and observations - If you dont need them consider
- It there is a significant difference in computer
use - Are you a good typist?
57Conclusions contd
- You can have plain vanilla SAS PROC SQL or you
can SASsy it up with data step statements
58Acknowledgements
- PPD management
- Kim Sturgen and Bonnie Duncan
- Andy Barnett
- Darrel Edgley
- Neil Howard, Larry Sleeper and Craig Mauldin
- For their time, although if there are mistakes,
they are all mine I made modifications and
additions after they read the paper
59Contact Information
- Angelina Cecilia Mauldin
- Abraxis BioScience
- 4505 Emperor Blvd. 400
- Durham, NC 27703
- 919 433 8400
- cmauldin_at_abraxisbio.com