Tyson Rottweiler 15MAR2000. Do we get a warning? Join - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Tyson Rottweiler 15MAR2000. Do we get a warning? Join

Description:

Tyson Rottweiler 15MAR2000. Do we get a warning? Joins cont'd. Limit records ... FROM dogs (WHERE=(breed='Rottweiler'))a. LEFT JOIN parents b. ON a.name=b.name ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 60
Provided by: richwe
Learn more at: https://www.rtsug.org
Category:

less

Transcript and Presenter's Notes

Title: Tyson Rottweiler 15MAR2000. Do we get a warning? Join


1
SAS PROC SQL or Vanilla Flavor
  • Cecilia Mauldin
  • January 31 2007

2
What is Special in SAS PROC SQL?
  • Things you already know from data step
    programming
  • Some differences between data step and proc sql
    syntax
  • Order of execution
  • How good is my code?

3
The Data
  • DATA dogs
  • LENGTH name breed 20
  • dob 10
  • INPUT name breed dob
  • CARDS
  • Nala Airdale 11jan1990
  • Nina AMBR 22apr1997
  • Jadzia Pointer 15feb1999
  • Rocco MMBNR 08may1980
  • Bruja Rottweiler 15aug2000
  • Chacal Rottweiler 29jan1990
  • Chata Boxer UNK
  • Tyson Rottweiler 15mar2000
  • RUN
  • DATA parents
  • LENGTH parent name 20
  • INPUT parent name
  • CARDS
  • Ina Nala
  • Ina Chata
  • Ina Bruja
  • Ina Fidonga
  • Chavo Rocco
  • Cecilia Nina
  • Cecilia Jadzia
  • RUN

4
What is Special in SAS PROC SQL?
  • Things you already know from data step
    programming
  • Some differences between data step and Proc SQL
    syntax
  • Order of execution
  • How good is my code?

5
Things You Already Know from Data Step Programming
  • Our old friends where, keep, drop and rename
  • In data step
  • How do we use them?
  • Why do we use them?
  • In SAS PROC SQL
  • How do we use them?
  • Why do we use them?

6
Things You Already Know from Data Step Programming
DATA new_data (KEEP RENAME( )
DROP WHERE( )) SET
old_data (KEEP RENAME()
DROP WHERE( )) SAS data
steps RUN
It is written from the PDV when specified on an
output data set.
Is applied when the dataset OLD_DATA is read into
the PDV.
7
Data Step Why Do We Use Our Old Friends?
DATA new_data (KEEP RENAME ( )
DROP WHERE( )) SET old_data
(KEEP RENAME() DROP
WHERE( )) SAS data steps RUN
No need to keep observations and/or variables
that are not used in the output data set.
No need to carry observations or variables that
are not going to be used in the PDV.
8
Proc SQL How Do We Use Them?
  • PROC SQL SQL_OPTIONS
  • CREATE TABLE example (KEEP .

  • RENAME( .)

  • DROP

  • WHERE( .))
  • AS SELECT
  • FROM data_set (KEEP var1 var2.
  • RENAME(var1
    newvar1

  • var2 newvar2

  • .)
  • DROP
  • WHERE(.))
  • QUIT

9
Why Do We Use Them?
  • Typing convenience
  • Right, left and full joins

10
Typing Convenience
  • Your dataset has 50 variables, you need to use 45
  • Do you want to type 45 variables?
  • Do you carry 5 variables that you dont need?

11
Typing Convenience contd
SELECT var1, var2, , var45 FROM data_set
SELECT FROM data_set (DROPvar46 var50)
12
Typing Convenience contd
First name Breed Nala
Airdale Nina AMBR Jadzia
Pointer Rocco MMBNR Bruja
Rottweiler Chacal
Rottweiler Chata Boxer Tyson
Rottweiler
SELECT name AS firstname,
breed FROM dogs
  • SELECT
  • FROM dogs (DROPdob
  • RENAME(namefirstname))

13
Joins
  • Avoid warning messages when creating a data set
    because two variables in two different datasets
    have the same name
  • Limit the records that come in a join

14
Joins contd
  • Avoid WARNING messages when 2 variables have the
    same name

CREATE TABLE final AS SELECT a., b. FROM
data_set1 a, data_set2 b WHERE a.var1b.var1
AND a.var2b.var2 AND a.var3b.var3
15
Joins contd
  • Avoid WARNING messages when 2 variables have the
    same name

CREATE TABLE family AS SELECT a., b. FROM
dogs a, parent b WHERE a.nameb.name
WARNING Variable Name already exists on file
WORK.FAMILY. NOTE Table WORK.FAMILY created,
with 6 rows and 4 columns.
16
Joins contd
  • Avoid WARNING messages when 2 variables have the
    same name

CREATE TABLE final (DROPVAR1B VAR2B VAR3B)
AS SELECT a., b. FROM data_set1 A,
data_set2 (RENAME(var1var1b
var2var2b

var3var3b)) B WHERE a.var1b.var1b AND
a.var2b.var2b AND a.var3b.var3b
17
Joins contd
CREATE TABLE family (DROPnickname) AS
SELECT a., b. FROM dogs a, parents
(RENAME(namenickname)) b WHERE
a.nameb.nickname
Name breed DOB Parent Nala Airdale 11JAN1990 Ina N
ina AMBR 22APR2001 Cecilia Jadzia Pointer 15FEB199
9 Cecilia Rocco MMBNR 08MAY2001 Chavo Bruja Rottwe
iler 15AUG2000 Ina Chata Boxer .
Ina
18
Joins contd
SELECT a., b. FROM dogs a LEFT JOIN parents
b ON a.nameb.name AND a.breed'Rottweiler
'
Do we get a warning?
Name Breed DOB Parent name Bruja Rottweiler 15AU
G2000 Ina Bruja Chacal Rottweiler 29JAN199
0 Chata Boxer . Jadzia Pointer 15FEB19
99 Nala Airdale 11JAN1990 Nina AMBR 22APR2001 Rocc
o MMBNR 08MAY2001 Tyson Rottweiler 15MAR2000
19
Joins contd
  • Limit records

CREATE TABLE final AS SELECT a., b. FROM
data_set1 (WHERE(var1keep)) a LEFT JOIN
data_set2 b ON a.var1B.var1b
AND a.var2B.var2b AND
a.var3B.var3b
20
Joins contd
SELECT a., b. FROM dogs (WHERE(breed'Rottweile
r'))a LEFT JOIN parents b ON a.nameb.name
Name breed DOB Parent
name Bruja Rottweiler 15AUG2000 Ina
Bruja Chacal Rottweiler 29JAN1990 Tyson Rottweiler
15MAR2000
21
What is special in SAS PROC SQL?
  • Things you already know from data step programming
  • Some differences between data step and SQL
    syntax
  • Order of execution
  • How good is my code?

22
Some Differences Between Data Step and SQL Syntax
  • Reserved word
  • Format modifiers
  • Descending sorting

23
Reserved Word
  • CASE
  • Why to use it
  • Is there something similar in data step?
  • How to use it
  • What can be the problem
  • How to avoid a problem

24
CASE Why to Use It
  • There is no IF or SELECT statements in SQL
  • Is similar to the SELECT clause in data step

25
CASE Why to Use It

DATA final SET parents SELECT (name)
WHEN (Nina') education Novice' WHEN
(Nala') education None' OTHERWISE
education There is hope END
26
CASE How to Use It
  • SELECT var1, var2,
  • CASE vark
  • WHEN (value1) THEN constant value
    function
  • WHEN (value3) THEN other constant
    function
  • ELSE last value function
  • END AS newvar
  • FROM data_set

27
CASE How to Use It contd
Name Education Parent Nala None Ina
Chata There is hope Ina
Bruja There is hope Ina
Fidonga There is hope Ina
Rocco There is hope Chavo
Nina Novice Cecilia Jadzia There is
hope Cecilia
  • SELECT
  • name,
  • CASE name
  • WHEN ('Nala') THEN 'None'
  • WHEN ('Nina') THEN 'Novice'
  • ELSE 'There is
    hope'
  • END AS education LENGTH 15 FORMAT 15.,
  • parent
  • FROM parent

28
CASE How to Use It contd
SELECT name,breed, CASE WHEN INDEX
(breed,'MB') THEN PUT(breed,UKC.) ELSE

breed END AS lbreed label "Long Breed" FROM
dogs
Name Breed Long breed Bruja Rottweiler Rottwe
iler Chacal Rottweiler Rot
tweiler Chata Boxer Boxer
Jadzia Pointer Pointe
r Nala Airdale Airdale
Nina AMBR American
Mixed Breed Registry Rocco MMBNR Mexican
Mixed Breed Non-Registered Tyson Rottweiler Rottwe
iler
29
CASE What Can Be the Problem
  • DATA new
  • LENGTH case 20
  • SET parents
  • IF parent 'Ina' THEN
    case'None'
  • ELSE IF parent 'Chavo' THEN case'Spoiled'
  • ELSE
    case'Work'
  • RUN
  • PROC PRINT DATAnew
  • RUN

Obs Case Parent Name 1 None Ina Nala
2 None Ina Chata 3 None Ina Bruja
4 None Ina Fidonga 5 Spoiled Chavo Rocco
6 Work Cecilia Nina 7 Work Cecilia Jadzia
30
CASE What Can Be the Problem contd
SELECT FROM new WHERE parent'Ina'
Case Parent Name None Ina Nala
None Ina Chata None Ina Bruja
None Ina Fidonga
31
CASE What Can Be the Problem contd
  • SELECT FROM new WHERE case'None'

select from new WHERE case'None'
22
76 ERROR 22-322 Syntax
error, expecting one of the following
a name, a quoted string, a numeric constant,
a datetime constant, a missing
value, (, , -, BTRIM,
CALCULATED, CASE, EXISTS, INPUT, LOWER,
NOT, PUT, SELECT, SUBSTRING, TRANSLATE,
UPPER, USER, WHEN, , .
ERROR 76-322 Syntax error, statement will be
ignored.
32
CASE How to Avoid a Problem
  • SELECT
  • FROM new (RENAME(caseeducation))
  • WHERE education'None'

33
Some Differences Between Data Step and SQL Syntax
  • Reserved word
  • Format modifiers
  • Descending sorting

34
Format Modifiers
  • What are format modifiers?
  • Format modifiers in data step
  • Format modifiers in proc SQL

35
Format Modifiers in Data Step
DATA ndogs SET dogs idateINPUT(dob,date9.)
IF _ERROR_1 THEN PUT _ALL_ RUN
NOTE Invalid argument to function INPUT at line
196 column 9. nameChata breedBoxer dobUNK
idate. _ERROR_1 _N_3 NOTE Mathematical
operations could not be performed at the
following places. The results of the operations
have been set to missing values. Each
place is given by (Number of times) at
(Line)(Column). 1 at 1969
36
Format Modifiers in Data Step contd
DATA ndogs SET dogs idateINPUT(dob, ??
date9.) IF _ERROR_1 THEN PUT _ALL_ RUN
NOTE There were 8 observations read from the
data set WORK.DOGS. NOTE The data set
WORK.NDOGS has 8 observations and 4 variables.
37
Format Modifiers in PROC SQL
CREATE TABLE ndogs AS SELECT name,
dob, INPUT(dob, date9.) AS idate FORMAT mmddyy8.
FROM dogs
ERROR Invalid day value NOTE Table WORK.NDOGS
created, with 8 rows and 3 columns.
38
Format Modifiers in PROC SQL contd
CREATE TABLE ndogs AS SELECT name, dob,
INPUT(dob,? date9.) AS idate FORMAT mddyy8. FROM
dogs
NOTE Table WORK.NDOGS created, with 8 rows and 3
columns.
39
Some Differences Between Data Step and SQL Syntax
  • Reserved word
  • Format modifiers
  • Descending sorting

40
Descending Sorting
  • Data step and PROCS

PROC SORT DATA ndogs BY DESCENDING
idate RUN
  • PROC SQL

SELECT FROM ndogs ORDER BY idate descending
41
Descending Sorting contd
  • A little lagniappe (pronounced nanap)

SELECT name, dob, INPUT( dob,? date9.) AS
idate FORMAT date9. FROM dogs ORDER BY 3
DESC
Name DOB Idate Bruja 15Aug2000 15AUG2000 Tyson 15
Mar2000 15MAR2000 Jadzia 15Feb1999 15FEB1999 Nina
22APR1997 22APR1997 Chacal 29Jan1990 29JAN1990 Nal
a 11jan1990 11JAN1990 Rocco 08may1980 08MAY1980 Ch
ata UNK .
42
What is Special in SAS PROC SQL?
  • The program data vector (PDV)
  • Some differences between data step and SQL syntax
  • Order of execution
  • How good is my code?

43
The Order of Execution
  • SELECT VAR1, VAR2,
  • FROM data_set
  • FULL LEFT RIGHT INNER
    JOIN TABLE2
  • ON condition 1
  • AND condition 2
  • WHERE condition k
  • AND condition l
  • GROUP BY VAR1, VAR.
  • HAVING condition p
  • AND condition q
  • ORDER BY var1, var2

44
The Order of Execution contd
  • PROC SQL NUMBER
  • SELECT a.breed,
  • INPUT(a.dob,? date9.) AS ndob format date9.,
  • COALESE(a.name, b.name) AS name, b.parent
  • FROM dogs a FULL JOIN parents b
  • ON a.nameb.name

Breed NDOB Name Parent 1 Rottweiler 15AUG2000
Bruja Ina 2
Rottweiler 29JAN1990 Chacal
3 Boxer Chata Ina
4 .
Fidonga Ina 5
Pointer 15FEB1999 Jadzia Cecilia 6
Airdale 11JAN1990 Nala Ina 7
AMBR 22APR1997 Nina Cecilia 8
MMBNR 08MAY1980 Rocco Chavo
45
The Order of Execution contd
  • SELECT a.breed,
  • INPUT( a.dob, ? date9.) AS ndob FORMAT
    date9.,
  • COALESCE (a.name, b.name) AS name,
  • b.parent
  • FROM dogs a FULL JOIN parents b
  • ON a.nameb.name
  • AND breed'Rottweiler'

Breed NDOB Name Parent 1 Rottweiler 15AUG2000
Bruja Ina 2
Rottweiler 29JAN1990 Chacal
3 Boxer Chata Ina
4 .
Fidonga Ina 5
Pointer 15FEB1999 Jadzia Cecilia 6
Airdale 11JAN1990 Nala Ina 7
AMBR 22APR1997 Nina Cecilia 8
MMBNR 08MAY1980 Rocco Chavo
46
The Order of Execution contd
  • SELECT a.breed,b.parent,
  • COALESCE ( a.name, b.name)
  • FROM dogs a FULL JOIN parent b
  • ON a.nameb.name
  • WHERE breed'Rottweiler'

Breed Parent Rottweiler Ina
Bruja Rottweiler Chacal Rottweiler
Tyson
SELECT a.breed, b.parent, COALESCE (b.name,
a.name) FROM dogs a FULL JOIN parent b ON
a.breed'Rottweiler' WHERE a.nameb.name
Breed Parent Rottweiler Ina
Bruja
47
The Order of Execution contd
SELECT a.breed, INPUT(a.dob, ? date9.) AS ndob
format date9., COALESCE(a.name, b.name) AS
name, b.parent FROM dogs a FULL JOIN parent
b ON a.nameb.name WHERE
breed'Rottweiler' HAVING ndob GT '01jan2000'd
Breed NDOB Name Parent Rottweiler 15AUG2000 Bruja
Ina Rottweiler 15MAR2000 Tyson
48
How Good is My Code?
  • Does the options used in data step make my code
    faster?
  • Is speed important?

49
How Good is My Code? contd
  • Do data step options make my code faster?

data all merge dogs parent by
name do i1 to 100 do j1 to 100
output end end run
NOTE There were 8 observations read from the
data set WORK.DOGS. NOTE There were 7
observations read from the data set
WORK.PARENT. NOTE The data set WORK.ALL has
891000 observations and 6 variables.
50
How Good is My Code? contd
  • Does the statements used to control the size of
    my input data set make my code faster?

proc sql stimer NOTE SQL statement used
real time 0.01 seconds cpu
time 0.02 seconds
51
How Good is My Code? contd
  • Does the statements used to control the size of
    my input data set make my code faster?

SELECT FROM all WHERE breed'Rottweiler'
  • NOTE SQL statement used
  • real time 10.82 seconds
  • cpu time 10.50 seconds

SELECT FROMall (WHERE(breed'Rottweiler'))
NOTE SQL statement used real time
10.85 seconds cpu time 10.61
seconds
52
How Good is My Code? contd
  • Does the statements used to control the size of
    my input data set make my code faster?

SELECT DISTINCT name, breed, dob FROM all WHERE
breed'Rottweiler'
real time 4.90 seconds cpu time 4.35 seconds
SELECT DISTINCT name, breed, dob FROM all (
WHERE(breed'Rottweiler'))
real time 9.42 seconds cpu time 4.57 seconds
SELECT DISTINCT name, breed, dob FROM all
(WHERE(breed'Rottweiler') KEEPname breed dob)
  • real time 5.89 seconds
  • cpu time 4.30 seconds

53
Conclusions
  • This presentation showed examples on how and when
    to use statements that we already know
  • KEEP
  • RENAME
  • DROP
  • WHERE
  • (things you already know)

54
Conclusions
  • Possible language problems (line mines)
  • Reserved word (CASE)
  • Format modifiers
  • Descending order
  • (things you think you already know)

55
Conclusions contd
  • Be careful with the order of the conditions
  • The order of the conditions affect the result

56
Conclusions contd
  • When you use SQL, consider if you need all your
    variables and observations
  • If you dont need them consider
  • It there is a significant difference in computer
    use
  • Are you a good typist?

57
Conclusions contd
  • You can have plain vanilla SAS PROC SQL or you
    can SASsy it up with data step statements

58
Acknowledgements
  • PPD management
  • Kim Sturgen and Bonnie Duncan
  • Andy Barnett
  • Darrel Edgley
  • Neil Howard, Larry Sleeper and Craig Mauldin
  • For their time, although if there are mistakes,
    they are all mine I made modifications and
    additions after they read the paper

59
Contact Information
  • Angelina Cecilia Mauldin
  • Abraxis BioScience
  • 4505 Emperor Blvd. 400
  • Durham, NC 27703
  • 919 433 8400
  • cmauldin_at_abraxisbio.com
Write a Comment
User Comments (0)
About PowerShow.com