Title: Data Transformations
1Chapter 12
2Section 12.1
3Objectives
- Review the syntax of SAS functions.
4SAS Functions
- SAS provides a large library of functions for
manipulating data during DATA step execution. - A SAS function is often categorized by the type
of data manipulation performed - truncation
- character
- date and time
- mathematical
- trigonometric
- special
- sample statistics
- financial
- random number
- state and ZIP code
5Syntax for SAS Functions
- A SAS function is a routine that performs a
computation or system manipulation and returns a
value. Functions use arguments supplied by the
user or by the operating environment. - General form of a SAS function
function-name(argument-1,argument-2,,argument-n)
6Using SAS Functions
- You can use functions in executable DATA step
statements anywhere that an expression can appear.
data contrib set prog2.donate
Totalsum(Qtr1,Qtr2,Qtr3,Qtr4) if Total ge
50runproc print datacontrib noobsrun
7Using SAS Functions
- Partial PROC PRINT Output
-
- What if you want to sum Qtr1 through Qtr400,
instead of Qtr1 through Qtr4?
ID Qtr1 Qtr2 Qtr3 Qtr4
Total E00224 12 33 22 .
67 E00367 35 48 40 30
153 E00441 . 63 89 90
242 E00587 16 19 30 29
94 E00621 10 12 15 25 62
8SAS Variable Lists
- A SAS variable list is a shortcut method of
referring to a list of variable names. SAS
enables you to use the following variable lists - numbered range lists
- name range lists
- name prefix lists
- special SAS name lists
- These methods can be used in many places where
variable names are expected.
9SAS Variable Lists Numbered Range List
- Syntax x1-xn
- A numbered range list specifies all variables
from x1 to xn inclusive (including the variables
named). - You can begin with any number and end with any
number. You must follow the rules for
user-supplied variable names and the numbers must
be consecutive.
proc print dataprog2.donate var id
Qtr2-Qtr4 run
10SAS Variable Lists- Numbered Range List
- What would be the result of this program if Qtr3
were not in the data set?
proc print dataprog2.donate var id
Qtr2-Qtr4 run
11SAS Variable Lists- Numbered Range List
- What would be the result of this program if Qtr3
were not in the data set?
proc print dataprog2.donate var id
Qtr2-Qtr4 run
Because the variable Qtr3 is not in the data set,
you get an error in the log. The error message
indicates that the variable does not exist.
12SAS Variable Lists- Name Range List
- Syntax StartVarName--StopVarName
- A name range list specifies all variables ordered
as they are in the program data vector, from
StartVarName to StopVarName, inclusive. - ? There are two hyphens.
proc print datafakedata var id
Name--Salary run
13SAS Variable Lists- Name Range List
- What is the result of the following program?
- The output contains ID followed by Name, State,
and Salary.
proc print datafakedata var id
Name--Salary run
14SAS Variable Lists- Name Range List
- Syntax StartVarName-NUMERIC-StopVarName
- StartVarName-CHARACTER-StopVarName
- You can also use the keyword NUMERIC or CHARACTER
inside the hyphens to select all the variables of
that data type, inclusively.
proc print datafakedata var id
Name-character-JobCode run
15SAS Variable Lists- Name Range List
- What is the result of the following program?
- The output contains ID followed by Name, State,
and Jobcode. Salary is not displayed because it
is a numeric variable.
proc print datafakedata var id
Name-character-Jobcode run
16SAS Variable Lists- Name Prefix List
- Syntax PartVarName
- Providing part of the variable name followed by a
semicolon tells SAS that you want all the
variables that start with that string. - ? The case of the string does not matter.
data fakedata2 set fakedata keep id
S run
17SAS Variable Lists- Name Prefix List
- What is the result of the following program?
- The new data set contains ID followed by State
and Salary.
data fakedata2 set fakedata keep id
S run
18SAS Variable Lists- Special Prefix List
- Syntax _ALL_
- _NUMERIC_
- _CHARACTER_
- specifies either all variables, all numeric
variables, or all character variables that are
defined in the current DATA step. - ? The case of the keyword does not matter.
19SAS Variable Lists
- When you use a SAS variable list in a SAS
function, use the keyword OF in front of the
first variable name in the list. - If you omit the OF keyword, the function
subtracts Qtr4 from Qtr1.
data contrib set prog2.donate
Totalsum(of Qtr1-Qtr4) if Total ge 50run
20Section 12.2
- Manipulating Character Values
21Objectives
- Use SAS functions and operators to extract, edit,
and search character values.
22A Mailing Label Application
- The freqflyers data set contains information
about frequent flyers. - Use this data set to create another data set
suitable for mailing labels.
23A Mailing Label Application
- ID is a character variable. Its last digit
represents the gender (1 denotes female, 2
denotes male) of the frequent flyer. - prog2.freqflyers
ID Name Address1
Address2 F31351 Farr,Sue 15 Harvey Rd.
Macon,Bibb,GA,31298 F161 Cox,Kay B. 163
McNeil Pl. Kern,Pond,CA,93280 F212 Mason,Ron
442 Glen Ave. Miami,Dade,FL,33054 F25122
Ruth,G. H. 2491 Brady St. Munger,Bay,MI,48747
24A Mailing Label Application
- labels
- The first task is to create a title of Mr. or Ms.
based on the last digit of ID.
FullName Address1
Address2 Ms. Sue Farr 15 Harvey Rd.
Macon, GA 31298 Ms. Kay B. Cox 163 McNeil Pl.
Kern, CA 93280 Mr. Ron Mason 442 Glen Ave.
Miami, FL 33054 Mr. G. H. Ruth 2491 Brady
St. Munger, MI 48747
25The SUBSTR Function (Right Side)
- The SUBSTR function is used to extract or replace
characters. - This form of the SUBSTR function (right side of
the assignment statement) extracts characters.
NewVarSUBSTR(string,startlt,lengthgt)
26The SUBSTR Function - Examples
- If the length of the created variable is not
previously defined with a LENGTH statement, it is
the same as the length of the first argument to
SUBSTR.
String can be a character constant, variable or expression.
Start specifies the starting position.
Length specifies the number of characters to extract. If omitted, the substring consists of the remainder of the expression.
27The SUBSTR Function (Right Side)
- Extract two characters from Location and start at
position 11.
Statesubstr(Location,11,2)
28A Mailing Label Application
proc print dataprog2.freqflyers noobs var
IDrun
PROC PRINT Output
ID
F31351 F161
F212 F25122
- In what position does the last digit of ID occur?
- In some values, the last digit is in column 6 and
in others it is in column 4.
29The RIGHT Function
- The RIGHT function returns its argument
right-aligned. - Trailing blanks are moved to the start of the
value.
NewVarRIGHT(argument)
NewIDright(ID)
continued...
30The RIGHT Function
- An argument can be a character constant,
variable, or expression - If the length of the created variable is not
previously defined with a LENGTH statement, it is
the same as the length of the argument.
31The LEFT Function
- The LEFT function returns its argument
left-aligned. - Trailing blanks are moved to the end of the value.
NewVarLEFT(argument)
NewIDleft(ID)
32The LEFT Function
- An argument can be a character constant, variable
or expression - If the length of the created variable is not
previously defined with a LENGTH statement, it is
the same as the length of the argument.
33A Mailing Label Application
data labels set prog2.freqflyers if
substr(right(ID),6)'1' then Title'Ms.'
else if substr(right(ID),6)'2' then
Title'Mr.'runproc print datalabels noobs
var ID Titlerun
The result of the RIGHT function acts as the
first argument to the SUBSTR function.
34A Mailing Label Application
ID Title F31351
Ms. F161 Ms.
F212 Mr. F25122 Mr.
35A Mailing Label Application
- The next task is to separate the names of the
frequent flyers into two parts.
36The SCAN Function
- The SCAN function returns the nth word of a
character value. - It is used to extract words from a character
value when the relative order of words is known,
but their starting positions are not.
NewVarSCAN(string,nlt,delimitersgt)
37The SCAN Function
- When the SCAN function is used,
- the length of the created variable is 200 bytes
if it is not previously defined with a LENGTH
statement - delimiters before the first word have no effect
- any character or set of characters can serve as
delimiters - two or more contiguous delimiters are treated as
a single delimiter - a missing value is returned if there are fewer
than n words in string - if n is negative, the SCAN function selects the
word in the character string starting from the
end of string.
38The SCAN Function
- Extract the second word of Phrase.
Secondscan(Phrase,2,' ')
and
software and services
1
2
3
...
39The SCAN Function
- Extract the second word of Phrase.
Secondscan(Phrase,2,'')
services
software andservices
services
software andservices
2
1
40The SCAN Function
- data scan Text'(Thursday July 4, 1776)'
Var1scan(Text,1) Var2scan(Text,4)
Var3scan(Text,5) Var4scan(Text,2,',')
Var5scan(Text,2,',)') run
...
41The SCAN Function
4
1
2
3
- data scan Text'(Thursday July 4, 1776)'
Var1scan(Text,1) Var2scan(Text,4)
Var3scan(Text,5) Var4scan(Text,2,',')
Var5scan(Text,2,',)') run
Var1 200
Thursday
...
42The SCAN Function
4
1
2
3
data scan Text'(Thursday July 4, 1776)'
Var1scan(Text,1) Var2scan(Text,4)
Var3scan(Text,5) Var4scan(Text,2,',')
Var5scan(Text,2,',)') run
Var1 200
Var2 200
Thursday
1776
...
43The SCAN Function
4
1
2
3
- data scan Text'(Thursday July 4, 1776)'
Var1scan(Text,1) Var2scan(Text,4)
Var3scan(Text,5) Var4scan(Text,2,',')
Var5scan(Text,2,',)') run
Var1 200
Var2 200
Var3 200
Thursday
1776
missing
...
44The SCAN Function
1
2
- data scan Text'(Thursday July 4, 1776)'
Var1scan(Text,1) Var2scan(Text,4)
Var3scan(Text,5) Var4scan(Text,2,',')
Var5scan(Text,2,',)') run
Var1 200
Var2 200
Var3 200
Var4 200
Thursday
1776
1776)
...
45The SCAN Function
1
2
- data scan Text'(Thursday July 4, 1776)'
Var1scan(Text,1) Var2scan(Text,4)
Var3scan(Text,5) Var4scan(Text,2,',')
Var5scan(Text,2,',)') run
Thursday
1776
1776)
1776
46A Mailing Label Application
- data labels length FMName LName 10 set
prog2.freqflyers if substr(right(ID),6)'1'
then Title'Ms.' else if
substr(right(ID),6)'2' then Title'Mr.'
FMNamescan(Name,2,',') LNamescan(Name,1,','
)run
47A Mailing Label Application
proc print datalabels noobs var ID Name
Title FMName LNamerun
PROC PRINT Output
ID Name Title FMName
LName F31351 Farr,Sue Ms. Sue
Farr F161 Cox,Kay B. Ms. Kay B.
Cox F212 Mason,Ron Mr. Ron
Mason F25122 Ruth,G. H. Mr. G. H.
Ruth
The next task is to join the values of Title,
FMName,and LName into another variable.
48Concatenation Operator
- The concatenation operator joins character
strings. - Depending on the characters available on your
keyboard, the symbol to concatenate character
values can be two exclamation points (!!), two
vertical bars (), or two broken vertical bars
().
NewVarstring1 !! string2
49Concatenation Operator
- Combine FMName and LName to create FullName.
FullNameFMName !! LName
50The TRIM Function
- The TRIM function removes trailing blanks from
its argument. - If the argument is blank, the TRIM function
returns one blank. - The TRIMN function is similar but returns a null
string (zero blanks) if the argument is blank.
NewVarTRIM(argument1) !! argument2
51The TRIM Function
data trim length FMName LName 10
FMName'Sue' LName'Farr'
FullName1trim(FMName) FullName2trim(FMName)
!! LName FullName3trim(FMName) !! ' ' !!
LNamerun
FullName1 10
Sue
...
52The TRIM Function
data trim length FMName LName 10
FMName'Sue' LName'Farr'
FullName1trim(FMName) FullName2trim(FMName)
!! LName FullName3trim(FMName) !! ' ' !!
LNamerun
FullName1 10
FullName2 20
Sue
SueFarr
...
53The TRIM Function
data trim length FMName LName 10
FMName'Sue' LName'Farr'
FullName1trim(FMName) FullName2trim(FMName)
!! LName FullName3trim(FMName) !! ' ' !!
LNamerun
FullName1 10
FullName2 20
FullName3 21
Sue
SueFarr
Sue Farr
54The TRIM Function
- The TRIM function does not remove leading blanks
from a character argument. Use a combination of
the TRIM and LEFT functions to remove leading and
trailing blanks from a character argument. - If FMName contained leading blanks, the following
assignment statement would correctly concatenate
FMName and LName into FullName.
FullNametrim(left(FMName)) !! ' ' !! LName
55A Mailing Label Application
data labels(keepFullName Address1 Address2)
length FMName LName 10 set
prog2.freqflyers if substr(right(ID),6)'1'
then Title'Ms.' else
if substr(right(ID),6)'2' then
Title'Mr.'
FMNamescan(Name,2,',')
LNamescan(Name,1,',') FullNameTitle !! ' '
!! trim(FMName) !! ' ' !! LName
Address2scan(Address2,1,',') !! ', ' !!
scan(Address2,3,',') !! ' ' !!
scan(Address2,4,',')run
56A Mailing Label Application
proc print datalabels noobs var FullName
Address1 Address2run
PROC PRINT Output
FullName Address1
Address2 Ms. Sue Farr 15 Harvey Rd.
Macon, GA 31298 Ms. Kay B. Cox 163 McNeil Pl.
Kern, CA 93280 Mr. Ron Mason 442 Glen Ave.
Miami, FL 33054 Mr. G. H. Ruth 2491 Brady St.
Munger, MI 48747
57The CATX Function
- The CATX function concatenates character strings,
removes leading and trailing blanks, and inserts
separators.
CATX(separator, string-1, string-n)
58A Mailing Label Application
data labels(keepFullName Address1 Address2)
length FMName LName 10 set
prog2.freqflyers if substr(right(ID),6)'1'
then Title 'Ms.' else if
substr(right(ID),6)'2' then Title
'Mr.' FMName scan(Name,2,',') Lname
scan(Name,1,',') FullName catx('
',Title,FMName,LName) Address2 catx(' ',
scan(Address2,1,',') ',',
scan(Address2,3,','), scan(Address2,4,',')) r
un
59A Mailing Label Application
proc print datalabels noobs var FullName
Address1 Address2run
PROC PRINT Output
FullName Address1
Address2 Ms. Sue Farr 15 Harvey Rd.
Macon, GA 31298 Ms. Kay B. Cox 163 McNeil Pl.
Kern, CA 93280 Mr. Ron Mason 442 Glen Ave.
Miami, FL 33054 Mr. G. H. Ruth 2491 Brady St.
Munger, MI 48747
60Exercise
- This exercise reinforces the concepts discussed
previously.
61Exercises
- The MIT Admissions Office received a list of
students with perfect SAT scores. The file must
be in a format that Admissions can use. Use the
People data set to create a temporary SAS data
set named Separate that contains the variables
First, MI, and Last to perform the following
tasks - Create a First and MI variable that contains each
person's first name and middle initial. Do not
include in the Separate data set. - Use the Separate data set to create a temporary
data set called flname that contains the
variables NewName and CityState. NewName should
be a concatenation of each person's first name
and last name with one space between them. - Create a list report to view the results.
62Exercises Solution 1
libname prog2 'your-directory' data
separate(dropFMnames) length FMnames First
30 MI 2 Last 30 set prog2.people
FMnames left(scan(Name,2,',')) First
scan(FMnames,1,' ') MI left(scan(FMnames,2,'
')) Last scan(Name,1,',') run proc
print dataseparate var Name CityState First
MI Last run
63Exercises Solution 2
libname prog2 'your-directory' data
flname(keepNewName CityState) length FMname
First MI Last 30 set prog2.people Last
scan(Name,1,',') FMname
left(scan(Name,2,',')) First
scan(FMname,1,' ') MI scan(FMname,2,' ')
NewName trim(First) !! ' ' !! Last run proc
print dataflname var NewName CityState run
64Exercises - Output
The SAS System Obs
NewName CityState 1
LINDSAY DEAN WILMINGTON, NC 2
HELEN-ASHE FLORENTINO WASHINGTON, DC 3
JAN VAN ALLSBURG SHORT HILLS, NJ 4
STANLEY LAFF SPRINGFIELD, IL 5
GEORGE RIZEN CHICAGO, IL 6
MARC MITCHELL CHICAGO, IL 7
DOROTHY MILLS JOE, MT 8
JONATHAN WEBB MORRISVILLE, NC 9
MAYNARD KEENAN SEDONA, AZ 10
PHYLLIS LACK WALTHAM, MA 11
KERRY THOMPSON WINTER PARK, FL 12
DOROTHY COX TIMONIUM, MD 13
DONALD SEPTOFF BOSTON, MA 14
JANICE PHOENIX SOMERVILLE, NJ 15
MURRAY HUNEYCUTT DIME BOX, TX 16
SHERRY ERICKSON EL PASO, TX 17
CLIVE SCHNEIDER CAPE MAY, NJ 18
KIMBERLY PUTNAM DUNWOODY, GA 19
JENNIFER PITTMAN BENNINGTON, VT 20
STACY ROLEN CODY, WY
65Exercises
The MIT Admissions Office likes to review
applications based on merit and remove as much
identifiable material as possible. They choose to
label each application folder with an applicants'
initials only. Using the Separate data set that
you recently created, create a temporary data set
called init that contains only the variables Name
and Initials. The value of Initials should be a
concatenation of the first characters from each
person's first name, middle initial, and last
name with no delimiters separating the characters.
66Exercises
data init(dropFName MName LName FMNames)
length Initials 3 LName FMNames
FName MName 30 set prog2.People
LNamescan(Name,1,',') FMNamesscan(Name,2,','
) FNamescan(FMNames,1,' ')
MNamescan(FMNames,2,' ') / Put together just
the first letters. / Initialssubstr(FName,1,1
) !! substr(MName,1,1) !!
substr(LName,1,1) run proc print datainit
var Name CityState Initials run
67Exercises
The SAS System Obs Name
CityState
Initials 1 DEAN, LINDSAY A.
WILMINGTON, NC LAD 2 FLORENTINO,
HELEN-ASHE H. WASHINGTON, DC HHF 3
VAN ALLSBURG, JAN F. SHORT HILLS, NJ
JFV 4 LAFF, STANLEY X.
SPRINGFIELD, IL SXL 5 RIZEN, GEORGE Q.
CHICAGO, IL GQR 6
MITCHELL, MARC J. CHICAGO, IL
MJM 7 MILLS, DOROTHY E. JOE, MT
DEM 8 WEBB, JONATHAN W.
MORRISVILLE, NC JWW 9 KEENAN, MAYNARD
J. SEDONA, AZ MJK 10
LACK, PHYLLIS M. WALTHAM, MA
PML 11 THOMPSON, KERRY L. WINTER
PARK, FL KLT 12 COX, DOROTHY E.
TIMONIUM, MD DEC 13 SEPTOFF,
DONALD E. BOSTON, MA DES 14
PHOENIX, JANICE A. SOMERVILLE, NJ
JAP 15 HUNEYCUTT, MURRAY Y. DIME
BOX, TX MYH 16 ERICKSON, SHERRY A.
EL PASO, TX SAE 17 SCHNEIDER,
CLIVE J. CAPE MAY, NJ CJS 18
PUTNAM, KIMBERLY M. DUNWOODY, GA
KMP 19 PITTMAN, JENNIFER R.
BENNINGTON, VT JRP 20 ROLEN, STACY D.
CODY, WY SDR
68A Search Application
- The ffhistory data set contains information about
the history of each frequent flyer. - This history information consists of
- each membership level that the flyer attained
(Bronze, Silver, or Gold) - the year that the flier attained each level.
- Create a report that shows all frequent flyers
who attained Silver membership status and the
year each of them became Silver members.
69A Search Application
- ffhistory
- To determine who attained Silver membership
status, search the Status variable for the value
Silver.
Seat ID Status
Pref F31351 Silver 1998,Gold 2000
AISLE F161 Bronze 1999
WINDOW F212 Bronze 1992,silver 1995
WINDOW F25122 Bronze 1994,Gold 1996,Silver
1998 AISLE
70The FIND Function
- The FIND function searches for a specific
substring of characters within a character
string that you specify, and returns its
location. - The FIND function returns
- the starting position of the first occurrence of
value within target, if value is found - 0, if value is not found.
Position FIND(target,valuelt,modifiers,startposgt)
71The FIND Function
- A modifier can be the value of I or T. I
indicates that the search is case insensitive. T
indicates that the search ignores trailing
blanks. They can be combined. If they are
omitted, the search is case sensitive and
trailing blanks are taken into consideration. - The startpos is an integer that specifies the
position at which the search should start and the
direction of the search. - Positive values forward (right)
- Negative values backward (left)
- If omitted, the search starts at position 1 and
moves right.
72The FIND Function
- Determine whether Text contains the string
BULL'S-EYE.
Text"This target contains a BULL'S-EYE." Posfin
d(Text,"BULL'S-EYE")
73The FIND Function
data index Text'DELIMIT IT WITH BLANKS.'
Pos1find(Text,'IT') Pos2find(Text,' IT ')
Pos3find(Text,'it') Pos4find(Text,'it','I'
) run
Pos1 N 8
6
...
74The FIND Function
data index Text'DELIMIT IT WITH BLANKS.'
Pos1find(Text,'IT') Pos2find(Text,' IT ')
Pos3find(Text,'it') Pos4find(Text,'it','I'
) run
Pos2 N 8
Pos1 N 8
8
6
...
75The FIND Function
data index Text'DELIMIT IT WITH BLANKS.'
Pos1find(Text,'IT') Pos2find(Text,' IT ')
Pos3find(Text,'it') Pos4find(Text,'it','I'
) run
Pos3 N 8
Pos2 N 8
Pos1 N 8
0
8
6
...
76The FIND Function
data index Text'DELIMIT IT WITH BLANKS.'
Pos1find(Text,'IT') Pos2find(Text,' IT ')
Pos3find(Text,'it') Pos4find(Text,'it','I'
) run
77A Search Application
prog2.ffhistory
Seat ID Status
Pref F31351 Silver 1998,Gold 2000
AISLE F161 Bronze 1999
WINDOW F212 Bronze 1992,silver 1995
WINDOW F25122 Bronze 1994,Gold
1996,Silver 1998 AISLE
data silver set prog2.ffhistory if
find(Status,'silver','I') gt 0run
78A Search Application
proc print datasilver noobsrun
PROC PRINT Output
Seat
ID Status
Pref F31351 Silver 1998,Gold 2000
AISLE F212 Bronze 1992,silver 1995
WINDOW F25122 Bronze 1994,Gold 1996,Silver 1998
AISLE
79The INDEX Function
- The INDEX function searches a character argument
for the location of a specified character value
and returns its location. - The INDEX function returns
- the starting position of the first occurrence of
value within target, if value is found - 0, if value is not found.
PositionINDEX(target,value)
80The INDEX Function
- Target specifies the character expression to
search. - Value specifies the string of characters to
search for in the character expression. - The search for value is literal. Capitalization
and blanks are considered. - The INDEX function differs from the FIND function
- does not have a modifier
- does not have startpos functionality
81The INDEX Function
- Determine whether Text contains the string
BULL'S-EYE.
Text"This target contains a BULL'S-EYE." Posind
ex(Text,"BULL'S-EYE")
82The INDEX Function
data index Text'DELIMIT IT WITH BLANKS.'
Pos1index(Text,'IT') Pos2index(Text,' IT
') Pos3index(Text,'it')run
...
83The INDEX Function
data index Text'DELIMIT IT WITH BLANKS.'
Pos1index(Text,'IT') Pos2index(Text,' IT
') Pos3index(Text,'it')run
...
84The INDEX Function
data index Text'DELIMIT IT WITH BLANKS.'
Pos1index(Text,'IT') Pos2index(Text,' IT
') Pos3index(Text,'it')run
85The INDEX Function
data index2 length String 5
String'IT' Text'DELIMIT IT WITH BLANKS.'
Pos4index(Text,String) Pos5index(Text,trim
(String)) Pos6index(Text,' ' !! trim(String)
!! ' ')run
String 5
IT
...
86The INDEX Function
data index2 length String 5
String'IT' Text'DELIMIT IT WITH BLANKS.'
Pos4index(Text,String) Pos5index(Text,trim
(String)) Pos6index(Text,' ' !! trim(String)
!! ' ')run
String 5
Pos4 N 8
IT
0
...
87The INDEX Function
data index2 length String 5
String'IT' Text'DELIMIT IT WITH BLANKS.'
Pos4index(Text,String) Pos5index(Text,trim
(String)) Pos6index(Text,' ' !! trim(String)
!! ' ')run
String 5
Pos4 N 8
IT
0
...
88The INDEX Function
data index2 length String 5
String'IT' Text'DELIMIT IT WITH BLANKS.'
Pos4index(Text,String) Pos5index(Text,trim
(String)) Pos6index(Text,' ' !! trim(String)
!! ' ')run
String 5
Pos4 N 8
IT
0
89A Search Application
prog2.ffhistory
Seat ID Status
Pref F31351 Silver 1998,Gold 2000
AISLE F161 Bronze 1999
WINDOW F212 Bronze 1992,silver 1995
WINDOW F25122 Bronze 1994,Gold 1996,Silver 1998
AISLE
data silver set prog2.ffhistory if
index(Status,'Silver') gt 0run
90A Search Application
- Why was F212 not selected?
proc print datasilver noobsrun
PROC PRINT Output
Seat ID Status
Pref F31351 Silver 1998,Gold 2000
AISLE F25122 Bronze 1994,Gold 1996,Silver 1998
AISLE
91A Search Application
- Why was F212 not selected?
- For F212, silver was stored in lowercase.
You searched for Silver.
proc print datasilver noobsrun
PROC PRINT Output
Seat ID Status
Pref F31351 Silver 1998,Gold 2000
AISLE F25122 Bronze 1994,Gold 1996,Silver 1998
AISLE
92The UPCASE Function
- The UPCASE function
- converts all letters in its argument to uppercase
- has no effect on digits and special characters.
NewValUPCASE(argument)
93A Search Application
- data silver(dropLocation) length Year 4
set prog2.ffhistory Locationindex(upcase(Sta
tus),'SILVER') if Location gt 0
Yearsubstr(Status,Location7,4)runproc
print datasilver noobs var ID Status Year
SeatPrefrun
94A Search Application
PROC PRINT Output
Yearsubstr(Status,Location7,4)
Seat ID Status
Year Pref F31351 Silver 1998,Gold 2000
1998 AISLE F212 Bronze 1992,silver 1995
1995 WINDOW F25122 Bronze 1994,Gold
1996,Silver 1998 1998 AISLE
- Did the values of Status permanently change?
95The PROPCASE Function
- The PROPCASE function converts all words in an
argument to proper case, in which the first
letter is uppercase and the remaining letters are
lowercase. - Default delimiters for the PROPCASE function are
the blank, forward slash, hyphen, open
parenthesis, period, and tab characters.
NewValPROPCASE(argument lt,delimiter(s)gt)
96A Search Application
data silver(dropLocation) length Year 4
set prog2.ffhistory Statuspropcase(Status,'
,') Locationfind(Status,'Silver') if
Location gt 0 SeatPrefpropcase(SeatPref)
Yearsubstr(Status,Location7,4)runproc
print datasilver noobs var ID Status Year
SeatPrefrun
97A Search Application
PROC PRINT Output
Seat ID Status
Year Pref F31351 Silver 1998,Gold 2000
1998 Aisle F212 Bronze 1992,Silver 1995
1995 Window F25122 Bronze 1994,Gold
1996,Silver 1998 1998 Aisle
98The TRANWRD Function
- The TRANWRD function replaces or removes all
occurrences of a given word (or a pattern of
characters) within a character string. - The TRANWRD function does not remove trailing
blanks from target or replacement. - source source string that you want translated
- target string searched for in source
- replacement string that replaces the target
NewValTRANWRD(source,target,replacement)
99The TRANWRD Function
- If the length of the new variable is not
previously defined with a LENGTH statement, the
default length is 200 bytes. - Using the TRANWRD function to replace an existing
string with a longer string might cause
truncation of the resulting value if a LENGTH
statement is not used.
100The TRANWRD Function
- Replace the first word of Dessert.
Desserttranwrd(Dessert,'Pumpkin','Apple')
101A Search Application
data silver(dropLocation) length Year 4
set prog2.ffhistory Statustranwrd(Status,'si
lver','Silver') Locationindex(Status,'Silver'
) if Location gt 0 Yearsubstr(Status,Loca
tion7,4)runproc print datasilver noobs
var ID Status Year SeatPrefrun
102A Search Application
PROC PRINT Output
Seat ID Status
Year Pref F31351 Silver 1998,Gold 2000
1998 AISLE F212 Bronze 1992,Silver 1995
1995 WINDOW F25122 Bronze 1994,Gold
1996,Silver 1998 1998 AISLE
103The LOWCASE Function
- The LOWCASE function
- converts all letters in its argument to lowercase
- has no effect on digits and special characters.
NewValLOWCASE(argument)
104The SUBSTR Function (Left Side)
- The SUBSTR function is used to extract or replace
characters. - This form of the SUBSTR function (left side of
the assignment statement) replaces characters in
a character variable.
SUBSTR(string,startlt,lengthgt)value
105The SUBSTR Function (Left Side)
- string specifies a character variable
- start specifies a numeric expression that is
the beginning character position - length specifies a numeric expression that is
the length of the substring to be replaced - The length value cannot be larger than the
remaining length of string after start (including
trailing blanks). - If you omit length, SAS uses all the characters
on the right side of the assignment statement to
replace the values of string up to the limit.
106The SUBSTR Function (Left Side)
- Replace two characters from Location starting at
position 11.
substr(Location,11,2)'OH'
107The LOWCASE Function
data silver set silver substr(SeatPref,2)
lowcase(substr(SeatPref,2))run
...
108The LOWCASE Function
data silver set silver substr(SeatPref,2)
lowcase(substr(SeatPref,2))run
isle
...
109The LOWCASE Function
data silver set silver substr(SeatPref,2)
lowcase(substr(SeatPref,2))run
isle
isle
110A Search Application
proc print datasilver noobs var ID Year
SeatPrefrun
PROC PRINT Output
Seat ID
Year Pref F31351 1998
Aisle F212 1995 Window
F25122 1998 Aisle
111Exercise
- This exercise reinforces the concepts discussed
previously.
112Exercises
The Pizza Company tracks feedback from customers.
The person who recorded the data misspelled the
word received in the data set Complaint. Write a
program to correct this data mistake. Create a
new data set called Complaints2 that contains the
corrected data. Create a list report to view the
results.
113Exercises Solution
libname apex 'SAS-directory' data complaints2
set apex.complaint Complaint
tranwrd(Complaint, 'recieve', 'receive') run pro
c print datacomplaints2 run
114Exercises Output
115Exercises
- The SAS Snack Company wants to create categories
for snacks. Using the Snacks data set, create a
temporary SAS data set named Snack_new. - Create a variable called Category. The value will
depend on the product and should be one of the
following Chips, Pretzels, Pretzel Sticks,
Popcorn, Pork Rinds, Crackers. or Puffs. (Note
Pretzel Sticks should go to the Pretzel Category,
as well as other snacks with Pretzel in the
name.) - Create a variable called TotalSales that is the
product of the quantity sold and the price.
Format the value with dollar signs and two
decimal places. - Create a frequency report of Category to make
sure that all the rows are mapped correctly. - Create a list report to view the results.
116Exercises - Solution
data snack_new set snacks length Category
12 if index(lowcase(Product), 'chip') gt0
then Category "Chips" else if
index(lowcase(Product), 'pretzel') gt0 then
Category "Pretzels" else if
index(lowcase(Product), 'stick') gt0 then Category
"Sticks" else if index(lowcase(Product),
'popcorn') gt0 then Category "Popcorn" else
if index(lowcase(Product), 'pork rinds') gt0 then
Category "Pork Rinds" else if
index(lowcase(Product), 'cracker') gt0 or
index(lowcase(Product), 'saltine') gt0 then
Category "Crackers" else if
index(lowcase(Product), 'puffs') gt0 then Category
"Puffs" TotalSales QtySoldPrice
format TotalSales dollar20.2 run proc freq
datasnack_new tables category run
117Exercises - Output
The FREQ Procedure
Cumulative Cumulative
Category Frequency Percent Frequency
Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chips
11242 31.43 11242 31.43
Crackers 5110 14.29
16352 45.71 Popcorn 4088
11.43 20440 57.14 Pork
Rinds 3066 8.57 23506
65.71 Pretzels 3066 8.57
26572 74.29 Puffs
3066 8.57 29638 82.86
Sticks 6132 17.14 35770
100.00
118Exercises
As part of the financial aid process, MIT helps
students identify scholarship opportunities from
external sources. Use the People data set to
create the temporary Prairie data set. Use the
appropriate function to search through values of
CityState to identify only those applicants from
Illinois.
119Exercises
data prairie set prog2.people if
index(CityState,' IL') gt 0 run proc print
dataPrairie run
The SAS System Obs
Name CityState 1 LAFF,
STANLEY X. SPRINGFIELD, IL 2 RIZEN,
GEORGE Q. CHICAGO, IL 3 MITCHELL,
MARC J. CHICAGO, IL
120Exercises
- In order to mail the scholarship information to
the proper applicants, MIT need the address
information in the proper format for a letter. - Use the variable Name from Prairie to create a
data set called mixedprairie that contains the
values of Name. - Convert Name from uppercase to mixed case.
121Exercises
data mixedprairie set prairie Name
propcase(Name) run proc print
datamixedprairie run
122Section 12.3
- Manipulating Numeric Values
123Objectives
- Use SAS functions to truncate numeric values.
- Use SAS functions to compute sample statistics of
numeric values.
124Truncation Functions
- Selected functions that truncate numeric values
include - ROUND function
- CEIL function
- FLOOR function
- INT function.
125The ROUND Function
- The ROUND function returns a value rounded to the
nearest round-off unit. - If round-off-unit is not provided, argument is
rounded to the nearest integer.
NewVarROUND(argumentlt,round-off-unitgt)
126The ROUND Function
data truncate NewVar1round(12.12)
NewVar2round(42.65,.1) NewVar3round(6.478,.0
1) NewVar4round(96.47,10) run
...
127The ROUND Function
data truncate NewVar1round(12.12)
NewVar2round(42.65,.1) NewVar3round(6.478,.0
1) NewVar4round(96.47,10) run
...
128The ROUND Function
data truncate NewVar1round(12.12)
NewVar2round(42.65,.1) NewVar3round(6.478,.0
1) NewVar4round(96.47,10) run
...
129The ROUND Function
data truncate NewVar1round(12.12)
NewVar2round(42.65,.1) NewVar3round(6.478,.0
1) NewVar4round(96.47,10) run
130The CEIL Function
- The CEIL function returns the smallest integer
greater than or equal to the argument.
NewVarCEIL(argument)
X
4.4
xceil(4.4)
x5
131The FLOOR Function
- The FLOOR function returns the greatest integer
less than or equal to the argument.
NewVarFLOOR(argument)
X
3.6
yfloor(3.6)
y3
132The INT Function
- The INT function returns the integer portion of
the argument.
NewVarINT(argument)
zint(3.9)
z3
133Truncation Functions
data truncate Var16.478
NewVar1ceil(Var1) NewVar2floor(Var1)
NewVar3int(Var1) run
...
134Truncation Functions
data truncate Var16.478
NewVar1ceil(Var1) NewVar2floor(Var1)
NewVar3int(Var1) run
...
135Truncation Functions
data truncate Var16.478
NewVar1ceil(Var1) NewVar2floor(Var1)
NewVar3int(Var1) run
136Truncation Functions
Use the same functions with a negative value for
the variable Var1. For values greater than
0, the FLOOR and INT functions return the same
value. For values less than 0, the CEIL and INT
functions return the same value.
data truncate Var1-6.478
NewVar1ceil(Var1) NewVar2floor(Var1)
NewVar3int(Var1) run
137Functions That Compute Statistics
- Selected functions that compute sample statistics
based on a group of values include the following - SUM function (total of values)
- MEAN function (average of values)
- MIN function (lowest value)
- MAX function (highest value)
- These functions
- accept multiple arguments in any order
- use the same algorithm as SAS statistical
procedures - ignore missing values.
138Functions That Compute Statistics
- The SUM function adds values together and ignores
missing values. - The MIN function returns the smallest non-missing
value. - The MAX function returns the largest value.
NewVarSUM(argument-1,argument-2,,argument-n)
MIN(argument-1, argument-2,, argument-n)
MAX(argument-1, argument-2,, argument-n)
139The SUM Function
data summary Var112 Var2. Var36
NewVarsum(Var1,Var2,Var3)run
...
140The SUM Function
data summary Var112 Var2. Var36
NewVarsum(Var1,Var2,Var3)run
What would be the value of NewVar if an
arithmetic operator were used instead of the SUM
function?
...
141The SUM Function
data summary Var112 Var2. Var36
NewVarsum(Var1,Var2,Var3)run
What would be the value of NewVar if an
arithmetic operator were used instead of the SUM
function? Missing
142The MEAN Function
- The MEAN function calculates the arithmetic mean
(average) of values and ignores missing values.
NewVarMEAN(argument-1,argument-2,,argument-n)
143The MEAN Function
data summary Var112 Var2. Var36
NewVarmean(Var1,Var2,Var3)run
144Exercise
- This exercise reinforces the concepts discussed
previously.
145Exercises
Final grades are coming. Use the data set Grade
to create a data set named Final. The Final data
set should contain a variable named Overall that
is the semester average grade. Calculate
Overall by averaging all the tests plus the
final. The final is weighted twice as much as any
of the other tests. (Count the final twice when
calculating Overall.) Round Overall to the
nearest integer.
146Exercises
data final set prog.grade
Overallround(mean(Test1,Test2,Test3,Final,Final))
run proc print datafinal run
Alternate Solution
data final set prog.grade
Overallround(mean(of Test1-Test3,Final,Final)) p
roc print datafinal run
147Exercises
The SAS System
SSN Course Test1 Test2 Test3
Final Overall 012-40-4928 BUS450 80
70 80 80 78 012-83-3816
BUS450 90 90 60 80
80 341-44-0781 MATH400 78 87 90
91 87 423-01-7721 BUS450 80
70 75 95 83 448-23-8111
MATH400 88 91 100 95
94 723-14-8422 HIST100 88 90 91
95 92 819-32-1294 HIST100 67
80 60 70 69 831-34-2411
MATH400 72 76 82 79
78 837-33-8374 HIST100 90 99 87
96 94 877-22-7731 MATH400 87
85 80 78 82 880-90-0783
HIST400 50 70 78 80
72 920-22-0209 MATH400 79 87 81
82 82 973-34-2119 BUS450 80
75 88 90 85 877-22-7731
SCI400 80 70 80 80
78 012-40-4928 FRENCH100 80 70 80
80 78 819-32-1294 FRENCH100 67
80 60 70 69 819-32-1294
BUS450 67 80 60 70
69 723-14-8422 SCI400 79 87 81
82 82 837-33-8374 SCI400 79
87 81 82 82
148Exercises
Final grades are coming. Modify the DATA step in
the previous exercise so that the value of
Overall is the average of the two highest test
scores and the final. (The lowest test should not
be used to calculate Overall.) As before, the
final exam should be counted twice. Round
Overall to the nearest integer.
149Exercises
data final(dropOverallTotal) set
prog2.grade OverallTotalsum(Test1,Test2,Test3
,Final,Final)- min(Test1,Test2,Test3)
Overallround(OverallTotal/4) run proc print
datafinal run
Alternate Solution
data final(dropOverallTotal) set
prog2.grade OverallTotalsum(of
Test1-Test3,Final,Final)-
min(of Test1-Test3) Overallround(OverallTotal
/4) proc print datafinal run
150Exercises
The SAS System Obs SSN Course
Test1 Test2 Test3 Final Overall 1
012-40-4928 BUS450 80 70 80 80
80 2 012-83-3816 BUS450 90 90
60 80 85 3 341-44-0781 MATH400
78 87 90 91 90 4
423-01-7721 BUS450 80 70 75 95
86 5 448-23-8111 MATH400 88 91
100 95 95 6 723-14-8422 HIST100
88 90 91 95 93 7
819-32-1294 HIST100 67 80 60 70
72 8 831-34-2411 MATH400 72 76
82 79 79 9 837-33-8374 HIST100
90 99 87 96 95 10
877-22-7731 MATH400 87 85 80 78
82 11 880-90-0783 HIST400 50 70
78 80 77 12 920-22-0209 MATH400
79 87 81 82 83 13
973-34-2119 BUS450 80 75 88 90
87 14 877-22-7731 SCI400 80 70
80 80 80 15 012-40-4928 FRENCH100
80 70 80 80 80 16
819-32-1294 FRENCH100 67 80 60 70
72 17 819-32-1294 BUS450 67 80
60 70 72 18 723-14-8422 SCI400
79 87 81 82 83 19
837-33-8374 SCI400 79 87 81 82
83
151Section 12.4
- Manipulating Numeric Values Based on Dates
152Objectives
- Review SAS functions used to create SAS date
values. - Review SAS functions used to extract information
from SAS date values. - Use SAS functions to determine the intervals
between two SAS date values.
153Creating SAS Date Values
- You can use the MDY or TODAY functions to create
SAS date values. - The MDY function creates a SAS date value from
month, day, and year values. - The TODAY function returns the current date as a
SAS date value.
NewDateMDY(month,day,year)
NewDateTODAY()
154Extracting Information
- You can use the MONTH, DAY, and YEAR functions
to extract information from SAS date values. - The MONTH function creates a numeric value (1-12)
that represents the month of a SAS date value.
NewMonthMONTH(SAS-date-value)
continued...
155Extracting Information
- The DAY function creates a numeric value (1-31)
that represents the day of a SAS date value. - The YEAR function creates a four-digit numeric
value that represents the year.
NewDayDAY(SAS-date-value)
NewYearYEAR(SAS-date-value)
continued...
156Calculating an Interval of Years
- The YRDIF function returns the number of years
between two SAS date values.
NewValYRDIF(sdate,edate,basis)
continued...
157Calculating an Interval of Years
'ACT/ACT' Actual number of days between dates in calculating the number of years. of days that fall in a 365 year/365 plus of days in a 366 year/366. ACTUAL is the alias.
'30/360' Each month is considered to have 30 days and each year 360 days regardless of the actual number. '360' is the alias.
'ACT/360' Actual number of days between dates in calculating the number of years regardless of the actual number of days in a year.
'ACT/365' Actual number of days between dates to calculate years. Number of days divided by 365 regardless of the actual number in a year.
158The YRDIF Function
- The variable DOB represents a person's date of
birth. Assume today's date is May 3, 2008, and
DOB is November 8, 1972. What is this person's
age? - The DATDIF function can be used to return the
number of days between two SAS date values. - Only two basis values are valid for the DATDIF
function - 'ACT/ACT' and '30/360')
MyValyrdif(DOB,'3may2008'd,'act/act')
159The YRDIF Function
- The variable DOB represents a person's date of
birth. Assume today's date is May 3, 2008, and
DOB is November 8, 1972. What is this person's
age?
MyValyrdif(DOB,'3may2008'd,'act/act')
...
160The YRDIF Function
- The variable DOB represents a person's date of
birth. Assume today's date is May 3, 2008, and
DOB is November 8, 1972. What is this person's
age? - How can you alter this program to
- compute each employee's age based on today's date
- truncate all of the decimal places without
rounding?
MyValyrdif(DOB,'3may2008'd,'act/act')
...
161The YRDIF Function
- How can you alter this program to
- compute each employee's age based on today's date
- truncate all of the decimal places without
rounding? - ? This code was run on July 13, 2008. Your values
will differ.
int(yrdif(DOB, today(),'act/act'))
162Exercise
- This exercise reinforces the concepts discussed
previously.
163Exercises
- In order to vote in most states, voters must be
18 years of age by the date of the election. - Use the Register data set to create two new data
sets called Voters and NonVoters. - Use the existing BirthMonth, Day, and BirthYear
to create a new variable called Birthday that
stores the SAS date value for each voter's birth
date. - Create a second new variable called Age that
stores the number of years between each voter's
birthday and today. - The values of Birthday should be displayed with
the DATE9. format. The value of Age should be
truncated to remove all decimals without
rounding. - Produce two listing reports with appropriate
titles.
164Exercises
data voters nonvoters keep StudentId Name
Birthday Age set prog.register
Birthdaymdy(BirthMonth,Day,BirthYear) /
The FLOOR function could be used in the
following assignment statement. /
Ageint(yrdif(Birthday,today(),'act/act'))
format Birthday date9. if Age gt 18 then
output voters else output nonvoters run titl
e 'Students who are over 18 and can vote' proc
print datavoters run title 'Students who are
not over 18 and cannot vote' proc print
datanonvoters run
165Exercises
Partial Listing of Voters
Students who are over 18 and can vote
Student Obs ID Name
Birthday Age 1 1155 Angel
Reisman 23JUN1987 19 2 1266
Melanie Michaels 17MAR1988 18 3
2055 Faith Sadowski 06FEB1988 18
4 2561 Dorothy Gilbert 16APR1988
18 5 2584 Patrice Ray
18AUG1988 18 6 2587 Jeremiah
Ashford 26SEP1987 19 7 2600
Alisha Gurman 21DEC1987 18 8
2606 Gustavo Spencer 16SEP1988 18
9 2681 Ryan Lin 03MAR1988
18 10 3213 Thomas Gladwell
09JUN1988 18 11 3250 Misty Orlowski
22OCT1987 19 12 3456 Ruby
Abdul 09JUN1988 18
? Your output will contain a different number of
rows.
166Exercises
Partial Listing of NonVoters
Students who are not over 18 and cannot
vote Student Obs ID Name
Birthday Age 1 1005
Chaz Richardson 21JUN1989 17 2
1154 Barbara Muir 04APR1990 16
3 1245 Leticia Ritter 27MAR1990
16 4 1257 Richard Calle
01SEP1989 17 5 1258 Ronnie Trimpin
03OCT1989 17 6 2001 Troy
Pruska 05AUG1989 17 7 2006
Annie Ritter 25FEB1990 16 8
2046 Derrick Ikiz 05MAR1989 17
9 2334 Jesse Liu 03AUG1991
15 10 2335 Taylor Lowet
07AUG1989 17 . . .
? Your output will contain a different number of
rows.
167Section 12.5
168Objectives
- Describe the automatic conversion of character
data into numeric data. - Explicitly convert character data into numeric
data. - Describe the automatic conversion of numeric data
into character data. - Explicitly convert numeric data into character
data.
169Data Conversion
- In many applications, you might need to convert
one data type to another. - You might need to read digits in character form
into a numeric value. - You might need to write a numeric value to a
character string.
170Data Conversion
- You can convert data types by using one of the
following methods - implicitly by enabling SAS to do it for you
- explicitly with these functions
- INPUT character-to-numeric conversion
- PUT numeric-to-character conversion
171Automatic Character-to-Numeric Conversion
- The prog2.salary1 data set contains a character
variable GrossPay. Compute a 10-percent bonus
for each employee. - What will happen when the character values of
GrossPay are used in an arithmetic expression?
172Automatic Character-to-Numeric Conversion
ID GrossPay 11
5 201-92-2498 52000
482-87-7945 32000 330-40-7172 49000
data bonuses set prog2.salary1
Bonus.10GrossPayrun
173Automatic Character-to-Numeric Conversion
2 data bonuses 3 set prog2.salary1 4
Bonus.10GrossPay 5 run NOTE
Character values have been converted
to numeric values at the places
given by (Line)(Column). 414 NOTE The
data set WORK.BONUSES has 3 observations
and 3 variables.
174Automatic Character-to-Numeric Conversion
- proc print databonuses noobs run
PROC PRINT Output
ID GrossPay Bonus 201-92-2498
52000 5200 482-87-7945 32000
3200 330-40-7172 49000 4900
175Automatic Character-to-Numeric Conversion
- SAS automatically converts a character value to a
numeric value when the character value is used in