An Introduction to SAS Character Functions including some new SAS 9 functions - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

An Introduction to SAS Character Functions including some new SAS 9 functions

Description:

Use the LENGTH statement as the very first statement in the DATA ... Note: You can specify delimiters for indexw in a third argument. Changing Case. Data case; ... – PowerPoint PPT presentation

Number of Views:272
Avg rating:3.0/5.0
Slides: 43
Provided by: RonAl93
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to SAS Character Functions including some new SAS 9 functions


1
An Introduction to SAS Character Functions
(including some new SAS 9 functions)
  • Ron Cody, Ed.D.

2
Some Functions We Will Discuss
  • LENGTH
  • SUBSTR
  • COMPBL
  • COMPRESS
  • VERIFY
  • INPUT
  • PUT
  • TRANWRD
  • SCAN
  • TRIM
  • UPCASE
  • LOWCASE
  • INDEX
  • INDEXC
  • INDEXW
  • SPEDIS
  • LENGTH

3
Some SAS 9 Functions
  • ANYDIGIT
  • ANYALPHA
  • NOTDIGIT
  • NOTALPHA
  • CATX and CATS
  • COMPARE
  • LENGTHC
  • LENGTHN
  • STRIP
  • COUNT
  • COUNTC
  • PROPCASE
  • FIND
  • FINDC

4
Character Storage Lengths
  • data chars1
  • length string 7
  • string 'abc'
  • length length(string)
  • storage_length lengthc(string)
  • display "" string ""
  • put storage_length /
  • length /
  • display
  • run

5
SAS Log
  • 11 data chars1
  • 12 length string 7
  • 13 string 'abc'
  • 14 storage_length lengthc(string)
  • 15 length length(string)
  • 16 display "" string ""
  • 17 put storage_length /
  • 18 length /
  • 19 display
  • 20 run
  • storage_length7
  • length3
  • displayabc

6
Moving the LENGTH Statement
  • data chars2
  • string 'abc'
  • length string 7
  • storage_length lengthc(string)
  • length length(string)
  • display "" string ""
  • put storage_length /
  • length /
  • display
  • run

7
SAS Log
  • 1 data chars2
  • 2 string 'abc'
  • 3 length string 7
  • WARNING Length of character variable string has
    already been set. Use the LENGTH statement as the
    very first statement in the DATA STEP to declare
    the length of a character variable.
  • 4 storage_length lengthc(string)
  • 5 length length(string)
  • 6 display "" string ""
  • 7 put storage_length /
  • 8 length /
  • 9 display
  • 10 run
  • storage_length3
  • length3
  • displayabc

8
The INPUT Function
  • data special
  • INPUT is a special function often used
  • for character to numeric conversion
  • length c_date 10 numeral 3
  • input c_date numeral
  • sas_date input(c_date,mmddyy10.)
  • number input(numeral,3.)
  • datalines
  • 11/12/1950 123
  • 9-15-2004 99

Listing of Data Set SPECIAL c_date numeral
sas_date number 11/12/1950 123 -3337
123 9-15-2004 99 16329 99
9
The PUT Function
  • data special
  • PUT is a special function often used
  • for numeric to character conversion
  • input sas_date number ss
  • c_date put(sas_date,date9.)
  • money put(number,dollar8.)
  • ss_char put(ss,ssn.)
  • datalines
  • 0 1234 123456789

Listing of Data Set SPECIAL sas_date number
ss c_date money ss_char 0
1234 123456789 01JAN1960 1,234 123-45-6789
10
Converting Multiple Blanks to a Single Blank
  • data multiple
  • input 1 _at_1 Name 20.
  • 2 _at_1 Address 30.
  • 3 _at_1 City 15.
  • _at_20 State 2.
  • _at_25 Zip 5.
  • name compbl(name)
  • address compbl(address)
  • city compbl(city)
  • datalines
  • Ron Cody
  • 89 Lazy Brook Rd.
  • Flemington NJ 08822
  • Bill Brown
  • 28 Cathy Street
  • North City NY 11518

Multiple Name Address Ron
Cody 89 Lazy Brook Rd. Bill Brown 28
Cathy Street City State Zip
Flemington NJ 08822 North City NY
11518
11
How to Remove Characters from a String
  • data phone
  • input phone 15.
  • phone1 compress(phone)
  • phone2 compress(phone,'(-) ')
  • datalines
  • (908)235-4490
  • (201) 555-77 99

Phone phone phone1
phone2 (908)235-4490 (908)235-4490
9082354490 (201) 555-77 99 (201)555-7799
2015557799
12
Another COMPRESS Example
  • data social
  • input ss_char 11.
  • ss input(compress(ss_char,'-'),9.)
  • easy_ss input(ss_char,comma11.)
  • datalines
  • 123-45-6789

ss 123456789 (numeric) ss_easy 123456789
(numeric)
13
Compress Function (SAS 9 changes)
  • COMPRESS (char_value lt, comp_stringgt
    lt,modifiersgt)
  • char_value is a SAS character value
  • comp_string is a character value containing the
    characters to remove from char_value.
  • modifiers add additional characters to the list
    of
  • characters to remove or modify the way the
    function works (see next slide).

14
Compress Function Modifiers (SAS 9)
  • Selected list of COMPRESS modifiers (upper- or
    lowercase)
  • a adds upper- and lowercase letters
  • d adds numerals (digits)
  • i ignores case
  • k keeps listed characters instead of removing
    them
  • s adds space (blank, tabs, lf, cr) to the list
  • p adds punctuation

15
Examples
For these examples, char "A C123XYZ" , phone
"(908) 777-1234"
16
Using the Compress Modifiers
  • data phone
  • input phone 15.
  • number compress(phone,,'kd')
  • datalines
  • (908)235-4490
  • (201) 555-77 99
  • Listing of Data Set PHONE
  • phone number
  • (908)235-4490 9082354490
  • (201) 555-77 99 2015557799

17
The VERIFY Function
  • data verify
  • input _at_1 id 3.
  • _at_5 answer 5.
  • position verify(answer,'abcde')
  • datalines
  • 001 acbed
  • 002 abxde
  • 003 12cce
  • 004 abc e

Verify id answer position 001 acbed
0 002 abxde 3 003 12cce
1 004 abc e 4
18
Watch Out for Trailing Blanks
  • data trailing
  • length string 10
  • string 'abc'
  • position verify(string,'abcde')
  • run

String 'abc ' Position 4 (the position
of the first trailing blank)
19
Watch Out for Trailing Blanks
  • data trailing
  • length string 10
  • string 'abc'
  • position
  • verify(trim(string),'abcde')
  • run

Position 0
20
Using VERIFY for Data Cleaning
  • data clean
  • input id
  • Valid ID's contain letters X,Y, or Z
  • and digits
  • if verify(trim(id),'XYZ0123456789') eq 0
  • then valid 'Yes'
  • else valid 'No'
  • datalines
  • 12X67YZ
  • 67WXYZ

Listing of Data Set CLEAN id
valid 12X67YZ Yes 67WXYZ No
21
Substring Example
  • data pieces_parts
  • input Id 9.
  • length State 2
  • state substr(Id,3,2)
  • Num input(substr(Id,5),4.)
  • datalines
  • XYNY123
  • XYNJ1234

Listing of Data Set PIECES_PARTS Id
State Num XYNY123 NY 123 XYNJ1234
NJ 1234
22
The SUBSTR Function on the Left-Hand Side of the
Equal Sign
  • data pressure
  • input sbp dbp _at__at_
  • length sbp_chk dbp_chk 4
  • sbp_chk put(sbp,3.)
  • dbp_chk put(dbp,3.)
  • if sbp gt 160 then
  • substr(sbp_chk,4,1) ''
  • if dbp gt 90 then
  • substr(dbp_chk,4,1) ''
  • datalines
  • 120 80 180 92 200 110

23
The SUBSTR Function on the Left-Hand Side of the
Equal Sign
  • Listing of Data Set PRESSURE
  • sbp dbp sbp_chk dbp_chk
  • 120 80 120 80
  • 180 92 180 92
  • 200 110 200 110

24
Parsing a String
  • data take_apart
  • input _at_1 Cost 10.
  • Integer input(scan(Cost,1,' /'),8.)
  • Num input(scan(Cost,2,' /'),8.)
  • Den input(scan(Cost,3,' /'),8.)
  • if missing(Num) then Amount Integer
  • else Amount Integer Num/Den
  • datalines
  • 1 3/4
  • 12 1/2
  • 123

Listing of Data Set TAKE_APART Cost Integer
Num Den Amount 1 3/4 1 3
4 1.75 12 1/2 12 1 2
12.50 123 123 . . 123.00
25
Using the SCAN Function to Extract a Last Name
  • data first_last
  • length last_name 15
  • input _at_1 name 20.
  • _at_22 phone 13.
  • extract the last name from name
  • last_name scan(name,-1,' ')
  • minus value scans from the right
  • datalines
  • Jeff W. Snoker (908)782-4382
  • Raymond Albert (732)235-4444
  • Alfred Edward Newman (800)123-4321
  • Steven J. Foster (201)567-9876
  • Jose Romerez (516)593-2377

26
Using the SCAN Function to Extract a Last Name
  • Names and Phone Numbers in Alphabetical Order (by
    Last Name)
  • Name Phone Number
  • Raymond Albert (732)235-4444
  • Steven J. Foster (201)567-9876
  • Alfred Edward Newman (800)123-4321
  • Jose Romerez (516)593-2377
  • Jeff W. Snoker (908)782-4382

27
Locating the Position of One String Within
Another String
  • data locate
  • input string 10.
  • first index(string,'xyz')
  • first_c indexc(string,'x','y','z')
  • /Equivalent indexc(string,'xyz') /
  • datalines
  • abczyx1xyz
  • 1234567890
  • abcx1y2z39
  • XYZabcxyz

string first first_c abczyx1xyz 8
4 1234567890 0 0 abcx1y2z39
0 4 XYZabcxyz 7 7
28
Locating the Position of One String Within
Another String
  • data locate
  • input string 10.
  • first find(string,'xyz','i')
  • first_c findc(string,'xyz','i')
  • / i means ignore case /
  • datalines
  • abczyx1xyz
  • 1234567890
  • abcx1y2z39
  • XYZabcxyz

string first first_c abczyx1xyz 8
4 1234567890 0 0 abcx1y2z39
0 4 XYZabcxyz 1 1
29
Locating One Word in a String Function INDEXW
  • data _null_
  • string 'anything goes any where'
  • index index(string,'any')
  • indexw indexw(string,'any')
  • put index indexw
  • run

index 1 indexw 15
Note You can specify delimiters for indexw in a
third argument
30
Changing Case
  • Data case
  • input name 15.
  • upper upcase(name)
  • lower lowcase(name)
  • proper propcase(name)
  • Datalines
  • gEOrge SMITH
  • The end

Listing of Data Set CASE name upper
lower proper gEOrge SMITH GEORGE
SMITH george smith George Smith The end
THE END the end The End
31
Substituting One Word for Another in a String
  • data convert
  • input _at_1 address 20.
  • Convert Street, Avenue and
  • Boulevard to their abbreviations
  • Address tranwrd(address,'Street','St.')
  • Address tranwrd(address,'Avenue','Ave.')
  • Address tranwrd(address,'Road','Rd.')
  • datalines
  • 89 Lazy Brook Road
  • 123 River Rd.
  • 12 Main Street

Listing of Data Set CONVERT Obs Address 1
89 Lazy Brook Rd. 2 123 River Rd. 3
12 Main St.
32
Spelling distance
  • data compare
  • length string1 string2 15
  • input string1 string2
  • points spedis(string1,string2)
  • datalines
  • same same
  • same sam
  • first xirst
  • last lasx
  • receipt reciept

Listing of Data Set COMPARE string1 string2
points same same 0 same
sam 8 first xirst 40 last
lasx 25 receipt reciept 7
33
The "ANY" Functions
  • data find_alpha_digit
  • input string 20.
  • first_alpha anyalpha(string)
  • first_digit anydigit(string)
  • datalines
  • no digits here
  • the 3 and 4
  • 123 456 789

Listing of Data Set FIND_ALPHA_DIGIT
first_ first_ string alpha
digit no digits here 1 0 the 3
and 4 1 5 123 456 789 0
1
34
The "NOT" FunctionsBeware of Trailing Blanks
  • length string 10
  • string '123'
  • position notdigit(string)
  • pos_trim notdigit(trim(string))
  • position 4 (position of first blank)
  • pos_trim 0

35
The "NOT" Functions
  • data data_cleaning
  • input string 20.
  • not_alpha notalpha(trim(string))
  • not_digit notdigit(trim(string))
  • datalines
  • abcdefg
  • 1234567
  • abc123
  • 1234abcd

Listing of Data Set DATA_CLEANING
not_ not_ string alpha digit abcdefg
0 1 1234567 1 0 abc123
4 1 1234abcd 1 5
36
Concatenation Functions
  • data join_up
  • length cats 6 catx 17
  • string1 'ABC '
  • string2 ' XYZ '
  • string3 '12345'
  • cats cats(string1,string2)
  • catx catx('',string1,string2,string3)
  • run

cats 'ABCXYZ' catx 'ABCXYZ12345
Without the length statement, cats and catx would
have a length of 200
37
Some LENGTH Functions
  • data how_long
  • one 'ABC '
  • miss ' ' / char missing value /
  • length_one length(one)
  • lengthn_one lengthn(one)
  • lengthc_one lengthc(one)
  • length_two length(miss)
  • lengthn_two lengthn(miss)
  • lengthc_two lengthc(miss)
  • run

3 3 6 1 0 1
38
The COMPARE Function
  • COMPARE(string1, string2 lt,'modifiers'gt)
  • I ignore case
  • L remove leading blanks
  • truncate the longer string to the length of the
    shorter string. The default is to pad the shorter
    string with blanks before a comparison.
  • (Note similar to the comparison operator)

If string1 and string2 are the same, COMPARE
returns a value of 0. If the arguments differ,
the sign of the result is negative if string1
precedes string2 in a sort sequence, and positive
if string1 follows string2 in a sort sequence
The magnitude of the result is equal to the
position of the leftmost character at which the
strings differ.
39
The COMPARE Function
  • data compare
  • input code _at__at_
  • value 'V30.450'
  • c1 compare(code,value)
  • c2 compare(code,value,'i')
  • c3 compare(trim(code),value,'i')
  • datalines
  • V30 V30.450 v30.4

Listing of Data Set COMPARE code value
c1 c2 c3 V30 V30.450 -4 -4
0 V30.450 V30.450 0 0 0 v30.4
V30.450 1 -6 0
40
The STRIP Function
  • data _null_
  • length concat 8
  • file print
  • one ' ABC '
  • two ' XYZ '
  • one_two '' one two ''
  • strip '' strip(one) strip(two)
    ''
  • concat cats('',one,two,'')
  • put one_two /
  • strip /
  • concat
  • run

one_two ABC XYZ stripABCXYZ concat
ABCXYZ
41
COUNT and COUNTC Functions
  • data Dracula / Get it Count Dracula /
  • input string 20.
  • count_abc count(string,'abc')
  • countc_abc countc(string,'abc')
  • count_abc_i count(string,'abc','i')
  • datalines
  • xxabcxABCxxbbbb
  • cbacba

Listing of Data Set DRACULA
count_ countc_ count_ string
abc abc abc_i xxabcxABCxxbbbb 1
7 2 cbacba 0
6 0
42
Contact Information
  • Author Ron Cody
  • You may download copies of the Powerpoint
    presentation from
  • www2.umdnj.edu/codyweb/biocomputing
Write a Comment
User Comments (0)
About PowerShow.com