Title: SAS Programming: Working With Variables
1SAS Programming Working With Variables
2Data Step Manipulations
- New variables should be created during a Data
step - Existing variables should be manipulated during a
data step
3Missing Values in SAS
- SAS uses a period (.) to represent missing values
in a SAS data set - Different SAS procedures and functions treat
missing values differently - always be careful
when your SAS data set contains missing values
4Working With Numeric Variables
- SAS uses the standard arithmetic operators
- , -, , /, (exponentiation)
- Note on Missing Values Arithmetic operators
propagate missing values. - SAS has many built-in numeric functions
- round(variable,value) Rounds variable to nearest
unit given by value. - sum(variable1, variable2, ) Adds any number of
variables and ignores missing values
5Acting on Selected Observations
- Working with selected observations - subsets of a
SAS data set - is easy in SAS - First, you must decide on a selection process.
What is the distinguishing characteristic of the
observations you want to work with?
6Selecting Observations IF-THEN Statements
- The IF-THEN statement is the most common way to
select observations. Format - IF condition THEN action
- condition is one or more comparisons. For any
observation, condition is either true or false.
If condition is true, SAS performs the action.
7IF-THEN Statement Example
- Suppose INC is a variable representing annual
household income and you want to create a dummy
variable, DUM, based on income that takes value 1
when income is less than 10,000. - IF INClt10000 THEN DUM1
- IF INC gt10000 THEN DUM0
8Using OBS in condition
- In a SAS data set, each record has an observation
number which is the number stored in the variable
OBS - OBS can be used in a condition, but you must
refer to the observation number using the
variable _n_ - Example set the first 10 observations of INC
equal to zero - IF _n_ lt 10 THEN INC0
9Comparison Operators
- There are 6 comparison operators
- Can use either the symbol or mnemonic
- Symbol Mnemonic Meaning
- EQ Equal to
- NE Not equal to
- gt GT Greater than
- lt LT Less than
- gt GE Greater than or equal to
- lt LE Less than or equal to
10Multiple Comparisons
- Can make more than one comparison in condition by
using AND/OR - AND / All parts must be true for condition to
be true - Or / At least one part must be true for
condition to be true - Be careful when using AND/OR
- Can use parentheses in condition
11Selecting Observations for New SAS Data Sets
- Can use IF-THEN statements to create new SAS data
sets - Either delete or keep selected observations based
on condition
12Deleting Observations
- Format for IF-THEN
- IF condition THEN DELETE
- Example Removing missing observations. Suppose
the variable INC is missing for some households
and you want to drop these observations - IF INC. THEN DELETE
13Keeping Selected Observations
- A more straightforward way to create new SAS data
sets is to keep only those observations that meet
some condition. Format - IF condition
14Example
- The file salary.dat contains data for 93
employees of a Chicago bank. The file contains
the following variables - Y Salary
- X Years of education
- E Months of previous work experience
- T Number of months after 1/1/69 that the
individual was hired - First 61 observations are females, last 32 males
15Example Create Dummy for Males
Program to create dummy variables and new SAS
data sets data
salary infile s\mysas\salary.dat input y x e
t IF _n_ gt61 THEN G1 IF _n_ lt 60 THEN
G0 run
16Example Create Data Set for Males
Make a new SAS data set composed of
only records for males
data males New SAS data set
setsalary Created from salary IF
G1 run
17Example Create Data Set for Females
Make a new SAS data set composed of
only records for females
data females New SAS data set
setsalary Created from salary IF
G0 run
18Describing Data Sample Statistics
- Format
- PROC UNIVARIATE ltoption-listgt
- VAR variable-list
- BY variable-list
- FREQ variable
- WEIGHT variable
19Selected Options
- DATASAS-data-set Specify Data Set
- If omitted, uses most recent
- SAS data set
- FREQ Generate Frequency Table
- NOPRINT Suppress Printed Output
20VAR Statement
- List of variables to calculate sample statistics
for. - If no variables are specified, sample statistics
are generated for all numeric variables
21WEIGHT Statement
- Specifies a numeric variable in the SAS data set
whose values are used to weight each observation
22BY Statement
- Can be used to obtain separate analyses on
observations in groups defined by some value of a
variable. - Example Suppose SEX1 if individual is male,
SEX0 if individual is female EARNannual
earnings. - PROC UNIVARIATE Generates statistics
- VAR EARN on earnings for men
- BY SEX and women
- RUN
23BY Statements and Sorting
- Before using a BY statement, the SAS data set
must be sorted on the variable specified - SAS puts the observations in order, based on the
values of the variables specified in the BY
statement. - Use PROC SORT
24PROC SORT
- FORMAT
- PROC SORT ltoptionsgt
- BY ltoptionsgtvariables
- Sort Order ascending. For descending, put
DESCENDING on BY line
25Describing Data Frequencies
- FORMAT
- PROC FREQ ltoptionsgt
- BY variables
- TABLES requestslt/optionsgt
- WEIGHT variable
26One-Way Frequency Table
- SEX1 (Male) SEX0(Female)
- EDUCATION1(Less than High School), 2(High
School),3(Some College),4(College grad.) - EARNAnnual Earnings
- PROC FREQ
- TABLES EDUCATION
- RUN
- PROC FREQ
- TABLES EDUCATION
- BY SEX
- RUN