SAS: ARRAY PROCESSING - PowerPoint PPT Presentation

About This Presentation
Title:

SAS: ARRAY PROCESSING

Description:

SAS: ARRAY PROCESSING Jordan Elm INTRODUCTION Most mathematical and computer languages have some notation for repeating. EG: a matrix, a vector, a dimension, a table ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 42
Provided by: elmj
Learn more at: http://people.musc.edu
Category:
Tags: array | processing | sas

less

Transcript and Presenter's Notes

Title: SAS: ARRAY PROCESSING


1
SAS ARRAY PROCESSING
  • Jordan Elm

2
INTRODUCTION
  • Most mathematical and computer languages have
    some notation for repeating.
  • EG a matrix, a vector, a dimension, a table
  • In a SAS data step, this structure is called an
    Array.
  • A group of variables defined in a data step.
  • Array elements dont need to be contiguous, the
    same length, or even related at all.
  • All elements must be character or numeric.

3
Why do we need SAS arrays?
  • Use arrays to help read and analyze repetitive
    data with a minimum of coding.
  • An array and a loop can make the program smaller.
  • Examples of Use
  • Recoding variables(eg. missing values set to
    -999)
  • Applying the same computation to many variables
    simultaneously (eg. Fahrenheit to Celsius)
  • Computing new variables (eg. Continuous to
    Binary)
  • Reshaping Data (Wide to Long/ Long to Wide)

4
Example Applying the same computation to many
variables simultaneously
  • For each record (row) there are 24 variables
    (temp1-temp24) with the temperatures for each
    hour of the day.
  • Temps are in Fahrenheit and need to convert them
    to Celsius.
  • data
  • input etc.
  • celsius_temp1 5/9(temp1 32)
  • celsius_temp2 5/9(temp2 32)
  • . . .
  • celsius_temp24 5/9(temp24 32)
  • run

5
Define arrays
  • Define arrays and use a loop
  • data
  • input etc.
  • array temperature_array 24 temp1-temp24
  • array celsius_array 24 celsius_temp1-celsius_tem
    p24
  • do i 1 to 24
  • celsius_arrayi 5/9(temperature_arrayi
    32)
  • end
  • run

6
Recoding Variables
  • Missing coded as -999
  • data ... set ...
  • array inc12 faminc1 - faminc12
  • do i 1 to 12
  • if inci-999 then inci.
  • end

7
Of Note
  • While TEMP1 is equivalent to the first element,
    TEMP2 to the second etc., the variables do not
    need to be named consecutively.
  • The array would work just as well with
    non-consecutive variable names.
  • array sample_array 5 x a i r d
  • In this example, the variable x is equivalent to
    the first element, a to the second etc.

8
BASIC ARRAY CONCEPTS
  • SAS arrays are another way to temporarily group
    and refer to SAS variables.
  • A SAS array is not a new data structure, the
    array name is not a variable, and arrays do not
    define additional variables.
  • Rather, a SAS array provides a different name to
    reference a group of variables.

9
BASIC ARRAY CONCEPTS
  • The ARRAY statement defines variables to be
    processed as a group.
  • The variables referenced by the array are called
    elements.
  • Once an array is defined, the array name and an
    index reference the elements of the array.
  • Since similar processing is generally completed
    on the array elements, references to the array
    are usually found within DO groups.

10
ARRAY STATEMENT
11
ARRAY STATEMENT
  • The statement used to define an array is the
    ARRAY statement.
  • array array-name n ltgt ltlengthgt array-elements
    lt(initial-values)gt
  • array-name Any valid SAS name
  • n Number of elements within the array
  • - Indicates the elements within the array are
    character type variables
  • length A common length for the array elements
  • array-elements List of SAS variables to be part
    of the array
  • initial values Provides the initial values for
    each of the array elements

12
BASIC CONCEPTS (cont)
  • The ARRAY statement is a compiler statement
    within the data step.
  • Array elements cannot be used in compiler
    statements such as DROP or KEEP.
  • An array must be defined within the data step
    prior to being referenced or an error will occur.
  • Defining an array within one data step and
    referencing the array within another data step
    will cause errors.
  • Must define an array within every data step where
    the array will be referenced

13
Array Statements Special Variables
  • When all numeric or all character variables in
    the data set are to be elements within the array,
    no need to list the individual variables as
    elements.
  • _NUMERIC_ - when all the numeric variables will
    be used as elements
  • _CHARACTER_ - when all the character variables
    will be used as elements
  • _ALL_ - when all variables on the data set will
    be used as elements and the variables are all the
    same type
  • array sample_array 5 _ALL_

14
Number of Elements
  • N is the array subscript in the array definition
    and it refers to the number of elements within
    the array.
  • A numeric constant,
  • a variable whose value is a number,
  • a numeric SAS expression,
  • or an asterisk ()
  • The subscript must be enclosed within
  • braces ,
  • square brackets ,
  • or parentheses ().
  • array sample_array 5 _ALL_

15
Number of Elements
  • When the asterisk is used, it is not necessary to
    know how many elements are contained within the
    array. SAS will count the number of elements for
    you.
  • An example of using the asterisk is when one of
    the special variables defines the elements.
  • array allnums _numeric_
  • When it is necessary to know how many elements
    are in the array, the DIM function can be used to
    return the count of elements.
  • do i 1 to dim(allnums)
  • allnumsi round(allnumsi,.1)
  • end

16
ARRAY REFERENCES
  • When an array is defined with the ARRAY statement
    SAS creates an array reference. The array
    reference is in the following form
  • array-namen
  • The value of n will be the elements position
    within the array.
  • For example, in the temperature array the
    temperature for 100 PM is in the variable
    TEMP13.
  • Therefore, the array reference will be
  • temperature_array13

17
ARRAY REFERENCES
  • The variable name and the array reference are
    interchangeable.
  • Variable Name Array Reference
  • temp1 temperature_array1
  • temp2 temperature_array2
  • temp3 temperature_array3
  • An array reference may be used within the data
    step in almost any place other SAS variables may
    be used including as an argument to many SAS
    functions.

18
USING ARRAY INDEXES
  • The array index is the range of array elements.
  • In SAS, subscripts are 1-based by default where
    arrays in other languages may be 0-based.
  • When we set the array bounds with the subscript
    and only specify the number of elements within
    the array as our upper bound, the lower bound is
    by default 1.
  • There may be scenarios when we want the index to
    begin at a lower bound other than 1.
  • Say we only want the temperatures for the
    daytime, temperatures 6 through 18.
  • array temperature_array 618 temp6 temp18

19
Identify patterns across variables using arrays
  • The objective is to identify the number of
    missing values for each row.
  • Create a new variable named nmiss, which will be
    the number of missing values across variables
    faminc1 - faminc12
  • data mspatterns
  • set recode_missing
  • array inc(12) faminc1-faminc12 / existing vars
    /
  • nmiss 0
  • do i 1 to 12
  • if inc(i) . then nmiss nmiss 1
  • end
  • run

20
(No Transcript)
21
Reshaping DATA from Wide to Long
  • data wide
  • input famid faminc96 faminc97 faminc98
  • cards
  • 1 40000 40500 41000
  • 2 45000 45400 45800
  • 3 75000 76000 77000
  • run

22
Reshaping wide to longcreating only one
variable using arrays
  • data long
  • set wide
  • array Afaminc(9698) faminc96 - faminc98
  • do year 96 to 98
  • faminc Afamincyear
  • output
  • end
  • drop faminc96-faminc98
  • run

23
LONG DATA
  • Obs famid year faminc
  • 1 1 96 40000
  • 2 1 97 40500
  • 3 1 98 41000
  • 4 2 96 45000
  • 5 2 97 45400
  • 6 2 98 45800
  • 7 3 96 75000
  • 8 3 97 76000
  • 9 3 98 77000

24
TEMPORARY ARRAYS
  • A temporary array is an array that only exists
    for the duration of the data step where it is
    defined.
  • Useful for storing constant values (for
    calculations).
  • No corresponding variables to identify the array
    elements.
  • The elements are defined by the key word
    _TEMPORARY_.

25
TEMPORARY ARRAYS
  • array rate 6 _temporary_ (0.05 0.08 0.12 0.20
    0.27 0.35)
  • The asterisk subscript cannot be used when
    defining a temporary array and explicit array
    bounds must be specified for temporary arrays.

26
TEMPORARY ARRAY
  • For example when a customer is delinquent in
    payment of their account balance, a penalty is
    applied. The amount of the penalty depends upon
    the number of months that the account is
    delinquent.
  • Without array processing
  • if month_delinquent eq 1 then balance balance
    (balance0.05)
  • else if month_delinquent eq 2 then balance
    balance (balance 0.08)
  • else if month_delinquent eq 3 then balance
    balance (balance 0.12)
  • else if month_delinquent eq 4 then balance
    balance (balance 0.20)
  • else if month_delinquent eq 5 then balance
    balance (balance 0.27)
  • else if month_delinquent eq 6 then balance
    balance (balance 0.35)

27
TEMPORARY ARRAY
  • Simplifies the code, and improves performance
    time.
  • data ... set ...
  • array rate 6 _temporary_ (0.05 0.08 0.12 0.20
    0.27 0.35)
  • if month_delinquent ge 1 and month_delinquent le
    6 then
  • balance balance (balance ratemonth_delinque
    nt)

28
TEMPORARY ARRAY
  • Setting initial values is not required on the
    ARRAY statement. The values within a temporary
    array may be set in another manner within the
    data step.
  • array rateb 6 _temporary_
  • do i 1 to 6
  • ratebi i 0.5
  • end

29
Implicit Arrays
  • array clin q1b q2b q3b q4b q5b q6b q7b q8b q9b
    q10b q11b q12b q13b
  • DO OVER clin
  • if clin in (2,.) then clin0
  • else if clin gt 2 then clin1
  • end
  • data baselinedemo
  • set baselinedemo
  • array zero male Indian Asian black white more
    hisp rhand
  • do over zero
  • if zero. then zero0
  • end
  • run

30
WHEN TO USE ARRAYS
  • It makes sense to use arrays when there are
    repetitive VARIABLES that are related WITHIN A
    SINGLE DATASTEP and the programmer needs to
    iterate though most of them.
  • The combination of arrays and do loops in the
    data step lend power to programming.
  • The variables in the array do not need to be
    related or contiguous

31
COMMON ERRORS AND MISUNDERSTANDINGS
  • INVALID INDEX RANGE
  • FUNCTION NAME AS AN ARRAY NAME
  • ARRAY REFERENCED IN MULTIPLE DATA STEPS, BUT
    DEFINED IN ONLY ONE

32
LIMITATIONS OF ARRAY STATEMENTS
  • Can only be used within a DATA Step (not a PROC).
    If want to do same action for several
    datasteps/procs, macro approach may be easier
  • SAS Array references cannot be used as
  • As an input to a MACRO parameter
  • In a FORMAT, LABEL, DROP, KEEP, LENGTH or OUTPUT
    statement
  • SAS Arrays refer to Variables or Constants (not
    Datasets or the value of a variable)

33
USE MACROS TO CREATE AN ARRAY
34
Using MACROS To Define ARRAYS Within a Single
Datastep
  • data
  • input etc.
  • array temperature_array 24 temp1-temp24
  • array celsius_array 24 celsius_temp1-celsius_tem
    p24
  • do i 1 to 24
  • celsius_arrayi 5/9(temperature_arrayi
    32)
  • end
  • run
  • macro w
  • data
  • input etc.
  • do i 1 to 24
  • celsiusi 5/9(tempi 32)
  • end
  • run
  • mend
  • w

35
CREATE A MACRO-DEFINED "ARRAY"
  • Using
  • LET xARRAY ELEMENTS (VARIABLES)
  • DO
  • LET y SCAN
  • MORE FLEXIBLE,
  • -Switching from Datastep to Proc
  • -Using Previously defined MACROS

36
LET and SCAN
  • Creates a macro variable and assigns it a value
  • LET can be used inside or outside of a macro
    program. CAN BE A STRING OF WORDS.
  • Syntax
  • LET macro-variable ltvaluegt
  • SCAN Search for a word that is specified by its
    position in a string

37
MACRO Arrays
  • Programs are reusable and easier to understand.
  • byid is a MACRO I wrote which I use a lot.
  • let xq1b q2b q3b q4b q5b q6b q7b q8b q9b q10b
    q11b q12b
  • do i1 to 12
  • let yscan(x, i)
  • byid(y)
  • end
  • Dont forget to embed this code within macro and
    mend

38
ONE DIMENSION ARRAYS
39
MULTI-DIMENSION ARRAYS
  • Multi-dimensional arrays may be created in two or
    more dimensions.
  • Conceptually, a two-dimensional array is a table
    with rows and columns (Although to SAS, it is
    still a group of variables)
  • Within the Program Data Vector the variable
    structure may be visualized as

40
MULTI-DIMENSION ARRAYS
  • The array statement to define this
    two-dimensional array will be
  • array sale_array 3, 12 sales1-sales12
    exp1-exp12 comm1-comm12
  • Number of elements indicates of rows (1st
    dimension), and of columns (2nd dimension).
  • Must reference the element number for both
    dimensions.
  • The reference to the sixth element for the
    expense group in the sales array is
  • sale_array2,6 refers to EXP6
  • Three and more dimensions can be defined as well.

41
References
  • Steve First and Teresa Schudrowitz. Arrays Made
    Easy An Introduction to Arrays and Array
    Processing. Paper 242-30. SUGI 30
    http//www2.sas.com/proceedings/sugi30/242-30.pdf
  • Steve First and Teresa Schudrowitz, Systems
    Seminar Consultants, Inc., Madison, WI
    Introduction to SAS. UCLA Academic Technology
    Services, Statistical Consulting Group. from
    http//www.ats.ucla.edu/stat/sas/notes2/.
  • http//www.ats.ucla.edu/stat/sas/seminars/SAS_arr
    ays/default_new.htm
Write a Comment
User Comments (0)
About PowerShow.com