It is most common to read data in with white - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

It is most common to read data in with white

Description:

We saw informats used with dates in Chapter 4. ... 22 21 4 6. Suppressing Error Messages. Suppose that you receive a dataset and the ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 28
Provided by: mickey9
Category:
Tags: common | data | read | saw | white

less

Transcript and Presenter's Notes

Title: It is most common to read data in with white


1
Chapter 12
2
  • It is most common to read data in with white
  • space between observations and one record
  • per line.
  • data new
  • input x y z
  • datalines
  • 2 no 4
  • 8 yes 6
  • 2 no 6

3
Reading data from a file
  • data new
  • infile etest.dat
  • input x y z
  • run
  • The data is in a separate text file on drive E.

4
Comma-delimited data
  • data new
  • infile etest.dat dlm,
  • input x y z
  • run
  • The data
  • 2, no, 4
  • 8, yes, 6
  • 2, no, 6

5
Delimiter Sensitive Data(DSD)
  • data new
  • infile etest.dat dsd
  • input x y z
  • run
  • The data
  • 2,, 4
  • 8, yes, 6
  • 2, no, 6
  • When there are 2 adjacent commas, the observation
    is treated as missing.
  • When there are quotes around a value for a
    character variable, they are removed.

6
Informats
  • We saw informats used with dates in Chapter 4.
  • Informats can be used to extend the length of a
    long value.
  • They can also be used to combine values.

7
Eight-character limit
  • Suppose we run the program below. See the
  • output on the next page.
  • data new
  • input x y z
  • datalines
  • 4 tennessee 20
  • 14 georgia 3
  • 22 mississippi 2
  • proc print
  • run

8
  • Notice that SAS cut off tennessee and mississippi
  • when the 8 character limit was reached.

9
  • data new
  • informat y 13.
  • input x y z
  • datalines
  • 4 tennessee 20
  • 14 georgia 3
  • 22 mississippi 2
  • proc print
  • run

10
  • data new
  • input x y 13. z
  • datalines
  • 4 tennessee 20
  • 14 georgia 3
  • 22 mississippi 2
  • proc print
  • run

11
  • Suppose that you want to combine the first and
  • last name into one single variable.
  • data new
  • input age name 30. gpa
  • datalines
  • 23 jennifer smith 3.1
  • 26 bob simpson 2.4
  • 33 greg davis 3.6

12
  • One caution about this.
  • In the list of data, there are at least 2 spaces
    between
  • where the name ends and the gpa begins.
  • If there was only 1 space between then we would
    have
  • the output on the following page.

13
  • This is all the output that was printed before
    SAS
  • complained because it was expecting a numeric
    for
  • X but found bob which is clearly not.

14
Column Input
  • data new
  • input x 1-2 y 3-5 z 6-8
  • datalines
  • 12113455
  • 98832211
  • 87733101

15
Formatted Input
  • data new
  • input _at_1 x 2.
  • _at_3 y 3.
  • _at_6 z 3.
  • datalines
  • 12113455
  • 98832211
  • 87733101

16
  • data new
  • input _at_6 id 4.
  • _at_32 average 4.
  • datalines
  • 960112103 jane smith 88.3
  • 960252886 george washington 78.7
  • 960312557 greg davis 92.1

17
  • Sometimes one record takes up more than one
  • line.
  • data new
  • input 1 _at_1 ID 1 means line 1
  • _at_11 Name 30.
  • 2 _at_ 11 GPA 2 means line 2
  • datalines
  • 960112103 jane smith
  • 88.3
  • 960252886 george washington
  • 78.7
  • 960312557 greg davis
  • 92.1

18
(No Transcript)
19
  • Variables can be read in any order.
  • Columns of data may be read more than once.
  • data new
  • input x 1-3 y 2-5 z 1-9
  • datalines
  • 181822330
  • 001451923

20
Informat Lists
  • data course
  • input _at_1 quiz1 2.
  • _at_4 quiz2 2.
  • _at_7 quiz3 2.
  • _at_10 exam1 3.
  • _at_14 exam2 3.
  • datalines
  • 10 14 15 88 93
  • 16 12 17 82 94
  • 19 20 13 81 90

21
  • We can shorten the previous data step with
  • data course
  • input _at_1 (quiz1-quiz3)(2.)
  • _at_10 (exam1-exam2)(3.)
  • datalines
  • 10 14 15 88 93
  • 16 12 17 82 94
  • 19 20 13 81 90

22
  • A more compact way of
  • doing this is
  • data course
  • input _at_1 (quiz1-quiz3)(2. 4)
  • _at_4 (hwork1-hwork2)(2. 4)
  • datalines
  • 10 14 15 11 12
  • 16 12 17 12 18
  • 19 20 13 14 20
  • Suppose we have
  • data course
  • input _at_1 quiz1 2.
  • _at_4 hwork1 2.
  • _at_7 quiz2 2.
  • _at_10 hwork1 2.
  • _at_13 quiz3 2.
  • datalines
  • 10 14 15 11 12
  • 16 12 17 12 18
  • 19 20 13 14 20

23
(No Transcript)
24
Holding the line
  • data new
  • input x _at__at_
  • datalines
  • 3 10 11 15
  • 22 21 4 6

25
  • data new
  • input x y _at__at_
  • datalines
  • 3 10 11 15
  • 22 21 4 6

26
Suppressing Error Messages
  • Suppose that you receive a dataset and the
  • person that created it used NA for missing
  • observations rather than a period.
  • Lets suppose were reading in a column of
  • peoples ages. When SAS comes across this
  • NA, it will complain.

27
  • data new
  • input x ??
  • datalines
  • 33
  • 41
  • 55
  • 23
  • NA
  • 41
  • 22
Write a Comment
User Comments (0)
About PowerShow.com