Data with semicolons - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Data with semicolons

Description:

var x; run; In the procedures you must use the 'data=' option. ... var age height weight; output out=newdata.means mean=age height weight; run; ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 36
Provided by: mickey9
Category:
Tags: data | semicolons | var

less

Transcript and Presenter's Notes

Title: Data with semicolons


1
Chapter 13
2
Data with semicolons
  • If we try to read this data, SAS will definitely
  • complain.
  • data new
  • input author 15. title 30.
  • datalines
  • Smith, Alan Everything you want to know about
  • Davis, Greg ?,!,etc.
  • proc print
  • run

3
datalines4
  • data new
  • input author 15. title 30.
  • datalines4
  • Smith, Alan Everything you want to know about
  • Davis, Greg ?,!,etc.
  • proc print
  • run

4
Reading from a text file
  • We saw in Chapter 12 how we could read data
  • from a textfile.
  • data new
  • infile "Csomefile.txt"
  • input x y z
  • run
  • proc print
  • run

5
  • We could also give the file a name first.
  • data new
  • filename whatever "Csomefile.txt"
  • infile whatever
  • input x y z
  • run
  • proc print
  • run

6
Missover
  • Data
  • Male 2 8
  • Female 3
  • Female 4 1
  • data new
  • filename whatever "Csomefile.txt"
  • infile whatever missover
  • input x y z
  • run
  • proc print
  • run

7
Reading Multiple Files
  • Suppose we have 3 files examscores1.txt,
  • examscores2.txt and examscores3.txt
  • data new
  • filename whatever "Cexamscores.txt"
  • infile whatever
  • input x y z
  • run
  • Each file contains 3 columns of numbers.

8
  • data new
  • filename whatever ("Cfile01.txt"
    "Cdata01.txt")
  • infile whatever
  • input x y z
  • run
  • Again, all files must have the same structure.

9
Writing ASCII data to an external file
  • data new
  • infile "Cinputfile.txt"
  • file "Coutputfile.txt"
  • input x y z
  • total x y z
  • meanxy(xy)/2
  • put x y z total meanxy
  • run

10
Writing CSV files
  • options missing" "
  • data new
  • input x y z
  • datalines
  • 2 2 1
  • 1 0 8
  • ods listing close
  • ods csv file"Cnewfile.csv"
  • proc print datanew
  • run
  • ods csv close
  • ods listing

11
Permanent SAS data set
  • A permanent SAS data set can only be used by
  • SAS. The authors state that there is a free
  • SAS viewer program to view and print SAS
  • data sets.
  • These datasets will usually use more storage
  • than the original raw data. The biggest reason
  • to use SAS data sets is speed.

12
  • If you plan to run many different analyses on
  • a dataset that will not be changing, its a good
  • idea to make a permanent data set.
  • SAS data sets can easily be transferred to
  • other SAS users. You dont need to know the
  • structure of the data since the variables, labels
  • and formats have already been defined.

13
Creating a SAS data set
  • libname mydata "C\SASDATA"
  • data mydata.file2
  • input x y z
  • datalines
  • 2 2 1
  • 1 0 8
  • run

14
The file that was just created.
15
Reading SAS data sets
  • libname abc "C\SASDATA"
  • proc print dataabc.file2
  • proc means dataabc.file2
  • var x
  • run
  • In the procedures you must use the data
  • option. Otherwise, SAS will not know what
  • dataset you are referring to.

16
(No Transcript)
17
Proc Contents
  • This procedure will list information about the
  • SAS dataset especially the variables and
  • their type.
  • libname abc "C\SASDATA"
  • proc contents dataabc.file2 varnum
  • run

18
(No Transcript)
19
SAS data sets with formats
  • If you have created a SAS data set that creates
  • user-created formats to variables, you must
  • make the format library permanent as well.
  • If you send someone the SAS data set, be sure
  • to send them the format library.

20
Creating the format library
  • libname mydata "C\SASDATA"
  • options fmtsearch (mydata)
  • proc format librarymydata
  • value ggroup mmale
  • ffemale
  • run
  • data mydata.file2
  • input gender y z
  • format group ggroup
  • datalines
  • m 2 1
  • f 0 8
  • run

21
A file containing the formats.
22
Read SAS data sets with formats
  • libname newdata "C\SASDATA"
  • options fmtsearch(newdata)
  • proc print datanewdata.file2
  • run

23
Working with large data sets
  • The authors have suggestions for working more
  • efficiently with large data sets.
  • On a PC, a dataset with 500,000
  • observations
  • and 50 observations might be considered large.
  • On a mainframe, a dataset with millions of
  • observations might be considered large.

24
Dont read files unnecessarily
  • Inefficient
  • libname newdata "C\SASDATA"
  • data mine
  • set newdata.file01
  • run
  • proc print datamine
  • run
  • Efficient
  • libname newdata "C\SASDATA"
  • proc print datanewdata.file01
  • run

25
DROP unnecessary variables
  • Inefficient
  • data mine
  • input week1-week52
  • avgtempmean(of week1-week52)
  • run
  • Efficient
  • data mine
  • input week1-week52
  • avgtempmean(of week1-week52)
  • drop (week1-week52)
  • run

26
Use DROP(or KEEP) with SET.
  • Inefficient
  • data mine
  • set old
  • drop x1-x50 y z1-z20
  • etc
  • Efficient
  • data mine
  • set old (dropx1-x50 y z1-z20)
  • etc

27
Use CLASS instead of BY
  • Inefficient
  • proc sort datanew
  • by gender
  • proc means datanew
  • by gender
  • var age gpa
  • Efficient
  • proc means datanew
  • class gender
  • var age gpa

28
Use WHERE instead of IF
  • Inefficient
  • data new
  • set old
  • if x ge 20
  • run
  • Efficient
  • data new
  • set old
  • where x ge 20
  • run
  • Or
  • data new
  • set old (where (x ge 20))
  • run

29
Use WHERE in procedures
  • Inefficient
  • data new
  • set old
  • where x ge 20
  • run
  • proc means datanew
  • var age gpa
  • run
  • Efficient
  • proc means dataold
  • where x ge 20
  • var age gpa
  • run
  • Or
  • proc means
  • dataold(where (x ge 20))
  • var age gpa
  • run

30
Use ELSE IF instead of multiple IFs
  • Inefficient
  • data new
  • input crsavg
  • if crsavg ge 90 then LG"A"
  • if 80 le crsavg lt 90 then LG"B"
  • if 70 le crsavg lt 80 then LG"C"
  • if 60 le crsavg lt 70 then LG"D"
  • if crsavg lt 60 then LG"F"
  • run

31
  • Efficient
  • data new
  • input crsavg
  • if crsavg ge 90 then LG"A"
  • else if 80 le crsavg lt 90 then LG"B"
  • else if 70 le crsavg lt 80 then LG"C"
  • else if 60 le crsavg lt 70 then LG"D"
  • else if crsavg lt 60 then LG"F"
  • run

32
  • When using multiple IFs, its best to make the
  • first IF the one that is most likely to be true.

33
Save Summary Statistics
  • This will be useful if you plan to do further
  • analysis on these results.
  • libname newdata "C\SASDATA"
  • proc means datanewdata.file01
  • class gender
  • var age height weight
  • output outnewdata.means meanage height weight
  • run

34
Use only the first n obs to test code.
  • data test
  • set hugedata(obs10)
  • etc.
  • run
  • OR
  • proc print datahugedata(obs10)
  • run

35
Adding few obs to a large dataset
  • proc append basehugedata datanewfile
  • run
Write a Comment
User Comments (0)
About PowerShow.com