Title: Data with semicolons
1Chapter 13
2Data with semicolons
- If we try to read this data, SAS will definitely
- complain.
- data new
- input author 15. title 30.
- datalines
- Smith, Alan Everything you want to know about
- Davis, Greg ?,!,etc.
-
- proc print
- run
3datalines4
- data new
- input author 15. title 30.
- datalines4
- Smith, Alan Everything you want to know about
- Davis, Greg ?,!,etc.
-
- proc print
- run
4Reading from a text file
- We saw in Chapter 12 how we could read data
- from a textfile.
- data new
- infile "Csomefile.txt"
- input x y z
- run
- proc print
- run
5- We could also give the file a name first.
- data new
- filename whatever "Csomefile.txt"
- infile whatever
- input x y z
- run
- proc print
- run
6Missover
- Data
- Male 2 8
- Female 3
- Female 4 1
- data new
- filename whatever "Csomefile.txt"
- infile whatever missover
- input x y z
- run
- proc print
- run
7Reading Multiple Files
- Suppose we have 3 files examscores1.txt,
- examscores2.txt and examscores3.txt
- data new
- filename whatever "Cexamscores.txt"
- infile whatever
- input x y z
- run
- Each file contains 3 columns of numbers.
8- data new
- filename whatever ("Cfile01.txt"
"Cdata01.txt") - infile whatever
- input x y z
- run
- Again, all files must have the same structure.
9Writing ASCII data to an external file
- data new
- infile "Cinputfile.txt"
- file "Coutputfile.txt"
- input x y z
- total x y z
- meanxy(xy)/2
- put x y z total meanxy
- run
10Writing CSV files
- options missing" "
- data new
- input x y z
- datalines
- 2 2 1
- 1 0 8
-
- ods listing close
- ods csv file"Cnewfile.csv"
- proc print datanew
- run
- ods csv close
- ods listing
11Permanent SAS data set
- A permanent SAS data set can only be used by
- SAS. The authors state that there is a free
- SAS viewer program to view and print SAS
- data sets.
- These datasets will usually use more storage
- than the original raw data. The biggest reason
- to use SAS data sets is speed.
12- If you plan to run many different analyses on
- a dataset that will not be changing, its a good
- idea to make a permanent data set.
- SAS data sets can easily be transferred to
- other SAS users. You dont need to know the
- structure of the data since the variables, labels
- and formats have already been defined.
13Creating a SAS data set
- libname mydata "C\SASDATA"
- data mydata.file2
- input x y z
- datalines
- 2 2 1
- 1 0 8
-
- run
14The file that was just created.
15Reading SAS data sets
- libname abc "C\SASDATA"
- proc print dataabc.file2
- proc means dataabc.file2
- var x
- run
- In the procedures you must use the data
- option. Otherwise, SAS will not know what
- dataset you are referring to.
16(No Transcript)
17Proc Contents
- This procedure will list information about the
- SAS dataset especially the variables and
- their type.
- libname abc "C\SASDATA"
- proc contents dataabc.file2 varnum
- run
18(No Transcript)
19SAS data sets with formats
- If you have created a SAS data set that creates
- user-created formats to variables, you must
- make the format library permanent as well.
- If you send someone the SAS data set, be sure
- to send them the format library.
20Creating the format library
- libname mydata "C\SASDATA"
- options fmtsearch (mydata)
- proc format librarymydata
- value ggroup mmale
- ffemale
- run
- data mydata.file2
- input gender y z
- format group ggroup
- datalines
- m 2 1
- f 0 8
-
- run
21A file containing the formats.
22Read SAS data sets with formats
- libname newdata "C\SASDATA"
- options fmtsearch(newdata)
- proc print datanewdata.file2
- run
23Working with large data sets
- The authors have suggestions for working more
- efficiently with large data sets.
- On a PC, a dataset with 500,000
- observations
- and 50 observations might be considered large.
- On a mainframe, a dataset with millions of
- observations might be considered large.
24Dont read files unnecessarily
- Inefficient
- libname newdata "C\SASDATA"
- data mine
- set newdata.file01
- run
- proc print datamine
- run
- Efficient
- libname newdata "C\SASDATA"
- proc print datanewdata.file01
- run
25DROP unnecessary variables
- Inefficient
- data mine
- input week1-week52
- avgtempmean(of week1-week52)
- run
- Efficient
- data mine
- input week1-week52
- avgtempmean(of week1-week52)
- drop (week1-week52)
- run
26Use DROP(or KEEP) with SET.
- Inefficient
- data mine
- set old
- drop x1-x50 y z1-z20
- etc
- Efficient
- data mine
- set old (dropx1-x50 y z1-z20)
- etc
27Use CLASS instead of BY
- Inefficient
- proc sort datanew
- by gender
- proc means datanew
- by gender
- var age gpa
- Efficient
- proc means datanew
- class gender
- var age gpa
28Use WHERE instead of IF
- Inefficient
- data new
- set old
- if x ge 20
- run
- Efficient
- data new
- set old
- where x ge 20
- run
- Or
- data new
- set old (where (x ge 20))
- run
29Use WHERE in procedures
- Inefficient
- data new
- set old
- where x ge 20
- run
- proc means datanew
- var age gpa
- run
- Efficient
- proc means dataold
- where x ge 20
- var age gpa
- run
- Or
- proc means
- dataold(where (x ge 20))
- var age gpa
- run
30Use ELSE IF instead of multiple IFs
- Inefficient
- data new
- input crsavg
- if crsavg ge 90 then LG"A"
- if 80 le crsavg lt 90 then LG"B"
- if 70 le crsavg lt 80 then LG"C"
- if 60 le crsavg lt 70 then LG"D"
- if crsavg lt 60 then LG"F"
- run
31- Efficient
- data new
- input crsavg
- if crsavg ge 90 then LG"A"
- else if 80 le crsavg lt 90 then LG"B"
- else if 70 le crsavg lt 80 then LG"C"
- else if 60 le crsavg lt 70 then LG"D"
- else if crsavg lt 60 then LG"F"
- run
32- When using multiple IFs, its best to make the
- first IF the one that is most likely to be true.
33Save Summary Statistics
- This will be useful if you plan to do further
- analysis on these results.
- libname newdata "C\SASDATA"
- proc means datanewdata.file01
- class gender
- var age height weight
- output outnewdata.means meanage height weight
- run
34Use only the first n obs to test code.
- data test
- set hugedata(obs10)
- etc.
- run
- OR
- proc print datahugedata(obs10)
- run
35Adding few obs to a large dataset
- proc append basehugedata datanewfile
- run