Point and Click SAS: Getting Started with SAS - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

Point and Click SAS: Getting Started with SAS

Description:

Today I am going to walk you though the process of doing analyses using a ... Its major downfall is in data manipulation involving multiple tables. ... – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 83
Provided by: bal7
Category:

less

Transcript and Presenter's Notes

Title: Point and Click SAS: Getting Started with SAS


1
Point and Click SASGetting Started with SAS
Enterprise Guide 4.1
  • Raymond R. Balise, PhD
  • Stanford University
  • Department of Health Research and Policy

2
From Start to Finish
  • Today I am going to walk you though the process
    of doing analyses using a package called
    SAS/Enterprise guide. Along the way I will hit
    the high points in these areas
  • Importing
  • Cleaning
  • Visualizing
  • Analyzing
  • Reporting

3
Data Management and Analysis Choices
  • You have many choices for managing and analyzing
    data
  • Excel by Microsoft
  • R by R Foundation for Statistical Computing
  • Rcmdr by John Fox
  • S-Plus by Insightful
  • SPSS
  • SAS
  • SAS/Enterprise Guide

4
Excel
  • Excel affords a very nice interface to quickly
    allow simple tables of data.
  • It has some data validation tools.
  • It has nice visualization prototyping tools but
    there are MAJOR graphical bugs.
  • It can do some common analysis methods but it has
    next to no built-in diagnostic tools.
  • Its report writing abilities are pathetic.

5
  • Excels built-in analysis tools are hidden away
    and are extremely limited.

6
R
  • R has no obvious way to do anything.

7
R with Rcmdr
If you learn to download and run a library called
R commander, you can get a nice point-and-click
system to do many common statistics but important
ones for medicine (like survival analysis) are
missing.
8
Rs big brother (commercial software) called
S-plus has a very nice graphical user interface.
You can point and click your way to lots of
analyses and in some cases you can see the code
and use it to learn the language. Unfortunately,
the language is extremely unintuitive for things
like data management. Also, the code generated
by the GUI sometimes includes things that amount
to you clicked on a menu to do the task.
9
SPSS
SPSS has a brilliant graphical user interface for
graphics and analyses. Its major downfall is in
data manipulation involving multiple tables.
Also, when you get stuck, youll need to find an
SPSS expert, which is not as easy as with other
packages (within the medical school).
10
SAS
The latest version of SAS has a robust system for
point-and-click analyses, which they hid away on
a submenu.
11
SAS/Analyst
Once you have the analyst running, you get the
graphics and statistics menus you would expect.
The interface is primitive but it does allow you
to generate many analyses. But the format of the
output (especially the graphics) is substandard.
12
SAS/Enterprise Guide
13
SAS/Enterprise Guide
  • Getting EG
  • If you have a SAS 9.1.3 license though Stanford,
    you can Get EG for free by contacting software
    licensing
  • software.stanford.edu
  • Using EG
  • Project explorer shows a tree view of the
    different things in the project.
  • Project Designer shows flowcharts of the tasks
    you have done.
  • Task List gives you hyperlinks to common tasks
    like analyses and graphics.
  • Task Status shows what EG is working on.
  • Notice the pushpins. A pushed in pin keeps the
    window displayed. A sideways pin says the window
    can retract to the edge of the screen.

14
Orientation to EG
  • The model for EG is to have a flowchart beginning
    with data import, moving through data management,
    into analysis with visualization and ending at
    well formatted pages.
  • You can right click on the objects in the
    flowchart and set and reset properties until you
    are happy.

15
To Code or Not to Code
  • You dont have to memorize SAS code syntax
    anymore!
  • Enterprise Guide builds SAS code for you and
    passes it on to the SAS analysis and
    visualization engines.
  • If you want to learn the programming, you can see
    the code as it develops and if you already know
    some SAS, all the procedures are readily
    available to you. You can augment the code that
    EG writes or you can get an enhanced editor to
    write code from scratch.

16
Excel and Analysis Software
  • Everyone wants to import data from Excel. Use
    extreme caution with every package that can read
    in Excel files, even other Microsoft programs.
  • There is a common, but not commonly known, bug
    with how Windows processes Excel files. The
    fundamental issue is how it figures out if a
    column in Excel has character or numeric data.
    If it thinks that the column is character data,
    no problem, but what happens when it thinks a
    column has numbers, and part way down the column,
    it has some letters in a cell? Sometimes the
    cells with the characters are unceremoniously
    blanked out!
  • You can end up with missing data.

17
R
SAS
18
The Registry
  • Deep inside of windows is a repository of
    information on all the software on your computer.
    It is called the registry.
  • In the registry there is a key that tells
    applications which are talking to Excel how many
    rows to check, going down a column, to figure out
    if a column should be called character or
    numeric.
  • It is set by default to only look in the first 8
    rows!!!!! So if you have character data for the
    first time in a cell after the first 8 rows, it
    guesses incorrectly that you have only numeric
    data in the column and your character cells will
    be erased without warning on import.

19
You can fix this.
  • Make sure to follow these instructions carefully.
    If you tweak the wrong thing in the registry you
    can render your machine unable to reboot!
  • Click the Windows Start menu and choose Run
  • In the dialog type regedit and click ok
  • Open up the tree to this path
  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engi
    nes\Excel
  • Double click TypeGuessRows
  • Type 0, that is zero not the letter o, in the
    DWORD editor and click ok
  • Microsoft ACCESS will silently change this
    setting!
  • So watch this setting if you use ACCESS.

20
After the TweakBack to EG
  • Even with that fix, the typical import menu
    choice can still be problematic in EG (and also
    in the other packages).
  • After you tweak the registry, you can copy and
    paste a tiny little program and use it to import
    your data correctly.
  • proc import out bugged datafile
    "C\blah\bugged.xls" replace
  • mixed YES
  • run
  • data bugged
  • set bugged
  • run

Import into a temporary file called bugged that
will disappear when you quit EG.
21
Importing
  • You want to add an import data icon to the
    flowchart.
  • You can go to the file menu and choose Import
    Data . but given the Excel problem, tread with
    extreme caution.
  • Instead, I recommend going to the file menu and
    choosing New gt Code and then pasting in the code
    from the last slide

22
Importing with EG
1) Paste in the code
23
(No Transcript)
24
Imported
  • There are many things to notice when you play
    (aka run) the code.
  • A view table opens.
  • The red letter A in the column heading means that
    the data imported as a column of character data.
    The fact that this is character data will prevent
    you from doing numeric statistics on this column.
  • The Project Explorer gets a log entry and a data
    object added to the tree. The Process Flow
    gets a dataset item. The selected dataset now
    shows up on the task/button bar.

25
Column Headings in Excel
  • You can have anything you want for the column
    headings in your Excel files but you will save
    yourself many headaches if you make the heading
    into a single word with no punctuation.
  • In Excel, change your column heading is dead
    into isDead. This convention will let you read
    the data into any popular analysis package
    without having to rename over and over again.

26
More Complicated Import
  • You can import worksheets with many columns and
    use named tabs in the Excel workbook. Add in
    another code object, then paste in

27
Storing Data with the Project
  • The datasets I have shown you so far are kept in
    a temporary store called work and when you quit
    EG, it will forget the data.
  • You can permanently store the data in a format EG
    knows. You do this by setting up a library
    using the Tools menu. This is just a pointer
    to or alias for a folder on your hard drive.

28
Use 8 or less letters
Click on local
SAS can read and write many files seamlessly. If
you specify an Engine other than BASE you can
access different types of files in the folder
(e.g., SPSS .sav files). You can even have a
library refer to an Excel workbook. That voodoo
lets you work directly inside of Excel files from
within SAS.
Push next if you are not accessing a database
29
Name it something!
  • Once you have the library, change the label on
    the flowchart. You can click on the icon in the
    process flow and then push the F2 key on your
    keyboard or click on the name in the project
    explorer and when it highlights, type over the
    name.
  • You can change the name on any object like this.

You can put any label you want into the flowchart
but you probably want to use the real library
name.
30
Libraries
  • Once the library is set, SAS is aware that there
    is a place that can hold data, but you still need
    to tell it to move the data to the permanent
    store.
  • Next is an example where the data gets stored in
    a permanent library on your hard drive.

31
Working with Data
  • You can import, then fix, problematic data and
    then save a permanent copy in a library.
  • You will be asked if you really want to change
    the data when you start to type the changes.

Double click the table icon to view the table
32
Keeping Data
  • I recommend moving the data to a permanent
    library before you make any changes. First make
    a library. Next load the data into the temporary
    work spot. Single click the data set, then make
    a query using the Filter and Query menu option
    on the Data menu.

33
Drag and drop the variables or double click them.
Push change to change the destination to be the
library.
Name your query.
34
Navigate to the BABIES library and assign a name
to the data set. Use a single word less than 31
letters with only letters in the name.
35
When you are done, push run.
36
Add in a note icon (from the File gt New menu) and
always document every change.
37
Save your project!
  • EG is very stable if it is patched up to date but
    I save often.

After first save
Before first save
38
Patch Software
  • The first release of EG 4.1 had major crashing
    issues.
  • You will want to get the SAS patches and EG
    patches from the web
  • ftp.sas.com/techsup/download/hotfix/e9_win_sbcs.ht
    ml
  • ftp.sas.com/techsup/download/hotfix/ent_guide41.ht
    ml

39
Returning to a Project
  • When you get back to a project the files that
    were imported into the temporary work location
    are gone and SAS will have forgotten about the
    library reference. The files in the permanent
    library are there. You just need to tell SAS to
    remember the library.
  • If you double click a view table immediately
    after you return, you will run into trouble.
    Just rerun the code.

40
Returning to a Project(2)
  • Replay the import code if you need the temporary
    files.
  • To replay the library creation, click on the
    library and use the play button or right click on
    the library and choose Run.

41
So far
  • That is the process for importing, cleaning and
    saving your data.
  • Next you will want to visualize and do statistics
    on the data.

42
For the Old Timers
  • If you were exposed to SAS prior to EG you were
    told a LOT about data steps (keyword DATA) and
    analysis procedures (keyword PROC). Everything
    you learned still works in a code object in the
    flowchart. The EG-created code for data step
    manipulation is now done mostly with SQL (what is
    called PROC SQL in SAS 9.1.3) and the procs are
    now nicely hidden away in the menus.

43
Coding
  • Should you choose to write some code, I have
    written macros to have SAS autocomplete with the
    syntax for the procs. Download the file here
  • To get this functionality, open a code object
    then go to the Code menu and choose Editor
    Macros and Macros
  • Click Import then select the file that you
    downloaded.
  • Then as you type in procedures, you can hit the
    tab key to have it complete your syntax.

www.stanford.edu/class/hrp223/2006/programming/mac
ros.kmf
44
Why would you code?
  • You need tools to do data validation and to fix
    systematic problems. You can manually change
    your dataset like you would in Excel by changing
    one cell at a time or you can learn to write
    basic data step code. The right way to clean
    data is with code.
  • You want to have an audit trail of every change
    to your data.
  • Check out HRP 223 if you need to do real data
    management with validation.
  • www.stanford.edu/class/hrp223/

45
Data Menu for Old Timers
EG does provide you with a data manipulation
menu. Data validation is limited and is buried
within the Filter and Query menu item!
Proc sql
Sort can also find and remove duplicates.
proc sql or proc append or data step
Proc sort
Formats are very important tools for changing
appearance of variables (e.g., how do dates and
dollars look).
Proc format
Proc transpose
Proc surveyselect
Proc rank
Proc standard
Want to know how two spreadsheets or datasets
differ? Use Compare.
Proc datasets /contents
Proc compare
46
The Describe Menu
  • You use this menu for summary descriptions of
    both character and numeric data.

Proc means
Proc tabulate
Proc print
Proc Means univariate graphics
Proc univariate
Proc freq and gplot
Proc tabulate
Proc freq
Proc freq
47
The Graph Menu
  • These plots are almost all done with proc gplot
    with 3D defaults. Dont ever use the fancy 3D
    effects unless you are modeling 3D data!
  • Right click on all the graphics objects in the
    flowchart, choose Properties and uncheck the 3D
    effects box.

48
Adding to the Graphics
  • A couple months back, SAS added in the ability to
    do very basic interactive exploratory graphics.
  • www.sas.com/apps/demosdownloads/setupcat.jsp?catS
    ASEnterpriseGuide

Once installed, you get an extra item on the
Graph menu.
49
You can easily add a second plot to the display.
50
(No Transcript)
51
(No Transcript)
52
Analyze Menu
  • The core statistics from SAS/BASE, SAS/STAT and
    SAS/QC modules of SAS are under the Analyze menu.
  • Some univariate statistics are mixed into the
    Describe menu.

53
Getting Analyses
  • These menus work by assigning variables to roles.
  • Add the sample Cars data set to a new project
  • File gt Open gt Data
  • Browse the data
  • Get descriptive statistics and graphics

54
Notice you can only use numbers here.
You can choose other statistics.
55
All objects in the flowchart have properties. If
you wanted a Microsoft Word document instead of
HTLM, you can change the output format easily by
tweaking the properties.
56
1) Click on results.
2) Check to override defaults.
3) Check output you want.
5) Rerun the analysis.
4) Click ok.
57
(No Transcript)
58
About those options
  • Go to the Tools gt Options menu item and tweak
    the setting. You can always reset them to
    default.

I quickly got tired of telling it to overwrite
the old output.
You can see and tweak the general output
appearance templates here.
59
About those options(2)
The default text was not useful.
Printing the name of the analysis procedure is
not useful.
60
Roles in Analyses
Notice cars is selected.
  • Say you want to compare the average weight for
    American vs. Japanese cars.
  • You need to tell the t-test the name of the
    variable that has the weight that will be used in
    comparing the averages and the grouping variable
    that says if a car is from the USA or Japan.

Notice it is saying it needs a grouping variable.
61
Notice the symbol. This can be character or
numeric data.
The analysis variable (weight in this case) has
to be numeric.
You can drag and drop the variables into the
boxes to fill the roles or click one and push the
arrow.
A t-test compares two means and the Country
variable has 3 levels (USA, Japan and other).
Happily the program complains. Ideally it would
advise doing an ANOVA. Oh well.
62
Building a Data Set
  • To subset the data down
  • Click on the dataset.
  • Choose Filter and Query from the Data menu.
  • Drag and drop in all the variables (or a subset
    if you prefer minimal datasets) into the
    Selected Data window/tab.
  • Click on the filter data tab and move over the
    country variable.

63
You can control click to pick more than one
choice at once.
64
Name the query and the new dataset.
65
Do the analysis on the new data.
  • Click on the new data set.
  • Click the T-test menu item.
  • Fill in the blanks.

Notice it is saying it needs an analysis variable.
66
ALWAYS plot your analyses!
As you are setting your analysis you can look at
the code to learn SAS coding.
67
This is the display format set to journal.
68
The rest
  • The rest of the analyses work as easily as the
    t-test.
  • Be sure to look at the data visualization options
    that go with every test.
  • There is a very nice graphical user interface for
    working with multiple tables accessible from the
    Filter and Query tool. See the Cow bonus
    material.

69
Learning More
  • The Little SAS Book for Enterprise Guide 4.1 by
    Slaugher and Delwiche is a nice friendly
    introduction.
  • There is a free tutorial at SAS
  • www.sas.com/apps/elearning/elearning_courses.jsp?c
    atFreeTutorials
  • There are fairly inexpensive additional online
    tutorials
  • www.sas.com/apps/elearning/elearning_courses.jsp?c
    atSASEnterpriseGuide
  • There are many course notes through SAS but
    they usually are written EG 3 not 4. Ask me
    before you spend the money.
  • The only beyond-the-basics book for EG 4 is
    Statistics Using SAS Enterprise Guide by James
    Davis
  • www.sas.com/apps/pubscat/bookdetails.jsp?catid1p
    c57255

70
Next?!!!
  • If you need pre-grant statistical support, come
    talk to us at SPCTRM!
  • clinicaltrials.stanford.edu
  • If you are interested in more education
    opportunities, visit
  • clinicaltrials.stanford.edu/education
  • You can sign up for our class mailing list form
    on that page.
  • Next quarter I am teaching data management with
    SAS. It will be using EG and will have a focus on
    programming.
  • www.stanford.edu/class/hrp223/
  • and I will be teaching first quarter graduate
    level statistics (hrp259) using EG.
  • Also, Lane and SPCTRM are collaborating to give a
    3 day short course on R/Rcmdr but it is very
    full.
  • lane.stanford.edu/services/workshops/laneclasses.h
    tml

71
Cows
  • The bonus level!

72
The Task
  • I have some cows whose milk is graded. I want to
    give scores of 80 or higher Pass and below 80,
    Fail. The cows were assigned IDs to anonymize
    them for the study but now that I have the
    scores, I want to use their names.

73
Loading the Data
  • I could make the files in Excel but just to be
    different, I did it with code.

74
Add in the Names to the Grades
  • 1) Click on the grades table.
  • 2) Use Filter and Query from the Data menu.

3) Click on Add Tables
4) Find the data on the local SAS server because
the data is already in the project in the
temporary work location.
75
The tables are related.
  • The ID variable in the two tables explains which
    name is for which grade. Push the tiny Join
    button and it notices the common variable name
    (id) and links the two tables.

This is an equijoin. Only records with a matching
ID in both tables will be in the final dataset.
76
Preliminary Look
  • Add the name and grade variables (and the ID if
    you like) to the Select Data tab and perhaps use
    the Sort Data tab to order by name. Then run the
    query.

77
Create the Pass/Fail Score from the Grade
  • Double click on the query to reopen it for
    editing. Then push the Computed Columns button
    and the New button and pick Recode a Column
    and pick grade.

78
  • Recoding categorical data is easy. There is a
    gotcha when working with continuous scale data.

1) Name the variable you are making.
3) Push Add to tell it how to categorize the
data.
2) Specify that the new variable is a character
string.
79
The gotcha is how will it handle scores that are
80.
Specify the value when the values are out of
range.
80
Remember to name your query and the new dataset.
81
Trouble with less-than-or-equal-to
  • The new column looks good except 80 went into the
    lower category. Open the Last Submitted Code
    to fix this.

82
Fixing the Range
  • While the code is mostly unintelligible, you can
    quickly see that it is using less-than-or-equal-to
    80 to call the sample Fail. Just tweak it to
    less-than.
Write a Comment
User Comments (0)
About PowerShow.com