SAS Training - PowerPoint PPT Presentation

About This Presentation
Title:

SAS Training

Description:

Title: Slide 1 Author: Shawn Tsai Last modified by: Win7 Created Date: 3/20/2003 1:33:22 AM Document presentation format: On-screen Show (4:3) Company – PowerPoint PPT presentation

Number of Views:303
Avg rating:3.0/5.0
Slides: 119
Provided by: Shawn106
Category:

less

Transcript and Presenter's Notes

Title: SAS Training


1
SAS Training
  • Basic

2
Agenda
Introduction to SAS Software Program
Data preparation Tabulation
Test of Difference T-test, and ANOVA Test of Association Correlation Regression Analysis



3
INTRODUCTION TO SAS SOFTWARE PROGRAM

4
SAS
  • From traditional statistical analysis of variance
    and predictive modeling to exact methods and
    statistical visualization techniques, SAS/STAT
    software is designed for both specialized and
    enterprise wide analytical needs. SAS/STAT
    software provides a complete, comprehensive set
    of tools that can meet the data analysis needs of
    the entire organization. 

5
SAS Components
SAS Enterprise Guide
Graphical user interface application for some
common basic data analysis tasks.
SAS 9.2
Command-based application for a wide variety of
data analysis tasks.
6
SAS Enterprise Guide
  • To  open  the  statistical  software  package 
    SAS  go  to  the Start Menu  gtgtgt  All Programs
    gtgtgt  SAS  gtgtgt  SAS Enterprise Guide 4.3

7
SAS 9.2
  • To  open  the  statistical  software  package 
    SAS  go  to  the Start Menu gtgt All Program gtgt SAS
    gtgt SAS 9.2 (English)

8
What Is SAS Enterprise Guide?
  • What Is SAS Enterprise Guide? SAS Enterprise
    Guide is an easy-to-use Windows client
    application that provides these features
  • access to much of the functionality of SAS
  • an intuitive, visual, customizable interface
  • transparent access to data
  • ready-to-use tasks for analysis and reporting
  • easy ways to export data and results to other
    applications
  • scripting and automation
  • a program editor with syntax completion and
    built-in function help

9
Explore the Main Windows
1
2
3
10
Create a Project for This Tutorial
  • If SAS Enterprise Guide is not open, start it
    now. In the Welcome window, select New Project. 
  • If SAS Enterprise Guide is already open,
    select File gtgt New Project. If you already had a
    project open in SAS Enterprise Guide, you might
    be prompted to save the project. Select the
    appropriate response.
  • The new project opens with an empty Process Flow
    window.

11
1. The Project Tree
  • You can use the Project Tree window to manage the
    objects in your project. You can delete, rename,
    and reorder the items in the project. You can
    also run a process flow or schedule a process
    flow to run at a particular time. 

12
2. Workspace and Process Flow Windows
  • You can have one or more process flows in your
    project. When you create a new project, an empty
    Process Flow window opens. As you add data, run
    tasks, and generate output, an icon for each
    object is added to the process flow.
  • The process flow displays the objects in a
    project, any relationships that exist between the
    objects, and the order in which the objects will
    run when you run the process flow.

13
3. The Task List
  • You can use tasks to do everything from
    manipulating data, to running specific analytical
    procedures, to creating reports.
  • Many tasks are also available as wizards, which
    contain a limited number of options and can
    provide a quick and easy way to use some of the
    tasks.

14
Add SAS Data to the Project
  • You can add SAS data files and other types of
    files, including OLAP cubes, information maps,
    ODBC-compliant data, and files that are created
    by other software packages, such as Microsoft
    Word or Microsoft Excel.

15
  • SAS Enterprise Guide requires all data that it
    accesses to be in table format. A table is a
    rectangular arrangement of rows (also called
    observations) and columns (also called
    variables). 

Name Gender Age Weight
Jones M 48 128.6
Laverne M 58 158.3
Jaffe F . 115.5
Wilson M 28 170.1
16
  • a column's type is important because it affects
    how the column can be used in a SAS Enterprise
    Guide task. A column's type can be either
    character or numeric.
  • Character variables, such as Name and Gender in
    the preceding data set, can contain any values.
    Missing character values are represented by a
    blank.
  • Numeric variables, such as Age and Weight in the
    preceding data set, can contain only numeric
    values. Currency, date, and time data is stored
    as numeric variables. Missing numeric values are
    represented by a period.

Name Gender Age Weight
Jones M 48 128.6
Laverne M 58 158.3
Jaffe F . 115.5
Wilson M 28 170.1
17
Local and Remote Data
  • When you open data in SAS Enterprise Guide, you
    must select whether you want to look for the data
    on your local computer, a SAS server, or in a SAS
    folder.

18
Local and Remote Data (Cont)
  • If you click My Computer, you can browse the
    directory structure of your computer. You can
    open any type of data file that SAS Enterprise
    Guide can read.
  • If you click Servers, you can look for your data
    on a server. A server can either be a local
    server if SAS software is installed on your own
    computer, or it can be a remote server if SAS
    software is installed on a different computer.

19
Open Data from Server
  • Within each server there are icons that you can
    select for Libraries and Files. Libraries are
    shortcut names for directory locations that SAS
    knows about. Some libraries are defined by SAS,
    and some are defined by SAS Enterprise Guide.
    Libraries contain only SAS data sets.
  • The Files folder on a server enables you to
    access data files in the directory structure on
    the computer where the SAS server is running. For
    example, if you wanted to open a Microsoft Excel
    file on a server that is defined in your
    repository, you would use the Files node to
    locate and open the file.

20
Open Data from SAS Folders
  • If you click SAS Folders, you can browse the
    list of SAS folders that you can access. SAS
    folders are defined in the SAS Metadata Server
    and can be used to provide a central location for
    your stored processes, information maps, and
    projects so that they can be shared with other
    SAS applications. SAS folders can also contain
    content that is not in the SAS Metadata Server,
    such as data files.

21
Add SAS Data from Your Local Computer
  • Select File gtgt Open gtgt Data. In the Open Data
    window, select My Computer. 
  • Open the SAS Enterprise Guide samples directory
    and double-click Data. By default, the sample
    programs, projects, and data are located in 
  • C\Program Files\SAS\EnterpriseGuide\4.3\Sample.
  • By default, all file types are displayed in the
    window. Files with the   icon are SAS data
    sets. Press CTRL and select Orders.sd2 and
    Products.sas7bdat, and then click Open.

22
Add SAS Data from Your Local Computer (Cont)
  • Shortcuts to the Products and Orders
  • tables are added to the project, and the data
    sets open in data grids.
  • By default, the tables open in read-only mode. In
    this mode, you can browse, resize column widths,
    hide and hold columns and rows, and copy columns
    and rows to a new table.
  • You cannot edit the data in the table unless you
    change to edit mode. Select Edit gtgt Remove
    Protect Data

23
View the Properties of a Data Set
  • In the project tree, right-click Products and
    select Properties from the pop-up menu. The
    Properties for Products window opens. You can see
    information about general properties such as the
    physical location of the data and the date it was
    last modified.

24
View the Properties of a Data Set (Cont)
  • In the selection pane, click Columns. Here you
    can view a list of columns in your data and the
    column attributes.

25
Add Data from a SAS Library
  • Select File gtgt Open gtgt Data. In the Open Data
    window, select Servers.
  • Double-click Libraries, and then
    double-click SASHELP. As you can see, only SAS
    data sets are stored in libraries
  • Scroll in the window and double-click
    the PRDSALE data set. A shortcut to the data is
    added to the project and the data opens in the
    data grid.

26
Save the Project
  • Select File gtgt Save Project As. 
  • The Save window opens and prompts you to choose
    whether to save the project on your computer or
    on a server. Select My Computer.
  • In the Save window, select a location for the
    project. In the File name box, type your file
    name. Project files are saved with the
    extension .egp.
  • Click Save.

27
Data preparation Tabulation
28
Data Input
  • There are two main simple tasks for data input
  • Manually Input Data
  • Import from an External File

29
Manually Input Data
  1. Create a SAS Library
  2. Create a SAS Data Set
  3. Input data

30
What is a SAS Data Library?
  • A SAS data library is a collection of one or more
    SAS files that are recognized by SAS and can be
    referenced and stored as a unit. Each file is a
    member of the library. SAS data libraries help to
    organize your work. For example, if a SAS program
    uses more than one SAS file, then you can keep
    all the files in the same library. Organizing
    files in libraries makes it easier to locate the
    files and reference them in a program.

31
Telling SAS Where the SAS Data Library Is Located
  • directly specify the operating environment's
    physical name for the location of the SAS data
    library.
  • assign a SAS libref (library reference), which is
    a SAS name that is temporarily associated with
    the physical location name of the SAS data
    library.

32
Using Librefs for Temporary and Permanent
Libraries
  • When you start a SAS session, SAS automatically
    assigns the libref WORK to a special SAS data
    library. Normally, the files in the WORK library
    are temporary files.
  • Files that are stored in any SAS data library
    other than the WORK library are usually permanent
    files that is, they endure from one SAS session
    to the next. Store SAS files in a permanent
    library if you plan to use them in multiple SAS
    sessions.

33
Create a SAS Library
  • Tools gtgt Assign Project Library

34
Create a SAS Library Step 1
  • Specify name and server for the library

35
Create a SAS Library Step 2
  • Specify the engine for the library

36
Create a SAS Library Step 3
  • Specify options for the library

37
Create a SAS Library Step 4
  • Click Test Library, checking its OK to create
    this library
  • Press Finish to create the library

38
Create a SAS Library
  • Check created library at Server List
  • When a libref is assigned to a SAS data library,
    you can use the libref throughout the SAS session
    to access the SAS files that are stored in that
    library or to create new files.

39
Create SAS Data Set
  • File gtgt New gtgt Data

40
Create SAS Data Set Step 1
  • Specify name TEST and location DEMO

41
Create SAS Data Set Step 2
  • Create columns and specify their properties

Name Gender Age Weight
Jones M 48 128.6
Laverne M 58 158.3
Jaffe F . 115.5
Wilson M 28 170.1
42
Input Data
43
Import from an External File
  • The Import Data wizard enables you to create SAS
    data sets from text, HTML, or PC-based database
    files (including Microsoft Excel, Microsoft
    Access, and other popular formats). When you use
    the Import Data wizard, you can specify import
    options for each file that you import.

44
Import Data
  • File gtgt Import Data

45
Import Data (Cont)
  • Desktop gtgt SAS Training gtgt Data Advising
    Survey.xls

46
Import Data (Cont)
  • Specify Data

47
Import Data (Cont)
  • Select Data Source

48
Import Data (Cont)
  • Define Field Attributes

49
Import Data (Cont)
  • Advanced Options

50
Import Data Result
51
Import SPSS file
52
Import SPSS file Step 1
  • Select an SPSS file to import

53
Import SPSS file Step 2
  • Specify a name for the imported table

54
Import SPSS file Result
55
Create Format
  • Tasks gtgt Data gtgt Create Format

56
Create Format (Cont)
  • Set Format Name GENDER
  • Select Library - SASUSER
  • Select Format Type Character

57
Define Formats
  • Click New Label and type a name of a label
  • Click New Range and select type of values and
    type a value according to the specified label
  • Repeat the steps
  • Click Run

58
Applying User-Defined Formats
  • Open a SAS Data Set
  • Unprotect Data Edit gtgt Unprotect Data

59
Applying User-Defined Formats (Cont)
  • Right-click the column
  • Select Properties

60
Applying User-Defined Formats (Cont)
  • In the left pane, select Formats
  • In Categories box, select User Defined
  • In Formats box, select the desired Formats

61
Applying Formats in Tasks
  • Custom formats can be applied in the same places
    that formats defined in SAS can be used.

62
SAS Tasks
  • After you have data in your project, you can
    create reports and run analyses on the data.
  • To do this, you select a SAS task from the Task
    List or from the Tasks menu. Some tasks have
    wizards to guide you through the decisions that
    you need to make. Wizards are available from
    menus or from a link next to the related task in
    the Task List.

63
Using Tasks in SAS Enterprise Guide
  • The icon next to each variable represents the
    variable's type. Country is a character variable
    ( ). Year is a numeric variable ( ). Month
    is a numeric variable in date-and-time format (
    ). Actual and Predict are numeric variables in
    currency format ( ).

64
One-Way Frequencies Task
  • We should create One-Way Frequencies (tables and
    graphs) to check our data set one last time
    before we intensively analyze the data.

65
One-Way Frequencies
  • Under Data, select Q1-Q19, Gender, Nation, Year,
    and Major for Analysis variables.

66
One-Way Frequencies
  • Under Plots, check Vertical for Bar chart.

67
One-Way Frequencies
  • Check Frequency Tables and/or Bar charts for any
    errors (e.g., typo). Make necessary correction(s).

68
Filter and Sort
  • Use Tasks gtgt Data gtgt Filter and Sort... or Sort
    data... to help you find the error(s).

69
Summary Statistics Task
  • The Summary Statistics task can be used to
    calculate summary statistics based on groups
    within the data. You can produce reports, graphs,
    and data sets as output.

70
Summary Statistics Task
  • The Summary Statistics task has both a wizard and
    the standard task dialog box that can be used to
    set up the results.


71
Summary Statistics Task Roles
  • Use the wizard to assign variables to roles.

Compute statisticsfor each numericvariable in
the list.
Specify variables whosevalues define subgroups.
72
Summary Statistics Statistics and Results
  • Choose statistics and results to include,
    including a report, graphics, and an output data
    set.

73
Summary Statistics Advanced View
  • Opening the task in Advanced View enables
    additional options to further modify the output.

74
Summary Tables
  • The Summary Tables wizard or task can be used to
    generate a tabular summary report.

75
Summary Tables Wizard
  • The Summary Tables wizard enables you to select
    analysis variable(s) and statistics, assign
    classification variables to define rows and
    columns, and specify totals.

76
Summary Tables Wizard
77
Test of Difference T-test, ANOVA, and others
78
One-Sample t-Test
  • Tasks gtgt ANOVA gtgt t Test

79
  • Selected One Sample.

80
  • Under Data, choose Q19 as the Analysis variable
    task role and Gender as the Group analysis by.

81
  • Under Analysis, input H0 3.

82
T-Test Output
Since p-value is less than 0.05, it can be
concluded that average female students consider
themselves as a well-prepared students for
advising appointment (significantly higher than
3).
Since p-value is less than 0.05, it can be
concluded that average male students also
consider themselves as a well-prepared students
for advising appointment
83
Two-Sample t-Test
  • Tasks gtgt ANOVA gtgt t Test

84
  • Selected Two Sample.

85
  • Under Data, choose Q6 as the analysis variable
    task role and Gender as the classification
    variable.

86
  • Under Plots, check Summary plot, Confidence
    interval plot, and Normal quantile-quantile (Q-Q)
    plot.

87
T-Test Output
Equaled variance is assumed. Pooled method is
used. Since p-value is greater than 0.05, it
cannot be concluded that there is significant
difference in Advisor Satisfaction between male
and female students.
the probability is greater than 0.05. So there is
evidence that the variances for the two groups,
female students and male students, are not
different.
88
One-Way ANOVA
  • Tasks gtgt ANOVA gtgt One-Way ANOVA

89
  • Under Data, assign Q6 and Year to the task roles
    of Dependent variable and Independent variable,
    respectively.

90
  • Under Tests, click Levenes test

91
  • Under Means Comparison, check Bonferroni t test,
    Duncans multiple-range test, and Scheffes
    multiple comparison procedure for Post Hoc tests

92
  • Under Plots, check Means for Plots Types.
  • Then, click Run.

93
One-Way ANOVA results
Since p-value is greater than 0.05, it can be
concluded that there is no significant
difference in average Advisor Satisfaction among
year(s) of study. Therefore, there is no need to
check the Post Hoc tests.
94
Post Hoc Test Bonferroni t Tests
95
Post Hoc Test Scheffes Tests
96
ANOVA Means Plot of Q6 by Year
97
Test of Association Correlation Regression
Analysis
98
Data Exploration, Correlations, and Scatter Plots
  • Tasks gtgt Multivariate gtgt Correlations

99
  • With Data selected at the left, assign Q1, Q2,
    Q3, Q4, and Q5 to the task role of Analysis
    variables and Q6 to the role of Correlate with.

100
Correlation Types
101
  • In Results, check the box for Create a scatter
    plot for each correlation pair. Also, check the
    box at the right for Show correlations in
    decreasing order of magnitude and uncheck the box
    for Show statistics for each variable.

102
Correlation Analysis
  • Since p-values are less than 0.05, there are
    significant (positive) relationships between Q6
    (Overall satisfaction on Advisor) and Q1, Q2, Q3,
    Q4, Q5.

103
Linear Regression
  • Tasks gtgt Regression gtgt Linear Regression

104
  • Drag Q6 to the dependent variable task role and
    Q1, Q2, Q3, Q4, Q5. to the explanatory variables
    task role.

105
Regression Model
  • Model Selection Method Full model fitted (by
    default)

106
Regression Statistics
  • Under Details on estimates, check Standardized
    regression coefficients
  • Perform some Diagnostics

107
Regression Diagnostics
  • Unusual and Influential data (Outliers/Leverage)
  • Tests on Normality of Residuals
  • Tests on Nonconstant Error of Variance
    (Heteroscedasticity)
  • Tests on Correlations among Predictors
    (Multicollinearity)
  • Tests on Nonlinearity
  • Tests on Dependence of Residuals
    (Autocorrelation)
  • Model Specification

108
Diagnostics Collinearity Analysis
  • This option requests a detailed analysis of
    collinearity among the regressors. This includes
    eigenvalues, condition indices, and decomposition
    of the variances of the estimates with respect to
    each eigenvalue.

109
Diagnostics Collinearity Analysis
  • Check Tolerance (1/VIF) or Variance Inflation
    (VIF)
  • Some researchers use the more lenient cutoff of
    5.0 or even 10.0 to signal when multicollinearity
    is a problem. The researcher may wish to drop the
    variable with the highest VIF if
    multicollinearity is indicated and theory
    warrants.
  • The condition indices are the square roots of the
    ratio of the largest eigenvalue to each
    individual eigenvalue. The largest condition
    index is the condition number of the
    scaled X matrix. Belsey, Kuh, and Welsch (1980)
    suggest that, when this number is around 10, weak
    dependencies might be starting to affect the
    regression estimates. When this number is larger
    than 100, the estimates might have a fair amount
    of numerical error (although the statistical
    standard error almost always is much greater than
    the numerical error).

110
Diagnostics Heteroscedasticity Test
  • This option tests that the first and second
    moments of the model are correctly specified.
  • Asymptotic covariance matrix. This option
    displays the estimated asymptotic covariance
    matrix of the estimates under the hypothesis of
    heteroscedasticity.

111
Diagnostics Durbin-Watson Statistic
  • The Durbin-Watson statistic shows whether or not
    the errors have first-order autocorrelation.
    (This test is appropriate only for time series
    data.) The sample autocorrelation of the
    residuals is also produced.
  • The value of d ranges from 0 to 4. Values close
    to 0 indicate extreme positive autocorrelation
    close to 4 indicates extreme negative
    autocorrelation and close to 2 indicates no
    serial autocorrelation. As a rule of thumb, d
    should be between 1.5 and 2.5 to indicate
    independence of observations. Positive
    autocorrelation means standard errors of the b
    coefficients are too small. Negative
    autocorrelation means standard errors are too
    large.

112
  • Under Plots, select Custom list of plots under
    Show plots for regression analysis. In the menu
    that appears, uncheck the box for Diagnostic
    plots and check the box for Histogram plot of the
    residual, Normal quartile plot of the residual
    and Residual plots.

113
Regression Analysis
  • These are the F Value and p-value, respectively,
    testing the null hypothesis that the Model does
    not explain the variance of the response
    variable.

R-Square defines the proportion of the total
variance explained by the Model.
114
Regression Analysis
  • These are the t Value and p-value, respectively,
    testing the null hypothesis that the coefficients
    are significantly equal to 0.

115
Regression Diagnostics
  • Might suggest violation of normality of residuals
    assumption

116
Regression Diagnostics
  • Might suggest violation of normality of residuals
    assumption

117
Regression Diagnostics
118
  • QA
Write a Comment
User Comments (0)
About PowerShow.com