Title: Haas MFE SAS Workshop Lecture 3:
1Haas MFE SAS WorkshopLecture 3
- Peng Liu http//faculty.haas.berkeley.edu/peliu/c
omputing -
Haas School of Business, Berkeley, MFE 2006
2Commonly used PROCeduresin Financial Economics
- Peng Liu http//faculty.haas.berkeley.edu/peliu/c
omputing -
Haas School of Business, Berkeley, MFE 2006
3Basic Statistical Analysis
- Univariate statistics
- PROC MEANS
- PROC UNIVARIATE
- PROC FREQ
- Bivariate and Multivariate Statistics
- PROC CORR
- PROC NPAR1WAY
- PROC TTEST
4Comparison of PROC MEANSand PROC UNIVARIATE
- PROC MEANS
- DESCRIPTIVE STATISTICS
- CLM CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS
RANGE SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR - QUANTILE STATISTICS
- MEDIANP50 Q1P25 Q3P75 P1 P5 P10 P90 P95 P99
RANGE - HYPOTHESIS TESTING
- PROBT T
- PROC UNIVARIATE
- DESCRIPTIVE STATISTICS
- CSS CV KURTOSIS MAX MEAN MIN MODE N NMISS RANGE
SKEWNESS STD STDMEAN SUM SUMWGT USS VAR - QUANTILE STATISTICS
- MEDIAN P1 P5 P10 P90 P95 P99 Q1 Q3 RANGE
- QUANTILE STATISTICS
- NORMAL PROBN MSIGN PROBM SIGNRANK PROBS T PROBT
- ROBUST STATISTICS
- GINI MAD QN SN STD_SINI STD_MAD STD_QN
STD_QRANGE STD_SN
5PROC MEANS
- PROC MEANS DATAmfe.loan
- VAR appraisal ltv
- CLASS state
- RUN
PROC MEANS DATAmfe.loan max min VAR appraisal
ltv OUTPUT OUTm maxmaxvalue maxltv
minminvalue minltv RUN
- The default output for PROC MEANS are variable
label N Mean Std Dev Min max - median min max clm alpha0.05 are examples of
options you can specify. - You can get summary statistics for many variables
- CLASS statements will produce summary stat for
each grouping class. - You can suppress print using NOPRINT option
- You can save the result in a self-defined sas
dataset.
6PROC UNIVARIATE
- PROC UNIVARIATE DATAmfe.loan
- VAR ltv ID id
- RUN
PROC UNIVARIATE DATAmfe.loan VAR ltv
HISTOGRAM QQPLOT /normal RUN
- Use VAR to specify which variable you want to
analyze, otherwise, this PROC will produce all
variables - Use ID to identify Extreme Observations, without
ID statement it will use observation number by
default - Can plot histogram, quantile-quantile plots etc.
- Can do twosided T test, etc.
7PROC FREQ
- PROC FREQ DATAmfe.loan
- TABLE term
- RUN
PROC FREQ DATAmfe.loan TABLE state
stateterm/nocol norow RUN
- One-way v.s two-way frequency table
- /CHISQ or /BINOMIAL option can be used to test
equal proportion - In one TABLE statement, you can produce more than
one frequency tables - You can suppress col percentage or/and row
percentage by option /nocol norow
8PROC CORR
- PROC CORR DATAmfe.loan
- VAR rate ltv fico_orig
- RUN
PROC CORR DATAmfe.loan COV SPEARMAN VAR rate
ltv fico_orig RUN
- The CORR procedure computes Pearson correlation
coefficients, three nonparametric measures of
association (Spearman rank-oder correlation,
Kendalls taub and Hoeffdings measure of
dependence D), and the probabilities associated
with these statistics for numeric variables - The default is Pearson correlation.
- COV option evolke the computation of covariance
9PROC TTEST
- DATA
- INPUT a b _at__at_
- DATALINES
- 51 55 64 61 75 74 86 90
- 95 93 68 71 73 72 90 95
-
- RUN
PROC TTEST PAIRED ab RUN
- DATA step will produce automatic dataset, if user
did not specify one. - _at__at_ in INPUT lets SAS continuously read from
datelines - DATALINES is a SAS statement followed by lines
of raw data. - Data are typed continuously separated by blank,
you can separated into a different line in the
way you like. - should be stand by itself
- PROC step will perform specified procedure on
current dataset in working directory if user did
not specify a particular dataset name - Paired T-Test
10PROC NPAR1WAY
- PROC NPAR1WAY DATAmfe.loan
- CLASS state
- VAR ltv
- RUN
- NONPARAMETRIC TEST FOR DIFFERENCE ACROSS ONE-WAY
CLASSIFICATION. - IF the normality assumption does not hold, we may
use some nonparametric tests. - PROC NPAR1WAY performs nonparametric tests for
location and scale differences across a one-way
classiication, based on the following scores
Wilcoxin, Median, Van Der Waerden, Savage,
Siegel-Tukey, Ansari-Bradley, Klotz, and Modd
Scores.
11Financial Econometrics using SAS
- Linear Models (OLS, GLS and their variates)
- PROC REG
- PROC GLM (Skip)
- Logistic Regression
- PROC LOGISTIC
- PROC GENMOD
- Hazard Regression (Cox-P.H.)
- PROC PHREG
12Linear Model Theory
- Data (yi, xi(xi1, xi2, xik)) for i1, , n
and yi ? R - Model yi ?0?-1xi1 ?kxik ?i for i1,,n
- For short yX??
- where
- Assumption ?i are i.i.d. normal N(0,?2)
- Ordinary Least Square Estimation
- ? (XTX)-1XTy
13PROC REG
- PROC REG is a SAS procedure for simple or
multivariate linear regression models with
continuous dependent variables. - Part of SAS/STAT
- Model fitting (parameters, residuals, confidence
limits, influential statistics, etc) - Model selection (forward, backward, stepwise,
,etc) - Hypothesis testing
- Model diagnostics
- Plotting
- Outputting estimates and statistics
14PROC REG Examples
- PROC REG DATAmfe.loan
- MODEL ltv rate
- PLOT ltv rate
- QUIT
MODEL ltv rate fico_orig OLSMODEL ltv term
rate fico_orig MODEL ltv rate fico_orig
term/SELECTIONF
- Begin with PROC REG end with QUIT
- Multiple independent , dependent variables are
separated by space - Label OLS is optional, useful for multiple
MODEL statement in one PROC REG - By default, a constant is included
- Use /Options to request additional stat or
specify model selection method - PLOT creates a scatter plot of your regression
data and automatically adds the regression line.
15Logistic Regression Theory
- Data (yi, xi(xi1, xi2, xik)) for i1, , n
and yi is a binary or ordinal response variable.
e.g. yi ? 0,1 - Model
-
- Maximum Likelihood estimate of ?
- Assumption binomial Variation
16Logistic Regression SAS procedure
- SAS has several procedures that performs logistic
regression, e.g. GENMOD, CATMOD and LOGISTIC - PROC LOGISTIC
- Works for binary or ordinal response variables
- Performs MLE using different optimization
algorithms - 4 model selection methods F, B, Stepwise, Score
- Outputs statistics to dataset
- Tests linear hypotheses of parameters
17PROC LOGISTIC Examples
- PROC LOGISTIC DATAmfe.loan
- CLASS state edu
- MODEL default ltv age edu term rate
state/LINKLOGIT - RUN
- Begin with PROC LOGISTIC end with QUIT
- /LINKLOGIT option can be ignored, other options
PROBIT, CLOGIT, CLOGLOG - Use CLASS statement to avoid creating dummy in
DATA step - /option can be used to request additional stat,
or specify selection method. - TEST statement
18Survival Analysis Background 1
19Survival Analysis Background 2
20Cox Proportional Hazard Regression
21PROC PHREG - Example
- PROC PHREG DATAmfe.loan
- MODEL loanageprepay(0) age edu race rate ltv
fico_orig state - RUN
- Use WHERE option to subset sample to want to
regress - You can define, group variables inside PHREG
after MODEL using IF THEN ELSE - Handling tied data /TIESEXACT, other option
DISCRETE - Run PHREG for different group, use BY option,
need to sort data. - Use CLASS statement to create dummy variables