Title: Regression Analysis Qualitative Dependent Variable
1Regression AnalysisQualitative Dependent Variable
- Muhammad Qaiser Shahbaz
- Department of Statistics
- GC University, Lahore
2The Regression Analysis
- The Regression Analysis deals with prediction of
the Mean value of the Dependent variable by using
information of Independent variables. - Nature of the dependent variable plays very
important role in the regression analysis. - Major types of the dependent variable encountered
in the regression analysis are Quantitative and
Qualitative types. - Estimation framework differ for both type of the
dependent variables.
3The Generalized Linear Models
- The Class of Linear Models that contains certain
types of models in itself. - The Generalized Linear Model set up is given as
4Types of Qualitative Dependent Variable
- Following major types of Qualitative Variables
are met in practice - Binary Variable
- Categorical without Order
- Categorical with Order
5Regression with Qualitative Dependent Variable
6Generalized Linear Models
- Models Estimation
- Iteratively Reweighted Least Squares Estimation
- Maximum Likelihood Estimation
- Model Diagnostics
- Outliers
- Residual Analysis
- Autocorrelations
- Heteroscedasticity
- Multicollinearity
- Leverage Values
- Influential Observations
- Model Validation
7Regression with Binary Dependent Variable
- The dependent variable is Binary.
- Distribution of the Error is Binomial
- Several models are available depending upon the
Link Function two most popular are - The Binary Logistic Regression
- The Probit Regression
- The Models are used to predict the probability of
falling in the success category given the
information of explanatory variables.
8The Binary Logistic Regression
- The Dependent variable is Binary, say for
example, recovery from a disease (Yes, No)
qualifying an entry test (Yes, No) etc. - The Yes category is generally referred to as
the success category. - Used to model the probability of having in the
success category given the information of
independent variables. - Can also be used to predict the Logit of the
success category - Commonly used in Medical sciences.
9The Model for Binary Variable
- The Logistic Regression model used to predict
the probability of dependent variable to have in
the success category given the information of
explanatory variables is given as - The Logit model used to predict Logit of the
dependent variable is given as
10Interpreting the Coefficients
- The Coefficients in the Logistic Regression are
interpreted in terms of Logit and Odds Ratio. - The Coefficient is the Logit of the
dependent variable when all the independent
variables have zero value. The quantity
is the Odds Ratio of the dependent variable
when all independent variables are zero. - The coefficient is partial effect of jth
Independent variable on Logit of the dependent
variable. The quantity is the
partial effect of jth Independent variable on
Odds Ratio of the dependent variable.
11Some Important Measures
- The Model ChiSquare measures the difference
between two LogLikelihood functions. - The PseudoR2Used to decide about the proportion
of variation of dependent variable explained by
the model. Two types of R2 are available
12Testing Adequacy and Significance of the Model
- Adequacy of the Logistic Regression can be tested
by using the Deviance statistic that measures
difference between saturated model and the fitted
model. An insignificant result indicates that the
model is adequate. - Significance of the model is tested by using the
model ChiSquare. This test tests whether all of
the regression coefficients are significantly
different from zero. A significant result
indicates that the coefficients are different
from zero.
13Measures for Model Diagnostics
- Individual Deviance
- Leverage Values
- Standardized Residuals
- Cooks Distance
14Data Format for Logistic Regression
- Two formats of data are available for Logistic
Regression. - The Raw Format The data is entered as it is
collected. - Covariate Class Format The data is entered in
the form of groups.
15The Raw Format Illustrated
16The Covariate Class Illustrated
17Example 1
- A study was performed to investigate new
automobile purchases. Data on monthly income (000
US), Age of Old Car and purchase of the new car
(1Yes) is collected and is given below
18Running the Regression
19Running the Regression
20The Output
21The Logistic Logit Model
22Calculation of Estimated Probability
- The estimated Logistic Regression Model is
- The probability of having a Car for a family
with Income of US 65000 and a 8 year old car is
23The Diagnostics
24The Diagnostics
25Models for Categorical Data
- Two types of Models are available depending upon
nature of the Categorical Variable. - Unordered Categorical Variable
- The Multinomial Logistic Model
- The Discriminatory Analysis
- Ordered Categorical Variable
- The Ordinal Logistic Model
26The Multinomial Logistic Model
- Dependent variable is Unordered Categorical, for
example preference of a TV brand. - Distribution of the dependent variable is
Multinomial. - A Base category is used in the model.
- Used to predict the Probability of a specific
category given the information of independent
variables
27The Multinomial Logistic Model
- The Multinomial Logistic model is a collection of
several models. - If there are G categories in the dependent
variable then there are G 1 logistic models,
one for each category with one category as base. - Each model can be used to predict the probability
or Logit of a given category on the basis of
information of explanatory variables.
28The Multinomial Logistic Model
29Interpreting the Regression Coefficients
- The model to predict Category Logit is
- The coefficient is partial change in the
Logit of gth category for a unit change in the
jth independent variable.
30Model Adequacy Significance
- Like Binary Logistic model, tests of significance
and adequacy of the model can also be conducted
in Multinomial Logistic Model. - Adequacy of the model can be tested by using the
Deviance Statistic. An insignificant result
indicates that the model is adequate. - Significance of the model can be tested by using
the ChiSquare Statistics. A significant result
indicates that the model is significant.
31The R2 Measure
- The R2 Multinomial Logistic is calculated by
using the statistic - Lp is LogLikelihood of model with independent
variables. - L0 is LogLikelihood of intercept only model.
32Diagnostics
33Example 2
- A study was conducted to see the effect of
length of alligators on their primary food
choice. Data is given below
34Running the Regression
35Running the RegressionStatistics
36Running the Regression Save
37The Output
38The Output
39Models to Predict Logit of a Category
40Models to Predict Category Probability
41Calculation of Probability
- The Probability of preference of various food
types of an alligator with 3.5 m length are
42The Discriminatory Analysis
- Used as an alternative to Multinomial Logistic
Model. - Basic use is to develop the Linear Discriminant
Functions that can be used to predict the group
membership. - The role of discriminant analysis is opposite to
One Way MANOVA. The Fixed Factor in one way
MANOVA becomes the dependent variable in
Discriminant Analysis. The dependent variable in
MANOVA becomes independent variables in
Discriminant Analysis.
43The Discriminatory Analysis
- The discriminatory analysis require continuous
independent variables. If independent variables
are categorical then the technique is not
appropriate. - The Multivariate Normality of independent
variables is also required to conduct Tests of
Significance. - The Homogeneity of Covariance Matrices is also
required for efficient use.
44The Linear Discriminant Function
- The function is used to predict the group
membership. - The Standardized Canonical Coefficients are
45Interpreting the Coefficients of Linear
Discriminant Function
- The Coefficients are like the coefficients of the
Regression Function. - The coefficients can be used to look at the role
of a particular variable in discrimination. - Larger the coefficient of a variable is, greater
will be its role in discrimination for that
particular group.
46Some Tests of Significance
- Boxs M test for testing Equality of Covariance
Matrices across various groups. Insignificant
result of this test indicates that pooled within
group covariance matrix can be used to form the
discriminant function. Significant result
indicates that separate covariance matrices
should be used. - Wilks Lambda statistic for testing Equality of
Mean Vectors across various groups.
47Example 3
- Data on Sepal Length, Sepal Width, Petal Length
and Petal Width is collected for various types of
Irises. The complete data has 150 observations. A
part of the data is given below
48Running the Analysis
49Running the Analysis
50Running the Analysis - Statistic
51Running the AnalysisClassification
52Running the AnalysisSave
53The Output
54The Output
55The Output
56The Output
57The Output
58The Output
59The Linear Discriminant Functions
60The Output
61The Ordinal Logistic Model
- Dependent variable is Ordered Categorical.
- Distribution of the dependent variable is
Multinomial. - If there are k categories then a total of k
1 models are estimated. - The models are used to predict the Cumulative
Probability of a specific category given the
information of explanatory variables.
62The Ordinal Logistic Model
- Two models are widely used and are given
63The Ordinal Logistic Model
- The models to predict Cumulative Probability of
a category are
64Tests of Significance in Ordinal Logistic
- Certain tests of significance can be carried out
in Ordinal Logistic Regression. - Significance of the Model is tested by using the
ChiSquare Statistic. - A test of Parallel Regressions can be tested by
using the ChiSquare Statistic.
65Example 4
- Data on Credit Card Status has been collected
from 250 credit card holders. Information upon
Card Status (chist), Age (age), Duration of Card
(dura) and Card Amount (camt) is collected. A
part of data is shown here
66Running the Analysis
67Running the Analysis Options
68Running the Analysis Output
69The Output
70The Output
71The Regression Models
72