Categorical Data Analysis Part II April 27 - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Categorical Data Analysis Part II April 27

Description:

Binomial distribution also has a bell shape. ... is close to normal distribution (good normal approximation when Np & N(1-p) 5 ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 20

Provided by: ccpUch

Category:

more less

Transcript and Presenter's Notes

Title: Categorical Data Analysis Part II April 27

1
Categorical Data Analysis (Part II) April 27

Dingcai Cao
d-cao_at_uchicago.edu

2
Probability Distributions
ANOVA or linear regression assumes a normal
distribution of the error Y X??, ?N(0,
?2) where Y is a continuous outcome variable, and
X is a matrix of independent variables, ? is the
error with a normal distribution (mean zero and
variance of ?2). For a normal distribution with
mean ? and variance of ?2, the probability
density function is
3
Normal Distribution
Probability density function
Cumulative distribution function
Bell Shape
4
Probability Distributions For Categorical Data
Two key distributions Binomial distribution
(for logistic regression) Poisson distribution
(for Poisson regression)
5
Binomial Distribution
Example Assume 5 of the population is
green-eyed. If we pick 500 people randomly, how
likely is it that we get 30 or more green-eyed
people? The number of green-eyed people we pick
is a random variable Y which follows a binomial
distribution with N 500 and ? 0.05 (when
picking the people with replacement). We are
interested in the probability PY gt 30.
Probability Mass Function
Let ? denote the probability of success for a
given trial. Let Y denote the number of
successes out of the N trials, the probability of
outcome y for Y equals
6
Binomial Distribution
Binomial distribution also has a bell shape. In
fact, with large N, binomial distribution is
close to normal distribution (good normal
approximation when Np N(1-p) gt 5
A special case of Binomial distribution with N
1 is called Bernoulli distribution.
7
Binomial Distribution
Example N 500 and ? 0.05
8
Poisson Distribution

In binomial distribution, the number of trial N
is fixed. In some situation, N is random
For instance, the number of car accidents in
Dan-Ryan (I90/94) Express Way in a week
We are interested in the number of fatal
accidents in Dan-Ryan (I90/94) Express Way in a
week
The probability of the number of fatal accidents
can be described by Poisson distribution.

Probability mass function
where ? is the mean number of fatal accidents in
a week.
9
Poisson Distribution
Probability mass function
Cumulative probability function
10
Likelihood Function
Probability mass function (for discrete events)
or probability density function (for continuous
measures) allows us to calculate the probability
from the distribution parameters. In
experiments, often time we observe the data, and
we need to estimate the parameters of the
distribution.
Likelihood Function
The probability of the observed data, expressed
as a function of the parameters, is called a
likelihood function (l).
Example Suppose in a binomial distribution, N
10, the observed number of success y 3, then
the likelihood function of the data is
11
Maximum Likelihood Estimation
The maximum likelihood estimate of the parameter
is defined to be the parameter value for which
the probability of the observed data has the
greatest value.
Likelihood Function
Likelihood
?
12
Binomial Distribution Maximum Likelihood
Estimation
Probability Mass Function
Probability mass function allows us to calculate
the probability. However, in experiments, we run
N trials and record y number of success. We need
to estimate ?, the success probability.
Likelihood Function
Log Likelihood Function
13
Binomial Distribution Maximum Likelihood
Estimation
Maximum Likelihood (ML)
Logistic regression! It assumes a binomial
distribution and uses a maximum likelihood
estimation method.
14
Poisson Distribution Maximum Likelihood
Estimation
Given a sample of n measured values yi we wish to
estimate the value of the parameter ? of the
Poisson population from which the sample was
drawn.
15
Normal Distribution Maximum Likelihood Estimation
In class exercise Given a sample of n measured
values yi, estimate the value of the parameter ?
and ?2 of the normal distribution.
16
Logistic Regression Model
Y Binary outcome variable (I.e. a categorical
variable with only two levels). Example Smoking
Status (Yes/No) Application Status
(Admitted/Denied) Assumption Y has a binomial
distribution X Explanatory variable ?(x)
the probability of success when X takes value
x. ?(x) is the parameter for the binomial
distribution.
17
Logistic Regression Model
Model
?(x)
logit(?(x))
0.5
Median effect level (EL50)
x
x
?0 indicates that Y is independent of x
18
Logistic Regression Model
Odds
Odds ratio
?(x)/(1- ?(x)))
1
x
? is the change in log odds ratio with one unit
of change in x
19
Logistic Regression Model
? is the log odds value when x 0. ? is the
change in log odds ratio with one unit of change
in x. x -?/? is the median effective level
(probability 0.5).

Write a Comment

User Comments (0)