Review of basic statistics and introduction to Asymptotics - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Review of basic statistics and introduction to Asymptotics

Description:

Var(X) is the expected value of the squared deviations from the mean ... Only if Var( )=0 as n. An consistent estimator is not necessarily unbiased. ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 33

Provided by: jun97

Category:

more less

Transcript and Presenter's Notes

Title: Review of basic statistics and introduction to Asymptotics

1
Review of basic statistics and introduction to
Asymptotics

Random variables
Estimation
Hypothesis testing and confidence interval

2
Random Variables and their characteristics
3
Random Variable

Random Variable (RV)
x is a random variable if it comes from a
random draw from some population, that is, the
group of entities-such as people, companies, or
school districts- that we are studying.
RV can be either discrete or continuous
If a RV can take on only some selected
values, then it is called discrete random
variable.
For example, if we toss a coin, then
the outcome is a
random draw from a population of 0
(head), 1(tail)-0, 1 is
what we describe a population when it
has only two outcomes)-
and the RV can take only values of
either 0 or 1
If a RV and take on any value in an
interval on the real number line, then it is
called continuous variable.
For example, the final score of a
student in ECON 120C can be
any number between 0 and 100.
How do we characterize a random variable? By
probability!

4
Probability

In simple statistics, probability is just the
proportion, or frequency of the time that a
outcome will occur in the long run, and the
outcome is the value of the random variable. So
probability is a relationship outcome
value?frequency
If there are 100 balls in an urn with 50 reds
and 50 black, we draw 1 ball out each time with
replacement. If we repeat this 10,000 times, then
the proportion of the balls drawn being red will
be roughly 50. i.e., probability (red) or
P(0)0.5

5
Probability distribution of discrete RV

For discrete random variables, each value it can
take will have a probability. These probabilities
sum to____. So we have a relationship of
RV values?probability
We can plot this relationship called
probability distribution

6
Probability distribution of continuous RV
Probability density function (pdf)

For continuous variable, we cannot list all the
possible values that it can take, since it can
take infinite number of values in an interval.
Instead, the random variable is characterized by
probability density function. The area under the
function between any two points is the
probability that the random variable values falls
between these two points. Usually a pdf is a bell
shaped line. Examples like normal or t
distribution

P(x1ltxltx2)
pdff(X)
P(xgtx2)
x1
x2
x
P(xltx1)
7
The most important pdf normal distribution

Of particular interest because of Central Limit
Theorem
This distribution is sometimes called the
Gaussian distribution in honor of Carl Friedrich
Gauss, a famous mathematician.
Probability Density function for a variable that
follows a normal distribution

8
Standard Normal
Peak height is about 0.4

A normal random variable can be standardized by
subtracting µ, and divided by d, to make it
standard normal, N(0, 1) with pdf

The standard normal pdf function is symmetric
around 0, most of the area is between -3 and 3.
9
Histogram an approximation to pdf

If we have sample data for the values of a random
variable that can take continuous real values,
how do we know whether its coming from a normal
distribution
The most straightforward way is to look at the
histogram

10
Histogram of normal distribution

Divide the horizontal real axis into intervals of
equal length, plot the proportion of random
values realized in an interval, i.e., the data
values that fall in an interval against the
midpoints of the interval, making a rectangular
bar.
When the number of intervals (bins) increase, the
outline of the group of bars looks more and more
like a pdf.
Lets try this in STATA

11
Mean and Variance

Expected value (mean) of random variable x,
denoted by or E(x), is the average of x
weighted by probability.
conditional mean is the expected value of
one random variable given that another random
variable takes on a particular value
Var(X) is the expected value of the squared
deviations from the mean
The variance of X is a measure of the
dispersion of the distribution. The square root
of Var(X) is the standard deviation of X

12
Covariance and Correlation

Covariance between X and Y is a measure of the
association between two random variables, X Y
Correlation, Corr(X,Y), scales covariance by the
standard deviations of X Y so that it lies
between ______

13
Properties of mean, variance, covariance and
correlation

E(a)a, Var(a)0
E(mX)mX, i.e. E(E(X))E(X)
E(aXb)aE(X)b
E(XY)E(X)E(Y)
E(X-Y)E(X)-E(Y)
E(X- mX)0 or E(X-E(X))0
E((aX)2)a2E(X2)

Var(X) E(X2) mx2
Var(aXb) a2Var(X)
Var(XY) Var(X) Var(Y) 2Cov(X,Y)
Var(X-Y) Var(X) Var(Y) - 2Cov(X,Y)
Cov(X,Y) E(XY)-mxmy
If (and only if) X,Y
independent, then
Var(XY)Var(X)Var(Y), E(XY)E(X)E(Y)

14
Estimation, Estimators and Their Properties
15
Population Parameters

A population parameter is a fixed constant that
describe a characteristic of the population, such
as the mean or the variance.
Think of it as a computer can generates infinite
number of random variables with mean 0 and
variance 1, or God secretly created all men in
the world equal with the potential of scoring
exactly 80 in ECON 120C. How do we find out these
numbers?

16
Estimate with random sample

It is the population that we are interested in.
However, when the population is large, we have no
way to observe all the elements of it to
calculate the parameters. Usually we have only
samples of the population.
For a random variable Y, repeated draws from the
same population can be labeled as Y1, Y2, . . . ,
Yn
If every combination of n sample points has an
equal chance of being selected, this is a random
sample
A random sample is a set of independent,
identically distributed (i.i.d) random variables,
i.e., E(Y1)E(Y2)E(Yn) , Var(Y1)Var(Y2)Var(Yn
)

17
Estimator and Sample Estimates

the population parameters can be
ESTIMATEDGUESSEDfrom the sample data.
The theory of estimation is the basis to
calculate the degree of precision with which the
guess is right.
A estimator is a rule, formula, algorithm or
recipe applied to sample data to estimate a
population parameter.
An estimate is the actual number the formula
produces from the sample data

18
Simplest example population mean estimate

I know the computer is generating random
variables with mean , and variance . To
find out what is, I ask the computer to give
me n of such random numbers, and using the
formula
as estimator for

19
Desired properties of estimators

Sample average is an estimator naturally for the
population mean. You may have a simpler
estimator, for example, you can always choose any
of the r.v. value as estimate of , if you
feel all the numbers from the computer are very
close to each other.
So which estimator is good? It must have three
properties (1) unbiasedness (2) efficiency (3)
consistent

20
Unbiasedness

An estimator , of a population parameter
is unbiased if E( ) , like shooting
at a target
is unbiased since E( ) , according to
the definition of random sample
is also unbiased since

21
Efficiency

An estimator , of a population parameter
is unbiased if Var( ) is the least of
all estimators of
For example,
while from the definition of
i.i.d. random sample.

22
Consistency

This is called the Asymptotic property in
advanced mathematics.
In studying a population, we can obtain random
sample of different size. You can ask a computer
to generate 10, 50, 100 or 1000 random numbers at
a time with normal distribution N(0,1) for 10000
times. We can advertise to enroll 10, 50,100 or
1000 students who have a inherent ability to
score 80 to take ECON 120C for 10000 quarters.
We can apply an estimator to sample of different
size, the question is, is it true that the larger
the sample size, the better the estimate produced
by the estimator?

23
Consistency of an estimator

An estimator is consistent for population
parameter
if
An estimator is also a random variable, its
distribution collapse to a single bar around the
true parameter if it is consistent. See from
histogram.
Is consistent for ? Lets do an
experiment in STATA
suppose we dont know whats the population
mean of random numbers generated by the command
gen xinvnorm(uniform())
Is consistent for ?

P is probability function, this means repeating
the estimation for 10,000 times
n is sample size, this means increasing sample
size from 10 to 50, 100, 1000
24
Unbiasedness and consistency

An unbiased estimator is not necessarily
consistent. Only if Var( )0 as n ? ?
An consistent estimator is not necessarily
unbiased.

25
Central Limit Theorem (CLT)

Basically, this theorem says that under general
conditions, the distribution of is
approximately normal when sample size n is large
even if themselves are not normally
distributed.

26
Central Limit Theorem

Let be a sequence of
independent random variables with same mean
and variance then the random variable
has an asymptotic standard normal
distribution. Lets do 10,000 experiments with
sample size 10, 50, 100, 1000 in STATA

When n gets large
27
Hypothesis testing
28
Hypothesis

As in any science, a good deal of economics is
concerned with testing hypothesis, which is
framed as yes/no questions. Examples like Is the
computer generating random variables with mean 0?
Are all men created equal to score 80 in ECON
120C?

29
Procedure of hypothesis testing

1) collect the relevant sample data
2) formulate null and alternative
hypothesis
3) specify test statistics and the
appropriate distribution, choose
rejection
region
4) compare test statistics with critical
value of rejection region
5) reject/fail to reject the null hypothesis
6) state conclusion

30
Hypothesis testing example are all men created
equal to score 80 in ECON 120C ?

Collect the final scores (S) from Spring 2004,
found
and
Null
Test statistics and distribution are given by
statistical theory (CLT), rejection region is
given according to the confidence level(99, 95
or 90)
Test statistics is 90, bigger than 801.96 1.5,
so reject null
Conclusion men are created smart to score much
higher than 80 in ECON 120C