Review of basic statistics and introduction to Asymptotics - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Review of basic statistics and introduction to Asymptotics

Description:

Var(X) is the expected value of the squared deviations from the mean ... Only if Var( )=0 as n. An consistent estimator is not necessarily unbiased. ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 33
Provided by: jun97
Category:

less

Transcript and Presenter's Notes

Title: Review of basic statistics and introduction to Asymptotics


1
Review of basic statistics and introduction to
Asymptotics
  • Random variables
  • Estimation
  • Hypothesis testing and confidence interval

2
Random Variables and their characteristics
3
Random Variable
  • Random Variable (RV)
  • x is a random variable if it comes from a
    random draw from some population, that is, the
    group of entities-such as people, companies, or
    school districts- that we are studying.
  • RV can be either discrete or continuous
  • If a RV can take on only some selected
    values, then it is called discrete random
    variable.
  • For example, if we toss a coin, then
    the outcome is a
  • random draw from a population of 0
    (head), 1(tail)-0, 1 is
  • what we describe a population when it
    has only two outcomes)-
  • and the RV can take only values of
    either 0 or 1
  • If a RV and take on any value in an
    interval on the real number line, then it is
    called continuous variable.
  • For example, the final score of a
    student in ECON 120C can be
  • any number between 0 and 100.
  • How do we characterize a random variable? By
    probability!

4
Probability
  • In simple statistics, probability is just the
    proportion, or frequency of the time that a
    outcome will occur in the long run, and the
    outcome is the value of the random variable. So
    probability is a relationship outcome
    value?frequency
  • If there are 100 balls in an urn with 50 reds
    and 50 black, we draw 1 ball out each time with
    replacement. If we repeat this 10,000 times, then
    the proportion of the balls drawn being red will
    be roughly 50. i.e., probability (red) or
    P(0)0.5

5
Probability distribution of discrete RV
  • For discrete random variables, each value it can
    take will have a probability. These probabilities
    sum to____. So we have a relationship of
  • RV values?probability
  • We can plot this relationship called
    probability distribution

6
Probability distribution of continuous RV
Probability density function (pdf)
  • For continuous variable, we cannot list all the
    possible values that it can take, since it can
    take infinite number of values in an interval.
    Instead, the random variable is characterized by
    probability density function. The area under the
    function between any two points is the
    probability that the random variable values falls
    between these two points. Usually a pdf is a bell
    shaped line. Examples like normal or t
    distribution

P(x1ltxltx2)
pdff(X)
P(xgtx2)
x1
x2
x
P(xltx1)
7
The most important pdf normal distribution
  • Of particular interest because of Central Limit
    Theorem
  • This distribution is sometimes called the
    Gaussian distribution in honor of Carl Friedrich
    Gauss, a famous mathematician.
  • Probability Density function for a variable that
    follows a normal distribution

8
Standard Normal
Peak height is about 0.4
  • A normal random variable can be standardized by
    subtracting µ, and divided by d, to make it
    standard normal, N(0, 1) with pdf

The standard normal pdf function is symmetric
around 0, most of the area is between -3 and 3.
9
Histogram an approximation to pdf
  • If we have sample data for the values of a random
    variable that can take continuous real values,
    how do we know whether its coming from a normal
    distribution
  • The most straightforward way is to look at the
    histogram

10
Histogram of normal distribution
  • Divide the horizontal real axis into intervals of
    equal length, plot the proportion of random
    values realized in an interval, i.e., the data
    values that fall in an interval against the
    midpoints of the interval, making a rectangular
    bar.
  • When the number of intervals (bins) increase, the
    outline of the group of bars looks more and more
    like a pdf.
  • Lets try this in STATA

11
Mean and Variance
  • Expected value (mean) of random variable x,
    denoted by or E(x), is the average of x
    weighted by probability.
  • conditional mean is the expected value of
    one random variable given that another random
    variable takes on a particular value
  • Var(X) is the expected value of the squared
    deviations from the mean
  • The variance of X is a measure of the
    dispersion of the distribution. The square root
    of Var(X) is the standard deviation of X

12
Covariance and Correlation
  • Covariance between X and Y is a measure of the
    association between two random variables, X Y
  • Correlation, Corr(X,Y), scales covariance by the
    standard deviations of X Y so that it lies
    between ______

13
Properties of mean, variance, covariance and
correlation
  • E(a)a, Var(a)0
  • E(mX)mX, i.e. E(E(X))E(X)
  • E(aXb)aE(X)b
  • E(XY)E(X)E(Y)
  • E(X-Y)E(X)-E(Y)
  • E(X- mX)0 or E(X-E(X))0
  • E((aX)2)a2E(X2)
  • Var(X) E(X2) mx2
  • Var(aXb) a2Var(X)
  • Var(XY) Var(X) Var(Y) 2Cov(X,Y)
  • Var(X-Y) Var(X) Var(Y) - 2Cov(X,Y)
  • Cov(X,Y) E(XY)-mxmy
  • If (and only if) X,Y
  • independent, then
  • Var(XY)Var(X)Var(Y), E(XY)E(X)E(Y)

14
Estimation, Estimators and Their Properties
15
Population Parameters
  • A population parameter is a fixed constant that
    describe a characteristic of the population, such
    as the mean or the variance.
  • Think of it as a computer can generates infinite
    number of random variables with mean 0 and
    variance 1, or God secretly created all men in
    the world equal with the potential of scoring
    exactly 80 in ECON 120C. How do we find out these
    numbers?

16
Estimate with random sample
  • It is the population that we are interested in.
    However, when the population is large, we have no
    way to observe all the elements of it to
    calculate the parameters. Usually we have only
    samples of the population.
  • For a random variable Y, repeated draws from the
    same population can be labeled as Y1, Y2, . . . ,
    Yn
  • If every combination of n sample points has an
    equal chance of being selected, this is a random
    sample
  • A random sample is a set of independent,
    identically distributed (i.i.d) random variables,
    i.e., E(Y1)E(Y2)E(Yn) , Var(Y1)Var(Y2)Var(Yn
    )

17
Estimator and Sample Estimates
  • the population parameters can be
    ESTIMATEDGUESSEDfrom the sample data.
  • The theory of estimation is the basis to
    calculate the degree of precision with which the
    guess is right.
  • A estimator is a rule, formula, algorithm or
    recipe applied to sample data to estimate a
    population parameter.
  • An estimate is the actual number the formula
    produces from the sample data

18
Simplest example population mean estimate
  • I know the computer is generating random
    variables with mean , and variance . To
    find out what is, I ask the computer to give
    me n of such random numbers, and using the
    formula
  • as estimator for

19
Desired properties of estimators
  • Sample average is an estimator naturally for the
    population mean. You may have a simpler
    estimator, for example, you can always choose any
    of the r.v. value as estimate of , if you
    feel all the numbers from the computer are very
    close to each other.
  • So which estimator is good? It must have three
    properties (1) unbiasedness (2) efficiency (3)
    consistent

20
Unbiasedness
  • An estimator , of a population parameter
  • is unbiased if E( ) , like shooting
    at a target
  • is unbiased since E( ) , according to
    the definition of random sample
  • is also unbiased since

21
Efficiency
  • An estimator , of a population parameter
  • is unbiased if Var( ) is the least of
    all estimators of
  • For example,
  • while from the definition of
    i.i.d. random sample.

22
Consistency
  • This is called the Asymptotic property in
    advanced mathematics.
  • In studying a population, we can obtain random
    sample of different size. You can ask a computer
    to generate 10, 50, 100 or 1000 random numbers at
    a time with normal distribution N(0,1) for 10000
    times. We can advertise to enroll 10, 50,100 or
    1000 students who have a inherent ability to
    score 80 to take ECON 120C for 10000 quarters.
  • We can apply an estimator to sample of different
    size, the question is, is it true that the larger
    the sample size, the better the estimate produced
    by the estimator?

23
Consistency of an estimator
  • An estimator is consistent for population
    parameter
  • if
  • An estimator is also a random variable, its
    distribution collapse to a single bar around the
    true parameter if it is consistent. See from
    histogram.
  • Is consistent for ? Lets do an
    experiment in STATA
  • suppose we dont know whats the population
    mean of random numbers generated by the command
  • gen xinvnorm(uniform())
  • Is consistent for ?

P is probability function, this means repeating
the estimation for 10,000 times
n is sample size, this means increasing sample
size from 10 to 50, 100, 1000
24
Unbiasedness and consistency
  • An unbiased estimator is not necessarily
    consistent. Only if Var( )0 as n ? ?
  • An consistent estimator is not necessarily
    unbiased.

25
Central Limit Theorem (CLT)
  • Basically, this theorem says that under general
    conditions, the distribution of is
    approximately normal when sample size n is large
    even if themselves are not normally
    distributed.

26
Central Limit Theorem
  • Let be a sequence of
    independent random variables with same mean
    and variance then the random variable
  • has an asymptotic standard normal
    distribution. Lets do 10,000 experiments with
    sample size 10, 50, 100, 1000 in STATA

When n gets large
27
Hypothesis testing
28
Hypothesis
  • As in any science, a good deal of economics is
    concerned with testing hypothesis, which is
    framed as yes/no questions. Examples like Is the
    computer generating random variables with mean 0?
    Are all men created equal to score 80 in ECON
    120C?

29
Procedure of hypothesis testing
  • 1) collect the relevant sample data
  • 2) formulate null and alternative
  • hypothesis
  • 3) specify test statistics and the
  • appropriate distribution, choose
    rejection
  • region
  • 4) compare test statistics with critical
  • value of rejection region
  • 5) reject/fail to reject the null hypothesis
  • 6) state conclusion

30
Hypothesis testing example are all men created
equal to score 80 in ECON 120C ?
  • Collect the final scores (S) from Spring 2004,
    found
  • and
  • Null
  • Test statistics and distribution are given by
    statistical theory (CLT), rejection region is
    given according to the confidence level(99, 95
    or 90)
  • Test statistics is 90, bigger than 801.96 1.5,
    so reject null
  • Conclusion men are created smart to score much
    higher than 80 in ECON 120C

31
Normal distribution
5 rejection region
5 rejection region
32
Application to regression
  • OLS estimator is an average
  • OLS estimator is consistent
  • OLS estimator is normal no matter what the error
    is when sample size is large
Write a Comment
User Comments (0)
About PowerShow.com