Title: Statistical inference
1Statistical inference General objective To
draw conclusions about the population of study
from observations (a sample) obtained from the
population.
Probability theory
Population
Sample
Statistical inference
2- Methodologies
- Point estimation
- Interval estimation
- Hypothesis testing
Point estimation
Interval estimation
Hypothesis testing
3Point estimation General objective To find
approximations to the unknown values of one ore
more parameters of the population Embedded
parts An approximate value is calculated from
the obtained observations. Obtained observations
are randomly drawn from the population ? The
current approximate value is randomly drawn
from the set of all possible approximate values
(values from all possible sets of observations) ?
The approximate value is an outcome of a
stochastic point estimator. ? This stochastic
point estimator has a probability distribution
induced by the population of study and the method
of sampling, a so-called sampling distribution.
4- Properties of the point estimator can be
investigated and compared between different
competitors using this sampling distribution. - Unbiasedness
- Consistency
- Efficiency
- Sufficiency
- Completeness
- Different systematic methods can be derived to
find point estimators, sometimes fulfilling
certain optimality criteria. - Maximum Likelihood Method
- Method of Moments
- Least-Squares Method
5Interval estimation General objective To find
a numerical interval or set (ellipsoid,
hyper-ellipsoid) to which the unknown value of a
parameter (one- or multidimensional) belongs with
a certain degree of confidence (security) More
common term Confidence intervals
(sets) Construction of confidence intervals is
done by using a so-called statistic calculated
from the observations (the sample). Like a point
estimator, a statistic has a sampling
distribution depending on the unknown parameter
value. The sampling distribution is used to find
an interval of parameter values that to a certain
(high) probability is consistent with the
observed value of the statistic.
6- Hypothesis testing
- General objective
- To formulate and test statements about
- the values of one or more parameters of the
population - relationships between corresponding parameters
of different populations - other properties of the probability
distribution(s) of the values of the
population(s) - Methodology
- Investigation of the consistency between a
certain statement and the observed values in the
sample (sometimes through a computed confidence
interval) - Embedded methodology
- There are different alternatives for the test of
one statement ? Properties of different tests
need to be investigated (power, unbiasedness,
consistency, efficiency, invariance,)
7Outline of the course ? Preliminary Work plan
(Course web page)
8- Teaching and examination
- Weekly meetings (1 or 2) consisting of lectures
and problem seminars - Lectures Summaries of the moments covered by
the corresponding week - Problem seminars Solutions to selected
exercises presented on the white board by
students and/or teacher - The students will each week be given a number of
exercises to work with. At the problem seminars
students are expected to attempt to present
solutions on the white board. Upon completing the
course every student should have presented at
least one solution. Each student should in
addition submit written solutions to a number of
exercises selected by the teacher from week to
week. - The course is ended by a written home exam.
9- Some practical points
- The teacher (Anders Nordgaard) works only
part-time at the university and is under normal
circumstances present on Thursdays and Fridays
(with exception for scheduled classes and
meetings that can be read on the door sign) - The easiest way to contact the teacher is by
e-mail Anders.Nordgaard_at_liu.se - E-mail is read all working days of a week
(including Monday-Wednesday) - Written solutions to exercises should be
submitted either directly at a meeting or
electronically by e-mail - The timetable for meetings will be decided
successively along the period of the course and
published on the course web page - Lectures will by no means cover all necessary
details of the course. It will not be sufficient
to read the lecture notes for success at the
final exam. The course book (or a textbook
covering the same topics) is necessary.
10- Some basic concept from inference theory
- A population is the group of elements under
study. In more detail it is the set of all
possible values of a certain property/feature. - The more stochastic description of a population
is the following - Consider the case were one single element is
drawn from the population. Denote its value by
X. X is then a random variable with a probability
distribution corresponding with how common each
unique value is in the population. - A population may be
- finite Contains a limited number of values
although duplicates are possible - countable infinite Contains an infinite number
of values but the numbers are enumerable. - uncountable infinite Contains an infinite number
of non-enumerable values. The values vary
continuosly.
11A random sample is a set of n values drawn from
the population. x1, , xn General
case Population is infinite (or drawing is with
replacement). Each value in the sample is an
outcome of a particular random variable. The
random variables X1, , Xn are independent with
identical probability distribution. Special
case Population is finite (and drawing is
without replacement). Each value in the sample is
an outcome of a particular random variable but
the random variables X1, , Xn are
non-independent and with different probability
distributions. Detailed theory about this case
may be find in higher-level textbooks on survey
sampling and it will not be covered by this
course.
12The population is usually characterized by a
parameter, generically denoted by ? (very often
one-dimensional but multi-dimensional cases will
also be treated) In a particular population, the
parameter is fixed. Typical examples of
population parameters are The mean ? The
variance ? 2 The parameters are at the same time
parameters of the corresponding probability
distribution, where we can find more, less
physically interpretable parameters. E.g. N(?, ?
2) Normal distribution B(n, p ) or
Bi(n, p) Binomial distribution P(? ) or Po
(? ) Poisson distribution U(a, b ) Uniform
distribution
13The sample can similarly be characterized by a
statistic (one- or multidimensional) In a
particular sample the statistic is fixed E.g.
In the theoretical investigation of statistics
and their properties we will always consider a
particular statistic as a random variable as it
is a function of the observations of a sample
which in turn are outcomes of individuals random
variables.
14Generally we may denote a statistic by
T. Considering the statistic as a numerical value
based on the values of the sample it can be
written as a function of the sample values T
T (x1, , xn ) Considering the statistic as a
random variable we just replace the values in the
argument of with their random variable
counterparts T T (X1, , Xn ) The
so-induced probability distribution of T is
called the sampling distribution of the
statistic. A specific statistic is the order
statistic T ( X(1), , X(n) ) Where X(1)
depicts the lowest value of the sample, X(2) the
second lowest etc. Note that we need n dimensions
to fully represent this statistic.
15- The likelihood function
- The probability distribution of each of the
random variables in a sample is characterized by - the probability mass function Pr(X x ) if the
random variable is discrete (i.e. the population
is countable infinite) - the probability density function f (x) if the
random variable is continuous (i.e. the
population is uncountable infinite) - Throughout the course (and the textbook) we will
use the term probability density function
(abbreviated p.d.f.) for both functions and it
should be obvious when this p.d.f. in fact is a
probability mass function. - We will sometimes also need the cumulative
probability distribution function - F(x ) Pr(X ? x )
- which has the same defintion no matter if the
random variable is discrete or continuos. It is
abbreviated c.d.f.
16As the probability distribution will depend on
the unknown parameter ? we will write f (x ? )
for the p.d.f. and F(x ? ) for the c.d.f The
likelihood function obtained from a sample x
(x1, ,,, , xn) is defined as i.e. the product
of the p.d.f. evaluated in all sample
values. Note that this is considered to be a
function of the parameter ? and not of the
sample values. Those are in a particular sample
considered to be known. The likelihood function
is related to how probable the current sample is,
and with discrete random variables it is exactly
the probability of the sample. For analytical
purposes it is often more convenient to work the
natural logarithm of L, i.e. As f(x ?) gt 0 for
all possible sample values and the log
transformation is one-to-one the two functions
are equivalent from an information point-of-view.
17Point estimation Assume a random sample x (x1,
, xn) from a population (distribution) with
probability density function (p.d.f) f (x ? )
We search for a function of the sample values
that is a (good) approximation to the unknown
value of ? . Assume is such a function.
is called the point estimate of ?
18- The point estimator has a probability
distribution induced by the sampling (and the
probability distribution of the sample values) - Referred to as the sampling distribution.
- Measures of location and dispersion of the point
estimator are of particular interest, e.g. - By investigating such measures and more generally
the sampling distribution, certain properties of
a point estimator may be obtained.
19Unbiasedness The bias of a point estimator
measures its mean deviation from ?
. Consistency