Title: Engineering Statistics
1Engineering Statistics
The University of Reading
School of Systems Engineering
2Introduction
- course should really be called probability and
statistics - both are about DATA
- facts given, from which others may be inferred
- statistics is about analysing measured data
- the issues are how to collect the data and how
much to collect so that valid conclusions can be
drawn - probability is about predicting what future
values are likely to be - given a model for the possible values which can
occur - model may come from previously measured values or
from theoretical considerations
3Relationship between probability and statistics
4How can we measure data?
- measures fall into 2 categories
- average value
- dispersion (or spread)
- both give us clues to the model for the data but
both can hide key features of data set - may not be enough samples to be truly
representative - can be distorted by rogue or outlying values
- measures used for the average value are the
mean, median and mode - measures used for the dispersion are the range,
variance and standard deviation
5Definitions of average
- Mean
- the arithmetic average of all the data values
- Median
- the middle value of the data set, when the values
are placed in order - if there is an even number of values, then the
median is half-way between the two middle values - Mode
- the most commonly occurring value
- some data sets may have more than one mode
6A sample set of data
- the speed of cars passing a certain point within
a 30 mph zone - 31,29,28,24,29,35,37,27,29,30,32,30,29,20,45,27,35
, 34,33,29 - what is the mean?
- what is the median?
- what is the mode?
- do these figures change if the outlier's are
removed?
7Data set
- it helps if the data is presented in another form
- this is a histogram and gives a much clearer
picture of what is going on - the graphing tool in a spreadsheet can give lots
of different ways of looking at the same data
8Questions to ask about this data set
- if we took another sample set, of the same number
of cars passing the same point, would the same
data be found? - would the average values and spread be the
same - are there enough samples to give a true picture
of the data? - could we use this data set to predict the speed
of the next car? - can we go from statistics to probability and back
again?
9Measures of spread
- Range
- the difference between the smallest and largest
data values - xn - x1
- Inter-quartile range
- each quartile is defined by dividing the number
of data points into 4 equal parts - the inter-quartile range is the difference
between the 1st and 3rd quartile values - Variance
- first need to calculate the residuals
-
- then the variance is given by
10Measures of spread (continued)
- because the variance is measured in (units)2 it
is more common to use the standard deviation as a
measure of spread. - standard deviation is given the abbreviation s
or s depending on which definition you use
11Measures of spread (cont)
- s is preferred as a definition of standard
deviation, although it is not intuitively the
most sensible - the reasons are beyond the scope of an
introductory course - s is used when the standard deviation of the
whole population is being calculated - there are also other definitions of variance
- divide by n or n-1
- if the number of samples is big enough, the
calculated values of s and s will be very
similar - a good reason for taking a large number of
samples - what are the inter-quartile range, variance and
standard deviation of the data given before?
12Box plots
- A good way of providing visual information about
a data set is to use a box plot - it can help you identify the centre, spread,
departure from symmetry and any significant
outliers - the box encloses the inter-quartile range with a
line extending from each end of the box to the
largest data point within 1.5 inter-quartile
ranges of the first (and third) quartile - data outside this range are plotted as individual
points
1st quartile
3rd quartile
outliers
2nd quartile
13Models for data
- data values which we measure can fall into a
variety of categories - they can be discrete or continuous
- they can be random or dependant
- which categories do the following fit into
- tossing a coin
- drawing cards from a pack
- height of men living in England
- width of leaves on an oak tree
- votes at a general election
- can you give an example of a data set which could
be classified as continuous and dependant?
14Samples and Populations
- in many situations it is impossible (or
undesirable) to measure the full set of data - opinion polls before a general election
- width of leaves on a tree
- a sample must be taken from the whole population
- much of the science of statistics and probability
is concerned with determining what an appropriate
sample is - how may samples and how accurate do they need to
be? - what confidence do we have that our sample
represents the true situation? - can we predict future values given the sampled
values?
15Probability
- probability is always taken as a number lying
between 0 and 1 and is denoted by p(x) - p(x) 1 means that an event is certain to happen
- p(x) 0 means that an event is certain NOT to
happen - so a toss of a coin could be represented as
- P(heads) 0.5 and P(tails) 0.5
- or, more formally
- p(x) 0.5 x heads, tails
- NOTE the use of P for an individual event and p
for a function
16Probability (cont)
- if we want to be pedantic then a small
possibility of the coin falling on its edge
exists, so - p(x) 0.4999 x head, tails
- p(x) 0.0001 x edge
- the sum of all the p(x) values MUST be 1 as
something must happen when a coin is tossed - it must land as heads, tails or on its edge
- how would we write the probabilities associated
with rolling a dice?
17Rules of probability
- there are some rules associated with
probabilities when the events are not dependent
on each other - P(A or B) P(A B) P(A) P(B)
- P(A and B) P(AB) P(A)P(B)
- for example, rolling a dice multiple times, or
rolling two dice - if the events are not dependent but they are not
mutually exclusive then - P(A B) P(A) P(B) - P(AB)
- how can this be extended to more than two
possible events?
182-Dice
19Conditional probability
- if the events are dependent then these rules do
not apply - the rules of conditional probability
must be invoked - P (AB) P(AB) / P(B)
- for example, drawing a card from a pack
- generally, if events (or instances) are
conditional on past events or other instances,
then probability theory becomes much more complex - we must revise the rules of permutations and
combinations
20Permutations and Combinations
- multiplication rule
- if you are drawing one element from each of k
sets, with the sizes of the sets n1 , n2 , n3 ,
.. nk then the number of different possible
outcomes is n1n2n3..nk - permutations rule
- if you are drawing k elements from a set of n
elements and arranging the k elements in a
distinct order, the number of different possible
results is
21Permutations and combinations (cont)
- partitions rule
- if you are partitioning the elements of a set of
n elements into k groups, each consisting of n1 ,
n2 , n3 nk elements the number of different
results is - combinations rule
- if you are drawing k elements from a set of n
elements, without regard to the order of the k
elements, the number of different possible
results is
22Probability mass function (p.m.f)
- if a random variable is discrete, its possible
states and their associated probabilities can be
modelled by a p.m.f - this is a mathematical expression (often shown in
diagrammatic form) which covers all possibilities - for example, a test of light-bulbs after 800
hours use could show - p(x) 0.8 x working p(x)
0.2 x failed
p(x)
23Mean of a discrete random variable
- for a discrete random variable with p.m.f of P(X
x) the mean of X - also called the expected value of X or E(X) is
given by - for example, rolling a dice or tossing a coin
multiple times
24The Bernoulli probability model
- if a random variable can only take one of two
values (which could be denoted by 0 and 1) then
the values are said to be Bernoulli random
variables and any observation of the variable is
said to be a Bernoulli trial - examples might be tossing a coin, has a
component failed, is a road open etc - clearly if P(1) p then P(0) 1 - p
- this is usually written in the form
25The binomial probability model
- this is used when there are a set of Bernoulli
trials which are independent of each other - for example drug trials
- if a set of n independent Bernoulli trials each
has an identical probability of success, p, then
the random variable, Y , defined as the total
number of successes over all the trials is said
to follow a binomial distribution with parameters
n and p. - this is written as Y B(n,p)
26Cumulative distribution function of the binomial
model
- it is often necessary to determine the
probability associated with a random variable
being less than or greater than a given value - for example, predicting the number of faulty
components in an individual batch - this can be determined using a cumulative
distribution function (c.d.f)
27Mean of a binomial distribution
- using the definition given before for the mean
(or expected value) of a discrete random
variable, the mean of a binomial distribution is - this looks complex, but can be shown to reduce to
28Probability density functions
- what happens when the variable is continuous as
well as random - a probability model must be constructed by taking
a sufficient number of samples to model the whole
population - this leads to the construction of a probability
density function - there are many possibilities for p.d.f
- a common model for random, continuous variables
is called the normal p.d.f - this has the form of a bell-shaped curve and is
scaled so that the total area underneath it is 1 - i.e..
29The normal p.d.f
- is defined by
- where the parameter m is the population mean and
the parameter s is the population standard
deviation - note the use of s for the standard deviation -
the definition implies that we know the
characteristics of the whole population
30The normal p.d.f
31Probabilities associated with a normal p.d.f
p(x)
x
x1
x2
32The standard normal distribution
- as the integrals associated with the normal
distribution are difficult to solve in closed
form it is usual to tabulate values associated
with a standard normal distribution (m 0, s
1) - and to use the transform
33The mean of a continuous random variable
- for a continuous random variable X with p.d.f.
p(x) over a specified range, the mean or expected
value of X is given by - for the normal p.d.f, this can be shown to reduce
to m, which is what we defined it to be, in the
first place!
34The variance of a random variable
- if the variable X is random and discrete then the
variance, Var(X) s2 , is given by - for a binomial distribution, this reduces to
Var(X)npq - if the variable X is random and continuous then
the variance is given by - in both cases, the standard deviation is s - the
square-root of the variance.
35The Poisson distribution
- there are many situations where the individual
probabilities of events occurring are unknown
(assuming each event is a discrete, random
variable) - there are many situations when an event is rare -
there are a large number of samples (n) and the
probability of occurrence of the event (p) is low
- failures of electronic components
- defects in manufacturing
- arrival of call for a particular number at a
telephone exchange - number of accidents in a factory
- these situations are can be dealt with by using
the Poisson distribution
36The Poisson distribution
- the Poisson distribution is given by
- where m is the mean or expected value of x
- as well as its other uses, an approximation to
the true binomial distribution can be found from
the Poisson distribution - remember that, for the binomial distribution E(x)
np - hence B(n,p) Poisson(np) and B(n, m/n)
Poisson(m) - a feature of the Poisson distribution is that its
variance is equal to its mean. i.e. V(X) E(X)
m
37Taking random samples
- all our work on distributions so far has
concerned entire populations. - Most of the time we will only have a sample (or
possibly, several independent samples) of the
whole population. - How do the parameters of the population (m, s2)
relate to the parameters of a random sample (
, s2)? - How much belief can we have that the results of a
sample are a true reflection of the population as
a whole?
38Random samples
- If random samples (X1, X2,. Xn) are taken from a
given population (mean m, variance s2), then
the following statements can be made about their
mean - the variance of the random samples will follow
these rules
39Central Limit Theorem
- hence, if a random sample of size n is taken from
a normal population, with mean m and variance s
2 , then the sample mean is normally
distributed with mean m and variance s 2/n - this can be generalised to the central limit
theorem (c.l.t), which states that however the
original population is distributed, then - a consequence of the c.l.t is that the total of
the samples Tn X1 X2.Xn is also normally
distributed
40Central Limit Theorem (cont)
- if X is binomial B(n, p) then the distribution of
X can be approximated by a normal model - where q 1 - p. The approximation is useful when
both np and nq are over 5 - if X is Poisson (m) then the distribution of X
can be approximated by a normal model - provided m is at least 30
41Consequence of the Central Limit Theorem
- suppose we have two independent populations which
we need to compare - we can say that
- and
- consequently
42Confidence testing
- whenever a sample is taken we want to know
whether it provides a reasonable estimate of the
population as a whole - this can be thought of as a subjective test
- do you think that a 1 or a 5 chance of being
wrong is acceptable - a consequence of the central-limit-theorem is
that we can use the standard normal tabulations
to give a good idea of the confidence we can have
in samples
43Significance Tests
- rules are
- if population is N(m, s2) or can be approximated
by it then the sampling distribution is N(m,
s2/n) - null hypothesis H0 m m0
- alternative hypothesis H1 m gt m0 or m lt m0 or m
¹ m0 - critical test parameter a
- a 5 reasonable evidence
- a 1 strong evidence
- a 0.5 very strong evidence
- the hypotheses and the value of a to be decided
before the test
44Confidence intervals
- e.g. m ¹ m0
- the values of z which correspond to the
particular confidence level we require can be
found from the table - then we can determine if our sample values fall
within these limits
45significant values of za
- these values (and any others, if needed) can be
determined from the tabulated values of the
standard normal distribution
46Significance testing (cont)
- need to compute critical values of z
corresponding to chosen level of significance
a Þ za - then need to compute the test statistic e.g. the
mean - compare z and za
47The Students-t distribution
- in many real cases the value of the population
variance (s2) is unknown, so the sample variance
(s2) must be used as an estimate - this means that we can no-longer use the
standard-normal variate z (as this needs s in its
transform) but must define a new variate t - this now contains two random variables (x and s)
and the distribution of t will depend on the
number of samples - these are known as Students t-distribution and
they are indexed by a parameter, called the
degrees of freedom or u, where u n-1 - tables of t for various values of u and
significance levels can be found in textbooks or
by using a spreadsheet
48The chi-squared distribution
- it is useful to be able to predict the
distribution of the sample variance (s2) - just as the central-limit-theorem allowed us to
predict the distribution of a number of sample
means - this is stated here (without proof) in the
following way - the distribution of s2 (the sample variance)
follows the chi-squared c2 distribution such that - the chi-squared distributions are NOT bell-shaped
and are best calculated using a spreadsheet
49Statistics and Probability
- there is a lot more to this subject than we have
time to cover, for example - how are outlying values dealt with?
- how do you decide which model fits your data
best? - sum up by returning to