Title: Statistics
1Statistics
- What Is Engineering?
- July 16, 2007
2Why Study Statistics?
Statistics A mathematical science concerned with
data collection, presentation, analysis, and
interpretation.
Statistics can tell us about
Sports
Population
Economy
3Why Study Statistics?
Statistical analysis is also an integral part of
scientific research!
Are your experimental results believable?
Example Breaking strength of Welds
Control
Response
Data suggests a relationship between velocity and
breaking strength!
Not perfect have random error.
(To make a weld, the operator stops a rotating
part by forcing it into a stationary part the
resulting friction generates heat that produces a
hot-pressure weld.)
4Why Study Statistics?
Responses and measurements are variable!
Due to
Randomness (or individual differences) in
sampling population.
Inability to perform measurements in exactly the
same way every time.
Goal of statistics is to find the model that best
describes a target population by taking sample
data.
Represent randomness using probability.
5Probability
Experiment of chance a phenomena whose outcome
is uncertain.
Probabilities
Chances
Sample Space
Events
Probability Model
Probability of Events
Sample Space Set of all possible outcomes
Event A set of outcomes (a subset of the sample
space). An event E occurs if any of its outcomes
occurs.
Probability The likelihood that an event will
produce a certain outcome.
6Probability
Consider a deck of playing cards
Set of 52 cards
Sample Space?
Event?
R The card is red. F The card is a face
card. H The card is a heart. 3 The card is a
3.
Probability?
P(R) 26/52 P(F) 12/52 P(H) 13/52 P(3)
3/52
7Events and variables
Can be described as random or deterministic
The outcome of a random event cannot be predicted
The sum of two numbers on two rolled dice. The
time of emission of the ith particle from
radioactive material.
The outcome of a deterministic event can be
predicted
The measured length of a table to the nearest cm.
Motion of macroscopic objects (projectiles,
planets, space craft) as predicted by classical
mechanics.
8Extent of randomness
A variable can be more random or more
deterministic depending on the degree to which
you account for relevant parameters
Mostly deterministic
Only a small fraction of the outcome cannot be
accounted for.
Length of a table
- Temperature/humidity variation
- Measurement resolution
- Instrument/observer error
- Quantum-level intrinsic uncertainty
Mostly Random
Most of the outcome cannot be accounted for.
- Trajectory of a given molecule in a solution
9Random variables
Can be described as discrete or continuous
- A discrete variable has a countable number of
values.
Number of customers who enter a store before one
purchases a product.
- The values of a continuous variable can not be
listed
Distance between two oxygen molecules in a room.
Consider data collected for undergraduate
students
Is the height a discrete or continuous variable?
How could you measure height and shoe size to
make them continuous variables?
10Probability Distributions
If a random event is repeated many times, it will
produce a distribution of outcomes (statistical
regularity).
(Think about scores on an exam)
The distribution can be represented in two ways
- Frequency distribution function represents the
distribution as the number of occurrences of each
outcome - Probability distribution function represents
the distribution as the percentage of occurrences
of each outcome
11Discrete Probability Distributions
Consider a discrete random variable, X
f(xi) is the probability distribution function
What is the range of values of f(xi)?
Therefore, Pr(Xxi) f(xi)
12Discrete Probability Distributions
Properties of discrete probabilities
for all i
for k possible discrete outcomes
Where
13Discrete Probability Distributions
Example Waiting for a success
Consider an experiment in which we toss a coin
until heads turns up.
Outcomes, w H, TH, TTH, TTTH, TTTTH Let X(w)
be the number of tails before a heads turns up.
For x 0, 1, 2.
14Cumulative Discrete Probability Distributions
Where xj is the largest discrete value of X less
than or equal to x
?
15Cumulative Continuous Probability Distributions
For continuous variables, the events of interest
are intervals rather than isolated values.
Consider waiting time for a bus which is equally
likely to be anywhere in the next ten minutes
Not interested in probability that the bus will
arrive in 3.451233 minutes, but rather the
probability that the bus will arrive in the
subinterval (a,b) minutes
F(t)
1
t
10
16Continuous Probability Density Function
c.d.f Gives the fraction of the total
probability that lies at or to the left of each x
p.d.f Gives the density of concentration of
probability at each point x
In terms of the c.d.f.
When F(x) is differentiable at x, and ?x is
small, we can approximate ?F by the differential
of F
17Continuous Probability Distributions
Properties of the cumulative distribution
function
Properties of the probability density function
18Continuous Probability Distributions
Example Gaussian (normal) distribution
Each member of the normal distribution family is
described by the mean (µ) and variance (s2).
Standard normal curve µ 0, s 1.
19Central Limit Theorem
As the sample size goes to infinity, the
distribution function of the standardized
variable leads to the normal distribution
function!
http//www.jhu.edu/virtlab/prob-distributions/
20Moments
In physics, the moment refers to the force
applied to a system at a distance from the axis
of rotation (as in a lever).
In mathematics, the moment is a measure of how
far a function is from the origin.
The 1st moment about the origin
(mean)
? Average value of x
The 2nd moment about the mean
(variance)
? A measure of the spread of the data
21Moments
Other values in terms of the moments
Skewness
- lopsidedness of the distribution
- a symmetric distribution will have a skewness
0 - negative skewness, distribution shifted to the
left - positive skewness, distribution shifted to the
right
Kurtosis
? Describes the shape of the distribution with
respect to the height and width of the curve
(peakedness)
22Standard Error
Standard Deviation
Variance is the average squared distance of the
data from the mean. Therefore, the standard
deviation measures the spread of data about the
mean.
Standard Error
Where N is the sample size
How do we reduce the size of our standard error?
23Independence
A measure of whether two variables are related.
Consider data collected for arrowhead breakage
Does the location of the fracture depend on the
cause of fracture? Or in other words, is the
location of fracture independent of the cause?