STAT131 Week7 L1a Exponential Distribution - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

STAT131 Week7 L1a Exponential Distribution

Description:

The random variable of interest , X, is the number of events occurring in a ... car passing.The count of cars per 30 second interval is an homogenous Poisson ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 44

Provided by: AP39

Category:

more less

Transcript and Presenter's Notes

Title: STAT131 Week7 L1a Exponential Distribution

1
STAT131Week7 L1aExponential Distribution

Anne Porter
alp_at_uow.edu.au

2
Lecture Outline

Review Poisson
Introduction to the Exponential
Context/assumptions
Probability
Centre
Spread
Calibration of the model
Goodness of fit

3
Poisson

Video Clip Blowhole

4
Poisson The random variable of interest , X,
is the number of events occurring in a fixed
dimension of length t.

The events occur in time or along any other
dimensional
continuum.
In each infinitesimally small period of length ?
the
probability of an event is P(event)?? for some
value ?.
In any infinitesimally small period of length, ?,
the
probability of two or more events is zero ie two
events
do not occur simultaneously.
The co-occurrence of events in any two
non-overlapping
periods is independent.
(Griffiths et al, 1998)

5
Exponential distribution

How else can we think about the count data we
examined as exemplifying the Poisson
distribution?

The time until the first event occurs and
because
the exponential process has no memory of the
previous event
The time until the next event or
The time between events.

6
Context or Problem - Exponential

Given a Poisson process observed from time t0
with a rate of events ?. Let Y be the time when
the first event occurs. Then Y has an exponential
distribution with parameter ?. Any two successive
events also has an exponential distribution with
parameter ?.

7
ProbabilityExponential

The probability that the random event Y takes on
a value between time1 and time 2 is given by

8
CentreExponential

The mean of the Exponential(?) Random Variable Y
is given as

9
Spread Variance
Spread Variance
10
Calibration of the Exponential Model

To estimate ? based on a sample we set the
mean of the distribution equal to the sample
mean
that is so

11
Problem

The random variable of interest is time to a the
next car passing.The count of cars per 30 second
interval is an homogenous Poisson (?).
The data consist of a list of inter-car arrival
times in seconds.
Develop a model to describe the data

12
Suggesting a model

Context and assumptions
Exploratory data analysis to reveal

If we have a homogeneous Poisson process with a
rate of events ? per unit of time then the time
from t0 to the first event is a random variable
with an exponential distribution.
Centre, shape, spread, outliers, allow
examination of theoretical assumptions
What plot might be useful?
13
Stem- and leaf plot reveals an exponential
distribution(12 bins)

Frequency Stem Leaf
7.00 0 2223333
34.00 0 . 55555566666667777788888999999999
99
18.00 1 000111122223333444
9.00 1 . 555567889
12.00 2 001222223344
4.00 2 . 5678
6.00 3 011123
7.00 3 . 6777899
3.00 4 133
2.00 4 . 59
2.00 5 22
1.00 5 . 9
7.00 Extremes (64), (68), (72), (74),
(86), (93)
Stem width 10
Each leaf 1 case(s)

Exponential shape
Centre Spread Outliers
14
Histogram suggests and exponential shaped
distribution (20 class intervals or bins)
Little jagged rather than smooth so try fewer bins
15
Histogram suggests and exponential shaped
distribution (10 class intervals or bins)
16
How many bins should we use?

let the data speak for themselves experiment
balance between too smooth and too jagged to
reveal shape

17
Exploratory statistics

TIME inter-arrival time (secs)
Valid cases 112.0
Missing cases .0 Percent missing .0
Mean 21.0982 Std Err 1.8138
Min 2.0000 Skewness 1.6881
Median 13.5000 Variance 368.4858
Max 93.0000 S E Skew .2284
5 Trim 18.9623 Std Dev 19.1960
Range 91.0000 Kurtosis 2.6822
IQR 21.5000 S E Kurt .4531

18
Exploratory statistics Compare Mean and
Standard deviation
Theoretically and
That is
In our sample the mean 21.0982 and the standard
deviation19.1960
These are close about 9 difference as a
percentage of the mean
19
Exploratory statistics Calibration

What should we do to calibrate the model?
This gives
Estimate ??
20
Probabilities

To find the probability of the time to be within
a certain time interval we use
where
P(0ltYlt20)
P(20ltYlt40)
P(40ltYlt60)
P(60ltYlt80)
P(Ygt80)

21
Probabilities

To find the probability of the time to be within
a certain time interval we use
where
P(0ltYlt20)

e -0.047x0-e -0.047x20
e0-e-0.94
1-0.3906 0.6094
22
Probabilities

P(0ltYlt20) 1- 0.3906 0.6094
P(20ltYlt40)

e -0.047x20 -e-0.047x40
0.3906-0.1526 0.2380
23
Probabilities

P(0ltYlt20) 1- 0.3906 0.6094
P(20ltYlt40) 0.3906 - 0.1526 0.2380
P(40ltYlt60) 0.1526 - 0.0596 0.0930
P(60ltYlt80) 0.0595 - 0.0233 0.0362
P(Ygt80)

1-(0.60940.23800.09300.0362) 0.0234
24
Expected counts

To find the frequencies of inter-arrival times
expected in each class interval

Multiply the probability of falling in an
interval by the total number of inter-arrival
times

freq expected (0ltYlt20)

0.6094 x 112 68.25
25
Finding expected counts for cells

freq expected (0ltYlt20) 0.6094 x 112 68.25
freq expected (20ltYlt40) 0.2380 x112 26.66
freq expected (40ltYlt60) 0.0930 x 112 10.42
freq expected (60ltYlt80) 0.0362 x 112 4.05
freq expected (Ygt80)

112-(68.2526.6610.424.05)2.62
As the expected number in the last two intervals
amount to less than 5 amalgamate these cells to
have freq expected gt60 6.67
26
Finding expected counts for cells

freq expected (0ltYlt20) 0.6094 x 112 68.25
freq expected (20ltYlt40) 0.2380 x112 26.66
freq expected (40ltYlt60) 0.0930 x 112 10.42
freq expected (Ygt60)
6.67
112.00
As the expected number in the last two intervals
amount to less than 5 we will amalgamate these
cells to have freq expected gt60 112-
(68.2526.6610.42) 6.67

class interval freq freq
Observed Expected
(0ltYlt20) 68 68.25
(20ltYlt40) 29 26.66
(40ltYlt60) 8 10.42
Ygt60 7 6.67
total

0.0009 0.2054 0.5620 0.016
0.7846
112
112
28
Decision

0.7486 ltg-p-1 ie lt 4-1-1 so the data
can be considered to fit the model
Informal Where we have one parameter l and
g 4 cells and dg-p-12. If gt
there is evidence of lack of fit
BUT 0.7486lt 24 so there is little evidence
that the data do not fit the model

29
Decision

Formal gt tabulated value
(5.991)with a0.05 and dfg-p-1 then there is
evidence the data do not fit the model
As 0.7486lt5.991 there is little evidence of lack
of fit between the exponential (.047) model and
the data

30
Assess Fit Observed compared to Expected
(Need to calculate probabilities and expected
counts first and this has been done with 5 not 4
bins as most appropriate Given in the last cell
the expected count is too small)
Good fit likely
31
Simulation

Simulate many samples of the same size (n112)
Simulate them so as to have the same parameter(s)
In this case ?0.047
See if the data set is similar to those samples
simulated and known to come from an exponential
model (0.047)
Same bins is sensible
If the data set is typical of those simulated it
is likely that the data follow the exponential
(0.047) model

32
Simulate Exponential (0.047) (n112 as per
original data set)Simulated samples can be
generated in SPSS
33
Simulate Exponential (0.047) (n112 as per
original data set)
34
Simulate Exponential (0.047) (n112 as per
original data set)
35
Simulate Exponential (0.047) (n112 as per
original data set)
36
Simulate Exponential (0.047) (n112 as per
original data set)
37
Simulate Exponential (0.047) (n112 as per
original data set)
38
Simulate Exponential (0.047) (n112 as per
original data set)
39
Simulation

If the data set is similar to those samples
simulated and known to come from an exponential
model (?) then it is likely that the data set
also comes from that same population

40
Quantile Plots
When the points fall on the straight line y(i)
qi where the qi are the quantiles expected with
the exponential (?).
41
Quantile - quantile plots

Sort (enter) the data in ascending order
Determine q1, q2,..qn such that the data are
divided into n1 areas
F(q1)1/(n1) 1/113
F(q2)2/(n1) 2/113 etc
Find P(0ltYltqi)
Solve for qi
F(q1) 1/113
F(qi) i/113 etc
Find P(0ltYltqi) and solve for qi (See lecture
notes)

42
What do we do if the data does not fit the model?

If the model does not fit, ask 'why not?' Could
it be that one or more assumptions do not hold.
Look at changes over time, constant rate or
lack of independence in time periods
Examine which cells have the largest lack of fit
Look at theoretical relationships eg
meanvariance or meanstandard deviation where
they exist

43
Poisson link to the ExponentialNext lecture