Title: STAT131 Week7 L1a Exponential Distribution
1STAT131Week7 L1aExponential Distribution
- Anne Porter
- alp_at_uow.edu.au
2Lecture Outline
- Review Poisson
- Introduction to the Exponential
- Context/assumptions
- Probability
- Centre
- Spread
- Calibration of the model
- Goodness of fit
3Poisson
4 Poisson The random variable of interest , X,
is the number of events occurring in a fixed
dimension of length t.
- The events occur in time or along any other
dimensional - continuum.
- In each infinitesimally small period of length ?
the - probability of an event is P(event)?? for some
value ?. - In any infinitesimally small period of length, ?,
the - probability of two or more events is zero ie two
events - do not occur simultaneously.
- The co-occurrence of events in any two
non-overlapping - periods is independent.
- (Griffiths et al, 1998)
5Exponential distribution
- How else can we think about the count data we
examined as exemplifying the Poisson
distribution?
- The time until the first event occurs and
because - the exponential process has no memory of the
previous event - The time until the next event or
- The time between events.
6 Context or Problem - Exponential
- Given a Poisson process observed from time t0
with a rate of events ?. Let Y be the time when
the first event occurs. Then Y has an exponential
distribution with parameter ?. Any two successive
events also has an exponential distribution with
parameter ?.
7ProbabilityExponential
The probability that the random event Y takes on
a value between time1 and time 2 is given by
8CentreExponential
- The mean of the Exponential(?) Random Variable Y
is given as
9Spread Variance
Spread Variance
10Calibration of the Exponential Model
- To estimate ? based on a sample we set the
- mean of the distribution equal to the sample
mean - that is so
11Problem
- The random variable of interest is time to a the
next car passing.The count of cars per 30 second
interval is an homogenous Poisson (?). - The data consist of a list of inter-car arrival
times in seconds. - Develop a model to describe the data
12Suggesting a model
- Context and assumptions
- Exploratory data analysis to reveal
If we have a homogeneous Poisson process with a
rate of events ? per unit of time then the time
from t0 to the first event is a random variable
with an exponential distribution.
Centre, shape, spread, outliers, allow
examination of theoretical assumptions
What plot might be useful?
13Stem- and leaf plot reveals an exponential
distribution(12 bins)
- Frequency Stem Leaf
- 7.00 0 2223333
- 34.00 0 . 55555566666667777788888999999999
99 - 18.00 1 000111122223333444
- 9.00 1 . 555567889
- 12.00 2 001222223344
- 4.00 2 . 5678
- 6.00 3 011123
- 7.00 3 . 6777899
- 3.00 4 133
- 2.00 4 . 59
- 2.00 5 22
- 1.00 5 . 9
- 7.00 Extremes (64), (68), (72), (74),
(86), (93) - Stem width 10
- Each leaf 1 case(s)
Exponential shape
Centre Spread Outliers
14Histogram suggests and exponential shaped
distribution (20 class intervals or bins)
Little jagged rather than smooth so try fewer bins
15Histogram suggests and exponential shaped
distribution (10 class intervals or bins)
16How many bins should we use?
- let the data speak for themselves experiment
- balance between too smooth and too jagged to
reveal shape
17Exploratory statistics
- TIME inter-arrival time (secs)
- Valid cases 112.0
- Missing cases .0 Percent missing .0
- Mean 21.0982 Std Err 1.8138
- Min 2.0000 Skewness 1.6881
- Median 13.5000 Variance 368.4858
- Max 93.0000 S E Skew .2284
- 5 Trim 18.9623 Std Dev 19.1960
- Range 91.0000 Kurtosis 2.6822
- IQR 21.5000 S E Kurt .4531
18Exploratory statistics Compare Mean and
Standard deviation
Theoretically and
That is
In our sample the mean 21.0982 and the standard
deviation19.1960
These are close about 9 difference as a
percentage of the mean
19Exploratory statistics Calibration
What should we do to calibrate the model?
This gives
Estimate ??
20Probabilities
- To find the probability of the time to be within
a certain time interval we use -
where - P(0ltYlt20)
- P(20ltYlt40)
- P(40ltYlt60)
- P(60ltYlt80)
- P(Ygt80)
21Probabilities
- To find the probability of the time to be within
a certain time interval we use -
where - P(0ltYlt20)
-
e -0.047x0-e -0.047x20
e0-e-0.94
1-0.3906 0.6094
22Probabilities
- P(0ltYlt20) 1- 0.3906 0.6094
- P(20ltYlt40)
e -0.047x20 -e-0.047x40
0.3906-0.1526 0.2380
23Probabilities
- P(0ltYlt20) 1- 0.3906 0.6094
- P(20ltYlt40) 0.3906 - 0.1526 0.2380
- P(40ltYlt60) 0.1526 - 0.0596 0.0930
- P(60ltYlt80) 0.0595 - 0.0233 0.0362
- P(Ygt80)
1-(0.60940.23800.09300.0362) 0.0234
24Expected counts
- To find the frequencies of inter-arrival times
expected in each class interval -
Multiply the probability of falling in an
interval by the total number of inter-arrival
times
0.6094 x 112 68.25
25Finding expected counts for cells
- freq expected (0ltYlt20) 0.6094 x 112 68.25
- freq expected (20ltYlt40) 0.2380 x112 26.66
- freq expected (40ltYlt60) 0.0930 x 112 10.42
- freq expected (60ltYlt80) 0.0362 x 112 4.05
- freq expected (Ygt80)
112-(68.2526.6610.424.05)2.62
As the expected number in the last two intervals
amount to less than 5 amalgamate these cells to
have freq expected gt60 6.67
26Finding expected counts for cells
- freq expected (0ltYlt20) 0.6094 x 112 68.25
- freq expected (20ltYlt40) 0.2380 x112 26.66
- freq expected (40ltYlt60) 0.0930 x 112 10.42
- freq expected (Ygt60)
6.67 -
112.00 - As the expected number in the last two intervals
amount to less than 5 we will amalgamate these
cells to have freq expected gt60 112-
(68.2526.6610.42) 6.67
27- class interval freq freq
- Observed Expected
- (0ltYlt20) 68 68.25
- (20ltYlt40) 29 26.66
- (40ltYlt60) 8 10.42
- Ygt60 7 6.67
- total
0.0009 0.2054 0.5620 0.016
0.7846
112
112
28Decision
- 0.7486 ltg-p-1 ie lt 4-1-1 so the data
can be considered to fit the model - Informal Where we have one parameter l and
- g 4 cells and dg-p-12. If gt
- there is evidence of lack of fit
- BUT 0.7486lt 24 so there is little evidence
that the data do not fit the model
29Decision
- Formal gt tabulated value
(5.991)with a0.05 and dfg-p-1 then there is
evidence the data do not fit the model - As 0.7486lt5.991 there is little evidence of lack
of fit between the exponential (.047) model and
the data
30Assess Fit Observed compared to Expected
(Need to calculate probabilities and expected
counts first and this has been done with 5 not 4
bins as most appropriate Given in the last cell
the expected count is too small)
Good fit likely
31Simulation
- Simulate many samples of the same size (n112)
- Simulate them so as to have the same parameter(s)
- In this case ?0.047
- See if the data set is similar to those samples
simulated and known to come from an exponential
model (0.047) - Same bins is sensible
- If the data set is typical of those simulated it
is likely that the data follow the exponential
(0.047) model
32Simulate Exponential (0.047) (n112 as per
original data set)Simulated samples can be
generated in SPSS
33Simulate Exponential (0.047) (n112 as per
original data set)
34Simulate Exponential (0.047) (n112 as per
original data set)
35Simulate Exponential (0.047) (n112 as per
original data set)
36Simulate Exponential (0.047) (n112 as per
original data set)
37Simulate Exponential (0.047) (n112 as per
original data set)
38Simulate Exponential (0.047) (n112 as per
original data set)
39Simulation
- If the data set is similar to those samples
simulated and known to come from an exponential
model (?) then it is likely that the data set
also comes from that same population
40Quantile Plots
When the points fall on the straight line y(i)
qi where the qi are the quantiles expected with
the exponential (?).
41Quantile - quantile plots
- Sort (enter) the data in ascending order
- Determine q1, q2,..qn such that the data are
divided into n1 areas - F(q1)1/(n1) 1/113
- F(q2)2/(n1) 2/113 etc
- Find P(0ltYltqi)
- Solve for qi
- F(q1) 1/113
- F(qi) i/113 etc
- Find P(0ltYltqi) and solve for qi (See lecture
notes)
42What do we do if the data does not fit the model?
- If the model does not fit, ask 'why not?' Could
it be that one or more assumptions do not hold. - Look at changes over time, constant rate or
lack of independence in time periods - Examine which cells have the largest lack of fit
- Look at theoretical relationships eg
meanvariance or meanstandard deviation where
they exist
43Poisson link to the ExponentialNext lecture
- No events between time zero and time t
- Is the same as
- The time of the first event is greater than t