Selecting Input Probability Distribution - PowerPoint PPT Presentation

About This Presentation

Title:

Selecting Input Probability Distribution

Description:

Introduction. need to specify probability distributions of random inputs. processing times at a specific machine. interarrival times of customers/pieces – PowerPoint PPT presentation

Number of Views:208

Avg rating:3.0/5.0

Slides: 51

Provided by: acat150

Category:

more less

Transcript and Presenter's Notes

Title: Selecting Input Probability Distribution

1
Selecting Input Probability Distribution
2
Introduction

need to specify probability distributions of
random inputs
processing times at a specific machine
interarrival times of customers/pieces
demand size
evaluate data sets (if available)
failure to choose the correct distribution can
affect the accuracy of the models results!

3
Assessing Sample Independence

correlation plot
scatter diagram

4
Assessing Sample Independence

important assumption
observations are supposed to be independent
graphical techniques for informally assessing
whether data are independent
correlation plot
scatter diagram

5
correlation plot

graph of sample correlation
estimate of the true correlation between two
observations that are j observations apart in
time
if observations X1, X2, , Xn are independent
then ½j 0 for j 1, 2, , n-1
estimates wont be exactly zero, even if Xis are
independent, since its an observation of a random
variable
if estimates differ from 0 by a significant
amount, then its strong evidence that the Xis
are not independent

6
correlation plot (example)
7
correlation plot (example)
8
scatter diagram

plot of pairs (Xi, Xi1)
if Xis are independent, one would expect the
points (Xi, Xi1) to be scattered randomly
throughout the first quadrant of the plane
nature of scattering depends on underlying
distribution of the Xis
if Xis are positively (negatively) correlated,
points will tend to lie along a line with
positive (negative) slope

9
scatter diagram (example)
10
scatter diagram (example 2)
11
Specifying Distribution

useful distributions
use values directly
define empirical distribution
fit theoretical distribution

12
useful probability distribution

parameters of continuous distributions
location parameter
x-axis location
usually the midpoint (mean for normal
distribution) or lower endpoint
also called shift-parameter
changes in shift the distribution left or right
without changing it otherwise
scale parameter
determines scale (unit) of measurement
standard deviation ¾ for normal distribution
changes in compress or expand the associated
distribution without altering its basic form

13
useful probability distribution

parameters of continuous distributions
shape parameter
determines basic form or shape of a distribution
within the general family of distributions of
interest
a change in generally alters a distributions
properties (skewness) more fundamentally than a
change in location or scale

14
Approaches to specify distribution

if data collection on an input random variable is
possible
use data values directly in simulation (trace
driven)
only reproduces what happened
seldom enough data to make all simulation runs
useful for model validation
define empirical distribution
at least (for continuous data) any value between
min and max
no values outside the range can be generated
may have irregularities
fit to theoretical distribution
preferred method
easy to change

15
Specifying Distribution

useful distributions
use values directly
define empirical distribution
fit theoretical distribution

16
Uniform U(a,b)

application
used as a first model for a quantity that is
felt to be randomly varying between a and b about
which little else is known

17
exponential distribution exp()

application
interarrival times of entities to a system that
occur at a constant rate
time to failure of a piece of equipment
parameters
scale parameter gt 0

18
gamma(k, µ)

application
time to complete some task (customer service,
machine repair)
parameters
shape parameter k gt 0
scale parameter µ gt 0

19
weibull(k, )

application
time to complete some task, time to failure of a
piece of equipment
used as a rough model in absence of data
parameters
shape parameter k gt 0, scale parameter gt 0

20
normal N(¹, ¾2)

application
errors of various types
quantities that are the sum of a large number of
other quantities
parameters
location parameter -1 lt ¹ lt 1 scale parameter ¾ gt
0

21
triangular (a,b,m)

application
used as a rough model in absence of data
a, b, m are real numbers (a lt m lt b)
location parameter a
scale parameter b-a
shape parameter m

22
poisson()

application
number of events that occur in an interval of
time when events are occurring at a constant rate
number of items demanded from inventory

23
Specifying Distribution

useful distributions
use values directly
define empirical distribution
fit theoretical distribution

24
Empirical Distributions

use observed data themselves to specify
distribution directly
generate random variables from empirical
distribution
(if no theoretical distribution can be fitted)
define a continuous piecewise-linear distribution
function
sort Xjs into increasing order
X(i) denotes the ith smallest value of all Xjs

25
Empirical Distribution (example)

observation X1 3, X2 8, X3 18,
X4 10, X5 13, X6 6
sorted observation X(1) 3, X(2) 6, X(3)
8, X(4) 10, X(5) 13, X(6) 18
distribution
F(X(i))
F(X(i)) (i-1)/(n-1)
F(X(1)) F(3) 0/5 0
F(X(2)) F(6) 1/5
F(X(3)) F(8) 2/5
etc

F(X) if X(i) X X(i1)
F(X) (i-1)/(n-1) (X X(i))/((n-1)(X(i1)-X(i)
)
F(12) ??
interval X(4) 12 lt X(5)
(n 6, i 4)
F(12) 3/5 2/(53) 0.68

26
Empirical Distribution (example)
27
Specifying Distribution

useful distributions
use values directly
define empirical distribution
fit theoretical distribution

28
Necessary Steps for fitting a theoretical
distribution

hypothesize family
summary statistics
histogram
quantile summary box plots
estimate parameters
how representative is fitted distribution?
Chi-Square Goodness of fit test
Kolmogorov-Smirnoff Test

29
Hypothesizing families of distributions

first step in selecting a particular input
distribution
decide upon general family appears to be
appropriate
prior knowledge might be helpful
service times should never be generated from a
normal distribution WHY????
approaches
summary statistics
histograms
quantile summaries and box plots

30
Summary Statistics

some distributions are characterized at least
partially by functions of their true paramters
sample estimate
estimate for range
minimum X(1)
maxiumum X(n)
measure of tendency
mean ¹
median x0.5

31
Summary Statistics (cont.)

sample estimate
measure of variability
variance ¾2
coefficient of variation cv
measure of symmetry
skewness n

32
Histograms

graphic estimate of the plot of the density
function corresponding to the distribution of
data
density functions tend to have recognizable
shapes in many cases
graphical estimate of a density should provide a
good clue to the distribution that might be tried
as a model for the data

33
Histograms

how to
break up range of values into k disjoint adjacent
intervals (same width)
b0, b1), b1, b2), , bk-1, bk) b bj
bj-1
you might want to throw out a few extremely large
or small Xis to avoid getting an
unwidely-looking histogram plot
let hj be the proportion of Xis that are in the
jth interval bj-1, bj)
hint try several values of b and choose the
smallest one that gives a smooth histogram

34
Histogram (example)

create 1000 random variables N(0,1)
create histogram

35
Quantile Summaries

useful for determining whether the underlying
probability density function is skewed to the
right or left
if F(x) is the distribution function for a
continuous random variable
q-quantile of F(x) is that number xq such that
F(xq) q
median x0.5
lower/upper quartiles x0.25 / x0.75
lower/upper octiles x0.125 / x0.875

36
Quantile Summaries

Quantile Depth Sample Values Midpoint
Median i (n1)/2 X(i) X(i)
Quartiles j (floor(i)1)/2 X(j)
X(n-j1) X(j) Xn-j1)/2
Octiles k (floor(j)1)/2 X(k) X(n-k1) X(k)
Xn-k1)/2
Extremes 1 X(1) X(n) (X(1) X(n)/2
if the underlying distribution of the Xis is
symmetric, then the midpoints should be
approximately equal
if the underlying distribution is skewed to the
right (left), then the midpoints should be
increasing (decreasing)

37
Box Plots (example)

graphical representation of quantile summary
fifty percent of observations fall within the
horizontal boundaries of the box x0.25, x0.75

38
Necessary Steps for fitting a theoretical
distribution

hypothesize family
summary statistics
histogram
quantile summary box plots
estimate parameters
how representative is fitted distribution?
Chi-Square Goodness of fit test
Kolmogorov-Smirnoff Test

39
Estimation of Parameters

After one ore more candidate families of
distributions have been hypothesized we most
somehow specify the values of their parameters in
order to have a completely specified
distributions for possible use in simulation
maximum likelihood estimators (MLEs)
estimator numerical function of the data
unknown parameter µ
hypothesized density function fµ(x)
likelihood function L(µ)
estimator is value µ that maximizes Lµ over
all permissible values of µ

40
Estimation for Parameters (example)

exponential distribution with unknown parameter
(µ )
f(x) (1/) e-x/ for x 0
likelihood function L()
we seek value of that maximizes L() over all
gt 0
easier to work with its logarithm
(maximize l() instead of L())
maximize set derivative equal to zero and solve
for

41
Necessary Steps for fitting a theoretical
distribution

hypothesize family
summary statistics
histogram
quantile summary box plots
estimate parameters
how representative is fitted distribution?
Chi-Square Goodness of fit test
Kolmogorov-Smirnoff Test

42
Goodness-of-Fit Tests

Statistical hypothesis tests
used to assess formally whether the observations
X1, X2, Xn are independent samples form a
particular distribution with distribution
function
H0 the Xis are IID random variables with
distribution function
be careful failure to reject H0 should not be
interpreted as accepting H0 as being true.
well concentrate on two different ones
chi-square test
Kolmogorov-Smirnoff tests

43
Chi-Square Goodness-of-Fit Test

more formal comparison of a histogram with the
fitted density or mass function
how to
divide range into k adjacent intervals a0, a1),
a1, a2), , ak-1, ak)
how to choose number and size of intervals? !
equiprobable
determine Nj (number of Xis in the jth interval
aj-1, aj)
compute pj (expected proportion of the Xis that
would fall in the jth interval if we were
sampling from the fitted distribution
determine test statistic ?² and reject H0 if its
too large

44
Chi-Square Goodness-of-Fit Test (cont.)

case 1 all parameters of the fitted distribution
are known
if H0 is true, Â2 converges in distribution (as n
? 1) to a chi-square distribution with k-1
degrees of freedom
for large n, a test with approximate level is
obtained by rejecting H0 if
upper 1 - critical point for a
chi-square distribution with k-1 dfs

45
Chi-Square Goodness-of-Fit Test (cont.)

case 2 m parameters had to be estimated to
specify fitted distribution
if H0 is true, then as n ! 1 the distribution
function of ?2 converges to a distribution
function that lies between the distribution
function with k-1 and k-m-1 degrees of freedom
the upper 1 - critical point of
the asymptotic distribution of ?2 (in general not
known)
reject H0 if
do not reject H0 if
ambiguous situation if
recommendation reject H0 if (conservative)

46
Kolmogorov-Smirnov Goodness-of-Fit Test

compares an empirical distribution function with
the distribution function of the hypothesized
distribution
not necessary to group data
valid for any sample size n
tend to be more powerful than chi-squared tests
but only valid if all parameters of the
hypothesized distribution are known and the
distribution is continuous

47
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)

compute tests statistics
define empirical distribution function
test statistic Dn corresponds to largest
(vertical) distance between Fn(x) and
hypothesized distribution function of

48
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)

case 1 all parameters of estimated distribution
function are known
distribution of Dn does not depend on
(if is continuous)
reject H0 if
c1- (does not depend on n) given in the
following table
1 - 0.85 0.9 0.95 0.975 0.99
c1- 1.138 1.224 1.358 1.48 1.628

49
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)

case 2
hypothesized distribution is N(¹, ¾2) with both ¹
and ¾2 unknown (estimated) , estimated
distribution function
Dn is calculated the same way as in case 1 -
different critical points
reject H0 if
c1- (does not depend on n) given in the
following table
1 - 0.85 0.9 0.95 0.975 0.99
c1- 0.775 0.819 0.895 0.955 1.035

50
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)