Selecting Input Probability Distribution - PowerPoint PPT Presentation

About This Presentation
Title:

Selecting Input Probability Distribution

Description:

Introduction. need to specify probability distributions of random inputs. processing times at a specific machine. interarrival times of customers/pieces – PowerPoint PPT presentation

Number of Views:204
Avg rating:3.0/5.0
Slides: 51
Provided by: acat150
Category:

less

Transcript and Presenter's Notes

Title: Selecting Input Probability Distribution


1
Selecting Input Probability Distribution
2
Introduction
  • need to specify probability distributions of
    random inputs
  • processing times at a specific machine
  • interarrival times of customers/pieces
  • demand size
  • evaluate data sets (if available)
  • failure to choose the correct distribution can
    affect the accuracy of the models results!

3
Assessing Sample Independence
  • correlation plot
  • scatter diagram

4
Assessing Sample Independence
  • important assumption
  • observations are supposed to be independent
  • graphical techniques for informally assessing
    whether data are independent
  • correlation plot
  • scatter diagram

5
correlation plot
  • graph of sample correlation
  • estimate of the true correlation between two
    observations that are j observations apart in
    time
  • if observations X1, X2, , Xn are independent
  • then ½j 0 for j 1, 2, , n-1
  • estimates wont be exactly zero, even if Xis are
    independent, since its an observation of a random
    variable
  • if estimates differ from 0 by a significant
    amount, then its strong evidence that the Xis
    are not independent

6
correlation plot (example)
7
correlation plot (example)
8
scatter diagram
  • plot of pairs (Xi, Xi1)
  • if Xis are independent, one would expect the
    points (Xi, Xi1) to be scattered randomly
    throughout the first quadrant of the plane
  • nature of scattering depends on underlying
    distribution of the Xis
  • if Xis are positively (negatively) correlated,
    points will tend to lie along a line with
    positive (negative) slope

9
scatter diagram (example)
10
scatter diagram (example 2)
11
Specifying Distribution
  • useful distributions
  • use values directly
  • define empirical distribution
  • fit theoretical distribution

12
useful probability distribution
  • parameters of continuous distributions
  • location parameter
  • x-axis location
  • usually the midpoint (mean for normal
    distribution) or lower endpoint
  • also called shift-parameter
  • changes in shift the distribution left or right
    without changing it otherwise
  • scale parameter
  • determines scale (unit) of measurement
  • standard deviation ¾ for normal distribution
  • changes in compress or expand the associated
    distribution without altering its basic form

13
useful probability distribution
  • parameters of continuous distributions
  • shape parameter
  • determines basic form or shape of a distribution
    within the general family of distributions of
    interest
  • a change in generally alters a distributions
    properties (skewness) more fundamentally than a
    change in location or scale

14
Approaches to specify distribution
  • if data collection on an input random variable is
    possible
  • use data values directly in simulation (trace
    driven)
  • only reproduces what happened
  • seldom enough data to make all simulation runs
  • useful for model validation
  • define empirical distribution
  • at least (for continuous data) any value between
    min and max
  • no values outside the range can be generated
  • may have irregularities
  • fit to theoretical distribution
  • preferred method
  • easy to change

15
Specifying Distribution
  • useful distributions
  • use values directly
  • define empirical distribution
  • fit theoretical distribution

16
Uniform U(a,b)
  • application
  • used as a first model for a quantity that is
    felt to be randomly varying between a and b about
    which little else is known

17
exponential distribution exp()
  • application
  • interarrival times of entities to a system that
    occur at a constant rate
  • time to failure of a piece of equipment
  • parameters
  • scale parameter gt 0

18
gamma(k, µ)
  • application
  • time to complete some task (customer service,
    machine repair)
  • parameters
  • shape parameter k gt 0
  • scale parameter µ gt 0

19
weibull(k, )
  • application
  • time to complete some task, time to failure of a
    piece of equipment
  • used as a rough model in absence of data
  • parameters
  • shape parameter k gt 0, scale parameter gt 0

20
normal N(¹, ¾2)
  • application
  • errors of various types
  • quantities that are the sum of a large number of
    other quantities
  • parameters
  • location parameter -1 lt ¹ lt 1 scale parameter ¾ gt
    0

21
triangular (a,b,m)
  • application
  • used as a rough model in absence of data
  • a, b, m are real numbers (a lt m lt b)
  • location parameter a
  • scale parameter b-a
  • shape parameter m

22
poisson()
  • application
  • number of events that occur in an interval of
    time when events are occurring at a constant rate
  • number of items demanded from inventory

23
Specifying Distribution
  • useful distributions
  • use values directly
  • define empirical distribution
  • fit theoretical distribution

24
Empirical Distributions
  • use observed data themselves to specify
    distribution directly
  • generate random variables from empirical
    distribution
  • (if no theoretical distribution can be fitted)
  • define a continuous piecewise-linear distribution
    function
  • sort Xjs into increasing order
  • X(i) denotes the ith smallest value of all Xjs

25
Empirical Distribution (example)
  • observation X1 3, X2 8, X3 18,
    X4 10, X5 13, X6 6
  • sorted observation X(1) 3, X(2) 6, X(3)
    8, X(4) 10, X(5) 13, X(6) 18
  • distribution
  • F(X(i))
  • F(X(i)) (i-1)/(n-1)
  • F(X(1)) F(3) 0/5 0
  • F(X(2)) F(6) 1/5
  • F(X(3)) F(8) 2/5
  • etc
  • F(X) if X(i) X X(i1)
  • F(X) (i-1)/(n-1) (X X(i))/((n-1)(X(i1)-X(i)
    )
  • F(12) ??
  • interval X(4) 12 lt X(5)
  • (n 6, i 4)
  • F(12) 3/5 2/(53) 0.68

26
Empirical Distribution (example)
27
Specifying Distribution
  • useful distributions
  • use values directly
  • define empirical distribution
  • fit theoretical distribution

28
Necessary Steps for fitting a theoretical
distribution
  • hypothesize family
  • summary statistics
  • histogram
  • quantile summary box plots
  • estimate parameters
  • how representative is fitted distribution?
  • Chi-Square Goodness of fit test
  • Kolmogorov-Smirnoff Test

29
Hypothesizing families of distributions
  • first step in selecting a particular input
    distribution
  • decide upon general family appears to be
    appropriate
  • prior knowledge might be helpful
  • service times should never be generated from a
    normal distribution WHY????
  • approaches
  • summary statistics
  • histograms
  • quantile summaries and box plots

30
Summary Statistics
  • some distributions are characterized at least
    partially by functions of their true paramters
  • sample estimate
  • estimate for range
  • minimum X(1)
  • maxiumum X(n)
  • measure of tendency
  • mean ¹
  • median x0.5

31
Summary Statistics (cont.)
  • sample estimate
  • measure of variability
  • variance ¾2
  • coefficient of variation cv
  • measure of symmetry
  • skewness n

32
Histograms
  • graphic estimate of the plot of the density
    function corresponding to the distribution of
    data
  • density functions tend to have recognizable
    shapes in many cases
  • graphical estimate of a density should provide a
    good clue to the distribution that might be tried
    as a model for the data

33
Histograms
  • how to
  • break up range of values into k disjoint adjacent
    intervals (same width)
  • b0, b1), b1, b2), , bk-1, bk) b bj
    bj-1
  • you might want to throw out a few extremely large
    or small Xis to avoid getting an
    unwidely-looking histogram plot
  • let hj be the proportion of Xis that are in the
    jth interval bj-1, bj)
  • hint try several values of b and choose the
    smallest one that gives a smooth histogram

34
Histogram (example)
  • create 1000 random variables N(0,1)
  • create histogram

35
Quantile Summaries
  • useful for determining whether the underlying
    probability density function is skewed to the
    right or left
  • if F(x) is the distribution function for a
    continuous random variable
  • q-quantile of F(x) is that number xq such that
    F(xq) q
  • median x0.5
  • lower/upper quartiles x0.25 / x0.75
  • lower/upper octiles x0.125 / x0.875

36
Quantile Summaries
  • Quantile Depth Sample Values Midpoint
  • Median i (n1)/2 X(i) X(i)
  • Quartiles j (floor(i)1)/2 X(j)
    X(n-j1) X(j) Xn-j1)/2
  • Octiles k (floor(j)1)/2 X(k) X(n-k1) X(k)
    Xn-k1)/2
  • Extremes 1 X(1) X(n) (X(1) X(n)/2
  • if the underlying distribution of the Xis is
    symmetric, then the midpoints should be
    approximately equal
  • if the underlying distribution is skewed to the
    right (left), then the midpoints should be
    increasing (decreasing)

37
Box Plots (example)
  • graphical representation of quantile summary
  • fifty percent of observations fall within the
    horizontal boundaries of the box x0.25, x0.75

38
Necessary Steps for fitting a theoretical
distribution
  • hypothesize family
  • summary statistics
  • histogram
  • quantile summary box plots
  • estimate parameters
  • how representative is fitted distribution?
  • Chi-Square Goodness of fit test
  • Kolmogorov-Smirnoff Test

39
Estimation of Parameters
  • After one ore more candidate families of
    distributions have been hypothesized we most
    somehow specify the values of their parameters in
    order to have a completely specified
    distributions for possible use in simulation
  • maximum likelihood estimators (MLEs)
  • estimator numerical function of the data
  • unknown parameter µ
  • hypothesized density function fµ(x)
  • likelihood function L(µ)
  • estimator is value µ that maximizes Lµ over
    all permissible values of µ

40
Estimation for Parameters (example)
  • exponential distribution with unknown parameter
    (µ )
  • f(x) (1/) e-x/ for x 0
  • likelihood function L()
  • we seek value of that maximizes L() over all
    gt 0
  • easier to work with its logarithm
  • (maximize l() instead of L())
  • maximize set derivative equal to zero and solve
    for

41
Necessary Steps for fitting a theoretical
distribution
  • hypothesize family
  • summary statistics
  • histogram
  • quantile summary box plots
  • estimate parameters
  • how representative is fitted distribution?
  • Chi-Square Goodness of fit test
  • Kolmogorov-Smirnoff Test

42
Goodness-of-Fit Tests
  • Statistical hypothesis tests
  • used to assess formally whether the observations
    X1, X2, Xn are independent samples form a
    particular distribution with distribution
    function
  • H0 the Xis are IID random variables with
    distribution function
  • be careful failure to reject H0 should not be
    interpreted as accepting H0 as being true.
  • well concentrate on two different ones
  • chi-square test
  • Kolmogorov-Smirnoff tests

43
Chi-Square Goodness-of-Fit Test
  • more formal comparison of a histogram with the
    fitted density or mass function
  • how to
  • divide range into k adjacent intervals a0, a1),
    a1, a2), , ak-1, ak)
  • how to choose number and size of intervals? !
    equiprobable
  • determine Nj (number of Xis in the jth interval
    aj-1, aj)
  • compute pj (expected proportion of the Xis that
    would fall in the jth interval if we were
    sampling from the fitted distribution
  • determine test statistic ?² and reject H0 if its
    too large

44
Chi-Square Goodness-of-Fit Test (cont.)
  • case 1 all parameters of the fitted distribution
    are known
  • if H0 is true, Â2 converges in distribution (as n
    ? 1) to a chi-square distribution with k-1
    degrees of freedom
  • for large n, a test with approximate level is
    obtained by rejecting H0 if
  • upper 1 - critical point for a
    chi-square distribution with k-1 dfs

45
Chi-Square Goodness-of-Fit Test (cont.)
  • case 2 m parameters had to be estimated to
    specify fitted distribution
  • if H0 is true, then as n ! 1 the distribution
    function of ?2 converges to a distribution
    function that lies between the distribution
    function with k-1 and k-m-1 degrees of freedom
  • the upper 1 - critical point of
    the asymptotic distribution of ?2 (in general not
    known)
  • reject H0 if
  • do not reject H0 if
  • ambiguous situation if
  • recommendation reject H0 if (conservative)

46
Kolmogorov-Smirnov Goodness-of-Fit Test
  • compares an empirical distribution function with
    the distribution function of the hypothesized
    distribution
  • not necessary to group data
  • valid for any sample size n
  • tend to be more powerful than chi-squared tests
  • but only valid if all parameters of the
    hypothesized distribution are known and the
    distribution is continuous

47
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)
  • compute tests statistics
  • define empirical distribution function
  • test statistic Dn corresponds to largest
    (vertical) distance between Fn(x) and
    hypothesized distribution function of

48
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)
  • case 1 all parameters of estimated distribution
    function are known
  • distribution of Dn does not depend on
    (if is continuous)
  • reject H0 if
  • c1- (does not depend on n) given in the
    following table
  • 1 - 0.85 0.9 0.95 0.975 0.99
  • c1- 1.138 1.224 1.358 1.48 1.628

49
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)
  • case 2
  • hypothesized distribution is N(¹, ¾2) with both ¹
    and ¾2 unknown (estimated) , estimated
    distribution function
  • Dn is calculated the same way as in case 1 -
    different critical points
  • reject H0 if
  • c1- (does not depend on n) given in the
    following table
  • 1 - 0.85 0.9 0.95 0.975 0.99
  • c1- 0.775 0.819 0.895 0.955 1.035

50
Kolmogorov-Smirnov Goodness-of-Fit Test (cont.)
  • case 3
  • hypothesized distribution is exponentially
    distributed (exp())
  • with unknown (estimated using )
  • estimated distribution function
  • reject H0 if
  • c1- (does not depend on n) given in the
    following table
  • 1 - 0.85 0.9 0.95 0.975 0.99
  • c1- 0.926 0.990 1.094 1.19 1.308
Write a Comment
User Comments (0)
About PowerShow.com