Selecting Input Probability Distributions - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Selecting Input Probability Distributions

Description:

Chi-square test. Choosing a distribution in the absence of data. 3. Sources of Randomness ... failure of a machine, Repair times for a machine, and. Setup times ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 26
Provided by: indus74
Category:

less

Transcript and Presenter's Notes

Title: Selecting Input Probability Distributions


1
Selecting Input Probability Distributions
2
Outline
  • Sources of randomness
  • Pitfalls in modeling simulation input data
  • Choosing a distribution when data are available
  • Hypothesizing families of distributions
  • Estimation of parameters
  • Determining how representative the fitted
    distributions are
  • Chi-square test
  • Choosing a distribution in the absence of data

3
Sources of Randomness
  • Almost all real-world systems contain one or more
    sources of randomness. The following are common
    sources of randomness in manufacturing systems
  • Interarrival times of parts or raw materials,
  • Processing or assembly times of parts,
  • Times to failure of a machine,
  • Repair times for a machine, and
  • Setup times for a machine.
  • Failure to choose correct probability
    distributions may drastically affect models
    results.

4
Pitfalls in Modeling Simulation Input Data
  • 1) Replacing a distribution by its mean
  • Example Assume an insurance company with a claim
    department of 3 employees each claim is
    processed by the three employees.
  • Insurance claims arrive at the claims department
    every 10 minutes (inter-arrival time) for
    processing.
  • When a claim arrives, it takes 1 min. to transfer
    the claim to the first employee. If the first
    employee is not free, the claim waits on his
    desk. When the first employee becomes free, it
    takes 10 min to process the claim. When the first
    employee finishes working on the claim, the claim
    is transferred to the second employee for further
    processing. This transfer takes 1 min.
  • Once the second employee is available, it takes
    10 min to complete his portion of the process.
    When the second employee finishes, the claim is
    transferred to the third and final employee. This
    transfer takes 1 min.
  • Once the third employee is available, it takes 10
    min to perform his portion of the process. When
    the third employee finishes, the claim is
    complete and is transferred to the mailroom where
    it is sent to the customer with the approval or
    disapproval decision.

5
Pitfalls in Modeling Simulation Input Data
Simple graphical representation
Model Input data
Simulation model (Averages)
Run the model
6
Pitfalls in Modeling Simulation Input Data
Simulation Output (Averages)
  • From the animation and the output
  • Queues are not building
  • Cycle time is not fluctuating,
  • No problems in the system.
  • Note This output is similar to using a static
    tool like a spreadsheet or a process map.

7
Pitfalls in Modeling Simulation Input Data
Reality
  • In reality, the arrival of the claims and
    department operations would never work in perfect
    rhythm, there is variability.
  • In reality, variability occurs in every day
    situations and in any business. This is where
    the power of simulation over other methods
    arises.
  • Variability and its effect on business operations
    and decision making will be demonstrated in the
    claims department simulation model.

8
Pitfalls in Modeling Simulation Input Data
The inter-arrival rate, processing times, and
transfer times used previously in the example
were Averages. Let us go back to
reality! Variability
The Real model Input (Variability -
Distributions)
Real distributions
Press here
9
Distributions Used in the Model
Mean 1
Mean 10
Mean 10 s 2
Min 8 Mode 10 Max 12
s Standard deviation
Min 8 Max 12
10
Pitfalls in Modeling Simulation Input Data
Simulation model (Distribution)
Run the model
Simulation Output (Distribution)
  • From the animation and the output
  • Queues are building
  • Cycle time is fluctuating,
  • There are significant problems in the system.
  • The output is not similar to the output based on
    averages.

11
Pitfalls in Modeling Simulation Input Data
12
Pitfalls in Modeling Simulation Input Data
Averages
Distributions (Variability)
It is evident that using the average only can
have a large impact on simulation output and on
the quality of decisions made with the simulation
results.
13
Pitfalls in Modeling Simulation Input Data
(Contd)
  • 1) Replacing a distribution by its mean ?
  • 2)Selecting the wrong distribution
  • In the example Suppose that 200 claims
    processing times are available for the first
    process but their underlying probability
    distribution is unknown. Using some methods
    (described later), The following distributions
    are fit to the observed data
  • Normal, Triangular, Lognormal, Beta and Weibull.

14
Distributions Used for Process 1
15
Pitfalls in Modeling Simulation Input Data
(Contd)
  • Then, a simulation run of length 1600 hours is
    made using each of the five distributions. If
    the normal distribution is the best fit for the
    data, the following errors for cycle time are
    observed when using other distributions

It is evident that the choice of probability
distribution can have a large impact on
simulation output and on the quality of decisions
made with the simulation results.
16
Choosing a Distribution When Data are Available
  • There are three steps in determining what
    probability distribution best represents a set of
    data
  • 1. Hypothesize families of distributions,
  • 2. Estimate parameters, and
  • 3. Determine how representative the fitted
    distributions are.

17
Hypothesizing Families of Distributions
  • The first step in selecting a particular input
    distribution is to decide what general families
    (e.g., exponential, normal) appear to be
    appropriate on the basis of their shapes.
  • Some general techniques used in hypothesizing
    families of distributions include using
  • Prior knowledge
  • Summary statistics
  • Histograms

18
Use of Prior Knowledge
  • In some situations, prior knowledge about a
    certain random variables role in the system can
    be used to select a distribution or at least to
    rule out some distributions. For example,
  • If customers arrive one at a time, at a constant
    rate, so that the numbers of customers arriving
    in disjoint time intervals are independent, the
    interarrival times are probably exponentially
    distributed.
  • Service times should (at least in principle) not
    be generated directly from a normal distribution.
  • The proportion of defective items in a large
    batch should not be assumed to have a gamma
    distribution, since proportions must be between 0
    and 1 and gamma random variables have no upper
    bounds.

19
Use of Summary Statistics
  • Summary statistics may be used in some situations
    to suggest an appropriate distribution. Some
    guidelines are
  • For a symmetric continuous distribution (e.g.,
    normal) the mean is equal to the median.
  • If the coefficient of variation, cv, is close to
    one, it suggests an exponential distribution.
  • Skewness is a measure of the symmetry of a
    distribution.
  • for symmetric distributions (e.g., normal)
  • skewness 0
  • if the distribution is skewed to the right
  • skewness gt 0
  • if the distribution is skewed to the left
  • skewness lt 0

20
Use of Summary Statistics (Contd)
  • For a discrete distribution, the lexis ratio
    plays an important role
  • for Poisson lexis ratio 1
  • for binomial lexis ratiolt1
  • for negative binomial lexis ratiogt 1

21
Estimation of Parameters
  • Once one or more candidate families of
    distributions have been hypothesized, the values
    of their parameters (i.e., shape, scale, or
    location) must be specified.
  • The most popular method for estimation of
    parameters is the method of maximum likelihood.
  • For a particular distribution, the method of
    maximum likelihood selects those values for the
    parameters that maximize the likelihood (or
    probability) of having obtained the observed data
    from the distribution.

22
Determining How Representative the Fitted
Distributions Are
  • After determining one or more probability
    distributions that might fit the observed data,
    the quality of the fitted distributions must be
    evaluated using one or more heuristics.
  • Two heuristics used in determining the goodness
    of fit are
  • The Chi-square test
  • The Kolmogorov-Smirnov test

23
Chi-square Test
  • The chi-square test measures the error between a
    candidate distributions density function and the
    histogram.
  • The test statistic is
  • where
  • k Number of intervals
  • Nj Number of observations in the interval
    aj-1, aj)
  • npj Expected number of observations that would
    fall in the jth interval if we were sampling from
    the fitted distribution.
  • If , the
    hypothesized distribution is rejected.

24
Choosing a Distribution in the Absence of Data
  • In some situations it is not possible to collect
    data on the random variables of interest.
    Examples include
  • A manufacturing system under study that does not
    currently exist, or
  • An existing system where the number of required
    probability distributions is large and the time
    available prohibits necessary data collection and
    analysis.

25
Choosing a Distribution in the Absence of Data
(Contd)
  • Two heuristic approaches for choosing a
    distribution in the absence of data involve
  • Using a triangular distribution
  • Using a beta distribution
  • The first step in using either heuristic is to
    identify an interval a,b in which it is felt
    that X (for example, the time to perform a task)
    will lie with probability close to 1.
  • In order to obtain subjective estimates of a and
    b, experts are asked for their most optimistic
    and pessimistic estimates of the time to perform
    the task.

26
Choosing a Distribution in the Absence of Data
(Contd)
  • Using a triangular distribution In addition to
    a and b (minimum and maximum values for time to
    perform a task), the experts are asked to specify
    the most likely time to perform the task, denoted
    by m.
  • The advantage of this approach is that it is
    simple and it is usually possible to obtain
    estimates for a, b, and m.
  • The disadvantage of this approach is that it is
    not flexible and may lead to large errors.
Write a Comment
User Comments (0)
About PowerShow.com