Selecting Input Probability Distributions - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Selecting Input Probability Distributions

Description:

Chi-square test. Choosing a distribution in the absence of data. 3. Sources of Randomness ... failure of a machine, Repair times for a machine, and. Setup times ... – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 26

Provided by: indus74

Category:

more less

Transcript and Presenter's Notes

Title: Selecting Input Probability Distributions

1
Selecting Input Probability Distributions
2
Outline

Sources of randomness
Pitfalls in modeling simulation input data
Choosing a distribution when data are available
Hypothesizing families of distributions
Estimation of parameters
Determining how representative the fitted
distributions are
Chi-square test
Choosing a distribution in the absence of data

3
Sources of Randomness

Almost all real-world systems contain one or more
sources of randomness. The following are common
sources of randomness in manufacturing systems
Interarrival times of parts or raw materials,
Processing or assembly times of parts,
Times to failure of a machine,
Repair times for a machine, and
Setup times for a machine.
Failure to choose correct probability
distributions may drastically affect models
results.

4
Pitfalls in Modeling Simulation Input Data

1) Replacing a distribution by its mean
Example Assume an insurance company with a claim
department of 3 employees each claim is
processed by the three employees.
Insurance claims arrive at the claims department
every 10 minutes (inter-arrival time) for
processing.
When a claim arrives, it takes 1 min. to transfer
the claim to the first employee. If the first
employee is not free, the claim waits on his
desk. When the first employee becomes free, it
takes 10 min to process the claim. When the first
employee finishes working on the claim, the claim
is transferred to the second employee for further
processing. This transfer takes 1 min.
Once the second employee is available, it takes
10 min to complete his portion of the process.
When the second employee finishes, the claim is
transferred to the third and final employee. This
transfer takes 1 min.
Once the third employee is available, it takes 10
min to perform his portion of the process. When
the third employee finishes, the claim is
complete and is transferred to the mailroom where
it is sent to the customer with the approval or
disapproval decision.

5
Pitfalls in Modeling Simulation Input Data
Simple graphical representation
Model Input data
Simulation model (Averages)
Run the model
6
Pitfalls in Modeling Simulation Input Data
Simulation Output (Averages)

From the animation and the output
Queues are not building
Cycle time is not fluctuating,
No problems in the system.
Note This output is similar to using a static
tool like a spreadsheet or a process map.

7
Pitfalls in Modeling Simulation Input Data
Reality

In reality, the arrival of the claims and
department operations would never work in perfect
rhythm, there is variability.
In reality, variability occurs in every day
situations and in any business. This is where
the power of simulation over other methods
arises.
Variability and its effect on business operations
and decision making will be demonstrated in the
claims department simulation model.

8
Pitfalls in Modeling Simulation Input Data
The inter-arrival rate, processing times, and
transfer times used previously in the example
were Averages. Let us go back to
reality! Variability
The Real model Input (Variability -
Distributions)
Real distributions
Press here
9
Distributions Used in the Model
Mean 1
Mean 10
Mean 10 s 2
Min 8 Mode 10 Max 12
s Standard deviation
Min 8 Max 12
10
Pitfalls in Modeling Simulation Input Data
Simulation model (Distribution)
Run the model
Simulation Output (Distribution)

From the animation and the output
Queues are building
Cycle time is fluctuating,
There are significant problems in the system.
The output is not similar to the output based on
averages.

11
Pitfalls in Modeling Simulation Input Data
12
Pitfalls in Modeling Simulation Input Data
Averages
Distributions (Variability)
It is evident that using the average only can
have a large impact on simulation output and on
the quality of decisions made with the simulation
results.
13
Pitfalls in Modeling Simulation Input Data
(Contd)

1) Replacing a distribution by its mean ?
2)Selecting the wrong distribution
In the example Suppose that 200 claims
processing times are available for the first
process but their underlying probability
distribution is unknown. Using some methods
(described later), The following distributions
are fit to the observed data
Normal, Triangular, Lognormal, Beta and Weibull.

14
Distributions Used for Process 1
15
Pitfalls in Modeling Simulation Input Data
(Contd)

Then, a simulation run of length 1600 hours is
made using each of the five distributions. If
the normal distribution is the best fit for the
data, the following errors for cycle time are
observed when using other distributions

It is evident that the choice of probability
distribution can have a large impact on
simulation output and on the quality of decisions
made with the simulation results.
16
Choosing a Distribution When Data are Available

There are three steps in determining what
probability distribution best represents a set of
data
1. Hypothesize families of distributions,
2. Estimate parameters, and
3. Determine how representative the fitted
distributions are.

17
Hypothesizing Families of Distributions

The first step in selecting a particular input
distribution is to decide what general families
(e.g., exponential, normal) appear to be
appropriate on the basis of their shapes.
Some general techniques used in hypothesizing
families of distributions include using
Prior knowledge
Summary statistics
Histograms

18
Use of Prior Knowledge

In some situations, prior knowledge about a
certain random variables role in the system can
be used to select a distribution or at least to
rule out some distributions. For example,
If customers arrive one at a time, at a constant
rate, so that the numbers of customers arriving
in disjoint time intervals are independent, the
interarrival times are probably exponentially
distributed.
Service times should (at least in principle) not
be generated directly from a normal distribution.
The proportion of defective items in a large
batch should not be assumed to have a gamma
distribution, since proportions must be between 0
and 1 and gamma random variables have no upper
bounds.

19
Use of Summary Statistics

Summary statistics may be used in some situations
to suggest an appropriate distribution. Some
guidelines are
For a symmetric continuous distribution (e.g.,
normal) the mean is equal to the median.
If the coefficient of variation, cv, is close to
one, it suggests an exponential distribution.
Skewness is a measure of the symmetry of a
distribution.
for symmetric distributions (e.g., normal)
skewness 0
if the distribution is skewed to the right
skewness gt 0
if the distribution is skewed to the left
skewness lt 0

20
Use of Summary Statistics (Contd)

For a discrete distribution, the lexis ratio
plays an important role
for Poisson lexis ratio 1
for binomial lexis ratiolt1
for negative binomial lexis ratiogt 1

21
Estimation of Parameters

Once one or more candidate families of
distributions have been hypothesized, the values
of their parameters (i.e., shape, scale, or
location) must be specified.
The most popular method for estimation of
parameters is the method of maximum likelihood.
For a particular distribution, the method of
maximum likelihood selects those values for the
parameters that maximize the likelihood (or
probability) of having obtained the observed data
from the distribution.

22
Determining How Representative the Fitted
Distributions Are

After determining one or more probability
distributions that might fit the observed data,
the quality of the fitted distributions must be
evaluated using one or more heuristics.
Two heuristics used in determining the goodness
of fit are
The Chi-square test
The Kolmogorov-Smirnov test

23
Chi-square Test

The chi-square test measures the error between a
candidate distributions density function and the
histogram.
The test statistic is
where
k Number of intervals
Nj Number of observations in the interval
aj-1, aj)
npj Expected number of observations that would
fall in the jth interval if we were sampling from
the fitted distribution.
If , the
hypothesized distribution is rejected.

24
Choosing a Distribution in the Absence of Data

In some situations it is not possible to collect
data on the random variables of interest.
Examples include
A manufacturing system under study that does not
currently exist, or
An existing system where the number of required
probability distributions is large and the time
available prohibits necessary data collection and
analysis.

25
Choosing a Distribution in the Absence of Data
(Contd)

Two heuristic approaches for choosing a
distribution in the absence of data involve
Using a triangular distribution
Using a beta distribution
The first step in using either heuristic is to
identify an interval a,b in which it is felt
that X (for example, the time to perform a task)
will lie with probability close to 1.
In order to obtain subjective estimates of a and
b, experts are asked for their most optimistic
and pessimistic estimates of the time to perform
the task.

26
Choosing a Distribution in the Absence of Data
(Contd)

Using a triangular distribution In addition to
a and b (minimum and maximum values for time to
perform a task), the experts are asked to specify
the most likely time to perform the task, denoted
by m.
The advantage of this approach is that it is
simple and it is usually possible to obtain
estimates for a, b, and m.
The disadvantage of this approach is that it is
not flexible and may lead to large errors.