Module 3: Characterizing Variability - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Module 3: Characterizing Variability

Description:

Binomial Distribution ... Binomial Example ... Binomial distribution is appropriate in sampling situations when there is ' ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 57
Provided by: jamesmc9
Category:

less

Transcript and Presenter's Notes

Title: Module 3: Characterizing Variability


1
Module 3 Characterizing Variability
  • Probability

2
Outline
  • the probability framework
  • random variables
  • probability distributions and densities
  • expected values

3
Probability Framework
  • How would you design a framework to account for
    uncertainty?
  • values that can occur
  • groups of outcomes that can occur
  • frequency of occurrence
  • of values, and groups

4
Probability Framework
  • Experiment
  • situation leading to a value
  • could be an actual experiment, or a circumstance
    in which values are observed
  • e.g., gold recovery in a lab experiment
  • e.g., atmospheric concentration of phenol
  • e.g., product consumer preferences
  • an experiment has outcomes
  • Sample Space
  • the space of all possible outcomes from an
    experiment
  • e.g,. appealing, acceptable, unappealing
  • e.g., temperature - Real line
  • denoted by S

5
Probability Framework
  • Events
  • are collections of outcomes
  • a single outcome is also an event
  • e.g., E chip is defective
  • e.g., E car finish is metallic, car finish is
    matte
  • an event is said to have occurred if at least one
    of the outcomes in the event has occurred
  • events are sets of outcomes, and we can talk
    about intersection and union of events
  • e.g., for E1car is red, car is green, car is
    blue, E2car is not red, then E1?E2 car is
    green, car is blue

6
The situation being considered...
  • In considering an experiment with outcomes, and
  • events, we are trying to describe a physical
    system.
  • We will gain information about this system
    through
  • observations.
  • is called a ...

7
Population
  • Broad definition -
  • all possible items or units possessing one or
    more common characteristics under specified
    experimental or observational conditions (Mason,
    Gunst and Hess)
  • in other words, all possible outcomes from a
    well-specified system
  • e.g., values from a process - process - series
    of repeatable actions resulting in observable
    characterisitics
  • See also Devore, page 3 and page 7
  • In identifying a population, we must be clear
    about what is being considered.

8
Set Operations
  • Since events are subsets, we can use standard set
    operations
  • union
  • union of two events is the set of outcomes
    occurring in either event
  • intersection
  • intersection of two events is the set of outcomes
    that occur in both events
  • complement
  • the complement of an event E is the set of events
    in the sample space that are not in E - notation

9
Visualizing Events
  • Since events are subsets, and the sample space is
    a large set, we can use Venn diagrams to
    visualize events

S
Sample space
E2
E1
E1 ? E2
10
Mutually Exclusive Events
  • Two events are mutually exclusive if
  • i.e., both events cant occur together

11
Examples
  • Temperature in a reactor
  • sample space (-?, ?)
  • event E1 - temperature below 350 C- E1 T?350
  • event E2 - temperature above 300 C - E2 Tgt300
  • E1?E2 300 lt T lt 350
  • E1?E2 (-?, ?)

Continuous Case
12
Examples
  • defects in samples of 5 from a chip foundry
  • sample space nnnnn, dnnnn, ndnnn, nndnn,
    nnndn, nnnnd, ddnnn,nddnn, ddddd
  • event E1 - one of the first two chips in the
    sample is defective and the rest are not - E1
    dnnnn, ndnnn
  • event E2 - at most one chip is defective - E2
    dnnnn, ndnnn, nndnn, nnndn, nnnnd, nnnnn
  • E1?E2 dnnnn, ndnnn
  • event E3 - no defective chips - E3 nnnnn
  • E3 ?E1 ? (mutually exclusive)

Discrete Case
13
Probability Framework
  • Probability
  • provides a measure on the space of all possible
    outcomes
  • indicates relative frequency, or likelihood, of a
    certain event occurring
  • must obey a few rules to be consistent
  • Axioms of Probability

14
Axioms of Probability
  • required for consistency
  • P(S) 1 - the probability that something
    happens is always one something always happens!
  • - probability
    provides a relative frequency of occurrence a
    fractional value that should like between 0 and 1
  • if E1 and E2 are mutually exclusive, P(E1? E2)
    P(E1) P(E2)
  • Recognizing that we have to be careful about
    double counting importance of the concept of
    mutually exclusive

15
Additional Probability Facts
  • Probability of nothing happening
  • Probability of an event NOT happening
  • where the overbar denotes complement
  • alternative symbol - (prime)

16
Additional Probability Facts
  • General case - probability of a union of events

Need to avoid double counting when an outcome in
both events occurs
Note that if the events are mutually exclusive,
their intersection is zero and this term drops
from the expression.
17
How can we determine probability functions?
  • by examining the sample space - how often can/do
    values occur?
  • definition of sample space - enumeration of
    values and outcomes
  • counting rules - permutations/combinations
  • physical observation
  • e.g., temperatures appear to occur in a pattern
    that follows a normal probability distribution

18
Probability functions for discrete problems
  • Equally Likely Outcomes
  • If we have N equally likely outcomes, then
  • If we have an event consisting of several
    outcomes, i.e., Eoutcome1, outcome2, outcome3
  • then

19
Probability functions for discrete problems
  • More generally, if we have an event consisting of
    individual outcomes, then
  • where n(E) is the number of outcomes in E, and
    n(S) is the number of outcomes in the sample
    space S.

20
Multiplication Rule
  • for counting numbers of possible outcomes.
  • If we have two operations that are independent,
    then if the first operation can be performed n1
    ways, and the second operation can be performed
    n2 ways, then both operations can be performed
    n1n2 ways.

21
Additional Counting Rules
  • for arrangements of n outcomes
  • Permutations
  • choosing r objects from a total of n when order
    is important
  • Combinations
  • choosing r objects from a total of n when order
    is not important

22
Example
  • Functional groups
  • suppose we have a set of 6 functional groups
  • F1, F2, F3, F4, F5, F6
  • what is the probability of obtaining F1-F2-F3-F4
    when we are considering strings of 4 functional
    groups?
  • order IS important here
  • number of outcomes in the sample space n(S)
    number of ways of choosing strings of 4 from the
    6 groups 6P4 6!/2! 360
  • only one outcome in the event
  • P(E) 1/360

Important consequences in computational
chemistry.
23
Probability and Inter-relationships
  • between events
  • Conditional Probability
  • Independence
  • Bayes Theorem

24
Conditional Probability
  • What is the likelihood of an event E1 occurring,
    given that event E2 has occurred?
  • Validity check - if events E1 and E2 are mutually
    exclusive, P(E1?E2)0, and P(E1E2) 0/P(E2) 0
  • if event E2 has occurred, event 1 cant occur --gt
    conditional probability is zero

given
25
Example
  • Galvanneal Line
  • Outcomes with probabilities -
  • O1 thickness off-spec, fails tape test -- 0.04
  • O2 thickness acceptable, fails tape test -- 0.1
  • O3 thickness off-spec, passes tape test -- 0.03
  • O4 thickness acceptable, passes tape test --
    0.83
  • Events -
  • E1 fails tape test
  • P(E1) P(O1) P(O2) 0.14
  • E2 fails thickness test
  • P(E2) P(O1) P(O3) 0.07

26
Galvanizing Line - Photos
Steel sheet goingthrough a moltenzinc bath
27
Example
  • Conditional Probability
  • what is the probability that given the zinc
    thickness is off-spec, the coil fails the tape
    test?
  • E1?E2 thickness offspec, fails tape test
  • prob 0.04
  • point of discussion - is zinc coating thickness a
    reliable indicator of tape test failure?

28
Independent Events
  • Two events are independent if
  • intuitive interpretation
  • likelihood of one event occurring is not
    influenced by whether the other event has
    occurred
  • likelihood of both events occurring together is
    simply the product of the likelihood of each one
    occurring
  • Validity check - conditional probability for two
    independent events

29
Bayes Theorem
  • useful for situations in which we have incomplete
    probability knowledge
  • forms basis for statistical estimation
  • suppose we have two events, A and B
  • from conditional probabilityso for P(B)gt0

30
Bayes Theorem
  • we can generalize this to the case where we have
    some event B, and a range of mutually exclusive
    events E1, , En that cover the sample space
  • exhaustive set of events
  • nowfor P(B)gt0
  • in this case, we have obtained P(B) from
    knowledge of how B occurs with the other events

31
Bayes Theorem - Example
  • Drug Testing
  • Drug testing - reliability of analytical
    procedure
  • Events - T -- positive test reading, D -- drug
    user
  • probability of true positive is 0.99 (correctly
    detects usage when individual is a drug user) --
    P(TD)0.99
  • probability of true negative is 0.94 (correctly
    detects non-usage when individual is not a drug
    user) -- P(TD)0.94
  • suppose that 5 of population are drug users --
    P(D) 0.05
  • if a positive reading is obtained, what is the
    probability that the individual is in fact a drug
    user? -- P(DT)

The prime denotes complement.
32
Bayes Theorem - Example
  • From Bayes Theorem
  • P(TD) 0.99, P(TD)0.95, P(D)0.05
  • from sum to unity for probabilities,
  • P(D)1-P(D)1-0.05 0.95
  • P(TD)1-P(TD)1-0.94 0.06

33
Bayes Theorem - Example
  • putting it all together,
  • with a positive detection rate of 99, and a
    false positive rate of 6, there is a 46 chance
    that an individual is a drug user given a
    positive reading, when 5 of the population are
    drug users

34
Bayes Theorem - Example
  • Policy implications
  • incidence of drug use fixed in the population -
    given
  • reliability of test depends significantly on true
    positive, false positive rate
  • e.g., how can we improve the reliability of the
    test by minimizing the false positive rate?

Underscores the importance of analytical
procedures
35
Random Variables and Probability Distributions
36
Random Variable
  • is a means of attaching a numerical value
    (label) to an outcome
  • in some instances, this occurs by definition -
    e.g., temperature is inherently numerical
  • e.g., defective 0, functional 1 --gt random
    variable that takes on the values of 0 and 1
  • why do we need this notion?
  • to allow us to express probability and outcomes
    in a mathematical setting

37
Types of Random Variables
  • reflect types of data
  • Discrete Random Variables
  • take on integer values - discrete set of values
  • Continuous Random Variables
  • take on values from a portion of the real line
  • continuum of values
  • implications for probability statements later

38
Random Variables - Notation
  • Standard Convention
  • Random variable denoted by capital -- X
  • Values assumed denoted by lower-case -- x

39
Discrete Random Variables
40
Discrete Random Variables
  • We have a probability function
  • Example - sampling one chip from a batch of 30
    (10 of which are defective)
  • defective 0, function 1

41
Cumulative Distribution Function
  • We can also define a Cumulative Distribution
    Function as follows
  • FX is the probability that we obtain an outcome
    less than or equal to a given number
  • FX is the accumulation of probabilities of
    outcomes less than the given number
  • more to come...

42
Probability Function - Example
  • Galvanneal Line
  • discrete random variable - attach score (number)
    to reflect outcomes - x0, 1, 2 -- acceptability
    score
  • O1 thickness off-spec, fails tape test - x 0
  • O2 thickness acceptable, fails tape test -x 1
  • O3 thickness off-spec, passes tape test - x 1
  • O4 thickness acceptable, passes tape test -x 2
  • interpretation - score reflects severity of
    situation in descending order
  • Probability Function
  • P(X0) 0.04, P(X1) 0.13, P(X2) 0.83

43
Expected Value
  • What is the value of the random variable expected
    on average?
  • Reasoning
  • we have probability function that indicates
    values occur PX(x) fraction of the time
  • if we had 1000 experiments, we would would obtain
    an outcome of 1 in PX(1) 1000 instances
  • we can carry this analysis for each outcome, and
    then take the average
  • we obtain (0PX(0) 1000 1PX(1) 1000
    )/1000 0 PX(0) 1PX(1) 2PX(2)
  • leads to definition of expected value for a
    discrete r.v.

44
Expected Value
  • The expected value of a discrete random variable
    X is defined as
  • The expected value is an important parameter that
    characterizes probability functions, and is given
    a symbol

? is the MEAN of the random variable X.
45
Example - Mean for Galvanneal Line
  • Using the probability function,

46
Variance
  • is defined using the expected value
  • what is the value of the squared deviation from
    the mean expected on average?
  • Note - reminiscent of sample variance, which in
    fact is the statistic that estimates the
    parameter ?2

47
Standard Deviation
  • is the square root of the variance

The mean, variance and standard deviation are
parameters summarizing a probability
distribution for a random variable.
48
Expected Values
  • In general, if we have a function of a random
    variable, we can take the expected value
  • Examples
  • mean - g(X) X
  • variance - g(X) (X-?)2

49
Linearity of Expectation
  • The Expected Value operation is LINEAR
  • 1) Additivity E(XY) E(X) E(Y)
  • 2) Scaling
  • E(kX) k E(X)
  • where k is a constant
  • e.g., E(X6) E(X) 6 ?X 6

50
Probability Distributions for Discrete R.V.s
  • Recall - we can determine probability functions
    by counting - enumeration given characteristics
    of physical situation - or based on empirical
    observations
  • specific types of problems occur frequently, and
    motivate the labeling and study of generic
    distributions
  • Binomial Distribution
  • Poisson Distribution

General Approach - build a library of standard
distributions.
51
Binomial Distribution
  • Suppose we are conducting a number of independent
    trials, each with only one of two possible values
  • each trial is referred to as a Bernoulli trial
  • note that each trial is independent
  • outcomes -- 0, 1 -- True/False -- Success/Fail --
    ...
  • in each trial, P(1) p, and P(0) 1-p
  • if we have n trials, what is the probability that
    we obtain x outcomes of 1 (successes)?
  • in N trials, we have nCx ways of having x
    successes
  • for each case of x successes, the probability is

52
Binomial Distribution
  • Putting it all together, the probability of
    having x successes in n independent trials is

Binomial Probability Distribution Function
53
Binomial Distribution
  • Mean
  • Variance

54
Using the Binomial Distribution
  • Sampling with Replacement -
  • Example -
  • On the microwave module line of a
    telecommunications equipment maker, the
    probability of a defective module is 0.21. From
    each batch, one module is selected and tested,
    and then returned to the batch. This procedure is
    repeated 5 times, so that we have 5 independent
    tests for defects. What is the probability of
    having
  • a) 1 defect in the five tests?
  • b) 3 defects in the five tests?
  • c) why is it important that the module be
    returned?

55
Binomial Example
  • a) n 5 (independent trials), x 1 (success
    defect identified - need to be clear on this!)
  • b) n 5 (independent trials), x 3

56
Binomial Example
  • c) why is it necessary to return the module to
    the batch before the next sample?
  • preserve independence
  • if module not returned, batch is one smaller, and
    there is potentially one fewer defect -
    underlying probability is influenced
  • Binomial distribution is appropriate in sampling
    situations when there is sampling with
    replacement
  • for sampling without replacement, we need to use
    the Hypergeometric distribution
  • if the lot size is large relative to the number
    of tests in the sample, binomial provides
    reasonable approximation
  • e.g., 10 sampling tests for lot of 1000
Write a Comment
User Comments (0)
About PowerShow.com