STAT 221 - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

STAT 221

Description:

The Random Variable - X. In statistics, X is a symbol for the unknown' numerical value that ... Let x = number of questions answered correctly on a test. ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 72
Provided by: margaret1
Category:
Tags: stat

less

Transcript and Presenter's Notes

Title: STAT 221


1
STAT 221
  • Chapter 5 Part A
  • Discrete Probability Distributions

2
The Random Variable - X
  • In statistics, X is a symbol for the unknown
    numerical value that expresses one possible
    outcome of an experiment/event.
  • In an experiment, any one outcome value that is
    assigned to X can be considered a random variable
    because X can turn out to be (or assume) any
    value in the set of all possible outcomes (and
    that set is called the sample space).

3
Depending on the experimental situation, X may be
Discrete or Continuous
  • A random variable can be classified as being
    either discrete or continuous depending on the
    numerical values it can assume.
  • A discrete random variable is usually an integer
    value and may assume either a finite number of
    values or an infinite sequence of values.
  • A continuous random variable can be an integer or
    non-integer value and may assume any numerical
    value in an interval or collection of intervals.

4
Examples of experimental outcome values
  • Examples of discrete random variables
  • Number of babies born on a particular day
  • Number of customers arriving at a drive-thru in
    any particular hour
  • Number of questions answered correctly on a test.
  • Examples of continuous random variables
  • Waiting time between customer arrivals
  • Distance that a person can run
  • Weight of a person

5
This chapter discusses experiments whose outcomes
would be discrete random variables
  • A discrete random variable can have a finite
    number of values
  • Let x number of questions answered correctly
    on a test. If the test has 100 questions, x can
    be any of 101 possible values (0, 1, 2, 3, 4
    100) note the finite upper limit of 100.
  • Or a discrete random variable can have an
    infinite sequence of values
  • Let x number of customers arriving in one day
  • where x can take on the values 0, 1, 2, . . .
    note that there is no upper limit.

6
Probability distributions of random variables
  • x f (x)
  • 0 .125 (1/8)
  • .375 (3/8)
  • .375 (3/8)
  • 3 .125 (1/8)
  • All possible values for x and their probabilities
    of occurrence can be plotted on a chart and we
    call that a probability distribution.
  • A probability distribution can actually be a
    chart, table, or function that expresses the
    probability of each possible outcome in an
    experiment.
  • Here is a probability distribution presented as a
    function f(x)
  • As you can see in this example, some x-values are
    more likely to occur than others.

7
A probability distribution
1
0
2
3
X of girls out of 3 children
8
Developing Probability Distributions
  • Probabilities of each possible value of random
    variable x can be developed using any of the
    three methods
  • Relative frequency approach look at past
    history of occurrences to project expected
    probabilities
  • Classical frequency approach use reasoning to
    deduce the expected probability (Ex a die has 6
    sides, therefore there is a 1-in-6 chance of
    getting each outcome).
  • Subjective frequency approach make an educated
    guess.

9
Here is a probability distribution for rolling a
die developed using the classical approach
This is a uniform distribution the probability
of occurrence of each possible value of x (1, 2,
3, 4, 5, or 6) is the same (uniform).
10
Here is a probability distribution developed
using the relative frequency approach
  • In this example, DiCarlo Motors used past history
    to project future probabilities of daily car
    sales.
  • They know that over the past 300 days, they sold
    no cars on 54 of those days, 117 days they sold 1
    car, 72 days they sold 2 cars, 42 days they sold
    3 cars, 12 days they sold 4, and 3 days they sold
    5.

11
Basic requirements of a probability distribution
f (x) 1 where x assumes all possible
values
0 ? f (x) ? 1 for every individual value of x
12
Descriptive statistics for frequency distributions
  • It is possible to calculate a mean, a variance
    and standard deviation from a probability
    distribution.
  • The mean is called the expected value and it is a
    single measure of central tendency.
  • The variance and standard deviations are measures
    of dispersion.

13
Expected Value mean of a random variable
  • The expected value of a random variable is the
    same as the mean a single value that represents
    the average of all possible outcomes.
  • In chapter 3, we calculated a mean value for x by
    adding up all the actual outcomes (x-values) and
    dividing by the number of outcomes. That approach
    was appropriate when you have the actual observed
    values for x.
  • When you do not have actual (observed) outcomes
    but you only have a probability distribution for
    expected outcomes, you must calculate the mean
    value of x using this formula
  • E(x) ? ?xf(x)

14
Example Rolling a die what is the Expected
Value?
  • We dont have to roll a die hundreds of times and
    actually observe the number of times we get a 1,
    2, 3 etc. to know that the probability of rolling
    each possible outcome (x) is 1-out-of-6.
  • But using the formula on the previous slide, we
    can calculate the average outcome / expected
    value
  • E(x) ? ?xf(x)
  • 1 1/6 2 1/6 3 1/6 4 1/6 5
    1/6 6 1/6
  • 2.7

15
Calculating a variance and standard deviation
  • Recall that variance and standard deviations are
    measures of dispersion (spread-out-ness).
  • Again, in Chapter 3 we learned formulas for
    calculating the variance and standard deviation
    based on actual/observed values for x. When all
    we have is an frequency distribution for expected
    x, we can still calculate a variance and standard
    deviation for x but we must use this formula
  • Var(x) ? 2 ?(x - ?)2f(x)

16
Example JSL Appliances the expected value or
mean
  • x f(x) xf(x)
  • 0 .40 .00
  • 1 .25 .25
  • 2 .20 .40
  • 3 .05 .15
  • 4 .10 .40
  • E(x) 1.20

Here is the probability distribution for the
number of TVs they expect to sell each day.
The mean or the expected number of TV sets they
expect to sell in a day is 1.2
17
Example JSL Appliances the variance and std
deviation
  • x x - ? (x - ?)2 f(x) (x - ?)2f(x)
  • 0 -1.2 1.44 .40 .576
  • 1 -0.2 0.04 .25 .010
  • 2 0.8 0.64 .20 .128
  • 3 1.8 3.24 .05 .162
  • 4 2.8 7.84 .10 .784
  • 1.660 ? ?
  • The variance of daily sales is 1.66 TV sets
    squared.
  • The standard deviation of sales is 1.2884 TV
    sets.

18
Practice question Calculate the mean, variance,
and SDExcel worksheet Computers
  • According to a survey, 95 of subscribers to the
    Wall Street Journal Interactive Edition have a
    computer at home. For those households, the
    probability distribution for the number of
    computers is given (see spreadsheet).
  • A. What is the expected value of the number of
    computers per household?
  • B. What is the variance of the number of
    computers per household?
  • C. Make three intelligent statements about the
    number of computers owned by the Journals
    subscribers.

19
Open the file DataSetsForCh5 and select the
worksheet Computers
20
To calculate the expected value for laptops,
first multiply the number of computers (x) by the
frequency f(x) for the first row C4 A4 B4
21
Copy the formula in C4 down to C7 to multiply the
number of computers (x) by the frequency f(x) for
the other possible values of x.
22
Sum the column C8 sum(C4C7)
23
Copy the sum into cell C10 because that is the
expected value C10 C8
24
To calculate the variance, start by squaring the
first value x D4 A4 A4
25
Copy the formula in D4 down to D7 to square the
remaining x-values
26
Multiply squared-x by the frequency of x for the
first x-value E4 D4 B4
27
Copy the formula in E4 down to D7 to multiply the
other squared-xs by their frequencies.
28
Sum the values in columns D and E D8
SUM(D4D7) E8 SUM(E4E8)
29
To calculate the variance, subtract the square of
the mean from the sum of column E C11 E8
(C10 C10)
30
To calculate the standard deviation, take the
square root of the variance C12 sqrt(C11)
Resave the file.
31
Draw some conclusions about number of computers
owned by the Journals subscribers.
  • The average Wall Street Journal reader has 1.42
    computers in their household.
  • The majority of households have 1 computer.
  • If we added and subtracted 2 standard deviations
    (.75) from the mean, wed find that about 95 of
    households have roughly 0 to 3 computers.

32
Developing probability distributions for discrete
random variables
  • Experiments are conducted when performing an
    experimental study.
  • Experiments yield results - a dataset of x-values
    each one considered to be a random value.
  • Some experiments have certain common
    characteristics that enable them to classified as
    a binomial experiment.

33
The Binomial Experiment
  • Here are some examples of binomial experiments
  • Tossing a coin n times and seeing how many of
    those outcomes are heads.
  • A salesman calling on potential clients and
    seeing how many of those clients purchase his
    product.
  • Gathering a sample of newborn babies and seeing
    how many of them have a certain characteristic or
    gender.

34
Characteristics of a binomial experiment
  • The experiment must consists of a sequence
    repeating the same step n times (the step could
    be flipping a coin, making a sales call,
    determining if a baby has a certain
    characteristic or gender, etc.)
  • For each step, there are two possible outcomes
    referred to as success or failure.
  • The probability of each outcome is the same at
    each step (so it must use with replacement
    sampling).
  • The steps are independent (the outcome at each
    step is not conditional on previous steps
    outcomes).

35
Practice Determine whether each of the given
experiments can be classified as a binomial
experiment
  • Surveying 1012 people and recording whether there
    is a should not response to this question Do
    you think the cloning of humans should or should
    not be allowed?
  • Rolling a loaded die 50 times and finding the
    number of times that 5 occurs.
  • Determining whether each of 3000 heart pacemakers
    is acceptable or defective.
  • Spinning a roulette wheel 12 times and finding
    the number of times that the outcome is an odd
    number.

36
Answers
  • Yes
  • Yes
  • Yes
  • Yes

37
The random variable (x) in a binomial experiment
  • A binomial experiment can produce results (a data
    set of values for x) where x is the number of
    successes after n trials.
  • So if the experiment is tossing a coin 10
    times, and on one trial we get this outcome
    HHHTTHTHTH, then x 6 for 6 times we got heads
    (a success).
  • We could just as easily have assigned tails to
    be the success.

38
Binomial experiments yield binomial distributions
  • A binomial experiment can produce results (a data
    set of values for x) that can be plotted into a
    frequency distribution that we refer to as a
    binomial frequency distribution.

39
The binomial frequency distribution
  • The binomial frequency distribution (using table
    format) would list all the possible outcomes of x
    and their probability of occurrence.
  • So, if our experiment is to toss a coin 10 times,
    our frequency distribution might look like this
  • of times Heads Probability
  • 0 .003
  • 1 .007
  • 2 .024
  • 3 .078
  • Etc.
  • 10 .003

40
But we dont have to actually perform n number of
trials to get an actual data set
  • In a binomial experiment, we dont actually have
    to have observed frequencies (that would be
    obtained if we engaged in n number of trials or
    have a sample size of n).
  • All we need to know is the probability of 1
    success. In this case, all we need to know is the
    probability of getting heads on one trial which
    would be 50.
  • We can use the binomial formula to calculate
    the probability of getting 0 heads, 1, 2, 3, etc.
    on 10 trials.

41
The binomial formula
where n number of trials x number of
successes among n trials p probability of
success in any one trial q probability of
failure in any one trial (q 1 p)
42
Lets start with what is the probability of
getting 3 heads (out of 10 tosses)?
10 9 8
.125 .007813
3 2
.1172 or 11.72
43
A closer examination of this formula
  • Notice the combinations formula within the
    binomial probability formula?
  • Thats because this part .53 .510-3
  • is the probability of getting one specific
    sequence of 3 heads and 7 tails (e. g.,
    HHHTTTTTTT)
  • This part 10 9 8 / 3 2 1
  • calculates the number of different combinations
    that result in 3 heads and 7 tails

44
What we have so far
  • X P(x)
  • 0 ?
  • 1 ?
  • 2 ?
  • 3 11.72
  • 4 ?
  • 5 ?
  • 6 ?
  • 7 ?
  • 8 ?
  • 9 ?
  • 10 ?

The probability distribution for getting x heads
out of 10 trials.
45
Using Excel
  • The binomial probability formula is one of
    Excels built-in formulas.
  • To calculate the probabilities of the other 10
    possible outcomes, lets use Excel.
  • Open the worksheet binomial and follow the steps
    on the next slides.

46
Open the file DataSetsForCH5 and click the
worksheet Coin Toss
47
1. Enter the n, p, and q values into cells C3
10 C4 .5 C5 1-C4
48
2. Position the cursor in B8. 3. To use Excels
binomdist( ) formula, click the fx button on the
formula bar to start the formula wizard.
49
4. In the category box, select Statistical,
5. In the select a function box, select
binomdist and click ok.
50
6. Enter the indicated value as function
arguments. Use absolute addressing where
appropriate. 7. Click ok.
51
8. See the formula result appear in cell B8. That
is the probability of getting 0 heads out of 10
trials.
52
9. Copy the formula in B6 down to B18 to
calculate the probabilities for the other 10
possible values for x.
53
10. Use the button on the formatting toolbar
to format the values to percentage and use the
increase decimal places button to specify 2
decimal places. This table shows the probability
of each possible outcome of the experiment.
54
11. Sum the probabilities to make sure they sum
to 100 B19 sum(B8B18)
55
Obtaining a mean, variance, and standard
deviation from a binomial distribution
  • Recall that we could derive a mean, variance,
    and std. deviation from a (generic) probability
    distribution.
  • When the probabilities distribution is
    binomial, the formulas can be simplified to the
    following
  • Expected Value E(x) ? np
  • Variance Var(x) ? 2 np(1
    - p)
  • Standard Deviation

56
Calculate the expected value or mean H3 C3 C4
57
Calculate the variance H4 C3 C4 C5
58
Calculate the standard deviation H5 sqrt(H4)
Resave the file.
59
Here is the frequency distribution notice the
symmetric characteristic of a binomial
distribution But as p gets larger (making q
smaller), the distribution becomes skewed to the
left.
60
Binomial probability practice 1
  • A multiple-choice test has six questions and each
    has 4 possible answers, one of which is correct.
  • A. If you completely guess at the answer to each
    question, what is the probability of getting
    (exactly) the first two wrong and the last four
    right (hint use the multiplication rule).
  • B. Begin with WWCCCC and make a complete list of
    all the different possible arrangements of two
    wrong and four right. What is the probability of
    getting each of these arrangements?
  • C. What is the probability of getting exactly two
    wrong and four right? (In other words, what is
    the probability of getting any of these
    arrangements?)

61
a. If you completely guess at the answer to each
question, what is the probability of getting the
first two wrong and the last four right (hint
use the multiplication rule).
  • P(C) .25
  • P(W) .75
  • P(WWCCCC) .75 .75 .25 .25 .25..25
  • P(WWCCCC) .00220

62
b. Begin with WWCCCC and make a complete list of
all the different possible arrangements of two
wrong and four right. Find the probability of
each arrangement.
CWWCCCCWCWCCCWCCWCCWCCCWCCWWCCCCWCWC
WWCCCCCWCWCCCCWCCWCCCWCCCWCCWCCCCWC WCCCCCW
CCWCCWCCCWWCCCCWCWCCCCWW
There are 15 ways to get 2 wrong and 4 right.
63
b. What is the probability of each of these
arrangements?
  • In each case, the probability is .752 .254
  • Or .00220

64
c. What is the probability of getting any of
these arrangements?
n is 6 p is .75 (the probability of
guessing wrong) x is 2 q is .25 (the
probability of guessing right)
The probability of getting any specific sequence
of 2 Ws and 4 Cs
Same as 15 (the number of ways of getting 2 Ws
and 4 Cs)
65
Binomial probability practice 2
  • If you run a red traffic light at an intersection
    equipped with a camera monitor, there is a .1
    probability that you will be given a ticket. If
    you run a red traffic light at this intersection
    five different times, what is the probability of
    getting at least one ticket?

66
The possible outcomes
  • The (6) possible outcomes are
  • 0 tickets
  • 1 ticket
  • 2 tickets
  • 3 tickets
  • 4 tickets
  • 5 tickets

None
At least 1
67
Remember how we solved it by finding the
probability of the complement?
  • If A is (getting at least one ticket)
  • Then not A is (getting 0 tickets)
  • Rather than find the P(A), find the P(not A) and
    subtract it from 1
  • P (0 tickets)
  • P(.95) (.9) (.9) (.9) (.9) (.9) ) .59
  • 1 - .59 .41
  • Or 41

68
Now we know how to solve it this way
  • P(at least one)
  • P(1) P(2) P(3) P(4) P(5)
  • Where p .1 and q .9
  • Refer to worksheet NumberOfTickets
  • Use Excels BINOMDIST( ) to find the
    probabilities and sum them.

69
(No Transcript)
70
Binomial probability practice 3
  • Assume several samples of 6 AA flights were
    selected at random and the average on-time
    arrival rate of these flights was .723 (72.3)
  • Based on this success rate of 72.3, create a
    frequency distribution.
  • Find the probability that at most two American
    Airlines flights arrive on time.
  • Find the probability that at least one AA flight
    arrives on time.

71
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com