Probability and Sampling Distributions - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Probability and Sampling Distributions

Description:

DMS causes 'off-odors' in wine, so winemakers want to know the odor threshold ... The odor thresholds for 10 randomly chosen subjects (in micrograms/liter) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 59
Provided by: melanie58
Category:

less

Transcript and Presenter's Notes

Title: Probability and Sampling Distributions


1
Chapter 4
  • Probability and Sampling Distributions

2
Random Variable
  • Definition A random variable is a variable whose
    value is a numerical outcome of a random
    phenomenon.
  • The statistic calculated from a randomly chosen
    sample is an example of a random variable.
  • We dont know the exact outcome beforehand.
  • A statistic from a random sample will take
    different values if we take more samples from the
    same population.

3
Section 4.4
  • The Sampling Distribution of a Sample Mean

4
Introduction
  • A statistic from a random sample will take
    different values if we take more samples from the
    same population
  • The values of a statistic do no vary haphazardly
    from sample to sample but have a regular pattern
    in many samples
  • We already saw the sampling distribution
  • Were going to discuss an important sampling
    distribution. The sampling distribution of the
    sample mean, x-bar( )

5
Example
  • Suppose that we are interested in the workout
    times of ISU students at the Recreation center.
  • Lets assume that µ is the average workout time
    of all ISU students
  • To estimate µ lets take a simple random sample of
    100 students at ISU
  • We will record each students work out time (x)
  • Then we find the average workout time for the 100
    students
  • The population mean µ is the parameter of
    interest.
  • The sample mean, , is the statistic (which is
    a random variable).
  • Use to estimate µ (This seems like a sensible
    thing to do).

6
Example
  • A SRS should be a fairly good representation of
    the population so the x-bar should be somewhere
    near the ?.
  • x-bar from a SRS is an unbiased estimate of ? due
    to the randomization
  • We dont expect x-bar to be exactly equal to ?
  • There is variability in x-bar from sample to
    sample
  • If we take another simple random sample (SRS) of
    100 students, then the x-bar will probably be
    different.
  • Why, then, can I use the results of one sample to
    estimate ??

7
Statistical Estimation
  • If x-bar is rarely exactly right and varies from
    sample to sample, why is x-bar a reasonable
    estimate of the population mean ??
  • Answer if we keep on taking larger and larger
    samples, the statistic x-bar is guaranteed to get
    closer and closer to the parameter ?
  • We have the comfort of knowing that if we can
    afford to keep on measuring more subjects,
    eventually we will estimate the mean amount of
    workout time for ISU students very accurately

8
The Law of Large Numbers
  • Law of Large Numbers (LLN)
  • Draw independent observations at random from any
    population with finite mean ?
  • As the number of observations drawn increases,
    the mean x-bar of the observed values gets closer
    and closer to the mean ? of the population
  • If n is the sample size as n gets large
  • The Law of Large Numbers holds for any
    population, not just for special classes such as
    Normal distributions

9
Example
  • Suppose we have a bowl with 21 small pieces of
    paper inside. Each paper is labeled with a number
    0-20. We will draw several random samples out of
    the bowl of size n and record the sample means,
    x-bar for each sample.
  • What is the population?
  • Since we know the values for each individual in
    the population (i.e. for each paper in the bowl),
    we can actually calculate the value of µ, the
    true population mean. µ 10
  • Draw a random sample of size n 1.
  • Calculate x-bar for this sample.

10
Example
  • Draw a second random sample of size n 5.
    Calculate for this sample.
  • Draw a third random sample of size n 10.
    Calculate for this sample.
  • Draw a fourth random sample of size n 15.
    Calculate for this sample.
  • Draw a fifth random sample of size n 20.
    Calculate for this sample.
  • What can we conclude about the value of as
    the sample size increases?
  • THIS IS CALLED THE LAW OF LARGE NUMBERS.

11
Another Example
  • Example Suppose we know that the average height
    of all high school students in Iowa is 5.70
    feet.
  • We get SRSs from the population and calculate
    the height.

Mean of first n observations
12
Example 4.21 From Book
  • Sulfur compounds such as dimethyl sulfide (DMS)
    are sometimes present in wine
  • DMS causes off-odors in wine, so winemakers
    want to know the odor threshold
  • What is the lowest concentration of DMS that the
    human nose can detect
  • Different people have different thresholds, so we
    start by asking about the mean threshold ? in the
    population of all adults
  • ? is a parameter that describes this population

13
Example 4.21 From Text
  • To estimate ?, we present tasters with both
    natural wine and the same wine spiked with DMS at
    different concentrations to find the lowest
    concentration at which they can identify the
    spiked wine
  • The odor thresholds for 10 randomly chosen
    subjects (in micrograms/liter)
  • 28 40 28 33 20 31 29 27 17 21
  • The mean threshold for these subjects is 27.4
  • x-bar is a statistic calculated from this sample
  • A statistic, such as the mean of a random sample
    of 10 adults, is a random variable.

14
Example
  • Suppose ? 25 is the true value of the parameter
    we seek to estimate
  • The first subject had threshold 28 so the line
    starts there
  • The second point is the mean of the first two
    subjects
  • This process continues many many times, and our
    line begins to settle around ? 25

15
Example 4.21From Book
The law of large numbers in action as we take
more observations, the sample mean always
approaches the mean of the population
16
The Law of Large Numbers
  • The law of large numbers is the foundation of
    business enterprises such as casinos and
    insurance companies
  • The winnings (or losses) of a gambler on a few
    plays are uncertain -- thats why gambling is
    exciting(?)
  • But, the house plays tens of thousands of times
  • So the house, unlike individual gamblers, can
    count on the long-run regularity described by the
    Law of Large Numbers
  • The average winnings of the house on tens of
    thousands of plays will be very close to the mean
    of the distribution of winnings
  • Hence, the LLN guarantees the house a profit!

17
Thinking about the Law of Large Numbers
  • The Law of Large Numbers says broadly that the
    average results of many independent observations
    are stable and predictable
  • A grocery store deciding how many gallons of milk
    to stock and a fast-food restaurant deciding how
    many beef patties to prepare can predict demand
    even though their customers make independent
    decisions
  • The Law of Large Numbers says that the many
    individual decisions will produce a stable result

18
The Law of Small Numbers or Averages
  • The Law of Large Numbers describes the regular
    behavior of chance phenomena in the long run
  • Many people believe in an incorrect law of small
    numbers
  • We falsely expect even short sequences of random
    events to show the kind of average behaviors that
    in fact appears only in the long run

19
The Law of Small Numbers or Averages
  • Example Pretend you have an average free throw
    success rate of 70. One day on the free throw
    line, you miss 8 shots in a row. Should you hit
    the next shot by the mythical law of averages.
  • No. The law of large numbers tells us that the
    long run average will be close to 70. Missing 8
    shots in a row simply means you are having a bad
    day. 8 shots is hardly the long run.
    Furthermore, the law of large numbers says
    nothing about the next event. It only tells us
    what will happen if we keep track of the long run
    average.

20
The Hot Hand Debate
  • In some sports If player makes several
    consecutive good plays, like a few good golf
    shots in a row, often they claim to have the hot
    hand, which generally implies that their next
    shot is likely to a good one.
  • There have been studies that suggests that runs
    of golf shots good or bad are no more frequent in
    golf than would be expected if each shot were
    independent of the players previous shots
  • Players perform consistently, not in streaks
  • Our perception of hot or cold streaks simply
    shows that we dont perceive random behavior very
    well!

21
The Gambling Hot Hand
  • Gamblers often follow the hot-hand theory,
    betting that a lucky run will continue
  • At other times, however, they draw the opposite
    conclusion when confronted with a run of outcomes
  • If a coin gives 10 straight heads, some gamblers
    feel that it must now produce some extra tails to
    get back into the average of half heads and half
    tails
  • Not true! If the next 10,000 tosses give about
    50 tails, those 10 straight heads will be
    swamped by the later thousands of heads and
    tails.
  • No short run compensation is needed to get back
    to the average in the long run.

22
Need for Law of Large Numbers
  • Our inability to accurately distinguish random
    behavior from systematic influences points out
    the need for statistical inference to supplement
    exploratory analysis of data
  • Probability calculations can help verify that
    what we see in the data is more than a random
    pattern

23
How Large is a Large Number?
  • The Law of Large Numbers says that the actual
    mean outcome of many trials gets close to the
    distribution mean ? as more trials are made
  • It doesnt say how many trials are needed to
    guarantee a mean outcome close to ?
  • That depends on the variability of the random
    outcomes
  • The more variable the outcomes, the more trials
    are needed to ensure that the mean outcome x-bar
    is close to the distribution ?

24
More Laws of Large Numbers
  • The Law of Large Numbers is one of the central
    facts about probability
  • LLN explains why gambling, casinos, and insurance
    companies make money
  • LLN assures us that statistical estimation will
    be accurate if we can afford enough observations
  • The basic Law of Large Numbers applies to
    independent observations that all have the same
    distribution
  • Mathematicians have extended the law to many more
    general settings

25
What if Observations are not Independent
  • You are in charge of a process that manufactures
    video screens for computer monitors
  • Your equipment measures the tension on the metal
    mesh that lies behind each screen and is critical
    to its image quality
  • You want to estimate the mean tension ? for the
    process by the average x-bar of the measurements
  • The tension measurements are not independent

26
AYK 4.82
  • Use the Law of Large Numbers applet on the text
    book website

27
Sampling Distributions
  • The Law of Large Numbers assures us that if we
    measure enough subjects, the statistic x-bar will
    eventually get very close to the unknown
    parameter ?

28
Sampling Distributions
  • What if we dont have a large sample?
  • Take a large number of samples of the same size
    from the same population
  • Calculate the sample mean for each sample
  • Make a histogram of the sample means
  • the histogram of values of the statistic
    approximates the sampling distribution that we
    would see if we kept on sampling forever

29
  • The idea of a sampling distribution is the
    foundation of statistical inference
  • The laws of probability can tell us about
    sampling distributions without the need to
    actually choose or simulate a large number of
    samples

30
Mean and Standard Deviation of aSample Mean
  • Suppose that x-bar is the mean of a SRS of size n
    drawn from a large population with mean ? and
    standard deviation ?
  • The mean of the sampling distribution of x-bar is
    ? and its standard deviation is
  • Notice averages are less variable than
    individual observations!

31
Mean and Standard Deviation of aSample Mean
  • The mean of the statistic x-bar is always the
    same as the mean ? of the population
  • the sampling distribution of x-bar is centered at
    ?
  • in repeated sampling, x-bar will sometimes fall
    above the true value of the parameter ? and
    sometimes below, but there is no systematic
    tendency to overestimate or underestimate the
    parameter
  • because the mean of x-bar is equal to ?, we say
    that the statistic x-bar is an unbiased estimator
    of the parameter ?

32
Mean and Standard Deviation of aSample Mean
  • An unbiased estimator is correct on the average
    in many samples
  • how close the estimator falls to the parameter in
    most samples is determined by the spread of the
    sampling distribution
  • if individual observations have standard
    deviation ?, then sample means x-bar from samples
    of size n have standard deviation
  • Again, notice that averages are less variable
    than individual observations

33
Mean and Standard Deviation of aSample Mean
  • Not only is the standard deviation of the
    distribution of x-bar smaller than the standard
    deviation of individual observations, but it gets
    smaller as we take larger samples
  • The results of large samples are less variable
    than the results of small samples
  • Remember, we divided by the square root of n

34
Mean and Standard Deviation of aSample Mean
  • If n is large, the standard deviation of x-bar is
    small and almost all samples will give values of
    x-bar that lie very close to the true parameter ?
  • The sample mean from a large sample can be
    trusted to estimate the population mean
    accurately
  • Notice, that the standard deviation of the sample
    distribution gets smaller only at the rate
  • To cut the standard deviation of x-bar in half,
    we must take four times as many observations, not
    just twice as many (square root of 4 is 2)

35
Example
  • Suppose we take samples of size 15 from a
    distribution with mean 25 and standard deviation
    7
  • the distribution of x-bar is
  • the mean of x-bar is
  • 25
  • the standard deviation of x-bar is
  • 1.80739

36
What About Shape?
  • We have described the center and spread of the
    sampling distribution of a sample mean x-bar, but
    not its shape
  • The shape of the distribution of x-bar depends on
    the shape of the population distribution

37
Sampling Distribution of a Sample Mean
  • If a population has the N(?, ?) distribution,
    then the sample mean x-bar of n independent
    observations has the
  • distribution

38
Example
  • Adults differ in the smallest amount of dimethyl
    sulfide they can detect in wine
  • Extensive studies have found that the DMS odor
    threshold of adults follows roughly a Normal
    distribution with mean ? 25 micrograms per
    liter and standard deviation ? 7 micrograms per
    liter

39
Example
  • Because the population distribution is Normal,
    the sampling distribution of x-bar is also Normal
  • If n 10, what is the distribution of x-bar?

40
What if the Population Distribution is not
Normal?
  • As the sample size increases, the distribution of
    x-bar changes shape
  • The distribution looks less like that of the
    population and more like a Normal distribution
  • When the sample is large enough, the distribution
    of x-bar is very close to Normal
  • This result is true no matter what shape of the
    population distribution as long as the population
    has a finite standard deviation ?

41
Central Limit Theorem
  • Draw a SRS of size n from any population with
    mean ? and finite standard deviation ?
  • When n is large, the sampling distribution of the
    sample mean x-bar is approximately Normal
  • x-bar is approximately

42
Central Limit Theorem
  • More general versions of the central limit
    theorem say that the distribution of a sum or
    average of many small random quantities is close
    to Normal
  • The central limit theorem suggests why the Normal
    distributions are common models for observed data

43
How Large a Sample is Needed?
  • Sample Size depends on whether the population
    distribution is close to Normal
  • We require more observations if the shape of the
    population distribution is far from Normal

44
Example
  • The time X that a technician requires to perform
    preventive maintenance on an air-conditioning
    unit is governed by the Exponential distribution
    (figure 4.17 (a)) with mean time ? 1 hour and
    standard deviation ? 1 hour
  • Your company operates 70 of these units
  • The distribution of the mean time your company
    spends on preventative maintenance is

45
Example
  • What is the probability that your companys units
    average maintenance time exceeds 50 minutes?
  • 50/60 0.83 hour
  • So we want to know P(x-bar gt 0.83)
  • Use Normal distribution calculations we learned
    in Chapter 2!

46
4.86 ACT scores
  • The scores of students on the ACT college
    entrance examination in a recent year had the
    Normal distribution with mean µ 18.6 and
    standard deviation s 5.9

47
4.86 ACT scores
  • What is the probability that a single student
    randomly chosen from all those taking the test
    scores 21 or higher?

48
4.86 ACT scores
  • About 34 of students (from this population)
    scored a 21 or higher on the ACT
  • The probability that a single student randomly
    chosen from this population would have a score of
    21 or higher is 0.34

49
4.86 ACT scores
  • Now take a SRS of 50 students who took the test.
    What are the mean and standard deviation of the
    sample mean score x-bar of these 50 students?
  • Mean 18.6 same as µ
  • Standard Deviation 0.8344 sigma/sqrt(50)

50
4.86 ACT scores
  • What is the probability that the mean score x-bar
    of these students is 21 or higher?

51
4.86 ACT scores
  • About 0.2 of all random samples of size 50
    (from this population) would have a mean score
    x-bar of 21 or higher.
  • The probability of having a mean score x-bar of
    21 or higher from a sample of 50 students (from
    this population) is 0.002.

52
Section 4.4 Summary
  • When we want information about the population
    mean µ for some variable, we often take a SRS and
    use the sample mean x-bar to estimate the unknown
    parameter µ.

53
Section 4.4 Summary
  • The Law of Large Numbers states that the actually
    observed mean outcome x-bar must approach the
    mean µ of the population as the number of
    observations increases.

54
Section 4.4 Summary
  • The sampling distribution of x-bar describes how
    the statistic x-bar varies in all possible
    samples of the same size from the same population.

55
Section 4.4 Summary
  • The mean of the sampling distribution is µ, so
    that x-bar is an unbiased estimator of µ.

56
Section 4.4 Summary
  • The standard deviation of the sampling
    distribution of x-bar is sigma over the square
    root of n for a SRS of size n if the population
    has standard deviation sigma. That is, averages
    are less variable than individual observations.

57
Section 4.4 Summary
  • If the population has a Normal distribution, so
    does x-bar.

58
Section 4.4 Summary
  • The Central Limit Theorem states that for large n
    the sampling distribution of x-bar is
    approximately Normal for any population with
    finite standard deviation sigma. That is,
    averages are more Normal than individual
    observations. We can use the fact that x-bar has
    a known Normal distribution to calculate
    approximate probabilities for events involving
    x-bar.
Write a Comment
User Comments (0)
About PowerShow.com