Chapter 8 Fundamental Sampling Distributions and Data Distributions - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Chapter 8 Fundamental Sampling Distributions and Data Distributions

Description:

Normal probability plots and quantile plots are used to check normal distribution. ... Definition 8.8: A quantile of a sample, q(f), is a value for which a specified ... – PowerPoint PPT presentation

Number of Views:453
Avg rating:3.0/5.0
Slides: 48
Provided by: iis72
Category:

less

Transcript and Presenter's Notes

Title: Chapter 8 Fundamental Sampling Distributions and Data Distributions


1
Chapter 8 Fundamental Sampling Distributions and
Data Distributions
  • Wen-Hsiang Lu (???)
  • Department of Computer Science and Information
    Engineering,
  • National Cheng Kung University
  • 2007/05/24

2
8.1 Random Sampling
  • Outcome of a statistical experiment
  • Numerical value total value of a pair of dice
    tossed
  • Descriptive representation blood types in blood
    test
  • Sampling from distributions or populations
  • Sample mean and sample variance
  • The use of high speed computer enhance the use of
    formal statistical inference with graphical
    techniques.

3
Random Sampling
  • Definition 8.1 A population consists of the
    totality of the observations with which we are
    concerned.
  • Finite size 600 students are classified
    according to blood type gt a population of size
    600
  • Infinite size measuring the atmospheric
    pressure some infinite populations are so large
  • Each observation in a population is a value of a
    random variable X having some probability
    distribution f(x).
  • If one is inspecting items coming off an assembly
    line for defects, then each observation in
    population might be a value 0 or 1 of the
    binomial random variable X with probability
    distributionwhere 0 indicates a nondefective
    item and 1 indicates a defective item.

4
Random Sampling
  • Sometimes, it is impossible or impractical to
    observe the entire set of observations that make
    up the population.
  • Definition 8.2 A sample is a subset of a
    population.
  • Inference from the sample to the population are
    to be valid
  • Obtain representative samples
  • Bias Erroneous inferences result from selecting
    convenient sampling members
  • Random sample independent and at random

5
Random Sampling
  • Definition 8.3 Let X1, X2 ,, Xn be n
    independent random variables, each having the
    same probability distribution f(x). We then
    define X1, X2, , Xn to be a random sample of
    size n from the population f(x) and write its
    joint probability distribution as
  • If we assume the population of battery lives to
    be normal, the possible values of any random
    sample Xi, i 1, 2,, 8, will be precisely the
    same as those in the original population, and
    hence Xi has the same identical normal
    distribution as X.

6
8.2 Some Important Statistics
  • Definition 8.4 Any function of the random
    variables constituting a random sample is called
    a statistic.
  • Definition 8.5 If X1, X2 ,, Xn represent a
    random sample of size n, then the sample mean is
    defined by the statistic
  • Definition 8.6 If X1, X2 ,, Xn represent a
    random sample of size n, then the sample variance
    is defined by the statistic

7
Some Important Statistics
  • Example 8.1 A comparison of coffee prices at 4
    randomly selected grocery stores in San Diego
    showed increases from the previous month of 12,
    15, 17, and 20 cents for a 1-pound bag. Find the
    variance of this random sample of price
    increases.
  • Solution

8
Some Important Statistics
  • Theorem 8.1 If S2 is the variance of a random
    sample of size n, we may write
  • Proof

9
Some Important Statistics
  • Definition 8.7 The sample standard deviation,
    denoted by S, is the positive square root of the
    sample variance.
  • Example 8.2 Find the variance of the data 3, 4,
    5, 6, 6, and 7, representing the number of trout
    caught by a random sample of 6 fishermen.
  • Solution

10
8.3 Data Displays and Graphical Methods
  • Motivation Use creative displays to extract
    information about properties of a set.
  • The stem and leaf plots provide the viewer a look
    at symmetry of the data.
  • Normal probability plots and quantile plots are
    used to check normal distribution.
  • Characterize statistical analysis as the process
    of drawing conclusion about system variability.
  • Statistics provide single measures, whereas a
    graphical display adds additional information in
    terms of a picture.

11
Box and Whisker Plot or Boxplot
  • Box and whisker plot encloses the interquartile
    range of the data in a box that has median
    displayed within.
  • Interquartile range between the 75th percentile
    (upper quartile) and the 25th percentile (lower
    quartile).
  • Boxplot provides the viewer information about
    outliers which represent rare event.
  • Example 8.3 Nicotine content was measured in a
    random sample of 40 cigarettes. The data is
    displayed right.
  • Mild outliers 0.72, 0.85, and2.55

12
Box and Whisker Plot or Boxplot
13
Box and Whisker Plot or Boxplot
  • Example 8.4 Consider the following data,
    consisting of 30 samples measuring the thickness
    of paint can ears. Figure 8.2 depicts a box and
    whisker plot for this asymmetric set of data.

14
Quantile Plot
  • Quantile plot
  • Compare samples of data
  • Draw distinctions
  • Depict cumulative distribution function
  • Definition 8.8 A quantile of a sample, q(f), is
    a value for which a specified fraction f of the
    data values is less than or equal to q(f).
  • Sample median q(0.5) 75th percentile q(0.75)
    25th percentile q(0.25)

15
Quantile Plot
  • In Figure 8.3, quantile plotshows all
    observations.
  • Large clusters slopes near zero
  • Sparse data steeper slopes
  • E.g.
  • Sparse data 28-30
  • High density 36-38

16
Normal Quantile-Quantile Plot
  • Approximation of quantileof normal distribution
  • Definition 8.8 The normal quantile-quantile
    plot is a plot of

17
Normal Quantile-Quantile Plot
  • Construct a normal quantile-quantile plot and
    draw conclusions regarding whether or not it is
    reasonable to assume that the two samples are
    from the same N(?, ?) distribution.
  • Solution
  • Far from a straight line
  • Station 1 reflect a few values in the lower tail
    of the distribution and several in the upper tail
  • Unlikely

18
8.4 Sampling Distribution
  • Statistical inference is concerned with
    generalizations and predictions.
  • Based on the opinions of several people
    interviewed on the street, that in a forthcoming
    election 60 of the eligible voters in the city
    of Detroit favor a certain candidate.
  • Definition 8.10 The probability distribution of
    a statistic is called a sampling distribution.
  • E.g., the probability distribution of is
    called the sampling distribution of the mean.
  • The sampling distribution of a statistic depends
    on the size of the population, the size of the
    samples, and the method of choosing the samples.

19
8.5 Sampling Distribution of Means
  • Suppose that a random sample of n observations is
    taken from a normal population with mean ? and
    variance ?2.
  • By the reproductive property of the normal
    distribution established in Theorem 7.11
  • Theorem 7.11 If X1, X2 ,, Xn are independent
    random variables having normal distributions with
    means ?1, ?2 ,, ?n and variances ?12, ?22 ,,
    ?n2, respectively, then the random variable
    Y a1X1 a2X2 anXnhas a
    normal distribution with mean
    ?Y a1?1 a2?2 an?nand variance
    ?Y2 a12?12 a22?22 an2?n2.

20
Sampling Distribution of Means
  • Theorem 8.2 Central Limit Theorem If is the
    mean of a random sample of size n taken from a
    population with mean ? and finite variance ?2,
    then the limiting form of the distribution ofas
    n??, is the standard normal distribution n(z 0,
    1).
  • The normal approximation for will generally
    be good if n ? 30.
  • If n lt 30, the approximation is good only if the
    population is not too different from a normal
    distribution.
  • If the population is known to be normal, the
    sampling distribution of will follow a
    normal distribution exactly, no matter how small
    the size of the samples.

21
Sampling Distribution of Means
  • Example 8.6 An electric firm manufactures light
    bulbs that have life mean equal to 800 hours and
    a standard deviation of 40 hours. Find the
    probability that a random sample of 16 bulbs will
    have an average life of less than 775 hours.
  • Solution

22
Sampling Distribution of Means
  • Example 8.7 A engineer conjectures that the
    population mean of a certain component parts is
    5.0 millimeters. An experiment is conducted in
    which 100 parts produced by the process are
    selected randomly and the diameter measured on
    each. It is known that the population standard
    deviation ? 0.1. The experiment indicates a
    sample average diameter 5.027 millimeters.
    Does this sample information appear to support or
    refute the engineers conjecture?
  • Solution

23
Sampling Distribution of the Difference Between
Two Averages
  • Theorem 8.3 If independent sample of size n1 and
    n2 are drawn at random from two populations,
    discrete or continuous, with means ?1 and ?2 and
    variances ?12 and ?22, respectively, then the
    sampling distribution of the differences of
    means, is approximately normally
    distributed with mean and variance given by
  • Theorem 7.11 If X1, X2 ,, Xn are independent
    random variables having normal distributions with
    means ?1, ?2 ,, ?n and variances ?12, ?22 ,,
    ?n2, respectively, then the random variable
    Y a1X1 a2X2 anXnhas a normal
    distribution with mean ?Y a1?1
    a2?2 an?nand variance ?Y2
    a12?12 a22?22 an2?n2.

24
Sampling Distribution of the Difference Between
Two Averages
  • Example 8.8 Two independent experiments are
    being run in which two different types of paints
    are compared. Eighteen specimens are painted
    using type A and the drying time in hours is
    recorded on each. The same is done with type B.
    The population standard deviations are both known
    to be 1.0. Assuming that the mean drying time is
    equal for the two types of paint, find
  • Solution

25
Sampling Distribution of the Difference Between
Two Averages
  • Example 8.9 The television picture tubes of
    manufacturer A have a mean lifetime of 6.5 yeas
    and a standard deviation of 0.9 year, while those
    of manufacturer B have a mean lifetime of 6.0
    years and a standard deviation of 0.8 year. What
    is the probability that a random sample of 36
    tubes from manufacturer A will have a mean
    lifetime that is at least 1 year more than the
    mean lifetime of a sample of 49 tubes from
    manufacturer B?
  • Solution

26
Sampling Distribution of S2
  • If a random sample of size n is taken from a
    normal population with mean ? and variance ?2,
    and the sample variance S2 is computed.

Corollary If X1, X2 ,, Xn are independent
random variables having identical normal
distributions with mean ? and variances ?2
has a chi-squared
distribution with v n degrees of freedom.
27
Sampling Distribution of S2
  • Theorem 8.4 If S2 is the variance of a random
    sample of size n taken from a normal population
    having the variance ?2, then the statistichas
    a chi-squared distribution with v n -1 degrees
    of freedom.
  • It is customary to let ??2 represent the ?2
    value above which we find an area of ?. This is
    illustrated by the shaded region in Figure 8.10.
  • Table A.5

28
(No Transcript)
29
Sampling Distribution of S2
  • Example 8.10 A manufacturer of car batteries
    guarantees that his batteries will last, on the
    average, 3 years with a standard deviation of 1
    year. If five of these batteries have lifetimes
    of 1.9, 2.4, 3.0, 3.5, and 4.2 years, is the
    manufacturer still convinced that his batteries
    have a standard deviation of 1 year? Assume that
    the battery lifetime follows a normal
    distribution.
  • Solution

30
Degrees of Freedom As a Measure of Sample
Information
  • Comparison
  • Theorem 7.12 has a ?2 distribution with n
    degrees of freedom.
  • Theorem 8.4 has a ?2 distribution with n -1
    degrees of freedom.(when ? is not known, a
    degree of freedom is lost in the estimation of ?,
    i.e. )

31
t-Distribution
  • Central Limit Theorem (Theorem 8.2)
  • ? might not be known.
  • Consider
  • In developing the sampling distribution of T, we
    shall assume that our random sample was selected
    from a normal population.

32
t-Distribution
  • Theorem 8.5 Let Z be a standard normal random
    variable and V a chi-squared random variable with
    v degrees of freedom. If Z and V are independent,
    then the distribution of the random variable T,
    whereis given by the density functionThis
    is known as the t-distribution with v degrees of
    freedom.

33
t-Distribution
  • Corollary Let X1, X2 ,, Xn be independent
    random variables that are all normal with mean ?
    and standard deviation ?. LetThen the random
    variable has at-distribution with v
    n-1 degrees of freedom.
  • Student t-distribution
  • The probability distribution of T was first
    published in 1908 in a paper by W. S. Gosset.
  • Employed by an Irish brewery, but disallowed
    publication.
  • Published his work secretly under the name
    Student.

34
t-Distribution
  • T is similar to Z symmetric about ?
    0,bell-shaped.
  • Difference between T and Z variance of T ? 1
    and depends on n
  • T and Z are the same n ? ?

35
t-Distribution
  • t-value with 10 degrees of freedom leaving an
    area of 0.025 to the right is t 2.228.
  • t-distribution is symmetric about 0 t1-? -t?.
  • Example 8.11 The t-value with v 14 degrees of
    freedom that leaves an area of 0.025 to the left,
    and therefore an area of 0.975 to the right, is
  • Example 8.12 P(-t0.025 lt T lt t0.05) 1 - 0.05
    - 0.025 0.925

36
t-Distribution
37
t-Distribution
38
t-Distribution
  • Example 8.13 Find k such that P(k lt T lt -1.761)
    0.045, for a random sample of size 15 selected
    from a normal distribution and
  • Solution

39
t-Distribution
  • Exactly 95 of the values of a t-distribution
    with v n -1 degrees of freedom lie between
    t0.025 and t0.025.
  • A t-value that falls below t0.025 or above
    t0.025 would tend to make us believe that either
    a very rare event has taken place or perhaps our
    assumption about ? is error.
  • Example 8.14 A engineer claims that the
    population mean of a process is 500 grams. To
    check this claim he samples 25 batches each
    month. If the computed t-value falls between
    t0.05 and t0.05, he is satisfied with his claim.
    What conclusion should he draw from a sample that
    has a mean grams and a sample
    standard deviation s 40 grams? Assume the
    distribution of yields to be approximately
    normal.
  • Solution

40
t-Distribution
  • The t-distribution is used extensively in
    problems that deal with
  • Inference about the population mean
  • Comparative samples (two sample means)
  • requires that X1, X2 ,, Xn be
    normal.

41
F-Distribution
  • The F-distribution finds enormous application in
    comparing sample variances.
  • Theorem 8.6 Let U and V be two independent
    random variables having chi-squared distribution
    with v1 and v2 degrees of freedom, respectively.
    Then the distribution of the random variable
    is given by the densityThis is
    known as the F-distribution with v1 and v2
    degrees of freedom.

42
F-Distribution
  • Theorem 8.7 Writing f?(v1, v2) for f? with v1
    and v2 degrees of freedom, we obtain
  • E.g., f-value with 6 and 10 degrees of freedom,
    leaving an area of 0.95 to the right,

43
F-Distribution
10
4.06
44
F-Distribution with Two Sample Variances
  • Suppose that random samples of size n1 and n2 are
    selected from two normal populations with
    variances ?12 and ?22 Let
    having chi-squared distribution
    with v1 n1 - 1 and v2 n2 1 degrees of
    freedom. Using Theorem 8.6, we obtain the
    following result
  • Theorem 8.8 If S12 and S22 are the variances of
    independent random samples of size n1 and n2
    taken from normal populations with variances ?12
    and ?22, respectively, thenhas an
    F-distribution with v1 n1 - 1 and v2 n2 1
    degrees of freedom.

45
F-Distribution
  • If we wish to determine if the population means
    are equivalent
  • The normal distribution applies nicely for
    two-sample situation.
  • However, three-sample?
  • F-distribution is called the variance ratio
    distribution.
  • Whether sample averages could have occurred by
    chance depends on the variability within samples,
    as quantified by SA2 and SB2, and SC2.
  • The notion of the important components of
    variability is best seen through some simple
    graphics

46
Analysis of Variance with F-Distribution
  • Two key sources of variability
  • Variability within samples
  • Variability between samples
  • If the variability within samples is considerably
    larger than the variability between samples,
    there will be considerable overlap in the sample
    data and a signal that the data could all have
    come from a common distribution.

47
Exercise
  • 1, 14, 17, 29, 41, 51, 59
Write a Comment
User Comments (0)
About PowerShow.com