Module Four: Normal distribution and it - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Module Four: Normal distribution and it

Description:

Looking inside the table, find the closed probability to .3, ... data, the ln transformed data and Square-root transformed data: ... Square-root transformation ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 31
Provided by: katij
Learn more at: http://www.cst.cmich.edu
Category:

less

Transcript and Presenter's Notes

Title: Module Four: Normal distribution and it


1
  • Module Four Normal distribution and its
    applications to inter-laboratory testing
  • When we conduct an inter-laboratory testing, we
    often observe continuous variables,
  • e.g., the amount of chloride of a water sample,
    the beta-carotene in a blood sample, the blood
    pressure are continuous variables.
  • When we construct a relative frequency histogram,
    it is very likely that the shape of the
    distribution is bell-shaped, that is a few
    possible values are small, a few are large, and
    most of them are around the average.
  • Such type of distribution is what we call NORMAL
    distribution.
  • Fox example, Blood Pressure, the beta-carotene in
    a blood sample, amount of chloride of a water
    sample mostly follow normal curves.

2
A histogram with imposed normal curve for 1900
individuals systolic blood pressure
  • The imposed smooth curve looks like a bell-shape.
    If the blood pressure follows a normal curve with
    mean 115 and s.d. 14,
  • We use the notation X N(m,s)
  • For this case, X N (115,14).
  • An immediate question is How can we detect if
    the distribution indeed follows a normal curve.

m115, s 14
Our interest may be to check if the blood
pressure follows a normal distribution, to find
out what proportion of individuals whose blood
pressure is at risk (150 ml or higher), or to
identify extreme cases.
3
  • When and How do you use Normal Distribution in
    real world situations?
  • Normal curve describes the probability of
    occurrences of many real situations.
  • Most of statistical techniques, including the
    techniques used for analyzing inter-laboratory
    testing data, assume that the response variable
    approximately follows a normal curve.
  • These methods may not be valid if the response
    does not follow a normal distribution. It is,
    therefore, important to learn how to check if a
    response variable follows a normal distribution
    or not. For this reason, we need to learn some
    basic properties of a normal distribution, to
    learn how to compute probabilities and
    percentiles for a normal distribution.
  • In this module, we will discuss
  • The use of z-table and Minitab to compute
    probabilities and percentiles.
  • Techniques of checking if a response variable
    follows a normal distribution.

4
  • The normal probability distribution provides a
    good model for describing data that have
    mound-shaped frequency distributions.
  • The Normal Probability Distribution
  • where e 2.718 and p 3.142 m and s (s gt 0
    ) are the parameters that represent the
    population mean and standard deviation.
  • We will use the notation X N(m , s). This
    means
  • X is distributed as Normal with mean m and
    standard deviation s.
  • Some examples of normal random variables are
  • X Adult Height , X Scores of s national
    test, X Gas price, X Blood pressure
  • NOTE X salary of individuals who are 40 years
    or old before retire does not follow a normal
    curve. It is a skewed to right distribution.

5
  • Properties of Normal Distribution
  • This figure shows three such distributions with
    differing values of m and s .
  • Mean determines the center. In this case, m1 lt m2
    lt m3
  • Standard deviation measures the variability. In
    this case, s2 lt s1 lt s3
  • Large values of s reduce the height of the curve
    and increase the spread.
  • Small values of s increase the height of the
    curve and reduce the spread.

s2
s1
s3
m1 m2 m3
6
Some properties for X N(m , s)
7
  • Example
  • Every year, universities recruit students using
    their SAT scores. Based on the previous
    information, we know that SAT scores follows a
    normal curve with the mean 1000 and standard
    deviation 180. In the past, CMU admits students
    with SAT 1090 or higher.
  • Q1 What is the percent of high school students
    who can receive CMU admission?
  • Q2 If CMU decides to higher the SAT admission
    limit to only admit the top 20 of high school
    graduates. What should be the new SAT admission
    limit?
  • Q3 A student scored 1200, and claim he is in
    the top 10. Is this a correct claim?

8
Tabulated Areas of the Normal Probability
Distributions
  • How do you solve the SAT admission problem?
  • First, we need to rewrite the problem using the
    notation we are familiar.
  • Let call X SAT scores. Then from the given
    information, we know
  • X N(1000, 180).
  • Q1 asks for P( X gt 1090)
  • Q2 asks for a value of X, call it xo, the
    admission limit, so that
  • P( X gt xo ) .2
  • Q3 asks for comparing P(X gt 1200) with .1
  • How do we solve these problems?
  • The probability that a continuous random variable
    x assumes a value in the interval from a to b is
    the area under the probability density function
    between the points a and b.

9
  • One can use computer such as Minitab, or use a
    standardized Z-table.
  • The Standard Normal Random Variable
  • The standardized normal random variable z, is
    defined as
  • z (x - m)/ s , or equivalently, x m zs .
  • The standard probability distribution has a mean
    of zero and a standard deviation of 1, that is Z
    N(0,1)
  • The area under the standard normal curve between
    mean z 0 and a specified positive value of z,
    say, z0 , is the probability
  • Some books use this
  • table. Some use other
  • type of tables.

10
Back to the SAT score problem
X N(1000, 180)
P( Xgt1090)
X, SAT score
1000 1090
Z(x-1000)/180
(1000-1000)/180 0
0.5 (1090-1000)/180
The idea is to transform X N(m,s) to Z(0,1)
using z (x-m)/s P(X gt 1090) P(Z gt
(1090-1000)/180 ) P(Z gt 0.5) Now Z-table can be
applied.
11
(No Transcript)
12
  • Example Find P (0 lt z lt 1.63)
  • Solution
  • Draw a normal curve, shade the area of interest.
  • Rewrite the question in the way that the Z-table
    can be applies. That is in the forms of
  • P( 0 lt Z lt zo)
  • For this example, it is already in this form, so
    using the Z-table, we obtain P (0 lt z lt 1.63)
    .4484.
  • Some additional exercises
  • Find P( Z lt 1.96), Find P(-1.24lt Z lt .68), Find
    P( Z gt -1.64)

13
  • Calculating Probabilities for a General Normal
    Random Variable, X
  • 1. Draw a normal curve for X, shade the area of
    interest,
  • 2. Transform X to Z.
  • - Standardize the interval of interest, write it
    as the equivalent interval in terms of z.
  • - The probability of interest is the area that
    you find using the standard normal probability
    distribution.

14
Now, Back to the the SAT example, do the
following exercises SAT score, X follows a
normal distribution with mean 1000 and s.d., 180.
That is, X N(1000, 180) Find P(X lt 800) Find
P(750 lt X lt 900) Find P(1180 lt X lt 1360)
15
  • How about the question of determining the SAT
    admission score for CMU so that the top 20 will
    receive admission from CMU.
  • Answer X N(1000, 180). The problem is to find
    the admission score, xo so that
  • P(X gt x0) .2
  • This is a problem we are looking for a score, not
    a probability. We are reversing the problem
    solving procedure, here.
  • Similar technique is applied here
  • Draw a normal curve, shade the area of interest.
  • Transform from X to Z.
  • Rewrite the problem in terms of Z.
  • Solve for the standardized value, zo using
    Z-table reversely.
  • Transform zo back to xo by xo m s(zo)

16
To solve for the admission score xo so that P(X gt
xo) .2 Draw the normal curve, shade the area of
interest, transform to Z. .2 P(X gt xo) P(Z gt
zo) implies P(0 lt Z lt zo) .3 This is a form we
can use Z-table. Looking inside the table, find
the closed probability to .3, which is .2995. By
the Z-table, .2995 P(0 lt Z lt .84). Therefore,
zo .84, which is the standardized admission
limit. So, solving for xo, we have xo m
s(zo) 1000 (180)(.84) 1151.2 The CMU SAT
admission limit will be about 1151.2 (In actual
application for setting up the policy, we can use
1150 as the new admission standard.)
17
Hands-on activities Q-aFor the SAT example, X
(1000, 180), suppose a university admits only top
5. Find their admission limit. Q-b Find the 5th
percentile of SAT score. Q-c Find the Q3 SAT
score (75th percentile).
18
  • Use Minitab to compute cumulative probabilities
    and percentiles for a normal distribution
  • Go to Calc, choose Probability Distributions,
    then select Normal.
  • In the Dialog box, Density probability f(x),
    Cumulative probability P( X lt a) for any given
    a, Inverse cumulative probability is the 100pth
    percentile, xo , so that P(X lt xo) p. Choose
    the one you are computing.
  • Enter Mean and s.d.. By default, it is N(0,1).
  • To compute cumulative probability, you need to
    provide a values, which may be created and
    recorded in a column, e.g., C3, or simply to
    provide the constant a.
  • To compute inverse cumulative probability, you
    need to provide the cumulative probabilities,
    which must be in (0,1).

19
  • Methods for detecting the discrepancy of the
    distribution of a response variable from normal
    distribution.
  • Consider the example of Blood Pressure data. From
    the histogram and the normal curve imposed onto
    the histogram using Minitab, we can see that the
    blood pressure generally speaking follows a
    normal curve. However, there seems to have a few
    unusually high blood pressures. The question is
    How well the blood pressure follows a normal
    curve?.
  • The imposing normal curve helps us to quickly
    identify serious discrepancy from normal.
    However, if the discrepancy is not very serious,
    it is difficult to simply observe the shape of a
    histogram.
  • We will discuss three ways for checking the
    normality of a response
  • Imposing normal curve onto the histogram,
  • Probability plot,
  • Numerical methods for testing the degree of
    departure from normal.

20
  • Imposing a normal curve onto a histogram for the
    blood pressure data of 1900 young adults between
    15-20 years old

The normal curve indicates there are a few large
blood pressure measurements. In fact, the
descriptive statistics shows the highest is 210,
which is much higher than 2 s.d. from the
average. It suggests 210 is very rare. One should
check immediately if there is a typo or not.
  • How to construct this plot using Minitab
  • Go to Stat, choose Basic Statistics, choose
    Display Descriptive Statistics.
  • Enter the variable. Click on the Graphs option,
  • In the Graphs option Dialog, you can have a
    variety of choices. One of them is Histogram with
    Normal Curve.

21
  • 2. Normal Probability Plot It is a
    two-dimensional plot.
  • The Y-axis is the estimated cumulative
    probabilities computed by
  • The X-axis is the original data in ascending
    order.
  • Diagnosis

When the data follow a normal curve, the dotted
points should follow a straight line
When data are skewed-to-right, the plot would
look like
When data are skewed-to-left, the plot would look
like
22
(No Transcript)
23
  • Based on the Normal probability plot, it
    indicates that the systolic blood pressure does
    not follow a normal curve. The pattern also shows
    that the distribution is somewhat
    skewed-to-the-right.
  • 3. Test statistic for testing if the blood
    pressure follows a normal curve or not.
  • Graphical methods are good to show the pattern
    and gives us pretty clear picture that the data
    do not follow normal. Numerically, there are
    methods that will test such a hypothesis. The
    test statistic is given in the same graph of the
    Normal Probability Plot.
  • The Anderson-Darlings Normality Test is
    presented here. The AD-value 11.5, and the
    corresponding p-value is .000
  • Note p-value tells us how far the distribution
    of blood pressure is away from normal. The
    smaller the p-value, the less likely the response
    variable follows a normal curve. A common cut-off
    point is 5. In this case, p-value .000, which
    is clear that the distribution of Systolic blood
    pressure does not follow normal.

24
  • How to construct a Normal Probability Plot and
    carry out the Anderson-Darlings Normality Test?
  • Go to Stat, choose Basic Statistics, then select
    Normality Test.
  • In the Dialog, enter variable name.
  • Reference Probabilities allow us to provide a
    column of cumulative probabilities so that the
    normal probability plot will show the percentiles
    for each given cumulative probability.

25
  • Note As we have observed that all three methods
    give us similar results. Therefore, the systolic
    blood pressure for 15 to 20 years old young
    adults does not follow a normal distribution from
    the 1909 cases.
  • Note Once we find out the distribution is not
    normal, it is critical to take some further
    analysis
  • carefully check the data to see if there are any
    typos,
  • Examine the data using some descriptive measures
    or other plots to identify extreme cases (Details
    will be discussed in another module).
  • Hands-on Activity
  • Use the above three methods to check the
    distribution of Diastolic Blood Pressure data.

26
  • Actions to deal with extreme cases
  • For observational studies (such as survey)
  • The sample sizes are usually large, and that it
    is often impossible to find out possible causes
    that resulted the extreme data after the data are
    collect. Therefore, it is critical to collect
    background and environmental variables that may
    have potential impact to the results.
  • For experimental studies, such as
    inter-laboratory testing
  • It is important to look for possible causes that
    resulted the extremes. The study is usually
    conducted under a controlled experimental
    environment. It is more likely to find out causes
    for the extremes, or be able to explain the
    possible causes.
  • Deletion of extremes Vs. Making transformation
    to normal
  • One must be careful of deleting extremes.
    Especially when we are not able to find any
    causes and the values are reasonable within the
    context of the study.
  • This may be an indication that the distribution
    of the response is skewed. For situations such as
    this, an appropriate approach is to transform the
    data to be closer to normal.

27
  • Method for transforming a variable to normal
  • When the data show a skewed distribution,
    statistical methods such as Analysis of Variance
    may not be valid. An approach is to make a
    mathematical transformation of the variable so
    that the transformed variable will be closer to
    normal.
  • Some tips for variable transformation
  • If variable, Y, is skewed-to-right Then,
    ln(Y), log10(Y), or will be closer of
    normal. (If there are zeros, add each data value
    by .5, first.
  • If variable, Y, is skewed-to-left ln(1/Y),
    log10(1/Y),
  • or Ya, a gt1 will be closer to normal.

28
  • An example of Transformation
  • The life time of 50 light bulbs are tested by
    letting them on all the time until it burns out.
    The data recorded (in months). Here are the
    histogram and the normal probability test of the
    raw data, the ln transformed data and Square-root
    transformed data

The raw data is skewed-to-right. The Ln
transformation does not work well. The
Square-root transformation works well.
29
The normal probability plots and
Anderson-Darlings tests for the life-time data
As the normal probability plots and the Normality
test results indicate, the Sqrt(Y) is
approximately normal. The other two are not.
30
Hands-on Activity Analyze the distribution of
variable GR36-Lab-Mean-1 in the TAPPI
inter-laboratory testing study, and determine an
appropriate transformation to make the data
closer to a normal distribution.
Write a Comment
User Comments (0)
About PowerShow.com