Normal Distribution - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Normal Distribution

Description:

You may know it by it's more common name the 'bell-shaped curve' or 'Guassian ... Kurtosis: the ratio of kurtosis to its standard error. ... – PowerPoint PPT presentation

Number of Views:259
Avg rating:3.0/5.0
Slides: 30
Provided by: drmarkjk
Category:

less

Transcript and Presenter's Notes

Title: Normal Distribution


1
Normal Distribution
  • HED 489 Biostatistics

2
  • Odds are that you've come across the normal
    distribution below. You may know it by it's more
    common namethe "bell-shaped curve or Guassian
    curve (knowing this is very impressiveuse it in
    a conversation with your family todaytheyll be
    impressed). In case you've never seen it before,
    that's the normal distribution on the right. We
    spend time understanding the normal distribution
    for two reasons
  • it forms the basis of probability, and
    probability forms the basis of parametric
    statistical tests
  • whether the distribution is normal is one of
    theperhaps the primary determinant of which
    family of statistical tests you will apply.

3
Formation of The Normal Curve
  • Assume for the moment that the data in the slide
    to the right represent ages of a bunch of people.
    As you can see, young people are on the left and
    older people on the right. Most of the people are
    between 9 and 11, right? That's the big bunch in
    the middle.

4
           
  • Now, let's say that we draw a line connecting the
    top midpoint of each bar. Here's what that would
    look like

5
  • See the nice straight lines you get? That's
    because, for purposes of this explanation, I've
    created the distribution such that those straight
    lines would result! Now, how about if we smooth
    the lines so they become a nice curve. That's
    what this next slide shows

6
  • See how we have the outline of a bell-shaped
    curve. This is an approximation of what would
    happen with these data. The last thing that
    happens when we have a normal distribution is
    that the outline becomes filled in with data.
    That's what this next slide shows.

7
  • So, if we strip away the unessential information,
    we're left with the first slide you saw in this
    lesson. Remember what it looked like?

8
Normal Curve Properties
  • Symmetry the mean bisects the distribution
    exactly, such that the two halves of the
    distribution form a mirror image of each other.
  • Unimodal one mode
  • Standardized characteristics the standard
    deviation of a standardized normal distribution
    will always be 1.0 with a mean of 0.
  • Infinite theoretically, the "tails" of the
    distribution never contact the x-axis. In other
    words, the tails are infinite.

9
Characteristics of a Normal Distribution
  • Bell-shaped examine the shape of the curve
  • Points of inflection the point at which a curve
    changes from concave to convex.
  • Skewness the ratio of skewness to its standard
    error.
  • lt-2 than left or negatively skewed gt2 than right
    or positively skewed.
  • Kurtosis the ratio of kurtosis to its standard
    error.
  • lt-2 than tails longer than expected gt2 than
    tails shorter than expected.
  • Central Tendencies the mean, median, and mode
    have the same point estimate.
  • Percentage of Scores for example, 68 of scores
    fall symmetrically within 1 standard deviation
    95 fall within 2, and 99 fall within 3
    standard deviation.

This is REALLY Important to knowLOOK it over and
KNOW these percentages
10
Standardized normal distribution, z-scores, and
probability
  • By standardizing normally distributed scores, one
    could better understand and compare scores.
  • Standardizing is a process through which scores
    are transformed into a common scale (in this
    case, z-scores).
  • Probabilities within the normal distribution
    could be represented by z-scores and visa versa
  • The standardized normal distribution has a mean
    equal to 0 and a standard deviation equal to 1.

11
Moving Towards Probability
  • As you can see on the slide to the right, we can
    plot the location of various values by adding or
    subtracting the standard deviation (1.0) to or
    from the mean (0). See where positive and
    negative standard deviations fall on this
    distribution? Typically, we don't go beyond 3.0
    standard deviations, commonly abbreviated "s.d."
    We'll talk about why in just a bit.
  • Remember, if we were measuring age, our raw data
    s.d. would be in increments of years. If we were
    measuring water consumption, our raw data
    increment might be ounces. However, both sets of
    data can be transformed into standardized scores
    (z-scores).

12
  • If you collect the ages of, say, 10,000 people,
    and build a frequency distribution, it will
    contain all 10,000 people, right? That goes
    without saying, doesn't it? Well, the same thing
    can be said of the normal distribution. That is,
    all of the data, or 100 of it for a particular
    variablelike agewill be contained within the
    distribution. For any normal distribution, no
    matter what the data are, 99.7 of the data will
    be contained in the space from -3 to 3 s.d. Do
    you see that in the slide on the right? See all
    the blue area between -3 and 3? That's where
    most of the data are. See how little blue there
    is to the left of -3 and to the right of 3?
    There are very few cases there. Combined, in
    fact, only .3 of the data are greater than 3 or
    less than -3.

13
  • So, if you had those 10,000 ages on slips of
    paper, and you selected one at random, it would
    fall between 3.0 s.d. 99.7 of the time. That is
    to say there is a probability of .997 of
    selecting an age at random that falls between
    3.0 s.d. Because the largest area under the
    curve falls between 1.0 s.d., it stands to
    reason that most of the cases will be contained
    in this area, too. As you can see from the slide,
    about 68.2 of the cases (34.1 2) fall between
    1.0 s.d.
  • An additional 14 of the cases are contained
    between 1.0 and 2.0 s.d., and 14 more fall
    between -1.0 and -2.0 s.d. See that? In total,
    95.4 of the cases fall between 2.0 s.d.
  • Because there's not much area under the curve
    between 2.0 and 3.0 s.d., only 2.2 of the cases
    will fall between 2.0 and 3.0 s.d., and between
    -2.0 and -3.0 s.d. Remember we said earlier that
    99.7 of the cases fall between 3.0 s.d.? Well,
    this is how we get to 99.7.
  • If you didn't follow that, go back and study it
    again until you understand it.

14
Taking Another Step
  • This slide summarizes what you just learned. That
    is, about 68 of the cases fall between 1.0
    s.d., about 95 between 2.0 s.d., and about 99
    fall between 3.0 s.d. Now, pay close attention
    it's going to get tricky If about 95 of the
    cases fall between 2.0 s.d., then about 5 will
    be greater than -2.0 or 2.0 s.d. Do you see
    that? If all, or 100 of the cases fall somewhere
    along the curve, and you account for 95 of them
    between 2.0 s.d., then 5 are left. About 2.5
    of the cases fall to the left of -2.0 s.d. and
    another 2.5 fall to the right of 2.0 s.d.
  • Still with me? Ok, contemplate this if you pick
    a value at random, there is a probability of .05
    that it will be greater than -2.0 or 2.0 s.d.
    Why? Because there's a probability of .95 that it
    will be between 2.0 s.d.

15
Calculating the z-score
  • Calculating the z-score is relatively simple. 
    Just follow the formula.

16
  • You will need to know the mean and the standard
    deviation.  Just punch in the score you need and
    you'll get the z-score.  For example, if you
    scored 100 on the test, and the mean is 80 and
    the standard deviation is 10, you'll have the
    formula as such
  • z 100 - 80 / 10 z  20 /10z 2
  • That is, your score was
  • two standard deviations
  • above the mean your
  • score was higher than
  • 97.7 of scores.

17
The Normal Distribution What does the z Score
mean?
  • It's time for you to open your textbook to the
    inside back cover (to Table A).  Click slide and
    it will appear on-screen (it is also shown on the
    next slide in this program). This table shows the
    area under the curve from zero to any point along
    the curve, out to three decimal points, to 4.0
    s.d.
  • This table is also referred to a table of
    "z-scores." See the "z" in the upper left cell?
    For a normal distribution of data, the standard
    deviation is equal to z. More on that in a
    moment. First, I want to get you comfortable with
    this table.
  • Remember that about 34 of the cases fall between
    zero and 1 s.d.? In case you forgot that, just
    look at the z-score table, and it will remind
    you. If we selected a case at random from a
    normal distribution of data that the probability
    is about .34 that the value of that case will
    fall between zero and 1.0 s.d.?

18
  • Look down the left column of Table A (normal
    curve or z-score table), either in the book or on
    the slide at the top. Move your finger one
    column to the right. That's the one labeled ".00"
    The value in the cell where your finger is
    pointing is .3413, right? (next to the z- score
    of 1.0) That's where I found "about" 34 or 34.

19
z(score)-(mean of scores)/standard deviation
  • The above formula is used to calculate the
    z-score.  What happens if you're interested in
    some s.d. other than 1.0, 2.0, or 3.0? Here's
    where the table really comes in handy! Say you're
    interested in the area under the curve (also
    known as "probability") for a s.d. (or z-score,
    as I want you to begin thinking, as well) of
    1.96? How do you get there. Well, run your finger
    down the first column until you get to 1.9, and
    then run across until you get to .06. What value
    did you find? .4750? If so, you did it just
    right!
  • Note that the z-score table just shows half of
    the curve, that is, from zero to the right.
    That's because the curve is symmetrical, so if
    you want to know the area, or probability, for
    the other half of the curve, just double the
    tabled value. For example, if you wanted to know
    the probability for 1.96, multiply .4750 time 2.
    That would give you .9500, and you'd say that the
    probability of selecting a value at random that
    would fall between 1.96 s.d. would be .95.
  • Now, I want to give you some practice working
    with the normal curve so I know that you've
    become comfortable with it.

20
Calculate the percent under the curve between the
mean and a z-score (or s.d.) of -1.39.
  • To do this, use the z-score table. Run your
    finger down the first column until you reach 1.3.
    Run your finger along the 1.3 row until you reach
    the .09 column. The figure in that cell is the
    area under the curve from 0.00 to 1.39 (or 0 to
    1.39). That number is .4177 (or 41.77). This
    is also the probability of selecting a value at
    random and having it fall between 0.00 and 1.39
    (or -1.39).

The probability of selecting a scorebetween the
mean and -1.39 sd or a z-score of -1.39 is .4177
21
Calculate the probability of selecting a value at
random greater than a z-score (or s.d.) of 2.01.
  • You should be able to find the area under the
    curve, also known as probability, for a value of
    2.01. Run down the first column until you get to
    2.0. Then across the row to the .01 column.
  • However, consider the curve at the lower right.
    See the figure .4772? That's the area under the
    curve for 0.00 to 2.00. What you want is the area
    beyond, or to the right of 2.01. To get that, you
    have to subtract the probability for 0.00 to 2.01
    (.4778) from all of the area on the right side of
    the curve. That is, from 0.00 to infinity. What
    is this figure? Can't remember? (remember that
    .50 is to the right of the mean and .50 is to the
    left. Focus on the right .50-.4778 .0222.
    The probability of selecting a value greater than
    a z score of 2.01 is .0222. Look at the z-score
    table.
  • Remember, report probability, not percentage.
    Click to get table.

22
Calculate the probability of randomly selecting a
value that is greater than a z-score of 2.00 or
less than a z-score of -2.00.
  • This assignment requires you to consider both
    sides of the normal curve. What are the chances
    of selecting a z score beyond 2.0 or -2.0?
  • Like the last assignment, you need to calculate
    "what's remaining" under the curve from 0.00 to
    -2.00 and 0.00 to 2.00. To do this, you need to
    determine the probability from 0.00 to 2.00 (that
    is .4772 subtract this from .50 to get .0228)
    double that value to get the probability of
    obtaining a z score greater than 2.00 or less
    than -2.0 in this case, .0556.
  • Click to get table.

23
Assignment 1
  • Calculate the z-score equivalents of these
    systolic blood pressure values 100, 120, 130,
    140, and 190, where the mean equals 126.2, and
    the standard deviation equals 18.8.  Click here
    to enter an excel file to do the work.

24
Assignment 2
  • Calculate the probability of selecting a blood
    pressure value at random greater than 160, where
    the mean is 126.20 and the standard deviation is
    18.80.
  • To determine this value, you first need to
    calculate the equivalent z-score, as you did in
    the prior assignment. Then, determine the area
    under the curve represented by that z-score.
    Finally, calculate the area beyond, or greater
    than that value. The resulting value represents
    probability. Click here for an excel file to work
    on.

25
Assignment 3
  • Calculate the probability of selecting a blood
    pressure value at random between 110 and 135
    where the mean is 126.20 and the s.d is 18.80.
    Click here for an excel file to work on.

26
  • We're about to make the transition from
    descriptive to inferential statistics. The heart
    of inferential statistics is "statistical
    significance." This is the probability that the
    your calculated value was "real" or happened by
    random chance. Here's an example

27
  • Let's say that you're working with a group of
    people to get their blood pressure down. One of
    the things you do is put them on an exercise
    program to reduce their weight and to strengthen
    their cardiovascular system. All things being
    equal, if your program is successful, their blood
    pressure should come down.
  • If you take individuals' blood pressure at the
    beginning of the study, calculate the mean, and
    take it again and average it again at the end of
    the study, the mean blood pressure at the end
    should be lower than at the beginning. However,
    how much does blood pressure have to be lowered
    to assert with some confidence that our program
    was successful? If the beginning group average
    was 128 and the ending was 124, is this
    difference large enough to claim programmatic
    success? What if the ending average was 120? 110?
    This is the issue central to statistical
    significance is the difference between the two
    values so close to the mean of the normal
    distributionzerothat the probability of
    selecting the value at random is too high to
    accept as "real?" Or, is it so far from the mean
    that it's out in one of the tails of the
    distribution, where the chances of pulling out of
    the hat of values randomly is very small, and,
    therefore, more likely "real?"
  • Here are some ways that a difference in mean
    values before and after your high blood pressure
    occur by chance alone for any particular group
    of people, their blood pressure might go down for
    reasons other than our exercise program. Maybe
    they had a high salt diet when they started and
    cut down on their sodium intake. Maybe they were
    experiencing a lot of stress and they got it
    under control. Or, maybe their blood pressure
    just went down unexpectedly.
  • Conversely, maybe your program actually worked.
    If so, you should be able to select another,
    similar group, conduct the same program, and find
    similar a similar difference between starting and
    ending blood pressure values. Not exactly the
    same, but the difference should be fall in the
    same general area on the normal curve. If you
    conducted this program with 100 similar groups
    and found about the same difference, see how
    you'd be pretty confident that your program
    actually worked? Well, you probably can only run
    it once, so you need that difference value to
    fall a good long way from the mean in order to
    have confidence that your program worked. That's
    what statistical significance does for you.

28
  • Remember the five blood pressure values that you
    worked with for the last assignment. Pretend
    that you conducted your study five times, and
    those numbers from the assignment represented the
    five studies.
  • Many researchers use a statistical significance
    level of .05 as their critical level. This means
    that if the value is statistically-significant,
    it will occur by chance alone, less than 5 times
    in 100.
  • Another way of looking at the .05 critical value
    is that it has to fall into one of the two tails
    of the normal distribution, and not from that big
    bunch of scores in the middle. Why? Because there
    are lots of scores in the middle, and you have a
    very good chance of selecting one of them by
    chance alone. So, if the mean difference score is
    in the "bulge" of the distribution, the
    probability that it happened by chance alone is
    too great for you to assume your program worked.

29
  • So, on the normal curve, in order for a value to
    be statistically-significant at the 95 level or
    greater, you have to have a z-score of greater
    than 1.96 or less than -1.96. That is, the value
    has to come from the area to the left of -1.96 or
    from the area to the right of 1.96. This is the
    only way that you can reduce your odds that the
    value you calculated occurred by random chance
    alone less than 5 times in 100.
  • If the value you calculate is statistically-signif
    icant at, let's say the .05 level, you write it
    like this plt.05. "lt" stands for "less than." If
    it's not statistically-significant, you write
    n.s. or ns (meaning that pgt.05. "gt" stands for
    "greater than).
  • The reason that we almost always consider both
    sides of the normal curve is because our
    calculated value might be greater or less than
    the mean. For the blood pressure example, our
    program may have failed so badly that we actually
    caused the average ending blood pressure to
    increase! We'll talk a bit more about one-tail vs
    two-tail tests a little later.
Write a Comment
User Comments (0)
About PowerShow.com