Title: AP Stat Do Now
1AP Stat - Do Now
- Look at activity 5-2 City Temperatures
2Objectives
- Chapter 5 Describing Distributions Numerically
- Period 9 Example regarding nuns and quartiles
(if time) - How to use the frequency option on the calc.
- IQR Kurtosis (if time)
- Standard deviation
NJCCCS 4.4.12.A.5
3Using the Frequency Option on the Calc.
4Lets check out the standard deviation worksheet
5Nuns, happiness, longevity and quartiles
- Reading from pages 3 4 of Authentic Happiness
by Martin Seligman, Ph.D. - Learned helplessness
6IQR and Kurtosis
7Finding the Center The Median
- When we think of a typical value, we usually look
for the center of the distribution. - For a unimodal, symmetric distribution, its easy
to find the centerits just the center of
symmetry.
8Finding the Center The Median (cont.)
- The median is the value with exactly half the
data values below it and half above it. - It is the middle data
value (once the data
values have been
ordered) that divides
the
histogram into
two equal areas. - It has the same
units as the data.
9Spread Home on the Range
- Always report a measure of spread along with a
measure of center when describing a distribution
numerically. - The range of the data is the difference between
the maximum and minimum values - Range max min
- A disadvantage of the range is that a single
extreme value can make it very large and, thus,
not representative of the data overall.
10Spread The Interquartile Range
- The interquartile range (IQR) lets us ignore
extreme data values and concentrate on the middle
of the data. - To find the IQR, we first need to know what
quartiles are
11Spread The Interquartile Range (cont.)
- Quartiles divide the data into four equal
sections. - The lower quartile is the median of the half of
the data below the median. - The upper quartile is the median of the half of
the data above the median. - The difference between the quartiles is the IQR,
so - IQR upper quartile lower quartile
12Spread The Interquartile Range (cont.)
- The lower and upper quartiles are the 25th and
75th percentiles of the data, so - The IQR contains the
middle 50 of
the
values of the
distribution,
as shown in
Figure 5.3
from the text
13The Five-Number Summary
- The five-number summary of a distribution reports
its median, quartiles, and extremes (maximum and
minimum). - Example The five-number summary for the ages at
death for rock concert goers who died from being
crushed is
14Rock Concert Deaths Making Boxplots
- A boxplot is a graphical display of the
five-number summary. - Boxplots are particularly useful when comparing
groups.
15Constructing Boxplots
- Draw a single vertical axis spanning the range of
the data. Draw short horizontal lines at the
lower and upper quartiles and at the median. Then
connect them with vertical lines to form a box.
16Constructing Boxplots (cont.)
- Erect fences around the main part of the data.
- The upper fence is 1.5 IQRs above the upper
quartile. - The lower fence is 1.5 IQRs below the lower
quartile. - Note the fences only help with constructing the
boxplot and should not appear in the final
display.
17Constructing Boxplots (cont.)
- Use the fences to grow whiskers.
- Draw lines from the ends of the box up and down
to the most extreme data values found within the
fences. - If a data value falls outside one of the fences,
we do not connect it with a whisker.
18Constructing Boxplots (cont.)
- Add the outliers by displaying any data values
beyond the fences with special symbols. - We often use a different symbol for far
outliers that are farther than 3 IQRs from the
quartiles (optional).
19Rock Concert Deaths Making Boxplots (cont.)
- Compare the histogram and boxplot for rock
concert deaths - How does each display represent the distribution?
20Comparing Groups With Boxplots
- The following set of boxplots compares the
effectiveness of various coffee containers - What does this graphical display tell you?
21Summarizing Symmetric Distributions
- Medians do a good job of identifying the center
of skewed distributions, but it is just a
pointer to a middle value. - Mean takes into account every single value, so no
one is left out of the calculation. - Mean is also used in many of the formulas that we
will use later in the course. - When we have symmetric data that is free from
outliers, the mean is a good measure of center. - We find the mean by adding up all of the data
values and dividing by n, the number of data
values we have.
22Mean or Median?
- Regardless of the shape of the distribution, the
mean is the point at which a histogram of the
data would balance
23Mean or Median? (cont.)
- In symmetric distributions, the mean and median
are approximately the same in value, so either
measure of center may be used. - For skewed data, though, its better to report
the median than the mean as a measure of center.
24Summarizing Symmetric Distributions (cont.)
- The distribution of pulse rates for 52 adults is
generally symmetric, with a mean of 72.7 beats
per minute (bpm) and a median of 73 bpm
25The Formula for Averaging
- The formula for the mean is given by
- The formula says that to find the mean, we add up
the numbers and divide by n.
26What About Spread? The Standard Deviation
- A more powerful measure of spread than the IQR is
the standard deviation, which takes into account
how far each data value is from the mean. - A deviation is the distance that a data value is
from the mean. - Since adding all deviations together would total
zero, we square each deviation and find an
average of sorts for the deviations.
27What About Spread? The Standard Deviation (cont.)
- The variance, notated by s2, is found by summing
the squared deviations and (almost) averaging
them - The variance will play a role later in our study,
but it is problematic as a measure of spreadit
is measured in squared units!
28What About Spread? The Standard Deviation (cont.)
- The standard deviation, s, is just the square
root of the variance and is measured in the same
units as the original data.
29Thinking About Variation
- Since Statistics is about variation, spread is an
important fundamental concept of Statistics. - Measures of spread help us talk about what we
dont know. - When the data values are tightly clustered around
the center of the distribution, the IQR and
standard deviation will be small. - When the data values are scattered far from the
center, the IQR and standard deviation will be
large.
30Shape, Center, and Spread
- When telling about a quantitative variable,
always report the shape of its distribution,
along with a center and a spread. - If the shape is skewed, report the median and
IQR. - If the shape is symmetric, report the mean and
standard deviation and possibly the median and
IQR as well.
31What About Outliers?
- If there are any clear outliers and you are
reporting the mean and standard deviation, report
them with the outliers present and with the
outliers removed. The differences may be quite
revealing. - Note The median and IQR are not likely to be
affected by the outliers.
32What Can Go Wrong?
- Dont forget to do a reality checkdont let
technology do your thinking for you. - Dont forget to sort the values before finding
the median or percentiles. - Dont compute numerical summaries of a
categorical variable. - Watch out for multiple modesmultiple modes might
indicate multiple groups in your data.
33What Can Go Wrong? (cont.)
- Be aware of slightly different methodsdifferent
statistics packages and calculators may give you
different answers for the same data. - Beware of outliers.
- Make a picture (make a picture, make a picture).
34What Can Go Wrong? (cont.)
- Be careful when comparing groups that have very
different spreads. - Consider these side-by-side boxplots of cotinine
levels
35Re-expressing to Equalize the Spread of Groups
- Here are the side-by-side boxplots of the
log(cotinine) values
36What have we learned?
- We can now summarize distributions of
quantitative variables numerically. - The 5-number summary displays the min, Q1,
median, Q3, and max. - Measures of center include the mean and median.
- Measures of spread include the range, IQR, and
standard deviation. - We know which measures to use for symmetric
distributions and skewed distributions.
37What have we learned? (cont.)
- We can also display distributions with boxplots.
- While histograms better show the shape of the
distribution, boxplots reveal the center, middle
50, and any outliers in the distribution. - Boxplots are useful for comparing groups.
38AP Stat - Homework
- P. 73-82 3, 5, 7, 8, 9, 11, 12, 15, 16a-d,
19-21, 24, 25, 27, 29, 35 - Worth 10 points
- You can work with one other person and hand in
one assignment (remember that you are responsible
for all of the content, however) - Due Thursday
- QUIZ FRIDAY (CHAPTER 5)