Title: Chapter 3 Descriptive Statistics
1Chapter 3 - Descriptive Statistics
- Given precise enough measurement, even supposedly
constant process conditions produce differing
responses. - For this reason we are not as interested in
individual data values as we are in the pattern
or distribution of the data as a whole. - Well use graphs to display the distribution and
statistics to describe the distribution.
23.1 Graphing Quantitative Data
- Dot diagram
- Each observation is a dot placed at a position
corresponding to its numerical value. - Stem-and-leaf plot
- Made by using the last few digits of each data
point to indicate where it falls - Both displays require that each data point covers
the same amount of space.
3Example 3.1
- The government requires manufacturers to monitor
the amount of radiation emitted through the
closed door of a microwave. The following are
radiation amounts emitted by 24 microwaves
measured by one manufacturer.
4Example 3.1
- Its easiest to first order your data.
5Dot Diagram
.00
.30
.20
.10
6Stem-and-Leaf Plot
- Use the first digit after the decimal place as
the stem and the second as the leaf
7Example 3.2
- Other examples of splitting data into stem and
leaf
8Stem-and-leaf Plot Vocabulary
- Back-to-back
- Example Mens vs. Womens heights
- Recorded in inches
- Men Women 630 0064000
65000000006600
- Split Stems
- There are multiple rows for each stem.
- Bins are divided at between digits.
- Womens heights639641647865011655
9Graphing Quantitative Data
- Frequency tables
- Break data into intervals of equal length and
then tally the data points in each interval - The number of intervals we use varies
- Every data point is included in exactly one
interval. - Histograms
- Break data into intervals of equal length and
then create a connected bar chart - Begin vertical axis at zero
- Draw bars with equal width
10Frequency Table
11Histogram
6
4
2
.03
.21
.15
.09
.27
12Histogram (different intervals)
8
6
4
2
.05
.25
.15
13BAD Histogram
8
6
4
2
.05
.25
.15
14Distributional Shapes
Bell-shaped
Right-skewed
Left-skewed
Bimodal (Multimodal)
Uniform (Unimodal)
Truncated
15Bivariate Quantitative Data
- Scatterplot
- Plot each data point as a dot where its response
variable and supervised variable - Look for patterns
- Run Chart
- Plot data points in order determined by the time
of observation - Look for patterns
16Example 3.3
- Scatterplot for ACT score and Highschool GPA for
12 students
The scatterplot of GPA against ACT score shows a
fairly strong, positive, linear relationship.
17Example 3.4
strong, positive, linear relationship
strong, negative, linear relationship
strong curved relationship
18Example 3.4 (continued)
weak, positive, linear relationship
weak, negative, linear relationship
no relationship
19Example 3.5
- Run Chart - Suppose that we plot the number of
typing errors made per minute by a typist against
time.
Pattern steady increase until we hit minute 10,
then there is a sharp drop followed by another
steady increase. It was discovered that the
typist was given a five minute break after minute
10.
203.2 Quantiles
- The p quantile, denoted as Q(p), is the number
such that p is the percentage of the distribution
that lies to the left of Q(p), and 1-p is the
percentage of the distribution that lies to the
right of Q(p).
1/2
Relative Frequency
Q(1/3) 2/3
2/3 4/3 2
21Quantiles (Book)
- For empirical distributions, the above definition
translates to - For an ordered data set x1 x2 xn
- 1. For i 1,2,,n the p quantile of
the data set is - the ith smallest data point, xi. That is
22Quantiles (Book)
-
- 2. For any p not equal to for some integer i
n such -
- that , the p quantile is obtained by
linear - interpolation between the values of
- with corresponding that bracket p
-
23Finding Quantiles
- General procedure for finding the p quantile of
an empirical distribution - Order data values x(1) x(2) x(n)
- Set i np0.5
- If i 1, 2, , n then
- otherwise,
24Example 3.5
- Ten batteries were tested to determine how long
the batteries would last (hrs) under normal
conditions. Below are the ten values that were
obtained - 100, 120, 80, 90, 95, 115, 120, 110, 105, 95
- Give values for Q(.35) and Q(.42)
- Give the values for Q(.68) and Q(.90)
- Give the values for Q(.25), Q(.50) and Q(.75)
- Step1 Order the data
- 80,90,95,95,100,105,110,115,120,120
25Example 3.5a
- Give the values for Q(.35) and Q(.42)
26Example 3.5b
- Give the values for Q(0.68) and Q(0.90)
27Example 3.5c
- Give the values for Q(0.25),Q(.50) and Q(0.75)
28Quantile Terminology
- Special quantiles
- Q(.25) Q1, 1st quartile, lower quartile
- Q(.5) Q2, 2nd quartile, median
- Q(.75) Q3, 3rd quartile, upper quartile
- Special values associated with quartiles
- Inter-quartile range (IQR) Q3 Q1
- Upper fence Q3 1.5IQR
- Lower fence Q1 1.5IQR
29Boxplots
- Another tool used to illustrate the distribution
- Steps for making a boxplot
- Order the observed data values
- Find Q1, Q2, Q3, IQR, UF and LF
- Draw a box that spans the IQR
- Divide the box at the median (Q2)
- Draw asterisks (or dots) for any data values less
than the lower fence and any values greater than
the upper fence - Draw a line from the sides of the box to the
smallest value greater than the LF and the
largest value smaller than the UF
30Example 3.6
- Draw a boxplot based on the 15 ordered values
below - 75, 80, 80, 85, 90, 95, 95, 100, 105, 110, 110,
115, 120, 120, 125 - Find the necessary values
-
31Example 3.6
- Q186.25, Q2100, Q3113.75, so
- IQR 113.75 86.25 41.25
- UF Q(.75)1.5(IQR) 155
- LF Q(.25) 1.5(IQR) 45
- Identify all values outside the upper and lower
fences none
32Example 3.6
33Q-Q Plots
- Q-Q plot Quantile-Quantile Plot
- Used for two data sets
- We plot Q(p) for data set 1 (denoted Q1(p))
versus Q(p) for data set 2 - We will only deal with two data sets of the same
size - Straight line indicates that the two data sets
have the same distributional shape.
34Example 3.7
- Data Set 1 1, 2, 3, 4, 5
- Data Set 2 6, 7, 8, 9, 10
35Example 3.8
- Data set 1 1, 5, 7, 8, 9, 10
- Data set 2 -10, -9, -8, -7, -5, -1
36Normal Probability Plot
- A normal probability plot is a type of Q-Q plot
that allows us to determine if the distribution
of our data is bell-shaped (normal). - A straight line is indication that our data is
normal/bell-shaped. - An S-shaped line indicates that our data is
skewed.
37Table 3.10
- Table 3.10 (page 89) in the book gives some
quantiles for a distribution that is known to be
bell-shaped. - The numbers in the body of the table give Q(p)
for p given by the margins of the table. - Q(.23) -.74
- Row .2 and Column .03 give the p0.23 quantile as
-.74 - Q(.79) .81
- Row .7 and Column .09 give the p0.79 quantile as
.81
38Example 3.9
- Annual incomes (in thousands of dollars) for 8
families (in a common geographical location) are
given below 23, 31, 43, 47, 51, 58, 67, 83 - Does this data appear to be from a bell-shaped
distribution? - Remember p
39Example 3.9
Bell-shaped distribution?
403.3 Numerical Measures
- Measures of location
- Median
- Same as Q(.5)
- Not affected by skew or outliers (extreme
observations) - Sample Mean
- For data , the mean is
given as - Strongly affected by skew or outliers
41Example 3.10
- Data set 1 2, 3, 5, 8, 12
- Median 5
- Mean (235812)/5 6
- Data set 2 2, 3, 5, 8, 102
- Median 5
- Mean 24
42Measures of Spread
- IQR
- Measures the spread of the middle half of the
data - Not sensitive to skew or outliers
- Range
-
- Highly sensitive to outliers
43Measures of Spread
- Sample Variance
-
- How much the data is spread from the sample mean,
. - Sensitive to outliers or skew
- Sample Standard Deviation
-
- Sensitive to outliers or skew
44Example 3.11
- Same data as example 3.9 23, 31, 43, 47,
51, 58, 67, 83 - Find the range
- Find the mean
- Find the variance
- Find the standard deviation
45Understanding Standard Deviation
- Chebyschevs Theorem
- For any data set and any number k larger than 1,
a fraction of at least of the
data are within ks of . - 3/4 of the data will be within 2 standard
deviations of the mean, 8/9 of the data will be
within 3 standard deviations of the mean, etc. - Standard deviation acts as a ruler
46Statistics vs. Parameters
- Numerical summaries of sample data are called
statistics. - Numerical summaries of population data are called
parameters. - Often represented by Greek letters
47Plots of Summary Statistics
- Example 9 from the book
- Three different glues are tested with three
different types of wood (3x3 factorial study) and
the mean strength is calculated (based on 3
observations) for each combination.
48Example 3.12
- A plot of the means categorized by glue and wood
shows that pine and fir have similar gluing
properties with pine being stronger. The gluing
properties of Oak are much different (opposite
trend).
250
oak
200
pine
150
fir
100
white
cascamite
carpenter
493.4 Statistics for Qualitative Data
- The fraction of items in the sample with a
particular characteristic is - The sample mean occurrences per unit of item is
- is closer in meaning to than to .
50Example 3.13
- A random sample of students from ISU is taken in
which there ends up being 210 freshmen, 171
sophomores, 182 juniors, and 115 seniors. - Find for each of the classifications.
51Example 3.14
- When studying the number of towns reporting power
outages caused by thunderstorms, it is found that
there were 8 storms in which no outages were
reported, 2 storms in which 3 outages were
reported, and 5 storms in which 1 outage was
reported. Find .
52Plotting Qualitative Data
- Bar Charts same as histograms, but without
intervals - Segmented Bar Charts each bar is a divided
between different levels of an additional
variable - Run Chart taken on categorical times
- Example daily
- Read more in section 3.4.2