Chapter 3 Descriptive Statistics - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Chapter 3 Descriptive Statistics

Description:

The p quantile, denoted as Q(p), is the number such that p is the percentage of ... General procedure for finding the p quantile of an empirical distribution ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 53

Provided by: karl252

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 3 Descriptive Statistics

1
Chapter 3 - Descriptive Statistics

Given precise enough measurement, even supposedly
constant process conditions produce differing
responses.
For this reason we are not as interested in
individual data values as we are in the pattern
or distribution of the data as a whole.
Well use graphs to display the distribution and
statistics to describe the distribution.

2
3.1 Graphing Quantitative Data

Dot diagram
Each observation is a dot placed at a position
corresponding to its numerical value.
Stem-and-leaf plot
Made by using the last few digits of each data
point to indicate where it falls
Both displays require that each data point covers
the same amount of space.

3
Example 3.1

The government requires manufacturers to monitor
the amount of radiation emitted through the
closed door of a microwave. The following are
radiation amounts emitted by 24 microwaves
measured by one manufacturer.

4
Example 3.1

Its easiest to first order your data.

5
Dot Diagram
.00
.30
.20
.10
6
Stem-and-Leaf Plot

Use the first digit after the decimal place as
the stem and the second as the leaf

7
Example 3.2

Other examples of splitting data into stem and
leaf

8
Stem-and-leaf Plot Vocabulary

Back-to-back
Example Mens vs. Womens heights
Recorded in inches
Men Women 630 0064000
65000000006600

Split Stems
There are multiple rows for each stem.
Bins are divided at between digits.
Womens heights639641647865011655

9
Graphing Quantitative Data

Frequency tables
Break data into intervals of equal length and
then tally the data points in each interval
The number of intervals we use varies
Every data point is included in exactly one
interval.
Histograms
Break data into intervals of equal length and
then create a connected bar chart
Begin vertical axis at zero
Draw bars with equal width

10
Frequency Table

Data from example 3.1

11
Histogram

Data from example 3.1

6
4
2
.03
.21
.15
.09
.27
12
Histogram (different intervals)
8
6
4
2
.05
.25
.15
13
BAD Histogram
8
6
4
2
.05
.25
.15
14
Distributional Shapes
Bell-shaped
Right-skewed
Left-skewed
Bimodal (Multimodal)
Uniform (Unimodal)
Truncated
15
Bivariate Quantitative Data

Scatterplot
Plot each data point as a dot where its response
variable and supervised variable
Look for patterns
Run Chart
Plot data points in order determined by the time
of observation
Look for patterns

16
Example 3.3

Scatterplot for ACT score and Highschool GPA for
12 students

The scatterplot of GPA against ACT score shows a
fairly strong, positive, linear relationship.
17
Example 3.4
strong, positive, linear relationship
strong, negative, linear relationship
strong curved relationship
18
Example 3.4 (continued)
weak, positive, linear relationship
weak, negative, linear relationship
no relationship
19
Example 3.5

Run Chart - Suppose that we plot the number of
typing errors made per minute by a typist against
time.

Pattern steady increase until we hit minute 10,
then there is a sharp drop followed by another
steady increase. It was discovered that the
typist was given a five minute break after minute
10.
20
3.2 Quantiles

The p quantile, denoted as Q(p), is the number
such that p is the percentage of the distribution
that lies to the left of Q(p), and 1-p is the
percentage of the distribution that lies to the
right of Q(p).

1/2
Relative Frequency
Q(1/3) 2/3
2/3 4/3 2
21
Quantiles (Book)

For empirical distributions, the above definition
translates to
For an ordered data set x1 x2 xn
1. For i 1,2,,n the p quantile of
the data set is
the ith smallest data point, xi. That is

22
Quantiles (Book)

2. For any p not equal to for some integer i
n such
that , the p quantile is obtained by
linear
interpolation between the values of
with corresponding that bracket p

23
Finding Quantiles

General procedure for finding the p quantile of
an empirical distribution
Order data values x(1) x(2) x(n)
Set i np0.5
If i 1, 2, , n then
otherwise,

24
Example 3.5

Ten batteries were tested to determine how long
the batteries would last (hrs) under normal
conditions. Below are the ten values that were
obtained
100, 120, 80, 90, 95, 115, 120, 110, 105, 95
Give values for Q(.35) and Q(.42)
Give the values for Q(.68) and Q(.90)
Give the values for Q(.25), Q(.50) and Q(.75)
Step1 Order the data
80,90,95,95,100,105,110,115,120,120

25
Example 3.5a

Give the values for Q(.35) and Q(.42)

26
Example 3.5b

Give the values for Q(0.68) and Q(0.90)

27
Example 3.5c

Give the values for Q(0.25),Q(.50) and Q(0.75)

28
Quantile Terminology

Special quantiles
Q(.25) Q1, 1st quartile, lower quartile
Q(.5) Q2, 2nd quartile, median
Q(.75) Q3, 3rd quartile, upper quartile
Special values associated with quartiles
Inter-quartile range (IQR) Q3 Q1
Upper fence Q3 1.5IQR
Lower fence Q1 1.5IQR

29
Boxplots

Another tool used to illustrate the distribution
Steps for making a boxplot
Order the observed data values
Find Q1, Q2, Q3, IQR, UF and LF
Draw a box that spans the IQR
Divide the box at the median (Q2)
Draw asterisks (or dots) for any data values less
than the lower fence and any values greater than
the upper fence
Draw a line from the sides of the box to the
smallest value greater than the LF and the
largest value smaller than the UF

30
Example 3.6

Draw a boxplot based on the 15 ordered values
below
75, 80, 80, 85, 90, 95, 95, 100, 105, 110, 110,
115, 120, 120, 125
Find the necessary values

31
Example 3.6

Q186.25, Q2100, Q3113.75, so
IQR 113.75 86.25 41.25
UF Q(.75)1.5(IQR) 155
LF Q(.25) 1.5(IQR) 45
Identify all values outside the upper and lower
fences none

32
Example 3.6
33
Q-Q Plots

Q-Q plot Quantile-Quantile Plot
Used for two data sets
We plot Q(p) for data set 1 (denoted Q1(p))
versus Q(p) for data set 2
We will only deal with two data sets of the same
size
Straight line indicates that the two data sets
have the same distributional shape.

34
Example 3.7

Data Set 1 1, 2, 3, 4, 5
Data Set 2 6, 7, 8, 9, 10

35
Example 3.8

Data set 1 1, 5, 7, 8, 9, 10
Data set 2 -10, -9, -8, -7, -5, -1

36
Normal Probability Plot

A normal probability plot is a type of Q-Q plot
that allows us to determine if the distribution
of our data is bell-shaped (normal).
A straight line is indication that our data is
normal/bell-shaped.
An S-shaped line indicates that our data is
skewed.

37
Table 3.10

Table 3.10 (page 89) in the book gives some
quantiles for a distribution that is known to be
bell-shaped.
The numbers in the body of the table give Q(p)
for p given by the margins of the table.
Q(.23) -.74
Row .2 and Column .03 give the p0.23 quantile as
-.74
Q(.79) .81
Row .7 and Column .09 give the p0.79 quantile as
.81

38
Example 3.9

Annual incomes (in thousands of dollars) for 8
families (in a common geographical location) are
given below 23, 31, 43, 47, 51, 58, 67, 83
Does this data appear to be from a bell-shaped
distribution?
Remember p

39
Example 3.9
Bell-shaped distribution?
40
3.3 Numerical Measures

Measures of location
Median
Same as Q(.5)
Not affected by skew or outliers (extreme
observations)
Sample Mean
For data , the mean is
given as
Strongly affected by skew or outliers

41
Example 3.10

Data set 1 2, 3, 5, 8, 12
Median 5
Mean (235812)/5 6
Data set 2 2, 3, 5, 8, 102
Median 5
Mean 24

42
Measures of Spread

IQR
Measures the spread of the middle half of the
data
Not sensitive to skew or outliers
Range
Highly sensitive to outliers

43
Measures of Spread

Sample Variance
How much the data is spread from the sample mean,
.
Sensitive to outliers or skew
Sample Standard Deviation
Sensitive to outliers or skew

44
Example 3.11

Same data as example 3.9 23, 31, 43, 47,
51, 58, 67, 83
Find the range
Find the mean
Find the variance
Find the standard deviation

45
Understanding Standard Deviation

Chebyschevs Theorem
For any data set and any number k larger than 1,
a fraction of at least of the
data are within ks of .
3/4 of the data will be within 2 standard
deviations of the mean, 8/9 of the data will be
within 3 standard deviations of the mean, etc.
Standard deviation acts as a ruler

46
Statistics vs. Parameters

Numerical summaries of sample data are called
statistics.
Numerical summaries of population data are called
parameters.
Often represented by Greek letters

47
Plots of Summary Statistics

Example 9 from the book
Three different glues are tested with three
different types of wood (3x3 factorial study) and
the mean strength is calculated (based on 3
observations) for each combination.

48
Example 3.12

A plot of the means categorized by glue and wood
shows that pine and fir have similar gluing
properties with pine being stronger. The gluing
properties of Oak are much different (opposite
trend).

250
oak
200
pine
150
fir
100
white
cascamite
carpenter
49
3.4 Statistics for Qualitative Data

The fraction of items in the sample with a
particular characteristic is
The sample mean occurrences per unit of item is
is closer in meaning to than to .

50
Example 3.13

A random sample of students from ISU is taken in
which there ends up being 210 freshmen, 171
sophomores, 182 juniors, and 115 seniors.
Find for each of the classifications.

51
Example 3.14

When studying the number of towns reporting power
outages caused by thunderstorms, it is found that
there were 8 storms in which no outages were
reported, 2 storms in which 3 outages were
reported, and 5 storms in which 1 outage was
reported. Find .

52
Plotting Qualitative Data

Bar Charts same as histograms, but without
intervals
Segmented Bar Charts each bar is a divided
between different levels of an additional
variable
Run Chart taken on categorical times
Example daily
Read more in section 3.4.2

Write a Comment

User Comments (0)