Chapter One Data Collection - PowerPoint PPT Presentation

1 / 346
About This Presentation
Title:

Chapter One Data Collection

Description:

A predictor variable is a variable which somehow affects the response variable. ... above each class midpoint at a height equal to the frequency of the class. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 347
Provided by: michae1237
Category:

less

Transcript and Presenter's Notes

Title: Chapter One Data Collection


1
Chapter OneData Collection
  • 1.1
  • Introduction to the Practice of Statistics

2
Statistics is the science of data. A statistic
is a quantity derived from information collected
from the population of interest (e.g., mean.)
3
Variables are the characteristics of the
individuals within the population.
4
Variables are the characteristics of the
individuals within the population Variables are
the quantities we are interested in measuring to
find out something about a population.
5
Qualitative or Categorical variables allow for
classification of individuals based on some
attribute or characteristic.
6
Qualitative or Categorical variables allow for
classification of individuals based on some
attribute or characteristic.
Quantitative variables provide numerical measures
of individuals. Arithmetic operations (means,
etc.) can be performed on these.
7
Two Types of Quantitative Variables
8
Two Types of Quantitative Variables
A discrete variable is a quantitative variable
that either has a finite number of possible
values or a countable number of possible values.
The term countable means the values result from
counting such as 0, 1, 2, 3, and so on.
9
Two Types of Quantitative Variables
A discrete variable is a quantitative variable
that either has a finite number of possible
values or a countable number of possible values.
The term countable means the values result from
counting such as 0, 1, 2, 3, and so on.
A continuous variable is a quantitative variable
that has an infinite number of possible values it
can take on and can be measured to any desired
level of accuracy.
10
The list of observations a set of variables
assume is called data.
11
The list of observations a set of variables
assume is called data. Qualitative data are
observations corresponding to a qualitative
variable.
12
The list of observations a set of variables
assume is called data. Qualitative data are
observations corresponding to a qualitative
variable. Quantitative data are observations
corresponding to a quantitative(numerical)
variable.
13
  • The list of observations a set of variables
    assume is called data.
  • Qualitative data are observations corresponding
    to a qualitative variable.
  • Quantitative data are observations corresponding
    to a quantitative(numerical) variable.
  • Discrete data are observations corresponding to
    a discrete variable.

14
  • The list of observations a set of variables
    assume is called data.
  • Qualitative data are observations corresponding
    to a qualitative variable.
  • Quantitative data are observations corresponding
    to a quantitative(numerical) variable.
  • Discrete data are observations corresponding to
    a discrete variable.
  • Continuous data are observations corresponding
    to a continuous variable.

15
Chapter OneData Collection
  • 1.2 1.3
  • Observational Studies
  • Random Sampling

16
A population is the entire group of people or
objects that we are interested in studying.
17
A population is the entire group of people or
objects that we are interested in studying. A
sample is simply a subcollection of individuals
from a population.
18
Two Types of Studies
19
  • Two Types of Studies
  • Observational Study

20
  • Two Types of Studies
  • Observational Study
  • Designed Experiment

21
An observational study does not attempt to
manipulate or apply a treatment to the
individuals in the sample.
22
An observational study does not attempt to
manipulate or apply a treatment to the
individuals in the sample. Observational studies
are useful for determining if there is a relation
(correlation) between two variables in a
population.
23
Sampling Techniques from Finite Populations
24
  • Sampling Techniques from Finite Populations
  • Simple Random Samples

25
  • Sampling Techniques from Finite Populations
  • Simple Random Samples
  • Stratified Samples

26
  • Sampling Techniques from Finite Populations
  • Simple Random Samples
  • Stratified Samples
  • Systematic Samples

27
  • Sampling Techniques from Finite Populations
  • Simple Random Samples
  • Stratified Samples
  • Systematic Samples
  • Cluster Samples

28
  • Sampling Techniques from Finite Populations
  • Simple Random Samples
  • Stratified Samples
  • Systematic Samples
  • Cluster Samples
  • Convenience Samples

29
Simple Random Samples
30
Simple Random Samples N Number of individuals
in population.
31
Simple Random Samples N Number of individuals
in population. n Number of individuals selected
in sample.
32
Simple Random Samples N Number of individuals
in population. n Number of individuals selected
in sample. If each such sample of size n is
equally probable to be selected, it is a simple
random sample.
33
Steps for Obtaining a Simple Random Sample
34
Steps for Obtaining a Simple Random Sample
1) List all the individuals in the population of
interest.
35
Steps for Obtaining a Simple Random Sample
1) List all the individuals in the population of
interest. 2) Number the individuals from 1 - N.
36
Steps for Obtaining a Simple Random Sample
1) List all the individuals in the population of
interest. 2) Number the individuals from 1 -
N. 3) Use a random number table, graphing
calculator, or statistical software to randomly
generate n numbers where n is the desired sample
size.
37
Stratified Random Sample
38
Stratified Random Sample A stratified random
sample is one obtained by separating the
population into non-overlapping groups called
strata.
39
Stratified Random Sample A stratified random
sample is one obtained by separating the
population into non-overlapping groups called
strata. A simple random sample is then obtained
from each stratum.
40
Stratified Random Sample A stratified random
sample is one obtained by separating the
population into non-overlapping groups called
strata. A simple random sample is then obtained
from each stratum. Each stratum is relatively
homogeneous with respect to a certain variable.
41
A systematic sample is obtained by selecting
every kth individual from the population up to
the desired sample size n.
42
STEPS IN SYSTEMATIC SAMPLING, POPULATION SIZE
KNOWN
43
STEPS IN SYSTEMATIC SAMPLING, POPULATION SIZE
KNOWN Step 1 Determine the population size, N.
44
STEPS IN SYSTEMATIC SAMPLING, POPULATION SIZE
KNOWN Step 1 Determine the population size,
N. Step 2 Determine the sample size desired, n.
45
STEPS IN SYSTEMATIC SAMPLING, POPULATION SIZE
KNOWN Step 1 Determine the population size,
N. Step 2 Determine the sample size desired,
n. Step 3 Compute N/n and round down to the
nearest integer. This value is k.
46
STEPS IN SYSTEMATIC SAMPLING, POPULATION SIZE
KNOWN Step 1 Determine the population size,
N. Step 2 Determine the sample size desired,
n. Step 3 Compute N/n and round down to the
nearest integer. This value is k. Step 4
Randomly select a number between 1 and k. Call
this number p.
47
STEPS IN SYSTEMATIC SAMPLING, POPULATION SIZE
KNOWN Step 1 Determine the population size,
N. Step 2 Determine the sample size desired,
n. Step 3 Compute N/n and round down to the
nearest integer. This value is k. Step 4
Randomly select a number between 1 and k. Call
this number p. Step 5 Select every kth
individual starting at the pth individual. The
sample will consist of the following
individuals p, p k, p 2k,
48
A cluster sample is obtained by selecting all
individuals within a randomly selected collection
or group of individuals.
49
A convenience sample is one in which the
individuals in the sample are easily obtained.
Any studies that use this type of sampling
generally have results that are suspect. Results
should be looked upon with extreme skepticism.
50
Chapter OneData Collection
  • 1.5
  • The Design of Experiments

51
A designed experiment is a controlled study in
which one or more treatments are applied to
experimental units.
52
A designed experiment is a controlled study in
which one or more treatments are applied to
experimental units. The experimenter then
observes the effect of varying these treatments
on a response variable.
53
The experimental unit (or subject) is a person,
object or some other well-defined item upon which
a treatment is applied.
54
The experimental unit (or subject) is a person,
object or some other well-defined item upon which
a treatment is applied. The treatment is a
condition applied to the experimental unit.
55
The experimental unit (or subject) is a person,
object or some other well-defined item upon which
a treatment is applied. The treatment is a
condition applied to the experimental unit. A
response variable is a quantitative or
qualitative variable that represents our variable
of interest.
56
The experimental unit (or subject) is a person,
object or some other well-defined item upon which
a treatment is applied. The treatment is a
condition applied to the experimental unit. A
response variable is a quantitative or
qualitative variable that represents our variable
of interest. A predictor variable is a variable
which somehow affects the response variable.
These include the experimental treatment.
57
Chapter TwoOrganizing and Summarizing Data
  • 2.1
  • Organizing Qualitative Data

58
Suppose there are k categories listed from 1 to
k.
59
Suppose there are k categories listed from 1 to
k. We collect sample of size n.
60
Suppose there are k categories listed from 1 to
k. We collect sample of size n. n1 fall in
category 1.
61
Suppose there are k categories listed from 1 to
k. We collect sample of size n. n1 fall in
category 1. n2 fall in category 2.
62
Suppose there are k categories listed from 1 to
k. We collect sample of size n. n1 fall in
category 1. n2 fall in category 2. . . . . . . .
63
Suppose there are k categories listed from 1 to
k. We collect sample of size n. n1 fall in
category 1. n2 fall in category 2. . . . . . .
. nk fall in category k.
64
The frequency of a category is the number of
times the data fall into that category.
65
The frequency of a category is the number of
times the data fall into that category. The
frequency of category j is nj for j 1, . . .,
k.
66
The frequency of a category is the number of
times the data fall into that category. The
frequency of category j is nj for j 1, . . .,
k. Note n n1 n2 . . . nk
67
A frequency distribution lists the number of
occurrences for each category of data.
68
A frequency distribution lists the number of
occurrences for each category of data. In other
words, the list of frequencies (n1, n2,. . .,
nk) is the frequency distribution for the
categorical variable with categories 1 - k.
69
The relative frequency is the proportion or
percent of observations within a category and is
found using the formula
70
The relative frequency is the proportion or
percent of observations within a category and is
found using the formula
71
The relative frequency is the proportion or
percent of observations within a category and is
found using the formula I.e., relative
frequency for category j is rj nj / n.
72
A relative frequency distribution lists the
relative frequency of each category of data.
73
A relative frequency distribution lists the
relative frequency of each category of data. In
other words, the list of relative
frequencies (r1, r2,. . ., rk) is the
relative frequency distribution for the
categorical variable with categories 1 - k.
74
A relative frequency distribution lists the
relative frequency of each category of data. In
other words, the list of relative
frequencies (r1, r2,. . ., rk) is the
relative frequency distribution for the
categorical variable with categories 1 -
k. Note 1 r1 r2 . . . rk
75
A bar graph is constructed by
76
  • A bar graph is constructed by
  • Labeling each category of data on a horizontal
    axis.

77
  • A bar graph is constructed by
  • Labeling each category of data on a horizontal
    axis.
  • The frequency or relative frequency of the
    category on the vertical axis.

78
  • A bar graph is constructed by
  • Labeling each category of data on a horizontal
    axis.
  • The frequency or relative frequency of the
    category on the vertical axis.
  • A rectangle of equal width is drawn for each
    category whose height is equal to the category's
    frequency or relative frequency.

79
Chapter TwoOrganizing and Summarizing Data
  • 2.2
  • Organizing Quantitative Data I

80
Summarizing Quantitative Data
81
  • Summarizing Quantitative Data
  • Discrete Data

82
  • Summarizing Quantitative Data
  • Discrete Data
  • Recall that discrete data consist of a finite or
    countable (0, 1, 2, ) number of numerical values.

83
  • Summarizing Quantitative Data
  • Discrete Data
  • Recall that discrete data consist of a finite or
    countable (0, 1, 2, ) number of numerical
    values.
  • Continuous Data

84
  • Summarizing Quantitative Data
  • Discrete Data
  • Recall that discrete data consist of a finite or
    countable (0, 1, 2, ) number of numerical
    values.
  • Continuous Data
  • Recall that continuous data are real numbers
    an infinite number of possible values measured
    with any degree of accuracy.

85
When summarizing quantitative data, we need to
create groups of numbers called classes.
86
When summarizing quantitative data, we need to
create groups of numbers called classes. We can
then construct frequency distributions and
relative frequency distributions using these
classes.
87
When summarizing quantitative data, we need to
create groups of numbers called classes. We can
then construct frequency distributions and
relative frequency distributions using these
classes. With discrete data we can use each
individual number as its own class.
88
A histogram is a graphical representation of the
frequencies in each class.
89
A histogram is a graphical representation of the
frequencies in each class. It is constructed by
drawing rectangles for each class of data
90
A histogram is a graphical representation of the
frequencies in each class. It is constructed by
drawing rectangles for each class of data
Frequency histogram the height is the frequency
of the class.
91
A histogram is a graphical representation of the
frequencies in each class. It is constructed by
drawing rectangles for each class of data
Frequency histogram the height is the frequency
of the class. Relative frequency histogram the
height is the relative frequency of the class.
92
Continuous data are summarized similarly to
discrete data.
93
Continuous data are summarized similarly to
discrete data. However, with continuous data we
need to create classes instead of using
individual numbers as classes.
94
Continuous data is summarized similarly to
discrete data. However, with continuous data we
need to create classes instead of using
individual numbers as classes. Classes for
continuous data are created by using
non-overlapping intervals of (usually) equal
width.
95
A rule of thumb is that we want approximately 5
to 20 classes.
96
A rule of thumb is that we want approximately 5
to 20 classes. For smaller datasets use fewer
classes and for larger datasets use more
classes.
97
Steps for Making a Frequency Distribution with
Continuous Data
98
Steps for Making a Frequency Distribution with
Continuous Data Step 1 Determine of classes
C.
99
Steps for Making a Frequency Distribution with
Continuous Data Step 1 Determine of classes
C. Step 2 Calculate Range of data
100
Steps for Making a Frequency Distribution with
Continuous Data Step 1 Determine of classes
C. Step 2 Calculate Range of data R
Largest Value Smallest Value
101
Steps for Making a Frequency Distribution with
Continuous Data Step 1 Determine of classes
C. Step 2 Calculate Range of data R
Largest Value Smallest Value Step 3 Let W
R / C (approximately). W is called the class
width.
102
Steps for Making a Frequency Distribution with
Continuous Data Step 1 Determine of classes
C. Step 2 Calculate Range of data R
Largest Value Smallest Value Step 3 Let W
R / C (approximately). W is called the class
width. Step 4 Starting at a value equal to or
slightly less than the lowest value in the data,
create C classes of width W.
103
Steps for Making a Frequency Distribution with
Continuous Data Step 1 Determine of classes
C. Step 2 Calculate Range of data R
Largest Value Smallest Value Step 3 Let W
R / C (approximately). W is called the class
width. Step 4 Starting at a value equal to or
slightly less than the lowest value in the data,
create C classes of width W. Step 5 Tally
frequencies for each class.
104
Histograms for continuous data are made exactly
as for discrete data.
105
Histograms for continuous data are made exactly
as for discrete data. We can make a frequency
histogram using the frequency distribution.
106
Histograms for continuous data are made exactly
as for discrete data. We can make a frequency
histogram using the frequency distribution. We
can make a relative frequency histogram using the
relative frequency distribution.
107
Stem-and-Leaf Plots
108
Stem-and-Leaf Plots Stem-and-Leaf Plots are
analogous to histograms but display more
numerical details of the data.
109
Stem-and-Leaf Plots Stem-and-Leaf Plots are
analogous to histograms but display more
numerical details of the data. A stem-and-leaf
plot is essentially a histogram turned on its
side.
110
Construction of a Stem-and-Leaf Plot
111
Construction of a Stem-and-Leaf Plot Step 1
The stem is the leading digit(s) The leaf is
the rightmost digit. (The choice of the stem
depends upon the class width desired.)
112
Construction of a Stem-and-Leaf Plot Step 1
The stem is the leading digit(s) The leaf is
the rightmost digit. (The choice of the stem
depends upon the class width desired.) Step 2
Write the stems in a vertical column in
increasing order. Draw a vertical line to the
right of the stems.
113
Construction of a Stem-and-Leaf Plot Step 1
The stem is the leading digit(s) The leaf is
the rightmost digit. (The choice of the stem
depends upon the class width desired.) Step 2
Write the stems in a vertical column in
increasing order. Draw a vertical line to the
right of the stems. Step 3 Write each leaf
corresponding to the stems to the right of the
vertical line. The leafs must be written in
ascending order.
114
Advantage of Stem-and-Leaf Diagrams over
Histograms
115
Advantage of Stem-and-Leaf Diagrams over
Histograms
Once a frequency distribution or histogram of
continuous data is created, the raw data is lost.
116
Advantage of Stem-and-Leaf Diagrams over
Histograms
Once a frequency distribution or histogram of
continuous data is created, the raw data is
lost. However, the raw data can be retrieved
from the stem-and-leaf plot.
117
Distribution Shapes
118
Distribution Shapes
119
Distribution Shapes
120
Distribution Shapes
121
Chapter TwoOrganizing and Summarizing Data
  • 2.3
  • Organizing Quantitative Data II

122
A cumulative frequency table displays the
aggregate frequency of the category.
123
A cumulative frequency table displays the
aggregate frequency of the category. In other
words, it displays the total number of
observations less than or equal to the category.
124
A cumulative frequency table displays the
aggregate frequency of the category. In other
words, it displays the total number of
observations less than or equal to the category.
A cumulative relative frequency table displays
the aggregate proportion (or percent) of
observations less than or equal to the category.
125
Definitions
126
Definitions The lower class limit of a class is
the smallest value within the class.
127
Definitions The lower class limit of a class is
the smallest value within the class. The upper
class limit of a class is the largest value
within the class.
128
Definitions The lower class limit of a class is
the smallest value within the class. The upper
class limit of a class is the largest value
within the class. The class midpoint is found by
adding a classs lower class limit and upper
class limit and dividing the result by 2. That
is,
129
Definitions The lower class limit of a class is
the smallest value within the class. The upper
class limit of a class is the largest value
within the class. The class midpoint is found by
adding a classs lower class limit and upper
class limit and dividing the result by 2. That
is,
130
Frequency Polygon
131
Frequency Polygon Step 1 Mark each class
midpoint on a horizontal axis.
132
Frequency Polygon Step 1 Mark each class
midpoint on a horizontal axis. Step 2 Plot a
point above each class midpoint at a height equal
to the frequency of the class.
133
Frequency Polygon Step 1 Mark each class
midpoint on a horizontal axis. Step 2 Plot a
point above each class midpoint at a height equal
to the frequency of the class. Step 3 After the
points for each class are plotted, draw straight
lines between consecutive points.
134
Relative Frequency Polygon
135
Relative Frequency Polygon Step 1 Mark each
class midpoint on a horizontal axis.
136
Relative Frequency Polygon Step 1 Mark each
class midpoint on a horizontal axis. Step 2
Plot a point above each class midpoint at a
height equal to the relative frequency of the
class.
137
Relative Frequency Polygon Step 1 Mark each
class midpoint on a horizontal axis. Step 2
Plot a point above each class midpoint at a
height equal to the relative frequency of the
class. Step 3 Connect the dots.
138
Frequency ogive
139
Frequency ogive a graph that represents the
cumulative frequency or cumulative relative
frequency for the class.
140
Frequency ogive a graph that represents the
cumulative frequency or cumulative relative
frequency for the class. Step 1 Plot the upper
class limits on a horizontal axis.
141
Frequency ogive a graph that represents the
cumulative frequency or cumulative relative
frequency for the class. Step 1 Plot the upper
class limits on a horizontal axis. Step 2 Plot
the cumulative frequency above each upper class
limit.
142
Frequency ogive a graph that represents the
cumulative frequency or cumulative relative
frequency for the class. Step 1 Plot the upper
class limits on a horizontal axis. Step 2 Plot
the cumulative frequency above each upper class
limit. Step 3 Connect the dots.
143
Time Series Plots
144
Time Series Plots If the value of a variable is
measured at different points in time, the data is
referred to as time series data.
145
Time Series Plots If the value of a variable is
measured at different points in time, the data is
referred to as time series data.
A time series plot is obtained by plotting the
time in which a variable is measured on the
horizontal axis and the corresponding value of
the variable on the vertical axis. Lines are
then drawn connecting the points.
146
Chapter ThreeNumerically Summarizing Data
  • 3.1
  • Measures of Central Tendency

147
Some Definitions
148
Some Definitions
  • A parameter is a descriptive measure of a
    population.

149
Some Definitions
  • A parameter is a descriptive measure of a
    population.
  • A statistic is a descriptive measure of a sample.

150
Some Definitions
  • A parameter is a descriptive measure of a
    population.
  • A statistic is a descriptive measure of a sample.
  • A statistic which is used to estimate a
    population parameter is called an estimator.

151
Some Definitions
  • A parameter is a descriptive measure of a
    population.
  • A statistic is a descriptive measure of a sample.
  • A statistic which is used to estimate a
    population parameter is called an estimator.
  • A statistic is an unbiased estimator of a
    parameter if it does not consistently over- or
    underestimate the parameter.

152
Measures of Centrality
153
Measures of Centrality A measure of centrality
is a measure of the center of the data.
154
Measures of Centrality A measure of centrality
is a measure of the center of the
data. Center can be defined in different ways.
155
  • Measures of Centrality
  • A measure of centrality is a measure of the
    center of the data.
  • Center can be defined in different ways.
  • Arithmetic mean.

156
  • Measures of Centrality
  • A measure of centrality is a measure of the
    center of the data.
  • Center can be defined in different ways.
  • Arithmetic mean.
  • (2) Median.

157
  • Measures of Centrality
  • A measure of centrality is a measure of the
    center of the data.
  • Center can be defined in different ways.
  • Arithmetic mean.
  • (2) Median.
  • (3) Mode.

158
Arithmetic Mean
159
Arithmetic Mean The arithmetic mean of a
variable is computed by
160
  • Arithmetic Mean
  • The arithmetic mean of a variable is computed by
  • Sum of all the values of the variable in the data
    set.

161
  • Arithmetic Mean
  • The arithmetic mean of a variable is computed by
  • Sum of all the values of the variable in the data
    set.
  • Divide the sum of all the values by the number
    of values.

162
The population arithmetic mean, is computed using
all the individuals in a population.
163
The population arithmetic mean, is computed using
all the individuals in a population. The
population mean is a parameter.
164
The population arithmetic mean, is computed using
all the individuals in a population. The
population mean is a parameter. We usually do not
know what its value is.
165
The population arithmetic mean, is computed using
all the individuals in a population. The
population mean is a parameter. We usually do not
know what its value is.
The population mean is denoted by
166
(No Transcript)
167
The sample arithmetic mean, is computed using
sample data.
168
The sample arithmetic mean, is computed using
sample data. The sample mean is denoted by
169
(No Transcript)
170
Median
171
Median The median M is computed by
172
  • Median
  • The median M is computed by
  • Arrange the data in order from smallest to
    largest.

173
  • Median
  • The median M is computed by
  • Arrange the data in order from smallest to
    largest.
  • Choose the value in the exact middle.

174
  • Median
  • The median M is computed by
  • Arrange the data in order from smallest to
    largest.
  • Choose the value in the exact middle.
  • Half the data is below the median

175
  • Median
  • The median M is computed by
  • Arrange the data in order from smallest to
    largest.
  • Choose the value in the exact middle.
  • Half the data is below the median
  • Half the data is below the median

176
Precise Steps for Calculating the Median
177
  • Precise Steps for Calculating the Median
  • Arrange the data in ascending order.

178
  • Precise Steps for Calculating the Median
  • Arrange the data in ascending order.
  • Determine the number of observation n.

179
  • Precise Steps for Calculating the Median
  • Arrange the data in ascending order.
  • Determine the number of observation n.
  • If n is an odd number, the median M is the value
    in the middle of the data the value in position
    (n 1) / 2.

180
  • Precise Steps for Calculating the Median
  • Arrange the data in ascending order.
  • Determine the number of observation n.
  • If n is an odd number, the median M is the value
    in the middle of the data the value in position
    (n 1) / 2.
  • If n is an even number, the median M is the
    average of the two observations in the middle

181
  • Precise Steps for Calculating the Median
  • Arrange the data in ascending order.
  • Determine the number of observation n.
  • If n is an odd number, the median M is the value
    in the middle of the data the value in position
    (n 1) / 2.
  • If n is an even number, the median M is the
    average of the two observations in the middle
  • I.e., the average of the value in the n / 2
    position and the value in the (n / 2) 1
    position.

182
Mode
183
Mode The mode of a variable is the most frequent
observation of the variable that occurs in the
data set.
184
Mode The mode of a variable is the most frequent
observation of the variable that occurs in the
data set. If there is no observation that occurs
with the most frequency, we say the data has no
mode.
185
Mode The mode of a variable is the most frequent
observation of the variable that occurs in the
data set. If there is no observation that occurs
with the most frequency, we say the data has no
mode. Used most often with categorical data.
186
Comparison of Mean and Median
187
Comparison of Mean and Median The arithmetic
mean is sensitive to extreme (very large or
small) values in the data.
188
Comparison of Mean and Median The arithmetic
mean is sensitive to extreme (very large or
small) values in the data. The median is
resistant to extreme values.
189
Use Median When
190
  • Use Median When
  • Data have unusually large or small values
    relative to the entire set of data.

191
  • Use Median When
  • Data have unusually large or small values
    relative to the entire set of data.
  • When the distribution of the data is skewed.

192
  • Use Median When
  • Data have unusually large or small values
    relative to the entire set of data.
  • When the distribution of the data is skewed.
  • The median gives a more accurate picture of the
    center of the data in these situations.

193
(No Transcript)
194
(No Transcript)
195
(No Transcript)
196
(No Transcript)
197
Chapter 3Numerically Summarizing Data
  • 3.2
  • Measures of Dispersion

198
Measures of Dispersion
199
  • Measures of Dispersion
  • Range

200
  • Measures of Dispersion
  • Range
  • Variance

201
  • Measures of Dispersion
  • Range
  • Variance
  • Standard Deviation

202
Range
203
Range The range, R, of a variable is the
difference between the largest data value and the
smallest data values.
204
Range The range, R, of a variable is the
difference between the largest data value and the
smallest data values. R Largest Data Value
Smallest Data Value
205
Population Variance
206
Population Variance The population variance of
is the sum of squared deviations about the
population mean divided by the number of
observations in the population, N.
207
Population Variance The population variance of
is the sum of squared deviations about the
population mean divided by the number of
observations in the population, N. In other
words, it is the average squared deviation about
the mean.
208
Population Variance
209
Population Variance The N population values are
x1, x2 , . . . , xN
210
Population Variance The N population values are
x1, x2 , . . . , xN Step 1 Determine the
population mean
211
Population Variance The N population values are
x1, x2 , . . . , xN Step 1 Determine the
population mean Step 2 Determine the
differences
212
Population Variance The N population values are
x1, x2 , . . . , xN Step 1 Determine the
population mean Step 2 Determine the
differences x1 - , x2 - , . . . ,
xN -
213
Population Variance The N population values are
x1, x2 , . . . , xN Step 1 Determine the
population mean Step 2 Determine the
differences x1 - , x2 - , . . . ,
xN - Step 3 Square the differences
214
Population Variance The N population values are
x1, x2 , . . . , xN Step 1 Determine the
population mean Step 2 Determine the
differences x1 - , x2 - , . . . ,
xN - Step 3 Square the differences (x1 -
)2 , (x2 - )2 , . . . , (xN - )2
215
Population Variance The N population values are
x1, x2 , . . . , xN Step 1 Determine the
population mean Step 2 Determine the
differences x1 - , x2 - , . . . ,
xN - Step 3 Square the differences (x1 -
)2 , (x2 - )2 , . . . , (xN - )2 Step
4 Take the average
216
Population Variance The N population values are
x1, x2 , . . . , xN Step 1 Determine the
population mean Step 2 Determine the
differences x1 - , x2 - , . . . ,
xN - Step 3 Square the differences (x1 -
)2 , (x2 - )2 , . . . , (xN - )2 Step
4 Take the average ((x1 - )2 (x2 - )2
. . . (xN - )2 )/ N
217
The population variance is symbolically
represented by lower case Greek sigma squared.
218
Sample Variance The sample variance s2 is
computed by determining the sum of squared
deviations about the sample mean and then
dividing this result by n 1.
219
Sample Variance
220
Sample Variance The n sample values are x1, x2
, . . . , xn
221
Sample Variance The n sample values are x1, x2
, . . . , xn Step 1 Determine the sample mean

222
Sample Variance The n sample values are x1, x2
, . . . , xn Step 1 Determine the sample mean
Step 2 Determine the differences x1 - ,
x2 - , . . . , xn -
223
Sample Variance The n sample values are x1, x2
, . . . , xn Step 1 Determine the sample mean
Step 2 Determine the differences x1 - ,
x2 - , . . . , xn - Step 3 Square
the differences (x1 - )2 , (x2 - )2 , .
. . , (xn - )2
224
Sample Variance The n sample values are x1, x2
, . . . , xn Step 1 Determine the sample mean
Step 2 Determine the differences x1 - ,
x2 - , . . . , xn - Step 3 Square
the differences (x1 - )2 , (x2 - )2 , .
. . , (xn - )2 Step 4 Sum and divide by
n-1 ((x1 - )2 (x2 - )2 . . . (xn -
)2 )/ (n-1)
225
Note Whenever a statistic consistently
overestimates or underestimates a parameter, it
is called biased. To obtain an unbiased estimate
of the population variance, we divide the sum of
the squared deviations about the mean by n - 1.
226
Population Standard Deviation
227
Population Standard Deviation The population
standard deviation is denoted by
228
Population Standard Deviation The population
standard deviation is denoted by
It is obtained by taking the square root of the
population variance, so that
229
Sample Standard Deviation
230
Sample Standard Deviation The sample standard
deviation is denoted by s
231
Sample Standard Deviation The sample standard
deviation is denoted by s
It is obtained by taking the square root of the
sample variance.
232
The Empirical Rule
233
The Empirical Rule If the population is
approximately bell-shaped, then we have the
following rules of thumb
234
The Empirical Rule If the population is
approximately bell-shaped, then we have the
following rules of thumb 68 of the data lies
within 1 s.d. of the mean
235
The Empirical Rule If the population is
approximately bell-shaped, then we have the
following rules of thumb 68 of the data lies
within 1 s.d. of the mean ( - ,
)
236
The Empirical Rule If the population is
approximately bell-shaped, then we have the
following rules of thumb 68 of the data lies
within 1 s.d. of the mean ( - ,
) 95 of the data lies within 2 s.d. of the
mean
237
The Empirical Rule If the population is
approximately bell-shaped, then we have the
following rules of thumb 68 of the data lies
within 1 s.d. of the mean ( - ,
) 95 of the data lies within 2 s.d. of the
mean ( - 2 , 2 )
238
The Empirical Rule If the population is
approximately bell-shaped, then we have the
following rules of thumb 68 of the data lies
within 1 s.d. of the mean ( - ,
) 95 of the data lies within 2 s.d. of the
mean ( - 2 , 2 ) 99.7 of the
data lies within 3 s.d. of the mean
239
The Empirical Rule If the population is
approximately bell-shaped, then we have the
following rules of thumb 68 of the data lies
within 1 s.d. of the mean ( - ,
) 95 of the data lies within 2 s.d. of the
mean ( - 2 , 2 ) 99.7 of the
data lies within 3 s.d. of the mean ( - 3 ,
3 )
240
(No Transcript)
241
(No Transcript)
242
Chapter 3Numerically Summarizing Data
  • 3.3
  • Measures of Central Tendency and Dispersion from
    Grouped Data

243
Chapter 3Numerically Summarizing Data
  • 3.4
  • Measures of Location

244
There are several ways of measuring the location
of a data point.
245
There are several ways of measuring the location
of a data point. Idea We want to locate a data
point relative to
246
  • There are several ways of measuring the location
    of a data point.
  • Idea We want to locate a data point relative
    to the other data.
  • z-scores

247
  • There are several ways of measuring the location
    of a data point.
  • Idea We want to locate a data point relative
    to the other data.
  • z-scores
  • percentiles (median, quartiles)

248
The z-score represents the number of standard
deviations that a data value is from the mean.
249
The z-score represents the number of standard
deviations that a data value is from the mean.
It is obtained by subtracting the mean from the
data value and dividing this result by the
standard deviation.
250
The z-score represents the number of standard
deviations that a data value is from the mean.
It is obtained by subtracting the mean from the
data value and dividing this result by the
standard deviation. The z-score is unitless with
a mean of 0 and a standard deviation of 1.
251
Population Z - score
252
Population Z - score
253
Population Z - score
Sample Z - score
254
Population Z - score
Sample Z - score
255
Percentiles
256
Percentiles The kth percentile, denoted Pk , of
a dataset divides the lower k of the data from
the upper (100 k) of the data.
257
Percentiles The kth percentile, denoted Pk , of
a dataset divides the lower k of the data from
the upper (100 k) of the data. The median
divides the lower 50 of the data from the upper
50
258
Computing the kth Percentile, Pk
Step 1 Arrange the data in ascending order.
259
Computing the kth Percentile, Pk
Step 1 Arrange the n data points in ascending
order.
Step 2 Let
260
Computing the kth Percentile, Pk
Step 1 Arrange the n data points in ascending
order.
Step 2 Let
Step 3 (a) If i is not an integer, round up to
the next highest integer. Pk is the ith value of
the data. (b) If i is an integer, the Pk is the
mean of the ith and (i 1)st data value.
261
The most common percentiles are quartiles.
262
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four
equal parts.
263
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four
equal parts. Q1 The 1st quartile divides the
bottom 25 the data from the top 75. (25th
percentile.)
264
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four
equal parts. Q1 The 1st quartile divides the
bottom 25 the data from the top 75. (25th
percentile.) Q2 The 2nd quartile divides the
bottom 50 the data from the top 50. (50th
percentile, or median.)
265
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four
equal parts. Q1 The 1st quartile divides the
bottom 25 the data from the top 75. (25th
percentile.) Q2 The 2nd quartile divides the
bottom 50 the data from the top 50. (50th
percentile, or median.) Q3 The 3rd quartile
divides the bottom 75 the data from the top 25.
(75th percentile.)
266
Checking for Outliers Using Quartiles
267
Checking for Outliers Using Quartiles
Step 1 Determine the first and third quartiles
of the data.
268
Checking for Outliers Using Quartiles
Step 1 Determine the first and third quartiles
of the data.
Step 2 Compute the interquartile range. The
interquartile range or IQR is the difference
between the third and first quartile. That is,
IQR Q3 - Q1
269
Checking for Outliers Using Quartiles
Step 1 Determine the first and third quartiles
of the data.
Step 2 Compute the interquartile range. The
interquartile range or IQR is the difference
between the third and first quartile. That is,
IQR Q3 - Q1
Step 3 Compute the fences that serve as cut-off
points for outliers.
Lower Fence Q1 - 1.5(IQR) Upper Fence Q3
1.5(IQR)
270
Checking for Outliers Using Quartiles
Step 1 Determine the first and third quartiles
of the data.
Step 2 Compute the interquartile range. The
interquartile range or IQR is the difference
between the third and first quartile. That is,
IQR Q3 - Q1
Step 3 Compute the that serve as cut-off points
for outliers.
Lower Fence Q1 - 1.5(IQR) Upper Fence Q3
1.5(IQR)
Step 4 If a data value is less than the lower
fence or greater than fences the upper fence,
then it is considered an outlier.
271
Chapter 3Numerically Summarizing Data
  • Section 3.5
  • Five Number Summary Boxplots

272
The Five-Number Summary
273
The Five-Number Summary MINIMUM
274
The Five-Number Summary MINIMUM Q1
275
The Five-Number Summary MINIMUM Q1 Median
276
The Five-Number Summary MINIMUM Q1
Median Q3
277
The Five-Number Summary MINIMUM Q1
Median Q3 MAXIMUM
278
A Boxplot is a graphical representation of the
five number summary.
279
Steps for Drawing a Boxplot
280
Steps for Drawing a Boxplot
Step 1 Draw vertical lines at Q1, M, and Q3.
Enclose these vertical lines in a box.
281
Steps for Drawing a Boxplot
Step 1 Draw vertical lines at Q1, M, and Q3.
Enclose these vertical lines in a box. Step 2
Label the lower and upper fence.
282
Steps for Drawing a Boxplot
Step 1 Draw vertical lines at Q1, M, and Q3.
Enclose these vertical lines in a box. Step 2
Label the lower and upper fence. Step 3 Draw a
line from Q1 to the smallest data value that is
larger than the lower fence. Draw a line from Q3
to the largest data value that is smaller than
the upper fence.
283
Steps for Drawing a Boxplot
Step 1 Draw vertical lines at Q1, M, and Q3.
Enclose these vertical lines in a box. Step 2
Label the lower and upper fence. Step 3 Draw a
line from Q1 to the smallest data value that is
larger than the lower fence. Draw a line from Q3
to the largest data value that is smaller than
the upper fence. Step 4 Any data values less
than the lower fence or greater than the upper
fence are outliers and are marked with an
asterisk ().
284
Symmetric
285
Skewed Right
286
Skewed Left
287
Chapter 4Describing the Relation Between Two
Variables
  • 4.1
  • Scatter Diagrams Correlation

288
The response variable is the variable whose value
we want to explain, predict or control.
289
The response variable is the variable whose value
we want to explain, predict or control. The
predictor variable is the variable which
explains, predicts, or controls the response.
290
The response variable is the variable whose value
we want to explain, predict or control. The
predictor variable is the variable which
explains, predicts, or controls the
response. Data for which two variables are
measured for each unit in the sample is called
bivariate data.
291
A scatter diagram shows the relationship between
two quantitative variables measured on the same
individual.
292
A scatter diagram shows the relationship between
two quantitative variables measured on the same
individual. Each individual in the data set is
represented by a point in the scatter diagram.
293
A scatter diagram shows the relationship between
two quantitative variables measured on the same
individual. Each individual in the data set is
represented by a point in the scatter diagram.
The predictor variable is plotted on the
horizontal axis.
294
A scatter diagram shows the relationship between
two quantitative variables measured on the same
individual. Each individual in the data set is
represented by a point in the scatter diagram.
The predictor variable is plotted on the
horizontal axis. The response variable is plotted
on the vertical axis.
295
Two variables that are linearly related are said
to be positively associated when the values of
the predictor variable increase, the values of
the response variable also increase.
296
Two variables that are linearly related are said
to be negatively associated when the values of
the predictor variable increase, the values of
the response variable decrease.
297
The sample correlation coefficient is a measure
of the strength of linear relation between two
quantitative variables.
298
The sample correlation coefficient is a measure
of the strength of linear relation between two
quantitative variables. We let r denote the
sample correlation coefficient.
299
The sample correlation coefficient is a measure
of the strength of linear relation between two
quantitative variables. We let r denote the
sample correlation coefficient. r close to 1
indicates strong positive linear relation.
300
The sample correlation coefficient is a measure
of the strength of linear relation between two
quantitative variables. We let r denote the
sample correlation coefficient. r close to 1
indicates strong positive linear relation. r
close to 0 indicates little linear relation.
301
The sample correlation coefficient is a measure
of the strength of linear relation between two
quantitative variables. We let r denote the
sample correlation coefficient. r close to 1
indicates strong positive linear relation. r
close to 0 indicates little linear relation. r
close to -1 indicates strong negative linear
relation.
302
Suppose we have bivariate data
303
Suppose we have bivariate data X Y x1
y1 x2 y2 x3 y3 . xn yn
304
Suppose we have bivariate data X Y x1
y1 x2 y2 x3 y3 . xn yn
305
Suppose we have bivariate data X Y x1
y1 x2 y2 x3 y3 . xn yn
n is the number of units sampled
306
Suppose we have bivariate data X Y x1
y1 x2 y2 x3 y3 . xn yn
n is the number of units sampled

x is the sample mean for X
307
Suppose we have bivariate data X Y x1
y1 x2 y2 x3 y3 . xn yn
n is the number of units sampled
x is the sample mean for X y is the sample
mean for Y
308
Suppose we have bivariate data X Y x1
y1 x2 y2 x3 y3 . xn yn
n is the number of units sampled
sx is the sample s.d for X
x is the sample mean for X y is the sample
mean for Y
309
Suppose we have bivariate data X Y x1
y1 x2 y2 x3 y3 . xn yn
n is the number of units sampled
sx is the sample s.d for X sy is the sample
s.d for Y
x is the sample mean for X y is the sample
mean for Y
310
Steps for Calculating r

311
Steps for Calculating r Step 1 Calculate the
sample mean x for variable X and y for variable
Y.

312
Steps for Calculating r Step 1 Calculate the
sample mean x for variable X and y for variable
Y. Step 2 Calculate the sample standard
deviation sx for variable X and sy for variable
Y.

313
Steps for Calculating r Step 1 Calculate the
sample mean x for variable X and y for variable
Y. Step 2 Calculate the sample standard
deviation sx for variable X and sy for variable
Y. Step 3 Calculate z-scores for all the
data

314
Steps for Calculating r Step 1 Calculate the
sample mean x for variable X and y for variable
Y. Step 2 Calculate the sample standard
deviation sx for variable X and sy for variable
Y. Step 3 Calculate z-scores for all the
data (x1 x)/ sx , (x2 x)/ sx, , (xn x)/
sx (y1 y)/ sy , (y2 y)/ sy, , (yn y)/
sx

315
Steps for Calculating r Step 4 Multiple the
respective z-scores for X and Y

316
Steps for Calculating r Step 4 Multiple the
respective z-scores for X and Y (x1 x)/ sx
x (y1 y)/ sy , (x2 x)/ sx x
(y2 y)/ sy . . . . . . . . . . . . (xn x)/
sx x (yn y)/ sx

317
Steps for Calculating r Step 4 Multiple the
respective z-scores for X and Y (x1 x)/ sx
x (y1 y)/ sy , (x2 x)/ sx x
(y2 y)/ sy . . . . . . . . . . . . (xn x)/
sx x (yn y)/ sx Step 5 Add these
together and divide by (n-1)

318
Steps for Calculating r Step 4 Multiple the
respective z-scores for X and Y (x1 x)/ sx
x (y1 y)/ sy , (x2 x)/ sx x
(y2 y)/ sy . . . . . . . . . . . . (xn x)/
sx x (yn y)/ sx Step 5 Add these
together and divide by (n-1) ((x1 x)/ sx x
(y1 y)/ sy) . . . ( (x1 x)/ sx x (y1
y)/ sy) (n-1)

319
(No Transcript)
320
(No Transcript)
321
(No Transcript)
322
(No Transcript)
323
(No Transcript)
324
(No Transcript)
325
(No Transcript)
326
Chapter 4Describing the Relation Between Two
Variables
  • 4.2
  • Least-squares Regression

327
Recall that the equation for a line is given
by
328
Recall that the equation for a line is given
by Y m X b
329
Recall that the equation for a line is given
by Y m X b m slope of the line.
330
Recall that the equation for a line is given
by Y m X b m slope of the line. b
intercept of the line.
331
Recall that the equation for a line is given
Write a Comment
User Comments (0)
About PowerShow.com