Title: Chapter 2 Describing, Exploring, and Comparing Data Sections 2.12.4
1Chapter 2--Describing, Exploring, and Comparing
Data(Sections 2.1-2.4)
2Overview
- Descriptive Statisticssummarize or describe the
important characteristics of a known set of
population data - Inferential Statisticsuse sample data to make
inferences (or generalizations) about a
population
3Important Characteristics of Data
- Center A representative or average value that
indicates where the middle of the data set is
located - Variation A measure of the amount that the
values vary among themselves - Distribution The nature or shape of the
distribution of data (such as bell-shaped,
uniform, or skewed) - Outliers Sample values that lie very far away
from the vast majority of other sample values - Time Changing characteristics of the data over
time
4Section 2.2Frequency Distributions
5Frequency Distribution
- Lists data values (either individually or by
groups of intervals), along with their
corresponding frequencies or counts.
6Lower Class Limits
- The smallest numbers that can actually belong to
different classes
7Upper Class Limits
- The largest numbers that can actually belong to
different classes
8Class Boundaries
- Number separating classes
9Class Width
- The difference between two consecutive lower
class limits or two consecutive lower class
boundaries
10Reasons To Construct Frequency Distributions
- Large data sets can be summarized
- Can gain some insight into the nature of the data
- Have a basis for constructing graphs
- Great overview
- Ability to see any strange or unusual data
- Quickly identifies any outliners
11Constructing a Frequency Distribution
- Decide on the number of classes (between 5 and 20
good start) - Calculate widths
- Starting point Choose a lower limit of the first
class - Using the lower limit of the first class and
class width, proceed to list the lower class
limits - List the lower class limits in a vertical column
and proceed to enter the upper class limits - Go through the data set putting a tally in the
appropriate class for each data value
12Example Determine Class Width
13Example Set Up Frequency Distribution Table
- Class width 34
- Take lowest number and add 33
- You want 34 units in the frequency.
- Count first lowest class as first unit
- Build from there
14Relative Frequency
15Cumulative Frequency
- Write out new classes using next class
- Add each to the previous class
- Should end up with total number of items at the
end
16Frequency Tables
17Section 2.3Visualizing Data
18Visualizing Data
Depict the nature of shape or shape of the data
distribution
19Histogram
A bar graph in which the horizontal scale
represents the classes of data values and the
vertical scale represents the frequencies.
20Relative Frequency Histogram
Has the same shape and horizontal scale as a
histogram, but the vertical scale is marked with
relative frequencies.
21Histogram Relative Frequency Histogram
Figure 2-1
Figure 2-2
22Frequency Polygon
Uses line segments connected to points directly
above class midpoint values
23Ogive
A line graph that depicts cumulative frequencies
24Dot Plot
Consists of a graph in which each data value is
plotted as a point along a scale of values
25Stem-and Leaf Plot
Represents data by separating each value into two
parts the stem (such as the leftmost digit) and
the leaf (such as the rightmost digit)
26 Pareto Chart
A bar graph for qualitative data, with the bars
arranged in order according to frequencies
27Pie Chart
A graph depicting qualitative data as slices of a
pie
28 Scatter Diagram
A plot of paired (x,y) data with a horizontal
x-axis and a vertical y-axis
29Time-Series Graph
Data that have been collected at different points
in time
30Losses of Napoleon's Army
31Section 2.4Measures of Center
32Definitions
- Measure of Center The value at the center or
middle of a data set - Arithmetic Mean (Mean) The measure of center
obtained by adding the values and dividing the
total by the number of values. - Median The middle value when the original data
values are arranged in order of increasing (or
decreasing) magnitude. - Median is not affected by an extreme value.
33Notation (Mean)
- ? denotes the addition of a set of values
- x is the variable usually used to represent the
individual data values - n represents the number of values in a sample
- N represents the number of values in a population
34Notation (Mean)
- µ is pronounced mu and denotes the mean of all
values in a population
35Finding the Median
- Odd Number of Values the median is the number
located in the exact middle of the list. - Even Number of Values the median is found by
computing the mean of the two middle numbers.
36Odd Number of Values
37Even Number of Values
38Mode (M)
- Value that occurs most frequently
- Mode may be
- Mode (one value)
- Bimodal (two values)
- Multimodal (multiple values)
- No Mode (no valuesno repeats)
39Midrange
- The value midway between the highest and lowest
values in the original data set - To compute
40Round-off Rule for Measures of Center
- Carry one more decimal place than is present in
the original set of values - 10.2 12.3 9.5 5.2 6.5
- Median 8.74
- Rules of Rounding
- Between 0 4, round down (9.83 becomes 9.8)
- Between 5 9, round up (9.86 becomes 9.9)
- Both are rounded to the tenths place
41Mean From Frequency Distribution
- Assume that in each class, all sample values are
equal to the class midpoint
42Best Measure of Center
43Definitions
- Symmetric Data is symmetric if the left half of
its histogram is roughly a mirror image of its
right half. - Skewed Data is skewed if it is not symmetric
and if it extends more to one side than the
other.
44Skewness
45Recap
In this section we have discussed