Title: The Role of Statistics and the Data Analysis Process
1Chapter 1
- The Role of Statistics and the Data Analysis
Process
21.1 Three Reasons to Study Statistics
- Reason 1. Being Informed.
- You should be able to
- Extract information from tables, charts, and
graphs - Follow numerical arguments
- Understand the basics of how data should be
gathered, summarized, and analyzed.
31.1 Three Reasons to Study Statistics
- Examples of Being Informed
- An analysis of data from University of Utah
concluded that drivers engaged in cell phone
conversations missed twice as many simulated
signals as drivers who were not talking over the
phone. - An article on the Journal of the American Medical
Association concluded that surgery patients at
hospitals with a severe shortage of nurses had a
31 greater risk of dying while in the hospital. - Based on interviews with 24,000 women in 10
different country, WHO found that the percentage
of women who have been abused by a partner varied
widely-from 15 in Japan to 71 in Ethiopia.
41.1 Three Reasons to Study Statistics
- Reason 2. Making Informed Judgments
- To make informed decisions, you must be able to
take the following steps - Decide whether existing information is adequate
or whether additional information is required. - If necessary, collect more information in a
reasonable and thoughtful way. - Summarize the available data in a useful and
informative manner. - Analyze the available data.
- Draw conclusions, make decisions, and assess the
risk of an incorrect decision.
51.1 Three Reasons to Study Statistics
- Examples of Making Informed Decisions
- Almost all industries, as well as government and
nonprofit organizations, use market research
tools, such as consumer surveys, that are
designed to provide information about who uses
their products or services. - Modern science and its applied fields rely on
statistical methods for analyzing data and
deciding whether various conjectures are
supported by observed data. - In law, class-action lawsuit can depend on a
statistical analysis of whether one kind of
injury or illness is more common in a particular
group than in general public. - We also use the five steps to make everyday
decision Should we go out for a sport that
involves the risk of injury. If we choose a
particular major, what are our chance of finding
a job when you graduate?
61.1 Three Reasons to Study Statistics
- Reason 3. Evaluating Decisions That Affect Your
Life Other people use statistical methods to
make decisions that affect you. An understanding
of statistical techniques will allow you to
question and evaluate decisions that affect your
well-being. - Insurance company use statistical techniques to
set auto insurance rates. - University financial aid offices collect data on
family incomes and savings, and use the data to
set criteria for deciding who receives financial
aid. - Medical researchers use statistical methods to
make recommendations regarding the choice between
surgical and nonsurgical treatment of such
diseases as coronary heart disease and cancer.
71.2 The Nature and Role of Variability
- Variability is almost universal.
- Imagine an unrealistic situation In a
university, every student takes the same courses,
spends exactly the same amount of money on
textbooks, and has the same GPA. - Populations with no variability almost do not
exist. - We need to understand variability to be able to
collect, analyze, and draw conclusions from data
in a sensible way.
81.3 Statistics and Data Analysis
- Statistics is the science of collecting,
analyzing, and drawing conclusions from data. - The Population the entire collection of
individuals or objects about which information is
desired. - A Sample A subset of the population, selected
for study in some prescribed manner. - Descriptive statistics includes methods for
organizing and summarizing data - Inferential statistics involves generalizing from
a sample to the population from which it was
selected, and assessing the reliability of such
generalization.
9The Data Analysis Process
- Understand the nature of the problem.
- Decide what to measure and how to measure it.
- Collect data with a carefully developed plan.
- Summarize the data and start preliminary
analysis. - Apply the appropriate inferential statistical
method for formal data analysis. - Interpret the results.
101.3 Statistics and Data Analysis
- Example A consumer group conducts crash tests of
new model cars. To determine the severity of
damage to 2003 Mazda 626s resulting from a 10-mph
crash into a concrete wall, the research group
tests six cars of this type and assesses the
amount of damage. Describe the population and
sample for this problem.
Population All 2003 Mazda 626s Sample The six
Mazda 626 being tested.
111.3 Statistics and Data Analysis
- Example The supervisors of a rural county are
interested in the proportion of property owners
who support the construction of a sewer system.
Because it is too costly to contact all 7000
property owners, a survey of 500 owners (selected
at random) is undertaken. Describe the population
and sample for this problem
Population All 7000 property owners in the
county Sample The 500 property owners being
surveyed
12Example A Proposed New Treatment for Alzheimers
Disease
- Doctors at Stanford Medical Center were
interested in determining if a new surgical
approach to treating Alzheimers disease results
in improved memory functioning. (The surgical
procedure involves implanting a thin tube, called
a shunt.) - 11 patients have shunts implanted and were
followed for a year, receiving quarterly tests
for memory function. - Another sample of Alzheimers patients received
standard care, and was used as a comparison
group. - After analyzing the data from this study, the
researchers concluded that the treated patients
essentially held their own in the cognitive test
while the patients in the comparison group
steadily declined.
131.3 Statistics and Data Analysis
- In the example A proposed new treatment for
Alzheimers disease, what is the population and
sample? - Do you think the sample is good enough to produce
conclusive statistical evidence? - The limitations of the study the result is from
a small sample. They need a larger, more
sophisticated study, and a new data analysis
cycle begins. - A much larger 18-month study was planned. The
study was to include 256 patients at 25 medical
centers around the country.
141.4 Types of Data and Some Simple Graphical
Displays
- Definitions
- A variable is an characteristic whose value may
change from one individual or object to another
in a population. e.g. The population is the set
of all students in our stats class. The brand of
calculator owned by each student is a variable,
and the distance to UHD from each students home
is also a variable. - A data set consisting of observations on a single
variable (attribute) is a univariate data set. - A univariate data set is categorical (or
qualitative) if the individual observations are
categorical responses. (e.g. the brand of
calculator) - A univariate data set is numerical (or
quantitative) if each observation is a number.
(e.g. the distance to UHD)
151.4 Types of Data and Some Simple Graphical
Displays
- Discrete and Continuous Data
- Numerical data are discrete if the possible
values are isolated points on the number line. - Numerical data are continuous if the set of
possible values forms an entire interval on the
number line.
161.4 Types of Data and Some Simple Graphical
Displays
- 1. Example Airline Safety Violations
- The FAA monitors airlines and can take
administrative actions for safety violations
Security (S), Maintenance (M), Flight Operations
(F), Hazardous Materials (H), or Other (O). - Data for 20 administrative actions are given
below. - S S M H M O S M S S
- F S O M S M S M S
M - Classify the attribute as categorical or
numerical.
Answer categorical
17An Example of Numerical Data
- 2. Example Revisiting Airline Safety Violation
- The following data present the number of
violations and the average fine per violation for
the period 1985-1998 for 10 major airlines - Airline No. of Violation Average Fine per
Violation () - Alaska 258 5038.760
- American West 257 3112.840
- American 1745 2693.410
- Continental 973 5755.390
- Delta 1280 3828.125
- Northwest 1097 2643.573
- Southwest 535 3925.234
- TWA 642 2803.738
- United 1110 2612.613
- US Airways 891 3479.237
181.4 Types of Data and Some Simple Graphical
Displays Frequency Distributions
- Frequency Distributions for Categorical Data is a
table that displays the possible categories along
with the associated frequencies and/or relative
frequencies. - The frequency for a particular category is the
number of times the category appears in the data
set. - The relative frequency for a particular category
is the fraction or proportion of the observations
resulting in the category - If the table includes relative frequency, it is
sometimes referred to as a relative frequency
distribution.
19Frequency Distributions
- Example To ensure safety, the motorcycle helmet
should reach the bottom of the motorcyclists
ears, according to the standards set by US
Department of Transportation. Data was collected
by observing 1700 motorcyclists nationwide at
selected roadway locations. There were 731 riders
who wore no helmet, 153 who wore a noncompliant
helmet, and 816 who wore a compliant helmet.
Determine the frequency distribution and relative
frequency distribution. Use the code - N no helmet, NH noncompliant helmet, and
- CH compliant helmet
- Frequency distribution for helmet use
20Some Simple Graphical Displays Bar Charts
- When to use a bar chart Categorical data
- How to Construct
- Draw a horizontal line, and write the category
names or labels below the line at regularly
spaced intervals. - Draw a vertical line, and label the scale using
either frequency or relative frequency. - Place a rectangular bar above each category
label. The height is determined by the categorys
frequency or relative frequency, and all bars
should have the same width. With the same width,
both the height and the area of the bar are
proportional to frequency and relative frequency. - Construct a bar chart for the helmet data.
21Create a Bar Chart Using Excel
22Excel generates the bar chart. You can choose
from Chart Layout to add title, give
explanations and do other modifications.
231.4 Types of Data and Some Simple Graphical
Displays Dotplots for Numerical Data
- When to use a dotplot Small numerical data sets
- How to construct a dotplot
- Draw a horizontal line and mark it with an
appropriate measurement scale - Locate each value in the data set along the
measurement scale, and represent it by a dot. If
there are two or more observations with the same
value, stack the dots vertically. - What to Look For
- A representative or typical value in the data
set. - The extent to which the data values spread out.
- The nature of the distribution of values along
the number line. - The presence of unusual values in the data set.
241.4 Types of Data and Some Simple Graphical
Displays Dotplots for Numerical Data
- Example The Chronicle of Higher Education
reported graduation rate for NCAA Division I
schools.. The rates reported are the percentage
of full-time freshmen in fall 1993 who had earned
a bachelors degree by August 1999. Data from 20
schools in California and 19 schools from Texas
are as follows - California
- Texas
- Construct (1) a dotplot of graduation rates
- (2) a dotplot of graduation rate for
California and Texas -
25Dotplot of graduation rates (California and Texas
together)
Separate dotplots of graduation rates for Texas
and California