The Role of Statistics and the Data Analysis Process - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

The Role of Statistics and the Data Analysis Process

Description:

... everyday decision: Should we go out for a sport that involves the risk of injury. ... Example: A consumer group conducts crash tests of new model cars. ... – PowerPoint PPT presentation

Number of Views:306
Avg rating:3.0/5.0
Slides: 26
Provided by: xie52
Category:

less

Transcript and Presenter's Notes

Title: The Role of Statistics and the Data Analysis Process


1
Chapter 1
  • The Role of Statistics and the Data Analysis
    Process

2
1.1 Three Reasons to Study Statistics
  • Reason 1. Being Informed.
  • You should be able to
  • Extract information from tables, charts, and
    graphs
  • Follow numerical arguments
  • Understand the basics of how data should be
    gathered, summarized, and analyzed.

3
1.1 Three Reasons to Study Statistics
  • Examples of Being Informed
  • An analysis of data from University of Utah
    concluded that drivers engaged in cell phone
    conversations missed twice as many simulated
    signals as drivers who were not talking over the
    phone.
  • An article on the Journal of the American Medical
    Association concluded that surgery patients at
    hospitals with a severe shortage of nurses had a
    31 greater risk of dying while in the hospital.
  • Based on interviews with 24,000 women in 10
    different country, WHO found that the percentage
    of women who have been abused by a partner varied
    widely-from 15 in Japan to 71 in Ethiopia.

4
1.1 Three Reasons to Study Statistics
  • Reason 2. Making Informed Judgments
  • To make informed decisions, you must be able to
    take the following steps
  • Decide whether existing information is adequate
    or whether additional information is required.
  • If necessary, collect more information in a
    reasonable and thoughtful way.
  • Summarize the available data in a useful and
    informative manner.
  • Analyze the available data.
  • Draw conclusions, make decisions, and assess the
    risk of an incorrect decision.

5
1.1 Three Reasons to Study Statistics
  • Examples of Making Informed Decisions
  • Almost all industries, as well as government and
    nonprofit organizations, use market research
    tools, such as consumer surveys, that are
    designed to provide information about who uses
    their products or services.
  • Modern science and its applied fields rely on
    statistical methods for analyzing data and
    deciding whether various conjectures are
    supported by observed data.
  • In law, class-action lawsuit can depend on a
    statistical analysis of whether one kind of
    injury or illness is more common in a particular
    group than in general public.
  • We also use the five steps to make everyday
    decision Should we go out for a sport that
    involves the risk of injury. If we choose a
    particular major, what are our chance of finding
    a job when you graduate?

6
1.1 Three Reasons to Study Statistics
  • Reason 3. Evaluating Decisions That Affect Your
    Life Other people use statistical methods to
    make decisions that affect you. An understanding
    of statistical techniques will allow you to
    question and evaluate decisions that affect your
    well-being.
  • Insurance company use statistical techniques to
    set auto insurance rates.
  • University financial aid offices collect data on
    family incomes and savings, and use the data to
    set criteria for deciding who receives financial
    aid.
  • Medical researchers use statistical methods to
    make recommendations regarding the choice between
    surgical and nonsurgical treatment of such
    diseases as coronary heart disease and cancer.

7
1.2 The Nature and Role of Variability
  • Variability is almost universal.
  • Imagine an unrealistic situation In a
    university, every student takes the same courses,
    spends exactly the same amount of money on
    textbooks, and has the same GPA.
  • Populations with no variability almost do not
    exist.
  • We need to understand variability to be able to
    collect, analyze, and draw conclusions from data
    in a sensible way.

8
1.3 Statistics and Data Analysis
  • Statistics is the science of collecting,
    analyzing, and drawing conclusions from data.
  • The Population the entire collection of
    individuals or objects about which information is
    desired.
  • A Sample A subset of the population, selected
    for study in some prescribed manner.
  • Descriptive statistics includes methods for
    organizing and summarizing data
  • Inferential statistics involves generalizing from
    a sample to the population from which it was
    selected, and assessing the reliability of such
    generalization.

9
The Data Analysis Process
  • Understand the nature of the problem.
  • Decide what to measure and how to measure it.
  • Collect data with a carefully developed plan.
  • Summarize the data and start preliminary
    analysis.
  • Apply the appropriate inferential statistical
    method for formal data analysis.
  • Interpret the results.

10
1.3 Statistics and Data Analysis
  • Example A consumer group conducts crash tests of
    new model cars. To determine the severity of
    damage to 2003 Mazda 626s resulting from a 10-mph
    crash into a concrete wall, the research group
    tests six cars of this type and assesses the
    amount of damage. Describe the population and
    sample for this problem.

Population All 2003 Mazda 626s Sample The six
Mazda 626 being tested.
11
1.3 Statistics and Data Analysis
  • Example The supervisors of a rural county are
    interested in the proportion of property owners
    who support the construction of a sewer system.
    Because it is too costly to contact all 7000
    property owners, a survey of 500 owners (selected
    at random) is undertaken. Describe the population
    and sample for this problem

Population All 7000 property owners in the
county Sample The 500 property owners being
surveyed
12
Example A Proposed New Treatment for Alzheimers
Disease
  • Doctors at Stanford Medical Center were
    interested in determining if a new surgical
    approach to treating Alzheimers disease results
    in improved memory functioning. (The surgical
    procedure involves implanting a thin tube, called
    a shunt.)
  • 11 patients have shunts implanted and were
    followed for a year, receiving quarterly tests
    for memory function.
  • Another sample of Alzheimers patients received
    standard care, and was used as a comparison
    group.
  • After analyzing the data from this study, the
    researchers concluded that the treated patients
    essentially held their own in the cognitive test
    while the patients in the comparison group
    steadily declined.

13
1.3 Statistics and Data Analysis
  • In the example A proposed new treatment for
    Alzheimers disease, what is the population and
    sample?
  • Do you think the sample is good enough to produce
    conclusive statistical evidence?
  • The limitations of the study the result is from
    a small sample. They need a larger, more
    sophisticated study, and a new data analysis
    cycle begins.
  • A much larger 18-month study was planned. The
    study was to include 256 patients at 25 medical
    centers around the country.

14
1.4 Types of Data and Some Simple Graphical
Displays
  • Definitions
  • A variable is an characteristic whose value may
    change from one individual or object to another
    in a population. e.g. The population is the set
    of all students in our stats class. The brand of
    calculator owned by each student is a variable,
    and the distance to UHD from each students home
    is also a variable.
  • A data set consisting of observations on a single
    variable (attribute) is a univariate data set.
  • A univariate data set is categorical (or
    qualitative) if the individual observations are
    categorical responses. (e.g. the brand of
    calculator)
  • A univariate data set is numerical (or
    quantitative) if each observation is a number.
    (e.g. the distance to UHD)

15
1.4 Types of Data and Some Simple Graphical
Displays
  • Discrete and Continuous Data
  • Numerical data are discrete if the possible
    values are isolated points on the number line.
  • Numerical data are continuous if the set of
    possible values forms an entire interval on the
    number line.

16
1.4 Types of Data and Some Simple Graphical
Displays
  • 1. Example Airline Safety Violations
  • The FAA monitors airlines and can take
    administrative actions for safety violations
    Security (S), Maintenance (M), Flight Operations
    (F), Hazardous Materials (H), or Other (O).
  • Data for 20 administrative actions are given
    below.
  • S S M H M O S M S S
  • F S O M S M S M S
    M
  • Classify the attribute as categorical or
    numerical.

Answer categorical
17
An Example of Numerical Data
  • 2. Example Revisiting Airline Safety Violation
  • The following data present the number of
    violations and the average fine per violation for
    the period 1985-1998 for 10 major airlines
  • Airline No. of Violation Average Fine per
    Violation ()
  • Alaska 258 5038.760
  • American West 257 3112.840
  • American 1745 2693.410
  • Continental 973 5755.390
  • Delta 1280 3828.125
  • Northwest 1097 2643.573
  • Southwest 535 3925.234
  • TWA 642 2803.738
  • United 1110 2612.613
  • US Airways 891 3479.237

18
1.4 Types of Data and Some Simple Graphical
Displays Frequency Distributions
  • Frequency Distributions for Categorical Data is a
    table that displays the possible categories along
    with the associated frequencies and/or relative
    frequencies.
  • The frequency for a particular category is the
    number of times the category appears in the data
    set.
  • The relative frequency for a particular category
    is the fraction or proportion of the observations
    resulting in the category
  • If the table includes relative frequency, it is
    sometimes referred to as a relative frequency
    distribution.

19
Frequency Distributions
  • Example To ensure safety, the motorcycle helmet
    should reach the bottom of the motorcyclists
    ears, according to the standards set by US
    Department of Transportation. Data was collected
    by observing 1700 motorcyclists nationwide at
    selected roadway locations. There were 731 riders
    who wore no helmet, 153 who wore a noncompliant
    helmet, and 816 who wore a compliant helmet.
    Determine the frequency distribution and relative
    frequency distribution. Use the code
  • N no helmet, NH noncompliant helmet, and
  • CH compliant helmet
  • Frequency distribution for helmet use

20
Some Simple Graphical Displays Bar Charts
  • When to use a bar chart Categorical data
  • How to Construct
  • Draw a horizontal line, and write the category
    names or labels below the line at regularly
    spaced intervals.
  • Draw a vertical line, and label the scale using
    either frequency or relative frequency.
  • Place a rectangular bar above each category
    label. The height is determined by the categorys
    frequency or relative frequency, and all bars
    should have the same width. With the same width,
    both the height and the area of the bar are
    proportional to frequency and relative frequency.
  • Construct a bar chart for the helmet data.

21
Create a Bar Chart Using Excel
22
Excel generates the bar chart. You can choose
from Chart Layout to add title, give
explanations and do other modifications.
23
1.4 Types of Data and Some Simple Graphical
Displays Dotplots for Numerical Data
  • When to use a dotplot Small numerical data sets
  • How to construct a dotplot
  • Draw a horizontal line and mark it with an
    appropriate measurement scale
  • Locate each value in the data set along the
    measurement scale, and represent it by a dot. If
    there are two or more observations with the same
    value, stack the dots vertically.
  • What to Look For
  • A representative or typical value in the data
    set.
  • The extent to which the data values spread out.
  • The nature of the distribution of values along
    the number line.
  • The presence of unusual values in the data set.

24
1.4 Types of Data and Some Simple Graphical
Displays Dotplots for Numerical Data
  • Example The Chronicle of Higher Education
    reported graduation rate for NCAA Division I
    schools.. The rates reported are the percentage
    of full-time freshmen in fall 1993 who had earned
    a bachelors degree by August 1999. Data from 20
    schools in California and 19 schools from Texas
    are as follows
  • California
  • Texas
  • Construct (1) a dotplot of graduation rates
  • (2) a dotplot of graduation rate for
    California and Texas

25
Dotplot of graduation rates (California and Texas
together)
Separate dotplots of graduation rates for Texas
and California
Write a Comment
User Comments (0)
About PowerShow.com