Chapter Overview - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Chapter Overview

Description:

S hape unimodal, bimodal, symmetric, skewed ( or ... When distributions are unimodal and symmetric, a Normal model may be useful ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 39
Provided by: debb113
Category:

less

Transcript and Presenter's Notes

Title: Chapter Overview


1
Chapter Overview
  • Statistical Process
  • Know your question plan your study or
    experiment
  • ID Population parameter of interest
  • Plan your study (using historical data cant
    claim causation)
  • Or plan your experiment (control, random
    selection, random assignment to control,
    replication)
  • EAC
  • Evidence (collect data)
  • Analysis (graph calculate statistics)
  • Conclusion (only infer to population if SRS)
  • Analysis (A of EAC) Present and describe sets
    of data
  • Graph
  • Qualitative Data (_attribute, or categorical
    data used to show relationships/proportions)
  • Circle
  • bar graph
  • Pareto Diagram (bar graph with cumulative line
    graph_)
  • Quantitative Data (uses graph to show_dispersion)
  • dot plot
  • stem leaf
  • frequency chart
  • Conclusion (C of EAC) Interpret findings so
    that we know what the data is telling us about
    the sampled population

2
Summary 2.1-2.2 Data Presentation
  • Categorical (Qualitative)
  • Circle graph
  • Bar graph
  • Bars spaced (x axis is categories not measure)
  • Pareto
  • Prioritized columns descending
  • Plots cumulative
  • Exceptions
  • Some quantitative data summarized into categories
  • Quantitative
  • Dot plot
  • Vertical or horizontal
  • Stem and leaf
  • State leaf unit
  • Large group subdivide into 2 or 5 subgroups
  • Histogram
  • Bar graph of frequency distr.
  • Bars touch (x axis is a scale)
  • Frequency distribution
  • Ungrouped
  • Grouped (f vs. midpoint)
  • Relative ( vs. class boundaries)
  • Cumulative
  • Cumulative relative (Ogive vs upper limit of
    class)

Interpret Patterns Overlapping distributions?
Separate into back to backs (p.47) Normal,
symmetrical mirror image Uniform box
like Bimodal two peaks Skewed left negatively
skewed, mean to left of median Skewed right
positively skewed, mean to right of median J
shaped no tail on side with highest frequency
3
2.3 2.4 Measures Overview (no rounding until
final answer round to one more place than data)
  • 2.3 Measures of Center
  • Mean
  • Of sample
  • Of population
  • Depth of median (n1)/2
  • Median MD
  • Mode
  • Midrange MR (hi low)/2
  • Measures for grouped data
  • mean
  • Weighted mean
  • Variance standard deviation
  • 2.4 Measures of Dispersion
  • Range hi - low
  • Deviation from mean
  • Abs. value of deviation
  • Mean abs. deviation
  • Sample variance
  • Sample standard deviation s
  • square root of variance
  • (used to est. pop.std.dev.-underestimates so n-l
    to reduce bias)
  • Population standard deviation

SECS S hape unimodal, bimodal, symmetric,
skewed ( or -) E xceptions 1.5 IQR meaning
Q1-1.5(stand.dev.) or Q31.5(stand.dev.) C
enter (measures of central tendency) mean,
median, mode, midrange S pread (measures of
dispersion) range, interquartile range,
var., standard deviation RESISTANT
Measures mean or median? variance or standard
deviation?
4
Remember Chapter 1
  • Only experiments can conclude causation
  • 3 Requirements for Exp.
  • Control
  • May have 2 groups being compared (comparative
    exp.)
  • Direct manipulation of independent variable
  • measure the effect on the dependent or response
    variable in a quantified way
  • Randomization
  • Statistical equivalence (subjects must be as
    similar as possible) RANDOM SELECTION
  • RANDOM ASSIGNMENT to control or treatment group
  • Control of extraneous variables
  • Replication
  • Repeat with large s so chance variation can be
    reduced and the effect of treatment more easily
    identified
  • Process
  • Purpose of research
  • Plan define var., pop.,sample, methods
  • Collect data
  • Analyze data

5
Additional Tips From Prep Workbooks
  • Outliers mean 1.5IQR or Q1-1.5(Q3-Q1) and
    Q31.5(Q3-Q1)
  • Dot plots ideal for discrete data
  • can you find. from hist., cum.rel freq., etc.
    means is it possible even if it takes multi.
    steps
  • Which is best? Hist., stem leaf means which
    one can be pulled from the quickest
  • cut-point class mark for grouped histogram
  • Data set could be 5,5,5,5 where box plot would
    be just a mark at 5
  • When asked how change affects measures model
    with your own s if needed


  • can use
    99.73std.dev.(which is 6 sigma) to verify
    reasonable std. dev.





  • Standardized normal curve (meaning datas been
    translated to z scores) always

    looks exactly the same
    (ht spread) - center is always z0 stand.
    dev. 1 (always)


  • Normal curve is denser at center (greater
    area under the curve for 1 increment)

  • When using actual
    data (not z s) small standard deviation shorter
    range so taller


  • large standard deviation
    wider range so flatter

6
Warm Up after chapter 2 Name_______________
  • When deciding if a variable is quantitiative or
    qualitative, the most important criteria to
    consider is _____________________________.
  • When describing a distribution remember to
    discuss ______________________.
  • Describing the ____________ can be very
    meaningful for a histogram but may not be for a
    bar graph of _____________ data.
  • A study conducted on youth obesity collected data
    via a pedometer attached to each child.
    Researchers found some of the more obese children
    had the highest activity levels. How do you
    think this could be explained?
  • In algebra we use x to represent ____________
    which has ____________ value(s). In stats we
    use X to represent ______________which has
    _____________value(s).
  • In algebra x x __ ____________ in stats
    X X ____ _____________
  • Game of Greed. Roll a pair of dice. All stand.
    You can take the sum or remain standing and add
    the sum of the next roll. However, when the sum
    of 2 is rolled the round ends, and you score 0.
    Play 5 rounds and collect data.

7
(No Transcript)
8
Chapter 3 Descriptive Analysis Presentation
of Bivariate Data
  • To be able to present bivariate data in tabular
    and graphic form
  • To become familiar with the ideas of descriptive
    presentation
  • To gain an understanding of the distinction
    between the basic purposes of correlation
    analysis and regression analysis

9
Sept. 8, Tues.   P. 140 3.11,12,15,17,18,22 Wed.  
        Read p. 150, work p. 152 3.35,36, 37 and
39 Thurs.       Section 3.3 work p.165
56-62 Fri.           Quiz C3 hwk chapter
exercises 3.70, 3.76 and 3.86
10
Vocabulary
  • Bivariate data
  • Sample statistics
  • 3 data type combinations for bivariate data
  • Contingency table
  • Cross-tabulation
  • Scatter-plot 1st step to see if linear
    relationship exists between 2 quantitative
    variables. predicted value is on y-axis
  • Independent variable
  • Dependent variable
  • Ordered pair
  • Input variable
  • Output variable
  • Least squares regression
  • Line of best fit
  • Linear correlation
  • Linear regression
  • Residuals plot
  • Correlation analysis
  • Positive correlation
  • Negative correlation

11
3.1 Bivariate Data
  • Bivariate Data Consists of the values of two
    different response variables that are obtained
    from the same population of interest.
  • Three combinations of variable types
  • Both variables are qualitative (_________or_______
    ___) arranged on a
  • a. ____________ or ____________ table (These
    statistics may also be displayed in a
    side-by-side bar graph)
  • b. Example A survey was conducted
    to investigate the relationship between
    preferences for television, radio, or newspaper
    for national news, and gender. The results are
    given in the table below
  • This table may be extended to
    display the _____totals (or _______). The total
    of the marginal totals is the grand total
  • Note contingency tables often show percentages
    (relative frequencies). These percentages are
    based on the entire sample or on the subsample
    (row or column) classifications.
  • One variable qualitative (_______) and other
    quantitative (________)
  • Quantitative values viewed as separate samples
  • each set identified by levels of the qualitative
    variable
  • Statistics for comparison measures of central
    tendency, measures of variation, 5-number summary
  • Each is described using techniques from C2
    results displayed side by side for easy
    comparison (dot-plots or box plots)
  • Both variables are quantitative (both numerical)
  • a. Expressed as _____________ (__,__)
  • b. x _________variable, __________
    variable y _________variable, ___________
    variable
  • c. Present data pictorially on a
    _________diagram

12
Illustration
  • These sample statistics (numerical values
    describing sample results) can be shown in a
    (side-by-side) bar graph

TV Radio NP
Percentages Based on Row (Column) Totals
  • The entries in a contingency table may also be
    expressed as percentages of the row (column)
    totals by dividing each row (column) entry by
    that rows (columns) total and multiplying by
    100. The entries in the contingency table below
    are expressed as percentages of the column totals

Note These statistics may also be displayed in a
side-by-side bar graph
13
Example
  • Example A random sample of households from
    three different parts of the country was obtained
    and their electric bill for June was recorded.
    The data is given in the table below
  • The part of the country is a qualitative variable
    with three levels of response. The electric bill
    is a quantitative variable. The electric bills
    may be compared with numerical and graphical
    techniques.

. . . . . . .
. ----------------------------------
----------------- Northeast
. ... ..
----------------------------------------
----------- Midwest
.
. . . . .
. . -----------------------------
---------------------- West
24.0 32.0 40.0 48.0 56.0
64.0
  • The electric bills in the Northeast tend to be
    more spread out than those in the Midwest. The
    bills in the West tend to be higher than both
    those in the Northeast and Midwest.

14
Two Quantitative Variables
Scatter Diagram The first tool used in
determining whether a _______ __________ exists
between 2 _______ _____________. Decide which
variable is to be _________. This variable will
be the __________ variable. Plot of all the
____________of _________ data on a coordinate
axis system. The input variable __ is plotted on
the __________ axis, and the output variable __
is plotted on the _________axis.
Note Use scales so that the range of the
y-values is equal to or slightly less than the
range of the x-values. This creates a window
that is approximately square.
  • Example In a study involving childrens fear
    related to being hospitalized, the age and the
    score each child made on the Child Medical Fear
    Scale (CMFS) are given in the table below

Construct a scatter diagram for this data.
age input variable, CMFS output variable
How to construct a scatter diagram  1.
Find the range of x values range of y values.
2. Then choose your increments for x-axis
y-axis (theyre not always the
same) 3. Plots the points - ordered pairs
(x,y). 4. Label both axes give a title
to the diagram.
examples 1. the number of hours studied for an
exam versus the grade on the exam, 2. the number
of years a runner has been training versus
his/her time for running a mile, 3. the weight
of a car versus its gas mileage. 4. Age and
price of a car Determine whether the dependent
variable in each example will increase or
decrease as the independent variable increases.
What examples can you come up with on your own.?
15
Take a look.
  • AGAINST ALL ODDS Inside Statistics has three
    videos presenting the concepts of correlation and
    regression analysis for bivariate data. Program
    9 "Correlation" reinforces the concepts behind
    correlation with several excellent examples.
    Program 8 "Describing Relationships" and the
    first 15 minutes of Program 7 "Models for
    Growth" give additional insight into regression
    analysis plus excellent examples.
  • The Student Suite CD contains three video clips
    Bivariate Data, Linear Correlation and
    Linear Regression.
  • Paper helicopter link http//courses.ncssm.edu/ma
    th/Stat_inst01/intro.htm

16
3.2 Linear Correlation
  • Linear Correlation Measures the strength of a
    linear relationship between 2 variables
  • As x increases, no definite shift in y no
    correlation
  • As x increases, a definite shift in y correlation
  • __________ correlation x increases, y increases
  • ___________correlation x increases, y decreases
  • If the ordered pairs follow a straight-line path
    __________ correlation
  • If the points are patterned close to the line
    _________
  • If the points are spread out, yet still look
    linear _____________
  • Perfect positive correlation all the points lie
    along a line with positive slope
  • Perfect negative correlation all the points lie
    along a line with negative slope
  • If the points lie along a _________ or _________
    line no correlation
  • If the points exhibit some other nonlinear
    pattern no linear relationship, no correlation
  • Need some way to measure correlation

17
Measures of Correlation
Coefficient of Linear Correlation r, measures
the strength of the linear relationship between
two variables
  • Notes
  • r 1 perfect positive correlation
  • r -1 perfect negative correlation

Alternate Formula for r
18
Example
  • Example The table below presents the weight (in
    thousands of pounds) x and the gasoline mileage
    (miles per gallon) y for ten different
    automobiles. Find the linear correlation
    coefficient

19
Please Note
  • r is usually rounded to the nearest hundredth
  • r close to 0 little or no linear correlation
  • As the magnitude of r increases, towards -1 or
    1, there is an increasingly stronger linear
    correlation between the two variables
  • Method of estimating r based on the scatter
    diagram. Window should be approximately square.
    Useful for checking calculations.
  • Can you make any predictions about circumstances
    based on the various outputs? If age vs. price
    has r -0.9, then ___
  • r only measures the strength of a linear
    relationship, and a cause and effect relationship
    cannot be concluded
  • Before answering any questions concerning data in
    contingency tables, add all of the rows and
    columns. Be sure the sum of the row totals the
    sum of the column totals the grand total. Now
    you are ready to answer all questions easily.
  • Explanatory variable independent x
  • Response variable dependent y what your e
    predicting
  •  

20
3.2 Linear Correlation Understanding the
Linear Correlation Coefficient
  • Estimating r - the linear correlation coefficient
  •  
  • 1. Draw as small a rectangle as possible that
    encompasses all of the data on the scatter
    diagram. (Diagram should cover a "square window"
    - same length and width)
  • 2. Measure the width.
  • 3. Let k the number of times the width fits
    along the length or in other words
    length/width.
  • 4. r (1 - 1/k )
  • 5. Use , if the rectangle is slanted positively
    or upward.
  • Use -, if the rectangle is slanted negatively
    or downward.
  • If there is a strong linear correlation between
    two variables, then one of the following
    situations may be true about the relationship
    between the two variables.
  •  There is a direct cause-and-effect relationship
    between the two.
  • There is a reverse cause-and-effect relationship.
  • Their relationship may be caused by a third
    variable (called a ______ ________).
  • Their relationship may be caused by the
    interactions of several other variables.
  • The apparent relationship may be strictly a
    coincidence.
  • Remember that a strong correlations does not
    necessarily imply causation.

21
Text Problems
  • P.140 3.11, 12, 15,17, 18, 22
  • Read p. 150 lurking variables causation
  • P.152 3.35, 36,37

22
Key Homework Problem Solutionsfor Small Group
Discussion Correction
  • Problem 1
  • Problem 2
  • Problem 3
  • Problem 4
  • Write your personal reflections about the
    statistics you learned from doing the problem to
    turn in for your warm up activity.

23
3.3 Linear Regression
  • If a linear relationship exists between two
    variables, that is,
  • 1. its scatter diagram suggests a
    __________ _______________
  • 2. its calculated ____ value is not near
    _________
  •  
  • Linear regression will calculate an __________
    of a ________based on the data. . This line,
    also known as the ____________ of ___________
    _________, will fit through the data with the
    smallest possible amount of ________ between it
    and the actual data points. The regression line
    can be used for generalizing and _________ over
    the sampled ________ of x.
  • STASTICAL FORM OF A LINEAR REGRESSION LINE
  • where _____________
    format differs
    from algebra
  • _____________
  • _____________
  • x ________________
  • Regression analysis finds linear equation that
    best describes relationship between 2 variables
    .
  • Least squares criterion Find the constants b0
    and b1 such that the sum is
    as small as possible
  • Some examples of various possible relationships

What would a scatter diagram look like to suggest
each relationship?
24
Illustration
  • Observed and predicted values of y

25
Example
  • Example A recent article measured the job
    satisfaction of subjects with a 14-question
    survey. The data below represents the job
    satisfaction scores, y, and the salaries, x, for
    a sample of similar individuals

1) Draw a scatter diagram for this data 2) Find
the equation of the line of best fit
26
Line of Best Fit
27
Scatter Diagram
28
Sept. 29, Mon.  C3 work day p.174 practice test
Tues. AP practice problems Quiz Wed.  start
test Thurs.  finish C3 Test Fri.    activity
Design of Experiments/Studies Oct.6, Mon.  AP
practice problems
29
Please Note
  • Keep at least three extra decimal places while
    doing the calculations to ensure an accurate
    answer
  • When rounding off the calculated values of b0 and
    b1, always keep at least two significant digits
    in the final answer
  • The slope b1 represents the predicted change in y
    per unit increase in x
  • The y-intercept is the value of y where the line
    of best fit intersects the y-axis
  • One of the main purposes for obtaining a
    regression equation is for making predictions
  • For a given value of x, we can predict a value of
    .
  • The regression equation should be used to make
    predictions only about the population from which
    the sample was drawn
  • The regression equation should be used only to
    cover the sample domain on the input variable.
    You can estimate values outside the domain
    interval, but use caution and use values close to
    the domain interval.
  • Use current data. A sample taken in 1987 should
    not be used to make predictions in 1999.
  • Work p. 165 3.57, 58, 59, 60 and 61

30
Chapter Practice
31
Supplemental Material
32
(No Transcript)
33
(No Transcript)
34
List Serv Comments
  • 1) What is plotted on a residuals plot? Is it the
    x-variable versus the residuals? If so, then in
    BVD book, on p. 149, they plot "predicted vs.
    gtresiduals." Is this another way to do
    it?Either way is fine. Typically in AP Stat it
    is x-variable v. residuals, though there was an
    AP Exam question once that did it the other way.
    Predicted v. residuals is needed when doing
    multiple regression.gt2) For TI-84 users, what
    is the difference between 4 LinReg(axb) versus
    8LinReg(abx), which is the one we use for
    stats? Both are found gtin the STAT CALC
    menu.They're the same, but abx is more
    commonly used for stats.gt3) Any good, short and
    sweet ways to explain regression to the
    mean?One thing you can do is take a scatterplot
    of something like 'Son's Height v. Father's
    Height' into several vertical bands. Put a big X
    at about the mean y-value for each vertical band.
    The regression line roughly passes through those
    bands, and the line will be less steep than the
    line yx.One way to explain the reason for this
    is to see that the line yx is approximately a
    symmetry line for the elliptical cloud of points.
    A segment perpendcular to yx with endpoints on
    the 'boundary' of the cloud is approximately
    centered on the line yx. But we are predicting
    vertically, not perpendicularly. Vertical
    segments with endpoints on the boundary of the
    cloud will be centered not on yx, but on a line
    less steep. That's hard to explain without a
    picture, but maybe it makes some sense? I
    hope?gt4) Why can we assume that the least
    squares regression line must go through the point
    (x-bar, y-bar)? Why can we assume that the mean
    gtvalue of the x-variable must necessarily
    correspond to the mean value of the
    y-variable?It's not that we assume this, it
    just happens to fall out of the algebra when you
    minimize the sum of squared residuals. David Bee
    could probably reproduce that algebra easily. It
    would take me a while. And I'd probably have to
    look it up anyway!

35
  • Lisa and OthersLisa asks3) Any good, short
    and sweet ways to explain regression to the
    mean?4) Why can we assume that the
    least-squares regression line must go through the
    point (x-bar, y-bar)? Why can we assume that the
    mean value of the x-variable must necessarily
    correspond to the mean value of the
    y-variable?Re L3 Here's a fairly short
    explanation that will probably seem       long
    because it's being written outWe know one way
    of writing the equation of the regression line
    is       (y - ybar)/sy r(x - xbar)/sx, or
    simply z_y rz_xConsider 0 lt r lt 1 and
    consider an x higher than its mean xbar.Thus,
    z_x gt 0, and so so is rz_x, which means the
    corresponding yvalue is higher than its mean
    ybar. But since r lt 1, it followsthat z_y lt z_x,
    and so the predicted y value is closer to ybar
    thanx is to xbar so, for an observation with a
    high x value, we predicty to be high, but,
    relative to the standard deviations, not as
    highas x, and so the predictions all appear to
    regress toward the mean.

36
  • Re L4 At the APStat level, one justification
    would be as follows       Prerequisite The
    point-slope equation of a line.    Consider an
    equation of the form y a bx, where a and
    b    are to be determined by the method of
    least-squares but    the calculations are all
    automated now. The process involves   
    least-squares leading to normal equations, but
    consider all    n x_i values and their
    corresponding y_i values.     Thus, we would
    have                         yi a
    bxi       Summing both sides of this equation
    gives                     SUM yi an b SUM
    xi    Dividing by n gives                      
    ybar a b xbar       (1)    Since the
    equation of the line is                        
    y  a bx           (2)    subtracting (1)
    from (2) gives                                  
       y - ybar b(x - xbar)    which shows the
    line passes through the point (x-bar,y-bar).   
    Note This is just a justification and not a
    proof I think    someone in the Forum in the
    past gave a good non-calculus-using    proof but
    I don't recall it.HTH-- David BeePS
    Note we didn't determine the value of b. However,
    if interested,    since we have SUM yi an b
    SUM xi, we could get the second    normal
    equation by multiplying yi a bxi through by
    xi and    summing, giving SUM xiyi a SUM xi
    b SUM (xi)2. Since we    now have two equations
    in a and b, they could be solved for a    and b,
    which CALC Choice 8 LinReg(abx) in effect does
    for us.

37
  • I am writing to see if there is a quicker way to
    find the standard deviation of the residuals. I
    know the formula, but the only way I currently
    see how to do it is to 1) list data 2) find
    linreg 3) place predicted data on list 4) find
    resid on next list 5) find square of resid list
    on next list 6) find sum of squared resid, then
    quickly divide by n-2 and take square root. Is
    there another way to do it on a TI 84?
  • Thanks in advance for your help.

38
Phone book Activity
  • Chapter 3 Descriptive Analysis of Bivariate
    Data-phone books
  •  
  • Teams are used for this project. Each team is
    given a local phone book. Each team determines a
    question of interest for which the answer is a
    proportion and then determines a sampling method
    for answering it. Possible questions of interest
    include the proportion who list a name and no
    address, the proportion who use initials only,
    the proportion of last names that end in son,
    or the proportion of last names for which only
    one household has that last name. Different
    sampling methods are appropriate for different
    questions. Sample size calculation must be part
    of the design. Each team reports its results to
    the class. Often, the same question is asked by
    more than one team, so the issue of variability
    in results and the relationship to margin of
    error can be discussed.
Write a Comment
User Comments (0)
About PowerShow.com