Stat%20112:%20Lecture%2021%20Notes - PowerPoint PPT Presentation

About This Presentation
Title:

Stat%20112:%20Lecture%2021%20Notes

Description:

Homework 6 is due Friday, Dec. 1st. ... After 315 volts, no further answers appear, and the pounding ceases. ... To answer these questions, we can fit a ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 31
Provided by: D2
Category:

less

Transcript and Presenter's Notes

Title: Stat%20112:%20Lecture%2021%20Notes


1
Stat 112 Lecture 21 Notes
  • Model Building (Brief Discussion)
  • Chapter 9.1 One way Analysis of Variance.
  • Homework 6 is due Friday, Dec. 1st.
  • I will be e-mailing you tonight or tomorrow some
    comments on your project ideas.
  • I will have the quizzes graded by tomorrows
    office hours (Wed. 130-230) otherwise, I will
    return to you next Tuesday.

2
Model Building
  1. Among the potential explanatory variables, think
    about which explanatory variables address the
    question of interest.
  2. For each explanatory variable, investigate
    whether a transformation is needed for it either
    because of curvature or crunching.
  3. Consider adding polynomial terms for each
    variable if there is remaining curvature for the
    variable (use the procedure of adding higher
    orders as long as the highest order term has
    p-value lt 0.05).
  4. Consider interactions between the explanatory
    variables, adding the interaction if the p-value
    lt 0.05 on the interaction term.

3
Analysis of Variance
  • The goal of analysis of variance is to compare
    the means of several (many) groups.
  • Analysis of variance is regression with only
    categorical variables
  • One-way analysis of variance Groups are defined
    by one categorical variable.
  • Two-way analysis of variance Groups are defined
    by two categorical variables.

4
Milgrams Obedience Experiments
  • Subjects recruited to take part in an experiment
    on memory and learning.
  • The subject is the teacher.

The subject conducted a paired-associated
learning task with the student. The subject is
instructed by the experimenter to administer a
shock to the student each time he gave a wrong
response. Moreover, the subject was instructed
to move one level higher on the shock generator
each time the learner gives a wrong answer.
The subject was also instructed to announce the
voltage level before administering a shock.
5
Four Experimental Conditions
  1. Remote-Feedback condition Student is placed in a
    room where he cannot be seen by the subject nor
    can his voice be heard his answers flash
    silently on signal box. However, at 300 volts
    the laboratory walls resound as he pounds in
    protest. After 315 volts, no further answers
    appear, and the pounding ceases.
  2. Voice-Feedback condition Same as remote-feedback
    condition except that vocal protests were
    introduced that could be heard clearly through
    the walls of the laboratory.

6
  1. Proximity Same as the voice-feedback condition
    except that student was placed in the same room
    as the subject, a few feet from subject. Thus,
    he was visible as well as audible.
  2. Touch-Proximity Same as proximity condition
    except that student received a shock only when
    his hand rested on a shock plate. At the
    150-volt level, the student demanded to be let
    free and refused to place his hand on the shock
    plate. The experimenter ordered the subject to
    force the victims hand onto the plate.

7
Two Key Questions
  1. Is there any difference among the mean voltage
    levels of the four conditions?
  2. If there are differences, what conditions
    specifically are different?

8
Multiple Regression Model for Analysis of Variance
  • To answer these questions, we can fit a multiple
    regression model with voltage level as the
    response and one categorical explanatory variable
    (condition).
  • We obtain a sample from each level of the
    categorical variable (group) and are interested
    in estimating the population means of the groups
    based on these samples.
  • Assumptions of multiple regression model for
    one-way analysis of variance
  • Linearity automatically satisfied.
  • Constant variance Check if spread within each
    group is the same.
  • Normality Check if distribution within each
    group is normally distributed.
  • Independence Sample consists of independent
    observations.

9
Comparing the Groups
  • The coefficient on ConditionProximity-26.25
    means that proximity is estimated to have a mean
    that is 26.25 less than the mean of the means of
    all the conditions.

  • Sample mean of proximity group.

10
  • Effect Test tests null hypothesis that the mean
    in all four conditions is the same versus
    alternative hypothesis that at least two of the
    conditions have different means.
  • p-value of Effect Test lt 0.0001. Strong evidence
    that population means are not the same for all
    four conditions.

11
JMP for One-way ANOVA
  • One-way ANOVA can be carried out in JMP either
    using Fit Model with a categorical explanatory
    variable or Fit Y by X with the categorical
    variable as the explanatory variable.
  • After using the Fit Y by X command, click the red
    triangle next to Oneway Analysis and then Display
    Options, Boxplots to see side by side boxplots
    and click Mean/ANOVA to see means of the
    different groups and the test of whether all
    groups have the same means. This test of whether
    all groups have the same means has p-value ProbgtF
    in the ANOVA table.

12
ProbgtF p-value for test that all groups have
same mean. Same as p-value for Effect test in
Fit Model Output.
13
Two Key Questions
  • Is there any difference among the mean voltage
    levels of the four conditions?
  • Yes, there is strong evidence of a
    difference. p-value of Effect Test lt 0.0001.
  • If there are differences, what conditions
    specifically are different?

14
Testing whether each of the groups is different
  • Naïve approach to deciding which groups have mean
    that is different from the average of the means
    of all groups Do t-test for each group and look
    for groups that have p-value lt0.05.
  • Problem Multiple comparisons.

15
(No Transcript)
16
Errors in Hypothesis Testing
State of World State of World
Null Hypothesis True Alternative Hypothesis True
Decision Based on Data Accept Null Hypothesis Correct Decision Type II error
Decision Based on Data Reject Null Hypothesis Type I errror Correct Decision
When we do one hypothesis test and reject null
hypothesis if p-value lt0.05, then the probability
of making a Type I error when the null hypothesis
is true is 0.05. We protect against falsely
rejecting a null hypothesis by making probability
of Type I error small.
17
Multiple Comparisons Problem
  • Compound uncertainty When doing more than one
    test, there is an increase chance of making a
    mistake.
  • If we do multiple hypothesis tests and use the
    rule of rejecting the null hypothesis in each
    test if the p-value is lt0.05, then if all the
    null hypotheses are true, the probability of
    falsely rejecting at least one null hypothesis is
    gt0.05.

18
Multiple Comparisons Simulation
  • In multiplecomp.JMP, 20 groups are compared with
    sample sizes of ten for each group.
  • The observations for each group are simulated
    from a standard normal distribution. Thus, in
    fact,
  • Number of pairs found to have significantly
    different means using t-test at level

Iteration 1 2 3 4 5
of Pairs
19
Multiple Comparison Simulation
  • In multiplecomp.JMP, 20 groups are compared with
    sample sizes of ten for each group.
  • The observations for each group are simulated
    from a standard normal distribution. Thus, in
    fact,
  • Number of groups found to have means different
    than average using t-test and rejecting if
    p-value lt0.05.

Iteration 1 2 3 4 5
of Groups
20
Individual vs. Familywise Error Rate
  • When several tests are considered simultaneously,
    they constitute a family of tests.
  • Individual Type I error rate Probability for a
    single test that the null hypothesis will be
    rejected assuming that the null hypothesis is
    true.
  • Familywise Type I error rate Probability for a
    family of test that at least one null hypothesis
    will be rejected assuming that all of the null
    hypotheses are true.
  • When we consider a family of tests, we want to
    make the familywise error rate small, say 0.05,
    to protect against falsely rejecting a null
    hypothesis.

21
Bonferroni Method
  • General method for doing multiple comparisons for
    any family of k tests.
  • Denote familywise type I error rate we want by
    p, say p0.05.
  • Compute p-values for each individual test --
  • Reject null hypothesis for ith test if
  • Guarantees that familywise type I error rate is
    at most p.
  • Why Bonferroni works If we do k tests and all
    null hypotheses are true , then using Bonferroni
    with p0.05, we have probability 0.05/k to make
    a Type I error for each test and expect to make
    k(0.05/k)0.05 errors in total.

22
Tukeys HSD
  • Tukeys HSD is a method that is specifically
    designed to control the familywise type I error
    rate (at 0.05) for analysis of variance.
  • After Fit Model, click the red triangle next to
    the X variable and click LSMeans Tukey HSD.

23
Comparisons between groups that are in red are
groups for which the null hypothesis that the
group means are the same is rejected using the
Tukey HSD procedure, which controls the
familywise Type I error rate at 0.05. A
confidence interval for the difference in group
means that adjusts for multiple comparisons is
shown in the third and fourth lines.
24
Assumptions in one-way ANOVA
  • Assumptions needed for validity of one-way
    analysis of variance p-values and CIs
  • Linearity automatically satisfied.
  • Constant variance Spread within each group is
    the same.
  • Normality Distribution within each group is
    normally distributed.
  • Independence Sample consists of independent
    observations.

25
Rule of thumb for checking constant variance
  • Constant variance Look at standard deviation of
    different groups by using Fit Y by X and clicking
    Means and Std Dev.
  • Rule of Thumb Check whether (highest group
    standard deviation/lowest group standard
    deviation) is greater than 2. If greater than 2,
    then constant variance is not reasonable and
    transformation should be considered.. If less
    than 2, then constant variance is reasonable.
  • (Highest group standard deviation/lowest group
    standard deviation) (131.874/63.640)2.07.
    Thus, constant variance is not reasonable for
    Milgrams data.

26
Transformations to correct for nonconstant
variance
  • If standard deviation is highest for high groups
    with high means, try transforming Y to log Y or
    . If standard deviation is highest for groups
    with low means, try transforming Y to Y2.
  • SD is particularly low for group with highest
    mean. Try transforming to Y2. To make the
    transformation, right click in new column, click
    New Column and then right click again in the
    created column and click Formula and enter the
    appropriate formula for the transformation.

27
Transformation of Milgrams data to Squared
Voltage Level
  • Check of constant variance for transformed data
    (Highest group standard deviation/lowest group
    standard deviation) 1.63. Constant variance
    assumption is reasonable for voltage squared.
  • Analysis of variance tests are approximately
    valid for voltage squared data reanalyzed data
    using voltage squared.

28
Analysis using Voltage Squared
Strong evidence that the group mean voltage
squared levels are not all the same.
Strong evidence that remote has higher mean
voltage squared level than proximity and
touch-proximity and that voice-feedback has
higher mean voltage squared level than
touch-proximity, taking into account the multiple
comparisons.
29
Rule of Thumb for Checking Normality in ANOVA
  • The normality assumption for ANOVA is that the
    distribution in each group is normal. Can be
    checked by looking at the boxplot, histogram and
    normal quantile plot for each group.
  • If there are more than 30 observations in each
    group, then the normality assumption is not
    important ANOVA p-values and CIs will still be
    approximately valid even for nonnormal data if
    there are more than 30 observations in each
    group.
  • If there are less than 30 observations per group,
    then we can check normality by clicking Analyze,
    Distribution and then putting the Y variable in
    the Y, Columns box and the categorical variable
    denoting the group in the By box. We can then
    create normal quantile plots for each group and
    check that for each group, the points in the
    normal quantile plot are in the confidence bands.
    If there is nonnormality, we can try to use a
    transformation such as log Y and see if the
    transformed data is approximately normally
    distributed in each group.

30
One way Analysis of Variance Steps in Analysis
  1. Check assumptions (constant variance, normality,
    independence). If constant variance is violated,
    try transformations.
  2. Use the effect test (commonly called the F-test)
    to test whether all group means are the same.
  3. If it is found that at least two group means
    differ from the effect test, use Tukeys HSD
    procedure to investigate which groups are
    different, taking into account the fact multiple
    comparisons are being done.
Write a Comment
User Comments (0)
About PowerShow.com