Testing of hypothesis - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Testing of hypothesis

Description:

Testing of hypothesis Dr.L.Jeyaseelan Dept. of Biostatistics Christian Medical College Vellore, India ANOVA without adjusting for age ANOVA without adjusting for age ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 60
Provided by: Dr852
Learn more at: http://chnri.org
Category:

less

Transcript and Presenter's Notes

Title: Testing of hypothesis


1
Testing of hypothesis
Dr.L.JeyaseelanDept. of BiostatisticsChristian
Medical CollegeVellore, India
2
Statistics
Inferential Statistics
Descriptive Statistics
Hypothesis testing Comparison of means
Comparison of proportions ( incidences /
prevalences)
Summarize mean / proportion
(incidence / prevalence)
3
Hypotheses
Research Question Is there a (statistically)
significant difference between two groups with
respect to the outcome? Null Hypothesis There is
no (statistically) significant difference between
two groups with respect to the outcome. Alternativ
e Hypothesis There is a (statistically)
significant difference between two groups with
respect to the outcome.
Two groups two independent populations Outcome
scores obtained Intervention Educational
training
4
P - Value
Probability of getting a result as extreme as or
more extreme than the one observed when the null
hypothesis is true. When our study results in a
probability of 0.01, we say that the likelihood
of getting the difference we found by chance
would be 1 in a 100 times. It is unlikely that
our results occurred by chance and the difference
we found in the sample probably due to the
teaching programme.
5
Variation
  • Chance Variation
  • Effect Variation
  • The difference that we might find between the
    two groups exam achievement in our sample might
    have occurred by chance, or it might have
    occurred due to the teaching programme

6
P as a significance level
P lt 0.05 result is
statistically significant P gt 0.05
result is not statistically significant. These
cutoffs are arbitrary have no specific
importance.
7
COMPARISON OF MEANS
t - tests
A bit of history... W.A. Gassit (1905) first
published a t-test. He worked at the Guiness
Brewery in Dublin and published under the name
Student. The test was called Student Test (later
shortened to t test).
8
Types of t-tests
  • One sample t-test
  • t-test for two independent (uncorrelated)
    samples
  • (i) Equal variance (ii) Unequal variance
  • t-test for two paired (correlated) samples

9
Comparison of two independent Means (Students
t-test / unpaired t-test)
A t-test is used when we wish to compare two
means Type of data required
Independent Variable
One nominal variable with two levels E.g., (i)
boy/girl students (ii) non-smoking/heavy
smoking mothers
Dependent Variable
Continuous variable E.g., (i) marks obtained by
the students in the annual exam (ii) Birth
weight of children
10
Assumptions
  • The samples are random independent of each
    other
  • The independent variable is categorical
    contains only two levels
  • The distribution of dependent variable is normal.
    If the distribution is seriously skewed, the
    t-test may be invalid.
  • The variances are equal in both the groups

11
Example data
A study was conducted to compare the birth
weights of children born to 15 non-smoking with
those of children born to 14 heavy smoking
mothers.
Non-smoking Mothers (n 15) Heavy smoking Mothers (n 14)
3.99 3.18
3.79 2.84
3.60 2.90
3.73 3.27
3.21 3.85
3.60 3.52
4.08 3.23
3.61 2.76
3.83 3.60
3.31 3.75
4.13 3.59
3.26 3.63
3.54 2.38
3.51 2.34
2.71
12
Checking the Normality
13
(No Transcript)
14
Unequal Variances
Sometimes we wish to compare two groups of
observations where the assumption of normality is
reasonable, but the variability in the two groups
are markedly different
  • Two questions arise
  • How different do the variances have to be before
    we should not use the two sample t-test?
  • What can we do if this happens?

15
Unequal Variances Contd..
  • Levenes test for equality of variances
  • Null Hypothesis The variances are equal
  • Alternative Hypothesis The variances are not
    equal

If Levenes test is not significant .
Pgt0.05 Report equal variances assumed
If Levenes test is significant ... Plt0.05
Report equal variances not assumed
(2) Use Modified t-test in the presence unequal
variances
16
How to report the results?
Heavy smoking mothers (n14) Heavy smoking mothers (n14) Non-smoking mothers (n15) Non-smoking mothers (n15) Diff in means (95 CI) P-Value
Mean SD Mean SD Diff in means (95 CI)
Birth weight of children 3.20 0.49 3.60 0.37 0.4 (0.06 0.72) 0.022
The difference between birth weight of children
born to non-smoking and heavy smoking mothers
found by chance is only 2 in a 100 times.
17
The distribution of data Normal data SD lt ½
mean use t-test Skewed / Non-normal data SD
gt ½ mean use Non parametric Mann -
Whitney test / log transformed t-test
Note Applicable only for variables where
negative values are impossible (e.g.,
Rate of GFR change)
Ref Altman DG, 1991
18
Clinical Significance Vs Statistical Significance
A possible antipyretic is tested in patients with
the common cold. 500 receive the candidate
drug 500 receive a placebo control Temperatures
measured 4 hours after dosing
p value 0.011
Yes. Probably there is a reduction in temperature
Statistical Significance? ______________________
____________ Clinical Significance?
__________________________________
NO. Temperature only fell by about 0.1?c
Because the sample size is so large we are able
to detect a very small change in temperature
19
Misuses of t-test
  • t-test for non-normal data.

Hospital 1 Hospital 1 Hospital 2 Hospital 2
Mean (SD) n Mean (SD) n
Length of Stay (in days) 26 (17) 11 79 (57) 13
Heterogeneous data SD gt ½ (mean)
Correct Method Non-parametric Mann-Whitney test
with Median and Range values
  • t-test for paired observations

Before intervention Before intervention After intervention After intervention
(n 12) (n 12) (n 12) (n 12)
Mean SD Mean SD
BP Levels 142.0 30.5 120.5 31.5
Correct method Paired t-test
20
Misuses of t-test (Contd. ..)
  • Multiple t-test
  • Comparison of length of stays between three
    hospitals

Hospital 1 Hospital 1 Hospital 2 Hospital 2 Hospital 3 Hospital 3
Mean (SD) n Mean (SD) n Mean (SD) n
Length of Stay (in days) 25 (5) 12 75 (20) 13 30 (10) 14
Hospital 1 vs Hospital 2 P- value
? Hospital 1 vs Hospital 3 P- value ? Hospital
2 vs Hospital 3 P- value ? The effective
p-value for 3 comparison is 3 x 0.05 0.15
Correct method ANOVA with Bonferroni correction.
21
(No Transcript)
22
Two groups of paired Observations
  • Paired t-test
  • Same individuals are studied more than once in
    different circumstances
  • eg. Measurements made on the same people before
    and after intervention
  • The outcome variable should be continuous
  • The difference between pre - post measurements
    should be normally distributed

23
A study was carried to evaluate the effect of the
new diet on weight loss. The study population
consist of 12 people have used the diet for 2
months their weights before and after the diet
are given below.
Patient No. Weight (Kgs) Weight (Kgs)
Patient No. Before Diet After Diet
1 75 70
2 60 54
3 68 58
4 98 93
5 83 78
6 89 84
7 65 60
8 78 77
9 95 90
10 80 76
11 100 94
12 108 100
The research question asks whether the diet makes
a difference?
24
Paired t test output
25
t- test ? To examine the difference
between two independent groups paired
t-test ? To examine the difference between pre
post measures of the same group
How do we compare more than two groups means??
26
Example Treatments A, B, C D Response BP
level
How does t-test concept work here?
A versus B B versus C A versus C B versus D A
versus D C versus D
The rate of error increases exponentially by the
number of tests conducted 1-(1-0.05)6 0.27
27
Instead of using a series of individual
comparisons we examine the differences among the
groups through an analysis that considers the
variation across all groups at once.
Analysis of Variance (ANOVA)
28
WHY ANOVA not ANOME? Although means are
compared, the comparisons are made using estimate
of variance. The ANOVA test statistic or F
statistics are actually ratios of estimate of
variance.
29
Hypotheses
The main analysis is to determine whether the
population means are all equal. If there are K
means then the null hypothesis is
Alternative hypothesis is given by
30
Type of data required
31
Assumptions
  • The samples are random independent of each
    other
  • The independent variable is categorical
    contains more than two levels
  • The distribution of dependent variable is normal.
    If the distribution is seriously skewed, the
    ANOVA may be invalid.
  • The groups should have equal variances

32
Example data
A study was conducted to assess the hb levels of
women in low, medium and high socio economic
status
SL No Low (n 20) Medium (n 18) High (n 17)
1 8.10 8.40 12.70
2 8.00 11.10 11.80
3 6.90 10.80 13.10
4 11.40 11.00 12.30
5 10.70 12.20 10.90
6 10.20 8.70 12.60
7 8.90 12.30 13.20
8 9.90 11.50 14.20
9 6.80 11.60 11.80
10 9.10 12.90 12.40
SL No Low (n 20) Medium (n 18) High (n 17)
11 9.20 12.00 12.70
12 7.40 10.90 13.40
13 10.70 11.70 14.30
14 11.40 11.00 13.80
15 7.70 12.20 15.00
16 6.10 11.20 14.20
17 11.00 10.70 9.20
18 11.10 9.90
19 7.90
20 10.60
33
Source of Variation ANOVA separates the
variation in all the data into two parts The
variation between the each group mean and the
overall mean for all the groups (the between
group variability) and the variation between each
study participant and the participants group mean
(the within-group variability). If the
between-group variability is much greater than
the within-group variability, there are likely to
be difference between the group means.
34
ANOVA data
Group 1
Group 2
Group 3
35
ANOVA output
36
Multiple Comparisons procedure ANOVA is a "
group comparison " that determines whether a
statistically significant difference exists
somewhere among the groups studied. If a
significant difference is indicated, ANOVA is
usually followed by a " multiple comparison
procedure " that compares combinations of groups
to examine further any differences among them.
The most common multiple comparison procedure is
the " pairwise comparison ", in which each group
mean is compared (two at a time) to all other
group means to determine which groups differ
significantly.
37
Bonferroni Test Uses t tests to perform
pairwise comparisons between group means, but
controls overall error rate by setting the error
rate for each test to the experiment wise error
rate divided by the total number of tests.
Disadvantage with this procedure is that true
overall level may be so much less than the
maximum value ? that none of individual tests
are more likely to be rejected.
38
Tukeys Method Uses the studentized range
statistic to make all of the pairwise comparisons
between groups.Sets the experiment wise error
rate at the error rate for the collection for all
pairwise comparisons This method is applicable
when 1. Size of the sample from each group are
equal. 2. Pairwise comparisons of means are of
primary interest that is Null hypothesis of the
form. to be considered.
39
Scheffé test Performs simultaneous joint
pairwise comparisons for all possible pairwise
combinations of means. Uses the F sampling
distribution. This method is recommended
when 1. The size of the samples selected from
the different populations are
unequal. 2. Comparisons other than simple
pairwise comparison between two means are of
interest.
40
Analysis of Covariance (ANCOVA)
41
Analysis of covariance
  • ANCOVA is an another ANOVA technique which
    combines the ANOVA with regression to measure the
    differences among group means
  • The advantages that ANCOVA has over other
    techniques are
  • The ability to reduce the error variance in the
    outcome measure.
  • The ability to measure group differences after
    allowing for other differences between subjects.
  • In ANOVA two sets of variables are involved in
    the analysis the independent and the dependent
    variable. With ANCOVA a third type of variable is
    included the covariate which is continuous

42
Assumptions
  1. The groups should be mutually exclusive.
  2. The variance of the groups should be equivalent.
  3. The dependent variable should be normally
    distributed.
  4. The covariate should be a continuous variable.
  5. The covariate and the dependent variable must
    show a linear relationship.
  6. The direction and strength of relationship
    between the covariate and dependent variable must
    be similar in each group (homogeneity of
    regression across groups).

43
Steps for the analysis
  • Check whether the dependent variable is normally
    distributed.
  • (Use rule of thump)
  • Sum chol
  • Test whether the variance of the dependent
    variable is similar across groups (Bartletts
    test for equal variances)
  • Oneway chol group, tabulate
  • Measure the correlation between cholesterol and
    age.
  • Corr chol age
  • Twoway (scatter chol age)

44
Cont..
  • Homogeneity of regression across groups is
    equivalent to testing interaction between the
    covariate and the independent variable.
  • Anova chol group age agegroup, contin(age)
  • If interaction is significant one could study the
    effect of age on cholesterol in each of the two
    groups separately.
  • If the interaction is not significant then the
    assumptions are met and it is appropriate to do
    ANCOVA.
  • anova chol group age agegroup, contin(age)

45
Summary
  • ANCOVA is an extension of ANOVA that allows us to
    remove additional sources of variation from the
    error term, thus enhancing the power of our
    analysis.
  • ANCOVA Should be used only after careful
    consideration has been given to meeting the
    underlying assumptions.
  • It is especially important to check for
    homogeneity of regression, because if that
    assumption is violated, ANCOVA can lead to
    improper interpretations of results.

46
Example
  • In a survey to examine relationships between the
    nutrition and the health of women in middle west,
    the concentration of cholesterol in the blood
    serum was determined on 56 randomly selected
    subjects of Iowa and 130 in Nebraska
  • After controlling for age, do the two groups
    (Iowa, Nebraska) differ significantly on the
    cholesterol levels?

47
Dataset
48
(No Transcript)
49
ANOVA without adjusting for age
50
(No Transcript)
51
Testing Homogeneity of Variances across groups
52
Measuring the correlation between cholesterol and
age
53
Correlations between the dependent variable and
the covariate
54
Testing Homogeneity of regression across groups
55
Testing the homogeneity of regression across
groups
56
Model shows that the interaction term is not
significant (Assumption is met)
57
The Interaction term is eliminated from the
model(Full Factorial model)
58
The ANCOVA results
59
Interpretation of the findings
  • After controlling for the covariate age the two
    groups, (IOWA and Nebraska) do not differ
    significantly in their cholesterol levels.
  • Note that the error variance was very high when
    age is not adjusted in the model
Write a Comment
User Comments (0)
About PowerShow.com