Evaluating importance: An overview - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Evaluating importance: An overview

Description:

Statistical significance (alpha level; p-value) reflects the odds that a ... Rapidly identify best available evidence addressing the foreground question ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 30
Provided by: christine58
Category:

less

Transcript and Presenter's Notes

Title: Evaluating importance: An overview


1
Evaluating importance An overview
  • Size (magnitude) of effect
    (a.k.a. practical
    significance)
  • d or other
  • Functional significance
    (a.k.a. clinical significance)
  • e.g., social validity ratings
  • Cost-benefit ratio
  • Feasibility

2
Practical vs statistical significance
  • Statistical significance (alpha level p-value)
    reflects the odds that a particular finding could
    have occurred by chance.
  • If the p-value for a difference between two
    groups is 0.05, it would be expected to occur by
    chance just 5 times out of 100 (thus, it is
    likely to be a real difference).
  • If the p-value for the difference is 0.01, it
    would be expected to occur by chance just one
    time out of 100 (thus, we can be even more
    confident that the difference is real rather than
    random).

3
Practical significance
  • Reflects the magnitude, or size, of the
    difference, not the odds that it could have
    occurred by chance
  • Arguably much more important than statistical
    significance, especially for clinical questions
  • Measures of effect size (ES) quantify practical
    significance of a finding

4
Effect size
  • The degree to which the null hypothesis is false,
    e.g., not just that two groups differ
    significantly, but how much they differ (Cohen,
    1990)
  • Several measures of ES exist use whatever
    conveys the magnitude of the phenomenon of
    interest appropriate to the research context
    (Cohen, 1990, p. 1310)
  • IQ and height example (Cohen, 1990)

5
The height-IQ correlation Cohens (1990) example
on statistical and practical significance
  • A study of 14,000 children ages 6-17 showed a
    highly significant (p lt .001) correlation of
    r .11) between height and IQ
  • What does this p indicate?
  • Whats the magnitude of this correlation?
  • Accounts for 1 of the variance
  • Based on an r this big, youd expect that
    increasing a childs height by 4 feet would
    increase IQ by 30 points, and that increasing IQ
    by 233 points would increase height by 4 inches
    (as a correlation, the predicted relationship
    could work in either direction)

6
2 main types of ES measures
  • Variance accounted for
  • a squared metric reflecting the percentage of
    variance in the dependent variable explained by
    the independent variable
  • e.g., squared correlations, odds ratios, kappa
    statistics
  • Standardized difference
  • scales measurements across studies into a single
    metric referenced to some standard deviation
  • d the most common and the easiest conceptually
    our focus today

7
Effect size
  • APA (2001) Publication Manual mandates . . .it
    is almost always necessary to include some index
    of effect size or strength of
    relationshipprovide the reader not
    only with information about statistical
    significance but also with enough information to
    assess the magnitude of the observed effect
    or relationship (pp. 25-26).

8
  • APA guidelines (2001) mandate inclusion of ES
    information (not just p-value information) in
    all published reports
  • Until that happy day, if ES information is
    missing, readers must estimate ES for themselves
  • When group means and SDs are reported, you often
    can estimate effect size quickly and decide
    whether to keep reading or not

9
Finding, estimating and interpreting d in group
comparison studies
  • d Difference between the means of the two
    groups, divided by the standard deviation (SD)
  • Interpret as size of group difference in SD units
  • When average mean difference between tx and
    control groups is 0.8 to 1 SD, practical
    significance has been defined as high

10
Estimating d
  • Find group means, subtract them, and divide by
    the standard deviation.
  • When SDs for the groups are identical, hooray.
    When not, arguments have been made for using the
    control group SD, or the average of the two SDs.
  • My preference is the second, which is more
    conservative and strikes me as more appropriate
    when dealing with the large variability we see in
    many groups of patients with disorders

11
Exercise 1 Calculating effect size, given group
means and SDs
  • Data from Arnold et al. (2004) study comparing
    scores on SNAP composite test after four types of
    treatment for ADHD
  • (Scores on SNAP composite lower better)
  • Treatment group Mean (SD)
  • Combined 0.92 (0.50)
  • Medical management 0.95 (0.51)
  • Behavioral 1.34 (0.56)
  • Community care 1.40 (0.54)

12
d demonstration, comparing SNAP performance in
Combined and Medical Mgt groups
  • Combined 0.92 (0.50)
  • Medical management 0.95 (0.51)
  • d 0.92-0.95/0.505 -.03/.505 -.0594
  • Interpretation The Combined group scored about
    6/100s of a standard deviation better (lower)
    than the Medical Mgt group (an extremely tiny
    difference these treatment approaches resulted
    in virtually the same outcomes on the SNAP
    measure)

13
d for Combined vs Community Care treatment groups
  • Combined 0.92 (0.50)
  • Community care 1.40 (0.54)
  • d 0.92-1.40/0.52 -0.48/.52 -0.92
  • Interpretation The Combined group scored nearly
    a whole standard deviation better than the
    Community care group this is a large effect
    size. Combined treatment is substantially better
    than Community care.

14
d for Medical Mgt vs Behavioral treatment
  • Medical management 0.95 (0.51)
  • Behavioral 1.34 (0.56)
  • d ?
  • Interpretation?

15
d for Medical Mgt vs Behavioral treatment
  • Medical management 0.95 (0.51)
  • Behavioral 1.34 (0.56)
  • d 0.95-1.34 -.39/.535 -.72897 -.73
  • Interpretation The Medical Mgt group scored
    about 3/4s of a SD better than the behavioral
    group. This is a solid effect size suggesting
    that Medical Mgt treatment was substantially more
    effective than Behavioral treatment.

16
Exercise 1 Interpreting d in the happy cases
when its reported
  • Treatment-difference effect sizes (Cohens d)
    from Arnold et al., 2004 (Table II, p. 45)
  • Combined vs Medical Management 0.06
  • Combined vs Behavioral 0.79
  • Combined vs Community Care 0.92
  • Medical Management vs Behavioral 0.72
  • Medical Mgt vs Community Care 0.85
  • Behavioral vs Community Care 0.11
  • Note that our calculated ds match these.

17
On to theme 3 an overview of evaluating precision
  • Precision is reflected by the width of the
    confidence interval (CI) surrounding a given
    finding
  • Any given finding is acknowledged to be an
    estimate of the real or true finding
  • CI reflects the range of values that includes the
    real finding with a known probability
  • A finding with a narrower CI is more precise (and
    thus more clinically useful) than a finding with
    a broader CI

18
Evaluating precision (cont.)
  • CIs are calculated by adding and subtracting a
    multiple of the standard error for a
    finding/value (e.g., value 1.96SE to determine
    the 95 CI)
  • standard error depends on sample size and
    reliability larger samples and higher
    reliability will result in narrower CIs, all else
    being equal
  • Sackett et al. (2000) Appendix 1 shows how to
    calculate CIs by hand, and easy-to-use
    statistical programs (many free on the web)
    provide CIs when raw data are available.

19
Finding and interpreting evidence of precision
  • CIs for difference between means of 206 children
    receiving early TTP and 196 receiving late TTP
    for OME (Paradise et al. 2001)
  • Early Late 95 CI
  • PPVT 92 (13) 92 (15) -2.8 to 2.8
  • NDW 124 (32) 126 (30) -7.6 to 4.8
  • PCC-R 85 (7) 86 (7) -2.1 to 0.7
  • CIs are narrow thanks to large sample

20
  • Contrast with risk estimates for low PCC-R from
    smaller samples of children with (n15) and
    without (n47) OME-associated hearing loss
    (Shriberg et al., 2000)
  • Estimated risk was 9.60 (i.e., children with
    hearing loss were 9.6 times more likely to have
    low PCC-R at age 3 than children without
  • But 95 confidence interval was 1.08-85.58
    meaning that this increased risk was somewhere
    between none and a lot. Not very precise!

21
Predict precision
  • In one study, children with histories of OME
    (n10) had significantly lower scores on a
    competitive listening task than children without
    OME histories (n13)
  • OME OME- p
  • -6.8 (2.8) -9.7 (2.6) .016
  • How could you quantify importance?
  • What would you predict about precision?

22
When multiple studies of a question are
available, meta-analysis
  • Quantitative summary of effects across a number
    of studies addressing particular question,
    usually in the form of a d (effect size)
    statistic
  • In EBP evidence reviews, the highest quality
    evidence comes from meta-analysis of studies with
    strong validity, precision, and importance

23
Evidence levels for evaluating quality of
treatment studiesa
  • Best Ia Meta-analysis of gt1 randomized
    controlled
  • trial (RCT)
  • Ib Well-designed randomized controlled study
  • IIa Well-designed controlled study without
  • randomization
  • IIb Well-designed quasi-experimental study
  • III Well-designed non-experimental
    studies,
  • i.e., comparative, correlational,
    and case
  • studies
  • Worst IV Expert committee report, consensus
  • conference, clinical experience
    of
  • respected authorities

24
A meta-analysis of OME and speech and language
(Casby, 2001)
  • Casby (2001) summarized results of available
    studies of OME and childrens language
  • For global language abilities, the effect size
    for comparing mean language scores from children
    with and without OME histories was d -.07.
  • Interpretation and a graphic representation

25
(No Transcript)
26
A more informative graphic for meta-analyses
  • Shows d from each study as well as associated 95
    CI.

27
d and 95 CI boundaries for OME and vocabulary
comprehension (Casby, 2001)
Study
Paradise 00
Black 93
Lonigan 92
Roberts 91m
Upper 95 CI
d
Roberts 91l
Lower 5 CI
Teele 90
Lous 88
Teele 84
2.5
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Better with OME
Worse with OME
Overall d .001
28
The need for meta-analyses in communication
disorders
  • Relatively few have been conducted, primarily
    because many studies in our literature
  • have not been conducted using procedures that
    would warrant their inclusion
  • may have been conducted carefully, but have not
    reported the information required
  • CONSORT (www.consort-statement.org) and STARD
    (Bossuyt et al., 2003) statements as one solution

29
Given that few meta-analyses are available
  • Rapidly identify best available evidence
    addressing the foreground question
  • Appraise it critically with respect to validity,
    precision, and importance
  • Use CAT format to summarize your appraisal in an
    organized, readily accessible (and update-able)
    way
Write a Comment
User Comments (0)
About PowerShow.com