Revising FDA - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Revising FDA

Description:

Revising FDA s Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006 – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 32
Provided by: BobGa6
Category:
Tags: fda | revising

less

Transcript and Presenter's Notes

Title: Revising FDA


1
Revising FDAs Statistical Guidance on Reporting
Results from Studies Evaluating Diagnostic Tests
  • FDA/Industry Statistics Workshop
  • September 28-29, 2006
  • Kristen Meier, Ph.D.
  • Mathematical Statistician, Division of
    Biostatistics
  • Office of Surveillance and Biometrics
  • Center for Devices and Radiological Health, FDA

2
Outline
  • Background of guidance development
  • Overview of comments
  • STARD Initiative and definitions
  • Choice of comparative benchmark and implications
  • Agreement measures pitfalls
  • Bias
  • Estimating performance without a perfect
    reference standard - latest research
  • Reporting recommendations

3
Background
  • Motivated by CDC concerns with IVDs for sexually
    transmitted diseases
  • Joint meeting of four FDA device panels
    (2/11/98) Hematology/Pathology, Clinical
    Chemistry/Toxicology, Microbiology and Immunology
  • Provide recommendations on appropriate data
    collection, analysis, and resolution of
    discrepant results, using sound scientific and
    statistical analysis to support indications for
    use of in vitro diagnostic devices when the new
    device is compared to another device, a
    recognized reference method or gold standard,
    or other procedures not commonly used, and/or
    clinical criteria for diagnosis

4
Statistical Guidance Developed
  • Statistical Guidance on Reporting Results from
    Studies Evaluating Diagnostic Tests Draft
    Guidance for Industry and FDA Reviewers
  • issued in Mar. 12, 2003 with a 90-day comment
    period
  • http//www.fda.gov/cdrh/osb/guidance/1428.html
  • for all diagnostic products not just in vitro
    diagnostics
  • only addresses diagnostic devices with 2 possible
    outcomes (positive/negative)
  • does not address design and monitoring of
    clinical studies for diagnostic devices

5
Dichotomous Diagnostic Test Performance
  • Study Population
  • TRUTH
  • Truth Truth?
  • New Test TP (true) FP (false)
  • Test Test? FN (false? ) TN (true?)
  • estimate
  • sensitivity (sens) Pr(TestTruth) ?
    100TP/(TPFN)
  • specificity (spec) Pr(Test?Truth?) ?
    100TN/(FPTN)
  • Perfect test sensspec100 (FPFN0)

6
Example Data 220 Subjects
  • TRUTH Imperfect Standard
  • ? ?
  • New 44 1 New 40 5
  • Test ? 7 168 Test ? 4 171
  • total 51 169 total 44 176
  • Unbiased Estimates Biased Estimates
  • Sens 86.3 (44/51) 90.9 (40/44)
  • Spec 99.4 (168/169) 97.2 (171/176)
  • Misclassification bias (see Begg 1987)

7
Recalculation of Performance Using Discrepant
Resolution
  • STAGE 1 retest discordants STAGE 2
    revise 2x2
  • using a resolver test based on
    resolver result
  • Imperfect Standard
    Resolver/imperfect std.
  • ? ?
  • New 40 5 (5, 0?) New
    45 0
  • Test ? 4 (1, 3?) 171 Test ?
    1 174
  • total 44 176 total 46
    174
  • sens 90.9 (40/44) ? 97.8 (45/46)
  • spec 97.2 (171/176) ? 100 (174/174)
  • assumes concordantcorrect

8
Topics for Guidance
  • Realization
  • Problems are much larger than discrepant
    resolution
  • 2x2 is an oversimplification, but still useful to
    start
  • Provide guidance
  • What constitutes truth?
  • What to do if we dont know truth?
  • What name do we give performance measures when we
    dont have truth?
  • Describing study design how were subjects,
    specimens, measurements, labs collected/chosen?

9
Comments on Guidance
  • FDA received comments from 11 individuals/organiza
    tions
  • provide guidance on what constitutes perfect
    standard
  • remove perfect/imperfect standard concept and
    include and define reference/non-reference
    standard concept (STARD)
  • reference and use STARD concepts
  • provide approach for indeterminate, inconclusive,
    equivocal, etc results
  • minimal recommendations
  • discuss methods for estimating sens and spec when
    a perfect reference standard is not used
  • cite new literature
  • include more discussion on bias, including
    verification bias
  • some discussion added, add more references
  • add glossary

10
STARD Initiative
  • STAndards for Reporting of Diagnostic Accuracy
    Initiative
  • effort by international working group to improve
    quality of reporting of studies of diagnostic
    accuracy
  • checklist of 25 items to include when reporting
    results
  • provide definitions for terminology
  • http//www.consort-statement.org/stardstatement.ht
    m

11
STARD Definitions Adopted
  • Purpose of a qualitative diagnostic test is to
    determine whether a target condition is present
    or absent in a subject from the intended use
    population
  • Target condition (condition of interest) can
    refer to a particular disease , a disease stage,
    health status, or any other identifiable
    condition within a patient, such as staging a
    disease already known to be present, or a health
    condition that should prompt clinical action,
    such as the initiation, modification or
    termination of treatment
  • Intended use population (target population)
    those subjects/patients for whom the test is
    intended to be used

12
Reference Standard (STARD)
  • Move away from notion of a fixed, theoretical
    Truth
  • considered to be the best available method for
    establishing the presence or absence of the
    target conditionit can be a single test or
    method, or a combination of methods and
    techniques, including clinical follow-up
  • dichotomous - divides the intended use population
    into condition present or absent
  • does not consider outcome of new test under
    evaluation

13
Reference Standard (FDA)
  • What constitutes best available
    method/reference method?
  • opinion and practice within the medical,
    laboratory and regulatory community
  • several possible methods could be considered
  • maybe no consensus reference standard exists
  • maybe reference standard exists but for
    non-negligible or intended use population, the
    reference standard is known to be in error
  • FDA ADVICE
  • consult with FDA on choice of reference standard
    before beginning your study
  • performance measures must be interpreted in
    context report reference standard along with
    performance measures

14
Benchmarks for Assessing Diagnostic Performance
  • NEW FDA recognizes 2 major categories of
    benchmarks
  • reference standard (STARD)
  • non-reference standard (a method or predicate
    other than a reference standard 510(k)
    regulations)
  • OLD perfect standard and imperfect standard,
    gold standard concepts and terms deleted
  • Choice of comparative method determines which
    performance measures can be reported

15
Comparison with Benchmark
  • If a reference standard is available use it
  • If a reference standard is available, but
    impractical use it to the extent possible
  • If a reference standard is not available or
    unacceptable for your situation consider
    constructing one
  • If a reference standard is not available and
    cannot be constructed, use a non-reference
    standard and report agreement

16
Naming Performance Measures Depends on
Benchmarks
  • Terminology is important help ensure correct
    interpretation
  • Reference standard (STARD)
  • a lot of literature on studies of diagnostic
    accuracy (Pepe 2003, Zhou et al. 2002)
  • report sensitivity, specificity (and
    corresponding CIs), predictive values of positive
    and negative results
  • Non-reference standard (due to 510(k)
    regulations)
  • report positive percent agreement and negative
    percent agreement
  • NEW include corresponding CIs (consider score
    CIs)
  • interpret with care many pitfalls!

17
Agreement
  • Study Population
  • Non-Reference Standard
  • ?
  • New Test a b
  • Test Test? c d
  • Positive percent agreement (new/non ref. std.)
    100a/(ac)
  • Negative percent agreement (new/non ref.
    std.)100d/(bd)
  • overall percent agreement100(ad)/(abcd)
  • Perfect new test PPA?100 and NPA?100

18
Pitfalls of Agreement
  • agreement as defined here is not symmetric
    calculation is different depending on which
    marginal total you use for the denominator
  • overall percent agreement is symmetric, but can
    be misleading (very different 2x2 data can give
    the same overall agreement
  • agreement ? correct
  • overall agreement, PPA and NPA can change
    (possibly a lot) depending the prevalence
    (relative frequency of target condition in
    intended use population)

19
Overall Agreement Misleading
  • Non-Ref Non-Ref Standard
    Standard
  • ? ?
  • New 40 1 New 40 19
  • Test ? 19 512 Test ? 1 512
  • total 59 513 total 41 531
  • overall agreement 96.5 ((40512)/572))
  • PPA 67.8 (40/59) PPA 97.6 (40/41)
  • NPA 99.8 (512/513) NPA 96.4 (512/531)

20
Agreement ? Correct
  • Original data Non-Reference Standard
  • ?
  • New 40 5
  • Test ? 4 171
  • Stratify data above by Reference Standard
    outcome
  • Reference Std Reference Std ?
  • Non-Ref Std Non-Ref Std
  • ? ?
  • New 39 5 New 1 0
  • Test ? 1 6 Test ? 3 165
  • tests agree and are wrong for 61 7 subjects

21
Bias
  • Unknown and non-quantified uncertainty
  • Often existence, size (magnitude), and direction
    of bias cannot be determined
  • Increasing overall number of subjects reduces
    statistical uncertainty (confidence interval
    widths) but may do nothing to reduce bias

22
Some Types of Bias
  • error in reference standard
  • use test under evaluation to establish diagnosis
  • spectrum bias do not choose the right
    subjects
  • verification bias only a non-representative
    subset of subjects evaluated by reference
    standard, no statistical adjustments made to
    estimates
  • many other types of bias
  • See Begg (1987), Pepe (2003), Zhou et al. (2002)

23
Estimating Sens and Spec Without a Reference
Standard
  • Model-based approaches latent class models and
    Bayesian models. See Pepe (2003), and Zhou et
    al. (2002)
  • Albert and Dodd (2004)
  • incorrect model leads to biased sens and spec
    estimates
  • different models can fit data equally well, yet
    produce very different estimates of sens and spec
  • FDA concerns recommendations
  • difficult to verify that model and assumptions
    are correct
  • try a range of models and assumptions and report
    range of results

24
Reference Standard Outcomeson a Subset
  • Albert and Dodd (2006, under review)
  • use info from verified and non-verified subjects
  • choosing between competing models is easier
  • explore subset choice (random, test dependent)
  • Albert (2006, under review)
  • estimation via imputation
  • study design implications (Albert, 2006)
  • Kondratovich (2003 2002-Mar-8 FDA Microbiology
    Devices Panel Meeting)
  • estimation via imputation

25
Practices to Avoid
  • using terms sensitivity and specificity if
    reference standard is not used
  • discarding equivocal results in data
    presentations and calculations
  • using data altered or updated by discrepant
    resolution
  • using the new test as part of the comparative
    benchmark

26
External validity
  • A study has high external validity if the study
    results are sufficiently reflective of the real
    world performance of the device in the intended
    use population

27
External validity
  • FDA recommends
  • include appropriate subjects and/or specimens
  • use final version of the device according to the
    final instructions for use
  • use several of these devices in your study
  • include multiple users with relevant training and
    range of expertise
  • cover a range of expected use and operating
    conditions

28
Reporting Recommendations
  • CRITICAL - need sufficient detail to be able to
    assess potential bias and external validity
  • just as (more?) important as computing CIs
    correctly
  • see guidance for specific recommendations

29
  • References
  • Albert, P. S. (2006). Imputation approaches for
    estimating diagnostic accuracy for multiple tests
    from partially verified designs. Technical
    Report 042, Biometric Research Branch, Division
    of Cancer Treatment and Diagnosis, National
    Cancer Institute (http//linus.nci.nih.gov/brb/Te
    chReport.htm).
  • Albert, P.S., Dodd, L.E. (2004). A cautionary
    note on the robustness of latent class models for
    estimating diagnostic error without a gold
    standard. Biometrics, 60, 427435.
  • Albert, P. S. and Dodd, L. E. (2006). On
    estimating diagnostic accuracy with multiple
    raters and partial gold standard evaluation.
    Technical Report 041, Biometric Research Branch,
    Division of Cancer Treatment and Diagnosis,
    National Cancer Institute (http//linus.nci.nih.go
    v/brb/TechReport.htm).
  • Begg, C.G. Biases in the assessment of
    diagnostic tests. Statistics in Medicine
    19876411-423.
  • Bossuyt, P.M., Reitsma, J.B., Bruns, D.E.,
    Gatsonis, C.A., Glasziou, P.P., Irwig, L.M.,
    Lijmer, J.G., Moher, D., Rennie, D., deVet,
    H.C.W. (2003). Towards complete and accurate
    reporting of studies of diagnostic accuracy The
    STARD initiative. Clinical Chemistry, 49(1), 16.
    (Also appears in Annals of Internal Medicine
    (2003) 138(1), W112 and in British Medical
    Journal (2003) 329(7379), 4144)

30
  • References (continued)
  • Bossuyt, P.M., Reitsma, J.B., Bruns, D.E.,
    Gatsonis, C.A., Glasziou, P.P., Irwig, L.M.,
    Moher, D., Rennie, D., deVet, H.C.W., Lijmer,
    J.G. (2003). The STARD statement for reporting
    studies of diagnostic accuracy Explanation and
    elaboration. Clinical Chemistry, 49(1), 718.
    (Also appears in Annals of Internal Medicine
    (2003) 138(1), W112 and in British Medical
    Journal (2003) 329(7379), 4144)
  • Lang, Thomas A. and Secic, Michelle. How to
    Report Statistics in Medicine. Philadelphia
    American College of Physicians, 1997.
  • Kondratovich, Marina (2003). Verification bias in
    the evaluation of diagnostic devices.
    Proceedings of the 2003 Joint Statistical
    Meetings, Biopharmaceutical Section, San
    Francisco, CA.
  • Pepe, M. S. (2003). The statistical evaluation of
    medical tests for classification and prediction.
    New York Oxford University Press.
  • Zhou, X. H., Obuchowski, N. A., McClish, D. K.
    (2002). Statistical methods in diagnostic
    medicine. New York John Wiley Sons.

31
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com