Linking Registry Data: Technical and Legal Considerations - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Linking Registry Data: Technical and Legal Considerations

Description:

Linking Registry Data: Technical and Legal Considerations. Sara Rosenbaum , JD ... are secured have other legal rights and interests that create legal obligations? ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 20
Provided by: ahrq
Category:

less

Transcript and Presenter's Notes

Title: Linking Registry Data: Technical and Legal Considerations


1
Linking Registry Data Technical and Legal
Considerations 
  • Sara Rosenbaum , JD
  • George Washington University
  • Alan F. Karr, PhD
  • National Institute of Statistical Sciences

2
Authors and Reviewers
  • Authors Reviewers
  • Stephen E. Fienberg (lead) Julia Lane
  • Carnegie Mellon University National
    Science Foundation
  • Sara Rosenbaum (lead) Eric Peterson
  • George Washington University Duke
    University Medical Center
  • Susan Adams (lead) Victoria Prescott
  • Dartmouth College McBroom
    Consulting, LLC
  • Alan F. Karr Gerald Riley
  • National Institute of Statistical Sciences
    Centers for Medicare Medicaid Services
  • Bradley Malin Marcy Wilder
  • Vanderbilt University Hogan
    Hartson
  • Deven McGraw
  • The Center for Democracy and Technology
  • Maya A. Bernstein
  • Office of the Assistant Secretary for Planning
    and Evaluation, DHHS
  • Melissa M. Goldstein
  • George Washington University
  • Joy Pritts
  • Georgetown University

3
Purpose
  • Increasingly, statistical methods are used to
    link data from multiple de-identified sources.
  • What is the risk of identifying patients by
    combining data from multiple registries?
  • What are the legal and ethical requirements on
    researchers to insure patient privacy and
    confidentiality?

4
Paper Overview
  • INTRODUCTION
  • TECHNICAL ASPECTS OF DATA LINKAGE PROJECTS
  • - Linking records for research and improving
    public health
  • - What do Privacy, Disclosure, and
    Confidentiality mean?
  • - Linking records and probabilistic matching
  • - Procedural issues in linking datasets
  • LEGAL ASPECTS OF DATA LINKAGE PROJECTS
  • - Risks of identification
  • - The HIPAA Privacy Rule

5
Paper Overview (cont.)
  • D. RISK MITIGATION FOR DATA LINKAGE PROJECTS
  • - Methodology for mitigating the risk of
    re-identification
  • - Security practices, standards, and
    technologies
  • SUMMARY
  • SCENARIOS
  • I. Linking Clinical Registry Data with
    Insurance Claims Files
  • II. Planning for Data Linkage Projects

6
High-Level View Linking records for research
and improving public health
  • The scientific value of a registry increases with
    the number of cases and the extent of the health
    information included.
  • There is an ethical obligation to protect patient
    interests when collecting, sharing, and studying
    person-specific biomedical information.
  • Thus, a tension exists between the broad goals of
    registries and regulations protecting
    individually identifiable information.
  • A large body of federal law applies to health
    information privacy.

7
Key Terms
  • Privacy protection of people against unallowed
    uses of PII (specifically, PHI)
  • Disclosure Attribution of information to source
    of data
  • Confidentiality protection accorded to
    statistical data

8
Technical Aspects Privacy, Disclosure and
Confidentiality
  • Privacy
  • As used in the HIPAA Privacy Rule, the term
    applies to protected health information (PHI).

9
Technical Aspects Privacy, Disclosure and
Confidentiality (cont.)
  • Disclosure
  • Technical the attribution of information to the
    source of the data.
  • Identity disclosure occurs when the data source
    becomes known from the data release itself
  • Attribute disclosure occurs when the released
    data make it possible to infer the
    characteristics of an individual data source more
    accurately than would have otherwise been
    possible
  • Inferential disclosure relates to the probability
    of identifying a particular attribute of a data
    source.
  • HIPAA the release, transfer, provision of,
    access to, or divulging in any other manner of
    information outside of the entity holding the
    information.

10
Technical Aspects Privacy, Disclosure and
Confidentiality (cont.)
  • Confidentiality
  • A quality or condition of protection accorded to
    statistical information as an obligation not to
    permit the transfer of that information to an
    unauthorized party.
  • A different notion of confidentiality relates to
    the ethical, legal, and professional obligation
    of those who receive information in the context
    of a clinical relationship to respect the privacy
    interests of their patients.

11
Technical Aspects Linking records and
probabilistic matching
  • Techniques for record linkage
  • Unique identifiers
  • AI-like rule
  • Probabilistic approaches
  • Probabilistic approach is built on five key
    components
  • 1. Define features that describe similarity
    between records.
  • Place feature vectors into three classes matches
    (M), non-matches (U), and possible matches (P).
  • Perform record-pair classification by calculating
    the ratio (P (Y M)) / (P (Y U)) for each
    pair, where Y is a feature vector for the pair
    and P (Y M) and P (Y U) are the probabilities
    of observing that feature vector for a matched
    and non-matched pair.
  • Where no duplicate and/or non-duplicate record
    pairs are available, estimate conditional
    probabilities by using observed frequencies in
    the records to be linked.
  • Blocking, or partitioning the databases based
    on some variable in both databases, improves
    efficiency.

12
Technical Aspects Procedural issues in linking
data sets
  • Neither data nor link can be defined
    unambiguously, and the relationship between
    datasets can vary.
  • Linking horizontally partitioned datasets carries
    little risk of re-identification, because in most
    cases there is no more information about a record
    on the combined dataset than was present in the
    individual datasets.
  • For vertically partitioned datasets, it is
    necessary to link individual subjects records
    that are contained in two or more datasets. This
    process is risky because the combined dataset
    contains more information about each subject than
    either of the components.
  • Preferred approach methods based on cryptography
    (complex and may involve a third party)
  • More common approach remove identifiers and
    carry out statistical disclosure limitation prior
    to linkage (may introduce errors into the linked
    dataset that alter results of statistical
    analyses)

13
Technical Aspects Procedural issues in linking
data sets (cont.)
  • Many linkage techniques depend on the presence of
    attributes in both databases that are unique to
    individuals but do not lead to re-identification.
  • Linkage can reduce data quality.
  • No matter how linkage is performed, other issues
    should be addressed
  • comparable attributes should be expressed in the
    same units of measure
  • conflicting values of attributes for each
    individual common to both databases should be
    reconciled
  • managing records that appear in only one database
    (most commonly they are dropped)
  • consider effect of linkage on data quality
  • There are unremovable risks from data linkage.
    Strong consideration should be given to forms of
    data protection such as licensing and restricted
    access.

14
Risk Mitigation Methodology for mitigating the
risk of re-identification
  • Basic methodology for statistical disclosure
    limitation
  • Disclosure limiting masks are transformations
    of the data where there is a specific functional
    relationship between masked values and original
    data.
  • Can be categorized as suppressions (e.g., cell
    suppression), re-codings (e.g., collapsing rows
    or columns, or swapping), or samplings (e.g.,
    releasing subsets).
  • The Risk-Utility tradeoff
  • Risk of disclosure is balanced with the utility
    of the released data.
  • Privacy-preserving data mining methodologies
  • Cryptographic approaches to privacy protection
  • Differential privacy focuses on algorithmic
    aspects of the problem with an emphasis on
    automation and scalability of a process for
    conferring anonymity
  • Limits the information a data user might learn
    beyond that known before exposure to the released
    statistics

15
Risk Mitigation Security practices, standards
and technologies
  • Philosophies regarding the preservation of
    confidentiality associated with individual-level
    data
  • Restricted or limited information, with
    restrictions on the amount or format of the data
    released
  • Restricted or limited access, with restrictions
    on the access to the information itself.
  • Accountability
  • Ensure that researchers are accountable for the
    use of datasets (e.g., best practices, unique
    logins, user authentication, audit trails)
  • Registries as data enclaves
  • Research data centers where users can access
    and use data in a regulated environment
  • Layered restricted access to databases
  • A form of layered restrictions that combines two
    approaches with differing levels of access at
    different levels of detail in the data

16
Legal Considerations
  • Critical starting point nature of the research
    undertaking
  • Health care operations?
  • HIPAA Privacy and Security Rules
  • Health care quality related activities
  • Public health practice?
  • Research within meaning of Common Rule?
  • creation of general knowledge
  • Some combination of the three?

17
Do HIPAA Privacy and Security Rules Apply? And if
so, What are the Issues?
  • Are the data PHI, and is the source a covered
    entity? If so, then HIPAA privacy and security
    standards apply
  • Is the data source a covered entity
  • (ARRA expands to include business associates)
  • De-identification and re-identification of data
  • Data use agreements for limited data sets
  • Security obligations for ePHI

18
Do the Data and Data Source Raise Other Legal
Obligations?
  • Do patients and the custodial institutions from
    whom the data are secured have other legal rights
    and interests that create legal obligations?
  • E.g., more stringent state privacy laws
  • Were confidentiality expectations created?
  • Institutional privacy expectations
  • Special federal or state standards applicable to
    substance abuse or mental illness information

19
Summary
  • This white paper describes technical and legal
    considerations for researchers interested in
    creating data linkage projects involving registry
    data, and presents typical linkage methods. It
    also discusses both the hazards for
    re-identification created by data linkage
    projects, and the statistical methods used to
    minimize the risk of re-identification.
  • Some limitations of this discussion are the
    exclusion of
  • considerations about linking data from public and
    private sectors, where different ethical and
    legal restrictions may apply, and
  • detailed information about the risks involved
    with identifying the health care providers that
    collect and provide data.
  • Dataset linkage entails the risks of loss of
    reliable confidential data management and
    identification or re-identification of
    individuals and institutions. Recognized and
    developing statistical methods and secure
    computation may limit these risks and may allow
    the public health benefits that registries linked
    to other datasets have the potential to
    contribute.
Write a Comment
User Comments (0)
About PowerShow.com