Linking Registry Data: Technical and Legal Considerations - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Linking Registry Data: Technical and Legal Considerations

Description:

Linking Registry Data: Technical and Legal Considerations. Sara Rosenbaum , JD ... are secured have other legal rights and interests that create legal obligations? ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 20

Provided by: ahrq

Category:

more less

Transcript and Presenter's Notes

Title: Linking Registry Data: Technical and Legal Considerations

1
Linking Registry Data Technical and Legal
Considerations

Sara Rosenbaum , JD
George Washington University
Alan F. Karr, PhD
National Institute of Statistical Sciences

2
Authors and Reviewers

Authors Reviewers
Stephen E. Fienberg (lead) Julia Lane
Carnegie Mellon University National
Science Foundation
Sara Rosenbaum (lead) Eric Peterson
George Washington University Duke
University Medical Center
Susan Adams (lead) Victoria Prescott
Dartmouth College McBroom
Consulting, LLC
Alan F. Karr Gerald Riley
National Institute of Statistical Sciences
Centers for Medicare Medicaid Services
Bradley Malin Marcy Wilder
Vanderbilt University Hogan
Hartson
Deven McGraw
The Center for Democracy and Technology
Maya A. Bernstein
Office of the Assistant Secretary for Planning
and Evaluation, DHHS
Melissa M. Goldstein
George Washington University
Joy Pritts
Georgetown University

3
Purpose

Increasingly, statistical methods are used to
link data from multiple de-identified sources.
What is the risk of identifying patients by
combining data from multiple registries?
What are the legal and ethical requirements on
researchers to insure patient privacy and
confidentiality?

4
Paper Overview

INTRODUCTION
TECHNICAL ASPECTS OF DATA LINKAGE PROJECTS
- Linking records for research and improving
public health
- What do Privacy, Disclosure, and
Confidentiality mean?
- Linking records and probabilistic matching
- Procedural issues in linking datasets
LEGAL ASPECTS OF DATA LINKAGE PROJECTS
- Risks of identification
- The HIPAA Privacy Rule

5
Paper Overview (cont.)

D. RISK MITIGATION FOR DATA LINKAGE PROJECTS
- Methodology for mitigating the risk of
re-identification
- Security practices, standards, and
technologies
SUMMARY
SCENARIOS
I. Linking Clinical Registry Data with
Insurance Claims Files
II. Planning for Data Linkage Projects

6
High-Level View Linking records for research
and improving public health

The scientific value of a registry increases with
the number of cases and the extent of the health
information included.
There is an ethical obligation to protect patient
interests when collecting, sharing, and studying
person-specific biomedical information.
Thus, a tension exists between the broad goals of
registries and regulations protecting
individually identifiable information.
A large body of federal law applies to health
information privacy.

7
Key Terms

Privacy protection of people against unallowed
uses of PII (specifically, PHI)
Disclosure Attribution of information to source
of data
Confidentiality protection accorded to
statistical data

8
Technical Aspects Privacy, Disclosure and
Confidentiality

Privacy
As used in the HIPAA Privacy Rule, the term
applies to protected health information (PHI).

9
Technical Aspects Privacy, Disclosure and
Confidentiality (cont.)

Disclosure
Technical the attribution of information to the
source of the data.
Identity disclosure occurs when the data source
becomes known from the data release itself
Attribute disclosure occurs when the released
data make it possible to infer the
characteristics of an individual data source more
accurately than would have otherwise been
possible
Inferential disclosure relates to the probability
of identifying a particular attribute of a data
source.
HIPAA the release, transfer, provision of,
access to, or divulging in any other manner of
information outside of the entity holding the
information.

10
Technical Aspects Privacy, Disclosure and
Confidentiality (cont.)

Confidentiality
A quality or condition of protection accorded to
statistical information as an obligation not to
permit the transfer of that information to an
unauthorized party.
A different notion of confidentiality relates to
the ethical, legal, and professional obligation
of those who receive information in the context
of a clinical relationship to respect the privacy
interests of their patients.

11
Technical Aspects Linking records and
probabilistic matching

Techniques for record linkage
Unique identifiers
AI-like rule
Probabilistic approaches
Probabilistic approach is built on five key
components
1. Define features that describe similarity
between records.
Place feature vectors into three classes matches
(M), non-matches (U), and possible matches (P).
Perform record-pair classification by calculating
the ratio (P (Y M)) / (P (Y U)) for each
pair, where Y is a feature vector for the pair
and P (Y M) and P (Y U) are the probabilities
of observing that feature vector for a matched
and non-matched pair.
Where no duplicate and/or non-duplicate record
pairs are available, estimate conditional
probabilities by using observed frequencies in
the records to be linked.
Blocking, or partitioning the databases based
on some variable in both databases, improves
efficiency.

12
Technical Aspects Procedural issues in linking
data sets

Neither data nor link can be defined
unambiguously, and the relationship between
datasets can vary.
Linking horizontally partitioned datasets carries
little risk of re-identification, because in most
cases there is no more information about a record
on the combined dataset than was present in the
individual datasets.
For vertically partitioned datasets, it is
necessary to link individual subjects records
that are contained in two or more datasets. This
process is risky because the combined dataset
contains more information about each subject than
either of the components.
Preferred approach methods based on cryptography
(complex and may involve a third party)
More common approach remove identifiers and
carry out statistical disclosure limitation prior
to linkage (may introduce errors into the linked
dataset that alter results of statistical
analyses)

13
Technical Aspects Procedural issues in linking
data sets (cont.)

Many linkage techniques depend on the presence of
attributes in both databases that are unique to
individuals but do not lead to re-identification.
Linkage can reduce data quality.
No matter how linkage is performed, other issues
should be addressed
comparable attributes should be expressed in the
same units of measure
conflicting values of attributes for each
individual common to both databases should be
reconciled
managing records that appear in only one database
(most commonly they are dropped)
consider effect of linkage on data quality
There are unremovable risks from data linkage.
Strong consideration should be given to forms of
data protection such as licensing and restricted
access.

14
Risk Mitigation Methodology for mitigating the
risk of re-identification

Basic methodology for statistical disclosure
limitation
Disclosure limiting masks are transformations
of the data where there is a specific functional
relationship between masked values and original
data.
Can be categorized as suppressions (e.g., cell
suppression), re-codings (e.g., collapsing rows
or columns, or swapping), or samplings (e.g.,
releasing subsets).
The Risk-Utility tradeoff
Risk of disclosure is balanced with the utility
of the released data.
Privacy-preserving data mining methodologies
Cryptographic approaches to privacy protection
Differential privacy focuses on algorithmic
aspects of the problem with an emphasis on
automation and scalability of a process for
conferring anonymity
Limits the information a data user might learn
beyond that known before exposure to the released
statistics

15
Risk Mitigation Security practices, standards
and technologies

Philosophies regarding the preservation of
confidentiality associated with individual-level
data
Restricted or limited information, with
restrictions on the amount or format of the data
released
Restricted or limited access, with restrictions
on the access to the information itself.
Accountability
Ensure that researchers are accountable for the
use of datasets (e.g., best practices, unique
logins, user authentication, audit trails)
Registries as data enclaves
Research data centers where users can access
and use data in a regulated environment
Layered restricted access to databases
A form of layered restrictions that combines two
approaches with differing levels of access at
different levels of detail in the data

16
Legal Considerations

Critical starting point nature of the research
undertaking
Health care operations?
HIPAA Privacy and Security Rules
Health care quality related activities
Public health practice?
Research within meaning of Common Rule?
creation of general knowledge
Some combination of the three?

17
Do HIPAA Privacy and Security Rules Apply? And if
so, What are the Issues?

Are the data PHI, and is the source a covered
entity? If so, then HIPAA privacy and security
standards apply
Is the data source a covered entity
(ARRA expands to include business associates)
De-identification and re-identification of data
Data use agreements for limited data sets
Security obligations for ePHI

18
Do the Data and Data Source Raise Other Legal
Obligations?

Do patients and the custodial institutions from
whom the data are secured have other legal rights
and interests that create legal obligations?
E.g., more stringent state privacy laws
Were confidentiality expectations created?
Institutional privacy expectations
Special federal or state standards applicable to
substance abuse or mental illness information

19
Summary

This white paper describes technical and legal
considerations for researchers interested in
creating data linkage projects involving registry
data, and presents typical linkage methods. It
also discusses both the hazards for
re-identification created by data linkage
projects, and the statistical methods used to
minimize the risk of re-identification.
Some limitations of this discussion are the
exclusion of
considerations about linking data from public and
private sectors, where different ethical and
legal restrictions may apply, and
detailed information about the risks involved
with identifying the health care providers that
collect and provide data.
Dataset linkage entails the risks of loss of
reliable confidential data management and
identification or re-identification of
individuals and institutions. Recognized and
developing statistical methods and secure
computation may limit these risks and may allow
the public health benefits that registries linked
to other datasets have the potential to
contribute.