Title: HIPAA and its Implications on Epidemiological Research Using Large Databases
1HIPAA and its Implications on Epidemiological
Research Using Large Databases
- K. Arnold Chan, MD, ScD
- Harvard School of Public Health
- Channing Laboratory,
- Birgham Womens Hospital
- and Harvard Medical School
1
2Brief outline of this presentation
- Using large linked automated data for public
health research - Data development processes to ensure
HIPAA-compliance - Examples
- Some thoughts
3Two types of data for public health research
- Primary data
- Prospectively collected
- Well-designed data collection tool
- Informed consent
- Secondary data
- Data originally collected for other purposes
- May be proprietary
- Privacy and confidentiality (particularly
important if no prior authorization) - Different data systems
4Large linked healthcare databases
- Health insurance claims data
- Medicaid
- Medicare
- Managed Care Organizations (MCO)
- Automated medical records
- Hospital / Clinic IT systems
- Availability of written records
- Need to contact patients / individuals ?
5Public health research within MCOs
- Harvard Community Health Plan (subsequently
became Harvard Pilgrim HealthCare) - Kaiser Permanente (several states)
- Group Health Cooperative (Seattle area)
- Others
- HMO Research Network
- 10 MCOs across the U.S.
6Public health research within MCOs
- Different types of MCOs
- Group model
- Staff model
- Different relationship with hospitals
- Implications on data access
- MCOs with research programs
- Separate research departments
- Full-time investigators and support staff
7Data elements in the MCO data
- Demographic information
- Membership
- Start date, termination date, benefit plan, ...
- Office visits
- Type of visit, diagnosis(es), special procedures
- Special examinations
- Radiology, Laboratory examinations
- Hospitalizations
- Drug dispensings
- Linkable by a unique ID
8HIPAA and Research with Databases
- Authorization from individual research subjects
not feasible - Individual authorization may be waived by
Institutional Review Board or Privacy Board - Minimal Risk
- Data reported in aggregate fashion
- No single-case report
- Minimum necessary principle
- De-identification
9HIPAA and Research with Databases
- Single MCO studies
- Investigators and research staff are MCO
employees - Multiple-MCO studies
- May involve transferral of data across MCOs or to
a Data Center - Other types of studies not covered in this
presentation - e.g. Generate a de-identified dataset for public
or commercial use
10HIPAA and data development
- Do not move individual level data unless
absolutely necessary - Generate summary tables at each study site
- Combine the tables for final report
- Smalley et al. Contraindicated use of cisapride
the impact of an FDA regulatory action. JAMA
2000 284 3036-9.
11(No Transcript)
12HIPAA and data development
- Randomly generated Study ID to replace True ID
- Crosswalk between the two stored at secured
location - Destroy the crosswalk after successful linkage of
data and quality check - Implications for storage and back-up
13HIPAA and data development
- Roll-up / transform variables
- Age --gt Age groups
- National Drug Code --gt Drug or Group of drugs
- ICD-9 diagnosis code --gt Disease
- e.g. A man born on Dec 10, 1934 with diagnosis
code xxx.yy received durg 55555-333-22 - 65-70 y/o m with Heart Failure received Digoxin
14HIPAA and data development
- Preserve temporal sequence of events
- but disguise the real dates
- e.g. Drug use during pregnancy study
- 29 year-old received 55555-333-22 on Nov 25, 1999
and delivered a baby on Dec 10, 1999 - --gt
- 26-30 year-old mother delivered in 1999, baby
exposed to amoxicillin at -16 days
15HIPAA and data development
- Only extract information relevant to the study
- e.g. A study of osteoporosis does not require
information on subjects' mental health status - Co-morbid conditions may be relevant
- Use proxy measures to describe level of
comorbidity - Charlson's Index (based on concomitant diagnoses)
- Chronic Disease Score (based on co-medications)
16HIPAA and data development
- Geocoding
- Describe social-economic status of study subjects
based on census tract data - Send out (Study ID, address) to a geocoding firm
- (Study ID, X1, X2, X3) returned
- X1 education level
- X2 income level
- X3 race/ethnicity information
17An example
- Finkelstein et al. Decreasing Antibiotic Use
Among US Children The Impact of Changing
Diagnosis Patterns. Pediatrics 2003 112 620-7. - Data elements involved
- Date of birth, gender
- Membership
- Drug dispensings
- Diagnoses in close proximity to antibiotics
dispensings - Data from nine MCOs
18Finkelstein et al. Pediatric antibiotics use study
- Data development at each MCO
- Extract antibiotics use information
- Extract diagnosis of interest (infections)
- Use date of birth, gender, and membership data to
calculate person-time of interest - Refined, aggregate data forwarded to the Data
Center - Rate of antibiotics use
- of antibiotics use / 1,000 person-years
- for each age-gender group
19HIPAA and data development
- Individual identification is needed for certain
types of research - Obtain medical records
- Contact patient to conduct interview and/or
request specimen - Linkage with external data
- Cancer registry
- National Death Index
20HIPAA and data development
- The process
- Data extraction, transformation, reduction, and
de-identification carried out at each MCO - Governed by State laws and local HIPAA-compliant
Standard Operating Procedures - Principle of Limited Dataset / Minimum necessary
- The goal
- Highly processed and de-identified data available
for concatenation across study sites and complex
analyses
21k-anonymity and large datasets
- The goal
- A de-identified dataset at a certain level of
individual anonymity - A 43 year-old man with hypertension, diabetes,
and anxiety, taking atenolol, rosiglitazone, and
lorazepam - vs.
- A man 40-45 taking a beta-blocker and a
thiazolidenedione
22HIPAA, Data Storage and Access
- Implications on Data Backup Plans
- Data need to be destroyed after the report is
published - Data only used to support pre-defined analyses
- Ancillary analysis are possible after IRB review
and approval
23Epidemiology studies using large databases
- In the old days ...
- Give me all the data, do what I say ...
- What if the investigator / reviewer want to do
THIS analysis ? - Use existing datasets to test new hypothesis
- Good research practice
- Define necessary data elements according to
research protocol - Pre-defined analytic plan
24Epidemiology studies using large databases
- Keys to protection of human subjects
- Competent, responsible investigators and staff
- IRB review and oversight
- Data development guidelines
- e.g. Good Epidemiology Practice
- Information technology
- Some reasonable rules/guidelines are better than
no guideline