Title: ALSPAC Record Linkage to External Databases
1ALSPAC Record Linkage to External Databases
- Andy Boyd
- ALSPAC, Social Medicine
- University of Bristol
2The data sources and processes involved
- The processes involved in linkage projects
- Overview of ALSPACs existing data linkage
projects - National Pupil DB Geographic linkage as
examples - Data Availability Linkage Problems
3Processes involved in linkage projects
- Find the contact
- Ethics informed consent and/or Section 60
support - Data Security
- HM Revenue Customs
- Creating a linkage data set
- Data QC checks
- Identifiers
- Formats and data normalisation
4Processes involved in linkage projects cont
- Who links the data?
- one of the two parties or an independent 3rd
party - Processing the data
- Anonymity vs sufficient data for research
- Ages in Months Years
- First Half of Postcode
- Recode unusual outcomes into wider categories
5Major External Databases
- Health related datasets
- Office National Statistics (ONS) Tracing
- Cancer Registry GRO
- NSTS (NHS Strategic Tracing Service)
- Electronic antenatal birth records
- PCT data (Exeter DB, My Quest)
- Non health Datasets
- National Pupil Database (DCSF, DIUS, UCAS)
- ALSPAC Schools Collection
- G.I.S Datasets (Geographic Information Systems)
- DWP
- Home Office Linkage currently being
investigated
6National Pupil Database
- Maintained by Dept. Children Schools Families
- Covers all state maintained schools in England
- Annual / now 3 time points, census
- Data at school and pupil level
- Key data include
- Exam results
- Attendance
- Pupil demographics (including address, ethnicity,
Free School Meals, Special Educational Needs) - School Characteristics (pupil numbers, staff
pupil ratios)
7NPD How we did it
- 3rd party conducted match The Fischer Trust
independent charity - Provided data on the eligible cohort
- ALSPAC DCSF provided the following linkage
variables - Surname, Forename, Familiar name
- Date of Birth, Gender
- Postcode, Previous Postcode Postcode accuracy
flag - Current School (from ALSPAC data collection)
8NPD - Details
- ALSPAC Cohort covers three academic years
- We hold data on all YPs across these three years
approx. 600,000 cases a year - Figures based on eligible cohort
- 17671 linked (86)
- Majority of unlinked cases thought to be in
private education (will be in NPD from KS4)
9NPD - Advantages
- Covers all English state schools
- Good match rate for eligible cohort
- Regular updates
- Access to confidential variables
- PLUG workshops provide good opportunities to
discuss data and solutions to problems
10NPD - Problems
- Central ID QC issues (a few duplicates)
- Only applies to English state maintained until
KS4, then re-link extra costs and bias until
then - Data collection method/standards varies from
school to school - Documentation (lack of)
- Size of raw data, time consuming to process
- Fixed time point census, doesnt record all
school movements (especially annual census)
11G.I.S Data
- Spatial data held at many geographic levels
- Geographies range in scale from 0.1 meters to
regional/national data - Tied together via postcode or grid reference as
central ID - Key data include
- NSPD ( was All Fields Postcode Directory) - geo
linking database - Deprivation Socio Economic indices (IMD,
Townsend, Acorn) - Census data
12G.I.S How we link cases to data
- Master file of Postcodes
- Postcodes linked to grid reference
- Grid references of various scales
- PCs/GridRef mapped to
- Electoral geographies
- Census geographies
- Ethics
- We dont generally identify residence at PC or
equivalent level
Ordinance Survey The National Grid
13G.I.S - Details
- 50,000 ALSPAC address points, associated with a
date range which can then be linked to ALSPAC
data collection - Linkage examples
- Indices of multiple deprivation
- Travel from home to
- school patterns
- Cancer rates and residential
- distance from power lines
The geographic relation between household income
and polluting factories FoE 1999
14G.I.S advantages
- Many data sets in public domain (or available
through athens) - Many geographies are broad enough to not identify
cohort members - National picture (some exclude Scotland)
15G.I.S Problems
- Shifting geographies across time points
- Royal Mail change postcodes
- Postcode not precise enough in some cases
- Postcode boundaries are not contiguous with other
geographic boundaries
16Accuracy issues with analysis at postcode level
Address level
Postcode level
17Accuracy issues with analysis at postcode level
Address level
Postcode level
18Accuracy issues with analysis at postcode level
Address level
Postcode level
19Data Availability Linkage Problems
- Cohort Data
- GIS Data
- GIS Ethics
20Linkage problems with the cohort data
- Missing data
- Especially problematic for the cases who didnt
enrol in the original recruitment - Partners
- 69 cases with no known birth outcome
- Gaps in the address data
- However
- ONS matched 99.7 mothers, so we have their old
new NHS numbers and cleaned data (original
recruitment cases only)
21Linkage problems we encounter
- Many of the early records are paper based or in
varied formats. - Quality Control ONS data returned to us with 37
incorrect ALSPAC Ids - Unknown methods No documentation from ONS or
Fischer regarding the quality of the match - Lack of uniqueness in
- the ID (either duplicates
- or multiple IDs per case)
22GIS Data Availability
- Collected as administrative resource
- Not yet cleaned, documented and presented to
usual ALSPAC standards - Initiatives under way to validate and fill gaps
in record - Schools GIS data in the main not processed
- Aim to build into standard ALSPAC resource
23GIS Ethics
- Postcode level or greater accuracy treated as a
personal identifier - Research proposals to use these data need ALSPAC
Law Ethics Approval - Broader geographical data can be released in
normal manner - A two-stage process is used to collect and
process precise data
24GIS Ethics
- Step 1 Postcodes (or full address) provided to
researcher with unique collection ID with no
other data attached - Step 2 Researcher attaches their data and
returns file to ALSPAC - Step 3 ID converted to the appropriate
collaborator ID, postcode data removed - Step 4 Requested ALSPAC data added to the file
and data sent to the researcher
25Andy BoydA.W.Boyd_at_Bristol.ac.uk