Title: CAPRISA Molecular Integration Database Project: Acute Infection
1CAPRISA Molecular Integration Database Project
Acute Infection
2Aims
- Develop integrated schema for sharing of
information generated by all CAPRISA projects and
cores relevant to molecular data - sequence alignment, evolution, host genetic
background, epitope mapping, clinical data - cohort statistics, and epidemiological data
linked but not likely to be as closely integrated
3CAPRISA Network
DURBAN CENTRE
SANBI
NHLS
UCT
IRENE
SUN
SHARON
4CAPRISA Network
Molecular Integration Database Web Site
SANBI
5Molecular Integration Database
- Sequence module
- Sequence generation/quality control, analysis
- Immunological data module
- seperate laboratory records from which
integratable data is transmitted to the MID - Clinical data module
- seperate laboratory records from which
integratable data is transmitted to the MID
6Prototype Database Integration
7Molecular Integration Database Development
Sequence data
- Production Module
- Production of sequences
- Quality Control
- Pipeline implementation
- Analysis Module
- Phylogenetics Module
- Transmission to integration database
- Integration
- Browsing/Visualisation
8Data pipeline
- Production
- Data production record (local database)
- Laboratory experiment
- Specimen collection
- Pipeline applications (molecular data processing)
- Transmission
- Data transmission record
- Information selected for its utility in
integration with molecular data - Visualisation
- Analysis
- Relational database tables
- Mined according to analytical needs
9MID Data sources Sites and individuals
- Molecular
- Carolyns group, UCT
- Carel, SUN
- Lynn, NICD
- Maria, NICD
- Sharon, Africa Laboratory, Durban
- Immunological
- Clive, NICD
- Clinical
- Irene, Lancet and KEH labs, Durban
- (Linked) Epidemiology, cohort statistics
10DATA FORMATS
- Numbers, letters, phrases, spreadsheets, text
files - Local database application captures data
- Clinical record, sequence, immunological etc.
- Data production record is stored locally
- Data transmission form is generated from the
local database application - Formats are discussed and frozen prior to data
capture
11Data Transmission Forms
- Layout of data transmission form cannot change
once the schema has been OKd - Schema development requires that we determine
- Data flow during production
- Data types and sources for each group
- Data types that can be cross linked
- Projected requirements for analysis
- Other TBD
12How will the data be sent to SANBI?
- Each data producing center will submit data via
internet to an account user/password accessible - Automated submission will be developed where
possible, secure submission will be performed - Data transmission form will be in tab delimited
format - Errors responsibility of the sender
- Base error checking will be performed during data
check in process
13Example of flow of data
Patient
Samples
UCT
Others
PCR
Immunology
SUN
SANBI
sequencing
Sequence owner accepts/discards sequences and
resubmits to database via Sequin
Sequence owner
14Draft schema for integrated database
15Sequencing data management
Sample Collection
Patient ID Visit Number
Tracking
Sample Processing
Patient ID Visit Number PCR Primer ID
orientation
Sample Sequencing
Patient ID Visit Number PCR Primer ID
orientation reaction ID
16Sequence submission
- Stand-alone tool for sequence submission to
database - Long sequences
- Sets of sequences
- segmented entries
- population
- phylogenetic
- mutation studies
- Editing and updating
- Complex annotation
- Validation functions for quality assurance.
17Issues finalised for sequence data
- Naming convention
- patient ID visit number PCR Primer name
orientation reaction ID - Format
- FASTA format from SUN sequencing centre
- Quality Control
- MID Submission in GenBank format
- Sequin
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Storage and policies
- Data Storage and backup
- Onsite database storage at SANBI
- Secure access via user authentification
- Dedicated hard drives
- Dedicated backup tapes
- Data Management Policies
- HPTN trials policies under review