CAPRISA Molecular Integration Database Project: Acute Infection - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

CAPRISA Molecular Integration Database Project: Acute Infection

Description:

... generated by all CAPRISA projects and cores relevant to molecular data. sequence alignment, evolution, host genetic background, epitope mapping, clinical data ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 22
Provided by: tltu5
Category:

less

Transcript and Presenter's Notes

Title: CAPRISA Molecular Integration Database Project: Acute Infection


1
CAPRISA Molecular Integration Database Project
Acute Infection
  • Planning and Design

2
Aims
  • Develop integrated schema for sharing of
    information generated by all CAPRISA projects and
    cores relevant to molecular data
  • sequence alignment, evolution, host genetic
    background, epitope mapping, clinical data
  • cohort statistics, and epidemiological data
    linked but not likely to be as closely integrated

3
CAPRISA Network
DURBAN CENTRE
SANBI
NHLS
UCT
IRENE
SUN
SHARON
4
CAPRISA Network
Molecular Integration Database Web Site
SANBI
5
Molecular Integration Database
  • Sequence module
  • Sequence generation/quality control, analysis
  • Immunological data module
  • seperate laboratory records from which
    integratable data is transmitted to the MID
  • Clinical data module
  • seperate laboratory records from which
    integratable data is transmitted to the MID

6
Prototype Database Integration
7
Molecular Integration Database Development
Sequence data
  • Production Module
  • Production of sequences
  • Quality Control
  • Pipeline implementation
  • Analysis Module
  • Phylogenetics Module
  • Transmission to integration database
  • Integration
  • Browsing/Visualisation

8
Data pipeline
  • Production
  • Data production record (local database)
  • Laboratory experiment
  • Specimen collection
  • Pipeline applications (molecular data processing)
  • Transmission
  • Data transmission record
  • Information selected for its utility in
    integration with molecular data
  • Visualisation
  • Analysis
  • Relational database tables
  • Mined according to analytical needs

9
MID Data sources Sites and individuals
  • Molecular
  • Carolyns group, UCT
  • Carel, SUN
  • Lynn, NICD
  • Maria, NICD
  • Sharon, Africa Laboratory, Durban
  • Immunological
  • Clive, NICD
  • Clinical
  • Irene, Lancet and KEH labs, Durban
  • (Linked) Epidemiology, cohort statistics

10
DATA FORMATS
  • Numbers, letters, phrases, spreadsheets, text
    files
  • Local database application captures data
  • Clinical record, sequence, immunological etc.
  • Data production record is stored locally
  • Data transmission form is generated from the
    local database application
  • Formats are discussed and frozen prior to data
    capture

11
Data Transmission Forms
  • Layout of data transmission form cannot change
    once the schema has been OKd
  • Schema development requires that we determine
  • Data flow during production
  • Data types and sources for each group
  • Data types that can be cross linked
  • Projected requirements for analysis
  • Other TBD

12
How will the data be sent to SANBI?
  • Each data producing center will submit data via
    internet to an account user/password accessible
  • Automated submission will be developed where
    possible, secure submission will be performed
  • Data transmission form will be in tab delimited
    format
  • Errors responsibility of the sender
  • Base error checking will be performed during data
    check in process

13
Example of flow of data
Patient
Samples
UCT
Others
PCR
Immunology
SUN
SANBI
sequencing
Sequence owner accepts/discards sequences and
resubmits to database via Sequin
Sequence owner
14
Draft schema for integrated database
15
Sequencing data management
Sample Collection
Patient ID Visit Number
Tracking
Sample Processing
Patient ID Visit Number PCR Primer ID
orientation
Sample Sequencing
Patient ID Visit Number PCR Primer ID
orientation reaction ID
16
Sequence submission
  • Stand-alone tool for sequence submission to
    database
  • Long sequences
  • Sets of sequences
  • segmented entries
  • population
  • phylogenetic
  • mutation studies
  • Editing and updating
  • Complex annotation
  • Validation functions for quality assurance.

17
Issues finalised for sequence data
  • Naming convention
  • patient ID visit number PCR Primer name
    orientation reaction ID
  • Format
  • FASTA format from SUN sequencing centre
  • Quality Control
  • MID Submission in GenBank format
  • Sequin

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Storage and policies
  • Data Storage and backup
  • Onsite database storage at SANBI
  • Secure access via user authentification
  • Dedicated hard drives
  • Dedicated backup tapes
  • Data Management Policies
  • HPTN trials policies under review
Write a Comment
User Comments (0)
About PowerShow.com