Title: GENAdb
1GENAdb Genomics Array Database
GENAdb
CSIRO Plant Industry Genomics Array Database
Presented by Gavin Kennedy CSIRO Mathematical and
Information Sciences
2GENAdb Genomics Array Database
Overview of the Microarray Process
3GENAdb Genomics Array Database
Identification of expressed genes
Analyzed data
4GENAdb Genomics Array Database
Data points in the microarray process
5GENAdb Genomics Array Database
Drivers for a microarray database
- Centralised storage and management for the
details of experimental conditions - Centralised storage and management for large
volumes of result sets - 1 Million individual data points for a single
slide - Consistent access to stored experiment details
and result sets. - Ability to perform structured queries across all
the data. - Investigation of genes of interest across
result sets. - Allow annotation at various stages of the
microarray process. - Allow analytical tools to be used against the
data and subsequent storage of analysis results.
6GENAdb Genomics Array Database
What GENA does
- Stores data related to a microarray experiment
- Data to describe the experiment
- Data to reproduce the experiment
- Stores results from a microarray experiment
- Results to support further investigation
- Results to support a publication
- Results for multiple analysis techniques
- Stores data and results to support enquiry and
analysis from multiple perspectives - Provides the interfaces to store, access and
analyse the data and results
7GENAdb Genomics Array Database
What GENA does not do
- Store blast results
- Analyse the result sets for you
- Limitations
- Time and complexity
- Focus on purpose
8GENAdb Genomics Array Database
High level structure of the database
- Generic at the organism level
- Plants and fungi for Plant Industry
- Animals and bacteria for Livestock Industry
- Specific at the microarray technology level (so
far) - Three subsets of data tables
- Arrays
- Samples
- Results
9GENAdb Genomics Array Database
High level structure of the database
- Structure consistent with other microarray
databases - RAD RNA Abundance Database
- (U. Pennsylvania, www.cbil.upenn.edu/RAD2/)
- SMD Stanford Microarray Database
- (genome-www.stanford.edu/MicroArray/SMD/)
- GeneX
- (NCGR genex.ncgr.org)
- Compliance with developing standards
- MIAME Minimum Information About a Microarray
Experiment - (MGED www.mged.org)
10GENAdb Genomics Array Database
Data points in the microarray process
11GENAdb Genomics Array Database
Entity Relationship Diagram
Hybridised Onto
Produces
Source
Sample
Used In
Cardinality
Mandatory 1
Groups
Scans As
Scan
Slide
Experiment
1 to Many
0 or 1
Mapped by
Zero to Many
Contains
Spotted From
Amplified From
Array
Plate
Amplification
Consists of
Recorded In
Spot
Result Set
Contains
Contains
Identifies
Sequence
12GENAdb Genomics Array Database
Gena Schema Source to Slide (Hybridisation)
13GENAdb Genomics Array Database
Gena Schema Plate to Slide (Spotting)
Slide
Slide_ID
Experiment _ID
Array_ID
Date_Spotted
Date_Hybridised
Bio_Replicate_No
Tech_Replicate_No
Sample_X_ID
X_Labelling_Info
Sample_Y_ID
Y_Labelling_Info
14GENAdb Genomics Array Database
Gena Schema Slide to Results (Scanning and
Quantification)
Slide
Slide_ID
Experiment _ID
Array_ID
Date_Spotted
Date_Hybridised
Bio_Replicate_No
Tech_Replicate_No
Sample_X_ID
X_Labelling_Info
Sample_Y_ID
Y_Labelling_Info
Norm_Results_1
Norm_Results_2
Norm_Results_3
Primary_Results_ID
Scan_ID
Spot_ID
Ch1_Median
Ch1_Mean
Ch2_Median
Ch2_Mean
15GENAdb Genomics Array Database
Gena Database Schema
16GENAdb Genomics Array Database
Normalisation and Analysis
- Three sets of normalised data reflecting
normalisation three methods - 1st Most popular
- 2nd Second most popular
- 3rd Flavour of the month
- Normalisation performed internal to the database
- Each new normalisation technique requires all
result sets to be re-normalised - Analysis can be carried out on any of the three
normalised result sets
17GENAdb Genomics Array Database
Communications Model
18GENAdb Genomics Array Database
Implementation of Gena
- GENAdb runs as an Oracle database
- Oracle database hosted on a dedicated NT/2000
server managed by IT Support Group - A separate (but connected) web server runs
Apache JServer - Java servlets used to
- Generate web pages
- Format, process and store data
- Oracle/Java combination makes Gena portable to
Unix/Linux platforms - R will run within the structure for statistical
analysis tasks
19GENAdb Genomics Array Database
Timeline
- 3 months
- Ready to load GPR files
- 6 months
- 90 of expected functionality implemented
- 9-12 months
- Refine existing processes
- Implement remaining processes
20GENAdb Genomics Array Database
- Tell me more
-
- Gena mailing list
- gena_at_pi.csiro.au
- Gena development homepage
- www.pi.csiro.au/gena/
- Feedback is welcome
- Acknowledgements
- Chris Helliwell
- Iain Wilson
- IT Support Group
- Robert Power (CMIS)