Title: ArrayTrack Data management, analysis and interpretation tool for DNA microarray and beyond
1ArrayTrack- Data management, analysis and
interpretation tool for DNA microarray and beyond
- Weida Tong
- Director, Center for Toxicoinformatics, NCTR/FDA
- Weida.tong_at_fda.hhs.gov
2ArrayTrack A brief history in the 5 years
Development Cycle
- AT version 1 (2001)
- Filter array data management tool
- AT version 2 (2002) in-house microarray core
facility - Customized two color arrays data management,
analysis and interpretation - Open to public (late of 2003)
- AT version 3.1 (2004) VGDS
- Affymetrix analysis capability enhanced
- AT version 3.2 (2005) MAQC
- Tested on 7 commercial platforms (Affy, Agilent
one- and two-color arrays, ABI, CodeLink,
Illumina ) - Integrated with other software (IPA, MetaCore,
DrugMatrix, CEBS, SAS/JMP ) - AT version 4 (2006 present)
- CDISC/SEND standard
- VGDS ? VXDS
3ArrayTrack Client-Server Architecture
CLIENT
Analysis Tools
Pub data (Gene annotation, Pathways )
Study data (Clinical and non-clinical data)
Microarray Proteomics Metabolomics
SERVER
CDISC/SEND
MIAME
NCBI, KEGG, GO
4ArrayTrack An Integrated Solution
Clinical and non-clinical data
Chemical data
ArrayTrack
5ArrayTrack-Freely Available to Public
Web-access
Local installation
of unique users access the locally installed
version of ArrayTrack
of unique users access the web version of
ArrayTrack
6ArrayTrack Website
http//www.fda.gov/nctr/science/centers/toxicoinfo
rmatics/ArrayTrack/
7DNA Microarray
Key advantage Simultaneously measure tens of
thousands transcription in a single experiment
Called Array, Chip or slide
Spot (DNA probe) Oligo (25-80 mer) or cDNA
Principle Hybridization of known DNA probes on
the chip with complementary DNA sequence from the
sample
Substrate Glass, Nylon or Plastic
8Application 1 - Mechanistic Study
Treated rats
Untreated rats
Comparing
Identify the affected genes in the treated
condition (Differentially expressed genes (DEGs)
identification)
Mechanism
9 Data Format of DNA Microarray
Genes
1 2 3 4 . N
1 2 3 4 5 6 . . . . . . . m
1
2
Experiments
3
m
Differential expression
10(No Transcript)
11Complexity of Microarray Experiment An Array
of Options
Affymetrix (GeneChip, 25-mer, in-situ synthesis,
one-color) Agilent (60-mer, in-situ synthesis,
two-color) Applied Biosystems (60-mer,
chemilumiscence) Clontech (7080-mer) GE
Healthcare (CodeLink, 30-mer, one-color) Illumina
(BeadArray, 70-mer) MWG (50-mer) NimbleGen (MAS
in-situ synthesis) Operon (Qiagen)
(70-mer) Customized long oligo or cDNA arrays .
12ArrayTrack for Microarray Data Management and
Analysis
Hypothesis
Exp Design
Microarray Exp
Data management
Data analysis
Data interpretation
13MicroarrayDB Storing data associated with a
microarray exp
- Microarray database
- Handling both one- and two-channel data,
including affy data - Only the CEL file is required for affy data
- Supporting toxicogenomics research by storing tox
parameters, e.g., dose schedule and treatment,
sacrifice time - MIAME supportive to capture the key data of a
microarray experiment - Will be MAGE-ML compliant to ensure inter-
exchangeability between ArrayTrack and other
public databases
Microarray DB
14LIB Component Containing functional
information for microarray data interpretation
- Functional data
- Individual gene analysis
- Pathway-based analysis
- Gene Ontology based analysis
- Linking expression data to the traditional
toxicological data
Microarray DB
LIB
15TOOL Component- Containing functionality for
microarray data analysis
- Analysis tools
- Four normalization methods
- Mean/median scaling for affy data
- LOWESS for 2-color array
- Gene selection method
- T-test, permutation t-test,
- Filtering using fold changes, intensity, flag inf
- Volcano plot, p-value plot
- Data exploring (e.g., HCA, PCA)
- Many visualization tools (e.g., flexible scatter
plot, Bar chart viewer,
TOOL
Microarray DB
LIB
16TOOL
Microarray DB
LIB
17(No Transcript)
18Supporting Eight Platforms
- Affy, Agilent, ABI, Combimatrix, Eppendorf, GE
Healthcare, Illumina and customized arrays - Affy data
- Probe data (.cel file)
- Probe-set data
Individual hyb import
Batch import
19Comparing ArrayTrack-derived Gene Lists with
these reported by the sponsor
- The gene lists that presented in the submitted
report
20Normalization Methods
Four common normalization methods for converting
Affy probe data to probe-set data, including
MAS5, dChip, RMA, and Plier
Five common normalization methods for other
platforms, including LOWESS
21Gene Selection
- Significant genes can be identified based on
- T-test (with or without Bonferroni correction)
and permutation t-test - False Discovery Rate (FDR) (e.g., Benjamini
Hochberg, p-value plot) - Volcano Plot (considering both p and fold-change)
22Microarray Experiment Results
- Treated group
- Replicates
- Cancer cell lines
- Tumor tissues
Control group A set of control samples
Fold Change up-regulated or down-regulated P
statistical significance
23Gene Selection- T-test, Bonferroni adjustment
and beyond
Two types of experiment Error rate for the exp
Single testing 1 gene Plt0.05 low error
rate Multiple testing n genes P1-(1-Pi)n If
Pi0.05, high error rate e.g., If n10 and
Pi0.05, P0.401
Select a gene list based on
P value
Bonferroni criterion
Low sensitivity
Low power
False discovery rate (e.g., Benjamini Hochberg,
p-value plot) Permutation t-test (e.g.,
SAM) Volcano plot (combination of p and fold
change)
24Data Interpretation
- Pathway-based tools
- Ingenuity Pathways Analysis
- KEGG
- PathArt
GOFFA Gene Ontology-based tool
Gene Annotation
25Data Interpretation- Pathway-based analysis
using KEGG Library
- KEGG - Kyoto Encyclopedia of Genes and Genomes
(http//www.genome.jp/kegg/). - It provides a database (free) of metabolic,
regulatory and disease pathways Most of them are
metabolic pathways - ArrayTrack contains pathways for human (134), rat
(116) and mouse (124) - Click KEGG in GeneLib and the genes are
reorganized based on their involved pathways
26Data Interpretation- Pathway-based analysis
using PathArt
- PathArt (Jubilant) is a pathway database that
contains over 600 mammalian disease and signaling
pathways. - The pathways are collated through manual curation
from literature and public domain databases. - ArrayTrack contains PathArt pathways for human
(276), rat (116) and mouse (77) - Click PathArt in GeneLib and the genes are
reorganized based on their involved pathways (see
next slide)
27Ingenuity Pathways Analysis (IPA)
Ingenuity Pathways Analysis
Conduct statistical analysis
Interrogate genes or proteins on omics scale
Elucidate functional pathways
Understand markers of efficacy and safety
- KEGG and PathArt provide canonical pathways
- IPA provides both canonical and de-novo pathways
28Data Interpretation- Gene Ontology Analysis
(GOFFA)
29Data Interpretation- GO-based analysis using
GOFFA
- GOFFA Gene Ontology For Functional Analysis
- It is developed based on Gene Ontology (GO)
database - Important for grouping the genes into functional
classes - GO Three ontologies
- Molecular function activities performed by
individual gene products at the molecular level,
such as catalytic activity, transporter activity,
binding - Biological process broad biological goals
accomplished by ordered assemblies of molecular
functions, such as cell growth, signal
transduction, metabolism - Cellular component the place in the cell where a
gene product is found, such as nucleus, ribosome,
proteasome
30Data Interpretation- GO-based analysis using
GOFFA
- Each ontology (e.g., mol. function) is presented
as a hierarchical tree structure - Each node is a GO term that contains several
known genes - Levels represent the specificity of terms
P-Path View
Genes in a specific GO term (node)
Tree view
Hierarchical tree
Genes are searched again GO
Fisher test
of genes
Searching panel
31(No Transcript)
32Data Exploring
- Before gene selection remove outliers
- Mixed scatter plot
- Principal component analysis (PCA)
- After gene selection drill-down analysis
- Bar chart
- Hierarchical Clustering analysis (HCA)
33Data Exploring
Expression profile
PCA
HCA
34(No Transcript)
35Toxicogenomics Study
- Toxicology parameters Clinical pathology data
(Clinical chemistry, Hematology),
histopathology, liver weight - Gene expression data
- Other omics data
36(No Transcript)
37Study Data Management and Analysis
- FDA eSubmission efforts
- Clinical data Clinical Data Interchanges
Standards Consortium (CDISC) - Non-clinical data Standard for Exchange of
Nonclinical Data (SEND) - Subject, treatment, Clinical pathology,
histopathology, - Conforming to SDTM used for CDISC/SEND
- Microarray data management and analysis are
processed in Array Domain and the findings are
available to correlate with data in Study Domain
38ArrayTrack Tutorial
39 40 41Topic 1Comparing two groups (e.g., treated vs
control groups)
- The array data are uploaded into ArrayTrack and
normalized, and now what? - You are going to learn how to determine the
differentially expressed genes (DEGs) and make
sense out of it using the ArrayTrack analysis and
library functions
Select a set of saved arrays
Biological interpretation
Divide the arrays into treated and control groups
Individual gene analysis
Pathway analysis
T-test
Determine differentially expressed genes (DEGs)
Gene Ontology Analysis
42Examination of Transcriptional Fingerprints of
Primary Rat Hepatocytes Exposed to Cadmium Acetate
- Examine Cd-treated effect on rat hepatocytes at
multiple doses and time points - Affy chip RT-U34 (1030 genes)
- Only one dose and one time point are used 2 mg
and 12hrs - 12 hybridizations 6 treated vs. 6 control
B
Hepatocytes
12 hrs later 4 hybs for each animal and total
12 hybs.
Control
C
D
Treated with Cd (2 mg)
D0_T12_C_a
Naming Dose_Time_BioRep_TechRep
D0_T12_C_b
43(No Transcript)