ArrayTrack Data management, analysis and interpretation tool for DNA microarray and beyond - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

ArrayTrack Data management, analysis and interpretation tool for DNA microarray and beyond

Description:

AT version 2 (2002): in-house microarray core facility ... ArrayTrack contains pathways for human (134), rat (116) and mouse (124) ... – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 44
Provided by: wto8
Category:

less

Transcript and Presenter's Notes

Title: ArrayTrack Data management, analysis and interpretation tool for DNA microarray and beyond


1
ArrayTrack- Data management, analysis and
interpretation tool for DNA microarray and beyond
  • Weida Tong
  • Director, Center for Toxicoinformatics, NCTR/FDA
  • Weida.tong_at_fda.hhs.gov

2
ArrayTrack A brief history in the 5 years
Development Cycle
  • AT version 1 (2001)
  • Filter array data management tool
  • AT version 2 (2002) in-house microarray core
    facility
  • Customized two color arrays data management,
    analysis and interpretation
  • Open to public (late of 2003)
  • AT version 3.1 (2004) VGDS
  • Affymetrix analysis capability enhanced
  • AT version 3.2 (2005) MAQC
  • Tested on 7 commercial platforms (Affy, Agilent
    one- and two-color arrays, ABI, CodeLink,
    Illumina )
  • Integrated with other software (IPA, MetaCore,
    DrugMatrix, CEBS, SAS/JMP )
  • AT version 4 (2006 present)
  • CDISC/SEND standard
  • VGDS ? VXDS

3
ArrayTrack Client-Server Architecture
CLIENT
Analysis Tools
Pub data (Gene annotation, Pathways )
Study data (Clinical and non-clinical data)
Microarray Proteomics Metabolomics
SERVER
CDISC/SEND
MIAME
NCBI, KEGG, GO
4
ArrayTrack An Integrated Solution
Clinical and non-clinical data
Chemical data
ArrayTrack
5
ArrayTrack-Freely Available to Public
Web-access
Local installation
of unique users access the locally installed
version of ArrayTrack
of unique users access the web version of
ArrayTrack
6
ArrayTrack Website
http//www.fda.gov/nctr/science/centers/toxicoinfo
rmatics/ArrayTrack/
7
DNA Microarray
Key advantage Simultaneously measure tens of
thousands transcription in a single experiment
Called Array, Chip or slide
Spot (DNA probe) Oligo (25-80 mer) or cDNA
Principle Hybridization of known DNA probes on
the chip with complementary DNA sequence from the
sample
Substrate Glass, Nylon or Plastic
8
Application 1 - Mechanistic Study
Treated rats
Untreated rats
Comparing
Identify the affected genes in the treated
condition (Differentially expressed genes (DEGs)
identification)
Mechanism
9
Data Format of DNA Microarray
Genes
1 2 3 4 . N
1 2 3 4 5 6 . . . . . . . m
1
2
Experiments
3
m
Differential expression
10
(No Transcript)
11
Complexity of Microarray Experiment An Array
of Options
Affymetrix (GeneChip, 25-mer, in-situ synthesis,
one-color) Agilent (60-mer, in-situ synthesis,
two-color) Applied Biosystems (60-mer,
chemilumiscence) Clontech (7080-mer) GE
Healthcare (CodeLink, 30-mer, one-color) Illumina
(BeadArray, 70-mer) MWG (50-mer) NimbleGen (MAS
in-situ synthesis) Operon (Qiagen)
(70-mer) Customized long oligo or cDNA arrays .
12
ArrayTrack for Microarray Data Management and
Analysis
Hypothesis
Exp Design
Microarray Exp
Data management
Data analysis
Data interpretation
13
MicroarrayDB Storing data associated with a
microarray exp
  • Microarray database
  • Handling both one- and two-channel data,
    including affy data
  • Only the CEL file is required for affy data
  • Supporting toxicogenomics research by storing tox
    parameters, e.g., dose schedule and treatment,
    sacrifice time
  • MIAME supportive to capture the key data of a
    microarray experiment
  • Will be MAGE-ML compliant to ensure inter-
    exchangeability between ArrayTrack and other
    public databases

Microarray DB
14
LIB Component Containing functional
information for microarray data interpretation
  • Functional data
  • Individual gene analysis
  • Pathway-based analysis
  • Gene Ontology based analysis
  • Linking expression data to the traditional
    toxicological data

Microarray DB
LIB
15
TOOL Component- Containing functionality for
microarray data analysis
  • Analysis tools
  • Four normalization methods
  • Mean/median scaling for affy data
  • LOWESS for 2-color array
  • Gene selection method
  • T-test, permutation t-test,
  • Filtering using fold changes, intensity, flag inf
  • Volcano plot, p-value plot
  • Data exploring (e.g., HCA, PCA)
  • Many visualization tools (e.g., flexible scatter
    plot, Bar chart viewer,

TOOL
Microarray DB
LIB
16
TOOL
Microarray DB
LIB
17
(No Transcript)
18
Supporting Eight Platforms
  • Affy, Agilent, ABI, Combimatrix, Eppendorf, GE
    Healthcare, Illumina and customized arrays
  • Affy data
  • Probe data (.cel file)
  • Probe-set data

Individual hyb import
Batch import
19
Comparing ArrayTrack-derived Gene Lists with
these reported by the sponsor
  • The gene lists that presented in the submitted
    report

20
Normalization Methods
Four common normalization methods for converting
Affy probe data to probe-set data, including
MAS5, dChip, RMA, and Plier
Five common normalization methods for other
platforms, including LOWESS
21
Gene Selection
  • Significant genes can be identified based on
  • T-test (with or without Bonferroni correction)
    and permutation t-test
  • False Discovery Rate (FDR) (e.g., Benjamini
    Hochberg, p-value plot)
  • Volcano Plot (considering both p and fold-change)

22
Microarray Experiment Results
  • Treated group
  • Replicates
  • Cancer cell lines
  • Tumor tissues

Control group A set of control samples
Fold Change up-regulated or down-regulated P
statistical significance
23
Gene Selection- T-test, Bonferroni adjustment
and beyond
Two types of experiment Error rate for the exp
Single testing 1 gene Plt0.05 low error
rate Multiple testing n genes P1-(1-Pi)n If
Pi0.05, high error rate e.g., If n10 and
Pi0.05, P0.401
Select a gene list based on
P value
Bonferroni criterion
Low sensitivity
Low power
False discovery rate (e.g., Benjamini Hochberg,
p-value plot) Permutation t-test (e.g.,
SAM) Volcano plot (combination of p and fold
change)
24
Data Interpretation
  • Pathway-based tools
  • Ingenuity Pathways Analysis
  • KEGG
  • PathArt

GOFFA Gene Ontology-based tool
Gene Annotation
25
Data Interpretation- Pathway-based analysis
using KEGG Library
  • KEGG - Kyoto Encyclopedia of Genes and Genomes
    (http//www.genome.jp/kegg/).
  • It provides a database (free) of metabolic,
    regulatory and disease pathways Most of them are
    metabolic pathways
  • ArrayTrack contains pathways for human (134), rat
    (116) and mouse (124)
  • Click KEGG in GeneLib and the genes are
    reorganized based on their involved pathways

26
Data Interpretation- Pathway-based analysis
using PathArt
  • PathArt (Jubilant) is a pathway database that
    contains over 600 mammalian disease and signaling
    pathways.
  • The pathways are collated through manual curation
    from literature and public domain databases.
  • ArrayTrack contains PathArt pathways for human
    (276), rat (116) and mouse (77)
  • Click PathArt in GeneLib and the genes are
    reorganized based on their involved pathways (see
    next slide)

27
Ingenuity Pathways Analysis (IPA)
Ingenuity Pathways Analysis
Conduct statistical analysis
Interrogate genes or proteins on omics scale
Elucidate functional pathways
Understand markers of efficacy and safety
  • KEGG and PathArt provide canonical pathways
  • IPA provides both canonical and de-novo pathways

28
Data Interpretation- Gene Ontology Analysis
(GOFFA)
29
Data Interpretation- GO-based analysis using
GOFFA
  • GOFFA Gene Ontology For Functional Analysis
  • It is developed based on Gene Ontology (GO)
    database
  • Important for grouping the genes into functional
    classes
  • GO Three ontologies
  • Molecular function activities performed by
    individual gene products at the molecular level,
    such as catalytic activity, transporter activity,
    binding
  • Biological process broad biological goals
    accomplished by ordered assemblies of molecular
    functions, such as cell growth, signal
    transduction, metabolism
  • Cellular component the place in the cell where a
    gene product is found, such as nucleus, ribosome,
    proteasome

30
Data Interpretation- GO-based analysis using
GOFFA
  • Each ontology (e.g., mol. function) is presented
    as a hierarchical tree structure
  • Each node is a GO term that contains several
    known genes
  • Levels represent the specificity of terms

P-Path View
Genes in a specific GO term (node)
Tree view
Hierarchical tree
Genes are searched again GO
Fisher test
of genes
Searching panel
31
(No Transcript)
32
Data Exploring
  • Before gene selection remove outliers
  • Mixed scatter plot
  • Principal component analysis (PCA)
  • After gene selection drill-down analysis
  • Bar chart
  • Hierarchical Clustering analysis (HCA)

33
Data Exploring
Expression profile
PCA
HCA
34
(No Transcript)
35
Toxicogenomics Study
  • Toxicology parameters Clinical pathology data
    (Clinical chemistry, Hematology),
    histopathology, liver weight
  • Gene expression data
  • Other omics data

36
(No Transcript)
37
Study Data Management and Analysis
  • FDA eSubmission efforts
  • Clinical data Clinical Data Interchanges
    Standards Consortium (CDISC)
  • Non-clinical data Standard for Exchange of
    Nonclinical Data (SEND)
  • Subject, treatment, Clinical pathology,
    histopathology,
  • Conforming to SDTM used for CDISC/SEND
  • Microarray data management and analysis are
    processed in Array Domain and the findings are
    available to correlate with data in Study Domain

38
ArrayTrack Tutorial
 
39
 
40
 
41
Topic 1Comparing two groups (e.g., treated vs
control groups)
  • The array data are uploaded into ArrayTrack and
    normalized, and now what?
  • You are going to learn how to determine the
    differentially expressed genes (DEGs) and make
    sense out of it using the ArrayTrack analysis and
    library functions

Select a set of saved arrays
Biological interpretation
Divide the arrays into treated and control groups
Individual gene analysis
Pathway analysis
T-test
Determine differentially expressed genes (DEGs)
Gene Ontology Analysis
42
Examination of Transcriptional Fingerprints of
Primary Rat Hepatocytes Exposed to Cadmium Acetate
  • Examine Cd-treated effect on rat hepatocytes at
    multiple doses and time points
  • Affy chip RT-U34 (1030 genes)
  • Only one dose and one time point are used 2 mg
    and 12hrs
  • 12 hybridizations 6 treated vs. 6 control

B
Hepatocytes
12 hrs later 4 hybs for each animal and total
12 hybs.
Control
C
D
Treated with Cd (2 mg)
D0_T12_C_a
Naming Dose_Time_BioRep_TechRep
D0_T12_C_b
43
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com