Title: Prof. Yike Guo
1Data Management and Mining in BioArray
Informatics
- Prof. Yike Guo
- Dept. of Computing, Imperial College, London
2Goal
- Understand the basic bioarray technology
including microarray technology for gene
expression, protein chips, NMR spectroscopy and
other high throughout devices - Learn the basic analytical technology and its
applications to the bioarray information - Learn the analysis processes of processing and
analysing bioarray data (e.g. gene expression
analysis)
3Lecture Overview
- Lecture One BioArray Informatics Introduction
- Lecture Two BioArray Technology
- Lecture Three Analysis Technology (1)Data
Normalisation and Transformation - Lecture Four Analysis Technology
(2)--Clustering and Classification - Lecture Five Analysis Technology (3)
Multivariate Statistics - Lecture Six Analysis Applications (1)Gene
Expression Analysis - Lecture Seven Analysis Application
(2)Integrative Analysis of BioArray Data
4BioArray Informatics Integrative Analysis of
BioArray Data within the Biological Context
secondary structure tertiary structure
polymorphism patient records epidemiology
expression patterns physiology
sequences alignments
receptors signals pathways
ATGCAAGTCCCT AAGATTGCATAA GCTCGCTCAGTT
linkage maps cytogenetic maps physical maps
5Functional -Omics Analysis
REAL WORLD
INPUTS NOXIOUS AGENT/STRESSOR OUTPUTS BIOLO
GICAL END-POINTS PATHOLOGY
ALTERED PHYSIOLOGY AND METABOLISM
6A Dynamics in BioArray Informatics
Interactions
Environment
DNA
Protein
Growth rate
Expression
7A mathematical model
8BioArray Provides the Means for Revealing the
Interaction
Relations 1- gene homologs 2- gene encodes a
protein 3- protein can regulate the expression of
a gene 4- protein phosphorylates another protein
5- protein binds to another protein 6- protein
lyses another protein 7- Proteins can sometimes
be receptors 8- Receptors bind a ligand 9-
Receptors (if bound) activate other proteins
9BioArray Quantitative Measurement of Biological
Concepts
experiment
ORF
- R/G ratios
- R, G values
- quality indicators
control
- Microarrays1
- 1000 bp hybridization
10Quantitative Analysis
Reproducibility confidence intervals to find
significant deviations
11BioArray Informatics BioArray is the data,
everything else is Informatics
- Data Engineering
- Data Warehousing
- Data Integration
- Data Analysis
- Knowledge Discovery
- Discovery Integration
- Discovery Validation
- Knowledge Integration
- Knowledge Warehousing
12Data Warehousing
Data Sources
External Data Sources
Operational Data Sources
Data Warehousing
13Example - ArrayExpress
14Data Warehousing and Data Integration
15Data Schema in Warehousing A Gene Expression
Example
Gene Expression Warehouse
OMIM
Enzyme
Protein
Disease
Affy Fragment
Known Gene
Sequence
Pathway
SNP
Metabolite
Sequence Cluster
KEGG
Genbank
NMR
16A Workflow of Gene Expression Database
Data Reduction Queries
Warehousing Output
Comparisons
Profile Report
between 2 samples
Set Fold Change
Comparisons
(e.g., gt 2X)
between multiple
Data in
User defined
samples
analysis
dataset
Set higher avg difference
value (e.g., gt200)
Visualisation
A-gtP/ P-gtA stringency
(e.g., 80)
Advanced Gene Expression Analysis
17Queries, Queries..
- Query to the data
- Which genes are linked ?
- Which genes are expressed similarly to my gene
XYZ? - Which genes are co-expressed in differing
conditions ? - classification (of tumors, diseased tissues
etc.) which patterns are characteristic for a
certain class of samples, which genes are
involved? - functional classification of genes Are changes
clustered in particular classes? - metabolic pathway information Is a certain
pathway/route in a pathway affected? - disease information clinical follow up
correlation to expression patterns. - phenotype information for mutants Are there
correlations between particular phenotypes and
expression patterns?
18Gene Expression Data Analysis Work Flow
Data in
Knowledge Deliverables
Interactive Analysis Procedures
analysis
Cluster by genes
Study outliers
Correlate clinical
measurements
Literature analysis
Time course analysis
Defined subsets of
genes
Classic drug targets
Examples, not
Known disease association
exhaustive
Cross species indices
19(Un)fortunately, Scientists never think linearly
- Why those genes are co-expressed?
- What do their protein products do?
- What is the common regulatory motifs of a
co-expressed gene set? - Can we patent them?
- Do we know which metabolic pathway they are in?
If there is no, can I synthesis one? - Are there HTS results for any proteins in the
pathway? - Are there any compounds in the HTS library that
hit selectively and consistently against those
proteins? - Which ones have good activity, availability and
toxicity?
20Advanced Analysis
- Discovery Annotation and Validation
- E.X. Annotating a set of co-expressed genes
with some conserved regulatory motifs - E.X. Scoring a co-expression pattern with
pathways - E.X. Literature analysis to annotate biological
semantics - Integrative Analysis
- E.X. Multi-modality Analysis
- E.X. Cross Annotation of Discovered Patterns
- Modelling and Simulation
- E.X. Pathway Synthesis
- E.X. Virtual Cell Modelling
21Pathway Scoring
22Analysis of Gene Expression Data with Pathway
Scores
Our Approach