Title: Bioinformatics Tools
1FINAL PROJECT- Key dates 9.1 last day to
decided on a project 18,23,24/1- Presenting a
proposed project in small groups A very short
presentation (Max 5 minutes) Title-
Background Main question
Major tools you are planning to use to answer
the questions 6.3 Final submission
2Gene Expression Analysis
3Gene Expression
protein
RNA
DNA
4Gene Expression
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
mRNA gene1
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
mRNA gene2
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
mRNA gene3
5Studying Gene Expression 1987-2010
Spotted microarray
One channel microarray
RNA-seq (Next Generation Sequencing)
6Applications
- Identify gene function
- Similar expression can infer similar function
- Find tissue/developmental specific genes
- Different expression in different cells/tissues
- Diagnostics and Therapy
- Different genes expression can indicate a disease
state - Genes which change expression in a disease can be
good candidates for drug targets
7Classical Methods
- Different types of microarray technologies
- Spotted Microarray
- Two channel cDNA microarrays.
- DNA Chips
- One Channel microarrays
- (Affymetrix, Agilent),
-
-
8Microarray Experiment
http//www.bio.davidson.edu/Courses/genomics/chip/
chip.html
9One channel DNA chips
- Each sequence is represented by a probe set
colored with one fluorescent dye - Target hybridizes to complimentary probes only
- The fluorescence intensity is indicative of the
expression of the target sequence
10Expression Data Format
cold normal hot uch1 -2.0 0.0
0.924 gut2 0.398 0.402 -1.329 fip1
0.225 0.225 -2.151 msh1 0.676 0.685
-0.564 vma2 0.41 0.414 -1.285
meu26 0.353 0.286 -1.503 git8 0.47
0.47 -1.088 sec7b 0.39 0.395 -1.358
apn1 0.681 0.636 -0.555 wos2
0.902 0.904 -0.149
11RNA-seq
12Gene Expression Analysis
- Unsupervised
- -Hierarchical Clustering
- -Partition Methods
- K-means
- Supervised Methods
- -Analysis of variance
- -Discriminant analysis
- -Support Vector Machine (SVM)
13Clustering genes according to their expression
profiles
Experiments
Genes
14Clustering
- Clustering organizes things that are close into
groups. - - What does it mean for two genes to be close?
- - Once we know this, how do we define groups?
15What does it mean for two genes to be close?
We need a mathematical definition of distance
between the expression of two genes
Gene 1
Gene 2
Gene1 (E11, E12, , E1N) Gene2 (E21, E22, ,
E2N)
For example distance between gene 1 and
2 Euclidean distance Sqrt of Sum of (E1i -E2i)2,
i1,,N
16Once we know this, how do we define groups?
Michael Eisen, 1998 Generate a tree based on
similarity (similar to a phylogenetic tree) Each
gene is a leaf on the tree Distances reflect
similarity of expression
Hierarchical Clustering
Gene Cluster
Genes
Experiments
17Internal nodes represent different functional
Groups (A, B, C, D, E)
genes
One genes may belong to more than one cluster
18Clusters can be presented by graphs
19What can we learn from clusters with similar gene
expression ??
- Similar expression between genes
- The genes have similar function
- One gene controls the other in a pathway
- All genes are controlled by a common regulatory
genes - Clusters can help identify regulatory motifs
- Search for motifs in upstream promoter regions of
all the genes in a cluster
20EXAMPLE- hnRNP A1 and SRp40 Gene with similar
expression pattern tend to have common functions
HnRNPA1 and SRp40 have a similar gene expression
pattern in different tissues
21EXAMPLE- hnRNP A1 and SRp40 Gene with similar
expression pattern tend to have common functions
hnRNP A1
SRp40
22Are they regulated by the same transcription
factor ?
1. Extract their promoter regions
2. Find a common motif in both sequences (MEME)
hnrnpA1
SRp40
gene
Promoter
Common motif
3. Identify the transcription factor related to
the motif http//jaspar.cgb.ki.se/
23Extract the promoters of the genes in the
cluster and find a common motif (using MEME)
gtGGATAACAATTTCACAAGTGTGTGAGCGGATAACAA gtAAGGTGTGAGT
TAGCTCACTCCCCTGTGATCTCTGTACATAG gtACGTGCGAGGATGAGAA
CACAATGTGTGTGCTCGGTTTAGTCACC gtTGTGACACAGTGCAAACGCG
CCTGACGGAGTTCACA gtAATTGTGAGTGTCTATAATCACGATCGATTTG
GAATATCCATCACA gtTGCAAAGGACGTCACGATTTGGGAGCTGGCGACC
TGGGTCATG gtTGTGATGTGTATCGAACCGTGTATTTATTTGAACCACAT
CGCAGGTGAGAGCCATCACAG gtGAGTGTGTAAGCTGTGCCACGTTTATT
CCATGTCACGAGTGT gtTGTTATACACATCACTAGTGAAACGTGCTCCCA
CTCGCATGTGATTCGATTCACA
24Create a Multiple Sequence Alignment
GGATAACAATTTCACA TGTGAGCGGATAACAA TGTGAGTTAGCTCAC
T TGTGATCTCTGTTACA CGAGGATGAGAACACA CTCGGTTTAGTTCA
CC TGTGACACAGTGCAAA CCTGACGGAGTTCACA AGTGTCTATAATC
ACG TGGAATATCCATCACA TGCAAAGGACGTCACG GGCGACCTGGGT
CATG TGTGATGTGTATCGAA TTTGAACCACATCGCA GGTGAGAGCCA
TCACA TGTAAGCTGTGCCACG TTTATTCCATGTCACG TGTTATACAC
ATCACT CGTGCTCCCACTCGCA TGTGATTCGATTCACA
25Generate a PSSM
Find the transcription factor which bind the motif
26How can we use microarray for diagnostics?
27Gene-Expression Profiles in Hereditary Breast
Cancer
- Breast tumors studied
- BRCA1
- BRCA2
- sporadic tumors
- Log-ratios measurements of 3226 genes for each
tumor after initial data filtering
RESEARCH QUESTION Can we distinguish BRCA1 from
BRCA2 cancers based solely on their gene
expression profiles?
28How can microarrays be used as a basis for
diagnostic ?
5 Breast Cancer Patient
Patient 1 patient 2 patient 3 patient4 patient 5
Gen1 - -
Gen2 - -
Gen3 - -
Gen4 - -
Gen5 - - -
29How can microarrays be used as a basis for
diagnostic ?
BRCA1
BRCA2
patinet1 patient 2 patient4 patient 3 patient 5
Gen1 - -
Gen3 - -
Gen4 - -
Gen2 - -
Gen5 - - -
Informative Genes
30Specific Examples
Cancer Research
Hundreds of genes that differentiate
between cancer tissues in different stages of the
tumor were found. The arrow shows an example of a
tumor cells which were not detected correctly
by histological or other clinical parameters.
Ramaswamy et al, 2003 Nat Genet 3349-54
31Supervised approachesfor predicting gene
function based on microarray data
- SVM would begin with a set of genes that have a
common function (red dots), In addition, a
separate set of genes that are known not to be
members of the functional class (blue dots) are
specified. -
32- Using this training set, an SVM would learn to
differentiate between the members and
non-members of a given functional class based on
expression data.
- Having learned the expression features of the
class, the SVM could recognize new genes as
members or as non-members of the class based on
their expression data.
33Using SVMs to diagnose tumors based on
expression data
Each dot represents a vector of the expression
pattern taken from a microarray experiment . For
example the expression pattern of all genes from
a cancer patients.
34How do SVMs work with expression data?
In this example red dots can be primary tumors
and blue are from metastasis stage. The SVM is
trained on data which was classified based on
histology.
After training the SVM we can use it to diagnose
the unknown tumor.
35Gene Expression Databasesand Resources on the Web
- GEO Gene Expression Omnibus
- - http//www.ncbi.nlm.nih.gov/geo/
- List of gene expression web resources
- http//industry.ebi.ac.uk/alan/MicroArray/
- Another list with literature references
- http//www.gene-chips.com/
- Cancer Gene Anatomy Project
- http//cgap.nci.nih.gov/
- Stanford Microarray Database
- http//genome-www.stanford.edu/microarray/