Comparative Genomics - PowerPoint PPT Presentation

About This Presentation
Title:

Comparative Genomics

Description:

Comparative Genomics & Annotation. The Foundation of Comparative Genomics ... Human GenomeGlenn A. Maston, Sara K. Evans, Michael R. Green Annual Review of ... – PowerPoint PPT presentation

Number of Views:478
Avg rating:3.0/5.0
Slides: 29
Provided by: stati3
Category:

less

Transcript and Presenter's Notes

Title: Comparative Genomics


1
Comparative Genomics Annotation
The Foundation of Comparative Genomics Non-Compara
tive Annotation Three methodological tasks of CG
Annotation Protein Gene Finding RNA
Structure Prediction Signal
Finding Challenges Empirical Investigations
Genes Signals Functional Stories
Positive Selection Open Questions
2
Hidden Markov Models in Bioinformatics
  • Definition
  • Three Key Algorithms
  • Summing over Unknown States
  • Most Probable Unknown States
  • Marginalizing Unknown States
  • Key Bioinformatic Applications
  • Pedigree Analysis
  • Profile HMM Alignment
  • Fast/Slowly Evolving States
  • Statistical Alignment

3
Further Examples
Isochore Churchill,1989,92
Lp(C)Lp(G)0.1, Lp(A)Lp(T)0.4,
Lr(C)Lr(G)0.4, Lr(A)Lr(T)0.1
Likelihood Recursions
Likelihood Initialisations
Simple Eukaryotic
Gene Finding Burge and Karlin, 1996
Simple Prokaryotic
4
Further Examples
Secondary Structure Elements Goldman, 1996
a ? L
a .909 .0005 .091
? .005 .881 .184
L .062 .086 .852
.325 .212 .462
HMM for SSEs
Adding Evolution
SSE Prediction
Profile HMM Alignment Krogh et al.,1994
5
Grammars Finite Set of Rules for Generating
Strings
6
Simple String Generators Terminals (capital)
--- Non-Terminals (small) i. Start with S
S --gt aT bS T
--gt aS bT ? One sentence odd of as S-gt
aT -gt aaS gt aabS -gt aabaT -gt aaba ii. ?S--gt
aSa bSb aa bb One sentence (even length
palindromes) S--gt aSa --gt abSba --gt abaaba
7
Stochastic Grammars
The grammars above classify all string as
belonging to the language or not.
All variables has a finite set of substitution
rules. Assigning probabilities to the use of
each rule will assign probabilities to the
strings in the language.
If there is a 1-1 derivation (creation) of a
string, the probability of a string can be
obtained as the product probability of the
applied rules.
i. Start with S. S --gt (0.3)aT (0.7)bS
T --gt (0.2)aS (0.4)bT (0.2)?
0.2
0.7
0.3
0.3
S -gt aT -gt aaS gt aabS -gt aabaT -gt aaba
0.2
ii. ?S--gt (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb
0.1
0.3
0.5
S -gt aSa -gt abSba -gt abaaba
8
Finding Regulatory Signals in Genomes
The Computational Problem Non-homologous/homo
logous sequences Known/unknown signal
1 common signal/complex signals/additional
information Combinations
Regulatory signals know from molecular biology
Different Kinds of Signals
Promotors Enhancers
Splicing Signals
a-globins in humans
9
Weight Matrices Sequence Logos
Wasserman and Sandelin (2004) Applied
Bioinformatics for the Identification of
Regulatory Elements Nature Review Genetics
5.4.276
10
Motifs in Biological Sequences 1990 Lawrence
Reilly An Expectation Maximisation (EM)
Algorithm for the identification and
Characterization of Common Sites in Unaligned
Biopolymer Sequences Proteins 7.41-51. 1992
Cardon and Stormo Expectation Maximisation
Algorithm for Identifying Protein-binding sites
with variable lengths from Unaligned DNA
Fragments L.Mol.Biol. 223.159-170 1993 Lawrence
Liu Detecting subtle sequence signals a Gibbs
sampling strategy for multiple alignment Science
262, 208-214.
Q(q1,A,,qw,T) probability of different bases
in the window
A(a1,..,aK) positions of the windows
q0(qA,..,qT) background frequencies of
nucleotides.
Priors A has uniform prior Qj
has Dirichlet(N0a) prior a base frequency in
genome. N0 is pseudocounts
11
The Gibbs Sampler
For i1,..,d Draw xi(t1) from conditional
distribution p(.x-i(t)) and leave remaining
components unchanged, i.e. x-i (t1) x-i
(t)
12
The Gibbs sampler
Gibbs iteration
From Lawrence, C. et al.(1993) Detecting Subtle
Sequence Signals A Gibbs Sampler approach to
Multiple Alignment. Science 262.208-
13
The Gibbs sampler example
From Lawrence, C. et al.(1993) Detecting Subtle
Sequence Signals A Gibbs Sampler approach to
Multiple Alignment. Science 262.208-
14
Natural Extensions to Basic Model I
Modified from Liu
15
Natural Extensions to Basic Model II
16
Combining Signals and other Data
Modified from Liu
17
MEME- Multiple EM for Motif Elicitation
Motif nucleotide distribution Mp,q, where p -
position, q-nucleotide. Background
distribution Bq, l is probability that a Zi,j
1
Find M,B, l, Z that maximize Pr (X, Z M, B,
l) Expectation Maximization to find a local
maximum Iteration t Expectation-step Z(t)
E (Z X, (M, B, l) (t) )
Maximization-step Find (M, B, l) (t1) that
maximizesPr (X, Z(t) (M, B, l) (t1))
Bailey, T. L. and C. Elkan (1994). "Fitting a
mixture model by expectation maximization to
discover motifs in biopolymers." Proc Int Conf
Intell Syst Mol Biol 2 28-36.
18
Phylogenetic Footprinting (homologous detection)
Blanchette and Tompa (2003) FootPrinter a
program designed for phylogenetic footprinting
NAR 31.13.3840-
19
(No Transcript)
20
Statistical Alignment and Footprinting.
Solution Cartesian Product of HMMs
21
Structure does not stem from an evolutionary
model
  • The equilibrium annotation
  • does not follow a Markov Chain
  • Each alignment in from the Alignment HMM
  • is annotated by the Structure HMM
  • No ideal way of simulating

using the HMM at the alignment will give other
distributions on the leaves
using the HMM at the root will give other
distributions on the leaves
22
(Homologous Non-homologous) detection
Wang and Stormo (2003) Combining phylogenetic
data with co-regulated genes to identify
regulatory motifs Bioinformatics 19.18.2369-80
23
Regulatory Signals in Humans
Transcription in Eukaryotes is done by RNA
Polymerase II. 1850 DNA-binding proteins in the
human genome.
  • Transcription Start Site - TSS
  • Core Promoter - within 100 bp of TSS
  • Proximal Promoter Elements - 1kb TSS
  • Locus Control Region - LCR
  • Insulator
  • Silencer
  • Enhancer

Sourece Transcriptional Regulatory Elements in
the Human GenomeGlenn A. Maston, Sara K. Evans,
Michael R. GreenAnnual Review of Genomics and
Human Genetics. Volume 7, Sep 2006
24
Sourece Transcriptional Regulatory Elements in
the Human GenomeGlenn A. Maston, Sara K. Evans,
Michael R. GreenAnnual Review of Genomics and
Human Genetics. Volume 7, Sep 2006
25
a-globins
Multispecies Conserved Sequences - MCSs Analyzed
238kb in 22 species Found 24 MCSs Programs use
GUMBY - VISTA - MULTIPIPMAKER MULTILAGAN -
CLUSTALW - DIALIGN TRANSFAC 6.0 - TRES -
Experimental Knowledge of the region
Hypersensitive sites (DHSs) DNA
Methylation Region lies in CG rich, gene rich
region close to the telomeres. It is not easy
to align CG-islands.
26
Promoters in a-globins
  • 94.273-114.273 vista illus.
  • 5 MCSs
  • Divergence relative to human
  1. Promoters MCSs - 11
  2. Regulatory MCSs - 4
  3. Intronics MCSs - 2
  4. Exonic MCSs - 4
  5. Unknown - 3

Sourece Hughes et al.(2005) Annotation of
cis-regulatory elements by identification,
subclassification, and functional assessment of
multispecies conserved sequences PNAS 2005 102
9830-9835
27
Regulatory Protein-DNA Complexes
28
Challenges
Open Problems
Write a Comment
User Comments (0)
About PowerShow.com