Catalogues,%20Homology%20 - PowerPoint PPT Presentation

About This Presentation
Title:

Catalogues,%20Homology%20

Description:

Catalogues, Homology & Molecular ... (Or the structural use of molecular evolution. ... Alignment, then conserved exons, etc. (TwinScan,GLASS, Rosetta) ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 40
Provided by: Jotun
Category:

less

Transcript and Presenter's Notes

Title: Catalogues,%20Homology%20


1
Catalogues, Homology Molecular Evolution.
(Or the structural use of molecular evolution.)
1. Using evolution by listing its products as
basic building blocks.
Example D. Baker and protein structure
prediction.
2. Observing Homology and hypothesizing similar
properties.
Example Most Database Searches, i.e. BLAST.
3. Observing the path of evolution
2
Molecular Evolution Structure
Fundamental Observation The Molecular
Evolution of a position in a
molecule depends
on its Structure. Consequence
Observing Molecular Evolution contains

information about the Structure.
3
Overview.
Three examples of the use of observing molecular
evolution Protein Secondary Structure
RNA Secondary Structure Comparative Genome
Annotation
Technicalities Structure Description
Hidden Markov Models Protein Secondary
Structure/Gene Finding Context Free
Grammars RNA Structure, regulatory signals
Molecular Evolution Description
The Generality of the Problem
4
Structure Dependent Molecular Evolution I Protein
Secondary Structure
NDAHIWFHWWYVKHGCDNDAHIWFHWWYVKHGCDVVHISA
a ? L
a .909 .0005 .091
? .005 .881 .184
L .062 .086 .852
.325 .212 .462
L
a
From Goldman et al.(1996) JMB.
5
Structure Dependent Molecular Evolution II RNA
Secondary Structure
From Durbin et al.(1998) Biological Sequence
Comparison
Secondary Structure Set of paired
positions. A-U C-G can base pair. Some
other pairings can occur triple interactions
exists. Pseudoknot non nested pairing i lt
j lt k lt l and i-k j-l.
6
Structure Dependent Molecular Evolution III Genes
Bases in non-coding coding 1 2
3 A .249 .245 .294
.171 C .251 .247
.227 .279 G .251
.362 .179 .293 T .249
.146 .301 .256
Coding Non-coding m .225
.350 ts/tv 2.5
1.89 dN/dS .14
about 1 Diffs .15
.30
7
Goldman, Thorne Jones Structure Evolution
1 A S D F G H J K L P 2 A S D F G H J K
L P 3 D S D F G K J K L C 4 D S D F G K
J K L C HMM ?? x x x x x
?????????????? x x L x x x
8
Three Questions
What is the probability of the data? What is the
most probable hidden configuration? What is the
probability of specific hidden state?
Training Given a set of instances, find
parameters making them
probable if they were independent.
9
Goldman-Thorne-Jones Application to 7 Xylanases
From Goldman et al.(1996) JMB.
10
A few small RNA Structures
Three nucleotides
Four nucleotides
..........
Forbidden
(Pseudo knot)
11
(No Transcript)
12
Simple String Generators Terminals (capital)
--- Non-Terminals (small) i. Start with S
S --gt aT bS T
--gt aS bT ? One sentence odd of as S-gt
aT -gt aaS gt aabS -gt aabaT -gt aaba ii. ?S--gt
aSa bSb aa bb One sentence (even length
palindromes) S--gt aSa --gt abSba --gt abaaba
13
Secondary Structure Generators
S --gt LS L .869 .131 F
--gt dFd LS .788 .212 L --gt
s dFd .895 .105
14
Chomsky Linguistic Hierarchy Source Biological
Sequence Comparison W nonterminal sign, a any
sign ??????? are strings, but ?, not null
string. ? Empty String i. Regular Grammars
W --gt aW W --gt a ii. Context-Free
Grammars W --gt ? iii. Context-Sensitive
Grammars ?1W?2 --gt ?1????2 iv. Unrestricted
Grammars ?1W?2 --gt ? The above listing
is in increasing power of string generation. For
instance "Context-Free Grammars" can generate
all sequences "Regular Grammar" can in addition
to some more.
15
SCFG Analogue to HMM calculations
HMM/Stochastic Regular Grammar
SCFG - Stochastic Context Free Grammars
W
WL
WR
j
L
i
1
i
j
16
(No Transcript)
17
From Knudsen Hein (1999)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Comparative Gene Finding
Non-Comparative Gene Finding Gene
Characteristics HMMs (i.e. GeneScan) Comparati
ve Gene Finding Structure Homology
Procrustes Protein Databases Gene Finding
Alignment, then conserved exons, etc.
(TwinScan,GLASS, Rosetta) Alignment with
Gene Finding (Huson, Scharling, Blayo,..)
Structure Evolution Skou Pedersen,
Irmtraud Meyer
22
Quality-of-Performance Measures Burset Guigo in
Genomics 34.354-
23
The Total Gene Finding Problem
Data Genomes, proteins ESTs (Expressed
Sequence Tags)
Proteins
Genomes
Tasks Gene Grammar/Knowledge of Genes.
Genome Sequence Alignment.
Genome - (protein/EST) alignment.
Combined Gene Finding
24
Gene and non-gene characteristics. Gene
characteristics i. dinucleotide, codon
dicodon characteristics ii. regulatory
regions iii. start - splice - termination
signals iv. vague characteristics found by
Neural Networks. v. Gene Evolution
replacement/silent substitutions lt 1, few
insertion-deletion, most of length
k3. Non-gene characteristics i. Many
repeats ii. Non-Gene Evolution
replacement/silent substitutions 1, many
insertion-deletion also would-be frame-shifts.
25
Gene Finding and Protein (HMM) Descriptors Burge
Karlin jmb 96
  • Make gene characteristics to each nucleotide.
    Extract legal prediction by dynamical
    programming.
  • B. Use HMM to describe biological knowledge of
    gene structure.

26
GENSCANGene Finding and Protein (HMM)
Descriptors Burge Karlin jmb 96
27
Molecular Evolution and Gene Finding Two HMMs
AGTGGTACCATTTAATGCG..... PcodingATG--gtGTG
or AGTGGTACTATTTAGTGCG..... Pnon-codingATG--gtGT
G
Simple Prokaryotic
Simple Eukaryotic
28
Molecular Evolution and Gene Finding
Meyer/Durbin/Goldman from Sanger Centre/Cambridge
is working on highly similar approach. W-H-Li,
T.Speed .. is has simlar methods. Many have
similar, but non-evolutionary approaches.
29
Performance as sequence number grows.
Simulated Sequences related by a binary tree
analyzed using the true model.
Prob. that predicted gene is there.
Prob. that existing gene is predicted.
30
Sensitivity to selection and relatedness
31
Gene Finding Protein Homology (Gelfand, Mironov
Pevzner, 1996)
Protein Database
Exon Ordering Graph
Spliced Alignment 1. Define set of potential
exons in new genome. 2. Make exon ordering graph
- EOG. 3. Align EOG to protein database.
T Y G H L P
T Y G H L P T Y - - L P M
Y
L P M
T
W
Q
32
Simultaneous Alignment Gene Finding Bafna
Huson, 2000, T.Scharling,2001 Blayo,2002.
Align by minimizing Distance/ Maximizing
Similarity
Align genes with structure Known/unknown
33
Simultaneous Alignment Gene Finding Bafna
Huson, 2000, T.Scharling,2001 Blayo,2002.
Can only be done using similarity maximisation.
1- Type Similarity Recursion Si,j
MaxSi-1,j-1si,j, Si-1,j- g, Si,j-1 - g
si,j log(Pi,j/PiPj)
Simple Model of Genome Fastly Slowly Evolving
Single Positions.
2- Type Similarity Recursion (a) SFi,j
MaxSFi-1,j-1sFi,j , SFi-1,j- g , SFi,j-1 -
g SSi-1,j-1sSi,j -c, SSi-1,j- g
-c, SSi,j-1 - g -c (b) SSi,j
MaxSSi-1,j-1sSi,j , SSi-1,j- g , SSi,j-1 -
g SFi-1,j-1sFi,j -c, SFi-1,j- g
-c, SFi,j-1 - g -c
34
Trivial Suboptimal Solutions
Suboptimal Solutions in Dynamical
Programming/String Matching i. Solutions
within e of optimum ii. Label Edges/Nodes
touched by SubOpt.
50/60 42/50 32/40 27/30 17/20 22/10
19/0 G 40/50 32/40 22/30 17/20 22/10
17/2 27/10 T 30/40 22/30 12/25 22/17
12/7 22/12 32/20 G 20/32 12/27 2/17
12/7 22/12 32/22 42/30 T 10/27 2/17
10/12 20/17 30/32 40/32 50/40 T 0/19
10/12 20/22 30/27 40/37 50/42 60/50
C T A G G A
35
An Idea/Problem Non-trivial Suboptimal Solutions
Alternative Splicing Motivated Problem Find
Non-trivial Suboptimal Solutions!!! A trivial
sub-solution is an optimal solution slightly
worsened.
36
Alternative Splicing
37
Summary
General Problem Entities observed in Homologous
Variants.
Two Applications RNA Structures.
Gene Finding.
Describe its evolutionary process Describe their
evolutionary relationship Infer hidden structure
or combine observations optimally.
Two Problems No Structure Evolution
Alignment unproblematic
Future Alternative Splicing Viral
genes
38
Recommended Literature
Vineet Bafna and Daniel H. Huson (2000) The
Conserved Exon Method for Gene Finding ISMB 2000
pp. 3-12 S.Batzoglou et al.(2000) Human and
Mouse Gene Structure Comparative Analysis and
Application to Exon Prediction. Genome Research.
10.950-58. Blayo, Rouze Sagot (2002) Orphan
Gene Finding - An exon assembly approach
J.Comp.Biol. Delcher, AL et al.(1998) Alignment
of Whole Genomes Nuc.Ac.Res. 27.11.2369-76. Grave
ly, BR (2001) Alternative Splicing increasing
diversity in the proteomic world. TIGS
17.2.100- Guigo, R.et al.(2000) An Assesment of
Gene Prediction Accuracy in Large DNA Sequences.
Genome Research 10.1631-42 Kan, Z. Et al. (2001)
Gene Structure Prediction and Alternative
Splicing Using Genomically Aligned ESTs Genome
Research 11.889-900. Ian Korf et al.(2001)
Integrating genomic homology into gene structure
prediction. Bioinformatics vol17.Suppl.1 pages
140-148 Tejs Scharling (2001) Gene-identification
using sequence comparison. Aarhus University JS
Pedersen (2001) Progress Report Comparative Gene
Finding. Aarhus University Reese,MG et
al.(2000) Genome Annotation Assessment in
Drosophila melanogaster Genome Research
10.483-501. Stein,L.(2001) Genome Annotation
From Sequence to Biology. Nature Reviews Genetics
2.493-
39
Acknowledgements
Comparative RNA Structure - Bjarne
Knudsen http//www.daimi.au.dk/compbio/pfold/
Comparative Gene Structure - Jakob Skou Pedersen
Write a Comment
User Comments (0)
About PowerShow.com