Title: ????????F????? ?? ???a
1????????F????? ?????aµ???? ?????aµµat?sµ??
2???aµ???? ?????aµµat?sµ??
- St????s? (t?p???-?????)
- RNA secondary structure prediction
- ??aµeµß?a???? tµ?µata
- Hidden Markov Models
- ???e? efa?µ????
3St????s?
- ?????
- ??p???
- ??d???? pe??pt?se??
4???aµ???? p????aµµat?sµ??
5??? pe??pt?se?? st????se??
F(i,0)-id, F(0,j)-jd
F(i,0)0, F(0,j)0
6?????? ??a ta ?e?? (gap penalties)
?p?? p???? ??a ta ?e??
S???et? p???? ??a ta ?e??
7?a??de??µa
?st? d?? a???????e?
?? ????µe ??a ta ?e??
d1
??te ? ?a??te?? ????? st????s? ?a e??a?
A A G T T A G C A G C A G T A T C G C A -
8????? st????s?
A A G T T A G C A G C A G T A T C G C A -
9??p??? st????s?
A G T T A G C A A G T A T C G C A
10????? a??????µ??
- ?p?????? ep?s?? e?d???? pe??pt?se?? st????s??
(p.?. p??sa?µ???) - T????µe d??ad? ?a e?t?p?s??µe, µ?a µ????
a???????a a? s??a?t?ta? se µ?a µe?a??te?? - ?st? ?t? ?????µe ?a a????e?s??µe a? st??
a????????a t?? ????d??? lacI t?? E.coli ?p???e? ?
???st? a????????a t?? ?p?????t? (promoter). ?st?
a??µa ?t? t? tµ?µa t?? ????d??? ??e? a????????a - ?a? ? a????????a t?? ?p?????t? e??a?
11s????e?a
F(i,0)-id F(0,j)0.
12?a? ? a???????a t?? p??a??? ?p?????t? e??a?
C A T G A T
13RNA secondary structure prediction
14Nussinov
15(No Transcript)
16??aµeµß?a???? tµ?µata
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21?a 3 ßas??? e??t?µata se ??a ??? ...
- ??t?µ?s?
- ?ed?µ???? t?? µ??t????, p?? ?a ?p?????s??µe t??
????? p??a??t?ta µ?a? a???????a? s?µß????.
P(x?) - ?p???d???p???s?
- ??? ?a ß???µe t?? p?? p??a?? a????????a
?atast?se?? (path) ap? t?? ?p??a ??e? d????e? t?
µ??t???, ??a ?a d?se? t?? s???e???µµ??? a???????a
s?µß????. - ??pa?de?s?
- ??? ?a t??p?p???s??µe t?? pa?aµ?t???? t??
µ??t????, ?ts? ?ste ?a µe??st?p????e? ? s???????
p??a??f??e?a t?? a????????? - ?MLargmaxP(x?)
22... ?a? ?? apa?t?se?? t???
- ??t?µ?s?
- ???????µ?? FORWARD, a??????µ?? d??aµ????
p????aµµat?sµ??, p?? ?p??????e? t?? s???????
p??a??t?ta t?? a???????a?, ????? ?a d????e? ap?
??a ta d??at? µ???p?t?a (a????????e?
?atast?se??). - ?p???d???p???s?
- ???????µ?? t?? VITERBI, a??????µ?? d??aµ????
p????aµµat?sµ??, p?? µ?s? a?ad??µ?? (recursion)
?p??????e? t?? p?? p??a?? a????????a ?atast?se??
??a t? ded?µ??? a???????a ?a? t? ded?µ???
µ??t???. (??a??a?t??? NBEST). - ??pa?de?s?
- ???????µ?? t?? BAUM-WELCH (? a?????
FORWARD-BACKWARD), e?d??? pe??pt?s? t??
a??????µ?? ?? (Expectation-Maximization), ?
?p???? ?e????eta? ta ded?µ??a sa? ded?µ??a µe
e??e?p?? t?µ?? (missing values) ?a? ?p??????e?
?.?.?. ??a t?? pa?aµ?t???? t?? µ??t????
(??a??a?t??? Gradient Descent). -
23???????µ?? Forward
24(No Transcript)
25(No Transcript)
26???????µ?? Viterbi
27?p???d???p???s? forward
28?? t?? ?st???? ap???d???p???s?
??a??a?t??? µp??e? ?a ?p?????s?e? ? p??a??t?ta
d??ad?, ? e? t?? ?st???? p??a??t?ta t?
s???e???µµ??? ?????e?t?d?? ?a p?????e ap? µ?a
?at?stas?
?????ta? ???s? t?? Forward ?a? Backward
29- ??e??e?t?µata
- st?? pe??pt?se?? p?? ta e?a??a?t??? µ???p?t?a
????? p??? µ????? d?af???? st?? p??ß?ep?µe?e?
p??a??t?te?. - ?ta? µ?a ?at?stas? ??e? p??? µ???? p??a??t?ta
?a? t? µ???p?t? µe t?? µ???st? p??a??t?ta, de?
t?? ep?s??pteta? p?t?.
- ?e???e?t?µata
- ?p??e? ?a p??ß?ef?e? µ?a p??a??t?ta ? ?p??a de?
e??a? ?????? ??a t? µ??t??? (µ?a µ? ep?t?ept?
µet?ßas?).
30S???pt??? ? a??????µ??
- ?p?????sµ?? t?? ? ?a? ?
- ?p?????sµ?? t?? ???
- ?pa?????? µ???? ?a s??????e?
31??a pa??de??µa...
32s????e?a...
???a??t?te? µetaß?se?? 1 0 0.90
0.100.10 0.90 ???a??t?te? ?e???se?? ? ?
G C 0.70 0.10 0.10 0.100.25
0.25 0.25 0.25
1 0
1 0
33s????e?a...
?st? µ?a a???????a DNA, ? ?p??a p?????eta? ap? t?
pa?ap??? µ??t???
AAACAAGAATGCGCACACTACGCAAAAACAATTAGTCGCACTCACGATGA
AACAAATTACCACGGTGAA 111111111100000000000001111111
111100000000000000111111110000000000001 AACGAATA
AACCTCAGAGGCCCAGCGTATATAAACAAGATAAAAACCTAGTCAGCACT
CTGACCAGACG 11111111110000000000000000000001111111
1111111100000000000000000000000 AGCTCACGACTTGAGG
ATAAGAAAAAAACAACAGCTCACGACTTGAGGATAAGAAAAAAACA 000
00000000000001111111111111100000000000000000011111
111111111
34s????e?a...
35s????e?a...
?? ?µ?? ?? p??a??t?te? µetaß?se?? ???a?a?
???a??t?te? µetaß?se?? 1 0 0.98
0.020.03 0.97 ???a??t?te? ?e???se?? ? ?
G C 0.60 0.10 0.10 0.100.25
0.25 0.25 0.25
1 0
1 0
36s????e?a...
37Posterior-Viterbi decoding
??????ta? ?? ep?t?ept?? µetaß?se??
38Optimal Accuracy Posterior Decoding
?a?a??a?? t?? Posterior-Viterbi, ? ?p??a
?p??????e? t? µ???p?t?
S???????
39(No Transcript)
40(No Transcript)
41???e? efa?µ????
- Fold recognition
- Threading
- Domain recognition
42Fold recognition
43Threading
- Protein threading is the problem of aligning a
protein sequence whose structure we want to
elucidate (the target protein) with a protein
sequence whose structure is known (the template
protein) in such a way that mapping residues of
the target onto a template according to the
alignment affords an accurate model of the
backbone structure of the target.
44Domain recognition
45(No Transcript)
46Transformational Grammars
Colourless green ideas sleep furiously Choms
ky
47(No Transcript)
48A transformational grammar consists of a number
of symbols and a number of rewriting rules
(productions) of the form a?b, where a and b
are both strings of symbols. i.e. C ? cN, C ?
E There are two types of symbols -abstract
nonterminal symbols -terminal (observable)
symbols)
49Production rules
- Regular grammars only productions of the form W
?aW or W ?a - Context-free grammars productions of the form W
?ß. Left just one non-terminal, right any
string - Context-sensitive grammars productions of the
form a1Wa2 ?a1ßa2 - Unrestricted grammars any production of the form
a1Wa2 ?? - W any non terminal,
- a any terminal,
- a, ? any string of nonterminals and/or terminals
including null string - ß any string of nonterminals and/or terminals
not including null string
50(No Transcript)
51Regular Expressions
RK-G-EDRKHPCG-AGSCI-FY-LIVA-x-FYM
52?s?d??aµ?a
S ? rW1kW1 W1 ? gW2 W2 ? afilmnqrstvwyW3 W3
? agsciW4 W4 ? fW5yW5 W5 ? lW6iW6vW6aW6 W6
? acdefghiklmnpqrstvwyW7 W7 ? fym
RK-G-EDRKHPCG-AGSCI-FY-LIVA-x-FYM
53Stochastic Grammars? the notion probability of
a sentence is an entirely useless one, under
any known interpretation of this term. Noam
Chomsky (famed linguist) Every time I fire a
linguist, the performance of the recognizer
improves. Fred Jelinek (former head of IBM
speech recognition group)
54HMMs and Regular grammars
55Modeling (allowed) transitions explicitly B ?
L F E L ? L F E L ? L F E In the
notation of the grammars, these are the
nonterminal symbols Modeling emission explicitly
(no probab. here) in state F a c g t
in state L a c g t In the notation of
the grammars, these are the terminal symbols
56??a µa??
- Together Modelling each combination of state and
transition explicitly - B ? aL cL gL tL aF cF gF tF E
- L ? aL cL gL tL aF cF gF tF E
- F ? aL cL gL tL aF cF gF tF E
- P( B ? aL ) P(B) P(aL)
- P( L ? aF ) P(F L) P(aF)
- These are the so called rewriting rules
57- Thats all we need to define a stochastic regular
grammar ! - Finite alphabet of terminal symbols
- (a,c,g,t)
- Finite set of nonterminal symbols
- (B,L,F,E)
- A set of rewriting rules
- (B -gt aF, L -gt cF, ...)
- Probabilities
- P(B-gtaL)
58Hidden states Non-terminals
Transition matrix Rewriting rules
Emission matrix Terminals
Probabilities Probabilities
59Example possible regular grammar N ? aF cF
gF tF aL cL gL tL E 0,1 0,1
0,3 ... B ? aF cF gF tF aL cL gL
tL E 0,2 0,1 0,2 ... C ? aF cF
gF tF aL cL gL tL E 0,1 0,3
0,2 ... An example derivation from the above
grammar is B ? aF ? aaL ? aacL ? aactF ?
aactE Finite State Automata Meale, Moore
60?d??aµ?e? t?? Regular Grammars
- Regular language
- a b a a a b
- Palindrome language
- a a b b a a
- Copy language
- a a b a a b
61?a???d??µe? G??sse?
- ????? ????????? ?? ????? ????.
- Doc, note. I dissent. A fast never prevents a
fatness. I diet on cot. - RNA secondary structure
- aggccuaaauagaucuag...
- ((()))...(((())))....
62(No Transcript)
63Context-free grammars
- St? context-free grammar, st? a??ste?? s?????
p??pe? ?a ????µe ??a ?a? µ??? non-terminal, a???
st? a??ste?? ?p????d?p?te s??d?asµ? terminal ?a?
non-terminal - S ?aSabSbaabb
- S?aSa ?aaSaa ?aabSbaa ?aabaabaa
- To parsing ???eta? µe ta Push-down automata
64Context-free grammars for RNA
65(No Transcript)
66Chomsky Normal form
- W1?W2W3 or W1?a
- ???e ??aµµat??? µp??e? ?a p??e? t? µ??f? a?t?
- ?d?a?te?a ???s?µ? ??a t??? a??????µ???
67Stochastic Context-free grammars (SCFGs)
- Se ???e ?a???a a?at??eta? µ?a p??a??t?ta
- ?as??? p?e????t?µa, ? p??fa??? ep??tas? ?a?
e???pt??s? t?? ap?te?esµ?t?? (?p?? ??a pa??de??µa
ap? Regular expression se ???) - ?a??de??µa ?p??e? ?a ep?t??p??µe (µe
d?af??et????, ?a? µ????? p??a??t?te?) t?
?a?eµ??? ?e?????µa G-U, C-A
68?a ßas??? e??t?µata se ??a SCFG
- ??? ?a ep?t????µe t?? ?a??te?? st????s? µ?a?
a???????a? µe µ?a ??aµµat??? (alignment-parsing
problem) - ?p?????sµ?? t?? p??a??t?ta? µ?a? a???????a?
ded?µ???? µ?a? ??aµµat???? (scoring problem) - ???es? t?? ?a??te??? pa?aµ?t??? µ?a? ??aµµat????
a? ?p?????? ???st? pa?ade??µata (training
problem)
69?? apa?t?se?? t???
- Cocke-Younger-Kasami (CYK) algorithm ???t?st?????
t?? Viterbi sta ??? - Inside (outside) algorithm ? ??t?st????? t??
Forward (Backward) - Inside-Outside algorithm? ??t?st????? t??
Baum-Welch (Forward-Backward)
70??t?st????e?
St???? ??? SCFG
???t?st? st????s? Viterbi CYK
P(x?) Forward Inside
EM algorithm Baum-Welch Inside-Outside
Memory complexity O(LM) O(L2M)
Time complexity O(LM2) O(L3M3)
71???e? p??se???se??
- Nusinov algorithm
- ?e??st?p??e? t? s????? t?? ?e??a???? ß?se??
- Zuker algorithm
- ?e??st?p??e? µ?a s????t?s? e????e?a? (?G), ?
?p??a ap?d?de? ?a??te?a - ?a? ?? d?? a??????µ??, µp????? ?a ??af??? se µ?a
?s?d??aµ? µ??f? SCFG
72??d???? pe??pt?se??
73?e??pt?se?? pseudoknots
?pa?t???ta? e?d???? t??p?p???se?? ??a ?a
e?s?µat????? se ??a SCFG
74?pe?t?se??
- ????profile HMM
- SCFG?Covariance Model (CM)
- Eddy and Durbin, 1994
75?? ???eta? µe t?? p??te??e??
76?a?a??a???
- Ranked Node Rewriting Grammar (RNRG)
- Multi-Tape S-Attributed Grammars (MTSAG)
77Ranked Node Rewriting Grammar (RNRG)
78Ranked Node Rewriting Grammar (RNRG)
79Multi-Tape S-Attributed Grammars (MTSAG)
80(No Transcript)
81(No Transcript)
82?p?te??sµata
- Prediction of Bacteriorhodopsin (1AP9)
- QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVP
AIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTTPLLLLDLALL
VDAD - .......TTHHHHHHHHHHHTTHHHHHHHHSS..S.HHHHHHHHHHHHTH
HHHHHHHHHHHTT.....SSS.SSS....STTHHHHTTTHHHHTTTTSTT
TT.. - .........MMMMMMMMMMMMMMMMMMMMMMMMMM......PMMPMMPPM
MPPMMPPMMPMMPMMPMMP........PPMPPMPPMPPMPPMMPPMPPMP
P... - .........PMMPMMPMMPMMPMMPMMPPMMPMMP......PMMPMMPPM
MPPMMPPMMPPMMPPMMPP........PPMPPMPPMPPMPPMPPMMPMMP
P... - QGTILALVGADGIMIGTGLVGALTKVYSYRFVWWAISTAAMLYILYVLFF
GFTSKAESMRPEVASTFKVLRNVTVVLWSAYPVVWLIGSEGAGIVPLNIE
TLLF - HHHHHHHHHHHHHHHHHHHHHHS..SSS.HHHHHHHHHHHHHHHHHHHTT
TTTTT..TT.SHHHHTTHHHHHHHHHHHHHHHHHHTTTTSSSSSS.SHHH
HHHH - PPMPPMPPMPPMPPMMPMMPMMP.....PMMPMMPMMPMMPMMPPMMPPM
PP..........PPMMPMMPMMPMMPMMPPMMPPMMP......PPMMPPM
MPPM - PMMPPMPPMPPMMPMMPMMPMMP.....PMMPMMPMMPMMPPMPPMMPPM
MP..........PMMPMMPPMMPMMPPMMPPMPPMPP......MMMMMMM
MMMM - MVLDVSAKVGFGLILLRSRAIFGEAEAPEPSAGDGAAATS
- HHHHHHHTHHHHTTTT........................
- MPPMPPMMPMMPMMPP........................
- MMMMMMMMMMMMMMMM........................
83Software
- INFERNAL
- http//infernal.wustl.edu/
- RNACAD
- http//www.cse.ucsc.edu/mpbrown/rnacad/
- CONUS
- http//www.genetics.wustl.edu/eddy/people/robin/c
onus/ - PKNOTS
- ftp//ftp.genetics.wustl.edu/pub/eddy/software/pk
nots.tar.gz - mtsag2c
- http//bioweb.pasteur.fr/docs/doc-gensoft/mtsag2c
/ - RNAUI
- http//www.uga.edu/RNA-Informatics/software/rnaui
0_2.tar