????????F????? ?? ???a - PowerPoint PPT Presentation

About This Presentation
Title:

????????F????? ?? ???a

Description:

– PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 84
Provided by: Pant83
Category:
Tags: automata | multi | tape

less

Transcript and Presenter's Notes

Title: ????????F????? ?? ???a


1
????????F????? ?????aµ???? ?????aµµat?sµ??
  • ?a?te??? ?p?????

2
???aµ???? ?????aµµat?sµ??
  • St????s? (t?p???-?????)
  • RNA secondary structure prediction
  • ??aµeµß?a???? tµ?µata
  • Hidden Markov Models
  • ???e? efa?µ????

3
St????s?
  • ?????
  • ??p???
  • ??d???? pe??pt?se??

4
???aµ???? p????aµµat?sµ??
5
??? pe??pt?se?? st????se??
F(i,0)-id, F(0,j)-jd
F(i,0)0, F(0,j)0
6
?????? ??a ta ?e?? (gap penalties)
?p?? p???? ??a ta ?e??
S???et? p???? ??a ta ?e??
7
?a??de??µa
?st? d?? a???????e?
?? ????µe ??a ta ?e??
d1
??te ? ?a??te?? ????? st????s? ?a e??a?
A A G T T A G C A G C A G T A T C G C A -
8
????? st????s?
A A G T T A G C A G C A G T A T C G C A -
9
??p??? st????s?
A G T T A G C A A G T A T C G C A
10
????? a??????µ??
  • ?p?????? ep?s?? e?d???? pe??pt?se?? st????s??
    (p.?. p??sa?µ???)
  • T????µe d??ad? ?a e?t?p?s??µe, µ?a µ????
    a???????a a? s??a?t?ta? se µ?a µe?a??te??
  • ?st? ?t? ?????µe ?a a????e?s??µe a? st??
    a????????a t?? ????d??? lacI t?? E.coli ?p???e? ?
    ???st? a????????a t?? ?p?????t? (promoter). ?st?
    a??µa ?t? t? tµ?µa t?? ????d??? ??e? a????????a
  • ?a? ? a????????a t?? ?p?????t? e??a?

11
s????e?a
F(i,0)-id F(0,j)0.
12
?a? ? a???????a t?? p??a??? ?p?????t? e??a?
C A T G A T
13
RNA secondary structure prediction
14
Nussinov
15
(No Transcript)
16
??aµeµß?a???? tµ?µata
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
?a 3 ßas??? e??t?µata se ??a ??? ...
  • ??t?µ?s?
  • ?ed?µ???? t?? µ??t????, p?? ?a ?p?????s??µe t??
    ????? p??a??t?ta µ?a? a???????a? s?µß????.
    P(x?)
  • ?p???d???p???s?
  • ??? ?a ß???µe t?? p?? p??a?? a????????a
    ?atast?se?? (path) ap? t?? ?p??a ??e? d????e? t?
    µ??t???, ??a ?a d?se? t?? s???e???µµ??? a???????a
    s?µß????.
  • ??pa?de?s?
  • ??? ?a t??p?p???s??µe t?? pa?aµ?t???? t??
    µ??t????, ?ts? ?ste ?a µe??st?p????e? ? s???????
    p??a??f??e?a t?? a?????????
  • ?MLargmaxP(x?)

22
... ?a? ?? apa?t?se?? t???
  • ??t?µ?s?
  • ???????µ?? FORWARD, a??????µ?? d??aµ????
    p????aµµat?sµ??, p?? ?p??????e? t?? s???????
    p??a??t?ta t?? a???????a?, ????? ?a d????e? ap?
    ??a ta d??at? µ???p?t?a (a????????e?
    ?atast?se??).
  • ?p???d???p???s?
  • ???????µ?? t?? VITERBI, a??????µ?? d??aµ????
    p????aµµat?sµ??, p?? µ?s? a?ad??µ?? (recursion)
    ?p??????e? t?? p?? p??a?? a????????a ?atast?se??
    ??a t? ded?µ??? a???????a ?a? t? ded?µ???
    µ??t???. (??a??a?t??? NBEST).
  • ??pa?de?s?
  • ???????µ?? t?? BAUM-WELCH (? a?????
    FORWARD-BACKWARD), e?d??? pe??pt?s? t??
    a??????µ?? ?? (Expectation-Maximization), ?
    ?p???? ?e????eta? ta ded?µ??a sa? ded?µ??a µe
    e??e?p?? t?µ?? (missing values) ?a? ?p??????e?
    ?.?.?. ??a t?? pa?aµ?t???? t?? µ??t????
    (??a??a?t??? Gradient Descent).

23
???????µ?? Forward
24
(No Transcript)
25
(No Transcript)
26
???????µ?? Viterbi
27
?p???d???p???s? forward
28
?? t?? ?st???? ap???d???p???s?
??a??a?t??? µp??e? ?a ?p?????s?e? ? p??a??t?ta
d??ad?, ? e? t?? ?st???? p??a??t?ta t?
s???e???µµ??? ?????e?t?d?? ?a p?????e ap? µ?a
?at?stas?
?????ta? ???s? t?? Forward ?a? Backward
29
  • ??e??e?t?µata
  • st?? pe??pt?se?? p?? ta e?a??a?t??? µ???p?t?a
    ????? p??? µ????? d?af???? st?? p??ß?ep?µe?e?
    p??a??t?te?.
  • ?ta? µ?a ?at?stas? ??e? p??? µ???? p??a??t?ta
    ?a? t? µ???p?t? µe t?? µ???st? p??a??t?ta, de?
    t?? ep?s??pteta? p?t?.
  • ?e???e?t?µata
  • ?p??e? ?a p??ß?ef?e? µ?a p??a??t?ta ? ?p??a de?
    e??a? ?????? ??a t? µ??t??? (µ?a µ? ep?t?ept?
    µet?ßas?).

30
S???pt??? ? a??????µ??
  • ?p?????sµ?? t?? ? ?a? ?
  • ?p?????sµ?? t?? ???
  • ?pa?????? µ???? ?a s??????e?

31
??a pa??de??µa...
32
s????e?a...
???a??t?te? µetaß?se?? 1 0 0.90
0.100.10 0.90 ???a??t?te? ?e???se?? ? ?
G C 0.70 0.10 0.10 0.100.25
0.25 0.25 0.25
1 0
1 0
33
s????e?a...
?st? µ?a a???????a DNA, ? ?p??a p?????eta? ap? t?
pa?ap??? µ??t???
AAACAAGAATGCGCACACTACGCAAAAACAATTAGTCGCACTCACGATGA
AACAAATTACCACGGTGAA 111111111100000000000001111111
111100000000000000111111110000000000001   AACGAATA
AACCTCAGAGGCCCAGCGTATATAAACAAGATAAAAACCTAGTCAGCACT
CTGACCAGACG 11111111110000000000000000000001111111
1111111100000000000000000000000   AGCTCACGACTTGAGG
ATAAGAAAAAAACAACAGCTCACGACTTGAGGATAAGAAAAAAACA 000
00000000000001111111111111100000000000000000011111
111111111
34
s????e?a...
35
s????e?a...
?? ?µ?? ?? p??a??t?te? µetaß?se?? ???a?a?
???a??t?te? µetaß?se?? 1 0 0.98
0.020.03 0.97 ???a??t?te? ?e???se?? ? ?
G C 0.60 0.10 0.10 0.100.25
0.25 0.25 0.25
1 0
1 0
36
s????e?a...
37
Posterior-Viterbi decoding
??????ta? ?? ep?t?ept?? µetaß?se??
38
Optimal Accuracy Posterior Decoding
?a?a??a?? t?? Posterior-Viterbi, ? ?p??a
?p??????e? t? µ???p?t?
S???????
39
(No Transcript)
40
(No Transcript)
41
???e? efa?µ????
  • Fold recognition
  • Threading
  • Domain recognition

42
Fold recognition
43
Threading
  • Protein threading is the problem of aligning a
    protein sequence whose structure we want to
    elucidate (the target protein) with a protein
    sequence whose structure is known (the template
    protein) in such a way that mapping residues of
    the target onto a template according to the
    alignment affords an accurate model of the
    backbone structure of the target.

44
Domain recognition
45
(No Transcript)
46
Transformational Grammars
Colourless green ideas sleep furiously Choms
ky
47
(No Transcript)
48
A transformational grammar consists of a number
of symbols and a number of rewriting rules
(productions) of the form a?b, where a and b
are both strings of symbols. i.e. C ? cN, C ?
E There are two types of symbols -abstract
nonterminal symbols -terminal (observable)
symbols)
49
Production rules
  • Regular grammars only productions of the form W
    ?aW or W ?a
  • Context-free grammars productions of the form W
    ?ß. Left just one non-terminal, right any
    string
  • Context-sensitive grammars productions of the
    form a1Wa2 ?a1ßa2
  • Unrestricted grammars any production of the form
    a1Wa2 ??
  • W any non terminal,
  • a any terminal,
  • a, ? any string of nonterminals and/or terminals
    including null string
  • ß any string of nonterminals and/or terminals
    not including null string

50
(No Transcript)
51
Regular Expressions
RK-G-EDRKHPCG-AGSCI-FY-LIVA-x-FYM
52
?s?d??aµ?a

S ? rW1kW1 W1 ? gW2 W2 ? afilmnqrstvwyW3 W3
? agsciW4 W4 ? fW5yW5 W5 ? lW6iW6vW6aW6 W6
? acdefghiklmnpqrstvwyW7 W7 ? fym
RK-G-EDRKHPCG-AGSCI-FY-LIVA-x-FYM
53
Stochastic Grammars? the notion probability of
a sentence is an entirely useless one, under
any known interpretation of this term. Noam
Chomsky (famed linguist) Every time I fire a
linguist, the performance of the recognizer
improves. Fred Jelinek (former head of IBM
speech recognition group)
54
HMMs and Regular grammars
55
Modeling (allowed) transitions explicitly B ?
L F E L ? L F E L ? L F E In the
notation of the grammars, these are the
nonterminal symbols Modeling emission explicitly
(no probab. here) in state F a c g t
in state L a c g t In the notation of
the grammars, these are the terminal symbols
56
??a µa??
  • Together Modelling each combination of state and
    transition explicitly
  • B ? aL cL gL tL aF cF gF tF E
  • L ? aL cL gL tL aF cF gF tF E
  • F ? aL cL gL tL aF cF gF tF E
  • P( B ? aL ) P(B) P(aL)
  • P( L ? aF ) P(F L) P(aF)
  • These are the so called rewriting rules

57
  • Thats all we need to define a stochastic regular
    grammar !
  • Finite alphabet of terminal symbols
  • (a,c,g,t)
  • Finite set of nonterminal symbols
  • (B,L,F,E)
  • A set of rewriting rules
  • (B -gt aF, L -gt cF, ...)
  • Probabilities
  • P(B-gtaL)

58
Hidden states Non-terminals
Transition matrix Rewriting rules
Emission matrix Terminals
Probabilities Probabilities
59
Example possible regular grammar N ? aF cF
gF tF aL cL gL tL E 0,1 0,1
0,3 ... B ? aF cF gF tF aL cL gL
tL E 0,2 0,1 0,2 ... C ? aF cF
gF tF aL cL gL tL E 0,1 0,3
0,2 ... An example derivation from the above
grammar is B ? aF ? aaL ? aacL ? aactF ?
aactE Finite State Automata Meale, Moore
60
?d??aµ?e? t?? Regular Grammars
  • Regular language
  • a b a a a b
  • Palindrome language
  • a a b b a a
  • Copy language
  • a a b a a b

61
?a???d??µe? G??sse?
  • ????? ????????? ?? ????? ????.
  • Doc, note. I dissent. A fast never prevents a
    fatness. I diet on cot.
  • RNA secondary structure
  • aggccuaaauagaucuag...
  • ((()))...(((())))....

62
(No Transcript)
63
Context-free grammars
  • St? context-free grammar, st? a??ste?? s?????
    p??pe? ?a ????µe ??a ?a? µ??? non-terminal, a???
    st? a??ste?? ?p????d?p?te s??d?asµ? terminal ?a?
    non-terminal
  • S ?aSabSbaabb
  • S?aSa ?aaSaa ?aabSbaa ?aabaabaa
  • To parsing ???eta? µe ta Push-down automata

64
Context-free grammars for RNA
65
(No Transcript)
66
Chomsky Normal form
  • W1?W2W3 or W1?a
  • ???e ??aµµat??? µp??e? ?a p??e? t? µ??f? a?t?
  • ?d?a?te?a ???s?µ? ??a t??? a??????µ???

67
Stochastic Context-free grammars (SCFGs)
  • Se ???e ?a???a a?at??eta? µ?a p??a??t?ta
  • ?as??? p?e????t?µa, ? p??fa??? ep??tas? ?a?
    e???pt??s? t?? ap?te?esµ?t?? (?p?? ??a pa??de??µa
    ap? Regular expression se ???)
  • ?a??de??µa ?p??e? ?a ep?t??p??µe (µe
    d?af??et????, ?a? µ????? p??a??t?te?) t?
    ?a?eµ??? ?e?????µa G-U, C-A

68
?a ßas??? e??t?µata se ??a SCFG
  • ??? ?a ep?t????µe t?? ?a??te?? st????s? µ?a?
    a???????a? µe µ?a ??aµµat??? (alignment-parsing
    problem)
  • ?p?????sµ?? t?? p??a??t?ta? µ?a? a???????a?
    ded?µ???? µ?a? ??aµµat???? (scoring problem)
  • ???es? t?? ?a??te??? pa?aµ?t??? µ?a? ??aµµat????
    a? ?p?????? ???st? pa?ade??µata (training
    problem)

69
?? apa?t?se?? t???
  1. Cocke-Younger-Kasami (CYK) algorithm ???t?st?????
    t?? Viterbi sta ???
  2. Inside (outside) algorithm ? ??t?st????? t??
    Forward (Backward)
  3. Inside-Outside algorithm? ??t?st????? t??
    Baum-Welch (Forward-Backward)

70
??t?st????e?
St???? ??? SCFG
???t?st? st????s? Viterbi CYK
P(x?) Forward Inside
EM algorithm Baum-Welch Inside-Outside
Memory complexity O(LM) O(L2M)
Time complexity O(LM2) O(L3M3)
71
???e? p??se???se??
  • Nusinov algorithm
  • ?e??st?p??e? t? s????? t?? ?e??a???? ß?se??
  • Zuker algorithm
  • ?e??st?p??e? µ?a s????t?s? e????e?a? (?G), ?
    ?p??a ap?d?de? ?a??te?a
  • ?a? ?? d?? a??????µ??, µp????? ?a ??af??? se µ?a
    ?s?d??aµ? µ??f? SCFG

72
??d???? pe??pt?se??
73
?e??pt?se?? pseudoknots
?pa?t???ta? e?d???? t??p?p???se?? ??a ?a
e?s?µat????? se ??a SCFG
74
?pe?t?se??
  • ????profile HMM
  • SCFG?Covariance Model (CM)
  • Eddy and Durbin, 1994

75
?? ???eta? µe t?? p??te??e??
76
?a?a??a???
  • Ranked Node Rewriting Grammar (RNRG)
  • Multi-Tape S-Attributed Grammars (MTSAG)

77
Ranked Node Rewriting Grammar (RNRG)
78
Ranked Node Rewriting Grammar (RNRG)
79
Multi-Tape S-Attributed Grammars (MTSAG)
80
(No Transcript)
81
(No Transcript)
82
?p?te??sµata
  • Prediction of Bacteriorhodopsin (1AP9)
  • QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVP
    AIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTTPLLLLDLALL
    VDAD
  • .......TTHHHHHHHHHHHTTHHHHHHHHSS..S.HHHHHHHHHHHHTH
    HHHHHHHHHHHTT.....SSS.SSS....STTHHHHTTTHHHHTTTTSTT
    TT..
  • .........MMMMMMMMMMMMMMMMMMMMMMMMMM......PMMPMMPPM
    MPPMMPPMMPMMPMMPMMP........PPMPPMPPMPPMPPMMPPMPPMP
    P...
  • .........PMMPMMPMMPMMPMMPMMPPMMPMMP......PMMPMMPPM
    MPPMMPPMMPPMMPPMMPP........PPMPPMPPMPPMPPMPPMMPMMP
    P...
  • QGTILALVGADGIMIGTGLVGALTKVYSYRFVWWAISTAAMLYILYVLFF
    GFTSKAESMRPEVASTFKVLRNVTVVLWSAYPVVWLIGSEGAGIVPLNIE
    TLLF
  • HHHHHHHHHHHHHHHHHHHHHHS..SSS.HHHHHHHHHHHHHHHHHHHTT
    TTTTT..TT.SHHHHTTHHHHHHHHHHHHHHHHHHTTTTSSSSSS.SHHH
    HHHH
  • PPMPPMPPMPPMPPMMPMMPMMP.....PMMPMMPMMPMMPMMPPMMPPM
    PP..........PPMMPMMPMMPMMPMMPPMMPPMMP......PPMMPPM
    MPPM
  • PMMPPMPPMPPMMPMMPMMPMMP.....PMMPMMPMMPMMPPMPPMMPPM
    MP..........PMMPMMPPMMPMMPPMMPPMPPMPP......MMMMMMM
    MMMM
  • MVLDVSAKVGFGLILLRSRAIFGEAEAPEPSAGDGAAATS
  • HHHHHHHTHHHHTTTT........................
  • MPPMPPMMPMMPMMPP........................
  • MMMMMMMMMMMMMMMM........................

83
Software
  • INFERNAL
  • http//infernal.wustl.edu/
  • RNACAD
  • http//www.cse.ucsc.edu/mpbrown/rnacad/
  • CONUS
  • http//www.genetics.wustl.edu/eddy/people/robin/c
    onus/
  • PKNOTS
  • ftp//ftp.genetics.wustl.edu/pub/eddy/software/pk
    nots.tar.gz
  • mtsag2c
  • http//bioweb.pasteur.fr/docs/doc-gensoft/mtsag2c
    /
  • RNAUI
  • http//www.uga.edu/RNA-Informatics/software/rnaui
    0_2.tar
Write a Comment
User Comments (0)
About PowerShow.com