Title: Biological sequence analysis and information processing by artificial neural networks
1Biological sequence analysis and information
processing by artificial neural networks
- Søren Brunak
- Center for Biological Sequence Analysis
- Technical University of Denmark
- brunak_at_cbs.dtu.dk
2Pairwise alignment
- gtcarp Cyprinus carpio growth hormone
210 aa vs. - gtchicken Gallus gallus growth hormone
216 aa - scoring matrix BLOSUM50, gap penalties -12/-2
- 40.6 identity Global alignment
score 487 - 10 20 30
40 50 60 70 - carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIR
VQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD - . .... . . .
.. . .. . ... . . . - chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLR
AQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE - 10 20 30 40
50 60 70 80 - 80 90 100 110
120 130 140 150 - carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVS
NSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN - ..... .... . .
... . ... . .... . . - chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFT
NNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G - 90 100 110 120
130 140 150 160 - 170 180 190
200 210
3(No Transcript)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Biological neuron
12(No Transcript)
13Diversity of interactions in a network enables
complex calculations
- Similar in biological and artificial systems
- Excitatory () and inhibitory (-) relations
- between compute units
14(No Transcript)
15Transfer of biological principles to neural
network algorithms
- Non-linear relation between input and output
- Massively parallel information processing
- Data-driven construction of algorithms
- Ability to generalize to new data items
16(No Transcript)
17(No Transcript)
18(No Transcript)
19Simplest non-trivial classification problem
- CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA,
DYHSANY, ... - Two categories positives and negatives
- Data described by two features, e.g.
- charge, sidechain volume, molecular
- weight, number of atoms, ...
20Features of phosphorylations sites
PKG cGMP- dep.kinase
cdc2 Cyclin- dep.kinase 2
CK-II Casein kinase 2
PKC
CaM-II Ca/cal-modulin-dep. kinase
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Homotypical cerebral cortex (from primate) - 6
layers
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29DEMO
30(No Transcript)
31(No Transcript)
32Training and error reduction
negative
positive
33Transfer of biological principles to neural
network algorithms
- Non-linear relation between input and output
- Massively parallel information processing
- Data-driven construction of algorithms
34(No Transcript)
35Sparse encoding of amino acid sequence windows
36Sparse encoding of nucleotide sequence windows
Nucleotides 4 letter alphabet Normally no need
for a fifth letter ACGTAGGCAATCTCAGACGTTTATC 10
00010000100001100000100010010010001000000101000001
010010000010100001000010000100010001100000010100
37(No Transcript)
38(No Transcript)
39(No Transcript)