Title: Hidden Markov Model: An Introduction
1Hidden Markov ModelAn Introduction
Mount p. 204 - 210
Spring 2008 Clark University
2Multiple sequence alignment to profile HMMs
Hidden Markov models (HMMs) are states that
describe the probability of having a particular
amino acid residue arranged in a column of a
multiple sequence alignment HMMs are
probabilistic models Like a hammer is more
refined than a blast, an HMM gives more sensitive
alignments than traditional techniques such as
progressive alignments.
3An HMM is constructed from a MSA Example five
lipocalins
GTWYA (hs RBP) GLWYA (mus RBP) GRWYE (apoD) GTWYE
(E Coli) GEWFS (MUP4)
4GTWYA GLWYA GRWYE GTWYE GEWFS
Prob. 1 2 3 4 5 p(G) 1.0 p(T) 0.4 p(L) 0.2 p(R)
0.2 p(E) 0.2 0.4 p(W) 1.0 p(Y) 0.8 p(F)
0.2 p(A) 0.4 p(S) 0.2
5GTWYA GLWYA GRWYE GTWYE GEWFS
Prob. 1 2 3 4 5 p(G) 1.0 p(T) 0.4 p(L) 0.2 p(R)
0.2 p(E) 0.2 0.4 p(W) 1.0 p(Y) 0.8 p(F)
0.2 p(A) 0.4 p(S) 0.2
P(GEWYE) (1.0)(0.2)(1.0)(0.8)(0.4) 0.064 log
odds score ln(1.0) ln(0.2) ln(1.0)
ln(0.8) ln(0.4) -2.75
6GTWYA GLWYA GRWYE GTWYE GEWFS
P(GEWYE) (1.0)(0.2)(1.0)(0.8)(0.4) 0.064 log
odds score ln(1.0) ln(0.2) ln(1.0)
ln(0.8) ln(0.4) -2.75
E0.4 A0.4 S0.2
T0.4 L0.2 R0.2 E0.2
Y0.8 F0.2
G1.0
W1.0
7(No Transcript)
8(No Transcript)
9Structure of a hidden Markov model (HMM)
10Structure of a hidden Markov model (HMM)
delete state
insert state
main state
11(No Transcript)
12 HBA_HUMAN ...VGA--HAGEY HBB_HUMAN
...V----NVDEV MYG_PHYCA ...VEA--DVAGH GLB3_CH
ITP ...VKG------D GLB5_PETMA
...VYS--TYETS LGB2_LUPLU ...FNA--NIPKH GLB1_GL
YDI ...IAGADNGAGV
13HMM algorithm
- (Parameter Initialization) Initialize HMM with a
preliminary MSA (say, from CLUSTALW). - (Parameter Estimation) For each sequence, find
the optimal (most likely) path among all possible
paths through the model. - From these new sequences, generate a new HMM.
- Repeat step 2 and 3 until parameters dont change
significantly. - (Alignment) Trained model can provide the most
likely path for each sequence. - (Search) This Profile HMM can then be used to
search for other similar sequences in a sequence
database.
14(No Transcript)
15HMMER biosequence analysis using profile hidden
Markov models
16HMMER build a hidden Markov model
Determining effective sequence number ...
done. 4 Weighting sequences heuristically
... done. Constructing model architecture
... done. Converting counts to probabilities
... done. Setting model name, etc.
... done. x Constructed a profile HMM
(length 230) Average score 411.45
bits Minimum score 353.73 bits Maximum
score 460.63 bits Std. deviation
52.58 bits
17HMMER calibrate a hidden Markov model
HMM file lipocalins.hmm Length
distribution mean 325 Length distribution s.d.
200 Number of samples 5000 random seed
1034351005 histogram(s) saved to
not saved POSIX threads 2 - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - HMM x mu -123.894508 lambda
0.179608 max -79.334000
18HMMER search an HMM against GenBank
Scores for complete sequences (score includes all
domains) Sequence
Description Score E-value N --------
----------- -----
------- --- gi20888903refXP_129259.1
(XM_129259) ret 461.1 1.9e-133
1 gi132407spP04916RETB_RAT Plasma
retinol- 458.0 1.7e-132 1 gi20548126refXP
_005907.5 (XM_005907) sim 454.9
1.4e-131 1 gi5803139refNP_006735.1
(NM_006744) ret 454.6 1.7e-131
1 gi20141667spP02753RETB_HUMAN Plasma
retinol- 451.1 1.9e-130 1 . . gi16767588re
fNP_463203.1 (NC_003197) out 318.2
1.9e-90 1 gi5803139refNP_006735.1
domain 1 of 1, from 1 to 195 score 454.6, E
1.7e-131 -gtmkwVMkLLLLaALagvfga
AErdAfsvgkCrvpsPPRGfrVkeNFDv
mkwVLLLLaA aAErd Crvs frVkeNFD
gi5803139 1 MKWVWALLLLAA--W--AAAERD------
CRVSS----FRVKENFDK 33
erylGtWYeIaKkDprFErGLllqdkItAeySleEhGsMsataeGrirVL
rGtWYaKkDp E
GLlqdIAeSEGMsataGrrL gi5803139
34 ARFSGTWYAMAKKDP--E-GLFLQDNIVAEFSVDETGQMSATAKGRV
RLL 80 eNkelcADkvGTvtqiEGeasev
fLtadPaklklKyaGvaSflqpGfddy
NcADvGTtE dPakkKyGvaSflqGdd
gi5803139 81 NNWDVCADMVGTFTDTE----------DPA
KFKMKYWGVASFLQKGNDDH 120
19HMMER search an HMM against GenBank match to a
bacterial lipocalin
gi16767588refNP_463203.1 domain 1 of 1, from
1 to 177 score 318.2, E 1.9e-90
-gtmkwVMkLLLLaALagvfgaAErdAfsvgkCrvpsPPRGfrVke
NFDv MLL A a
AfvCpPPGVNFD gi1676758 1
----MRLLPVVA------AVTA-AFLVVACSSPTPPKGVTVVNNFDA
36 erylGtWYeIaKkDprFErGLllqdkI
tAeySleEhGsMsataeGrirVL
rylGtWYeIa DrFErGL tAySl
GiV gi1676758 37 KRYLGTWYEIARLDHRFERGL--
-EQVTATYSLRD--------DGGINVI 75
eNkelcADkvGTvtqiEGeasevfLtadPaklklKyaGvaSflqpGfdd
y NkD EGa t
P lK Sfpy gi1676758 76
-NKGYNPDR-EMWQKTEGKA---YFTGSPNRAALKV----SFFGPFYGGY
116
20HMMER search an HMM against GenBank
Scores for complete sequences (score includes all
domains) Sequence
Description Score E-value N --------
----------- -----
------- --- gi3041715spP27485RETB_PIG
Plasma retinol- 614.2 1.6e-179
1 gi89271pirA39486 plasma
retinol- 613.9 1.9e-179 1 gi20888903refXP
_129259.1 (XM_129259) ret 608.8
6.8e-178 1 gi132407spP04916RETB_RAT
Plasma retinol- 608.0 1.1e-177
1 gi20548126refXP_005907.5
(XM_005907) sim 607.3 1.9e-177
1 gi20141667spP02753RETB_HUMAN Plasma
retinol- 605.3 7.2e-177 1 gi5803139refNP_
006735.1 (NM_006744) ret 600.2
2.6e-175 1 gi5803139refNP_006735.1
domain 1 of 1, from 1 to 199 score 600.2, E
2.6e-175 -gtmeWvWaLvLLaalGgasaE
RDCRvssFRvKEnFDKARFsGtWYAiAK
mWvWaLLLaa aaERDCRvssFRvKEnFDKARFsGtWYAAK
gi5803139 1 MKWVWALLLLAAW--AAAERDCRVSSFRV
KENFDKARFSGTWYAMAK 45
KDPEGLFLqDnivAEFsvDEkGhmsAtAKGRvRLLnnWdvCADmvGtFtD
KDPEGLFLqDnivAEFsvDEGmsAtAKG
RvRLLnnWdvCADmvGtFtD gi5803139 46
KDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTD
95 tEDPAKFKmKYWGvAsFLqkGnDDHW
iiDtDYdtfAvqYsCRLlnLDGtC
tEDPAKFKmKYWGvAsFLqkGnDDHWiDtDYdtAvqYsCRLlnLDGtC
gi5803139 96 TEDPAKFKMKYWGVASFLQKGNDDHWIVDT
DYDTYAVQYSCRLLNLDGTC 145