Lab7 - PowerPoint PPT Presentation

About This Presentation

Title:

Lab7

Description:

Arial Calibri Courier New Office Theme Lab7 Sean Eddy s Lab HMMER Introduction HMMER executables Installation Hmmbuild : build a ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 18

Provided by: Kwangm1

Learn more at: http://darwin.informatics.indiana.edu

Category:

Tags: hmmer | lab7

more less

Transcript and Presenter's Notes

Title: Lab7

1
Lab7

QRNA, HMMER, PFAM

2
Sean Eddys Lab

http//selab.janelia.org/software.html

3
HMMER
4
Introduction

HMMER2 is an implementation in UNIX (Linux,
MacOS) platform of profile hidden Markov model,
whose source code, executables, and user guide
can be downloaded from http//hmmer.janelia.org/
The experiment of HMMER is to look for known
domains in a query sequence by searching a single
sequence again a library of HMMs.
One such library is PFAM, and you can also create
your own library using HMMER

5
HMMER executables

hmmalign - Align sequences to an existing model.
hmmbuild - Build a model from a multiple sequence
alignment.
hmmcalibrate - Takes an HMM and empirically
determines parameters that are used to make
searches more sensitive, by calculating more
accurate expectation value scores (E-values).
hmmconvert - Convert a model file into different
formats, including a compact HMMER 2 binary
format, and best effort emulation of GCG
profiles.
hmmemit - Emit sequences probabilistically from a
profile HMM.
hmmfetch - Get a single model from an HMM
database.
hmmindex - Index an HMM database.
hmmpfam - Search an HMM database for matches to a
query sequence.
hmmsearch - Search a sequence database for
matches to an HMM.

6
Installation

Simple installation
Download the current version of HMMER
hmmer-2.3.2.bin.intel-linux.tar.gz from
http//hmmer.janelia.org/download
Unpack the software by typing tar xvf
hmmer-2.3.2.bin.intel-linux.tar.gz in the
command line. You will see a new directory
hmmer-2.3.2.bin.intel-linux.
Enter the directory of hmmer-2.3.2.bin.intel-linux
. You will see NINE executables ready in the
subdirectory /binaries, and also nine files in
the subdirectory /tutorial
Installation from source code
Download the current HMMER source code version
hmmer-2.3.2.tar.gz from http//hmmer.janelia.org
/download
Create a new directory in your Linux account and
upload or move the software package to the
directory
Unpack the software by typing tar xvf
hmmer-2.3.2.tar.gz in the command line
Type cd hmmer-2.3.2 to enter the software
directory
Type ./configure to configure for your system
and build the programs
Type make to generate the executables
Type make check to run the automated test
suite (This is optional but recommended, and all
these tests should pass)
Please note that by default programs are in
/usr/local/bin/ and man pages are in
/usr/local/man/man1
Type make install to install all executables

7
Hmmbuild build a profile HMM from an aignment

hmmbuild options hmmfile alignfile
hmmbuild test.hmm test.aln
hmmbuild -h
hmmbuild reads a multiple sequence alignment file
alignfile , builds a new profile HMM, and saves
the HMM in hmmfile.
alignfile may be in ClustalW, GCG MSF, or SELEX
alignment format.
By default, the model is configured to find one
or more non-overlapping alignments to the
complete model.
To configure the model for a single global
alignment, use the -g option
To configure the model for multiple local
alignments, use the -f option
To configure the model for a single local
alignment (standard Smith/Waterman), use the -s
option.

8
Hmmcalibrate calibrate HMM search statistics

hmmcalibrate options hmmfile
hmmcalibrate test.hmm
Hmmcalibrate -h
hmmcalibrate reads an HMM file from hmmfile,
scores a large number of synthesized random
sequences with it, fits an extreme value
distribution (EVD) to the histogram of those
scores, and re-saves hmmfile now including the
EVD parameters.
This step is optional, but it will increase the
sensitivity of your database search
hmmcalibrate may take several minutes (or longer)
to run. While it is running, a temporary file
called hmmfile.xxx is generated in your working
directory.
If you abort hmmcalibrate prematurely (ctrl-C,
for instance), your original hmmfile will be
untouched, and you should delete the hmmfile.xxx
temporary file.

9
Hmmsearch - search a sequence database with a
profile HMM

hmmsearch options hmmfile seqfile
hmmsearch test.hmm query.faa gt query.faa.domain
hmmsearch -h
hmmsearch reads an HMM from hmmfile and searches
seqfile for significantly similar sequence
matches.
hmmsearch may take minutes or even hours to run,
depending on the size of the sequence database.
It is a good idea to redirect the output to a
file.
The output consists of four sections
a ranked list of the best scoring sequences,
a ranked list of the best scoring domains,
alignments for all the best scoring domains, and
a histogram of the scores.
A sequence score may be higher than a domain
score for the same sequence if there is more than
one domain in the sequence the sequence score
takes into account all the domains. All sequences
scoring above the -E and -T cutoffs are shown in
the first list, then every domain found in this
list is shown in the second list of domain hits.
If desired, E-value and bit score thresholds may
also be applied to the domain list using the
-domE and -domT options.

10
PFAM
11
Pfam 23.0 (July 2008, 10340 families)

The Pfam database is a large collection of
protein families, each represented by multiple
sequence alignments and hidden Markov models
(HMMs).
Proteins are generally composed of one or more
functional regions, commonly termed domains.
Different combinations of domains give rise to
the diverse range of proteins found in nature.
The identification of domains that occur within
proteins can therefore provide insights into
their function.
There are two components to Pfam Pfam-A and
Pfam-B.
Pfam-A entries are high quality, manually curated
families.
Although these Pfam-A entries cover a large
proportion of the sequences in the underlying
sequence database, in order to give a more
comprehensive coverage of known proteins we also
generate a supplement using the ADDA database.
These automatically generated entries are called
Pfam-B.
Although of lower quality, Pfam-B families can be
useful for identifying functionally conserved
regions when no Pfam-A entries are found.
Pfam also generates higher-level groupings of
related families, known as clans. A clan is a
collection of Pfam-A entries which are related by
similarity of sequence, structure or profile-HMM.
(see Pfam-C)

12
Sequence analysis with HMM

ftp//ftp.sanger.ac.uk/pub/databases/Pfam/releases
/Pfam23.0/ to download files Pfam_fs.gz and
Pfam_ls.gz
Pfam_ls - All global (ls mode) Pfam-A HMMs in an
HMM library searchable with the hmmpfam program.
Pfam_fs - All local (fs mode) Pfam-A HMMs in an
HMM library searchable with the hmmpfam program.
Data location
/home/kwchoi/public_html/I529-09-lab/Lab7/Data/PFA
M_data/
Copy to your working directory or make symbolic
link
To search for domains in test.faa in the global
sequence database, type
hmmpfam Pfam_fs test.faa gt test.faa.pfam
The results is logged into an output file
test.faa.pfam

13
QRNA
14
QRNA

QRNA is a prototype structural noncoding RNA
genefinder tools for detecting novel structural
RNA genes.
It uses three probabilistic "pair-grammars"
a pair stochastic context free grammar modeling
alignments constrained by structural RNA
evolution,
a pair hidden Markov model modeling alignments
constrained by coding sequence evolution, and
a pair hidden Markov model modeling a null
hypothesis of position-independent evolution.
Given an input pairwise sequence alignment (e.g.
from a BLASTN comparison of two related genomes),
it classify the alignment into the coding, RNA,
or null class according to the posterior
probability of each class.

15
Local Installation

The latest version of QRNA (qrna-2.0.3c.tar.gz )
can be download from ftp//selab.janelia.org/pub/s
oftware/qrna/
Configure QRNA and install
tar -xvf qrna-2.0.3c.tar
cd qrna-2.0.3c
cd squid
make
cd ../squid02
make
cd ../src
make
QRNA is installed in
/home/kwchoi/Installed/qrna-2.0.3c/

Data location
/home/kwchoi/public_html/I529-09-lab/Lab7/Data/QRN
A_data/
blastn2qrnadepth.pl
/home/kwchoi/Installed/qrna-2.0.3c/src/scripts/bla
stn2qrnadepth.pl -g human HG_13_RNAs_gene.fa.MGSCv
3.fragchrom.blast
HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0.01.D1
.q
HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0.01.D1
.q.gff
HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0.01.D1
.q.rep
Simple test
Set the running enveriment and run a simple
example
export QRNADB/home/kwchoi/Installed/qrna-2.0.3c/l
ib
/home/kwchoi/Installed/qrna-2.0.3c/src/eqrna -a
5s_rRNA.q gt 5s_rRNA.q.eqrna

17
QRNA demo

Option -C shuffles the columns of the pairwise
alignment while maintaining the gap and conserved
structure of the original alignment. Compare the
two results of using -C and without using -C
/home/kwchoi/Installed/qrna-2.0.3c/src/eqrna -a
-C 5s_rRNA.q gt 5s_rRNA.q.con_shuffle.eqrna
Example using the scanning version with a window.
Consider file Scerevisiae orf v other yeasts.q
which contains an alignment of a S. cerevisiae
ORF The alignment has 514 nucleotides, and we
would like to score it with eqrna using a window
of 150 nucleotides, and moving the window 50
nucleotides each time
/home/kwchoi/Installed/qrna-2.0.3c/src/eqrna -w
150 -x 50 Scerevisiae_orf_v_other_yeasts.q gt
Scerevisiae_orf_v_other_yeasts.q.w150.x50.eqrna
example start with a blastn output
/home/kwchoi/Installed/qrna-2.0.3c/src/eqrn -w 50
-x 50 HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0
.01.D1.q HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast
.E0.01.D1.q.W150.X50.eqrna
The results files are shown as following
5s_rRNA.q.eqrna
5s_rRNA.q.con_shuffle.eqrna
Scerevisiae_orf_v_other_yeasts.q.w150.x50.eqrna