Lab7 - PowerPoint PPT Presentation

About This Presentation
Title:

Lab7

Description:

Arial Calibri Courier New Office Theme Lab7 Sean Eddy s Lab HMMER Introduction HMMER executables Installation Hmmbuild : build a ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 18
Provided by: Kwangm1
Category:
Tags: hmmer | lab7

less

Transcript and Presenter's Notes

Title: Lab7


1
Lab7
  • QRNA, HMMER, PFAM

2
Sean Eddys Lab
  • http//selab.janelia.org/software.html

3
HMMER
4
Introduction
  • HMMER2 is an implementation in UNIX (Linux,
    MacOS) platform of profile hidden Markov model,
    whose source code, executables, and user guide
    can be downloaded from http//hmmer.janelia.org/
  • The experiment of HMMER is to look for known
    domains in a query sequence by searching a single
    sequence again a library of HMMs.
  • One such library is PFAM, and you can also create
    your own library using HMMER

5
HMMER executables
  1. hmmalign - Align sequences to an existing model.
  2. hmmbuild - Build a model from a multiple sequence
    alignment.
  3. hmmcalibrate - Takes an HMM and empirically
    determines parameters that are used to make
    searches more sensitive, by calculating more
    accurate expectation value scores (E-values).
  4. hmmconvert - Convert a model file into different
    formats, including a compact HMMER 2 binary
    format, and best effort emulation of GCG
    profiles.
  5. hmmemit - Emit sequences probabilistically from a
    profile HMM.
  6. hmmfetch - Get a single model from an HMM
    database.
  7. hmmindex - Index an HMM database.
  8. hmmpfam - Search an HMM database for matches to a
    query sequence.
  9. hmmsearch - Search a sequence database for
    matches to an HMM.

6
Installation
  • Simple installation
  • Download the current version of HMMER
    hmmer-2.3.2.bin.intel-linux.tar.gz from
    http//hmmer.janelia.org/download
  • Unpack the software by typing tar xvf
    hmmer-2.3.2.bin.intel-linux.tar.gz in the
    command line. You will see a new directory
    hmmer-2.3.2.bin.intel-linux.
  • Enter the directory of hmmer-2.3.2.bin.intel-linux
    . You will see NINE executables ready in the
    subdirectory /binaries, and also nine files in
    the subdirectory /tutorial
  • Installation from source code
  • Download the current HMMER source code version
    hmmer-2.3.2.tar.gz from http//hmmer.janelia.org
    /download
  • Create a new directory in your Linux account and
    upload or move the software package to the
    directory
  • Unpack the software by typing tar xvf
    hmmer-2.3.2.tar.gz in the command line
  • Type cd hmmer-2.3.2 to enter the software
    directory
  • Type ./configure to configure for your system
    and build the programs
  • Type make to generate the executables
  • Type make check to run the automated test
    suite (This is optional but recommended, and all
    these tests should pass)
  • Please note that by default programs are in
    /usr/local/bin/ and man pages are in
    /usr/local/man/man1
  • Type make install to install all executables

7
Hmmbuild build a profile HMM from an aignment
  • hmmbuild options hmmfile alignfile
  • hmmbuild test.hmm test.aln
  • hmmbuild -h
  • hmmbuild reads a multiple sequence alignment file
    alignfile , builds a new profile HMM, and saves
    the HMM in hmmfile.
  • alignfile may be in ClustalW, GCG MSF, or SELEX
    alignment format.
  • By default, the model is configured to find one
    or more non-overlapping alignments to the
    complete model.
  • To configure the model for a single global
    alignment, use the -g option
  • To configure the model for multiple local
    alignments, use the -f option
  • To configure the model for a single local
    alignment (standard Smith/Waterman), use the -s
    option.

8
Hmmcalibrate calibrate HMM search statistics
  • hmmcalibrate options hmmfile
  • hmmcalibrate test.hmm
  • Hmmcalibrate -h
  • hmmcalibrate reads an HMM file from hmmfile,
    scores a large number of synthesized random
    sequences with it, fits an extreme value
    distribution (EVD) to the histogram of those
    scores, and re-saves hmmfile now including the
    EVD parameters.
  • This step is optional, but it will increase the
    sensitivity of your database search
  • hmmcalibrate may take several minutes (or longer)
    to run. While it is running, a temporary file
    called hmmfile.xxx is generated in your working
    directory.
  • If you abort hmmcalibrate prematurely (ctrl-C,
    for instance), your original hmmfile will be
    untouched, and you should delete the hmmfile.xxx
    temporary file.

9
Hmmsearch - search a sequence database with a
profile HMM
  • hmmsearch options hmmfile seqfile
  • hmmsearch test.hmm query.faa gt query.faa.domain
  • hmmsearch -h
  • hmmsearch reads an HMM from hmmfile and searches
    seqfile for significantly similar sequence
    matches.
  • hmmsearch may take minutes or even hours to run,
    depending on the size of the sequence database.
    It is a good idea to redirect the output to a
    file.
  • The output consists of four sections
  • a ranked list of the best scoring sequences,
  • a ranked list of the best scoring domains,
  • alignments for all the best scoring domains, and
  • a histogram of the scores.
  • A sequence score may be higher than a domain
    score for the same sequence if there is more than
    one domain in the sequence the sequence score
    takes into account all the domains. All sequences
    scoring above the -E and -T cutoffs are shown in
    the first list, then every domain found in this
    list is shown in the second list of domain hits.
    If desired, E-value and bit score thresholds may
    also be applied to the domain list using the
    -domE and -domT options.

10
PFAM
11
Pfam 23.0 (July 2008, 10340 families)
  • The Pfam database is a large collection of
    protein families, each represented by multiple
    sequence alignments and hidden Markov models
    (HMMs).
  • Proteins are generally composed of one or more
    functional regions, commonly termed domains.
    Different combinations of domains give rise to
    the diverse range of proteins found in nature.
  • The identification of domains that occur within
    proteins can therefore provide insights into
    their function.
  • There are two components to Pfam Pfam-A and
    Pfam-B.
  • Pfam-A entries are high quality, manually curated
    families.
  • Although these Pfam-A entries cover a large
    proportion of the sequences in the underlying
    sequence database, in order to give a more
    comprehensive coverage of known proteins we also
    generate a supplement using the ADDA database.
    These automatically generated entries are called
    Pfam-B.
  • Although of lower quality, Pfam-B families can be
    useful for identifying functionally conserved
    regions when no Pfam-A entries are found.
  • Pfam also generates higher-level groupings of
    related families, known as clans. A clan is a
    collection of Pfam-A entries which are related by
    similarity of sequence, structure or profile-HMM.
    (see Pfam-C)

12
Sequence analysis with HMM
  • ftp//ftp.sanger.ac.uk/pub/databases/Pfam/releases
    /Pfam23.0/ to download files Pfam_fs.gz and
    Pfam_ls.gz
  • Pfam_ls - All global (ls mode) Pfam-A HMMs in an
    HMM library searchable with the hmmpfam program.
  • Pfam_fs - All local (fs mode) Pfam-A HMMs in an
    HMM library searchable with the hmmpfam program.
  • Data location
  • /home/kwchoi/public_html/I529-09-lab/Lab7/Data/PFA
    M_data/
  • Copy to your working directory or make symbolic
    link
  • To search for domains in test.faa in the global
    sequence database, type
  • hmmpfam Pfam_fs test.faa gt test.faa.pfam
  • The results is logged into an output file
    test.faa.pfam

13
QRNA
14
QRNA
  • QRNA is a prototype structural noncoding RNA
    genefinder tools for detecting novel structural
    RNA genes.
  • It uses three probabilistic "pair-grammars"
  • a pair stochastic context free grammar modeling
    alignments constrained by structural RNA
    evolution,
  • a pair hidden Markov model modeling alignments
    constrained by coding sequence evolution, and
  • a pair hidden Markov model modeling a null
    hypothesis of position-independent evolution.
  • Given an input pairwise sequence alignment (e.g.
    from a BLASTN comparison of two related genomes),
    it classify the alignment into the coding, RNA,
    or null class according to the posterior
    probability of each class.

15
Local Installation
  • The latest version of QRNA (qrna-2.0.3c.tar.gz )
    can be download from ftp//selab.janelia.org/pub/s
    oftware/qrna/
  • Configure QRNA and install
  • tar -xvf qrna-2.0.3c.tar
  • cd qrna-2.0.3c
  • cd squid
  • make
  • cd ../squid02
  • make
  • cd ../src
  • make
  • QRNA is installed in
  • /home/kwchoi/Installed/qrna-2.0.3c/

16
  • Data location
  • /home/kwchoi/public_html/I529-09-lab/Lab7/Data/QRN
    A_data/
  • blastn2qrnadepth.pl
  • /home/kwchoi/Installed/qrna-2.0.3c/src/scripts/bla
    stn2qrnadepth.pl -g human HG_13_RNAs_gene.fa.MGSCv
    3.fragchrom.blast
  • HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0.01.D1
    .q
  • HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0.01.D1
    .q.gff
  • HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0.01.D1
    .q.rep
  • Simple test
  • Set the running enveriment and run a simple
    example
  • export QRNADB/home/kwchoi/Installed/qrna-2.0.3c/l
    ib
  • /home/kwchoi/Installed/qrna-2.0.3c/src/eqrna -a
    5s_rRNA.q gt 5s_rRNA.q.eqrna

17
QRNA demo
  • Option -C shuffles the columns of the pairwise
    alignment while maintaining the gap and conserved
    structure of the original alignment. Compare the
    two results of using -C and without using -C
  • /home/kwchoi/Installed/qrna-2.0.3c/src/eqrna -a
    -C 5s_rRNA.q gt 5s_rRNA.q.con_shuffle.eqrna
  • Example using the scanning version with a window.
    Consider file Scerevisiae orf v other yeasts.q
    which contains an alignment of a S. cerevisiae
    ORF The alignment has 514 nucleotides, and we
    would like to score it with eqrna using a window
    of 150 nucleotides, and moving the window 50
    nucleotides each time
  • /home/kwchoi/Installed/qrna-2.0.3c/src/eqrna -w
    150 -x 50 Scerevisiae_orf_v_other_yeasts.q gt
    Scerevisiae_orf_v_other_yeasts.q.w150.x50.eqrna
  • example start with a blastn output
  • /home/kwchoi/Installed/qrna-2.0.3c/src/eqrn -w 50
    -x 50 HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast.E0
    .01.D1.q HG_13_RNAs_gene.fa.MGSCv3.fragchrom.blast
    .E0.01.D1.q.W150.X50.eqrna
  • The results files are shown as following
  • 5s_rRNA.q.eqrna
  • 5s_rRNA.q.con_shuffle.eqrna
  • Scerevisiae_orf_v_other_yeasts.q.w150.x50.eqrna
Write a Comment
User Comments (0)
About PowerShow.com