Why is pairwise sequence alignment different - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Why is pairwise sequence alignment different

Description:

BLASTP 2.2.6 [Apr-09-2003] RID: 1068830459-16741-16367211346.BLASTQ3 ... http://bioweb.pasteur.fr/seqanal/blast/intro-uk.html#psiblast ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 22
Provided by: bch7
Category:

less

Transcript and Presenter's Notes

Title: Why is pairwise sequence alignment different


1
Lecture 5
  • Why is pairwise sequence alignment different
  • for proteins and for nucleic acids ?
  • General protein introduction.
  • Scoring systems and matrices for protein data.
  • 3. Wet experience for pairwise sequence
    alignment
  • (for proteins, more options).
  • 4. Special Blast pages.
  • 5. Why is multiple alignment better ?
  • 6. Wet experience for MSA (for proteins).

2
BLASTP 2.2.6 Apr-09-2003 RID
1068830459-16741-16367211346.BLASTQ3 Query
gi33875338gbAAH00349.2 GBA protein Homo
sapiens (359 letters) Database All
non-redundant GenBank CDS translationsPDBSwissPr
otPIRPRF 1,541,613 sequences 503,866,891
total letters Taxonomy reports Distribution of
87 Blast Hits on the Query Sequence
Graphic Representation
3
From the BlastP Page Go To Taxonomy Report
Organism Report
Score
Common name
Blast (family) name
E-value
Scientific name
  • TaxBLAST hits are sorted according to species
    containing the target sequence.
  • All hits of the same organism are listed
    together.
  • Within each species, TaxBLAST hits are sorted
    by score and E-value.

4
PSI-BLAST - Position Specific Iterated BLAST
  • A fast heuristic method for searching a profile,
    by using iterations. The profile is used as the
    query in the next iteration.
  • Advantages of PSI-BLAST
  • Identify week homologies (more distant
    relatives of
  • a protein, not found directly in FASTA or
    BLAST).
  • An important tool for predicting biochemical
    function.

Information http//www.ncbi.nlm.nih.gov/Educatio
n/BLASTinfo/psi1.html
5
BLAST vs PSI-BLAST
BLAST DNA or protein. Use for close
homologies. PSI-BLAST Proteins only. Finds
distant homologies. Predicts biochemical
activity and function.
http//www.rubic.rdg.ac.uk/andrew/bioinf.org/talk
s/LocalBlast/img0.htm
6
Outline of the PSI-BLAST Algorithm
First ordinary BLAST is used to find close
homologues. Rather than making a real multiple
alignment, the close homologues are all aligned
to the query sequence. A profile is constructed
using a very simple empirical weighing
scheme. Ignoring the positional variation of
indels the profile is again searched against the
database.
7
What is a Conserved Position ?
A conserved position has a high frequency of any
single amino-acid type in the MSA column.
8
How Does PSI-BLAST Work ?
  • 1. Run a gapped-BLAST search with the query
    sequence.
  • Collect all output sequences aligned to the query
    with E-value
  • below a threshold (default is 0.005). Call
    the collection M.
  • Construct a profile from M. The profile is a
    matrix (position specific score matrix - PSSM).
    The matrix has 20 rows,
  • one per AA.
  • Iterate steps 1 to 3 with query profile. The
    iterative search results in increased
    sensitivity, and detection of weak homologies.
  • 5. Stop iterating when no new, significant
    sequences are found ("convergence).
  • Note A highly conserved position will receive a
    high count frequency
  • so they will be more significant in
    the next iteration than
  • weakly conserved positions (that
    receive low count frequencies).

9
PSSM (Position Specific Scoring Matrix)
1 2 3 4 5 6 7 8 9 10 11 12 13.
  • Notes
  • PSSM is generated by calculating
  • position specific scores for each
  • position in the alignment (conserved
  • positions receive high score, weakly
  • conserved positions receive low score).
  • Profile is produced internally but not
    available on NCBI server.
  • Only first 15 positions of profile shown here
    for lack of space.

http//www.rubic.rdg.ac.uk/andrew/bioinf.org/talk
s/LocalBlast/img0.htm
10
http//www.idi.ntnu.no/grupper/KS-grp/microarray/s
lides/drablos/Fold_recognition/sld004.htm http//b
ioweb.pasteur.fr/seqanal/blast/intro-uk.htmlpsibl
ast
11
PSI-BLAST - Output
Hits that are better than the E-value threshold
are listed first. These hits are used in forming
the profile that will be used in the next
PSI-BLAST iteration. Hits with E-values worse
than threshold, but nonetheless have an E-value
better than 10 (default selected on the query
page) are listed further down the page. Any of
the sequences in the list of "Sequences with
E-value worse than threshold (gt 0.005) can be
manually added (click) to sequences used for
generating the PSI-BLAST profile.
12
To run PSI-BLAST, Start with the BLAST page
http//www.ncbi.nlm.nih.gov/BLAST/
13
PSI-BLAST
14
Running PSI-BLAST
  • NOTES
  • Use the
  • SwissProt
  • database and
  • the BLOSUM62
  • scoring matrix.
  • Default EXPECT value in BLAST is 10.
  • Default threshold value for PSI-BLAST is
    0.005.
  • The user can see all BLASTP hits up to E-Value
    10,
  • but only sequences with E-value threshold lt
    0.005 affect the profile.

15
PSI-BLAST Output - First Run - Query Human
Prosaposin.
16
PSI-BLAST Output - First Run - Query Human
Prosaposin.
16 Sequences with E-value BETTER than threshold
(lt 0.005)
17
PSI-BLAST Output, Second Run, query profile
(up from 54)
18
PSI-BLAST Output, Second Run, query profile
Sequences with E-value BETTER than threshold
19
PSI-BLAST Output, Third Run, query 2nd
iteration profile
20
PSI-BLAST Tutorial
http//www.ncbi.nlm.nih.gov/Education/BLASTinfo/ps
i1.html
http//www.cmbi.kun.nl/bioinf/tools/psiblast.shtml

Help http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automa
t.pl?page/NPSAHLP/npsahlp_simsearchpsiblast.html
Other Servers for PSI-Blast
http//xylian.igh.cnrs.fr/blast/psi_blast2.html
http//www.vge.ac.uk/blast/psiblast.html
http//www.cmbi.kun.nl/bioinf/tools/psiblast.shtm
l
21
http//www.ebi.ac.uk/fasta3/
45
Write a Comment
User Comments (0)
About PowerShow.com