Multiple Sequence Alignments PSIBLAST - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Multiple Sequence Alignments PSIBLAST

Description:

Identification of conserved, functionally important sites ... Reference: Thompson et al. (1994) Nucleic Acids Res. 22, 4673-4680. 9. Step 1-pairwise alignments ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 47
Provided by: johnt90
Category:

less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignments PSIBLAST


1
Multiple Sequence AlignmentsPSI-BLAST
  • June 4, 2009

2
  • Reading assignments
  • Xiong Chapters 5 6

3
Topics
  • Overview of MSA
  • MSA methods
  • Practical aspects
  • MSA to Profiles
  • PSI-BLAST

4
What are MSAs used for?
  • Identification of protein families
  • More sensitive identification of remote
    homologues
  • Identification of conserved, functionally
    important sites
  • Starting point for phylogenetic studies
  • Defining protein family sets for entire genomes
  • Prediction of protein secondary structure
  • Identification of regulatory regions

5
Overview of MSA
  • Alignment of 3 sequences to bring as many
    similar characters into register as possible
  • Hypothetical model of mutations (substitutions,
    insertions deletions)
  • Best represents most likely evolutionary
    scenario.
  • Cannot be unambiguously established

6
MSA methods
  • Hierarchical
  • Most common accurate
  • ClustalW is most popular MSA program
  • ClustalW advanced development
  • Can propagate errors made early on
  • Non-hierarchical
  • Can align sequences of different lengths
  • May fail on larger sequence sets
  • T-Coffee

7
Overview of hierarchical method
  • Do a pairwise comparison of all sequences
  • Create a guide tree of the most to least similar
  • Align 2 most similar, then next 2 most similar
  • Add sequences progressively in decreasing order
    of similarity
  • Gaps added are never removed

8
Clustal W
  • CLUSTALCluster alignment
  • The underlying concept is that groups of
    sequences are phylogenetically related.
  • If they can be aligned then one can construct a
    tree.

Reference Thompson et al. (1994) Nucleic Acids
Res. 22, 4673-4680
9
Step 1-pairwise alignments
Compare each sequence with each other and
calculate a distance matrix.
A - B .87 - C .59
.60 -
Each number represents the number of exact
matches divided by the sequence length (ignoring
gaps). Thus, the higher the number the more
closely related the two sequences are.
Different sequences
A B C
In this distance matrix sequence A is 87
identical to sequence B
10
Step 2-Create Guide Tree
Use the Distance Matrix to create a Guide Tree
to determine the order of the sequences.
0.87 (0.13)
A - B .87 - C .59
.60 -
A B C
Different sequences
0.60 (0.40)
A B C
Guide Tree
Branch length proportional to estimated
divergence between A and B (0.13)
11
Step 3-Progressive Alignment
First, align A and B Then add sequence C to the
previous alignment. In the closely aligned
sequences, gaps are given a heavier weight than
in more divergent sequences.
Guide Tree
12
Amino acid weight matrices
  • Series of scoring matrices that one can use
    depending on the relatedness of the proteins
    aligned.
  • As the alignment proceeds in CLUSTALW the AA
    weight matrices are changed to more divergent
    scoring matrices.
  • Length of the branch is used to determine which
    matrix to use and contributes to the alignment
    score.

13
Globin alignment
  • Starting with a group of 7 globin-related
    sequences from different species
  • Do pairwise alignments between all 7 sequences
  • Calculate similarity between each pair higher
    score indicates more similar

14
  • Cluster the sequences by similarity to create a
    guide tree
  • Branch length is proportional to estimated
    divergence between the two sequences

15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Example of Sequence Alignment using Clustal W
identity high similarity . low similarity -
gap in sequence
Amino acids often color coded based on
physical -chemical properties
19
Globin alignment
20
Guide tree phylogram
21
ClustalW programs
  • Locally by itself
  • Download and install ClustalX on any platform
  • Graphic interface
  • Locally, as part of another package
  • BioEdit free runs on Windows
  • Genious basic version free (Mac Windows)
  • WEB servers
  • EBI ClustalW server www.ebi.ac.uk/clustalw

22
Practical aspects
  • Identify download sequences in correct format
  • Should meet criteria for MSA
  • Closely related (E lt 1)
  • Similar length and number of domains
  • Same domain order
  • If necessary, extract regions of similar length
  • Name them appropriately

23
Alignment viewers
  • Edit and prepare for publication
  • Different coloring schemes
  • Jalview -- Java based interactive viewer (free)

24
(No Transcript)
25
MSAs to Profiles
  • MSAs can be used to find remote homologs or
    remote members of a protein family
  • PSI-BLAST
  • Automated, available as part of NCBI BLAST
  • Hidden Markov Models (HMMs)
  • more sensitive
  • less automated
  • Basis of PFAM database

26
Why?
  • Database searches using a profile or
    position-specific scoring matrices (PSSM) are
    much more sensitive for detecting weak or distant
    relationships than are database searches using a
    single sequence as query
  • Information content higher in a PSSM

27
Pairwise alignment
28
What is a PSSM?
Position Specific Scoring Matrix
29
MSAs to PSSM
POS 123456 Seq1 ATGTCG Seq2 AAGACT Seq3 TACTCA
Seq4 CGGAGG Seq5 AACCTG
30
ATGTCG AAGACT TACTCA CGGAGG AACCTG
Convert MSA to raw frequency table
31
Normalize by dividing by overall frequencies
32
Convert the values to log to the base of 2
PSSM
33
Match AACTCG in the matrix
SUM 1.0 1.0 0.8 1.0 1.38 1.15 6.33
34
PSI-BLAST
  • Position-Specific Iterated BLAST
  • What is it and how does it relate to MSA?
  • How is it related to BLAST?
  • What can I do with it?

35
Steps in PSI-BLAST
  • Single protein sequence compared to database
    using BLASTP
  • Construct a multiple alignment and profile (PSSM)
    from any significant local alignments
  • Profile or PSSM is compared to database, making
    local alignments
  • Estimate statistical significance of local
    alignments
  • Iterate an arbitrary number of times or until
    convergence (no new sequences added)

36
Practical uses of PSI-BLAST
  • Can create a PSSM using PSI-BLAST against 1
    database, use the same PSSM in another database
    for a more sensitive search
  • Does not have to run to convergence to create a
    PSSM useful for finding remote homologues,
    usually 2 or 3 iterations is sufficient

37
Sma4 protein from C. elegans
  • Sma4 protein, 570 aa long
  • Protein domains

38
BLASTP against Refseq DB
. . .
39
PSI-BLAST iteration1
  • Default threshold is E 0.005

E 0.023
40
PSI-BLAST iteration 2
E2e-29
41
PSI-BLAST iteration 3
E2e-39
. . .
42
Homologs?
  • 30-50 identical over short stretches

Sma4 protein
XP_001668763
43
Homologs?
  • Sma4 protein (570 aa)
  • XP_001668763 (531 aa)

44
Finding homologs in other species
  • Sma4 BLASTP against Refseq limited to Gallus
    gallus (chicken)
  • 17 hits with E-value lt 10

45
Use PSSM to search other DB
  • PSSM from 3rd iteration with Sma4 in a PSI-BLAST
    search of REFSEQ limited to Gallus gallus
    (chicken)
  • 127 matches with E-value lt 10

46
Computer lab
  • PSI-BLAST to find remote homologues
  • MSA of proteins for genotyping
  • MSA to determine homology
Write a Comment
User Comments (0)
About PowerShow.com