Sequence Based Analysis Tutorial - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Sequence Based Analysis Tutorial

Description:

Scoring Matrices Based on Conserved Amino Acid Substitution ... Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 48
Provided by: wuc
Category:

less

Transcript and Presenter's Notes

Title: Sequence Based Analysis Tutorial


1
Sequence Based Analysis Tutorial
  • NIH Proteomics Workshop
  • Cecilia Arighi, Ph.D.
  • Protein Information Resource at
  • Georgetown University Medical Center

2
Retrieval, Sequence Search Classification
Methods
  • Retrieve protein info by text / UID
  • Sequence Similarity Search
  • BLAST, FASTA, Dynamic Programming
  • Family Classification
  • Patterns, Profiles, Hidden Markov Models,
    Sequence Alignments, Neural Networks
  • Integrated Search and Classification System

3
Sequence Similarity Search (I)
  • Based on Pair-Wise Comparisons
  • Dynamic Programming Algorithms
  • Global Similarity Needleman-Wunch
  • Local Similarity Smith-Waterman
  • Heuristic Algorithms
  • FASTA Based on K-Tuples (2-Amino Acid)
  • BLAST Triples of Conserved Amino Acids
  • Gapped-BLAST Allow Gaps in Segment Pairs
  • PHI-BLAST Pattern-Hit Initiated Search
  • PSI-BLAST Position-Specific Iterated Search

4
Sequence Similarity Search (II)
  • Similarity Search Parameters
  • Scoring Matrices Based on Conserved Amino Acid
    Substitution
  • Dayhoff Mutation Matrix, e.g., PAM250 (20
    Identity)
  • Henikoff Matrix from Ungapped Alignments, e.g.,
    BLOSUM 62
  • Gap Penalty
  • Search Time Comparisons
  • Smith-Waterman 10 Min
  • FASTA 2 Min
  • BLAST 20 Sec

5
Feature Representation
  • Features of Amino Acids Physicochemical
    Properties, Context (Local Global) Features,
    Evolutionary Features
  • Alternative Amino Acids Classification of Amino
    Acids To Capture Different Features of Amino Acid
    Residues

6
Substitution Matrix
  • Likelihood of One Amino Acid Mutated into Another
    Over Evolutionary Time
  • Negative Score Unlikely to Happen (e.g.,
    Gly/Trp, -7)
  • Positive Score Conservative Substitution (e.g.,
    Lys/Arg, 3)
  • High Score for Identical Matches Rare Amino
    Acids (e.g., Trp, Cys)

7
Secondary Structure Features
  • a Helix Patterns of Hydrophobic Residue
    Conservation Showing I, I3, I4, I7 Pattern Are
    Highly Indicative of an a Helix (Amphipathic)
  • b Strands That Are Half Buried in the Protein
    Core Will Tend to Have Hydrophobic Residues at
    Positions I, I2, I4, I6

8
BLAST
  • BLAST (Basic Local Alignment Search Tool)
  • Extremely fast
  • Robust
  • Most frequently used
  • It finds very short segment pairs (seeds)
    between the query and the database sequence
  • These seeds are then extended in both directions
    until the maximum possible score for extensions
    of this particular seed is reached

9
BLAST Search
  • From BLAST Search Interface
  • Table-Format Result with BLAST Output and SSEARCH
    (Smith-Waterman) Pair-Wise Alignment

Link to NCBI taxonomy
Click to see alignment
Link to PIRSF report
Links to iProClass and UniProtKB reports
Click to see SSearch alignment
10
Blast Result Pairwise Alignment
BLAST Aligment
11
Classification
  • What is classification?
  • Why do we need protein classification?
  • Different levels of classification
  • Basis for functional protein classification
  • How to classify a protein of unknown function?

12
Classification Databases
  • Protein motif
  • Protein domain
  • 3-D structure
  • Whole-protein

13
Family Classification Methods
  • Based on Other Classification Information
  • Multiple Sequence Alignment (ClustalW)
  • ProSite Pattern Search
  • Profile Search
  • Hidden Markov Models (HMMs)
  • Domain (Pfam) Whole protein (PIRSF)
  • Neural Networks

14
How do you build a tree?
  • Pick sequences to align
  • Align them
  • Verify the alignment
  • Keep the parts that are aligned correctly
  • Build and evaluate a phylogenetic tree
  • Integrated Analysis

15
Multiple Sequence Alignment CLUSTALW
Pairwise alignment Calculate distance matrix
Mean number of differences per residue
Thompson et al., NAR 22, 4675 (1994).
16
PIR Multiple Alignment and Tree
  • From Text/Sequence Search Result or CLUSTAL W
    Alignment Interface

17
(No Transcript)
18
PIR Pattern Search
  • Signature Patterns for Functional Motifs
  • From Text/Sequence Search Result or Pattern
    Search Interface

A
P-IV-WY-x(3)-H-MR-V-x(3,4)-Q-x(1,2)-D-x(4,5)
-G-A-N
B
Test sequence against PROSITE database
O05689
19
Pattern Search Result (I)
  1. One Query Pattern Against UniProtKB or UniRef100
    DBs

Display the query pattern
Indicate pattern sequence region(s)
Links to iProClass and UniProtKB reports
Link to NCBI taxonomy
Link to PIRSF report
20
Pattern Search Result (II)
  1. One Query Sequence Against PROSITE Pattern
    Database

21
Profile Method
  • Profile A Table of Scores to Express Family
    Consensus Derived from Multiple Sequence
    Alignments
  • Num of Rows Num of Aligned Positions
  • Each row contains a score for the alignment with
    each possible residue.
  • Profile Searching
  • Summation of Scores for Each Amino Acid Residue
    along Query Sequence
  • Higher Match Values at Conserved Positions

22
Prosite PS50157 profile for Zinc finger C2H2
23
PIRSF scan
1
Shows PIRSF that the query belongs to
  • Search One Query Protein Against all the
    Full-length and Domain HMM models for the fully
    curated PIRSFs by HMMER
  • The matched regions and statistics will be
    displayed.

Statistical data for all domains
Statistical data per domain
Alignment with consensus sequence
24
Creation and Curation of PIRSFs
25
Integrated Bioinformatics System for Function and
Pathway Discovery
  • Data Integration
  • Associative Analysis

26
Analytical Pipeline
27
Integrated Bioinformatics System
  • Global Bioinformatics Analysis of 1000s of Genes
    and Proteins
  • Pathway Discovery, Target Identification

28
Lab Section
29
Rat eye lens phosphoproteomics in normal and
cataract Kamei et al., Biol. Pharm. Bull.,
2005.
Normal
Cataract
(-) pI ()
More phosphorylated spots in cataract
sample. Digestion and MS from Spot 16 gave these
peptides MDVTIQHPWFKR ALGPFYPSR CSLSADGMLTFSG YR
LPSNVDQSALS
Mw
MDVTIQHPWFKR
We want to identify the protein(s) that contain
these peptides
Use Peptide Search
30
Peptide Search
31
Peptide Search Results
Species restricted search
Sorting arrows
Search in UniProtKB, 23 proteins
Links to iProClass and UniProtKB reports
Link to NCBI taxonomy
Link to PIRSF report
Matching peptide highlighted in the sequence
32
Batch Retrieval Results (I)
  • Retrieve multiple proteins in from iProClass
    using a specific identifier or a combination of
    them
  • Provides a means to easily retrieve and analyze
    proteins when the identifiers come from different
    databases

33
Blast Similarity Search
What proteins are related to rat CRYAA?
  • Perform sequence similarity search

gtP24623
http//pir.georgetown.edu/pirwww/search/blast.shtm
l
34
Blast Search Results
BLAST (partial) result for CRYAA_RAT in UniProtKB
database
35
Pairwise Alignment
36
PIR Text Search
(http//pir.georgetown.edu/search/textsearch.shtml
)
UniProtKBDatabase and unique UniParc sequences
Lets search for human crystallins
PIR protein family classification database
37
  • Lets look for crystallins which have 3D
    structure

Refine your search or start over
Display PDB ID
38
Domain Display allows to compare simultaneously
Pfam domains present in multiple proteins
Share same domain architecture
Lets perform a multiple alignment on the
sequences containing PF00030
39
Multiple Alignment
40
Interactive Phylogenetic Tree and Alignment
Beta B1 and gamma crystallins share the same
domains, SCOP fold and share significant sequence
similarity suggesting that they are related
41
Pattern Search (I)
Select P07320 and perform a pattern search
Search for proteins containing this pattern
(PS00225) in rat
42
Pattern Search Result
Beta and gamma Crystallins have multiple copies
of this pattern
43
  • PIRSF provides a single platform where all the
    previous analysis has been done by curators

Represents extent of manual curation
Validation tag
Link to PIRSF report
44
Taxonomic Distribution
Alpha-crystallin is exclusively found in
metazoans
45
PIRSF scan
46
PIRSF report (I) a single platform to study
proteins
47
PIRSF report (II)
Cross-links to other databases
http//www.geneontology.org/
48
alpha-Crystallin and Related Proteins
Alpha crystallin beta chain
HSPs
Alpha crystallin alpha chain
Write a Comment
User Comments (0)
About PowerShow.com