Title: Sequence Based Analysis Tutorial
1Sequence Based Analysis Tutorial
- NIH Proteomics Workshop
- Lai-Su L. Yeh, Ph.D.
- Protein Information Resource at
- Georgetown University Medical Center
2Retrieval, Sequence Search Classification
Methods
- Retrieve protein info by text / UID
- Sequence Similarity Search
- BLAST, FASTA, Dynamic Programming
- Family Classification
- Patterns, Profiles, Hidden Markov Models,
Sequence Alignments, Neural Networks - Integrated Search and Classification System
3Sequence Similarity Search (I)
- Based on Pair-Wise Comparisons
- Dynamic Programming Algorithms
- Global Similarity Needleman-Wunch
- Local Similarity Smith-Waterman
- Heuristic Algorithms
- FASTA Based on K-Tuples (2-Amino Acid)
- BLAST Triples of Conserved Amino Acids
- Gapped-BLAST Allow Gaps in Segment Pairs
- PHI-BLAST Pattern-Hit Initiated Search
- PSI-BLAST Position-Specific Iterated Search
4Sequence Similarity Search (II)
- Similarity Search Parameters
- Scoring Matrices Based on Conserved Amino Acid
Substitution - Dayhoff Mutation Matrix, e.g., PAM250 (20
Identity) - Henikoff Matrix from Ungapped Alignments, e.g.,
BLOSUM 62 - Gap Penalty
- Search Time Comparisons
- Smith-Waterman 10 Min
- FASTA 2 Min
- BLAST 20 Sec
5Feature Representation
- Features of Amino Acids Physicochemical
Properties, Context (Local Global) Features,
Evolutionary Features - Alternative Amino Acids Classification of Amino
Acids To Capture Different Features of Amino Acid
Residues
6Substitution Matrix
- Likelihood of One Amino Acid Mutated into Another
Over Evolutionary Time - Negative Score Unlikely to Happen (e.g.,
Gly/Trp, -7) - Positive Score Conservative Substitution (e.g.,
Lys/Arg, 3) - High Score for Identical Matches Rare Amino
Acids (e.g., Trp, Cys)
7BLAST
- BALST (Basic Local Alignment Search Tool)
- Extremely fast
- Robust
- Most frequently used
- It finds very short segment pairs (seeds)
between the query and the database sequence - These seeds are then extended in both directions
until the maximum possible score for extensions
of this particular seed is reached
8BLAST Search
- From BLAST Search Interface
- Table-Format Result with BLAST Output and SSEARCH
(Smith-Waterman) Pair-Wise Alignment
Link to NCBI taxonomy
Link to PIRSF report
Click to see alignment
Links to iProClass and UniProtKB reports
Click to see SSearch alignment
9Blast Result Pairwise Alignment
BLAST Aligment
10Classification
- What is classification?
- Why do we need protein classification?
- Different levels of classification
- Basis for functional protein classification
- How to classify a protein of unknown function?
11Classification Databases
- Protein motif
- Protein domain
-
- 3-D structure
-
- Whole-protein
-
12Family Classification Methods
- Based on Other Classification Information
- Multiple Sequence Alignment (ClustalW)
- ProSite Pattern Search
- Profile Search
- Hidden Markov Models (HMMs)
- Domain (Pfam) Whole protein (PIRSF)
- Neural Networks
13How do you build a tree?
- Pick sequences to align
- Align them
- Verify the alignment
- Keep the parts that are aligned correctly
- Build and evaluate a phylogenetic tree
- Integrated Analysis
14Multiple Sequence Alignment
- ClustalW
- Progressive Pairwise Approach
- Base on Exhaustive Pairwise Alignments
- Neighbor Joining
- Joining Order Corresponding to a Tree
- Alignment Varies
- Dependent on Joining Order
15Multiple Alignment and Tree
- From Text/Sequence Search Result or ClustalW
Alignment Interface
16(No Transcript)
17Motif Patterns (Regular Expressions)
- Signature Patterns for Functional Motifs
ProClass Motif Alignments
18PIR Pattern Search
- From Text/Sequence Search Result or Pattern
Search Interface - One Query Sequence Against PROSITE Pattern
Database - One Query Pattern (PROSITE or User-Defined)
Against Sequence DB
19Pattern Search Result (I)
- One Query Sequence Against PROSITE Pattern
Database
20Pattern Search Result (II)
- One Query Pattern Against Sequence Database
21Profile Method
- Profile A Table of Scores to Express Family
Consensus Derived from Multiple Sequence
Alignments - Num of Rows Num of Aligned Positions
- Each row contains a score for the alignment with
each possible residue. - Profile Searching
- Summation of Scores for Each Amino Acid Residue
along Query Sequence - Higher Match Values at Conserved Positions
22PIRSF scan
1
Shows PIRSF that the query belongs to
- Search One Query Protein Against all the
Full-length and Domain HMM models for the fully
curated PIRSFs by HAMMER - The matched regions and statistics will be
displayed.
Statistical data for all domains
Statistical data per domain
Alignment with consensus sequence
23Secondary Structure Features
- a Helix Patterns of Hydrophobic Residue
Conservation Showing I, I3, I4, I7 Pattern Are
Highly Indicative of an a Helix (Amphipathic) - b Strands That Are Half Buried in the Protein
Core Will Tend to Have Hydrophobic Residues at
Positions I, I2, I4, I6
243D Structure
Proteins share the same fold suggesting homology
Beta B1 Crystallin
Gamma Crystallin C
25Creation and Curation of PIRSFs
26Integrated Bioinformatics System for Function and
Pathway Discovery
- Data Integration
- Associative Analysis
27Analytical Pipeline
28Integrated Bioinformatics System
- Global Bioinformatics Analysis of 1000s of Genes
and Proteins -
- Pathway Discovery, Target Identification
29Lab Section
30Text Search
31Text Search Result (I)
Extend your search or start over
Choose columns to be displayed
Expand view
Pre-computed BLAST Results
Links to iProClass and UniProtKB reports
Link to NCBI taxonomy
Link to PIRSF report
32Text Search Result (III)
Number of Related Seq. at 3 different E-value
cut-offs
33Text Search Result (II)
Extend your search or start over
Choose columns to be displayed
Curated domain architecture with links to
Pfam database
Link to PIRSF report
Extent of family curation
34Peptide Search
35Peptide Search Results
36Batch Retrieval Results (I)
Retrieve more sequences
37Batch Retrieval Results (II)
38Blast Similarity Search
39Blast Search Results
40Blast / Related Sequences Results
41Blast Result Pairwise Alignment
BLAST Aligment
42Pairwise Alignment
43Multiple Alignment Interactive Phylogenetic Tree
and Alignment
44Phylogenetic Tree and Alignment View
45Pattern Search (I)
46 Pattern Search (II)
47PIRSF scan
48PIRSF Report
49PIRSF Family Hierarchy
50Taxonomic Distribution Phylogenetic Pattern
51Rabbit Alpha Crystallin A Chain An iProClass View
of the entry
Pre-computed BLAST results
See protein synonyms
See IDs from different databases
52alpha-Crystallin and Related Proteins