Title: Protein Analysis and Modeling
1Protein Analysis and Modeling
- BFB Workshop
- Selected Methods in Bioinformatics
- April 2009
2Protein Comparison
3Protein Comparison (1D)
- Sequence-based sequence alignment
- goal align conserved residues together
- each pair of aligned residues obtains a score
- residues that frequently occur at the same
position in related proteins are more similar
and obtain better scores than distinct residues - NCBI BLAST (Basic Local Alignment Search Tool)
- database search for similar sequences
- different specialized databases
- nucleotide or peptide sequences
- pairwise alignment
-
4Protein Comparison (3D)
- Structure-based
- structure is more conserved than sequence ? find
distantly related proteins - identification of secondary structure elements
- residues in corresponding structure elements are
aligned together - residues that are aligned together have similar
spatial positions - NCBI VAST (Vector Alignment Search Tool)
- database search for similar 3D structures
- secondary structure elements are represented
- as vectors
- alignment of vectors in compared structures
5Search for Related Structures in NCBI
- Protein entry ? related structures link
- Protein structure entry ? structure summary page
Click on sequence bar to retrieve related
structures for entire chain or individual 3D
domains
6Related Structures in NCBI
Click on sequence bar to view structure-based
sequence alignment
View 3D alignment
Colored residues correspond to aligned secondary
structure elements red highly conserved residue
7Conserved Domains
8Conserved Domains
- are distinct functional units
- often coincide with 3D protein domains (but are
not the same!) - can help to elucidate the function of a protein
- contain highly conserved sequence patterns
- can be identified through multiple sequence
alignment of related proteins
9Conserved Domains in NCBI
- NCBI Conserved Domain Database (CDD) contains
conserved domains - Domains are derived from multiple sequence
alignments of related proteins in different
species - Structure information is used (if available)
10Conserved Domains in NCBI
- Related domains are hierarchically organized into
families with common conserved residues and
general function - Child nodes represent more specific domain models
and contain additional conserved residues
compared to parent nodes
Sub-family hierarchy
11Conserved Domains in NCBI
- Sequences in domain families are clustered based
on their similarity
12Detection of Conserved Domains
- Sequence comparison of the query protein against
multiple alignments in CDD - Search techniques
- Enter protein sequence or accession code in CD
search
13Detection of Conserved Domains
- Sequence comparison of the query protein against
multiple alignments in CDD - Search techniques
- Enter protein sequence or accession code in CD
search - Structure summary page
14Detection of Conserved Domains
- Sequence comparison of the query protein against
multiple alignments in CDD - Search techniques
- Enter protein sequence or accession code in CD
search - Structure summary page
- Domains link for many Entrez search results
- BLAST results page
15Conserved Domain Search Results
Click to show all domain hits
Conserved features
Best-scoring domains
4 types of domain hits
16Conserved Domain Search Results
- 4 types of domain hits
- Specific hits
- domain-specific e-value threshold
- high confidence that query protein belongs to the
same family as the proteins used to identify the
conserved domain - Non-specific hits
- general e-value threshold
- Domain super-family
- including specific and non-specific hits
- Multi-domains
- computationally detected
- likely to contain multiple single domains
17Conserved Domain Entry
Select individual domain hit
Search for proteins with similar domain
architecture
18Conserved Domain Entry
Text summary
Conserved features (binding sites, catalytic
centers, pockets)
19Conserved Domain Entry
Alignment of sequences used to derive the domain
Residues of conserved features
Query sequence embedded in the alignment
20Homology Modeling
21Homology Modeling
- Given protein sequence
- Aim model of the 3D structure of the target
protein - Approach use homologous proteins as templates
...MPKYTLHYFPLMGRAELCRFVLAAHG...
Sequence
Model
Template
22Homology Modeling
- Based on the observation that 3D structure is
much more conserved than sequence - Take the known structure of a protein with
sequence similarity to the modeling target as a
structural template - Template and target proteins need not be
evolutionary related (comparative modeling) - Generation of topologically correct sequence
alignments is the most important step in
comparative modeling
234 Steps of Building a Model
- Template selection
- Target-template alignment
- Model construction
- Model refinement and assessment
24Template Selection
- Homology searching database search for
homologous proteins - Sequence similarity searching (BLAST, FASTA)
- Sequence identity crucial for model reliability
- gt 50 high accuracy ? RMSE 1 Å
- (Swiss-Model Automated Mode)
- 30 - 50 medium accuracy ? RMSE 1.5 Å
- (Swiss-Model Alignment Mode)
- 20 - 30 twilight zone (Swiss-Model Project
Mode)
25Alignment
- Usually multiple template proteins
- Structure-based alignment
- superpose template structures
- align conserved motifs
- derive corresponding sequence alignment
- embed target sequence
gktlit nfsqehip gktlisflyeqnfsqehip
sequence vs. structure alignment
sequence alignment
structure alignment
Most critical step!
26Model Construction
- Three sequential steps
- Core
- conserved regions
- gapless aligned blocks
- assignment of secondary structure elements
- Loops
- variable regions
- de novo modeling
- conformation databases
- Side chains
- energy minimization
- molecular dynamics
- rotamer databases
27Model Refinement and Assessment
- Check for unfavorable local conformations
- Ramachandran plot
- bond angles/distances
- chirality
- model refinement by energy minimization
- Sequence-structure mapping
- map target sequence onto modeled structure
- score compatibility of sequence and structure by
knowledge-based potentials or energy calculations - Retrospective benchmarking
- comparison to experimental structure
- RMSE
28Use of Homology Modeling
- What can a homology model provide
- study patterns of conservation
- spatial proximity of residues to known active
sites - surface exposure of residues
- and what not?
- atomic details of protein geometry
- exact loop and side chain conformations
- local shape
- protein flexibility