Title: The Genome Access Course Protein Structure
1TheGenomeAccessCourseProtein Structure
HSP 70 (1DKG, 1DKZ) and prefoldin (1FXK)
2Protein Structural Elements
- 2o Structural Elements
- a-Helix
- ß-Sheet
- Globular regions
- Domains
- SH2
- Leucine Zipper
3Domains
- Discrete structural units
- Can infer boundaries from sequence analysis
- 25 500 residues long
- Most lt 200 residues
- Less than 50 residues usually stabilized by SS
bonds or metal ions
4LipoxygenaseDomain
gt500 residues
5WW Domain
33 residues
6Domain Determination
- Internal duplications
- Detect with a dotplot
- Transmembrane segments
- Hydrophobic, 1535 residues
- Segments easy to predict
- Topology and multiple segments harder to predict
- PHD, TMHMM, TMpred
- Low complexity segments
- Composition typically non-random
- Non-compact folds coiled coils, rods, flexible
domain linkers - Complexity function (SEG)
- Small-pitch overlapping repeats (XNU)
7Protein Sequence Databases
- GenPept
- Swiss-Prot
- TrEMBL
8Protein Domain Databases
- Pfam
- PROSITE
- BLOCKS
- PRINTS
- CDD
- ProDom
- SMART
- InterPro
9- HMM family profiles constructed by hand
- Structural data in alignments
- No hierarchy
- No specific compositional bias
- Good graphical output
10Pfam-A and Pfam-B
- Pfam-A (75)
- Curated, annotated families
- Pfam-B (30)
- Families derived automatically from ProDom
- Other
11- Protein fingerprint database (fingerprints are
groups of conserved motifs that characterize a
protein family) - Regular grammar for describing profiles (e.g.
EDQ-x-G-x-DN-A-x-x-GALI) - Profile search is sensitive, but low coverage
(signaling) - Pattern search has high false positive rate
12- Highly conserved, ungapped MSAs
- Derived from PROSITE
13- Fingerprints are sets of ungapped weight matrices
- Hierarchical classification for important
families - Families, domains, and proteins
14- Conserved Domain Database (NCBI)
- Linked into other NCBI resources
- Includes Pfam and SMART domains (but does not
give the same answer)
15- Simple Modular Architecture Research Tool
- Collected by Ponting and Bork (641 HMMs)
- Focuses on
- Signaling Domains
- Extracellular domains
- Nuclear domains
- High quality nice graphics
16Alignment of Representative Members
Profile-HMM built with HMMer 2.0
Search Protein DB
Description
Full alignment
17- Profiles automatically built from PSI-BLAST
alignments of Swiss-PROT - No annotation
- As with other automated DBs (Pfam-B, DOMO),
useful for seeing if region appears in different
contexts
18- Pfam, SMART, ProDom, PRINTS, and Prosite domains
- High quality annotation
19Comparison of Protein Family DBs
Pfam
SMART
CDD
PROSITE
SRS
20Protein Sequence Analysis
- Biochemical/biophysical properties
- Secondary Structure
- Super-secondary (signal peptides, domains,
motifs) - 3D prediction (Threading)
21Amphipathic Helix
Edge Strand
Buried Strand
22(No Transcript)
23(No Transcript)
24Viewing 3D Structures
- Cn3d
- Chime
- RasMol
- Protein Explorer
25(No Transcript)
26(No Transcript)
27Predicting Structure from Sequence
- 100 amino acid protein has 3200 backbone
configurations - Threading
28Protein Structure Prediction is Quite a Challenge
293D Structure Prediction
- UCLA-DOE
- SwissMODEL
- CPHmodels
30Methods for Aligning Structures
- (Double) Dynamic Programming
- Distance Matrix
- Gibbs sampling
- Branch-and-bound searching
31HMMSTR Local Sequence-Structure Correlations
- Constructed using motif clustering
- No gaps or insertion states
- Non-linear, highly branching model
- Models local motifs common to all proteins