Title: Bioinformatics of Protein Structure
1Bioinformatics of Protein Structure
2Protein structures often characterized by
secondary structure content
- All a
- All b
- a/b
- ab
- There are tools available (for instance at
www.expasy.ch that will allow one to predict
secondary structure from sequence data
3(No Transcript)
4Sequence/structure
- All a-proteins begin to reveal sequence/structure
relationship - Coiled-coil proteins exhibit periodicity with
hydrophobic residues - Observe hydrophobic moments in membrane proteins
51/4 of all predicted proteins in a genome are
membrane proteins
6A different periodicity in b-structures
7Common structures found in b structures
- Barrels
- Propellers
- Greek key
- Jelly roll (Contains one Greek key)
- Helix
8Barrels anti-parallel sheets
9Anti-parallel structures exhibit every other
amino acid periodicity
10Propellers
- Variable number of propeller blades
http//info.bio.cmu.edu/courses/03231/ProtStruc/b-
props.htm
11Quaternary structure of neuraminidase
12Looking for active sites
13g-crystallin has two domains with identical
topology
- Protein evolution
- motif duplication and
- fusion
14Three sheet b-helix Toblerone
15Protein structures containing a and b
- Distinction between a/b and a b
- a/b - Mainly parallel beta sheets
(beta-alpha-beta units) - a b - Mainly antiparallel beta sheets
(segregated alpha and beta regions)
16a/b
17Interspersed a and b
18Generally, a tight hydrophobic core found in a/b
barrels
19How many folds are there?
Proteins have a common fold if they have the
same major secondary structures in the same
arrangement and with the same topological
connections.
To date we know 26,000 protein
structures Within this dataset, 945 folds are
recognized
http//scop.mrc-lmb.cam.ac.uk/scop/
20How many non-folds are there?
- http//www.scripps.edu/news/press/013102.html
- 30-40 of human genome encodes for unstructured
native proteins
21Transition to structural classifications
- Several useful databases link sequence analysis
and protein structure information - Since structure is more highly conserved than
sequence during evolution, structural alignment
algorithms and classifications enable more
distant evolutionary relatives to be identified. - CATH and SCOP are two databases that organize
protein structures, each containing 950-1400
protein superfamilies
22Structural Alignments
- Various algorithms allow structure vs. structure
comparisons - VAST, DALI
- CATH (http//www.biochem.ucl.ac.uk/bsm/cath/)
also has SSAP and GRATH (one computationally
intensive, one not) - Sequence similarity to structural families for
modeling often extracted using PSI-BLAST
(Gene3D)
23Comparison of sequence and structure alignments
1 Taylor WR, Orengo CA, 1989, Protein structure
alignment. J Mol Biol 2081-224 Mueller L,
2003, Protein structure alignment. Paper
presentations 27.51630h
24Multiple structural alignments
- CORA from CATH (where?)
- MultiProt - http//bioinfo3d.cs.tau.ac.il/MultiPro
t/ - DMAPS (pre-calculated) http//dmaps.sdsc.edu/
- CE-MC - http//cemc.sdsc.edu/
- Others?
25CATH
- http//cathwww.biochem.ucl.ac.uk/latest/
- Classification Scheme Class, Architecture,
Topology and Homology - Class secondary structure composition and
packing - Architecture orientation of secondary
structures in 3D, regardless of connectivity - Topology both orientation and connectivity of
secondary structure is accounted for - Homologous superfamily grouped based on whether
an evolutionary relationship exists (clustered at
different levels of sequence ID)
26CATH hierarchy
- Structural alignments
- To homologous super-
- Family, then sequence
- Alignments for sequence
- Family, and then domains.
-
27Protein structure predictions
- Identifying similar protein structures using only
amino acid sequence - Modeling an amino acid sequence onto a known
protein structure - Ab initio protein structure prediction
28Test sequence
- gtrsp2570
- MTLDGKTIAILIAPRGTEDVEYVRPKEALTQATVVTVSLEPGEAQTVNGD
LDPGATHRVDRTFADVSADAFDGLVIPGGTVGADKIRSSEEAVAFVRGFV
SAGKPVAAICHGPWALVEADVLKGREVTSYPSLATDIRNAGGRWVDREVV
VDSGLVTSRKPDDLDAFCAKMIEEFAEGVHDGQRRSA
29SCOP database
- Classification scheme Class, Fold, Superfamily,
and Family, - Class Type and organization of secondary
structure - Fold Share common core structure, same
secondary structure elements in the same
arrangement with the same topological connections - Superfamily share very common structure and
function - Family protein domains share a clear common
evolutionary origin as evidenced by sequence
identity or similar structure/function
30HMMs are useful at SCOP
- For instance, SCOP (http//scop.mrc-lmb.cam.ac.uk/
scop/) HMMs are derived from the PDB databank at
www.rcsb.org - Identify sequence signatures for specific domains
31Modeling protein structure based on homology
- SWISS-MODEL
- http//swissmodel.expasy.org/
- Using first approach mode, submit test sequence,
and use your email - PSI-Blast identifies the most similar sequence
with a protein structure, and SWISS-MODEL wraps
your input sequence around it - Note you can also specify which structure you
would like your sequence to wrap around
32Ab initio predictions
- Protein folding is a complex problem
33Ab initio attempts
- Based on Ramachandran plot probabilities
- Measure interatomic
- Interactions has
- worked for small proteins
- lt85 aa, which appear to
- Favor H-bonds and van
- Der Waal and ignore
- Electrostatic interactions