Title: BCB 444544 Introduction to Bioinformatics
1BCB 444/544 - Introduction to Bioinformatics
Lecture 29 Protein Structure Basics
Prediction 29_Nov1
2Seminars in Bioinformatics/Genomics
- Mon Oct 30
- Madan Bhattacharyya (Agron, ISU) Isolation of
Signaling Genes for Phytophthora Resistance in
Soybean - IG Faculty Seminar 1210 PM in 101 Ind Ed II
- Thurs Nov 2
- Peter Clote (ComS, Boston Univ) Some New
Aspects of Protein - RNA Structure
- Baker Center Seminar 210 PM in Howe Hall
Auditorium - Laura Dutca (Chem, ISU) Detailed analysis of
E.coli primary r-Protein Interaction with 16rRNA
Implications for RNP assembly - BBMB Seminar 410 PM in 1414 MBB
- Fri Nov 3
- Heather Greenlee (BMS, ISU) Decoding the
Rod-Photoreceptor Differentiation Pathway - BCB Faculty Seminar 210 PM in W142 Lago
- Eric Henderson (GDCB, ISU) Putting the "Bio" in
Bionanotechnology - GDCB Seminar 410 PM in 1414 MBB
3Assignments Reading This Week
- Chp 8 Proteomics
- More Machine Learning Algorithms
- vMon Oct 30 Chp 8.1 Proteomics - Introduction
Chp 8.2 Protein 3D Structure - vWed Nov 1 Chp 8.3 Protein Interaction Networks
- Chp 8.4 Measuring Proteins
- Thurs Nov 2 Lab 9 Attendance Seminar
Feedback Form (immediately after seminar)
required - Peter Clote 210 PM Howe Hall Auditorium New
Aspects of Protein RNA Structure -
- Fri Nov 3 Machine Learning - more Algorithms
- Support Vector Machines (SVMs)
- Neural Networks (ANNs)
4Assignments Events
Exam 2 Keys Grades posted today Exam returned
today after class BCB 444 544 HW5
Posted online yesterday (sorry) Due by Noon,
Mon Nov 6 BCB 544 Only Teams Projects
Any questions? 544Extra2 Due Mon
Nov 6
See updated Schedule (Oct 30) posted online
5Protein Structure Function
- Protein Structure
- Amino acids characteristics
- Structural classes motifs
- Protein functional families
- Classification
- Databases
- Visualization
6Protein Structure Function
- Protein structure - primarily determined by
sequence - Protein function - primarily determined by
structure - ( structure determines interactions with other
molecules) - Globular proteins
- have compact hydrophobic core hydrophilic
surface - Membrane proteins
- have special hydrophobic domains
- often transmembrane (TM) helices
7Protein Structure Function
- Protein Folding?
- Folded proteins are only marginally stable,
because proteins must balance stability vs
function - Intrinsically disordered some domains of
proteins (or even entire proteins) that do not
assume a stable "fold" until they are bound to
their partner (protein, DNA, etc.) - Predicting protein structure and function can be
very difficult -- but is increasingly important
84 Basic Levels of Protein Structure
9Amino Acids
- Each of 20 different amino acids has different
"R-Group" side chain attached to Ca
10Peptide bond is rigid and planar
11Hydrophobic Amino Acids
12Charged Amino Acids
13Polar Amino Acids
14Certain side-chain configurations are
energetically favored (rotamers)
Ramachandran plot "Allowable" psi phi angles
15Glycine is smallest amino acidR group H atom
- Glycine residues increase backbone flexibility
because they have no R group
16Proline is cyclic
- Proline residues reduce flexibility of
polypeptide chain - Proline cis-trans isomerization is often a
rate-limiting step in protein folding - Recent work suggests it also regulates ligand
binding in native proteins -Andreotti
17Cysteines can form disulfide bonds
- Disulfide bonds (covalent) stabilize
- 3-D structures
- In eukaryotes, disulfide bonds are found only in
secreted proteins or extracellular domains
18Globular proteins have a compact hydrophobic core
- Packing of hydrophobic side chains into interior
is main driving force for folding - Problem? Polypeptide backbone is highly polar
(hydrophilic) due to polar -NH and CO in each
peptide unit these polar groups must be
neutralized - Solution? Form regular secondary structures,
- e.g., ?-helix, b-sheet, stabilized by H-bonds
19Exterior surface of globular proteins is
generally hydrophilic
- Hydrophobic core formed by packed secondary
structural elements provides compact, stable core - "Functional groups" of protein are attached to
this framework exterior has more flexible
regions (loops) and polar/charged residues - Hydrophobic "patches" on protein surface are
often involved in protein-protein interactions
20Secondary Structural Elements
- ??Helix
- ?? Sheets
- Loops
- Coils
21?a?- Helix
- Most abundant 2' structure in proteins
- Average length 10 aa's (10 Angstroms)
- Length varies from 5-40 aa's
- Alignment of H-bonds creates dipole moment
(positive charge at NH end) - Often at surface of core, with hydrophobic
residues on inner-facing side, hydrophilic on
other side
22??helix is stabilized by H-bonds between every
4th residue
C black O red N blue
23R-groups are on outside of ??helix
24Types of ??helices
- "Standard" ??helix 3.6 residues per turn
- H-bonds between C0 of residue n and
- NH of residue n 4
- Helix ends are polar almost always on surface of
protein - Other types of helices?
- n 5 ? helix
- n 3 310 helix
25Certain amino acids are "preferred" others are
rare in ??helices
- Ala, Glu, Leu, Met good helix formers
- Pro, Gly, Tyr, Ser poor
- Amino acid composition distribution varies,
depending on location of helix in 3-D structure
26??-Strands Sheets
- H-bonds formed between 5-10 consecutive residues
in one portion of chain with another - set of 5-10 residues farther down chain
- Interacting regions may be adjacent (with short
loop between) or far apart - ?-sheets usually have all strands either parallel
or antiparallel
27Antiparallel???-sheet
28Antiparallel???-sheet
29Parallel???-sheet
30Mixed??-Sheets also occur
31?Loops
- Connect helices and sheets
- Vary in length and 3-D configurations
- Are located on surface of structure
- Are more "tolerant" of mutations
- Are more flexible and can adopt multiple
conformations - Tend to have charged and polar amino acids
- Are frequently components of active sites
- Some fall into distinct structural families
- (e.g., hairpin loops, reverse turns)
32Coils
- Regions of 2' structure that are not helices,
sheets, or recognizable turns - Intrinsically disordered regions appear to play
important functional roles
33Globular proteins are built from recurring
structural patterns
- Motif or supersecondary structure
- combination of 2' structural elements
- Domain combination of motifs
- Independently folding unit (foldon)
- Functional unit
34A few common structural motifs
- Helix-turn-helix e.g., DNA binding
- Helix-loop-helix e.g., Calcium binding
- ?b-hairpin 2 adjacent antiparallel strands
- connected by short loop
- Greek key 4 adjacent antiparallel strands
-
- b?a-b 2 parallel strands connected by helix
35H-T-H H-L-H
36?b-hairpin
37Greek key
38Beta-alpha-beta
39Simple motifs combine to form domains
40Large polypeptide chains fold into several domains
416 main classes of protein structure
- 1) a Domains
- Bundles of helices connected by loops
- 2) ? Domains
- Mainly antiparallel sheets, usually with 2 sheets
forming sandwich - 3) a????Domains
- Mainly parallel sheets with intervening helices,
also mixed sheets - 4) ??? a????Domains
- Mainly segregated helices and sheets
- 5) Multidomain ?????????
- Containing domains from more than one class
- 6) Membrane cell-surface proteins
42?-domain structures coiled-coils
43?-domain structures 4-helix bundles
44All-? proteins Globins
45?-domain structures
- Anti-parallel ? structures
- Functionally most diverse
- Includes
- Up-and-down sheets or barrels
- Propeller-like structures
- Jelly roll barrels (from Greek key motifs)
46Up-and-down sheets and barrel
47Up-and-down sheets can form propeller-like
structures
48Greek key motifs can form jelly roll barrels
49a??-domain structures
- 3 main classes
- TIM barrel Core of twisted parallel strands
close together - Rossman fold open twisted sheet surrounded by
helices on both sides - Leucine-rich motif specific pattern of Leu
residues, strands form a curved sheet with
helices on outside
50TIM barrel Rossman fold
51Leucine rich motifs can form a???horseshoes
52Protein structure databases visualization
software
- PDB Protein Data Bank
- http//www.rcsb.org/pdb/
- (RCSB) - several structure viewers
- MMDB Molecular Modeling Database
- http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
Structure - (NCBI Entrez) - Cn3D viewer
- MSD Molecular Structure Database
- http//www.ebi.ac.uk/msd
- Especially good for interactions, binding sites
- Other visualization tools PyMol JMol
53Protein structure classification
- SCOP Structural Classification of Proteins
- Levels reflect both evolutionary and structural
relationships - CATH Classification by Class, Architecture,
Topology and Homology - DALI/FSSP
- Fully automated structure alignments
For links discussion, comparisons of these,
see http//pdomains.rcsb.org/pdomains/index.php
54Protein sequence databases
- UniProt (SwissProt, PIR, EBI)
- http//www.pir.uniprot.org
- NCBI Protein http//www.ncbi.nlm.nih.gov/entrez/
query.fcgi?dbProtein
55Protein sequence structure analysis
- Diamond STING Millennium - many useful structure
analysis tools, including Protein Dossier
http//trantor.bioc.columbia.edu/SMS/ - SwissProt (UniProt)
- knowledgebase
- http//us.expasy.org/sprot
- InterPRO
- sequence analysis tools
- http//www.ebi.ac.uk/interpro
56Structural Genomics
- 20,000 "traditional" genes in human genome
- (not including miRNAs, etc.)
- 3,000 proteins in a typical cell
- gt 3 million sequences in UniProt
- 40,000 protein structures in the PDB
- Experimental determination of protein structure
lags far behind sequence determination! - Goal of Structural Genomics Determine structures
of "all" protein folds in nature, using
combination of experimental structure
determination methods (X-ray crystallography,
NMR, mass spectrometry) structure prediction
57Structural Genomics Projects
TargetDB database of structural genomics
targets http//targetdb.pdb.org
Protein Structure Prediction?
58Protein Folding
- "Major unsolved problem in molecular biology"
-
- In cells spontaneous
- assisted by enzymes
- assisted by chaperones
- In vitro many proteins fold spontaneously
- many do not!
-
59Steps in Protein Folding
- 1- "Collapse"- driving force is burial of
hydrophobic aas - (fast - msecs)
- 2- Molten globule - helices sheets form, but
"loose" - (slow - secs)
- 3- "Final" native folded state - compaction, some
2' structures
rearranged -
- Native state? - assumed to be lowest free energy
- - may be an ensemble of structures
-
60Protein Dynamics
- Protein in native state is NOT static!
- Function of many proteins depends on
conformational changes, sometimes large,
sometimes small - Recall
- Globular proteins are inherently "unstable"
- (most proteins have NOT evolved for maximum
stability) - Energy difference between native and denatured
state is very small (5-15 kcal/mol) - (this is equivalent to 1 or 2 H-bonds!)
- So Assumption of prediction methods that lowest
free energy structure is "native" doesn't help a
lot! There may be many "decoy structures" with
very similar "energy" scores
61Protein Structure Prediction
- Structure is largely determined by sequence
- BUT
- Similar sequences can assume different
structures - Dissimilar sequences can assume similar
structures - Many proteins are multi-functional
-
- 2 Major Protein Folding Problems
- 1- Determination of folding pathway
- 2- Prediction of tertiary structure from
sequence - Both still largely unsolved problems