Title: Introduction to Structural Biology
1Introduction to Structural Biology
2Lecture 1
Introduction to Structural Biology
Adva Yeheskel Bioinformatics Unit, 001 Sherman
Bldg. Faculty of Life Science, TAU Tel 03-640
6840 E-mail suezadva_at_tauex.tau.ac.il Bioinfo.
Unit webpage http//bioinfo.tau.ac.il/
3Proteins Workshops Series
Semester
Subjects
Lecture
1st semester
Introduction to structural biology
1
Advanced Tools for Protein Structure
Visualization
2
Consurf Identification of Functional Regions in
Proteins by Surface-Mapping of Phylogenetic
Information
3
2nd semester
Structural Alignment of Macromolecules
4
Modeling of Protein Structures using Nest
5
Docking of Small Molecules using Accelrys'
Discovery Studio
6
Searching for Homologous proteins using Blast
7
A Suit of Tools for Protein-Protein Docking
8
http//bioinfo.tau.ac.il
4Protein Principles
- Proteins reflect millions of years of
evolution. - Most proteins belong to large evolutionary
families. - 3D structure is better conserved than sequence
during evolution. - Similarities between sequences or between
structures may reveal - information about shared biological functions
of a protein family.
5http//expasy.org/sprot/
6http//www.ncbi.nlm.nih.gov/sites/entrez?dbprotei
n
7How can we determine the function of an
uncharacterized protein sequence ?
MGENDPPAVEAPFSFRSLFGLDDLKISPVAPDADAVAAQILSLLPLKFFP
IIVIGIIALILALAIGLGIHFDCSGKYRCRSSFKCIELIARCDGVSDCKD
GEDEYRCVRVGGQNAVLQVFTAASWKTMCSDDWKGHYANVACAQLGFPSY
VSSDNLRVSSLEGQFREEFVSIDHLLPDDKVTALHHSVYVREGCASGHVV
TLQCTACGHRRGYSSRIVGGNMSLLSQWPWQASLQFQGYHLCGGSVITPL
WIITAAHCVYDLYLPKSWTIQVGLVSLLDNPAPSHLVEKIVYHSKYKPKR
LGNDIALMKLAGPLTFNEMIQPVCLPNSEENFPDGKVCWTSGWGATEDGA
GDASPVLNHAAVPLISNKICNHRDVYGGIISPSMLCAGYLTGGVDSCQGD
SGGPLVCQERRLWKLVGATSFGIGCAEVNKPGVYTRVTSFLDWIHEQMER
DLKT
8Tell me who your friends are and I will tell you
who you are
- Find homologues
- Predict conserved domains
- Predict structure
- Other
9Example Human EGFR
http//www.uniprot.org/uniprot/P00533
10Higher Level Structures Motifs Domains
Motif is a simple combination of a few secondary
structures, that appear in several different
proteins in nature. A collection of motifs
forms a domain. Domain is a more complex
combination of secondary structures. It has a
very specific function (contains an active site).
A protein may contain more than one domain.
11Grouping of Secondary Structures Elements -
Super-secondary Structures or Motifs
alpha-alpha
beta-hairpin
beta-alpha-beta
beta-barrels
http//www.expasy.org/swissmod/course/text/chapter
4.htm
12http//www.expasy.ch/prosite/
Prosite determines the function of
uncharacterized protein, and to which known
family of proteins it belongs. A pattern
describes a group of amino acids that constitutes
an usually short but characteristic motif within
a protein sequence.
For example The pattern AC - x - V - x(4) -
ED. is interpreted as Ala or Cys - any -
Val - any-any-any-any- any but Glu or Asp.
Note Search by full text.
13PROSITE SYNTAX
For example The pattern AC - x - V - X(4) -
ED. is interpreted as Ala or Cys - any -
Val - any-any-any-any- any but Glu or Asp.
- The standard one-letter code for amino acids.
- x' any amino acid.
- ' residues allowed at the position.
- ' residues forbidden at the position.
- ( )' repetition of a pattern element are
indicated in parenthesis. - X(n) or X(n, m) to indicate the number or
range of repetition. - -' separates each pattern element.
- ' indicated a N-terminal restriction of
the pattern. - ' indicated a C-terminal restriction of
the pattern. - .' the period ends the pattern..
14Prosite Patterns ....
- Consensus sequences and patters are regular
expressions, - that can be used like fingerprints. E.g.
PROSITE patters
-N-P-ST-P- PS00001
N-Glycosylation
MGENDPPAVEAPFSFRSLFGLDDLKISPVAPDADAVAAQILSLLPLKFFP
IIVIGIIALILALAIGLGIHFDCSGKYRCRSSFKCIELIARCDGVSDCKD
GEDEYRCVRVGGQNAVLQVFTAASWKTMCSDDWKGHYANVACAQLGFPSY
VSSDNLRVSSLEGQFREEFVSIDHLLPDDKVTALHHSVYVREGCASGHVV
TLQCTACGHRRGYSSRIVGGNMSLLSQWPWQASLQFQGYHLCGGSVITPL
WIITAAHCVYDLYLPKSWTIQVGLVSLLDNPAPSHLVEKIVYHSKYKPKR
LGNDIALMKLAGPLTFNEMIQPVCLPNSEENFPDGKVCWTSGWGATEDGA
GDASPVLNHAAVPLISNKICNHRDVYGGIISPSMLCAGYLTGGVDSCQGD
SGGPLVCQERRLWKLVGATSFGIGCAEVNKPGVYTRVTSFLDWIHEQMER
DLKT
MGENDPPAVEAPFSFRSLFGLDDLKISPVAPDADAVAAQILSLLPLKFFP
IIVIGIIALILALAIGLGIHFDCSGKYRCRSSFKCIELIARCDGVSDCKD
GEDEYRCVRVGGQNAVLQVFTAASWKTMCSDDWKGHYANVACAQLGFPSY
VSSDNLRVSSLEGQFREEFVSIDHLLPDDKVTALHHSVYVREGCASGHVV
TLQCTACGHRRGYSSRIVGGNMSLLSQWPWQASLQFQGYHLCGGSVITPL
WIITAAHCVYDLYLPKSWTIQVGLVSLLDNPAPSHLVEKIVYHSKYKPKR
LGNDIALMKLAGPLTFNEMIQPVCLPNSEENFPDGKVCWTSGWGATEDGA
GDASPVLNHAAVPLISNKICNHRDVYGGIISPSMLCAGYLTGGVDSCQGD
SGGPLVCQERRLWKLVGATSFGIGCAEVNKPGVYTRVTSFLDWIHEQMER
DLKT
15http//www.uniprot.org/uniprot/P00533
16PROSITE Scan on Expasy
http//www.expasy.org/tools/scanprosite/
17http//pfam.sanger.ac.uk/
18http//www.uniprot.org/uniprot/P00533
19(No Transcript)
20http//www.ebi.ac.uk/interpro/
http//www.ebi.ac.uk/InterProScan/
21http//www.uniprot.org/uniprot/P00533
22Protein Structures
Primary
Secondary
Tertiary
Quaternary
Arrangement of secondary elements in 3D space.
Amino acid sequence.
Alpha helices Beta sheets, Loops.
Packing of several polypeptide chains.
Given an amino acid sequence, we are interested
in its secondary structures, and how they are
arranged in higher structures.
23Protein Structures
- Proteins are fundamental components of all
living cells. - The critical feature of a protein is its
ability to adopt the right shape for carrying out
a particular function. - Identifying protein's shape (structure), is a
key to understanding its biological function and
its role in health and disease, in addition to
finding the right cure. -
- Amino acid chains can fold, in a variety of
ways. Only one of these folds allows a protein to
function properly.
24How do Proteins Acquire Correct Conformation ?
- The primary amino acid sequence is crucial in
determining its final - structure.
- In some cases, additional interactions may be
required before a - protein can attain its final conformation
(for example, cofactors, - one or more subunits).
- Proteins can change their shape and function
depending on the - environmental conditions in which they are
found. The primary amino - acid sequence does not change.
25A Major Challenge of Bio-informatics
The challenge Understand the relationship
between amino acid sequence and the 3D structure
of proteins Predict 3D structure from
sequence. Unfortunately, the relationship
between sequence and structure is very
complicated. Current tools perform this task
poorly. Best performance (so far) can be
achieved using sequence homology to a known 3D
structure experimentally determined (by X-ray
crystallography or NMR).
26The Structural Prediction Problem
Given a protein sequence, compute its structure.
- Possible in principle.
- Astronomical, highly under-constrained search
space. - Biophysics complex and incomplete.
- Next to impossible in practice.
27How is the 3D structure determined ?
- 1. Experimental methods (Best approach)
- X-rays crystallography - stable fold, good
quality crystals. - NMR - stable fold, not suitable for large
molecule. - 2. In-silico methods (partial solutions -
- based on similarity)
- Sequence or profile alignment - uses similar
sequences, - limited use of 3D information.
- Threading - needs 3D structure, combinatorial
complexity. - Ab-initio structure prediction - not always
successful.
http//www.idi.ntnu.no/grupper/KS-grp/microarray/s
lides/drablos/Fold_recognition/sld004.htm
28PDB Content Growth
As of Wednesday February 11, 2009
http//www.rcsb.org/pdb/
29Solved structures of EGFR
Click here to open 2J5F in the PDB database
http//www.uniprot.org/uniprot/P00533
30PDB - DataBank of Protein Structures
PDB tutorial http//www.pdb.org/pdb/tutorials/tut
orial.html
31Solved structures of EGFR
Click here to open 2J5F in the PDBsum database
http//www.uniprot.org/uniprot/P00533
32http//www.ebi.ac.uk/pdbsum/2J5F