Title: Postgenomic Era
1Postgenomic Era
- Having learned how to sequence entire genomes, we
want to move beyond just the sequence - First major area Proteomics
- Study of all proteins within a cell/organism
- One Clear Difficulty
- The proteome is tissue dependent
- Every one of your cells has the identical genome,
but the proteome of your cells varies by both
time and space and can be environmentally
dependent
2Protein Function
- Protein functions are often divided into three
subparts - Biological Process
- Why is the protein doing what it is doingwhat is
the goal (e.g., lactose metabolism) - Molecular Function
- What does the protein do? (e.g., its an enzyme
that helps convert sugar A to sugar B) - Cellular Component
- Where is the protein expressed and active (e.g.,
pancreatic cells) - Can any of these be answered computationally?
3Molecular Function
- In order to understand what a protein actually
does, we need to know what it looks like - About 20,000 proteins have experimentally
determined structures - As of June 2003, more than 1 million protein
sequences had been determined
4Protein Structure
- Protein structure can be thought of in multiple
dimensions - Primary Structure Protein sequence
- Human C2 protein
- MGPLMVLFCLLFLYPGLADSAPSCPQNVNISGGTFTLSHGWAPGSLLTYS
CPQGLYPSPASRLCKSSGQWQTPGATRSLSKAVCKPVRCPAPVSFENGIY
TPRLGSYPVGGNVSFECEDGFILRGSPVRQCRPNGMWDGETAVCDNGAGH
CPNPGISLGAVRTGFRFGHGDKVRYRCSSNLVLTGSSERECQGNGVWSGT
EPICRQPYSYDFPEDVAPALGTSFSHMLGATNPTQKTKESLGRKIQIQRS
GHLNLYLLLDCSQSVSENDFLIFKESASLMVDRVRNQESACSRGLPVLTI
SLCLLPLLRTPLTAHLLQEVFSDYTHAM
5Protein Structure
- Can be sequenced directly or derived from DNA
sequence - Molecular structure of proteins amino acid
chain - Amino acid
R the side chain this is what makes one amino
acid different from another
Glycine R a single hydrogen atom (H) Alanine
R CH3 Tryptophan R a double ring
6Protein Structure
Amino acid chain
Peptide Bond
- The side chains are what give specific properties
to amino acids - Size
- Electric Charge
- Polarity
- Shape and rigidity
- These properties describe how amino acids can
interact - Example positive and negative charged amino
acids can interact as salt bridges
7Protein Structure
- Secondary Structure
- Basic folding of primary sequence into common
subparts - Backbone of a protein is made up of the non-side
chain atoms - Peptide bond (C O NH) is rigid and planar
- All flexibility in the backbone is due to the
bonds from the alpha carbon to the neighboring C
and N, since these bonds can rotate
8Protein Structure
- Major Secondary Structures
- Alpha helices where both bonds have -60
degree angles - Beta strands where the bonds alternate -130 and
135 degree angles - Beta strands combine to form Beta sheets
- Beta turns The U turn between beta strands in a
beta sheet
9Protein Structure
Alpha Helix
10Protein Structure
Beta Sheet
Beta Strand
11Protein Structure
- Tertiary Structure
- Folding of subparts in the full three-dimensional
protein - Quarternary structure
- How multiple proteins interact to form larger
molecules the interaction may cause changes in
tertiary structure
12Graphical Display
Graphical methods for displaying 3d structure can
be very complicated and difficult to understand
without training
13Structural Proteomics
- Goal To discover the exact three-dimensional
location of every atom within a protein - Experimental approach
- Purify large amounts of the protein
- Crystalize the protein
- Use X-ray crystallography of nuclear magnetic
resonance imaging to find locations of the atoms - Requires computational analysis of the X-ray/NMR
results - This is difficult and time consuming
14X-Ray Crystallography
15Structural Proteomics
- Computational Prediction
- Theoretically it should be possible to predict
full protein structure from a sequence - Every protein sequence naturally folds into shape
in a fraction of a second therefore, simply
knowing the protein sequence show provide all
necessary information to predict the structure - In practice this is extraordinarily difficult
- One approach is to take the atomic structure of
the protein (the atom by atom diagram of the
entire amino acid chain) and calculate the
optimal energy binding of Van der Waals forces
16Structural Proteomics
- No computers that exist can currently do this
- One of the fastest computers in the world is
being designed by IBM to solve this very problem
(started in 1999) - Called Blue Gene
- 100 million development costs
- We may still be decades or more away from this
approach succeeding
17Secondary Structure Prediction
- What can be done today?
- Secondary structure prediction
- Chau-Fasman algorithm
- There are many others, but this is one of the
most common ones - Each amino acid has been assigned as set of
parameters - P(a), P(b), P(turn), f(i), f(i1), f(i2), f(i3)
- Parameter values were estimated from a large set
of proteins with known structure - The Ps represent the propensity of a specific
amino acid to be found in alpha helices, beta
strands, and beta turns - The fs represent the frequency with which each
amino acid is found in a hairpin turn
18Secondary Structure Prediction
Example of the table
19Secondary Structure Prediction
- Algorithm
- Alpha helices
- Find all regions where 4 out of 6 consecutive
amino acids have P(a) gt 100 - Extend each region until 4 consecutive amino
acids with P(a) lt 100 - Add P(a) and P(b) for each region
- If region is longer than 5 amino acids and SP(a)
gt SP(b), it is assumed to represent an alpha
helix - Beta strand
- Same exact approach, but SP(b) gt SP(a) and
average P(b) gt 100
20Secondary Structure Prediction
- If helix and sheet prediction overlap, the one
with the greater sum P(a) vs P(b) is assumed to
be correct - Beta turns
- For each amino acid, calculate turn probability
by multiplying f(i) f(i1) for the next amino
acid f(i2) for the next f(i3) for the next - Assumed to be a hairpin turn if
- Product is gt 0.00075
- The mean P(turn) for all four amino acids is gt
100 - S P(turn) gt SP(a) and SP(b) for all 4 amino acids
21Secondary Structure Prediction
- Many other possible algorithms
- Hidden Markov Models, Neural networks, etc.
- Best methods tend to identify about 75 of
secondary structures
22Tertiary Structure
- Much more difficult than secondary structure
- Important features
- Hydrophobicity
- Disulfide bonds
- Cysteine residues cross-link to form a very
strong, stabilizing bond - Cysteine molecules are often more strongly
conserved than other amino acids and can be
extremely important for alignment - Interacting amino acids can be fairly close or
very far apart in the protein chain!
23Tertiary Structure
- Three basic approaches
- Homology modeling
- BLAST protein for a similar sequence with known
structure - If protein sequences are more than 30 identical,
align (or multiple align) and use known structure
as a template for estimating the structure of
unknown protein - Order of construction
- Model core backbone
- Model loops
- Loops will likely vary more between homologous
proteins than the core backbone - Model side chains
24Tertiary Structure
- How good is homology modeling?
- When gt 50 of the sequence is identical, results
are usually excellent - Alignment error is one of the main causes or
prediction mistakes
25Tertiary Structure
- Second approach Fold Recognition
- Similar to BLAST search but where youre
searching for secondary structure patterns
(rather than the primary sequence) in a database - Essentially trying to align folds rather than
amino acids - If you find proteins with similar secondary
structure patterns, you assume the tertiary
structures are similar - There are known cases or proteins with almost no
similarity in primary sequence having extremely
similar three-dimensional shapes - One form of advanced fold recognition is known as
Threading - Does a more thorough search than standard fold
recognition sort of an optimization algorithm
for protein folding
26Tertiary Structure
- Third approach ab initio prediction
- Predict structure from physical properties
- Goal of IBMs Blue Gene project
- Derive secondary structure from amino acid
sequence (alpha helix, beta sheets, coils, etc.) - Fold these structures into tertiary structures
based on physical properties - Currently these methods are restricted to the
carbon atoms of the peptide backboneside chains
are still too difficult - Accuracy is relatively poor compared to other
methods - Accuracy is measured as the difference in
angstroms between predicted and experimentally
determined locations of atoms
27Protein Prediction Competition
- CASP Critical Assessment of Structure
Prediction - Held every two years
- Contest for protein structure prediction
- A group experimentally determines the structure
of novel proteins - Sequences are published, but the structures are
not - Outside groups try to predict the structures and
submit their results - A big meeting/party is held to announce how
everyone did
28RNA structure
- Similar idea for the most part
- Secondary structure consists of complementary
base pairing between a single stranded sequence
and itself - Tertiary structure has similar problems as those
found with proteins