Postgenomic Era - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Postgenomic Era

Description:

Molecular Function. What does the protein do? ... Molecular Function ... Molecular structure of proteins = amino acid chain. Amino acid ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 29
Provided by: michaelsr
Category:
Tags: era | postgenomic

less

Transcript and Presenter's Notes

Title: Postgenomic Era


1
Postgenomic Era
  • Having learned how to sequence entire genomes, we
    want to move beyond just the sequence
  • First major area Proteomics
  • Study of all proteins within a cell/organism
  • One Clear Difficulty
  • The proteome is tissue dependent
  • Every one of your cells has the identical genome,
    but the proteome of your cells varies by both
    time and space and can be environmentally
    dependent

2
Protein Function
  • Protein functions are often divided into three
    subparts
  • Biological Process
  • Why is the protein doing what it is doingwhat is
    the goal (e.g., lactose metabolism)
  • Molecular Function
  • What does the protein do? (e.g., its an enzyme
    that helps convert sugar A to sugar B)
  • Cellular Component
  • Where is the protein expressed and active (e.g.,
    pancreatic cells)
  • Can any of these be answered computationally?

3
Molecular Function
  • In order to understand what a protein actually
    does, we need to know what it looks like
  • About 20,000 proteins have experimentally
    determined structures
  • As of June 2003, more than 1 million protein
    sequences had been determined

4
Protein Structure
  • Protein structure can be thought of in multiple
    dimensions
  • Primary Structure Protein sequence
  • Human C2 protein
  • MGPLMVLFCLLFLYPGLADSAPSCPQNVNISGGTFTLSHGWAPGSLLTYS
    CPQGLYPSPASRLCKSSGQWQTPGATRSLSKAVCKPVRCPAPVSFENGIY
    TPRLGSYPVGGNVSFECEDGFILRGSPVRQCRPNGMWDGETAVCDNGAGH
    CPNPGISLGAVRTGFRFGHGDKVRYRCSSNLVLTGSSERECQGNGVWSGT
    EPICRQPYSYDFPEDVAPALGTSFSHMLGATNPTQKTKESLGRKIQIQRS
    GHLNLYLLLDCSQSVSENDFLIFKESASLMVDRVRNQESACSRGLPVLTI
    SLCLLPLLRTPLTAHLLQEVFSDYTHAM

5
Protein Structure
  • Can be sequenced directly or derived from DNA
    sequence
  • Molecular structure of proteins amino acid
    chain
  • Amino acid

R the side chain this is what makes one amino
acid different from another
Glycine R a single hydrogen atom (H) Alanine
R CH3 Tryptophan R a double ring
6
Protein Structure
Amino acid chain
Peptide Bond
  • The side chains are what give specific properties
    to amino acids
  • Size
  • Electric Charge
  • Polarity
  • Shape and rigidity
  • These properties describe how amino acids can
    interact
  • Example positive and negative charged amino
    acids can interact as salt bridges

7
Protein Structure
  • Secondary Structure
  • Basic folding of primary sequence into common
    subparts
  • Backbone of a protein is made up of the non-side
    chain atoms
  • Peptide bond (C O NH) is rigid and planar
  • All flexibility in the backbone is due to the
    bonds from the alpha carbon to the neighboring C
    and N, since these bonds can rotate

8
Protein Structure
  • Major Secondary Structures
  • Alpha helices where both bonds have -60
    degree angles
  • Beta strands where the bonds alternate -130 and
    135 degree angles
  • Beta strands combine to form Beta sheets
  • Beta turns The U turn between beta strands in a
    beta sheet

9
Protein Structure
Alpha Helix
10
Protein Structure
Beta Sheet
Beta Strand
11
Protein Structure
  • Tertiary Structure
  • Folding of subparts in the full three-dimensional
    protein
  • Quarternary structure
  • How multiple proteins interact to form larger
    molecules the interaction may cause changes in
    tertiary structure

12
Graphical Display
Graphical methods for displaying 3d structure can
be very complicated and difficult to understand
without training
13
Structural Proteomics
  • Goal To discover the exact three-dimensional
    location of every atom within a protein
  • Experimental approach
  • Purify large amounts of the protein
  • Crystalize the protein
  • Use X-ray crystallography of nuclear magnetic
    resonance imaging to find locations of the atoms
  • Requires computational analysis of the X-ray/NMR
    results
  • This is difficult and time consuming

14
X-Ray Crystallography
15
Structural Proteomics
  • Computational Prediction
  • Theoretically it should be possible to predict
    full protein structure from a sequence
  • Every protein sequence naturally folds into shape
    in a fraction of a second therefore, simply
    knowing the protein sequence show provide all
    necessary information to predict the structure
  • In practice this is extraordinarily difficult
  • One approach is to take the atomic structure of
    the protein (the atom by atom diagram of the
    entire amino acid chain) and calculate the
    optimal energy binding of Van der Waals forces

16
Structural Proteomics
  • No computers that exist can currently do this
  • One of the fastest computers in the world is
    being designed by IBM to solve this very problem
    (started in 1999)
  • Called Blue Gene
  • 100 million development costs
  • We may still be decades or more away from this
    approach succeeding

17
Secondary Structure Prediction
  • What can be done today?
  • Secondary structure prediction
  • Chau-Fasman algorithm
  • There are many others, but this is one of the
    most common ones
  • Each amino acid has been assigned as set of
    parameters
  • P(a), P(b), P(turn), f(i), f(i1), f(i2), f(i3)
  • Parameter values were estimated from a large set
    of proteins with known structure
  • The Ps represent the propensity of a specific
    amino acid to be found in alpha helices, beta
    strands, and beta turns
  • The fs represent the frequency with which each
    amino acid is found in a hairpin turn

18
Secondary Structure Prediction
Example of the table
19
Secondary Structure Prediction
  • Algorithm
  • Alpha helices
  • Find all regions where 4 out of 6 consecutive
    amino acids have P(a) gt 100
  • Extend each region until 4 consecutive amino
    acids with P(a) lt 100
  • Add P(a) and P(b) for each region
  • If region is longer than 5 amino acids and SP(a)
    gt SP(b), it is assumed to represent an alpha
    helix
  • Beta strand
  • Same exact approach, but SP(b) gt SP(a) and
    average P(b) gt 100

20
Secondary Structure Prediction
  • If helix and sheet prediction overlap, the one
    with the greater sum P(a) vs P(b) is assumed to
    be correct
  • Beta turns
  • For each amino acid, calculate turn probability
    by multiplying f(i) f(i1) for the next amino
    acid f(i2) for the next f(i3) for the next
  • Assumed to be a hairpin turn if
  • Product is gt 0.00075
  • The mean P(turn) for all four amino acids is gt
    100
  • S P(turn) gt SP(a) and SP(b) for all 4 amino acids

21
Secondary Structure Prediction
  • Many other possible algorithms
  • Hidden Markov Models, Neural networks, etc.
  • Best methods tend to identify about 75 of
    secondary structures

22
Tertiary Structure
  • Much more difficult than secondary structure
  • Important features
  • Hydrophobicity
  • Disulfide bonds
  • Cysteine residues cross-link to form a very
    strong, stabilizing bond
  • Cysteine molecules are often more strongly
    conserved than other amino acids and can be
    extremely important for alignment
  • Interacting amino acids can be fairly close or
    very far apart in the protein chain!

23
Tertiary Structure
  • Three basic approaches
  • Homology modeling
  • BLAST protein for a similar sequence with known
    structure
  • If protein sequences are more than 30 identical,
    align (or multiple align) and use known structure
    as a template for estimating the structure of
    unknown protein
  • Order of construction
  • Model core backbone
  • Model loops
  • Loops will likely vary more between homologous
    proteins than the core backbone
  • Model side chains

24
Tertiary Structure
  • How good is homology modeling?
  • When gt 50 of the sequence is identical, results
    are usually excellent
  • Alignment error is one of the main causes or
    prediction mistakes

25
Tertiary Structure
  • Second approach Fold Recognition
  • Similar to BLAST search but where youre
    searching for secondary structure patterns
    (rather than the primary sequence) in a database
  • Essentially trying to align folds rather than
    amino acids
  • If you find proteins with similar secondary
    structure patterns, you assume the tertiary
    structures are similar
  • There are known cases or proteins with almost no
    similarity in primary sequence having extremely
    similar three-dimensional shapes
  • One form of advanced fold recognition is known as
    Threading
  • Does a more thorough search than standard fold
    recognition sort of an optimization algorithm
    for protein folding

26
Tertiary Structure
  • Third approach ab initio prediction
  • Predict structure from physical properties
  • Goal of IBMs Blue Gene project
  • Derive secondary structure from amino acid
    sequence (alpha helix, beta sheets, coils, etc.)
  • Fold these structures into tertiary structures
    based on physical properties
  • Currently these methods are restricted to the
    carbon atoms of the peptide backboneside chains
    are still too difficult
  • Accuracy is relatively poor compared to other
    methods
  • Accuracy is measured as the difference in
    angstroms between predicted and experimentally
    determined locations of atoms

27
Protein Prediction Competition
  • CASP Critical Assessment of Structure
    Prediction
  • Held every two years
  • Contest for protein structure prediction
  • A group experimentally determines the structure
    of novel proteins
  • Sequences are published, but the structures are
    not
  • Outside groups try to predict the structures and
    submit their results
  • A big meeting/party is held to announce how
    everyone did

28
RNA structure
  • Similar idea for the most part
  • Secondary structure consists of complementary
    base pairing between a single stranded sequence
    and itself
  • Tertiary structure has similar problems as those
    found with proteins
Write a Comment
User Comments (0)
About PowerShow.com