Bioinformatics: Applications - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Bioinformatics: Applications

Description:

Group of residues with high contact density, number of contacts within domains ... They are are the basis of the PRINTS database, and are stored in the form of ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 63
Provided by: jonath76
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics: Applications


1
Bioinformatics Applications
  • ZOO 4903
  • Fall 2006, MW 1030-1145
  • Sutton Hall, Room 312
  • Jonathan Wren
  • Proteomics II Motifs Domains

2
Lecture overview
  • What weve talked about so far
  • High-throughput detection of protein abundance
    and identification
  • Overview
  • Proteins have constituent components and
    functional structures
  • Proteins can be modified post-translationally

3
3 Kinds of Proteomics
  • Expression or Analytical Proteomics
  • 2 dimensional electrophoresis gels
  • Mass Spectrometry, Microsequencing
  • Functional or Interaction Proteomics
  • Protein Domains motifs
  • Post-translational modifications
  • Structural Proteomics
  • High throughput X-ray Crystallography/Modeling
  • High throughput NMR Spectroscopy/Modeling

4
How do proteins evolve?
  • Proteins couldnt have began with the functions
    (or size) they have today

5
How do proteins evolve?
  • Proteins couldnt have began with the functions
    (or size) they have today
  • Proteins can be broken down into constituent
    components

6
The Protein Parts List
7
40-60 proteins of unknown function in the human
genome
8
Protein structures
  • Primary, secondary, tertiary, quaternary

9
Methods of grouping proteins
  • Protein motif
  • Protein domain
  • 3-D structure
  • Whole-protein

10
Protein Domains
  • Group of residues with high contact density,
    number of contacts within domains is higher than
    the number of contacts between domains
  • A stable unit of protein structure that can fold
    autonomously
  • A rigid body linked to other domains by flexible
    linkers
  • A portion of the protein that can be
    active/stable on its own if you remove it from
    the rest of the protein

11
Protein Domains
  • The term fold is commonly used in the context
    of a 3D structure
  • Together, a group of proteins that share a
    particular domain is known as a family
  • Domains are often further qualified with respect
    to function
  • Zinc finger bind DNA
  • Intracellular domain soluble cytoplasmic
    protein
  • Extracellular domain found on the outside of
    the cell membrane

12
Protein Domains
  • Domains can be 25 to 500 residues long most are
    less than 200 residues
  • The average protein contains 2 or 3 domains
  • The total number of different types of domains
    2000
  • The same or similar domains are found in
    different proteins.Nature is a tinkerer and
    not an inventor (Jacob, 1977).
  • Usually, each domain plays a specific role in the
    function of the protein
  • Generally, two sequences with over 30 identity
    are likely to have the same fold.

13
Linkers
  • Domain linkers link the protein domains together
    and have been found to contain an amino acid
    signature that is distinct from the structurally
    compact domains
  • Average linker size 8-9 amino acids
  • Linkers are flexible and more susceptible to
    protease attack

14
Divisibility of proteins by domain helps 3D
Structure Determination
  • X-ray crystallography
  • grow crystal
  • collect diffract. data
  • calculate e- density
  • trace chain
  • NMR spectroscopy
  • label protein
  • collect NMR spectra
  • assign spectra NOEs
  • calculate structure using distance geom.

15
3D Structure
Proteins share the same fold suggesting homology
Beta B1 Crystallin
Gamma Crystallin C
16
Protein Domains
17
Motifs are built from Multiple Alignmennts
18
Motif detection via MSA
Structure
alignment

tree
Functional site
Lichtarge et al, JMB 1996 Lichtarge et al, JMB
1997 Lichtarge et al, PNAS 1996 Sowa et al, NSB
2001
19
Motifs are Functionally Relevant
Trp1 domain of Hop
Dihydropteroate Synthase
Galectin CRD
Cluster Type
Ligand binding site
ET clusters
Structural Epitope Yellow ligand, Blue
Residues within 5Å of the ligand ET Clusters
Yellow ligand, Red Largest Cluster,
Other colors trace residues
20
Domains/Motifs
  • These domains have conserved sequences
  • Often much more similar than their respective
    proteins
  • Exon splicing theory (Gilbert)
  • Exons correspond to folding domains which in
    turn serve as functional units
  • Unrelated proteins may share a single similar
    exon (i.e. ATPase or DNA binding function)

21
Sequence Motif regular expressions
  • The simplest method of defining short amino acid
    sequence motifs
  • Example the nuclear receptor motif
  • C-x(2)-C-x-DE-x(5)-HN-FY-x(4)-C-x(2)-C-x(2)
    -F-F-x-R
  • DE either D or E
  • x(5) five undefined positions
  • FYW any non-aromatic amino acid

22
Motif Patterns (Regular Expressions)
  • Signature Patterns for Functional Motifs

ProClass Motif Alignments
23
(No Transcript)
24
Simple domains
  • Common structural domains
  • Membrane spanning
  • Signal peptide
  • Coiled coil
  • Helix-turn-helix

25
DNA Binding domainZinc-Finger
26
Methyl-binding domains
MBD methyl CpG binding domain
27
Multidomain proteins
28
Protein domains
29
Protein domain abundance is skewed
30
Remember Proteins Interact
31
Protein Interaction Domains
32
Proteins Assemble
33
Proteins localize
http//www.cs.ualberta.ca/bioinfo/PA/Sub/
34
Web servers that predict secondary structures
  • Predict Protein server
  • http//www.predictprotein.org/
  • TMpred (transmembrane prediction)
  • http//www.ch.embnet.org/software/TMPRED_form.html
  • COILS (coiled coil prediction)
  • http//www.ch.embnet.org/software/COILS_form.html
  • SignalP (signal peptides)
  • http//www.cbs.dtu.dk/services/SignalP/

35
Protein Domain Databases
  • Known protein domains have been collected in
    databases
  • Best database is PROSITE
  • The Dictionary of Protein Sites and Patterns
  • Maintained by Amos Bairoch, at the Univ. of
    Geneva, Switzerland
  • Contains a comprehensive list of documented
    protein domains constructed by expert molecular
    biologists
  • Alignments and patterns built by hand!
  • http//www.expasy.org/prosite/

36
PROSITE is based on Patterns
  • Each domain is defined by a simple pattern
  • Patterns can have alternate amino acids in each
    position and defined spaces, but no gaps
  • Pattern searching is by exact matching, so any
    new variant will not be found (can allow
    mismatches, but this weakens the algorithm)

37
PIR Pattern Search
  • From Text/Sequence search result or pattern
    search interface
  • One Query Sequence Against PROSITE Pattern
    Database
  • One Query Pattern (PROSITE or User-Defined)
    Against Sequence DB

38
Pattern detection in sequence
39
Sequence search using pattern
40
PRINTS database
  • Most protein families are characterized not by
    one, but by several conserved motifs
  • Fingerprints are groups of conserved motifs
    excised from sequence alignments
  • Taken together, they provide diagnostic family
    signatures. They are are the basis of the PRINTS
    database, and are stored in the form of aligned
    motifs
  • Input on protein families is done manually
  • True members match all elements of the
    fingerprint in order, subfamily members may match
    part of fingerprint

41
BLOCKS
  • The BLOCKS database uses an extension of the
    motif approach
  • block an alignment of the motif sequences from
    a family of proteins
  • BLOCKS are used to produce the BLOSUM matrices
  • e.g. BLOSUM62 is derived from those blocks that
    are at least 62 identical

42
Pfam
  • Pfam is a collection of alignments of protein
    domain sequences
  • Some families generated using HMMs, some created
    by hand
  • HMM Hidden Markov Model, a statistical method
    increasingly used in gene and protein modelling
  • HMMs are rigorous algorithms which allow for
    varying gap scores

43
Integrating Pattern databases
  • InterPro - Integrated Documentation Resource of
    Protein Families, Domains and Functional Sites.
  • InterPro is a database of protein families,
    domains and functional sites in which
    identifiable features found in known proteins can
    be applied to unknown protein sequences.
  • The aim is to provide a one-stop-shop for protein
    family diagnostics

44
InterPro
  • Member Databases
  • Prosite (regular expressions and profiles)
  • Pfam, SMART, TIGRFAMs, PIRSF, PANTHER, Gene3D and
    SUPERFAMILY (hidden Markov Models - HMMs)
  • PRINTS (groups of aligned, un-weighted motifs)
  • ProDom (uses cluster analysis to group sequences)
  • Release 13.0 contains 13,147 entries and covers
    77.6 of UniProtKB - 2530773 of 3260640 proteins
  • Types of entries Family, Domain, Repeat, PTM,
    Binding Site, Active Site
  • http//www.ebi.ac.uk/interpro/

45
Discovery of new Motifs
  • All of the tools discussed so far rely on a
    database of existing domains/motifs
  • How to discover new motifs
  • Start with a set of related proteins
  • Make a multiple alignment
  • Build a pattern or profile
  • You will need access to a fairly powerful UNIX
    computer to search databases with custom built
    profiles or HMMs.

46
Patterns in Unaligned Sequences
  • Sometimes sequences may share just a small common
    region
  • common signal peptide
  • new transcription factors
  • MEME San Diego Supercomputing Facility
  • http//meme.sdsc.edu/meme/meme-intro.html

47
Post-translational Modifications
Post-translational modification is the chemical
modification of a protein after its translation.
Translation is the process of synthesizing the
peptide chain of amino acids specified by the
nucleotide sequence on the mRNA.
48
The Central Dogma
  • Transcription
  • Translation

It is not necessary that the final product of
translation should be the final product of
protein synthesis.
49
Types of Post-translational modifications
  • Several types of PTMs characterized. Some of
    them
  • Proteolytic cleavage
  • Glycosylation (N)
  • Methylation (D - E - K)
  • Phosphorylation (S - T - Y)
  • Sulfation (Y)
  • Acetylation (D - E - K)
  • Disulfide bond formation (C)
  • Carboxylation, Hydroxylation, Prenylation,
    Formylation, etc. 300 PTMs total

50
Phosphorylation
Phosphorylation is the addition of a phosphate
(PO4) group to a protein or a small molecule
  • Phosphorylation and dephosphorylation
    responsible for activating or deactivation many
    enzymes and receptors
  • Phosphorylation catalyzed by various specific
    protein kinases, dephosphorylation by
    phosphatases
  • Can occur on Serine, Threonine, Tyrosine
  • gt30 of all proteins are phosphorylated during
    their functional life cycle

51
Phosphorylation Sites
pY
pT
pS
PO4
PO4
CH3
PO4
52
Glycosylation
Glycosylation is the addition of saccharide to a
protein or a lipid molecule
  • N-Linked Glycosylation
  • Amide nitrogen of Asparagine
  • O-Linked Glycosylation
  • - Hydroxy oxygen of Serine and Threonine

53
PTMs have significant biological functions
  • Extend the range of possible functions that can
    be exhibited by a protein by introducing new
    chemical groups.
  • Alter the hydrophobicity of a protein (synthesis
    of membrane proteins).
  • Activating or inactivating an enzyme.
  • Energy metabolism
  • Oxidative phosphorylation in respiration
  • Photophosphorylation in protein synthesis
  • Signal transduction
  • Protein degradation
  • Blood coagulation
  • Immune system

54
PTMs affect protein behavior
Mann Nat Biotech 2003
55
Post-translational modifications
56
PTMs can be characterized or predicted
  • Experimental methods
  • Crystallography
  • Mass Spectrometry
  • PTM Prediction tools
  • Auto-motif server
  • Sulfinator
  • NetPhos server
  • Predphospho server
  • eMOTIF
  • PROSITE

57
PTM detection
  • Pattern prediction (PROSITE)
  • Short or weak signal
  • Frequent hit producer
  • Best method is experimental
  • MS/MS detection
  • Most methods use  rules  joining pattern
    detection and knowledge to predict sites.

58
ExPASY protein tools
http//www.expasy.org/tools/
  • ChloroP - Prediction of chloroplast transit
    peptides
  • LipoP - Prediction of lipoproteins and signal
    peptides in Gram negative bacteria
  • MITOPROT - Prediction of mitochondrial targeting
    sequences
  • SignalP - Prediction of signal peptide cleavage
    sites
  • NetAcet - Prediction of N-acetyltransferase A
    (NatA) substrates
  • NetOGlyc - Prediction of O-GalNAc (mucin type)
    glycosylation sites in mammalian proteins
  • NetNGlyc - Prediction of N-glycosylation sites in
    human proteins
  • YinOYang - O-beta-GlcNAc attachment sites in
    eukaryotic protein sequences
  • big-PI Predictor - GPI Modification Site
    Prediction
  • DGPI - Prediction of GPI-anchor and cleavage
    sites (Mirror site)
  • Myristoylator - Prediction of N-terminal
    myristoylation by neural networks
  • NetPhos - Prediction of Ser, Thr and Tyr
    phosphorylation sites in eukaryotic proteins
  • NMT - Prediction of N-terminal N-myristoylation
  • PrePS - Prenylation Prediction Suite
  • Sulfinator - Prediction of tyrosine sulfation
    sites
  • SUMOplot - Prediction of SUMO protein attachment
    sites
  • TermiNator - Prediction of N-terminal
    modification

59
PTM Databases
  • General PTM Databases
  • RESID
  • Unimod
  • Delta Mass
  • PTM Databases for Specific Proteins
  • Histone sequence database
  • Human Protein Reference Database
  • Plasma Proteome Database
  • Databases for Specific PTMs
  • Phospho.ELM Phosphorylation
  • GlycoSuiteDB, SweetDB Glycosylation

60
Limitations of current PTM databases
  • PTMs mostly annotated in a static fashion. e.g.,
    an amino acid is denoted as either modified or
    unmodified. In reality, some amino acids are
    modified under one condition, and return to their
    initial state when the condition changes.
  • The status of a specific amino acid site with
    respect to a modification is highly associated
    with biological functionality of the protein. But
    this association is often not annotated in the
    database.
  • Phosphorylation vs. signal transduction
  • Glycosylation vs. cell-cell interaction
  • Different PTMs on the same protein may be
    associated with each other. These associations
    are not annotated in the current databases either.

61
Summary
  • Many different protein signature databases exist
    (from small patterns to alignments to complex
    HMMs)
  • The quality of a database/server is best tested
    with a sequence you know very well
  • Positive controls - submit sequences for which
    you know the right answer
  • Negative controls - random or shuffled sequences
  • Many proteins function only after they have been
    further chemically modified

62
For next time
  • Homework 6 due
  • Read Mount, Chapter 10
Write a Comment
User Comments (0)
About PowerShow.com