Protein Bioinformatics Course - PowerPoint PPT Presentation

About This Presentation
Title:

Protein Bioinformatics Course

Description:

Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution) Course overview Day 1 - Modularity Day 2 - Interactions Day 3 - Modularity ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 27
Provided by: Matthew610
Category:

less

Transcript and Presenter's Notes

Title: Protein Bioinformatics Course


1
Protein Bioinformatics Course Matthew Betts Rob
Russell AG Russell (Protein Evolution)
Course overview Day 1 - Modularity Day 2 -
Interactions Day 3 - Modularity
Interactions Day 4 - Structure Day 5 - Structure
Interactions
Daily schedule 1000-1100 lecture 1100-1200 wor
k on exercises in pairs 1200-1300 lunch 1300-15
30 work on exercises in pairs 1600-1700 present
ations by you
2
Protein Sequence Databases
3
Database Searching
  • Homologues proteins with a common ancestor
  • Homology --gt similar function
  • Sequence similarity --gt homology
  • Find homologues using
  • BLAST
  • Profile Searching

4
(No Transcript)
5
Scores and E-values
How much would I expect to get gt this score by
chance alone?
How similar is my sequence to one in the database?
  • cf. random sequences
  • E 1 one such match by chance
  • E lt 0.01 significant
  • Depends on database
  • size larger better
  • composition (random assumed)
  • Alignment
  • Substitution matrix
  • Gap penalties

6
  • Homology comes in two main types
  • Orthology and Paralogy
  • What is the difference and why does this matter?

7
(No Transcript)
8
Different Fates
  • Orthologues
  • Both copies required (one in each species)
  • conservation of function (same gene)
  • adaptation to new environment
  • Paralogues
  • Both copies useful
  • conservation of function
  • One copy freed from selection
  • disabled
  • new function
  • Different parts of each free from selection
  • function split between them

9
  • Assignment of orthology / paralogy can be
    complicated by
  • duplication preceding speciation
  • lineage-specific deletions of paralogs
  • complete genome duplications
  • many-to-one relationship
  • multi-domain proteins

10
Homology usually found by sequence similarity,
but proteins with dissimilar sequences can still
be homologous
Betts, Guigo, Agarwal, Russell, EMBO J 2001
11
Proteins are modular
Since the early 1970s it has been observed that
protein structures are divided into discrete
elements or domains that appear to fold, function
and evolve independently.
12
Given a sequence, what should you look for?
  • Functional domains (Pfam, SMART, COGS, CDD, etc.)
  • Intrinsic features
  • Signal peptide, transit peptides (signalP)
  • Transmembrane segments (TMpred, etc)
  • Coiled-coils (coils server)
  • Low complexity regions, disorder (e.g. SEG,
    disembl)
  • Hints about structure?

13
Given a sequence, what should you look for?
SMART domain bubblegram for human fibroblast
growth factor (FGF) receptor 1 (type P11362 into
web site smart.embl.de)
14
Protein Modularity
  • discrete structural and functional units
  • found in different combinations in different
    proteins

Receptor-related tyrosine-kinase
Non-receptor tyrosine-kinases
consider separately in predictions
15
Finding Protein Domains
  • through partial matches to whole sequences
  • compare to databases of domains (Pfam, SMART,
    Interpro)
  • can be separated by
  • low-complexity and disordered regions (SEG)
  • trans-membrane regions (TMAP)
  • coiled-coils (COILS)

Repeat searches using each domain separately
16
12 000 domain alignments make sequence searching
easier
WPP domain alignment
Alignments provide more information about a
protein family and thus allow for more sensitive
sequences than a single sequence. Domain
alignments also lack low-complexity or disorder
(normally) and other domains that can make single
sequence searches confusing.
17
Finding domains in a sequence
18
Cryptic domains at the border of sequence
detectability
Identified using more sensitive fold recognition
methods that use structure to help find weak
members of sequence families. If Pfam or SMART
or similar do not find a domain, and the region
is probably not disordered, then fold recognition
might help.
Gallego et al, Mol Sys Biol 2010
19
Domain peptide interactions
Recognition of ligands or targeting signals
Post-translational modifications
20
Linear motifs
Peptides interacting with a common domain often
show a common pattern or motif usually 3-8 aas.
3BP1_MOUSE/528-537 APTMPPPLPP PTN8_MOUSE/612
-629 IPPPLPERTP SOS1_HUMAN/1149-1157
VPPPVPPRRR NCF1_HUMAN/359-390
SKPQPAVPPRPSA PEXE_YEAST/85-94
MPPTLPHRDW SH3-interacting motif PxxP
perpetrator
victim
Puntervol et al, NAR, 2003 www.elm.org
(Eukaryotic Linear Motif DB)
21
Linear motifs versus domains
Domains large globular segments of the proteome
that fold into discrete structures and belong in
sequence families. Linear motifs small,
non-globular segments that do not adopt a regular
structure, and arent homologous to each other in
the way domains are. Motifs lie in the
disordered part of the proteome.
22
Intrinsically unstructured or disordered proteins
or protein fragments
23
Disorder predictors (IUPred, RONN, DisORPred, etc)
24
Linear motif mediated interactions are everywhere
  • Include motifs for
  • Targeting e.g. KDEL
  • Modifications e.g. phosphorylation
  • Signaling e.g. SH3
  • About 200 are currently
  • known, likely many more
  • still to be discovered

Neduva Russell, Curr. Opin. Biotech, 2006
25
Finding linear motifs in a sequence
26
www.russelllab.org/wiki
Write a Comment
User Comments (0)
About PowerShow.com