Title: Protein Bioinformatics Course
1Protein Bioinformatics Course Matthew Betts Rob
Russell AG Russell (Protein Evolution)
Course overview Day 1 - Modularity Day 2 -
Interactions Day 3 - Modularity
Interactions Day 4 - Structure Day 5 - Structure
Interactions
Daily schedule 1000-1100 lecture 1100-1200 wor
k on exercises in pairs 1200-1300 lunch 1300-15
30 work on exercises in pairs 1600-1700 present
ations by you
2Protein Sequence Databases
3Database Searching
- Homologues proteins with a common ancestor
- Homology --gt similar function
- Sequence similarity --gt homology
- Find homologues using
- BLAST
- Profile Searching
4(No Transcript)
5Scores and E-values
How much would I expect to get gt this score by
chance alone?
How similar is my sequence to one in the database?
- cf. random sequences
- E 1 one such match by chance
- E lt 0.01 significant
- Depends on database
- size larger better
- composition (random assumed)
- Alignment
- Substitution matrix
- Gap penalties
6- Homology comes in two main types
- Orthology and Paralogy
- What is the difference and why does this matter?
7(No Transcript)
8Different Fates
- Orthologues
- Both copies required (one in each species)
- conservation of function (same gene)
- adaptation to new environment
- Paralogues
- Both copies useful
- conservation of function
- One copy freed from selection
- disabled
- new function
- Different parts of each free from selection
- function split between them
9- Assignment of orthology / paralogy can be
complicated by - duplication preceding speciation
- lineage-specific deletions of paralogs
- complete genome duplications
- many-to-one relationship
- multi-domain proteins
10Homology usually found by sequence similarity,
but proteins with dissimilar sequences can still
be homologous
Betts, Guigo, Agarwal, Russell, EMBO J 2001
11Proteins are modular
Since the early 1970s it has been observed that
protein structures are divided into discrete
elements or domains that appear to fold, function
and evolve independently.
12Given a sequence, what should you look for?
- Functional domains (Pfam, SMART, COGS, CDD, etc.)
- Intrinsic features
- Signal peptide, transit peptides (signalP)
- Transmembrane segments (TMpred, etc)
- Coiled-coils (coils server)
- Low complexity regions, disorder (e.g. SEG,
disembl) - Hints about structure?
13Given a sequence, what should you look for?
SMART domain bubblegram for human fibroblast
growth factor (FGF) receptor 1 (type P11362 into
web site smart.embl.de)
14Protein Modularity
- discrete structural and functional units
- found in different combinations in different
proteins
Receptor-related tyrosine-kinase
Non-receptor tyrosine-kinases
consider separately in predictions
15Finding Protein Domains
- through partial matches to whole sequences
- compare to databases of domains (Pfam, SMART,
Interpro) - can be separated by
- low-complexity and disordered regions (SEG)
- trans-membrane regions (TMAP)
- coiled-coils (COILS)
Repeat searches using each domain separately
1612 000 domain alignments make sequence searching
easier
WPP domain alignment
Alignments provide more information about a
protein family and thus allow for more sensitive
sequences than a single sequence. Domain
alignments also lack low-complexity or disorder
(normally) and other domains that can make single
sequence searches confusing.
17Finding domains in a sequence
18Cryptic domains at the border of sequence
detectability
Identified using more sensitive fold recognition
methods that use structure to help find weak
members of sequence families. If Pfam or SMART
or similar do not find a domain, and the region
is probably not disordered, then fold recognition
might help.
Gallego et al, Mol Sys Biol 2010
19Domain peptide interactions
Recognition of ligands or targeting signals
Post-translational modifications
20Linear motifs
Peptides interacting with a common domain often
show a common pattern or motif usually 3-8 aas.
3BP1_MOUSE/528-537 APTMPPPLPP PTN8_MOUSE/612
-629 IPPPLPERTP SOS1_HUMAN/1149-1157
VPPPVPPRRR NCF1_HUMAN/359-390
SKPQPAVPPRPSA PEXE_YEAST/85-94
MPPTLPHRDW SH3-interacting motif PxxP
perpetrator
victim
Puntervol et al, NAR, 2003 www.elm.org
(Eukaryotic Linear Motif DB)
21Linear motifs versus domains
Domains large globular segments of the proteome
that fold into discrete structures and belong in
sequence families. Linear motifs small,
non-globular segments that do not adopt a regular
structure, and arent homologous to each other in
the way domains are. Motifs lie in the
disordered part of the proteome.
22Intrinsically unstructured or disordered proteins
or protein fragments
23Disorder predictors (IUPred, RONN, DisORPred, etc)
24Linear motif mediated interactions are everywhere
- Include motifs for
- Targeting e.g. KDEL
- Modifications e.g. phosphorylation
- Signaling e.g. SH3
- About 200 are currently
- known, likely many more
- still to be discovered
Neduva Russell, Curr. Opin. Biotech, 2006
25Finding linear motifs in a sequence
26www.russelllab.org/wiki