Title: Protein structure and alignment
1Protein structure and alignment
- When you see a protein sequence alignment,
notice the blocks with higher and lower
similarity (they are almost always there). - (most of the time) These are not simply
stochastic variation they represent regions
under more or less strong purifying selection. - These blocks can vary from rather small segments
to rather long domains (or both). - Longer blocks usually correspond to different
protein domains (which can vary as a unit in
selective pressure). - Shorter blocks usually correspond to
intra-domain structural features.
2Kinases form a complex, diverse family
Example from a particular enzyme
(many subtypes)
(many subtypes)
3CaM Kinase I and CaM Kinase II (CaMKII) (CaM
stands for calcium-calmodulin)
- Very similar in the kinase and calmodulin
regulatory domains. - CaMKI is monomeric, whereas CaMKII is a 10-12
subunit multimer. - CaMKII most likely arose (a very long time ago)
after the CaM Kinase domain by fusing a multimer
formation domain to the C-terminus.
4CaM Kinase II structure
N
C
multimer
serine-threonine
calmodulin
formation
protein kinase
regulation
12 subunits
with the catalytic
domains facing out
5insect, nematode, chordate CaMKII
6unc-43 --------------------MQLQQINSGAFSVV
RRCVHKTTGLEFAAKIINTKKLSARD rCaMKII
-------MATITCTRFTEEYQLFEELGKGAFSVVRRCVKVLAGQEYPAKI
INTKKLSARD hCaMKI MLGAVEGPRWKQAEDIRDIYDFR
DVLGTGAFSEVILAEDKRTQKLVAIKCIAKEALEGKE rCaMKI
MPGAVEGPRWKQAEDIRDIYDFRDVLGTGAFSEVILAEDKRTQKLV
AIKCIAKKALEGKE
.. . .
.. Â unc-43 FQKLEREARICRKLQHPNIVRLHDSIQEE
SFHYLVFDLVTGGELFEDIVAREFYSEADAS rCaMKII
HQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVA
REYYSEADAS hCaMKI GS-MENEIAVLHKIKHPNIVALD
DIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS rCaMKI
GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFD
RIVEKGFYTERDAS . . .
.. . .. . ..
 unc-43 HCIQQILESIAYCHSNGIVHRDLKPENL
LLASKAKGAAVKLADFGLAIEVN-DSEAWHGF rCaMKII
HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEV
EGEQQRWFGF hCaMKI RLIFQVLDAVKYLHDLGIVHRDL
KPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA rCaMKI
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGL
SKMED-PGSVLSTA . ....
. . . ...
 unc-43 AGTPGYLSPEVLKKDPYSKPVDIWACGVILY
ILLVGYPPFWDEDQHRLYAQIKAGAYDYP rCaMKII
AGTPGYLSPEVLRKDPYGKPVDLWACGVILYILLVGYPPFWDEDQHRLYQ
QIKARAYDFP hCaMKI CGTPGYVAPEVLAQKPYSKAVDC
WSIGVIAYILLCGYPPFYDENDAKLFEQILKAEYEFD rCaMKI
CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDA
KLFEQILKAEYEFD ... .
. .. ..
 unc-43 SPEWDTVTPEAKSLIDSMLTVNPKKRITADQ
ALKVPWICNRERVASAIHRQDTVDCLKKF rCaMKII
SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHR
QETVDCLKKF hCaMKI SPYWDDISDSAKDFIRHLMEKDP
EKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN rCaMKI
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDK
NIH-QSVSEQIKKN ..
.. . .. . . . . .
 unc-43 NARRKLKGAILTTMIATRNLSSKRSYRLTLG
AEKLVISMKNIEYWQVLLNKIFATYKIKM rCaMKII
NARRKLKGAILTTMLATRNFSGG---------------------------
--------KS hCaMKI FAKSKWKQAFNATAVVRHMR---
------------------------------------- rCaMKI
FAKSKWKQAFNATAVVRHMR--------------------------
-------------- . . . .
continued
7continued (overlapped)
unc-43 SPEWDTVTPEAKSLIDSMLTVNPKKRITADQALK
VPWICNRERVASAIHRQDTVDCLKKF rCaMKII
SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHR
QETVDCLKKF hCaMKI SPYWDDISDSAKDFIRHLMEKDP
EKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN rCaMKI
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDK
NIH-QSVSEQIKKN ..
.. . .. . . . . .
 unc-43 NARRKLKGAILTTMIATRNLSSKRSYRLTLG
AEKLVISMKNIEYWQVLLNKIFATYKIKM rCaMKII
NARRKLKGAILTTMLATRNFSGG---------------------------
--------KS hCaMKI FAKSKWKQAFNATAVVRHMR---
------------------------------------- rCaMKI
FAKSKWKQAFNATAVVRHMR--------------------------
-------------- . . . .
 unc-43 KQCRNLLNKKEQGPPSTIKESSESS-QTIDD
NDSEKGGGQLKHENTVVRADGATGIVSSS rCaMKII
G--G---NKKNDG----VKESSESTNTTIEDED-----------------
---------- .
.. .. Â unc-43
NSSTASKSSSTNLSAQKQDIVRVTQTLLDAISCKDFETYTRLCDTSMTCF
EPEALGNLIE rCaMKII ------------TKVRKQEIIKV
TEQLIEAISNGDFESYTKMCDPGMTAFEPEALGNLVE
.... .. ...
.. Â unc-43
GIEFHRFYFD--GNRKNQ-VHTTMLNPNVHIIGEDAACVAYVKLTQFLDR
NGEAHTRQSQ rCaMKII GLDFHRFYFENLWSRNSKPVHTT
ILNPHIHLMGDESACIAYIRITQYLDAGGIPRTAQSE
... . .....
..... . Â unc-43
ESRVWSKKQGRWVCVHVHRSTQPSTNTTVSEF rCaMKII
ETRVWHRRDGKWQIVHFHRSGAPSVLPH----
. .. . .
(note both inter- and intra-domain differences in
conservation)
blue kinase, orange calmodulin binding, green
multimerization
8Protein structure basics
- proteins consist mostly of a-helices, b-sheets,
and turns. - the a-helices and b-sheets typically form the
framework of the protein. - the turns and other atypical structures often
play important binding and catalytic roles. - the core of the protein is hydrophobic, whereas
the surface is usually polar or charged. - most sharp turns (kinks) have glycine or proline.
9alpha helix
10three-stranded antiparallel b-sheet
11three-stranded antiparallel b-sheet, space filled
12substrate binding cleft
arm swings away on calmodulin binding
rCaMKII SPEWDTVTPEAKDLINKMLTINPSKRITAAEALK
HPWISHRSTVASCMHRQETVDCLKKF rCaMKI
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-
QSVSEQIKKN 297 ..
.. . . ... . . . . .
  rCaMKII NARRKLKGAILTTMLATRN rCaMKI
FAKSKWKQAFNATAVVRHM 316
. . . . .
Â
gold and red - calmodulin binding arm and site
respectively
13sliced half-way through the protein
red - charged blue - polar green - hydrophobic
14(No Transcript)
15rCaMKII HQKLEREARICRLLKHPNIVRLHDSISEEGHHYL
IFDLVTGGELFEDIVAREYYSEADAS rCaMKI
GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVE
KGFYTERDAS 119 . . .
. ... . ..
 rCaMKII HCIQQILEAVLHCHQMGVVHRDLKPENL
LLASKLKGAAVKLADFGLAIEVEGEQQRWFGF rCaMKI
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKME
D-PGSVLSTA 178 . ..
. . . ... .
16rCaMKII HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKL
KGAAVKLADFGLAIEVEGEQQRWFGF rCaMKI
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKME
D-PGSVLSTA 178 . ..
. . . ... .
  rCaMKII AGTPGYLSPEVLRKDPYGKPVDLWACGVI
LYILLVGYPPFWDEDQHRLYQQIKARAYDFP rCaMKI
CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDAKLFE
QILKAEYEFD 238 ... .
. .. ... .
17(No Transcript)
18Measuring structural similarity
- Structural similarity can persist after sequence
similarity has reached noise levels. - More generally, how do you measure two
structures for degree of similarity? - Commonly used approach is root mean square
deviation (RMSD) between the positions of matched
backbone atoms.
19No statistically significant sequence similarity
RMSD for shared regions 3.5 Angstroms
20Illustration of three points on a structure of
poorly known function
- gaps in alignments tend to be on surface loops
- areas of highest conservation tend to be at key
sites (e.g. active sites of enzymes) and in core
structural elements - BUT when positive selection acts, binding faces
may tend to be the parts that vary.
21MATH domain containing genes a mystery family
in C. elegans
22(No Transcript)
23(No Transcript)
24For Thursday Download either Rasmol or Cn3D
structure viewer and a protein structure of your
choice. Send in an image of some view of the
protein with only the backbone showing and an
alpha helix or beta sheet colored differently
from the rest. (Youll have to read documentation
from your viewer to figure out how to do
this.) Final assignment will be posted by this
evening. Due on the last day of exam week.