Title: Immunological Bioinformatics: Prediction of epitopes in pathogens
1Immunological Bioinformatics Prediction of
epitopes in pathogens Ole Lund
2Data driven predictions
YMNGTMSQV GILGFVFTL ALWGFFPVV ILKEPVHGV ILGFVFTLT
LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CVGGLLTMV
FIAGNSAYE
List of peptides that have a given biological
feature
Mathematical model (neural network, hidden Markov
model)
Search databases for other biological sequences
with the same feature/property
gtpolymerase MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSG
RQEKNPALRMKWMMAMKYPITAD KRIMEMIPERNEQGQTLWSKTNDAGS
DRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFE KVERLKHGTFGPVHF
RNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE SQLT
ITKEKKEELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHL
TQGTCW EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEM
CHSTQIGGIRMVDILRQ NPTEEQAVDICKAAMGLRISSSFSFGGFTFKR
TNGSSVKKEEEVLTGNLQTLKIKVHEGY EEFTMVGRRATAILRKATRRL
IQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF ...
3Prediction algorithms
MHC binding data
Prediction algorithms
Genome scans
4Influenza A virus (A/Goose/Guangdong/1/96(H5N1))
Genome
gtSegment 1 agcaaaagcaggtcaattatattcaatatggaaagaata
aaagaactaagagatctaatg tcgcagtcccgcactcgcgagatactaa
caaaaaccactgtggatcatatggccataatc aagaaatacacatcagg
aagacaagagaagaaccctgctctcagaatgaaatggatgatg gcaatg
aaatatccaatcacagcagacaagagaataatggagatgattcctgaaag
gaat and 13350 other nucleotides on 8 segments
9mer peptides
Proteins
gtpolymerase MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSG
RQEKNPALRMKWMMAMKYPITAD KRIMEMIPERNEQGQTLWSKTNDAGS
DRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFE KVERLKHGTFGPVHF
RNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE SQLT
ITKEKKEELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHL
TQGTCW EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEM
CHSTQIGGIRMVDILRQ NPTEEQAVDICKAAMGLRISSSFSFGGFTFKR
TNGSSVKKEEEVLTGNLQTLKIKVHEGY EEFTMVGRRATAILRKATRRL
IQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF ... and 9
other proteins
MERIKELRD ERIKELRDL RIKELRDLM IKELRDLMS KELRDLMSQ
ELRDLMSQS LRDLMSQSR RDLMSQSRT DLMSQSRTR LMSQSRTRE
and 4376 other 9mers
5Arms race between humans and microbes
Recognize
HLA molecules In Humans
Peptides from microbes
Escape
6Human MHC1000 variants distributed over 12
types
Peptideup to 209 variants
Figure by Anne Mølgaard, peptide (KVDDTFYYV)
used as vaccine by Snyder et al. J Virol 78,
7052-60 (2004).
7HLA A and B diversity
Nielsen M, Lundegaard C, Blicher T, Lamberth K,
Harndahl M, Justesen S, Roder G, Peters B, Sette
A, Lund O, Buus S., NetMHCpan, a method for
quantitative predictions of peptide binding to
any HLA-A and -B locus protein of known sequence.
PLoS ONE. 2007 2e796.
8Binding affinity vs antigenecity
A quantitative analysis of the variables
affecting the repertoire of T cell specificities
recognized after vaccinia virus
infection. Assarsson E, Sidney J, Oseroff C,
Pasquetto V, Bui HH, Frahm N, Brander C, Peters
B, Grey H, Sette A. J Immunol. 2007 Jun
15178(12)7890-901.
9Prediction of MHC I epitopes
Major histocompatibility complex class I binding
predictions as a tool in epitope discovery.
Lundegaard C, Lund O, Buus S, Nielsen M.
Immunology. 2010 Jul130(3)309-18. Epub 2010 May
26. Review.
10Recent benchmark studies
- Class I
- Peters B, Bui HH, Frankild S et al. A community
resource benchmarking predictions of peptide
binding to MHC-I molecules. PLoS Comput Biol
2006 2e65. - Lin HH, Ray S, Tongchusak S, Reinherz EL, Brusic
V. Evaluation of MHC class I peptide binding
prediction servers applications for vaccine
research. BMC Immunol 2008 98. - Class II
- Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters
B. A systematic assessment of MHC class II
peptide binding predictions and evaluation of a
consensus approach.PLoS Comput Biol 2008
4e1000048. - Lin HH, Zhang GL, Tongchusak S, Reinherz EL,
Brusic V. Evaluation of MHC-II pep- tide binding
prediction servers applications for vaccine
research. BMC Bioinformatics 2008 9(Suppl.
12)S22. - Toward more accurate pan-specific MHC-peptide
binding prediction a review of current methods
and toolsLianming Zhang, Keiko Udaka, Hiroshi
Mamitsuka, Shanfeng ZhuBriefings in
bioinformatics (impact factor 7.33). 09/2011
DOI 10.1093/bib/bbr060
11Validation of binding predictions
12Response diversity
Hoof, et al., JI, 2010
13TB epitope discovery strategy
Mtb H37Rv genome sequence
Selection of peptides predicted to bind to HLA
supertypes (NetCTL, protFun, SubCell) A2
(A0201), A3 (A0301), B7 (B0702) (coverage
approx. 80 of the world population)
Synthesis selected peptides
Measuring peptide/MHC binding affinity in vitro
Screening for peptide recognition in in vitro
CD8 T cell assay in healthy PPD donors
Direct ex vivo determination of frequencies of
peptide/tetramer CD8 T cells in TB patients
(Multi) functionality of peptide responsive CD8
T cells in TB patients
Tang ST et al Submitted.
Genome-Based In Silico Identification of New
Mycobacterium tuberculosis Antigens Activating
Polyfunctional CD8 T Cells in Human
Tuberculosis. Tang ST, van Meijgaarden KE,
Caccamo N, Guggino G, Klein MR, van Weeren P,
Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S,
Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011
Jan 15186(2)1068-80. Epub 2010 Dec 17.
14TB
Genome-Based In Silico Identification of New
Mycobacterium tuberculosis Antigens Activating
Polyfunctional CD8 T Cells in Human
Tuberculosis. Tang ST, van Meijgaarden KE,
Caccamo N, Guggino G, Klein MR, van Weeren P,
Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S,
Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011
Jan 15186(2)1068-80. Epub 2010 Dec 17.
15TB
Selection pos tot Frac.
TBVAC 1 8 0.13
CD8 3 17 0.37
Best Pred 1 16 0.06
Cons 1 18 0.05
DOS/LAG 2 19 0.1
Bepi 0 18 0
Secret 6 19 0.31
Pred sec 4 15 0.26
All 18 130 0.13
Genome-Based In Silico Identification of New
Mycobacterium tuberculosis Antigens Activating
Polyfunctional CD8 T Cells in Human
Tuberculosis. Tang ST, van Meijgaarden KE,
Caccamo N, Guggino G, Klein MR, van Weeren P,
Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S,
Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011
Jan 15186(2)1068-80. Epub 2010 Dec 17.
16Tetramer and cytokine staining of 10 cured TB
patients and 10 healthy controls
Genome-Based In Silico Identification of New
Mycobacterium tuberculosis Antigens Activating
Polyfunctional CD8 T Cells in Human
Tuberculosis. Tang ST, van Meijgaarden KE,
Caccamo N, Guggino G, Klein MR, van Weeren P,
Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S,
Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011
Jan 15186(2)1068-80. Epub 2010 Dec 17.
17The challenge of rational epitope selection
- We have more than 2500 MHC molecules
- We often have more than 500 different pathogenic
strains - How to design a method to select a small pool of
peptides that will cover both the MHC
polymorphism and the pathogen diversity? - No peptide will bind to all MHC molecules and few
(maybe even no) peptides will be present in all
pathogenic strains
18Vaccine discovery - HIV case story
- 10 HIV proteins
- gt 2,000,000 different peptides exist within the
known HIV clades - Patient diversity
- More than 2500 different MHC molecules
- The challenge
- Select 100 (0.005) peptides with optimal genomic
and HLA coverage
19HIV Gag phylogenetic tree
Clade C
Few peptides conserved between all viral strains
Clade D
Clade AE
Clade A
Clade B
20Dodo Flavi viruses
21Predicted West Nile virus Epitopes
Mette Voldby Larsen
22Sequence identity vs. serotype
Solmaz Gabery
23Epitope identification
56 (1.5) 9mer are conserved among all 15 Clade
A gag sequences
24Polyvalent vaccines
- Select epitopes in a way so that they together
cover all strains.
Uneven coverage, Average coverage 2
Epitope
X
Strain 1
Strain 2
Even coverage, Average coverage 2
Strain 1
Strain 2
25EpiSelect. Pathogen diversity
26Selected West Nile Virus EpitopesShown relative
to NC001563/M12296
Mette Voldby Larsen
27Use of EpiSelect CTL Epitopes with Maximum HIV-1
Coverage
Problem The high mutation rate of HIV-1 makes it
difficult to identify CTL epitopes that are
conserved among all subtypes. Possible
solution Chose a number of predicted and
experimentally identified epitopes that together
constitute a broad coverage of the HIV-1 strains
examined. Data 300 fully sequenced HIV-1
strains A (A1 and A2), B, C, D, and
CRF01_AE Methods Prediction of CTL epitopes
restricted by A1, A2, A3, A24, B7, B44, or B58
Select the epitopes that give the broadest
coverage The algorithm chooses epitopes found in
as many strains as possible, while up
prioritizing epitopes from strains with few
already-selected epitopes. Results The final
set consists of 180 epitopes. On average, each
strain is covered by 54 epitopes (minimum 29).
Ongoing work by Annika Karlsson The ability of
the chosen epitopes to elicit CTL response will
be examined by using PBMCs from HIV-1 infected
patients.
Annika Karlsson and Carina Perez
28HLA polymorphism - frequencies
Supertypes Phenotype frequencies Caucasian Bla
ck Japanese Chinese Hispanic Average A2,A3, B7 83
86 88 88 86 86 A1, A24, B44 100 98
100 100 99 99 B27, B58, B62 100 100
100 100 100 100
Sette et al, Immunogenetics (1999) 50201-212
29Karolinska Institute
Response of 31 HIV infected patients to 184
predicted HIV epitopes
Annika Karlsson
Carina Perez
Perez et al., JI, 2008
30All HIV responsive patients respond to at least
one of nine peptides
Perez et al., JI, 2008
31PopCover 2D searching
- gt 2,000,000 different peptides exists within the
known HIV clades - 227091 peptides with prediction binding affinity
stronger than 500 nM to any MHC molecule - 5608(tat), 20961(nef), 31848(gag),42748(pol),12592
6 (env) - No Gag peptides are found in all clades and 92
of all Gag peptides are shared only between 0-5
of all clades - The challenge
- Select 64 (less than 0.001) peptides with
optimal genomic and HLA coverage - tat(4), nef(15), gag(15), pol(15), env(15)
32EpiSelect and PoPCover
- EpiSelect
- The sum is over all genomes i. Pji is 1 if
epitope j is present in genome i. Ci is the
number of times genome i has been targeted in the
already selected set of epitopes - PopCover
- The sum is over all genomes i and HLA alleles k.
Rjki is 1 if epitope j is present in genome i and
is presented by allele k, and Eki is the number
of times allele k has been targeted by epitopes
in genome i by the already selected set of
epitopes, fk is the frequency of allele k in a
given population and gi is the genomes frequency
33Experimental validation of HIV class II epitopes
Tat Nef Gag Pol Env
An average of 4,79 recognized peptides per patient
Marcus Buggert et al., In preparation
34Experimental validation
35Vaccine design. Polytope construction
Linker
NH2
M
COOH
Epitope
cleavage
C-terminal cleavage
New epitopes
Cleavage within epitopes
36Polytope starting configuration
Immunological Bioinformatics, The MIT press.
37Polytope optimal configuration
Immunological Bioinformatics, The MIT press.
38 Prediction servers at CBS
Web servers CTL epitopes www.cbs.dtu.dk/servi
ces/NetCTL MHC binding www.cbs.dtu.dk/services
/NetMHC www.cbs.dtu.dk/services/NetMHCII
www.cbs.dtu.dk/services/NetMHCpan
www.cbs.dtu.dk/services/NetMHCcons www.cbs.dt
u.dk/services/NetMHCIIpan www.cbs.dtu.dk/servic
es/HLArestrictor MHC Motif viewer www.cbs.dtu.
dk/biotools/MHCMotifViewer/Home.html Proteasome
processing www.cbs.dtu.dk/services/NetChop-3.0
B-cell epitopes www.cbs.dtu.dk/services/BepiPre
d/ www.cbs.dtu.dk/services/DiscoTope Plotting
of epitopes relative to reference
sequence www.cbs.dtu.dk/services/EpiPlot-1.0
Analysis of human immunoglobulin VDJ
recombination www.cbs.dtu.dk/services/VDJsolver
Geno-pheno type association based mapping of
binding sites www.cbs.dtu.dk/services/SigniSite
PhD/master course in Immunological
Bioinformatics, June, 2012 www.cbs.dtu.dk/cour
ses/27685.imm
39Immune Epitope Database (IEDB)
Peters B, et al. Immunogenetics. 2005 57326-36,
PLoS Biol. 2005 3e91.
40Cross-reactivity
- Crossreactivity is predictable (Pearsons r
0.35-0.6) - Rule of thumb Each mutation halfs the response
Frankild et al., PLoS ONE 3(3) e1831 Hoof, et
al., JI, 2010
41Pilot study of immunogenecity based on DrugBank
- www.drugbank.ca
- Records corresponding to 123 FDA-approved biotech
(protein/peptide) drugs were downloaded - Sequences were compared to the human proteome
(sequences from Homo Sapiens in NR (non
redundant database from NCBI)) using blast. - Sequences found in DrugBank and NR need to be
manually validated/curated
42Types of proteins
- Human/Human protein sequence Identical proteins
- Modified/allelic human proteins
- Non human proteins
- Antibodies
- Non human
- Human-murine chimaer
- Humanized
- Human
- Who Allelic differences of VDJ genes
- How much Break tolerance
- Tolerance to own B cell receptors?
43Proposed application in assessment of protein
drugs
- Compare amino acid sequence of drug with the
human proteome - Predict epitopes in regions that differ from the
human proteome - Select representative HLA alleles
- Verify binding experimentally
- Assess predicted immunogenecity using blood from
treated patients/transgenic animals/naïve donors - Compare with clinical findings of
immunogenecity/adverse effects/lack of effect
44Data acquired Data on 33 approved therapeutic
proteins
- Immunogenicity
- Percent of recipients in a clinical study that
had detectable antibodies against the therapeutic
protein - The primary source of immune response data was
the reviewed data presented in Meyler's Side
Effects of Drugs and from FDA labels.
Julie Serritslev, Jens Vindahl Kringelum, et al.,
in preparation
45Alleles representative of HLA-A, HLA-B and
HLA-DRB, HLA-DQB1 and four HLA-DPB1 super-types
Julie Serritslev, Jens Vindahl Kringelum, et al.,
in preparation Nielsen et al., 2008
46 Prediction of epitopes
- MHC Class I and II binders can be predicted for
all known alleles (AROC 0.8-0.9) - Binding correlates with likelihood of response
- No epitope give response in all individuals
- Cross reactivity correlates with epitope
similarity - B cell epitopes are hard to predict (AROC
0.6-0.7)