Title: Identifying structural templates using alignments of designed sequences
1Identifying structural templates using alignments
of designed sequences
- Stefan M. Larson
- Pande Group
- Biophysics Program
- December, 2002
- smlarson_at_stanford.edu
2Structure prediction sequence space
ASDJFHLKASD ASDFLHUHOUI QWEONBLQWER ASDFPOIQWER QW
EORSADFLK
ASDJFHLKASDLFH ASDFLHUHOUIQWE QWEONBLQWEROKJ ASDFP
OIQWERUHO QWEORSADFLKJIJ
ASDJFHLKASDLFHTJYH ASDFLHUHOUIQWEDFGH QWEONBLQWERO
KJDGHJ ASDFPOIQWERUHODHGR QWEORSADFLKJIJGHFG QWOIE
GTXKNBVALHERT ASDLFHIUWERHSDDFGH KBJDDURMWOFBMFERT
J FGJDKEGORTMVIRGHRT
ASDJFHLKASDLFHTJYH ASDFLHUHOUIQWEDFGH QWEONBLQWERO
KJDGHJ ASDFPOIQWERUHODHGR QWEORSADFLKJIJGHFG
3Multiple sequence alignments aid comparative
protein modeling
- 1 in 3 sequences are recognizably related to at
least one protein structure. - A significant fraction of the remaining 2/3 have
solved structural homologues, but they are not
recognized through sequence similarity searching
techniques. - Marti-Renom et al. (2000)
- Multiple sequence alignments greatly improve the
efficacy and accuracy of almost all phase of
comparative modeling. - Venclovas (2001)
4Computational protein design
New sequence
Iterative refinement
Native structure
5Large scale sequence generation
6Reverse BLAST finding templates for
comparative modeling
Larson SM, Garg A, Desjarlais JR, Pande VS.
(2003) Proteins Structure, Function, and Genetics
7Experiment Sequence quality
ASDFASDFASDFAS FDSAFASDFASDFA FASDFASDFASDFA FHFDI
DIFERIDKD ADHFYWTEFHHASD ASDFYEFHGASDFV ADHFYWTEFH
HASD ASDFYEFHGASDFV DGSAHDYERCNDFK AKSLKALSDFPLAK
Design
BLAST
Elt0.01
8Results Sequence quality
9Method Reverse BLAST
Designed Sequences
Hypothetical Proteins
Structural Templates
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
BLAST
Elt0.01
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASD
FASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
10Do the designed sequences help?
Correctly identified structural templates
fold-increase in of templates
fold-increase in of genes
total hits
11Remote homology detection
12Optimizing structural diversity
sequence entropy
prediction accuracy
prediction coverage
mean pairwise ID
mean native ID
13Future work
- Compare reverse BLAST to other remote homology
detection approaches (3D-PSSM, HHMER, etc). - Retrodict CASP targets, especially those which
were not successfully predicted by comparative
modeling. - Increase the coverage and accuracy of the
designed sequence sets.
14Collaborators
- Stanford University
- Amit Garg
- Dr. Vijay Pande
- Harvard University
- Jeremy England
- Xencor, Inc.
- Dr. John Desjarlais