Title: Optimal sequencing for drug discovery in Ewing
1Optimal sequencing for drug discovery in Ewings
sarcoma Diana Negoescu, Peter Frazier, Warren B.
Powell Department of Operations Research and
Financial Engineering, Princeton
University Jeffrey A. Toretsky, Sivanesan
Dakshanamurthy Georgetown University
Results Modified Free Wilson Model
Results for Free Wilson Model
Introduction Ewings sarcoma is a small
round-cell tumor typically arising in the bones,
and rarely in soft tissues, of children and
adolescents. In the US, 650-700 children and
adolescents younger than 20 years of age are
diagnosed with bone tumors each year, of which
approximately 200 are Ewing's sarcomas (Ries et
al. (1999)).
Method
Non-Informative Prior
Non-informative Prior
The Correlated Knowledge Gradient (CKG) (Frazier
et al. (2007))
Fig. 15 A sample path using a data set of 1000
compounds, when our initial belief has a high
uncertainty (non-informative prior). We plot the
opportunity cost after each measurement, defined
as the difference between the true best value and
the true value of the current best compound.
Fig. 18 Four sample paths using the
Non-informative prior. The best compound is found
after about 55 measurements.
- Bayesian approach
- Assume we have a budget of N measurements
- Assume measurements come from a multivariate
normal distribution - Start with a belief on the values of the
compounds, given by a mean vector µ and a
covariance matrix S - Decide what to measure and make the
measurement - 2. Update the mean vector µ and the covariance
matrix S - Repeat steps 1 and 2 until all N measurements
have been made.
Fig. 1 Child with Ewings sarcoma
Fig. 5 Sarcoma of the femur
The best compound is found after about 60
measurements.
Fig. 4 Ewings sarcoma cells
Informative Prior
The 5 year survival rate is of about 58, but
children with the metastatic disease at diagnosis
have a much lower prognosis 18 30 (Shankar et
al. (2003)).
The measurement decision
If the chemists have an idea about the mean and
variance of the substituent contributions, use
these values as an informative prior.
Fig. 2 Five year survival rates
Make each decision so as to maximize the increase
in knowledge (the gradient) from measuring a
specific compound. Mathematically, this is
Let m be the mean of the substituent
contributions, mainMol the value of the
unsubstituted molecule, v the variance of the
substituent contributions. The initial belief is
It has been discovered recently that,
genetically, Ewing's sarcoma is the result of a
translocation between chromosomes 11 and 22,
which fuses the EWS gene of chromosome 22 to the
FLI1 gene of chromosome 11 (Owen et al. (2008)).
Fig. 16 Sample path using the informative prior.
Informative Prior
where Sn is the belief state after measurement n,
and x is a compound.
Fig. 19 Four sample paths using the informative
prior. Best compound is usually found after about
50 measurements.
A medical research group at the Lombardi Cancer
Center at Georgetown University has selected a
chemical compound as a candidate for treating
Ewing's sarcoma. This chemical operates by
preventing two proteins, RNA Helicase and
EWS-FLI, from binding with each other, thus
stopping the spread of the disease. The research
group is now searching for derivatives of this
compound that could block binding with even
greater efficiency. However, synthesizing each
compound takes a few days, and there is a very
large number of molecules that could be tested.
The CKG policy chooses the molecule x that
maximizes ?CKG,n, which is the amount by which
the solution is expected to improve, and is
illustrated for the case of independent
measurement as an example in Fig. 9. In the
example, choice 4 has the current highest mean,
but choosing alternative 5 could improve what we
believe to be the best value. The shaded area
under the Gaussian curve is the probability that
choice 5 is better than the current best value,
and the knowledge gradient is the expected amount
by which the new best value will increase if we
choose compound 5.
Best compound found after 15 measurements.
where i is the number of substituents present in
compound x, j is the number of substituents
common to compounds x and x, and R is a noise
term simulating the error in the prior belief
about the value of the unsubstituted molecule.
Fig. 17 The true values of the compounds chosen
in the sample path at each step.
Fig. 9 Illustration of KG for independent
measurements
Fig. 6 Lab equipment at the Lombardi
Comprehensive Cancer Center
Our problem given the data we have available
thus far, and taking into account that molecules
with similar structures might have similar
properties, can we
systematically tell which compound to test next?
When updating our belief, we keep in mind that
measuring a compound teaches us something about
other compounds that share its substituents.
Conclusions and Future Work
Approach
How CKG works
Fig. 12 First 4 measurements in a sequence of 19
measurements made by the CKG algorithm under the
Pure Free - Wilson model for the 36 compounds
data set shown in Fig. 10. After each
measurement, not only does the variance of the
measured compound decrease, but also the
variances of the compounds that share a
substituent with it.
- Results so far indicate that the CKG algorithm
could be used to improve efficiency in drug
discovery for Ewings sarcoma. This conclusion
is made assuming that the additive Free-Wilson
model is accurate. - The current procedure requires enumerating all
possible compounds, limiting its application to
small molecules (lt 1000 combinations). - We are working on methods which can handle on
the order of 1000 parameters, making it possible
to handle molecules with millions of
combinations. - Further research needs to consider more
realistic models than Free-Wilson.
Assessing the value of a molecule
- Two methods can be used
- The BIAcore method detect optically if the
target protein binds with the compound (Raghavan
Bjorkman (1995)). This technique is accurate,
but is difficult to perform because the compounds
tend to aggregate when in the fluid. - Protein displacement combine the target protein
with the chemical compound to be tested, and then
mix with the secondary protein. Move the second
protein into a second container, and see if any
target protein has moved along with it (Angelakou
et al. (1999)). This technique is less accurate
than the BIAcore method.
Fig. 10 Compounds Representation
Fig. 7 BIAcore machine
References
Fig. 11 The molecule that generates the compounds
of Fig. 10
- Angelakou, A., Valsami, G., Macheras, P.
Koupparis, M. (1999), A displacement approach
for competitive drug protein binding studies,
European Journal of Pharmaceutical Sciences 9(2),
123-130. - Frazier, P., Powell, W.B. Dayanik, S. (2009),
The knowledge-gradient policy for correlated
normal rewards, INFORMS Journal on Computing. - Frazier, P., Powell, W.B., Dayanik, S. (2008),
A knowledge-gradient policy for sequential
information collection, SIAM Journal of Control
and Optimization. - Free, S. Wilson, W. (1964), Contribution to
structure-activities studies, J Med Chem 7,
395-399. - Owen, L., Kowalewski, A. Lessnick, S. (2008),
EWS/FLI Mediates Transcriptional Repression via
NKX2. 2 during Oncogenic Transformation in
Ewings Sarcoma, PLoS ONE. - Raghavan, M. Bjorkman, P. (1995), BIAcore a
microchip-based system for analyzing the
formation of macromolecular complexes, Structure
3(4), 331-333. - Shankar, A., Ashley, S., Craft, A. Pinkerton,
C. (2003), Outcome after relapse in an
unselected cohort of children and adolescents
with Ewings Sarcoma, Medical and Pediatric
Oncology 40(3), 141-147.
Modeling the relationship between the structure
and the value of a molecule
- Define
- a substituent to be an atom or group of atoms
substituted in place of a hydrogen atom on the
parent chain of a hydrocarbon. The molecule in
Fig. 8 has two positions, X and Y, at which
substituents can be attached - ai as the contribution of substituent i si is
an indicator variable whose value is 1 if
substituent i is present and 0 otherwise - µ as the biological activity value of the
unsubstituted parent structure.
Fig. 8 molecule of disubstituted N,N-
Dimethyl-a- Bromophenethylamines
Fig. 13 Measurements 7 -10 of the sequence
started in Fig. 12
Free Wilson Model
Modified Free Wilson Model
- Assumptions
- substituents do not have additive contributions
- contributions of any two different substituents
are independent. - Model the value of a compound as
- V Saisi µ b
- Model the covariance between compounds i and j as
- Cov(i,j) SlVar(ai) sb21ij, where l is
a counter over all common substituents to
compounds i and j.
- Assumptions
- each substituent has a strictly additive
contribution - contributions of any two different substituents
are independent. - Model the value of a compound as
- V Saisi µ
- Model the covariance between compounds i and j as
- Cov(i,j) SlVar(ai), where l is a counter
over all common substituents to compounds i and j.
Acknowledgements The research was performed
under the supervision of Peter Frazier and Prof.
Warren Powell at Princeton University, and
Professors Jeff Toretsky and Sivanesan
Dakshanamurthy at Georgetown University. We
also thank Dr. Andrew Mulberg for providing the
introduction.
Fig. 14 Measurements 11-14 of the sequence
started in Fig. 12
For further information Please contact me at
negoescu_at_princeton.edu. I would be happy to share
with you more on the current state of my thesis
research.