Example Poster - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Example Poster

Description:

Corresponding donors and acceptors in other database structures, this being an ... molecule is complementary (in terms of its steric, hydrophobic and electrostatic ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 2
Provided by: academ108
Category:
Tags: example | poster | steric

less

Transcript and Presenter's Notes

Title: Example Poster


1
CLIP The Candidate Ligand Identification Program
 
Nicholas Rhodes1, Peter Willett1, Alain Calvet2
and Christine Humblet2
  • Background
  • Recent improvements in combinatorial synthesis
    techniques have resulted in the availability of
    very large numbers of molecules for
    high-throughput screening (HTS) systems.
    Although very efficient in operation,
    considerations of cost-effectiveness mean that
    screening should be restricted as far as possible
    to molecules that have a reasonable probability
    of being active. This has led to much interest
    in methods for virtual screening, i.e., the
    ranking of a set of compounds in decreasing
    probability of activity so that biological
    testing can be restricted to a (hopefully) small
    fraction of the total number of molecules
    available for consideration.
  • One of the most important virtual screening
    techniques is ligand docking. This involves
    determining whether a molecule is complementary
    (in terms of its steric, hydrophobic and
    electrostatic characteristics) to the binding
    site of a protein for which the 3D structure is
    available (typically from X-ray crystallography).
    Several programs for ligand-docking are now
    widely available and although effective in
    operation they can be quite slow, especially when
    an attempt is made to explore the conformational
    space of the potential ligands for the chosen
    target.
  • CLIP was designed to provide a fast alternative
    to docking methods, specifically, to meet the
    following criteria
  • It should be based on the 3D structures of
    ligands, rather than the 2D structures used in
    conventional similarity searching.
  • It should be able to utilise information about
    the binding site if a protein 3D structure is
    available
  • It should be sufficiently fast in operation to
    permit the virtual screening of a million
    compounds in an overnight run.
  • The programs
  • CLIP takes as inputs modified MOL2 files that
    have been pre-classified to include information
    about donors, acceptors, electronegativity etc.
    This classification is done by a Python script
    (CAP.py) on a once and for all basis, using the
    classification scheme proposed by Pepperell et
    al.. One of these inputs is the query template,
    the others are candidate molecules in a database.
    The query is then successively matched against
    each element of the database and the results
    sorted and presented.
  • Results
  • The actives clustered into 4 groups (UNITY RNN
    clustering) consisting of one group of 36 and
    three singletons. When each of the actives was
    used as a template all of the structures from the
    major cluster retrieved a majority of the actives
    from the same cluster in the top-100 indeed, 11
    retrieved all 36 cluster members. As expected,
    the singletons retrieved only themselves, and in
    one case, one other active molecule, indicating
    that CLIP is highly discriminating.
  • The data are presented as cumulative recall plots
    in Figures 2-4 (right)
  • ideal situation (actives rated 1-39)
  • average (random) or one hit every 5000/39
    structures
  • ranking obtained by docking of 3D structures
    into the HIV protease binding site using GOLD in
    command-line mode with default parameters. The
    cavity was centred on atom 242 (D25 OD1) by GOLD
    flood-fill with a radius of 15?.
  • rankings obtained from CLIP UNITY 2D searches
    for each of two templates
  • Using the templates 0154385 (6 nodes) and 0162034
    (3 nodes), CLIP ranked respectively 12 and 61
    molecules with a coefficient of unity. Note that
    CLIP will not analyse any structures that have
    fewer nodes than the minimum clique size
    parameter. These analyses were performed with
    MINCLQ 3 and only compounds containing matching
    cliques were ranked, so for template 0154386 only
    1822 of a possible 4981 were ranked and only 1155
    for template 162034. Because CLIP ranked many
    structures with identical Simpsons coefficients,
    these were averaged and so the top-ranked
    molecules (Simpsons coefficient is 1) all
    received an equal ranking of 6.5. This technique
    results in some discontinuous jumps in the CLIP
    data series.
  •  
  • The same two molecules were also used as UNITY
    queries for a default 2D similarity search here,
    by reducing the minimum similarity to zero,
    UNITY effectively ranked the entire data set.
    UNITY is probably marginally more effective than
    CLIP and whilst GOLD is an improvement on random
    selection, it is considerably less effective at
    ranking actives in this data set. However it
    should be noted that GOLD was designed to
    identify the binding modes of small numbers of
    molecule and not for this type of approach.

Figure 2
  • The experiments
  • Using a set of 5k candidate HIV protease
    inhibitors containing 39 known actives, we
    present performance comparisons of CLIP against
  • UNITY 2D fingerprints
  • docking of 3D structures into the HIV protease
    binding site using GOLD (Jones et al.)
  • In the first of these, though performing a 3-D
    match, CLIP is effectively acting as a similarity
    tool. Ranking was performed using a similarity
    metric based on Simpsons coefficient
  • where a is the clique size, b the candidate
    molecule size and c the template size, and where
    the sizes are the number of vertices in that
    graph or subgraph. To remove bias towards large
    molecules, the coefficient was normalised using a
    correction based on the differences between the
    template and candidate intra-node distances, the
    aim being to increase the similarity for cases
    where there was a high measure of agreement in
    the matched distances from the template graph and
    a candidate graph.

Figure 3
  • Theory
  • CLIP is based on mapping the 3D arrangement of
    pharmacophore features, typically donors and
    acceptors, in a target molecule against either
  • Corresponding donors and acceptors in other
    database structures, this being an example of 3D
    similarity searching
  • Complementary acceptors and donors in a protein
    binding site, which we will refer to as
    complementary searching
  • In both cases, the mapping is generated using a
    3D maximum common subgraph isomorphism algorithm,
    specifically the clique-detection algorithm of
    Bron and Kerbosch that has been used, both by us
    and by other workers, in several previous
    studies. This algorithm was chosen for two
    reasons it has been shown to both effective and
    efficient in operation (Brint, A.T. Willett,
    P. Gardiner, E.J., Artymiuk, P.J. Willett, P.)
    and it is also fairly easy to implement, in
    contrast to several of the other algorithms for
    MCS detection that have been described in the
    graph-theoretic literature.
  • Test data
  • A subset (5000) of the in-house HIV protease
    database (SMILES) with activities was supplied,
    this subset (candidates) contained a total of 39
    actives which were marked by renaming from
    xyz-0000 to xyz- to facilitate their
    identification by scripts processing result
    files. The SMILES were converted to 3D MOL2
    representations using CONCORD with default
    parameters. 19 molecules failed the conversion,
    none of them active. The resulting MOL2 file was
    then passed to the Python preprocessor, CAP.py,
    giving two files, both containing information on
    likely H-bond formers and one containing
    additional information on aromatic and
    hydrophobic moieties. The results described here
    were all obtained using the former, as aromatic
    and hydrophobic interactions in aspartyl protease
    binding sites were observed to be long-range and
    non-directional.
  • Templates were constructed from 3D structures
    with bound inhibitors and also from each of the
    actives. These were used for similarity searches
    against the whole subset.

Figure 4
  • Runtimes
  • With regard to timings, GOLD processed between
    1.25 and 3.77 structures per hour, depending on
    processor speed and machine load. Assuming two
    structures per hour, the total GOLD runtime was
    around 100 CPU days. CLIP will rank about
    250,000 structures per hour for a 3-node
    structure, and about 150,000 for a 6-node one
    for the dataset in question CLIP took just under
    two minutes for the 6-node structure and 72
    seconds for the smaller one (both well within the
    design criterion of one million compounds in an
    overnight run). The runtimes for UNITY 2D
    similarity searching are comparable to those for
    CLIP (about 3 minutes per search) but the modus
    operandi makes it difficult to time UNITY
    searches accurately.
  • It is difficult to compare CLIP against the SYBYL
    3-D searches, there seems to be little difference
    at all between the two approaches in terms of
    effectiveness there is, however, a substantial
    difference in terms of efficiency. Though
    difficult to time because of the way it operates,
    SYBYL takes around 4-5 minutes to search 4755
    compounds. CLIP is considerably faster, taking
    only 72 seconds for the same search. However, it
    is when taking into account the combinatorial
    problem of matching a larger template that the
    real advantage is seen. To search for all 3-point
    matches for a 6-entity template would take SYBYL
    approximately 80 to 100 minutes, CLIP takes
    around 150 seconds (2.5 minutes).
  • Conclusions
  • CLIP is capable of both similarity and
    complementary matches. In most cases, when doing
    a complementary match with a binding site, the
    sought-for positions of the entities are those of
    the bound ligand so the problem reduces to one of
    taking their positions and inverting the
    donor/acceptor status and is thus equivalent to a
    similarity search in 3D space. The program can,
    however, additionally be used when just the
    protein structure is available without a bound
    ligand. 
  • CLIP proved comparable in retrieval performance
    and speed with the fingerprint search, and
    outperformed the docking search (which is, after
    all, designed for more exhaustive exploration of
    a much smaller number of ligands) in both
    respects. CLIP is capable of ranking between
    150k and 250k structures per hour and thus
    provides a fast 3D alternative to traditional 2D
    screening methods.
  • Implementation
  • Written in entirely in C, CLIP employs a
    user-supplied file of rules to determine whether
    or not two nodes are compatible and a match has
    been made. The current implementation supports 8
    types of node (donor, acceptor, donor-acceptor,
    electronegative, electropositive, ambivalent,
    hydrophobic and aromatic), thus there are 88
    possible matches. In addition, the user specifies
    that one of four match modes (rule sets) is to be
    used
  • SIMPLE e.g. donors match with donors
    donor-acceptors
  • IDENTITY e.g.donors match only with donors
  • COMPLEMENTARY e.g. donors match with acceptors
    donor-acceptors
  • FUZZY anything else the user might wish
  • The four match modes in CLIP are all equally
    fast. They have been implemented systematically
    rather than specifically CLIP has not been
    programmed with rules relating DONORs, ACCEPTORs
    etc. but can only apply the following predicate
  • if (entity1) is compatible with (entity2) then
  • return (result)
  • CLIP has three levels of configuration
    hard-coded defaults, configuration files and
    command-line parameters for running in script
    mode. For large datasets, writing to disk takes
    place at user-specified intervals (CHUNKSIZE).
    Results are summarised in the main output file
    and details of the cliques (the bulk of the
    output) are written to a separate file, and
    optionally gzip-compressed using the zlib library
    routines.
  • References
  • Brint, A.T. Willett, P. "Algorithms for the
    identification of three-dimensional maximal
    common substructures." JCICS 27, 1987, 152-158
  • Bron, C. Kerbosch, J. Finding all cliques of
    an undirected graph. Communications of the ACM
    16, 1973, 575-577
  • Gardiner, E.J., Artymiuk, P.J. Willett, P.
    Clique-detection algorithms for matching
    three-dimensional molecular structures. JMGM 15,
    1998, 245-253
  • Jones, G., Willett, P., Glen, R.C., Leach, A.R.
    Taylor, R. "Development and validation of a
    genetic algorithm for flexible docking., JMB
    267, 1997, 727-748
  • Pepperell, C.A., Poirrette, A.R., Willett, P.
    Taylor, R.., Development of an atom-mapping
    procedure for similarity searching in databases
    of three-dimensional chemical structures,
    Pestic. Sci. 33, 1991, 97-111

1 Department of Information Studies, University
of Sheffield, Western Bank, Sheffield, S10 2TN. 2
Pfizer Global Research and Development, Ann
Arbor, MI 48105, USA. Acknowledgements This work
was funded by Parke-Davis and Pfizer.
Computational facilities were provided by the
BBSRC.
Figure1 Template created from 1hvi showing the 9
nodes of the inhibitor A77003 and their
interaction nodes in HIV protease.
Write a Comment
User Comments (0)
About PowerShow.com