Title: Shape and ROCS
1Shape and ROCS
- Bob Tolbert
- OpenEye Scientific Software, Inc.
- ACS - NYC
- September 10th, 2003
2Agenda
- Shape Theory
- Basic ROCS usage
- Using VIDA to visualize results
- Hands-on examples
- Using the Color Force Field
- Hands-on examples with color
3Gaussian Description of Shape
- Atoms are represented as Gaussians instead of
hard spheres - Easily integrable, analytic derivatives
- Product of 2 gaussians is another gaussian
- Easily calculate overlap between two atoms
- Easily calculate overlap between two collections
of gaussians - Shape Tanimoto and Tversky as measure of
similarity
4What is shape tanimoto?
- shape specific version of tanimoto that uses 3D
overlap instead of bits for whats in common. - Formula
- Sensitive to large size differences between two
structures
5What is shape Tversky?
- Tversky is an alternate similarity measure.
- Introduced for 2D similarity by John Bradshaw at
Daylight - Has weighting factor to deal with size
differences. - Useful for small structure vs large structure
similarity
6Tversky
- Tversky is assymmetric with ? 0.95 and ? 0.05
- ROCS reports 2 values Tversky-d and Tversky-q
with the 0.95 weighting for query molecule and
database molecule, respectively
7What is ROCS
- Rapid Overlay of Chemical Structures
- Small molecule vs. small molecule
- Uses
- Scaffold jumping
- HTS Rescue
- Patent busting
8ROCS Status
- Version 2.0 announced this week
- Completely rebuilt on top of OEChem
- Platforms include Linux, Linux-IA64, HPUX,
MacOSX, AIX, Tru64(Alpha), Solaris, Win32 and
Irix. - PVM support for all platforms except Win32
9ROCS in actual use
- How many 'hits' has ROCS helped to identify?
- 15-20, 4 classes, 15-20, 1, 1, 1, 50-75
- How many Targets attempted?
- 5, 4, 3, 1, numerous
- What are typical hit rates?
- 1-2, 6-15, 5-10, 2-5
- How many hits have made it into chemistry
- 0, 1-2, 3, 1, 1, 1
- How many HTS programs have been cancelled?
- gt1?
10Minimum example
- ROCS requires only 2 inputs, a file containing
shape query molecules and a file containing the
database of interesting structures, i.e.
corporate database, ACS-Screen, vendor databases
etc. - rocs -dbase vendor.oeb.gz -query 6cox.mol2
11The query file
- Can be one or more structures, each with one or
more conformers. By default, each
conformer/structure is treated as a separate
query - File format can be any of SDF, MOL2, PDB, XYZ,
MMOD, OEB - Molecule(s) must already be 3D.
12The database file
- Normally a pre-generated multi-conformer file.
- Can be one of several formats SDF, MOL2, PDB,
XYZ, MMOD, OEB (OEBinary v2) or BIN (OEBinary v1) - Contiguous conformers in file like SDF or MOL2
will be combined into a single, multi-conformer
molecule (by default) - OpenEyes tool of choice is OMEGA for generating
dbase files.
13Other default settings
- -besthits 500
- number of results to keep in hitlist
- -cutoff 0.0
- minimum score to even consider
- -rankby tanimoto
- alternates tverskyd, tverskyq, scaledcolor,
combo - -prefix rocs
- text prefix for output file names
- -oformat sdf
- format of output structure file
14Initial orientations for optimization
- Default - Inertial Frame alignment
- Overlay COM of query and target
- Large MOI aligned, then second largest. Including
2-fold degeneracy of each yields 4 starting
points. - Extra 2 or 4 axes for top symmetry.
- Random
- -randomstarts N
15A couple of simple examples
- rocs -dbase spam.bin -query acetsali.sdf -prefix
ACET -cutoff 0.5 -besthits 100 -outputquery - rocs -dbase spam.bin -query aminopy.sdf -rankby
tverskyd -cutoff 0.4 -maxhits 10 -outputquery
16Using VIDA to visualize results
- ROCS outputs (by default) two main files, a
structure file and a report file. - Structure file is by default SDF, scores are
stored in SD tags and automatically loaded into
VIDAs spreadsheet - Report file is tab-delimited text, can be loaded
into any spreadsheet for analysis or parsed by
splitting each line on tabs.
17Report File Format
- Tab-delimited
- Fields
- Name
- ShapeQuery
- Rank
- ShapeTanimoto
- Scaled Color
- ComboScore
- ColorScore
- SubTan
18Hands-on examples
- Example molecules
- Several example molecules in data directory
- spam.bin is example multi-conformer database
- Documentation is available in HTML and PDF format.
19Color Force Field
- Use SMARTS to describe color atoms or groups of
atoms - Post-shape scoring
- Color gradients can be used as part of
optimization process
20Color Force-field definition
TYPE donor TYPE acceptor TYPE cation TYPE
anion TYPE rings
21Color Force-field
- Define patterns (SMARTS) that match the types
These definitions of donor and acceptor are the
general definitions of Mills Dean, JCAMD
10607-622, 1996. Donor an electronegative
atom with a proton (no S or C, see
above). PATTERN donor
7,8h,H Acceptor a lone pair on an
electronegative atom (O or N S was removed,
see reference). Note, N in an amide or in an
alkyl-aniline system is too conjugated to
accept, however, analinic NH2 is a potential
acceptor. PATTERN acceptor
OD1(O-,,6,15,16)!(1,2,3) PATTERN
acceptor nH0,N,8!(nX3)((-,
,ee)-,,ee)! (NC)!(ND2,D3-a)!(
1,2,3) PATTERN acceptor
NX3((-,,ee)(-,,ee)-,,ee)!(
1,2,3)
22Color Force-field
- Define interactions between types
- Weight is strength of interaction, relative to
shape gradients. - Radius affects range of interaction
INTERACTION donor donor attractive gaussian
weight1.0 radius1.0 INTERACTION acceptor
acceptor attractive gaussian weight1.0
radius1.0 INTERACTION rings rings attractive
gaussian weight1.0 radius1.0
23Example Force-field
- ROCS includes a very simple color force field
simple.cff - Also included is a more complete force field that
include Mills-Dean definitions of donors and
acceptors as well as types for rings, anions and
cations. (MillsDean.cff)
24Using Color
- To just score shape hits
- rocs -dbase spam.bin -query aminopy.sdf -chemff
MillsDean.cff - To use color gradients
- rocs -dbase spam.bin -query aminopy.sdf -chemff
MillsDean.cff -optchem - To also rank by color
- rocs -dbase spam.bin -query aminopy.sdf -chemff
MillsDean.cff -optchem -rankby scaledcolor
25Two extra scores with Color
- Scaledcolor
- Actual color score is sum of each best color
interaction. Scaled color divides actual score
into self-color of query giving score between 0
and 1.0 - Comboscore
- To use shape and color together for ranking,
comboscore is the sum of shape tanimoto and
scaled color giving a score between 0.0 and 2.0
26Advanced Features
- -maxconfs
- retain more than one db molecule conformer
- -scdbase
- treat dbase as single conformer, separate
molecules - -report each, one, none
- -stats hits, best, all
- -nostructs
- dont write structure file at all
- -pvmconf, -pvmpass
27Roadmap I
- MOCS
- Multiple Overlay of Chemical Structures
- result is grid for query
- General grid query
- Pharmacophore pre-screening
- User-directed starting positions
28Roadmap II
- ElectroROCS
- Electrostatic tanimoto
- Electrostatic gradients
- Shape fingerprint pre-screening
- Torsion tweak with MMFF
- Query Optimization
- Database Optimization
29Acknowledgements
- Anthony Nicholls - OpenEye El Presidente
- author of Shape toolkit
- The OEChem hive-mind
- Geoff Skillman, Roger Sayle, and Matt Stahl
- Hewlett Packard
- computer loans for booth and seminar