Title: Daylight and Discovery
1Daylight and Discovery
- How do I impress the boss when I get back?
2What is Discovery?
- A constant fight against the hedgehogs!!
3What have I learned this week?
- Above all you have learned new languages that
allow you to communicate chemical concepts to,
and between, machines. - These languages also allow you to communicate
these concepts via machines to your colleagues. - You have also learned about other descriptions of
a molecular structure, such as fingerprints.
4Language recap
- SMILES
- SMARTS
- SMIRKS
- (FINGERPRINTS)
5SMILES
- SMILES contains the same information as might be
found in an extended connection table. - The primary reason SMILES is more useful than a
connection table is that it is a linguistic
construct, rather than a computer data structure. - SMILES is a true language, albeit with a simple
vocabulary (atom and bond symbols) and only a few
grammar rules. - SMILES can be canonicalised. I.e. there is a
unique, universal name for a structure - SMILES representations of structure can in turn
be used as words in the vocabulary of other
languages designed for storage and retrieval of
chemical information .E.g HTML, XML or query
languages such as SQL.
6SMILES syntax
atombondatom etc atom ltmassgt
symbol ltchiralgt lthcountgt ltsignltchargegtgt
ltclassgt bond ltemptygt -
.
Common elements, in the organic subset
B,C,N,O,P,S,F,Cl,Br,I, in their lowest common
valence state(s), can be written without
brackets. If bonds are omitted, they default to
single or aromatic, as appropriate, for
juxtaposed atoms.
7Example SMILES
8SMARTS
- In the SMILES language, there are two fundamental
types of symbols atoms and bonds. Using these
SMILES symbols, one can specify a molecule's
graph (its "nodes" and "edges") and assign
"labels" to the components of the graph (that is,
say what type of atom each node represents, and
what type of bond each edge represents). - The same is true in SMARTS One uses atomic and
bond symbols to specify a graph. However, in
SMARTS the labels for the graph's nodes and edges
(its "atoms" and "bonds") are extended to include
"logical operators" and special atomic and bond
symbols these allow SMARTS atoms and bonds to be
more general. For example, the SMARTS atomic
symbol C,N is an atom that can be aliphatic C
or aliphatic N the SMARTS bond symbol ""
(tilde) matches any bond
9Example SMARTS
10Useful SMARTS
Heavy atom !(6,7,8,9,15,16,17,35,53)
Rotatable bonds !()!D1-!_at_!()!
D1 Secondary amides NH1D2-!_at_6X3 H-dono
rs !6!H0 H-acceptors (!60)!(F,Cl,
Br,I)!(o,s,nX3)!(Nv5,Pv5,Sv4,Sv6) Isola
ting carbons 6!(C(F)(F)F)!(c(!c)!c)!
(6,!6)!(6!0) Stereo atoms
(X4!v6!v5H0,H1),(SX3(6)(6)O) St
ereo bonds CX3!H2CX3!H2 Stereo
allenes CX3H0CCX3H0,H1
11Rotatable bonds!()!D1-!_at_!()!D1
- An atom which is
- NOT triply bonded to another atom
- AND NOT 1-connected ( I.e. Not terminal )
- Bonded by
- A single bond
- AND NOT a ring bond
- to the same type of atom
12Chemical Information Concepts in Discovery
- Matching
- Total
- Partial
- Similarity
- Qualitative
- Quantitative
- Both matching and similarity are opinions as they
depend on descriptors.
13Filtering
- Quite often you may wish to eliminate compounds
which are inappropriate for some activity or
test. - E.g. Delete any molecule from a list which
contains a heavy metal i.e. a non-common
element - gt CONTRIB/smarts_filter -v \ !(6,7,8,9,1
5,16,17,35,53)
14Counting things
- Count matches to patterns defined in SMARTS
- Molecular formula
- H-donors
- H-acceptors
- Rotatable bonds
- Chiral centres
- Rings
- Fragments
15Example
- Molecular formula C13H22N4O3S
- H-donors 2
- H-acceptors 6
- Rotatable bonds 8
- Chiral centres 1
- Rings 1
- Fragments 6
16Estimating Measured Properties
- Any property which is an additive constitutive
property of a molecule can be calculated by - counting the matches of the constituent patterns
- lookup the weight for the pattern
- summing the products of the count and individual
pattern weights. - apply any correction factors
17Examples of properties to calculate
- Molecular Weight
- logP
- Parachor
- Molar Volume
- Molar Refractivity
- .
18Molecular weight a simple example
- Molecular weight
- Molecular formula
- ?(count(atom(i))atomic_weight(atom(i)))
- Accuracy depends on accuracy of atomic weights (
IUPAC) - C13H22N4O3S
- 314.45 (average molecular weight )
- 314.141235 ( accurate mass of commonest isotope)
19CLOGP A more complicated example
- Algorithmic definition of fragment
- Pattern NOT an isolating carbon
- Match the pattern to find all the fragments
- Look up the fragment value(s) ( if it exists )
using the unique string(s) from the match. - Accumulate the values for fragments and
non-fragments (isolating carbons). - Correct for proximity
20CLOGP example
- 2 Cl 1.880
- guanidyl 1.930
- 2 C 0.390
- 6 c 0.780
- 7 H 1.589
- Proximity 0.984
- Total 1.727
21Estimating values for concepts
- Flexibility
- Ratio of number of rotatable bonds to total
number of bonds - Rigidity
- Molecular similarity between original molecule
and molecules formed by breaking all rotatable
bonds - Difficulty of synthesis
- Ratio of number of potential chiral centres
weighted for rings to total number of heavy atoms
in a molecule
22Example
- Flexibility 0.38
- Rigidity 0.3819
- Difficulty of synthesis 0.05
23Example
- Flexibility 0.38(0.00)
- Rigidity 0.3819(1.00)
- Difficulty of synthesis 0.05 (0.85)
- Figures in parentheses for morphine
24Relationships between compounds
- Compound sets
- Molecular descriptors
- Fingerprints etc
- Similarity measures
- Tanimoto etc
- Clustering
- Jarvis-Patrick etc
25Relationships between compounds
- Mixtures
- Molecular descriptors
- Modal Fingerprints etc
- Similarity measures
- Tanimoto etc
- Prototypes
- Family Resemblance
26Relationships between compounds
- Reactions
- Molecular descriptors
- Fingerprints
- Rôles
- Schemes/pathways
- Similarity and clustering
27Examples
- Creating a spreadsheet of properties.
- Non-standard fingerprinting and similarity.
28Dont let the hedgehogs take over..
29Dont let the hedgehogs take over..