Daylight and Discovery - PowerPoint PPT Presentation

About This Presentation
Title:

Daylight and Discovery

Description:

Above all you have learned new languages that allow you to communicate chemical ... aliphatic C or aliphatic N; the SMARTS bond symbol '~' (tilde) matches any bond ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 30
Provided by: JohnBr171
Category:

less

Transcript and Presenter's Notes

Title: Daylight and Discovery


1
Daylight and Discovery
  • How do I impress the boss when I get back?

2
What is Discovery?
  • A constant fight against the hedgehogs!!

3
What have I learned this week?
  • Above all you have learned new languages that
    allow you to communicate chemical concepts to,
    and between, machines.
  • These languages also allow you to communicate
    these concepts via machines to your colleagues.
  • You have also learned about other descriptions of
    a molecular structure, such as fingerprints.

4
Language recap
  • SMILES
  • SMARTS
  • SMIRKS
  • (FINGERPRINTS)

5
SMILES
  • SMILES contains the same information as might be
    found in an extended connection table.
  • The primary reason SMILES is more useful than a
    connection table is that it is a linguistic
    construct, rather than a computer data structure.
  • SMILES is a true language, albeit with a simple
    vocabulary (atom and bond symbols) and only a few
    grammar rules.
  • SMILES can be canonicalised. I.e. there is a
    unique, universal name for a structure
  • SMILES representations of structure can in turn
    be used as words in the vocabulary of other
    languages designed for storage and retrieval of
    chemical information .E.g HTML, XML or query
    languages such as SQL.

6
SMILES syntax
atombondatom etc atom ltmassgt
symbol ltchiralgt lthcountgt ltsignltchargegtgt
ltclassgt bond ltemptygt -
.
Common elements, in the organic subset
B,C,N,O,P,S,F,Cl,Br,I, in their lowest common
valence state(s), can be written without
brackets. If bonds are omitted, they default to
single or aromatic, as appropriate, for
juxtaposed atoms.
7
Example SMILES
8
SMARTS
  • In the SMILES language, there are two fundamental
    types of symbols atoms and bonds. Using these
    SMILES symbols, one can specify a molecule's
    graph (its "nodes" and "edges") and assign
    "labels" to the components of the graph (that is,
    say what type of atom each node represents, and
    what type of bond each edge represents).
  • The same is true in SMARTS One uses atomic and
    bond symbols to specify a graph. However, in
    SMARTS the labels for the graph's nodes and edges
    (its "atoms" and "bonds") are extended to include
    "logical operators" and special atomic and bond
    symbols these allow SMARTS atoms and bonds to be
    more general. For example, the SMARTS atomic
    symbol C,N is an atom that can be aliphatic C
    or aliphatic N the SMARTS bond symbol ""
    (tilde) matches any bond

9
Example SMARTS
10
Useful SMARTS
Heavy atom !(6,7,8,9,15,16,17,35,53)
Rotatable bonds !()!D1-!_at_!()!
D1 Secondary amides NH1D2-!_at_6X3 H-dono
rs !6!H0 H-acceptors (!60)!(F,Cl,
Br,I)!(o,s,nX3)!(Nv5,Pv5,Sv4,Sv6) Isola
ting carbons 6!(C(F)(F)F)!(c(!c)!c)!
(6,!6)!(6!0) Stereo atoms
(X4!v6!v5H0,H1),(SX3(6)(6)O) St
ereo bonds CX3!H2CX3!H2 Stereo
allenes CX3H0CCX3H0,H1
11
Rotatable bonds!()!D1-!_at_!()!D1
  • An atom which is
  • NOT triply bonded to another atom
  • AND NOT 1-connected ( I.e. Not terminal )
  • Bonded by
  • A single bond
  • AND NOT a ring bond
  • to the same type of atom

12
Chemical Information Concepts in Discovery
  • Matching
  • Total
  • Partial
  • Similarity
  • Qualitative
  • Quantitative
  • Both matching and similarity are opinions as they
    depend on descriptors.

13
Filtering
  • Quite often you may wish to eliminate compounds
    which are inappropriate for some activity or
    test.
  • E.g. Delete any molecule from a list which
    contains a heavy metal i.e. a non-common
    element
  • gt CONTRIB/smarts_filter -v \ !(6,7,8,9,1
    5,16,17,35,53)

14
Counting things
  • Count matches to patterns defined in SMARTS
  • Molecular formula
  • H-donors
  • H-acceptors
  • Rotatable bonds
  • Chiral centres
  • Rings
  • Fragments

15
Example
  • Molecular formula C13H22N4O3S
  • H-donors 2
  • H-acceptors 6
  • Rotatable bonds 8
  • Chiral centres 1
  • Rings 1
  • Fragments 6

16
Estimating Measured Properties
  • Any property which is an additive constitutive
    property of a molecule can be calculated by
  • counting the matches of the constituent patterns
  • lookup the weight for the pattern
  • summing the products of the count and individual
    pattern weights.
  • apply any correction factors

17
Examples of properties to calculate
  • Molecular Weight
  • logP
  • Parachor
  • Molar Volume
  • Molar Refractivity
  • .

18
Molecular weight a simple example
  • Molecular weight
  • Molecular formula
  • ?(count(atom(i))atomic_weight(atom(i)))
  • Accuracy depends on accuracy of atomic weights (
    IUPAC)
  • C13H22N4O3S
  • 314.45 (average molecular weight )
  • 314.141235 ( accurate mass of commonest isotope)

19
CLOGP A more complicated example
  • Algorithmic definition of fragment
  • Pattern NOT an isolating carbon
  • Match the pattern to find all the fragments
  • Look up the fragment value(s) ( if it exists )
    using the unique string(s) from the match.
  • Accumulate the values for fragments and
    non-fragments (isolating carbons).
  • Correct for proximity

20
CLOGP example
  • 2 Cl 1.880
  • guanidyl 1.930
  • 2 C 0.390
  • 6 c 0.780
  • 7 H 1.589
  • Proximity 0.984
  • Total 1.727

21
Estimating values for concepts
  • Flexibility
  • Ratio of number of rotatable bonds to total
    number of bonds
  • Rigidity
  • Molecular similarity between original molecule
    and molecules formed by breaking all rotatable
    bonds
  • Difficulty of synthesis
  • Ratio of number of potential chiral centres
    weighted for rings to total number of heavy atoms
    in a molecule

22
Example
  • Flexibility 0.38
  • Rigidity 0.3819
  • Difficulty of synthesis 0.05

23
Example
  • Flexibility 0.38(0.00)
  • Rigidity 0.3819(1.00)
  • Difficulty of synthesis 0.05 (0.85)
  • Figures in parentheses for morphine

24
Relationships between compounds
  • Compound sets
  • Molecular descriptors
  • Fingerprints etc
  • Similarity measures
  • Tanimoto etc
  • Clustering
  • Jarvis-Patrick etc

25
Relationships between compounds
  • Mixtures
  • Molecular descriptors
  • Modal Fingerprints etc
  • Similarity measures
  • Tanimoto etc
  • Prototypes
  • Family Resemblance

26
Relationships between compounds
  • Reactions
  • Molecular descriptors
  • Fingerprints
  • Rôles
  • Schemes/pathways
  • Similarity and clustering

27
Examples
  • Creating a spreadsheet of properties.
  • Non-standard fingerprinting and similarity.

28
Dont let the hedgehogs take over..
29
Dont let the hedgehogs take over..
Write a Comment
User Comments (0)
About PowerShow.com