Chemoinformatics Theory - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Chemoinformatics Theory

Description:

... silico techniques are used in pharmaceutical companies in the process of drug discovery. ... analysis played a major role in the pharmaceutical industry. ... – PowerPoint PPT presentation

Number of Views:353
Avg rating:3.0/5.0
Slides: 41
Provided by: scott709
Category:

less

Transcript and Presenter's Notes

Title: Chemoinformatics Theory


1
Chemoinformatics Theory
Yoon Soo Pyon ysp2_at_case.edu October 19th, 2007
2
Outline
  • Chemoinformatics-What is it?
  • Molecular descriptors and chemical spaces
  • Chemical spaces and molecular similarity
  • Molecular similarity, dissimilarity, diversity
  • Modification and Simplification of chemical
    spaces
  • Compound Classification and Selection
  • Similarity Searching
  • Machine Learning Methods
  • Library Design
  • Quantitative Structure Activity Relationship
    Analysis (QSAR)
  • Virtual Screening and compound filtering

3
Chemoinformatics-What is it?
  • Use of computer and informational techniques,
    applied to a range of problems in the field of
    chemistry.
  • This in silico techniques are used in
    pharmaceutical companies in the process of drug
    discovery.

4
Chemoinformatics-What is it?
5
Chemoinformatics-What is it?
6
Molecular descriptors and chemical spaces
  • Chemical reference spaces where molecular data
    sets are projected and analysis of design is
    carried out.
  • Definition of chemical spaces critically depend
    on the use of computational descriptors of
    molecular structure, physical or chemical
    properties.

7
Molecular descriptors and chemical spaces
8
Molecular descriptors and chemical spaces
  • There are no generally preferred descriptor
    spaces.
  • Require to generate reference spaces for
    specific application on a case by case

9
Chemical spaces and molecular similarity
  • Similar Property Principle Molecules having
    similar structures and properties should also
    exhibit similar activity. (Often but not always
    true)
  • Thus, molecules that are located closely
    together in chemical reference space are often
    considered to be functionally related.

10
Chemical spaces and molecular similarity
11
Molecular similarity, dissimilarity, and diversity
  • Diversity analysis
  • Select different compounds from a given
    population
  • Evenly populate a given chemical space with
    candidate molecules. Only selecting compounds
    that are at least a pre-defined minimum distance
    away from others.
  • Dissimilarity Inverse of molecular similarity
  • Dissimilarity analysis played a major role in the
    pharmaceutical industry.

12
Molecular similarity, dissimilarity, and diversity
  • Dissimilarity algorithm
  • Select a subset of k maximally dissimilar
    compounds
  • ? due to combinatorial problem, non-trivial
    challenge
  • Other dissimilarity algorithm
  • Decide on a desired size, n, of a final subset
  • Select a seed compound and place it in the
    subset
  • Calculate the dissimilarity between each of the
    other compounds and those in the subset
  • Choose the next compound as the one most
    dissimilar to those in the subset
  • If fewer than n in the subset, repeat the
    calculation of the dissimilarity until n is
    achieved
  • Complexity varies as the square of n

13
Modification and Simplification of Chemical
Spaces
  • High dimensional chemistry space might often too
    complex for carrying meaningful analyses.
  • Why?
  • 1) Major areas of high dimensional chemical space
    might not populated and remained as empty.
  • 2) Correlation effects between selected
    descriptors dramatically distort the reference
    space.
  • Therefore,
  • 1) Design low-dimensional reference spaces
  • 2) Simplify high-dimensional spaces
  • 3) Reduce their dimensionality

14
Modification and Simplification of Chemical
Spaces (contd.)
  • Auto scaling or variance scaling
  • Why? Descriptor with large value range will
    dominate those having smaller one.
  • Dimension reduction

15
Modification and Simplification of Chemical
Spaces (contd.) Dimension reduction
  • Assumption High dimensional descriptor spaces
    have at least some intrinsic redundancy.
  • Two approaches
  • To identify those descriptors that are most
    important for representing the original dataset
    and the relationships they form between objects
    for lower-dimensional representation
  • ex) multi dimensional scaling (Agrafiotis,
    et al. 2001)
  • To attempt to generate new descriptors for
    lower-dimensional spaces by combining important
    contributors from original one.
  • ex) Principal Component Analysis (PCA)

16
Modification and Simplification of Chemical
Spaces (contd.) - Simplification
  • Simplification of n-dimensional descriptor
    spaces
  • ex) Binary descriptor transformation
  • above mean ? 1, below mean ? 0

17
Compound Classification and Selection- CLUSTER
ANALYSIS
  • Aim is to divide a group into clusters where
    objects in the cluster are similar, but objects
    in other clusters are dissimilar
  • Many algorithms for doing this
  • Hierarchical methods seem to be better than
    non-hierarchical
  • Sometimes called a distance-based approach to
    compound selection, because distance is measured
    between pairs of compounds

18
Compound Classification and Selection- CLUSTER
ANALYSIS
19
Compound Classification and Selection-
Hierarchical Clustering
  • The composition of each cluster depends on the
    one from which it was derived
  • Agglomerative methods start at the bottom and
    merge similar clusters (bottom-up)
  • Wards method clusters are formed to minimize
    the variance (i.e., the sum of the squared
    deviations from the mean)
  • Others centroid method and the median method
  • Divisive hierarchical clustering starts with all
    compounds in a single cluster and partitions the
    data (top-down)

20
Compound Classification and Selection-
Non-Hierarchical Clustering
  • Organize compounds into an initially defined
    number of independent clusters.
  • Methods
  • nearest neighbor Jarvis Patrick clustering
  • relocation K-means

21
Compound Classification and Selection-
Partitioning
  • Rather than comparing molecular positions,
    establish a coordinate ore reference system in
    chemical space.
  • Compounds that populate the same partitions
    considered to be similar.

22
Compound Classification and Selection-
Partitioning
  • Diversity-based selection - Aims at generating a
    small representative subset of a compound
    collection. It is attempted to generate evenly
    populated partition.
  • Activity-based selection Known active compounds
    are added to the source database prior to
    partitioning. Compounds in database mapping close
    to known activities are then selected as
    candidate for testing to identify new hits.

23
Compound Classification and Selection-
Statistical Partitioning
  • Recursive partitioning most popular statistical
    partitioning. A decision tree method
  • Divides datasets along decision trees formed by
    sequences of molecular descriptors.
  • ex) The compounds could be divided according to
    molecular weight.

24
Compound Classification and Selection-
Statistical Partitioning
  • Statistical partitioning methods such as
    recursive partitioning is also very attractive
    tools for the analysis of HTS data sets.

25
Similarity Searching Structural queries and
graphs
  • Detection of structural fragments or
    substructures is a simple but popular form of
    similarity searching.

26
Similarity Searching Structural queries and
graphs
  • Contemporary substructure search methods are
    mostly based on dictionaries of predefined
    molecular fragments.
  • Queries can be transformed into an
    machine-readable format such as Simplified
    Molecular Input Line Entry Specification (SMILES)
    code.
  • SMILES encodes 2D representation of molecules as
    linear strings of alpha-numeric characters.

27
Similarity Searching Structural queries and
graphs (SMILES)
28
Similarity Searching Structural queries and
graphs
  • Subgraph-isomorphism
  • Common substructures can also determined by
    systematic mapping of corresponding node
    positions in graph.
  • However, computationally expensive
  • Reduced graph
  • Nodes do not represent atoms but features such as
    functionally important groups or whole ring
    system.
  • Become more suitable for node matching procedures
    and similarity searching.

29
Similarity Searching Structural queries and
graphs (Reduced graph )
30
Similarity Searching Pharmacophore
  • A molecular framework that carries the essential
    features responsible for drugs biological
    activity
  • Spatial arrangements of atoms or groups that are
    responsible for biological activity
  • Often used as 3D queries for database searching

31
Similarity Searching Fingerprints
  • Fingerprints
  • widely used similarity search tools.
  • consist of various descriptors that are encoded
    as bit strings
  • Bit strings of query and database compared using
    similarity metric such as Tanimoto coefficient

32
Machine Learning Methods
  • Important role in chemoinformatics
  • For example, it is usually difficult to predict
    which types of descriptors are most suitable for
    a given search, classification.
  • Therefore, machine learning techniques are often
    used to facilitate descriptor selection
  • Applied to generate complex predictive models by
    iterative processing of molecular learning sets
  • Genetic algorithms
  • Neural Networks
  • Self Organizing Maps (SOM)

33
Machine Learning Methods Genetic algorithms
  • Different parameters and model solutions to given
    problems are encoded in a chromosome and
    subjected to iterative random variation, thus
    generating a population.
  • Solutions provided by these chromosomes are
    evaluated by fitness function that assign high
    scores to desired results.
  • Chromosomes yielding best intermediate solutions
    are subjected to mutation and crossover operation
    that correspond to random genetic mutations and
    gene recombination events.
  • The resulting modified chromosomes represent the
    next generation and the process is continued
    until the obtained results meet a satisfactory
    convergence criterion

34
Library Design
  • Diverse Library
  • Focused Library

35
Quantitative Structure Activity Relationship
Analysis (QSAR)
  • Goal Evaluation of molecular features that
    determine biological activity and the prediction
    of compound potency as a function of structural
    modification

36
Virtual Screening and Compound Filtering
  • VS(Virtual Screening) - the process of screening
    large databases on the computer for molecules
    having desired properties and biological
    activity.
  • A major application of VS techniques is the
    identification of novel active molecules in large
    compound databases.
  • Series of known active compounds are added as
    search templates to a source DB and then
    compounds that are identified as similar to these
    templates based on VS calculations are selected
    as candidate molecules for experimental
    evaluation

37
(No Transcript)
38
Virtual Screening and Compound Filtering- Filter
Functions
  • Filter functions are very popular tools for VS
  • Attempts to identify compounds with desired
    properties and discard others.
  • Have been implemented for analysis of diverse
    molecular properties including chemical
    reactivity, toxicity, drug-like character,
    absorption, distribution, metabolism, excretion
    (ADME) parameters.
  • Ex) Aqueous solubility, Passive absorption
  • blood-brain-barrier penetration,
    metabolic stability,
  • oral availability

39
Virtual Screening and Compound Filtering- Filter
Functions
40
Thank You
Write a Comment
User Comments (0)
About PowerShow.com