Using visualization to find relationships in chemogenomic data - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Using visualization to find relationships in chemogenomic data

Description:

Using visualization to find relationships in chemogenomic data ... Toxicology. CAS. ISIS. ABase. CodeLink. Multidisciplinary Drug Compound Candidate Project Team ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 24
Provided by: brianp9
Category:

less

Transcript and Presenter's Notes

Title: Using visualization to find relationships in chemogenomic data


1
Using visualization to find relationships in
chemogenomic data
  • Brian Prather, Ph.D., Application Specialist,
    Spotfire
  • Third Virtual Conference on Genomics and
    Bioinformatics
  • 18 September 2003

2
Challenges in Chemogenomics
  • Conceptual challenges
  • Multidisciplinary project teams
  • Data from chemistry, biology, genomics,
  • Data challenges
  • Disparate data sources, formats and identifiers
  • Multiple data types gene expression, chemical
    structures, clinical chemistry, histopathology,
    receptor binding assays,
  • Incomplete and missing data
  • Analysis challenges
  • Should the data be normalized?
  • Does it make sense to cluster the data?
  • How to relate chemical structures and gene
    expression data?
  • Identify correlations between chemical properties
    and biological function

3
Conceptual challenges
Multidisciplinary Drug Compound Candidate Project
Team
Chemistry
Genomics
Shared Analysis
Affymetrix
Toxicology
CAS
CodeLink
ABase
Biology
ISIS
Information Technology
4
Conceptual challenges
  • Different types of scientist ask different
    questions
  • Chemistry
  • Which chemical compound should I make next?
  • What structural motifs are associated with
    desired activity?
  • Are certain compound properties associated with
    drug-like behavior?
  • Are specific classes of compounds affecting our
    disease target?
  • How do structural R-groups influence activity?
  • Biology/Genomics
  • Which genes are significantly changed by drug
    treatment?
  • What pathways are involved in drug response?
  • Are we identifying agonists/antagonists of our
    target?
  • What gene changes are associated with toxicity?
  • Are genes involved in drug metabolism induced or
    repressed?
  • Information Technology
  • How can the data be stored, retrieved, merged,
    analyzed?

5
Common goals
  • Speed the drug development process
  • Focus research efforts on only the most promising
    drug compounds
  • Increase the number of successful candidate drugs
    in the pipeline
  • Discover and create drugs to help people who are
    suffering from diseases, conditions and ailments.

6
Data challenges multiple sources
  • There are multiple kinds of data and multiple
    data sources
  • Chemistry
  • Databases, Data Marts
  • Specialized databases for primary and secondary
    screening
  • Chemical properties databases
  • Chemical structure databases
  • Instrumentation
  • Biology/Genomics
  • Databases, Data Marts
  • Internal and external web-based search engines
  • Specialized gene expression databases
  • Gene annotation databases
  • Instrumentation

7
Data challenges incomplete data
  • Data sources are not always complete
  • Some compounds have not been subjected to
    toxicity studies
  • Some compounds have not been examined in tissue
    studies
  • Compounds might have been tested in different
    animals
  • Historical information might not exist for some
    compounds
  • Our analysis must take into account that the data
    might be very sparse.

8
Analysis challenges examples
  • Gene expression data normalization
  • Should we make experiments (entire arrays)
    comparable?
  • Should we make genes (within arrays) comparable?
  • How should missing values be handled?
  • Clustering
  • Does it make sense to cluster the data?
  • Which distance and similarity metrics should be
    used?
  • How should missing values be handled?
  • Chemical Structure analysis
  • Are the structures similar enough to allow a
    meaningful R-group analysis?

9
Finding relationships between data
  • Example dataset
  • Derived from Iconix DrugMatrix database
  • Summarizes 30 million data points
  • Data gathered during liver toxicity study in rats
  • Each row of data represents a single compound
    with each column containing a different measured
    variable
  • Includes gene expression in liver, clinical
    chemistry tests, histopathology, receptor binding
    assays
  • Structure database provides chemical structures
  • Example dataset
  • Results of ADME/Tox studies from a panel of
    assays
  • Each row is a compound, each column is an assay

10
Simple view x vs. y, labeled markers
Label and color by single word group
Non-steroidal anti-inflammatory drugs (NSAIDs)
cluster below 0.15 for albumin assay (indicative
of kidney damage typically associated with
NSAIDs)
11
Linking structures to biology
Do those statins with more upregulated genes have
common structural motifs?
12
Exploring groups/therapeutic roles
of entries per single word group
PPARa and Statins are in the same Therapeutic
class
13
Three simple views
  • Investigate compound groupings based on two
    spatial variables, plus labels, sizing, and
    coloring
  • Interactively link chemical structures to 2D
    visualizations
  • Examine overlap between categorical variables
    with colored histograms

14
Explore categories of activity
Determine which studies have been performed on a
given compound, and the qualitative (or binned)
result
15
Explore categories of activity
Link several visualizations together to relate
numerical data, qualitative data, structures,
16
Hierarchical clustering on gene subset
HC used to group compounds according to gene
expression patterns
Genes involved in cholesterol synthesis down
regulated by steroid receptor (orange category)
Single word categorical column
Genes involved in cholesterol synthesis
upregulated by Statins (light blue category)
17
Explore structural motifs vs. expression
What are the compounds with structural
commonalities?
Modify structure and search for other compounds
with this substructure
18
Explore structural motifs vs. expression
Are common structural motifs related to activity?
Note that each compound might have been tested at
multiple time points and/or multiple doses.
19
View structures of outliers
Select two statins that did not cluster together
by gene expression and look for structural
differences
View structures associated with compounds that
formed their own cluster in gene expression
20
Integrate pathway GO information
21
Link clustering and PCA results
Compare results of clustering and PCA
22
Statistical/structural analysis visualization
  • Overview biological results, categorical data,
    missing values,
  • Use clustering to group compounds according to
    gene expression levels. Perform substructure
    searches to identify structure-expression
    relationships.
  • Explore the relationship between structural
    differences and gene expression within subgroups
    of similar compounds
  • Link structures, gene expression data,
    hierarchical ontologies and pathways.
  • Compare results from different clustering
    methods, PCA analyses, profile matching metrics,

23
Summary
  • There are multiple challenges in chemogenomics
  • Conceptual challenges
  • Data challenges
  • Analysis challenges
  • Visualization is a useful and effective tool for
    understanding chemogenomic data
  • It can link multiple data types (structures, gene
    expression data, biological assay results,
    pathways, ).
  • It can help scientists interpret and understand
    the results of statistical analyses and is a
    complement (not a replacement!) to rigorous
    statistical analysis.
  • It lets the human eye do something that it does
    very well notice patterns, outliers, unexpected
    trends and relationships, and assimilate huge
    amounts of information in a glance.
Write a Comment
User Comments (0)
About PowerShow.com