Title: Pfizer Project
1Pfizer Project
2Outline
- The problem
- Our approach
- Examples of RadViz usage in single point
optimization - Examples of RadViz usage in multipoint
optimization - Next steps
3The Problem
4Our Approach
5Intro to RadViz
6CDK-2 Dataset
- Original
- 192 dimensions
- 17080 compounds
- Single Objective
- 357 active compounds
- 1458 gray compounds
- 15265 inactive compounds
7CDK-2 Dataset Random Layout
CDK2 dataset 192 dimensions 17080 compounds 357
active (2.1) 1458 gray (8.5) 15265 inactive
(89.4)
Random Layout
8CDK-2 Dataset Random Layout
CDK2 dataset 192 dimensions 17080 compounds 357
active (2.1) 1458 gray (8.5) 15265 inactive
(89.4)
Random Layout
Pearson Correlations Removed 33
dimensions Cutoff is 0.8
9CDK2 dataset 192 dimensions 17080 compounds 357
active (2.1) 1458 gray (8.5) 15265 inactive
(89.4)
Class Discrimination Layout Algorithm 24
dimensions per class
Selection 3917 compounds 172 active (4.4) 326
gray (8.3) 3419 inactive (87.3)
10CDK2 dataset 192 dimensions 17080 compounds 357
active (2.1) 1458 gray (8.5) 15265 inactive
(89.4)
Class Discrimination Layout Algorithm 24
dimensions per class
Selection 1018 compounds 95 active (9.3) 111
gray (10.9) 812 inactive (79.8)
11CDK2 dataset 192 dimensions 17080 compounds 357
active (2.1) 1458 gray (8.5) 15265 inactive
(89.4)
Class Discrimination Layout Algorithm 24
dimensions per class
Selection 162 compounds 17 active (10.5) 18
gray (11.1) 127 inactive (78.4)
12(No Transcript)
13K-Means 5 and Scaffold Entire Dataset
14K-Means 10 and Scaffold Entire Dataset
15K-Means 14 and Scaffold Entire Dataset
16CDK-2 Medium Size Selection Subset
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
17CDK-2 Medium Size Selection Subset
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
Multi-Column Flattening
18CDK-2 Medium Size Selection Colored by K-Means
10 Clusters
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
All Records Shown
19CDK-2 Medium Size Selection Colored by K-Means
10 Clusters
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
Active Records Shown
20CDK-2 Medium Size Selection Colored by K-Means
10 Clusters
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
Inactive Records Shown
21K-Means 10 Cluster Distribution Entire Dataset
22K-Means 10 and Scaffold CDK-2 Medium Size
Selection Subset
23CDK-2 Medium Size Selection Colored by Scaffold
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
All Records Shown
24CDK-2 Medium Size Selection Colored by Scaffold
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
Active Records Shown
25CDK-2 Medium Size Selection Colored by Scaffold
CDK-2 Selection 1018 compounds 95 active
(9.3) 111 gray (10.9) 812 inactive (79.8)
Class Discrimination Layout Algorithm 12
dimensions per class
Inactive Records Shown
26Scaffold Distribution (0-14, 16-21) Entire
Dataset
Scaffold 00 12716 compounds 74 active 1111
gray 11531 inactive
27Scaffold Distribution (1-14, 16-21) Entire
Dataset
28Multi-objective Dataset A
- Original
- 190 dimensions
- 2824 total compounds
- Preprocessed
- Filter molecular weight lt500
- Filter CLOGP lt5
- 190 dimensions
- 2505 total compounds
Selectivity 141 highly selective (5.6) 183
selective (7.3) 212 moderate (8.5) 1190 weak
(47.5) 779 very weak (31.1)
Activity 145 highly active (5.8) 178 active
(7.1) 187 moderate (7.5) 998 weak (39.8) 997
very weak (39.8)
29Multiobjective Data A Random Layout Colored by
Activity
Multiobjective Data A 190 dimensions 2505
compounds 145 highly active (5.8) 178 active
(7.1) 187 moderate (7.5) 998 weak (39.8) 997
very weak (39.8)
Random Layout
Pearson Correlations Removed 52
dimensions Cutoff is 0.8
30Multiobjective Data A Colored by Activity
Multiobjective Data A 190 dimensions 2505
compounds 145 highly active (5.8) 178 active
(7.1) 187 moderate (7.5) 998 weak (39.8) 997
very weak (39.8)
Class Discrimination Layout Algorithm (by
Activity) 10 dimensions per class
Selection 411 compounds 57 highly active
(13.9) 47 active (11.4) 46 moderate (11.2) 151
weak (36.7) 110 very weak (26.8)
31Multiobjective Data A Selection Subset Colored
by Selectivity
Data A Selection Subset 411 compounds 50 highly
selective (12.2) 58 selective (14.1) 60
moderate (14.6) 195 weak (47.4) 48 very weak
(11.7)
Class Discrimination Layout Algorithm (by
Selectivity) 5 dimensions per class
32Multiobjective Data A Selection Subset Colored
by Activity
Data A Selection Subset 411 compounds 57 highly
active (13.9) 47 active (11.4) 46 moderate
(11.2) 151 weak (36.7) 110 very weak (26.8)
Class Discrimination Layout Algorithm (by
Selectivity) 5 dimensions per class
33Multiobjective Data A Selection Subset Colored
by Selectivity
Data A Selection Subset 411 compounds 50 highly
selective (12.2) 58 selective (14.1) 60
moderate (14.6) 195 weak (47.4) 48 very weak
(11.7)
Class Discrimination Layout Algorithm (by
Selectivity) 10 dimensions per class
Multi-Column Flattening
34Multiobjective Data A Selection Subset Colored
by Activity
Data A Selection Subset 411 compounds 57 highly
active (13.9) 47 active (11.4) 46 moderate
(11.2) 151 weak (36.7) 110 very weak (26.8)
Class Discrimination Layout Algorithm (by
Selectivity) 10 dimensions per class
Multi-Column Flattening
35Next Steps