PowerPoint Template

About This Presentation
Title:

PowerPoint Template

Description:

An Inter-Corporate Collaboration on Computer Curation of Intellectual Property & the Scientific Literature * * * * MASTER STAMP MASTER STAMP MASTER STAMP ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 60
Provided by: cactusNc

less

Transcript and Presenter's Notes

Title: PowerPoint Template


1
IBM Research An Inter-Corporate Collaboration
on Computer Curation of Intellectual Property
the Scientific Literature
2
(No Transcript)
3
What we are trying to accomplish
the challenges of today's researchers
Applying text image analysis technology -
to better understand IP (patents) and the
scientific literature Computer curation of
the literature -
Stephen Boyer Ph D Sboyer_at_us.ibm.com 408-858-5
544
4
The Problem
All content and no discovery ?
5
What we are trying to accomplish
the challenges of today's researchers
The problem Gain a better understanding of
IP (patents) and the Scientific Literature The
Question Can we use computers to read
documents, identify critical entities, and
perform meaningful associations that can help
us with our work ? What we did 1) Apply text
analytics technology to analyze Patents the
Scientific Literature (gt30 M IP documents
Medline abstracts) 2) Apply image analytics to
IP documents 3) Explore how these technologies
can be applied to foreign documents (for
example Chinese Japanese patents) The Value
Provide new insights into chemical biomedical
information (still a work in progress).
6
Collaborators
A collaborative work in progress
Corporate Sponsors
Other informal Collaborators partners
  • IBM Research
  • Novartis
  • Pfizer
  • Dupont
  • Lilly
  • Boheringer-Ingelheim
  • Roche / Genentech
  • AstraZeneca (AZ)
  • Bristol-Myers Squibb (BMS)
  • NIH
  • University of Texas
  • EMBL - EBI
  • University of Dundee
  • UC Davis
  • ChemAxon
  • CambridgeSoft
  • Dalhouise
  • Univ of New Mexico

7
Why this is important !
What are the differences between these two
molecules?
Chemistry 1 Carbon, 1 Nitrogen, 1 double bond,
1 hydrogen
Business 1.7B in revenue An opportunity loss of
320M A revenue gain of 320M
  • Bayer patented molecule
  • Annual sales of 320 Million
  • Vardenafil (Levitra)
  • Late to market, found similar
  • molecule and gained share
  • Pfizer patented molecule
  • Annual sales of gt1.7 billion
  • Sildenafil (Viagra)
  • 1st to market, but didnt patent (cover) full
    Chemical space

8
Example IP Challenge
the challenges of today's researchers
Additional Properties
Relationships
How do I find entities from the docs?
How do I find entities relationships?
New IP
Web, Scientific News
Worldwide Patents
Medline
How do I exploit other Information sources?
New Insights
9
Can you find the key molecules in an
unstructured text , for example a scientific
journal or patent?
Chemical nomenclature can be daunting
 a) (2P/4S)-4-4-Amino-5-(4-benzyloxy-phenyl)pyrro
lo2,3-dpyrimidin-7-yl-2-hydroxymethyl-pyrrolidi
ne-1-carboxylic acid tert-butyl ester prepared
analogously to Example 18 starting from
(2R/4S)-4-4-amino-5-(4-benzyloxy-phenyl)-pyrrolo
2,3-dpyrimidin-7-yl-pyrrolidine-1,2-dicarboxylic
acid 1-tert-butyl ester 2-ethyl ester (Example
20a). 1 H-NMR (CDCl3, ppm) 8.52 (s, 1H),
7.52-7.32 (m, 7H), 7.1 (d, 2H), 6.95 (d,1 H),
5.50 (m, 1H), 5.13 (s, 2H), 4.62-4.42 (m, 2H),
4.28 (m, 2H), 4.10 (m, 1H), 3.95-3.70 (m, 1H),
2.75 (m, 1H), 2.50 (m, 1H),1.49 (s, 9H).     b)
(2R/4S)-4-4-Amino-5-(4-benzyloxy-phenyl)-pyrrolo
2,3-dpyrimidin-7-yl-pyrrolidin-2-yl-methanol
0.100 g of (2R/4S)4-4-amino-5-(4-benzyloxy-phenyl
)-pyrrolo2,3-dpyrimidin-7-yl-pyrrolidine-1,2-di
carboxylic acid 1-tert-butyl ester is dissolved
in 4 ml of tetrahydrofuran 10 ml of 4M hydrogen
chloride in diethyl ether are added, and stirring
is carried out for 1 hour at room temperature.
The product is filtered off and dried under a
high vacuum. The dihydrochloride of the title
compound is obtained. 1 H-NMR (CD3 OD, ppm) 8.4
(s, 1H) 7.60 (s, 1H), 7.5-7.10 (m, 9H), 5.65 (m,
1H), 5.18 (s, 2H), 4.32 (m, 1H), 4.00-3.65 (m,
4H), 2.60 (m, 2H). EXAMPLE 24 (2R/4S)-4-(4-Amino-
5-phenyl-pyrrolo2,3-dpyrimidin-7-yl)-1-(2,2-dime
thyl-propionyl)-pyrrolidine-2-carboxylic acid
ethyl ester 0.130 g of (2R/4S)-4-(4-benzyloxycarbo
nylamino-5-phenyl-pyrrolo2,3-dpyrimidin-7-yl)-1-
(2,2-dimethyl-propionyl)-pyrrolidine-2-carboxylic
acid ethyl ester is dissolved in 8 ml of
methanol, and the solution is hydrogenated over
0.030 g of palladium-on-carbon (10) for 1 hour
at normal pressure. The catalyst is removed by
filtration, the filtrate is concentrated by
10
identify the chemical names then convert them
to structures chemical names -gt structures !
entity identification
 a) (2P/4S)-4-4-Amino-5-(4-benzyloxy-phenyl)pyrro
lo2,3-dpyrimidin-7-yl-2-hydroxymethyl-pyrrolidi
ne-1-carboxylic acid tert-butyl ester prepared
analogously to Example 18 starting from
(2R/4S)-4-4-amino-5-(4-benzyloxy-phenyl)-pyrrolo
2,3-dpyrimidin-7-yl-pyrrolidine-1,2-dicarboxylic
acid 1-tert-butyl ester 2-ethyl ester (Example
20a). 1 H-NMR (CDCl3, ppm) 8.52 (s, 1H),
7.52-7.32 (m, 7H), 7.1 (d, 2H), 6.95 (d,1 H),
5.50 (m, 1H), 5.13 (s, 2H), 4.62-4.42 (m, 2H),
4.28 (m, 2H), 4.10 (m, 1H), 3.95-3.70 (m, 1H),
2.75 (m, 1H), 2.50 (m, 1H),1.49 (s, 9H).     b)
(2R/4S)-4-4-Amino-5-(4-benzyloxy-phenyl)-pyrrolo
2,3-dpyrimidin-7-yl-pyrrolidin-2-yl-methanol
0.100 g of (2R/4S)4-4-amino-5-(4-benzyloxy-phenyl
)-pyrrolo2,3-dpyrimidin-7-yl-pyrrolidine-1,2-di
carboxylic acid 1-tert-butyl ester is dissolved
in 4 ml of tetrahydrofuran 10 ml of 4M hydrogen
chloride in diethyl ether are added, and stirring
is carried out for 1 hour at room temperature.
The product is filtered off and dried under a
high vacuum. The dihydrochloride of the title
compound is obtained. 1 H-NMR (CD3 OD, ppm) 8.4
(s, 1H) 7.60 (s, 1H), 7.5-7.10 (m, 9H), 5.65 (m,
1H), 5.18 (s, 2H), 4.32 (m, 1H), 4.00-3.65 (m,
4H), 2.60 (m, 2H). EXAMPLE 24 (2R/4S)-4-(4-Amino-
5-phenyl-pyrrolo2,3-dpyrimidin-7-yl)-1-(2,2-dime
thyl-propionyl)-pyrrolidine-2-carboxylic acid
ethyl ester 0.130 g of (2R/4S)-4-(4-benzyloxycarbo
nylamino-5-phenyl-pyrrolo2,3-dpyrimidin-7-yl)-1-
(2,2-dimethyl-propionyl)-pyrrolidine-2-carboxylic
acid ethyl ester is dissolved in 8 ml of
methanol, and the solution is hydrogenated over
0.030 g of palladium-on-carbon (10) for 1 hour
at normal pressure. The catalyst is removed by
filtration, the filtrate is concentrated by
What is this compound ??
11
Problem I need to find information about Valium
nomenclature issues
Valium (Trade Name)
CAS 439-14-5 (Chemical ID )

Diazepam (Generic Name)


Valium has gt 149 names
ALBORAL, ALISEUM, ALUPRAM , AMIPROL
,ANSIOLIN , ANSIOLISINA , APAURIN, APOZEPAM,
ASSIVAL , ATENSINE , ATILEN , BIALZEPAM ,
CALMOCITENE, CALMPOSE , CERCINE, CEREGULART,
CONDITION, DAP, DIACEPAN, DIAPAM , DIAZEMULS
, DIAZEPAN , DIAZETARD , DIENPAX, DIPAM ,
DIPEZONA, DOMALIUM , DUKSEN, DUXEN, E-PAM,
ERIDAN, EVACALM, FAUSTAN, FREUDAL ,
FRUSTAN, GIHITAN, HORIZON, KIATRIUM, LA-III ,
LEMBROL, LEVIUM, LIBERETAS , METHYL
DIAZEPINONE, MOROSAN , NEUROLYTRIL NOAN
NSC-77518 PACITRAN PARANTEN PAXATE PAXEL
PLIDAN QUETINIL QUIATRIL QUIEVITA RELAMINAL
RELANIUM RELAX RENBORIN RO 5-2807 S.A. R.L.
SAROMET SEDAPAM SEDIPAM SEDUKSEN SEDUXEN ,
SERENACK SERENAMIN SERENZIN SETONIL SIBAZON
SONACON STESOLID STESOLIN , TENSOPAM TRANIMUL
TRANQDYN TRANQUASE TRANQUIRIT ,
TRANQUO-TABLINEN , UMBRIUM UNISEDIL USEMPAX
AP VALEO VALITRAN VALRELEASE VATRAN VELIUM,
VIVAL VIVOL WY-3467
12
There are many different chemical names for Valium
entity identification

Valium
CAS 439-14-5
Diazepam


13
Problems of taxonomy name normalization
The scientist simply wants information about
valium
Choose keywords
Medline
In-house database
Chem. Abstracts
Patent database
DIAPAM
439-14-5 (Chemical ID)
Pereira notebook 23a
7-CHLORO-1-METHYL-5-PHENYL-2H-1,4-BENZODIAZEPIN-2-
ONE
Sedapam
Multiple documents contain Information about
Valium
7-CHLORO-1,3-DIHYDRO-1-METHYL-5-PHENYL-2H-1,4-BEN
ZODIAZEPIN-2-ONE
Diazepam
14
Considerations for searching documents (or web
pages) for chemical substances
Name normalization is important
  • Chemicals have a wide variety of trivial and
    official names.
  • No text search can find chemicals which are named
    using one of the alternative names.
  • Synonym expansion is insufficient.
  • Searching by structure will find all such cases.

Source J Cooper / IBM
15
Finding similarity structures not just text !
Find documents with similar structures
  • Further, we would like to find compounds which
    are supersets of the given structure.
  • For example toluene and methylnaphthalene

Text searches wont find documents with similar
structures
Source J Cooper / IBM
16
The Solution
The proposed solution
Applying text and image analytics to better
understand IP (patents) the scientific
literature Computer curation of the
literature -
17
Patents contain molecular data in multiple forms
  • Text Image manually created chemical complex
    work units (CWUs)

And as (Manually Created) Chemical Complex Work
Units (CWUs)
18
Text Analytics
Lets start with text analysis
The computer reads documents and attempts to
determine domain specific entities for
example chemical names, gene names, disease
names, etc.
19
Step 1 Identify the chemical entities
Step 2 Extract chemical names and load into
tables
Entity extraction
20
Step 3 Convert words to structures
Convert the chemicals into machine readable
formats !
7-CHLORO-1-METHYL-5- PHENYL-2H-1,4- BENZODIAZEPIN-
2-ONE
SMILES strings c1ccccc1
INChI1/C6H6/c1-2-4-6-5-3-1/h1-6H
21
Step 4 Automate the process
Scale up automate the process -
HealthCare Life Science Data warehouse
IBM Servers
Any text
Web Pages
Medline

Patents
Valium
Benzene
  • 11 Million patent documents
  • 18 Million Medline abstracts
  • 100 Million
  • chemical structures
  • gt12 Million unique

22
Summary of overall text analysis operations for
chemistry (HMM, CRF, CFG)
Overall process flow for text analysis
2D Structure
toluene
SMILES String
CC1CCCCC1
-
-
-
.
methyl benzene
Dictionary of the English Language minus
the Dictionary of Desired Entities
  • Options to compute
  • 300 properties per
  • molecule

Blue Gene enabled -
23
Summary of overall text analysis operations for
chemistry
Overall process flow for text analysis
SMILES String
2D Structure
toluene
-
-
-
CC1CCCCC1
.
methyl benzene
Dictionary of the English Language minus
the Dictionary of Desired Entities
  • Options to compute
  • 300 properties per
  • molecule

Blue Gene enabled -
24
Why use Blue Gene?
  • Find and compute the 3D structure of every
    molecule on every page of every patent (and
    Medline abs.)
  • Identify every protein (from our dictionary of
    gt350K proteins) on every page of every patent
    (and Medline abs.)
  • Identify every disease (from our list of 14,500 )
    on every page of every patent and map it to
    Medline MeSh codes
  • Identify the occurrence of every biomarker (from
    our dictionary of 485 biomarkers) on every page
    of every patent
  • .your request goes here !
  • Equivalent to 240K simultaneous Google searches -

Compute properties, find relationships,
Data warehouse
25
Examples
Chemicals derived from text analytics
26
Examples
Chemicals derived from text analytics
27
Examples of structures created via automated
chemical annotation
Chemicals derived from text analytics
28
Leading Causes of Annotator Problems
Typical problems encountered when dealing with
OCR text
  • Improper spacing within the chemical name
  • 2-_(Bicyclo_2.2._1_hept-5-en-2-ylamino)_-5-_2-_
    (4-chloro-3-methylphenoxy)_ethyl-l,_3-_thiazol-4_
    (5H)-one
  • Run on lists
  • indane, 1,2,_3,4- tetrahydroquinoline,
    3,_4-dihydro-2H-1,_4-benzoxazine,
    1,5-naphthyridine, 1, 8- naphthyridine
  • Numbering of compounds
  • Comparative Example 3, 2-bromo-4- (1, 3-dioxo-1,
    3-dihydro-2H-isoindol-2-yl) butanoic acid
    4-(1,3-dioxo-1,3-dihydro-2H-isoindol-2-yl)
    butanoic acid
  • Formatting issues
  • 2-2-(bicyclo 2.2. 1 hept-5-en-2-ylamino)
    -4-oxo-4, 5-dihydro-1, 3-thiazol-5-yl -N-ltBRgt
    ltBRgt ltBRgt ltBRgt ltBRgt ltBRgt ltBRgt ltBRgt
    (4-metlioxyphenyl)-N-methylacetamide
  • Missing or Incorrect Parenthesis
  • 5-(2-anilinoethyl)-2-(2-cyclohex-1-en-1-ylethyl)a
    mino-1,3-thiazol-4(5H)-one

using WO/2005/075471 as an example
29
OCR Errors Compound Names
Searching full-text patents (WO, EP, US, FR, GB,
DE, JP) for the term Simvastatin yields 9030
patents (3666 INPADOC families).
30
OCR Errors Chemical Names
If you think that was bad... look at the IUPAC
names
WO2007096753 6(R)-2-(8'(S)-2",2"-dimethylbutyryloxy-2'(S),6'(R)-dimethyl- l',2',6',7,'8',8a'(R)-hexahydronapthyl-l'(S))-ethyl-4(R)-hydroxy -3,4-5,6-tetrahydro- 2H-pyran-2-one
WO2005095374 6(R)-2-8(5)-(2,2-dimethyl.butyyloxy)-2 (S), 6 (R)-dimethyl-1, 2, 6, 7, 8, 8a(R)-hexahydro-l (S)-napthylelhyl/-4(R)-hydroxy-3, 4, 5, 6-tetrahydro-2H-pyran-2 one
WO2005095374 6(R)-2-8(S)-(2, 2-dimethylbulyryloxy)-2 (S), 6 (R)-dimethyl-1, 2, 6, 7, 8, 8a(R)-hexabydro-l (S)-napthylethyl/-4(R)-hydroxy-3, 4, 5, 6-tetrahydro-2H-pyran-2 one
WO2003018570 6(R)-2-8(S)-(2,2 10 dimethylbutylyloxy)-2(S),6(R)-dimethyl-1,2, 6,7,8,8a(R) hexahydronaphthyl-l(S)ethyl-4(R)-hydroxy-3,4,5,6 tetra hydro-2H-pyrane-2-one
WO2003048149 6(R)-2-8(S)-(2,2- dimethylbutylyloxy)-2(S),6(R)-dimethyl-1,2,6,7,8,8a(R)- hexahydronaphthyl-l(S)ethyl-4(R)-hydroxy-3,4,5,6 20 tetrahydro-2H-pyrane-2-on
WO2003018570 6(R)-2-8(S)-(2,2-dimethylbutylyloxy)-2(S),6(R)-dimeth yl-1,2,6,7,8,8a(R)-hexahydronaphthyl-l(S) ethyl-hydrox y-3,4,5,6-tetrahydro-2H-pyrane-2-one
WO2005095374 6(R)-2-8(S)-(2,2-dimethylbutyrylaxy)-2 (S),6 (R)-dimethyAl, 2, 6, 7, 8, 8a(R)-hexahydro-l (S)-napthylJethyl)-4(R)-hydroxy-3, 4, 5, 6-tetrahydro-2H-pyran-2 one
WO2006072963 6(R)-28(S)-(2,2dimethylbutyryloxy)2(5),6(R).. dimethyI..lt/pgtltpgt1,2,6,7,8,8a(R)-hexahydro-1 (S)-naphthylJethy1J-4(R)hydroxy3,4,5, 6 tetrahydro-2H-pyran-2-one
31
Transposed Characters
Some errors cannot originate from an erroneous
OCR process. Accidentally transposed characters
are another source for variations
ehtyl 1565 patents mehtyl 840
patents compuond 231 patents relaese 44
patents formual 1689 patents
32
Chemical Name Annotation of US patents backfile
(1976-2005) US patent applications (2002
-2005)
Rule 112 Analysis
- Preliminary Results as of June 20 , 2006 -
65,645,252 of Molecules identified -
(total) 3,623,248 of Unique Molecules
1,830,575 of Molecules Passing the
Lipinski Rules 363,993 of documents with
possible 112 violations 17,122 of 2005
pre-grants w/ possible 112 violations
All identified molecules were successfully
converted to Smiles strings
33
Analysis Results
Post processing with pipeline pilot
Molecules TOTAL 65,645,252 UNIQUE
3,623,248 DRUG¹ 1,830,575
¹ Passing Lipinskis Rule of 5 http//en.wikipe
dia.org/wiki/Lipinski's_Rule_of_Five
34
IBM's Research Collaboration on Computer
Curation
Automated Text Image Analysis !
Annotation Factory ? Data Warehouse
Data
  • Annotators
  • Chemicals
  • Biomarkers
  • Genes
  • Proteins
  • Cell Lines
  • Cell Types
  • People
  • Institutions
  • Diseases
  • Symptoms
  • Other

Full-Text Chemical Structures
Journals
Attributes
Medline
Search
Patents
Entities
Edgar
Analysis
Relationships
Web
Co-occurrence Lipinksi Rules Section 112 Trends,
Molecular Networks Time lines
"UIMA"
Blue Gene
Scitegic Pipeline Pilot and other Partner Tools
35
What about processing image data ??
Image entity recognition
IBM pioneered a process for converting images of
chemical structures into Mol files (machine
readable representations of chemical structures)
We can also analyze the image content of patents
journals
36
Seminal paper on converting chemical images into
MOL files
Optical Recognition of Chemical Structures
(OROCS)
37
Optical recognition of chemical structures
(OROCS) How it works
OC(CN1C2(C3CCCCC3)OC(C)CC1O)N(C)C4C2CC(Cl)
CC4
38
Optimization of Image processing process
Extract the images From the page
Isolate the chemical images
Pre-processing of the images makes a
significant difference
SMILE String
39
This shows the selective extraction of image data
from within the patent
Individual images
40
Image Extracted from the page
Structure Generated from the image
SMILE String Generated from the image
Chemical derived from OCR of image data
Examples Results from OCR of chemical images

Source Dr John Kinney
41
Learning from the Exceptions
  • Radicals, polymers, organometallics
  • Name lookup table differences
  • formal
  • Structure conventions differ
  • i.e., CH3MgBr vs. CH3Mg.Br-
  • Ionization state/stereochemistry
  • Internal error corrections
  • Some names are incomplete and therefore ambiguous!

42
Differences of opinion
Often tagged as ambiguous
Where do the punctuation marks belong?
43
Structures from Images
  • Image-to-Structure software very effective on
    clean, crisp images
  • Like text, image quality in documents varies
    greatly!
  • Improper structure assignments are common

44
Structure Recognition Process
  • Clipped images from documents are used.
  • Processing of full-page images is slow and gives
    many errors.
  • OSRA (NIH) run to produce SDFile output
  • PipelinePilot Protocol used to analyze and filter
    resulting structure set.

45
Criteria for filtering invalid structures
  • Presence of non-element atoms, R, X, etc.
  • Inappropriate internal coordinates (bond length
    and angles) of the 2D representation.
  • Over-assigned stereochemistry can be corrected
    rather than removing the entire structure

46
Examples of common errors in translation
Example Structure
Error
Filter Rule
The minimum bond distance where neither atoms is
Hydrogen is required to be greater than 0.85 Å.
Double bond interpreted as two single bonds
The minimum bond angle from an exocyclic
terminal atom to the ring atoms was required to
be greater than 50.
Aromatic bond interpreted as exocyclic bond from
ring
47
Examples of common errors in translation
Example Structure
Filter Rule
Error
Atom found in center of single bond
The maximum bond angle of a carbon with exactly
two single bonds was required to be less than
155.
The minimum bond angle which includes any
terminal atom was required to be greater than
10.
Single bond divided into two single bonds
48
Conversion Statistics
  • 20,081 patents with 487,537 clip files

35 clean
49
Combining Text and Image Structures
50
Image Processing Operations
PTO/ Data Processing Operations
Chem CWUs
Clip Images
CDX / MOL files
OSRA /Clide
SDF files
SDF files
Multi-step post processing Operations
Multi-step post processing Operations
Multi-step post processing Operations
51
Image Extracted from the page
Structure Generated from the image
SMILE String Generated from the image
Chemical derived from OCR of image data
Examples Results from OCR of chemical images

Source Dr John Kinney
52
Computer curation now involves multiple types of
analysis
combining technologies into workflow protocols
IBM Collaborator input
  • Analysis of text

Derived Meta data
  • Analysis of image

Output db to Collaborators
  • Analysis of XML files
  • Analysis of (CWU) s

Internal data
53
Multiple Workflows for processing text Image s
via different technologies
54
Computer Curation Process Overview
Services Hosted at IBM Almaden
User Applications
Annotation Factory
ChemVerse
Selected Internet Content
Pipeline Pilot
U.S. Patents (1976 -2009)
ChemVerse db
ChemVerse (Semantic Associations)
e Classifier Other Data Associations
View selected Documents Reports
BIW
U.S. Pre- Grants (All)
ADU
IP Database (e.g. DB2)
Database computed Meta Data
Data Sources
Parse Extract data
PCT EPO Apps
Cognos/DDQB/ Other Apps
Medline Abstracts (gt18 M)
In-House Content
Computational Analytics
Annotator 1
Chem Search
Annotator 2
SIMPLE
ADU Automated Data Update
55
What about additional meta data ?
Data association
How should we identify extract and associate
attributes ?
Semantic associations using ChemVerse
56
ChemVerse a tool for associating molecular
attributes from different sources
Attributes derived from different sources
IP Attributes
Spectral Attributes
Physical Attributes
  • Orange Book
  • Legal status
  • Assignee
  • Foreign filings
  • Expiration Date
  • NIST db
  • IR spectra
  • NMR,
  • Mass Spec, etc
  • Computational
  • MW,
  • MF
  • Bp
  • Mp , Etc etc

Screening Attributes
Molecular Entities have Various Attributes (
From different sources)
Durg Attributes
  • PubChem
  • Activity
  • Pharam data
  • Target data for SRA
  • Literature references
  • Drugbank
  • Activity
  • Pharam data
  • Protein Binding
  • half life

Toxicity Attributes
  • WomBat
  • Activity
  • Pharam data
  • Target data for SRA
  • Literature references
  • EPA databases
  • Toxicity studies
  • LD50
  • Literature references

57
ChemVerse Semantically maps associations of
attributes from different sources
Semantic association of attributes
Drugbank
Pub Ch?em
FDA
Orange Book
Others
Database C (Tox)
Internet
Location
Data Source 1 Schema 1
Attributes
Output file list of attributes
Data Sources
Input list of SMILES
The Tank
Data Source 2 Schema 2
Attributes
Input list of Attributes
Output file list of SMILES
Attributes
58
Whos is participating ?
Pfizer
Novartis
Bruce.a.Lefker, Christopher.Kibbey,
David.J.Walsh, Sarah.Blendermann, Bryn
Williams-Jones Jacquelyn Klug-McLeod Lee Harland
Robert Owen Marudai Balasubramanian
BMS
Therese.Vachon, Edgar.Jacoby, Peter.Ertl,
Peter.Gedeck, Fatma.Oezdemir-Zaech, John-w.Davies
, Jeremy.Jenkins, Allen.Cornett, Stefan
Wetzel Greg Landrum Richard Lewis A J Dambra
Cynthia.Yang, Charles.Hand, Michael.Rogers,
Ramesh.durvasula, Alice.goshorn,
Mark.Hermsmeier,
AstraZeneca
Sorel.Muresan, Christopher.Southan,
Niklas.Blomberg, Plamen.Petrov,
WIPO
Glenn.Macstravic, Chrstophe.Mazenc, Lustin
Diaconescu, Paul Halfpenny
IBM Almaden Research
Lilly
Ana Lelescu Linda Kato Su Yan Ashish
Sanghavi Ramachandran Prasad Qi He Timothy J
Bethea Yanbo Wu Meenakshi Nagarajan Christopher
Campbell
Stephen Boyer, Jeff Kreulen Ying Chen Tom
Griffin Alfredo Alba Scott Spangler Eric Louie
Brad Wade John Colino Isaac Cheng
Thompson Doman
Boheringer
Jasmin.Saric, Scott.Oloff, John.Hart,
Stephen.Boyer, John.Proudfoot, Markus.Kunze,
NIH
Marc Nicklaus Igor Filippov Marcus Sitzmann
EBI
Genentech / Roche
john Overington Christopher Steinbeck Dominique
Clark
Jeff Blaney, Slaton Lipscomb Keven Clark Jw
Feng Vickie Tsui Bin Qing Ben Sellers
Dupont
John.B.Kinney, Timothy.E.Mueller,
59
Research - Its a journey
Backup materials
Write a Comment
User Comments (0)