Title: Strategies for Building the CAS databases
1Strategies for Building the CAS databases
- Matt Toussant and Paul Peters
2CAS strategy is to achieveleadership on three
fronts
Worlds
Leading
Chemistry
Databases
To Create The worlds best digital environment to
search, retrieve, analyze and link chemical
information
3The CAS strategy drives specific actions in
Editorial
The worlds best digital environment to search,
retrieve, analyze and link chemical information
But how do these strategies evolve?...
4World interchange on chemistry research
initiated CA
Radium
Electrons
Penicillin
1890
1925
5Technology aided CAS in organizing science
diversity
Super- conductor
HIV infection
Fission
Moon
DNA code
1985
1930
6Scientist multi-disciplinary relationships compel
CAS today
Nano- technology
Expression profiles
Combinatorial Technology
PCR
The Internet
1990
2002
7CAS databases capture novel chemistry from
publications this provides a useful body of
prior art.
8Prior art is complex
Taxol
Nanotubes
Herbicide resistant corn
Prosthetic leg
Combinatorial chemistry
Polymerase chain reaction
Teflon
9Chemical prior art monitoring is not getting
easier.
- Number of relevant chemical disclosures continues
to grow annually - Complexity of chemical disclosures is rising
- Inter-dependence and diversity of IP is
challenging
Some data and examples will illustrate
10CAS structure collections encompass distinct
resources
- ChemCATs - gt5,400,000 records 773 commercial
catalogs (660 suppliers) - MARPAT - multi-millions of structure combinations
from 480,000 Markush parents - CASREACT - gt6,300,000 reactions from 350,000
publications - Registry - gt44,500,000 substances identified
since 1967
11Annual chemistry disclosures (publications) have
grown and likely will continue to grow
Annual CAS Records since 1907
Chemistry publications
12Annual chemistry disclosures (publications) have
grown and likely will continue to grow
Projected Growth 2003
Chemistry publications
2007
13Annual chemistry disclosures (publications) have
grown and likely will continue to grow
Projected Growth 2007
Speculation
Chemistry publications
2007
14Complexity rising - more substances, more
interrelated concepts
Index entries
15CAS organizes the world of chemistry
The worlds leading chemistry databases
16CAS is committed to new content...
. the last five years
17Pre-1967
Patents
Calculated Properties
Citations
Sequences
E-Pre-prints
Experimental Properties
Pre-1967
Screening libraries
INPI
InfoChem
Catalog updates
18CAS database building efforts to organize
chemistry information
19Chemists have used many systems to describe and
identify compounds
11b-18FFluoro-5a-dihydrotestosterone
LogD 1.2
Chemical Identification
m.p. 78.2º
8-Quinolinamine
NCX 4040
b.p. 57º
Physical Properties
Atomic Relationships
Nomenclature
20Organizing chemical information requires
systemization
- Systemization captures chemistry themes
- Themes (databases)
- reactions,
- structures,
- gene sequences,
- concepts, etc.
- Within a theme, rules define relationships of
entities - process
- identifiers
- categorization
21Another component of systemization is database
identifiers
Substance Identification
Sequences
Chemical Reactions
Physical Properties
Atomic Relationships
Nomenclature
chemical novelty
22CAS organizes substance identification categories
Substance Identification
Physical Properties
Atomic Relationships
Nomenclature
- Topological w/
- Stereochemistry
- Markush
- Connection Tables
- 18 million
- Sequences
- CAS Registry System
1. Predicted Properties nearing 100 million
values for gt12 million compounds 2.
Experimental Properties 1 million mp, bp, etc.
- 1. Generic gt3 million
- CAS Thesaurus
- 2. Exact gt100 million
- Trade
- Semi-systematic
- CAS
23The Registry is a highly ordered collection of
molecule identifiers
very orderly
very disorderly
The orderliness continuum
24Underlying orderliness is a system for
minimizing ambiguity
25Reactions are built based on reactant and product
identity
CAS RN 325-76-0
CAS RN 435275-98-2
CAS RN 435276-00-9
26The key elements are not just the systems
- CAS Database Building Staff
- Over 600 staff involved in database building
efforts - Advance degrees held by about 300 analysts
- Original literature can be in any of 50
languages - Experience at CAS averages more than 15 years
27CAS uses its processes, identifiers, and
categorization to logically capture and organize
information for chemists.
An example...
28Patents are dissected for novelty
WO 02 046157 118 pages 45 examples 104 claims
29An abstract reflecting the breadth of the art is
created
30Patent family information is organized
31CAS indexing captures novel chemistry
- 23 New substances
- 153 Reactions and reaction schemes
- 27 single step reactions
- 42 Substances with new information
- 6 Subjects with novelty
- 3 Markush structures with 10 Gps 6 106
definitions
Total of at least 227 new chemistry novelty
entities
32Substances claimed are indexed and connection
tables registered
33ST methylpiperidinemethanol diastereomer prepn
drug intermediate opioid receptor ligand
prepn piperidinecarboxaldehyde asym addn
alkylmetal IT Human (prepn. of
a-methylpiperidine-3-methanol diastereomers and
analogs as drug intermediates) IT Opioid
receptors RL BSU (Biological study,
unclassified) BIOL (Biological study)
(prepn. of a-methylpiperidine-3-methanol
diastereomers and analogs as drug
intermediates) IT Addition reaction
(stereoselective prepn. of a-methylpiperidine-3-m
ethanol diastereomers and analogs as drug
intermediates) IT 435275-92-6P 435275-93-7P
435275-94-8P 435275-95-9P 435275-96-0P
435275-97-1P 435275-98-2P 435275-99-3P
435276-03-2P RL IMF (Industrial
manufacture) RCT (Reactant) SPN (Synthetic
preparation) PREP (Preparation) RACT (Reactant
or reagent) (prepn. of a-methylpiperidine-
3-methanol diastereomers and analogs as
drug intermediates) IT 435276-00-9P
435276-01-0P 435276-02-1P RL IMF
(Industrial manufacture) SPN (Synthetic
preparation) PREP (Preparation)
(prepn. of a-methylpiperidine-3-methanol
diastereomers and analogs as drug
intermediates) IT 309746-85-8P 309746-87-0P
309746-90-5P 309746-92-7P RL PAC
(Pharmacological activity) SPN (Synthetic
preparation) THU (Therapeutic use) BIOL
(Biological study) PREP (Preparation) USES
(Uses) (prepn. of a-methylpiperidine-3-met
hanol diastereomers and analogs as drug
intermediates) IT 62-53-3, Aniline, reactions
103-63-9, (2-Bromoethyl)benzene 501-53-1,
Benzyloxycarbonyl chloride 83602-37-3
163343-71-3 377780-25-1 RL RCT
(Reactant) RACT (Reactant or reagent)
(prepn. of a-methylpiperidine-3-methanol
diastereomers and analogs as drug
intermediates)
34Reactions are organized and information verified
by substance connection table
35Single-step reactions are built and multi-step
reactions are mapped.
36Generically described substances are indexed
37G1 (0-2) CH2 G2 H / alkyl (SR (1-)
G3) / alkyl (SR G4) / 8
G3 arylltEC (6-) C, RC (1-)gt (SO) G4
HyltEC (5-) A (1-4) Q (0-) N (0-) O (0-) S, AR
(1-), BD (2-) D, RC (1-), RS (0-) E5
(0-) E6 (0-) E7gt (SO) G5 alkyl (SO G6) /
cycloalkyl (SO G7) / arylltEC (6-) C, RC
(1-)gt (SO) / HyltEC (5-) A (1-4) Q (0-) N
(0-) O (0-) S, AR (1-), BD (2-) D, RC
(1-), RS (0-) E5 (0-) E6 (0-) E7gt (SO) /
alkyl (SR (1-) G3) / alkyl (SR G4) / (SC CH2Ph /
CH2CH2Ph) G6 cycloalkyl (SO) / R G7
alkyl (SO) / cycloalkyl (SO) / R G8 NH2 /
15 / OH
G9 alkyl (SO G6) / cycloalkyl (SO G7) /
arylltEC (6-) C, RC (1-)gt (SO) /
HyltEC (5-) A (1-4) Q (0-) N (0-) O (0-) S, AR
(1-), BD (2-) D, RC (1-), RS (0-) E5
(0-) E6 (0-) E7gt (SO) / alkyl (SR (1-)
G3) / alkyl (SR G4) / (SC Me) G10 alkyl (SO
G6) / cycloalkyl (SO G7) / arylltEC (6-)
C, RC (1-)gt (SO) / HyltEC (5-) A (1-4) Q
(0-) N (0-) O (0-) S, AR (1-), BD (2-)
D, RC (1-), RS (0-) E5 (0-) E6 (0-) E7gt (SO) /
alkyl (SR (1-) G3) / alkyl (SR G4) / (SC
Ph)
38Derwent abstract and indexing for this document
AB WO 200246157 A UPAB 20020823 NOVELTY -
Substituted piperidines are new.
DETAILED DESCRIPTION - Substituted piperidines of
formulae (I), (II), (III) or (IV) are new.
n 0 - 2 R H, aralkyl or
CO2R' R' alkyl, aryl or aralkyl
Z' NHR'' or OH and R''
H, alkyl, aryl or aralkyl. An
INDEPENDENT CLAIM is included for the preparation
of an enantiomerically enriched
3-(1-hydroxyalkyl)-substituted cyclic amine (A),
comprising stereoselective addition of a
nucleophilic alkyl or aryl to an enantiomer
of a 3-substituted cyclic amine (carbonyl
containing) with a chiral transition metal
complex and a metal alkyl or metal aryl to
produce an enantiomer of (A).
ACTIVITY - Analgesic Antiaddictive Auditory.
No biological data available.
MECHANISM OF ACTION - Opioid receptor binder.
No biological data available.
USE - (I) - (IV) are used for the treatment of
numerous ailments, conditions and diseases
(e.g. addiction and pain), psychological
addictions, psychiatric disorders or neurological
pathologies (e.g. tinnitus).
ADVANTAGE - (I) - (IV) possess analgesic
properties free from respiratory depression
and the potential for physical dependence
associated with mu -opioid receptor ligands such
as morphine or fentanyl. Dwg.0/41 FS
CPI FA AB GI DCN MC CPI B07-D05 B14-C01
B14-J01 B14-J02 B14-L01 B14-L06 B14-M01
B14-N02
39Implications for searching
- Searching for a class of compounds no longer
retrieves compounds just from the journal and
patent literature in CA
40Running the structure search
41Derwent code searching
Much more imprecise, almost like searching
Registry only at the screen level. Prepare for
lots of false drops. Good as a final check if
absolutely necessary
42CAplus retrieves hits from journals and patents
43Did we find answers from all substances?
44CASREACT can also be a unique source of CAS RNs
45Reaction from the INPI section of CASREACT is
available in CAplus, but without indexing
46What about the rest of the compounds?
Most of them are compounds registered for
Chemcats but not yet linked to their
Chemcats record
47Summary
- CAS indexing focuses on concepts and specific
substances - Substance registration now goes beyond the
journal and patent literature - Prior art can also be found in Chemcats and
CASREACT