Strategies for Building the CAS databases - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Strategies for Building the CAS databases

Description:

Chemical prior art monitoring is not getting easier. ... 11. Annual chemistry disclosures (publications) have grown and likely. will continue to grow ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 48
Provided by: chemicalab
Category:

less

Transcript and Presenter's Notes

Title: Strategies for Building the CAS databases


1
Strategies for Building the CAS databases
  • Matt Toussant and Paul Peters

2
CAS strategy is to achieveleadership on three
fronts
Worlds
Leading
Chemistry
Databases
To Create The worlds best digital environment to
search, retrieve, analyze and link chemical
information
3
The CAS strategy drives specific actions in
Editorial
The worlds best digital environment to search,
retrieve, analyze and link chemical information
But how do these strategies evolve?...
4
World interchange on chemistry research
initiated CA
Radium
Electrons
Penicillin
1890
1925
5
Technology aided CAS in organizing science
diversity
Super- conductor
HIV infection
Fission
Moon
DNA code
1985
1930
6
Scientist multi-disciplinary relationships compel
CAS today
Nano- technology
Expression profiles
Combinatorial Technology
PCR
The Internet
1990
2002
7
CAS databases capture novel chemistry from
publications this provides a useful body of
prior art.
8
Prior art is complex
Taxol
Nanotubes
Herbicide resistant corn
Prosthetic leg
Combinatorial chemistry
Polymerase chain reaction
Teflon
9
Chemical prior art monitoring is not getting
easier.
  • Number of relevant chemical disclosures continues
    to grow annually
  • Complexity of chemical disclosures is rising
  • Inter-dependence and diversity of IP is
    challenging

Some data and examples will illustrate
10
CAS structure collections encompass distinct
resources
  • ChemCATs - gt5,400,000 records 773 commercial
    catalogs (660 suppliers)
  • MARPAT - multi-millions of structure combinations
    from 480,000 Markush parents
  • CASREACT - gt6,300,000 reactions from 350,000
    publications
  • Registry - gt44,500,000 substances identified
    since 1967

11
Annual chemistry disclosures (publications) have
grown and likely will continue to grow
Annual CAS Records since 1907
Chemistry publications
12
Annual chemistry disclosures (publications) have
grown and likely will continue to grow
Projected Growth 2003
Chemistry publications
2007
13
Annual chemistry disclosures (publications) have
grown and likely will continue to grow
Projected Growth 2007
Speculation
Chemistry publications
2007
14
Complexity rising - more substances, more
interrelated concepts
Index entries
15
CAS organizes the world of chemistry
The worlds leading chemistry databases
16
CAS is committed to new content...
. the last five years
17
Pre-1967
Patents
Calculated Properties
Citations
Sequences
E-Pre-prints
Experimental Properties
Pre-1967
Screening libraries
INPI
InfoChem
Catalog updates
18
CAS database building efforts to organize
chemistry information
19
Chemists have used many systems to describe and
identify compounds
11b-18FFluoro-5a-dihydrotestosterone
LogD 1.2
Chemical Identification
m.p. 78.2º
8-Quinolinamine
NCX 4040
b.p. 57º
Physical Properties
Atomic Relationships
Nomenclature
20
Organizing chemical information requires
systemization
  • Systemization captures chemistry themes
  • Themes (databases)
  • reactions,
  • structures,
  • gene sequences,
  • concepts, etc.
  • Within a theme, rules define relationships of
    entities
  • process
  • identifiers
  • categorization

21
Another component of systemization is database
identifiers
Substance Identification
Sequences
Chemical Reactions
Physical Properties
Atomic Relationships
Nomenclature
chemical novelty
22
CAS organizes substance identification categories
Substance Identification
Physical Properties
Atomic Relationships
Nomenclature
  • Topological w/
  • Stereochemistry
  • Markush
  • Connection Tables
  • 18 million
  • Sequences
  • CAS Registry System

1. Predicted Properties nearing 100 million
values for gt12 million compounds 2.
Experimental Properties 1 million mp, bp, etc.
  • 1. Generic gt3 million
  • CAS Thesaurus
  • 2. Exact gt100 million
  • Trade
  • Semi-systematic
  • CAS

23
The Registry is a highly ordered collection of
molecule identifiers
very orderly
very disorderly
The orderliness continuum
24
Underlying orderliness is a system for
minimizing ambiguity
25
Reactions are built based on reactant and product
identity
CAS RN 325-76-0
CAS RN 435275-98-2
CAS RN 435276-00-9
26
The key elements are not just the systems
  • CAS Database Building Staff
  • Over 600 staff involved in database building
    efforts
  • Advance degrees held by about 300 analysts
  • Original literature can be in any of 50
    languages
  • Experience at CAS averages more than 15 years

27
CAS uses its processes, identifiers, and
categorization to logically capture and organize
information for chemists.
An example...
28
Patents are dissected for novelty
WO 02 046157 118 pages 45 examples 104 claims
29
An abstract reflecting the breadth of the art is
created
30
Patent family information is organized
31
CAS indexing captures novel chemistry
  • 23 New substances
  • 153 Reactions and reaction schemes
  • 27 single step reactions
  • 42 Substances with new information
  • 6 Subjects with novelty
  • 3 Markush structures with 10 Gps 6 106
    definitions

Total of at least 227 new chemistry novelty
entities
32
Substances claimed are indexed and connection
tables registered
33
ST methylpiperidinemethanol diastereomer prepn
drug intermediate opioid receptor ligand
prepn piperidinecarboxaldehyde asym addn
alkylmetal IT Human (prepn. of
a-methylpiperidine-3-methanol diastereomers and
analogs as drug intermediates) IT Opioid
receptors RL BSU (Biological study,
unclassified) BIOL (Biological study)
(prepn. of a-methylpiperidine-3-methanol
diastereomers and analogs as drug
intermediates) IT Addition reaction
(stereoselective prepn. of a-methylpiperidine-3-m
ethanol diastereomers and analogs as drug
intermediates) IT 435275-92-6P 435275-93-7P
435275-94-8P 435275-95-9P 435275-96-0P
435275-97-1P 435275-98-2P 435275-99-3P
435276-03-2P RL IMF (Industrial
manufacture) RCT (Reactant) SPN (Synthetic
preparation) PREP (Preparation) RACT (Reactant
or reagent) (prepn. of a-methylpiperidine-
3-methanol diastereomers and analogs as
drug intermediates) IT 435276-00-9P
435276-01-0P 435276-02-1P RL IMF
(Industrial manufacture) SPN (Synthetic
preparation) PREP (Preparation)
(prepn. of a-methylpiperidine-3-methanol
diastereomers and analogs as drug
intermediates) IT 309746-85-8P 309746-87-0P
309746-90-5P 309746-92-7P RL PAC
(Pharmacological activity) SPN (Synthetic
preparation) THU (Therapeutic use) BIOL
(Biological study) PREP (Preparation) USES
(Uses) (prepn. of a-methylpiperidine-3-met
hanol diastereomers and analogs as drug
intermediates) IT 62-53-3, Aniline, reactions
103-63-9, (2-Bromoethyl)benzene 501-53-1,
Benzyloxycarbonyl chloride 83602-37-3
163343-71-3 377780-25-1 RL RCT
(Reactant) RACT (Reactant or reagent)
(prepn. of a-methylpiperidine-3-methanol
diastereomers and analogs as drug
intermediates)
34
Reactions are organized and information verified
by substance connection table
35
Single-step reactions are built and multi-step
reactions are mapped.
36
Generically described substances are indexed
37
G1 (0-2) CH2 G2 H / alkyl (SR (1-)
G3) / alkyl (SR G4) / 8
G3 arylltEC (6-) C, RC (1-)gt (SO) G4
HyltEC (5-) A (1-4) Q (0-) N (0-) O (0-) S, AR
(1-), BD (2-) D, RC (1-), RS (0-) E5
(0-) E6 (0-) E7gt (SO) G5 alkyl (SO G6) /
cycloalkyl (SO G7) / arylltEC (6-) C, RC
(1-)gt (SO) / HyltEC (5-) A (1-4) Q (0-) N
(0-) O (0-) S, AR (1-), BD (2-) D, RC
(1-), RS (0-) E5 (0-) E6 (0-) E7gt (SO) /
alkyl (SR (1-) G3) / alkyl (SR G4) / (SC CH2Ph /
CH2CH2Ph) G6 cycloalkyl (SO) / R G7
alkyl (SO) / cycloalkyl (SO) / R G8 NH2 /
15 / OH
G9 alkyl (SO G6) / cycloalkyl (SO G7) /
arylltEC (6-) C, RC (1-)gt (SO) /
HyltEC (5-) A (1-4) Q (0-) N (0-) O (0-) S, AR
(1-), BD (2-) D, RC (1-), RS (0-) E5
(0-) E6 (0-) E7gt (SO) / alkyl (SR (1-)
G3) / alkyl (SR G4) / (SC Me) G10 alkyl (SO
G6) / cycloalkyl (SO G7) / arylltEC (6-)
C, RC (1-)gt (SO) / HyltEC (5-) A (1-4) Q
(0-) N (0-) O (0-) S, AR (1-), BD (2-)
D, RC (1-), RS (0-) E5 (0-) E6 (0-) E7gt (SO) /
alkyl (SR (1-) G3) / alkyl (SR G4) / (SC
Ph)
38
Derwent abstract and indexing for this document
AB WO 200246157 A UPAB 20020823 NOVELTY -
Substituted piperidines are new.
DETAILED DESCRIPTION - Substituted piperidines of
formulae (I), (II), (III) or (IV) are new.
n 0 - 2 R H, aralkyl or
CO2R' R' alkyl, aryl or aralkyl
Z' NHR'' or OH and R''
H, alkyl, aryl or aralkyl. An
INDEPENDENT CLAIM is included for the preparation
of an enantiomerically enriched
3-(1-hydroxyalkyl)-substituted cyclic amine (A),
comprising stereoselective addition of a
nucleophilic alkyl or aryl to an enantiomer
of a 3-substituted cyclic amine (carbonyl
containing) with a chiral transition metal
complex and a metal alkyl or metal aryl to
produce an enantiomer of (A).
ACTIVITY - Analgesic Antiaddictive Auditory.
No biological data available.
MECHANISM OF ACTION - Opioid receptor binder.
No biological data available.
USE - (I) - (IV) are used for the treatment of
numerous ailments, conditions and diseases
(e.g. addiction and pain), psychological
addictions, psychiatric disorders or neurological
pathologies (e.g. tinnitus).
ADVANTAGE - (I) - (IV) possess analgesic
properties free from respiratory depression
and the potential for physical dependence
associated with mu -opioid receptor ligands such
as morphine or fentanyl. Dwg.0/41 FS
CPI FA AB GI DCN MC CPI B07-D05 B14-C01
B14-J01 B14-J02 B14-L01 B14-L06 B14-M01
B14-N02
39
Implications for searching
  • Searching for a class of compounds no longer
    retrieves compounds just from the journal and
    patent literature in CA

40
Running the structure search
41
Derwent code searching
Much more imprecise, almost like searching
Registry only at the screen level. Prepare for
lots of false drops. Good as a final check if
absolutely necessary
42
CAplus retrieves hits from journals and patents
43
Did we find answers from all substances?
44
CASREACT can also be a unique source of CAS RNs
45
Reaction from the INPI section of CASREACT is
available in CAplus, but without indexing
46
What about the rest of the compounds?
Most of them are compounds registered for
Chemcats but not yet linked to their
Chemcats record
47
Summary
  • CAS indexing focuses on concepts and specific
    substances
  • Substance registration now goes beyond the
    journal and patent literature
  • Prior art can also be found in Chemcats and
    CASREACT
Write a Comment
User Comments (0)
About PowerShow.com