Title: Dr. Douglas B. Lenat
1- CYC Software that partially understands
- The impact of that on the AIM dream
Computers versus Common Sense
-
- Dr. Douglas B. Lenat
-
- , 7718 Wood Hollow Drive, Austin,
TX 78731 - Email Lenat_at_cyc.com
Phone 512-342-4001 -
2ELIZA (DOCTOR)
- 1965 Joe Weizenbaum, MIT
- Carl Rogers advocated reflection
- Patient I swear a lot.
- How do you feel about the fact that you swear a
lot? - Counted a lot on anthropomorphism
- Count on patient-to-doctor respect/awe/
Patient My dogs mother died recently. Tell
me more about your mother.
Veneer of intelligence, not intelligence
3- ALICE (40 years post-Eliza)
- JUDGE119260I WILL ASK YOU SOME SIMPLE
QUESTIONS AND GIVE SIMPLE COMMANDS - PROGRAM9260Do you think your plan will
succeed? - JUDGE1192616WHAT IS THE COLOR OF A BLUE CAR
- PROGRAM92616WHAT COLOR IS a blue car
- JUDGE1192635WHERE IS SUES NOSE WHEN SUE IS
IN HER HOUSE - PROGRAM92635Where it belongs.
- Try searching the Web.
4(No Transcript)
5(No Transcript)
62009 Software is still Brittle
Is the Space Needle taller than the Eiffel
Tower?
How old was Martin van Buren when he was
elected President of the U.S.?
7Natural Language Understanding requires having
lots of knowledge
- The pen is in the box.
- The box is in the pen.
- 2. The police watched the demonstrators because
they feared violence. - The police watched the demonstrators because
because they advocated violence. - 3. Mary and Sue are sisters.
- Mary and Sue are mothers.
- 4. Every American has a mother.
- Every American has a president.
- 5. John saw his brother skiing on TV. The fool
didnt have a coat on! - John saw his brother skiing on TV. The fool
didnt recognize him!
8- 7. include all the re-do CABG procedures
utilizing ITA and SVG in 1991. - And usually does mean and. But in this
query, and really must mean or. Medical
knowledge, not grammar, disambiguates this a
single CABG will not have both an ITA and a SVG. - 8. that the tumor cells are stopping dividing
or dying - Do they mean stopping dividing or stopping
dying? Of course not, but in 16 of 30
randomly selected syntactically similar
constructions from www.clinicaltrials.gov, the
coordination (i.e., the wider scope of the
modifier, in this case the word stopping) was
the intended meaning. In each case, only one
choice makes sense (is consistent with medical
knowledge and common sense). - 9. Adult patients who underwent MAZE III with or
without Mitral Valve Repair or Replacements. - Is the second half of that query just a waste of
space? Discourse pragmatics says no, the
physician must have had some reason for saying
that. Medical knowledge provides a plausible
interpretation Adult patients who underwent
MAZE III with no concomitant procedures other
than Mitral Valve Repair or Replacements
9Okay, so lets tell the computer the same sorts
of things that human beings know about cars, and
colors, heights, movies, time, driving to a
place, etc.
? all the other stuff that everybody knows.
- The basic idea
- Get the computer to understand, not just store,
information. Then it can reason to answer your
queries.
2 July 2005
10MicrowaveOven is a type of Kitchen-Appliance Dishw
asher is a type of Kitchen-Appliance
- The basic idea
- Get the computer to understand, not just store,
information. Then it can reason to answer your
queries.
2 July 2005
11You cant use X if it alorxes Y but lacks any Y
Rthagide-disjaks is a type of Kitchen-Appliance Gr
acinimumples is a type of Kitchen-Appliance Rthagi
de-disjaks alorxes Vorawnistz. Gracinimumples
alorxes Vorawnistz and Buzqa. Buzqa is a Thwarn
and supplied through Epluns.
2 July 2005
12etc. ? all the other stuff that everybody knows.
Eventually, after writing millions of these
rules, the system knows as much about pipes,
liquids, water, electricity, microwave ovens,
dishwashers, cars, colors, movies, heights, etc.
as you and I do.
Ultimately, there is just 1 interpretation of
that model, and it corresponds to the real world.
Long before that, incrementally, the system
gains competence and trustworthiness
- The basic idea
- Get the computer to understand, not just store,
information. Then it can reason to answer your
queries.
2 July 2005
13Cyc is
- The typical bird has 1 beak, 1 heart, lots of
feathers, - Hearts are internal organs feathers are external
protrusions - Most vehicles are steered by an awake, sane,
adult, human - Tangible objects cant be in 2 (disjoint) places
at once - Badly injuring a child is much worse than killing
a dog - Causes temporally precede (i.e., start before)
their effects - A stabbing requires 2 cotemporal and proximate
actors - etc.
14Cyc is
- Each of these represented in formal logic
- Info. about a set of hundreds of thousands of
terms - Language-independent
ChineseWordForWritingPen
15Cyc is
16 What Needs to be Shared?
- bits/bytes/streams/network
- alphabet, special characters,
- words, morphological variants,
- syntactic meta-level markups (HTML)
- semantic meta-level markups (SGML, XML)
- content (logical representation of doc/page/...)
- context (common sense, recent utterances, and n
dimensions of metadata time, space, level of
granularity, the sources purpose, etc.)
Sem. Web
17 How formalized knowledge helps search
(ForAll ?P (ForAll ?C (implies (and
(isa ?P Person)
(children ?P ?C)) (loves ?P ?C))))
When you become happy, you smile. You become
happy when someone you love accomplishes a
milestone. Taking ones first step is a
milestone. Parents love their children.
find information by inference (KB)
- Caption A man helping his daughter take
her first step
.
18 How formalized knowledge helps search
- Query Show me pictures of strong and
adventurous people - Caption A man climbing a rock face
find information by inference (KB)
19 How formalized knowledge helps search
- Query Government buildings damaged in
terrorist events in Beirut between 1990 and 2001 - Document 1993 pipe bombing of Frances embassy
in Lebanon.
Text Document
find information by inference (KB)
20How can our programs be intelligent, not merely
have the veneer of it?
- ANSWER By having a large corpus of knowledge,
spanning the gamut from specific domain-dependent
all the way up to general common sense. - The computer needs to be able to apply the
knowledge, not just store some English gloss - Represent it formally (predicate calculus), and
apply logic - Represent it numerically, and apply
mathematics/statistics
- And after all that Be compelling to the
human deciding
21One Good Explanation is worth 20 points of IQ
- Magic tricks
- How do they do that?! ? How was I ever
fooled by that?! - Efficacy of punishment vs reward
- Punishment is more effective, and the statistics
back me up - Clinical decision-making (by doctors and by
patients) - Because 0.814 versus Because lt plausible
causal rationale gt - Organ donation in European countries
- Why is it so often 15/85 or 85/15 ?
- Answer Because when you apply for a drivers
license in some countries, you have to check a
box to opt in in others, you have to check a
box to opt out and in the U.S. and most
European countries at least, 85 of the people
dont know what they should do, even though its
an emotional, serious choice, and end up just
leaving it unchecked.
- And after all that Be compelling to the
human deciding
22Reflection Framing Effect
Philadelphia is preparing for a Legionaires
Disease outbreak expected to kill 600 people
today. Two alternative programs to combat the
disease have been proposed. The consequences
of each program are as follows
If Program A is adopted, 200 people will be
saved. (72) If Program B is adopted, there is
a 1/3 chance that all 600 will be saved, and a
2/3 chance that no lives will be saved.
(28)
If Program A is adopted, 400 people will die.
(22) If Program B is adopted, there
is a 2/3 chance that 600 will die, and a 1/3
chance that no one will die.
(78)
For more information, see Kahneman, D. and
Tversky, A. (1984). Choices, values, and
frames. American Psychologist, 39, 341-350.
23Conjunction Fallacy
- A health survey was conducted in a
representative sample of adult males in Chicago
of all ages and occupations. Mr. F was included
in the sample. He was selected by random chance
from the list of participants. - Please rank the following statements in terms of
which is most likely to be true of Mr. F. (1more
likely to be true, 6least likely) - ____ Mr. F smokes more than 1 cigarette per day
on average. - ____ Mr. F has had one or more heart attacks.
A - ____ Mr. F had a flu shot this year.
A and B - ____ Mr. F eats red meat at least once per week.
- ____ Mr. F has had one or more heart attacks and
he is over 55 years old. - ____ Mr. F never flosses his teeth.
58 rated A and B more likely than A
For more information, see Tversky, A. and
Kahneman, D. (1983). Extensional vs. intui-tive
reasoning The conjunction fallacy in
probability judgment. Psych.Rev. 90, 293-315.
24Why there is a need for meta-logical elements
(rationale and POV) to convince decision-makers
- Early hominids pre-rational decision-makers
- Later hominids usually rational
- Even later hominids almost always rational
-
YOU ARE HERE
25- A 67 year old woman suffering from ICM with
elevated bilirubin, history of diabetes, body
mass index of 39.5, NYHA function class III,
mitral valve regurgitation grade (MVRG) of 2,
and no aortic valve regurgitation (AVR) is
assigned to CABG surgery. RFCyc is consulted
and the RF (random forest statistical reasoning)
component, having been trained on a large
database, identifies CABG alone as the most
likely treatment option, citing an odds ratio of
2.6 over the next most favorable treatment,
CABGMVA. As rationale, the Cyc (AI) component
observes that the low MVRG is atypical of MVA
which is a surgical procedure typically reserved
for patients with severe mitral regurgitation and
thus the simpler CABG procedure is preferred.
However, an intraoperative transesophageal
echocardiogram (TEE) suggests MVRG is 3. Based
on this, the surgical team overrides the initial
diagnosis without consultation, opting instead
for CABGMVA. The patient dies 3 days later from
complications due to surgery. - In this setting, RFCyc, if consulted, could
have alerted the heart team to additional data
that might have swayed their decision, thus
potentially saving a life. RFCyc would have
noted that while an MVRG of 3 is consistent with
CABGMVA, the odds favoring CABG only marginally
decrease from 2.61 to 1.71 when MVRG is
upstaged for this patient from 2 to 3, and that
surgery under CABG alone offers a 20 increase in
median survival compared to CABGMVA. RFCyc
could further argue that intraoperative MVRG can
falsely appear to be upstaged due to altered
hemodynamics in anesthetized patients. An
Cyc-assisted semantic search of the recent
literature reveals that transesophageal
transthoracic echocardiograms (TTE) more reliably
reflect the degree of mitral regurgitation than
TEE. That (co-morbidities) argues for just
CABG.
264 Pitfalls of Semantic Technology
- Ignorance-based A small theory size (terms,
instances, rules) - Static KB (massively tuned, optimized, cached
ahead of time) - Simple assertions (SAT constraints
propositional calculus Horn clause logic
Description Logic first order logic) - 1 global context (no contradic.s, tiny domain,
simplified world)
27 Applying Cyc
- Cyc is a power source, not a single application.
- Like oil, electricity, telephony, computers,
Cyc can spawn and sustain a knowledge utility
industry. - It can cost-effectively underlie almost all apps.
- (Provide a common-sense layer to reduce
brittleness when faced with unexpected
inputs/situations) - To apply Cyc, we extend its ontology, its KB, and
possibly its suite of specialized reasoning
modules
28The Analysts Knowledge Base
CT Analyst
"What sequences of events could lead to the
destruction of Hoover Dam?"
Were there any attacks on targets of symbolic
value to Muslims since 1987 on a Christian holy
day?"
Domain Experts
Scenario
Explanation
Query
Scenario
Explanation
Query
Generation
Generation
Formulation
Generator
Generator
Formulator
Others/GOTS
Cycorp Tools For Ontology-Building, -Browsing,
-Editing, Fact/Rule Entry
Analysis and
General Knowledge
Collaboration
Components
Terrorism Knowledge
AKB
OWL
Relational DB projection of the AKB
29A more recent example
- What major US cities are particularly vulnerable
to an anthrax attack? - The answer is logically implied by data
dispersed through several sources
30What major US cities are particularly
vulnerable to an anthrax attack?
- major US city ? ?C is a U.S. City with gt1M
population - particularly vulnerable to an anthrax attack ?
- the current ambient temperature at ?C is above
freezing, and - ?C has more than 100 people for each hospital
bed, and - the number of anthrax host animals near ?C
exceeds 100k
31 state name type
county state_fips ---------------------
--------------------------------------------
TX Dallas ppl
Dallas 48 MN Hennepin
County civil Hennepin
27 CA Sacramento County civil
Sacramento 6 AZ
Phoenix ppl Maricopa
4 primary_lat primary_long
elevation population status
-----------------------------------------------
------------------- 32.78333 -96.8
463 1022830 BGN 1978 1959
45.01667 -93.45 0 1032431
38.46667 -121.31667 0
1041219 33.44833 -112.07333 1072
1048949 BGN 1931 1900 1897
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
32- So how do we explain to our system that
- row 1 of that table is about the city of
Dallas, TX - the population field of that table contains the
number of inhabitants of the city that that row
is about - here is exactly how to access tuples of that
database - that access will be fast, accurate, recent,
complete
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
33- the population field of that table contains the
number of inhabitants of the city that that row
is about - We provide the field encodings and decodings,
some of which correspond to explicit fields like
population, two-letter state codes, etc
(fieldDecoding Usgs-Gnis-LS ?x
(TheFieldCalled population)
(numberOfInhabitants (TheReferentOfTheRow
Usgs-Gnis) ?x))
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
34- how to access tuples of that database
- We provide all the information needed for a JDBC
connection script - We assert, in the context (MappingMtFn Usgs-KS),
all of these
(passwordForSKS Usgs-KS "geografy") (portNumberFor
SKS Usgs-KS 4032) (serverOfSKS Usgs-KS
"sksi.cyc.com") (sqlProgramForSKS Usgs-KS
PostgreSQL) (structuredKnowledgeSourceName
Usgs-KS "usgs") (subProtocolForSKS Usgs-KS
"postgresql") (userNameForSKS "sksi")
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
35- that access will be fast, accurate, recent,
complete - We provide meta-level assertions about the
database, about each table of the database, about
the completeness etc. of various kinds of data in
the DB, etc. - We assert, in the context (MappingMtFn Usgs-KS)
(schemaCompleteExtentKnownForValueTypeInArg
Usgs-Gnis-LS USCity numberOfInhabitants
1)
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
36- that access will be fast, accurate, recent,
complete - We provide meta-level assertions about the
database, about each table of the database, about
the completeness etc. of various kinds of data in
the DB, etc. - We assert, in the context (MappingMtFn Usgs-KS)
(resultSetCardinality Usgs-Gnis-PS
(TheSet (PhysicalFieldFn Usgs-Gnis-PS
"state")) TheEmptySet
60.0)(resultSetCardinality Usgs-Gnis-PS
(TheSet (PhysicalFieldFn Usgs-Gnis-PS
"primary_long") (PhysicalFieldFn
Usgs-Gnis-PS "primary_lat")
(PhysicalFieldFn Usgs-Gnis-PS "name"))
(TheSet (PhysicalFieldFn Usgs-Gnis-PS
"county") (PhysicalFieldFn
Usgs-Gnis-PS "state")) 530.36)
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
37What major US cities are particularly
vulnerable to an anthrax attack?
- major US city ? U.S. City with gt1M population
- particularly vulnerable to an anthrax attack ?
- the current ambient temperature at ?C is above
freezing, and - ?C has more than 100 people for each hospital
bed, and - the number of anthrax host animals near ?C
exceeds 100k
Cyc knows that pullets are chickens, so dont add
those two numbers together!
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Even simple queries often require 1-4 reasoning
steps
In what countries bordering Pakistan are there
members of the ANVC?
- Each answer that CAE finds for this generally
involves a 1-4-step (not 0-step) argument
(reasoning chain) - E.g., for the answer India, the justification
is - According to the web site Inside Terrorism,
the ANVCs headquarters has been in Garo Hills,
India from the beginning of January, 1996 through
today. - If an organizations HQ is in place x, then
there are members of that organization in place
x. - If someone is in place x, they are in every
super-region of x. - India borders Pakistan.
Dont include Prior Tacit Knowledge
44The Cyc Knowledge Base
- Represented in
- First Order Logic
- Higher Order Logic
- Context Logic
- Micro-theories
Cyc contains 15,000 Predicates 500,000 Concept
s 5,200,000 Assertions
These numbers are not a good way to really get a
handle on the Cyc KB
General Knowledge about Various Domains
Specific data, facts, and observations
45The Cyc Knowledge Base
Cyc contains 15,000 Predicates 500,000 Concept
s 5,200,000 Assertions
Is any seagull also a moose? If Cyc knows
10,000 kinds of animals, it should be able to
answer 100,000,000 queries like that. Option 1
Add those 100M assertions to the KB Option 2 Add
50M disjointWith assertions instead Option 3 Add
about 10k Linnaean taxonomy assertions to the KB,
plus one extra assertion (isa BiologicalTaxon
SiblingDisjointCollectionType) If taxons A and B
are not explicitly known (via those 10k
assertions) to be in a subset/superset
relationship, then assume that they are disjoint.
These numbers are not a good way to really get a
handle on the Cyc KB
A few hundred such SiblingDisjoint assertions
take the place of over 6 billion disjointness
ones
which in turn take the place of 100 trillion ones
like this (not (isa Cher Moose))
46There is no one correct monolithic ontology.
E.g., Cycs 5M axioms are divided into thousands
of contexts by granularity, topic, culture,
geospatial place, time,...
There is a correct monolithic reasoning
mechanism, but it is so deadly slow that we never
call on it unless we have to
E.g., the Cyc inference engine is a community of
1000 agents that attack every problem and,
recursively, every subproblem (subgoal). One
of these 1000 is a general theorem prover the
others have special-purpose data
structures/algorithms to handle the most
important, most common cases, very fast.
47What factors argue ltfor/againstgt the conclusion
that ltETAgt ltperformedgt ltthe March 2004 Madrid
attacksgt?
48Building Cyc qua Engineering Task
learning by discovery
learning via natural language
1984
2004
today
codify enter each piece of knowledge, by hand
CYC
900 person-years 23 realtime years 90 million
49(No Transcript)
50Temporal Relations
37 Relations Between Temporal Things
- temporalBoundsContain
- temporalBoundsIdentical
- startsDuring
- overlapsStart
- startingPoint
- simultaneousWith
- after
- temporalBoundsIntersect
- temporallyIntersects
- startsAfterStartingOf
- endsAfterEndingOf
- startingDate
- temporallyContains
- temporallyCooriginating
51Temporal Relations
Ariel Sharon was in Jerusalem during 2005 with
granularity calendar-week
Condoleezza Rice made a ten-day trip to
Jerusalem in February of 2005
52 - Rather than struggling to reason in natural
language sentences, use logic as the
representation language. - Most knowledge is default reason by
argumentation - Rather than striving in vain for a single fast
inference engine, use a suite of 1000 heuristic
modules that each handles a class of
commonly-occurring problems very fast. EL ??HL
split - Some of these HL modules act as tacticians
(meta-reasoners) to guide the reasoning a few
are strategists (meta-meta-reasoners) - Bridging the knowledge gap do the intermediate
theories. - Probabilities / certainty factors are useful
(risk overdependence) - Rather than striving in vain for a monolithic
consistent KB, divide the KB up into many
locally-consistent contexts
53Each assertion should be situated in a context
in a region of context-space
- We identified 12 dimensions of mt-space
- We developed a vocabulary of predicates and terms
to describe points and regions along each of
those 12 dimensions and - We have been situating assertions more and more
precisely, and we have been working out calculi
for inferring contexts - E.g., if P is true in C1, and PgtQ is true in C2,
in what context C2 can Q be validly concluded? -
- Anthropacity
- Time
- GeoLocation
- TypeOfPlace
- TypeOfTime
- Culture
- Sophistication/Security
- Topic
- Granularity
- Modality/Disposition /Epistemology
- Argument-Preference
- Justification
54Mathematical Factoring of Context-space Dimensions
There are at least 900,000 doctors.
This inference depends on the time, space,
and respective granularities of the contexts.
LehighCountyInFebruary1985Context Dick
Thornburgh is governor and Ronald Reagan is
president.
Dick Thornburgh is governor and there
are at least 900,000 doctors.
55Time Indices and Granularities
Doug is talking, at 1400-1500, on 4 May 2009.
56Time Indices and Granularities
Doug is talking, at 1400 to 1500, on 4 May
2009 with temporal granularity 1 calendar minute
P Doug is talking.
Calendar Minutes
t that two-hour interval t a continuous
15-min. sub-interval
Future
t
Past
t
So Talking during each 15-minute interval?
Yes Talking during each 2-second interval
Unknown
57Relations Between an Event and its Participants
Over 400 more.
58In In Our Geospatial Ontology
- We started in 1984 with just one binary
predicate, in. - in(X,Y) means the inner object X is spatially
located in the region defined by the outer object
Y. - If I just tell you in(X,Y), and you arent told
what X and Y are, then you (and Cyc) cant answer
questions like these - From the outside of Y, can I see any part of X?
- If I turn Y over and shake it, will X fall out?
- Is there room to put more things in Y?
- Is X actually a part of Y?
- Such failures led to our introducing new, more
precise, more specialized versions of in. By
now there are over 75 such predicates, organized
in a graphical taxonomy.
59Propositional Attitudes Relations Between Agents
and Propositions
- goals
- intends
- desires
- hopes
- expects
- believes
- opinesThat
- knowsThat
- remembersThat
- perceivesThat
- seesThat
- fearsThat
Most of these are modal assertions using them go
beyond 1st-order logic
60Handcrafted Cyc KB
- Represented in
- First Order Logic
- Higher Order Logic
- Context Logic
- Microtheories
Cyc contains 15,000 Predicates 500,000 Concept
s 5,200,000 Assertions
The pump has been primed, Use it as an inductive
bias to power more automatic knowledge acquisition
Real World Domain Knowledge
Specific cases, facts, details,
61AKA by Shallow Fishing
Automated Knowledge Acquisition
- Abu Sayyaf was founded in ___
- Al Harakat Islamiya, established in ___
- ASG was established in ___
Search Strings
(foundingDate AbuSayyaf ?X)
Abu Sayyaf was founded in the early 1990s
?
Parse (foundingDate AbuSayyaf (EarlyPartFn
(DecadeFn 199)))
62AKA by Shallow Fishing
Automated Knowledge Acquisition
- The height of the Eiffel Tower is ___
- The Eiffel Tower is ___ tall
Search Strings
(height EiffelTower ?x)
The height of the Eiffel Tower is 36 feet The
height of the Eiffel Tower is 984 feet
? Parse (height
EiffelTower (Foot 36)) (height EiffelTower (Foot
984))
63WWW.CYC.COM
64(No Transcript)
65(No Transcript)
66(No Transcript)
67Recent/Future AKB Directions
CYC
- Make it comprehensive (13 ? 100) apply it to
other dom. - Make it easier for SMEs to enter/vet/modify
info. - Improve the automatic acquis. (parsing / fishing
from unstructured texts SKSI to structured
sources, incl. SPARQL) - Make it easier for end users to pose questions
- Automatically select (a small superset of) the
relevant fragments - Use semantic constraints (argIsa, disjointness,
domain knowledge) to combine the
relevant fragments into a meaningful logical
query - Make justifications more terse and more
compelling - Speed up inference (in general and for AKB entry
and AKB query-answering) - Graceful degradation ½-way betw. QA Google
falling back on Semantic Search of auto. tagged
documents (tagged with Cyc terms)
68Developing a Cyc App.
- Extend Cycs KB
- Augment its ontology
- New assertions involving those new terms
- New Heuristic Level modules
- Identify the need(s) for them
- Design, build, and debug them
- New interface modules
- For manual entry for SKSI mapping for end users
- Domain-specific interfaces (e.g., sketching
military unit movements drawing chemical
formulae etc.)
69OpenCycOpen Source release of most of the
Cyc Ontology Simple Relns. Inference Engine
ResearchCycAlmost All of Cyc (for free for RD
purposes)
70The Ontology
Pre-existing general medical knowledge
framework Prior to the CCF project, Cycs KB
had184 specializations of MedicalCareEvent
MedicalCareEvent Ablation Ligation
CoronaryArteryBypassGraft Biopsy-SurgicalProcedure
TrephiningSomeone Prostatectomy RoboticSurgery
OutpatientSurgery InpatientSurgery
LiposuctionSurgery RemovalOfUniqueBodyPart
Appendectomy
Tonsillectomy GumSurgery SurgicalTreatment
TransplantSurgery HeartTransplantSurgery
GeneralSurgery MajorSurgery OpenHeartSurgery
RootCanalSurgery VaccinationEvent
BoosterVaccinationEvent AnthraxMilitaryVaccination
Script MedicalTesting
71The Ontology
Pre-existing general medical knowledge
framework Prior to the CCF project, Cycs KB had
350 specializations of AilmentCondition
AttentionDeficitDisorder Glaucoma SpinalStenosis
SleepDeprivation Ache-AilmentCondition Migraine
Hemorrhaging-TheCondition Jaundice
ParasiticAilment BacillaryAngiomatosis
Cryptosporidiosis Rickettsiosis
EpidemicTyphus-NAmerica ArthropodInfestation
ExternalArthropodInfestation InternalArthropodInfe
station Trichinosis Schistosomiasis Ascariasis
BladderFlukeInfestation
Atherosclerosis MultiplePersonalityDisorder
Adenomyosis Scabies AmyotrophicLateralSclerosis
Scoliosis Hypoglycemia TemproMandibularJointSyndro
me AcetylcholinePoisoning CadmiumPoisoning
CarbonMonoxidePoisoning FoodborneBotulism
InhalationalBotulism WoundBotulism InfantBotulism
Endometriosis Neuralgia Sciatica Diverticulitis
Gout MacularDegeneration
72The Ontology
Pre-existing general medical knowledge
framework Prior to the CCF project, Cycs KB had
200 specializations of Bacterium
StreptococcusPneumoniae StreptococcusPyogenes
Bacillaceae-Family Bacillus-Genus
BacillusCereus-Species Monotrichous
Bacterium-Monotrichous Peritrichous
Bacterium-Peritrichous Amphitrichous
Bacterium-Amphitrichous Tenericutes-Division
Mollicutes-Class Anaeroplasmataceae-Family
Asteroplasma-Genus Acholeplasmatales-Order
Acholeplasmataceae-Family Acholeplasma-Genus
Phytoplasma-Genus Eperythrozoon-Genus
Mycoplasmatales-Order Mycoplasmataceae-Family
Mycoplasma-Genus MycoplasmaPneumoniae-Species
Spirillales-Order Vibrionaceae-Family
Vibrio-Genus VibrioCholerae-Species
73The Ontology
Hundreds of pre-existing relevant
relationships
Medical domain specific relations infectionCause
dByOrganism infectingPathogen patientTreated devic
eTypeTreatsConditionType causeOfDeathTypeOfType fo
rmOfDisease ailmentTypeAffects ailmentEpidemicType
ailmentAcquiredBy ailmentTypicallyAcquiredBy
indicatedDrug mortalityRiskForCondition
survivalRate riskOfInfectionFromTypeToType
General Role Predicates objectActedOn eventOccur
sAt dateOfEvent objectPlaced objectRemoved deviceU
sed
74The Ontology
- Methodology
- Establish bridging (translation) rules
- Define rules that allow users to associate
patients, dates, locations, etc. with the various
events e.g. define patientTreated as a
relationship between a medical event and a
patient. - Define rules that allow users to easily express
complicated logical conditions e.g. the
defining rules for PrimarySurgery,
isolatedProcedureOfType, concomitantProcedures,
etc. - Define concise vocabulary for constructions that
are complicated or difficult to express e.g.
aortic valve replacement is represented as a
single non-atomic term. This allows the user to
specify this very common procedure with a single
fragment instead of three distinct fragments in
the CCF ontology (which in turn came about due to
there not being an explicit functional term
composition construct in the CCF representation).
75Typical Query for outcomes study
- The examples in this presentation were short,
simple, Medical English queries the ones being
focused on while building the application, and
now that it is actually being used at CCF, are
much larger ones, e.g. - IDENTIFY PATIENT POPULATION
- FIND all native aortic valve replacements
performed at CCF between January 1, 2000 and
December 31, 2004 with a pre-operative diagnosis,
as determined by echocardiogram, of moderately
severe or severe aortic stenosis and moderate to
severe left ventricular impairment. - INCLUDE operations in which concomitant primary
CABG or concomitant mitral or tricuspid valve
repair was performed. - EXCLUDE all patients with any prior valve repair
or replacement or with concomitant pulmonary
valve repair or with concomitant mitral,
tricuspid, or pulmonary valve replacement or
with aortic regurgitation greater than moderate
degree.
76Researchers and clinicians sometimes ask the same
queries
- Are there cases in the last decade where
patients had pericardial aortic valves inserted
in the reverse position, to serve as mitral valve
replacements, and how often in such cases did
endocarditis or tricuspid valve infection
develop, and how long after the procedure?
77 Applying i.e., Using Cyc
- Get a large set of use-cases (CCF task the last
900 queries) - Arrange them into maximally mutually-dissimilar
classes - Manually represent a couple from each of those
buckets - Reveals most of the necessary new predicates (
interfaces) - Now go through each of the use-cases, trolling
for new domain-specific terms to add to the
ontology - Can be done manually, but we are beginning to
rely more on semi-automatic methods where the
system itself helps with that process - As appropriate, lexify the terms and/or align
them to existing standards - Run exemplars from each bucket (i.e., to
completion) - tracer bullets to reveal nec. new rules,
reasoning modules (interfaces) - Replace the largest bucket by 2-4 spec.s, recur
(i.e., repeat the preceding 3 steps, and this
one, again) until there is no new gain
78 Applying i.e., Using Cyc
- Test the system on previously-unseen use-cases
(or at least ones which were not among those
previously-selected from their bucket) - Have users try to use the system, and watch them
(their results, of course, but also to the extent
possible their time-feature trajectory) - Which features did they rarely or never use (to
good effect)? - Which features did they make heavy use of?
- Independent of this, ask them for their feedback
and suggestions - Try to identify classes of users which will
translate into classes of documentation and
training materials/regimes/interface specifics - All along, identify what elements of the ontology
(if any) are proprietary, and assimilate
everything else into future versions of OpenCyc
and ResearchCyc
79(No Transcript)
80- (implies
- (and (cCFhasLeftAtriumDiameter ?EVT ?D)
(greaterThan ?D ((Centi Meter) 3.8))
(patientTreated ?EVT ?PAT) (patientSex
?PAT FemaleHuman) (rdf-type ?EVT ?TYPE)
(genls ?TYPE CCF-Evaluation)) (isa ?EVT
EvaluationThatIndicates- - LeftAtrialEnlargement))
811784 pieces of pre-existing (prior to this
project) Cyc KB knowledge used while handling a
typical query. E.g. Inferred Disjointness
constraints (disjointWith PericardialWindow-S
urgicalProcedure MedicalPatient) Justificati
on we are counting each of these assertions,
in the total (genls PericardialWindow-SurgicalPr
ocedure PericardialProcedure-Surgical) in
UniversalVocabularyMt (genls PericardialProcedure-
Surgical CardiacProcedure-Surgical) in
UniversalVocabularyMt (genls CardiacProcedure-Surg
ical SurgicalProcedure) in UniversalVocabularyMt (
genls SurgicalProcedure MedicalCareEvent) in
BaseKB (genls MedicalCareEvent PhysicalSituation)
in BaseKB (genls PhysicalSituation
Situation-Localized) in UniversalVocabularyMt (gen
ls Situation-Localized Situation) in
UniversalVocabularyMt (disjointWith
SpatialThing-NonSituational Situation) in
BaseKB (genls EnduringThing-Localized
SpatialThing-NonSituational) in
UniversalVocabularyMt (genls Agent-NonGeographical
EnduringThing-Localized) in UniversalVocabularyMt
(genls EmbodiedAgent Agent-NonGeographical) in
UniversalVocabularyMt (genls PerceptualAgent-Embod
ied EmbodiedAgent) in UniversalVocabularyMt (genls
Animal PerceptualAgent-Embodied) in
UniversalVocabularyMt (genls MedicalPatient
Animal) in UniversalVocabularyMt
82Ideas for NLM Grand Challenges
- Comprehensive Ontology of Medicine
- Ties to terminological standards (Snomed, ICD),
lexical ones (WordNet), conceptual ones (Cyc) - Knowledge about/involving the concepts
- Contextualized for time, source, level of
detail, - Sample sub-project multicultural Engl.-Engl.
translation - English-to-English translation
- Using the above ontology of medicine, and models
of discourse, models of classes of users (by age,
occupation, etc.), models of individual users
(built up over time and stored HIPAA-securely) - Translate articles, web pages, medicine bottle
labels, etc. into comprehensible form for that
user - In some cases this means literally writing more
text expanding its length, or paring it down
(eliminating prior knowledge) - In less clear cases (where the user might or
might not already know some piece of
information), the best way to expand the original
text might be to add footnotes containing the
borderline information, and to pare down the
original text by relegating borderline material
to footnote form - The translations neednt just be static they can
sync with the users calendars, cell phones,
computers, etc., to provide reminders,
proactively send them relevant news articles or
new warnings, and so on - Automated Clinical/Biomedical Discovery
- Hypothesis formation, Experiment design, Data
gathering, Analysis, New termshypotheses