Title: P1253553562mXHzc
1 Supporting Creativity in Science Cooperative
Knowledge Acquisition Knowledge Refinement
Systems Derek Sleeman Department of Computing
Science The University ABERDEEN AB24 3FX Tel
44 (0)1224 272296 Email d.sleeman_at_abdn.ac.uk WWW
http//www.csd.abdn.ac.uk Acknowledgements
EPSRC support for the AKT Consortium Students
Eugenio Alberdi, David Corsar, Andy Aiken, Mark
Winter
2OVERVIEW of TALK
I Context Advanced Knowledge Technologies
(AKT) Consortium II Co-operative Knowledge
Acquisition Knowledge Refinement
Systems. III ReTAX system IV The REFINER
System Questions / Discussion
3I AKTs CHALLENGES
Knowledge Acquisition
Knowledge Maintenance
Knowledge Modelling
Life Cycle, Integration Issues Testbeds
Knowledge Reuse
Knowledge Publishing
Knowledge Retrieval
4II Co-operative KA Knowledge Refinement
Systems
- Knowledge-Based systems inevitably require a
sizeable amount of - domain knowledge. This can be acquired from
- domain experts (KA)
- detailed examples (using ML techniques) etc
- However for complex tasks these KBs are
inevitably - incomplete when further Knowledge-Acquisition
is needed - inconsistent when the KB needs to be refined.
- also it is likely that background knowledge
will be incomplete thus requiring an expert
to act as an oracle. - Hence the need for Co-operative (Problem
Solving) Knowledge - Acquisition Knowledge Refinement Systems
5II Co-operative KA Knowledge Refinement Systems
KRUST (Classical KB Classification) (Susan
Craw) STALKER (Efficient Truth Maintenance based
system Classification) (Leo Carbonara) REFI
NER/Refiner / R5 (Case-base Classification) (S
unil Sharma Mark Winter Andy
Aiken) RETAX (Revision of Taxonomies) (Eugenio
Alberdi David Corsar) CRIMSON
(Refinement of Constraints) (Mark
Winter) TIGON Time Series Data/Causal Model
(Diagnosis) (Fraser
Mitchell) SALT Rules Constraints Propose
Revise (Piero Leo) References see - WWW
http//www.csd.abdn.ac.uk
6II Co-operative KA Knowledge Refinement Systems
KRUST Wine Adviser STALKER REFINER Attendan
ce at Medical Clinics Stock control
CRIMSON/ConRef Stock control RETAX Botanical
Taxonomies TIGON Turbines (Fault Detection
Diagnosis) SALT Elevators/Lifts References
see - WWW http//www.csd.abdn.ac.uk
7III RETAX
- The heuristics in RETAX are based on a study to
determine how Botanists reacted to a rogue
item(s). - There are 2 (principal) rules which determine
whether a taxonomy is well formed - each child node must be more specialized that
its parent - each of a nodes siblings must be unique.
- Retax was used to replicate the revision of a
major botanical taxonomy done manually in
Aberdeens Botany dept in the 90s. - References Middleton Wilcox (1990) Edinburgh
Journal of Botany revision of taxonomy for
Pernettya / Gaultheria - Alberdi Sleeman (1997) AI Journal, p257-279.
- Alberdi, Sleeman Korpi (1999) Cognitive
Science Journal
8Label Wheels Size Motor Engine-Power Parent Depth
string ANY integer-range (2 8) ordered-set 4 (low medium large high) ordered-set 2 (yes no) Integer- Range (0 20) string ANY Integer- Range (0 3)
vehicle 2 - 8 (low medium Large, high) (yes no) 0 - 20 root 0
train 6 - 8 (medium Large) (yes) 15 - 20 vehicle 1
car 3 - 6 (low medium high) (yes) 2 - 10 vehicle 1
cycle 2 - 3 (low) (yes no) 0 - 3 vehicle 1
lorry 4 - 8 (medium high large) (yes) 5 - 20 vehicle 1
9sports- car 4 (low) (yes) 5 10 car 2
salon-car 4 (medium) (yes) 3 5 car 2
bicycle 2 (low) (no) 0 cycle 2
motor- cycle 2 (low) (yes) 1 3 cycle 2
large- lorry 4 8 (large) (yes) 6 - 20 lorry 2
small- van 4 (medium) (yes) 5 10 lorry 2
smaller- van 4 (medium) (yes) 6 small- van 3
10Vehicle
Train
Car
Cycle
Lorry
Sports Car
Salon Car
Bicycle
Motorbike
Large Lorry
Small Van
Smaller Van
11RETAX
- Lets refer to a new object/node as N, the
existing hierarchy/tree as T, and the potential
parent node as P. Then possible operations are - Is T well formed? (If not report nodes which
violate the rules.) - E.G., If Sibling nodes N1 N2 are
equal, then merge the 2 nodes. - Is N already in T?
- Assuming T is well-formed, to which parent node,
P, can N be attached without causing T to be
rearranged or N modified? (Answer could be none) - What changes have to be made to N to make it a
legal child of node P? - What changes have to be made to T so that N can
be a child of P? - Combinations of the last 2 operations
12ReTAX
- Ericaceae
- Arctostaphylos Arbutus Pernettya
Leucothoe Gaultheria Agauria Andromeda - A. uva-ursi A. unedo P. tasminica
G.oppositfolia G. rupestris G. antipoda
A. polifolia
13ReTAX
- - Historical In Bentham Hookers (1876)
classification the main differences detected
between the Pernettya Gaultheria genera were
type of fruit and succulence of the calyx
features. - G Bentham JD Hooker (1876). Genera Plantarum,
Vol II, Part2. (Publ Reeves Co, London) - - Subsequent botanical investigations in the
20th Century challenged this analysis, but did
not suggest any further distinguishing features
for the 2 genera hence the 2 genera were
combined, (Middleton Wilcox, 1990).
14ReTAX
- Simulation (Simplified)
- - The descriptions of several species of the
Pernettya Gaultheria genus were replaced by
others with revised features (descriptors) which
effect the definitions of the parent nodes (P G) - - When parent nodes (Pernettya Gaultheria) are
found to be the same, the system checks a set of
other features (further facility of ReTAX) to see
if they are distinctive when no differences are
found, the 2 nodes (PG) are collapsed
15RETAX Current / Future activities
- Use with other experts to help them formulate /
refine taxonomies (eg other aspects of botany,
microbiology) - Use RETAX, or a variant, to formulate / refine
ontologies (eg medical terminologies). This has
resulted in the Protégé RepairTAB which detects
inconsistencies on OWL Ontologies gives advice
about removing inconsistencies. (Lam, Sleeman,
Pan, Wasconcelos (2008) Journal of Data
Semantics)
16IV REFINER System
- The Refiner algorithm
- Sample dataset
- Interaction with experts
- Current / future work
17The Sample Dataset
Age DBP Associated Disease Category
1 50 90 D1 A
2 56 90 D2 A
3 52 101 D3 A
4 50 95 D3 B
5 56 97 D3 B
6 - 89 D5 A
7 52 97 D3 A
18The Refiner Algorithm
- Each case is assigned to a category
- Category descriptions are inferred from the
case values - When a case matches a category it was not
assigned, by the expert, this is an
inconsistency - While inconsistencies exist
- A selection of disambiguation strategies are
suggested - The user chooses a strategy to be performed
- The list of inconsistencies is re-evaluated
- The refined dataset is now consistent
19Generating Descriptions
- Generalise each field
- Numeric range from lowest to highest
- String set of all unique items
- Taxon nearest common parent
- Boolean set of all unique items from the set
true, false, any - Combine to get category description
20Category Descriptions
Category Age DBP Disease
A 50 56 89 101 All
B 50 56 95 97 D3
- There are inconsistencies
- Cases 4 and 5 match A
- Case 7 matches B
- We need to remove the overlap
21Disambiguation Strategies
- Change values for certain cases
- Remove values from a category (eg, create a
disjunction) - Reclassify a case
- Make a case match an additional category
- Shelve a problem case
- Add a new field
22Refiner
C2
C1
C3
23Strategies for this problem
- Change value of DBP in case 7 to 90
- Change value of DBP in case 5 to 95
- Reclassify case 7 to category B
- Add case 7 to category B
- Shelve case 7
- Change value of Disease in cases 3 and 7 to D3
- Reclassify cases 4 and 5 to category A
- Add cases 4 and 5 to category A
- Shelve cases 4 and 5
- Add a new field
24Strategy Ordering
- Typically, many strategies are suggested
- We need heuristics to order them
- Ordered by number of times suggested prefer
strategies which are suggested many times - Ordered by number of cases affected prefer
strategies which affect fewer cases
25The Refiner Main Screen
26Scalability
- Measured the time taken to
- perform validation on
- randomly-generated datasets
- with varying numbers of
- cases, fields and categories
- For most datasets, time taken
- is under 1 second
27Use of REFINER by Experts
- Refiner has been used with various experts
including - Pain Control Expert (Anaesthesiology)
- Child psychologist
- High Dependency Unit (HDU) Physician
- KCAP-2003 paper (Aiken Sleeman)
28Pain Control
- Pre-existing Access dataset on epidural patients
- Many cases, lots of fields / descriptors
- Refiner imported the data (almost) perfectly
- Expert categorised cases based on the length of
the epidural (in days) - REFINER took only a few seconds to create
category descriptions and validate - But
29Pain Control
- Hundreds of inconsistencies found
- Hundreds of strategies suggested
- Almost all which were change value
- Why did it not work better?
- Subjective nature of the subject domain.
- Categories were contiguous
30Child Psychology
- The session was a series of anecdotes and
outlines of specific cases - Three types of cases were identified
- Severely autistic
- Mildly autistic
- Difficulties with language development
31Child Psychology
- The expert stated that autistic children usually
had the - following characteristics
- Problems with language and verbal communication
- Problems with social interaction
- Obsessive behaviour
- These characteristics were abstracted by the
knowledge - engineers and subsequently confirmed with the
expert - The expert showed no inclination to use
REFINER, but a case set was created by the
knowledge engineers
32HDU
- Task poised by domain expert when to move high
dependency unit (HDU) patients to a general ward,
or the intensive care unit (ICU), or leave them
in the HDU. - Used Refiner with three datasets one for each
condition (cardiac, neuro respiratory) - Expert did not use the system but did dictate the
descriptors the sets of cases to the knowledge
engineers who typed this information into
REFINER. - Refiner found 2 categories were consistent
in the third identified inconsistencies
33Inconsistent Dataset
HR RR AVPU Sat O2 Cat.
1 105 27 1 94 Higher
2 120 35 2 88 Higher
3 140 45 3 80 Higher
4 105 28 1 94 Same
5 90 22 1 95 Same
6 80 18 1 96 Lower
7 70 15 1 98 Lower
34Category Descriptions
Category HR RR AVPU Sat O2
higher 105-140 27-45 1-3 80-94
same 90-105 22-38 1 94-95
lower 70-80 15-18 1 96-98
- There are inconsistencies
- Case 1 matches Category SAME
- Case 4 matches Category HIGHER
- We need to remove the overlap
- Refiner suggested lower and upper danger
zones for each field
35Future Work Use with Domain Experts
- Make the systems GUI more intuitive (some
changes already made) - Ask expert to come along to the session with a
document which summarizes the main features of
the dataset they wish to discuss. (In session ask
them to highlight principal concepts) - For each domain expert contacted, record an AVI
session of a simple but related domain (eg simple
childhood diseases before approach a
paediatrician) (demo)
36Current Work (ICU domain)
-
- Developed system which is statistically based, so
given a case description it returns the
likelihood of that case belonging to one of the
predefined categories (R5 Andy Aiken) - Acquired data set of patients physiological
parameters from an ICU DB, and have clinicians
assign patients on day-by-day hour-by-hour to a
5-point severity score. (Develop in conjunction
with Glasgow Royal Infirmary) - Using R5 with the above data set to assign new
patient reports to a severity class. (Practically
important as the descriptors include clinical
interventions which standard scales dont.) - Identify analyse (explain) anomalous / unusual
cases (segments of cases)
37VI Dimensional Analysis ??
- Outline issue
- Pointer to TR
- Pointer to WWW systems / sources
38Questions/Comments
39V (Causal) Explanations for Anomalous Medical
cases
- Discuss ICU context
- Experiment to detect Anomalous cases / sections
of cases - Outline a typical investigation
40V Seeking to Explain an anomalous Observation
- EXPECTED An injection of X will cause the heart
(Organ, O) to increase its contraction rate
within T seconds. - SUPPOSE that does not happen, then here are some
of the investigations which might be performed - Is the injection being given effectively
- IF so then check whether the drug X is being
transported to Organ, O - Is the transport path physically /
bio-chemically blocked? - Is the transport mechanism inhibited slowed down?
- IF the drug is actually arriving at Organ O the
conc is OK, then investigate - Is the drug mechanism within the organ being
blocked? - Is the organ for some reason unable to respond in
the usual way (eg weaken heart muscle)