Title: Interpreting Loosely Encoded Questions
1Interpreting Loosely Encoded Questions
- James Fan and Bruce Porter
- University of Texas at Austin
Full support for this research was provided by
Vulcan Inc. as part of Project Halo
2Problem
KB
English question
end user
question encoding
3Task
- Context end users pose questions to
knowledge-based question-answering systems
without intimate knowledge of the structure of
the knowledge base - Task translate end users encodings so that they
align with the KB.
4Input and output
- naïve encoding a question encoded without regard
for the structure of the knowledge base. Naïve
encodings are often literal translations from the
original English expressions, i.e. the form of
questions we should expect from end users. - correct encoding a question encoding that aligns
with the structure of the knowledge base.
5Loose speak
- loose speak the part of an encoding that fails
to align with the knowledge base. - Not meant to be pejorative. Loose refers to the
imprecise way that people form English
expressions.
6Project Halo phase I
- Three systems for Advanced Placement chemistry.
(Barker et al. 2004 Angele et al. 2003) - A chemistry KB is built.
- The best KB answers enough questions to score 3.
- Knowledge engineers encode 160 English test
questions (10 man-weeks).
7Project Halo phase II
- Develop a knowledge-acquisition tool that will
enable domain experts in the sciences to
independently formulate and debug high quality,
reusable knowledge modules. - Develop a knowledge-based question-answering
system that allows an untrained end-user to pose
questions and problems to those underlying
knowledge modules.
8Examples (continued)
- When dilute nitric acid was added to a solution
of one of the following chemicals, a gas was
evolved. This gas turned a drop of limewater,
Ca(OH)2 cloudy, due to the formation of a white
precipitate. The chemical was - (a) household ammonia, NH3
- (b) baking soda, NaHCO3
- (c) table salt, NaCl
- (d) epsom salt, MgSO4? 7H2O
- (e) bleach, 5 NaOCl
9Examples (continued)
- Which of the following aqueous solutions has the
lowest conductivity? - (a) 0.1 M CuSO4
- (b) 0.1 M KOH
- (c) 0.1 M BaCl2
- (d) 0.1 M HF
- (e) 0.1 M HNO3
10Burden of interpreting loose speak
- Without interpreting loose speak, question
encodings will not yield right answers. - Without intimate knowledge of the knowledge base
structure, no loose speak will be interpreted. - Obtaining such intimate knowledge about the
knowledge base and interpreting loose speak is a
heavy burden.
11Previous approaches
- Restrict expressiveness
- keywords
- question templates, such as what happens to
during ?. - Unsuitable for questions, such as the previous
examples. - Educate users
- But for different KBs require different
education. - Unsuitable for untrained end-users.
12Project goal
- To improve knowledge-based question-answering
systems by automating the interpretation of loose
speak to produce correct encodings of questions - Input a naïve encoding of a question.
- Output an encoding of the input question that
conveys the intended semantics of the input, and
does not contain loose speak.
13Project Goal
Goal
Interpreter
KB
English question
end user
question encoding
14Study 1 types of loose speak
- Purpose since a naïve encoding may differ from a
correct encoding in many ways, we need to
discover types of loose speak. - Methodology compare naïvely encoded questions
with the correct encodings. - Data two sets of questions from Project Halo
(150 questions in total).
15Types of loose speak
16Types and frequencies
17Algorithm
- Overview reuse the knowledge in the KB being
queried. - Made of a test and repair function.
- Test check to see if an input contains loose
speak based on constraint violation and the
knowledge in the KB. - Repair finds a list of interpretations based on
spread activation on the KB using the input as
anchor points.
18Example
- Question Hydrolysis of NaCH3COO yields?
- a strong acid and a strong base
- a weak acid and a weak base
- a strong acid and a weak base
- a weak acid and a strong base
- none of the above
19Example (continued) test
- There is no constraint violation because the
domain of raw-material is Event, the range of
raw-material is Tangible-Entity, Hydrolysis is an
Event, and NaCH3COO is a Tangible-Entity. - However, it detects a loose speak because there
is no super or subclass of Hydrolysis whose
raw-material is a super or subclass of NaCH3COO
in the KB.
Hydrolysis
raw-material
result
?
NaCH3COO
intensity
?
20Example (continued) repair
- Breadth-first search starting from Hydrolysis.
- Spread activation terminates when it finds a
super or subclass of NaCH3COO.
Time-Interval
Chemical-Entity
time
has-basic-structural-unit
Hydrolysis
raw-material
site
Place
Chemical
Halo KB
21Example (continued)
Hydrolysis
result
raw-material
?
Chemical
intensity
?
has-basic-structural-unit
NaCH3COO
22Study 2 interpreter performance
- Data
- 50 multiple choice questions from AP chemistry
practice tests. - Distinct from the data used in frequency study.
- Users
- 3 users with different background in knowledge
engineering and chemistry. - Given a brief 3-page tutorial on encoding
question. Not complete tutorial on using the KB. - Measurements
- precision and recall.
23Experimental results
24 Discussion and analysis
- Loose speak is very common on average 91.3 of
the encodings by the users contain loose speak. - None of the encodings that contain loose speak
would be correctly answered by our knowledge
base. - The loose speak interpreter works well in our
test precision 95, recall near 90.
25Related work
- Metonymy
- Based on a set of rules (Weischedel Sondheimer
1983 Grosz et al. 198 Lytinen, Burridge
Kirtner 1992 Fass 1997). - Based on KB-search (Browse 1978 Markert Hahn
1997 Harabagiu 1998). - KB-search in knowledge acquisition (Davis 1979
Kim Gil 1999 Blythe 2001).
26Summary
- Defined loose speak as the part of a question
encoding that misaligns with existing knowledge
base structures. - Preliminary evaluation shows that loose speak is
common. - The interpreter can detect and interpret most
occurrences of loose speak correctly in our test.
27Future work
- Expand the investigation of loose speak into
other aspects of knowledge base interaction, such
as knowledge acquisition.
28Why doesnt traversal order matter? (most of the
time).
- Interpretation of an edge does not affect other
edges because the interpreter does not alter the
original head and tail, and does not depend on
the interpretation of other edges. - Except
- overly generic concept type of loose. Use
backtracking. - queries process them last, and process in the
direction of the edges in the queries.
29Example (continued)
30Why didnt you use a more sophisticated search?
- Deeper search is not better. A very deep search
will return encodings that are not closely
related to the input, therefore they are less
likely to convey the intended meaning of the
input. - If only shallow search is needed, then a brute
force is sufficient.
31Isnt everything related to something in the
taxonomy? So most search results must be useless.
- The semantic relations in the searches do include
subclasses relation, but they do not include
superclasses relation. - If both superclasses and subclasses are
included, then any concept can be found from
another by climbing up and down the taxonomy, and
a large number of spurious interpretations may be
returned.
32Precision and recall definition
- Measurements (Jurafsky Martin 2000)
- Precision of correct answers given by system
/ of answer given by system - Recall of correct answers given by system /
total of possible correct answers - of correct answers given by system is the
of question encodings interpreted correctly . - of answer given by system is the of
question encodings the interpreter detects loose
speak and finds an interpretation. - total of possible correct answers is the
number of all question encodings that contain
loose speak.
33Experiment details
- tp inputs contain LS, and they are interpreted
correctly - fp inputs don't contain LS, but they are
interpreted - tn inputs don't contain LS, and they are not
interpreted - fn inputs contain LS, but they are not
interpreted - Special cases
- If an input has syntax mistakes, such as missing
paren, use set filler instead of single inst.
fixed versions are used - If an input causes interpreter to crash, then it
counts as no interpretations found (hence fn) no
matter what the cause of the crash is (could be
KB or really really bad encoding) - If the interpretation solves the LS in an input
correctly even if the result isn't the perfect
encoding for the question, it counts as true
positive - If the input has LS and the interpretation is
incorrect or partial correct, then it counts as
fn
34Test repair test
- Constraint violation
- If the edge violates structural constraints, then
it must contain loose speak (because correct
encodings are consistent with the structure of
the knowledge base) - Returns many true positives and false negatives.
- Resemblance test
- If the input does not resemble any existing
knowledge, then it may contain loose speak
because studies have shown that one frequently
repeats similar versions of general theories
(Clark, et al. 2000). - Returns many false positives and true negatives.
35Test repair test (continued)
- Constraint violation implemented as a test for
domain and range violation of the relation in an
edge - Resemblance test
- if the edge represents a query, then it passes
the test only if the KB can compute one or more
fillers for the tail. - Otherwise, passes if the KB
contains an edge such that
Headkb subsumes or is subsumed by Headq and
Tailkb subsumes or is subsumed by Tailq.
36Test and repair repair
- Given , repair is implemented as two
breadth-first procedures - search_head start at C1, traverse all semantic
relations, stop when a suitable instance C3 is
found. C3 is suitable if does not
contain loose speak. The successful search path
is returned. - search_tail similar search starting from C2.
37Example (continued) interpreting loose speak
- the domain of intensity is Thing, and the range
is Intensity-Value. Because the result of
Hydrolysis is a Chemical, which is a Thing, it
passes the constraint violation test. - However because the query about intensity does
not return any value, it fails the resemblance
test.
Hydrolysis
raw-material
result
?
Chemical
intensity
has-basic-structural-unit
?
NaCH3COO
38Example (continued) interpreting loose speak
- search_head finds that the Base-Role played by
the resulting Chemical has an intensity value. - Its a role type of loose speak.
Chemical-Entity
has-basic-structural-unit
Chemical
Intensity-Value
plays
intensity
Base-Role
Halo KB
39Example (continued) interpreting loose speak
Hydrolysis
result
raw-material
?
Chemical
plays
?
has-basic-structural-unit
NaCH3COO
intensity
?