Title: A Proposal for an Automatic LooseSpeak Interpreter
1A Proposal for an Automatic Loose-Speak
Interpreter
2Outline
- What is loose-speak
- Thesis
- Frequency study
- LIPS
- Case study noun compound interpreter
- Future work
- Related work
3Knowledge-Base (KB) Interaction
- Knowledge acquisition (KA) knowledge engineers
adding assertions to a KB - Question answering (QA) users posing queries to
a KB
4The Difficulties in KB Interaction
- One cause of KB interaction difficulties KB
misalignments
5KB Misalignment Example
encodings
assertions/queries
KB
HCl is yellow-greenish
H
has-part
HCl
Cl-
6KB Misalignments Are Common
- Misalignments are not limited to our KB.
- The arbitrary nature of KR makes misalignment
unavoidable. - Misalignment problem gets worse when SMEs encode
knowledge.
7KB Misalignments Are Difficult to Fix
- Aligning encodings with a KB requires lots of
intimate knowledge about the KB. - Example In the Halo Project, a team of KEs spent
two weeks encoding 150 questions, and aligning
encodings was a significant part.
8Naïve Encodings
- Encodings without regard for the KB being
interacted with. - Pros straightforward, faithful and literal.
- Cons often misaligned with KBs.
9Correct Encodings
- Encodings conveys the meaning of the input and
compatible with KBs. - Pros suitable for KB reasoning
- Cons unintuitive, requires extensive knowledge
about the idiosyncrasies of KB.
10Loose Speak
- The KB interaction that results in a discrepancy
between a naïve encoding and a correct encoding
of the same input.
11Loose Speak Example 1
- Input assertion
- The civil defense office has reported a clash
between policemen and demonstrators.
12LS Example 1 (continued)
Naïve encoding
- Discrepancies
- metonymy
- role
- aggregate
Correct encoding
13Loose Speak Example 2
- Input query
- What is the equilibrium constant of the
following reaction given that H2C2O4 is a
diprotic acid with K1 5.36?10-2 and K2
4.3?10-5, H2C2O42H2O?2H3O C2O42-?
14LS Example 2 (continued)
Naïve Encoding
- Discrepancies
- Too generic concepts
- Aggregate
Correct Encoding
15Outline
- What is loose-speak
- Thesis
- Frequency study
- LIPS
- Case study noun compound interpreter
- Future work
- Related work
16Thesis
- It is possible to have the best of both naïve
encodings and correct encodings by allowing users
to speak loosely, then automatically interpreting
the naïve encodings into correct encodings.
17Challenges in Interpreting LS
- Countless occurrences of KB misalignments
- Each occurrence could constitute a different type
- Challenge how to interpret so many different
types of KB misalignments?
18Four Hypotheses
- Frequency hypothesis LS concentrates on a few
types. - Bootstrapping hypothesis LS usually can be
interpreted using ONLY the knowledge from KBs
interacted with. No new LS knowledge is needed. - Prior knowledge hypothesis LS occurs when an
input is unrelated to any prior knowledge. - Related knowledge hypothesis LS can be
interpreted by searching the space around the
input.
19Outline
- What is loose-speak
- Thesis
- Frequency study
- LIPS
- Case study noun compound interpreter
- Future work
- Related work
20Evaluation of Frequency Hypothesis
- Goal find common LS types and estimate their
frequencies - Data
- Corpus study 100 randomly chosen sentences from
each of the 3 corpora Brown, MUC Alberts. - Halo questions two sets of AP test questions
(200 questions) in the form of English sentences.
21Methodology
- Encode the sentences literally without regard for
the idiosyncrasies in our KB. - Encode the sentences according to the
idiosyncrasies of our KB. - Compare the two encodings and record any
misalignments.
22LS Types
- Metonymy attribute used for thing itself
- Causal factor causes used for results
- Too generic concepts generic concepts used for
specific ones - Roles things that are in the context of events
- Aggregate individual used for a set
- Spatial relations spatial relations used between
objects instead of locations - Temporal relations temporal relations used
between events instead of time - Noun compounds sequences of nouns without
semantic relations explicitly specified. - Metaphors one thing figuratively used to refer
another based on similarity
23Metonymy
- An attribute (metonym) is used in place of the
thing itself (referent) - Example Pearl Harbor caused US to declare war
against Japan. - Commonly used metonymic relations (Lakoff and
Johnson 1980) - PART-FOR-WHOLE
- PRODUCER-FOR-PRODUCT
- OBJECT-FOR-USER
- CONTROLLER-FOR-CONTROLLED
- INSTITUTION-FOR-PEOPLE-RESPONSIBLE
- PLACE-FOR-INSTITUTION
- PLACE-FOR-EVENT
24Causal Factor
- Results in a causal chain are described by their
causes. - Example What is the result of mixing NaOH and
HCl? - Not viewed as metonymy because causal relation is
excluded from Lakoff Johnsons list.
25Frequency Study Results
26Analysis
- Frequency hypothesis validated because LS does
concentrate on a few types. - Frequencies of different types of LS vary across
different domains.
27Outline
- What is loose-speak
- Thesis
- Frequency study
- LIPS
- Case study noun compound interpreter
- Future work
- Related work
28Loose-speak Interpreter Requirements
- Automaticity intrude upon the KB interaction as
little as possible. - Coverage cover all the important LS types, and
be easily extended to handle additional types
should they occur.
29Loose-speak InterPretation System (LIPS) Overview
The traditional KB interaction model.
The LIPS KB interaction model.
30Outline
- What is loose-speak
- Thesis
- Frequency study
- LIPS
- Case study noun compound interpretation
- Future work
- Related work
31Noun Compound Interpretation
- Noun compounds a sequence of nouns composed of a
head noun and one (or more) modifiers, such as
concrete floor. - Noun compound interpretation find a sequence of
semantic relations that links the nouns in a
compound, - Example animal virus ?
- Noun compound in KA given a new concept made of
a noun compound, and its constituent nouns
knowledge is often skeleton.
agent
object
basic-structural-unit
Virus
Invade
Cell
Animal
32Bootstrapping Hypothesis in Noun Compound
Interpretation
- Bootstrapping hypothesis
- Supported if noun compounds can be interpreted
without much knowledge about the constituent
nouns. - Example concrete floor interpreted without
knowing that concrete is made of sand, gravel and
cement
33Ablations
- Impact of each level of the ontology measured
through a series of ablations. - When a level is ablated, the concepts on that
level and all their axioms are deleted from the
KB.
Ablation of level 1 on a sample taxonomy
34Related Knowledge Hypothesis in Noun Compound
Interpretation
- Related knowledge hypothesis
- Validated if a good percentage of noun compounds
are interpreted correctly by searching the space
around the constituent nouns.
35Noun Compound Interpreter Algorithm
- Given noun compound
- A breadth-first search starting from C1, stops
when C2 or any super/subclass of C2 is found - A breadth-first search starting from C2, stops
when C1 or any super/subclass of C1 is found - Example
36KBs
- Three domains
- Biology textbook
- Small engine repair manual
- Sparcstation manual
- Share top level ontology, but few other concepts
in common
37Measurements
- P precision and R -- recall
- Csystem -- of correct answers by the system,
i.e. the number of correctly interpreted noun
compounds - Asystem -- of answers by the system, i.e. the
number of interpreted noun compounds - Cpossible -- total of possible correct answers,
i.e. the number of noun compounds tested
38Measurements (continued)
- If R ? and P ? then
- Csystem ? because Cpossible remains the same
- Asystem ? ?
- Cpossible - Asystem number of uninterpreted
inputs ?
39Results
40Results (continued)
- Precisions are 93.8, 85.2 and 84.5 recalls
are 93.8 74.5 and 73.2 without ablations - Ablating level 1 causes a big drop in both
precision recall. - Ablating level 2 causes a gap between precision
and recall, which indicates no interpretations
are found for many noun compounds. - As lower levels are ablated, the impact
diminishes.
41Analysis
- Related knowledge hypothesis validated because
good percentage of noun compounds are interpreted
correctly. - Why are the top levels of the ontology the most
important? Two possible reasons - Because they contain more knowledge?
- Because their knowledge is more important?
42Axiom-per-Level Count
- Use axiom counts as measurement of knowledge
amount - Count the local axioms only
43Axiom-per-level Count Results
44Analysis (2)
- Therefore top levels knowledge is more important
than lower levels. - Bootstrapping hypothesis validated because only
the top levels are needed for noun compound
interpretation task.
45Outline
- What is loose-speak
- Thesis
- Frequency study
- LIPS
- Case study noun compound interpreter
- Future work
- Related work
46Future Work
- Extend LIPS
- Evaluate LIPS
47Extending LIPS
- Implement interpreters for other types of LS.
- Based on our hypotheses, other types of LS can be
detected and interpreted using similar KB-search
methods.
48Extending LIPS (continued)
- Example given assertion HCl is
green-yellowish, - Naïve encoding
- LS Detection
- HCl doesnt have any knowledge of color and
- The domain of color relation does not apply to
HCl
49Extending LIPS (continued)
H
has-part
HCl
Cl-
Is-basic-structural-unit-of
Chemical
color
Color
50LIPS Evaluation Environments
- Incorporate LIPS to SHAKEN
- A knowledge acquisition system
- Used in RKF project
- To be used in Halo II and CALO projects.
- Incorporate LIPS to a controlled language
question answering system (Clark and Robinson
2003). - A question answering system
- Proposed for Halo II
51New Applications
- Halo II
- Follow on of the Halo project
- Three science domains
- Aimed to enable domain experts to encode
knowledge modules and untrained end-user to pose
questions and problems to those knowledge
modules. - EPCA
- Office procedure domain
- Naïve users interacting KB
52Outline
- What is loose-speak
- Thesis
- Frequency study
- LIPS
- Case study noun compound interpreter
- Future work
- Related work
53Ontology Merging
- Combine several ontologies into one standard
ontology. Niles and Pease 2001 McGuinness et
al. 2000 Noy and Musen 1999 Chalupsky 2000 - Similar to LS
- Concerned with resolving representational
differences. - Different from LS because
- Objects being replaced are different
- Need for automation is different
54Semantic Interpretation
- Map the representation of utterances in natural
languages into formal representation of meaning.
Ratnaparkhi 1997 Yarowsky 1992 Gardent and
Webber 1998 - Similar to LS
- To translate the ambiguities in natural
languages. - Different from LS because
- Emphasis on structural, semantic and scope
ambiguities.
55Noun Compounds
- Noun compounds in NLP classify noun compounds
into different categories. - Similar to LS
- Try to disambiguate the underlying semantic
relation in noun compounds. - Different in interpretation
- Semantic category vs. sequence of semantic
relations, e.g. animal virus
56Noun Compounds (continued)
- Rule based approaches (Rosemary 1984)
- Example If modifier is a material, then its a
made-of category such as in marble statue. - Machine learning based classification approaches
(Lauer 1994, Barker 1998) - KB search based approaches (Vanderwende 1994)
57LIPS and Noun Compounds
- Rule based approaches unsuitable because
- Not as flexible
- Requires additional knowledge, which may erase
the KA gains by automatically interpreting LS. - Machine learning based classification approaches
unsuitable for LIPS because - Lack of training examples
- Abundance of knowledge
- Need semantic relations instead of semantic
category as the end goal
58Metonymy Interpretation
- Rule based approaches (Weischedel and Sondheimer
1983, Grosz et al. 1987, Lytinen et al. 1992,
Fass 1997) - Meta rules on how to interpret metonymy
- Pros
- Easy to implement
- Cons
- Can only handle a fixed number of metonymy types
- May need different rules for different domains
59Metonymy Interpretation (continued)
- KB-search based (Browse 1978, Markert and Hahn
1997, Harabagiu 1998) - Rely on searches in general purpose knowledge
bases. - Pros
- No restriction on the types of metonymy it can
interpret. - No changes needed for different domains
- Cons
- Requires a large KB
60LIPS and Previous Metonymy Approaches
- LIPS similar to KB-based approaches
- Difference
- Uses prior knowledge hypothesis instead of type
violation to detect metonymy - Example
- KB Car has-part Engine, Engine has-part
Carburetor - Query the carburetor part of a car?
- Other metonymy systems no metonymy found because
of no type violations - LIPS metonymy found because no prior knowledge
of carburetor part of a car is found.
61Conclusion
- Defined Loose-speak
- Studied the frequencies of different types of LS
- Proposed an automatic loose-speak interpreter
- Analyzed how noun compound interpreter works
- Propose to further develop evaluate the
automatic loose-speak interpreter