Hybrid Systems for Information Extraction and Question Answering - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Hybrid Systems for Information Extraction and Question Answering

Description:

Hybrid Systems for Information Extraction and Question Answering Presented By Rani Qumsiyeh – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 26

Provided by: Simi157

Category:

more less

Transcript and Presenter's Notes

Title: Hybrid Systems for Information Extraction and Question Answering

1
Hybrid Systems for Information Extraction and
Question Answering

Presented By
Rani Qumsiyeh

2
What is Question Answering?

Being able to retrieve the exact piece of
information the user is looking for rather than a
set of relevant documents.
Who was the president of the US in 2004?
George W. Bush

3
What is Summarization?

Text summarization can be regarded as the most
interesting and promising Natural Language
Understanding task computational linguists are
currently faced with Rodolfo Delmonte
Summarization means taking a large piece of text
and extracting the most important ideas out of
it.
The story of the 3 little pigs
Once upon a time there were three little pigs
who lived happily in the countryside. But in the
same place lived a wicked wolf who fed precisely
on plump and tender pigs. The little pigs
therefore decided to build a small house each, to
protect themselves from the wolf. The oldest one,
Jimmy who was wise, worked hard and built his
house with solid bricks and cement. The other
two, Timmy and Tommy, who were lazy settled the
matter hastily and built their houses with straw
and pieces of wood. The lazy pigs spent their
days playing and singing a song that said, "Who
is afraid of the big bad wolf?" And one day, lo
and behold, the wolf appeared suddenly behind
their backs. "Help! Help!", shouted the pigs and
started running as fast as they could to escape
the terrible wolf. He was already licking his
lips thinking of such an inviting and tasty meal.
The little pigs eventually managed to reach their
small house and shut themselves in, barring the
door. They started mocking the wolf from the
window singing the same song, "Who is afraid of
the big bad wolf?" In the meantime the wolf was
thinking a way of getting into the house. He
began to observe the house very carefully and
noticed it was not very solid. He huffed and
puffed a couple of times and the house fell down
completely. Frightened out of their wits, the two
little pigs ran at breakneck speed towards their
brother's house. "Fast, brother, open the door!
The wolf is chasing us!" They got in just in time
and pulled the bolt. Within seconds the wolf was
arriving, determined not to give up his meal.
Convinced that he could also blow the little
brick house down, he filled his lungs with air
and huffed and puffed a few times. There was
nothing he could do. The house didn't move an
inch. In the end he was so exhausted that he fell
to the ground. The three little pigs felt safe
inside the solid brick house. Grateful to their
brother, the two lazy pigs promised him that from
that day on they too would work hard.

4
Could this be automated?

When understanding a text a human reader or
listener does make use of his encyclopedia
parsimoniously.
To do it automatically, the system should
simulate the actual human behavior in that the
access to extra linguistic knowledge is triggered
by contextual factors independently present in
the text and detected by the system itself.
Most simple approach is to use the Bag Of Words
(BOW).
For question answering, out of the first n
documents retrieved, extract the words in the
question along with a certain number of
neighboring words.
For summarization, extract all sentences with
title keywords in them.

5
What is the Problem?

The problem the researchers are trying to tackle
is taken from P. Bosch contribution to a book by
Herzog Rollinger(eds), Text Understanding in
LILOG.
Identifying in a text "inferentially unstable"
concepts which are to be kept distinct from
"inferentially stable" ones. The latter should be
analyzed solely on the basis of linguistic
description, while the former should tap external
linguistic knowledge of the world.
We identify tout court with contextual reasoning,
i.e. performing inferential processes on the
basis of linguistic information while keeping
under control the contribution of external
knowledge in order to achieve understanding of a
text

6
Example of the Problem

More information from query
Bill surprised Hillary with his answer
The word his refers to Bill, hence, answer refers
to Bill.
Same Head Problem
The president of Russia visited the president of
China
Who visited the president?
Reversible Arguments Problem
What do frogs eat?
What eats frogs?

7
The solution, A Hybrid System

Symbolic processing is defined as those
computations that are performed at the same or
more abstract level than the word level.
Statistical natural-language processing uses
stochastic, probabilistic and statistical methods
to resolve some of the ambiguities of text.
Syntactic processing deals with certain aspects
of meaning that can be determined only from the
underlying structure and not simply from the
linear string of words.
Semantic analysis involves extracting
context-independent aspects of a sentence's
meaning.
In order to act and think like a human a system
needs both.

8
GETARUNS (General Text And Reference UNderstander)

Works in the following way
Performs semantic analysis on the basis of
syntactic parsing.
Performs Anaphora Resolution.
Builds a quasi logical form with flat indexed
Augmented Dependency Structures (Discourse Model)
Uses a centering algorithm to individuate the
topics or discourse centers which are weighted on
the basis of a relevance score.
This logical form can then be used to individuate
the best sentence candidates to answer queries or
provide appropriate information.

9
The parser

Rule-based deterministic parser.
Uses a lookahead and a Well-Formed Substring
Table to reduce backtracking.
It also implements Finite State Automata in the
task of tag disambiguation.
It is based on a top down, depth-first search
tree.

10
Example of the F-Structure produced by the Parser

John went into a restaurant
indexf1
predgo
lex_formnp/subj/agent/human, object,
pp/obl/locat/to, in, into/object, place
voiceactive moodind tensepast
catresult
subj/agentindexsn4
cathuman
pred'John'
genmas numsing pers3
specdef'0'
tab_refref, -pro, -ana, -class
obl/locatindexsn5
catplace
predrestaurant
numsing pers3 specdef-
tab_refref, -pro, -ana, class
qmarkq1
aspectachiev_tr

11
Building the Discourse Model

A set of entities and relation between them, as
specified in a discourse.
Discourse Entities can be used as Discourse
Referents.
Entities and relation in a Discourse Model can be
interpreted as representations of the cognitive
objects of a mental model.
Representation inspired to Situation Semantics.
Implemented as prolog facts.

12
DM and infons

Any piece of information is added to the DM as an
infon.
Infon(Index,
Relation(Property),
List of Arguments - with Semantic Roles,
Polarity - 1 affirmative, 0 negation,
Temporal Location Index,
Spatial Location Index)
An infon consists of a relation name, its
arguments, a polarity (yes/no), and a couple of
indexes anchoring the relation to a
spatio-temporal location.
EX meet, (arg1john, arg2mary), yes,
22-sept-2008, venice
Each infon has a unique identifier and can be
referred to by other infons.

13
Kinds of Infons

Full infons
Situations sit/6
Facts fact/6
Complex infons have other sit/fact as argument
Simplified infons
Entities ind/2, set/2, class/2
Cardinalities card/3
Membership in/3
Spatio-temporal rels includes/2, during/2,

14
Entities, Cardinalities, Membership

Entities are represented in the DM without any
commitment about their existence in reality.
Individual entities (John) ind(infon1, id5).
Extensional plural entities (his kids)
set(infon2, id6).
Intensional plural entities (lions) class(,
id7).
Cardinality (only for sets four kids)
card(, id6, 5).
Membership (between individual and sets one of
them)
in(, id5, id6).

15
Anaphora Resolution

Anaphora is an instance of an expression
referring to another.
Anaphora Resolution means identifying which
instance of an expression Anaphora is referring
to.

16
Two Types of Anaphora

Noun/Noun Phrase (i.e. Nominal)
He doesnt like this book. Show him a more
interesting one.
One refers to the book.
If you want a typewriter, they will provide you
with one.
One refers to the typewriter.
Slang disappears quickly, especially the juvenile
sort.
Sort refers to Slang
Nominal substitutes also include some indefinite
pronouns, such as all, both, some, any enough,
several, none, many, much, (a) few, (a) little,
the other, others, another, either, neither, etc.
eg
Can you get me some nails? I need some.
Some refers to nails
Pronoun/Pronoun Phrase(i.e. Pronominal)
The Prime Minister of New Zealand visited us
yesterday. The visit was the first time she had
come to New York since 1998.
She refers to the Prime Minister.
Us refers to the people of New York.
The monkey took the banana and ate it.
it refers to the banana.

17
How does it work?

Computed by a Module of Discourse Anaphora (MDA).
Decides on the basis of semantic categories
attached to predicates and arguments of
predicates whether to bind a pronoun to the
locally available antecedent or to the discourse
level one.
Creates a list of candidates or possible
arguments of discourse which includes all
external pronouns and referential expressions.
The algorithm creates a Weighted List of
Candidates Arguments of Discourse(WLCAD)

18
Ontology Behind Anaphora Resolution

On first occurrence of a referring expression
it is asserted as an INDividual if it is a
definite or indefinite expression
it is asserted as a CLASS if it is quantified or
has no determiner
We have LOCs for main locations, both spatial and
temporal.
Whenever there is cardinality determined by a
digit, the referring expression is asserted as a
SET
On second occurrence of the same nominal head
The semantic index is recovered from the history
list
In case it is definite or indefinite with a
predicative role and no attributes nor modifiers,
nothing is done
In case it has different number - singular and
the one present in the DM is a set or a class,
nothing happens
In case it has attributes and modifiers which are
different and the one present in the DM has none,
nothing happens
In case it is quantified expression and has no
cardinality, and the one present in the DM is a
set or a class, again nothing happens.
Otherwise a new entity is asserted to the in DM.