Hybrid Systems for Information Extraction and Question Answering - PowerPoint PPT Presentation

1 / 25
About This Presentation

Hybrid Systems for Information Extraction and Question Answering


Hybrid Systems for Information Extraction and Question Answering Presented By Rani Qumsiyeh – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 26
Provided by: Simi157


Transcript and Presenter's Notes

Title: Hybrid Systems for Information Extraction and Question Answering

Hybrid Systems for Information Extraction and
Question Answering
  • Presented By
  • Rani Qumsiyeh

What is Question Answering?
  • Being able to retrieve the exact piece of
    information the user is looking for rather than a
    set of relevant documents.
  • Who was the president of the US in 2004?
  • George W. Bush

What is Summarization?
  • Text summarization can be regarded as the most
    interesting and promising Natural Language
    Understanding task computational linguists are
    currently faced with Rodolfo Delmonte
  • Summarization means taking a large piece of text
    and extracting the most important ideas out of
  • The story of the 3 little pigs
  • Once upon a time there were three little pigs
    who lived happily in the countryside. But in the
    same place lived a wicked wolf who fed precisely
    on plump and tender pigs. The little pigs
    therefore decided to build a small house each, to
    protect themselves from the wolf. The oldest one,
    Jimmy who was wise, worked hard and built his
    house with solid bricks and cement. The other
    two, Timmy and Tommy, who were lazy settled the
    matter hastily and built their houses with straw
    and pieces of wood. The lazy pigs spent their
    days playing and singing a song that said, "Who
    is afraid of the big bad wolf?" And one day, lo
    and behold, the wolf appeared suddenly behind
    their backs. "Help! Help!", shouted the pigs and
    started running as fast as they could to escape
    the terrible wolf. He was already licking his
    lips thinking of such an inviting and tasty meal.
    The little pigs eventually managed to reach their
    small house and shut themselves in, barring the
    door. They started mocking the wolf from the
    window singing the same song, "Who is afraid of
    the big bad wolf?" In the meantime the wolf was
    thinking a way of getting into the house. He
    began to observe the house very carefully and
    noticed it was not very solid. He huffed and
    puffed a couple of times and the house fell down
    completely. Frightened out of their wits, the two
    little pigs ran at breakneck speed towards their
    brother's house. "Fast, brother, open the door!
    The wolf is chasing us!" They got in just in time
    and pulled the bolt. Within seconds the wolf was
    arriving, determined not to give up his meal.
    Convinced that he could also blow the little
    brick house down, he filled his lungs with air
    and huffed and puffed a few times. There was
    nothing he could do. The house didn't move an
    inch. In the end he was so exhausted that he fell
    to the ground. The three little pigs felt safe
    inside the solid brick house. Grateful to their
    brother, the two lazy pigs promised him that from
    that day on they too would work hard.

Could this be automated?
  • When understanding a text a human reader or
    listener does make use of his encyclopedia
  • To do it automatically, the system should
    simulate the actual human behavior in that the
    access to extra linguistic knowledge is triggered
    by contextual factors independently present in
    the text and detected by the system itself.
  • Most simple approach is to use the Bag Of Words
  • For question answering, out of the first n
    documents retrieved, extract the words in the
    question along with a certain number of
    neighboring words.
  • For summarization, extract all sentences with
    title keywords in them.

What is the Problem?
  • The problem the researchers are trying to tackle
    is taken from P. Bosch contribution to a book by
    Herzog Rollinger(eds), Text Understanding in
  • Identifying in a text "inferentially unstable"
    concepts which are to be kept distinct from
    "inferentially stable" ones. The latter should be
    analyzed solely on the basis of linguistic
    description, while the former should tap external
    linguistic knowledge of the world.
  • We identify tout court with contextual reasoning,
    i.e. performing inferential processes on the
    basis of linguistic information while keeping
    under control the contribution of external
    knowledge in order to achieve understanding of a

Example of the Problem
  • More information from query
  • Bill surprised Hillary with his answer
  • The word his refers to Bill, hence, answer refers
    to Bill.
  • Same Head Problem
  • The president of Russia visited the president of
  • Who visited the president?
  • Reversible Arguments Problem
  • What do frogs eat?
  • What eats frogs?

The solution, A Hybrid System
  • Symbolic processing is defined as those
    computations that are performed at the same or
    more abstract level than the word level.
  • Statistical natural-language processing uses
    stochastic, probabilistic and statistical methods
    to resolve some of the ambiguities of text.
  • Syntactic processing deals with certain aspects
    of meaning that can be determined only from the
    underlying structure and not simply from the
    linear string of words.
  • Semantic analysis involves extracting
    context-independent aspects of a sentence's
  • In order to act and think like a human a system
    needs both.

GETARUNS (General Text And Reference UNderstander)
  • Works in the following way
  • Performs semantic analysis on the basis of
    syntactic parsing.
  • Performs Anaphora Resolution.
  • Builds a quasi logical form with flat indexed
    Augmented Dependency Structures (Discourse Model)
  • Uses a centering algorithm to individuate the
    topics or discourse centers which are weighted on
    the basis of a relevance score.
  • This logical form can then be used to individuate
    the best sentence candidates to answer queries or
    provide appropriate information.

The parser
  • Rule-based deterministic parser.
  • Uses a lookahead and a Well-Formed Substring
    Table to reduce backtracking.
  • It also implements Finite State Automata in the
    task of tag disambiguation.
  • It is based on a top down, depth-first search

Example of the F-Structure produced by the Parser
  • John went into a restaurant
  • indexf1
  • predgo
  • lex_formnp/subj/agent/human, object,
    pp/obl/locat/to, in, into/object, place
  • voiceactive moodind tensepast
  • catresult
  • subj/agentindexsn4
  • cathuman
  • pred'John'
  • genmas numsing pers3
  • tab_refref, -pro, -ana, -class
  • obl/locatindexsn5
  • catplace
  • predrestaurant
  • numsing pers3 specdef-
  • tab_refref, -pro, -ana, class
  • qmarkq1
  • aspectachiev_tr

Building the Discourse Model
  • A set of entities and relation between them, as
    specified in a discourse.
  • Discourse Entities can be used as Discourse
  • Entities and relation in a Discourse Model can be
    interpreted as representations of the cognitive
    objects of a mental model.
  • Representation inspired to Situation Semantics.
  • Implemented as prolog facts.

DM and infons
  • Any piece of information is added to the DM as an
  • Infon(Index,
  • Relation(Property),
  • List of Arguments - with Semantic Roles,
  • Polarity - 1 affirmative, 0 negation,
  • Temporal Location Index,
  • Spatial Location Index)
  • An infon consists of a relation name, its
    arguments, a polarity (yes/no), and a couple of
    indexes anchoring the relation to a
    spatio-temporal location.
  • EX meet, (arg1john, arg2mary), yes,
    22-sept-2008, venice
  • Each infon has a unique identifier and can be
    referred to by other infons.

Kinds of Infons
  • Full infons
  • Situations sit/6
  • Facts fact/6
  • Complex infons have other sit/fact as argument
  • Simplified infons
  • Entities ind/2, set/2, class/2
  • Cardinalities card/3
  • Membership in/3
  • Spatio-temporal rels includes/2, during/2,

Entities, Cardinalities, Membership
  • Entities are represented in the DM without any
    commitment about their existence in reality.
  • Individual entities (John) ind(infon1, id5).
  • Extensional plural entities (his kids)
    set(infon2, id6).
  • Intensional plural entities (lions) class(,
  • Cardinality (only for sets four kids)
  • card(, id6, 5).
  • Membership (between individual and sets one of
  • in(, id5, id6).

Anaphora Resolution
  • Anaphora is an instance of an expression
    referring to another.
  • Anaphora Resolution means identifying which
    instance of an expression Anaphora is referring

Two Types of Anaphora
  • Noun/Noun Phrase (i.e. Nominal)
  • He doesnt like this book. Show him a more
    interesting one.
  • One refers to the book.
  • If you want a typewriter, they will provide you
    with one.
  • One refers to the typewriter.
  • Slang disappears quickly, especially the juvenile
  • Sort refers to Slang
  • Nominal substitutes also include some indefinite
    pronouns, such as all, both, some, any enough,
    several, none, many, much, (a) few, (a) little,
    the other, others, another, either, neither, etc.
  • Can you get me some nails? I need some.
  • Some refers to nails
  • Pronoun/Pronoun Phrase(i.e. Pronominal)
  • The Prime Minister of New Zealand visited us
    yesterday. The visit was the first time she had
    come to New York since 1998.
  • She refers to the Prime Minister.
  • Us refers to the people of New York.
  • The monkey took the banana and ate it.
  • it refers to the banana.

How does it work?
  • Computed by a Module of Discourse Anaphora (MDA).
  • Decides on the basis of semantic categories
    attached to predicates and arguments of
    predicates whether to bind a pronoun to the
    locally available antecedent or to the discourse
    level one.
  • Creates a list of candidates or possible
    arguments of discourse which includes all
    external pronouns and referential expressions.
    The algorithm creates a Weighted List of
    Candidates Arguments of Discourse(WLCAD)

Ontology Behind Anaphora Resolution
  • On first occurrence of a referring expression
  • it is asserted as an INDividual if it is a
    definite or indefinite expression
  • it is asserted as a CLASS if it is quantified or
    has no determiner
  • We have LOCs for main locations, both spatial and
  • Whenever there is cardinality determined by a
    digit, the referring expression is asserted as a
  • On second occurrence of the same nominal head
  • The semantic index is recovered from the history
  • In case it is definite or indefinite with a
    predicative role and no attributes nor modifiers,
    nothing is done
  • In case it has different number - singular and
    the one present in the DM is a set or a class,
    nothing happens
  • In case it has attributes and modifiers which are
    different and the one present in the DM has none,
    nothing happens
  • In case it is quantified expression and has no
    cardinality, and the one present in the DM is a
    set or a class, again nothing happens.
  • Otherwise a new entity is asserted to the in DM.

GETARUN as a QA system
  • Uses Bag Of Words to search through Google.
  • It builds the Discourse Model for the first five
  • It looks for the answer using the Discourse
  • It retrieves the snippet with the right answer.

Examples of QA
  • No other system out there that does text
  • 74 F-measure for Anaphora Resolution.
  • Is very effective in retrieving the gist of the
  • Can answer natural language questions.
  • Introduces very important algorithms to the NLP

  • Very slow when dealing with large text.
  • When summarizing it only manages to maintain 73
    of the important text.
  • No actual data to test on.
  • If data is lost, can we really use such a system.
  • Achieves a 63 accuracy with question answering.
  • Cannot answer WHO questions.

Future Work
  • Consider more than 2 sentences in advance of the
    current one being processed.
  • Find a way to deal with all type of questions.
  • Currently this work is being performed, no
    publication yet.
  • Try to increase accuracy, especially in the
    summarization aspect of the system.
  • Consider Categories of questions to further pin
    down the answer.
Write a Comment
User Comments (0)
About PowerShow.com