Semantics and Information Extraction - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Semantics and Information Extraction

Description:

China announced a new policy regarding North Korea. ... Copular predication - Clinton was the president. Prepositional Phrase - The CEO of Microsoft... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 39
Provided by: douglas94
Category:

less

Transcript and Presenter's Notes

Title: Semantics and Information Extraction


1
Semantics and Information Extraction
  • Douglas E. Appelt
  • Artificial Intelligence Center
  • SRI International

2
What is Semantics?
  • Theory of the relationship between formal aspects
    of language and objects and facts in the world.

3
Traditional Approach in NLP (and linguistics)
  • Define a well-behaved logical language
  • Intensional logic
  • Dynamic predicate logic
  • Discourse Representation Structures
  • Define a semantics for the logical language
    (using model theory)
  • Devise rules for translating natural language
    structures into the logical language that
    preserve truth conditions.
  • Apply principles of compositionality to build
    larger structures from smaller ones.

4
Successes and Failures
  • Success
  • Data base query applications (e.g. ATIS systems)
  • Dialog systems with narrow domain of application
    (e.g. TRAINS)
  • Failures
  • Extracting information from large corpora
  • Real syntax too complex
  • Coverage too weak for large corpora

5
Semantics and Information Extraction
  • General requirements of a semantic theory for
    information extraction
  • ACE as a specific approach to semantics for
    information extraction
  • Examine specific issues
  • Basic ontology
  • Coreference
  • Generic/Specific
  • Metonymy
  • Relations and Events

6
Information ExtractionA Pragmatic Approach
  • Let application requirements drive semantic
    analysis
  • Identify the types of entities that are relevant
    to a particular task
  • Identify the range of facts that one is
    interested in for those entities
  • Ignore everything else

7
MUC and Scenario Templates
  • Define a set of interesting entities
  • Persons, organizations, locations
  • Define a complex scenario involving interesting
    events and relations over entities
  • Example management succession persons,
    companies, positions, reasons for succession
  • This collection of entities and relations is
    called a scenario template.

8
Problems with Scenario Template
  • Encouraged development of highly domain specific
    ontologies, rule systems, heuristics, etc.
  • Most of the effort expended on building a
    scenario template system was not directly
    applicable to a different scenario template.

9
Addressing the Problem
  • Address a large number of smaller, more focused
    scenario templates (Event-99)
  • Develop a more systematic ground-up approach to
    semantics by focusing on elementary entities,
    relations, and events (ACE)

10
The ACE Program
  • Automated Content Extraction
  • Develop core information extraction technology by
    focusing on extracting specific semantic entities
    and relations over a very wide range of texts.
  • Corpora Newswire and broadcast transcripts, but
    broad range of topics and genres.
  • Third person reports
  • Interviews
  • Editorials
  • Topics foreign relations, significant events,
    human interest, sports, weather
  • Discourage highly domain- and genre-dependent
    solutions

11
Components of a Semantic Model
  • Entities - Individuals in the world that are
    mentioned in a text
  • Simple entities singular objects
  • Collective entities sets of objects of the same
    type where the set is explicitly mentioned in the
    text
  • Attributes - Timeless unary properties of
    entities (e.g. Name)
  • Temporal points and intervals
  • Relations - Properties that hold of two entities
    over a time interval
  • Events - A particular kind of relation among
    entities implying a change in relation state at
    the end of the time interval.

12
Semantic Analysis Relating Language to the Model
  • Linguistic Mention
  • A particular linguistic phrase
  • Denotes a particular entity, relation, or event
  • A noun phrase, name, or possessive pronoun
  • A verb, nominalization, compound nominal, or
    other linguistic construct relating other
    linguistic mentions
  • Linguistic Entity
  • Equivalence class of mentions with same meaning
  • Coreferring noun phrases
  • Relations and events derived from different
    mentions, but conveying the same meaning

13
Language and World Model
Linguistic Mention
Denotes
Denotes
Linguistic Entity
14
NLP Tasks in an Extraction System
15
The Basic Semantic Tasks of an IE System
  • Recognition of linguistic entities
  • Classification of linguistic entities into
    semantic types
  • Identification of coreference equivalence classes
    of linguistic entities
  • Identifying the actual individuals that are
    mentioned in an article
  • Associating linguistic entities with predefined
    individuals (e.g. a database, or knowledge base)
  • Forming equivalence classes of linguistic
    entities from different documents.

16
Choosing an Ontology for IE Semantics
  • Ordinary native speakers should be able to
    annotate text with minimal training.
  • People should have well-developed intuitions
    about type classification
  • Is a museum an organization or facility? (A
    FOG?)
  • People should have well-developed intuitions
    about entity coreference
  • Peace in the Middle East
  • Entities should be extensional, not abstract,
    generic, counterfactual, or fictional

17
The ACE Ontology and Annotation Standards
  • Documents available online
  • http//www.ldc.upenn.edu/Projects/ACE/
  • Entity standards
  • Relations standards
  • Proposed event standards still under development

18
The ACE Ontology
  • Persons
  • A natural kind, and hence self-evident
  • Organizations
  • Should have some persistent existence that
    transcends a mere set of individuals
  • Locations
  • Geographic places with no associated governments
  • Facilities
  • Objects from the domain of civil engineering
  • Geopolitical Entities
  • Geographic places with associated governments

19
Why GPEs
  • An ontological problem certain entities have
    attributes of physical objects in some contexts,
    organizations in some contexts, and collections
    of people in others
  • Sometimes it is difficult to impossible to
    determine which aspect is intentded
  • It appears that in some contexts, the same phrase
    plays different roles in different clauses

20
Aspects of GPEs
  • Physical
  • San Francisco has a mild climate
  • Organization
  • The United States is seeking a solution to the
    North Korean problem.
  • Population
  • France makes a lot of good wine.

21
Metonymy
  • Metonymy is when a speaker uses a mention to
    refer in a systematic way to an entity with a
    different name or type than that mentioned.
  • Metonymy is a property of mentions.
  • A literal mention is where the mention uses the
    name or type of the referential entity.
  • A metonymic mention violates that in some way.
  • A single entity can have both literal and
    metonymic mentions.

22
Examples
  • Name metonymy
  • Beijing announced a new policy toward North
    Korea.
  • Baltimore hit a home run in the ninth inning
  • SRI was severely damaged in the 1989 earthquake
  • Type metonymy
  • John works for the restaurant on the corner

23
Problem Cases literal and metonymic mentions
both not types of interest
John bought a Picasso.
It set him back 1 million.
He is his favorite artist.
24
Role AmbiguityWhy isnt it just metonymy?
  • Iraq attacked Kuwait
  • Was the attack on the physical territory?
  • Was the attack on the government?
  • Was the attack on the people of Kuwait?
  • The answer is yes.

25
Multiple Roles
  • Iraq disputed its border with Kuwait
  • Governments dispute things
  • Physical real estate has borders

26
Role Classification andSparse Data Problem
  • Role determination through predicate-argument
    constraints
  • China announced a new policy regarding North
    Korea.
  • ACE Corpus About 20K words in training corpus
  • GPE-PER 84 configurations
  • GPE-LOC 432 configurations
  • GPE-ORG 504 configurations
  • GPE-GPE 789 configurations
  • Only 131 configurations have more than 2
    instances in the corpus (about 7)
  • Many of those involve weakly constrained
    predicates (have, be, of, etc.)

27
Generic vs Specific
  • The assumed application is building a database
    using extracted information
  • Databases typically represent concrete entities
  • Specificity is a critical attribute of linguistic
    entities.
  • Specificity is a property of the entity, not the
    mention
  • John is looking for a Java programmer.
  • He must have three years of experience.
  • Problem assessment of specificity is a nuanced
    distinction subject to substantial
    inter-annotater disagreement

28
Types of Linguistic Mentions
  • Name mentions
  • The mention uses a proper name to refer to the
    entity
  • Nominal mentions
  • The mention is a noun phrase whose head is a
    common noun
  • Pronominal mentions
  • The mention is a headless noun phrase, or a noun
    phrase whose head is a pronoun, or a possessive
    pronoun

29
Entity and Mention Example
COLOGNE, Germany (AP) _ A Chilean exile
has filed a complaint against former Chilean
dictator Gen. Augusto Pinochet accusing him of
responsibility for her arrest and torture in
Chile in 1973, prosecutors said Tuesday. The
woman, a Chilean who has since gained German
citizenship, accused Pinochet of depriving
her of personal liberty and causing bodily harm
during her arrest and torture.
Person Organization Geopolitical Entity
30
Relations
  • Relations hold between two entities over a time
    interval.
  • Relations may be timeless or temporal interval
    is not specified
  • Relations have inertia, I.e. they dont change
    unless a relevant event happens.

31
Explicit and Implicit Relations
  • Many relations are true in the world. Reasonable
    knoweldge bases used by extraction systems will
    include many of these relations. Semantic
    analysis requires focusing on certain ones that
    are directly motivated by the text.
  • Example
  • Baltimore is in Maryland is in United States.
  • Baltimore, MD
  • Text mentions Baltimore and United States. Is
    there a relation between Baltimore and United
    States?

32
Another Example
  • Prime Minister Tony Blair attempted to convince
    the British Parliament of the necessity of
    intervening in Iraq .
  • Is there a role relation specifying Tony Blair as
    prime minister of Britain?
  • A test a relation is implicit in the text if the
    text provides convincing evidence that the
    relation actually holds.

33
Explicit Relations
  • Explicit relations are expressed by certain
    surface linguistic forms
  • Copular predication - Clinton was the president.
  • Prepositional Phrase - The CEO of Microsoft
  • Prenominal modification - The American envoy
  • Possessive - Microsofts chief scientist
  • SVO relations - Clinton arrived in Tel Aviv
  • Nominalizations - Anans visit to Baghdad
  • Apposition - Tony Blair, Britains prime minister

34
Types of ACE Relations
  • ROLE - relates a person to an organization or a
    geopolitical entity
  • Subtypes member, owner, affiliate, client,
    citizen
  • PART - generalized containment
  • Subtypes subsidiary, physical part-of, set
    membership
  • AT - permanent and transient locations
  • Subtypes located, based-in, residence
  • SOC - social relations among persons
  • Subtypes parent, sibling, spouse, grandparent,
    associate

35
Event Types (preliminary)
  • Movement
  • Travel, visit, move, arrive, depart
  • Transfer
  • Give, take, steal, buy, sell
  • Creation/Discovery
  • Birth, make, discover, learn, invent
  • Destruction
  • die, destroy, wound, kill, damage

36
Problem Collective and Distributive Reference
John. .Bill
. they.
There are at least three distinct entities in
this text. Need a way to relate John and Bill
entities to the collective mention, they.
37
Solution Relations
John. .Bill
. they.
PartOf.part(e(John), e(they)) PartOf.part(e(Bill),
e(they))
Three of the men
PartOf.part(e(three), e(the men))
38
Summary
  • Motivation for a semantic theory is a practical
    one driven by database filling needs
  • Pick a limited ontology of core concepts, and
    build out, motivated by application needs
  • Address a broad spectrum of semantic problems,
    but from a limited ontology that simplifies data
    annotation issues.
Write a Comment
User Comments (0)
About PowerShow.com