MultiPerspective Question Answering - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

MultiPerspective Question Answering

Description:

In the United States, torture and pressure to confess crime is common. ... id span type name content. 42 215,228 string MPQA-agent id='foreign-ministry' 7/12/09 ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 66
Provided by: janyce8
Category:

less

Transcript and Presenter's Notes

Title: MultiPerspective Question Answering


1
Multi-Perspective Question Answering
  • ARDA NRRC Summer 2002 Workshop

2
Participants
  • Janyce Wiebe
  • Eric Breck
  • Chris Buckley
  • Claire Cardie
  • Paul Davis
  • Bruce Fraser
  • Diane Litman
  • David Pierce
  • Ellen Riloff
  • Theresa Wilson

3
Problem
  • Finding and organizing opinions in the world
    press and other text

4
Our Work will Support
  • Finding a range of opinions expressed on a
    particular topic, event, issue
  • Clustering opinions and their sources
  • Attitude (positive, negative, uncertain)
  • Basis for opinion (supporting beliefs,
    experiences)
  • Expressive style (sarcastic, vehement, neutral)
  • Building perspective profiles of individuals and
    groups over many documents and topics

5
Task Annotation
  • Manual annotation scheme for linguistic
    expressions of opinions
  • It is heresy, said Cao. The Shouters claim
  • they are bigger than Jesus.

(writer,Cao)
(writer,Cao,Shouters)
(writer,Cao)
(writer,Cao)
6
Task Annotation
The Foreign Ministry said Thursday that it was
surprised, to put it mildly
by the U.S. State Departments criticism of
Russias human rights
record and objected in particular to the odious
section on Chechnya.
7
Task Conceptualization
  • Various ways perspective is manifested in
    language
  • Implications for higher-level tasks

8
Task Automate Manual Annotations
  • Machine learning
  • Identification of opinionated phrases, sources of
    opinions,

9
Task Organizing Perspective Segments
  • Unsupervised clustering
  • Text features features from the annotation
    scheme higher-level features

10
Solution Architecture
Annotation Architecture
AnnotationTool
Learning Architecture
LearningAlgorithms
Trained Taggers
Application Architecture
PerspectiveTagging
DocumentRetrieval
SegmentClustering
Question
Other Taggers
11
Evaluation
  • Exploratory manual clustering
  • Evaluation of automatic annotations against
    manual annotations
  • End-user evaluation of how well the system groups
    text segments into clusters of similar opinions
    about a given topic
  • Development of other end-user evaluation tasks

12
Example
The Annual Human Rights Report of the US State
Department has been strongly criticized and
condemned by many countries. Though the report
has been made public for 10 days, its contents,
which are inaccurate and lacking good will,
continue to be commented on by the world media.
Many countries in Asia, Europe, Africa, and
Latin America have rejected the content of the US
Human Rights Report, calling it a brazen
distortion of the situation, a wrongful and
illegitimate move, and an interference in the
internal affairs of other countries. Recently,
the Information Office of the Chinese People's
Congress released a report on human rights in the
United States in 2001, criticizing violations of
human rights there. The report quoting data from
the Christian Science Monitor, points out that
the murder rate in the United States is 5.5 per
100,000 people. In the United States, torture and
pressure to confess crime is common. Many people
have been sentenced to death for crime they did
not commit as a result of an unjust legal system.
More than 12 million children are living below
the poverty line. According to the report, one
American woman is beaten every 15 seconds.
Evidence show that human rights violations in the
United States have been ignored for many years.
13
Example
The Annual Human Rights Report of the US State
Department has been strongly criticized and
condemned by many countries. Though the report
has been made public for 10 days, its contents,
which are inaccurate and lacking good will,
continue to be commented on by the world media.
Many countries in Asia, Europe, Africa, and
Latin America have rejected the content of the US
Human Rights Report, calling it a brazen
distortion of the situation, a wrongful and
illegitimate move, and an interference in the
internal affairs of other countries. Recently,
the Information Office of the Chinese People's
Congress released a report on human rights in the
United States in 2001, criticizing violations of
human rights there. The report quoting data from
the Christian Science Monitor, points out that
the murder rate in the United States is 5.5 per
100,000 people. In the United States, torture and
pressure to confess crime is common. Many people
have been sentenced to death for crime they did
not commit as a result of an unjust legal system.
More than 12 million children are living below
the poverty line. According to the report, one
American woman is beaten every 15 seconds.
Evidence show that human rights violations in the
United States have been ignored for many years.
14
Example
neg-attitude
15
Support the following
  • Describe the collective perspective w.r.t.
    issue/object presented in an individual article,
    across a set of articles,
  • Describe the perspective of a particular
    writer/individual/government/news service w.r.t.
    issue/object in an individual article, across a
    set of articles,
  • Create a perspective profile for agents, groups,
    news sources, etc.

16
Outline
  • Annotation Wiebe Wilson
  • Conceptualization Davis
  • Architecture Pierce
  • End-user evaluation Buckley

17
Annotation
  • Find opinions, evaluations, emotions,
    speculations (private states) expressed in
    language

18
Annotation
  • Explicit mentions of private states and speech
    events
  • The United States fears a spill-over from the
    anti-terrorist campaign
  • Expressive subjective elements
  • The part of the US human rights report about
    China is full of absurdities and fabrications.

19
Annotation
  • Nested sources

The US fears a spill-over, said Xirao-Nima, a
professor of foreign affairs at the central
university for nationalities.
20
Annotation
  • Whether opinions or other private states are
    expressed in speech
  • Type of private state (negative evaluation,
    positive evaluation, )
  • Object of positive or negative evaluation
  • Strengths of expressive elements and private
    states

21
Example
  • It is heresy, said Cao. The Shouters claim
  • they are bigger than Jesus.

22
Example
The Foreign Ministry said Thursday that it was
surprised, to put it mildly
by the U.S. State Departments criticism of
Russias human rights
record and objected in particular to the odious
section on Chechnya.
23
Accomplishments
  • Fairly mature annotation scheme and instructions
  • Representation supporting manual annotation using
    GATE (Sheffield)
  • Annotation corpus
  • Significant training of 3 annotators
  • Participants understand the annotation scheme

24
Sample Gate Annotation
25
Conceptualization
  • Ideology, emotions, and opinions are reflected in
    language
  • Language gives us a means to track and assess
    perspective
  • Goal create document to support workshop
    annotation and experiments, and to extend to
    future applications

26
Conceptualization Part ITheoretical Background
  • Types of perspective attitudes (subjectivity),
    spatial, temporal, sociological, etc.
  • Focuses on subjectivity expressed linguistically
    (e.g.,
    opinions criticized an unfair election,
    emotions applauded the election, speculations
    probably will be elected)

27
Conceptualization SubjectivityTheoretical
Background (continued)
  • Sources have Attitudes about Objects (writer,
    criticizes, election)
  • An ontology of attitudes leading to different
    types of private states (distinctions can range
    from identification, to positive and negative, to
    more fine-grained reliability, source,
    assessment, necessity, etc.)
  • This theoretical background informs the
    annotation strategy, experiments, and extensions

28
Conceptualization Part IILooking to higher
levels larger segments
  • Subjectivity beyond the immediate occurrence of
    the segment
  • sentence and paragraph level
  • document level
  • discourse and topic level

29
Conceptualization Part IIILooking forward
applications
  • Track perspective over time (identify changes)
  • Identify ideology (subjective expressions taken
    as a unit may approximate ideology)
  • Cluster agents with similar ideologies
  • (similar expressions of opinions may help group
    those on the same side)
  • Infer ideology from limited expressions of
    perspective (some subjectivity for a source may
    suggest opinions on other topics)

30
Architecture Overview
  • Solution architecture includes
  • Application Architecture
  • supports high-level QA task
  • Annotation Architecture
  • supports document annotation
  • Learning Architecture
  • supports development of low- and mid-level system
    components via machine learning

31
Solution Architecture
AnnotationArchitecture
annotateddocuments
LearningArchitecture
automaticannotators
ApplicationArchitecture
32
Solution Architecture
Annotation Architecture
AnnotationTool
Learning Architecture
LearningAlgorithms
Trained Taggers
Application Architecture
PerspectiveTagging
DocumentRetrieval
DocumentClustering
Question
Other Taggers
33
Application Architecture
Multi-perspective Classifiers
Document Clustering
Documents
Annotation Database
Gate NE
CASS
Feature Generators
34
Annotation Components
  • GATEs ANNIE or MITRE Alembic
  • Tokenization, sentence-finding
  • Part-of-speech tagging
  • Name finding
  • Coreference resolution
  • CASS partial parser
  • SMART IR engine
  • Feature Generators

35
Learning Architecture
Evaluation
Training Data
Weka Learner
Weka Learner
Annotation Database
Gate NE
CASS
Feature Generators
36
Learning Tasks
  • Identify subjective phrases
  • Identify nested sources
  • Discriminate Facts and Views
  • Classify Opinion Strength

37
Learning Features
  • Name recognition
  • Syntactic features
  • Lists of words
  • Contextual features
  • Density

38
Annotation Architecture
TopicDocuments
GateAnnotationTool
HumanAnnotators
Gate XML
MPQADatabase
39
Annotation Tool (GATE)
  • Move headers and original markup to standoff
    annotation database
  • Initialize document annotations
  • Initial sources and speech events
  • Verify human annotations
  • Check id existence
  • Check attribute consistency

40
Data Formats
  • Gate XML Format
  • standoff
  • structured
  • MPQA Annotation Format
  • standoff
  • flat
  • Machine Learning Formats (e.g., ARFF)

41
Gate XML Format
ltAnnotation Typeexpressive-subjectivity
StartNode215 EndNode228gt ltFeaturegt ltN
amegtstrengthlt/Namegt ltValuegtlowlt/Valuegt lt/Featur
egt ltFeaturegt ltNamegtsourcelt/Namegt ltValuegtw,for
eign-ministrylt/Valuegt lt/Featuregt lt/Annotationgt
42
MPQA Annotation Format
id span type name content 42 215,228 string MPQA-
agent idforeign-ministry
43
End-User Evaluation Goal
  • Establish framework for evaluating tasks that
    would be of direct interest to analyst users
  • Do an example evaluation

44
Manual Clustering
  • Human exploratory effort
  • MPQA participants manually cluster documents from
    1-2 topics
  • Analyze basis for cluster

45
User Task Topic
  • U1 User states topic of interest and interacts
    with IR system
  • S1 System retrieves set of relevant documents
    along with their perspective annotations

46
Example Topic
  • U1 2002 election in Zimbabwe
  • S1 System returns
  • 03.47.06-11142 Mugabe confident of victory in
  • 04.33.07-17094 Mugabe victory leaves West in
  • 05.22.13-11526 Mugabe says he is wide awake
  • 06.21.57-1967 Mugabe predicts victory
  • 06.37.20-8125 Major deployment of troops
  • 06.47.23-22498 Zambia hails results

47
User Task Question
  • U2 User states particular perspective question
    on topic.
  • Question should
  • identify source type (eg, governments,
    individuals, writers) of interest.
  • Be a yes/no (or pro/con) question for now

48
Example Question
  • Give Range of perspective national
    government,groups of governments
  • Was the election process fair, valid, and free of
    voter intimidation?

49
User Task Question Response
  • S2System clusters documents
  • based on question,text,annotations
  • goalgroup together documents with same answer
    and perspective (including expressive content).
  • System,for now, does not attempt to label each
    group with specific answers.
  • Target a small number of clusters (4?)

50
ExampleQuestion Response
  • Cluster 1 ltkeywordsgt
  • 07.20.20-11694
  • 08.12.40-1611
  • 08.15.19-23507
  • 09.35.06-27851
  • 13.10.41-18948
  • Cluster 2 ltkeywordsgt
  • 12.08.27-27397
  • 13.44.36-19236
  • 04.33.07-17094
  • 05.22.13-11526
  • Cluster 3 ltkeywordsgt
  • 06.47.23-22498
  • 06.51.18-1222
  • 06.56.31-3120
  • 07.16.31-13271

51
User Task Cluster Feature
  • U3 User states constraints on clustered
    documents or segments.
  • These might be geographic, date, ideological,
    political, religous
  • S3 System shows subclusters or highlighted
    documents

52
Example Cluster by Features
  • U3 Highlight governments by regions
  • S3 System shows docs with African governments
    opinions in red, North American in blue, European
    in green, Asian in purple. Multicolored if docs
    have more than one source

53
User Task Results
  • U4 User gets impression (visual or statistical)
    whether constraints match clusters.
  • Easy visualization of exceptions

54
Example Results
  • User sees that the
  • Red docs (African) are mostly in one cluster,
  • Blue and green (NA and EU) in another
  • Purple docs are scattered in both clusters.

55
Document Collection
  • Large collection of 270,000 foreign news
    documents from June, 2001 to May, 2002
  • Almost all FBIS documents with a small number of
    other relevant docs.
  • From MITRE MiTAP system

56
Document Collection Features
  • English Language
  • 60 FBIS translated
  • 40 source English
  • 20 TV/Radio
  • 5 Identified as editorials

57
  • From day Tue Jan 22 201306 2002
  • Received from smtpsrv1.mitre.org
  • From FBIS_at_fbis.org
  • Date 21 Jan 2002 000000 (EST)
  • Subject Vietnam Calls for Broader
  • lt?xml version"1.0"?gt
  • lt!DOCTYPE document
  • ltdocument media_file"sep20020122000034n.html"
    media_type"text" scribe"Rough'n'Ready v1.1"
    title"Vietnam Calls for Broader Environmental
    Protection at ASEM Conference in Beijing"
    document_time"2002-01-21" create_time"2002-01-22
    T201305" source"Worldwide Open-source News"
    description"Hanoi Voice of Vietnam News "
    reference"SEP20020122000034 Hanoi Voice of
    Vietnam News WWW-Text in Vietnamese 21 Jan 02"gt
  • ltregiongtEast Asialt/regiongt
  • ltregiongtChinalt/regiongt
  • ltsubregiongtSoutheast Asialt/subregiongt
  • ltsubregiongtChinalt/subregiongt
  • ltcountrygtVietnamlt/countrygt
  • ltcountrygtChinalt/countrygt
  • ltsection section_id"1"gt
  • lttopicsgt lttopicgtENVIRONMENTlt/topicgtlttopicgtHEALTHlt
    /topicgtlt/topicsgt
  • ltturngt
  • Vietnamese Minister of Science, Technology and
    Environment Chu Tuan Nha told the first
    Asia-Europe Meeting ASEM Environment Ministers'
    Meeting ASEM EnMM in Beijing recently that
    Vietnam always values environmental protection,
    including prevention of pollution or degradation,
    bio-diversity protection, and improvement of the
    environment in industrial zones and in both urban
    and rural areas.

58
Sample Pure Text
  • Vietnamese Minister of Science, Technology and
    Environment Chu Tuan Nha told the first
    Asia-Europe Meeting ASEM Environment Ministers'
    Meeting ASEM EnMM in Beijing recently that
    Vietnam always values environmental protection,
    including prevention of pollution or degradation,
    bio-diversity protection, and improvement of the
    environment in industrial zones and in both urban
    and rural areas.

59
Sample Meta-annotation
  • 1 0,0 string meta_media_file sep20020122000034n.ht
    ml
  • 2 0,0 string meta_media_type text
  • 3 0,0 string meta_scribe Rough'n'Ready v1.1
  • 4 0,0 string meta_title Vietnam Calls for Broader
    Environmental Protection at ASEM Conference in
    Beijing
  • 5 0,0 string meta_document_time 2002-01-21
  • 6 0,0 string meta_create_time 2002-01-22T201305
  • 7 0,0 string meta_source Worldwide Open-source
    News
  • 8 0,0 string meta_description Hanoi Voice of
    Vietnam News
  • 9 0,0 string meta_reference SEP20020122000034
    Hanoi Voice of Vietnam News WWW-Text in
    Vietnamese 21 Jan 02
  • 10 0,0 string meta_region East Asia
  • 11 0,0 string meta_region China
  • 12 0,0 string meta_subregion Southeast Asia
  • 13 0,0 string meta_subregion China
  • 14 0,0 string meta_country Vietnam
  • 15 0,0 string meta_country China
  • 16 0,0 string meta_topic ENVIRONMENT
  • 17 0,0 string meta_topic HEALTH

60
Topics
  • About 12 Topic statements.
  • Clause or Sentence
  • 25-50 known relevant docs per topic, with manual
    perspective annotations.
  • 1-5 Questions per topic

61
Questions
  • Type of Perspective
  • range of perspective,
  • strongly felt perspective,
  • identify all perspective
  • Issue
  • Direct information
  • Opinion evidence
  • Constraints (pinpoint discrepencies)

62
Evaluation on Topic/Question
  • Artificially construct 75 doc retrieved set
  • Include the known (25-50) rel docs
  • Add top retrieved docs from SMART
  • System automatically annotates set
  • System clusters based on annotation.

63
Evaluation (cont)
  • Evaluate homogeneity of clusters. Compare with
  • Base Case 1 Cluster docs into same number of
    clusters without any annotations
  • Base Case 2 Cluster docs into same number of
    clusters based on manual annotations.

64
Evaluation Within Workshop
  • Evaluation through S2 only
  • No constraints, subclusters
  • Yes/No (Pro/Con) questions only

65
Current Status
  • Document collection prepared, indexed
  • 8 topics (more coming)
  • 16 questions total
  • 10-40 rel docs per topic (more coming)
Write a Comment
User Comments (0)
About PowerShow.com