Analysis of Uncertain Data in Text Documents - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of Uncertain Data in Text Documents

Description:

We will use public data about Iranian nanotechnology in the system ... perform the same tasks using off-the-shelf tools. Unclassified//For Official Use Only ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 12
Provided by: DTO83
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Uncertain Data in Text Documents


1
Analysis of Uncertain Datain Text Documents
PAINT
  • Carnegie Mellon University and DYNAMiX
    Technologies
  • PI Jaime G. Carbonell / jgc_at_cs.cmu.edu / (412)
    268-7279
  • Co-PI Eugene Fink / e.fink_at_cs.cmu.edu / (412)
    268-6593
  • HNC and Fair Isaac
  • Co-PI Dayne Freitag / daynefreitag_at_fairisaac.com
    / (858) 369-8191
  • Co-PI Richard Rohwer / richardrohwer_at_fairisaac.co
    m / (858) 369-8318

2
Proposed functionality
We will integrate the text-extraction system
developed by HNC / Fair Isaac with the
uncertainty-analysis system developed by CMU /
DYNAMiX. The integrated system will support the
following main capabilities.
  • Extraction of relevant facts, relations, and
    causal links from natural-language text documents
  • Automated intent inferences and identification of
    surprising developments based on uncertain data
  • Evaluation of given hypotheses
  • Proactive information gathering
  • Application to the analysis of Iranian
    nanotechnology plans and capabilities

We will also build an external API for future
integration with other PAINT systems, and
evaluate its effectiveness by implementing an
optional loose integration with the
predictive-analysis system developed by Berkeley
/ LLC.
3
HNC / Fair IsaacREALISM System
Extracted relations and causal links(structured
rules)
Knowledge baseentities, relations, implication
pool
Abstract IE model learning background
Extracted facts and entities(structured tables)
Basic IE model learning background
Information extraction (entities and relations)
IE models
Genre detection
Academic
TEXT DOCUMENTS
Unstructured text archive by genre
Newswire
Data acquisition Real-time IR
Blog
...
WEB
Background / model-learning data paths Real-time
/ modeling data paths
4
HNC / Fair IsaacREALISM System
  • Output
  • Large structured tables of relevant facts and
    entities, which include uncertainty
  • Inference-rule representation of relations and
    causal links, also including uncertainty
  • Input
  • Requirements and filters for the information
    extraction
  • Natural-language documents
  • World-wide web

5
CMU / DYNAMiXRAPID System
Manual entry,selection, andediting ofknowledge
Prioritized plans for proactivedata collection
Learnedinferencerules
RAPID Inference Engine
RAPID Proactive Planner
Criticaluncertainties
Inferredfacts
Evaluation ofhypotheses
Querymatches
6
CMU / DYNAMiXRAPID System
  • Output
  • Inferences from uncertain data
  • New learned inference rules
  • Exact and approximatematches for given queries
  • Hypothesis assessment
  • Proactive plans for collectingadditional data
  • Input
  • Reality interpretation tables, which represent
    uncertain facts
  • Uncertain inference rules
  • Queries for specific relevant data
  • Analysts hypotheses

7
Integrated system
Manual entry,selection, andediting ofknowledge
Informationrequests
Topicfilters
Structured relations andcausal links
TEXT DOCUMENTS
Learnedinferencerules
Plans forproactivedata collection
REALISM
RAPID
Structuredfacts andentities
Inferredfacts
WEB
Evaluation ofhypotheses
Querymatches
HNC / Fair Isaac
CMU / DYNAMiX
External API
OTHER PAINT SYSTEMS
Testing with Berkeley / LLC System
8
Empirical evaluation
Data We will use public data about Iranian
nanotechnology in the system evaluation. When the
PAINT challenge-problem data about Iran becomes
available, we will combine it with the public
data.
  • Component evaluation
  • We will measure the following performance
    factors
  • Accuracy and completeness of text extraction
  • Accuracy of hypothesis evaluation
  • Effectiveness of data-collection plans
  • Speed of each system component

9
Empirical evaluation
Evaluation of the integrated system We will
compare the productivity of subjects usingthe
developed system with that of subjects
whoperform the same tasks using off-the-shelf
tools.
  • Specific tasks
  • Find data relevant to given hypotheses
  • Evaluate the validity of these hypotheses
  • Identify critical uncertainties and propose a
    plan for collecting additional relevant data
  • Performance measurements
  • Number of tasks completed during the experiment
  • Accuracy of hypothesis evaluation
  • Effectiveness of proactive data-collection plans

Experimental group Use of REALISM / RAPID
Control group Use of standard tools
10
Empirical evaluation
Component utility We will also evaluate the
utility of REALISM and RAPID by comparing the
productivity of subjects under the following
three conditions
  • Use of the integrated system
  • Use of REALISM without RAPID
  • Use of RAPID without REALISM

11
Schedule
Write a Comment
User Comments (0)
About PowerShow.com