Multimodal Information Access and Synthesis Research Review - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Multimodal Information Access and Synthesis Research Review

Description:

Multimodal Information Access and Synthesis Research Review Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 39
Provided by: danr7
Category:

less

Transcript and Presenter's Notes

Title: Multimodal Information Access and Synthesis Research Review


1
Multimodal Information Access and Synthesis
Research Review
  • Dan Roth
  • Department of Computer Science
  • University of Illinois at Urbana/Champaign

2
MIAS Mission
  • Most of the data today is unstructured
  • books, newspaper articles, journal publications,
    reports, images, and audio and video streams.
  • How to deal with the huge amount of unstructured
    data as if it was organized in a database with a
    known schema.
  • how to locate, organize, access and analyze
    unstructured data.
  • MIAS Mission
  • develop the theories, algorithms, and tools for
    analysts to
  • access a variety of data formats and models
  • integrate them with existing resources
  • transform raw data into useful and understandable
    information.

3
Task Perspective
  • In the next decade, some people will need to
  • Monitor a multimodal stream of interesting
    events, entities, threats
  • Formulate and evaluate hypotheses with respect to
    them.
  • Impossible to touch even a fraction of the data
    available
  • Requires interaction, at the appropriate level of
    semantic abstraction, with a system that can
  • synthesize, summarize and interpret vast amounts
    of multimodal information,
  • integrate observed data with multimodal domain
    models and background information in multiple
    formats,
  • propose hypotheses, and help verify them.

4
Information Access Management
  • The key to allowing access to information is to
  • Match methods to evoke an information need and to
  • Transform information from one form to another to
    make it more accessible to the abilities
    available to a user.
  • The solution lies in understanding the meaning of
    information and of peoples interaction with it.
  • Current tools tend to ignore the meaning of
    information, and operate on surface phenomena
    (particular words, image segments, and so on).
  • Doing so complicates all stages of the
    information use process.
  • The key is matching the demand of interacting
    with information to the abilities available to a
    user when they do it.
  • This will allow access to information under
    unusual, but valuable, circumstances (eg., when
    some modalities cannot be used temporarily)

5
Scenario
  • Consider an intelligence analyst researching a
    problem
  • Iranian nuclear program generate a list of
    Iranian nuclear scientists, affiliations,
    specialties, biographies, photos, and notable
    recent activities.
  • Current technologies have solved the problem of
    collecting and storing huge amounts of
    information it would be reasonable to assume
    that the information she is after does exist
  • However, multiple barriers exist on the way to
    successful analysis, synthesis and decision
    support, posing significant research challenges.
  • Medical treatment what is known about it who
    are the experts what do users say about it what
    side effects have been reported
  • Disease Outbreaks what is known about it (say,
    Ebola, Sars) who are the experts evidence for
    outbreak side effects reported, where, when.
  • Food Protection, Water Safety, Societal
    Infrastructures and Development project

6
Multimodal Information Access Synthesis
Saeed Zakeri
Saeed Zakeri
Online Data Sources
Analysis Synthesis
Web pages
attended


News Articles Specific Web sites Text
Repositories Relational databases Surveillance
Videos ...

Discovering unusual events, entities, trends
threats associations. Tracking of events,
entities associations Rapid retrieval of all
multi modal information about a particular entity
Mapping to augmenting institutional resources
Efficient search, querying, question answering,
browsing.




visited
Elkhan Factory
Text documents
U. of Tehran


Elkhan Factory






Relational data
loc Northern Iran name Elkhan
Factory topics fertilizer, enrichment Semantic
categories Temporal categories Subjectivity/Opinio
ns
Images
Infer Metadata Semantic entities
Discover Trends of Relations Between Semantic
Entities
  • Focused Multimodal Data Retrieval

Support Information Analysis, Knowledge
Discovery, Monitoring
Semantic Disambiguation Integration across
multiple sources and modalities
Meaning Based Transformation of Data for
Presentation and Analysis Support.


7
MIAS Processes
Tools Text Processing Analysis Semantic Analysis
Information Extraction Information
Integration Machine Learning Knowledge
Discovery Integrating Text Images
  • Focused data retrieval and integration
  • Identify and collect relevant data from multiple
    sources
  • Semantic data enrichment Real world Entities and
    Relations among them
  • Infer semantics from unstructured data and
    images
  • Identify real-world entities and relations among
    them
  • Extraction of attributes and relations features
    into a common framework (generalized graphs)
  • Relate them to existing institutional resources
    for information integration
  • Trend Analysis
  • Tracking of events, social content, entities and
    topics
  • Knowledge discovery and hypotheses generation and
    verification
  • Construct the rich semantic structure and hidden
    networks of entity linkages
  • Multifaceted output
  • Information extraction
  • Allow semantic based navigation and search across
    disparate data modalities
  • KR Multi-view representation of the information
    as input to visualization tools.

8
Integrated Mission Research Education
  • Develop diverse human resources to enhance the
    scientific research, educational, and
    governmental workforce in MIAS
  • Educational and Outreach Initiatives
  • Target students from small research programs
    minority-serving
  • Expose them to the national labs
  • Open opportunities for bigger impact
  • A comprehensive education program designed to
    increase participation in the study and practice
    of MIAS topics
  • Provide substantive training for a new generation
    of experts in the field,
  • Serve as a tool for recruiting an experienced
    group of undergraduates into graduate study in
    one of the broad fields of information science
  • Be an intellectual community center, where
    participants at all levels of expertise come
    together in an enriched environment of
    collaboration.

9
Data Science Summer Institute at UIUC
1st DSSI May 2007 Huge Success 2nd DSSI May
2008
  • Intensive Course
  • in
  • The Math of Data
  • Sciences
  • Probability and Statistics
  • Linear Algebra
  • Data Structures and Algorithms
  • Optimization
  • Learning Clustering
  • Research Projects
  • Led by co-PIs and
  • Grad Students
  • Topics
  • Virtual Web Focused Crawling
  • Relations and Entities
  • Text and Images

8 weeks course 27 students Faculty from UIUC,
Kansas State, UTSA, UTEP
Advanced MIAS Related Tutorials
Speaker Series
10
Data Science Summer Institute at UIUC
Advanced MIAS Related Tutorials
11
Data Science Summer Institute at UIUC
12
Our Team
  • Leading researchers in intelligent information
    access analysis and its foundations
  • Machine Learning
  • Data Bases, Data Integration and Knowledge
    Discovery
  • Information Retrieval
  • Natural Language Processing
  • Machine Vision
  • Knowledge Representation and Reasoning
  • A large number of affiliates/consultants covering
    all areas of interest to the MIAS center.

13
Kevin C. Chang Data Integration and Retrieval
  • Deep Web
  • MetaQuerier Large-scale
  • integration over deep Web
  • pioneered the holistic integration paradigm
  • Widely published at SIGMOD, VLDB, ICDE
  • System building and demo at CIDR, SIGMOD, VLDB,
  • PC/editor/organizers of SIGMOD, ICDE, WWW, SIGKDD
    special issue, WIRI06, IIWeb06 workshops
  • Awards NSF CAREER, IBM Faculty, NCSA Faculty
    VLDB00 Best Paper Selection

14
AnHai Doan Data Integration
  • Data integration
  • Matching Schemas, Ontologies, Entities
  • Integrating databases and text
  • ACM Doctoral Dissertation Award 2004
  • Sloan Fellowship 2006
  • Edited special issues on Data Integration
  • Co-chaired workshops on data integration, Web
    technologies, machine learning
  • Co-writing a book titled Data Integration

Monitoring People and Events
15
David Forsyth Computer Vision and Learning
  • Linking Text and Images
  • Labeling images via (a lot of)
  • caption text
  • Leading Computer Vision researcher,
  • Over 110 papers on vision, graphics, learning
    applications
  • Program Chair, CVPR 2000, General chair CVPR
    2006, regular member of PC in all major vision
    conferences
  • IEEE Technical Achievement Award, 2006
  • Lead author of main textbook, widely adopted

16
German supermodel Claudia Schiffer gave birth to
a baby boy by Caesarian section January 30, 2003,
her spokeswoman said. The baby is the first child
for both Schiffer, 32, and her husband, British
film producer Matthew Vaughn, who was at her side
for the birth. Schiffer is seen on the German
television show 'Bet It...?!' ('Wetten
Dass...?!') in Braunschweig, on January 26, 2002.
(Alexandra Winkler/Reuters)
British director Sam Mendes and his partner
actress Kate Winslet arrive at the London
premiere of 'The Road to Perdition', September
18, 2002. The films stars Tom Hanks as a Chicago
hit man who has a separate family life and
co-stars Paul Newman and Jude Law. REUTERS/Dan
Chung
US President George W. Bush (L) makes remarks
while Secretary of State Colin Powell (R) listens
before signing the US Leadership Against HIV
/AIDS , Tuberculosis and Malaria Act of 2003 at
the Department of State in Washington, DC. The
five-year plan is designed to help prevent and
treat AIDS, especially in more than a dozen
African and Caribbean nations(AFP/Luke Frazza)
17
(No Transcript)
18
Jiawei Han Knowledge Discovery
  • Patterns analysis and knowledge discovery from
    massive data
  • Research focus Data streams, frequent patterns,
    sequential patterns, graph patterns, and their
    applications
  • Privacy preserving Data Analysis
  • Developed many popular data mining algorithms,
    e.g., FPgrowth, PrefixSpan, gSpan, StarCubing,
    CrossMine, RankingCube, and CrossClus
  • Over 300 research papers published in conferences
    and journals
  • Editor-in-Chief, ACM Transactions on Knowledge
    Discovery from Data
  • Textbook, Data mining Concepts and Techniques,
    adopted worldwide

19
Cinda Heeren Knowledge Discovery, Education
  • MIAS Summer School Director
  • Mathematical Foundation of Data Science Discrete
  • UIUC CS Department Director of Diversity Programs
  • Lecturer for Discrete Math, Data Structures
    courses at UIUC
  • One of the leading teachers and educational
    leaders at UIUC.
  • Research in Algorithmic Data Analysis and Data
    bases
  • Speaker and regular presenter at conferences for
    young women, including
  • GAMES and WYSE summer camps
  • Expanding Your Horizons careers conference
  • Grace Hopper Celebration of Women in Computing
    2004 - panel on best practices for recruiting
    women into undergraduate programs in CS.
  • SIGCSE 2006 - workshop, How to host your own
    Small Regional Celebration of Women in Computing.

Summer School
20
ChengXiang Zhai Information Retrieval and Text
Analysis
  • Probabilistic Paradigms for Information Retrieval
  • Personalized/Context Dependent Search
  • Relation Identification and Data Integration
  • Leading expert in information retrieval and
    search technologies
  • Recipient of the 2004 Presidential Early Career
    Award for Scientists and Engineers (PECASE),
  • Main architect and key contributor of the Lemur
    Toolkit (being used by many research groups and
    IR companies around the world)
  • ACM SIGIR04 best paper award
  • Selected services include Program Chair of ACM
    CIKM 2004 HLT/NAACL 06 SIGIR09

21
Dan Roth Machine Learning, NLP, Inference
  • Semantic Analysis and Data enrichment
  • Entity and Relation Identification and
    Integration
  • Textual Entailment
  • Machine Learning Methods for NLP and IE
  • Leading Researcher in Machine Learning, NLP, AI
  • Developed Popular Machine Learning system and
    machine learning based NLP tools used in industry
    and NLP classes.
  • Program Chair, ACL03, CoNLL02 Regular senior
    PC member in all major Machine Learning, NLP and
    AI conferences
  • Associate Editor Journal of Artificial
    Intelligence Research Machine Learning
  • Multiple papers awards

22
Machine Learning, NLP, Reasoning Optimization
  • Foundations
  • Learning Theory Algorithmic and representational
    Issues high Dimensions dimensionality reduction
  • Learning protocols how to minimize interaction
    (supervision) how to map domain/task information
    to supervision semi-supervised learning active
    learning ranking adaptation.
  • Constrained Conditional Models Global decisions
    in which several local decisions play a role but
    there are mutual dependencies on their outcome
    NLP Inference as Constrained Optimization.
  • Natural Language Processing
  • Semantic Parsing
  • Question answering
  • Semantic Entailment
  • Intelligent Information Access
  • Information Extraction
  • Named Entities and Relations
  • Matching Entities Mentions within and across
    documents and data bases
  • Software
  • Many NLP and IE tools that are being used in
    research labs and industry
  • Basic tools development SNoW, FEX shallow
    parser, pos tagger, semantic parser NER,
  • Learning Based Programming

23
Textual Entailment
Phrasal verb paraphrasing ConnorRoth07
  • Given
  • Q Who acquired Overture?
  • Determine
  • A Eyeing the huge market potential,
    currently
  • led by Google, Yahoo took over
    search company
  • Overture Services Inc last
    year.

Entity matching Li et. al, AAAI04, NAACL04
Semantic Role Labeling
Inference for Entailment AAAI05TE07
Is it true that? (Textual Entailment)
Eyeing the huge market potential, currently led
by Google, Yahoo took over search company
Overture Services Inc. last year
?
Yahoo acquired Overture
Overture is a search company
Google is a search company
Google owns Overture
.
24
Constrained Conditional Models
Subject to constraints
(Soft) constraints component
How to solve (for best assignment) ? This is an
Integer Linear Program Solve using ILP packages
gives an exact solution. Search techniques are
also possible
How to train? How to decompose global objective
function? Should we incorporate constraints in
the learning process?
25
Semantic Categories
  • Information Access and Extraction requires the
    identification of semantic categories in text.

Query Aids Treatment
Federal health officials are recommending
aggressive use of a newly approved drug that
protects people infected with the AIDS virus
against a form of pneumonia that is the No.1
killer of AIDS victims. (AP890616-0048,
TIPSTER VOL. 1) Relevant documents may mention
specific types of treatments for AIDS
Hemophiliacs lack a protein, called factor VIII,
that is essential for making blood clots. As a
result, they frequently suffer internal bleeding
and must receive infusions of clotting protein
derived from human blood. During the early 1980s,
these treatments were often tainted with the AIDS
virus. (AP890118-0146, TIPSTER Vol.
1) Many irrelevant documents mention AIDS and
treatments for other diseases
  • There is a need to identify that this phrase
    represent a name of an organization, a name of a
    person, a name of a disease, a medicine, etc.
  • A narrow version of the problem is called named
    entity recognition (NER)

26
Adaptation of Named Entity Recognition
  • Entities are inherently ambiguous (e.g. JFK can
    be both location and a person depending on the
    context)
  • Can appear in various forms Can be nested.
  • Using lists is not sufficient
  • New entities are always being introduced

New NE seen
  • A lot of Machine Learning work significant
    over fitting
  • Key difficulties Adaptation to
  • New domains/corpora
  • Slightly new definition of an entity
  • New languages
  • New types of entities .

NE seen
  • How to reduce the requirements on the resources
    needed to produce a semantic categorization for
    a new domain/new language/new type of entities

27
NER Tools
Screen shot from a CCG demo http//L2R.cs.uiuc.edu
/cogcomp
  • Work in progress
  • Un-supervised discovery of entities in other
    languages
  • Quick adaptation to new entity types and new
    domains.

28
Extracting Relations
  • Information Access and Extraction requires the
    identification of relations between concepts in
    text.
  • Relations expressed within a single sentence or
    paragraph
  • Relations uncovered by processing large
    quantities of text (over time)
  • There is a need to identify concepts (e.g.,
    entities) and relations that hold between them in
    a given sentence.
  • Closed set of relations
  • A causes B
  • A works for B
  • A prevents B
  • A lives in B
  • Open ended set of relations
  • Every predicate can be a relation

29
Extracting Relations via Semantic Analysis
Screen shot from a CCG demo http//L2R.cs.uiuc.edu
/cogcomp
  • Semantic parsing reveals several relations in
    the sentence along with their arguments.
  • This level of analysis, however, cannot abstract
    over the inherent variability in expressing the
    relations. .
  • Kill and Explode can be expressed in many
    different ways.

30
Information extraction ACL07
with Background Knowledge (Constraints)
  • Lars Ole Andersen . Program analysis and
    specialization for the C Programming language.
    PhD thesis. DIKU , University of Copenhagen, May
    1994 .
  • Prediction result of a trained HMM
  • Lars Ole Andersen . Program analysis and
  • specialization for the
  • C
  • Programming language
  • . PhD thesis .
  • DIKU , University of Copenhagen , May
  • 1994 .

AUTHOR TITLE EDITOR BOOKTITLE
TECH-REPORT INSTITUTION DATE
Violates lots of constraints!
31
Information Extraction ACL07
with (background Knowledge) Constraints
  • Learn simple models.
  • Add constraints, to improve model expressivity
    and get correct results!
  • AUTHOR Lars Ole Andersen .
  • TITLE Program analysis and
    specialization for the
  • C Programming language .
  • TECH-REPORT PhD thesis .
  • INSTITUTION DIKU , University of Copenhagen ,
  • DATE May, 1994 .
  • If incorporated into semi-supervised training,
    better results mean
  • Better Feedback!

32
Why is it difficult?
Meaning
Language
33
The Reference Problem
The same problem exists with other types of
entities
Document 1 The Justice Department has officially
ended its inquiry into the assassinations of John
F. Kennedy and Martin Luther King Jr., finding
no persuasive evidence'' to support conspiracy
theories, according to department documents. The
House Assassinations Committee concluded in 1978
that Kennedy was probably'' assassinated as the
result of a conspiracy involving a second gunman,
a finding that broke from the Warren Commission
's belief that Lee Harvey Oswald acted alone in
Dallas on Nov. 22, 1963. Document 2 In 1953,
Massachusetts Sen. John F. Kennedy married
Jacqueline Lee Bouvier in Newport, R.I. In 1960,
Democratic presidential candidate John F. Kennedy
confronted the issue of his Roman Catholic faith
by telling a Protestant group in Houston, I do
not speak for my church on public matters, and
the church does not speak for me.' Document 3
David Kennedy was born in Leicester, England in
1959.  Kennedy co-edited The New Poetry
(Bloodaxe Books 1993), and is the author of New
Relations The Refashioning Of British Poetry
1980-1994 (Seren 1996). 
34
Entity/Concept Identification in Text
  • Goal Given names in text documents and their
    semantic types, identify real-world entities they
    represent.
  • A similarity measure between names entity type
    dependent
  • A way to group different looking strings into one
    group
  • A context sensitive way to distinguish between
    identical/similar strings that represent
    different entities
  • A generative Model
  • Li, Morie, Roth, NAACL04
  • A discriminative approach
  • Li, Morie, Roth, AAAI04
  • Summary AI Magazine Special Issue on Semantic
    Integration05
  • Goal Semantic Integration Text, Databases and
    Institutional Recourses
  • Map concepts identified in text to entries in
    databases.
  • Construct/augment databases from textual
    information.
  • Aid discovery in text using existing knowledge
    bases.

35
Demo
Screen shot from a CCG demo http//L2R.cs.uiuc.edu
/cogcomp More work on this problem Scaling
up Integration with DBs Temporal
Integration/Inference
Related Entities Context
36
MIAS Processes
  • Focused data retrieval and integration
  • Identify and collect relevant data from multiple
    sources
  • Semantic data enrichment Real world Entities and
    Relations among them
  • Infer semantics from unstructured data and
    images
  • Identify real-world entities and relations among
    them
  • Extraction of attributes and relations features
    into a common framework (generalized graphs)
  • Relate them to existing institutional resources
    for information integration
  • Trend Analysis
  • Tracking of events, social content, entities and
    topics
  • Knowledge discovery and hypotheses generation and
    verification
  • Construct the rich semantic structure and hidden
    networks of entity linkages
  • Multifaceted output
  • Information extraction
  • Allow semantic based navigation and search across
    disparate data modalities
  • KR Multi-view representation of the information
    as input to visualization tools.

Tools Text Processing Analysis Semantic Analysis
Information Extraction Information
Integration Machine Learning Data
Mining Integrating Text Images
37
MIAS - DSSI
  • Thank You

38
MIAS Mission
  • Most of the data today is unstructured
  • books, newspaper articles, journal publications,
    reports, images, and audio and video streams.
  • How to deal with the huge amount of unstructured
    data as if it was organized in a database with a
    known schema.
  • how to locate, organize, access and analyze
    unstructured data.
  • MIAS Mission
  • develop the theories, algorithms, and tools for
    analysts to
  • access a variety of data formats and models
  • integrate them with existing resources
  • transform raw data into useful and understandable
    information.
Write a Comment
User Comments (0)
About PowerShow.com