Structured Text Retrieval Models - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Structured Text Retrieval Models

Description:

Structured Text Retrieval Models Str. Text Retrieval Text Retrieval retrieves documents based on index terms. Observation: Documents have implicit structure. – PowerPoint PPT presentation

Number of Views:760
Avg rating:3.0/5.0
Slides: 14
Provided by: A83480
Category:

less

Transcript and Presenter's Notes

Title: Structured Text Retrieval Models


1
Structured Text Retrieval Models
2
Str. Text Retrieval
  • Text Retrieval retrieves documents based on index
    terms.
  • Observation Documents have implicit structure.
  • Regular text retrieval and indexing strategies
    lose the information available within the
    structure.
  • Text Retrieval desired based on structure.
  • e.g. All documents having George Bush in the
    caption of a photo.

3
Models for Str. Text Retrieval
  • PAT Expressions
  • Overlapped Lists
  • Proximal Nodes
  • List of References
  • Tree-based
  • Query Languages (SFQL,CCL)

4
Proximal Nodes
  • By Gonzalo Navarro and Ricardo Baeza-Yates
  • Based on hierarchical structure of documents
  • Structure computation is static and all
    structural elements are defined. nodes
  • Model attempts to define operators on these nodes
    based on their definition and content.
  • Only nodes at a particular hierarchy are returned
    as results.

5
Proximal Nodes
Document
Chapter
Chapter
Section
Section
Section
6
Proximal Nodes
  • Nodes are structural in nature, e.g. Chapter,
    Section, etc.
  • Each node has a defined segment (Contiguous part
    of text)
  • Operators are defined with respect to this model.
  • Structure operators and Text operators.

7
Proximal Nodes
  • Structure Operators
  • Name
  • Inclusion
  • Positional Inclusion
  • Distance operators
  • Child/Parent operators
  • Set Manipulation operators
  • Text Operators
  • Match

8
Retrieval on Evidence
  • By Mounia Lalmas
  • Based on documents made up of objects.
  • Objects are modeled as independent entities and
    can be in different media, language or locations.
  • Document indexing degree of uncertainty that
    the index term actually represents the object.
  • Uncertainty must be captured to get better
    results.
  • Use the Dempster-Shafer theory of evidence

9
Retrieval on Evidence
  • Model takes into consideration disparity between
    indexing vocabularies.
  • Aggregation of indexing vocabulary and also the
    aggregation of the uncertainty.
  • Object o ? O and a type t ? T, the function type
    is defined as O ??(T)
  • Aggregation is defined over objects and composite
    object types contain all the types of the
    contained objects

10
Retrieval on Evidence
  • Indexing vocabulary is defined over a
    proposition-space. e.g. Wine (english,text),
    Blue(colour,feature)
  • Sentence space defines that indexes in the same
    proposition space can be used together.
  • Semantic between indexing vocabulary is
    maintained using the the notion of worlds.

11
Retrieval on Evidence
  • Each type t has S, W, v, p
  • St is the sentence space for a type
  • W is the possible worlds associated with St
  • vt is true, false over Wt x Pt
  • ?t is true, false over Wt x St
  • Logical and equivalence between sentences is
    built around the notion of their semantics being
    equivalent in all or most worlds.

12
Retrieval on Evidence
  • However, the uncertainty of the representation
    remains.
  • This is represented by the weighting function
    based on the Dempster Shafer model.
  • These objects and their syntactic and semantic
    models are aggregated for the objects which
    contain them. E.g. A section containing sentences
    indexed by terms a,b,c,d.. Will be equivalent to
    sentences over the worlds also implying a,b,c,d

13
Comparisons
  • Proximal Nodes is based on structured documents.
    It presents the matter clearly and provides
    approaches towards building a software
    architecture. It presents findings of conducted
    experiments.
  • The Evidence paper tries to model heterogeneous
    documents, made up of different media, languages,
    etc. Overall the model is complex and no results
    are given to its implementation and performance.
Write a Comment
User Comments (0)
About PowerShow.com