Structured Text Retrieval Models

About This Presentation

Title:

Description:

Number of Views:762

Avg rating:3.0/5.0

Slides: 14

Provided by: A83480

Category:

more less

Transcript and Presenter's Notes

Title: Structured Text Retrieval Models

1
Structured Text Retrieval Models
2
Str. Text Retrieval

Text Retrieval retrieves documents based on index
terms.
Observation Documents have implicit structure.
Regular text retrieval and indexing strategies
lose the information available within the
structure.
Text Retrieval desired based on structure.
e.g. All documents having George Bush in the
caption of a photo.

3
Models for Str. Text Retrieval

4
Proximal Nodes

By Gonzalo Navarro and Ricardo Baeza-Yates
Based on hierarchical structure of documents
Structure computation is static and all
structural elements are defined. nodes
Model attempts to define operators on these nodes
based on their definition and content.
Only nodes at a particular hierarchy are returned
as results.

5
Proximal Nodes
Document
Chapter
Chapter
Section
Section
Section
6
Proximal Nodes

7
Proximal Nodes

8
Retrieval on Evidence

By Mounia Lalmas
Based on documents made up of objects.
Objects are modeled as independent entities and
can be in different media, language or locations.
Document indexing degree of uncertainty that
the index term actually represents the object.
Uncertainty must be captured to get better
results.
Use the Dempster-Shafer theory of evidence

9
Retrieval on Evidence

Model takes into consideration disparity between
indexing vocabularies.
Aggregation of indexing vocabulary and also the
aggregation of the uncertainty.
Object o ? O and a type t ? T, the function type
is defined as O ??(T)
Aggregation is defined over objects and composite
object types contain all the types of the
contained objects

10
Retrieval on Evidence

Indexing vocabulary is defined over a
proposition-space. e.g. Wine (english,text),
Blue(colour,feature)
Sentence space defines that indexes in the same
proposition space can be used together.
Semantic between indexing vocabulary is
maintained using the the notion of worlds.

11
Retrieval on Evidence

Each type t has S, W, v, p
St is the sentence space for a type
W is the possible worlds associated with St
vt is true, false over Wt x Pt
?t is true, false over Wt x St
Logical and equivalence between sentences is
built around the notion of their semantics being
equivalent in all or most worlds.

12
Retrieval on Evidence

However, the uncertainty of the representation
remains.
This is represented by the weighting function
based on the Dempster Shafer model.
These objects and their syntactic and semantic
models are aggregated for the objects which
contain them. E.g. A section containing sentences
indexed by terms a,b,c,d.. Will be equivalent to
sentences over the worlds also implying a,b,c,d

13
Comparisons

Proximal Nodes is based on structured documents.
It presents the matter clearly and provides
approaches towards building a software
architecture. It presents findings of conducted
experiments.
The Evidence paper tries to model heterogeneous
documents, made up of different media, languages,
etc. Overall the model is complex and no results
are given to its implementation and performance.

Write a Comment

User Comments (0)