Dynamic Element Retrieval in a Structured Environment - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Dynamic Element Retrieval in a Structured Environment

Description:

Similar work(Grabs and Shek) Exhaustivity dependent. Progress in specifity. Researchers. Grabs and Shek(similar work to flexible retrieval) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 29
Provided by: bigg
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Element Retrieval in a Structured Environment


1
Dynamic Element Retrieval in a Structured
Environment
  • Crouch, Carolyn J.
  • University of Minnesota Duluth, MN
  • October 1, 2006

2
Key Problems
  • Retrieval of elements at desired level of
    granularity
  • Assigning a rank order to each element that
    reflects its perceived relevance to the query

3
Retrieval Environment
  • Vector Space Model
  • INEX Environment
  • Flexible Retrieval

4
Vector Space Model
  • Document Indexing
  • Term Weighting
  • Similarity Coefficients

5
INEX- Initiative for the Evaluation of XML
Retrieval
  • INEX provides an environment for experiments in
    structured retrieval
  • Traditionally contains two types of topics CO and
    CAS
  • Both INEX 2004 and 2005 utilize an evaluation
    measure known as inex-eval
  • Recall(the proportion of relevant information
    retrieved) and Precision(the proportion of
    retrieved items that are relevant

6
Flexible Retrieval System
  • Systems processes XML documents
  • Smart format(Saltons Magic Automatic Retriever
    of Text)
  • Lnu-ltu term weighting

7
A Method for Flexible Retrieval
  • Input to Flexible Retrieval
  • Construction of the Document Tree
  • Ranking of Elements
  • Output of Flexible Retrieval

8
Input to Flexible Retrieval
  • Preorder traversal
  • Ranked terminal leaf nodes(paragraphs)
  • Generate document tree(schema and paragraphs)

9
Document Tree
10
Construction of the Document Tree
  • Schema determine document tree
  • Calculate Lnu-ltu term weights

11
Ranking of Elements
  • Address ranking issues with Lnu-ltu term
    weighting
  • Length and normalization issues
  • Pivot and slope

12
Simple structured document
13
Lnu(weight of element vector formula)
(1 log(term frequency)) (1 log(average term
frequency)) ______________________________________
____________ (1 - slope) slope ((number
unique terms) pivot)
14
Ltu(weighting of query terms formula)
(1 log(term frequency) log(N
nk) ___________________________________________ (1
- slope) slope ((number unique terms)
pivot)
15
Overview of flexible retrieval
1. Parse to extract leaf nodes from the original
XML documents 2. Index leaf nodes and queries
using Smart 3. Perform Smart retrieval to get
highly correlated leaf nodes
16
Overview of flexible retrieval(cont)
4. For each document containing a retrieved leaf
node a. Get its document schema b. Generate
vector representations for inner nodes
(elements) 5. For each term in the query a. Get
its inverted file entry and corresponding xpaths b
. Find nk at all levels
17
Output of Flexible Retrieval
  • Equivalent to all-element index

18
Experiments in flexible retrieval
  • Factors of interest
  • Experiments and results

19
Factors of interest
  • Slope and pivot during Lnu-ltu term weighting
  • The n(number of paragraph)

20
Experiments and Results
  • Attendant file size(dictionary, inverted index,
    element vectors reduced by 60, 50 and 50
    respectively)
  • 30- 40 less storage than all-element index
  • Is dynamic element retrieval Cost Effective?

21
Conclusion
  • Similar work(Grabs and Shek)
  • Exhaustivity dependent
  • Progress in specifity

22
Researchers
  • Grabs and Shek(similar work to flexible
    retrieval)
  • Govert et al.(term weights are multiplied by a
    collection-dependent augmentation factor as they
    are propagated up the doc. Tree
  • Mass et al.(maintain separate indices for element
    at different levels of granularity. Solves issues
    of distorted statistics

23
Overview of flexible retrieval(cont)
6. Correlate element vectors at each level with
query 7. Return ranked list of elements
24
Table I
INEX 2004 INEX 2005 article
12,107 16,440 sections 69,577
94,421 subsections 77,397
104,746 paragraphs 1,029,747
1,378,202 elements 1,188,828 1,593,809 CO
Topics 40 Topics 40 Topics (34 assessed)
(29 assessed)
25
Table II. Comparison of All-Element and Flexible
Retrieval under Inex-Eval (Generalized)

Precision at Rank
2004
2005 Rank All Element
Flexible All Element
Flexible 1 0.3897
0.3971 0.4224
0.4224 5 0.3088
0.2882 0.3241
0.3413 10 0.2735
0.2669 0.2991
0.2991 20 0.2529
0.2390 0.2841
0.2939 25 0.2456
0.2379 0.2669
0.2800 50 0.2000
0.1972 0.2364
0.2366 100 0.1523
0.1501 0.1921
0.1920 500 0.0697
0.0697 0.0943
0.0949 1500 0.0353
0.0362 0.0472
0.0483
26
Table II.(cont)

  • Precision at Various Points of Recall
  • 2004
    2005
  • Recall All Element Flexible
    All Element Flexible
  • 0.01 0.3395 0.3348
    0.3562 0.3693
  • 0.25 0.0971 0.0951
    0.1131 0.1165
  • 0.50 0.0257 0.0283
    0.0385 0.0404
  • 0.75 0.0017 0.0017
    0.0097 0.0095
  • 1.00 0.0013 0.0013
    0.0015 0.0015
  • avg prec 0.0625 0.0620
    0.0739 0.0750

27
Table III. Comparison of All-Element and Flexible
Retrieval under Inex-Eval (Strict)

  • Precision at Rank
  • 2004
    2005
  • Rank All Element Flexible
    All Element Flexible
  • 1 0.2000 0.2000
    0.1481 0.1481
  • 5 0.1440 0.1200
    0.0667 0.0741
  • 10 0.1240 0.1200
    0.0852 0.0778
  • 20 0.1120 0.1020
    0.0815 0.0815
  • 25 0.1024 0.0992
    0.0800 0.0830
  • 50 0.0898 0.0832
    0.0689 0.0681
  • 100 0.0628 0.0608
    0.0511 0.0500
  • 500 0.0268 0.0259
    0.0219 0.0217
  • 1500 0.0141 0.0143
    0.0096 0.0097

28
Table III.(cont)
  • Precision at Various
    Points of Recall
  • 2004
    2005
  • Recall All Element Flexible
    All Element Flexible
  • 0.01 0.2134 0.2115
    0.1521 0.1535
  • 0.25 0.1006 0.1070
    0.0540 0.0515
  • 0.50 0.0411 0.0394
    0.0156 0.0191
  • 0.75 0.0166 0.0159
    0.0103 0.0104
  • 1.00 0.0042 0.0044
    0.0046 0.0048
  • avg prec 0.0586 0.0577
    0.0318 0.0335
Write a Comment
User Comments (0)
About PowerShow.com