COSC 6341 Information Retrieval Project Presentation - PowerPoint PPT Presentation

About This Presentation
Title:

COSC 6341 Information Retrieval Project Presentation

Description:

A distributed Technical Report Archive in an University. Documentation for a large distributed project ... Each query is assigned to a random home site QHome(Q) ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 45
Provided by: www12
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: COSC 6341 Information Retrieval Project Presentation


1
COSC 6341Information RetrievalProject
Presentation
2
Dissemination of Collection Wide Information in a
Distributed Information Retrieval
System (Original Authors C.L. Viles J.C.
French) by Parmav Ghosh
3
What is Collection Wide Information?(CWI)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
What is Dissemination ?
8
Dissemination
  • Spreading
  • Making known
  • Distribution

9
(No Transcript)
10
(No Transcript)
11
The Distributed Archive Model
12
(No Transcript)
13
Examples of Distributed Archive Models
  • A distributed Technical Report Archive in an
    University
  • Documentation for a large distributed project
    like NASAs Mission to Planet Earth

14
Some Terms
  • s number of sites the Distributed Archive is
    composed of
  • Dij j-th document at site i
  • Cil The ordered collection of documents
    physically strored at site is local collection.

15
  • Cig The collection of documents that has been
    used to generate is version of CWI, this version
    is also called Gi
  • Gi f(Cig)

16
(No Transcript)
17
Dissemination of CWI
N The total number of documents in the
collection dfk The document frequency for the
k-th term
18
Ni Total number of documents in the i-th
site. dfik The document frequency for the k-th
term in i-th site s Total number of sites in
the archive
19
Dissemination Model
20
  • Prefix(d, Cil) the first d-th fraction of Cil
  • d The degree of Dissemination of CWI
  • d varies from 0 to 1

21
(No Transcript)
22
  • At any point of time, site i knows about all of
    its own document plus Prefix(d, Cjl) for all j?i

23
Special Cases
  • When d0 then no dissemination occurs and Gi is
    derived from local holdings.
  • When 0ltdlt1, Gi is derived partly from local
    holdings and partly from documents elsewhere.
  • When d1, complete dissemination occurs.

24
(No Transcript)
25
Document Allocation
26
(No Transcript)
27
  • Assumptions
  • Documents relevant to same query are relevant to
    each other
  • Each query is assigned to a random home site
    QHome(Q)

28
  • Documents are assigned to sites based on three
    pieces of information
  • Relevance information
  • QHome()
  • An affinity probability a

29
  • If document D is relevant to query Q, then D is
    assigned to QHome(Q) with probability a, and is
    assigned at random across all sites with
    probability (1-a).
  • Actual probability for QHome(Q)
  • a (1-a)/s

30
Allocation Pseudo Code
D getNextDocFromStream ( ) if(relevantQueryForD
oc(D) and Bernoulli (a)) Q findRelevantQuery
(D) assignedSite QHome(Q) else
assignedSite Equilikely (1, numSites)
31
Special Cases
  • When a0, documents are randomly allocated across
    all sites.
  • When a1, documents relevant to the same query
    are co-located.

32
Probability that a document D will be assigned to
a site given 5 sites, D is relevant to Q,
QHome(Q) 2, for an affinities of 0.0 and 0.5
33
Iso-knowledge lines
34
Using the equation discussed earlier and
knowledge of the size of the local collection, we
can determine the total proportion (ki) of
documents represented by Cig
35
Let, ci Ni / N be the fraction of all the
documents held at site i, then
36
When local collections are all of same size, then
ci 1 / s and the global k is defined by
37
(No Transcript)
38
(No Transcript)
39
One interesting result case
40
(No Transcript)
41
References
  • Primary
  • Dissemination of Collection Wide Information in a
    Distributed Information Retrieval System.
  • Secondary
  • Maintaining Retrieval Effectiveness in
    Distributed, Dynamic Information Retrieval
    Systems
  • Ensuring Retrieval Effectiveness in Distributed
    Digital Libraries

42
Q A
43
Feedback
44
Thank you
Write a Comment
User Comments (0)
About PowerShow.com