Title: COSC 6341 Information Retrieval Project Presentation
1COSC 6341Information RetrievalProject
Presentation
2Dissemination of Collection Wide Information in a
Distributed Information Retrieval
System (Original Authors C.L. Viles J.C.
French) by Parmav Ghosh
3What is Collection Wide Information?(CWI)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7What is Dissemination ?
8Dissemination
- Spreading
- Making known
- Distribution
9(No Transcript)
10(No Transcript)
11The Distributed Archive Model
12(No Transcript)
13Examples of Distributed Archive Models
- A distributed Technical Report Archive in an
University - Documentation for a large distributed project
like NASAs Mission to Planet Earth
14Some Terms
- s number of sites the Distributed Archive is
composed of - Dij j-th document at site i
- Cil The ordered collection of documents
physically strored at site is local collection.
15- Cig The collection of documents that has been
used to generate is version of CWI, this version
is also called Gi - Gi f(Cig)
16(No Transcript)
17Dissemination of CWI
N The total number of documents in the
collection dfk The document frequency for the
k-th term
18Ni Total number of documents in the i-th
site. dfik The document frequency for the k-th
term in i-th site s Total number of sites in
the archive
19Dissemination Model
20- Prefix(d, Cil) the first d-th fraction of Cil
- d The degree of Dissemination of CWI
- d varies from 0 to 1
21(No Transcript)
22- At any point of time, site i knows about all of
its own document plus Prefix(d, Cjl) for all j?i
23Special Cases
- When d0 then no dissemination occurs and Gi is
derived from local holdings. - When 0ltdlt1, Gi is derived partly from local
holdings and partly from documents elsewhere. - When d1, complete dissemination occurs.
24(No Transcript)
25Document Allocation
26(No Transcript)
27- Assumptions
- Documents relevant to same query are relevant to
each other - Each query is assigned to a random home site
QHome(Q)
28- Documents are assigned to sites based on three
pieces of information - Relevance information
- QHome()
- An affinity probability a
29- If document D is relevant to query Q, then D is
assigned to QHome(Q) with probability a, and is
assigned at random across all sites with
probability (1-a). - Actual probability for QHome(Q)
- a (1-a)/s
30Allocation Pseudo Code
D getNextDocFromStream ( ) if(relevantQueryForD
oc(D) and Bernoulli (a)) Q findRelevantQuery
(D) assignedSite QHome(Q) else
assignedSite Equilikely (1, numSites)
31Special Cases
- When a0, documents are randomly allocated across
all sites. - When a1, documents relevant to the same query
are co-located.
32Probability that a document D will be assigned to
a site given 5 sites, D is relevant to Q,
QHome(Q) 2, for an affinities of 0.0 and 0.5
33Iso-knowledge lines
34Using the equation discussed earlier and
knowledge of the size of the local collection, we
can determine the total proportion (ki) of
documents represented by Cig
35Let, ci Ni / N be the fraction of all the
documents held at site i, then
36When local collections are all of same size, then
ci 1 / s and the global k is defined by
37(No Transcript)
38(No Transcript)
39One interesting result case
40(No Transcript)
41References
- Primary
- Dissemination of Collection Wide Information in a
Distributed Information Retrieval System. - Secondary
- Maintaining Retrieval Effectiveness in
Distributed, Dynamic Information Retrieval
Systems - Ensuring Retrieval Effectiveness in Distributed
Digital Libraries
42Q A
43Feedback
44Thank you