Generalized Vector Space Model presentation

About This Presentation

Transcript and Presenter's Notes

Title: Generalized Vector Space Model

1
Generalized Vector Space Model

2
An example for independent

V1(1, 0, 0), V2(0, 1, 0), V3(0, 0, 1).
V1 ? V20000.
Vi ? Vj0.
Each element represents a keywords.
Different keywords are treated as totally
different items. This is not reasonable since
sometimes they are related.

Definition Given the set k1 ,k2 ,,kt of index
terms in a collection, as before, let wi,j be the
weight associated with the term-document pair ki
,dj. If the wi,j weights are all binary, then
all possible patterns of term co-occurrence
(inside documents) can be represented by a set of
2t minterms given by min1 (0,0,,0),
min2 (1,0,,0),, min2t (1,1,,1).
Let gi (minj ) return the weight 0,1 of the
index term ki in the minterm mini. (gi(dj) is
defined similarly.)

5
The new vector kki is defined as
1.1
1.2
6
(No Transcript)
7
An example for Generalized Vector Space Model

8
Generalized Vector Space Model

Independent vectors
v1 (1, 0, 0, 0, 0, 0), v2(0, 1, 0, 0, 0,
0),
v3(0, 0, 1, 0, 0, 0), v4(0, 0, 0, 1, 0,
0),
v5(0, 0, 0, 0, 1, 0), v6(0, 0, 0, 0, 0, 1).
Vi represents minterm mini.
Each pair of Vi and Vj is orthogonal. (dot
product0)
The four keywords k1, k2, k3, and k4 are
represent by a combination of the independent
vectors.

9
Generalized Vector Space Model

The four keywords k1, k2, k3, and k4 are
represent by a combination of the independent
vectors.
k1(c1,1V1c1,2V2c1,3V3c1,4V4c1,5V5c1,6V6)/C
where c1,1w1,1w1,2w1,8 251 (D1, D2, and D8
has minterm min1), c1,2w1,3w1,9 123(D3 and
D9 has minterm min2), c1,3w1,4w1,6w1,120000
(D4, D6 and D12 has minterm min3.),
c1,4w1,5w1,1000. c1,5w1,70. c1,6w1,111.
C(c1,1 2c1,2 2c1,3 2c1,4 2c1,5 2c1,6 2)0.5

10
Generalized Vector Space Model

k2(c2,1V1c2,2V2c2,3V3c2,4V4c2,5V5c2,6V6)/C
where c2,1w2,1w2,2w2,8 111 (D1, D2, and D8
has minterm m1), c2,2w2,3w2,9 112(D3 and D9
has minterm m2), c2,3w2,4w2,6w2,120000
(D4, D6 and D12 has minterm m3.),
c2,4w2,5w2,10123. c2,5w2,70. c2,6w2,110.
C(c2,1 2c2,2 2c2,3 2c2,4 2c2,5 2c2,6 2)0.5

11
Generalized Vector Space Model

k3(c3,1V1c3,2V2c3,3V3c3,4V4c3,5V5c3,6V6)/C
where c3,1w3,1w3,2w3,8 0 (D1, D2, and D8 has
minterm m1), c3,2w3,3w3,9 112(D3 and D9 has
minterm m2), c3,3w3,4w3,6w2,122125 (D4, D6
and D12 has minterm m3.), c3,4w3,5w3,10123.
c3,5w3,71. c3,6w3,112.
C(c3,1 2c3,2 2c3,3 2c3,4 2c3,5 2c3,6 2)0.5

12
Generalized Vector Space Model

k4(c4,1V1c4,2V2c4,3V3c4,4V4c4,5V5c4,6V6)/C
where c4,1w4,1w4,2w4,8 0 (D1, D2, and D8 has
minterm m1), c4,2w4,3w4,9 112(D3 and D9 has
minterm m2), c4,3w4,4w4,6w4,122114 (D4, D6
and D12 has minterm m3.), c4,4w4,5w4,10224.
c4,5w4,70. c4,6w4,110.
C(c4,1 2c4,2 2c4,3 2c4,4 2c4,5 2c4,6 2)0.5
Kis are converted from a vector of length 4 into
a vector of length 6.

13
Extended Boolean Model

Disadvantages of Boolean Model
No term weight is used
Counterexample query qKx AND Ky.
Documents containing just one term, e,g, Kx
is considered as irrelevant as another document
containing none of these terms.
No term weight is used
The size of the output might be too large or too
small

14
Extended Boolean Model

15
Extended Boolean Model

16
Fig. Extended Boolean logic considering the space
composed of two terms kx and ky only.

17
Extended Boolean Model

18
Extended Boolean Model

19
Extend the idea to m terms

20
Properties

The p norm as defined above enjoys a couple of
interesting properties as follows. First, when
p1 it can be verified that
Second, when p? it can be verified that
Sim(qor,dj)max(xi)
Sim(qand,dj)min(xi)

21
Example

For instance, consider the query q(k1 ?k2) ? k3.
The similarity sim(q,dj) between a document dj
and this query is then computed as
Any boolean can be expressed as a numeral
formula.

22
Exercise

1. Give the numeral formula for extended Boolean
model of the query
q(k1 or k2 or k3)and (not k4 or k5). (assume
that there are 5 terms in total.)
2. Assume that the document is represented by the
vector (0.8, 0.1, 0.0, 0.0, 1.0).
What is sim(q, d) for extended Boolean model?
Also try to do more exercise for other Boolean
formulas.

Write a Comment

User Comments (0)

About PowerShow.com

Generalized Vector Space Model PowerPoint PPT Presentation