Ranking and Preference in Computer Science: Models and Semantics - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Ranking and Preference in Computer Science: Models and Semantics

Description:

Announcement: Plan for Next Week. Wednesday: ... Q: 'apple computer' Sim(Q, D) = Vector space modeling ... intuition: e.g., 'apple computer' in a computer DB ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 43

Provided by: ZhenZ7

Category:

more less

Transcript and Presenter's Notes

Title: Ranking and Preference in Computer Science: Models and Semantics

1
Ranking and Preference in Computer Science
Models and Semantics

Kevin Chen-Chuan Chang

2
Announcement Plan for Next Week

Wednesday
Tutorials If we can find good ones willing to
speak
Or, continue lecture on ranking for processing
Friday The CS Industry Affiliate Conference
Guest seminar http//www.cs.uiuc.edu/events/idcsa
/panel.php?panelInformationIntegration
Friday April 29th, 1015-1115am, 2405 SC
Recording will be online by Saturday afternoon

3
Midterm Survey General positive

8 feedbacks 7 on-campus, 1 on-line

4
Comments

Most like course content, structure and
components
Tutorials may be too open ended and intimidating
Lectures sometime too slow
SGP good to write, but too much to read
2nd part seemed too much work

5
Its rewarding to hear this anonymously!

So far I have learned a lot in this course. This
course has an excellent staff who are very
supportive. Thank you!

6
Ranking Ordering according to the degree of some
fuzzy notions

Similarity (or dissimilarity)
Relevance
Preference

Q
ranking
7
Similarity!-- Are they similar?

Two images

8
Similarity!-- Are they similar?

Two images

9
So, similarity is not a Boolean notion It is
relatively ranking
10
Similarity Are they similar?

Two strings

Virginia
Vermont
11
Ranking by similarity
12
Similarity-based ranking - by a distance
function (or dissimilarity)
Q
d(Q, Oi)
13
The space Defined by the objects and their
distances

Object representation Vector or not?
Distance function Metric or not?

14
Vector space What is a vector space?

(S, d) is a vector space if
Each object in S is a k-dimensional vector
The distance d(x, y) between any x and y is metric

15
Vector space distance functions The Lp
distance functions

The general form
AKA p-norm distance, Minkowski distance
Does this look familiar?

16
Vector space distance functions L1 The
Manhattan distance

Let p1 in Lp
Manhattan or block distance

(y1, y2)
(x1, x2)
17
Vector space distance functions L2 The
Euclidean distance

Let p2 in Lp
The shortest distance

(y1, y2)
(x1, x2)
18
Vector space distance functions The Cosine
measure
x
q
y
19
Sounds abstract? Thats actually how Web search
engines (like Google) work
Vector space modeling Or the TF-IDF model
Cosine measure
Q (x1, , xk)
Q apple computer
Sim(Q, D)
D
D (y1, , yk)
20
How to evaluate vector-space queries?Consider Lp
measure--

Consider L2 as the ranking function
Given object Q, find Oi of increasing d(Q, Oi)
How to evaluate this query? What index structure?
As nearest-neighbor queries
Using multidimensional or spatial indexes. e.g.,
R-tree Guttman, 1984

21
How to evaluate vector-space queries? Consider
Cosine measure--

Sim(Q, D)
How to evaluate this query? What index structure?
Simple computation multiply and sum up
Inverted index to find document with non-zero
weights for query terms

22
Is vector space always possible?

Can you always express objects as k-dimensional
vectors, so that
distance function compares only corresponding
dimensions?
Counter examples?

23
How about comparing two strings? Is it natural to
consider in vector space?

Two strings

Virginia
Vermont
24
Metric space What is a metric space?

Set S of objects
Global distance function d, (the metric)
For every two points x, y in S
Positiveness
Symmetry
Reflexivity
Triangle inequity

25
Vector space is a special case of metric space
E.g., consider L2

Let p2 in Lp
The shortest distance

(y1, y2)
(x1, x2)
26
Another example-- Edit distance

The smallest number of edit operations
(insertions, deletions, and substitutions)
required to transform one string into another
Virginia
Verginia
Verminia
Vermonia
Vermonta
Vermont
http//urchin.earth.li/twic/edit-distance.html

27
Is edit distance metric?

Can you show that it is symmetric?
Such that d(Virginia, Vermont) d(Vermont,
Virginia)?
Virginia
Verginia
Verminia
Vermonta
Vermonta
Vermont
Check other properties

28
How to evaluate metric-space ranking queries?
Chávez et al., 2001

Can we still use R-tree?
What property of metric space can we leverage to
prune the search space for finding near objects?

29
Metric-space indexing

What is the range of u?
How does this help in focusing our search?

Q
5
Index
2
3
u
6
30
Relevance-based ranking for text retrieval

What is being relevant?
Many different ways modeling relevance
Similarity
How similar is D to Q?
Probability
How likely is D relevant to Q?
Inference
How likely can D infer Q?

31
Similarity-based relevance- We just talked about
this vector-space modeling Salton et al.,
1975
Vector space modeling Or the TF-IDF model

TF-IDF for term weights in vectors
TF term frequency (in this document)
the more term occurrences in this doc, the better
IDF inverse document frequency (in entire DB)
the fewer documents contain this term, the better

Cosine measure
Q (x1, , xk)
Q apple computer
Sim(Q, D)
D
D (y1, , yk)
32
Probabilistic relevance

View Probability of relevance
the probabilistic ranking principle Robertson,
1977
If a retrieval systems response to each request
is a ranking of the documents in the collections
in order of decreasing probability of usefulness
to the user who submitted the request, where the
probabilities are estimated as accurately as
possible on the basis of whatever data made
available to the system for this purpose, then
the overall effectiveness of the system to its
users will be the best that is obtainable on the
basis of that data.
Initial idea proposed in Maron and Kuhns, 1960
many models followed.

33
Probabilistic models (e.g. Croft and Harper,
1979)

Estimate and rank by P(R Q, D), or
I.e., , where
Assume
pi the same for all query terms
qi ni/N, where N is DB size
(i.e., all docs are non-relevant)
Similar to using IDF
intuition e.g., apple computer in a computer DB

34
This is how we derive the ranking function

To rank by

35
Inference-based relevance

Motivation
Is there any objective way of defining
relevance?
Hint from a logic view of database querying
retrieve all objects s.t., O ? Q
E.g., O (john, cs, 3.5) ? gpagt3.0 AND deptcs
What about Retrieve D iff we can prove D?Q?
Challenges Uncertainty in inference? van
Rijsbergen, 1986
Representation of documents and queries
Quantify the uncertainty of inference P(D?Q)
P(QD)

36
Inference network Turtle and Croft, 1990

Given doc as evidence, prove that info need is
satisfied
Inference based on Bayesian belief networks

doc
doc dn observed
d1
dn
d2
t1
t2
Doc rep.
tn
rk
Doc Network
r1
r2
r3
Doc concept
cm
c2
c1
Query concept
q1
Query rep.
Query Network
q2
Q
Query or infomation need
37
Using and constructing the network