Retrieval by Content Retrieval by Authority - PowerPoint PPT Presentation

About This Presentation

Title:

Retrieval by Content Retrieval by Authority

Description:

Matching Topics and Documents. Challenge I: Beyond literal matching. Expansion Strategies ... Crucial for highly inflected languages Czech, Arabic. Aspect models ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 64

Provided by: classesCs

Learn more at: https://www.classes.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: Retrieval by Content Retrieval by Authority

1
Retrieval by Content Retrieval by Authority

Artificial Intelligence
CMSC 25000
February 5, 2008

2
Roadmap

Problem
Matching Topics and Documents
Challenge I Beyond literal matching
Expansion Strategies
Challenge II Authoritative source
Hubs Authorities
Page Rank

3
Roadmap

Problem
Matching Topics and Documents
Methods
Classic Vector Space Model
Challenge I Beyond literal matching
Expansion Strategies
Challenge II Authoritative source
Page Rank
Hubs Authorities

4
Matching Topics and Documents

Two main perspectives
Pre-defined, fixed, finite topics
Text Classification
Arbitrary topics, typically defined by statement
of information need (aka query)
Information Retrieval

5
Vector Space Information Retrieval

Task
Document collection
Query specifies information need free text
Relevance judgments 0/1 for all docs
Word evidence Bag of words
No ordering information

6
Vector Space Model
Tv
Program
Computer
Two documents computer program, tv program
Query computer program matches 1 st doc
exact distance2 vs 0 educational
program matches both equally distance1
7
Vector Space Model

Represent documents and queries as
Vectors of term-based features
Features tied to occurrence of terms in
collection
E.g.
Solution 1 Binary features t1 if present, 0
otherwise
Similiarity number of terms in common
Dot product

8
Question

Whats wrong with this?

9
Vector Space Model II

Problem Not all terms equally interesting
E.g. the vs dog vs Levow
Solution Replace binary term features with
weights
Document collection term-by-document matrix
View as vector in multidimensional space
Nearby vectors are related
Normalize for vector length

10
Vector Similarity Computation

Similarity Dot product
Normalization
Normalize weights in advance
Normalize post-hoc

11
Term Weighting

Aboutness
To what degree is this term what document is
about?
Within document measure
Term frequency (tf) occurrences of t in doc j
Specificity
How surprised are you to see this term?
Collection frequency
Inverse document frequency (idf)

12
Term Selection Formation

Selection
Some terms are truly useless
Too frequent, no content
E.g. the, a, and,
Stop words ignore such terms altogether
Creation
Too many surface forms for same concepts
E.g. inflections of words verb conjugations,
plural
Stem terms treat all forms as same underlying

13
Key Issue

All approaches operate on term matching
If a synonym, rather than original term, is used,
approach fails
Develop more robust techniques
Match concept rather than term
Expansion approaches
Add in related terms to enhance matching
Mapping techniques
Associate terms to concepts
Aspect models, stemming

14
Expansion Techniques

Can apply to query or document
Thesaurus expansion
Use linguistic resource thesaurus, WordNet to
add synonyms/related terms
Feedback expansion
Add terms that should have appeared
User interaction
Direct or relevance feedback
Automatic pseudo relevance feedback

15
Query Refinement

Typical queries very short, ambiguous
Cat animal/Unix command
Add more terms to disambiguate, improve
Relevance feedback
Retrieve with original queries
Present results
Ask user to tag relevant/non-relevant
push toward relevant vectors, away from nr
ß?1 (0.75,0.25) r rel docs, s non-rel docs
Roccio expansion formula

16
Compression Techniques

Reduce surface term variation to concepts
Stemming
Map inflectional variants to root
E.g. see, sees, seen, saw -gt see
Crucial for highly inflected languages Czech,
Arabic
Aspect models
Matrix representations typically very sparse
Reduce dimensionality to small key aspects
Mapping contextually similar terms together
Latent semantic analysis

17
Authoritative Sources

Based on vector space alone, what would you
expect to get searching for search engine?
Would you expect to get Google?

18
Issue

Text isnt always best indicator of content
Example
search engine
Text search -gt review of search engines
Term doesnt appear on search engine pages
Term probably appears on many pages that point to
many search engines

19
Hubs Authorities

Not all sites are created equal
Finding better sites
Question What defines a good site?
Authoritative
Not just content, but connections!
One that many other sites think is good
Site that is pointed to by many other sites
Authority

20
Conferring Authority

Authorities rarely link to each other
Competition
Hubs
Relevant sites point to prominent sites on topic
Often not prominent themselves
Professional or amateur
Good Hubs Good Authorities

21
Computing HITS

Finding Hubs and Authorities
Two steps
Sampling
Find potential authorities
Weight-propagation
Iteratively estimate best hubs and authorities

22
Sampling

Identify potential hubs and authorities
Connected subsections of web
Select root set with standard text query
Construct base set
All nodes pointed to by root set
All nodes that point to root set
Drop within-domain links
1000-5000 pages

23
Weight-propagation

Weights
Authority weight
Hub weight
All weights are relative
Updating
Converges
Pages with high x good authorities y good hubs

24
Weight Propagation

Create adjacency matrix A
Ai,j 1 if i links to j, o.w. 0
Create vectors x and y of corresponding values
Converges to principal eigenvector

25
Googles PageRank

Identifies authorities
Important pages are those pointed to by many
other pages
Better pointers, higher rank
Ranks search results
t page pointing to A C(t) number of outbound
links
d damping measure
Actual ranking on logarithmic scale
Iterate

26
Contrasts

Internal links
Large sites carry more weight
If well-designed
HA ignores site-internals
Outbound links explicitly penalized
Lots of tweaks.

27
Web Search

Search by content
Vector space model
Word-based representation
Aboutness and Surprise
Enhancing matches
Simple learning model
Search by structure
Authorities identified by link structure of web
Hubs confer authority

28
Learning Perceptrons

Artificial Intelligence
CMSC 25000
February 5, 2008

29
Agenda

Neural Networks
Biological analogy
Perceptrons Single layer networks
Perceptron training
Perceptron convergence theorem
Perceptron limitations
Conclusions

30
Neurons The Concept
Dendrites
Axon
Nucleus
Cell Body
Neurons Receive inputs from other neurons (via
synapses) When input exceeds threshold,
fires Sends output along axon to other
neurons Brain 1011 neurons, 1016 synapses
31
Artificial Neural Nets

Simulated Neuron
Node connected to other nodes via links
Links axonsynapselink
Links associated with weight (like synapse)
Multiplied by output of node
Node combines input via activation function
E.g. sum of weighted inputs passed thru
threshold
Simpler than real neuronal processes

32
Artificial Neural Net
w
x
w
Sum Threshold
x
w
x
33
Perceptrons

Single neuron-like element
Binary inputs
Binary outputs
Weighted sum of inputs gt threshold

34
Perceptron Structure
y
w0
wn
w1
w3
w2
x01
x1
x3
x2
xn
. . .
compensates for threshold
x0 w0
35
Perceptron Convergence Procedure

Straight-forward training procedure
Learns linearly separable functions
Until perceptron yields correct output for all
If the perceptron is correct, do nothing
If the percepton is wrong,
If it incorrectly says yes,
Subtract input vector from weight vector
Otherwise, add input vector to weight vector

36
Perceptron Convergence Example

LOGICAL-OR
Sample x1 x2 x3 Desired Output
1 0 0 1
0
2 0 1 1
1
3 1 0 1
1
4 1 1 1
1
Initial w(0 0 0)After S2, wws2(0 1 1)
Pass2 S1ww-s1(0 1 0)S3wws3(1 1 1)
Pass3 S1ww-s1(1 1 0)

37
Perceptron Convergence Theorem

If there exists a vector W s.t.
Perceptron training will find it
Assume

for all
ive examples x
w2 increases by at most x2, in each
iteration
wx2 lt w2x2 ltk x2
v.w/w gt lt 1

Converges in k lt O
steps

38
Perceptron Learning

Perceptrons learn linear decision boundaries
E.g.

x2

0
But not
0

x1
xor
X1 X2 -1 -1 w1x1 w2x2 lt 0 1
-1 w1x1 w2x2 gt 0 gt implies w1 gt 0 1
1 w1x1 w2x2 gt0 gt but should be
false -1 1 w1x1 w2x2 gt 0 gt implies
w2 gt 0
39
Perceptron Example

Digit recognition
Assume display 8 lightable bars
Inputs on/off threshold
65 steps to recognize 8

40
Perceptron Summary

Motivated by neuron activation
Simple training procedure
Guaranteed to converge
IF linearly separable

41
Neural Nets

Multi-layer perceptrons
Inputs real-valued
Intermediate hidden nodes
Output(s) one (or more) discrete-valued

X1
Y1 Y2
X2
X3
X4
Inputs
Hidden
Hidden
Outputs
42
Neural Nets

Pro More general than perceptrons
Not restricted to linear discriminants
Multiple outputs one classification each
Con No simple, guaranteed training procedure
Use greedy, hill-climbing procedure to train
Gradient descent, Backpropagation

43
Solving the XOR Problem
o1
w11
Network Topology 2 hidden nodes 1 output
w13
x1
w01
w21
y
-1
w23
w12
w03
w22
x2
-1
w02
o2
Desired behavior x1 x2 o1 o2 y 0 0 0
0 0 1 0 0 1 1 0 1 0 1
1 1 1 1 1 0
-1
Weights w11 w121 w21w22 1 w013/2 w021/2
w031/2 w13-1 w231
44
Neural Net Applications

Speech recognition
Handwriting recognition
NETtalk Letter-to-sound rules
ALVINN Autonomous driving

45
ALVINN

Driving as a neural network
Inputs
Image pixel intensities
I.e. lane lines
5 Hidden nodes
Outputs
Steering actions
E.g. turn left/right how far
Training
Observe human behavior sample images, steering

46
Backpropagation

Greedy, Hill-climbing procedure
Weights are parameters to change
Original hill-climb changes one parameter/step
Slow
If smooth function, change all parameters/step
Gradient descent
Backpropagation Computes current output, works
backward to correct error

47
Producing a Smooth Function

Key problem
Pure step threshold is discontinuous
Not differentiable
Solution
Sigmoid (squashed s function) Logistic fn

48
Neural Net Training

Goal
Determine how to change weights to get correct
output
Large change in weight to produce large reduction
in error
Approach
Compute actual output o
Compare to desired output d
Determine effect of each weight w on error d-o
Adjust weights

49
Neural Net Example
xi ith sample input vector w weight vector
yi desired output for ith sample
-
Sum of squares error over training samples
From 6.034 notes lozano-perez
Full expression of output in terms of input and
weights
50
Gradient Descent

Error Sum of squares error of inputs with
current weights
Compute rate of change of error wrt each weight
Which weights have greatest effect on error?
Effectively, partial derivatives of error wrt
weights
In turn, depend on other weights gt chain rule

51
Gradient Descent
dG dw

E G(w)
Error as function of weights
Find rate of change of error
Follow steepest rate of change
Change weights s.t. error is minimized

E
G(w)
w0w1
w
Local minima
52
Gradient of Error
-
Note Derivative of sigmoid ds(z1)
s(z1)(1-s(z1)) dz1
From 6.034 notes lozano-perez
53
From Effect to Update