Title: Hubs and Authorities
1Hubs and Authorities Learning Perceptrons
- Artificial Intelligence
- CMSC 25000
- February 3, 2004
2Roadmap
- Problem
- Matching Topics and Documents
- Methods
- Classic Vector Space Model
- Challenge I Beyond literal matching
- Expansion Strategies
- Challenge II Authoritative source
- Hubs Authorities
- Page Rank
3Authoritative Sources
- Based on vector space alone, what would you
expect to get searching for search engine? - Would you expect to get Google?
4Issue
- Text isnt always best indicator of content
- Example
- search engine
- Text search -gt review of search engines
- Term doesnt appear on search engine pages
- Term probably appears on many pages that point to
many search engines
5Hubs Authorities
- Not all sites are created equal
- Finding better sites
- Question What defines a good site?
- Authoritative
- Not just content, but connections!
- One that many other sites think is good
- Site that is pointed to by many other sites
- Authority
6Conferring Authority
- Authorities rarely link to each other
- Competition
- Hubs
- Relevant sites point to prominent sites on topic
- Often not prominent themselves
- Professional or amateur
- Good Hubs Good Authorities
7Computing HITS
- Finding Hubs and Authorities
- Two steps
- Sampling
- Find potential authorities
- Weight-propagation
- Iteratively estimate best hubs and authorities
8Sampling
- Identify potential hubs and authorities
- Connected subsections of web
- Select root set with standard text query
- Construct base set
- All nodes pointed to by root set
- All nodes that point to root set
- Drop within-domain links
- 1000-5000 pages
9Weight-propagation
- Weights
- Authority weight
- Hub weight
- All weights are relative
- Updating
- Converges
- Pages with high x good authorities y good hubs
10Googles PageRank
- Identifies authorities
- Important pages are those pointed to by many
other pages - Better pointers, higher rank
- Ranks search results
- tpage pointing to A C(t) number of outbound
links - ddamping measure
- Actual ranking on logarithmic scale
- Iterate
11Contrasts
- Internal links
- Large sites carry more weight
- If well-designed
- HA ignores site-internals
- Outbound links explicitly penalized
- Lots of tweaks.
12Web Search
- Search by content
- Vector space model
- Word-based representation
- Aboutness and Surprise
- Enhancing matches
- Simple learning model
- Search by structure
- Authorities identified by link structure of web
- Hubs confer authority
13Efficient Implementation K-D Trees
- Divide instances into sets based on features
- Binary branching E.g. gt value
- 2d leaves with d split path n
- d O(log n)
- To split cases into sets,
- If there is one element in the set, stop
- Otherwise pick a feature to split on
- Find average position of two middle objects on
that dimension - Split remaining objects based on average position
- Recursively split subsets
14K-D Trees Classification
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
Poor
Good
Good
Poor
Good
Good
Poor
Good
15Efficient ImplementationParallel Hardware
- Classification cost
- distance computations
- Const time if O(n) processors
- Cost of finding closest
- Compute pairwise minimum, successively
- O(log n) time
16Nearest Neighbor Summary
17Nearest Neighbor Issues
- Prediction can be expensive if many features
- Affected by classification, feature noise
- One entry can change prediction
- Definition of distance metric
- How to combine different features
- Different types, ranges of values
- Sensitive to feature selection
18Nearest Neighbor Analysis
- Problem
- Ambiguous labeling, Training Noise
- Solution
- K-nearest neighbors
- Not just single nearest instance
- Compare to K nearest neighbors
- Label according to majority of K
- What should K be?
- Often 3, can train as well
19Nearest Neighbor Analysis
- Issue
- What is a good distance metric?
- How should features be combined?
- Strategy
- (Typically weighted) Euclidean distance
- Feature scaling Normalization
- Good starting point
- (Feature - Feature_mean)/Feature_standard_deviatio
n - Rescales all values - Centered on 0 with std_dev 1
20Nearest Neighbor Analysis
- Issue
- What features should we use?
- E.g. Credit rating Many possible features
- Tax bracket, debt burden, retirement savings,
etc.. - Nearest neighbor uses ALL
- Irrelevant feature(s) could mislead
- Fundamental problem with nearest neighbor
21Nearest Neighbor Advantages
- Fast training
- Just record feature vector - output value set
- Can model wide variety of functions
- Complex decision boundaries
- Weak inductive bias
- Very generally applicable
22Summary
- Machine learning
- Acquire function from input features to value
- Based on prior training instances
- Supervised vs Unsupervised learning
- Classification and Regression
- Inductive bias
- Representation of function to learn
- Complexity, Generalization, Validation
23Summary Nearest Neighbor
- Nearest neighbor
- Training record input vectors output value
- Prediction closest training instance to new data
- Efficient implementations
- Pros fast training, very general, little bias
- Cons distance metric (scaling), sensitivity to
noise extraneous features
24Learning Perceptrons
- Artificial Intelligence
- CMSC 25000
- February 3, 2003
25Agenda
- Neural Networks
- Biological analogy
- Perceptrons Single layer networks
- Perceptron training
- Perceptron convergence theorem
- Perceptron limitations
- Conclusions
26Neurons The Concept
Dendrites
Axon
Nucleus
Cell Body
Neurons Receive inputs from other neurons (via
synapses) When input exceeds threshold,
fires Sends output along axon to other
neurons Brain 1011 neurons, 1016 synapses
27Artificial Neural Nets
- Simulated Neuron
- Node connected to other nodes via links
- Links axonsynapselink
- Links associated with weight (like synapse)
- Multiplied by output of node
- Node combines input via activation function
- E.g. sum of weighted inputs passed thru
threshold - Simpler than real neuronal processes
28Artificial Neural Net
w
x
w
Sum Threshold
x
w
x
29Perceptrons
- Single neuron-like element
- Binary inputs
- Binary outputs
- Weighted sum of inputs gt threshold
30Perceptron Structure
y
w0
wn
w1
w3
w2
x01
x1
x3
x2
xn
. . .
compensates for threshold
x0 w0
31Perceptron Convergence Procedure
- Straight-forward training procedure
- Learns linearly separable functions
- Until perceptron yields correct output for all
- If the perceptron is correct, do nothing
- If the percepton is wrong,
- If it incorrectly says yes,
- Subtract input vector from weight vector
- Otherwise, add input vector to weight vector
32Perceptron Convergence Example
- LOGICAL-OR
- Sample x1 x2 x3 Desired Output
- 1 0 0 1
0 - 2 0 1 1
1 - 3 1 0 1
1 - 4 1 1 1
1 - Initial w(0 0 0)After S2, wws2(0 1 1)
- Pass2 S1ww-s1(0 1 0)S3wws3(1 1 1)
- Pass3 S1ww-s1(1 1 0)
33Perceptron Convergence Theorem
- If there exists a vector W s.t.
- Perceptron training will find it
- Assume
for all
ive examples x - w2 increases by at most x2, in each
iteration - wx2 lt w2x2 ltk x2
- v.w/w gt lt 1
Converges in k lt O
steps
34Perceptron Learning
- Perceptrons learn linear decision boundaries
- E.g.
x2
0
But not
0
x1
xor
X1 X2 -1 -1 w1x1 w2x2 lt 0 1
-1 w1x1 w2x2 gt 0 gt implies w1 gt 0 1
1 w1x1 w2x2 gt0 gt but should be
false -1 1 w1x1 w2x2 gt 0 gt implies
w2 gt 0
35Perceptron Example
- Digit recognition
- Assume display 8 lightable bars
- Inputs on/off threshold
- 65 steps to recognize 8
36Perceptron Summary
- Motivated by neuron activation
- Simple training procedure
- Guaranteed to converge
- IF linearly separable