Title: Fast Random Walk with Restart and Its Applications
1Fast Random Walk with Restart and Its Applications
- Hanghang Tong, Christos Faloutsos and Jia-Yu
(Tim) Pan
ICDM 2006
Dec.
18-22, HongKong
2Motivating Questions
- Q How to measure the relevance?
- A Random walk with restart
- Q How to do it efficiently?
- A This talk tries to answer!
3Random walk with restart
4Random walk with restart
Node 4
Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02
Nearby nodes, higher scores
Ranking vector
More red, more relevant
5Automatic Image Caption
Sea
Sun
Sky
Wave
?
A RWR! Pan KDD2004
6Region
Image
Test Image
Keyword
7Region
Image
Test Image
Grass, Forest, Cat, Tiger
Sea
Sun
Sky
Wave
Cat
Forest
Tiger
Grass
Keyword
8Neighborhood Formulation
Q what is most related conference to ICDM
A RWR! Sun ICDM2005
Conference
Author
9NF example
10Center-Piece Subgraph(CePS)
Q
?
Original Graph Black query nodes
CePS
A RWR! Tong KDD 2006
11CePS Example
12Other Applications
- Content-based Image Retrieval He
- Personalized PageRank Jeh, Widom,
Haveliwala - Anomaly Detection (for node link) Sun
- Link Prediction Getoor, Jensen
- Semi-supervised Learning Zhu, Zhou
13Roadmap
- Background
- RWR Definitions
- RWR Algorithms
- Basic Idea
- FastRWR
- Pre-Compute Stage
- On-Line Stage
- Experimental Results
- Conclusion
14Computing RWR
Starting vector
Restart p
Adjacent matrix
Ranking vector
1
n x n
n x 1
n x 1
15Beyond RWR
Maxwell Equation for Web!
Chakrabarti
P-PageRank Haveliwala
SM Learning Zhou, Zhu
RL in CBIR He
PageRank Haveliwala
RWR Pan, Sun
Fast RWR Finds the Root Solution !
16- Q Given query i, how to solve it?
?
?
17OntheFly
No pre-computation/ light storage
Slow on-line response
O(mE)
18PreCompute
10
9
12
2
8
1
11
R
3
4
6
5
7
Haveliwala
19PreCompute
Fast on-line response
Heavy pre-computation/storage cost
O(n )
3
O(n )
2
20Q How to Balance?
On-line
Off-line
21Roadmap
- Background
- RWR Definitions
- RWR Algorithms
- Basic Idea
- FastRWR
- Pre-Compute Stage
- On-Line Stage
- Experimental Results
- Conclusion
22Basic Idea
Find Community
Combine
Fix the remaining
23Pre-computational stage
-1
- Q
- A A few small, instead of ONE BIG, matrices
inversions
Efficiently compute and store Q
24On-Line Query Stage
-1
- Q Efficiently recover one column of Q
- A A few, instead of MANY, matrix-vector
multiplication
25Roadmap
- Background
- RWR Definitions
- RWR Algorithms
- Basic Idea
- FastRWR
- Pre-Compute Stage
- On-Line Stage
- Experimental Results
- Conclusion
26Pre-compute Stage
- p1 B_Lin Decomposition
- P1.1 partition
- P1.2 low-rank approximation
- p2 Q matrices
- P2.1 computing (for each partition)
- P2.2 computing (for concept space)
27P1.1 partition
10
9
12
2
8
1
11
3
4
6
5
7
Within-partition links
cross-partition links
28P1.1 block-diagonal
10
9
12
2
8
1
11
3
4
6
5
7
29P1.2 LRA for
10
9
12
2
8
1
11
3
4
6
5
7
S ltlt W2
30 31p2.1 Computing
32Comparing and
- Computing Time
- 100,000 nodes 100 partitions
- Computing 100,00x is Faster!
- Storage Cost
- 100x saving!
33- Q How to fix the green portions?
?
34p2.2 Computing
-1
_
U
V
35We have
Communities
Bridges
SM Lemma says
36Roadmap
- Background
- RWR Definitions
- RWR Algorithms
- Basic Idea
- FastRWR
- Pre-Compute Stage
- On-Line Stage
- Experimental Results
- Conclusion
37On-Line Stage
?
Query
Result
Pre-Computation
38On-Line Query Stage
39(No Transcript)
40Roadmap
- Background
- RWR Definitions
- RWR Algorithms
- Basic Idea
- FastRWR
- Pre-Compute Stage
- On-Line Stage
- Experimental Results
- Conclusion
41Experimental Setup
- Dataset
- DBLP/authorship
- Author-Paper
- 315k nodes
- 1,800k edges
- Approx. Quality Relative Accuracy
- Application Center-Piece Subgraph
42Query Time vs. Pre-Compute Time
Log Query Time
- Quality 90
- On-line
- Up to 150x speedup
- Pre-computation
- Two orders saving
Log Pre-compute Time
43Query Time vs. Pre-Storage
Log Query Time
- Quality 90
- On-line
- Up to 150x speedup
- Pre-storage
- Three orders saving
Log Storage
44Roadmap
- Background
- RWR Definitions
- RWR Algorithms
- Basic Idea
- FastRWR
- Pre-Compute Stage
- On-Line Stage
- Experimental Results
- Conclusion
45Conclusion
- FastRWR
- Reasonable quality preservation (90)
- 150x speed-up query time
- Orders of magnitude saving pre-compute storage
- More in the paper
- The variant of FastRWR and theoretic
justification - Implementation details
- normalization, low-rank approximation, sparse
- More experiments
- Other datasets, other applications
46QA
- Thank you!
- htong_at_cs.cmu.edu
- www.cs.cmu.edu/htong