Title: Heat%20Diffusion%20Model%20and%20its%20Applications
1Heat Diffusion Model and its Applications
- Haixuan Yang
- Term Presentation
- Dec 2, 2005
2Outline
- Introduction
- Heat Diffusion Model
- Heat Diffusion Classifiers
- Heat Diffusion Ranking
- Predictive Random Graph Ranking
- Experiments
- Conclusions and Future Work
3Introduction - heat diffusion
- Heat diffusion is a physical phenomena.
- In a medium, heat always flow from position with
high temperature to position with low
temperature. - Heat kernel is used to describe the amount of
heat that one point receives from another point. - The way that heat diffuse varies when the
underlying geometry varies.
4Introduction - related work
- Kondor Lafferty (NIPS2002)
- Construct a diffusion kernel on a graph
- Handle discrete attributes
- Apply to a large margin classifier
- Achieve goof performance in accuracy on 5 data
sets from UCI - Lafferty Kondor (JMLR2005)
- Construct a diffusion kernel on a special
manifold - Handle continuous attributes
- Restrict to text classification
- Apply to SVM
- Achieve good performance in accuracy on WEbKB and
Reuters - Belkin Niyogi (Neural Computation 2003)
- Reduce dimension by heat kernel and local
distance - Tenenbaum et al (Science 2000)
- Reduce dimension by local distance
5Introduction the ideas adopted
- Similarity between heat diffusion and density.
- Heat diffuses in the same way as Gaussian density
in the ideal case when the manifold is the
Euclidean space. -
- The way heat diffuses on a manifold can be
understood as a generalization of the Gaussian
density from Euclidean space to manifold. - Local information is relatively accurate in a
nonlinear manifold. - Learn local information by k nearest neighbors.
Direct distance may not be accurate
The curve may better measure the distance
6Introduction different ideas
- Unknown manifold in most cases.
- Unknown solution for the known manifold.
- The explicit form of the approximation to the
heat kernel in (Lafferty Lebanon JMLR2005) is
a rare case. - Establish the heat diffusion equation directly on
a graph that is either the K nearest neighbor
graph or the link graph. - The K nearest neighbor graph or the link graph is
considered as an approximation to the unknown
manifold. - Always have an explicit form in any case.
- Form a classifier by the solution directlyin the
application of classification. - Apply the heat kernel for ranking onthe Web
pages.
7Heat Diffusion Model - Notations
- G(V,E), a given directed graph, where
- V1,2,,n,
- E(i,j) if there is an edge from i to j,
- fi(t) the heat at node i at time t.
- RH(i,j,t,?t) amount of heat that at time t, i
receives from its antecedent j during a period of
?t. - DH(i,t,?t) amount of heat that at time t, i
diffuses to its subsequent nodes.
8Heat Diffusion Model - assumptions
- RH(i,j,t, ?t) is proportional to the time period
?t. - RH(i,j,t, ?t) is proportional to the heat at node
j. - RH(i,j,t, ?t) is zero if there is no link from j
to i. - DH(i,j,t, ?t) is proportional to the time period
?t. - DH(i,j,t, ?t) is proportional to the heat at node
i. - RH(i,j,t, ?t) is proportional to its outdegree
.
9Heat Diffusion Model - solution
- The heat difference fi(t?t) and fi(t) can be
expressed as - It can be expressed as a matrix form
-
-
- where we let
for simplicity. - Let ?t tends to zero, the above equation becomes
- Especially, we have
10Heat Diffusion Model weighted graph
- For weighted graphs, the heat difference fi(t?t)
and fi(t) can be expressed as - The solution is expressed as
-
-
-
11Heat Diffusion Classifiers - Illustration
NHDC Non-propagating Heat Diffusion
Classifier PHDC Propagating Heat Diffusion
Classifier
The first heat diffusion
The second heat diffusion
12Heat Diffusion Classifiers - Illustration
13Heat Diffusion Classifiers - Illustration
14Heat Diffusion Classifiers - Illustration
Heat received from A class 0.018 Heat received
from B class 0.016
Heat received from A class 0.002 Heat received
from B class 0.08
15Heat Diffusion Classifiers - algorithm - Step 1
- Construct neighborhood graph
- Define graph G over all data points both in the
training data set and in the test data set. - Add edge from j to i if j is one of the K
nearest neighbors of i. - Set edge weight w(i,j)d(i, j) if j is one of the
K nearest neighbors of i, where d(i, j) be the
Euclidean distance between point i and point j.
16Heat Diffusion Classifiers - algorithm - Step 2
- Compute the Heat Kernel
- Computing H for NHDC using
- Computing for PHDC using the equation
17Heat Diffusion Classifiers - algorithm - Step 3
- Compute the Heat Distribution
- For each class c,
- Set f(0)
- nodes labeled by class c, has an initial unit
heat at time 0, all other nodes have no heat at
time 0. - Compute the heat distribution
- In PHDC, use equation
- to compute the heat distribution.
- In NHDC, use equation
18Heat Diffusion Classifiers - algorithm - Step 4
- Classify the nodes
- By last step, we get the heat distribution
- for each class k, then, for each node in the
- test data set, classify it to the class from
- which it receives most heat.
19Heat Diffusion Classifiers - Connections with
other models
- The Parzen window approach (when the window
function takes the normal form) is a special case
of the NHDC. - It is a non-parametric method for probability
density estimation
For each class k
The class-conditional density for class k
Using Bayes rule
Assign x to a class whose value is maximal.
20Heat Diffusion Classifiers - Connections with
other models
- The Parzen window approach (when the window
function takes the normal form) is a special case
of the NHDC. - In our model, let Kn-1, then the graph
constructed in Step 1 will be a complete graph.
The matrix H will be
Using the heat equation f(t)Hf(0)
Heat that xp receives from the data points in
class k
21Heat Diffusion Classifiers - Connections with
other models
- KNN is a special case of the NHDC.
- KNN
- For each test data, assign it to the class that
has the maximal number in its K nearest neighbors.
22Heat Diffusion Classifiers - Connections with
other models
- KNN is a special case of the NHDC.
- In our model, let ß tend to infinity, then the
matrix H becomes
Using the heat equation f(t)Hf(0)
The number of the cases in class q in its K
nearest neighbor.
Heat that xp receives from the data points in
class k
23Heat Diffusion Classifiers - Connections with
other models
- PHDC can approximate NHDC.
- If ?is small, then
- Since the identity matrix has no effect on the
heat - distribution, PHDC and NHDC has similar
classification accuracy when ? is small.
24Heat Diffusion Classifiers - Connections with
other models
PHDC
When ? is small
NHDC
When ß is infinity
When kn-1
KNN
PWA
25Heat Diffusion Ranking - motivation
- The Web pages are considered to be drawn from an
unknown manifold. - The link structure forms a directed graph, which
is considered as an approximation to the unknown
manifold. - The heat kernel established on the Web graph is
considered as the representation of relationship
between Web pages. - When there are more paths from page j to page i,
i will receive more heat from j - When the path length from j to i is shorter, i
will receive more heat form j.
26Heat Diffusion Ranking - algorithm
- Let V be the set of the Web pages. If there is a
link from j to i, we - say there is edge (j,i). The graph is a static
graph. - Compute the Matrix H
- Compute or
- The i-row j-column element means the amount of
heat that i can receive from j from time 0 to 1,
and is used to measure the similarity from j to
i. - If the graph is a random graph, which is
generated by the first stage of the Predictive
Random graph Ranking, then - Compute the Matrix R
- Compute or
-
The algorithm is called DiffusionRank
27Heat Diffusion Ranking - advantages
- Its solution has two forms, both of which are
closed form. - Its solution is not symmetric, which better
models the nature of relativity of similarity. - It can be naturally employed to detect
group-group relation. - It can be used to anti-manipulation.
28Predictive Random Graph Ranking - motivation
- To improve the accuracy of DiffusionRank, we need
to model the Web graph accuratelyrandom graph. - The web is dynamic
- The observer is partial
- Links are different
- The random graph model can also improve other
ranking algorithms, and hence is called
predictive random graph ranking framework .
29Predictive Random Graph Ranking - framework
- Random Graph Generation Stage
- Engages the temporal, spatial and local link
information to construct a random graph. - Random Graph Ranking Stage
- Takes the random graph output and then calculates
the ranking result based on a candidate ranking
algorithm.
30Predictive Random Graph Ranking first stage
- The web is dynamic
- Predict the early Web structure as a random
graph Temporal Web Prediction Model - The observer is partial
- Different Web graph Gi (Vi ,Ei ) are obtained
by N different observers (or crawlers). - A random graph RG(V,P) is constructed by
- n(i,j) is the number of the graphs
where the link (i,j) appears. - Links are different
- As an example, a random graph RG(V,P) can be
constructed by - where j is the k(i, j)-th out-link from i
31Predictive Random Graph Ranking Temporal Web
Prediction Model
- From the viewpoint of a crawler, the web is
dynamic, and there are many dangling nodes (pages
that either have no out-link or have no known
out-link) - Classify dangling nodes
- Dangling nodes of class 1 (DNC1) those that
have been found but have not been visited. - Dangling nodes of class 2 (DNC2) those that
have been tried but not visited successfully. - Dangling nodes of class 3 (DNC3) those that
have been visited successfully but from which no
out-link is found.
32Predictive Random Graph Ranking Temporal Web
Prediction Model
- Suppose that all the nodes V can be partitioned
into three subsets . - denotes the set of all non-dangling nodes
(that have been crawled successfully and have at
least one out-link) - denotes the set of all dangling nodes of class
3 - denotes the set of all dangling nodes of class
1 - For each node v in V, the real in-degree of v is
not known.
33Predictive Random Graph Ranking Temporal Web
Prediction Model
- We predict the real in-degree of v by the number
of found links from C to v. - Assumption the number of found links from C to v
is proportional to the real number of links from
V to v. - The difference between real in-degree and the
predicted in-degree is distributed uniformly to
the nodes in .
34Predictive Random Graph Ranking Temporal Web
Prediction Model
Models the missing information from unvisited
nodes to nodes in V from D2 to V.
Model the known link information as Page (1998)
from C to V.
Model the users behavior as Kamvar (2003) when
facing dangling nodes of class 3 from D1 to V.
n the number of nodes in V m the number of
nodes in C m1 the number of nodes in D1.
35Predictive Random Graph Ranking second stage
- On a random graph RG(V,P)
- DiffusionRank
36Predictive Random Graph Ranking second stage
- On a random graph RG(V,P)
- PageRank
- Common Neighbor
- Jaccards Coeffient
- SimRank
37Experiments Heat Diffusion Classifiers
- 2 artificial Data sets and 6 datasets from UCI
- Spiral-100
Spiral-1000 - Compare with Parzen window (The window function
takes the normal form), KNN. - The result is the average of the ten-fold cross
validation.
38Experiments - Heat Diffusion Classifiers
- Experimental Setup
- Experimental Environments
- Hardware Nix Dual Intel Xeon 2.2GHz
- OS Linux Kernel 2.4.18-27smp (RedHat 7.3)
- Developing tool C
- Data Description
- In Credit-g, the 13 discrete variables are
- ignored since we only consider the
- continuous variables.
Dataset Cases Classes Variable
Spiral-100 100 2 3
Spiral-1000 1000 2 3
Credit-g 1000 2 7
Diabetes 768 2 8
Glass 214 6 9
Iris 150 3 4
Sonar 208 2 60
Vehicle 846 4 18
39Experiments - Heat Diffusion Classifiers
Algorithm NHDC NHDC PHDC PHDC PHDC KNN PWA
K 1/ß K 1/ß ? K 1/ß
Spiral-100 8 150 8 150 0.01 7 100
Spiral-1000 5 100 5 150 0.10 7 250
Credit-g 13 0 11 0 0.02 31 50
Diabetes 33 50 34 150 0.05 34 300
Glass 40 1750 38 1500 0.27 3 7500
Iris 15 0 13 50 0.47 7 350
Sonar 24 1650 24 1200 0.41 3 1150
Vehicle 8 350 10 600 0.11 10 650
40Experiments - Heat Diffusion Classifiers
Algorithm NHDC PHDC KNN PWA
Spiral-100 84 84 67 83
Spiral-1000 99.6 99.8 99.3 99.7
Credit-g 76.1 76.06 75.59 72.35
Diabetes 76.3 76.22 75.78 74.96
Glass 72.99 73.12 70.64 71.56
Iris 97.36 97.79 97.36 97.07
Sonar 88.75 89.07 82.86 88.28
Vehicle 72.90 72.93 71.41 72.45
41Experiments Predictive Random Graph Ranking
- Data
- Synthetic Web Graph
- Follow a power law
- Real Web Graph
- Within cuhk.edu.hk
t 1 2 3 4 5 6 7 8 9 10 11
V(t) 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
T(t) 1764 1778 1837 1920 1927 1936 1952 1954 1964 1994 2000
t 1 2 3 4 5 6 7 8 9 10 11
V(t) 7712 78662 109383 160019 252522 301707 373579 411724 444974 471684 502610
T(t) 18542 120970 157196 234701 355720 404728 476961 515534 549162 576139 607170
42Experiments Predictive Random Graph Ranking
- Methodology
- For each algorithm A, we have two versions
denoted by A and PreA. - A the original version
- PreA -- the version with the Temporal Web
Prediction Model - For each data series and for each algorithm A,
we obtain 22 ranking results - A1 , A2 , , A11
- PreA1 , PreA2 , , PreA11
- Compare the early results with the final result
A11 . - Value Difference
- Order Difference
43Experiments Predictive Random Graph Ranking
- Set Up
- For PageRank and PrePageRank,
- a0.85,
- g is the uniform distribution
- For DiffusionRank and PreDiffusionRank
- Use the discrete diffuse kernel
- s1, N20
44Experiments PageRank synthetic data
45Experiments PageRank real data
46Experiments DiffusionRank synthetic data
47Experiments DiffusionRank real data
48Conclusions
- Both NHDC and PHDC outperform KNN and Parzen
Window Approach in accuracy on these 8 datasets. - PHDC outperforms NHDC in accuracy on these 8
datasets. - DiffusionRank is another candidate of ranking
algorithm. - Temporal Web Prediction Model in effective in
PageRank and DiffusionRank. - The Predictive Random Graph Ranking framework
extends the scope of some original ranking
techniques.
49Future Work
- Approximate the manifold more accurately.
- Apply the non-symmetric heat kernel to SVM.
- Further investigate on partial observers and
weighted links.
50Q A