Title: Measuring and Extracting Proximity in Networks
1Measuring and Extracting Proximity in Networks
- By - Yehuda Koren, Stephen C.North and Chris
Volinsky -
-
- Rahul Sehgal
2Introduction
- Network
- Information hidden in a network
- What is Proximity?
- Why do we need proximity in a network?
- Methods for measuring and extracting proximities
in a network. - CFEC ( Cycle free effective conductance) and how
to compute cycle-free escape probability - Extracting proximity through proximity graphs
obtained by CFEC - Working with large networks
- Experiments
- Conclusion
- Questions
3 Network
- Collection of edges or links
- Collection of nodes
- These edges and links help in deciding the
proximity in a given network.
4Hidden information in a network
C
D
E
B
A
F
5Hidden information in a network
C
- If two people speak on the phone to many common
friends, the probability is high that they will
talk to each other in the future, or perhaps that
they already communicate through some other
medium as email.
D
E
B
A
F
6Hidden information in a network
C
- If two nodes are connected to a common node then
it might be possible that they have a strong
relationship or they dont have any relationship
at all. - for example in a telephone network where
many people call to service center but it might
be possible that two people who are calling they
dont know each other at all.
D
E
B
A
F
7 What is proximity?
- Proximity is a method of measuring distance or
closeness between different objects. - It is a method of finding hidden relationship
between the objects. - It is a method of finding similarities and helps
in clustering objects or nodes.
8Need of proximity
- It measures potential information exchange
between two non-linked objects through
intermediaries - It can measure the extent to which two nodes
belong to each other. - It helps in knowing the likelihood that a link
will exist in future. - In social network settings proximity helps to
predict or track the propagation of an idea,
product or disease . - It helps in discovering unexpected communities in
a network.
9Various Methods
- Graph-theoretic distance
- It is length of shortest path connecting two
nodes measured either by number of hops or sum of
weight of edges - Limitations
- Proximity decays as nodes become farther apart.
- Information may be lost due to friction or noise
at a particular node. - This method doesnt assume that proximity can
exist via multiple paths.
10Various Methods contd
- Network Flow (maximal network flow)
- Limited capacity is assigned to each edge,
depending upon the weight of that edge and then
compute the maximal number of units that can be
simultaneously delivered from node s to node t. - It prefers high weight edges and captures the
premise that an increasing number of alternative
paths increases the proximity. -
- For example we consider the adjacent figure.
a1
B
A
b1
A
B
a1
11Various Methods contd
- Network Flow (maximal network flow)
-
- Limitations
- It disregards the length of the path.
- It also follow that the maximal s-t flow (that is
graph flow from node s to node t) in a graph
equals the minimal s-t cut that is the minimal
edge capacity we need to remove to disconnect s
from t, therefore we cannot implement it in a
robust system.
12Various Methods contd
- Effective Conductance (EC)
- Modeling of networks as an electric circuit by
treating the edges as resistors whose conductance
is the given edge weight. - In this method we keep the starting node (s) as 1
and the end node (t) as 0. and then we solve the
linear equations for getting voltage and current
on each edge. - It accounts for both path length and number of
alternative paths . - It avoids dependence on single shortest path and
bottlenecks - It has a monotonicity property which states that
in an electrical resistor network, increasing the
conductance of any resistor or increasing the
number of resistors increase conductance between
any two nodes.
13Various Methods contd
- Limitations of Effective Conductance-
- Monotonicity has its limitations consider
following example
s1
a1
t1
same EC
a1
t
s
14Various Methods contd
- Sink augmented effective conductance
- Each node in a network is connected to a
universal sink which is at voltage zero. - This universal sink competes with the node t.
- Universal sink tax each node that absorbs a
portion of out going current. Consequently it
forces all the node to have degree greater than
1. So, our restriction for degree one node
doesnt exist any more.
15Various Methods contd
- Limitations of sink augmented effective
conductance - Its required to know how much current will flow
through each node . Understanding how such
parameter will influence the proximity is a
difficult task. - It destroys the concept of monotonicity . It
means whenever a new node is added to the network
it has a direct link to the sink but not to t. It
strengthens sink and it compete with node t and
thus the proximity between s and t is lost. - So, we look for a solution that can overcome the
limitations of above methods..
16Random Walk
- Definitions
- Random Walk is a transition from one state to
another without depending on the previous state.
The transition could be to the same state also.
t
a3
a4
s
a2
a1
17Random Walk
- Random walk in network proximity is the infinite
number of attempts that is made to reach from
starting node s to end node t. It might be
possible that when we traverse this path we might
return back to the starting node s.
s
t
18Random Walk
- Explanation
- In network it might be possible that during
random walk we might go back to s without going
to t. - As we can see in the diagram it might be possible
that we return back to s via path sa1-a2-a3-s
without going to t. - We are having a cyclic path which is leading us
back to starting point s. -
- continued gt
19Random Walk
a3
a2
s
a1
a5
t
a4
20What is our goal?
- We have to improve the effective conductance
measure and avoid any cyclic path..
21CFEC( Cycle Free Effective Conductance)
- It considers random walk interpretation
- DEFINITION The cycle-free escape probability
from s to t is the probability that a random walk
originating at s will reach t without visiting
any node more than once. - In random walk, a probability of transition from
node i to node j is -
- probability of transition
from node i to node j - weight of edge from i
to j - degree of node i.
22CFEC( Cycle Free Effective Conductance)
- The probability that a random walk starting at v1
for path P v1-v2--vr, will follow this path is
given by
23Features of CFEC
- We have following equalities
-
- Multiplying by the degree
- R set of simple paths from s to t, simple path
are those that never visit the same node twice - CFEC discourages long paths as the probability of
following the path decays exponentially with its
length. It is given by -
- It supports proximity measure for multiple paths.
- Degree-1 nodes dilute the significance of path
from s to t. So we cut the main graph into
sub-graph. And we preserve the original degree of
the nodes.
24Explanation of CFEC features
- CFEC discourages long paths
ck
s
t
c1
c2
Proximity decreases
25Computing Cycle Free Escape Probability
- Restrict the sum in following equation to K most
probable simple path. - Edge weights are transformed into edge lengths,
establishing 1-1 correspondence between path
probability and path length. - Path probability is given as
-
-
26Extracting Proximity Through Proximity Graphs
- Extract a subgraph that maximizes the
ratio -
- subgraph
- constant,
- Find subset of R, which required for above
problem. - Set of simple paths R is sorted in ascending
order of weights of paths.
27Extracting Proximity Through Proximity Graphs
- Optimizing the given formula using branch and
bound path merging algorithm
28Branch and Bound path merging algorithm
29Working with large networks
- Growing candidate graphs via s, t neighborhoods
- Producing the candidate graph is to find a sub
graph containing the highest weight paths
originating at either s or t. - Now, the problem becomes , find a sub graph
containing shortest paths originating at either s
or t. Our objective is to expand the
neighborhoods of s and t. This is done by
Dijkstras algorithm for computing shortest path
on graphs with non-negative edge lengths.
30Working with large networks contd.
- Determining neighborhood size
- we determine paths which are not longer than L
log(). Path lengths greater than is are not
useful. Here L is the length of shortest s-t
path. - In practice, (L log())/2 neighborhoods might
be too large.
31Working with large networks contd.
- Pruning the neighborhood
- We prune the neighbor in such a way that
dist(s,i)dist(t,i) gt ß , where s is our starting
node and t is terminating node. - Then, we exclude from the neighborhood any i for
which dist(s,i)dist(i,t) gt L log().
32Experiments
- Online movie database IMDB.
- Co-authorship graph.
- Telecommunication graph.
33Experiments
34Telecommunication Graph
- Extract the candidate graph by growing
neighborhoods from the two nodes of interest. - 2000 random pairs of telephone numbers to
calculate the CFEC value. - For 1808(90) of these pairs paths between them
were found. Others do not have any known path
between them, this could be due to, these number
were not in frequent use. - We have values for as 1,5,10 and 50.
35Telecommunication Graph (contd)
36Conclusion
- CFEC proximity allows us to readily compute
proximity graphs, which are small portions of the
network that are aimed at capturing a related
proximity value. - An analyst studying proximity in a graph has to
focus on the most relevant part of the graph. - It is extension of connection graph which is
capable of presenting compact relationship
between objects of a network. - We can deduce relationship between more than two
endpoints, the flexibility to handle edge
direction, and the fact that they are obtained by
solving an intuitively tunable optimization
problem.
37 38References
39References