Measuring and Extracting Proximity in Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Measuring and Extracting Proximity in Networks

Description:

Collection of edges or links. Collection of nodes ... walk starting at v1 for path P= v1-v2-...-vr, will follow this path is given by: ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 40

Provided by: stude76

Learn more at: https://www.cs.kent.edu

Category:

more less

Transcript and Presenter's Notes

Title: Measuring and Extracting Proximity in Networks

1
Measuring and Extracting Proximity in Networks

By - Yehuda Koren, Stephen C.North and Chris
Volinsky
- Rahul Sehgal

2
Introduction

Network
Information hidden in a network
What is Proximity?
Why do we need proximity in a network?
Methods for measuring and extracting proximities
in a network.
CFEC ( Cycle free effective conductance) and how
to compute cycle-free escape probability
Extracting proximity through proximity graphs
obtained by CFEC
Working with large networks
Experiments
Conclusion
Questions

3
Network

Collection of edges or links
Collection of nodes
These edges and links help in deciding the
proximity in a given network.

4
Hidden information in a network
C
D
E
B
A
F
5
Hidden information in a network
C

If two people speak on the phone to many common
friends, the probability is high that they will
talk to each other in the future, or perhaps that
they already communicate through some other
medium as email.

D
E
B
A
F
6
Hidden information in a network
C

If two nodes are connected to a common node then
it might be possible that they have a strong
relationship or they dont have any relationship
at all.
for example in a telephone network where
many people call to service center but it might
be possible that two people who are calling they
dont know each other at all.

D
E
B
A
F
7
What is proximity?

Proximity is a method of measuring distance or
closeness between different objects.
It is a method of finding hidden relationship
between the objects.
It is a method of finding similarities and helps
in clustering objects or nodes.

8
Need of proximity

It measures potential information exchange
between two non-linked objects through
intermediaries
It can measure the extent to which two nodes
belong to each other.
It helps in knowing the likelihood that a link
will exist in future.
In social network settings proximity helps to
predict or track the propagation of an idea,
product or disease .
It helps in discovering unexpected communities in
a network.

9
Various Methods

Graph-theoretic distance
It is length of shortest path connecting two
nodes measured either by number of hops or sum of
weight of edges
Limitations
Proximity decays as nodes become farther apart.
Information may be lost due to friction or noise
at a particular node.
This method doesnt assume that proximity can
exist via multiple paths.

10
Various Methods contd

Network Flow (maximal network flow)
Limited capacity is assigned to each edge,
depending upon the weight of that edge and then
compute the maximal number of units that can be
simultaneously delivered from node s to node t.
It prefers high weight edges and captures the
premise that an increasing number of alternative
paths increases the proximity.
For example we consider the adjacent figure.

a1
B
A
b1
A
B
a1
11
Various Methods contd

Network Flow (maximal network flow)
Limitations
It disregards the length of the path.
It also follow that the maximal s-t flow (that is
graph flow from node s to node t) in a graph
equals the minimal s-t cut that is the minimal
edge capacity we need to remove to disconnect s
from t, therefore we cannot implement it in a
robust system.

12
Various Methods contd

Effective Conductance (EC)
Modeling of networks as an electric circuit by
treating the edges as resistors whose conductance
is the given edge weight.
In this method we keep the starting node (s) as 1
and the end node (t) as 0. and then we solve the
linear equations for getting voltage and current
on each edge.
It accounts for both path length and number of
alternative paths .
It avoids dependence on single shortest path and
bottlenecks
It has a monotonicity property which states that
in an electrical resistor network, increasing the
conductance of any resistor or increasing the
number of resistors increase conductance between
any two nodes.

13
Various Methods contd

Limitations of Effective Conductance-
Monotonicity has its limitations consider
following example

s1
a1
t1
same EC
a1
t
s
14
Various Methods contd

Sink augmented effective conductance
Each node in a network is connected to a
universal sink which is at voltage zero.
This universal sink competes with the node t.
Universal sink tax each node that absorbs a
portion of out going current. Consequently it
forces all the node to have degree greater than
1. So, our restriction for degree one node
doesnt exist any more.

15
Various Methods contd

Limitations of sink augmented effective
conductance
Its required to know how much current will flow
through each node . Understanding how such
parameter will influence the proximity is a
difficult task.
It destroys the concept of monotonicity . It
means whenever a new node is added to the network
it has a direct link to the sink but not to t. It
strengthens sink and it compete with node t and
thus the proximity between s and t is lost.
So, we look for a solution that can overcome the
limitations of above methods..

16
Random Walk

Definitions
Random Walk is a transition from one state to
another without depending on the previous state.
The transition could be to the same state also.

t
a3
a4
s
a2
a1
17
Random Walk

Random walk in network proximity is the infinite
number of attempts that is made to reach from
starting node s to end node t. It might be
possible that when we traverse this path we might
return back to the starting node s.

s
t
18
Random Walk

Explanation
In network it might be possible that during
random walk we might go back to s without going
to t.
As we can see in the diagram it might be possible
that we return back to s via path sa1-a2-a3-s
without going to t.
We are having a cyclic path which is leading us
back to starting point s.
continued gt

19
Random Walk

Diagram

a3
a2
s
a1
a5
t
a4
20
What is our goal?

We have to improve the effective conductance
measure and avoid any cyclic path..

21
CFEC( Cycle Free Effective Conductance)

It considers random walk interpretation
DEFINITION The cycle-free escape probability
from s to t is the probability that a random walk
originating at s will reach t without visiting
any node more than once.
In random walk, a probability of transition from
node i to node j is
probability of transition
from node i to node j
weight of edge from i
to j
degree of node i.

22
CFEC( Cycle Free Effective Conductance)

The probability that a random walk starting at v1
for path P v1-v2--vr, will follow this path is
given by

23
Features of CFEC

We have following equalities
Multiplying by the degree
R set of simple paths from s to t, simple path
are those that never visit the same node twice
CFEC discourages long paths as the probability of
following the path decays exponentially with its
length. It is given by
It supports proximity measure for multiple paths.
Degree-1 nodes dilute the significance of path
from s to t. So we cut the main graph into
sub-graph. And we preserve the original degree of
the nodes.

24
Explanation of CFEC features

CFEC discourages long paths

ck
s
t
c1
c2
Proximity decreases
25
Computing Cycle Free Escape Probability

Restrict the sum in following equation to K most
probable simple path.
Edge weights are transformed into edge lengths,
establishing 1-1 correspondence between path
probability and path length.
Path probability is given as

26
Extracting Proximity Through Proximity Graphs

Extract a subgraph that maximizes the
ratio
subgraph
constant,
Find subset of R, which required for above
problem.
Set of simple paths R is sorted in ascending
order of weights of paths.

27
Extracting Proximity Through Proximity Graphs

Optimizing the given formula using branch and
bound path merging algorithm

28
Branch and Bound path merging algorithm
29
Working with large networks

Growing candidate graphs via s, t neighborhoods
Producing the candidate graph is to find a sub
graph containing the highest weight paths
originating at either s or t.
Now, the problem becomes , find a sub graph
containing shortest paths originating at either s
or t. Our objective is to expand the
neighborhoods of s and t. This is done by
Dijkstras algorithm for computing shortest path
on graphs with non-negative edge lengths.

30
Working with large networks contd.

Determining neighborhood size
we determine paths which are not longer than L
log(). Path lengths greater than is are not
useful. Here L is the length of shortest s-t
path.
In practice, (L log())/2 neighborhoods might
be too large.

31
Working with large networks contd.

Pruning the neighborhood
We prune the neighbor in such a way that
dist(s,i)dist(t,i) gt ß , where s is our starting
node and t is terminating node.
Then, we exclude from the neighborhood any i for
which dist(s,i)dist(i,t) gt L log().

32
Experiments

Online movie database IMDB.
Co-authorship graph.
Telecommunication graph.

33
Experiments
34
Telecommunication Graph

Extract the candidate graph by growing
neighborhoods from the two nodes of interest.
2000 random pairs of telephone numbers to
calculate the CFEC value.
For 1808(90) of these pairs paths between them
were found. Others do not have any known path
between them, this could be due to, these number
were not in frequent use.
We have values for as 1,5,10 and 50.

35
Telecommunication Graph (contd)
36
Conclusion

CFEC proximity allows us to readily compute
proximity graphs, which are small portions of the
network that are aimed at capturing a related
proximity value.
An analyst studying proximity in a graph has to
focus on the most relevant part of the graph.
It is extension of connection graph which is
capable of presenting compact relationship
between objects of a network.
We can deduce relationship between more than two
endpoints, the flexibility to handle edge
direction, and the fact that they are obtained by
solving an intuitively tunable optimization
problem.