Network centrality

About This Presentation

Title:

Network centrality

Description:

Lecture 5: Network centrality Slides are modified from Lada Adamic * * * * Outline Degree centrality Centralization Betweenness centrality Closeness centrality ... – PowerPoint PPT presentation

Number of Views:284

Avg rating:3.0/5.0

Slides: 88

Provided by: LAD101

Category:

more less

Transcript and Presenter's Notes

Title: Network centrality

1
Lecture 5 Network centrality
Slides are modified from Lada Adamic
2
Measures and Metrics

Knowing the structure of a network, we can
calculate various useful quantities or measures
that capture particular features of the network
topology.
basis of most of such measures are from social
network analysis
So far,
Degree distribution, Average path length, Density
Centrality
Degree, Eigenvector, Katz, PageRank, Hubs,
Closeness, Betweenness, .
Several other graph metrics
Clustering coefficient, Assortativity,
Modularity,

3
Characterizing networksWho is most central?
4
network centrality

Which nodes are most central?
Definition of central varies by context/purpose
Local measure
degree
Relative to rest of network
closeness, betweenness, eigenvector (Bonacich
power centrality), Katz, PageRank,
How evenly is centrality distributed among nodes?
Centralization, hubs and authorities,

5
centrality whos important based on their
network position
In each of the following networks, X has higher
centrality than Y according to a particular
measure
indegree
outdegree
betweenness
closeness
6
Outline

Degree centrality
Centralization
Betweenness centrality
Closeness centrality
Eigenvector centrality
Bonacich power centrality
Katz centrality
PageRank
Hubs and Authorities

7
degree centrality (undirected)
He who has many friends is most important.

When is the number of connections the best
centrality measure?
people who will do favors for you
people you can talk to (influence set,
information access, )
influence of an article in terms of citations
(using in-degree)

8
degree normalized degree centrality
divide by the max. possible, i.e. (N-1)
9
Prestige in directed social networks

when prestige may be the right word
admiration
influence
gift-giving
trust
directionality especially important in instances
where ties may not be reciprocated (e.g. dining
partners choice network)
when prestige may not be the right word
gives advice to (can reverse direction)
gives orders to (- -)
lends money to (- -)
dislikes
distrusts

10
Extensions of undirected degree centrality -
prestige

degree centrality
indegree centrality
a paper that is cited by many others has high
prestige
a person nominated by many others for a reward
has high prestige

11
centralization how equal are the nodes?
How much variation is there in the centrality
scores among the nodes?
Freemans general formula for centralization
(can use other metrics, e.g. gini coefficient
or standard deviation)
maximum value in the network
12
degree centralization examples
CD 0.167
CD 1.0
CD 0.167
13
degree centralization examples
example financial trading networks
high centralization one node trading with many
others
low centralization trades are more evenly
distributed
14
when degree isnt everything
In what ways does degree fail to capture
centrality in the following graphs?

ability to broker between groups
likelihood that information originating anywhere
in the network reaches you

15
Outline

Degree centrality
Centralization
Betweenness centrality
Closeness centrality
Eigenvector centrality
Bonacich power centrality
Katz centrality
PageRank
Hubs and Authorities

16
betweenness another centrality measure

intuition how many pairs of individuals would
have to go through you in order to reach one
another in the minimum number of hops?
who has higher betweenness, X or Y?

X
Y
17
betweenness on toy networks

non-normalized version

A
B
C
E
D

A lies between no two other vertices
B lies between A and 3 other vertices C, D, and
E
C lies between 4 pairs of vertices
(A,D),(A,E),(B,D),(B,E)
note that there are no alternate paths for these
pairs to take, so C gets full credit

18
betweenness centrality definition
paths between j and k that pass through i
betweenness of vertex i
all paths between j and k
Where gjk the number of geodesics connecting
j-k, and gjk the number that actor i is on.
Usually normalized by
number of pairs of vertices excluding the vertex
itself
directed graph (N-1)(N-2)
19
betweenness on toy networks

non-normalized version

20
betweenness on toy networks

non-normalized version

broker
21
example
Nodes are sized by degree, and colored by
betweenness.
Can you spot nodes with high betweenness but
relatively low degree?
What about high degree but relatively low
betweenness?
22
betweenness on toy networks

non-normalized version

why do C and D each have betweenness 1?
They are both on shortest paths for pairs (A,E),
and (B,E), and so must share credit
½½ 1
Can you figure out why B has betweenness 3.5
while E has betweenness 0.5?

C
A
E
B
D
23
Alternative betweenness computations

Slight variations in geodesic path computations
inclusion of self in the computations
Flow betweenness
Based on the idea of maximum flow
edge-independent path selection effects the
results
May not include geodesic paths
Random-walk betweenness
Based on the idea of random walks
Usually yields ranking similar to geodesic
betweenness
Many other alternative definitions exist based on
diffusion, transmission or flow along network
edges

24
Extending betweenness centrality to directed
networks

We now consider the fraction of all directed
paths between any two vertices that pass through
a node

paths between j and k that pass through i
betweenness of vertex i
all paths between j and k

Only modification when normalizing, we have
(N-1)(N-2) instead of (N-1)(N-2)/2, because we
have twice as many ordered pairs as unordered
pairs

25
Directed geodesics

A node does not necessarily lie on a geodesic
from j to k if it lies on a geodesic from k to j

j
k
26
Outline

Degree centrality
Centralization
Betweenness centrality
Closeness centrality
Eigenvector centrality
Bonacich power centrality
Katz centrality
PageRank
Hubs and Authorities

27
closeness another centrality measure

What if its not so important to have many direct
friends?
Or be between others
But one still wants to be in the middle of
things,
not too far from the center

28
closeness centrality definition
Closeness is based on the length of the average
shortest path between a vertex and all vertices
in the graph
Closeness Centrality
depends on inverse distance to other vertices
Normalized Closeness Centrality
29
closeness centrality toy example
A
B
C
E
D
30
closeness centrality more toy examples
31
how closely do degree and betweenness correspond
to closeness?

degree
number of connections
denoted by size
closeness
length of shortest path to all others
denoted by color

32
Closeness centrality

Values tend to span a rather small dynamic range
typical distance increases logarithmically with
network size
In a typical network the closeness centrality C
might span a factor of five or less
It is difficult to distinguish between central
and less central vertices
a small change in network might considerably
affect the centrality order
Alternative computations exist but they have
their own problems

33
Influence range

The influence range of i is the set of vertices
who are reachable from the node i

34
Extensions of undirected closeness centrality

closeness centrality usually implies
all paths should lead to you
paths should lead from you to everywhere else
usually consider only vertices from which the
node i in question can be reached

35
Outline

Degree centrality
Centralization
Betweenness centrality
Closeness centrality
Eigenvector centrality
Bonacich power centrality
Katz centrality
PageRank
Hubs and Authorities
Applications to Information Retrieval
LexRank

36
Eigenvalues and eigenvectors

Eigenvalues and eigenvectors have their origins
in physics, in particular in problems where
motion is involved, although their uses extend
from solutions to stress and strain problems to
differential equations and quantum mechanics.
Eigenvectors are vectors that point in directions
where there is no rotation. Eigenvalues are the
change in length of the eigenvector from the
original length.
The basic equation in eigenvalue problems is

Slides from Fred K. Duennebier
37
Eigenvalues and eigenvectors

In words, this deceptively simple equation says
that for the square matrix A, there is a vector x
such that the product of Ax is a SCALAR, ?, that,
when multiplied by x, results in the same
product.
The multiplication of vector x by a scalar
constant is the same as stretching or shrinking
the coordinates by a constant value.
The vector x is called an eigenvector and the
scalar ? is called an eigenvalue.

38
(E.01)
Do all matrices have real eigenvalues? No, they
must be square and the determinant of A- ?I must
equal zero. This is easy to show This can only
be true if det(A- ?I )A- ?I 0 Are
eigenvectors unique? No, if x is an
eigenvector, then ?x is also an eigenvector and ?
? is an eigenvalue.
(E.02)
(E.03)
A(?x) ?Ax ?lx l (? x)
(E.04)
39
How do you calculate eigenvectors and
eigenvalues? Expand equation (E.03) det(A- ?I
)A- ?I 0 for a 2x2 matrix
(E.05)
For a 2-dimensional problem such as this, the
equation above is a simple quadratic equation
with two solutions for ?. In fact, there is
generally one eigenvalue for each dimension, but
some may be zero, and some complex.
40
The solution to E.05 is
(E.06)
(E.07)
This characteristic equation does not involve
x, and the resulting values of ? can be used to
solve for x. Consider the following example
Eqn. E.07 doesnt work here because
a11a22-a12a120, so we use E.06
41
We see that one solution to this equation is ?0,
and dividing both sides of the above equation by
? yields ?5. Thus we have our two eigenvalues,
and the eigenvectors for the first eigenvalue,
?0 are
These equations are multiples of x-2y, so the
smallest whole number values that fit are x2,
y-1
42
For the other eigenvalue, ?5

This example is rather special A-1 does not
exist, the two rows of A- ?I are dependent and
thus one of the eigenvalues is zero. (Zero is a
legitimate eigenvalue!)
EXAMPLE A more common case is A 1.05 .05
.05 1 used in the strain exercise. Find the
eigenvectors and eigenvalues for this A, and then
calculate V,DeigA.
The procedure is
Compute the determinant of A- ?I
Find the roots of the polynomial given by A-
?I0
Solve the system of equations (A- ?I)x0

43
What good are such things? Consider the matrix
What is A100 ? We can get A100 by multiplying
matrices many many times
Or we could find the eigenvalues of A and obtain
A100 very quickly using eigenvalues.
44
For now, Ill just know that there are two
eigenvectors for A
The eigenvectors are x1.6 .4 and x21
-1, and the eigenvalues are ?11 and
?20.5. Note that, if we multiply x1 by A, we
get x1. If we multiply x1 by A again, we STILL
get x1. Thus x1 doesnt change as we mulitiply
it by An.
45
What about x2? When we multiply A by x2, we get
x2/2, and if we multiply x2 by A2, we get x2/4 .
This number gets very small fast. Note that
when A is squared the eigenvectors stay the same,
but the eigenvalues are squared! Back to our
original problem we note that for A100, the
eigenvectors will be the same, the eigenvalues
?11 and ?2(0.5)100, which is effectively
zero. Each eigenvector is multiplied by its
eigenvalue whenever A is applied,
46
Outline

Degree centrality
Centralization
Betweenness centrality
Closeness centrality
Eigenvector centrality
Bonacich power centrality
Katz centrality
PageRank
Hubs and Authorities
Applications to Information Retrieval
LexRank

47
Eigenvector Centrality

48
Eigenvector Centrality

49
Eigenvector Centrality

Can be calculated for directed graphs as well
We need to decide between incoming or outgoing
edges
A has no incoming edges, hence a centrality of 0
B has only an incoming edge from A
hence its centrality is also 0
Only vertices that are in a strongly connected
component of two or more vertices or the
out-component of such a component have non-zero
centrality

50
Katz centrality

51
Katz Centrality b

The magnitude of b reflects the radius of power
Small values of b weight local structure
Larger values weight global structure
If b gt 0, ego has higher centrality when tied to
people who are central
If b lt 0, then ego has higher centrality when
tied to people who are not central
With b 0, you get degree centrality

52
Katz Centrality examples
b.25
b-.25
Why does the middle node have lower centrality
than its neighbors when b is negative?
53
PageRank bringing order to the web

Its in the links
links to URLs can be interpreted as endorsements
or recommendations
the more links a URL receives, the more likely it
is to be a good/entertaining/provocative/authorita
tive/interesting information source
but not all link sources are created equal
a link from a respected information source
a link from a page created by a spammer

an important page, e.g. slashdot
Many webpages scattered across the web
if a web page isslashdotted, it gains attention
54
PageRank

55
Ranking pages by tracking a drunk

A random walker following edges in a network for
a very long time will spend a proportion of time
at each nodewhich can be used as a measure
ofimportance

56
Trapping a drunk

Problem with pure random walk metric
Drunk can be trapped and end up going in circles

57
Ingenuity of the PageRank algorithm

Allow drunk to teleport with some probability
e.g. random websurfer follows links for a while,
but with some probability teleports to a random
page
bookmarked page or uses a search engine to start
anew

58
PageRank algorithm

where p1,p2,...,pN are the pages under
consideration,
M(pi) is the set of pages that link to pi,
L(pj) is the number of outbound links on page pj,
and
N is the total number of pages.
d is the random jumping probability (d 0.85 for
google)

59
Exercise PageRank

What happens to the relative PageRank scores of
the nodes as you increase the teleportation
probability?
Can you construct a network such that a node with
low indegree has the highest PageRank?

GUESS PageRank demo
http//projects.si.umich.edu/netlearn/GUESS/pagera
nk.html
60
example probable location of random walker after
1 step
20 teleportation probability
61
example location probability after 10 steps
62
Matrix-based Centrality measures
with constant term
without constant term
PageRank
Degree centrality
Divide by out-degree
No division
Eigenvector centrality
Katz centrality
63
Outline

Degree centrality
Centralization
Betweenness centrality
Closeness centrality
Eigenvector centrality
Bonacich power centrality
Katz centrality
PageRank
Hubs and Authorities
Applications to Information Retrieval
LexRank

64
Hubs and Authorities

In directed networks, vertices that point to
important resources should also get a high
centrality
e.g. review articles, web indexes
recursive definition

authorities are nodes that are linked to by good
hubs
hubs are nodes that links to good authorities
65
Hyperlink-Induced Topic Search

HITS algorithm
start with a set of pages matching a query
expand the set by following forward and back
links
take transition matrix E, where the i,jth entry
Eij 1/ni
where i links to j, and ni is the number of links
from i
then one can compute the authority scores a, and
hub scores h through an iterative approach

66
Outline

Degree centrality
Centralization
Betweenness centrality
Closeness centrality
Eigenvector centrality
Bonacich power centrality
Katz centrality
PageRank
Hubs and Authorities
Applications to Information Retrieval
LexRank

67
Applications to Information Retrieval

Can we use the notion of centrality to pick the
best summary sentence?
Can we use the subgraph of query results to infer
something about the query?
Can we use a graph of word translations to expand
dictionaries? disambiguate word meanings?
How might one use the HITS algorithm for document
summarization?
Consider a bipartite graph of sentences and words

68
Centrality in summarization

Extractive summarization
pick k sentences that are most representative of
a collection of n sentences
Motivation
capture the most central words in a document or
cluster
Centroid score Radev al. 2000, 2004a
Alternative methods for computing centrality?

69
Sample multidocument cluster
(DUC cluster d1003t)
1 (d1s1) Iraqi Vice President Taha Yassin Ramadan
announced today, Sunday, that Iraq refuses to
back down from its decision to stop cooperating
with disarmament inspectors before its demands
are met. 2 (d2s1) Iraqi Vice president Taha
Yassin Ramadan announced today, Thursday, that
Iraq rejects cooperating with the United Nations
except on the issue of lifting the blockade
imposed upon it since the year 1990. 3 (d2s2)
Ramadan told reporters in Baghdad that "Iraq
cannot deal positively with whoever represents
the Security Council unless there was a clear
stance on the issue of lifting the blockade off
of it. 4 (d2s3) Baghdad had decided late last
October to completely cease cooperating with the
inspectors of the United Nations Special
Commission (UNSCOM), in charge of disarming
Iraq's weapons, and whose work became very
limited since the fifth of August, and announced
it will not resume its cooperation with the
Commission even if it were subjected to a
military operation. 5 (d3s1) The Russian Foreign
Minister, Igor Ivanov, warned today, Wednesday
against using force against Iraq, which will
destroy, according to him, seven years of
difficult diplomatic work and will complicate the
regional situation in the area. 6 (d3s2) Ivanov
contended that carrying out air strikes against
Iraq, who refuses to cooperate with the United
Nations inspectors, will end the tremendous
work achieved by the international group during
the past seven years and will complicate the
situation in the region.'' 7 (d3s3) Nevertheless,
Ivanov stressed that Baghdad must resume working
with the Special Commission in charge of
disarming the Iraqi weapons of mass destruction
(UNSCOM). 8 (d4s1) The Special Representative of
the United Nations Secretary-General in Baghdad,
Prakash Shah, announced today, Wednesday, after
meeting with the Iraqi Deputy Prime Minister
Tariq Aziz, that Iraq refuses to back down from
its decision to cut off cooperation with the
disarmament inspectors. 9 (d5s1) British Prime
Minister Tony Blair said today, Sunday, that the
crisis between the international community and
Iraq did not end'' and that Britain is still
ready, prepared, and able to strike Iraq.'' 10
(d5s2) In a gathering with the press held at the
Prime Minister's office, Blair contended that the
crisis with Iraq will not end until Iraq has
absolutely and unconditionally respected its
commitments'' towards the United Nations. 11
(d5s3) A spokesman for Tony Blair had indicated
that the British Prime Minister gave permission
to British Air Force Tornado planes stationed in
Kuwait to join the aerial bombardment against
Iraq.
70
Cosine between sentences

Let s1 and s2 be two sentences.
Let x and y be their representations in an
n-dimensional vector space
The cosine between is then computed based on the
inner product of the two.

The cosine ranges from 0 to 1.

71
LexRank (Cosine centrality)
1 2 3 4 5 6 7 8 9 10 11
1 1.00 0.45 0.02 0.17 0.03 0.22 0.03 0.28 0.06 0.06 0.00
2 0.45 1.00 0.16 0.27 0.03 0.19 0.03 0.21 0.03 0.15 0.00
3 0.02 0.16 1.00 0.03 0.00 0.01 0.03 0.04 0.00 0.01 0.00
4 0.17 0.27 0.03 1.00 0.01 0.16 0.28 0.17 0.00 0.09 0.01
5 0.03 0.03 0.00 0.01 1.00 0.29 0.05 0.15 0.20 0.04 0.18
6 0.22 0.19 0.01 0.16 0.29 1.00 0.05 0.29 0.04 0.20 0.03
7 0.03 0.03 0.03 0.28 0.05 0.05 1.00 0.06 0.00 0.00 0.01
8 0.28 0.21 0.04 0.17 0.15 0.29 0.06 1.00 0.25 0.20 0.17
9 0.06 0.03 0.00 0.00 0.20 0.04 0.00 0.25 1.00 0.26 0.38
10 0.06 0.15 0.01 0.09 0.04 0.20 0.00 0.20 0.26 1.00 0.12
11 0.00 0.00 0.00 0.01 0.18 0.03 0.01 0.17 0.38 0.12 1.00
72
Lexical centrality (t0.3)
73
Lexical centrality (t0.2)
74
Lexical centrality (t0.1)
Sentences vote for the most central sentence
75
LexRank

T1Tn are pages that link to A,
c(Ti) is the outdegree of pageTi, and
N is the total number of pages.
d is the damping factor, or the probability
that we jump to a far-away node during the
random walk.
It accounts for disconnected components or
periodic graphs.
When d 0, we have a strict uniform
distribution.When d 1, the method is not
guaranteed to converge to a unique solution.
Typical value for d is between 0.1,0.2 (Brin
and Page, 1998).

Günes Erkan and Dragomir R. Radev, LexRank
Graph-based Lexical Centrality as Salience in
Text Summarization
76
lab Lexrank demo

how does the summary change as you
increase the cosine similarity threshold for an
edge
how similar two sentences have to be?
increase the salience threshold (minimum degree
of a node)

http//tangra.si.umich.edu/demos/lexrank/
77
Content similarity distributions for web pages
(DMOZ) and scientific articles (PNAS)
Menczer, Filippo (2004) The evolution of document
networks.
78
what is that good for?

How could you take advantage of the fact that
pages that are similar in content tend to link to
one another?

79
What can networks of query results tell us about
the query?

If query results are highly interlinked, is this
a narrow or broad query?
How could you use query connection graphs to
predict whether a query will be reformulated?

Jure Leskovec, Susan Dumais Web Projections
Learning from Contextual Subgraphs of the Web
80
How can bipartite citation graphs be used to find
related articles?

co-citation both A and B are cited by many other
papers (C, D, E )

B
A
C
D
E

bibliographic coupling both A and B are cite
many of the same articles (F,G,H )

81
which of these pairs is more proximate

according to cycle free effective conductance
the probability that you reach the other node
before cycling back on yourself, while doing a
random walk.

82
Proximity as cycle free effective conductance

Measuring and Extracting Proximity in Networks by
Yehuda Koren, Stephen C. North, Chris Volinsky,
KDD 2006
demo http//public.research.att.com/volinsky/cgi
-bin/prox/prox.pl

83
Using network algorithms (specifically proximity)
to improve movie recommendations can pay off
Source undetermined
84
final IR application machine translation

not all pairwise translations are available
e.g. between rare languages
in some applications, e.g. image search, a word
may have multiple meanings
spring is an example in english
But in other languages, the word may be
unambiguous.
automated translation could be the key

or
or
or
85
final IR application machine translation

if we combine all known word pairs, can we
construct additional dictionaries between rare
languages?

source Reiter et al., Lexical Translation with
Application to Image Search on the Web
86
Automatic translation network structure

Two words more likely to have same meaning if
there are multiple indirect paths of length 2
through other languages

???? Arabic
3
printemps French
udaherri Basque
3

1

3
1

1
3
3
1
1
3

koanga Maori
primavera Spanish
1
spring English
2
2
??????? Russian

87
summary