Title: Network centrality
1Lecture 5 Network centrality
Slides are modified from Lada Adamic
2Measures and Metrics
- Knowing the structure of a network, we can
calculate various useful quantities or measures
that capture particular features of the network
topology. - basis of most of such measures are from social
network analysis - So far,
- Degree distribution, Average path length, Density
- Centrality
- Degree, Eigenvector, Katz, PageRank, Hubs,
Closeness, Betweenness, . - Several other graph metrics
- Clustering coefficient, Assortativity,
Modularity,
3Characterizing networksWho is most central?
4network centrality
- Which nodes are most central?
- Definition of central varies by context/purpose
- Local measure
- degree
- Relative to rest of network
- closeness, betweenness, eigenvector (Bonacich
power centrality), Katz, PageRank, - How evenly is centrality distributed among nodes?
- Centralization, hubs and authorities,
5centrality whos important based on their
network position
In each of the following networks, X has higher
centrality than Y according to a particular
measure
indegree
outdegree
betweenness
closeness
6Outline
- Degree centrality
- Centralization
- Betweenness centrality
- Closeness centrality
- Eigenvector centrality
- Bonacich power centrality
- Katz centrality
- PageRank
- Hubs and Authorities
7degree centrality (undirected)
He who has many friends is most important.
- When is the number of connections the best
centrality measure? - people who will do favors for you
- people you can talk to (influence set,
information access, ) - influence of an article in terms of citations
(using in-degree)
8degree normalized degree centrality
divide by the max. possible, i.e. (N-1)
9Prestige in directed social networks
- when prestige may be the right word
- admiration
- influence
- gift-giving
- trust
- directionality especially important in instances
where ties may not be reciprocated (e.g. dining
partners choice network) - when prestige may not be the right word
- gives advice to (can reverse direction)
- gives orders to (- -)
- lends money to (- -)
- dislikes
- distrusts
10Extensions of undirected degree centrality -
prestige
- degree centrality
- indegree centrality
- a paper that is cited by many others has high
prestige - a person nominated by many others for a reward
has high prestige
11centralization how equal are the nodes?
How much variation is there in the centrality
scores among the nodes?
Freemans general formula for centralization
(can use other metrics, e.g. gini coefficient
or standard deviation)
maximum value in the network
12degree centralization examples
CD 0.167
CD 1.0
CD 0.167
13degree centralization examples
example financial trading networks
high centralization one node trading with many
others
low centralization trades are more evenly
distributed
14when degree isnt everything
In what ways does degree fail to capture
centrality in the following graphs?
- ability to broker between groups
- likelihood that information originating anywhere
in the network reaches you
15Outline
- Degree centrality
- Centralization
- Betweenness centrality
- Closeness centrality
- Eigenvector centrality
- Bonacich power centrality
- Katz centrality
- PageRank
- Hubs and Authorities
16betweenness another centrality measure
- intuition how many pairs of individuals would
have to go through you in order to reach one
another in the minimum number of hops? - who has higher betweenness, X or Y?
X
Y
17betweenness on toy networks
A
B
C
E
D
- A lies between no two other vertices
- B lies between A and 3 other vertices C, D, and
E - C lies between 4 pairs of vertices
(A,D),(A,E),(B,D),(B,E) - note that there are no alternate paths for these
pairs to take, so C gets full credit
18betweenness centrality definition
paths between j and k that pass through i
betweenness of vertex i
all paths between j and k
Where gjk the number of geodesics connecting
j-k, and gjk the number that actor i is on.
Usually normalized by
number of pairs of vertices excluding the vertex
itself
directed graph (N-1)(N-2)
19betweenness on toy networks
20betweenness on toy networks
broker
21example
Nodes are sized by degree, and colored by
betweenness.
Can you spot nodes with high betweenness but
relatively low degree?
What about high degree but relatively low
betweenness?
22betweenness on toy networks
- why do C and D each have betweenness 1?
- They are both on shortest paths for pairs (A,E),
and (B,E), and so must share credit - ½½ 1
- Can you figure out why B has betweenness 3.5
while E has betweenness 0.5?
C
A
E
B
D
23Alternative betweenness computations
- Slight variations in geodesic path computations
- inclusion of self in the computations
- Flow betweenness
- Based on the idea of maximum flow
- edge-independent path selection effects the
results - May not include geodesic paths
- Random-walk betweenness
- Based on the idea of random walks
- Usually yields ranking similar to geodesic
betweenness - Many other alternative definitions exist based on
diffusion, transmission or flow along network
edges
24Extending betweenness centrality to directed
networks
- We now consider the fraction of all directed
paths between any two vertices that pass through
a node
paths between j and k that pass through i
betweenness of vertex i
all paths between j and k
- Only modification when normalizing, we have
(N-1)(N-2) instead of (N-1)(N-2)/2, because we
have twice as many ordered pairs as unordered
pairs
25Directed geodesics
- A node does not necessarily lie on a geodesic
from j to k if it lies on a geodesic from k to j
j
k
26Outline
- Degree centrality
- Centralization
- Betweenness centrality
- Closeness centrality
- Eigenvector centrality
- Bonacich power centrality
- Katz centrality
- PageRank
- Hubs and Authorities
27closeness another centrality measure
- What if its not so important to have many direct
friends? - Or be between others
- But one still wants to be in the middle of
things, - not too far from the center
28closeness centrality definition
Closeness is based on the length of the average
shortest path between a vertex and all vertices
in the graph
Closeness Centrality
depends on inverse distance to other vertices
Normalized Closeness Centrality
29closeness centrality toy example
A
B
C
E
D
30closeness centrality more toy examples
31how closely do degree and betweenness correspond
to closeness?
- degree
- number of connections
- denoted by size
- closeness
- length of shortest path to all others
- denoted by color
32Closeness centrality
- Values tend to span a rather small dynamic range
- typical distance increases logarithmically with
network size - In a typical network the closeness centrality C
might span a factor of five or less - It is difficult to distinguish between central
and less central vertices - a small change in network might considerably
affect the centrality order - Alternative computations exist but they have
their own problems
33Influence range
- The influence range of i is the set of vertices
who are reachable from the node i
34Extensions of undirected closeness centrality
- closeness centrality usually implies
- all paths should lead to you
- paths should lead from you to everywhere else
- usually consider only vertices from which the
node i in question can be reached
35Outline
- Degree centrality
- Centralization
- Betweenness centrality
- Closeness centrality
- Eigenvector centrality
- Bonacich power centrality
- Katz centrality
- PageRank
- Hubs and Authorities
- Applications to Information Retrieval
- LexRank
36Eigenvalues and eigenvectors
- Eigenvalues and eigenvectors have their origins
in physics, in particular in problems where
motion is involved, although their uses extend
from solutions to stress and strain problems to
differential equations and quantum mechanics. - Eigenvectors are vectors that point in directions
where there is no rotation. Eigenvalues are the
change in length of the eigenvector from the
original length. - The basic equation in eigenvalue problems is
Slides from Fred K. Duennebier
37Eigenvalues and eigenvectors
- In words, this deceptively simple equation says
that for the square matrix A, there is a vector x
such that the product of Ax is a SCALAR, ?, that,
when multiplied by x, results in the same
product. - The multiplication of vector x by a scalar
constant is the same as stretching or shrinking
the coordinates by a constant value. - The vector x is called an eigenvector and the
scalar ? is called an eigenvalue.
38(E.01)
Do all matrices have real eigenvalues? No, they
must be square and the determinant of A- ?I must
equal zero. This is easy to show This can only
be true if det(A- ?I )A- ?I 0 Are
eigenvectors unique? No, if x is an
eigenvector, then ?x is also an eigenvector and ?
? is an eigenvalue.
(E.02)
(E.03)
A(?x) ?Ax ?lx l (? x)
(E.04)
39How do you calculate eigenvectors and
eigenvalues? Expand equation (E.03) det(A- ?I
)A- ?I 0 for a 2x2 matrix
(E.05)
For a 2-dimensional problem such as this, the
equation above is a simple quadratic equation
with two solutions for ?. In fact, there is
generally one eigenvalue for each dimension, but
some may be zero, and some complex.
40The solution to E.05 is
(E.06)
(E.07)
This characteristic equation does not involve
x, and the resulting values of ? can be used to
solve for x. Consider the following example
Eqn. E.07 doesnt work here because
a11a22-a12a120, so we use E.06
41We see that one solution to this equation is ?0,
and dividing both sides of the above equation by
? yields ?5. Thus we have our two eigenvalues,
and the eigenvectors for the first eigenvalue,
?0 are
These equations are multiples of x-2y, so the
smallest whole number values that fit are x2,
y-1
42For the other eigenvalue, ?5
- This example is rather special A-1 does not
exist, the two rows of A- ?I are dependent and
thus one of the eigenvalues is zero. (Zero is a
legitimate eigenvalue!) - EXAMPLE A more common case is A 1.05 .05
.05 1 used in the strain exercise. Find the
eigenvectors and eigenvalues for this A, and then
calculate V,DeigA. - The procedure is
- Compute the determinant of A- ?I
- Find the roots of the polynomial given by A-
?I0 - Solve the system of equations (A- ?I)x0
43What good are such things? Consider the matrix
What is A100 ? We can get A100 by multiplying
matrices many many times
Or we could find the eigenvalues of A and obtain
A100 very quickly using eigenvalues.
44For now, Ill just know that there are two
eigenvectors for A
The eigenvectors are x1.6 .4 and x21
-1, and the eigenvalues are ?11 and
?20.5. Note that, if we multiply x1 by A, we
get x1. If we multiply x1 by A again, we STILL
get x1. Thus x1 doesnt change as we mulitiply
it by An.
45What about x2? When we multiply A by x2, we get
x2/2, and if we multiply x2 by A2, we get x2/4 .
This number gets very small fast. Note that
when A is squared the eigenvectors stay the same,
but the eigenvalues are squared! Back to our
original problem we note that for A100, the
eigenvectors will be the same, the eigenvalues
?11 and ?2(0.5)100, which is effectively
zero. Each eigenvector is multiplied by its
eigenvalue whenever A is applied,
46Outline
- Degree centrality
- Centralization
- Betweenness centrality
- Closeness centrality
- Eigenvector centrality
- Bonacich power centrality
- Katz centrality
- PageRank
- Hubs and Authorities
- Applications to Information Retrieval
- LexRank
47Eigenvector Centrality
48Eigenvector Centrality
49Eigenvector Centrality
- Can be calculated for directed graphs as well
- We need to decide between incoming or outgoing
edges - A has no incoming edges, hence a centrality of 0
- B has only an incoming edge from A
- hence its centrality is also 0
- Only vertices that are in a strongly connected
component of two or more vertices or the
out-component of such a component have non-zero
centrality
50Katz centrality
51Katz Centrality b
- The magnitude of b reflects the radius of power
- Small values of b weight local structure
- Larger values weight global structure
- If b gt 0, ego has higher centrality when tied to
people who are central - If b lt 0, then ego has higher centrality when
tied to people who are not central - With b 0, you get degree centrality
52Katz Centrality examples
b.25
b-.25
Why does the middle node have lower centrality
than its neighbors when b is negative?
53PageRank bringing order to the web
- Its in the links
- links to URLs can be interpreted as endorsements
or recommendations - the more links a URL receives, the more likely it
is to be a good/entertaining/provocative/authorita
tive/interesting information source - but not all link sources are created equal
- a link from a respected information source
- a link from a page created by a spammer
an important page, e.g. slashdot
Many webpages scattered across the web
if a web page isslashdotted, it gains attention
54PageRank
55Ranking pages by tracking a drunk
- A random walker following edges in a network for
a very long time will spend a proportion of time
at each nodewhich can be used as a measure
ofimportance
56Trapping a drunk
- Problem with pure random walk metric
- Drunk can be trapped and end up going in circles
57Ingenuity of the PageRank algorithm
- Allow drunk to teleport with some probability
- e.g. random websurfer follows links for a while,
but with some probability teleports to a random
page - bookmarked page or uses a search engine to start
anew
58PageRank algorithm
- where p1,p2,...,pN are the pages under
consideration, - M(pi) is the set of pages that link to pi,
- L(pj) is the number of outbound links on page pj,
and - N is the total number of pages.
- d is the random jumping probability (d 0.85 for
google)
59Exercise PageRank
- What happens to the relative PageRank scores of
the nodes as you increase the teleportation
probability? - Can you construct a network such that a node with
low indegree has the highest PageRank?
GUESS PageRank demo
http//projects.si.umich.edu/netlearn/GUESS/pagera
nk.html
60example probable location of random walker after
1 step
20 teleportation probability
61example location probability after 10 steps
62Matrix-based Centrality measures
with constant term
without constant term
PageRank
Degree centrality
Divide by out-degree
No division
Eigenvector centrality
Katz centrality
63Outline
- Degree centrality
- Centralization
- Betweenness centrality
- Closeness centrality
- Eigenvector centrality
- Bonacich power centrality
- Katz centrality
- PageRank
- Hubs and Authorities
- Applications to Information Retrieval
- LexRank
64Hubs and Authorities
- In directed networks, vertices that point to
important resources should also get a high
centrality - e.g. review articles, web indexes
- recursive definition
authorities are nodes that are linked to by good
hubs
hubs are nodes that links to good authorities
65Hyperlink-Induced Topic Search
- HITS algorithm
- start with a set of pages matching a query
- expand the set by following forward and back
links - take transition matrix E, where the i,jth entry
Eij 1/ni - where i links to j, and ni is the number of links
from i - then one can compute the authority scores a, and
hub scores h through an iterative approach
66Outline
- Degree centrality
- Centralization
- Betweenness centrality
- Closeness centrality
- Eigenvector centrality
- Bonacich power centrality
- Katz centrality
- PageRank
- Hubs and Authorities
- Applications to Information Retrieval
- LexRank
67Applications to Information Retrieval
- Can we use the notion of centrality to pick the
best summary sentence? - Can we use the subgraph of query results to infer
something about the query? - Can we use a graph of word translations to expand
dictionaries? disambiguate word meanings? - How might one use the HITS algorithm for document
summarization? - Consider a bipartite graph of sentences and words
68Centrality in summarization
- Extractive summarization
- pick k sentences that are most representative of
a collection of n sentences - Motivation
- capture the most central words in a document or
cluster - Centroid score Radev al. 2000, 2004a
- Alternative methods for computing centrality?
69Sample multidocument cluster
(DUC cluster d1003t)
1 (d1s1) Iraqi Vice President Taha Yassin Ramadan
announced today, Sunday, that Iraq refuses to
back down from its decision to stop cooperating
with disarmament inspectors before its demands
are met. 2 (d2s1) Iraqi Vice president Taha
Yassin Ramadan announced today, Thursday, that
Iraq rejects cooperating with the United Nations
except on the issue of lifting the blockade
imposed upon it since the year 1990. 3 (d2s2)
Ramadan told reporters in Baghdad that "Iraq
cannot deal positively with whoever represents
the Security Council unless there was a clear
stance on the issue of lifting the blockade off
of it. 4 (d2s3) Baghdad had decided late last
October to completely cease cooperating with the
inspectors of the United Nations Special
Commission (UNSCOM), in charge of disarming
Iraq's weapons, and whose work became very
limited since the fifth of August, and announced
it will not resume its cooperation with the
Commission even if it were subjected to a
military operation. 5 (d3s1) The Russian Foreign
Minister, Igor Ivanov, warned today, Wednesday
against using force against Iraq, which will
destroy, according to him, seven years of
difficult diplomatic work and will complicate the
regional situation in the area. 6 (d3s2) Ivanov
contended that carrying out air strikes against
Iraq, who refuses to cooperate with the United
Nations inspectors, will end the tremendous
work achieved by the international group during
the past seven years and will complicate the
situation in the region.'' 7 (d3s3) Nevertheless,
Ivanov stressed that Baghdad must resume working
with the Special Commission in charge of
disarming the Iraqi weapons of mass destruction
(UNSCOM). 8 (d4s1) The Special Representative of
the United Nations Secretary-General in Baghdad,
Prakash Shah, announced today, Wednesday, after
meeting with the Iraqi Deputy Prime Minister
Tariq Aziz, that Iraq refuses to back down from
its decision to cut off cooperation with the
disarmament inspectors. 9 (d5s1) British Prime
Minister Tony Blair said today, Sunday, that the
crisis between the international community and
Iraq did not end'' and that Britain is still
ready, prepared, and able to strike Iraq.'' 10
(d5s2) In a gathering with the press held at the
Prime Minister's office, Blair contended that the
crisis with Iraq will not end until Iraq has
absolutely and unconditionally respected its
commitments'' towards the United Nations. 11
(d5s3) A spokesman for Tony Blair had indicated
that the British Prime Minister gave permission
to British Air Force Tornado planes stationed in
Kuwait to join the aerial bombardment against
Iraq.
70Cosine between sentences
- Let s1 and s2 be two sentences.
- Let x and y be their representations in an
n-dimensional vector space - The cosine between is then computed based on the
inner product of the two.
- The cosine ranges from 0 to 1.
71LexRank (Cosine centrality)
1 2 3 4 5 6 7 8 9 10 11
1 1.00 0.45 0.02 0.17 0.03 0.22 0.03 0.28 0.06 0.06 0.00
2 0.45 1.00 0.16 0.27 0.03 0.19 0.03 0.21 0.03 0.15 0.00
3 0.02 0.16 1.00 0.03 0.00 0.01 0.03 0.04 0.00 0.01 0.00
4 0.17 0.27 0.03 1.00 0.01 0.16 0.28 0.17 0.00 0.09 0.01
5 0.03 0.03 0.00 0.01 1.00 0.29 0.05 0.15 0.20 0.04 0.18
6 0.22 0.19 0.01 0.16 0.29 1.00 0.05 0.29 0.04 0.20 0.03
7 0.03 0.03 0.03 0.28 0.05 0.05 1.00 0.06 0.00 0.00 0.01
8 0.28 0.21 0.04 0.17 0.15 0.29 0.06 1.00 0.25 0.20 0.17
9 0.06 0.03 0.00 0.00 0.20 0.04 0.00 0.25 1.00 0.26 0.38
10 0.06 0.15 0.01 0.09 0.04 0.20 0.00 0.20 0.26 1.00 0.12
11 0.00 0.00 0.00 0.01 0.18 0.03 0.01 0.17 0.38 0.12 1.00
72Lexical centrality (t0.3)
73Lexical centrality (t0.2)
74Lexical centrality (t0.1)
Sentences vote for the most central sentence
75LexRank
- T1Tn are pages that link to A,
- c(Ti) is the outdegree of pageTi, and
- N is the total number of pages.
- d is the damping factor, or the probability
that we jump to a far-away node during the
random walk. - It accounts for disconnected components or
periodic graphs. - When d 0, we have a strict uniform
distribution.When d 1, the method is not
guaranteed to converge to a unique solution. - Typical value for d is between 0.1,0.2 (Brin
and Page, 1998).
Günes Erkan and Dragomir R. Radev, LexRank
Graph-based Lexical Centrality as Salience in
Text Summarization
76lab Lexrank demo
- how does the summary change as you
- increase the cosine similarity threshold for an
edge - how similar two sentences have to be?
- increase the salience threshold (minimum degree
of a node)
http//tangra.si.umich.edu/demos/lexrank/
77Content similarity distributions for web pages
(DMOZ) and scientific articles (PNAS)
Menczer, Filippo (2004) The evolution of document
networks.
78what is that good for?
- How could you take advantage of the fact that
pages that are similar in content tend to link to
one another?
79What can networks of query results tell us about
the query?
- If query results are highly interlinked, is this
a narrow or broad query? - How could you use query connection graphs to
predict whether a query will be reformulated?
Jure Leskovec, Susan Dumais Web Projections
Learning from Contextual Subgraphs of the Web
80How can bipartite citation graphs be used to find
related articles?
- co-citation both A and B are cited by many other
papers (C, D, E )
B
A
C
D
E
- bibliographic coupling both A and B are cite
many of the same articles (F,G,H )
81which of these pairs is more proximate
- according to cycle free effective conductance
- the probability that you reach the other node
before cycling back on yourself, while doing a
random walk.
82Proximity as cycle free effective conductance
- Measuring and Extracting Proximity in Networks by
Yehuda Koren, Stephen C. North, Chris Volinsky,
KDD 2006 - demo http//public.research.att.com/volinsky/cgi
-bin/prox/prox.pl
83Using network algorithms (specifically proximity)
to improve movie recommendations can pay off
Source undetermined
84final IR application machine translation
- not all pairwise translations are available
- e.g. between rare languages
- in some applications, e.g. image search, a word
may have multiple meanings - spring is an example in english
- But in other languages, the word may be
unambiguous. -
- automated translation could be the key
or
or
or
85final IR application machine translation
- if we combine all known word pairs, can we
construct additional dictionaries between rare
languages?
source Reiter et al., Lexical Translation with
Application to Image Search on the WebÂ
86Automatic translation network structure
- Two words more likely to have same meaning if
there are multiple indirect paths of length 2
through other languages
???? Arabic
3
printemps French
udaherri Basque
3
1
3
1
1
3
3
1
1
3
koanga Maori
primavera Spanish
1
spring English
2
2
??????? Russian
87summary
- the web can be studied as a network
- this is useful for retrieving relevant content
- network concepts can be used in other IR tasks
- summarization
- query prediction
- machine translation