Network centrality - PowerPoint PPT Presentation

1 / 87
About This Presentation
Title:

Network centrality

Description:

Lecture 5: Network centrality Slides are modified from Lada Adamic * * * * Outline Degree centrality Centralization Betweenness centrality Closeness centrality ... – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 88
Provided by: LAD101
Category:

less

Transcript and Presenter's Notes

Title: Network centrality


1
Lecture 5 Network centrality
Slides are modified from Lada Adamic
2
Measures and Metrics
  • Knowing the structure of a network, we can
    calculate various useful quantities or measures
    that capture particular features of the network
    topology.
  • basis of most of such measures are from social
    network analysis
  • So far,
  • Degree distribution, Average path length, Density
  • Centrality
  • Degree, Eigenvector, Katz, PageRank, Hubs,
    Closeness, Betweenness, .
  • Several other graph metrics
  • Clustering coefficient, Assortativity,
    Modularity,

3
Characterizing networksWho is most central?
4
network centrality
  • Which nodes are most central?
  • Definition of central varies by context/purpose
  • Local measure
  • degree
  • Relative to rest of network
  • closeness, betweenness, eigenvector (Bonacich
    power centrality), Katz, PageRank,
  • How evenly is centrality distributed among nodes?
  • Centralization, hubs and authorities,

5
centrality whos important based on their
network position
In each of the following networks, X has higher
centrality than Y according to a particular
measure
indegree
outdegree
betweenness
closeness
6
Outline
  • Degree centrality
  • Centralization
  • Betweenness centrality
  • Closeness centrality
  • Eigenvector centrality
  • Bonacich power centrality
  • Katz centrality
  • PageRank
  • Hubs and Authorities

7
degree centrality (undirected)
He who has many friends is most important.
  • When is the number of connections the best
    centrality measure?
  • people who will do favors for you
  • people you can talk to (influence set,
    information access, )
  • influence of an article in terms of citations
    (using in-degree)

8
degree normalized degree centrality
divide by the max. possible, i.e. (N-1)
9
Prestige in directed social networks
  • when prestige may be the right word
  • admiration
  • influence
  • gift-giving
  • trust
  • directionality especially important in instances
    where ties may not be reciprocated (e.g. dining
    partners choice network)
  • when prestige may not be the right word
  • gives advice to (can reverse direction)
  • gives orders to (- -)
  • lends money to (- -)
  • dislikes
  • distrusts

10
Extensions of undirected degree centrality -
prestige
  • degree centrality
  • indegree centrality
  • a paper that is cited by many others has high
    prestige
  • a person nominated by many others for a reward
    has high prestige


11
centralization how equal are the nodes?
How much variation is there in the centrality
scores among the nodes?
Freemans general formula for centralization
(can use other metrics, e.g. gini coefficient
or standard deviation)
maximum value in the network
12
degree centralization examples
CD 0.167
CD 1.0
CD 0.167
13
degree centralization examples
example financial trading networks
high centralization one node trading with many
others
low centralization trades are more evenly
distributed
14
when degree isnt everything
In what ways does degree fail to capture
centrality in the following graphs?
  • ability to broker between groups
  • likelihood that information originating anywhere
    in the network reaches you

15
Outline
  • Degree centrality
  • Centralization
  • Betweenness centrality
  • Closeness centrality
  • Eigenvector centrality
  • Bonacich power centrality
  • Katz centrality
  • PageRank
  • Hubs and Authorities

16
betweenness another centrality measure
  • intuition how many pairs of individuals would
    have to go through you in order to reach one
    another in the minimum number of hops?
  • who has higher betweenness, X or Y?

X
Y
17
betweenness on toy networks
  • non-normalized version

A
B
C
E
D
  • A lies between no two other vertices
  • B lies between A and 3 other vertices C, D, and
    E
  • C lies between 4 pairs of vertices
    (A,D),(A,E),(B,D),(B,E)
  • note that there are no alternate paths for these
    pairs to take, so C gets full credit

18
betweenness centrality definition
paths between j and k that pass through i
betweenness of vertex i
all paths between j and k
Where gjk the number of geodesics connecting
j-k, and gjk the number that actor i is on.
Usually normalized by
number of pairs of vertices excluding the vertex
itself
directed graph (N-1)(N-2)
19
betweenness on toy networks
  • non-normalized version

20
betweenness on toy networks
  • non-normalized version

broker
21
example
Nodes are sized by degree, and colored by
betweenness.
Can you spot nodes with high betweenness but
relatively low degree?
What about high degree but relatively low
betweenness?
22
betweenness on toy networks
  • non-normalized version
  • why do C and D each have betweenness 1?
  • They are both on shortest paths for pairs (A,E),
    and (B,E), and so must share credit
  • ½½ 1
  • Can you figure out why B has betweenness 3.5
    while E has betweenness 0.5?

C
A
E
B
D
23
Alternative betweenness computations
  • Slight variations in geodesic path computations
  • inclusion of self in the computations
  • Flow betweenness
  • Based on the idea of maximum flow
  • edge-independent path selection effects the
    results
  • May not include geodesic paths
  • Random-walk betweenness
  • Based on the idea of random walks
  • Usually yields ranking similar to geodesic
    betweenness
  • Many other alternative definitions exist based on
    diffusion, transmission or flow along network
    edges

24
Extending betweenness centrality to directed
networks
  • We now consider the fraction of all directed
    paths between any two vertices that pass through
    a node

paths between j and k that pass through i
betweenness of vertex i
all paths between j and k
  • Only modification when normalizing, we have
    (N-1)(N-2) instead of (N-1)(N-2)/2, because we
    have twice as many ordered pairs as unordered
    pairs

25
Directed geodesics
  • A node does not necessarily lie on a geodesic
    from j to k if it lies on a geodesic from k to j

j
k
26
Outline
  • Degree centrality
  • Centralization
  • Betweenness centrality
  • Closeness centrality
  • Eigenvector centrality
  • Bonacich power centrality
  • Katz centrality
  • PageRank
  • Hubs and Authorities

27
closeness another centrality measure
  • What if its not so important to have many direct
    friends?
  • Or be between others
  • But one still wants to be in the middle of
    things,
  • not too far from the center

28
closeness centrality definition
Closeness is based on the length of the average
shortest path between a vertex and all vertices
in the graph
Closeness Centrality
depends on inverse distance to other vertices
Normalized Closeness Centrality
29
closeness centrality toy example
A
B
C
E
D
30
closeness centrality more toy examples
31
how closely do degree and betweenness correspond
to closeness?
  • degree
  • number of connections
  • denoted by size
  • closeness
  • length of shortest path to all others
  • denoted by color

32
Closeness centrality
  • Values tend to span a rather small dynamic range
  • typical distance increases logarithmically with
    network size
  • In a typical network the closeness centrality C
    might span a factor of five or less
  • It is difficult to distinguish between central
    and less central vertices
  • a small change in network might considerably
    affect the centrality order
  • Alternative computations exist but they have
    their own problems

33
Influence range
  • The influence range of i is the set of vertices
    who are reachable from the node i

34
Extensions of undirected closeness centrality
  • closeness centrality usually implies
  • all paths should lead to you
  • paths should lead from you to everywhere else
  • usually consider only vertices from which the
    node i in question can be reached


35
Outline
  • Degree centrality
  • Centralization
  • Betweenness centrality
  • Closeness centrality
  • Eigenvector centrality
  • Bonacich power centrality
  • Katz centrality
  • PageRank
  • Hubs and Authorities
  • Applications to Information Retrieval
  • LexRank

36
Eigenvalues and eigenvectors
  • Eigenvalues and eigenvectors have their origins
    in physics, in particular in problems where
    motion is involved, although their uses extend
    from solutions to stress and strain problems to
    differential equations and quantum mechanics.
  • Eigenvectors are vectors that point in directions
    where there is no rotation. Eigenvalues are the
    change in length of the eigenvector from the
    original length.
  • The basic equation in eigenvalue problems is

Slides from Fred K. Duennebier
37
Eigenvalues and eigenvectors
  • In words, this deceptively simple equation says
    that for the square matrix A, there is a vector x
    such that the product of Ax is a SCALAR, ?, that,
    when multiplied by x, results in the same
    product.
  • The multiplication of vector x by a scalar
    constant is the same as stretching or shrinking
    the coordinates by a constant value.
  • The vector x is called an eigenvector and the
    scalar ? is called an eigenvalue.

38
(E.01)
Do all matrices have real eigenvalues? No, they
must be square and the determinant of A- ?I must
equal zero. This is easy to show This can only
be true if det(A- ?I )A- ?I 0 Are
eigenvectors unique? No, if x is an
eigenvector, then ?x is also an eigenvector and ?
? is an eigenvalue.
(E.02)
(E.03)
A(?x) ?Ax ?lx l (? x)
(E.04)
39
How do you calculate eigenvectors and
eigenvalues? Expand equation (E.03) det(A- ?I
)A- ?I 0 for a 2x2 matrix
(E.05)
For a 2-dimensional problem such as this, the
equation above is a simple quadratic equation
with two solutions for ?. In fact, there is
generally one eigenvalue for each dimension, but
some may be zero, and some complex.
40
The solution to E.05 is
(E.06)
(E.07)
This characteristic equation does not involve
x, and the resulting values of ? can be used to
solve for x. Consider the following example
Eqn. E.07 doesnt work here because
a11a22-a12a120, so we use E.06
41
We see that one solution to this equation is ?0,
and dividing both sides of the above equation by
? yields ?5. Thus we have our two eigenvalues,
and the eigenvectors for the first eigenvalue,
?0 are
These equations are multiples of x-2y, so the
smallest whole number values that fit are x2,
y-1
42
For the other eigenvalue, ?5
  • This example is rather special A-1 does not
    exist, the two rows of A- ?I are dependent and
    thus one of the eigenvalues is zero. (Zero is a
    legitimate eigenvalue!)
  • EXAMPLE A more common case is A 1.05 .05
    .05 1 used in the strain exercise. Find the
    eigenvectors and eigenvalues for this A, and then
    calculate V,DeigA.
  • The procedure is
  • Compute the determinant of A- ?I
  • Find the roots of the polynomial given by A-
    ?I0
  • Solve the system of equations (A- ?I)x0

43
What good are such things? Consider the matrix
What is A100 ? We can get A100 by multiplying
matrices many many times
Or we could find the eigenvalues of A and obtain
A100 very quickly using eigenvalues.
44
For now, Ill just know that there are two
eigenvectors for A
The eigenvectors are x1.6 .4 and x21
-1, and the eigenvalues are ?11 and
?20.5. Note that, if we multiply x1 by A, we
get x1. If we multiply x1 by A again, we STILL
get x1. Thus x1 doesnt change as we mulitiply
it by An.
45
What about x2? When we multiply A by x2, we get
x2/2, and if we multiply x2 by A2, we get x2/4 .
This number gets very small fast. Note that
when A is squared the eigenvectors stay the same,
but the eigenvalues are squared! Back to our
original problem we note that for A100, the
eigenvectors will be the same, the eigenvalues
?11 and ?2(0.5)100, which is effectively
zero. Each eigenvector is multiplied by its
eigenvalue whenever A is applied,
46
Outline
  • Degree centrality
  • Centralization
  • Betweenness centrality
  • Closeness centrality
  • Eigenvector centrality
  • Bonacich power centrality
  • Katz centrality
  • PageRank
  • Hubs and Authorities
  • Applications to Information Retrieval
  • LexRank

47
Eigenvector Centrality
  •  

48
Eigenvector Centrality
  •  

49
Eigenvector Centrality
  • Can be calculated for directed graphs as well
  • We need to decide between incoming or outgoing
    edges
  • A has no incoming edges, hence a centrality of 0
  • B has only an incoming edge from A
  • hence its centrality is also 0
  • Only vertices that are in a strongly connected
    component of two or more vertices or the
    out-component of such a component have non-zero
    centrality

50
Katz centrality
  •  

51
Katz Centrality b
  • The magnitude of b reflects the radius of power
  • Small values of b weight local structure
  • Larger values weight global structure
  • If b gt 0, ego has higher centrality when tied to
    people who are central
  • If b lt 0, then ego has higher centrality when
    tied to people who are not central
  • With b 0, you get degree centrality

52
Katz Centrality examples
b.25
b-.25
Why does the middle node have lower centrality
than its neighbors when b is negative?
53
PageRank bringing order to the web
  • Its in the links
  • links to URLs can be interpreted as endorsements
    or recommendations
  • the more links a URL receives, the more likely it
    is to be a good/entertaining/provocative/authorita
    tive/interesting information source
  • but not all link sources are created equal
  • a link from a respected information source
  • a link from a page created by a spammer

an important page, e.g. slashdot
Many webpages scattered across the web
if a web page isslashdotted, it gains attention
54
PageRank
  •  

55
Ranking pages by tracking a drunk
  • A random walker following edges in a network for
    a very long time will spend a proportion of time
    at each nodewhich can be used as a measure
    ofimportance

56
Trapping a drunk
  • Problem with pure random walk metric
  • Drunk can be trapped and end up going in circles

57
Ingenuity of the PageRank algorithm
  • Allow drunk to teleport with some probability
  • e.g. random websurfer follows links for a while,
    but with some probability teleports to a random
    page
  • bookmarked page or uses a search engine to start
    anew

58
PageRank algorithm
  • where p1,p2,...,pN are the pages under
    consideration,
  • M(pi) is the set of pages that link to pi,
  • L(pj) is the number of outbound links on page pj,
    and
  • N is the total number of pages.
  • d is the random jumping probability (d 0.85 for
    google)

59
Exercise PageRank
  • What happens to the relative PageRank scores of
    the nodes as you increase the teleportation
    probability?
  • Can you construct a network such that a node with
    low indegree has the highest PageRank?

GUESS PageRank demo
http//projects.si.umich.edu/netlearn/GUESS/pagera
nk.html
60
example probable location of random walker after
1 step
20 teleportation probability
61
example location probability after 10 steps
62
Matrix-based Centrality measures
with constant term
without constant term
PageRank
Degree centrality
Divide by out-degree
No division
Eigenvector centrality
Katz centrality
63
Outline
  • Degree centrality
  • Centralization
  • Betweenness centrality
  • Closeness centrality
  • Eigenvector centrality
  • Bonacich power centrality
  • Katz centrality
  • PageRank
  • Hubs and Authorities
  • Applications to Information Retrieval
  • LexRank

64
Hubs and Authorities
  • In directed networks, vertices that point to
    important resources should also get a high
    centrality
  • e.g. review articles, web indexes
  • recursive definition

authorities are nodes that are linked to by good
hubs
hubs are nodes that links to good authorities
65
Hyperlink-Induced Topic Search
  • HITS algorithm
  • start with a set of pages matching a query
  • expand the set by following forward and back
    links
  • take transition matrix E, where the i,jth entry
    Eij 1/ni
  • where i links to j, and ni is the number of links
    from i
  • then one can compute the authority scores a, and
    hub scores h through an iterative approach

66
Outline
  • Degree centrality
  • Centralization
  • Betweenness centrality
  • Closeness centrality
  • Eigenvector centrality
  • Bonacich power centrality
  • Katz centrality
  • PageRank
  • Hubs and Authorities
  • Applications to Information Retrieval
  • LexRank

67
Applications to Information Retrieval
  • Can we use the notion of centrality to pick the
    best summary sentence?
  • Can we use the subgraph of query results to infer
    something about the query?
  • Can we use a graph of word translations to expand
    dictionaries? disambiguate word meanings?
  • How might one use the HITS algorithm for document
    summarization?
  • Consider a bipartite graph of sentences and words

68
Centrality in summarization
  • Extractive summarization
  • pick k sentences that are most representative of
    a collection of n sentences
  • Motivation
  • capture the most central words in a document or
    cluster
  • Centroid score Radev al. 2000, 2004a
  • Alternative methods for computing centrality?

69
Sample multidocument cluster
(DUC cluster d1003t)
1 (d1s1) Iraqi Vice President Taha Yassin Ramadan
announced today, Sunday, that Iraq refuses to
back down from its decision to stop cooperating
with disarmament inspectors before its demands
are met. 2 (d2s1) Iraqi Vice president Taha
Yassin Ramadan announced today, Thursday, that
Iraq rejects cooperating with the United Nations
except on the issue of lifting the blockade
imposed upon it since the year 1990. 3 (d2s2)
Ramadan told reporters in Baghdad that "Iraq
cannot deal positively with whoever represents
the Security Council unless there was a clear
stance on the issue of lifting the blockade off
of it. 4 (d2s3) Baghdad had decided late last
October to completely cease cooperating with the
inspectors of the United Nations Special
Commission (UNSCOM), in charge of disarming
Iraq's weapons, and whose work became very
limited since the fifth of August, and announced
it will not resume its cooperation with the
Commission even if it were subjected to a
military operation. 5 (d3s1) The Russian Foreign
Minister, Igor Ivanov, warned today, Wednesday
against using force against Iraq, which will
destroy, according to him, seven years of
difficult diplomatic work and will complicate the
regional situation in the area. 6 (d3s2) Ivanov
contended that carrying out air strikes against
Iraq, who refuses to cooperate with the United
Nations inspectors, will end the tremendous
work achieved by the international group during
the past seven years and will complicate the
situation in the region.'' 7 (d3s3) Nevertheless,
Ivanov stressed that Baghdad must resume working
with the Special Commission in charge of
disarming the Iraqi weapons of mass destruction
(UNSCOM). 8 (d4s1) The Special Representative of
the United Nations Secretary-General in Baghdad,
Prakash Shah, announced today, Wednesday, after
meeting with the Iraqi Deputy Prime Minister
Tariq Aziz, that Iraq refuses to back down from
its decision to cut off cooperation with the
disarmament inspectors. 9 (d5s1) British Prime
Minister Tony Blair said today, Sunday, that the
crisis between the international community and
Iraq did not end'' and that Britain is still
ready, prepared, and able to strike Iraq.'' 10
(d5s2) In a gathering with the press held at the
Prime Minister's office, Blair contended that the
crisis with Iraq will not end until Iraq has
absolutely and unconditionally respected its
commitments'' towards the United Nations. 11
(d5s3) A spokesman for Tony Blair had indicated
that the British Prime Minister gave permission
to British Air Force Tornado planes stationed in
Kuwait to join the aerial bombardment against
Iraq.
70
Cosine between sentences
  • Let s1 and s2 be two sentences.
  • Let x and y be their representations in an
    n-dimensional vector space
  • The cosine between is then computed based on the
    inner product of the two.
  • The cosine ranges from 0 to 1.

71
LexRank (Cosine centrality)
1 2 3 4 5 6 7 8 9 10 11
1 1.00 0.45 0.02 0.17 0.03 0.22 0.03 0.28 0.06 0.06 0.00
2 0.45 1.00 0.16 0.27 0.03 0.19 0.03 0.21 0.03 0.15 0.00
3 0.02 0.16 1.00 0.03 0.00 0.01 0.03 0.04 0.00 0.01 0.00
4 0.17 0.27 0.03 1.00 0.01 0.16 0.28 0.17 0.00 0.09 0.01
5 0.03 0.03 0.00 0.01 1.00 0.29 0.05 0.15 0.20 0.04 0.18
6 0.22 0.19 0.01 0.16 0.29 1.00 0.05 0.29 0.04 0.20 0.03
7 0.03 0.03 0.03 0.28 0.05 0.05 1.00 0.06 0.00 0.00 0.01
8 0.28 0.21 0.04 0.17 0.15 0.29 0.06 1.00 0.25 0.20 0.17
9 0.06 0.03 0.00 0.00 0.20 0.04 0.00 0.25 1.00 0.26 0.38
10 0.06 0.15 0.01 0.09 0.04 0.20 0.00 0.20 0.26 1.00 0.12
11 0.00 0.00 0.00 0.01 0.18 0.03 0.01 0.17 0.38 0.12 1.00
72
Lexical centrality (t0.3)
73
Lexical centrality (t0.2)
74
Lexical centrality (t0.1)
Sentences vote for the most central sentence
75
LexRank
  • T1Tn are pages that link to A,
  • c(Ti) is the outdegree of pageTi, and
  • N is the total number of pages.
  • d is the damping factor, or the probability
    that we jump to a far-away node during the
    random walk.
  • It accounts for disconnected components or
    periodic graphs.
  • When d 0, we have a strict uniform
    distribution.When d 1, the method is not
    guaranteed to converge to a unique solution.
  • Typical value for d is between 0.1,0.2 (Brin
    and Page, 1998).

Günes Erkan and Dragomir R. Radev, LexRank
Graph-based Lexical Centrality as Salience in
Text Summarization
76
lab Lexrank demo
  • how does the summary change as you
  • increase the cosine similarity threshold for an
    edge
  • how similar two sentences have to be?
  • increase the salience threshold (minimum degree
    of a node)

http//tangra.si.umich.edu/demos/lexrank/
77
Content similarity distributions for web pages
(DMOZ) and scientific articles (PNAS)
Menczer, Filippo (2004) The evolution of document
networks.
78
what is that good for?
  • How could you take advantage of the fact that
    pages that are similar in content tend to link to
    one another?

79
What can networks of query results tell us about
the query?
  • If query results are highly interlinked, is this
    a narrow or broad query?
  • How could you use query connection graphs to
    predict whether a query will be reformulated?

Jure Leskovec, Susan Dumais Web Projections
Learning from Contextual Subgraphs of the Web
80
How can bipartite citation graphs be used to find
related articles?
  • co-citation both A and B are cited by many other
    papers (C, D, E )

B
A
C
D
E
  • bibliographic coupling both A and B are cite
    many of the same articles (F,G,H )

81
which of these pairs is more proximate
  • according to cycle free effective conductance
  • the probability that you reach the other node
    before cycling back on yourself, while doing a
    random walk.

82
Proximity as cycle free effective conductance
  • Measuring and Extracting Proximity in Networks by
    Yehuda Koren, Stephen C. North, Chris Volinsky,
    KDD 2006
  • demo http//public.research.att.com/volinsky/cgi
    -bin/prox/prox.pl

83
Using network algorithms (specifically proximity)
to improve movie recommendations can pay off
Source undetermined
84
final IR application machine translation
  • not all pairwise translations are available
  • e.g. between rare languages
  • in some applications, e.g. image search, a word
    may have multiple meanings
  • spring is an example in english
  • But in other languages, the word may be
    unambiguous.
  • automated translation could be the key

or
or
or
85
final IR application machine translation
  • if we combine all known word pairs, can we
    construct additional dictionaries between rare
    languages?

source Reiter et al., Lexical Translation with
Application to Image Search on the Web 
86
Automatic translation network structure
  • Two words more likely to have same meaning if
    there are multiple indirect paths of length 2
    through other languages

???? Arabic
3
printemps French
udaherri Basque
3

1

3
1

1
3
3
1
1
3

koanga Maori
primavera Spanish
1
spring English
2
2
??????? Russian

87
summary
  • the web can be studied as a network
  • this is useful for retrieving relevant content
  • network concepts can be used in other IR tasks
  • summarization
  • query prediction
  • machine translation
Write a Comment
User Comments (0)
About PowerShow.com