Title: Fast Counting of triangles in large networks: Algorithms and laws
1Fast Counting of triangles in large networks
Algorithms and laws
Charalampos (Babis) Tsourakakis School of
Computer ScienceCarnegie Mellon
University http//www.cs.cmu.edu/ctsourak
- RPI Theory Seminar, 24 November 2008
2Counting Triangles
- Given an undirected, simple graph G(V,E) a
triangle is a set of 3 vertices such that any two
of them by an edge of the graph. - Related Problems
- a) Decide if a graph is triangle-free.
- b) Count the total number of triangles d(G).
- c) Count the number of triangles d(v) that
each vertex v - participates at.
- d) List the triangles that each vertex v
participates at.
Our focus
3Why is triangle counting important?
- Social Network AnalysisFriends of friends are
friends WF94 - Web Spam Detection BPCG08
- Hidden Thematic Structure of the Web EM02
- Motif Detection e.g. biological networks
YPSB05 - few indicative reasons, from the graph mining
perspective
4Why is triangle counting important?
- Furthermore, two often used metrics are
- Clustering Coefficient
- where
- Transitivity Ratio
- where
Triple at node v
v
Triangle
5Outline
- Related Work
- Proposed Method
- Experiments
- Triangle-related Laws
- Triangles in Kronecker Graphs
- Future Work Open Problems
6Counting methods
Fast Low space
Time complexity O(n2.37) O(n3)
Space complexity O(n2) O(m)
Fast Low space
Time complexity O(m0.7n1.2n2o(1)) e.g. O( n )
Space complexity T(n2) (eventually) T(m)
Sparse graphs
7Outline
- Related Work
- Proposed Method
- Experiments
- Triangle-related Laws
- Triangles in Kronecker Graphs
- Future Work Open Problems
8Outline of the Proposed Method
- EigenTriangle theorem
- EigenTriangleLocal theorem
- EigenTriangle algorithm
- EigenTriangleLocal algorithm
- Efficiency Complexity
- Power law degree distributions
- Gershgorin discs
- Real world network spectra
9Theorem EigenTriangle
- Theorem
- The number of triangles d(G) in an undirected,
simple graph G(V,E) is given by - where are the eigenvalues of the adjacency
matrix of graph G.
10Proof
- Call A the adjacency matrix of the graph.
Consider the i-th diagonal element of A3, aii.
This element is equal to the number of triangles
vertex i participates at. So the trace is 6d(G)
because each triangle is counted 6 times (3
participating vertices and is also counted as
i-j-k, and i-k-j). Furthermore, if Ax?x, then ?3
is an eigenvalue of A3 () and vice versa if ?
is an eigenvalue of A3 , then is an
eigenvalue of A. A3 xAAAxAA?x???x???x?2?x
?3x
11Theorem EigenTriangleLocal
- Theorem
- The number of triangles d(i) vertex i
partipates at is equal towhere is the
j-th entry of the i-th eigenvector - Proof SketchFollows from the previous theorem
and the fact that A is symmetric, therefore
diagonalizable and also
12EigenTriangle Algorithm
13EigenTriangleLocal Algorithm
Why are these two algorithms efficient?
14Skewed Degree Distributions
- Skewed degree distribution ubiquitous in nature!
Have been termed as the signature of human
activityFKP02 but appear as well to all other
kind of networks, e.g. biological. See
N05M04 for generative models of power law
distributions. - Typically referred to as power-laws (even if
sometimes we abuse the strict definition of a
power law, i.e ).
15Examples of power laws
- Newman N05 demonstratedhow often power laws
appearusing may different types ofnetworks,
ranging from wordfrequencies to population
ofcities.
Many cities havea small population
Few cities havea huge population
16Gershgorins Discs
- Theorem Let B an arbitrary matrix. Then the
eigenvalues ? of B are located in the union of
the n discs - For a proof see Demmel D97, p.82.
17Gershgorin Discs
- Bounds on the airports network (Observe how
loose)
18Typical real world spectra
Airports
Political blogs
19Top Eigenvalues
- Zooming in the top eigenvalues and plotting the
rank vs. the eigenvalue in log-log scale reveals
that the top eigenvalues follow a power law
FFF99 - Some years later, Mihail Papadimitriou MP02
and Chung, Lu and Vu CLV03 proved this fact.
20Our idea
- Simple clear Use a low-rank approximation of
A3 to estimate the diagonal elements and the
trace. - Suggests also a way of thinkingTake advantage
of special properties (e.g. power laws) to reduce
the complexity of certain computational tasks in
real-world networks.
21Summing up Why does it work?
- Almost symmetry of the spectrum around 0 for the
bulk of the eigenvalues except the top ones is
the first main reason. - Cubes amplify strongly this phenomenon!
22Complexity Analysis
- Main computational bottleneck that determines the
complexity is the Lanczos method. - Lanczos runs in linear time with respect to the
non-zero entries of the matrix, i.e. the edges,
assuming that we compute a few constant number of
eigenvalues. - Convergence of Lanczos is fast due to the
eigenvalue power law (see Kaniel-Paige theory
GL89)
23Outline
- Related Work
- Proposed Method
- Experiments
- Triangle-related Laws
- Triangles in Kronecker Graphs
- Future Work Open Problems
24Datasets
25Competitor Node Iterator
- Node Iterator algorithm considers each node at
the time, looks at its neighbors and checks how
many among them are connected among them. - Complexity O(n )
- We report the results as the speedup that
EigenTriangle algorithm gives compared to the
running time of the Node Iterator .
26Results Eigenvalues vs. Speedup
27Results Edges vs. Speedup
28Main points
- Some interesting facts for the two scatterplots
- Mean required approximations rank for at least
95 is 6.2 - Speedups are between 33.7x and 1159x.
- The mean speedup is 250.
- Notice the increasing speedup as the size of the
network grows.
29Zooming in
Zoomingin this point
30Evaluating the Local Counting Method
- Pearsons correlation coefficient ?
- Relative Reconstruction Error
Political BlogsRRE 710-4 ? 99.97
31Eigenvalues vs. ? for three networks
Observe how a low rank results in almost
optimal results.This holds for surprisingly
manyreal world networks
32Outline
- Related Work
- Proposed Method
- Experiments
- Triangle-related Laws
- Triangles in Kronecker Graphs
- Future Work Open Problems
33Triangle Participation Law
- Plots the number of triangles d (x-axis) vs. the
count of vertices with d participating triangles.
(a)
(b)
a) EPINIONS, who trusts-whosb) ASN, social
networkc) HEP_TH, collaboration network
(c)
34Degree Triangle Law
- Plots the degree di (x-axis) vs. the mean number
of triangles that nodes with degree di
participate at.
Epinions
ASN
35Outline
- Related Work
- Proposed Method
- Experiments
- New Triangle-related Laws
- Triangles in Kronecker Graphs
- Future Work Open Problems
36Kronecker Graphs
- This model was introduced in LCKF05. It is
based on the simple operation of the Kronecker
product to generate graphs that mimic real world
networks. - Deterministic Kronecker Graphs Kronecker Product
of the adjacency matrix at the current step k
with the initiator adjacency matrix (typically
small). - Stochastic Kronecker Graphs Kronecker Product of
the matrix at the current step k with the
initiator matrix. Initiator matrix contains
probabilities.For more details see LF07.
37Triangles in Kronecker Graphs
- Some notation firstA nxn initiatior adjacency
matrix of the undirected, simple graph GA - B Ak k-th Kronecker product
- ?(?1,...,?n) the eigenvalues of A
- ?(GA), ?(G?) triangles of GA , G?
- Theorem KroneckerTRC
-
38Proof
- We use induction on the number of recursion steps
k. For k0 the theorem trivially holds. - Assume now that KroneckerTRC holds now for
some - .Call CAr, DAr1 and the
eigenvalues of C, µii1..s.By the assumption
- The eigenvalues of D are given by the
Kronecker product . By the EigenTriangle
theorem, the number of triangles in D is given by
39Proof
Therefore KroneckerTRC holds for all
.Q.E.D
40Outline
- Related Work
- Proposed Method
- Experiments
- New Triangle-related Laws
- Triangles in Kronecker Graphs
- Future Work Open Problems
41Theoretical Challenge ISpectra of real world
networks
- Can we prove things about the distribution of the
eigenvalues, adopting a random graph model such
as the expected degree model G(w) CLV03? - An analog to Wigners semicircle law for random
Erdos-Renyi graphs (see Furedi-Komlos FK81)
Spectrum of over 100000 IterationsS07
42Theoretical Challenge ISpectra of real world
networks
Empirically, the rest of the spectrum Triangular
-likedistributionFDBV01
Can we prove Something aboutthis empirical
observation ?
43Theoretical Challenge II Eigenvectors of real
world networks
- Things even worse than the case of spectra.
Very few knowledge about the eigenvectors.
Related workSee P08 for random graphs.
44Theoretical Challenge III Degree Triangle Law
- Prove using the expected degree random graph
model G(w) the pattern we saw (see S04) - Conjecture
- The relationship we observed probably appears
- for some cases of the slope of the degree
- distribution. Further experiments, recently
showed - that for some graphs this pattern does not
hold.
45Experimental Challenge ICompare with Streaming
Methods
- Streaming or Semi-Streaming methods, perform one
or O(1) passes over the graph. YKS02BFLSS06
BPCG08 Common Underlying Idea Sophisticated
sampling methods - Implement and compare.
46Practical Challenge ITriangles in Large Scale
Graph Mining
- Many Giga-byte and Peta-byte sized graphs.
- How to handle these graphs?
- HADOOP
- EigenTriangle algorithms are based just on simple
- matrix vector multiplications.
- Easy to parallelize in all sorts of
architectures - (distributed memory , shared memory). See
DHV93 for the details.
47PEGASUS Peta-Graph Miningfrom the Triangle
perspective
- On-going work with U Kang and Christos Faloutsos
in collaboration with Yahoo! Research. - Among others Implement EigenTriangle algorithms
in HADOOP and compare to other methods. - Find outliers in graphs with many billions of
edges wrt triangles.
SoonStay tuned!
48Curious about
49Acknowledgements
- Christos Faloutsos
- Yiannis Koutis
For the helpful discussions
50Acknowledgements
For the PEGASUS logo
51(No Transcript)
52References
- WF94 Wasserman, Faust Social Network
Analysis Methods and Applications (Structural
Analysis in the Social Sciences) - EM02 Eckmann, Moses Curvature of co-links
uncovers hidden thematic layers in the World Wide
Web - BPCG08 Becchetti, Boldi, Castillo, Gionis
Efficient Semi-Streaming Algorithms for Local
Triangle Counting in Massive Graphs - FKP02 Fabrikant, Koutsoupias, Papadimitriou
Heuristically Optimized Trade-offs A New
Paradigm for Power Laws in the Internet - N05 Newman Power laws, Pareto distributions
and Zipf's law - M04 Mitzenmacher A brief history of
generative models for power law and lognormal
distributions - FK81 Furedi-Komlos Eigenvalues of random
symmetric matrices
53References
- S04 Danilo Sergi Random graph model with
power-law distributed triangle subgraphs - D97 Demmel Applied Numerical Algebra
- LCKF05 Leskovec, Chakrabarti, Kleinberg,
Faloutsos Realistic, Mathematically Tractable
Graph Generation and Evolution using Kronecker
Multiplication - LK07 Leskovec, Faloutsos Scalable Modeling of
Real Graphs using Kronecker Multiplication - FFF09 Faloutsos, Faloutsos, Faloutsos On
power-law relationships of the Internet topology - MP02 Mihail, Papadimitriou On the Eigenvalue
Power Law - CLV03 Chung, Lu, Vu Spectra of Random Graphs
with given expected degrees
54References
- YKS02 Yossef, Kumar, Sivakumar Scalable
Modeling of Real Graphs using Kronecker
Multiplication - GL89 Golub, Van Loan Matrix Computations
- BFLSS06 Buriol, Frahling, Leonardi, Spaccamela,
Sohler Counting triangles in data streams - DHV93 Demmel, Heath, Vorst Parallel Numerical
Linear Algebra - YPSB05 Ye, Peyser, Spencer, Bader
Commensurate distances and similar motifs in
genetic congruence and protein interaction
networks in yeast - P08 Mitra Pradipta Entrywise Bounds for
Eigenvectors of Random Graphs - FDBV01 Farkas, Derenyi, Barabasi, Vicsek
Spectra of "real-world" graphs Beyond the
semi-circle law - S07 Spielmans Spectral Graph Theory and its
Applications class (YALE) http//www.cs.yale.edu
/homes/spielman/eigs/
55References
- F08 Faloutsos Multimedia Databases and Data
Mining class (CMU)http//www.cs.cmu.edu/christ
os/courses/826.S08 - For more references, take a look also in the
paper http//www.cs.cmu.edu/ctsourak/tsourICDM08
.pdf