Fast Counting of triangles in large networks: Algorithms and laws PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Fast Counting of triangles in large networks: Algorithms and laws


1
Fast Counting of triangles in large networks
Algorithms and laws
Charalampos (Babis) Tsourakakis School of
Computer ScienceCarnegie Mellon
University http//www.cs.cmu.edu/ctsourak
  • RPI Theory Seminar, 24 November 2008

2
Counting Triangles
  • Given an undirected, simple graph G(V,E) a
    triangle is a set of 3 vertices such that any two
    of them by an edge of the graph.
  • Related Problems
  • a) Decide if a graph is triangle-free.
  • b) Count the total number of triangles d(G).
  • c) Count the number of triangles d(v) that
    each vertex v
  • participates at.
  • d) List the triangles that each vertex v
    participates at.

Our focus
3
Why is triangle counting important?
  • Social Network AnalysisFriends of friends are
    friends WF94
  • Web Spam Detection BPCG08
  • Hidden Thematic Structure of the Web EM02
  • Motif Detection e.g. biological networks
    YPSB05
  • few indicative reasons, from the graph mining
    perspective

4
Why is triangle counting important?
  • Furthermore, two often used metrics are
  • Clustering Coefficient
  • where
  • Transitivity Ratio
  • where

Triple at node v
v
Triangle
5
Outline
  • Related Work
  • Proposed Method
  • Experiments
  • Triangle-related Laws
  • Triangles in Kronecker Graphs
  • Future Work Open Problems

6
Counting methods
  • Dense graphs

Fast Low space
Time complexity O(n2.37) O(n3)
Space complexity O(n2) O(m)
Fast Low space
Time complexity O(m0.7n1.2n2o(1)) e.g. O( n )
Space complexity T(n2) (eventually) T(m)
Sparse graphs
7
Outline
  • Related Work
  • Proposed Method
  • Experiments
  • Triangle-related Laws
  • Triangles in Kronecker Graphs
  • Future Work Open Problems

8
Outline of the Proposed Method
  • EigenTriangle theorem
  • EigenTriangleLocal theorem
  • EigenTriangle algorithm
  • EigenTriangleLocal algorithm
  • Efficiency Complexity
  • Power law degree distributions
  • Gershgorin discs
  • Real world network spectra

9
Theorem EigenTriangle
  • Theorem
  • The number of triangles d(G) in an undirected,
    simple graph G(V,E) is given by
  • where are the eigenvalues of the adjacency
    matrix of graph G.

10
Proof
  • Call A the adjacency matrix of the graph.
    Consider the i-th diagonal element of A3, aii.
    This element is equal to the number of triangles
    vertex i participates at. So the trace is 6d(G)
    because each triangle is counted 6 times (3
    participating vertices and is also counted as
    i-j-k, and i-k-j). Furthermore, if Ax?x, then ?3
    is an eigenvalue of A3 () and vice versa if ?
    is an eigenvalue of A3 , then is an
    eigenvalue of A. A3 xAAAxAA?x???x???x?2?x
    ?3x

11
Theorem EigenTriangleLocal
  • Theorem
  • The number of triangles d(i) vertex i
    partipates at is equal towhere is the
    j-th entry of the i-th eigenvector
  • Proof SketchFollows from the previous theorem
    and the fact that A is symmetric, therefore
    diagonalizable and also

12
EigenTriangle Algorithm
13
EigenTriangleLocal Algorithm
Why are these two algorithms efficient?
14
Skewed Degree Distributions
  • Skewed degree distribution ubiquitous in nature!
    Have been termed as the signature of human
    activityFKP02 but appear as well to all other
    kind of networks, e.g. biological. See
    N05M04 for generative models of power law
    distributions.
  • Typically referred to as power-laws (even if
    sometimes we abuse the strict definition of a
    power law, i.e ).

15
Examples of power laws
  • Newman N05 demonstratedhow often power laws
    appearusing may different types ofnetworks,
    ranging from wordfrequencies to population
    ofcities.

Many cities havea small population
Few cities havea huge population
16
Gershgorins Discs
  • Theorem Let B an arbitrary matrix. Then the
    eigenvalues ? of B are located in the union of
    the n discs
  • For a proof see Demmel D97, p.82.

17
Gershgorin Discs
  • Bounds on the airports network (Observe how
    loose)

18
Typical real world spectra
Airports
Political blogs
19
Top Eigenvalues
  • Zooming in the top eigenvalues and plotting the
    rank vs. the eigenvalue in log-log scale reveals
    that the top eigenvalues follow a power law
    FFF99
  • Some years later, Mihail Papadimitriou MP02
    and Chung, Lu and Vu CLV03 proved this fact.

20
Our idea
  • Simple clear Use a low-rank approximation of
    A3 to estimate the diagonal elements and the
    trace.
  • Suggests also a way of thinkingTake advantage
    of special properties (e.g. power laws) to reduce
    the complexity of certain computational tasks in
    real-world networks.

21
Summing up Why does it work?
  • Almost symmetry of the spectrum around 0 for the
    bulk of the eigenvalues except the top ones is
    the first main reason.
  • Cubes amplify strongly this phenomenon!

22
Complexity Analysis
  • Main computational bottleneck that determines the
    complexity is the Lanczos method.
  • Lanczos runs in linear time with respect to the
    non-zero entries of the matrix, i.e. the edges,
    assuming that we compute a few constant number of
    eigenvalues.
  • Convergence of Lanczos is fast due to the
    eigenvalue power law (see Kaniel-Paige theory
    GL89)

23
Outline
  • Related Work
  • Proposed Method
  • Experiments
  • Triangle-related Laws
  • Triangles in Kronecker Graphs
  • Future Work Open Problems

24
Datasets

25
Competitor Node Iterator
  • Node Iterator algorithm considers each node at
    the time, looks at its neighbors and checks how
    many among them are connected among them.
  • Complexity O(n )
  • We report the results as the speedup that
    EigenTriangle algorithm gives compared to the
    running time of the Node Iterator .

26
Results Eigenvalues vs. Speedup
27
Results Edges vs. Speedup
28
Main points
  • Some interesting facts for the two scatterplots
  • Mean required approximations rank for at least
    95 is 6.2
  • Speedups are between 33.7x and 1159x.
  • The mean speedup is 250.
  • Notice the increasing speedup as the size of the
    network grows.

29
Zooming in
Zoomingin this point
30
Evaluating the Local Counting Method
  • Pearsons correlation coefficient ?
  • Relative Reconstruction Error

Political BlogsRRE 710-4 ? 99.97
31
Eigenvalues vs. ? for three networks
Observe how a low rank results in almost
optimal results.This holds for surprisingly
manyreal world networks
32
Outline
  • Related Work
  • Proposed Method
  • Experiments
  • Triangle-related Laws
  • Triangles in Kronecker Graphs
  • Future Work Open Problems

33
Triangle Participation Law
  • Plots the number of triangles d (x-axis) vs. the
    count of vertices with d participating triangles.

(a)
(b)
a) EPINIONS, who trusts-whosb) ASN, social
networkc) HEP_TH, collaboration network
(c)
34
Degree Triangle Law
  • Plots the degree di (x-axis) vs. the mean number
    of triangles that nodes with degree di
    participate at.

Epinions
ASN
35
Outline
  • Related Work
  • Proposed Method
  • Experiments
  • New Triangle-related Laws
  • Triangles in Kronecker Graphs
  • Future Work Open Problems

36
Kronecker Graphs
  • This model was introduced in LCKF05. It is
    based on the simple operation of the Kronecker
    product to generate graphs that mimic real world
    networks.
  • Deterministic Kronecker Graphs Kronecker Product
    of the adjacency matrix at the current step k
    with the initiator adjacency matrix (typically
    small).
  • Stochastic Kronecker Graphs Kronecker Product of
    the matrix at the current step k with the
    initiator matrix. Initiator matrix contains
    probabilities.For more details see LF07.

37
Triangles in Kronecker Graphs
  • Some notation firstA nxn initiatior adjacency
    matrix of the undirected, simple graph GA
  • B Ak k-th Kronecker product
  • ?(?1,...,?n) the eigenvalues of A
  • ?(GA), ?(G?) triangles of GA , G?
  • Theorem KroneckerTRC

38
Proof
  • We use induction on the number of recursion steps
    k. For k0 the theorem trivially holds.
  • Assume now that KroneckerTRC holds now for
    some
  • .Call CAr, DAr1 and the
    eigenvalues of C, µii1..s.By the assumption
  • The eigenvalues of D are given by the
    Kronecker product . By the EigenTriangle
    theorem, the number of triangles in D is given by

39
Proof
Therefore KroneckerTRC holds for all
.Q.E.D
40
Outline
  • Related Work
  • Proposed Method
  • Experiments
  • New Triangle-related Laws
  • Triangles in Kronecker Graphs
  • Future Work Open Problems

41
Theoretical Challenge ISpectra of real world
networks
  • Can we prove things about the distribution of the
    eigenvalues, adopting a random graph model such
    as the expected degree model G(w) CLV03?
  • An analog to Wigners semicircle law for random
    Erdos-Renyi graphs (see Furedi-Komlos FK81)

Spectrum of over 100000 IterationsS07
42
Theoretical Challenge ISpectra of real world
networks

Empirically, the rest of the spectrum Triangular
-likedistributionFDBV01
Can we prove Something aboutthis empirical
observation ?


43
Theoretical Challenge II Eigenvectors of real
world networks
  • Things even worse than the case of spectra.
    Very few knowledge about the eigenvectors.
    Related workSee P08 for random graphs.

44
Theoretical Challenge III Degree Triangle Law
  • Prove using the expected degree random graph
    model G(w) the pattern we saw (see S04)
  • Conjecture
  • The relationship we observed probably appears
  • for some cases of the slope of the degree
  • distribution. Further experiments, recently
    showed
  • that for some graphs this pattern does not
    hold.

45
Experimental Challenge ICompare with Streaming
Methods
  • Streaming or Semi-Streaming methods, perform one
    or O(1) passes over the graph. YKS02BFLSS06
    BPCG08 Common Underlying Idea Sophisticated
    sampling methods
  • Implement and compare.

46
Practical Challenge ITriangles in Large Scale
Graph Mining
  • Many Giga-byte and Peta-byte sized graphs.
  • How to handle these graphs?
  • HADOOP
  • EigenTriangle algorithms are based just on simple
  • matrix vector multiplications.
  • Easy to parallelize in all sorts of
    architectures
  • (distributed memory , shared memory). See
    DHV93 for the details.

47
PEGASUS Peta-Graph Miningfrom the Triangle
perspective
  • On-going work with U Kang and Christos Faloutsos
    in collaboration with Yahoo! Research.
  • Among others Implement EigenTriangle algorithms
    in HADOOP and compare to other methods.
  • Find outliers in graphs with many billions of
    edges wrt triangles.

SoonStay tuned!
48
Curious about
49
Acknowledgements
  • Christos Faloutsos
  • Yiannis Koutis

For the helpful discussions
50
Acknowledgements
  • Maria Tsiarli

For the PEGASUS logo
51
(No Transcript)
52
References
  • WF94 Wasserman, Faust Social Network
    Analysis Methods and Applications (Structural
    Analysis in the Social Sciences)
  • EM02 Eckmann, Moses Curvature of co-links
    uncovers hidden thematic layers in the World Wide
    Web
  • BPCG08 Becchetti, Boldi, Castillo, Gionis
    Efficient Semi-Streaming Algorithms for Local
    Triangle Counting in Massive Graphs
  • FKP02 Fabrikant, Koutsoupias, Papadimitriou
    Heuristically Optimized Trade-offs A New
    Paradigm for Power Laws in the Internet
  • N05 Newman Power laws, Pareto distributions
    and Zipf's law
  • M04 Mitzenmacher A brief history of
    generative models for power law and lognormal
    distributions
  • FK81 Furedi-Komlos Eigenvalues of random
    symmetric matrices

53
References
  • S04 Danilo Sergi Random graph model with
    power-law distributed triangle subgraphs
  • D97 Demmel Applied Numerical Algebra
  • LCKF05 Leskovec, Chakrabarti, Kleinberg,
    Faloutsos Realistic, Mathematically Tractable
    Graph Generation and Evolution using Kronecker
    Multiplication
  • LK07 Leskovec, Faloutsos Scalable Modeling of
    Real Graphs using Kronecker Multiplication
  • FFF09 Faloutsos, Faloutsos, Faloutsos On
    power-law relationships of the Internet topology
  • MP02 Mihail, Papadimitriou On the Eigenvalue
    Power Law
  • CLV03 Chung, Lu, Vu Spectra of Random Graphs
    with given expected degrees

54
References
  • YKS02 Yossef, Kumar, Sivakumar Scalable
    Modeling of Real Graphs using Kronecker
    Multiplication
  • GL89 Golub, Van Loan Matrix Computations
  • BFLSS06 Buriol, Frahling, Leonardi, Spaccamela,
    Sohler Counting triangles in data streams
  • DHV93 Demmel, Heath, Vorst Parallel Numerical
    Linear Algebra
  • YPSB05 Ye, Peyser, Spencer, Bader
    Commensurate distances and similar motifs in
    genetic congruence and protein interaction
    networks in yeast
  • P08 Mitra Pradipta Entrywise Bounds for
    Eigenvectors of Random Graphs
  • FDBV01 Farkas, Derenyi, Barabasi, Vicsek
    Spectra of "real-world" graphs Beyond the
    semi-circle law
  • S07 Spielmans Spectral Graph Theory and its
    Applications class (YALE) http//www.cs.yale.edu
    /homes/spielman/eigs/

55
References
  • F08 Faloutsos Multimedia Databases and Data
    Mining class (CMU)http//www.cs.cmu.edu/christ
    os/courses/826.S08
  • For more references, take a look also in the
    paper http//www.cs.cmu.edu/ctsourak/tsourICDM08
    .pdf
Write a Comment
User Comments (0)
About PowerShow.com