Title: Social Networks and Graph Mining
1Social Networks and Graph Mining
- Christos Faloutsos
- CMU - MLD
2Outline
- Problem definition / Motivation
- Graphs and power laws
- Virus propagation
- e-bay fraud detection
- Conclusions
3Motivation
- Data mining find patterns (rules, outliers)
- Problem1 How do real graphs look like?
- Problem2 How do viruses propagate?
- Problem3 How to spot fraudsters in e-bay?
4Problem1 Joint work with
- Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)
5Graphs - why should we care?
Internet Map lumeta.com
Food Web Martinez 91
Protein Interactions genomebiology.com
Friendship Network Moody 01
6Graphs - why should we care?
- network of companies board-of-directors members
- viral marketing
- web-log (blog) news propagation
- computer network security email/IP traffic and
anomaly detection - ....
7Problem 1 - network and graph mining
- How does the Internet look like?
- How does the web look like?
- What constitutes a normal social network?
- What is normal/abnormal?
- which patterns/laws hold?
8Graph mining
9Laws and patterns
- NO!!
- Diameter
- in- and out- degree distributions
- other (surprising) patterns
10Solution
- Power law in the degree distribution SIGCOMM99
internet domains
att.com
ibm.com
11But
- Q1 How about graphs from other domains?
- Q2 How about temporal evolution?
12The Peer-to-Peer Topology
Jovanovic
- Frequency versus degree
- Number of adjacent peers follows a power-law
13More power laws
- citation counts (citeseer.nj.nec.com 6/2001)
log(count)
Ullman
log(citations)
14Swedish sex-web
Nodes people (Females Males) Links sexual
relationships
Albert Laszlo Barabasi http//www.nd.edu/networks
/ Publication20Categories/ 0420Talks/2005-norway
-3hours.ppt
4781 Swedes 18-74 59 response rate.
Liljeros et al. Nature 2001
15More power laws
- web hit counts w/ A. Montgomery
Web Site Traffic
log(count)
Zipf
ebay
log(in-degree)
16epinions.com
- who-trusts-whom Richardson Domingos, KDD 2001
count
trusts-2000-people user
(out) degree
17But
- Q1 How about graphs from other domains?
- Q2 How about temporal evolution?
18Time evolution
- with Jure Leskovec (CMU/MLD)
- and Jon Kleinberg (Cornell sabb. _at_ CMU)
19Evolution of the Diameter
- Prior work on Power Law graphs hints at slowly
growing diameter - diameter O(log N)
- diameter O(log log N)
- What is happening in real data?
20Evolution of the Diameter
- Prior work on Power Law graphs hints at slowly
growing diameter - diameter O(log N)
- diameter O(log log N)
- What is happening in real data?
- Diameter shrinks over time
- As the network grows the distances between nodes
slowly decrease
21Diameter ArXiv citation graph
diameter
- Citations among physics papers
- 1992 2003
- One graph per year
time years
22Diameter Autonomous Systems
diameter
- Graph of Internet
- One graph per day
- 1997 2000
number of nodes
23Diameter Affiliation Network
diameter
- Graph of collaborations in physics authors
linked to papers - 10 years of data
time years
24Diameter Patents
diameter
- Patent citation network
- 25 years of data
time years
25Temporal Evolution of the Graphs
- N(t) nodes at time t
- E(t) edges at time t
- Suppose that
- N(t1) 2 N(t)
- Q what is your guess for
- E(t1) ? 2 E(t)
26Temporal Evolution of the Graphs
- N(t) nodes at time t
- E(t) edges at time t
- Suppose that
- N(t1) 2 N(t)
- Q what is your guess for
- E(t1) ? 2 E(t)
- A over-doubled!
- But obeying the Densification Power Law
27Densification Physics Citations
- Citations among physics papers
- 2003
- 29,555 papers, 352,807 citations
E(t)
??
N(t)
28Densification Physics Citations
- Citations among physics papers
- 2003
- 29,555 papers, 352,807 citations
E(t)
1.69
N(t)
29Densification Physics Citations
- Citations among physics papers
- 2003
- 29,555 papers, 352,807 citations
E(t)
1.69
1 tree
N(t)
30Densification Physics Citations
- Citations among physics papers
- 2003
- 29,555 papers, 352,807 citations
E(t)
1.69
clique 2
N(t)
31Densification Patent Citations
- Citations among patents granted
- 1999
- 2.9 million nodes
- 16.5 million edges
- Each year is a datapoint
E(t)
1.66
N(t)
32Densification Autonomous Systems
- Graph of Internet
- 2000
- 6,000 nodes
- 26,000 edges
- One graph per day
E(t)
1.18
N(t)
33Densification Affiliation Network
- Authors linked to their publications
- 2002
- 60,000 nodes
- 20,000 authors
- 38,000 papers
- 133,000 edges
E(t)
1.15
N(t)
34Outline
- Problem definition / Motivation
- Graphs and power laws
- Virus propagation
- e-bay fraud detection
- Conclusions
35Virus propagation
- How do viruses/rumors propagate?
- Will a flu-like virus linger, or will it become
extinct soon?
36The model SIS
- Flu like Susceptible-Infected-Susceptible
- Virus strength s b/d
Healthy
N2
N
N1
Infected
N3
37Epidemic threshold t
- of a graph the value of t, such that
- if strength s b / d lt t
- an epidemic can not happen
- Thus,
- given a graph
- compute its epidemic threshold
38Epidemic threshold t
- What should t depend on?
- avg. degree? and/or highest degree?
- and/or variance of degree?
- and/or third moment of degree?
- and/or diameter?
39Epidemic threshold
- Theorem We have no epidemic, if
ß/d ltt 1/ ?1,A
40Epidemic threshold
- Theorem We have no epidemic, if
epidemic threshold
recovery prob.
ß/d ltt 1/ ?1,A
largest eigenvalue of adj. matrix A
attack prob.
Proof Wang03
41Experiments (Oregon)
b/d gt t (above threshold)
b/d t (at the threshold)
b/d lt t (below threshold)
42Outline
- Problem definition / Motivation
- Graphs and power laws
- Virus propagation
- e-bay fraud detection
- Conclusions
43E-bay Fraud detection
w/ Polo Chau, CMU
44E-bay Fraud detection - NetProbe
45Conclusions
- Graphs pose fascinating problems
- self-similarity/fractals and power laws work,
when textbook methods fail! - Need ML/AI, Stat, NA, DB (Gb/Tb), Systems
(Networks), sociology,
46Contact info
- christos_at_cs.cmu.edu
- www.cs.cmu.edu/christos
THANK YOU!