Social Networks and Graph Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Social Networks and Graph Mining

Description:

Social Networks and Graph Mining. Christos Faloutsos. CMU - MLD ... Problem#2: How do viruses propagate? Problem#3: How to spot fraudsters in e-bay? MLD-AB '07 ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 46
Provided by: christosf
Learn more at: http://www.cs.cmu.edu
Category:
Tags: bay | graph | mining | networks | social

less

Transcript and Presenter's Notes

Title: Social Networks and Graph Mining


1
Social Networks and Graph Mining
  • Christos Faloutsos
  • CMU - MLD

2
Outline
  • Problem definition / Motivation
  • Graphs and power laws
  • Virus propagation
  • e-bay fraud detection
  • Conclusions

3
Motivation
  • Data mining find patterns (rules, outliers)
  • Problem1 How do real graphs look like?
  • Problem2 How do viruses propagate?
  • Problem3 How to spot fraudsters in e-bay?

4
Problem1 Joint work with
  • Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

5
Graphs - why should we care?
Internet Map lumeta.com
Food Web Martinez 91
Protein Interactions genomebiology.com
Friendship Network Moody 01
6
Graphs - why should we care?
  • network of companies board-of-directors members
  • viral marketing
  • web-log (blog) news propagation
  • computer network security email/IP traffic and
    anomaly detection
  • ....

7
Problem 1 - network and graph mining
  • How does the Internet look like?
  • How does the web look like?
  • What constitutes a normal social network?
  • What is normal/abnormal?
  • which patterns/laws hold?

8
Graph mining
  • Are real graphs random?

9
Laws and patterns
  • NO!!
  • Diameter
  • in- and out- degree distributions
  • other (surprising) patterns

10
Solution
  • Power law in the degree distribution SIGCOMM99

internet domains
att.com
ibm.com
11
But
  • Q1 How about graphs from other domains?
  • Q2 How about temporal evolution?

12
The Peer-to-Peer Topology
Jovanovic
  • Frequency versus degree
  • Number of adjacent peers follows a power-law

13
More power laws
  • citation counts (citeseer.nj.nec.com 6/2001)

log(count)
Ullman
log(citations)
14
Swedish sex-web
Nodes people (Females Males) Links sexual
relationships
Albert Laszlo Barabasi http//www.nd.edu/networks
/ Publication20Categories/ 0420Talks/2005-norway
-3hours.ppt
4781 Swedes 18-74 59 response rate.
Liljeros et al. Nature 2001
15
More power laws
  • web hit counts w/ A. Montgomery

Web Site Traffic
log(count)
Zipf
ebay
log(in-degree)
16
epinions.com
  • who-trusts-whom Richardson Domingos, KDD 2001

count
trusts-2000-people user
(out) degree
17
But
  • Q1 How about graphs from other domains?
  • Q2 How about temporal evolution?

18
Time evolution
  • with Jure Leskovec (CMU/MLD)
  • and Jon Kleinberg (Cornell sabb. _at_ CMU)

19
Evolution of the Diameter
  • Prior work on Power Law graphs hints at slowly
    growing diameter
  • diameter O(log N)
  • diameter O(log log N)
  • What is happening in real data?

20
Evolution of the Diameter
  • Prior work on Power Law graphs hints at slowly
    growing diameter
  • diameter O(log N)
  • diameter O(log log N)
  • What is happening in real data?
  • Diameter shrinks over time
  • As the network grows the distances between nodes
    slowly decrease

21
Diameter ArXiv citation graph
diameter
  • Citations among physics papers
  • 1992 2003
  • One graph per year

time years
22
Diameter Autonomous Systems
diameter
  • Graph of Internet
  • One graph per day
  • 1997 2000

number of nodes
23
Diameter Affiliation Network
diameter
  • Graph of collaborations in physics authors
    linked to papers
  • 10 years of data

time years
24
Diameter Patents
diameter
  • Patent citation network
  • 25 years of data

time years
25
Temporal Evolution of the Graphs
  • N(t) nodes at time t
  • E(t) edges at time t
  • Suppose that
  • N(t1) 2 N(t)
  • Q what is your guess for
  • E(t1) ? 2 E(t)

26
Temporal Evolution of the Graphs
  • N(t) nodes at time t
  • E(t) edges at time t
  • Suppose that
  • N(t1) 2 N(t)
  • Q what is your guess for
  • E(t1) ? 2 E(t)
  • A over-doubled!
  • But obeying the Densification Power Law

27
Densification Physics Citations
  • Citations among physics papers
  • 2003
  • 29,555 papers, 352,807 citations

E(t)
??
N(t)
28
Densification Physics Citations
  • Citations among physics papers
  • 2003
  • 29,555 papers, 352,807 citations

E(t)
1.69
N(t)
29
Densification Physics Citations
  • Citations among physics papers
  • 2003
  • 29,555 papers, 352,807 citations

E(t)
1.69
1 tree
N(t)
30
Densification Physics Citations
  • Citations among physics papers
  • 2003
  • 29,555 papers, 352,807 citations

E(t)
1.69
clique 2
N(t)
31
Densification Patent Citations
  • Citations among patents granted
  • 1999
  • 2.9 million nodes
  • 16.5 million edges
  • Each year is a datapoint

E(t)
1.66
N(t)
32
Densification Autonomous Systems
  • Graph of Internet
  • 2000
  • 6,000 nodes
  • 26,000 edges
  • One graph per day

E(t)
1.18
N(t)
33
Densification Affiliation Network
  • Authors linked to their publications
  • 2002
  • 60,000 nodes
  • 20,000 authors
  • 38,000 papers
  • 133,000 edges

E(t)
1.15
N(t)
34
Outline
  • Problem definition / Motivation
  • Graphs and power laws
  • Virus propagation
  • e-bay fraud detection
  • Conclusions

35
Virus propagation
  • How do viruses/rumors propagate?
  • Will a flu-like virus linger, or will it become
    extinct soon?

36
The model SIS
  • Flu like Susceptible-Infected-Susceptible
  • Virus strength s b/d

Healthy
N2
N
N1
Infected
N3
37
Epidemic threshold t
  • of a graph the value of t, such that
  • if strength s b / d lt t
  • an epidemic can not happen
  • Thus,
  • given a graph
  • compute its epidemic threshold

38
Epidemic threshold t
  • What should t depend on?
  • avg. degree? and/or highest degree?
  • and/or variance of degree?
  • and/or third moment of degree?
  • and/or diameter?

39
Epidemic threshold
  • Theorem We have no epidemic, if

ß/d ltt 1/ ?1,A
40
Epidemic threshold
  • Theorem We have no epidemic, if

epidemic threshold
recovery prob.
ß/d ltt 1/ ?1,A
largest eigenvalue of adj. matrix A
attack prob.
Proof Wang03
41
Experiments (Oregon)
b/d gt t (above threshold)
b/d t (at the threshold)
b/d lt t (below threshold)
42
Outline
  • Problem definition / Motivation
  • Graphs and power laws
  • Virus propagation
  • e-bay fraud detection
  • Conclusions

43
E-bay Fraud detection
w/ Polo Chau, CMU
44
E-bay Fraud detection - NetProbe
45
Conclusions
  • Graphs pose fascinating problems
  • self-similarity/fractals and power laws work,
    when textbook methods fail!
  • Need ML/AI, Stat, NA, DB (Gb/Tb), Systems
    (Networks), sociology,

46
Contact info
  • christos_at_cs.cmu.edu
  • www.cs.cmu.edu/christos

THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com