Graph Mining in Social Network Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Graph Mining in Social Network Analysis

Description:

Graph Mining in Social Network Analysis Student: Du an Risti Professor: Veljko Milutinovi */17 Graphs A graph G = (V,E) is a set of vertices V and a set ... – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 18
Provided by: Dus108
Category:

less

Transcript and Presenter's Notes

Title: Graph Mining in Social Network Analysis


1
Graph Mining in Social Network Analysis
  • Student Dušan Ristic

Professor Veljko Milutinovic
2
Graphs
  • A graph G (V,E) is a set of vertices V and a
    set (possibly empty) E of pairs of vertices e1
    (v1, v2), where e1 ? E and v1, v2 ? V.
  • Edges may contain weights or labels and have
    direction
  • Nodes may contain additional labels

3
Motivation
  • Many domains produce data that can be
    intuitively represented by graphs.
  • We would like to discover interesting facts and
    informationabout these graphs.
  • Real world graph datasets are too large for
    humans to make any sense of.
  • We would like to discover patterns that have
    structural information present.

4
Application Domains
  • Web structure graphs
  • Social networks
  • Protein interaction networks
  • Chemical compound
  • Program Flow
  • Transportation networks

5
Social Network Analysis
  • Network (Graph)
  • Nodes Things (people, places, etc.)
  • Edges Relationships (friends, visited, etc.)

6
Existing solutions
  • Existing graph pattern mining algorithms
  • FSG
  • gSpan
  • Problems
  • Fuzzy patterns
  • Too many frequent subgraphs generated for large
    graphs

7
Proximity Pattern
  • Overcomes the above two issues
  • Subset of frequently repeating labels in tightly
    connected subgraphs

8
Example 1
Proximity pattern a,b,c
9
Definitions
  • Definition 1 (Support)The support sup(I) of an
    itemset I ? L (label set) is the number of
    transactions in the data set that contain I.
  • Definition 2 (Downward Closure)For a frequent
    itemset, all of its subsets are frequent and
    thus for an infrequent itemset, all of its
    supersets must be infrequent.
  • Definition 3 (Embedding and Mapping)Given a
    label subset I, subset of vertices p is called an
    embedding of I if I ? L(p). A mapping f between I
    and the vertices in p is a function f I ? p

v1, v2, v3 is an embedding of l1, l2, l5
10
Neighbor Association Model
  • Find all the embeddings, p1, p2, . . . , pm of an
    itemset I in the graph
  • For each embedding p, measure its strength f(p)
  • Aggregate the strength of the embeddings, Take
    F(I) as the support of I
  • Create an overlapping graph with embeddings as
    nodes

overlapping graph
11
Information Propagation Model
  • G0 ? G1 ? . . . ? Gn
  • G0 starting graph
  • Gn stable graph
  • G stable graph approximation
  • The probability of observing L(u) and l is
  • P(L ? l) P(Ll)P(l),
  • P(l) is the probability of l in us neighbors
  • P(Ll) is the probability that l is successfully
    propagated to u
  • For multiple labels, l1, l2, . . . , lm
  • P(L?l1, l2, . . . , lm) P(Ll1). . .
    P(Llm)P(l1). . . P(lm).

12
Nearest Probabilistic Association
  • Definition 4.Au(l) P(L(u)l) e-ad, where
    l? is a label present in a vertex v? closest to
    u?, d? is the distance from v? to u? (1
    for unweighted graph), a? is the decay
    constant (a gt 0)
  • Probabilistic Support of I
  • If I l1, l2, . . . ,m and J l1, l2, . . .
    , lm, lm1 . . . , n, then since,
  • sup(I) sup(J)

13
(No Transcript)
14
Normalized Probabilistic Association (Improved
NPA)
  • Normalized Probabilistic Association of label l?
    at vertex u?
  • NMPA can break ties when
  • two vertices have the same number of neighbors
  • two vertices have different number of neighbors
    but the same number of neighbors having label
    l?.
  • The update rule of Algorithm 1 is changed
  • The normalizing factor will give more
    association strength for the labels that are
    contained by many neighbors.
  • Complexity same as NPA, however propagation
    decays faster due to normalization.

15
Conclusion
  • With growth in popularity of social
    networksgraph data mining will be more needed
    than ever
  • Probabilistic itemset mining discovers new kind
    of patterns

16
Literature
  • http//cs.ucsb.edu/xyan/papers/sigmod10-associati
    on.pdf
  • A. Khan, X. Yan, and K.-L.Wu. Towards proximity
    pattern mining in large graphs. In SIGMOD, 2010.
  • C.C. Aggarwal, Y. Li, J. Wang, J. Wang. Frequent
    Pattern Mining with Uncertain Data. KDD 2009.
  • Presentation on Frequent Pattern Growth
    (FP-Growth) Algorithm An Introduction by Florian
    Verhein
  • J. Han, J. Pei, Y. Yin. Mining frequent patterns
    without candidate generation. SIGMOD, 2000.

17
Questions?
  • Dušan Ristic 3020/2014
  • E-mail rd143020m_at_etf.rs
Write a Comment
User Comments (0)
About PowerShow.com