Link Prediction Using Supervised Learning - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Link Prediction Using Supervised Learning

Description:

Finding missing members from hidden groups. Modeling association patterns over the time. Finding the force that drive social associations. Rensselaer Polytechnic ... – PowerPoint PPT presentation

Number of Views:378
Avg rating:3.0/5.0
Slides: 16
Provided by: Office2004530
Category:

less

Transcript and Presenter's Notes

Title: Link Prediction Using Supervised Learning


1
Link Prediction Using Supervised Learning
  • Presented by
  • Mohammad Hasan

Other Members Vineet Chaoji, Saeed Salem, Dr.
Mohammed Zaki Rensselaer Polytechnic Institute
2
Presentation Outline
  • Motivation of the work
  • Problem Definition
  • Dataset Preparation
  • Results
  • Discussion on the results
  • Acknowledgements

3
Motivation
  • Predicting future associations in a social
    network
  • Finding missing members from hidden groups
  • Modeling association patterns over the time
  • Finding the force that drive social associations

4
Problem Definition
  • Link Prediction in a social group
  • u, v are two members
  • Apparently, u and v does not associate
  • What is the likelihood that they will associate
    in near future?

5
Dataset
  • Coauthorship Data
  • A true social network that involves scientists
    from a specific subject area.
  • The network is formed by
  • Institutional affiliations
  • Skill and Interest overlaps
  • Potential of being Mutually benefitial
  • We know, it satisfy
  • Small-world phenomenon
  • Power-Law distribution
  • Availability
  • Publicly
  • Very small risk of privacy invasion

6
Our Datasets
  • BIOBASE Dataset
  • Scientific publication data in the area of life
    sciences
  • We experimented for the year 1998 - 2002
  • Not available publicly
  • DBLP Dataset
  • Scientific publication data in Computer Science
  • We experimented for the year 1985 - 2002
  • Available publicly

7
Experimental Steps
  • Prepare Labeled Dataset for Supervised Learning
  • A binary classification/prediciotn problem Will
    form a link / Will not form a link
  • Compute feature values for instance vectors.
  • Use classification algorithms to predict links
  • Evaluate and Justify
  • Features
  • Classification Algorithms

8
Preparing Labeled Datasets
  • Finding Training Instances
  • Define observation/train years
  • Randonly select a pairs of scientists, u and v
    only from these years data
  • They makes a training instances, if they did not
    associates in these years.
  • Assigning Class Labels
  • Define test years that chronologically follows
    from the train years
  • If u, v has associations in test years, make the
    class label, 1
  • Otherwise, the label is -1

9
Finding Features
  • We are predicting links in a social graph
  • links are graph edges
  • Social graph follows small world phenomenon, the
    popular six-degree separation
  • So, naturally, a smaller value of minimum
    distance imply a higher change of future
    associations
  • So, graph distance is a good feature

10
Finding Features (2)
  • Those, who are more connected, are more likely to
    make new associations.
  • More connected people has
  • Many publications
  • Many neighbors
  • of publications or of direct neighbors, can be
    features. Obviously, these two features are
    highly correlated

11
Finding Feature (3)
  • People associate to be mutually benefitted.
  • That requires (identical/completemtary) skill and
    interest.
  • Considering identical case, overlap of interest
    is a good feature.
  • In the datasets we have, complementary skill did
    not turn out to be a good feature.

12
Feature Categories
13
Feature Evaluation
  • Class density distribution approach
  • Visualization approach
  • ROC approach
  • Single feature classification performance
  • Accuracy

14
Class density distributions of features
Keyword-match
Shortest-distance
Neighbor-count
Second shortest distance
15
Computing feature ROC by sliding a bar
Precision increases, Recall decreases
distribution
1
-1
Feature Values
Write a Comment
User Comments (0)
About PowerShow.com