Title: A Study of social influence in diffusion of innovation over Facebook
1A Study of social influence in diffusion of
innovation over Facebook
- Shaomei Wu
- sw475_at_cornell.edu
- Information Science
- Cornell University
- Information Science Breakfast, Dec 5, 2008
2Diffusion of Innovation
- Diffusion is the process in which an innovation
is communicated through certain channels over
time among the members of a social system. - Everett M. Rogers
- innovation Friendship Quiz a Facebook
application - Communicated Invitations among Facebook
friends - time September 25, 2008 Now
- social system Facebook
Rogers, Everett M. (2003). Diffusion of
Innovations, 5th ed.. New York, NY Free Press,
pp 5-6
3Basic Diffusion Models
Threshold Model
Cascade Model
?
Statistically Equivalent
David Kempe, Jon Kleinberg, Eva Tardos.
Maximizing the Spread of Influence through a
Social Network. KDD, 2003
4Cascade Model
- Each recommendation will succeed with certain
probability.
h
k
b
c
pgk
i
pab
pab
pac
pdi
g
pgl
pag
d
a
pad
l
pdj
j
paf
pae
non-adopter adopter social link recommendation
f
e
Question how to estimate puv ?
5Question how to estimate puv?
- Current practice
- Constant 1
- Based on ONLY network structure (e.g.,
in/out-degree) 2
Do individuals and the social relationship among
them matter?
1 Jure Leskovec, Mary McGlohon, Christos
Faloutsos, Natalie Glance, Matthew Hurst,
Cascading Behavior in Large Blog Graphs. SDM
2007. 2 Jure Leskovec, Lada Adamic, Bernardo
Huberman. The Dynamics of Viral Marketing. ACM
Conference on Electronic Commerce (EC) 2006.
6Theories from Empirical Diffusion Research
- Opinion leaders who own greater exposure to
mass media than their followers, are more
cosmopolite, have greater social participation
, have higher socioeconomic status, and are
more innovative Rogers 2003, pp 316-318. - The importance of heterophily between
participants on certain attributes (i.e.,
education and socioeconomic status) at
determining the efficiency of diffusion, despite
the fact that more effective communication
occurs when two or more individuals are
homophilous Rogers, 2003, pp19
7This project is to
- Model puvs for cascade model
- Identify the most influential factors at
determining puv - Predict the success of contagion
- Exploit Facebook data
- A real-world, ongoing diffusion instance
- Rich and (most of the time) trustable profile
information of individuals and their social
connections/activities - Precisely timestamped diffusion process, a
complete log of events
8Status
- Launched Sep 25, 2008.
- Currently used data is until Nov 25, 2008.
- 216 adopters,
- 375 individuals,
- 737 edges between 266 pairs of people,
- 90 successful infection
- 178 failed infection
- Network Evolution (in the first month after
release)
9(No Transcript)
10Predict the success of invitation with SVM
- A Binary classifier
- each invitation is either successful or failed.
- Features
- Individual features
- Pair features (homophily/heterophily)
11Individual Features
of events attended/invited of photo tagged
of wall posts of networks of groups
participated of notes Religion Political
View Gender Age Culture Background Relationship
Status Work Info Education Info
Social Activeness
Innovativeness
Socioeconomics
Education
12Pair-wise Features
Age difference Same gender? Same political
view? Same religion? Same culture background?
of same networks of photos both tagged of
groups both participated of events both
attended Same education level? Same high
school? Same college? Same workplace? Same
current city?
Biological traits
Belief
Socioeconomics
Proximity
13Each invitation is a training example - machine
learning.
Training Data
all numerical features are normalized across
examples.
14AdaBoost (with DecisionDump) A popular way
to do feature selection.
- Selected Features
- sender wall post count
- sender group count
- sender network count
- receiver age
- receiver group count
- sender receiver common group count
- Performance (10-fold cross validation)
- Accuracy 83.6
15SVM performance
- SVM-light (10-fold cross-validation)
16Weights from SVM
17Result
- SVM-light performance
- 209 records into 5 folds, 4 for training, 1 for
testing. - Performance on the testing set
- Accuracy 71.43 (30 correct, 12 incorrect, 42
total) - Precision/recall 55.56/38.46
- Feature weights distribution
Top weighted features 8, sender_events_invited,4
, sender_friend_count,11, sender_gender35,
receiver_is_It's Complicated5,
sender_wall_post_count,9, sender_note_count27.
sender_is_In a Relationship
So, the story can be when a sender who has been
invited to greater number of events in Facebook,
has more friends, wrote more Facebook notes (blog
entries), is female, has less wall posts, in a
relationship, tried to infect a person whose
relationship status is its complicated, its
more like the infection will happen compared to
other cases.
18SVM with features selected by AdaBoost
19Background
- Diffusion of Innovation
- Question
- How does it work in large online social networks?
- What are the key factors at determining the
success of infection? - Can we predict the propagation path?
20Hypothesis
- Social influence depends on 5 dimensions of
similarities - geographical distance
- current location(country/state/city), current
school, current major, year of class, current
workplace, current courses enrolled - background similarity
- sex, sexual preference, dating interest,
relationship interest, relationship status,
birthday, political view, religious view,
hometown address, previous school, previous
workplace - social similarity
- number of mutual networks they belong to,
number of mutual friends - interest similarity
- activities, favorite books, favorite music,
favorite movies, favorite TV shows, favorite
quotas - social status distance
- difference of numbers of friends, difference
of wallpost counts, difference of counts of
message sent and received, difference of counts
of notes.
21Project Description
- Objectives
- Identify the key factors for social influence
- Predict occurrence of adoption based on the key
factors. - Friendship Quiz
- A Facebook application we developed
- Enable users to make quizzes and send to their
friends (take a peek!) - We track the spread of application.
22Highlights
- A real-world diffusion of innovation
- Rich and (most of the time) trustful profile
information of individuals and their social
connections/activities - Precisely timestamped diffusion process, a
complete log of events - Ongoing diffusion process
23Backup Threshold Model