Title: CS 599: Social Media Analysis
1Information Diffusion in Social Media
- Kristina Lerman
- University of Southern California
2Information diffusion on Twitter follower graph
3Diffusion on networks
- The spread of disease, ideas, behaviors, on a
network can be described as a contagion process
where an active node (infected/informed/adopted)
activates its non-active neighbors with some
probability - creates a cascade on a network
- How large do cascades become?
- What determines their growth?
4Diffusion models
- Complex response infection requires multiple
exposures. - Non-monotonic exposure response
Exposure response function
Complex contagion
Threshold model
1
1
infection prob.
infection prob.
fiki
number infected neighbors
number infected neighbors
5Epidemic diffusion model
- Infected nodes propagate contagion to susceptible
neighbors with probability m (transmissibility or
virality of contagion)
Exposure response function
1
infection prob.
number infected neighbors
6Epidemic threshold
- Epidemic threshold t
- For m lt t, localized cascades (epidemic dies out)
- For m gt t, global cascades
- Epidemic threshold depends on topology only
largest eigenvalue of adjacency matrix of the
network - True for any network
7Differences in the Mechanics of Information
Diffusion across Topics Idioms, Political
Hashtags and Complex Contagion on Twitter
- Daniel M Romero, Brendan Meeder and Jon Kleinberg
Presentation by Aswin Rajkumar
8Motivation and Contribution
- Information Diffusion and Topics- Eg
Controversial political topics have high
information diffusion.- Scientific study of the
variation in diffusion mechanics across topics. - Contribution of the paper- Empirical analysis of
real world data- Observation that the mechanics
of spread can be defined using two variables,
stickiness and persistence.- Confirmation of
sociological theories found in the offline world
diffusion of innovations
9The Study How?
- Twitter Dataset, a snapshot covering a large
number of tweets over a period of several months
(Aug 09 to Jan 10) - 3 billion messages from over 60 million users
- Hashtag Tokens, Top 500 Hashtags
- _at_Mention Network, Neighbor Sett mentions from
X to Y, t 3Why? Shows Xs attention to Y.
10The Study What?
- Adoption and Spread of Hashtags - Diffusion
- Topics Politics, Celebrity, Music, Movies,
Games, Idioms, Sports and Technology - Stickiness - the probability that a piece of
information will pass from a person who knows or
mentions it to another person who is exposed to
it. - Persistence and Complex Contagion, a principle
from sociology. Persistence - the relative extent
to which repeated exposures to a hashtag continue
to have significant marginal effects on
adoption.Rate of decay.
11Complex Contagion
Complex contagion refers to the phenomenon in
social networks in which multiple sources of
exposure to an innovation are required before an
individual adopts the change of behavior. -
Wikipedia
12P(K)StickinessPersistence
13Analysis Stickiness and Persistence
- Take the top 500 hashtags
- Classify them into 8 topics or categories
- Construct p(k) curves for each hashtag and
average them separately within each category - Compare the shapesPolitical Hashtags High
Stickiness and PersistenceTwitter Idioms High
Stickiness, Low Persistence - mw2, mafiawars
- lost, newmoon
- mj, brazilwantsjb
- pandora, thisiswar
- obama, hcr
- cricket, nhl
- photoshop, digg
14Twitter Idioms
cantlivewithout
musicmonday
iloveitwhen
followfriday
15Analysis Subgraph Structure
- Interconnections among early adopters
- Subgraphs for political hashtags - High
in-degree, large number of triangles. - Tie Strength Strong, Weak.
Credit Bridge-talent.com
16Exposure Curve - Definitions
- K-exposed A user is k-exposed to a tag h if he
has not used h, but is connected to k other users
who have used h in the past. - Whats the probability that a k-exposed user u
will use hashtag h in the future?1) Ordinal
Time EstimateProbability of a k-exposed user u
using hashtag h before becoming k1 exposed.P(k)
I(k) / E(k) E(k) number of k-exposed
users I(k) number of k-exposed users who used
h before becoming k1 exposed.2) Snapshot
EstimateSimilar, but based on time. E(k)
numer of users k-exposed at t1. I(k) number of
users k-exposed at t1 and used h before t2P(k)
I(k) / E(k) -gt Exposure Curve
17Comparison Parameters
- Persistence ParameterF(P) A(P) / R(P)A(P)
Area under P curve.R(P) Area under the
rectangle of length K and height
max(P(k))Curve comparisonsIncreases rapidly
and falls vs Increases slowly and
saturatesIncreases slowly and saturates vs Rapid
Increase - Stickiness ParameterM(P) Max(P(K))
18Plots
F(P) A(P) / R(P) -gt Persistence Parameter M(P)
Max(P(K)) -gt Stickiness Parameter
19Improvements and Related Work
- _at_Mention network is not very representative.
Also, attention should be from Y to X. - Considers only average persistence. Median and
variance should be analyzed too. - Other types of networks. Eg Blogs. Gruhl, Guha,
Nowell, Tomkins - Information Diffusion through
Blogspace. - Influence on Online Behavior. Eg Games. Woo,
Kang, Kim The Contagion of Malicious Behaviors
in Online Games - Network structure is dynamic in real life. Bano,
Holthoefer, Wang, Moreno, Bailon Diffusion
Dynamics with Changing Network Composition
20Conclusion
- Hashtags of different topics exhibit different
mechanics of spread. Politically controversial
hashtags have the highest diffusion. - Information diffusion depends on the probability
of users adopting a hashtag after repeated
exposure to it. Depends on the magnitude of the
probabilities as well as the rate of decay - Confirms the sociological theory of complex
contagion - Higher in-degree and stronger ties results in
better spread.
21Questions?
22What Stops Social Epidemics? (Ver Steeg et al.)
- Why do information cascades in social media
- Grow quickly initially
- But remain much smaller than predicted by
epidemic models? - Information cascades differ from viral contagion
- Response to repeated exposure is important on
Digg (and Twitter) - Drastically alters predictions about size of
epidemics
23Social news
- Users submit or vote for (infected by) news
stories - Social network
- Users follow friends to see
- Stories friends submit
- Stories friends vote for
- Trending stories
- Digg promotes most popular stories to its Top
News page
24How large are cascades in social media?
Number of people who share a message (with a URL)
Twitter
Digg
70K URLs 700K users 36M edges
3.5K URLs 258K users 1.7M edges
Most cascades less than 1 of total network size!
25Why are these cascades so small?
Standard model of epidemic growth (Heterogenous
mean field theory, SIR model, same degree
distribution as Digg)
Most cascades fall in this range
Transmissibility, m
Transmissibility of almost all Digg stories fall
within width of this line?!
26Maybe graph structure is responsible?
? Mean field prediction (same degree dist.) ?
Simulated cascades on a random graph with same
degree dist. Simulated cascades on the
observed Digg graph
epidemic threshold
Transmissibility m
- clustering reduces epidemic threshold and
cascade size, - but not enough!
27What about the spreading mechanism?
Infected
Not Infected
?
28Are repeat exposures a big effect?
Yes, more than half of the users are exposed to
the same information more than once!
29How do people respond to repeated exposure?
Exposure response
Not much. We have similar results for Twitter
------- Also noted by Romero, et al, WWW 2011
30Big consequences for cascade growth
- Most people are exposed to a story more than once
- Repeated exposures have little effect
- Growth of epidemics is severely curtailed
(especially compared to Ind. Cascade Model)
31Weak response to repeated exposures suppresses
outbreaks
Take effect of repeat exposure into
account Actual Digg cascades Result of
simulations
Epidemic threshold unchanged
?
m, Transmissibility
32How Limited Visibility and Divided Attention
Constrain Social Contagion (Hodas Lerman, 2012)
- Questions
- How do people respond to exposures to information
by friends on social media? - What role does content play in information
diffusion? - Findings
- Users have finite ability to process information
- Most recently received messages are retweeted,
the rest are overlooked - Highly connected users (hubs) are far less likely
to retweet any message they receive than poorly
connected people - Reduced susceptibility of hubs to infections
explains why cascades are small
33Mechanics of information diffusion
User must see an item and find it interesting
before he/she can spread it (e.g., by retweeting
it, voting for or liking it, )
Cognitive
Tastes
Retweet
Content
Interface
34Cognitive factors Position bias
- People pay more attention to items at the top of
the screen or a list of items
Payne, The Art of Asking Questions (1951)
Counts Fisher ICWSM11
Buscher et al, CHI09
limits how far down the list/page the user
navigates
35Measuring position bias
- Amazon Mechanical Turk experiments
- Users were asked to recommend science stories
- We controlled the order stories were presented to
users
Position bias stories at top list positions
received more recommendations
Lerman Hogg (2014) Leveraging position bias
to improve peer recommendation in Plos One.
36Position bias creates a limited attention
post visibility
new post at top of users screen
post near the top is most likely to be seen
37Position bias creates a limited attention
some time later newer posts appear at the top
post is less likely to be seen
38Position bias and number of friends
many friends
few friends
some time later newer posts appear at the top
post is less likely to be seen
same age post is even less visible to a highly
connected user
39Friends are a source of distraction
users with more friends are more active
users with more friends are distracted by more
content
nf
- Limited attention makes hubs less susceptible to
infection
40Users retweet most recent messages
high connectivity users
Time Response Function
low connectivity users
- Users retweet newest messages (at the top of
their screen) - Hubs are much less likely to retweet an older
message
41Does content matter?
visibility
probability to tweet a message
virality
42Do viral messages spread farther?
ln(virality)
viral messages can reach many or few people
43How do people respond to multiple exposures?
Exposure response
Number of tweeting friends
- Is this evidence for complex contagion?
44Complex contagion- artifact of heterogeneity
low connectivity users
high connectivity users
- Breaking down exposure response by different
sub-populations, separated according to number of
friends they follow, reveals simple, monotonic
response
45Summary
- A meme is not a virus
- Information spread ? Disease spread
- Big consequences for modeling information spread
in social media - Highly connected people (hubs) act as fire walls
to information spread - They have a hard time finding messages in their
stream - ? People have a finite capacity to process
information the more messages they receive, the
less likely they are to respond to any given one - Information overload actually reduces the size of
information cascades