Extracting Relevant - PowerPoint PPT Presentation

About This Presentation
Title:

Extracting Relevant

Description:

Extracting Relevant & Trustworthy Information from Microblogs Joint work with Bimal Viswanath, Farshad Kooti, Saptarshi Ghosh, Naveen Sharma, Niloy Ganguly, – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 50
Provided by: Sapt1
Category:

less

Transcript and Presenter's Notes

Title: Extracting Relevant


1
Extracting Relevant Trustworthy Information
from Microblogs
  • Joint work with Bimal Viswanath, Farshad Kooti,
    Saptarshi Ghosh, Naveen Sharma, Niloy Ganguly,
  • Fabricio Benevenuto
  • MPI-SWS, Germany IIT Kharagpur, India UFOP,
    Brazil

2
My research The big picture
  • Three fundamental trends challenges in social
    Web
  • 1. User-generated content sharing
  • can we protect privacy of users sharing personal
    data?
  • 2. Word-of-mouth based content exchange
  • can we understand leverage word-of-mouth
    better??
  • 3. Crowd-sourcing content rating and ranking
  • can we find trustworthy relevant content
    sources?

3
Twitter microblogging site
  • An important source for real-time Web content
  • 500 million active users posting 400 million
    tweets daily
  • Quality of tweets / content vary widely
  • Any one can post tweets
  • Celebrities, politicians, news media, academics,
    spammers
  • Challenge Finding relevant trustworthy content
  • Trustworthy Thwart spammers and their spam
  • Relevance Identify authoritative experts on
    specific topics

4
Thwarting Spammers in TwitterWWW 2012
Part 1
5
Background How spammers operate
  • Twitter spammers try to gain lots of followers
  • To promote spam directly
  • To gain influence in the network
  • Search engines rank tweets based on how
    influential the user is
  • Most metrics depend on users network
    connectivity
  • More followers help a user to gain influence

Incentivizes spammers to acquire links to gain
influence
6
Acquiring followers via link farming
  • Unrelated users exchange links with each other
  • To gain more influence based on network
    connectivity

David
Alice
Charlie
Influence based on connectivity is improved
Bob
7
To thwart spammers
  • We need to
  • 1. Understand link farming activity in Twitter
  • 2. Combat link farming activity in Twitter
  • Prior works Focused on detecting spammers
  • Via their characteristics, e.g., follower to
    following ratios
  • Rat-race between spammers and spam fighters
  • We focus on the spammer support network

8
Identifying spammers
  • Used Twitter network gathered from previous study
    ICWSM10
  • Data collected in August 2009
  • 54M nodes, 1.9B links, 1.7B Tweets
  • Identified accounts suspended by Twitter
  • Account could be suspended for various reasons
  • Found suspended users that posted blacklisted
    URLs
  • Includes 41,352 such spammers

9
Spammers farm links at large-scale
  • Spam-targets 27 of all users followed by at
    least one of 40,000 spammers!
  • Spam-followers 82 of all followers have been
    targeted
  • Spammers have more followers than random users
  • Avg follower count for Spammers 234, Random
    users 36

10
Who responds to links from spammers?
  • Small number of followers respond most of the
    time

We call these users link farmers
Top 100k users account for 60 of all links to
spammers
Top 100k followers exhibit high reciprocation of
0.8 on avg.
11
Are link farmers real users or spammers?
  • To find out if they are spammers or real users,
    we
  • 1. Checked if they were suspended by Twitter
  • 76 users not suspended, 235 of them verified by
    Twitter
  • 2. Manually verified 100 random users
  • 86 users are real with legitimate links in their
    Tweets
  • 3. Analyzed their profiles
  • More active in updating their profiles than
    random users

12
Are link farmers lay or popular users?
  • Conventional wisdom
  • Lay users more likely to follow back due to
    social etiquette
  • Popular users might be more conservative in
    following others

Probability increases with user popularity
Link farmers are popular users with lots of
followers
13
Are link farmers lay or popular users?
  • Top 5 link farmers according to Pagerank
  • 1. Barack Obama Obama 2012 campaign staff
  • 2. Britney Spears
  • 3. NPR Politics Political coverage and
    conversation
  • 4. UK Prime Minister PMs office
  • 5 JetBlue Airways

Link farmers include legitimate, popular users
organizations
14
What possibly motivates link farmers?
  • One explanation
  • Link farmers have similar incentives as spammers
  • They seek to amass social capital influence in
    the network
  • Link farmers rank among top 5 influential
    Twitter users
  • In terms of various metrics like Pagerank
    Followerrank

15
Combating link farming
  • Key challenge
  • Real, popular and active users are involved in
    link farming
  • Detecting and suspending spammers alone will not
    help
  • Insight
  • Discourage users from following others carelessly
  • Penalize users following anyone found to be bad
  • Lower the influence scores of users following
    spammers

Incentivizes users to be more careful about who
they link to
16
Collusionrank
  • Borrows ideas from spam defense strategies for
    Web WWW05
  • Low Collusionrank score for a user indicates
  • heavy linking to spammers or spam-followers
  • Requires a seed set of known spammers
  • Twitter operator periodically identifies and
    updates spammers

17
Collusionrank
Algorithm 1. Negatively bias the initial scores
to the set of spammers 2. In Pagerank style,
iteratively penalize users who follow spammers or
those who follow spam-followers
Collusionrank is based on the score of followings
of a user Because user is penalized based on who
he follows
18
Evaluating Collusionrank
  • Goal
  • To penalize spammers and spam-followers
  • Should not penalize users who are not following
    spammers
  • Used a small subset of 600 spammers as seed set
  • Compare ranks between
  • Pagerank
  • Pagerank Collusionrank
  • Measures influence after accounting for link
    farming activity

19
Effect of Collusionrank on spammers
40 of spammers appear in top 20 according to
Pagerank
Most of the spammers get pushed to last 10
positions based on Collusionrank
20
Effect on link farmers
98 of the link farmers get pushed to last 10
positions based on Collusionrank
87 of link farmers in top 2 users according to
Pagerank
21
Effect on normal users
  • Focus on top 100,000 users according to Pagerank
  • Analyze the percentile difference in ranks
    between
  • Pagerank (P) Pagerank Collusionrank (PC)
  • Percentile Difference ( PC-P/N ) x 100

Only 20 of users get demoted heavily
Heavily demoted users follow many more spammers
than others
Collusion rank selectively filters out spammers
and spam-followers
22
Summary Thwarting spammers
  • Spammers infiltrate the Twitter network by
    farming links
  • Link farming helps them gain influence to promote
    spam
  • Search involves ranking users based on
    connectivity influence
  • Analyzed link farming in Twitter by studying
    spammers
  • Top link farmers are real, active and popular
    users
  • Proposed an algorithm Collusionrank to limit link
    farming
  • Incentivizes users to be careful about who they
    connect with

23
Finding Topic Experts in TwitterWOSN 2012
SIGIR 2012
Part 2
24
Topic experts in Twitter
  • Twitter is now an important source of current
    news
  • 500 million users post 400 million tweets daily
  • Quality of tweets posted by different users vary
    widely
  • News, pointless babble, conversational tweets,
    spam,
  • Challenge to find topic experts
  • Sources of authoritative information on specific
    topics

25
Identifying topic experts in Twitter
  • Existing approaches
  • Research studies Pal WSDM 11, Weng WSDM 10
  • Application systems Twitter Who-To-Follow,
    Wefollow,
  • Existing approaches primarily rely on information
    provided by the user herself
  • Bio, contents of tweets, network features e.g.
    followers
  • We rely on wisdom of the Twitter crowd
  • How do others describe a user?

26
Twitter Lists
  • A feature to organize tweets received from the
    people whom a user is following
  • Create a List, add name description, add
    Twitter users to the list
  • List meta-data offers cues for who-is-who
  • Tweets from all listed users will be available as
    a separate List stream

27
(No Transcript)
28
Mining Lists to infer expertise
  • Collect Lists containing a given user U
  • Identify Us topics from List meta-data
  • Basic NLP techniques
  • Extract nouns and adjectives
  • Extracted words collected to obtain a topic
    document for user
  • movies tv hollywood stars entertainment
  • celebrity hollywood

29
Lists vs. other features
Profile bio
Fallon, happy, love, fun, video, song, game,
hope, fjoln, fallonmono
Most common words from tweets
Most common words from Lists
celeb, funny, humor, music, movies, laugh,
comics, television, entertainers
30
Dataset
  • Collected Lists of 55 million Twitter users who
    joined before or in 2009
  • Our analysis infers topics for 1.3 million users
    who are included in 10 or more Lists

31
Evaluating inference quality
  • Quality metrics
  • Is the inference accurate?
  • Is the inference informative?
  • Evaluation of popular users
  • Celebrities, News media sources, US Senators
  • Using user feedback

32
Popular users set 1 Celebrities
  • The inferred attributes accurately capture
  • Biographical information
  • Topics of expertise
  • Popular perception about the user

Biographical Tags Topics of Expertise Popular Perception
government, president, USA, democrat politics, government celebs, leader, famous, current events
sports, cyclist, athlete tdf, triathlon, cancer celebs, influential, famous, inspiration
33
Popular users set 2 News media sources
  • The inferred attributes indicate
  • Primary topics of the media source
  • Perceived political bias (Verified using ADA
    scores)

Media Biographical Tags Topics of Expertise Popular Perception
CNN media, journalist, bloggers politics, sports, tech, weather, current influential outlets
The Nation media, journalist, magazines, blogs politics, government progressive, liberal
Townhall.com media, bloggers, commentary, journalists politics conservative, republican
GuardianFilm journalists, reviews movies, cinema, actors, theatre, hollywood film critics
34
Popular users set 3 US Senators
  • Out of the 100 US senators, 84 have Twitter
    accounts
  • The inferred attributes correctly infer
  • Their political party
  • The state represented by them
  • Their gender
  • Female or Women for all 15 female senators
  • Their political ideology
  • progressive/liberal/conservative/tea-party
  • The senate committees to which they belong

35
Popular users set 3 US Senators
Biographical Tags Senate Committees Perception
Chuck Grassley politics, senator, republican, iowa, gop health, food, agriculture conservative
Claire McCaskill politics, democrats, missouri, women tech, security, power, health, commerce progressive, liberal
Jim Inhofe politics, congress, oklahoma, republican army, energy, climate, foreign conservative
John Kerry politics, senate, democrats, boston health, climate, tech progressive
36
User feedback
Accurate Informative
Total Evaluations 345 342
Response Yes 274 277
Response No 18 20
Cant tell 53 45
  • Ignoring cant tell responses,
  • Accuracy 94
  • Informative 93

37
Evaluating inference coverage
  • What fraction of Twitter can our method of
    inference be applied to?

A large fraction of popular Twitter users are
covered
38
Evaluating inference coverage
  • We could also infer attributes of less popular
    users
  • 6 of users with Follower Ranks between 1 and 10
    Million
  • They are often experts on niche topics

User Twitter bio Followers Listed Inferred Attributes
spacespin news on robotic space exploration 56 11 science, space exploration, nasa, astronomy, planets
laithm Al-jazeera network battle cameraman 201 16 jounalists, photographer, al-jazeera, media
HumphreysLab Stem Cell, Regenrative Biology of Kidney 119 17 science, stem cell, genetics, cancer, physicians, biotech, nephrologist
39
Cognos
  • Search system for topic experts in Twitter
  • Given a query (topic)
  • Identify experts on the topic using Lists
  • Rank identified experts

40
Ranking experts
  • Used a ranking scheme solely based on Lists
  • Two components of ranking user U w.r.t. query Q
  • Relevance of user to query cover density
    ranking between topic document TU of user and Q
  • Popularity of user number of Lists including
    the user

Topic relevance(TU, Q) log(Lists including U)
41
Cognos results for stem cell
42
Evaluation of Cognos
  • System deployed and evaluated in-the-wild
  • Evaluators were students researchers from the
    three home institutes of authors

43
Cognos vs. Twitter Who-To-Follow
44
Cognos vs. Twitter Who-To-Follow
  • Considering 27 distinct queries asked at least
    twice
  • Judgment by majority voting
  • Cognos judged better on 12 queries
  • Computer science, Linux, Mac, Apple, Ipad,
    Internet, Windows phone, photography, political
    journalist,
  • Twitter Who-To-Follow judged better on 11 queries
  • Music, Sachin Tendulkar, Anjelina Jolie, Harry
    Potter, metallica, cloud computing, IIT
    Kharagpur,

45
Results for query music
46
Summary Finding topic experts in Twitter
  • Developed and deployed Cognos
  • Uses Lists to infer topics of expertise and rank
    users
  • Competes favorably with Twitter Who-To-Follow
  • Lists vital in searching for topic experts in
    Twitter
  • Future work
  • Make the inference methodology robust against
    List spam
  • Key insight Unlike follow-links, experts do not
    List non-expert users

47
Twitter microblogging site
  • An important source for real-time Web content
  • 500 million active users posting 400 million
    tweets daily
  • Quality of tweets / content vary widely
  • Any one can post tweets
  • Celebrities, politicians, news media, academics,
    spammers
  • Challenge Finding relevant trustworthy content
  • Trustworthy Thwart spammers and their spam
  • Relevance Identify authoritative experts on
    specific topics

48
Higher-level take away
  • Links mean different things in different
    real-world social networks
  • In fact, every social network offers different
    types of links
  • They are backed by different social interactions
  • Many links are implicit
  • Important to differentiate and leverage
    domain-specific usage of social links

49
Thank You
  • You can try Cognos at
  • http//twitter-app.mpi-sws.org/whom-to-follow/
  • http//twitter-app.mpi-sws.org/who-is-who/
Write a Comment
User Comments (0)
About PowerShow.com