Title: Informationssuche in sozialen Netzen
1Informationssuche in sozialen Netzen
Joint work with Tom Crecelius, Mouna Kacimi,
Sebastian Michel, Thomas Neumann, Josiane
Parreira, Marc Spaniol, Gerhard Weikum
2Social Tagging Networks
- Definition Social Tagging Network
- Website where people
- publish tag information
- review rate information
- publish their interests
- maintain network of friends
- interact with friends
3Some Statistics
- Flickr (as of Nov 2008)
- 3 billion photos, 3 million new photos per day
- Facebook (as of Nov 2008)
- 10 billion photos, 30 million new photos per
day - 120 million active users
- 150,000 new users per day
- Myspace (as of Apr 2007)
- 135 million users (6th largest country on Earth)
- 2 billion images (150,000 req/s), millions added
daily - 25 million songs
- 60TB videos
- StudiVZ.net (as of Nov 2008)
- 11 million users
- 300 million images, 1 million added daily
Huge volume of highly dynamic data
4Showcase librarything.com
5librarything.com Social Interaction
6librarything.com Tag Clouds
7librarything.com Search
Search results independent of the querying
user(and the social context)
8librarything.com Search
Search automatically expanded with similar
tags(synonyms)
9Librarything.com Recommendations
Recommendations depend on user and tags(but not
on social context)
10Librarything.com Recommendations
Explanation for the recommendation
11Librarything.com Explanations
12Librarything.com Explanations
13Outline
- Search in Social Tagging Networks
- Graph Model
- Different Information Needs
- Effective Query Scoring
- Efficient Query Evaluation
- Summary Further Challenges
14Querying Social Tagging Networks
15Querying Social Tagging Networks
16Information Need 1 Globally Popular
harry potter
Most frequently tagged items bestTags by all
users equally important
17Information Need 2 Similar Users
travel
18Information Need 2 Similar Users
travel
Tags by users with similar tags/items(brothers
in spirit)more important
19Information Need 3 Trusted Friends
probability
20Information Need 3 Trusted Friends
probability
Tags by closely related and well-known users more
important
21Towards Social-Aware Social Search
- Search results may depend on
- Global popularity of items
- Spiritual context of the querying user(users
with similar books and/or tags) - Social context of the querying user(known and
trusted friends)
22Outline
- Search in Social Tagging Networks
- Effective Query Scoring
- Quantifying Friendship Strengths
- User-specific Scoring Functions
- Experimental Evaluation
- Efficient Query Evaluation
- Summary Further Challenges
23Notation
- U set of users
- T set of tags
- I set of items
- tags(u) tags used by user u
- items(u) items tagged by user u
- items(t) items tagged with tag t by at least one
user - df(t) number of items tagged with tag t
- tfu(i,t) number of times user u tagged item i
with tag t - tf(i,t) number of times item i was tagged with
tag t
24Quantifying Friendship Strengths
- Global friendship strength
- Spiritual friendship strength
- Social friendship strength
- Integrated friendship strength
25Spritual Friendship Strength
overlap in interests of u and u
- Several alternatives
- based on overlap of tag usage
harrypotterwizard
deathlyhallows
philosopherstone
u
u
- based on overlap of tagged items
- overlap of behavior (tagging, searching, rating,
)
- For all
- Pspirit(u,u)0
- normalization such that
tags(u) tags used by user u items(u) items
tagged by user u
26Graph-Based Friendship Strength
distance of u and u in user network
u1
u5
u3
u7
u2
u6
Psocial( ,u)
u4
u2
u
u3
u4
u5
u6
u7
27Integrated Friendship Strength
- Query-dependent mixture of
- spiritual friendship strength
- social friendship strength
- background model (global)
- (0??,??1 ???1)
Pint(u,u)
28Excursion Scoring in Text Retrieval
General scoring framework
Importance of t in the collection(the less
frequent, the better)
Importance of t for item i(the more frequent,
the better)
29Towards a User-specific Score
SIGIR 2008
30Including Tag Expansion
- Problem Users use different tags for similar
things - ? poor recall (missing relevant results)
ExampleMPI, MPII, MPI-INF, MPI-CS,
Max-Planck-Institut, D5, AG5, DBIS, MMCI, UdS,
Saarland University,
Solution 1. Define notion of similar tags 2.
Expand queries with similar tags 3. Modify
scoring function for expanded queries
31Heuristics for finding similar tags
- Co-Occurrence heuristics
- Tags t1 and t2 similar if they occur (almost)
always together
32Scoring Expanded Queries
- Naive approach
- For query tag t, add similar tags t with
sim(t,t)gtd to query
But transportation disaster expanded by train
car bus plane
international crime expanded by mafia camorra
yakuza
Result quality drops due to topic drift
Better auto-tuning incremental expansion For
query tag t, consider only expansion with highest
combined score per item
33Experimental Evaluation Effectiveness
- Systematic evaluation of result quality difficult
- Three possible setups
- Manual queries human assessments
- Queriesassessments derived from external info
(ex DMOZ categories) - Automated assessments from context of user
- Items tagged by friends
- Items tagged in the future
?
?
?
34Prototype VLDB/SIGIR 2008 demo
35Preliminary User Study
- LibraryThing user study Data Engineering
Bulletin, June 2008 - 6 librarything users with reasonably large
library and friend sets - Overall 49 queries like mystery magic,
wizard, yakuza - Crawled (part of) librarything 1,3 mio books,
15 mio tags, 12,000 users, 18,000 friends - Measured NDCG10
? (spiritual)
0.0 0.2 0.5 0.8 1.0
0.0 0.546 0.572 0.568 0.565 0.565
0.2 0.564 0.572 0.579 0.581 -
0.5 0.539 0.552 0.559 - -
0.8 0.515 0.546 - - -
1.0 0.465 - - - -
a (social)
- Result quality generally very high
- Combination of spiritual and social friends is
best
36Outline
- Search in Social Tagging Networks
- Effective Query Scoring
- Efficient Query Evaluation
- Threshold Algorithms
- ContextMerge
- Experimental Evaluation
- Summary Further Challenges
37Algorithmic Overview
- Input query qt1tn for user u, a, ?
- Output k items with highest scores
- Goals
- Avoid computing all results
- Minimize disk I/O and CPU load
- Utilize precomputed information on disk
harry potter
..
38Excursion Threshold Algorithms for Text IR
- Input
- query qt1tn
- lists L(tp) with pairs lti,score(i,tp)gt, sorted by
score(i,tp)? - Output k items with highest aggregated score
- Family of Threshold Algorithms
- scan lists in parallel
- maintain partial candidate results with score
bounds - terminate as soon as top-k results are stable
39Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
candidates
40Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
0.9
candidates
41Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
0.9
1.0
candidates
42Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
candidates
43Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
candidates
No more new candidates considered
44Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
1.3
candidates
Algorithm safely terminates
45Can we reuse this here?
harry
travel
0.87
0.95
0.82
0.85
0.69
0.51
Number of lists to precompute would
explode!(tags ? users ? parameter space)
46Revisiting the Social Frequency
Compute sfu(i,t) on the fly from tf(i,t), friends
of u and their tagged documents
47Top-K in Social Networks ContextMerge
- Precomputed lists
- ITEMS(t) pairs lti,tf(i,t)gt, sorted by tf(i,t)?
-
- USERITEMS(u,t) pairs lti,tfu(i,t)gt, unsorted
- FRIENDS(u) pairs ltu,F(u,u)gt, sorted by
F(u,u)?
ITEMS(harry)
alreadyexist insystems
32
47
26
USERITEMS( , harry)
FRIENDS( )
0.12
0.085
0.10
48ContextMerge
- Adapted Threshold Algorithm for query u,t
- Scan ITEMS(t) and FRIENDS(u) in parallel
- pick best list
- If ITEMS(t) read next entry
- If FRIENDS(u) read USERITEMS(u,t) for next
friend u - Maintain candidates with bounds for min and max
score and current results
ITEMS(harry)
FRIENDS( )
47
0.12
0.10
32
0.085
26
49ContextMerge
- Adapted Threshold Algorithm for query u,t
- Scan ITEMS(t) and FRIENDS(u) in parallel
- pick best list
- If ITEMS(t) read next entry
- If FRIENDS(u) read USERITEMS(u,t) for next
friend u - Maintain candidates with bounds for min and max
score and current results
ITEMS(harry)
FRIENDS( )
User-indeppart of sf
47
User-specpart of sf
47
0.12
?
? U
0.10
32
0.085
26
50ContextMerge
- Adapted Threshold Algorithm for query u,t
- Scan ITEMS(t) and FRIENDS(u) in parallel
- pick best list
- If ITEMS(t) read next entry
- If FRIENDS(u) read USERITEMS(u,t) for next
friend u - Maintain candidates with bounds for min and max
score and current results
ITEMS(harry)
FRIENDS( )
User-indeppart of sf
47
User-specpart of sf
47
0.12
? 0.88U
? U
?
0.10
32
? 47
0.085
? U
26
51Experimental Evaluation Efficiency
- Testbed 3 large crawls of real social networks
- Flickr 10 mio pictures, 50,000 users
- Del.icio.us 175,000 bookmarks, 12,000 users
- Librarything 6.5 mio books, 10,000 users
- Queries
- 150 frequent tag pairs
- for each query pick user with enough results
friends - Abstract cost measure ? disk load
- Baseline full merge sort
52Experimental Evaluation Efficiency (?0)
a
53Outline
- Search in Social Tagging Networks
- Effective Query Scoring
- Efficient Query Evaluation
- Summary Further Challenges
54Summary
- Need for social-aware social search, supporting
- global
- social
- spiritual
- information needs
- Social scoring
- integrating global, collection, and social
context - including dynamic tag expansion
- ContextMerge scalable implementation
55Further Challenges
- Meaningful common benchmark
- Incremental maintenance for high dynamics
- Extend to ratings, user weights, item weights,
- Extend to non-tags (like image features)
- Automatic query parameterization
- Meaningful explanations of results
- Exploit dynamics (hot topics, evolving groups,.)
Social-Aware Search Recommendationsat planet
scale
56Thank you.