Informationssuche in sozialen Netzen - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Informationssuche in sozialen Netzen

Description:

Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 57

Provided by: RalfSc1

Category:

more less

Transcript and Presenter's Notes

Title: Informationssuche in sozialen Netzen

1
Informationssuche in sozialen Netzen

Ralf Schenkel

Joint work with Tom Crecelius, Mouna Kacimi,
Sebastian Michel, Thomas Neumann, Josiane
Parreira, Marc Spaniol, Gerhard Weikum
2
Social Tagging Networks

Definition Social Tagging Network
Website where people
publish tag information
review rate information
publish their interests
maintain network of friends
interact with friends

3
Some Statistics

Flickr (as of Nov 2008)
3 billion photos, 3 million new photos per day

Facebook (as of Nov 2008)
10 billion photos, 30 million new photos per
day
120 million active users
150,000 new users per day

Myspace (as of Apr 2007)
135 million users (6th largest country on Earth)
2 billion images (150,000 req/s), millions added
daily
25 million songs
60TB videos

StudiVZ.net (as of Nov 2008)
11 million users
300 million images, 1 million added daily

Huge volume of highly dynamic data
4
Showcase librarything.com
5
librarything.com Social Interaction
6
librarything.com Tag Clouds
7
librarything.com Search
Search results independent of the querying
user(and the social context)
8
librarything.com Search
Search automatically expanded with similar
tags(synonyms)
9
Librarything.com Recommendations
Recommendations depend on user and tags(but not
on social context)
10
Librarything.com Recommendations
Explanation for the recommendation
11
Librarything.com Explanations
12
Librarything.com Explanations
13
Outline

Search in Social Tagging Networks
Graph Model
Different Information Needs
Effective Query Scoring
Efficient Query Evaluation
Summary Further Challenges

14
Querying Social Tagging Networks
15
Querying Social Tagging Networks
16
Information Need 1 Globally Popular
harry potter
Most frequently tagged items bestTags by all
users equally important
17
Information Need 2 Similar Users
travel
18
Information Need 2 Similar Users
travel
Tags by users with similar tags/items(brothers
in spirit)more important
19
Information Need 3 Trusted Friends
probability
20
Information Need 3 Trusted Friends
probability
Tags by closely related and well-known users more
important
21
Towards Social-Aware Social Search

Search results may depend on
Global popularity of items
Spiritual context of the querying user(users
with similar books and/or tags)
Social context of the querying user(known and
trusted friends)

22
Outline

Search in Social Tagging Networks
Effective Query Scoring
Quantifying Friendship Strengths
User-specific Scoring Functions
Experimental Evaluation
Efficient Query Evaluation
Summary Further Challenges

23
Notation

U set of users
T set of tags
I set of items
tags(u) tags used by user u
items(u) items tagged by user u
items(t) items tagged with tag t by at least one
user
df(t) number of items tagged with tag t
tfu(i,t) number of times user u tagged item i
with tag t
tf(i,t) number of times item i was tagged with
tag t

24
Quantifying Friendship Strengths

Global friendship strength

Spiritual friendship strength

Social friendship strength

Integrated friendship strength

25
Spritual Friendship Strength
overlap in interests of u and u

Several alternatives
based on overlap of tag usage

harrypotterwizard
deathlyhallows
philosopherstone
u
u

based on overlap of tagged items

overlap of behavior (tagging, searching, rating,
)

For all
Pspirit(u,u)0
normalization such that

tags(u) tags used by user u items(u) items
tagged by user u
26
Graph-Based Friendship Strength
distance of u and u in user network
u1
u5
u3
u7
u2
u6
Psocial( ,u)
u4
u2
u
u3
u4
u5
u6
u7
27
Integrated Friendship Strength

Query-dependent mixture of
spiritual friendship strength
social friendship strength
background model (global)
(0??,??1 ???1)

Pint(u,u)
28
Excursion Scoring in Text Retrieval
General scoring framework
Importance of t in the collection(the less
frequent, the better)
Importance of t for item i(the more frequent,
the better)
29
Towards a User-specific Score
SIGIR 2008
30
Including Tag Expansion

Problem Users use different tags for similar
things
? poor recall (missing relevant results)

ExampleMPI, MPII, MPI-INF, MPI-CS,
Max-Planck-Institut, D5, AG5, DBIS, MMCI, UdS,
Saarland University,
Solution 1. Define notion of similar tags 2.
Expand queries with similar tags 3. Modify
scoring function for expanded queries
31
Heuristics for finding similar tags

Co-Occurrence heuristics
Tags t1 and t2 similar if they occur (almost)
always together

32
Scoring Expanded Queries

Naive approach
For query tag t, add similar tags t with
sim(t,t)gtd to query

But transportation disaster expanded by train
car bus plane
international crime expanded by mafia camorra
yakuza
Result quality drops due to topic drift
Better auto-tuning incremental expansion For
query tag t, consider only expansion with highest
combined score per item
33
Experimental Evaluation Effectiveness

Systematic evaluation of result quality difficult
Three possible setups
Manual queries human assessments
Queriesassessments derived from external info
(ex DMOZ categories)
Automated assessments from context of user
Items tagged by friends
Items tagged in the future

?
?
?
34
Prototype VLDB/SIGIR 2008 demo
35
Preliminary User Study

LibraryThing user study Data Engineering
Bulletin, June 2008
6 librarything users with reasonably large
library and friend sets
Overall 49 queries like mystery magic,
wizard, yakuza
Crawled (part of) librarything 1,3 mio books,
15 mio tags, 12,000 users, 18,000 friends
Measured NDCG10

? (spiritual)
0.0 0.2 0.5 0.8 1.0
0.0 0.546 0.572 0.568 0.565 0.565
0.2 0.564 0.572 0.579 0.581 -
0.5 0.539 0.552 0.559 - -
0.8 0.515 0.546 - - -
1.0 0.465 - - - -
a (social)

Result quality generally very high
Combination of spiritual and social friends is
best

36
Outline

Search in Social Tagging Networks
Effective Query Scoring
Efficient Query Evaluation
Threshold Algorithms
ContextMerge
Experimental Evaluation
Summary Further Challenges

37
Algorithmic Overview

Input query qt1tn for user u, a, ?
Output k items with highest scores
Goals
Avoid computing all results
Minimize disk I/O and CPU load
Utilize precomputed information on disk

harry potter
..
38
Excursion Threshold Algorithms for Text IR

Input
query qt1tn
lists L(tp) with pairs lti,score(i,tp)gt, sorted by
score(i,tp)?
Output k items with highest aggregated score
Family of Threshold Algorithms
scan lists in parallel
maintain partial candidate results with score
bounds
terminate as soon as top-k results are stable

39
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
candidates
40
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
0.9
candidates
41
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
0.9
1.0
candidates
42
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
candidates
43
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
candidates
No more new candidates considered
44
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
1.3
candidates
Algorithm safely terminates
45
Can we reuse this here?
harry
travel
0.87
0.95
0.82
0.85
0.69
0.51
Number of lists to precompute would
explode!(tags ? users ? parameter space)
46
Revisiting the Social Frequency
Compute sfu(i,t) on the fly from tf(i,t), friends
of u and their tagged documents
47
Top-K in Social Networks ContextMerge

Precomputed lists
ITEMS(t) pairs lti,tf(i,t)gt, sorted by tf(i,t)?
USERITEMS(u,t) pairs lti,tfu(i,t)gt, unsorted
FRIENDS(u) pairs ltu,F(u,u)gt, sorted by
F(u,u)?

ITEMS(harry)
alreadyexist insystems
32
47
26

USERITEMS( , harry)
FRIENDS( )
0.12
0.085
0.10

48
ContextMerge

Adapted Threshold Algorithm for query u,t
Scan ITEMS(t) and FRIENDS(u) in parallel
pick best list
If ITEMS(t) read next entry
If FRIENDS(u) read USERITEMS(u,t) for next
friend u
Maintain candidates with bounds for min and max
score and current results

ITEMS(harry)
FRIENDS( )
47
0.12
0.10
32
0.085
26

49
ContextMerge

Adapted Threshold Algorithm for query u,t
Scan ITEMS(t) and FRIENDS(u) in parallel
pick best list
If ITEMS(t) read next entry
If FRIENDS(u) read USERITEMS(u,t) for next
friend u
Maintain candidates with bounds for min and max
score and current results

ITEMS(harry)
FRIENDS( )
User-indeppart of sf
47
User-specpart of sf
47
0.12
?
? U
0.10
32
0.085
26

50
ContextMerge

Adapted Threshold Algorithm for query u,t
Scan ITEMS(t) and FRIENDS(u) in parallel
pick best list
If ITEMS(t) read next entry
If FRIENDS(u) read USERITEMS(u,t) for next
friend u
Maintain candidates with bounds for min and max
score and current results

ITEMS(harry)
FRIENDS( )
User-indeppart of sf
47
User-specpart of sf
47
0.12
? 0.88U
? U
?
0.10
32
? 47
0.085
? U
26

51
Experimental Evaluation Efficiency