Title: Exploring folksonomy for personalized seaRch
1University of Seoul Sungjick Lee
2Personalized Search with Folksonomy
- Folksonomy
- as Category Names
- as Keywords
- as Link Structure
- Personalized Search
- Associations between the users and the web pages
- using Vector Space Model(VSM)
- interest vector of user
- topic vector of page
3A Personalized Search Framework
All pages
Ranking pages by the topic matching model ( user
?? page )
Ranking pages by the text matching model ( query
??page)
- The topic vector of the web page pi
- The interest vector of the user uj
- The topic similarity between pi and uj
Ranking Aggregation
4A Personalized Search Framework
In practice
All pages
Ranking pages by the topic matching model ( user
?? page )
Ranking pages by the text matching model ( query
??page)
- The topic vector of the web page pi
- The interest vector of the user uj
- The topic similarity between pi and uj
Top 100 pages
Ranking Aggregation
5A Personalized Search Framework
In practice
Ranking pages by the text matching model ( query
??page)
Ranking pages by the text matching model ( query
??page)
- Two state-of-the-art text retrieval model
- BM25
- Language Model for IR(LMIR)
6A Personalized Search Framework
In practice
Ranking pages by the topic matching model ( user
?? page )
- The topic vector of the web page pi
- The interest vector of the user uj
- The topic similarity between pi and uj
7Ranking pages by topic matching model(1/6)
- Estimating the initial topic vectors and initial
interest vectors - From Folksonomy ( Using TFIDF / BM25)
- From Taxonomy ( ODP Categories as Topics)
- Interest and Topic Adjusting via Bipartite
Collaborative Link Structure
8Ranking pages by topic matching model(2/6)
- Estimating the initial topic vectors and initial
interest vectors - From Folksonomy ( Using TFIDF / BM25)
The Social annotations
Topic Space
Documents
The users
The web pages
The Social annotations of each page
The Social annotations of each user
Terms
TFIDF / BM25
TFIDF / BM25
The interest vectors of each user
The topic vectors of each page
9Ranking pages by topic matching model(3/6)
- Estimating the initial topic vectors and initial
interest vectors - From Folksonomy ( Using TFIDF / BM25)
The users
Documents
The Social annotations of each user
Terms
Folksonomy
User A
User A User B User C
Car Girl Car
Book Girl
Girl Girl
Book Car Girl
TF 1 1 1
IDF Log(3) Log(3/2) Log(3/2)
TFIDF Log(3) Log(3/2) Log(3/2)
The interest vector of User A rALog(3),
Log(3/2), Log(3/2)
Topic Space Book, Car, Girl
10Ranking pages by topic matching model(4/6)
- Estimating the initial topic vectors and initial
interest vectors(Cont.) - From Taxonomy (ODP Categories as Topics)
ODP Categories
Topic Space
All the description of the web pages under a
category
The term vector of the category
Calculating Cosine Similarity
The Social annotations owned by each page
The Social annotations owned by each user
The topic vectors of each page
The interest vectors of each user
11Ranking pages by topic matching model(5/6)
- Estimating the initial topic vectors and initial
interest vectors(Cont.) - From Taxonomy (ODP Categories as Topics)
Category 1 Category 2 Category 3
The term vector Of category 1
The term vector Of category 2
The term vector Of category 3
The Social annotations owned by user A
The Social annotations owned by user A
The Social annotations owned by user A
Cosine Similarity
Cosine Similarity
Cosine Similarity
The interest vector of user A
r1,A , r2,A ,
r3,B
12Ranking pages by topic matching model(6/6)
- Interest and Topic Adjusting via Bipartite
Collaborative Link Structure
W The adjacency matrix, in which the rows represent the users and the columns represent the web pages. Wi,j is set to the number of annotations that ui gives to pj
ri,j The jth normalized interest of the ith user
ti,j The jth normalized topic of the ith web page
a The weight of the initial estimated user interest
ß The weight of the initial estimated web page topic
The initial interest vectors of each user
The initial topic vectors of each page
Adjusting
User interest adjusting by related web pages
Web page topic adjusting by related users
The adjusted topic vectors of each page
The adjusted interest vectors of each user
13Data Set
gt500 all users own more than 500 bookmarks
80-100 100 random users who own 80-100 bookmarks
5-10 100 randomly selected users who own 5-10 bookmarks
- Folksonomy
- Two heterogeneous Data Sets
- From each data set, Three test beds according to
the number of bookmarks owned by the users
web pages annotations users
Del.icio.us 90,300 65,080 9,813
Dogear 179,835 47993 5192
Data Set Num. users Max. Tags Min. Tags Avg. Tags Max. Pages Min. Pages Avg. Pages
DEL.gt500 31 1133 74 464.42 1790 506 727.55
DEL.80-100 100 456 2 107.51 100 80 88.43
DEL.5-10 100 64 1 18.53 10 5 7.44
DOG.gt500 92 2147 42 543.87 4578 500 999.04
DOG.80-100 85 295 9 126.96 100 80 89.32
DOG.5-10 100 41 2 16.11 10 5 6.99
14Evaluation metric
- Mean Average Precision (MAP)
- The average precision for each query for a user
- Mean Mean Average Precision (MMAP)
- The mean of all the MAP values
15Performance