Collective Collaborative Tagging System PowerPoint PPT Presentation

presentation player overlay
1 / 18
About This Presentation
Transcript and Presenter's Notes

Title: Collective Collaborative Tagging System


1
Collective Collaborative Tagging System
  • Jong Y. Choi, Joshua Rosen, Siddharth Maini,
    Marlon E. Pierce, and Geoffrey C. Fox
  • Community Grids Laboratory
  • Indiana University

2
People-Powered Knowledge
  • Delicious example

Bookmark
Tags
Social Networks
People-generated
3
People-Powered Knowledge
  • Collaborative Tagging
  • Online bookmarking with annotations
  • Create social networks
  • Utilize power of peoples knowledge
  • Pros and cons
  • High-quality classifier by using human
    intelligence
  • But lack of control or authority

4
Motivations
  • Distributed and fragmented knowledge
  • ? Need an unified data set
  • ? More accurate and richer information
  • No flexibility in choosing different information
    retrieval (IR) algorithms
  • ? Need a playground to do experiment with
    various IR techniques
  • ? Help to discover hidden knowledge

5
Proposed System
Collective Collaborative Tagging (CCT) System
CCT System
Data Importer
RDF RSS Atom HTML
Data Coordinator
Distributed Tagging Data
Populate Bookmarks/ tags
Repository
Query with various options
User Service
Search Result SOAP, REST,
6
Development Plan and Progress
  • 1st - Service and algorithm development
  • Identify services and algorithms
  • 2nd - Interface development
  • Web2.o style interface
  • REST, SOAP,
  • 3rd Export/import service development
  • Merging distributed data sets
  • Export data to build mesh-up sites
  • So far, we are mainly in 1st stage and do some
    experiments in 2nd stage

7
Prototype
Different Data Sources
Various IR algorithms
Flexible Options
Result Comparison
8
Service Types and Algorithms
Service
Description
Algorithm
Type
Searching
Given input tags, returning the most relevant X
(X URLs, tags, or users)
Latent Semantic Indexing (LSI), FolkRank
I
Recommendation
Indirect input tags, returning undiscovered X
II
Clustering
Community discovering. Finding a group or a
community with similar interests
K-Means, Deterministic Annealing Clustering
III
Trend detection
Analysis the tagging activities in time-series
manner and detect abnormality
Time Series Analysis
IV
9
Data Models (I)
  • Vector-space model (bag-of-words model)
  • Assume n URLs and q tags
  • A URL can be represented by q-dimension vector,
    di (t1, t2, , tq)
  • A total data set can be represented by n-by-q
    matrix
  • Pairwise Dissimilarity Matrix
  • n-by-n symmetric matrix
  • Distance (Euclidean, Manhattan, )
  • Angles, cosine, sine,
  • O(n2) complexity

10
Data Models (II)
  • Graph model
  • Building a graph with nodes and edges
  • Edges are indicating relationship
  • Becoming complex networks (tag graph)
  • Dissimilarity
  • Related with path distance
  • Finding path is important (Shortest path
    problem)
  • Naive approach O(n3) complexity

(Source MSI-CIEC)
11
Searching
  • Latent Semantic Indexing
  • Using vector-space model, find the most similar
    URLs with users query tags
  • Dimension reduction from high q to low d (q gtgt d)
  • Removing noisy terms, extracting latent concepts

Ideal Line
Recall
2 terms4 terms8 terms20 dim. reductionNone
Precision
12
Clustering
  • Discover the group structures of URLs
  • Non-parametric learning algorithm
  • Non-trivial optimization problem
  • Should avoid local minima/maxima solution

13
Deterministic Annealing Clustering
  • Deterministically avoid local minima
  • Tracing global solution by changing level of
    energy
  • Analogy to physical annealing process (High ? Low)

14
More Machine Learning Algorithms
  • Classification
  • To response more quickly to users requests
  • Training data based on users input and answering
    questions based on the training results
  • Artificial Neural Network, Support Vector
    Machine,
  • Trend Detection
  • Can be used for prediction/forecasting
  • Time-series analysis of tagging activities
  • Markov chain model, Fourier transform,

15
Conclusion
  • The goal of our Collective Collaborative Tagging
    (CCT) system
  • Utilize various data sets
  • Provide various information retrieval (IR)
    algorithms
  • Help to utilize people-powered knowledge
  • Currently various models and algorithms are
    being investigated
  • Service interfaces and import/export function
    will be added soon

16
Thank you!! Questions?
jychoi_at_cs.indiana.edu
17
Vector-space Vs. Graph
Vector-space Model
Graph Model
-. q-dimensional vector -. q-by-n matrix
Represen-tation
-. G(V, E) -. V URL, tags, users
-. Distances, cosine, -. O(N2) complexity
Dis-similarity
-. Paths, hops, connectivity, -. O(N3)
complexity
-. Latent Semantic Indexing -. Dimension
reduction schemes -. PCA
Algorithm
-. PageRank, FolkRank, -. Pairwise
clustering -. MDS
18
Pairwise Dissimilarity
  • Pairwise clustering
  • Input from vector-based model vs. graph model
  • How to avoid local minima/maxima? (e.g, K-Means)

Vector-space model
Graph model
Write a Comment
User Comments (0)
About PowerShow.com