See Also: Auto Generated Recommendations - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

See Also: Auto Generated Recommendations

Description:

users on similar topics create connections to the same articles ... some articles within this kind of clusters thematically related ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 17
Provided by: Mis67
Category:

less

Transcript and Presenter's Notes

Title: See Also: Auto Generated Recommendations


1
See Also Auto Generated Recommendations
  • Mislav Cimperak
  • Marija Tkalec
  • Sinia Jovcic
  • Faculty of Humanities and Social Sciences
  • Ivana Lucica 3, Zagreb, Croatia

INFuture 2009 Digital Resources and Knowledge
Sharing
2
Introduction
  • reliable source of information
  • accessible to everyone around the world
  • most up-to-date online encyclopedia
  • disadvantages

3
See Also
  • list of similar or related articles to current
    article
  • urges users to continue browsing and reading
    articles on the page itself
  • user created list

4
Thesis
  • users on similar topics create connections to the
    same articles
  • by comparing two articles connections we could
    conclude how similar these two articles are

5
Goal
  • creation of an automatic recommendation system
    for the See also section based on soft
    clustering of documents

6
GNOME
Xfce
KDE
7
GNU General Public License
BSD license
GNOME
Apache License
Xfce
MIT license
KDE
GUI
Linux
Unix
Windows
Mac OS
8
GNU General Public License
BSD license
GNOME
Apache License
Xfce
MIT license
KDE
Fedora
GUI
Linux
Unix
Windows
Mac OS
9
Research
  • 5,012 articles
  • 509 clusters
  • evaluation
  • compared against human created connections

10
Research
  • tokens as vector features
  • document similarity threshold 0.5
  • connections within Wikipedia treated as separate
    tokens with extra weight when comparing the
    articles

11
Research
  • clusters in three categories
  • clusters with no real value
  • partially relevant clusters
  • well-formed clusters

12
Clusters with no real value
  • generated clusters not usable
  • subjects in completely different theme areas
  • clusters which contain too many articles
  • St. Peter, Saint-John Perse, General Staff of
    Armed Forces of the Republic of Croatia, French
    Guiana, Marine mammals
  • Eurasian Avars, Psychology, birds

13
Partially relevant clusters
  • some articles within this kind of clusters
    thematically related
  • remaining articles are not bound with the same
    subject or they dont involve the same or similar
    area
  • Croatian Football Team, Parliamentray elections,
    Orthography, Presidential Elections, Croatian
    Academy of Science and Arts

14
Well-formed clusters
  • articles connected to the same subject
  • Olympic Games in Tokyo, London, Barcelona,
    Atlanta, Athena, Beijing, Summer Olympic Games
  • football teams
  • Airbus airplanes

15
Observations
  • Wikipedia users more often create connections on
    more general and more obvious terms

16
Conclusion
  • the procedure cannot be regarded as being
    successful enough for an unsupervised
    implementation on articles in Croatian Wikipedia
  • most likely the algorithm would be more
    successful in a strictly supervised encyclopedia
Write a Comment
User Comments (0)
About PowerShow.com