Automatic Detection of Tags for Political Blogs - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Automatic Detection of Tags for Political Blogs

Description:

Title: Dynamics of the Upper Airway Author: George Tetlow Last modified by: nisa Created Date: 8/18/2006 12:47:09 PM Document presentation format – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 2
Provided by: George726
Category:

less

Transcript and Presenter's Notes

Title: Automatic Detection of Tags for Political Blogs


1
Automatic Detection of Tags for Political Blogs
  • Khairun-nisa Hassanali Vasileios
    Hatzivassiloglou
  • nisa_at_hlt.utdallas.edu vh_at_hlt.utdallas.edu
  • The University of Texas at Dallas
  • 2681 posts from Daily Kos and 571 posts from Red
    State
  • Compared tags to original tags of blog post
  • Manually evaluated relevance of tags on a small
    portion of test set
  • Tags for Political blogs are automatically
    detected
  • Tags are representative of topics
  • Significant topics are automatically identified
    using SVM and other NLP techniques
  • Many blogs tag their posts
  • Tags are representative of the topics discussed
  • Training data was collected from Daily Kos and
    Red State
  • 100,000 posts from Daily Kos (2003-2010)
  • 70,000 posts from Red State (2007-2010)
  • A total of 787,780 tags
  • Used Joachims SVM Light

  Precision Recall F-Score
Single Word SVM 27.30 60.30 37.60
Stemming 26.10 59.50 36.30
Proper Nouns 36.50 56.80 44.40
Named Entities 48.40 49.10 48.70
All Combined 21.10 65 31.90
Manual Scoring 67.00 75 70.80
  • More than 22 .6 million Americans maintain web
    sites with regularly updated commentary (blogs),
    of which at least 38,500 are specifically
    dedicated to politics

Fig 3 Results on Daily Kos
  Precision Recall F-Score
Single Word SVM 19.00 30.00 23.30
Stemming 22.00 30.20 25.50
Proper Nouns 46.30 54.00 49.90
Named Entities 60.10 41.50 49.10
All Combined 20.30 65.70 31.00
Manual Scoring 47.00 62.00 53.50

Training of SVM classifiers
  • Given multiple texts from two or more
    blogs/political sources, answer the following
    questions
  • On which subjects the texts, as a whole across
    each source, agree/disagree?
  • How similar are the sources positions?
  • What makes them agree/disagree?

Fig 4 Results on Red State
Detection of Tags
Fig. 1 Tag Detection using Support Vector
Machines
  • A tool for automatically tagging of political
    blog posts was introduced.
  • Political blogs differ from other blogs as they
    often revolve around named entities (politicians,
    organizations and places). Therefore, tagging of
    political blog posts benefits from using basic
    named entity recognition to improve tagging.
  • Tag identification using a hybrid approach
    (statistical and grammatical) yield better
    results
  • Sood et. al report a precision/recall of
    13.11/22.83 whereas Wang and Davidson report a
    precision/recall of 45.25/23.24. Our recall is
    higher perhaps because of the domain.
  • Use the same SVM based approach with new features
    based on grammatical knowledge
  • Proper Nouns are frequently topics
  • Place a higher weight on proper and common nouns
  • Identifying entities referred by different names
  • Barack Obama, Obama and Barack Hussein Obama
    refer to the same person
  • Difficult to associate an attitude with a
    specific topic/subject
  • Many clues are implicit and appear to require
    deep semantic analysis
  • Tags can serve as a basis for bringing together
    posts about the same topic
  • Compiling a profile for each political entity
    What it talks about and what its position is
  • Organizing groups of sources according to
    perspective
  • Political Profile is a summary of a political
    entitys (politician, political group) stance on
    different issues
  • Extract the top scoring topics along with the
    entities sentiments (attitudes towards topic)
    and select representative sentences that voice
    sentiments towards these topics
  • Aggregate information across texts according to
    specific criteria (poster, source, time) and
    quantitatively compare signatures and identify
    which topics are responsible for the differences

Extraction of Tag Nouns
Extraction of Tag Entities using Named Entity
Recognition and Co-reference Resolution
Fig. 2 Tag Detection using Grammatical
Techniques
Write a Comment
User Comments (0)
About PowerShow.com