Building Topic/Trend Detection System based on Slow Intelligence - PowerPoint PPT Presentation

About This Presentation
Title:

Building Topic/Trend Detection System based on Slow Intelligence

Description:

Building Topic/Trend Detection System based on Slow Intelligence Chia-Chun Shih & Ting-Chun Peng Institute for Information Industry Taipei, Taiwan – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 25
Provided by: KL79
Category:

less

Transcript and Presenter's Notes

Title: Building Topic/Trend Detection System based on Slow Intelligence


1
Building Topic/Trend Detection System based on
Slow Intelligence
  • Chia-Chun Shih Ting-Chun Peng
  • Institute for Information Industry
  • Taipei, Taiwan

Presented at DMS10 special session on Slow
Intelligence Systems
2
Agenda
  • Introduction
  • Topic/Trend Detection System
  • Topic/Trend Detection System with Slow
    Intelligence
  • Conclusion

3
Introduction
4
Introduction
Facebook Users
Twitter Posts
Blog Posts
  • Social media is prevailing
  • Social media is a reflection of real-world
  • An experiment from HP Social Computing Lab shows
  • Twitter-rate time series can accurately predict
    box-office movie sales with Adjusted R2 0.973
    (amazing!!)
  • The emerging market for Social Media Monitoring
    Service
  • E.g., Nielsen Buzzmetrics, Radian6

5
Introduction
(contd)
  • Topic Detection and Tracking (TDT)
  • Initiated by DARPA at 1996
  • discover the topical structure in unsegmented
    streams of news reporting as it appears across
    multiple media
  • Tasks
  • Topic Detection
  • Topic Tracking
  • First Story Detection
  • Story Segmentation
  • Link Detection

6
Introduction
(contd)
  • Slow Intelligence provides a software development
    framework for systems with insufficient computing
    resources to gradually adapt to environments to
    handle complexities

Environment
Knowledge-based Controller
Problem
Solution
1
2
3
4
Enumerator
Adaptor
Eliminator
Concentrator
Slow Intelligence System
7
Introduction
(contd)
  • In this paper, we propose a design of online
    topic/trend detection system for Social Media
    with the advantages of Slow Intelligence.
  • Four complexities of designing online topic/trend
    detection systems are identified, along with
    corresponding Slow Intelligence solutions.

8
Topic/Trend Detection System
9
Topic/Trend Detection System
  • Objective
  • Detect current hot topics and to predict future
    hot topics based on data collected from Social
    Media
  • Three components
  • Crawler Extractor Collect data and extract
    information from Social Media
  • Topic Extractor Detect hot topics from a set of
    text documents
  • Trend Detector Detect trends (future hot topics)
    based on currently available data

Current Hot topics
Crawler Extractor
Topic Extractor
Trend Detector
Social Media
Future Hot topics
10
Topic/Trend Detection System
(contd)
  • Crawler Extractor

Social Media
HTML documents
Users Keywords of Interests
Web Crawler
Text documents
Web data DB
Topic Extractor
Information Extractor
Extract articles and metadata (title, author,
content, etc) from semi-structured web content
Crawler Extractor
11
Topic/Trend Detection System
(contd)
  • Topic Extractor

Web data DB
Current topics
Topic Word Extraction
Topic Word Clustering
  • Apply TF-IDF scheme to generate Top-N topic
    words for each document
  • Apply clustering algorithm to cluster topic
    words into topic groups. The topic groups are
    treated as topics

Current Hot topics
Hot topic extraction
  • Apply aging theory to find hot topics

Topic Extractor
12
Topic/Trend Detection System
(contd)
  • Trend Detector

Trend Detector
  • The Trend Estimation Algorithm is a black box
    now, however, it will find its way when Slow
    Intelligence is involved in the system

13
Topic/Trend Detection Systemwith Slow
Intelligence
14
T/TD System with Slow Intelligence
  • Four complexities of designing online topic/trend
    detection systems
  • 1. It is unlikely to collect all web data based
    on limited amount of computing resources. The
    system needs to develop data collection
    strategies which can concentrate limited
    resources on collecting important web data.

Crawler Extractor
15
T/TD System with Slow Intelligence
(contd)
  • 2. Many computation methods are available for
    estimating trends. If parameter settings are also
    taken into account, there are too many
    combinations to choose. Furthermore, Internet is
    a changing environment, which means current best
    solution may not perform well in the future. The
    system needs to automatically (or at least
    quasi-automatically) find best solution from many
    alternatives in a changing environment.

Trend Detector
16
T/TD System with Slow Intelligence
(contd)
  • 3. The crawler needs to revisit websites to
    collect up-to-date data in hourly or daily
    intervals. Each site has different amount of
    to-be-update data and different policy to
    restrict frequent access, which are unknown
    beforehand. The system needs to find feasible
    data collection schedule based on past experience.

Crawler Extractor
17
T/TD System with Slow Intelligence
(contd)
  • 4. Any changes in web pages may disrupt
    Extractors. It needs automatic repair mechanism
    for Extractors if many websites are being
    monitored. The repair mechanism needs to detect
    errors of Extractors, find alternatives, and
    choose the best solution from alternatives to fix
    the disrupted Extractors.

Crawler Extractor
18
T/TD System with Slow Intelligence
(contd)
  • 1. SIS to help restrict the range of data
    collection

Knowledge of data
Knowledge of algorithm
19
T/TD System with Slow Intelligence
(contd)
  • 2. SIS to help select and adapt trend detection
    algorithms

20
T/TD System with Slow Intelligence
(contd)
  • 3. SIS to help scheduling Crawler

21
T/TD System with Slow Intelligence
(contd)
  • 4. SIS to help adapt Extractors

22
Conclusion
23
Conclusion
  • An online trend detection system requires careful
    resource allocation and automatic algorithm
    adaptation to process huge size of heterogeneous
    data.
  • This research adopts Slow Intelligence, which
    provides a framework for systems with
    insufficient computing resources to gradually
    adapt to environments, to response the
    challenges.
  • Four Slow Intelligence subsystems are proposed,
    and each subsystem targets a challenge in
    designing online topic/trend detection systems.

24
If you have any questions, please e-mail us
  • chiachun_at_iii.org.tw (Chia-Chun Shih)
  • markpeng_at_iii.org.tw (Ting-Chun Peng)
Write a Comment
User Comments (0)
About PowerShow.com