Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA)

Description:

Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu ... Keyphrase Extraction Algorithms. Heuristic, Syntactic, Machine ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 18
Provided by: jialo
Category:

less

Transcript and Presenter's Notes

Title: Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA)


1
Automating Keyphrase Extraction with
Multi-Objective Genetic Algorithms (MOGA)
  • Jia-Long Wu
  • Alice M. Agogino
  • Berkeley Expert System Laboratory
  • U.C. Berkeley

2
Outline
  • Role of Keyphrases
  • Phrase Extraction Algorithms
  • Phrase Extraction with Multi-Objective Genetic
    Algorithm
  • Experiment and Results
  • Results Evaluation
  • Conclusion
  • Future Research

3
Role of Keyphrases
  • Concept representations
  • Document indexing
  • Enhance document retrieval / Browsing
  • Query formulation assistance
  • Document surrogates

4
Vision of Unified Language System
Context Mapping Mechanism
Semantic Network
Unified Language System for Engineering Design
5
Keyphrase Extraction Algorithms
  • Heuristic, Syntactic, Machine Learning
  • Requires prior training
  • Heuristic cut-off thresholds in number of phrases
  • Focuses on single document
  • Redundancy when aggregated for the whole document
    collection

6
Keyphrase Extraction with MOGA
  • Phrase extraction as an optimization problem
  • Candidate phrases generation
  • Optimize phrase selection with MOGA
  • Model Genetic Operators

Crossover
Phenotype Genotype
Parents
Offspring
7
Keyphrase Extraction with MOGA
  • Optimize phrase selection with MOGA (cont.)
  • Model Genetic Operators (cont.)
  • Evaluation fitness functions
  • Minimize clustering measure / dispersion
    (Bookstein 98)
  • Minimize number of phrases
  • Non-Dominated Sorting Genetic Algorithm (NSGA-II)

Mutation
1
0
0
1
0
1
1
0
1
0
8
Experiment and Results
  • Data set
  • 34 papers from Design Theory and Methodology
    Conference 01
  • Candidate phrases
  • 5000 noun phrases extracted
  • Genetic Algorithm Parameters
  • Population size 100
  • Converges at 5000 generations
  • 5 hours on Xeon 1.8GHz CPU

9
Experiment and Results
Pareto plot of Dispersion versus Number of Phrases
10
Experiment and Results
Histogram of number of optimal solutions a
keyphrase appears
11
Evaluation
12
Evaluation
  • 6 domain experts participated in the evaluation.
  • Core phrases vs. Non-core phrases.
  • Less than 10 are deemed irrelevant.
  • Significant deviation between evaluators.

13
Conclusion
  • Keyphrase extraction can be successfully
    implemented as a multi-objective global
    optimization problem.
  • Reasonably good keyphrases can be extracted
    without prior training or domain knowledge.
  • Trade-off information between objectives such as
    number of phrases vs. average quality of phrases
    can be gained from Pareto solutions.
  • Preferences can be made based on the user needs
    and trade-off information.

14
Future Research
  • Test on larger text collection.
  • Implement extracted keyphrases in IR system as
    browsing and query expansion tool and compare to
    full-text search IR system.
  • Evaluate with more raters and 1-5 scale.
  • Build domain thesauri with extracted keyphrases
    and semantic discovery algorithms (e.g. Latent
    Semantic Analysis).

15
Metathesaurus in Digital Library
16
Thank you!
  • Comments? Questions?
  • jialong_at_me.berkeley.edu
  • aagogino_at_me.berkeley.edu

17
Mode Analysis of Scaled Evaluation
Write a Comment
User Comments (0)
About PowerShow.com