Parallel Clustering of English Verbs into Levin Classes - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Clustering of English Verbs into Levin Classes

Description:

Melanie Goetz Andrew Hogue. May 13, 2004. Background. Levin [1993] hand-classified verbs. 3086 verbs into 264 classes (with overlaps) Utilized verb arguments and ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 19
Provided by: secondt
Learn more at: http://secondthought.org
Category:

less

Transcript and Presenter's Notes

Title: Parallel Clustering of English Verbs into Levin Classes


1
Parallel Clustering of English Verbs into Levin
Classes
  • 6.338/18.337 Final Project
  • Melanie Goetz Andrew Hogue
  • May 13, 2004

2
Background
  • Levin 1993 hand-classified verbs
  • 3086 verbs into 264 classes (with overlaps)
  • Utilized verb arguments and alternations
  • E.g. the glass broke or broke the glass
  • Classes correlated with semantic meaning of verbs

3
Our Approach
  • Automatically classify verbs
  • Build graph G with node for each word, edges if
    words appear in same sentence
  • First, build bipartite graph with verbs and
    prepositions
  • Extend with subject nouns, object nouns
  • Use spectral partitioning to divide verbs into
    classes

4
Our Approach
5
Our Approach
6
Parallel Implementation
  • Three components
  • Extract meaningful words from parsed corpus
  • Merge per-processor sparse matrices without
    bringing data to front end
  • Run parallel spectral partitioning on full graph

7
Parsing
  • Embarrassingly parallel
  • Wall Street Journal corpus of 99 documents
  • Each processor separately extracts tree from
    corpus and relevant words from tree

8
Indexing
  • Need to combine matrices from separate processors
    into one indexing scheme
  • Bringing to front end is inefficient
  • Solution share vocabulary lists between
    processes
  • Allows each process to use the same index for
    each word

9
Indexing
10
Indexing
11
Partitioning
  • Based on specpart.m from Meshpart toolkit
  • Serial version uses Cholesky decomposition
  • Our parallel version uses eigs() function as we
    only need a few eigenvalues

12
(No Transcript)
13
Results
  • Clustered 3317 sentences from Wall Street Journal
    corpus
  • 2827 unique words
  • Included subjects, verbs, objects, prepositions

14
Results - Parsing
15
Results - Indexing
May 13, 2004
6.338/18.337 Final Project
15
16
Results - Partitioning
May 13, 2004
6.338/18.337 Final Project
16
17
Results - Clustering
May 13, 2004
6.338/18.337 Final Project
17
18
Future Work
  • Parse other corpora (Project Gutenberg)
  • Restrict word types to verb/preposition or
    subject/verb/object
  • Other ways to use eigenvectors for partitioning
    into more than 2 parts
Write a Comment
User Comments (0)
About PowerShow.com