Parallel Clustering of English Verbs into Levin Classes

About This Presentation

Title:

Parallel Clustering of English Verbs into Levin Classes

Description:

Melanie Goetz Andrew Hogue. May 13, 2004. Background. Levin [1993] hand-classified verbs. 3086 verbs into 264 classes (with overlaps) Utilized verb arguments and ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 19

Provided by: secondt

Learn more at: http://secondthought.org

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Clustering of English Verbs into Levin Classes

1
Parallel Clustering of English Verbs into Levin
Classes

6.338/18.337 Final Project
Melanie Goetz Andrew Hogue
May 13, 2004

2
Background

Levin 1993 hand-classified verbs
3086 verbs into 264 classes (with overlaps)
Utilized verb arguments and alternations
E.g. the glass broke or broke the glass
Classes correlated with semantic meaning of verbs

3
Our Approach

Automatically classify verbs
Build graph G with node for each word, edges if
words appear in same sentence
First, build bipartite graph with verbs and
prepositions
Extend with subject nouns, object nouns
Use spectral partitioning to divide verbs into
classes

4
Our Approach
5
Our Approach
6
Parallel Implementation

Three components
Extract meaningful words from parsed corpus
Merge per-processor sparse matrices without
bringing data to front end
Run parallel spectral partitioning on full graph

7
Parsing

Embarrassingly parallel
Wall Street Journal corpus of 99 documents
Each processor separately extracts tree from
corpus and relevant words from tree

8
Indexing

Need to combine matrices from separate processors
into one indexing scheme
Bringing to front end is inefficient
Solution share vocabulary lists between
processes
Allows each process to use the same index for
each word

9
Indexing
10
Indexing
11
Partitioning

Based on specpart.m from Meshpart toolkit
Serial version uses Cholesky decomposition
Our parallel version uses eigs() function as we
only need a few eigenvalues

12
(No Transcript)
13
Results

Clustered 3317 sentences from Wall Street Journal
corpus
2827 unique words
Included subjects, verbs, objects, prepositions

14
Results - Parsing
15
Results - Indexing
May 13, 2004
6.338/18.337 Final Project
15
16
Results - Partitioning
May 13, 2004
6.338/18.337 Final Project
16
17
Results - Clustering
May 13, 2004
6.338/18.337 Final Project
17
18
Future Work

Parse other corpora (Project Gutenberg)
Restrict word types to verb/preposition or
subject/verb/object
Other ways to use eigenvectors for partitioning
into more than 2 parts

Write a Comment

User Comments (0)

About PowerShow.com

Parallel Clustering of English Verbs into Levin Classes - PowerPoint PPT Presentation

Parallel Clustering of English Verbs into Levin Classes

Melanie Goetz Andrew Hogue. May 13, 2004. Background. Levin [1993] hand-classified verbs. 3086 verbs into 264 classes (with overlaps) Utilized verb arguments and ... – PowerPoint PPT presentation