Title: Parallel Tools for Natural Language Processing
1Parallel Tools for Natural Language Processing
- Mark Brigham
- Melanie Goetz
- Andrew Hogue
6.338 / 18.337 - March 16, 2004
2Sentence Parsing
- Consider the sentence
- John ate the cookie on the table
- We want to
- Tag the sentence with parts of speech
- Group the words by phrase
3Context Free Grammars
S ? NP VP
NP ? Det N
NP ? NP PP
VP ? VP PP
VP ? V NP
N ? cookie
N ? table
Det ? the
V ? ate
- Recursive set of rules
- Defines what syntactic structure can be applied
to a phrase or word - Top-level rule S defines the sentence
4Context Free Grammars
- Applying a CFG to a sentence creates a parse-tree
for that sentence
5Context Free Grammars
Top-down parse
6Context Free Grammars
Bottom-up parse
Parallelizable!
7Ambiguity
More than one parse for a single sentence!
8Parallelization
- Bottom-up rule application appropriate for
parallel processing - Ambiguous parses also parallelizable
- Long, complex sentences may be most interesting
- Proust?
9Chart Parsing
- Create a matrix where entries correspond to
words/phrases - If there is a valid CFG parse of a phrase i,j,
add it to that matrix cell - A cell i,j may only depend on other cells m,n
where m lt i and n lt j.
10John ate the cookie on the table
John
ate
the
cookie
on
the
table
11John ate the cookie on the table
John
ate
the
cookie
on
the
table
12John ate the cookie on the table
John
ate
the
cookie
on
the
table
13John ate the cookie on the table
John
ate
the
cookie
on
the
table
14John ate the cookie on the table
John
ate
the
cookie
on
the
table
15John ate the cookie on the table
John
ate
the
cookie
on
the
table
16John ate the cookie on the table
John
ate
the
cookie
on
the
table
17John ate the cookie on the table
John
ate
the
cookie
on
the
table
18John ate the cookie on the table
John
ate
the
cookie
on
the
table
19John ate the cookie on the table
John
ate
the
cookie
on
the
table
20John ate the cookie on the table
John
ate
the
cookie
on
the
table
21John ate the cookie on the table
John
ate
the
cookie
on
the
table
22John ate the cookie on the table
John
ate
the
cookie
on
the
table
23John ate the cookie on the table
John
ate
the
cookie
on
the
table
24John ate the cookie on the table
John
ate
the
cookie
on
the
table
25John ate the cookie on the table
John
ate
the
cookie
on
the
table
26John ate the cookie on the table
John
ate
the
cookie
on
the
table
27John ate the cookie on the table
John
ate
the
cookie
on
the
table
28Other Tools
- Considering parallelizing other NLP tools
- Word-stemming Multiple finite state automata
applied to a single word in parallel - Automated part-of-speech recognition on large
corpora