Declarative Learning Models for Natural Language Processing - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Declarative Learning Models for Natural Language Processing

Description:

Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. ... Walking distance to shopping, public transportation, schools and park. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 17
Provided by: Office20041706
Category:

less

Transcript and Presenter's Notes

Title: Declarative Learning Models for Natural Language Processing


1
Declarative Learning ModelsforNatural Language
Processing
  • Aria Haghighi
  • 12/08/2006

2
Overview
  • Need Quick NLP System Deployment
  • New languages
  • New domains
  • Typical User
  • Sophisticated engineer
  • Little statistical expertise
  • No time to label data!

3
Overview
Annotated Data
Unlabeled Data
Prototype List

4
Sequence Modeling Tasks
Information Extraction Classified Ads
Newly remodeled 2 Bdrms/1 Bath, spacious upper
unit, located in Hilltop Mall area. Walking
distance to shopping, public transportation,
schools and park. Paid water and garbage. No dogs
allowed.
Newly remodeled 2 Bdrms/1 Bath, spacious upper
unit, located in Hilltop Mall area. Walking
distance to shopping, public transportation,
schools and park. Paid water and garbage. No dogs
allowed.
Prototype List
5
Sequence Modeling Tasks
English POS
Newly remodeled 2 Bdrms/1 Bath, spacious upper
unit, located in Hilltop Mall area. Walking
distance to shopping, public transportation,
schools and park. Paid water and garbage. No dogs
allowed.
Newly remodeled 2 Bdrms/1 Bath, spacious upper
unit, located in Hilltop Mall area. Walking
distance to shopping, public transportation,
schools and park. Paid water and garbage. No dogs
allowed.
Prototype List
6
Generalizing Prototypes
a witness reported
a witness reported
said
the
president
  • Tie each word to its
  • most similar prototype

7
Generalizing Prototypes
reported VBD
suffix-2ed VBD
simsaid VBD
Weights ?reported Æ VBD 0.35 ?suffix-2ed
Æ VBD 0.23

?simsaid Æ VBD 0.35
8
English POS Experiments
  • Data
  • 193K tokens (about 8K sentences)
  • of WSJ portion of Penn Treebank
  • Features Smith Eisner 05
  • Trigram tagger
  • Word type, suffixes up to length 3,
  • contains hyphen, contains digit,
  • initial capitalization

9
English POS Experiments
BASE
  • Fully Unsupervised
  • Random initialization
  • Greedy label remapping

10
English POS Experiments
  • Prototype List
  • 3 prototypes
  • per tag
  • Automatically
  • extracted by
  • frequency

11
English POS Distributional Similarity
  • Judge a word by the company it keeps
  • the president said a downturn is near
  • Collect context counts from 40M words of WSJ
  • Similarity Schuetze 93
  • SVD dimensionality reduction
  • cos(?) similarity measure

12
English POS Experiments
  • Add similarity features
  • Top five most similar prototypes
  • that exceed threshold

PROTOSIM
67.8 on non-prototype accuracy
13
English POS Transition Counts
14
Classified Ads Experiments
  • Data
  • 100 ads (about 119K tokens)
  • from Grenager et. al. 05
  • Features
  • Trigram tagger
  • Word type

15
Classified Ads Experiments
BASE
  • Fully Unsupervised
  • Random initialization
  • Greedy label remapping

16
Classified Ads Experiments
  • Prototype List
  • 3 prototypes
  • per tag
  • 33 words
  • in total
  • Automatically
  • extracted by
  • frequency
Write a Comment
User Comments (0)
About PowerShow.com