Title: Unsupervised Learning for Natural Language Processing
1Unsupervised Learning forNatural Language
Processing
- Dan Klein
- Computer Science Division
- University of California, Berkeley
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AA
2Learning Language
Unsupervised NLP
Supervised NLP
3Unsupervised NLP
- Goal induce linguistic structure not in the data
- Problem Characteristics
- Complex linguistic phenomena
- Rich, interacting, combinatorial structures
- Lots of data
- Solution Characteristics
- Incremental / hierarchical learning
- Careful choice of what to model
- Careful choice of what not to model
4Outline
- Unsupervised Grammar Refinement
- Unsupervised Coreference Resolution
- Unsupervised Translation Mining
5Syntactic Analysis
Hurricane Emily howled toward Mexico 's Caribbean
coast on Sunday packing 135 mph winds and
torrential rain and causing panic in Cancun,
where frightened tourists squeezed into musty
shelters .
6Treebank PCFGs
Charniak 96
- Use PCFGs for broad coverage parsing
- Can take a grammar right off the trees (doesnt
work well)
ROOT ? S 1 S ? NP VP . 1 NP ? PRP 1 VP ? VBD
ADJP 1 ..
Model F1
Baseline 72.0
7Conditional Independence?
- Not every NP expansion can fill every NP slot
- A grammar with symbols like NP wont be
context-free - Statistically, conditional independence too
strong
8Grammar Refinement
- Refining symbols improves statistical fit
- Parent annotation Johnson 98
9Grammar Refinement
- Refining symbols improves statistical fit
- Parent annotation Johnson 98
- Head lexicalization Collins 99, Charniak 00
10Grammar Refinement
- Refining symbols improves statistical fit
- Parent annotation Johnson 98
- Head lexicalization Collins 99, Charniak 00
- Automatic clustering Petrov and Klein 06
11Parses and Derivations
Derivations
-1
-1
-2
-1
-2
-1
-1
-2
-2
-1
-1
-1
-1
-2
- Parses (T) now have multiple derivations (t)
12Training Objectives
Matsuzaki et. al 05, Prescher 05
- One option maximum likelihood using EM
- Want derivation parameters which maximize parse
likelihood - Other options possible
- Variational inference Liang et al. 07
- Conditional likelihood Petrov and Klein 08
13Learning Latent Grammars
- Brackets are known
- Base categories are known
- Only induce subsymbols
Just like Forward-Backward for HMMs.
14Refinement of the DT tag
DT
15Refinement of the DT tag
DT
16Hierarchical Refinement
DT
17Grammar Ontogeny
X-BarG0
G
18Hierarchical Estimation Results
Model F1
Flat Training 87.3
Hierarchical Training 88.4
19Refinement of the , tag
- Splitting all categories equally is wasteful
20Adaptive Splitting
- Want to split complex categories more
- Idea split everything, roll back bad splits
21Adaptive Splitting Results
Model F1
Previous 88.4
With 50 Merging 89.5
22Number of Phrasal Subcategories
23Number of Phrasal Subcategories
NP
VP
PP
24Number of Phrasal Subcategories
NAC
X
25Number of Lexical Subcategories
NNP
JJ
NNS
NN
26Number of Lexical Subcategories
POS
TO
,
27Learned Lexical Clusters
- Proper Nouns (NNP)
- Personal pronouns (PRP)
NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
28Learned Lexical Clusters
- Relative adverbs (RBR)
- Cardinal Numbers (CD)
RBR-0 further lower higher
RBR-1 more less More
RBR-2 earlier Earlier later
CD-7 one two Three
CD-4 1989 1990 1988
CD-11 million billion trillion
CD-0 1 50 100
CD-3 1 30 31
CD-9 78 58 34
29Incremental Learning
X-BarG0
G
30Coarse-to-Fine Pruning
Charniak 98, Charniak and Johnson 05, Petrov and
Klein 07
- Consider the span 5 to 12
coarse
QP NP VP
split in two
QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4
split in eight
31Bracket Posteriors
32State Drift (DT tag)
33Projected Grammars
Petrov and Klein 07
X-BarG0
G
34Coarse-to-Fine Parsing
Petrov and Klein 07
X-BarG0
G
35Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG Split / Merge 90.6 90.1
GER Dubey 05 76.3 -
GER Split / Merge 80.8 80.1
CHN Chiang et al. 02 80.0 76.6
CHN Split / Merge 86.3 83.4
36Nonparametric PCFGs
Liang, Petrov, Jordan, Klein 07
37Unstructured Phone Models
Petrov, Pauls, Klein 07
Standard Model
Automatic Splits
HMM Baseline 25.1
5 Split rounds 21.4
38Summary
- Latent-variable grammar refinement
- Automatically learns good grammar splits
- Gives state-of-the-art parsing accuracy
- Admits very efficient parsing algorithms
- More applications beyond parsing!
39Outline
- Unsupervised Grammar Refinement
- Unsupervised Coreference Resolution
- Unsupervised Translation Mining
40Unsupervised Coreference
Haghighi and Klein 07
The Weir Group , whose headquarters
is in the U.S , is a large
specialized corporation . This power plant ,which
, will be situated in Jiangsu , has a large
generation capacity.
41Generative Mention Models
Li et al 04, Haghighi and Klein 07
...
.
...
...
..............
..........
..........
.
.
.......
.......
42Generative Mention Models
...
.
...
...
Inference Time
..............
..........
..........
.
.
.......
.......
43Finite Mixture Model
44Finite Mixture Model
Entity Distribution
Mention Parameters
K
P(W Weir Group) Weir Group0.4,
whose0.2, .......
Z1 Weir Group
Z2 Weir Group
Z3 Weir HQ
W1 Weir Group
W2 whose
W3 headqrts
45Finite Mixture Model
Entity Distribution
Mention Parameters
K
Z1 Weir Group
Z2 Weir Group
Z3 Weir HQ
K
W1 Weir Group
W2 whose
W3 headqrts
46Infinite Mixture Model
Entity Distribution
Mention Parameters
Z1 Weir Group
Z2 Weir Group
Z3 Weir HQ
W1 Weir Group
W2 whose
W3 headqrts
47 Infinite Mixture Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
48Enriching the Mention Model
Mention Model
Z
P(W Weir Group) Weir Group0.4,
whose0.2, .......
W
49Enriching the Mention Model
Non-Pronoun
Pronoun
Z
Z
Type PERS, LOC, ORG, MISC
Number Sing, Plural
Gender M,F,N
T
G
N
W
W
50Enriching the Mention Model
Entity Parameters
Pronoun
Z
Pronoun Parameters
T
G
N
W
51Enriching the Mention Model
Non-Pronoun
Pronoun
Z
Z
T
G
N
W
W
52Enriching Mention Model
Mention Type Proper, Pronoun, Nominal
Z
N
T
G
M
W
Non-pronoun
Pronoun
53Enriching Mention Model
.....
.....
.....
Z
Z
W
W
54Enriching Mention Model
.....
.....
.....
55Pronoun Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
56Salience Model
Entity Activation
1 1.0
2 0.0
L
Z
Salience Values
TOP, HIGH, MED, LOW, NONE
S
Mention Type
Proper, Pronoun, Nominal
M
57Salience Model
Entity Activation
1 0.5
2 1.0
Entity Activation
1 1.0
2 0.0
Entity Activation
1 0.0
2 0.0
1
2
2
NONE
NONE
TOP
PROPER
PROPER
PRONOUN
58Salience Model
.....
.....
.....
L
L
Z
Z
S
S
T
G
N
T
G
N
M
M
W
W
59Salience Model
60Salience Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
61Global Coreference Resolution
62Global Entity Model
63Global Entity Model
N
64Global Entity Model
N
65HDP Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
66Global Entity Resolution
67Experiments
- MUC6 English NWIRE (all mentions)
- 53.6 F1 Cardie and Wagstaff 99 Unsupervised
- 70.3 F1 Unsup Entity-Mention Unsupervised
- 73.4 F1 McCallum Wellner 04 Supervised
- 81.3 F1 Luo et al 04 Supervised
- MUC score
68Summary
- Fully generative unsupervised coref model
- Basic model of pronoun structure
- Sequential model of local attentional state
- HDP global coreference model ties documents
- Competitive with supervised results
- Many features not exploited
- Still lots of room to improve!
69Outline
- Unsupervised Grammar Refinement
- Unsupervised Coreference Resolution
- Unsupervised Translation Mining
70Standard MT Approach
Source Text
Target Text
- Trained using parallel sentences
- May not always be available
- Need (lots of) sentences
71MT from Monotext
Source Text
Target Text
- Translation without parallel text?
- Need (lots of) sentences
Fung 95, Koehn and Knight 02, Haghighi and Klein
08
72Task Lexicon Induction
Source Text
Target Text
nombre
73Data Representation
state
Source Text
What are we generating?
74Data Representation
estado
state
Source Text
Target Text
What are we generating?
75Canonical Correlation Analysis
Target Space
Source Space
76Canonical Correlation Analysis
2
3
1
1
2
3
2
1
PCA
1
3
2
3
Target Space
Source Space
77Canonical Correlation Analysis
2
1
3
2
1
3
CCA
2
1
1
3
2
3
CCA
Target Space
Source Space
78Canonical Correlation Analysis
Bach and Jordan 06
Canonical Space
2
1
3
2
1
1
3
2
3
Target Space
Source Space
79Canonical Correlation Analysis
Bach and Jordan 06
Canonical Space
2
2
2
2
Target Space
Source Space
80Generative Model
81Generative Model
82Generative Model
estado
state
nombre
world
name
politica
mundo
nation
83Learning EM?
- E-Step Obtain posterior over matching
-
- M-Step Maximize CCA Parameters
-
84Learning EM?
0.2
..
0.30
0.30
..
0.15
0.10
85Inference Hard EM
Hard E-Step Find best matching M-Step Solve
CCA
86Experimental Setup
- Data 2K most frequent nouns, texts from
Wikipedia - Seed 100 translation pairs
- Evaluation Precision and Recall against lexicon
obtained from Wiktionary - Report p0.33, precision at recall 0.33
87Feature Experiments
4k EN-ES Wikipedia Articles
88Feature Experiments
- MCCA Only orthographic features
4k EN-ES Wikipedia Articles
89Feature Experiments
- MCCA Only context features
Precision
4k EN-ES Wikipedia Articles
90Feature Experiments
- MCCA Orthographic and context features
4k EN-ES Wikipedia Articles
91Feature Experiments
Precision
Recall
92Feature Experiments
Precision
Recall
93Corpus Variation
93.8
100k EN-ES Europarl Sentences
94Corpus Variation
Ā¼
4k EN-ES Wikipedia Articles
95Corpus Variation
?
92
89
68
100k English and Spanish Gigaword
96Seed Lexicon Source
- Automatic Seed
- Edit distance seed Koehn Knight 02
92
4k EN-ES Wikipedia Articles
97Analysis
98Analysis
Top Non-Cognates
99Analysis
Interesting Mistakes
100Language Variation
101Language Variation
102Analysis
103Summary
- Learned bilingual lexicon from monotext
- Matching CCA model
- Possible even from unaligned corpora
- Possible for non-related languages
- High-precision, but much left to do!
104Conclusion
- Three cases of unsupervised learning of
non-trivial linguistic structure for NLP problems - Incremental structure learning
- Careful control of structured training
- Targeted modeling choices
- In some cases, unsupervised systems are
competitive with supervised systems (or better!) - Much more left to do!
105Thank you!
106(No Transcript)
107Outline
- Latent-Variable Grammar Learning
- Unsupervised Coreference Resolution
- Unsupervised Translation Mining
- Other Unsupervised Work
108Agreement-Based Learning
109Weakly Supervised Learning
Newly remodeled 2 Bdrms/1 Bath, spacious upper
unit, located in Hilltop Mall area. Walking
distance to shopping, public transportation,
schools and park. Paid water and garbage. No dogs
allowed.
Prototype Lists
NN president IN of
VBD said NNS shares
CC and TO to
NNP Mr. PUNC .
JJ new CD million
DET the VBP are
FEATURE kitchen, laundry
LOCATION near, close
TERMS paid, utilities
SIZE large, feet
RESTRICT cat, smoking
English POS
Information Extraction
110Language Evolution