Title: Kernel Methods Cont Case study on Coreference Resolution
1Kernel Methods ContCase study on Coreference
Resolution
- Heng Ji
- hengji_at_cs.qc.cuny.edu
- Oct 2, 2009
2Assignment 1 Analysis
- How much does linguistic-intensive features help?
- Any problem on the scoring?
3Outline
- Count distance between hyperplanes
- Correction from low-dimension to
high-dimension (my apologies I got confused at
dimensions vs. features in last class) - Theoretical Support
- More about Kernel methods
- Coreference Resolution
4Linear Support Vector Machines
- Primal form
- Minimize
- Subject to
- To solve this, transform the primal to the dual
see the next slide
5Linear Support Vector Machines
- Dual form
- W can be recovered by
m
m
m
m
6Characteristics of the Solution
- Many of the ai are zero
- w is a linear combination of a small number of
data points - xi with non-zero ai are called support vectors
(SV) - The decision boundary is determined only by the
SV - For testing with a new data z
- Compute and
classify z as class 1 if the sum is positive, and
class 2 otherwise - Note w need not be formed explicitly
7SVM model
- SVM model
- Bias b, a list of support vectors and their
coefficients ? - Where is ? and How C affects the model?
- Why dont we compute w explicitly?
8Nonlinear Support Vector Machines
- What if decision boundary is not linear?
9Nonlinear Support Vector Machines
- Transform data into higher dimensional space
10Nonlinear Support Vector Machines
- A naive way
- Transform data into higher dimensional space
- Compute a linear boundary function in the new
feature space - the boundary function becomes nonlinear in the
original feature space gt very time consuming
though. - SVM Kernel trick
- Does all of these without explicitly transforming
data into higher dimensional space
11Nonlinear Support Vector Machines
m
m
m
12Basic Math Dot Product
13An Example for f(.) and K(.,.)
- Suppose f(.) is given as follows
- An inner product in the feature space is
- So, if we define the kernel function as follows,
there is no need to carry out f(.) explicitly - This use of kernel function to avoid carrying out
f(.) explicitly is known as the kernel trick
original feature space
new feature space
14Kernel Functions
- In practical use of SVM, the user specifies the
kernel function the transformation f(.) is not
explicitly stated - Another view kernel function, being an inner
product, is really a similarity measure between
the objects
15Examples of Kernels
- Assume we measure two features, e.g. head and
dependency path dp and we use the mapping - Consider the function
- We can verify that
16Polynomial and Gaussian Kernels
- is called the polynomial kernel of degree p.
- In general, using the Kernel trick provides huge
computational savings over explicit mapping! - Another commonly used Kernel is the Gaussian
(maps to a dimensional space with number of
dimensions equal to the number of training cases)
17Support Vector Machines
- Three main ideas
- Define what an optimal hyperplane is (in way that
can be identified in a computationally efficient
way) maximize margin - Extend the above definition for non-linearly
separable problems have a penalty term for
misclassifications - Map data to high dimensional space where it is
easier to classify with linear decision surfaces
reformulate problem so that data is mapped
implicitly to this space
18Modification Due to Kernel Function
- Change all inner products to kernel functions
- For training,
m
m
Original
m
m
m
With kernel function
m
19Modification Due to Kernel Function
- For testing, the new data z is classified as
class 1 if f ³0, and as class 2 if f lt0
Original
With kernel function
20The Kernel Trick
- ?(xi) ? ?(xj) means, map data into new space,
then take the inner product of the new vectors - We can find a function such that K(xi ? xj)
?(xi) ? ?(xj), i.e., the image of the inner
product of the data is the inner product of the
images of the data - Then, we do not need to explicitly map the data
into the high-dimensional space to solve the
optimization problem (for training)
21Terminology Mention
The American Medical Association voted yesterday
to install the heir apparent as its
president-elect, rejecting a strong, upstart
challenge by a District doctor who argued that
the nations largest physicians group needs
stronger ethics and new leadership. In electing
Thomas R. Reardon, an Oregon general practitioner
who had been the chairman of its board, members
signified they did not hold him responsible for a
costly gaffe last year, when the group agreed to
endorse a line of Sunbeam Corp. health care
products. Reardon had become chairman
Mention NAME, NOMINAL, PRONOUN
22Entity physical object (Overloaded set of
mentions)Coreference Resolution From Mentions
To Entities
Terminology Entity
The American Medical Association voted yesterday
to install the heir apparent as its
president-elect, rejecting a strong, upstart
challenge by a District doctor who argued that
the nations largest physicians group needs
stronger ethics and new leadership. In electing
Thomas R. Reardon, an Oregon general practitioner
who had been the chairman of its board, members
signified they did not hold him responsible for a
costly gaffe last year, when the group agreed to
endorse a line of Sunbeam Corp. health care
products. Reardon had become chairman
AMA
Reardon
23Coreference Resolution From Mentions to Entities
- Input
- A large population is waiting for Bill Gates
and Microsoft to release MS two thousand, the
millennium virus fixer. Alan Simpson, the senior
scientist, has been involved for years in the
creation of global communications networks. He
says Americans must realize the threat posed by
the so called millennium bug is very real and
very serious. - Output
24Binary Classification for Coreference
population
Corefer (Mi, Mj) No
Bill Gates
Corefer (Mi, Mj) No
the senior scientist
Microsoft
Corefer (Mi, Mj) No
Alan Simpson
Corefer (Mi, Mj) Yes
A large population is waiting for Bill Gates and
Microsoft to release MS two thousand, the
millennium virus fixer. Alan Simpson, the senior
scientist, has been involved for years in the
creation of global communications networks.
25Model Training
- Solution 1 mention-pair model
- Solution 2 Entity-mention Model
26 Basic Features
- Basic Feature Groups
- -Lexical exact match, partial match, acronym,
edit dist - -Syntactic apposition, POS tags
- -Distance word, sentence distance
- -Count how many times a phrase see in the
document - -Mention information spelling, level, type,
- -Pronoun Attributes gender, number,
possessiveness, reflexivity - Conjunction features
27More advanced Features
- Entity-level
- Gender
- Number
- Syntactic features
- Definiteness
- Same-NP test
- Functional tag
- Hobbs distance
- C-command/Governing category
- Dependency structure
28Entity-level Features
- Gender
- 5 values UNK, MALE, FEM, NEUT, CONF
- Set of rules of gender propagation
- (e.g., (ent.MALE, m.FEM) -gt ent.CONF)
- CONF is sticky
- Number
- 4 values UNK, SING, PL, CONF
- Rules to set values
- Features (ent_genx, ment_geny)