Kernel Methods Cont Case study on Coreference Resolution - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Kernel Methods Cont Case study on Coreference Resolution

Description:

... the kernel function; the transformation f(.) is not explicitly ... Pronoun Attributes: gender, number, possessiveness, reflexivity. Conjunction features ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 29
Provided by: hen4
Category:

less

Transcript and Presenter's Notes

Title: Kernel Methods Cont Case study on Coreference Resolution


1
Kernel Methods ContCase study on Coreference
Resolution
  • Heng Ji
  • hengji_at_cs.qc.cuny.edu
  • Oct 2, 2009

2
Assignment 1 Analysis
  • How much does linguistic-intensive features help?
  • Any problem on the scoring?

3
Outline
  • Count distance between hyperplanes
  • Correction from low-dimension to
    high-dimension (my apologies I got confused at
    dimensions vs. features in last class)
  • Theoretical Support
  • More about Kernel methods
  • Coreference Resolution

4
Linear Support Vector Machines
  • Primal form
  • Minimize
  • Subject to
  • To solve this, transform the primal to the dual
    see the next slide

5
Linear Support Vector Machines
  • Dual form
  • W can be recovered by

m
m
m
m
6
Characteristics of the Solution
  • Many of the ai are zero
  • w is a linear combination of a small number of
    data points
  • xi with non-zero ai are called support vectors
    (SV)
  • The decision boundary is determined only by the
    SV
  • For testing with a new data z
  • Compute and
    classify z as class 1 if the sum is positive, and
    class 2 otherwise
  • Note w need not be formed explicitly

7
SVM model
  • SVM model
  • Bias b, a list of support vectors and their
    coefficients ?
  • Where is ? and How C affects the model?
  • Why dont we compute w explicitly?

8
Nonlinear Support Vector Machines
  • What if decision boundary is not linear?

9
Nonlinear Support Vector Machines
  • Transform data into higher dimensional space

10
Nonlinear Support Vector Machines
  • A naive way
  • Transform data into higher dimensional space
  • Compute a linear boundary function in the new
    feature space
  • the boundary function becomes nonlinear in the
    original feature space gt very time consuming
    though.
  • SVM Kernel trick
  • Does all of these without explicitly transforming
    data into higher dimensional space

11
Nonlinear Support Vector Machines
  • Dual form

m
m
m
12
Basic Math Dot Product
13
An Example for f(.) and K(.,.)
  • Suppose f(.) is given as follows
  • An inner product in the feature space is
  • So, if we define the kernel function as follows,
    there is no need to carry out f(.) explicitly
  • This use of kernel function to avoid carrying out
    f(.) explicitly is known as the kernel trick

original feature space
new feature space
14
Kernel Functions
  • In practical use of SVM, the user specifies the
    kernel function the transformation f(.) is not
    explicitly stated
  • Another view kernel function, being an inner
    product, is really a similarity measure between
    the objects

15
Examples of Kernels
  • Assume we measure two features, e.g. head and
    dependency path dp and we use the mapping
  • Consider the function
  • We can verify that

16
Polynomial and Gaussian Kernels
  • is called the polynomial kernel of degree p.
  • In general, using the Kernel trick provides huge
    computational savings over explicit mapping!
  • Another commonly used Kernel is the Gaussian
    (maps to a dimensional space with number of
    dimensions equal to the number of training cases)

17
Support Vector Machines
  • Three main ideas
  • Define what an optimal hyperplane is (in way that
    can be identified in a computationally efficient
    way) maximize margin
  • Extend the above definition for non-linearly
    separable problems have a penalty term for
    misclassifications
  • Map data to high dimensional space where it is
    easier to classify with linear decision surfaces
    reformulate problem so that data is mapped
    implicitly to this space

18
Modification Due to Kernel Function
  • Change all inner products to kernel functions
  • For training,

m
m
Original
m
m
m
With kernel function
m
19
Modification Due to Kernel Function
  • For testing, the new data z is classified as
    class 1 if f ³0, and as class 2 if f lt0

Original
With kernel function
20
The Kernel Trick
  • ?(xi) ? ?(xj) means, map data into new space,
    then take the inner product of the new vectors
  • We can find a function such that K(xi ? xj)
    ?(xi) ? ?(xj), i.e., the image of the inner
    product of the data is the inner product of the
    images of the data
  • Then, we do not need to explicitly map the data
    into the high-dimensional space to solve the
    optimization problem (for training)

21
Terminology Mention
The American Medical Association voted yesterday
to install the heir apparent as its
president-elect, rejecting a strong, upstart
challenge by a District doctor who argued that
the nations largest physicians group needs
stronger ethics and new leadership. In electing
Thomas R. Reardon, an Oregon general practitioner
who had been the chairman of its board, members
signified they did not hold him responsible for a
costly gaffe last year, when the group agreed to
endorse a line of Sunbeam Corp. health care
products. Reardon had become chairman
Mention NAME, NOMINAL, PRONOUN
22
Entity physical object (Overloaded set of
mentions)Coreference Resolution From Mentions
To Entities
Terminology Entity
The American Medical Association voted yesterday
to install the heir apparent as its
president-elect, rejecting a strong, upstart
challenge by a District doctor who argued that
the nations largest physicians group needs
stronger ethics and new leadership. In electing
Thomas R. Reardon, an Oregon general practitioner
who had been the chairman of its board, members
signified they did not hold him responsible for a
costly gaffe last year, when the group agreed to
endorse a line of Sunbeam Corp. health care
products. Reardon had become chairman
AMA
Reardon
23
Coreference Resolution From Mentions to Entities
  • Input
  • A large population is waiting for Bill Gates
    and Microsoft to release MS two thousand, the
    millennium virus fixer. Alan Simpson, the senior
    scientist, has been involved for years in the
    creation of global communications networks. He
    says Americans must realize the threat posed by
    the so called millennium bug is very real and
    very serious.
  • Output

24
Binary Classification for Coreference
population
Corefer (Mi, Mj) No
Bill Gates
Corefer (Mi, Mj) No
the senior scientist
Microsoft
Corefer (Mi, Mj) No
Alan Simpson
Corefer (Mi, Mj) Yes
A large population is waiting for Bill Gates and
Microsoft to release MS two thousand, the
millennium virus fixer. Alan Simpson, the senior
scientist, has been involved for years in the
creation of global communications networks.
25
Model Training
  • Start Model
  • Solution 1 mention-pair model
  • Solution 2 Entity-mention Model

26
Basic Features
  • Basic Feature Groups
  • -Lexical exact match, partial match, acronym,
    edit dist
  • -Syntactic apposition, POS tags
  • -Distance word, sentence distance
  • -Count how many times a phrase see in the
    document
  • -Mention information spelling, level, type,
  • -Pronoun Attributes gender, number,
    possessiveness, reflexivity
  • Conjunction features

27
More advanced Features
  • Entity-level
  • Gender
  • Number
  • Syntactic features
  • Definiteness
  • Same-NP test
  • Functional tag
  • Hobbs distance
  • C-command/Governing category
  • Dependency structure

28
Entity-level Features
  • Gender
  • 5 values UNK, MALE, FEM, NEUT, CONF
  • Set of rules of gender propagation
  • (e.g., (ent.MALE, m.FEM) -gt ent.CONF)
  • CONF is sticky
  • Number
  • 4 values UNK, SING, PL, CONF
  • Rules to set values
  • Features (ent_genx, ment_geny)
Write a Comment
User Comments (0)
About PowerShow.com