Chunking - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Chunking

Description:

a head word (noun, proper name, adjective, ... Kermes and Evert:02) drawback: requires search. Parser is not deterministic any longer. ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 25
Provided by: imsUnist
Category:
Tags: chunking | kermes

less

Transcript and Presenter's Notes

Title: Chunking


1
Chunking

2
Chunking
  • is an efficient and robust method for
  • identifying short phrases in text (chunks)
  • Chunks are non-overlapping spans of text,
    containing
  • a head word (noun, proper name, adjective, )
  • adjacent modifiers and function words
    (adjective, determiner, preposition, )

3
Example
  • I begin with an intuition when I read
  • a sentence, I read it a chunk at a time
  • I begin with an intuition when I
    read
  • a sentence, I read it a chunk at a
    time
  • I begin with an intuition when I
    read
  • a sentence, I read it a chunk at a
    time

4
Motivation
  • Locate information
  • Extract Noun Phrases Indexing
  • Extract NPs and Verbs Information Extraction
  • Ignore information
  • Aquisition of subcat-information
  • gave NP, gave up NP in NP, gave NP up,
  • gave NP NP, gave NP to NP

5
Definition of Chunks (Abney93)
  • major head a content word not between a function
    word f and the word selected by f
  • root highest node with major head as semantic
    head
  • chunk maximal string containing a major head,
    dominated by root, not contained in other chunk

6
Problem empty categories
  • The underlying grammar must assume
  • null determiners
  • Ø poor people forms a noun chunk
  • and empty nouns.
  • the poor Ø forms a noun chunk

7
Problem Center-Embedding
  • Function word and semantic head may be separated
    by other noun chunks.
  • der/den mehrere Milliarden Euro hohen
    Schäden
  • in Johns house
  • Base noun chunks may be ungrammatical.
  • die im Alter nachlassenden Kräfte

8
Problems with Coordination
  • by Abneys definition










  • if chunks may be
    multi-headed








  • if conjunctions
    are excluded from
    chunks

9
System Overview
10
Base Nouns Chunks vs. Full Noun Chunks
  • Base Noun Chunk maximal string dominated by root
    containing noun as major head ( Abney 1993)
  • Full Noun Chunk Part of NP between determiner
    and (first) head noun (Schmid and Schulte im
    Walde 2000)
  • includes names
  • the discoverer Christopher Columbus
  • but not coordinated NPs
  • parts of Scotland and Northern Ireland
  • and not appositions
  • Christopher Columbus, the famous discoverer,

11
Negative Definition of Full Noun Chunks
  • NP/PP stripped of - adverbials at the front and
    - PPs and relative clauses at the back
    (Brants,
    1999)
  • coordinations and appositions
  • ? parts ? of Scotland and Northern Ireland
  • pre- and postnominal genitives
  • Marias Version der Geschichte
  • measure phrases 20 Dollar Strafe


12
Disambiguation in the recognition of full noun
chunks
  • POS ambiguities resolved by POS tagger.
  • PP attachment ambiguities are kept underspecified
    (wrt. positive and negative definition of full
    noun chunk).
  • All other ambiguities are resolved using the
    longest-match criterion (Abney, 1993).
  • Chunks should be as long as possible.
  • But approaches to deal with them explicitly, by
    underspecification, or by non-monotoniciy exist.

13
Recognizing Full Noun Chunksexplicit
representation of ambiguities
  • used in previous work on full noun chunking
    (Brants99, Schmid and Schulte im Walde00,
    Kermes and Evert02)
  • drawback requires search
  • Parser is not deterministic any longer.
  • Linear complexity is lost.

14
Recognizing Full Noun Chunksdealing with
ambiguities by non-monotonic cascades
  • A method retaining determinism and linear
    complexity (Schiehlen02)
  • recognize base noun chunks that could form
  • beginning,
  • middle or
  • end of a full noun chunk
  • discard those noun chunks (monotonicity lost!)
  • re-apply original noun chunk transducer

15
Recognizing Recursive NPsby Non-Monotonic
Cascades
16
Recognizing Recursive NPsby Underspecification
  • 0 1 2 3 4
    5
  • die Ende der Woche geplanten Treffen
  • Underspecified Representation (à la
    Spranger05)
  • lt NP, 0,1,2,3,4,5, 1,3,5, 1,3,5 gt
  • Desambiguierungsalgorithmus
  • Problem needs preprocessing of set of NPs

17
Case Checking
18
3 Approaches to Agreement Checking in FS Parsers
  • add agreement info to POS tags and compile the
    grammar out (drawback explosion of trans table)
  • postpone agreement check until after chunk
    recognition (Abney, 1997)
  • interleave agreement checking with chunking
    (Neumann et al., 2000), problems with
    subcategorizing multi-words
  • um Gottes willen (for God's sake)
  • um takes acc., um-willen takes gen.!

19
Online Agreement Checking
  • errors avoided
  • genitives (case mismatch)
  • in John's house
  • conjunction attachment (case mismatch)
  • das Leben von Schauspielern und Zirkusleuten
    the life(nomacc) of actors
    and circus people(dat)
  • adjacent NPs (adjective declination)
  • diese beiden ähnliche Erfolge
  • those two(weak) similar(strong) successes

20
Online Agreement Checking
  • Some grammar errors become visible only with
    agreement checking.
  • N coordination is missing.
  • die nachlassenden Kräfte
    the
    diminishing strength
  • die Verletzungen und nachlassenden Kräfte
    the injuries
    and diminishing strength

no noun chunk!
21
Experiment by Schielen02
  • Writing a finite-state grammar is worth the
    effort. FS method performs better than
    statistical method
  • Noun chunker is not very good at determining POS
    tags.
  • Online agreement checking improves performance.
  • Shortest match is better than longest match for
    conjunction attachment.

22
Case checking Readings
23
Underspecification Context Variables
  • Udo/0 1aNPnom,1bNPakk
  • kennt/1
  • eine/2
  • nette/4
  • Frau/5 1aNPakk,1bNPnom
  • aus/6
  • Rio/7 ADJ 1A5
  • ADJ 1A1

24
Literatur
  • Schiehlen02. Experiments in German Noun
    Chunking. COLING 2002, Taipei, August 27th, 2002
  • Schiehlen03. A Cascaded Finite-State Parser for
    German. EACL 2003, Budapest, April 17th, 2003
  • Spranger05. Combining deterministic processing
    with ambiguity awareness the case of German
    quantifying noun groups. PhD Thesis. University
    of Stuttgart.
Write a Comment
User Comments (0)
About PowerShow.com