Chunk Parsing - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Chunk Parsing

Description:

Method: Assign some additional structure to input over tagging. Used when full parsing not feasible or not desirable. ... in [NP the barn] ate. Chunk Parsing ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 23
Provided by: facultyWa4
Category:
Tags: barn | chunk | parsing

less

Transcript and Presenter's Notes

Title: Chunk Parsing


1
Chunk Parsing
2
Chunk Parsing
  • Also called chunking, light parsing, or partial
    parsing.
  • Method Assign some additional structure to
    input over tagging
  • Used when full parsing not feasible or not
    desirable.
  • Because of the expense of full-parsing, often
    treated as a stop-gap solution.

3
Chunk Parsing
  • No rich hierarchy, as in parsing.
  • Usually one layer above tagging.
  • The process
  • Tokenize
  • Tag
  • Chunk

4
Chunk Parsing
  • Like tokenizing and tagging in a few respects
  • Can skip over material in the input
  • Often finite-state (or finite-state like) methods
    are used (applied over tags)
  • Often application specific (i.e., the chunks
    tagged have uses for particular applications)

5
Chunk Parsing
  • Chief Motivations to find data or to ignore
    data
  • Example from Bird and Loper find the argument
    structures for the verb give.
  • Can discover significant grammatical structures
    before developing a grammar
  • gave NP
  • gave up NP in NP
  • gave NP up
  • gave NP help
  • gave NP to NP

6
Chunk Parsing
  • Like parsing, except
  • It is not exhaustive, and doesnt pretend to be.
  • Structures and data can be skipped when not
    convenient or not desired
  • Structures of fixed depth produced
  • Nested structures typical in parsing
  • SNP The cow PP in NP the barn ate
  • Not in chunking
  • NP The cow in NP the barn ate

7
Chunk Parsing
  • Finds contiguous, non-overlapping spans of
    related text, and groups them into chunks.
  • Because contiguity is given, finite state methods
    can be adapted to chunking

8
Longest Match
  • Abney 1995 discusses longest match heuristic
  • One automaton for each phrasal category
  • Start automata at position i (where i0
    initially)
  • Winner is the automaton with the longest match

9
Longest Match
  • He took chunks from the PTB
  • NP ? D N
  • NP ? D Adj N
  • VP ? V
  • Encoded each rule as an automaton
  • Stored longest matching pattern (the winner)
  • If no match for a given word, skipped it (in
    other words, didnt chunk it)
  • Results Precision .92, Recall .88

10
An Application
  • Data-Driven Linguistics Ontology Development (NSF
    BCE-0411348)
  • One focus locate linguistically annotated
    (read tagged) text and extract linguistically
    relevant terms from text
  • Attempt to discover meaning of the terms
  • Intended to build out content of the ontology
    (GOLD)
  • Focus on Interlinear Glossed Text (IGT)

11
An Application
  • Interlinear Glossed Text (IGT), some examples
  • (1) Afisi a-na-ph-a nsomba
  • hyenas SP-PST-kill-ASP fish
  • The hyenas killed the fish.'
    (Baker 1988254)

12
An Application
  • More examples
  • (4) a. yerexa-n p'at'uhan-e bats-ets

    child-NOM window-ACC open-AOR.3SG
  • The child opened the window. (Megerdoom
    ian ??)

13
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
  • Problem How do we discover the meaning of the
    linguistically salient terms, such as NOM, ACC,
    AOR, 3SG?
  • Perhaps we can discover the meanings by examining
    the contexts in which the occur.
  • POS can be a context.
  • Problem POS tags rarely used in IGT
  • How do you assign POS tags to a language you know
    nothing about?
  • IGT gives us aligned text for free!!

14
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
  • IGT gives us aligned text for free!!
  • POS tag the English translation
  • Align with the glosses and language data
  • That helps. We now know that NOM and ACC attach
    to nouns, not verbs (nominal inflections)
  • And AOR and 3SG attach to verbs (verbal
    inflections)

15
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
  • In the LaPolla example, we know that NOM does not
    attach to nouns, but to verbs. Must be some
    other kind of NOM.

16
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
  • How we tagged
  • Globally applied most frequent tags (stupid
    tagger)
  • Repaired tags where context dictated a change
    (e.g., TO preceding race VB)
  • Technique similar to Brill 1995

17
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
  • But can we get more information about NOM, ACC,
    etc.?
  • Can chunking tell us something more about these
    terms?
  • Yes!

18
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
  • Chunk phrases, mainly NPs
  • Since relationship (in simple sentences) between
    NPs and verbs tells us something about the verbs
    arguments (Bird and Loper 2005)
  • We can tap this information to discover more
    about the linguistic tags

19
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
NP
NP
VP
  • Apply Abney 1995s longest match heuristic to get
    as many chunks as possible (especially NP)
  • Leverage English canonical SVO (NVN) order to
    identify simple argument structures
  • Use these to discover more information about the
    terms
  • Thus

20
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
NP
NP
VP
  • We know that
  • NOM attaches to subject NPs may be a case
    marker indicating subject
  • ACC attaches to object NPs may be a case marker
    indicating object

21
An Application
  • What we do next look at co-occurrence relations
    (clustering) of
  • Terms with terms
  • Host categories with terms
  • To determine more information about the terms
  • Done by building feature vectors of the various
    linguistic grammatical terms (grams)
    representing their contexts
  • And measuring relative distances between these
    vectors (in particular, for terms we know)

22
Linguistic Gram Space
Write a Comment
User Comments (0)
About PowerShow.com