Finite State Parsing - PowerPoint PPT Presentation

About This Presentation
Title:

Finite State Parsing

Description:

Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan. with a local concern and a Japanese trading house to produce golf clubs to. be shipped ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 28
Provided by: Gues198
Category:

less

Transcript and Presenter's Notes

Title: Finite State Parsing


1
Finite State Parsing Information Extraction
  • CMSC 35100
  • Intro to NLP
  • January 10, 2006

2
Roadmap
  • Motivation
  • Limitations Advantages
  • Example Fastus
  • Finite state cascades
  • Other applications

3
Why NOT Finite State?
  • Fundamental representational limitations
  • Finite state systems cant handle recursion
  • Unsupported phenomena center embedding, etc
  • Fundamentally a strict subset of context-free
    languages

4
Why Finite State?
  • Significant computational advantages
  • FAST!!!!
  • 10 mins vs 36 hours for 100 sentences
  • Can compile rules, even CFGs, to transducers
  • Approximate CFGs, overgenerate in specific ways
  • Toolkits
  • Minimal representational limitations
  • Most recursion is actually bounded
  • Human memory practically limits depth of
    recursion
  • Unroll finite number of recursions
  • Sufficient simple representation for many tasks
  • Information extraction, speech recognition

5
Fastus MUC
  • MUC Message Understanding Conference
  • DARPA shared-task evaluation
  • Task Information extraction
  • Essentially form-filling
  • Only 10 info relevant, no nuance
  • Joint ventures, terrorist incidents
  • Original system Deep syntax, KR, Semantics
  • High precision best in task
  • SLOW!!!! 36 hours for 100 messages

6
MUC Example
Bridgestone Sports Co. said Friday it has set up
a joint venture in Taiwan with a local concern
and a Japanese trading house to produce golf
clubs to be shipped to Japan. The joint
venture, Bridgestone Sports Taiwan Co.,
capitalized at 20 million new Taiwan dollars,
will start production in January 1990 with
production of 20,000 iron and metal wood clubs
a month. TIE-UP-1 Relationship TIE-UP Entities
Bridgestone Sports Co., a local
concern, a Japanese trading house Joint
Venture Company Bridgestone Sports Taiwan
Co. Activity ACTIVITY-1 Amount
NT20000000 A-1 Activity PRODUCTION Company
Bridgestone Sports Taiwan Co. Product iro
n and metal wood clubs Start Date DURING
January 1990
7
Finite-State Cascade
  • Cascade of FSTs
  • Separates stages of processing
  • Initially smaller units, linguistically base
  • Later larger units, domain specific information
  • Complex words multi-words, proper names
  • Basic phrases noun groups, verb groups, part
  • Complex phrases Complex NG, VG
  • Domain events Application info
  • Merging structures co-ref, related info

8
Complex Words
  • Identifies multiwords
  • E.g. set up, trading house, joint venture
  • Company names, people, locations, etc
  • Fixed expressions recognized with microgrammars
  • Subsequent stages can also distinguish
  • E.g. preceding appositive

9
MUC Example Basic Phrases
Bridgestone Sports Co. said Friday it has set up
a joint venture in Taiwan with a local concern
and a Japanese trading house to produce golf
clubs to be shipped to Japan. Company name
Bridgestone Sports Co. Verb Group to be
shipped Verb Group said Preposition to Noun
Group Friday Location Japan Noun
Group it Verb Group had set up Noun Group a
joint venture Preposition in Location Taiwan Pre
position with Noun Group a local
concern Conjunction and Noun Group a Japanese
trading house Verb Group to produce Noun
Group golf clubs
10
Noun Group Extraction
  • Noun Group Head noun premodifiers

NG -gt Pronoun Time-NP Date-NP (DETP) (Adjs)
HdNns DETP Ving HdNns DETP-CP (and
HdNns) DETP -gt DETP-CP DETP-INCP DETP-CP -gt
(Adv-pre-numanother
DetPro-Poss(Adv-pre-numonly (other))
Number Q Q-er (the) Q-est
another Det-cp DetQ Pro-Poss-cp DETP-INCP
-gt DetPro-Poss only a an
Det-incomp Pros-Poss-incomp
(other) (DET-CP) other
11
Noun Group Extraction
  • Adjs -gt AdjP (, (,) Conj AdjP
    Vparticiple)
  • AdjP -gt Ordinal (Q-erQ-estAdjVparticiple
    Nsing,!Time-NP(-)Vparticiple
  • Number (-) monthdayyear(-)old
  • HdNns -gt HdNn (and HdNn)
  • HdNn -gt PropN PreNs PropN PreNs
    N!Time-NP
  • PropN CommonN!Time-NP
  • PreNs -gt PreN (and PreN2)
  • PreN -gt (Adj -) Common-Sing-N
  • PreN2 -gt PreN Ordinal Adj-noun-like

12
Noun Group Extraction AdjP FSA
e
and
AdjP
AdjP
,
1
0
2
3
Vparticiple
,and
13
Noun Group Extraction Adj FSA
-
Vparticiple
Nsing!TimeNP
1
2
Ordinal
3
0
Vparticiple Adj
Q-est
Vparticiple
e
old
4
9
e
Q-er
Adj
5
8
-
-
month day year
6
7
e
e
14
Complex Phrases
  • Build up from basic noun and verb groups
  • Attach appositives
  • Construct measure phrases
  • Attach prepositional phrases
  • Conjoin noun phrases
  • Combine syntactic variants, modalities with
    common meaning
  • Identify domain entities and events

15
Domain Events
  • Ordered list of complex phrases
  • Drops out all other elements -gt robustness
  • Transitions driven by headword phrasetype
  • E.g. company-NounGroup,Formed-PassiveVerbGroup
  • ltCompanygt ltSet-upgtltJoint-Venturegtwith ltCompanygt
  • ltProducegt ltProductgt
  • Map to particular extracted units
  • E.g. Entities in set-up, ProductionProduct Type

16
Multi-layer Cascades
  • Finesse the recursion problem
  • Automata construction expands rules-gtautomata
  • AdjPs are duplicated, but no self-reference
  • AdjPs and NPs in conjunction independent
  • One level identifies base, non-recursive NGs
  • Next levels combine with
  • Measure phrases, prepositional phrases,
    conjunction
  • Limits depth of possible recursive constructs

17
More Complete FST Parsing
  • Roche 1996, 97, etc
  • Construct syntactic dictionary
  • S N thinks that S S N kept N
  • N John N Peter Nthe book
  • Convert entries to finite-state transducers
  • S a thinks that B S-gt
  • (S N a N ltV thinks Vgt that S b S S)
  • N John N gt (N John N)

18
Transducer Dictionary
19
Transducer Dictionary
20
Full Transducer Dictionary
21
Transducers -gt Parser
  • Transducer dictionary Union of transducers
  • T_dic U T_i
  • Parser Repeated application of transducers
  • Repeat until output input
  • Transduction causes no change

22
Finite-State Extensions
  • Finite-State Approaches to
  • Tree Adjoining Grammars
  • Machine translation
  • Multimodal analysis and interpretation

23
Probabilistic CFGs
24
Handling Syntactic Ambiguity
  • Natural language syntax
  • Varied, has DEGREES of acceptability
  • Ambiguous
  • Probability framework for preferences
  • Augment original context-free rules PCFG
  • Add probabilities to transitions

0.2
NP -gt N NP -gt Det N NP -gt Det Adj N NP -gt NP PP
0.45
0.85
VP -gt V VP -gt V NP VP -gt V NP PP
S -gt NP VP S -gt S conj S
1.0
PP -gt P NP
0.65
0.45
0.15
0.10
0.10
0.05
25
PCFGs
  • Learning probabilities
  • Strategy 1 Write (manual) CFG,
  • Use treebank (collection of parse trees) to find
    probabilities
  • Parsing with PCFGs
  • Rank parse trees based on probability
  • Provides graceful degradation
  • Can get some parse even for unusual constructions
    - low value

26
Parse Ambiguity
  • Two parse trees

S
S
NP
VP
NP
VP
N V NP
NP PP
N V NP PP
Det N P NP
Det N P NP
Det N
Det N
I saw the man with the duck
I saw the man with the duck
27
Parse Probabilities
  • T(ree),S(entence),n(ode),R(ule)
  • T1 0.850.20.10.6510.65 0.007
  • T2 0.850.20.450.050.6510.65 0.003
  • Select T1
  • Best systems achieve 92-93 accuracy
Write a Comment
User Comments (0)
About PowerShow.com