Constraint Based Hindi Parser - PowerPoint PPT Presentation

About This Presentation
Title:

Constraint Based Hindi Parser

Description:

Context Free Grammar (CFG) not well-suited for free-word order languages ... Discourse processing (Anaphora resolution, Informational Structure, etc.) Example ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 46
Provided by: ltrcI
Category:

less

Transcript and Presenter's Notes

Title: Constraint Based Hindi Parser


1
Constraint Based Hindi Parser
LTRC, IIIT Hyderabad
2
Introduction
  • Broad coverage parser
  • Very crucial
  • IL-IL MT systems, IE, co-reference resolution,
    etc.

3
Why Dependency ?
  • Phrase Structures
  • Intrinsically presumes order
  • Context Free Grammar (CFG) not well-suited for
    free-word order languages (Shieber, 1985)
  • Particularly ill suited to Indian Languages
  • Dependency Structures
  • Gives flexibility
  • Common structures
  • With appropriate labels, closer to Semantics

4
Computational Paninian Grammar (CPG)
  • Based on Paninis Grammar (500 BC)
  • Inspired by Inflectionally rich language
    (Sanskrit)
  • A dependency based analysis

5
Computational Paninian Grammar (The Basic
Framework)
  • Treats a sentence as a set of modifier-modified
    relations
  • Sentence has a primary modified or the root
    (which is generally a verb)
  • Gives us the framework to identify these
    relations
  • Relations between noun constituent and verb
    called karaka
  • karakas are syntactico-semantic in nature
  • Syntactic cues help us in identifying the karakas

6
karta karma karaka
  • The boy opened the lock
  • k1 karta
  • k2 karma
  • karta, karma usually correspond to agent, theme
  • But not always
  • karakas are direct participants in the activity
    denoted by the verb

open
k1
k2
boy
lock
7
Basic karaka relations
  • karta agent/doer/force
  • Relation label k1
  • karma object/patient
  • Relation label k2
  • karana instrument
  • Relation label k3
  • sampradaan beneficiary
  • Relation label k4
  • apaadaan source
  • Relation label k5
  • adhikarana location in place/time/other
  • Relation label k7p/k7t/k7
  • For complete list of dependency relations (Begum
    et al., 2008)

8
Basic karaka relations
raama phala khaataa hai Ram eats fruit
9
Basic karaka relations
raama chaaku se saiv kaatataa hai Ram cuts the
apple with knife
10
Basic karaka relations
raama ne mohana ko pustaka dii Ram gave a book
to Mohan
11
Why Paninian Labels
  • Other choices for labels could be
  • Grammatical relations
  • Subject, Object, etc.
  • Behavioral tests (Mohanan, 1994)
  • Thematic roles
  • Agent, patient, etc.
  • No concrete cues
  • Difficult to extract them automatically
  • Karakas can be computationally exploited
  • Syntactically grounded, Semantically loaded
  • Gives a level of interface

12
Levels of Language Analysis
  • Morphological analysis (Morph Info.)
  • Analysis in local context (POS tagging)
  • Sentence analysis (Chunking, Parsing)
  • Semantic analysis (Word sense disambiguation,
    etc.)
  • Discourse processing (Anaphora resolution,
    Informational Structure, etc.)

13
Example
  • rAma ne mohana ko puswaka xI

14
Example Parsed Output
xI give
k2
k1
k4
puswaka book
mohana
rAma
15
Parser
  • Two stage strategy
  • Appropriate constraints formed
  • Stage I (Intra-clausal relations)
  • Dependency relations marked
  • Relations such as k1, k2, k3, etc. for each verb
  • Stage II (Inter-clausal relations conjunct
    relations)
  • Conjuncts, relative clauses, kriya mula, etc

16
Demand Frame for Verb
  • A demand frame or karaka frame for a verb
    indicates the demands the verb makes
  • It depends on the verb and its tense, aspect and
    modality (TAM) label.
  • A mapping is specified between karaka relations
    and vibhaktis (post-positions, suffix).

17
Karaka Frame
  • It specifies what karakas are mandatory or
    optional for the verb and what vibhaktis
    (post-positions) they take respectively
  • Each verb belongs to a specific verb class
  • Each class has a basic karaka frame
  • Each TAM specifies a transformation rule

18
Example
  • rAma mohana ko puswaka xewA hE

xewA hE give is
k2
k1
k4
puswaka book
mohana
rAma
Parsed Dependency Tree
19
Transformations
  • Based on the TAM of the verb
  • rAma ne mohana ko KilOnA xiyA
  • rAma ko mohana ko KilOnA xenA padZA
  • Appropriate transformation applied

20
Example
  • rAma ne mohana ko puswaka xI

21
Karaka Frame xe (give)
22
Transformation Rule yA (TAM)
23
Karaka Frame
yA TAM
rAma ne mohana ko KilOnA xiyA
Transformed frame for xe after applying the yA
trasformation
--------------------------------------------------
-------------------------------------- arc-label
necessity vibhakti lextype
src-pos arc-dir -------------------------
--------------------------------------------------
------------- k1 m
ne n l
c k2 m
0ko n l
c k3 d
se n l
c k4 d
ko n l
c --------------------------------------------
--------------------------------------------
0 ? ne
24
Parsed Output
xI give
k2
k1
k4
puswaka book
mohana
rAma
25
Other frames
  • Adjectives

26
Steps in Parsing
SENTENCE
Morph, POS tagging, Chunking
Identify Demand Groups
Load Frames Transform
Find Candidates
Apply Constraints Solve
Final Parse
27
Example
  • rAma ne mohana ko KilOnA xiyA

28
Identify the demand group,Load and Transform DF
  • xiyA
  • Only verb
  • Transformed frame
  • Use yA TAM info.

--------------------------------------------------
-------------------------------------- arc-label
necessity vibhakti lextype
src-pos arc-dir -------------------------
--------------------------------------------------
------------- k1 m
ne n l
c k2 m
0ko n l
c k3 d
se n l
c k4 d
ko n l
c --------------------------------------------
--------------------------------------------
29
Candidates
k1
  • rAma ne mohana ko KilOnA xiyA _ROOT_

main
k2
k2
k4
30
Constraints
  • C1 For each of the mandatory demands in a demand
    frame for each demand group, there should be
    exactly one outgoing edge labeled by the demand
    from the demand group.
  • C2 For each of the optional demands in a demand
    frame for each demand group, there should be at
    most one outgoing edge labeled by the demand from
    the demand group.
  • C3 There should be exactly one incoming arc into
    each source group.

31
Constraints
  • A parse of a sentence is obtained by satisfying
    all the above constraints
  • Ambiguous sentences have multiple parses
  • Ill formed sentences have no parse.

32
Parse - I
k1
  • rAma ne mohana ko KilOnA xiyA _ROOT_

main
k2
k4
33
Parse - I
_ROOT_
main
xiyA
k2
k1
k4
KilOnA
mohana
rAma
34
Integer Programming Constraints
  • Xijk represents a possible arc from word group i
    to j with karaka label k
  • It takes a value 1 if the solution has that arc
    and 0 otherwise. It cannot take any other values.
  • The constraint rules are formulated into
    constraint equations.

35
Constraint Equations
  • C1 For each demand group i, for each of its
    mandatory demands k, the following equalities
    must hold
  • Mik Sj xikj 1
  • C2 For each demand group i, for each of its
    optional or desirable demands k, the following
    inequalities must hold
  • Oik Sj xikj lt 1
  • C3 For each of the source groups j, the
    following equalities must hold
  • Sj Sik xikj 1

36
Multiple Frames
  • If more than one karaka frame for a verb
  • Call Integer Programming package for each frame
  • If more than one demand groups (e.g., multiple
    verbs) in the sentence with multiple demand
    frames
  • Call Integer Programming package for each
    combination of such frames

37
Other frames
  • Common karaka frame
  • Attached to each karaka frame
  • Preference given to main frame if there are
    clashes
  • Fallback karaka frame
  • required karaka frame is missing
  • Graceful degradation

38
Stage I Types being handled
  • Simple Verbs
  • Non-finite verbs
  • wA_huA
  • wA_hI
  • nA
  • kara
  • 0_rahe, etc.
  • Copula
  • Genitive

39
Example (Complex Sentence)
  • rAma ne phala khaakara mohana ko
  • Ram ERG fruit having eaten Mohan
    DAT
  • KilOnA xiyA
  • toy gave
  • Having eaten the fruit Ram gave the toy to
    Mohan

40
Candidates
X1 k1
  • rAma ne phala khaakara mohana ko KilOnA xiyA
    _ROOT_

X8 main
X4 k2
X7 vmod
X6 k2
X2 k2
X3 k2
X5 k4
41
Constraint Equations
  • Verb xe
  • Mandatory Demands (C1)
  • k1 ? x1 1
  • k2 ? x2 x3 x4 1
  • Optional Demands (C2)
  • k4 ? x5 lt 1
  • Verb khaa
  • Mandatory Demands (C1)
  • k2 ? x6 1
  • vmod ? x7 1
  • _ROOT_
  • C1
  • Main ? x8 1

42
Constraint Equations (contd.)
  • Incoming Arcs into Source (C3)
  • rAma
  • x1 1
  • phala
  • x4 x6 1
  • khaa
  • x7 1
  • mohana
  • x3 x5 1
  • KilOnA
  • x2 1
  • xe
  • x8 1

43
Solution Graph
_ROOT_
main
xiyA
k2
k1
k4
vmod
mohana
rAma
KilOnA
khaakara
k2
phala
44
References
  • Akshar Bharati and Rajeev Sangal. 1993. Parsing
    free word order languages in Paninian Framework.
    ACL93, Proc.of Annual Meeting of Association of
    Computational Linguistics, Association of
    Computational Linguistics, New Jersey. USA.
  • Akshar Bharati, Rajeev Sangal, T Papi Reddy.
    2002. A Constraint Based Parser Using Integer
    Programming In Proc. of ICON-2002 International
    Conference on Natural Language Processing.
  • Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti
    Misra Sharma, Lakshmi Bai and Rajeev Sangal.
    2008. Dependency Annotation Scheme for Indian
    Languages. In Proceedings of The Third
    International Joint Conference on Natural
    Language Processing (IJCNLP). Hyderabad, India.
  • S. M. Shieber. 1985. Evidence against the
    context-freeness of natural language. In
    Linguistics and Philosophy, p. 8, 334343.
  • Tara Mohanan, 1994. Arguments in Hindi. CSLI
    Publications.

45
  • THANKS!!
Write a Comment
User Comments (0)
About PowerShow.com