A Corpusbased Technique for Grammar Development - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

A Corpusbased Technique for Grammar Development

Description:

A Corpus-based Technique for Grammar Development. Philippe Blache, Marie-Laure ... state of the constraint system, i.e. set of satisfied & violated constraints ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 30
Provided by: marielau
Category:

less

Transcript and Presenter's Notes

Title: A Corpusbased Technique for Grammar Development


1
A Corpus-based Technique for Grammar Development
  • Philippe Blache, Marie-Laure Guénot, Tristan van
    Rullen

Laboratoire Parole et Langage CNRS Université
de Provence France
Corpus Linguistics SPROLAC Workshop Lancaster,
March 26th, 2003
2
Outline
  • Overview of the step-by-step Grammar development
    process
  • The formalism of Property Grammars
  • Ressources, tools and their use
  • Some details on parsing tools
  • Deep parser
  • Shallow parser
  • Multiplexer
  • Conclusions and perspectives

3
Step by step grammar development
  • A fully constraint-based approach
  • Broad-coverage grammars
  • Several parsing tools
  • For development
  • For evaluation

4
Overview of the development process
completion
tagged Corpus
? Versioning ?
Property Grammar Version 1
Property Grammar Version n

non-determinist Deep parser
parse result R1
parse result Rn

? Analysis ?
Versioning stage on parts of a grammar
Syntactic phenomena interpretation
Different tagged Large corpora
releasing
Property Grammar Version n
Property Grammar Version n1

Shallow parser
parse result Rn
parse result Rn1

Large tests stage on whole grammar releases
Multiplexer
Multiplexed output and statistics about results
Interpretation of modifications with the
indications of statistics
5
The formalism of Property Grammars
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

6
The formalism of Property Grammars
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

7
The formalism of Property Grammars
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

8
The formalism of Property Grammars
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

9
The formalism of Property Grammars
the Det most requested touristic flights N
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

10
The formalism of Property Grammars
the most requested touristic A flights N
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

11
The formalism of Property Grammars
the Det most requested touristic flights N
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

12
The formalism of Property Grammars
the Det most requested touristic flights N
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level

13
The formalism of Property Grammars
  • Totally constraint-based
  • Properties (constraints) relations between
    categories of the same level
  • No explicit mention of constituency
  • The set of properties describing a
  • category forms a graph

14
Parsing with Property Grammars
  • Parsing constraint satisfaction
  • Characterization (parsing result) state of the
    constraint system, i.e. set of satisfied
    violated constraints
  • Identification of a set of categories
  • Identification of its relevant properties
    (evaluated)
  • Building a characterization graph
  • Whatever the input
  • Unrestricted texts, spoken language corpora
  • All constraints are at the same level, and are
    independent
  • Separate evaluation is possible

15
Parsing with Property Grammars
16
Parsing with Property Grammars
  • Syntactic description is only based on constraint
    satisfaction
  • no derivation relation
  • no need for a grammar to be complete, coherent
    nor consistent to be evaluated
  • possible representation of partial information
    partial structures
  • ? Flexibility

17
Ressources different corpora
  • A French treebank
  • 6500 tagged and disambiguished sentences among a
    corpus of 13000 journalistic sentences
  • Large corpora (160.000.000 words)
  • French newspapers
  • Novels
  • Oral transcriptions

18
Parsing Tools
  • Non-deterministic deep parser
  • syntactic phenomena identification
  • grammar completion experimentation
  • Deterministic shallow parser
  • systematic evaluation on unrestricted data
  • robustness and efficiency
  • Multiplexer
  • statistics about results

19
Parsing Tools
  • Non-deterministic deep parser
  • syntactic phenomena identification
  • grammar completion experimentation
  • Deterministic shallow parser
  • systematic evaluation on unrestricted data
  • robustness and efficiency
  • Multiplexer
  • statistics about results

20
Parsing Tools
  • Non-deterministic deep parser
  • syntactic phenomena identification
  • grammar completion experimentation
  • Deterministic shallow parser
  • systematic evaluation on unrestricted data
  • robustness and efficiency
  • Multiplexer
  • statistics about results

21
Development Tool Deep Parser
  • Non deterministic
  • Descriptive point of view
  • identification, among the corpus, of various
    occurrences of a construction (e.g. coordination)
  • accurate empirical linguistic description
  • tests with the Deep Parser to observe the
    evolution of the results (quality quantity)
  • integration of the results into the grammar

22
Development Tool Deep Parser
  • Grammar versioning
  • correction of the grammar
  • test with the Deep Parser to observe the
    evolution of the results (quality quantity)

23
Development Tool Deep Parser
  • Set of properties
  • isolation, among the grammar, of a set of
    properties
  • observation of its own behaviour and its impact
    on the Deep Parser
  • modification of this set of properties and/or
    its semantics
  • tests with the Deep Parser to observe the
    evolution of the results (quality quantity)

24
Deep parsing outputs
Two deep parses for two grammar versions of the
sentence so well ask you too hum why its the
best
25
Development Tool Shallow Parser
  • Deterministic
  • heuristics to control the parse
  • Classic left-corner parsing
  • Dynamic constraint-satisfaction algorithm
  • Test of the efficiency of the grammar over large
    corpora

26
Shallow parsing outputs
(P) (NP)La celebration (PP)de
(NP)le(AP)dixième (NP)anniversaire
(PP)de (NP)la mort (PP)de (NP)Max
Pol Fouchet (VP)va commencer
(P) (NP)La celebration (PP)de (NP)le
(AP)dixième (NP)anniversaire
(PP)de (NP)la mort
(PP)de (NP)Max Pol Fouchet (VP)va
commencer
Two shallow parses for two different grammar
releases with the sentence the celebration of
the tens anniversary of Max Pol Fouchets death
will begin
27
Evaluation Tool Multiplexer
  • Parameterised automatic evaluation strategy
  • Comparison of phrase common boundaries
    statistics
  • width, nature, count
  • No need of a treebank to compare parses.
  • With a treebank, the multiplexer becomes an
    evaluation device

28
Some multiplexers statistics
This evaluation shows two grammars letting NPs
unchanged, and giving 25 of different VPs and
15 of different PPs
29
Conclusions perspectives
  • Equivalent and contradictory constraints are
    specified
  • The Property Grammar paradigm is simplified and
    enriched by such information.
  • Taggers and parsers still can be improved and
    evaluated
  • A french evaluation project is being prepared
  • The results of the current development process
    lead to the programming of a context-dependent
    granular parser
Write a Comment
User Comments (0)
About PowerShow.com