Capturing linguistic interaction in a grammar - PowerPoint PPT Presentation

About This Presentation
Title:

Capturing linguistic interaction in a grammar

Description:

Each sentence analysed in the form of a tree. different ... Sequential probability analysis. calculate probability of adding each AJP ... analysis of ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 28
Provided by: seanw2
Category:

less

Transcript and Presenter's Notes

Title: Capturing linguistic interaction in a grammar


1
Capturing linguistic interaction in a grammar
  • A method for empirically evaluatingthe grammar
    of a parsed corpus

Sean Wallis Survey of English Usage University
College London s.wallis_at_ucl.ac.uk
2
Capturing linguistic interaction...
  • Parsed corpus linguistics
  • Empirical evaluation of grammar
  • Experiments
  • Attributive AJPs
  • Preverbal AVPs
  • Embedded postmodifying clauses
  • Conclusions
  • Comparing grammars or corpora
  • Potential applications

3
Parsed corpus linguistics
  • Several million-word parsed corpora exist
  • Each sentence analysed in the form of a tree
  • different languages have been analysed
  • limited amount of spontaneous speech data
  • Commitment to a particular grammar required
  • different schemes have been applied
  • problems computational completeness manual
    consistency
  • Tools support linguistic research in corpora

4
Parsed corpus linguistics
  • An example tree from ICE-GB (spoken)

S1A-006 23
5
Parsed corpus linguistics
  • Three kinds of evidence may be obtained from a
    parsed corpus
  • Frequency evidence of a particular known rule,
    structure or linguistic event
  • Coverage evidence of new rules, etc.
  • Interaction evidence of the relationship between
    rules, structures and events
  • This evidence is necessarily framed within a
    particular grammatical scheme
  • So how might we evaluate this grammar?

6
Empirical evaluation of grammar
  • Many theories, frameworks and grammars
  • no agreed evaluation method exists
  • linguistics is divided into competing camps
  • status of parsed corpora suspect
  • Possible method retrievability of events
  • circularity you get out what you put in
  • redundancy improvement by mere addition
  • atomic based on single events, not pattern
  • specificity based on particular phenomena
  • New method retrievability of event sequences

7
Experiment 1 attributive AJPs
  • Adjectives before a noun in English
  • Simple idea plot the frequency of NPs with at
    least n 0, 1, 2, 3 attributive AJPs

8
Experiment 1 attributive AJPs
  • Adjectives before a noun in English
  • Simple idea plot the frequency of NPs with at
    least n 0, 1, 2, 3 attributive AJPs

Raw frequency
Log frequency
NB not a straight line
9
Experiment 1 analysis of results
  • If the log-frequency line is straight
  • exponential fall in frequency (constant
    probability)
  • no interaction between decisions (cf. coin
    tossing)
  • Sequential probability analysis
  • calculate probability of adding each AJP
  • error bars (binomial)
  • probability falls
  • second lt first
  • third lt second
  • fourth lt second
  • decisions interact

10
Experiment 1 analysis of results
  • If the log-frequency line is straight
  • exponential fall in frequency (constant
    probability)
  • no interaction between decisions (cf. coin
    tossing)
  • Sequential probability analysis
  • calculate probability of adding each AJP
  • error bars (binomial)
  • probability falls
  • second lt first
  • third lt second
  • fourth lt second
  • decisions interact

probability
11
Experiment 1 analysis of results
  • If the log-frequency line is straight
  • exponential fall in frequency (constant
    probability)
  • no interaction between decisions (cf. coin
    tossing)
  • Sequential probability analysis
  • calculate probability of adding each AJP
  • error bars (binomial)
  • probability falls
  • decisions interact
  • fit to a power law
  • y m.x k
  • find m and x

probability
y 0.1931x -1.2793
12
Experiment 1 explanations?
  • Feedback loop for each successive AJP, it is
    more difficult to add a further AJP
  • Explanation 1 semantic constraints
  • tend to say tall green ship
  • do not tend to say tall short ship or green tall
    ship
  • Explanation 2 communicative economy
  • once speaker said tall green ship, tends to only
    say ship
  • Further investigation required
  • General principle
  • significant change (usually, fall) in probability
    is evidence of an interaction along grammatical
    axis

13
Experiments 2,3 variations
  • ? Restrict head common and proper nouns
  • Common nouns similar results
  • Proper nouns and adjectives are often treated as
    compounds (Northern England vs. lower Loire )
  • ? Ignore grammar adjective noun strings
  • Some misclassifications / miscounting (noise)
  • she was beautiful, people said tall very
    green ship
  • Similar results
  • slightly weaker (third lt second ns at p0.01)
  • Insufficient evidence for grammar
  • null hypothesis simple lexical adjacency

14
Experiment 4 preverbal AVPs
  • Consider adverb phrases before a verb
  • Results very different
  • Probability does not fall significantly between
    first and second AVP
  • Probability does fall between third and second
    AVP
  • Possible constraints
  • (weak) communicative
  • not (strong) semantic
  • Further investigationneeded

15
Experiment 4 preverbal AVPs
  • Consider adverb phrases before a verb
  • Results very different
  • Probability does not fall significantly between
    first and second AVP
  • Probability does fall between third and second
    AVP
  • Possible constraints
  • (weak) communicative
  • not (strong) semantic
  • Further investigationneeded
  • Not power law R2 lt 0.24

probability
16
Experiment 5 embedded clauses
  • Another way to specify nouns in English
  • add clause after noun to explicate it
  • the ship that was tall and green
  • the ship in the port
  • may be embedded
  • the ship in the port with the ancient
    lighthouse
  • or successively postmodified
  • the ship in the portwith a very old mast
  • Compare successive embedding and sequential
    postmodifying clauses
  • Axis embedding depth / sequence length

17
Experiment 5 method
  • Extract examples with FTFs
  • at least n levels of embedded postmodification

18
Experiment 5 method
  • Extract examples with FTFs
  • at least n levels of embedded postmodification
  • 0
  • 1
  • 2

(etc.)
19
Experiment 5 method
  • Extract examples with FTFs
  • at least n levels of embedded postmodification
  • 0
  • 1
  • 2
  • problems
  • multiple matching cases (use ICECUP IV to
    classify)
  • overlapping cases (subtract extra case)
  • co-ordination of clauses or NPs (use alternative
    patterns)

(etc.)
20
Experiment 5 analysis of results
  • Probability of adding a further embedded clause
    falls with each level
  • second lt first
  • sequential lt embedding
  • Embedding only
  • third lt first
  • insufficient data forthird lt second
  • Conclusion
  • Interaction along embedding and sequential axes

21
Experiment 5 analysis of results
  • Probability of adding a further embedded clause
    falls with each level
  • second lt first
  • sequential lt embedding
  • Embedding only
  • third lt first
  • insufficient data forthird lt second
  • Conclusion
  • Interaction along embedding and sequential axes

embedded
sequential
probability
22
Experiment 5 analysis of results
  • Probability of adding a further embedded clause
    falls with each level
  • second lt first
  • sequential lt embedding
  • Fitting to f m.x k
  • k lt 0 fall ( f m/x k)
  • k is high steep
  • Conclusion
  • Both match power law R2 gt 0.99

embedded
y 0.0539x -1.2206
sequential
y 0.0523x -1.6516
23
Experiment 5 explanations?
  • Lexical adjacency?
  • No 87 of 2-level cases have at least one VP, NP
    or clause between upper and lower heads
  • Misclassified cases of embedding?
  • No very few (5) semantically ambiguous cases
  • Language production constraints?
  • Possibly, could also be communicative economy
  • contrast spontaneous speech with other modes
  • Positive proof of recursive tree grammar
  • Established from parsed corpus
  • cf. negative proof (NLP parsing problems)

24
Conclusions
  • A new method for evaluating interactions along
    grammatical axes
  • General purpose, robust, structural
  • More abstract than linguistic choice
    experiments
  • Depends on a concept of grammatical distance
    along an axis, based on the chosen grammar
  • Method has philosophical implications
  • Grammar viewed as structure of linguistic choices
  • Linguistics as an evaluable observational science
  • Signature (trace) of language production
    decisions
  • A unification of theoretical and corpus
    linguistics?

25
Comparing grammars or corpora
  • Can we reliably retrieve known interaction
    patterns with different grammars?
  • Do these patterns differ across corpora?
  • Benefits over individual event retrieval
  • non-circular generalisation across local syntax
  • not subject to redundancy arbitrary terms makes
    trends more difficult to retrieve
  • not atomic based on patterns of interaction
  • general patterns may have multiple explanations
  • Supplements retrieval of events

26
Potential applications
  • Corpus linguistics
  • Optimising existing grammar
  • e.g. co-ordination, compound nouns
  • Theoretical linguistics
  • Comparing different grammars, same language
  • Comparing different languages or periods
  • Psycholinguistics
  • Search for evidence of language production
    constraints in spontaneous speech corpora
  • speech and language therapy
  • language acquisition and development

27
Links and further reading
  • Survey of English Usage
  • www.ucl.ac.uk/english-usage
  • Corpora and grammar
  • .../projects/ice-gb
  • Full paper
  • .../staff/sean/resources/analysing-grammatical-int
    eraction.pdf
  • Sequential analysis spreadsheet (Excel)
  • .../staff/sean/resources/interaction-trends.xls
Write a Comment
User Comments (0)
About PowerShow.com