Capturing linguistic interaction in a grammar

About This Presentation

Title:

Capturing linguistic interaction in a grammar

Description:

Each sentence analysed in the form of a tree. different ... Sequential probability analysis. calculate probability of adding each AJP ... analysis of ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 28

Provided by: seanw2

Category:

more less

Transcript and Presenter's Notes

Title: Capturing linguistic interaction in a grammar

1
Capturing linguistic interaction in a grammar

A method for empirically evaluatingthe grammar
of a parsed corpus

Sean Wallis Survey of English Usage University
College London s.wallis_at_ucl.ac.uk
2
Capturing linguistic interaction...

Parsed corpus linguistics
Empirical evaluation of grammar
Experiments
Attributive AJPs
Preverbal AVPs
Embedded postmodifying clauses
Conclusions
Comparing grammars or corpora
Potential applications

3
Parsed corpus linguistics

Several million-word parsed corpora exist
Each sentence analysed in the form of a tree
different languages have been analysed
limited amount of spontaneous speech data
Commitment to a particular grammar required
different schemes have been applied
problems computational completeness manual
consistency
Tools support linguistic research in corpora

4
Parsed corpus linguistics

An example tree from ICE-GB (spoken)

S1A-006 23
5
Parsed corpus linguistics

Three kinds of evidence may be obtained from a
parsed corpus
Frequency evidence of a particular known rule,
structure or linguistic event
Coverage evidence of new rules, etc.
Interaction evidence of the relationship between
rules, structures and events
This evidence is necessarily framed within a
particular grammatical scheme
So how might we evaluate this grammar?

6
Empirical evaluation of grammar

Many theories, frameworks and grammars
no agreed evaluation method exists
linguistics is divided into competing camps
status of parsed corpora suspect
Possible method retrievability of events
circularity you get out what you put in
redundancy improvement by mere addition
atomic based on single events, not pattern
specificity based on particular phenomena
New method retrievability of event sequences

7
Experiment 1 attributive AJPs

Adjectives before a noun in English
Simple idea plot the frequency of NPs with at
least n 0, 1, 2, 3 attributive AJPs

8
Experiment 1 attributive AJPs

Adjectives before a noun in English
Simple idea plot the frequency of NPs with at
least n 0, 1, 2, 3 attributive AJPs

Raw frequency
Log frequency
NB not a straight line
9
Experiment 1 analysis of results

If the log-frequency line is straight
exponential fall in frequency (constant
probability)
no interaction between decisions (cf. coin
tossing)
Sequential probability analysis
calculate probability of adding each AJP
error bars (binomial)
probability falls
second lt first
third lt second
fourth lt second
decisions interact

10
Experiment 1 analysis of results

If the log-frequency line is straight
exponential fall in frequency (constant
probability)
no interaction between decisions (cf. coin
tossing)
Sequential probability analysis
calculate probability of adding each AJP
error bars (binomial)
probability falls
second lt first
third lt second
fourth lt second
decisions interact

probability
11
Experiment 1 analysis of results

If the log-frequency line is straight
exponential fall in frequency (constant
probability)
no interaction between decisions (cf. coin
tossing)
Sequential probability analysis
calculate probability of adding each AJP
error bars (binomial)
probability falls
decisions interact
fit to a power law
y m.x k
find m and x

probability
y 0.1931x -1.2793
12
Experiment 1 explanations?

Feedback loop for each successive AJP, it is
more difficult to add a further AJP
Explanation 1 semantic constraints
tend to say tall green ship
do not tend to say tall short ship or green tall
ship
Explanation 2 communicative economy
once speaker said tall green ship, tends to only
say ship
Further investigation required
General principle
significant change (usually, fall) in probability
is evidence of an interaction along grammatical
axis

13
Experiments 2,3 variations

? Restrict head common and proper nouns
Common nouns similar results
Proper nouns and adjectives are often treated as
compounds (Northern England vs. lower Loire )
? Ignore grammar adjective noun strings
Some misclassifications / miscounting (noise)
she was beautiful, people said tall very
green ship
Similar results
slightly weaker (third lt second ns at p0.01)
Insufficient evidence for grammar
null hypothesis simple lexical adjacency

14
Experiment 4 preverbal AVPs

Consider adverb phrases before a verb
Results very different
Probability does not fall significantly between
first and second AVP
Probability does fall between third and second
AVP
Possible constraints
(weak) communicative
not (strong) semantic
Further investigationneeded

15
Experiment 4 preverbal AVPs

Consider adverb phrases before a verb
Results very different
Probability does not fall significantly between
first and second AVP
Probability does fall between third and second
AVP
Possible constraints
(weak) communicative
not (strong) semantic
Further investigationneeded
Not power law R2 lt 0.24

probability
16
Experiment 5 embedded clauses

Another way to specify nouns in English
add clause after noun to explicate it
the ship that was tall and green
the ship in the port
may be embedded
the ship in the port with the ancient
lighthouse
or successively postmodified
the ship in the portwith a very old mast
Compare successive embedding and sequential
postmodifying clauses
Axis embedding depth / sequence length

17
Experiment 5 method

Extract examples with FTFs
at least n levels of embedded postmodification

18
Experiment 5 method

Extract examples with FTFs
at least n levels of embedded postmodification
0
1
2

(etc.)
19
Experiment 5 method

Extract examples with FTFs
at least n levels of embedded postmodification
0
1
2
problems
multiple matching cases (use ICECUP IV to
classify)
overlapping cases (subtract extra case)
co-ordination of clauses or NPs (use alternative
patterns)

(etc.)
20
Experiment 5 analysis of results

Probability of adding a further embedded clause
falls with each level
second lt first
sequential lt embedding
Embedding only
third lt first
insufficient data forthird lt second
Conclusion
Interaction along embedding and sequential axes

21
Experiment 5 analysis of results

Probability of adding a further embedded clause
falls with each level
second lt first
sequential lt embedding
Embedding only
third lt first
insufficient data forthird lt second
Conclusion
Interaction along embedding and sequential axes

embedded
sequential
probability
22
Experiment 5 analysis of results

Probability of adding a further embedded clause
falls with each level
second lt first
sequential lt embedding
Fitting to f m.x k
k lt 0 fall ( f m/x k)
k is high steep
Conclusion
Both match power law R2 gt 0.99

embedded
y 0.0539x -1.2206
sequential
y 0.0523x -1.6516
23
Experiment 5 explanations?

Lexical adjacency?
No 87 of 2-level cases have at least one VP, NP
or clause between upper and lower heads
Misclassified cases of embedding?
No very few (5) semantically ambiguous cases
Language production constraints?
Possibly, could also be communicative economy
contrast spontaneous speech with other modes
Positive proof of recursive tree grammar
Established from parsed corpus
cf. negative proof (NLP parsing problems)

24
Conclusions

A new method for evaluating interactions along
grammatical axes
General purpose, robust, structural
More abstract than linguistic choice
experiments
Depends on a concept of grammatical distance
along an axis, based on the chosen grammar
Method has philosophical implications
Grammar viewed as structure of linguistic choices
Linguistics as an evaluable observational science
Signature (trace) of language production
decisions
A unification of theoretical and corpus
linguistics?

25
Comparing grammars or corpora

Can we reliably retrieve known interaction
patterns with different grammars?
Do these patterns differ across corpora?
Benefits over individual event retrieval
non-circular generalisation across local syntax
not subject to redundancy arbitrary terms makes
trends more difficult to retrieve
not atomic based on patterns of interaction
general patterns may have multiple explanations
Supplements retrieval of events

26
Potential applications

Corpus linguistics
Optimising existing grammar
e.g. co-ordination, compound nouns
Theoretical linguistics
Comparing different grammars, same language
Comparing different languages or periods
Psycholinguistics
Search for evidence of language production
constraints in spontaneous speech corpora
speech and language therapy
language acquisition and development

27
Links and further reading

Survey of English Usage
www.ucl.ac.uk/english-usage
Corpora and grammar
.../projects/ice-gb
Full paper
.../staff/sean/resources/analysing-grammatical-int
eraction.pdf
Sequential analysis spreadsheet (Excel)
.../staff/sean/resources/interaction-trends.xls

Write a Comment

User Comments (0)

About PowerShow.com

Capturing linguistic interaction in a grammar - PowerPoint PPT Presentation

Capturing linguistic interaction in a grammar

Each sentence analysed in the form of a tree. different ... Sequential probability analysis. calculate probability of adding each AJP ... analysis of ... – PowerPoint PPT presentation