Anne Li-E Liu - PowerPoint PPT Presentation

About This Presentation
Title:

Anne Li-E Liu

Description:

Automated Suggestions for Miscollocations Anne Li-E Liu David Wible Nai-lung Tsao Overview Introduction Methodology Experimental Results Conclusion Introduction Our ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 30
Provided by: roch123
Category:

less

Transcript and Presenter's Notes

Title: Anne Li-E Liu


1
Automated Suggestions for Miscollocations
  • Anne Li-E Liu
  • David Wible
  • Nai-lung Tsao

2
Overview
  • Introduction
  • Methodology
  • Experimental Results
  • Conclusion

3
Introduction
  • Our study focuses on how to find suggestions for
    miscollocations automatically.
  • In this paper, only verb-noun collocations and
    miscollocations are considered.

4
Introduction
  • Howarths (1998) investigation of collocations
    found in L1 and L2 writers writing.
  • Grangers analysis on adverb-adjective
    collocation (1998).
  • Lius (2002) lexical semantic analysis on the
    verb-noun miscollocations in English Taiwanese
    Learner Corpus.

5
Introduction
  • Projects using learner corpora in analyzing and
    categorizing learner errors
  • NICT JLE (Japanese Learner English) Corpus
  • The Chinese Learner English Corpus (CLEC)
  • English Taiwan Learner Corpus (or TLC) (Wible et
    al., 2003).

6
An example
1. solve
2. pose
3. tackle
4. grapple
5. alleviate
6. overcome
7. exacerbate
8. compound
9. beset
10. resolve
  • She tries to improve her students problems.

7
Method
  • Three features of collocate candidates are used
  • 1. Word association strength,
  • 2. Semantic similarity
  • 3. Intercollocability (Cowie and Howarth,
    1996).

8
Resource
  • 84 VN miscollocations in TLC (Liu, 2002).
  • Training data 42 Testing data 42
  • Two knowledge resources BNC, WordNet
  • Two human evaluators.

9
Word Association Strength
  • Mutual Information (Church et al. 1991)
  • Two purposes
  • All suggested correct collocations have to be
    identified as collocations.
  • The higher the word association strength the more
    likely it is to be a correct substitute for the
    wrong collocate.

10
Semantic Similarity
  • A semantic relation holds between a miscollocate
    and its correct counterpart (Gitsaki et al.,
    2000 Liu 2002)
  • The synsets of WordNet to be nodes in a graph.
    ?measure graph-theoretic distance

say a story
tell a story
think of a story
say a story
11
Semantic Similarity
12
Intercollocability
  • Cowie and Howarth (1996) propose that certain
    collocations form clusters on the basis of the
    shared meaning.

convey point
get across the message
communicate concern
convey feeling
express concern
13
Intercollocability
  • Collocations in a cluster show a certain degree
    of intercollocability.

?
condolences
express ones concern
express communicate
concern feeling
14
Intercollocability
  • She tries to improve her students problems.

improve problem
Starting point.
problem
86 verb collocates
improve
52 noun collocates
problem
problem
resolve/ improve
resolve
reduce
situation matter way
15
Intercollocability
situation matter problem way quality efficiency ef
fectiveness
situation matter problem way
resolve
reduce
  • The cluster is partially created and the link
    between improve, resolve and reduce is developed
    by virtue of the overlapping noun collocates.

16
Intercollocability
  • Quantify intercollocability
  • The number of shared collocates

17
situation matter problem way quality efficiency ef
fectiveness
situation matter problem way
resolve
reduce
  • shared collocate (resolve, improve) 3
  • shared collocate (reduce, improve) 3
  • The more shared collocates a verb has with the
    wrong verb, the more likely this verb is a good
    candidate

18
Integrate the 3 features
  • The probabilistic model

19
Training
  • Probability distribution of word association
    strength
  • MI value to 5 levels
  • (lt1.5, 1.53.0, 3.04.5, 4.56, gt6)
  • P( MI level )
  • P(MI level Sc)

20
Training
  • Probability distribution of semantic similarity
  • Similarity score to 5 levels
  • (0.00.2, 0.20.4, 0.40.6, 0.60.8 and 0.8 1.0
    )
  • P(SS level )
  • P(SS level Sc)

21
Training
  • Probability distribution of intercollocability
  • Normalized shared collocates number to 5 levels
  • (0.00.2, 0.20.4, 0.40.6, 0.60.8 and 0.8 1.0
    )
  • P(SC level )
  • P(SC level Sc)

22
Experiments
  • Different combinations of the three features.

Models Feature (s) considered
M1 MI (Mutual Information)
M2 SS (Semantic Similarity)
M3 SC (Shared Collocates)
M4 MI SS
M5 MI SC
M6 SS SC
M7 MI SS SC
23
Results
K-Best M1 M2 (SS) M3 M4 M5 M6 (SSSC) M7 (MISSSC)
1 16.67 40.48 22.62 48.81 29.76 55.95 53.75
2 36.90 53.45 38.10 60.71 44.05 63.1 67.86
3 47.62 64.29 50.00 71.43 59.52 77.38 78.57
4 52.38 67.86 63.10 77.38 72.62 80.95 82.14
5 64.29 75.00 72.62 83.33 78.57 83.33 85.71
6 65.48 77.38 75.00 85.71 83.33 84.52 88.10
7 67.86 77.38 77.38 86.90 86.90 86.90 89.29
8 70.24 80.95 82.14 86.90 89.29 88.10 91.67
9 72.62 83.33 85.71 88.10 92.86 90.48 92.86
10 76.19 86.90 88.10 88.10 94.05 90.48 94.05
24
Results (cont.)
The K-Best suggestions for get knowledge. The K-Best suggestions for get knowledge. The K-Best suggestions for get knowledge. The K-Best suggestions for get knowledge.
K-Best M2 M6 M7
1 aim obtain acquire
2 generate share share
3 draw develop obtain
4 obtain generate develop
5 develop acquire gain
25
The K-Best suggestions for reach purpose. The K-Best suggestions for reach purpose. The K-Best suggestions for reach purpose. The K-Best suggestions for reach purpose.
K-Best M2 M6 M7
1 achieve achieve achieve
2 teach account account
3 explain trade trade
4 account treat fulfill
5 trade allocate serve
26
The K-Best suggestions for pay time. The K-Best suggestions for pay time. The K-Best suggestions for pay time. The K-Best suggestions for pay time.
K-Best M2 M6 M7
1 devote spend spend
2 spend invest waste
3 expend devote devote
4 spare date invest
5 invest waste date
27
Conclusion
  • A probabilistic model to integrate features.
  • The early experimental result shows the potential
    of this research.

28
Future works
  • Applying such mechanisms to other types of
    miscollocations.
  • Miscollocation detection will be one of the main
    points of this research.
  • A larger amount of miscollocations should be
    included in order to verify our approach.

29
  • Thank you!
  • Q A
  • Anne Li-E Liu lel29_at_cam.ac.uk
  • David Wible wible45_at_yahoo.com
  • Nai-Lung Tsao beaktsao_at_gmail.com
Write a Comment
User Comments (0)
About PowerShow.com