Title: Anne Li-E Liu
1Automated Suggestions for Miscollocations
- Anne Li-E Liu
- David Wible
- Nai-lung Tsao
2Overview
- Introduction
- Methodology
- Experimental Results
- Conclusion
3Introduction
- Our study focuses on how to find suggestions for
miscollocations automatically. - In this paper, only verb-noun collocations and
miscollocations are considered.
4Introduction
- Howarths (1998) investigation of collocations
found in L1 and L2 writers writing. - Grangers analysis on adverb-adjective
collocation (1998). - Lius (2002) lexical semantic analysis on the
verb-noun miscollocations in English Taiwanese
Learner Corpus.
5Introduction
- Projects using learner corpora in analyzing and
categorizing learner errors - NICT JLE (Japanese Learner English) Corpus
- The Chinese Learner English Corpus (CLEC)
- English Taiwan Learner Corpus (or TLC) (Wible et
al., 2003).
6An example
1. solve
2. pose
3. tackle
4. grapple
5. alleviate
6. overcome
7. exacerbate
8. compound
9. beset
10. resolve
- She tries to improve her students problems.
7Method
- Three features of collocate candidates are used
- 1. Word association strength,
- 2. Semantic similarity
- 3. Intercollocability (Cowie and Howarth,
1996).
8Resource
- 84 VN miscollocations in TLC (Liu, 2002).
- Training data 42 Testing data 42
- Two knowledge resources BNC, WordNet
- Two human evaluators.
9Word Association Strength
- Mutual Information (Church et al. 1991)
- Two purposes
- All suggested correct collocations have to be
identified as collocations. - The higher the word association strength the more
likely it is to be a correct substitute for the
wrong collocate.
10Semantic Similarity
- A semantic relation holds between a miscollocate
and its correct counterpart (Gitsaki et al.,
2000 Liu 2002) - The synsets of WordNet to be nodes in a graph.
?measure graph-theoretic distance
say a story
tell a story
think of a story
say a story
11Semantic Similarity
12Intercollocability
- Cowie and Howarth (1996) propose that certain
collocations form clusters on the basis of the
shared meaning.
convey point
get across the message
communicate concern
convey feeling
express concern
13Intercollocability
- Collocations in a cluster show a certain degree
of intercollocability.
?
condolences
express ones concern
express communicate
concern feeling
14Intercollocability
- She tries to improve her students problems.
improve problem
Starting point.
problem
86 verb collocates
improve
52 noun collocates
problem
problem
resolve/ improve
resolve
reduce
situation matter way
15Intercollocability
situation matter problem way quality efficiency ef
fectiveness
situation matter problem way
resolve
reduce
- The cluster is partially created and the link
between improve, resolve and reduce is developed
by virtue of the overlapping noun collocates.
16Intercollocability
- Quantify intercollocability
- The number of shared collocates
17situation matter problem way quality efficiency ef
fectiveness
situation matter problem way
resolve
reduce
- shared collocate (resolve, improve) 3
- shared collocate (reduce, improve) 3
- The more shared collocates a verb has with the
wrong verb, the more likely this verb is a good
candidate
18Integrate the 3 features
19Training
- Probability distribution of word association
strength - MI value to 5 levels
- (lt1.5, 1.53.0, 3.04.5, 4.56, gt6)
- P( MI level )
- P(MI level Sc)
20Training
- Probability distribution of semantic similarity
- Similarity score to 5 levels
- (0.00.2, 0.20.4, 0.40.6, 0.60.8 and 0.8 1.0
) - P(SS level )
- P(SS level Sc)
21Training
- Probability distribution of intercollocability
- Normalized shared collocates number to 5 levels
- (0.00.2, 0.20.4, 0.40.6, 0.60.8 and 0.8 1.0
) - P(SC level )
- P(SC level Sc)
22Experiments
- Different combinations of the three features.
Models Feature (s) considered
M1 MI (Mutual Information)
M2 SS (Semantic Similarity)
M3 SC (Shared Collocates)
M4 MI SS
M5 MI SC
M6 SS SC
M7 MI SS SC
23Results
K-Best M1 M2 (SS) M3 M4 M5 M6 (SSSC) M7 (MISSSC)
1 16.67 40.48 22.62 48.81 29.76 55.95 53.75
2 36.90 53.45 38.10 60.71 44.05 63.1 67.86
3 47.62 64.29 50.00 71.43 59.52 77.38 78.57
4 52.38 67.86 63.10 77.38 72.62 80.95 82.14
5 64.29 75.00 72.62 83.33 78.57 83.33 85.71
6 65.48 77.38 75.00 85.71 83.33 84.52 88.10
7 67.86 77.38 77.38 86.90 86.90 86.90 89.29
8 70.24 80.95 82.14 86.90 89.29 88.10 91.67
9 72.62 83.33 85.71 88.10 92.86 90.48 92.86
10 76.19 86.90 88.10 88.10 94.05 90.48 94.05
24Results (cont.)
The K-Best suggestions for get knowledge. The K-Best suggestions for get knowledge. The K-Best suggestions for get knowledge. The K-Best suggestions for get knowledge.
K-Best M2 M6 M7
1 aim obtain acquire
2 generate share share
3 draw develop obtain
4 obtain generate develop
5 develop acquire gain
25The K-Best suggestions for reach purpose. The K-Best suggestions for reach purpose. The K-Best suggestions for reach purpose. The K-Best suggestions for reach purpose.
K-Best M2 M6 M7
1 achieve achieve achieve
2 teach account account
3 explain trade trade
4 account treat fulfill
5 trade allocate serve
26The K-Best suggestions for pay time. The K-Best suggestions for pay time. The K-Best suggestions for pay time. The K-Best suggestions for pay time.
K-Best M2 M6 M7
1 devote spend spend
2 spend invest waste
3 expend devote devote
4 spare date invest
5 invest waste date
27Conclusion
- A probabilistic model to integrate features.
- The early experimental result shows the potential
of this research.
28Future works
- Applying such mechanisms to other types of
miscollocations. - Miscollocation detection will be one of the main
points of this research. - A larger amount of miscollocations should be
included in order to verify our approach.
29- Thank you!
-
- Q A
- Anne Li-E Liu lel29_at_cam.ac.uk
- David Wible wible45_at_yahoo.com
- Nai-Lung Tsao beaktsao_at_gmail.com