Title: Word clusters and the idiomaticity of learner English
1????????????Word clusters and the idiomaticity
of learner English
- ???
- xujiajin_at_bfsu.edu.cn
- Beijing Foreign Studies University
2Major points
- Defining word clusters
- Hypothesis and methodology
- DIY step by step
- Data display and interpretation
3An alternative view of L
- Some linguists see language as an edifice
assembled with a great amount of ready-made
templates or building blocks(?????).
4Some Chinese examples
- ????(Have you had your dinner?)
- ??????(Where are you going?)
- ???(Thats it.)
- ??? (That is to say, kind of)
- ???? (Then, so)
- Such lexical phrases r stored uttered in one
spurt at a time highly frequent in use.
5Word clusters
Defining clusters
- A word cluster is a group of words which follow
each other in a text (Scott 2004). In our case,
we take a frequency-driven approach to clusters.
6Aliases of cluster
Defining clusters
- They are similar to phrases in most pedagogic
grammars, but bear different confusing names in
the literature, like formulaic sequences, lexical
bundles, clusters, chunks, multi-word
expressions, recurrent word combinations,
pre-fabs, ngrams etc.
7Word list multi-word list
Defining clusters
- How a chunk is calculated?
- Based on frequency
- Lexical chunks are generated like a multi-word
list.
8How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that you're having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
9How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that you're having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
10How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that you're having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
11How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that you're having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
12How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that you're having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
13How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that you're having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
14How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that you're having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
15How a cluster is calculated (e.g.3-word)?
Defining clusters
- The idea of respect comes from the concept that
everyone, including yourself, has self-worth, and
therefore should be treated with dignity. Say,
for example, that youre having a discussion with
your boyfriend or girlfriend and your opinions
are different. While you may disagree with each
other, each of you still has a right to your own
feelings.
16Cluster and idiomaticity
Hypothesis method
- When a word cluster occurs highly frequently, it
may imply that the cluster is formulaic, being
able to enhance idiomaticity and fluency (Wray
2002).
17Cluster and idiomaticity
Hypothesis method
- Studies in pattern grammar (Hunston 1996), the
lexical approach (Lewis 1993) to language
teaching, etc. show that native speakers use much
more chunks in their language production than L2
learners do. Alternberg (1998) reported that 96
of native speakers language follow some
prefabricated patterns.
18Cluster and idiomaticity
Hypothesis method
- It is, therefore, believed that the more
proficient an L2 learner is, the more formulaic
language is used in his language. Such
formulaicity makes his language more idiomatic
and fluent.
19Measuring the idiomaticity of learner language
Hypothesis method
- Ideally, it would be best to extract a set of
clusters used in native speakers language, and
to use the set of clusters to measure the
idiomaticity of L2 learners language.
20Measuring the idiomaticity of learner language
Hypothesis method
- However, this is often difficult, as there may be
thousands of such clusters (Pawley Syder 1983),
and clusters are often content-related (Schmitt
Carter 2004). It is not easy to find native
speakers language data with shared topics.
21Measuring the idiomaticity of learner language
Hypothesis method
- It is wondered that, if NS data cannot be found,
proficient L2 learners language data may also
serve the purpose. - Word clusters from proficient L2 learners
language the measure of learner language
idiomaticity.
22Data and Methodology
- 90 essays written by university students, with
scores assigned by expert human raters - 30 best compositions were chosen from the 90
- Most frequent 3- and 4-word clusters were
extracted from the 30 best compositions
23Methodology
- The remaining 60 texts were searched for the of
clusters found in the 30 best essays - Statistical analysis was then conducted to see
whether the of clusters contained in the 60
compositions correlates with essay scores.
24Flow chart
No. of clusters in each of the 60 texts
90 essays
60 Others
30 Best
Correlation analysis
PatCount search
Clusters
Ngram list
File-based Concordancing
25DIY steps
- Cluster/Ngram extraction
- 1. Choose the 30 best students files
- 2. Compute 3-4 word clusters
- 3. Save cluster results
26DIY steps
- Counting clusters
- 4. PatCount is used to search the 60 texts for
the 3-4 word clusters.
27DIY steps
- Correlation analysis
- 5. Numbers of counted cluster frequencies are
correlated with rater-assigned scores.
28Data display
29Correlation co-efficient
???
????
???
???
???
???
0
1
0.8
0.6
0.4
0.2
30Summary
- Content
- Notion of cluster
- Methodology
- The integration of corpus techniques and
statistical tests.
31Some reflections
- In-depth analysis of the results or the
underlying rationale - Functional grouping of clusters
- Use of clusters across different proficiency
levels - Tag-sequence/POS-gram/colligation
32Thank you!
33Assignment
- The correlation between Tag sequence/POSgram and
idiomaticity