Title: Learner corpora, error analysis
1Learner corpora, error analysis
- ?????????????????
- ?????????????????
- ???
2Learner corpora
3- Granger, S. 2003. Error-tagged Learner Corpora
and CALL A Promising Synergy. CALICO Journal, 20
(3), 465-480. - Pravec, N. 2002. Survey of learner corpora. ICAME
Journal No. 26, 81-114. - Tono, Y. 2003. Learner corpora design,
development and applications. CL2003 workshop
paper. 800-809.
4What is a Learner Corpus?
- a corpus, or computer textual database, of the
language produced by foreign language learners - G. Leech (1998) Introduction to Granger (ed.)
Learner English on Computer. London Addison
Wesley Longman
5learner corpora
- Computer learner corpora are electronic
collections of authentic FL/SL textual data
assembled according to explicit design criteria
for a particular SLA/FLT purpose. They are
encoded in a standardised and homogeneous way and
documented as to their origin
6- There is nothing new in the idea of collecting
learner data. Both FLT and SLA researchers have
been collecting learner output for descriptive
and/or theory-building purposes since the
disciplines emerged. In view of this, it is
justified to ask what added value, if any, can be
gained from using learner corpus data. - (Granger 2004 123f.)
7- a new resource for second language acquisition
(SLA) and foreign language teaching (FLT)
specialists. - especially useful when annotated with the help of
a standardized system of error tags.
8- Learner language differs from native language
both quantitatively and qualitatively. - It displays very different frequencies of words,
phrases and structures, with some items overused
and others significantly underused. - It is also characterized by a high rate of
misuse, i.e. orthographic, lexical, and
grammatical errors.
9- "The area of linguistic enquiry known as learner
corpus research ... has created an important
link between the two previously disparate fields
of corpus linguistics and foreign/second language
research. Using the main principles, tools and
methods from corpus linguistics, it aims to
provide improved descriptions of learner language
which can be used for a wide range of purposes in
foreign/second language acquisition research and
also to improve foreign language teaching."
(Granger 2002, 4)
10Features of Learner Corpora
- Storing the data in a machine-readable format
- Computational analysis can be exploited
- Annotations make corpora even more valuable
- Finding patterns vs. idiosyncrasies
- Can be shared with other researchers
- Can be used for research teaching/learning.
- Standard reference
11- Hence, it is vital to consult corpora of learner
data, as well as corpora of data that serve as
input to learners, before we can get a full
picture of what learners know and how they come
to know it. - Alan Juffs (2001) in SSLA, pp.312
12What FLT fields will benefit from computer
learner corpus (CLC) research?
- curriculum design use of CLC for selecting and
sequencing what needs to be taught - materials design use of CLC to improve FLT
tools traditional grammars and dictionaries
web-based materials and CALL programs - classroom methodology use of CLC for data-driven
learning (DDL) and learning-based exercises - Granger
13Mark-up annotation
14Annotated Learner Data
- Mark-up Header info, sentence boundaries, etc.
- Useful for choosing a proper set (portion) of
files - Annotation POS tagging, error tagging, etc.
15Error analysis
16CA
EA
CIA
CEA
17ERROR
CA, EA
CIA, CEA
18Traditional EA suffers from a number of
limitations
- Limitation 1 EA is based on heterogeneous
learner data - Limitation 2 EA categories are fuzzy
- Limitation 3 EA cannot cater for phenomena such
as avoidance - Limitation 4 EA is restricted to what the
learner cannot do - Limitation 5 EA gives a static picture of L2
learning. - (Dagneaux et al. 1998 164)
19Criticism of EA
- Once a very popular enterprise, error analysis
(EA) is now out of favor with most SLA/FLT
circles. It has gone down in history as a fuzzy,
unscientific, and unreliable way of approaching
learner language.
20Systematic, would occur in similar context
mistake
error
performance
competence
21- errors are an integral part of interlanguage and
are just as worthy of analysis as any other IL
aspect. - an important key to a better understanding of
the process underlying L2-learning. Ringbom
(1987, p. 69) - can still serve as a useful tool and is still
undertaken. Ellis (1994, p. 20) - In particular, a detailed description of learner
errors cannot but contribute to one essential FLT
aimthat of helping learners to achieve a high
level of accuracy in the language.
22goals of learner corpus research
- descriptively
- find and classify errors, find patterns in
learner language - theoretically
- find out about learner's hypotheses
(interlanguage) - improve teaching material
23learner corpora what can you do with them
- two main types of studies
- error analysis (EA)
- qualitative and quantitative
- contrastive interlanguage analysis (CIA)
- qualitative and quantitative
- know your data comparability, ways of counting
different distributions
24what is an error?
- "A linguistic form, ... which, in the same
context would in all likelihood not be produced
by the learner's native speaker counterparts."
(Lennon 1991, 182)
25- structural errors (breaking of a rule),
- non-structural errors,
- deviations from some kind of norm ('breaches of
code') - quantitative differences (overuse, underuse)
26Steps in EA
- 1. collection of samples of learner language
- 2. identification of errors
- 3. description of errors
- 4. explanation of errors
- Ellis 1994
27error tags
- classification
- formal kind of error (insertion, deletion, ...)
- exponent of error (word, phrase, ...)
- hypothesis about reason (interference with L1,
principle X not understood, ...) - linguistic level (morphology, syntax, ...)
28development of error tagsets
- the development of error tags is difficult
because - there is often more than one possible target
hypothesis - the level of granularity of the tagset depends on
the research question - generic tagsets vs. specific tagsets
29Error tagging principles (1)
30Principles in error tagging
- 1. informative but manageable it should be
detailed enough to provide useful information on
learner errors, but not so detailed that it
becomes unmanageable for the annotator
31- 2. reusable the categories should be general
enough to be used for a variety of languages
32- 3. flexible it should allow for addition or
deletion of tags at the annotation stage and for
quick and versatile retrieval at the
post-annotation stage
33- 4. consistent to ensure maximum consistency
between the annotators, detailed descriptions of
the error categories and error tagging principles
should be included in an error tagging manual.
34two major descriptive error taxonomies
- (a) one based on linguistic categories (general
ones such as morphology, lexis, and grammar and
more specific ones such as auxiliaries, passives,
and prepositions) and - (b) the other focusing on the way surface
structures have been altered by learners (e.g.,
omission, addition, misformation, and
misordering).
35three levels of annotation
- error domain,
- error category, and
- word category.
- These three levels are descriptive rather than
interpretative.
36- The error domain is the most general level it
specifies whether the error is formal (i.e.
orthographic), grammatical, lexical, and so
forth. Each error domain is subdivided into a
variable number of error categories.
37Error Domains and Categories (1)
382
393
40- The lexical domain ltLgt groups all lexical errors
due to - 1. insufficient knowledge of the conceptual
(i.e., denotative) meaning of words ltSIGgt - 2. violations of the co-occurrence patterns of
words. This category covers a wide spectrum from
restricted collocations to idioms ltFIGgt - 3. violations of the grammatical complementation
(i.e., valency) patterns of words. This category
covers the valency of verbs ltCPVgt, nouns ltCPNgt,
adjectives ltCPAgt and adverbs ltCPDgt
41(No Transcript)
42(No Transcript)
43CLEC
44Error tagging principles (2)
45????????
-  1.  ????,????????????????,???????,??????????????,
????11???(fm)?????(vp)?????(np)???(pr)??????(aj)?
??(ad)?????(pp)???(cj)???(wd)???(cc)???(sn)???????
???????cc???????,cc1??????????,cc2??????????
,cc3??????????,???
46-  2. ????????????????,?????,???????????/??????,????
?????????????????????????????(?vp?np??9??),???????
?(?cj?????)????????61????,????????????
47- Â 3.?????????(????????????????)???In the past,
people are vp6, 4- kind to each other,
????????,??????? vp6?vp(??)?6?(??)??,4-????????,
-???????,4??????4???????4??,????are???????
48Error tags
Positions of error
Types of error
49- 4.??????????????????????????????????sn8????????,
?????????????????????????sn8?????????,????????????
,?sn81,sn82,???
50- 5.  ??????????????,???????????????,??????
51(No Transcript)
52(No Transcript)
53????
54????
55??
56?????
57??
58????
59??
60??
61??
62??
63- Your further tagging or your own tagging
64CA to CIA
65CA CIA
CA
OLltgtOL
SLltgtTL
DIAGNOSTIC
PREDICTIVE
TRANSFER
CIA
NLltgtIL
ILltgtIL
66L2 Corpus (English)
L1 Corpus (Chinese)
L1 Corpus (English)
67LOCNESS
CLEC/ SWECCL