Title: Spoken%20multimedia%20corpora%20for%20pedagogical%20purposes
1Spoken multimedia corpora for pedagogical purposes
Birmingham Corpus Linguistics Conference 2007
- Sabine Braun (University of Surrey)
- Pascual Pérez-Paredes (Universidad de Murcia)
- Ylva Berglund (Oxford University)
2Introduction
- The usefulness of corpora in language pedagogy is
widely recognised. - But there is a need for pedagogically relevant
corpora, reflected e.g. in initiatives to create
'ad-hoc' corpora in pedagogical contexts. - The creation of pedagogically relevant corpora
raises challenges for corpus design. - Past and current initiatives have largely
focussed on written corpora spoken discourse is
becoming more important in pedagogical contexts. - The creation of pedagogically relevant spoken
corpora raises additional challenges for corpus
design.
3The challenges (1)
- Corpora contain textual records of discourse
their interpretation requires (re-)contextualisati
on. - Learners may have difficulties analysing corpus
data they require pedagogical mediation. - Pedagogical corpus uses differ from linguistic
description this requires e.g. pedagogically
motivated query options. - Corpora need to be integrated with curricula
this requires e.g. complementarity of content and
effective delivery.
- CORPUS DESIGNTraditional reference corpora
(content, size, data format,transcription,
annotation, query)
- CORPUS EXPLOITATIONData-Driven Learning (focus
on non-linear reading concordances and co-texts)
Do not fully support pedagogical requirements.
4The challenges (2)
- Spoken discourse is more dependent on shared
physical contexts. - It is adjusted to aural and online perception
(e.g. chunking) - It is affected by limitations of processing
capacity (false starts, repair). - It is marked by accents.
- It is multimodal.
- CORPUS DESIGNTraditionally representation in
written format
- CORPUS EXPLOITATIONWork with text-only data and
e.g. conversational markup
Again, this does not fully support pedagogical
requirements.
5Requirements
- Format multimedia to retain multimodal character
of spoken language - Content complementary with curriculum topics,
more coherence than in traditional corpora - Pedagogically motivated transcription, annotation
and alignment (transcript-video) - Combination of query methods text-based
exploration and application of corpus techniques - Pedagogical enrichment of corpora with
complementary resources (e.g. exercises,
explanations) - Effective delivery of corpora and additional
resources to learners/teachers
6Corpus creation (1)
- Examples ELISA and SACODEYL
- Interview format
- Video clips with transcript
- ELISA
- Professional English
- Accounts of professional life
- Different varieties
- SACODEYL
- 7 European languages
- Youth language corpora
- Speakers 13-15 and 16-18
- Communicatively relevant topics, e.g. in SACODEYL
topics outlined in the Common European Framework - Elicitation process briefing informants and
prompting them during the interview, ensuring
naturally flowing discourse
7Corpus creation (1)
Example of topics in SACODEYL
Topic Interview questions Age CEF Gramm. functions
Holidays Where did you spend your last holidays? 13-15 16-18 A2 can describe past activities, personal experiences Past tense
What are your plans for the next holidays? 13-15 B1 can describe dreams, hopes and ambitions Future Conditonal Modal verbs
Plans for the future What are your plans for your career? 16-18 B1 can explain/give reasons for my plans, intentions and actions Future
On what grounds do you decide? 16-18 B2 can speculate about causes, consequences, hypothetical situations Conditional Modal verbs
8Corpus creation (2)
CONTINUUM RAW, ORTHOGRAPHIC TRANSCRIPTION
ANNOTATED CORPORA
Transcription
TEI-compliant corpora
Markup
Pedagogic annotation
XML files
9Corpus creation (2)
Transcription
SACODEYL TRANSCRIPTOR
TEI-compliant corpora
Markup
SACODEYL ANNOTATOR
Pedagogic annotation
XML files
10Corpus creation (3)
SACODEYL TRANSCRIPTOR
11Corpus creation (2)
LanguageES MediaFileNameES02.avi Participants
personChico name role Entrevistado
sex Hombre age 16 description person E
name Andrés Mercader Rodríguez role
Entrevistador sex Hombre age 32
description /METADATA
- METADATA
- Title La Unión Europea une a los ciudadanos
- Date Recording2006-11-05
- Date Transcription2007-02-02
- LocaleI.E.S. Floridablanca,Murcia, España
- Principal Investigator Pascual Perez-Paredes
- ResearcherPascual Perez-Paredes
- Transcriber Encarnación Tornero Valero
- Editor
- Autority SACODEYL Project
- ID
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Corpus query
- Query options will support text- and corpus-based
exploration and include e.g. - Easy access to entire interviews
- A topic index supporting the analysis of similar
sections across interviews ("topic concordances") - Other indices based on the annotation categories
- Ready-made data (e.g. frequency lists of each
interview selective concordances) - A concordancer for extended/advanced search
adapted to pedagogical requirements
19Corpus query
20Pedagogical enrichment
- The corpora will be enriched with prototypical
learning activities. - These will focus on one interview section or one
interview as a whole or sections across
interviews - They will include e.g.
- linguistic and cultural explanations and
exercises(form-focussed as well as
communication-oriented), - (listening) comprehension and production tasks,
- explorative tasks (concordance-based as well as
interview-based). - Use of authoring tool Telos Language Partner to
create learning packages with ranges of
activities.
21Pedagogical enrichment
22Pedagogical enrichment
23Pedagogical enrichment
24Pedagogical enrichment
25Corpus delivery
- Effective delivery as a further prerequisite for
integration into curriculum - In SACODEYL, use of Moodle learning platform,
giving access to - Corpora (query interfaces)
- Resources created in the project (different types
of learning activities) - Resources created by future corpus users
26Summary
- Method outlined is transferable to other
pedagogical contexts, topics, languages - Method helps to use corpora more efficiently in
pedagogical contexts from sporadically used
resource to systematic exploitation - Corpus creation complies with standards to
facilitate reuse of corpora for other contexts
(research) - ?
27Contact
- Sabine Braun s.braun_at_surrey.ac.uk
- Pascual Pérez-Paredes sacodeyl_at_um.es
- Ylva Berglund ylva.berglund_at_oucs.ox.ac.uk
- And visit our poster session
- As well as our websites
- www.um.es/sacodeyl
- www.corpora4learning.net/elisa