Title: Using A Corpus of Spontaneous Speech
1Using A Corpus of Spontaneous Speech
2Structure of the Class
- Introduction
- Some basic questions
- What is a corpus?
- What are corpora for?
- What are corpora especially good for?
- What is coding?
- An unusual corpus a designed corpus of
spontaneous task-oriented speech - Some exercises using a coded corpus to learn
about spontaneous speech - Do people speak in whole sentences?
- Do people take turns speaking
- What are the units of dialogue?
- When do people look at one another?
- When are people disfluent?
31.a. Some basic questions
- What is a corpus?
- A corpus is a collection of language materials
produced by real people engaged in real
activities, varying in - Genre
- Single genre (Canadian Hansard, Wall Street
Journal ) - Sampling across genres (British National Corpus)
- Modality of communication
- Written (Novels of Jane Austen)
- Spoken (Switchboard)
- Purpose of communication
- Chat (Switchboard, Phone Home)
- Task Completion (MIT restaurant and hotel finder)
41.a. Some basic questions
- What are corpora for?
- Discovering how language is used norming
- Lexicography Words and contexts for dictionaries
- Pedagogy getting students used to the language
as they will hear or read it - Socio-linguistics and historical linguistics
- Neurolinguistics
- Speech and language technologies to train
computational systems for - Automatic summarization or indexing of text
- Automatic comprehension of text
- Automatic recognition of speech
- Automatic production of speech
- Human-computer dialogue
- Conducting large-scale experiments comparison of
groups or contexts
51.a. Some basic questions
- What are corpora especially good for?
- Studying spontaneous production
- Studying big samples
- Studying very rare phenomena
- Allowing use of powerful statistics instead of
weak observations - Sharing data and saving time
61.a. Some basic questions
- What is coding?
- Labelling parts of a corpus in a well-formed
system of analysis - Examples
- Linguistic analyses
- Part of Speech (Noun, Verb, Adjective, etc)
- Segment of text (paragraph, document, title,
quotation) - Source (Speaker)
- Conditions of production (telephone,
face-to-face) - Simultaneous factors noise, direction of speaker
gaze, gesture
71.a. Some basic questions
- What is coding? contd.
- N.B. Coding is a stringent test of linguistic
theories for - consistency,
- coverage,
- interpretability
- Coding tools software designed to
- help linguists and psychologists code corpora
- allow only legitimate choice of codes
- Raw coding can be horrible to read,
- so coding now used as instruction to display
linguistic material in a particular way.
81.b. An unusual corpus a designed corpus of
spontaneous task-oriented speech
- Name The HCRC Map Task Corpus
- Authors Anne Anderson, Matthew Aylett, Ellen
Gurman Bard, Matthew Bull, Jean Carletta,
Gwyneth Doherty-Sneddon, Simon Garrod, Amy Isard,
Stephen Isard, Jacqueline Kowtko, Robin Lickley,
David McKelvie, Jim Miller, Alison Newlands,
Cathy Sotillo, Paul Taylor, Henry Thompson - Type unscripted dialogue
- Task route communication
- One speaker has a route pre-printed on a map,
- The other has a similar map without a route
- Neither can see the others map
- Either can say anything
- No hand gestures allowed
- Both know that maps differ but not how or where
- Size 128 dialogues
- Speakers 64 undergraduates at Glasgow University
in Scotland - Design orderly division of examples among
conditions which may be compared
9CORPUS DESIGN, contd.
10CORPUS DESIGN, contd.
11CORPUS DESIGN, contd.
12CORPUS DESIGN, contd.
13CORPUS DESIGN, contd.
14CORPUS DESIGN, contd.
15CORPUS DESIGN, contd.
- /t/-deletion east lake
- glottalization slate mountain
- /d/-deletion gold mine
- nasal assimilationgreen bay
162. Some exercises using a coded corpus to learn
about spontaneous speech
- This is a list of exercises. You do not have to
finish them all today. The website for the
corpus can be accessed from any web browser, so
you can examine it at any time. - Getting started
- Do people speak in whole sentences?
- Do people take turns when they speak?
- What are the units of dialogue?
- When do people look at one another?
- When are people disfluent?
172. Some exercises using a coded corpus to learn
about spontaneous speech
- Getting started
- Go to the introduction page for the corpus
- http//www.hcrc.ed.ac.uk/amyi/maptask/demo_instru
ctions.html - Read it. You will need to know how to select a
particular one of the 128 dialogues to look at
more closely. - Note that the speakers are Scottish.
- You may find some of their words or expressions
strange if you are used to English or American
versions of English. - Use the maps to help you decipher the landmark
names -
182. Some exercises using a coded corpus to learn
about spontaneous speech
- Introduction - getting started, contd.
- Follow the link to the demo page
- http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
- This page will allow you to look at
transcriptions of dialogues and learn something
about them. - Choose a dialogue to look at. The best ones to
use are in quad 3, I.e., q3ec1 or q3nc4. You can
choose either an e dialogue (where speakers can
see each other) or an n dialogue (where they
cant). You can choose any dialogue from 1-8. - Enter the code number of a dialogue in the
Dialogue Name box. - Hit the Process button to make the dialogue
appear
192. Some exercises using a coded corpus to learn
about spontaneous speech
- Introduction - getting started, contd.
- The dialogue is divided into turns, stretches of
speech by an individual speaker. - At the top of the page are buttons labelled giver
map and follower map. Press each of these to see
the maps that the participants used. Can you
spot the differences?
202. Some exercises using a coded corpus to learn
about spontaneous speech
- Do people speak in whole sentences?
- Your first exercise is to examine the contents of
the turns on the first page of your dialogue and
answer the question above. - If you are not sure that you have enough
information to answer, what should you do?
212. Some exercises using a coded corpus to learn
about spontaneous speech
- Do people take turns when they speak?
- Use the back button to return to the demo page.
If that doesnt work, pull down the GO menu and
select the demo page - Keep the same dialogue but now click on overlap
under Display Highlight. Remember to click on
Process. - What you will see next are the same turns but
with an indication of when people were speaking
at the same time. Did your speakers often speak
at the same time? Scroll through the dialogue to
see. - Take another dialogue and check the overlaps
again. Do the speakers differ from the ones you
first looked a? (Hint if you keep the same quad
number, for example q3, and dialogue number, for
example 4, but just change e to n, you will find
a different pair of speakers dealing with the
same map.)
222. Some exercises using a coded corpus to learn
about spontaneous speech
- What are the units of dialogue?
- Language is often organized in hierachies
Sentences are composed of phrases, and phrases
are composed of words. - For larger units of text, we have chapters,
sections or paragraphs. What are the units of
dialogue? The Map Task coding posits 3 levels.
We will discuss 2 of them here. - To find out what they are, start at the
introduction page - http//www.hcrc.ed.ac.uk/amyi/maptask/demo_instru
ctions.html -
232. Some exercises using a coded corpus to learn
about spontaneous speech
- What are the units of dialogue? Contd
- What is a Dialogue Move?
- Follow the link to Dialogue Moves to answer
- Hint Assume that conversation is a game in
which one kind of move must be followed by
another, a question by an answer, for example.
If we classify utterances by their functions in
terms of getting and giving information, which
kinds of utterances demand to be followed by
other particular kinds? - Look at the list of Initiating Moves. Do you
agree that they need to be followed by certain
kinds of responses? - Look at the list of Response Moves. Do any of
these seem to you to be good replies to any
Initiating Moves? -
-
242. Some exercises using a coded corpus to learn
about spontaneous speech
- What are the units of dialogue? Contd
- What is a Dialogue Game?
- Follow the link to Dialogue Games to answer
- Now go back to the Demo page
- http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
- to examine a dialogue coded for Games and Moves.
- Choose a dialogue by typing its number into the
Dialogue Name box. - Ask for Dialogue Games as the Display option and
overlap as the Highlight option.. - Does the grouping of moves into games make sense
to you? - Look at the overlaps. When do your speakers most
often overlap their speech Early or late in
Games? -
-
252. Some exercises using a coded corpus to learn
about spontaneous speech
- When do people look at one another?
- Visual communication is important. But when does
it occur? - Go back to the Demo page
- http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
- to examine a dialogue coded for Games and Moves.
You must choose a dialogue coded for the
direction of the speakers gaze Try q3. And
lets start with one where they can actually see
each other, q3e4, for example. - Choose Dialogue Games as the Display option and
gaze as the Highlight option.. - Look for the color code at the top of the page.
- Do your speakers look at each other
simultaneously? When do they most often overlap
their speech Early or late in Games? - To see whether the speakers were just looking up
at random, or whether they expected to see
something, examine the dialogue using the same
map, but with a barrier between speakers e.g.,
q3n4 to compare with q3ec4. -
-
262. Some exercises using a coded corpus to learn
about spontaneous speech
- When are people disfluent?
- Disfluency is an error in speaking which the
speaker corrects. How disfluent are fluent
speakers of a language? - Go to the introduction page
- http//www.hcrc.ed.ac.uk/amyi/maptask/demo_instru
ctions.html - Click on Disfluency to learn about how
disfluencies are classified. - Go back to the Demo page to choose a dialogue.
- http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
- Ask for Dialogue Games as the Display option and
disfluency as the Highlight option.. - Look at the disfluency codngs. When do your
speakers most often become disfluent? Early or
late in Games? -