Title: ?????????????? ???????????, ?????????????? ? ??????????? ??????? ? ????????? ????
1?????????????? ???????????,???????????????
??????????? ??????? ? ????????? ????
????????? 22 ?????? 2012
- ?.?. ??????
- (???????? ??????????? ???
- ? ??? ????? ?.?.??????????)
- aakibrik_at_gmail.com
2INTERACTION OF THE VERBAL, PROSODIC, AND VISUAL
COMPONENTS in language understanding
Jaroslavl November 22, 2012
- Andrej A. Kibrik
- (Institute of Linguistics RAN
- and Lomonosov Moscow State University)
- aakibrik_at_gmail.com
3The mainstream linguistic approach
- Language consists of hierarchically organized
segmental units, such as phonemes, morphemes,
words, phrases, and sentences - Linguistic form is thus equated with verbal form
4However
- Apart from sound, there are other channels (or
components) of communication, in the first place
through vision (body language - gesture, mimic,
gaze, posture, etc.) - Also, there are prosodic, that is non-verbal
(non-segmental) aspects to sound - Imagine prosody-free talk
- or, vice versa, talk behind a wall
5Communication channels
- The verbal component, prosody, and body language
all count as distinct communication (or
information) channels - They all cooperate in getting message from
speaker to addressee - This is what is sometimes called the multimodal
approach - Cf. ???????????? 1963 How the non-verbal text
interacts with the verbal text?
6Multimodality
- A multimodal approach assumes that the message
is spread across all the modes of
communication. If this is so, then each mode is a
partial bearer of the overall meaning of the
message. (Kress 2002). - Any use of language is inescapably multimodal
(Scollon 2006) - Unimpaired communication is, of course,
inherently multimodal, with the speech content
being modified by prosody and delivered in
parallel with facial expression, gesture,
posture, and a range of other nonverbal
communication methods. (Alm 2006) - Within biology, experimental psychology, and
cognitive neuroscience, a separate rapidly
growing literature has clarified that
multisensory perception and integration cannot be
predicted by studying the senses in isolation.
(Cohen and Oviatt 2006)
7What is the contribution of different channels?
- Traditional approach of mainstream linguistics
the verbal channel is so central that prosody and
the visual channel are at best downgraded as
paralinguistics - Applied psychology
- It is often stated that (figures go back to
Mehrabian 1971) - body language conveys 55 of information
- prosody conveys 38 of information
- the verbal component conveys 7 of information
- Words may be what men use when all else fails
(???????? 2002 6) - Who is right?
8Relative contribution of three communication
channels?
- DISCOURSE
- Vocal channels Visual channel
- Verbal channel Prosodic channel
9Experimental design
- Isolate the three communication channels
- Present a sample discourse in all possible
variants (238) - Present each of the eight variants to a group of
subjects - Assess the degree of understanding in each case
- Such assessment may lead to estimates of the
contributions of communication channels
10Studies in this line of research
- Èlbert 2006, year paper
- Èlbert 2007, diploma thesis
- Reinterpreted and refined in Kibrik and Èlbert
2008 - Molchanova 2008, year paper
- Molchanova 2009, year paper
- Molchanova 2010, diploma thesis
- Reinterpreted and refined in Kibrik 2011
11Èlbert 2007, Kibrik and Èlbert 2008
- Russian TV serial Tajny sledstvija Mysteries
of the investigation - Experimental excerpt 3 min. 20 sec.
- Preceded by a 8 minutes context (that starts from
the beginning of the series) - The excerpt fully consists of a conversation, to
ensure that we are testing the understanding of
discourse rather than of the film in general - Two vocal channels have been separated
- Verbal running subtitles
- Prosodic superimposed filter creating the
behind a wall effect - Participants
- 99 participants, divided into 8 groups
- Native speakers of Russian
- Each group comprised 10 to 17 participants
12Eight experimental groups
- Group 0 only the context excerpt
- Groups 1 (one communication channel)
- Verbal subtitles, temporally aligned
- Prosodic filtered sound
- Visual video
- Groups 2 (two communication channels)
- Verbal prosodic original sound
- Verbal visual subtitles and video
- Prosodic visual filtered sound and video
- Group 3 original material
13Group 3 original material
14Verbal visual
15Visual prosodic
16Procedure
- The context and the experimental excerpts were
shown to a group of subjects on a large screen - Each subject was instructed to watch the context
and the experimental excerpt and then answer a
set of questions concerned with the experimental
excerpt alone - Questionnaire was constructed in accordance with
the received principles of test tasks (Panchenko
2000) - 23 multiple-choice questions in questionnaire
- A subject was supposed to choose only one answer
out of four listed variants - What Tamara Stepanovna offers Masha before the
beginning of the conversation - a. to take off her coat
- b. to have a cup of tea
- ? c. to have a seat
- d. to have a drink
- Percentage of correct answers is used as an
assessment of a subjects degree of understanding
17Results
- All three channels are substantially informative
- Verbal gt visual gt prosodic
- Integration of visual and prosodic channels is
difficult
18Molchanova 2010
- Contribution of information channels in
understanding spoken discourse methodological
aspects - The following aspects of the prior study have
been changed (improved) - Stimulus material
- Prosodic channel
- Verbal channel
- Questionnaire
- Interviewing procedure
19Stimulus material discourse type
- Shortcomings of movies
- Plot facilitates guessing
- Possible familiarity with the movie
- Quasi-natural behavior of actors
- Solution natural dialogue
- Shared activity
- Figure-guessing game
- Can be filmed by one camera
- ??? 3 ??????.avi, 019 057
- Remaining problems
- Hard to remember the sequence of events
- Many events are similar
20Stimulus material speakers
- Shortcomings of the prior studies
- Same-sex speakers ? indistinguishable in the
prosody-only version - Solutions
- Different sexes F0 range is different
- Additional features
- Acquainted
- Not close friends
21Prosodic channel
- Shortcomings of the prosodic material as used in
previous studies - Èlbert 2007 noisy sound
- Molchanova 2009 Unnatural, electronic, sound
- Solution
- Loudness is decreased radically at all
frequencies except for the speakers average F0
frequency - This has led to the behind the wall (or behind
the glass) effect
22Visual prosodic
23Verbal channel
- Shortcomings of subtitles
- Hard to read without punctuation
- Especially at the rate of speech
- And especially in the verbal visual condition
- Solution spoken prosody-free signal
- Each word in transcript is replaced by an
individually pronounced word - All thus elicited words are glued together in the
right order
24Visual verbal
25Verbal channel
- Remaining problem
- Unnatural input
- No reduction
- No intonation
- etc.
26Questionnaire
- Shortcomings of prior studies
- Èlbert 2007 gap between Group 0 (38.3) and
Group 3 (87.4) is insufficient - Solution
- Testing stage
- Identify trivial questions (high Group 0)
- Identify unfortunate questions (low Group 3)
- 30 ? 17
- Group 0 24.7 correct answers
- Group 3 91.2 correct answers
27Interviewing procedure
- Shortcomings of prior studies
- Participants of various age and life experience
- Multiple participants may affect each others
performance - Need for a large room, loud speakers, and big
screen - Solutions
- Control for age, gender, geographical origin,
social status - Remote implementation
- Stimulus materials at Youtube.com
- Questionnaire at Googledocs
- All participants are in similar conditions
- Comfortable, adjustable conditions
- No need for audio and video control in large rooms
28Kibrik and Èlbert 2008 vs. Molchanova 2010
- General picture is remarkably similar
- All three channels are substantially informative
- Verbal gt visual gt prosodic
- Visual prosodic dip is even sharper
- Cleaner results
- Two channels is much better than one channel
- Verbal and visual channels integrate well
29Normalized contribution of three channels
- Suppose the three channels are independent
- Sum up all percentages of individual channel
contributions and normalize to 100 - Identify normalized contribution
30Normalized contribution of three channels
Kibrik and Èlbert 2008 Molchanova 2010
Summed percentages Summed percentages 725162185 594649154
Normalized contributions Verbal 721.8539 591.5438
Normalized contributions Prosodic 511.8528 461.5430
Normalized contributions Visual 621.8533 491.5432
31Gender differences
- Molchanova 2010 gender advantages
- Percentages of correct answers
Condition Men Women Advantage
Verbal only 59.1 69.9 Women 10.7
Visual prosodic 66.1 51.6 Men 14.5
32Conclusions
- All communicatioin channels are highly
significant - ? the traditional linguistic viewpoint is
erroneous - The verbal channel is the leading one
- ? the viewpoint popular in applied psychology is
erroneous - Information from the prosodic and the visual
channels is primarily used through integration
with the verbal channel - Very similar results have been attained in
different studies, in spite of very different
methodological details
33Further questions
- Auditory or graphic presentation of the verbal
alone channel? - Optimal discourse type?
- and Other suggestions on this approach?
34Thanks for your attention
visual channel
language
verbal channel
prosodic channel