Title: Automatic Assessment of Spoken Modern Standard Arabic
1Automatic Assessment ofSpoken Modern Standard
Arabic
- NAACL
- Boulder, Colorado
- 5 June 2009
- Pearson Knowledge Technologies
- Palo Alto, California
- Jian Cheng
- Jared Bernstein
- Ulrike Pado
- Masa Suzuki
2Outline
- Pearson Knowledge Technologies
- How Versant tests operate
- 2. Versant Arabic Test (development)
- 3. Validation evidence
- 4. Predictive accuracy
3Pearson Knowledge Tech. (PKT)
- (KAT Ordinate) are now PKT
- KAT LSA, Essay Scoring, Write-to-Learn, PTE,
etc. - Ordinate Versant, ORF for NCES, VersaReader,
PTE, etc.) - PKT is part of Pearson
- Pearson FT, Economist, Penguin, Longman,
PsychCorp, etc - PearsonKT is in Boulder, Colorado and Palo Alto,
California.
4Test delivery
Scoring system
ENGLISH
speech
Database tests, prompts, responses
ARABIC
Delivery Interface
Communication Network
DUTCH
report
SPANISH
California
Anywhere
5How Versant tests operate
The trains been delayed by one hour
Test Delivery Server
Versant Database
Scoring
6Versant Arabic Test
- DLI purpose
- 1000 students at DLI need predictive speaking
tests - Requirements
- Accurate test of Arabic listening speaking
- Convenient to use at DLI and worldwide (ILR is
costly) - Suitable for repeated formative testing
- High peak capacity for mass screening
7Construct Comparison
- OPI Construct Oral Proficiency as manifest in
an Oral Proficiency Interview, is compatible with
communicative competence as reflected in the
functional level and/or complexity of content
accurately produced. - Versant Construct facility in spoken language
the ability to understand spoken language and
speak appropriately in response at a
conversational pace on everyday topics.
8Versant Arabic Test
Test Structure
Part A Reading Part B Repeat -1 Part C Short
Answers Part D Sentence Builds Part E Repeat
-2 Part F Passage Retelling
9Versant Scoring
10How Versants are developed (1)
ScaleEstimates
NativeJudges
scale scores
Criteria
Internal
Ordinate System
Versant Scores
NativeScribes
transcripts
Validation
(Versant Arabic Test)
External
Recorded Items
Item Text
ILR Scores
Arabic Natives
Concurrent ILR Interviews
Arabic Learners
Native TestDevelopers
Test Spec
11Arabic Challenges Voweling
- kutubu al-waladi the books of the boy
- kataba al-waladu wrote the boysubj
- No disambiguating short vowels written
- Vowels carry phonetic information
- Vowels carry grammar information
12Complex Morphology
ziyaarat
naa
li
- for visit of us for our visit
- Complicates lexicon lookup, frequency estimates
- Short Arabic items are harder than English
items with the same number of words
13Development Run-time Processes
- Compilation of expectation and runtime flow
14Training data sources
Prompt Voices and Training Samples
Prompt Voices Prompt Voices Prompt Voices Prompt Voices Prompt Voices Prompt Voices Prompt Voices Prompt Voices
Country Egypt Iraq Jordan Morocco Lebanon Palestine Syria
Voices F, M F, M M F M F, M F, M
Native Data Native Data Native Data Native Data Native Data
Egypt Syria Iraq Palestine Other Total
484 281 179 187 517 1648
Learner Data Learner Data
DLI Non-DLI Total
1120 552 1672
15Validation Criteria
- Reliability
- Scores are consistent
- Validity
- Native and non-native speakers should be clearly
distinct - MSA and dialect speakers should be
distinct(since were testing MSA) - Machine scores should predict human scores
16Reliability
Score Split-Half Reliability (N 134) Test Retest Reliability (N 100)
Overall 0.98 0.97
Sentence Mastery 0.97 0.96
Vocabulary 0.89 0.82
Fluency 0.97 0.96
Pronunciation 0.96 0.94
17Native Non-Native Scores
18Natives by Countries
19Educated Uneducated Speakers
Cumulative Density
Arabic Overall Score
20Machine Human Comparison
Score Correlation(N 134)
Overall 0.97
Sentence Mastery 0.97
Vocabulary 0.96
Fluency 0.84
Pronunciation 0.83
21How Versants Compare to OPIs
ILR OPI Score (logits)
N 118 r 0.87
Versant Arabic Overall Score
22Spanish English Versant Human
Spanish
English
N 37r 0.92
N 151r 0.86
23Summary
- Versant Arabic Test (VAT) is in operation
- Based on a large and wide body of transcribed
spoken material - VAT is available on demand
- Returns consistent, accurate scores that reflect
real-time skills with MSA - VAT can triage or screen for OPI tests
24Thanks to Waheed Samy, Naima Bousofara Omar, Eli
Andrews,Mohamed Al-Saffar, Nazir Kikhia, Rula
Kikhia,and Linda Istanbullifor item development
and data collection/transcription in Arabic,and
to Andy Freeman for providing diacritic markings.