Linguistic%20Resources%20needed%20by%20Nuance - PowerPoint PPT Presentation

About This Presentation
Title:

Linguistic%20Resources%20needed%20by%20Nuance

Description:

Linguistic Resources needed by Nuance Jan Odijk 060528 Cocosda/Write Workshop Overview Nuance History Nuance Technologies Nuance Language Coverage Which Languages are ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 13
Provided by: coc131
Learn more at: http://www.cocosda.org
Category:

less

Transcript and Presenter's Notes

Title: Linguistic%20Resources%20needed%20by%20Nuance


1
Linguistic Resources needed by Nuance
  • Jan Odijk 060528 Cocosda/Write Workshop

2
Overview
  • Nuance History
  • Nuance Technologies
  • Nuance Language Coverage
  • Which Languages are needed
  • Which data are needed
  • Advantages

3
Nuance History
  • ScanSoft (Digital Imaging) acquired
  • Lernout Hauspie speech divisions (2001)
  • Philips Speech Processing embedded and network
    divisions (2002)
  • Telelogue (2003)
  • LocusDialog (2003)
  • SpeechWorks (2004)
  • Talks (2004)
  • ART (2005)
  • Phonetic Systems (2005)
  • Rhetorical (2005)
  • MedRemote (2005)
  • Nuance (2005) ?company renamed Nuance
  • Dictaphone (2006)

4
Nuance Technologies
  • Digital Imaging
  • Speech Technologies
  • Text-to-Speech (TTS)
  • Automatic Speech Recognition (ASR)
  • Dictation
  • Speaker Verification
  • Audiomining
  • Speech Applications/Solutions
  • Automated Attendant Systems
  • Directory Assistance Systems
  • Dictation end-user application
  • Multimodal applications

5
Nuance Technologies
  • Platforms
  • Server
  • DeskTop
  • Embedded
  • Automotive
  • Mobile Phones
  • Domains
  • Horizontal
  • Vertical
  • Medical
  • Legal
  • Navigation
  • ....

6
Nuance Language Coverage
  • Broad language coverage
  • OCR supports 114 languages
  • DeskTop Dictation in 8 languages
  • TTS gt 23 languages
  • Telephony ASR gt 40 languages
  • Embedded ASR gt 11 languages
  • Broad language coverage necessary
  • Most business customers are operating
    internationally
  • Want a single provider of language and speech
    technologies

7
Nuance Language Coverage
  • Language Coverage must be further broadened!
  • Data are needed for that, but ...
  • Costs are high
  • No single company can afford the investments

8
Which Languages?
  • Priority 1
  • Arabic, Chinese (Mandarin, Cantonese), Danish,
    Dutch, English (UK), English (US), Farsi,
    Finnish, French, French (Canadian), German,
    Hindi, Indonesian, Italian, Malaysian, Pilipino
    (Tagalog), Polish, Portuguese, Portuguese
    (Brazil), Russian, Spanish, Spanish (American),
    Swedish, Thai, Turkish, Vietnamese,...
  • Priority 2
  • Bulgarian, Croatian, Czech, Estonian, Greek,
    Gujarati, Hebrew, Hungarian, Icelandic, Japanese,
    Kannada, Kazak, Khmer, Latvian, Lithuanian,
    Macedonian, Malayalam, Marathi, Norwegian,
    Punjabi Romanian, Serbian, Sesotho, Sinhalese,
    Slovak, Slovenian, Swahili, Tamil, Telugu,
    Ukrainian, Urdu, Uzbek, Xhosa, Zulu,...

9
Which Data?
  • Theres not Data but More Data
  • but...
  • Given Time and Costs constraints a minimal set is
    needed to develop technologies/applications for
    new languages

10
Which Data?
  • Network ASR SpeechDat family
  • SpeechDat-II, Orientel, SALA (I and II), LILA
  • Embedded ASR
  • Automotive SpeechDat-Car
  • Consumer Apps SPEECON
  • Pronunciation and Grammatical Lexicons LC-STAR
  • TTS synthesis TC-STAR
  • see
  • http//www.speechdat.org
  • http//www.tc-star.org
  • http//www.lc-star.com

11
Which Data?
  • Desktop Office data
  • Large Text Corpora (gt300 million tokens plain
    text)
  • news
  • business / finance
  • traffic messages, weather messages
  • e-mail
  • SMS
  • ...

12
Advantages
  • Research can be done in your own language
  • Part of the costs can be recovered by licensing
    data via ELRA to companies
  • Companies can develop technologies/applications
    for your languages
  • Contributes to securing the position of your
    language in the Internet era
  • Ask your government for funding and support
  • Some good examples
  • STEVIN Programme Netherlands/Flanders
  • UPC databases for Catalan (Asunción Moreno)
Write a Comment
User Comments (0)
About PowerShow.com