Natural Language Processing - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Natural Language Processing

Description:

German research project on natural language translation. Video ... Sequence of word in target language with confidence measure. Case-Based Translation ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 15
Provided by: dUmn
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Processing


1
Natural Language Processing
  • Application Tasks
  • Application front end
  • E.g., database query, telephone menu systems
  • Translation
  • Text-based speech-based
  • Some issues in speech processing
  • Word recognition Meaning understanding
  • Vocabulary size, recognition rate, speaker
    dependence/independence, continuous/discrete
    speech

2
Verbmobil
  • German research project on natural language
    translation
  • Video
  • See link on http//verbmobil.dfki.de/overview-us.h
    tml
  • Reports
  • http//verbmobil.dfki.de/cgi-bin/verbmobil/htbin/d
    oc-access.cgi
  • Book
  • Wahlster, W. (2000). Verbmobil Foundations of
    Speech-to-Speech Translation. Springer-Verlag
    Berlin.

3
Verbmobil 1
  • Translation task
  • Speech input in source language
  • Translation to an internal representation
  • Speech output in target language
  • Source/target languages
  • English, German, Japanese
  • Speaker independent, and adaptive
  • Improves recognition results after a few words
    have been spoken by an individual
  • Dialog understanding
  • Tracks content of sentences from sentence to
    sentence
  • Uses this information (or knowledge) to assist
    in context sensitive translation

4
Verbmobil - 2
  • Speech
  • Low quality input signal (e.g., mobile phone)
  • Designed to deal with speech disfluencies
    repair
  • Changes mid-word or mid-sentence
  • Ums, ers, cases where short words left out in
    rapid speech
  • Domains
  • Appointment scheduling
  • Trip planning (10,000 word vocabulary)
  • Remote PC maintenance (35,000 word vocabulary)

5
Performance
  • approximately correct translation
  • if it preserves the intention of the speaker and
    the main information of his utterance (p. 1491,
    Wahlster, 2001)
  • 43,180 Verbmobil translations evaluated by human
    evaluators
  • 80 correct translations
  • 80 correct gist
  • 90 success rate for dialog tasks

6
System Architecture
  • Multi-blackboard
  • A blackboard is a global data object, allowing
    communication between components of a system
    through a common data format
  • E.g., a database
  • May allow for remote process access to data
  • 198 blackboards
  • Multi-engine
  • An engine is a software module, probably in this
    case a separate communicating process in the
    system
  • 69 modules modules typically subscribe to more
    than one blackboard
  • Word-Hypothesis Graph (WHG)
  • A graph structure representing the current
    hypothesis about the translation

7
Data Representations
  • Various components of the system (e.g., parser
    output) share a common data representations
  • VIT Central Data Format
  • E.g., See Figure 7
  • Processing module outputs specified with
    confidence values
  • Selection modules choose most promising result
  • Formalisms for under-specification
  • There is ambiguity in language, and system needs
    to add constraints on to reduce ambiguity
  • Constraints linguistic, discourse, domain

8
Data Corpus
  • Translation process is strongly influenced by
    data collected from actual people speaking
  • Multi-channel recordings made of
  • 3,200 spontaneous dialogs
  • Comprised
  • 79,562 conversational turns 1,658 different
    speakers 21.5 GB of data
  • Processing
  • Transcribed and annotated in 15 levels (Figure 5)
    structure offline, human transcription
  • Statistical properties extracted from all data,
    including transcriptions offline, automated
  • Some hand-crafted knowledge sources
  • E.g., language grammars (DCGs)

9
Usage of Prosody
  • Prosody
  • includes duration, pitch, energy, and pause
    features
  • Prosody differences in one language can
    correspond to lexical or syntactic differences in
    another language
  • wir habben noch
  • in English, can be
  • we still have or we have another
  • Depending on whether noch is stressed.
  • In Verbmobil
  • Algorithms detect prosody features including
    stress, and interpret meaning of words
    accordingly

10
Another Use of Prosody
  • During parsing, the clause boundary marks that
    are inserted into the Word Hypothesis Graph by
    the prosody module play the role of punctuation
    marks in written language (p. 1489)
  • Important
  • In speech processing programs that deal with
    continuous speech, we dont have have
    punctuation, and often no separation between words

11
Statistical Translation Module
  • Input
  • single best sentence hypothesis from speech
    recognizer
  • Prosodic information about phrase boundaries
    utilized
  • Output
  • Sequence of word in target language with
    confidence measure

12
Case-Based Translation
  • Substring-based
  • Searches sentence-aligned translation database
    for substrings matching current hypothesis
    translation
  • These substrings form units of processing
  • Translation templates
  • 30,000 templates, learned from sentence-aligned
    corpus
  • Date, time, naming expressions recognized by
    definite clause grammars (DCGs)
  • A search to explore cross-product graph of WHG
    (word hypothesis graph), subphrase tags, and
    template graph

13
Questions - 1
  • What aspects of Verbmobil are
  • A) empiricist oriented?
  • B) rationalist oriented?

14
Questions - 2
  • Verbmobil is specific to the following domains
  • Appointment scheduling
  • Trip planning (10,000 word vocabulary)
  • Remote PC maintenance (35,000 word vocabulary)
  • Why?
Write a Comment
User Comments (0)
About PowerShow.com