CHARM - PowerPoint PPT Presentation

About This Presentation
Title:

CHARM

Description:

What knowledge must the programme bring to bear? ... Brown applied it to the domain of translation from source language to target language. ... – PowerPoint PPT presentation

Number of Views:431
Avg rating:3.0/5.0
Slides: 12
Provided by: MikeR2
Category:
Tags: charm

less

Transcript and Presenter's Notes

Title: CHARM


1
CHARM
  • Lecture 1
  • Outline of the Problem

2
The Problem 1
The Maltese Alphabet A a B b C c D d E e F f G
g G g Gh gh H h a be ce de e ef ge ge ajn akka H
h I i Ie ie J j K k L l M m N n O o P
p he i ie je ke elle emme enne o pe Q q R r S s T
t U u V v W w X x Z z Z z qe erre esse te u ve we
exxe ze zej
We will refer to ordinary characters that could
yield Maltese characters as charms
3
The Problem 2
  • from KullHadd
  • FIL-KRIZI li ghandna fit-turizmu fil-gzejjer
    taghna l-aghar li qed jintlaqtu huma l-lukandi
    tal tliet stilel. L-ahhar studju li sar
    mid-Deloitte ghall-Assocjazzjoni Maltija
    tal-Lukandi u Ristoranti jghidilna kif in-nuqqas
    tal turisti u z-zieda fl-ispejjez ghal dawn
    il-lukandi fissru li ghamlu telf tal 19.8
    fir-rata tal qliegh taghhom u fosthom kien hemm
    min salva biss anki fl-aqwa tas-sajf permezz tal
    l-istudenti. L-istess studju juri li 70
    tas-sidien tal dawn il-lukandi jibzghu li se
    jkomplu jbatu min-nuqqas tal turisti u se
    jkollhom hafna kmamar vojta fix-xhur li gejjin.

4
The Problem 3
  • Is there some way in which we can recover the
    special Maltese characters automatically? If so
  • What is the underlying algorithmic model?
  • What knowledge must the programme bring to bear?
  • What resources are needed to build the knowledge
    base?

5
Noisy Channel Modelfor Sentence Translation
(Brown et. al. 1990)
target sentence
sourcesentence
sentence
diagram from Jurafsky Martin
6
Algorithmic Model
  • Noisy channel model is domain independent.
  • Brown applied it to the domain of translation
    from source language to target language.
  • We can use it for the domain of words.

7
Noisy Channel at Word Level
NOISY CHANNEL
KullHadd source
KullHadd target
8
Main Algorithm Four Steps
  • See target word t
  • Generate the set S of all possible source words
    for that word.
  • Pick the most probable source word s in S
  • Output s

9
Step 1 See Target Word
  • Preprocessing
  • noise
  • case
  • punctuation
  • hyphen
  • Tokenisation
  • words
  • numbers
  • other

10
Step 2
  • Generate S
  • If t contains charms generate
  • S s forall 0 lt i lt len(t) si
    ti \/ si m(ti)

11
Step 3
  • Pick the most probable source word s in return
    argmax(P(s)) for s in S
  • This is covered in lecture 2
Write a Comment
User Comments (0)
About PowerShow.com