Introduction to the Language Technologies Institute - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to the Language Technologies Institute

Description:

Entertainment Technologies (Animation, graphics) Language Technologies Institute ... Word Senses for 'line' (52 senses in Random House English-Japanese Dictionary) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 23
Provided by: jaimeca
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to the Language Technologies Institute


1
Introduction to the Language
Technologies Institute
  • Fall, 2008
  • Jaime Carbonell
  • jgc_at_cs.cmu.edu

2
School of Computer Science at Carnegie Mellon
University
  • Computer Science Department (theory, systems)
  • Robotics Institute (space, industry, medical)
  • Language Technologies Institute (MT, speech, IR)
  • Human-Computer Interaction Inst. (Ergonomics)
  • Institute for Software Research Int. (SE)
  • Machine Learning Department (ML theory)
  • Entertainment Technologies (Animation, graphics)

3
Language Technologies Institute
  • Founded in 1986 as the Center for Machine
    Translation (CMT).
  • Became Language Technologies Institute in 1996,
    unifying CMT, Comp Ling program.
  • Current Size 197 FTEs
  • 27 Faculty (including joint appointments)
  • 25 Staff
  • 125 Graduate Students (90 PhD, 40 MLT)
  • 10 Visiting Scholars

4
LTI Bill of Rights
  • Get the right information
  • To the right people
  • At the right time
  • On the right medium
  • In the right language
  • At the right level of detail

5
Slogan Challenges
  • right information
  • right people
  • right time
  • right medium
  • right language
  • right detail
  • IR, filtering, TC,
  • routing, personalization,
  • anticipatory analysis,
  • text, speech, video,
  • translation, bio,
  • summarization, expansion

6
on the Right Medium
  • Speech Recognition
  • SPHINX (Reddy, Rudnicky Rosenfeld, )
  • JANUS (Waibel, Schultz, )
  • Speech Synthesis
  • Festival (Black, Lenzo)
  • Handwriting Gesture Recognition
  • ISL (Waibel, J. Yang)
  • Multimedia Integration (CSD)
  • Informedia (Wactlar, Hauptmann, )

7
in the Right Language
  • High-Accuracy Interlingual MT
  • KANT (Nyberg, Mitamura)
  • Parallel Corpus-Trainable MT
  • Statistical MT (Lafferty, Vogel)
  • Example-Based MT (Brown, Carbonell)
  • AVENUE Instructible MT (Levin, Lavie, Carbonell)
  • Multi-Engine MT (Lavie, Frederking)
  • Speech-to-speech MT
  • JANUS/DIPLOMAT/AVENUE (Waibel, Frederking, Levin,
    Schultz, Vogel, Lafferty, Black, )

8
We also Engage in
  • Tutoring Systems (Eskenazi, Callan)
  • Linguistic Analysis (Levin, Mitamura)
  • Dialog Systems (Rudnicky, Waibel, )
  • Computational Biology
  • Protein structure/function (Carbonell, Langmead)
  • DNA seq/motifs (Yang, Xing, Rosenfeld)
  • Complex System Design (Nyberg, Callan)
  • Machine Learning (Carbonell, Lafferty, Yang,
    Rosenfeld, Xing, Cohen,)
  • Question Answering (Nyberg, Mitamura,)

9
How we do it at LTI
  • Data-driven methods
  • Statistical learning
  • Corpora-based
  • Examples
  • Statistical MT
  • Example-based MT
  • Text categorization
  • Novelty detection
  • Translingual IR
  • Knowledge-based
  • Symbolic learning
  • Linguistic analysis
  • Knowledge represent.
  • Examples
  • Interlingual MT
  • Parsing generation
  • Discourse modeling
  • Language tutoring

10
MMR Ranking vs Standard IR
documents
query
MMR
IR
? controls spiral curl
11

Adaptive Filtering over a Document Stream

Test documents
Training documents (past)
time
Topic 1 Topic 2 Topic 3
Current document On-topic?
Unlabeled documents
On-topic documents
RF
Off-topic documents
12
(No Transcript)
13
Types of Machine Translation
  • Interlingua

Semantic Analysis
Sentence Planning
Transfer Rules
Text Generation
Syntactic Parsing
Source (Arabic)
Target (English)
Direct SMT, EBMT
14
EBMT Example
English I would like to meet
her. Mapudungun Ayükefun trawüael fey
engu.

English The tallest man is my
father. Mapudungun Chi doy fütra chi wentru
fey ta inche ñi chaw.

English I would like to meet the
tallest man Mapudungun (new)
Ayükefun trawüael Chi doy fütra chi
wentru Mapudungun (correct) Ayüken ñi
trawüael chi doy fütra wentruengu.


15
Ambiguity Makes MT Hard
  • Word Senses for line (52 senses in Random House
    English-Japanese Dictionary)
  • Power line densen (??)
  • Subway line chikatetsu (???)
  • (Be) on line onrain (?????)
  • (Be) on the line denwachuu (???)
  • Line up narabu (??)
  • Line ones pockets kanemochi ni naru
    (??????)
  • Line ones jacket uwagi o nijuu ni suru
    (????????)
  • Actors line serifu (???)
  • Get a line on joho o eru (?????)

16
CONTEXT More is Better
  • The line for the new play extended for 3
    blocks.
  • The line for the new play was changed by the
    scriptwriter.
  • The line for the new play got tangled with the
    other props.
  • The line for the new play better protected the
    quarterback.

17
(Borrowed from Judith Klein-Seetharaman)
PROTEINS Sequence ? Structure ? Function
Primary Sequence
MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML
AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA
VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG
GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT
WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN
ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES
ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG
SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT
LCCGKNPLGD DEASTTVSKT ETSQVAPA
Folding
3D Structure
Complex function within network of proteins
Normal
18
PROTEINS Sequence ? Structure ? Function
Primary Sequence
MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML
AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA
VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG
GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT
WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN
ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES
ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG
SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT
LCCGKNPLGD DEASTTVSKT ETSQVAPA
Folding
3D Structure
Complex function within network of proteins
19
Predicting Protein Structures
  • Protein Structure is a key determinant of protein
    function
  • Crystalography to resolve protein structures
    experimentally in-vitro is very expensive, NMR
    can only resolve very-small proteins
  • The gap between the known protein sequences and
    structures
  • 3,023,461 sequences v.s. 36,247 resolved
    structures (1.2)
  • Therefore we need to predict structures in-silico

20
Linked Segmentation CRF
  • Node secondary structure elements and/or simple
    fold
  • Edges Local interactions and long-range
    inter-chain and intra-chain interactions
  • L-SCRF conditional probability of y given x is
    defined as

21
Discriminative Semi-Markov Model for Parallel
Right-handed ß-Helix Prediction
  • Structures
  • A regular super secondary structure with an an
    elongated helix whose successive rungs are
    composed of beta-strands
  • Conserved T2 turn
  • Computational importance
  • Long-range interactions
  • Biological importance
  • functions such as the bacterial infection of
    plants, binding the O-antigen, antifreeze,...

22
Some LTI Accomplishments
  • First large-scale web-spider (LYCOS)
  • First speech-speech MT (JANUS)
  • First high-accuracy text MT (KANT)
  • First minority-language MT (DIPLOMAT)
  • First high-accuracy translingual IR
  • First multidocument summarizer (MMR)
Write a Comment
User Comments (0)
About PowerShow.com