Course Overview - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Course Overview

Description:

Title: PowerPoint Presentation Last modified by: Department of Computing Science Created Date: 1/1/1601 12:00:00 AM Document presentation format – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 25
Provided by: acuk
Category:

less

Transcript and Presenter's Notes

Title: Course Overview


1
Course Overview
  • What is AI?
  • What are the Major Challenges?
  • What are the Main Techniques?
  • Where are we failing, and why?
  • Step back and look at the Science
  • Step back and look at the History of AI
  • What are the Major Schools of Thought?
  • What of the Future?
  • What are we trying to do? How far have we got?
  • Natural language (text speech) (continued)
  • Robotics
  • Computer vision
  • Problem solving
  • Learning
  • Board games
  • Applied areas Video games, healthcare,
  • What has been achieved, and not achieved, and
    why is it hard?

2
Language Technology
Interlingua open1 open2
Meaning
Hard!
Text
Text
Cheaters shortcut
Speech
Speech
3
Modern Machine Translation
  • Prevalent approach uses statistics, following an
    idea by Warren Weaver (conceived as early as
    1947)
  • View translation as a form of decoding Dutch is
    just coded English (or the other way round)
  • i.e. look at the problem from the computers
    point of view
  • Deciphering coded text, which replaces each
    English word with a coded word
  • Suppose you have a large English text, and an
    even larger corpus of English
  • You guess the correct version of a coded word by
    comparing the frequency of that word in the
    corpus with the frequency of all the words in
    your text
  • E.g., most frequent word must be the, so the
    most frequent word in the corpus may be code for
    the. (Just a guess!)
  • Check whether this combination of guesses is a
    proper English text change where necessary
  • Can try with Google wound will cure/heal
    served his sorrow/sentence

4
Modern Machine Translation
  • But of course, Dutch is not just coded English.
    (For example, the right translation for open
    may depend on the words surrounding open.)
  • How do we find out how sentences in the two
    languages are related?
  • To get a good starting point, Machine Translation
    uses huge bilingual corpora (usually based on
    human translation)
  • Example Canadian Hansard corpus, bilingual
    French/English parliament proceedings, also Hong
    Kong
  • (But Ill use Dutch as an example)

5
Modern Machine Translation
  • Here we will not explain the statistical
    techniques used
  • Just observe Guess how expressions line up
    across two languages
  • Based on pure pattern matching. No knowledge of
    Dutch or English is required
  • NB in statistical translation program, no longer
    easy to see understanding followed by generation

6
Modern Machine Translation
  • perform various preparatory operations (e.g.,
    match corresponding sentences with each other)
  • hypothesise ways of matching smaller expressions
    with each other. Example 1
  • E that Blair responded
  • N dat Blair antwoordde
  • E whether Kennedy responded
  • N of Kennedy antwoordde

7
Modern Machine Translation
  • Here is a more interesting example, involving
    differences in word order between the two
    languages
  • Example 2
  • E that Blair responded to the question
  • N dat Blair op de vraag antwoordde
  • E whether Kennedy responded to the challenge
  • N of Kennedy op de uitdaging antwoordde
  • Need offsets in translation model

8
Models, Unigrams, Bigrams, Trigrams
  • Need a translation model and a language model
  • Translation model tells us likely translations
    (roughly)
  • Language model tells us how good those sentences
    are in the target language
  • Language model
  • Ideally we would like to know how common any
    sentence is
  • We will settle for pairs (bigrams)
  • Translation model
  • Often use something quite crude, like word by
    word
  • Correct positions with offsets
  • Good language model can save bad translation model

9
How far has Machine Translation advanced?
  • National Institute of Standards and Technology
    (NIST)
  • Regular competitions between MT systems
  • (Source K.Knight, Statistical MT Tutorial,
    Aberdeen 2005)

10
winner 2002
  • insistent Wednesday may recurred her trips to
    Libya tomorrow for flying
  • Cairo 6-4 (AFP) an official announced today in
    the Egyptian lines company for flying Tuesday is
    a company insistent for flying may resumed a
    consideration of a day wednesday tomorrow her
    trips to Libya of security council decision trace
    international the imposed ban comment

winner 2003
Egyptair Has Tomorrow to Resume its flights to
Libya Cairo 4-6 (AFP) Said an official at the
Egyptian Aviation Company today that the company
egyptair may resume as of tomorrow, Wednesday its
flights to Libya after the International Security
Council resolution to the suspension of the
embargo imposed on Libya.
11
Conclusion on Statistical MT
  • This approach to MT relies on massive parallel
    corpora these are not yet available for all
    language pairs
  • The MT system does not understand the content
    of the sentences
  • Perhaps progress using statistical methods will
    flatten in future
  • but they are starting to be combined with
    higher-level information

12
Practical Machine Translation
  • Types of translation
  • Rough translation
  • Could perhaps be post-edited by a monolingual
    human (cheaper)
  • Restricted source translation
  • Subject and form restricted, e.g. weather
    forecast
  • Pre-edited translation
  • Human pre-edits, e.g. Caterpillar English
  • Can improve original too
  • Literary

13
Summing up Modern Translation
  • Deep vs. Shallow?
  • Deep - comprehensive knowledge of the word.
  • Shallow - no knowledge.
  • So far, shallow approaches more successful.
  • Deep can be better on a particular domain if a
    lot of expert effort is put into building models
  • Shallow approach is much easier
  • Similar story in other areas of AI
  • Each of these programs on its own is highly
    specialised (i.e., limited)

14
On the other hand
  • Humans dont always get it right either!
  • French hotel Please leave your values at the
    front desk.
  • Athens hotel We expect our visitors to complain
    daily at the office between the hours of 9 and 11
    a.m.
  • Tokyo hotel room The flattening of underwear is
    the job of the chambermaid - get it done, turn
    her on.
  • Hong Kong tailor shop Order your summer suit
    now. Because of big rush we execute customers in
    strict rotation.
  • Men's room at Mexican golf course/resort
    Guests are requested not to wash their balls in
    the hand basins.
  • Budapest elevator due to out of order we
    regret that you are unbearable
  • Bangkok Dry Cleaner's Drop Trousers Here for
    Best Results
  • Tokyo hotel room Please take advantage of our
    chambermaids.
  • They do understand, but they may make the wrong
    choices in the target language

15
Speech Recognition
  • Signal processing to recognise features
  • Coarticulation model how each sound (phone)
    depends on neighbours
  • Dialect different possible pronunciations
  • To recognise isolated words use unigram language
    model again
  • Continuous speech use bigram or trigram model
  • Try
  • eat I scream vs. eat ice cream
  • eat a banana vs. eat a bandana

16
Speech Recognition
  • Humans are remarkably good because of high level
    knowledge
  • Computers
  • No background noise, single speaker, vocabulary
    few thousand words
  • gt99
  • In general with good acoustics
  • 60-80
  • On noisy phone
  • terrible

17
Natural Language Generation (NLG)
  • Natural Language Generation is better than having
    people write texts when
  • There are many potential documents to be written,
    differing according to the context (user,
    situation, language)
  • There are some general principles behind document
    design

18
Example Noun Phrase design
  • A noun phrase can convey an arbitrary amount of
    information
  • Someone vs.
  • a designer vs.
  • an old designer vs.
  • an old designer with red hair
  • How much information should we pack into a
    given Noun Phrase?
  • This is normally considered part of the
    aggregation task.

19
Some Issues to Consider
  • Preferred ordering within the text (e.g. most
    important first)
  • Readability of the Noun Phrase,
  • Flow of focus,
  • Successful use of pronouns and abbreviated
    references

20
Example Content
  • (NB we assume that words, basic syntax etc have
    been chosen)
  • This T-shirt was made by James Sportler .
  • Sportler is a famous British designer.
  • He drives an ancient pink Jaguar.
  • He works in London with Thomas Wendsop.
  • Wendsop won the first prize in the FWJG awards.
  • Can/should we add more to the Noun Phrase?

21
One possible addition
  • This T-shirt was made by James Sportler, who
    works in London with Thomas Wendsop .
  • Sportler is a famous British designer. He drives
    an ancient pink Jaguar.
  • Wendsop won the first prize in the FWJG awards.
  • Facts about Wendsop are now separated from one
    another (focus).
  • Wendsop now has greater prominence in the text
    (ordering)

22
Another possible addition
  • This T-shirt was made by James Sportler, a famous
    British designer who works in London with Thomas
    Wendsop, who won the first prize in the FWJG
    awards .
  • Sportler drives an ancient pink Jaguar.
  • The Noun Phrase is now very complex (readability)
  • He now doesnt seem to work in the second
    sentence (pronouns)

23
Another possible addition
  • This T-shirt was made by James Sportler, a famous
    British designer .
  • He drives an ancient pink Jaguar.
  • He works in London with Thomas Wendsop.
  • Wendsop won the first prize in the FWJG awards.
  • Possibly the best solution, but is this better
    than the original text?

24
Why is Natural Language Generation hard?
  • Natural Language Generation involves making many
    choices, e.g. which content to include, what
    order to say it in, what words and syntactic
    constructions to use.
  • Linguistics does not provide us with a
    ready-made, precise theory about how to make such
    choices to produce coherent text
  • The choices to be made interact with one another
    in complex ways
  • Many results of choices (e.g. text length) are
    only visible at the end of the process
  • There doesnt seem to be any simple and reliable
    way to order the choices
Write a Comment
User Comments (0)
About PowerShow.com