Computer Vision and Language Research Group - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Computer Vision and Language Research Group

Description:

... are not so lucky resources are disappointingly sparse. ... What Latex Tutorial Magazine 'That'll learn yer' - Mr T. Martin. 10. GI Achievements (so far... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 16
Provided by: andrewr98
Category:

less

Transcript and Presenter's Notes

Title: Computer Vision and Language Research Group


1
Computer Vision and Language Research Group
  • Andrew Roberts

Unsupervised Grammar Inference for Natural
Language Syntax Discovery
8th October 2003
2
Outline
  • Overview of Grammar Inference
  • Why is it so hard?
  • Unsupervised learning
  • Why bother?
  • What has been achieved so far
  • My approach
  • How do you evaluate?

3
Grammars
  • A grammar describes a language in terms of its
    constituents (lexicon) and the relationships
    between them.
  • E.g., grammar for sheep
  • baa
  • gt baa
  • gt baaaaaa
  • gt etc

4
Grammar Inference
  • Given a language, describe the grammar!
  • Both positive and negative examples aid
    inference
  • baaaa ?
  • baaba ?
  • baa ?

5
Unsupervised Grammar Inference
  • Extract grammar from raw text that represents the
    target language.
  • Unsupervised means that no explicit knowledge
    cannot be built in must be learned.
  • No negative examples.
  • Very difficult - Gold (1967) showed that it is
    impossible!

6
Why bother?
  • Various applications
  • speech processing
  • gene analysis
  • information retrieval
  • cryptology
  • psychology (innateness debate)
  • etc.

7
Why I Bother?
  • The cat sat on the mat
  • El Coronel no tiene quien le escriba
  • ?????????????

8
Why I bother?
  • English has been well researched lots of
    language resources available.
  • Other widely used languages (Arabic, Mandarin,
    even Spanish) are not so lucky resources are
    disappointingly sparse.

9
Commercial Break
  • Latex tutorial tomorrow, 3-4, ALL. Continues for
    next two Thursdays.
  • Tutorial of the month ????
  • - What Latex Tutorial Magazine
  • Thatll learn yer
  • - Mr T. Martin

10
GI Achievements (so far)
  • Regular languages (with or without ve examples)
    is generally solved to a high accuracy.
  • Context-free languages (supervised) can be
    acquired reasonably well (80 accuracy)
  • Context-free (unsupervised) still very far to go
    (50-60)

11
Problems - Ambiguity
  • One morning I shot an elephant in my pyjamas.
    How he got into my pyjamas I dont know. -
    Groucho Marx
  • There will be many ambiguous instances within raw
    text.
  • Need semantic information to overcome this issue.

12
Alignment Based Learning (Menno van Zaanen)
  • Principle of substitutability
  • I like to have a drink
  • Give me a drink
  • Give me help on databases
  • Bootstraps input to give
  • ((the cat) (sat (on (the mat))))

13
My Approach
  • Three stages
  • Acquiring POS by clustering words based on
    similar distributions.
  • Acquiring phrases using a variance on ABL.
  • Determining similar phrases, also by clustering
    on context.
  • Merge altogether to get a grammar.

14
Problems - Evaluating
  • Never a single correct grammar.
  • E.g., sheep language baaaa!
  • baa
  • baa
  • Grammars are equivalent, but not identical.
  • How can we compensate for this when evaluating GI
    accuracy?
  • More in Roberts (2003).

15
Thank you!
  • Thanks for your attention
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com