Foundations of statistical natural language processing - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Foundations of statistical natural language processing

Description:

Introduction Chapter 1 Foundations of statistical natural language processing – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 16
Provided by: ahme96
Category:

less

Transcript and Presenter's Notes

Title: Foundations of statistical natural language processing


1
Introduction
  • Chapter 1
  • Foundations of statistical natural language
    processing

2
NLP and Statistical Approach
  • Why many people are adopting a statistical
    approach to natural language processing?
  • How one should approach this approach?
  • We will begin with discussion of some
    philosophical themes and leading ideas

3
Approaches to language
  • Between 1960 and 1985 most of linguistics,
    Psychology, Artificial Intelligence and NLP was
    dominated by Rationalist approach
  • Significant part of the knowledge in the human
    mind is not derived by the senses but is fixed in
    advance, presumably by genetic inheritance

4
Rationalist approach
  • Dominated the field due to widespread acceptance
    of arguments by Noam Chomsky
  • Argument Problem of poverty of stimulus
  • Difficult to see how children can learn
    something complex as natural language from
    limited input
  • Questions?

5
Empiricist Approach
  • Also begins with cognitive abilities point
  • Difference between approaches is in terms of
    degree of belief
  • Mind does not begin with detailed sets of
    principles/procedures for various components of
    language and things like morphological structure,
    case marking etc.
  • Babys brain begins with general operations of
    associations, pattern recognition, and
    generalization

6
  • Empiricist approach to NLP suggest that
  • We can learn complicated and extensive
    language structures by specifying appropriate
    general language model
  • and then using Statistical, Pattern Recognition
    and Machine Learning models to a large amount of
    language use

7
SNLP
  • People cannot work from observing a large amount
    of language usage
  • Instead simple texts are used
  • A body of text is called Corpus (pl Corpora)
  • Empiricist corpus-based approach is seen in
    American Structuralists (Zelling Harris )
  • Languages structure can be discovered
    automatically using corpus

8
----
  • Chomskyan linguistics seeks to describe language
    model of human mind (I-language), for which texts
    (E-language) provide indirect evidence
  • Empiricist approaches describe E-language as it
    ACTUALLY occurs
  • Chomsky postulates
  • Linguistics competence
  • Linguistic performance

9
  • Chomskyan linguistics depends on categorical
    principles
  • Do or Do not satisfy
  • Same as American Structuralism
  • Categorical judgment of rare type of sentences
  • Our approach would be inspired of Statistical NLP
    draws from work of Shannon
  • Assign probabilities to linguistic events to
    decided which sentences are usual and which are
    unusual
  • Associations and preferences occur in totality of
    language use

10
Scientific Content
  • Questions that linguistics should answer
  • What kind of things do people say?
  • What do these things say/ask/request about the
    world.
  • Key point How knowledge of language is acquired
    by humans, and how they actually understand and
    generate sentences in real time

11
  • Competence Grammar
  • Said to underlie the language
  • Generative approach in speakers head
  • It suggests that there is a set of sentences
    -Grammatical Sentences- and other strings which
    are ungrammatical
  • The concept of grammaticality
  • Judged on how sentence is structurally well
    formed
  • Not according to what people say or semantically
    anomalous
  • e.g. Colorless green ideas sleep furiously

12
  • Syntactic grammaticality is a binary choice
  • Native speaker normally produces grammatical
    sentences
  • Two points
  • Binary choice is plausible for simple sentences
    but for complex it may be farfetched
  • Non native speakers speak something grammatical
    but somehow odd.
  • In addition to this, she insisted that women
    were regarded as a different existence from men
    unfairly

13
Non-categorical phenomena in language
  • Categorical view of language may be sufficient
    for many purposes but has its limitations
  • Frequency based analysis is required
  • To see non-categorical phenomena change in the
    language should be studied
  • e.g. While (noun)? time Take a while
  • While (Complementizer) While you were out
  • After analyzing frequency, category should be
    reanalyzed

14
  • near Adjective/Preposition
  • We will review that decision in the near future
  • He lives near the station
  • We live nearer the water than you thought
  • Grammatically adjectives and nouns do not take
    direct object but preposition
  • convenient for people
  • Comparative form is like adjective/adverbs
  • Blending and Language change
  • Kind of, sort of
  • We are kind of hungry

15
Summing up
  • Few attempts to use statistical NLP for
    explaining complex linguistics phenomena
  • This new way of looking at language may be able
    to account for things such as non categorical
    phenomenon and language change
  • Supportive argument
  • human cognition is probabilistic and that
    language must be too
Write a Comment
User Comments (0)
About PowerShow.com