Grammaticality - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Grammaticality

Description:

For now, we call words the basic units of communication ... Dominant non-computational linguistic theory ... Linguistic theory assumes a categorical view of ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 21
Provided by: VasileiosH9
Category:

less

Transcript and Presenter's Notes

Title: Grammaticality


1
Grammaticality
  • Vasileios Hatzivassiloglou
  • University of Texas at Dallas

2
Linguistics
  • Linguistics as a discipline offers mostly
    observations and descriptions, which do not allow
    for the decoding of sentences or the generation
    of new ones
  • Linguistics has evolved towards a more empirical
    foundation and offers a number of theories
    (syntactic, semantic, and beyond)
  • However, these theories are rarely directly
    applicable to a computer

3
Evolution of NLP
  • Originally, natural language processing
    approaches were based on intuitions and limited
    examples
  • Knowledge about the language was encoded in
    computer programs in the form of rules

4
Problems with rule-based NLP
  • Cost of building up the knowledge base
  • Scaling up to realistic settings (thousands of
    words in the vocabulary)
  • Predicting and accommodating interactions between
    different words

5
Rationalism vs. empiricism
  • For extremely complex systems, the rational
    approach (collect and model knowledge beforehand)
    fails
  • Empiricism replaces the models of knowledge built
    by the developer with models built from data
  • Such models need not be accessible/understandable
    by their designer

6
Empiricism in NLP
  • (Re)Introduced in the early 1990s
  • Currently the dominant approach in NLP (as well
    as artificial intelligence)
  • Based on large collections of data (text)
  • Statistical / probabilistic models provide the
    framework they leave parameters unspecified
    which are filled from the data using machine
    learning methods

7
Topics we will cover
  • Background mathematical concepts Probability,
    probabilistic models, entropy, parameter
    estimation, hypothesis testing
  • Statistical properties of letters, words, and
    word combinations
  • Large text collections and issues coming up when
    working with them and trying to construct them

8
Topics we will cover
  • Sequential probabilistic models of text and their
    estimation, including maximum likelihood
    estimation, n-gram models and Markov Chains
  • Semantic analysis of text word sense
    disambiguation, lexical knowledge and its
    automatic acquisition

9
Topics we will cover
  • Hidden Markov Models, Estimation-Maximization and
    their application to speech understanding and OCR
  • Text alignment and machine translation
  • Probabilistic text generation

10
Topics we will cover
  • Selected topics beyond syntax and semantics
    (e.g., discourse analysis, summarization,
    question answering)
  • Text categorization, filtering, and clustering
  • Text mining

11
Grammar
  • A grammar is a device for determining the
    validity of a sequence of words
  • This definition brings many questions
  • what is device?
  • what is valid?
  • what is word?
  • what is sequence of words?

12
Elaborating on grammar
  • Can be any decision mechanism (e.g., rules,
    probabilistic model) can be generative
  • For now, we call words the basic units of
    communication
  • We look at sequences that are relatively complete
    in meaning and relatively independent from nearby
    sequences (phrases and sentences)

13
Grammaticality
  • Validity according to the grammar is called
    grammaticality
  • In principle, overall acceptability of a string
    of words (grammaticality can be violated at
    different levels)
  • In modern linguistics, grammaticality at the
    syntactic level
  • Is grammaticality well-defined?

14
Generative linguistics
  • Dominant non-computational linguistic theory
  • Distinguishes between competence level (innate
    ability of language, filled in by examples after
    birth) and performance level (language as
    expressed)

15
Testing grammaticality
  • One way is to ask proficient speakers of the
    language (usually native speakers) to validate
    strings and see whether they agree
  • Who did Sam think said John saw him?
  • What did Sally whisper that she had read?

16
Grammaticality as a binary decision
  • Linguistic theory assumes a categorical view of
    grammaticality a string is grammatical or isnt
  • But often the violation of grammaticality is
    subtle
  • The boys read Marys stories about each other.
  • Women were regarded as a different existence from
    men unfairly.

17
Discrete vs continuous grammaticality
  • Categorical decisions simplify the process
  • Language is based on discrete symbols
  • These symbols are transmitted in continuous form
    and are discretized (writing, speech)
  • Example /p/ versus /b/ sounds (bin vs pin)

18
Imperfect decisions
  • An advantage of a continuous model of
    grammaticality is the possibility to distinguish
    between more or less grammatical strings
  • Another advantage is the option to revise earlier
    decisions
  • the pin/bin is open
  • the pin/bin was pulled

19
Syntactic categorization of words
  • Syntactic models assign categories to words based
    on what syntactic function they carry out
  • These categories are called parts-of-speech
  • Common ones include
  • nouns, verbs, adjectives, adverbs (open classes)
  • prepositions, determiners, pronouns,
    complementizers, particles (closed classes)

20
Reading
  • Chapter 1 up to and including Section 1.2.1
Write a Comment
User Comments (0)
About PowerShow.com