Tools and Algorithms for NLP - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Tools and Algorithms for NLP

Description:

A semantic lexicon gather all words of the language in the form of pairs ... The lexicon of a natural language is the set of all its words with their linguistic ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 13
Provided by: guype1
Category:

less

Transcript and Presenter's Notes

Title: Tools and Algorithms for NLP


1
Tools and Algorithms for NLP
  • Bertrand Gaiffe and Guy Perrier

2
1 - Summary notions of linguistics and
computational linguistics
  • Generalities about natural languages
  • Formal languages and natural languages
  • Computational linguistics and Natural Language
    Processing

3
1.1 - Generalities about natural languages
  • The human language ability to represent and
    communicate knowledge

4
1.1 - Generalities about natural languages
  • The human language operates in a society by means
    of a system of signs, a natural language, to
    produce utterances.
  • The basic signs of a natural language are its
    words. A word is a pair of a phonological form
    (the  signifiant ) and a meaningful content
    (the  signifié ) (Ferdinand de Saussure , Cours
    de linguistique générale 1916).
  • From sounds to meaning, every natural language
    present different levels, which are autonomous
    while interacting closely. Every level gives rise
    to a linguistic field.

5
1.1 - Generalities about natural languages
  • Phonetics deals with the physical aspects of
    producing and percepting the sounds (phones) of
    a natural language.
  • Phonology studies how sounds contribute to build
    words via abstract units, phonemes, and how
    phonemes are realized in utterances modulo
    prosody.
  • Morphology concerns the way in which elementary
    signs, morphemes, combine to build words.

6
1.1 - Generalities about natural languages
  • Syntax concerns the combination of words to build
    sentences. There are two main views of syntax
    phrase structure grammars take constituent as the
    basic concept whereas dependency grammars take
    dependency between words as the basic concept.
  • Semantics concerns the meaning of linguistic
    utterances independently of their context-use.
    Logic is a usual framework for representing the
    meaning of utterances.
  • Pragmatics concerns the meaning of linguistic
    utterances relatively to their context-use
    (discourse, reference resolution, communicative
    structure, dialogue )

7
1.1 - Generalities about natural languages
  • The grammar of a natural language is a system of
    categories and rules governing the phonological,
    morphological, syntactic, semantic and pragmatic
    levels of the language.
  • The lexicon of a natural language is the set of
    all its words with their linguistic properties. A
    lexeme is an element of the lexicon and it
    contains phonological, morphological, syntactic
    and semantic information related to a word of the
    language.
  • The grammar and the lexicon are complementary
    both participate in the characterisation of the
    language.

8
1.2 - Formal languages and natural languages
  • A formal language L over a finite alphabet ? of
    symbols is a part of the monoid ? of words
    built from ? elements.
  • The class of languages defined over ? is equipped
    with operations intersection, union
    (disjunction), concatenation, complementation,
    Kleene closure...

9
1.2 - Formal languages and natural languages
  • If L is infinite, it is important to have a
    computation procedure for recognizing L , that is
    for deciding if any string from ? belongs to L.
    If such a procedure exists, L is said to be
    recursive.
  • If there exists only a computation procedure for
    enumerating L, L is said to be recursively
    enumerable.
  • A formal grammar is a concise definition of a
    formal language under the form of initial data
    and a procedure for generating the words of the
    language from the initial data.

10
1.2 - Formal languages and natural languages
  • In a formal language, symbols are concatenated to
    build the words of the language in a potentially
    infinite way. In a natural language, words are
    concatenated to build the utterances of the
    language in a potentially infinite way too.
  • In a formal language, words are double side
    objects (form, meaning) pairs. In a natural
    language, utterances are also double side
    objects they are (sound, meaning) pairs.
  • Chomskys results on the formalization of
    grammars for natural languages (1956) played a
    great part in the development of the theory of
    formal languages.

11
1.4 - Formal languages and natural languages
  • Ambiguity is excluded from formal languages
    whereas it has an important place in natural
    languages an utterance is ambiguous if there
    are multiple alternative linguistic structures
    that can be built from it. According to the
    source of this multiplicity, we distinguish
    lexical, phonological, syntactic and semantic
    ambiguity.
  • Formal languages are frozen whereas natural
    languages are evolutionary. What is linked to
    this property is that the border between
    acceptable linguistic utterances and non
    acceptable linguistic utterances is fuzzy and
    mobile.

12
1.3 - Computational Linguistics and Natural
Language Processing
  • Computational Linguistics (CL) and Natural
    Language Processing (NLP) are the application of
    mathematics and computer science to linguistics.
    The former is science-oriented whereas the latter
    is application-oriented.
  • They are driven by two paradigms
  • The symbolic paradigm aims at modelling
    pre-existent linguistic knowledge with symbolic
    systems.
  • The stochastic paradigm aims at extracting
    linguistic information from corpora with
    stochastic methods.
  • Natural language processing is structured
    according to two directions analysis from the
    utterances to their pragmatic interpretation and
    generation from pragmatic goals to the utterances
    representing their linguistic realization.
Write a Comment
User Comments (0)
About PowerShow.com