Topics in computational morphology and phonology - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Topics in computational morphology and phonology

Description:

This is both very concrete and very ... Basic maxim of intelligence in the Universe: ... This maxim leads to the framework of Minimum Description Length (MDL) ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 50
Provided by: johngol
Category:

less

Transcript and Presenter's Notes

Title: Topics in computational morphology and phonology


1
Topics in computational morphology and phonology
  • John Goldsmith
  • LSA Institute 2003

2
Basics
  • Focus of the course is on algorithmic learning of
    natural language.
  • This is both very concrete and very abstract
  • Well get our hands dirty (with data) and our
    minds clean (with theory).
  • Were really interested in systems that work from
    real data and derive real linguistic analyses.

3
Changes in our views
  • Naturally, this activity affects our view of what
    the ideal linguistic analysis isour view of
    linguistics does not emerge unscathed and
    unchanged.

4
Basics
  • Everything thats important is accessible through
    the Internet
  • The syllabus for this course
  • The software that well be exploring
  • Linguistica automatic learning of morphology
  • Phonological complexity learning of phonological
    complexity (tactics?)
  • Word Breaker
  • Dynamic Computational network

5
Software
  • Its important to download the software
    (especially Linguistica) and to run it, and see
    what it does and what it is telling us. Try
    running it on different corpora and different
    languages.

6
Evolution of some ideas
  • Late 1980s trying to develop a view of
    phonological theory in which we could employ a
    quantitative notion of well-formedness, along
    with a simplified set of rule-changes, plus the
    over-arching principle
  • A rule applies iff its application improves the
    well-formedness of the representation it applies
    to.

7
  • Connection to Well-formedness Condition of early
    autosegmental phonology (Goldsmith 1976)

8
  • But how to come up with a general
    characterization (especially a language-particular
    one) of well-formedness?
  • Interest in neural nets, in which a natural
    notion of complexity of a state exists, and in
    which neural nets could be understood to relax
    into the least complex state consistent with the
    input.

9
  • This idea will come back later.
  • At the same time, phonology (as much else in
    linguistics) seemed to show the hallmark of
    competition complexities arise when (otherwise
    simple) generalizations conflict with each other,
    and the language has to decide which wins. This
    is a very natural notion in the context of neural
    network calculation, but not from the point of
    view of serial rule application.

10
  • In addition, neural networks deal with
    competition by using numbers rather than ordering
    or ranking. This feels like a big change.

11
MS Shock
  • Late 1990s
  • Lack of interest in linguistic theory
  • Interesting tools HMM, decision trees, etc.
  • Is a linguist someone who cares about all that
    involves language? (Jakobson Linguista sum
    linguistici nihil a me alienum puto )

12
Language Identification
  • Why Language ID (LID)? Practical applications.
    Text / speech.
  • We typically define the challenge of choosing one
    language from within a small universe of
    languages (e.g., French, German, Spanish, Dutch,
    English) customers of MS Word?
  • First guess dead-ringers sounds that uniquely
    identify their language.

13
Dead-ringers
  • Problems
  • Fewer dead-ringers with written text (compared
    with sound) à? è? â?
  • Dead-ringers dont work we often have
    borrowings from the wrong language. Biggest
    case is in names (persons, places companies,
    etc.), but others exist.
  • Not a good enough strategy. Why?

14
Whats not good enough?
  • We demand of ourselves to quantify our success.
    In particular, we want 98 correct identification
    from universe of 8 languages within 5 words.

15
Language ID
  • Dead ringer approach inadequate because it
    overlooks the enormous amount of information that
    is present.
  • But what is that information?
  • It lies in the frequencies of the letters and the
    frequencies of the letter combinations.
  • But then

16
  • How do we combine all of the information we have
    regarding all of the frequencies of the letters
    and letter combinations in a single test sentence
    (5 word sequence, e.g.)?
  • There is a way.

17
Probability theory
  • is exactly what we need.
  • Each language provides a set of probabilities for
    letters (etc.). Each language then analyzes the
    test string and tells us how good an example the
    string is as an example of that language.

18
  • A string comes from a language L (out of our set)
    if language L assigns the highest probability to
    that string (compared to the probabilities
    assigned by the other languages).
  • All we need to do is collect the relevant
    information about letters (combinations) from a
    decent sample of each language.

19
Impact
  • The problem is solved, using numerical methods
    using probabilistic model
  • The solution requires more computation (but
    simple computation) than traditional linguistic
    analyses we pretty much need to have a computer.
  • The computer is definitely necessary in order to
    collect the frequencies.

20
But !
  • Once the program is written to collect
    frequencies, it will work perfectly well for any
    language. Its a rough Language-Data Acquisition
    Device for learning a very specific, particular
    linguistic task.

21
How particular is this task?
  • Eventually it occurred to me that we were asking
    a very linguistic question, even if we were
    focusing on standard orthography.
  • Suppose we had easy access to phonological
    transcriptions.
  • Wed be asking, for a given utterance, how good
    is it as an utterance in English? French? German?
    How well-formed is it?

22
  • And we can test different models (theories) and
    see instantaneously how good they are relative to
    each other.

23
Next Event
  • Word-breaking (word-segmentation) in Asian
    languages
  • Carl de Marckens dissertation
  • Word breaking in English, Chinese
  • Using Minimum Description Length analysis

24
Unsupervised learning
  • Theres a practical side to this
  • And a theoretical side.
  • Work to date
  • Morphology learning work on European languages,
    morphological rich languages.
  • Phonology
  • Syntax

25
General themesWhat (the heck) are we doing?
  • Mediationalist / distributionalist views of
    language (Huck and Goldsmith 1995)
  • This is strictly distributionalist but it does
    not need to be (work on MT machine translation).

26
Schools of explanation
  • Historical
  • Psychological
  • Social
  • Algorithmic
  • Computational linguistics is the embodiment of
    the algorithmic point of view.

27
Strengths of computational approaches to
linguistics
  • (a) They allow for testing of ideas against data.
  • (b) They allow for better exploration of data
    you get your hands dirty (and your mind clean).
  • (c) They suggest models to linguists.

28
  • Contrasts between the automatic learning and the
    hand-crafted rule-based computational linguists
  • The great debate of the 1990s in computational
    linguistics.

29
Machine learning
  • The study of the extraction of regularities from
    raw data

30
Linguistic theory...
  • The strongest requirement that could be placed on
    the relation between a theory of linguistic
    structure and particular grammars is that the
    theory must provide a practical and mechanical
    method for actually constructing the grammar,
    given a corpus of utterances. Let us say that
    such a theory provides us with a discovery
    procedure.

31
grammar
corpus
32
  • A weaker requirement would be that the theory
    must provide a practical and mechanical method
    for determining whether or not a grammar proposed
    for a given corpus is, in fact, the best grammar
    of the language from which the corpus is drawn (a
    decision procedure).

33
yes/no
corpus
grammar
34
  • An even weaker requirement would be that given a
    corpus and given two proposed grammars G1 and G2,
    the theory must tell us which is the better
    grammar....an evaluation procedure.

35
G1
"G1" or "G2"
G2
corpus
36
  • The point of view adopted here is that it is
    unreasonable to demand of linguistic theory that
    it provide anything more than a practical
    evaluation procedure for grammars. That is, we
    adopt the weakest of the three positions
    described above...

37
  • I think that it is very questionable that this
    goal is attainable in any interesting way, and I
    suspect that any attempt to meet it will lead
    into a maze of more and more elaborate and
    complex analytic procedures that will fail to
    provide answers for many important questions
    about the nature of linguistic structure. I
    believe that by lowering our

38
  • sights to the more modest goal of developing an
    evaluation procedure for grammars we can focus
    attention more clearly on truly crucial
    problems...The correctness of this judgment can
    only be determined by the actual development and
    comparison of theories of these various sorts.

39
  • Notice, however, that the weakest of these three
    requirements is still strong enough to guarantee
    significance for a theory that meets it. There
    are few areas of science in which one would
    seriously consider the possibility of developing
    a general, practical, mechanical method for
    choosing among several theories, each compatible
    with the available data.
  • Noam Chomsky, Syntactic Structures (1957)

40
  • Thats precisely one of the concerns of modern
    machine learning.

41
Paradigms of machine learning (ML) today
  • Minimum Description Length
  • Neural networks
  • Support vector machines
  • Decision trees

42
  • Machine learning provides a way (perhaps the only
    way) to test the claim that an information-rich
    UG (Universal Grammar) is necessary to account
    for the acquisition of human language.
  • Such a view carries with it the claim that UG is
    learned on an evolutionary scale by an
    uncharacterized species-level learning mechanism
    not a very plausible idea, but not impossible
    something like that has indeed happened for,
    e.g., vision, but then again languages vary more
    than vision systems do, and mammals had longer
    than homo sapiens.

43
Probabilistic approaches
  • A better name would be QTE the quantitative
    theory of evidence.
  • Probability theory is not fuzzy.
  • More importantly, it is not in any fashion
    inimical to structurally-based theories (a
    wrong-headed notion that many linguists have
    picked up).
  • Probabilistic models (or their supporters)
    sometimes appear to be skeptical about structure,
    but only because probabilistic modelers want to
    wring every last bit of quantitative result out
    of a simpler model before investing themselves in
    a more complex model, one which may require a
    great deal of mathematical work to get up and
    running.

44
Contrast between generative and probabilistic
approach
  • Principal difference
  • Generative grammar given an alphabet, specify
    explicitly those combinations (representations
    strings) that are in and those that are out.
  • Probabilistic model given an alphabet, specify a
    distribution over all possible combinations.
    (terms distribution support).

45
  • Working with real people using language, its
    often of little interest to distinguish between
    the In and the Out.
  • For example,

46
Information theory
  • Information theory's usefulness, and its central
    tool, the notion of information content
  • positive log probability, or
  • average positive log probability.

47
Math?
  • The only math youll need to feel comfortable
    with is logarithms
  • Mainly
  • -1 log (x), where x is a number between 0 and
    1.

48
Basic maxim of intelligence in the Universe
  • A general goal of quantitative modeling, not just
    for linguistics, but for intelligence in the
    universe
  • Minimize the complexity of the perceived
    universe.
  • This maxim leads to the framework of Minimum
    Description Length (MDL)

49
Minimum Description Length
  • You must simultaneously maximize the ability of
    your theory to accurate describe (model) the data
    and
  • Minimize the complexity of the theory (so it
    wont overfit the data so you wont always be
    looking at the same data over and over and
    oversound familiar?)
Write a Comment
User Comments (0)
About PowerShow.com