Title: Martin Kay
1Ling 138/238
Introduction to
Computational Linguistics
- Martin Kay
- Stanford University
2- 30 Introduction
- Oct 1 Complexity String search
- 6 Knuth-Morris-Pratt Boyer Moore
- 8 Suffix Trees
- 13 Tagging Alignment
- 15
- 20 Chomsky Hierarchy Regular Expressions
- 22
- 27 Finite-state automata
- 39
-
3- Nov 3 Morphology
- 5
- 10 Context-free grammar
- 12
- 17 Unification, HPSG, LFG
- 19
- 24 Machine Translation
- 26
- Dec 1 Summary Wrap-up
- 3
4Linguistics 138/238
- Martin Kay
- KAY_at_csli.stanford.edu
- 740 3043
- Margaret Jacks 124
- Office hours TuTh 4.15-5.45 p.m.
5Prerequisites and Expectations
- No prerequisites
- Classroom participation
- Occasional readings
- Learn Prolog
- Laboratory sessions
- Homework Problems
- Project
6Project
- Learn something new about language
- Significant programming
- Group work
- Modifying or amplifying existing code
A HMM-based tagger A searcher for tagged
text Implementation of Suffix trees Morphological
analysis Named-entity recognition
7Intellectual Relations
- Relation to
- Linguistics
- Psychology
- Artificial Intelligence
- Computer Science
Abstract
Process
8Computational Linguistics as Science
Computing
as
Inspiration
9Ideas from Computing
- Search
- Divide and Conquer
- Guides and Oracles
- Nondeterminism
- Dynamic Programming
- Scheduling, agendas
- Compilation
- Unification
- Automata Theory
- Co-routining and parallelism
- Top-down vs. bottom-up
- Complexity
10Ideas from Computing
- Search
- Nondeterminism
- Dynamic Programming
11A Maize
Search Nondeterminism Dynamic Programming
Keep you right hand on the wall
12A Maize
Search Nondeterminism Dynamic Programming
13Nondeterminism
Search Nondeterminism Dynamic Programming
- A process is nondeterministic if there are points
in it when a choice must be made, but the
information necessary to make the choice is not
available. - Solution Pick one of the alternatives. If it
does not work out, come back and pick another
one. - Note the information required to make the choice
was available after all!
14DynamicProgramming
Search Nondeterminism Dynamic Programming
Chalons
Metz
192
266
161
Paris
Strasbourg
458
619
288
234
115
620
344
Mulhouse
276
Dijon
15The CKY Chart
Search Nondeterminism Dynamic Programming
people np np np s s s like prep pp p
p v vp vp the det np np French adj n
n n drink n vp
Context free All phrase with the same
Coverage, and Category enter into larger
phrases as a single item
16Ideas from Computing
17Unification
Unification
- Attribute Report 1 Report 2 Combined Report
- eyes blue blue blue
- hair black or brown brown or red brown
- accent Italian Italian
- wife see below see below see below
- children Ahemed Angela Rebecca Angela Ahmed,
Angela - Rebecca
- age middle 48 Middle
- Wife
- eyes brown brown
- weight 247 lbs 112 Kg 247 lbs
- disposition surly surly
18Unification
Unification
- Attribute Report 1 Report 2 Combined Report
- eyes blue blue blue
- hair black or brown brown or red brown
- accent Italian Italian
- wife see below see below see below
- children Ahemed Angela Rebecca Angela Ahmed,
Angela - Rebecca
- age middle 48 Middle
- Wife
- eyes brown grey FAIL
- weight 247 lbs 112 Kg 247 lbs
- disposition surly surly
19English Agreement
Unification
- The dog sleeps
- The dogs sleep
- The dog slept
- The dogs slept
- The sheep sleeps
- The sheep sleep
- The sheep slept
- The sheep that was in the barn slept
- The sheep that were in the barn slept
20German Case
Unification
- Der Junge sah den Lehrer
- Den Lehrer sah der Junge
- Das Mädchen sah der Junge
- der Junge sah das Mädchen
- Die Lehrerin sah den Lehrer
- Die Lehrerin sah das Mädchen
21Ideas from Computing
22Finite-State Methods in Language Processing
Finite-State Methods
- The Application of a branch of mathematics
- The regular branch of automata theory
- to a branch of computational linguistics in which
what is crucial is (or can be reduced to) - Properties of string sets and string relations
with - A notion of bounded dependency
23Applications
Finite-State Methods
- Finite Languges
- Dictionaries
- Compression
- Phenomena involving bounded dependency
- Morpholgy
- Spelling
- Hyphenation
- Tokenization
- Morphological Analysis
- Phonology
- Approximations to phenomena involving mostly
bounded dependency - Syntax
- Phenomena that can be translated into the realm
of strings with bounded dependency - Syntax
24Ideas from Computing
25The Chomsky Hierarchy
Complexity
Grammar Language Automaton Type 0 Recursively
Turing Machines Enumerable
Sets Context-sensitive Context-sensitive Nondeter
ministic linear space bound Turing
Machines Context-free Context-free Nondeterminist
ic push- down automata LR(k) Deterministic
Context- Deterministic push-down free
automata Regular Expressions Regular
Sets Finite-state automata Left (Right) Linear
26Computation and Psychology
27Computational Linguistics as Engineering
Computing
as
Power
28Tools for Linguists
- TLF, OED
- Corpus Linguistics
- Field Notes
- Grammar Testing
29Translation
- MT, Translator's Tools
- Alignment, Dictionaries, Term Banks
- Normalization and Tuning
30Other Applications
- Writer's Tools
- Spelling
- Dictionary, Thesaurus
- Grammar
- Natural Language Interfaces
- Information Storage and Retrieval
31CL AI
- Text, Meaning, and Interpretation
Linguistics
???
Text
Interpretation
Meaning