Title: Introduction to Computational Linguistics
1Introduction to Computational Linguistics
- Marie-Catherine de Marneffe
2What is Computational Linguistics?
- Getting computers to perform useful tasks
involving human languages whether for - Enabling human-machine communication
- Improving human-human communication
- Doing stuff with language objects
- Examples
- Machine Translation
- Automatic Question Answering
- Speech Recognition
- Text-to-Speech Synthesis
- Text Understanding
3Some brief demos
- Machine Translation
- http//translate.google.com/translate_t
- http//babelfish.altavista.com/
- Text-To-Speech
- http//www-306.ibm.com/software/pervasive/tech/dem
os/tts.shtml - Question Answering
- http//www.powerset.com/
4I. Syntax and Parsing
5Syntax
- Why should we care?
- Grammar checkers
- Question answering
- Information extraction
- Machine translation
6Parsing
- Parsing is the process of taking a string and a
grammar and returning a parse tree for that
string - a flight left
7Phrase structure rules
- S ? NP VP
- NP ? Det N
- VP ? Verb
- Det ? a
- N ? flight
- Verb ? left
8Context-Free Grammars (CFG)
- Capture constituency and ordering
- Constituency
- How words group into units and how the various
kinds of units behave - Ordering
- What are the rules that govern the ordering of
- words and bigger units in the language
9Context?
- The notion of context in CFGs has nothing to do
with the ordinary meaning of the word context in
language - All it really means is that the non-terminal on
the left-hand side of a rule is out there all by
itself (free of context) - A ? B C
- Means that I can rewrite an A as a B followed by
a C - regardless of the context in which A is found
10Parsing
- Parsing assigning correct trees to input strings
- Correct tree
- a tree that covers all and only the elements of
the input and has an S at the top - For now enumerate all possible trees
- A further task disambiguation
- means choosing the correct tree from among all
the possible trees
11Parsing involves search
- As with everything of interest, parsing involves
a search which involves the making of choices - Well look at some basic methods to give you an
idea of the problem
12Top-Down Parsing
- Since were trying to find trees rooted with an S
(Sentences) start with the rules that give us an
S. - Then work your way down from there to the words.
13Top-Down Space
S
14Bottom-Up Parsing
- Of course, we also want trees that cover the
input words. So start with trees that link up
with the words in the right way. - Then work your way up from there.
15Bottom-Up Space
16Control
- We need to keep track of the search space and
have a strategy to make choices - Which node to try to expand next
- Which grammar rule to use to expand a node
17Top-Down, Depth-First, Left-to-Right Search
18Example
19Example
20Example
21Avoiding repeated work
- Parsing is hard, and slow. Its wasteful to redo
stuff over and over and over. - More efficient algorithm
- Dynamic programming parsing CKY
- (Cocke-Kasami-Younger)
-
22Ambiguity
- Bond shot the spy with a pistol.
23One possible structure
24Another possible structure
25Lots of ambiguity
- VP ? VP PP
- NP ? NP PP
- Show me the meals on flight 286 from SF to
Denver. - 14 parses!
26Lots of ambiguity
- Church and Patil (1982)
- Number of parses for such sentences grows at rate
of number of parenthesizations of arithmetic
expressions - Which grow with Catalan numbers
- PPs Parses
- 1 2
- 2 5
- 3 14
- 4 132
- 5 469
- 6 1430
27How to disambiguate parses?
- Probabilistic methods
- Augment the grammar rules with probabilities,
computed on Treebanks - Modify the parser to keep only most probable
parses - And at the end, return the most probable parse
28A statistical scientific revolution
- Computational Linguistics before 1990
- Hand-built parsers, hand-built dialogue systems
- High precision, low coverage methods
- Computational Linguistics after 1995
- Automatically trained parsers, unsupervised
clustering, statistical machine translation - High coverage, low precision methods
- LOGIC vs NGRAM (Gazdar, 1996)
29Ambiguity
- One morning I shot an elephant in my pajamas.
How he got into my pajamas I dont know. -
- Groucho Marx
30II. Text Understanding
31The textual inference task
- On the assumption that a piece of text T is true,
- does this imply the truth of the hypothesis H?
- T Sydney was the host city of the 2000 Olympics.
- H The Olympics have been held in Sydney.
- T Wal-Mart defended itself in court today
against claims that its female employees were
kept out of jobs in management because they are
women. - H Wal-Mart was sued for sexual discrimination.
- PASCAL RTE Challenge Dagan et al. 05
- US government AQUAINT program
32The contradiction detection task
- Given two sentences, are they contradictory to
one another? - T Sources in the intelligence community revealed
that Abu Zubaydah was a low-level al-Qaeda
operative handling minor logistics. - H Abu Zubaydah was a high-ranking member of
al-Qaeda. - T UN Secretary General Kofi Annan has expressed
deep concern over Saturday's Israeli commando
raid deep inside Lebanon, calling it a truce
violation. - H Israel insisted it had not breached the
ceasefire.
33Why is it useful?
- Several applications need automatic entailment
and contradiction detection - e.g., determining similarities and differences
in peoples positions
- I think that it is the right idea. We can make
sure that drivers who are illegal come out of the
shadows. - Barack Obama
I will not support driver's licenses for
undocumented people. -Hillary Clinton
34Approaches to text understanding
Graph matchingapproaches
35Approaches to text understanding
Very precise, But poor recall
Shallow, But robust
Graph matchingapproaches
36How the system works
37- Antonym only predictive of contradiction because
- modify the same entity
- in context of same polarity
38How to develop and evaluate such a system?
- Development sets
- (training sets, when we learn something)
- Test sets that contain the good answers
- 2 measures used precision and recall
- Precision exactness
-
- Recall completeness
items correctly retrieved
items in total
39Why people get into this field
- Passion about understanding how human language
works - Passion about finding ways to use the power of
computers to help processing of natural language
Kevin Knight