Part II. Statistical NLP - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Part II. Statistical NLP

Description:

Advanced Artificial Intelligence Part II. Statistical NLP Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 56
Provided by: www2Infor9
Category:

less

Transcript and Presenter's Notes

Title: Part II. Statistical NLP


1
Advanced Artificial Intelligence
  • Part II. Statistical NLP

Introduction and Grammar Models Wolfram Burgard,
Luc De Raedt, Bernhard Nebel, Kristian Kersting
Some slides taken from Helmut Schmid, Rada
Mihalcea, Bonnie Dorr, Leila Kosseim, Peter
Flach and others
2
Topic
  • Statistical Natural Language Processing
  • Applies
  • Machine Learning / Statistics to
  • Learning the ability to improve ones behaviour
    at a specific task over time - involves the
    analysis of data (statistics)
  • Natural Language Processing
  • Following parts of the book
  • Statistical NLP (Manning and Schuetze), MIT
    Press, 1999.

3
Contents
  • Motivation
  • Zipfs law
  • Some natural language processing tasks
  • Non-probabilistic NLP models
  • Regular grammars and finite state automata
  • Context-Free Grammars
  • Definite Clause Grammars
  • Motivation for statistical NLP
  • Overview of the rest of this part

4
Rationalism versus Empiricism
  • Rationalist
  • Noam Chomsky - innate language structures
  • AI hand coding NLP
  • Dominant view 1960-1985
  • Cf. e.g. Steven Pinkers The language instinct.
    (popular science book)
  • Empiricist
  • Ability to learn is innate
  • AI language is learned from corpora
  • Dominant 1920-1960 and becoming increasingly
    important

5
Rationalism versus Empiricism
  • Noam Chomsky
  • But it must be recognized that the notion of
    probability of a sentence is an entirely
    useless one, under any known interpretation of
    this term
  • Fred Jelinek (IBM 1988)
  • Every time a linguist leaves the room the
    recognition rate goes up.
  • (Alternative Every time I fire a linguist the
    recognizer improves)

6
This course
  • Empiricist approach
  • Focus will be on probabilistic models for
    learning of natural language
  • No time to treat natural language in depth !
  • (though this would be quite useful and
    interesting)
  • Deserves a full course by itself
  • Covered in more depth in Logic, Language and
    Learning (SS 05, prob. SS 06)

7
Ambiguity
8
NLP and Statistics
  • Statistical Disambiguation
  • Define a probability model for the data
  • Compute the probability of each alternative
  • Choose the most likely alternative

9
NLP and Statistics
Statistical Methods deal with uncertainty.They
predict the future behaviour of a systembased on
the behaviour observed in the past. ? Statistical
Methods require training data. The data in
Statistical NLP are the Corpora
10
Corpora
  • Corpus text collection for linguistic purposes
  • TokensHow many words are contained in Tom
    Sawyer?? 71.370
  • TypesHow many different words are contained in
    T.S.?? 8.018
  • Hapax Legomenawords appearing only once

11
Word Counts
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
? The most frequent words are function words
12
Word Counts
f nf 1 3993 2 1292 3 664 4 410 5
243 6 199 7 172 8 131 9 82 10
91 11-50 540 51-100 99 gt 100 102
How many words appear f times?
About half of the words occurs just once About
half of the text consists of the 100 most
common words .
13
Word Counts (Brown corpus)
14
Word Counts (Brown corpus)
15
Zipfs Law
word f r fr word f r fr the 3332 1
3332 turned 51 200 10200 and 2972 2
5944 youll 30 300 9000 a 1775 3
5235 name 21 400 8400 he 877 10
8770 comes 16 500 8000 but 410 20
8400 group 13 600 7800 be 294 30
8820 lead 11 700 7700 there 222 40
8880 friends 10 800 8000 one 172 50
8600 begin 9 900 8100 about 158 60
9480 family 8 1000 8000 more 138 70
9660 brushed 4 2000 8000 never 124 80
9920 sins 2 3000 6000 Oh 116 90 10440 Could 2 40
00 8000 two 104 100 10400 Applausive 1 8000
8000 Zipfs Law f1/r (fr
const)
Minimize effort
16
Conclusions
  • Overview of some probabilistic and machine
    learning methods for NLP
  • Also very relevant to bioinformatics !
  • Analogy between parsing
  • A sentence
  • A biological string (DNA, protein, mRNA, )

17
Language and sequences
  • Natural language processing
  • Is concerned with the analysis of sequences of
    words / sentences
  • Construction of language models
  • Two types of models
  • Non-probabilistic
  • Probabilistic

18
Key NLP Problem Ambiguity
  • Human Language is highly ambiguous at all levels
  • acoustic levelrecognize speech vs. wreck a
    nice beach
  • morphological levelsaw to see (past), saw
    (noun), to saw (present, inf)
  • syntactic levelI saw the man on the hill with a
    telescope
  • semantic levelOne book has to be read by every
    student

19
Language Model
  • A formal model about language
  • Two types
  • Non-probabilistic
  • Allows one to compute whether a certain sequence
    (sentence or part thereof) is possible
  • Often grammar based
  • Probabilistic
  • Allows one to compute the probability of a
    certain sequence
  • Often extends grammars with probabilities

20
Example of bad language model
21
A bad language model
22
A bad language model
23
A good language model
  • Non-Probabilistic
  • I swear to tell the truth is possible
  • I swerve to smell de soup is impossible
  • Probabilistic
  • P(I swear to tell the truth) .0001
  • P(I swerve to smell de soup) 0

24
Why language models ?
  • Consider a Shannon Game
  • Predicting the next word in the sequence
  • Statistical natural language .
  • The cat is thrown out of the
  • The large green
  • Sue swallowed the large green
  • Model at the sentence level

25
Applications
  • Spelling correction
  • Mobile phone texting
  • Speech recognition
  • Handwriting recognition
  • Disabled users

26
Spelling errors
  • They are leaving in about fifteen minuets to go
    to her house.
  • The study was conducted mainly be John Black.
  • Hopefully, all with continue smoothly in my
    absence.
  • Can they lave him my messages?
  • I need to notified the bank of.
  • He is trying to fine out.

27
Handwriting recognition
  • Assume a note is given to a bank teller, which
    the teller reads as I have a gub. (cf. Woody
    Allen)
  • NLP to the rescue .
  • gub is not a word
  • gun, gum, Gus, and gull are words, but gun has a
    higher probability in the context of a bank

28
For Spell Checkers
  • Collect list of commonly substituted words
  • piece/peace, whether/weather, their/there ...
  • ExampleOn Tuesday, the whether On
    Tuesday, the weather

29
Another dimension in language models
  • Do we mainly want to infer (probabilities) of
    legal sentences / sequences ?
  • So far
  • Or, do we want to infer properties of these
    sentences ?
  • E.g., parse tree, part-of-speech-tagging
  • Needed for understanding NL
  • Lets look at some tasks

30
Sequence Tagging
  • Part-of-speech tagging
  • He drives with his bike
  • N V PR PN N noun, verb,
    preposition, pronoun, noun
  • Text extraction
  • The job is that of a programmer
  • X X X X X X JobType
  • The seminar is taking place from 15.00 to 16.00
  • X X X X X X
    Start End

31
Sequence Tagging
  • Predicting the secondary structure of proteins,
    mRNA,
  • X A,F,A,R,L,M,M,A,
  • Y he,he,st,st,st,he,st,he,

32
Parsing
  • Given a sentence, find its parse tree
  • Important step in understanding NL

33
Parsing
  • In bioinformatics, allows to predict (elements
    of) structure from sequence

34
Language models based on Grammars
  • Grammar Types
  • Regular grammars and Finite State Automata
  • Context-Free Grammars
  • Definite Clause Grammars
  • A particular type of Unification Based Grammar
    (Prolog)
  • Distinguish lexicon from grammar
  • Lexicon (dictionary) contains information about
    words, e.g.
  • word - possible tags (and possibly additional
    information)
  • flies - V(erb) - N(oun)
  • Grammar encode rules

35
Grammars and parsing
  • Syntactic level best understood and formalized
  • Derivation of grammatical structure
    parsing(more than just recognition)
  • Result of parsing mostly parse treeshowing the
    constituents of a sentence, e.g. verb or noun
    phrases
  • Syntax usually specified in terms of a grammar
    consisting of grammar rules

36
Regular Grammars and Finite State Automata
  • Lexical information - which words are ?
  • Det(erminer)
  • N(oun)
  • Vi (intransitive verb) - no argument
  • Pn (pronoun)
  • Vt (transitive verb) - takes an argument
  • Adj (adjective)
  • Now accept
  • The cat slept
  • Det N Vi
  • As regular grammar
  • S -gt Det S1 terminal
  • S1 -gt N S2
  • S2 -gt Vi
  • Lexicon
  • The - Det
  • Cat - N
  • Slept - Vi

37
Finite State Automaton
  • Sentences
  • John smiles - Pn Vi
  • The cat disappeared - Det N Vi
  • These new shoes hurt - Det Adj N Vi
  • John liked the old cat PN Vt Det Adj N

38
Phrase structure
S
NP
VP
D
N
NP
V
PP
P
NP
N
D
N
D
the
dog
a
cat
into
the
garden
chased
39
Notation
  • S sentence
  • D or Det Determiner (e.g., articles)
  • N noun
  • V verb
  • P preposition
  • NP noun phrase
  • VP verb phrase
  • PP prepositional phrase

40
Context Free Grammar
S -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP
PPPP -gt P NPD -gt theD -gt aN -gt dogN -gt
catN -gt gardenV -gt chasedV -gt sawP -gt
into
Terminals Lexicon
41
Phrase structure
  • Formalism of context-free grammars
  • Nonterminal symbols S, NP, VP, ...
  • Terminal symbols dog, cat, saw, the, ...
  • Recursion
  • The girl thought the dog chased the cat

VP -gt V, SN -gt girlV -gt thought
42
Top-down parsing
  • S -gt NP VP
  • S -gt Det N VP
  • S -gt The N VP
  • S -gt The dog VP
  • S -gt The dog V NP
  • S -gt The dog chased NP
  • S -gt The dog chased Det N
  • S-gt The dog chased the N
  • S-gt The dog chased the cat

43
Context-free grammar
S --gt NP,VP. NP --gt PN. Proper noun NP --gt
Art, Adj, N. NP --gt Art,N. VP --gt VI.
intransitive verb VP --gt VT, NP. transitive
verb Art --gt the. Adj --gt lazy. Adj --gt
rapid. PN --gt achilles. N --gt
turtle. VI --gt sleeps. VT --gt beats.
44
Parse tree
45
Definite Clause GrammarsNon-terminals may have
arguments
S --gt NP(N),VP(N). NP(N) --gt Art(N),N(N). VP(N) --
gt VI(N). Art(singular) --gt a. Art(singular) --gt
the. Art(plural) --gt the. N(singular) --gt
turtle. N(plural) --gt turtles. VI(singular) --
gt sleeps. VI(plural) --gt sleep.
Number Agreement
46
DCGs
  • Non-terminals may have arguments
  • Variables (start with capital)
  • E.g. Number, Any
  • Constants (start with lower case)
  • E.g. singular, plural
  • Structured terms (start with lower case, and take
    arguments themselves)
  • E.g. vp(V,NP)
  • Parsing needs to be adapted
  • Using unification

47
Unification in a nutshell (cf. AI course)
  • Substitutions
  • E.g. Num / singular
  • T / vp(V,NP)
  • Applying substitution
  • Simultaneously replace variables by corresponding
    terms
  • S(Num) Num / singular S(singular)

48
Unification
  • Take two non-terminals with arguments and compute
    (most general) substitution that makes them
    identical, e.g.,
  • Art(singular) and Art(Num)
  • Gives Num / singular
  • Art(singular) and Art(plural)
  • Fails
  • Art(Num1) and Art(Num2)
  • Num1 / Num2
  • PN(Num, accusative) and PN(singular, Case)
  • Num/singular, Case/accusative

49
Parsing with DCGs
  • Now require successful unification at each step
  • S -gt NP(N), VP(N)
  • S -gt Art(N), N(N), VP(N) N/singular
  • S -gt a N(singular), VP(singular)
  • S -gt a turtle VP(singular)
  • S -gt a turtle sleeps
  • S-gt a turtle sleep fails

50
Case Marking
PN(singular,nominative) --gt heshe PN(singular
,accusative) --gt himher PN(plural,nominative)
--gt they PN(plural,accusative) --gt them S
--gt NP(Number,nominative), NP(Number) VP(Number
) --gt V(Number), VP(Any,accusative) VP(Number,Case
) --gt PN(Number,Case) VP(Number,Any) --gt
Det, N(Number)
He sees her. She sees him. They see her. But
not Them see he.
51
DCGs
  • Are strictly more expressive than CFGs
  • Can represent for instance
  • S(N) -gt A(N), B(N), C(N)
  • A(0) -gt
  • B(0) -gt
  • C(0) -gt
  • A(s(N)) -gt A(N), A
  • B(s(N)) -gt B(N), B
  • C(s(N)) -gt C(N), C

52
Probabilistic Models
  • Traditional grammar models are very rigid,
  • essentially a yes / no decision
  • Probabilistic grammars
  • Define a probability models for the data
  • Compute the probability of each alternative
  • Choose the most likely alternative
  • Ilustrate on
  • Shannon Game
  • Spelling correction
  • Parsing

53
Some probabilistic models
  • N-grams
  • Predicting the next word
  • Artificial intelligence and machine .
  • Statistical natural language .
  • Probabilistic
  • Regular (Markov Models)
  • Hidden Markov Models
  • Conditional Random Fields
  • Context-free grammars
  • (Stochastic) Definite Clause Grammars

54
Illustration
  • Wall Street Journal Corpus
  • 3 000 000 words
  • Correct parse tree for sentences known
  • Constructed by hand
  • Can be used to derive stochastic context free
    grammars
  • SCFG assign probability to parse trees
  • Compute the most probable parse tree

55
(No Transcript)
56
Sequences are omni-present
  • Therefore the techniques we will see also apply
    to
  • Bioinformatics
  • DNA, proteins, mRNA, can all be represented as
    strings
  • Robotics
  • Sequences of actions, states,

57
Rest of the Course
  • Limitations traditional grammar models motivate
    probabilistic extensions
  • Regular grammars and Finite State Automata
  • All use principles of Part I on Graphical Models
  • Markov Models using n-gramms
  • (Hidden) Markov Models
  • Conditional Random Fields
  • As an example of using undirected graphical
    models
  • Probabilistic Context Free Grammars
  • Probabilistic Definite Clause Grammars
Write a Comment
User Comments (0)
About PowerShow.com