COMP 791A: Statistical Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

COMP 791A: Statistical Language Processing

Description:

Title: COMP 790: Statistical Language Processing Last modified by: Leila Kosseim Created Date: 12/7/1999 2:57:41 AM Document presentation format – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 54
Provided by: umiacsUmd3
Category:

less

Transcript and Presenter's Notes

Title: COMP 791A: Statistical Language Processing


1
COMP 791A Statistical Language Processing
Linguistic Essentials Chap. 3
2
Levels of study of NLP
  • Lexical
  • Possible words in a given language
  • rose ?gellapou
  • Phonetics phonology
  • How words are related to sounds
  • rose roz
  • Parts-of-speech Morphology
  • How words are constructed from basic meaning
    units (morphemes)
  • friend ly --gt friendly friend s --gt
    friends
  • rose ly ? rosely woman s ? womans
  • Phrase Structure and Syntax
  • How words can be ordered to form correct
    sentences
  • ?Red the is rose / adj det verb noun
  • The rose is red / det noun verb adj

3
Levels of study of NLP (cont)
  • Semantics
  • What words mean (lexical semantics, word sense
    disambiguation)
  • chair --gt furniture / person
  • How word meanings are combined into the meaning
    of sentences.
  • The chair is broken.
  • The chair is sick.
  • Pragmatics
  • How language conventions affects the literal
    meaning (interpretation)
  • Do you have the time?
  • Do you have the children?
  • Discourse
  • How surrounding sentences affect interpretation
  • The chairs leg is broken. He went skiing last
    week-end.
  • The chairs leg is broken. Someone placed a 500kg
    package on it.
  • World-Knowledge
  • How general knowledge about the world affects
    interpretation
  • The prof sent the student to see the chair
    because he was fed up with his behavior.
  • The prof sent the student to see the chair
    because he wanted to see him.
  • The prof sent the student to see the chair
    because he was taking in class.

4
Levels of study of NLP
  • Lexical
  • Phonetics phonology
  • Parts-of-speech Morphology
  • Phrase Structure and Syntax
  • Semantics
  • Pragmatics
  • Discourse
  • World-Knowledge

5
Parts of Speech and Morphology
  • Parts of Speech (POS)
  • word/lexical/syntactic/grammatical
    categories/tag/class
  • Ex noun, verb, adjectives, prepositions,
  • Morphology
  • study and description of word formation in a
    language
  • modification of a root form (stem) by affixes
  • affix prefixes, suffixes, infixes, circumfixes
  • and exceptions thief --gt thieves chief --gt
    chiefs
  • Word categories are systematically related by
    morphological processes

6
Morphological processes
  • Inflection
  • to indicate case, gender, number, tense, person,
    mood, or voice
  • does not change the words grammatical class or
    meaning significantly
  • car --gt cars
  • talk --gt talking
  • Derivation
  • creation of a new word
  • may have different meaning and/or grammatical
    class
  • infect --gt disinfect
  • grateful --gt ungrateful
  • wide (adjective) --gt widely (adverb)
  • teach (verb) --gt teacher (noun)
  • Compounding
  • merging 2 or more words into a single one
  • written as separate words
  • but pronounced as a single word / denotes 1
    single concept
  • so merits an entry in lexicon

7
Classes of POS
  • Open (lexical) class
  • things, actions, events,
  • ex. cat, John, eat
  • new words can be added easily
  • nouns, verbs, adjectives, adverbs
  • some languages do not have all these categories
  • Closed (functional) class
  • generally function/grammatical words
  • ex. the, in, and, for
  • relatively fixed membership
  • prepositions, determiners, pronouns,
    conjunctions, particles, numerals, auxiliary verbs

8
Main POS
  • Open class
  • Noun refers to entities like people, places,
    things or ideas.
  • Adjective describes the properties of nouns or
    pronouns.
  • Verb describes actions, activities and states.
  • Adverb describes a verb, an adjective or
    another adverb.
  • Closed class
  • Pronoun word that take the place of a noun or
    other.
  • Determiner describes the particular reference
    of a noun.
  • Preposition - expresses spatial or time
    relationships.

9
Nouns (open)
  • Entities like people, places, things or ideas
  • ex dog, tree, Mary, idea
  • Typical inflections
  • number (singular, plural),
  • gender (masculine, feminine, neuter),
  • case (nominative, genitive, accusative, dative)
  • Sub-categories
  • proper nouns (John)
  • adverbial nouns (today, home)

10
Verbs (open)
  • Actions, activities, and states
  • The men work in the field.
  • The men are working in the field.
  • The men are in the field.
  • Typical inflections
  • tenses present, past, future
  • other inflection number, person
  • aspect progressive, perfective
  • voice active, passive
  • Sub-category
  • auxiliaries (considered closed-class words)
  • ex be, do, will
  • modal verbs (considered closed-class words)
  • ex can, should, could
  • main verbs

11
Main verbs
  • Transitive
  • requires a direct object (found with questions
    what? or whom?)
  • ?The child broke.
  • The child broke a glass.
  • Intransitive
  • does not require a direct object.
  • The train arrived.
  • Some verbs can be both transitive and
    intransitive
  • The ship sailed the seas. (transitive)
  • The ship sails at noon. (intransitive)
  • I met my friend at the airport. (transitive)
  • The delegates met yesterday. (intransitive)

12
Adjectives (open)
  • Properties and attributes
  • long road
  • rainy day
  • attractive hat
  • Typical inflections
  • number, gender, case
  • Sub-categories
  • comparative (richer)
  • superlative (richest)

13
Adverbs (open)
  • words added to a verb, adjective, adverbs or
    other to expand its meaning
  • You must set up the copy now.
  • Mary walks gracefully.
  • Sometimes I take a walk in the woods.
  • Jack usually leaves the house at seven.
  • I have always admired her.
  • sub-categories
  • locative (here)
  • degree (very)
  • manner (slowly)
  • temporal (late, yesterday (noun?))

14
Closed class categories
  • Determiners
  • words that makes specific the denotation of a
    noun phrase
  • articles the hat, a hat
  • demonstrative this hat, that hat
  • possessive Johns hat, my hat, her book
  • wh-determiner which hat, whose hat
  • quantifier some hat, every hat
  • Prepositions
  • words that show the relationship between certain
    words in a sentence
  • The accident occurred under the bridge.
  • by, to, at,
  • Conjunctions
  • words used to join other words or group of words
  • or, when, but, and,
  • Auxiliary modal verbs
  • be, do, can , may, should,

15
Closed class categories (cont)
  • Particles
  • words that are added to main verbs to construct
    different verbs
  • checkout check out, makeup make up
  • Ex
  • She made up a story
  • She made it up
  • particles vs. prepositions
  • she ltran upgt a bill / she ltrangt ltupgt a hill
  • Numerals
  • one, third

16
Closed class categories (cont)
  • Pronouns
  • a word that replaces a noun or even another
    sentence
  • ex she, ourselves, mine, that
  • subcategories
  • Personal
  • You are very nice.
  • Possessive
  • Mine is nicer.
  • Interrogative used to ask questions who?,
    what?, which?
  • Who is that girl ?
  • Demonstrative point out definite persons,
    places or things this, these, that
  • This is my book.
  • He said he was busy, but that was a lie.
  • Relative joins the clause which is introduced
    its own attachment who, which, that
  • She is the girl who won the race.
  • ...

17
Other parts of speech
  • Interjections
  • Ouch!
  • Negatives
  • no, not
  • Politeness markers
  • Hello, bye
  • Existential
  • There are 3 students sleeping.

18
Summary
  • Open class
  • nouns cat, spirit
  • verbs eat, cook
  • adjectives slow, large
  • adverbs slowly
  • Closed class
  • prepositions on, under, at
  • determiners a, the, some
  • pronouns she, who, I, other
  • conjunctions and, but, or
  • auxiliary verbs can, may, should
  • particles up, on, off
  • numerals one, two, first

19
The substitution test
  • Basic test to determine if 2 words belong to the
    same POS class
  • intelligent
  • The sad one is in the corner.
  • green
  • fat

20
POS Tagging
  • Automatically assign POS tags to words in a text.
  • Children/NOUN eat/VERB sweet/ADJECTIVE candy/NOUN
  • The/ARTICLE children/NOUN ate/VERB the/ARTICLE
    cake/NOUN
  • The/ARTICLE news/NOUN has/AUXILIARY been/MAIN
    VERB quite/ADVERB sad/ADJECTIVE in/PREPOSITION
    fact/NOUN ./PERIOD

21
Why do POS Tagging?
  • 1st step towards NLU
  • easier then full NLU (results gt 95 accuracy)
  • Useful for
  • speech recognition/ synthesis (better accuracy)
  • how to recognize/pronounce a word
  • CONtent /noun VS conTENT/adj
  • stemming in IR
  • which morphological affixes the word can take
  • adverb - ly noun (friendly - ly friend)
  • Indexing in IR
  • pick out nouns which may be more important than
    other words in indexing documents

22
Tag Sets
  • A tag indicates the various conventional parts of
    speech.
  • Different Tag Sets have been used
  • Ex. Brown Tag Set, Penn Treebank Tag Set
  • Tag examples
  • NP Proper noun
  • NN Singular noun
  • AT Article
  • DET Determinant
  • More on this later

23
Penn Treebank tag Set
Tag Description Examples
CC conjunction, coordinating and but either et for less minus neither nor or plus so therefore
CD numeral, cardinal mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one
DT determiner all an another any both del each either every half la many much
IN preposition or subordinating conjunct. astride among upon whether out inside pro despite on by throughout
JJ adjective or numeral, ordinal third ill-mannered pre-war regrettable oiled calamitous first
JJR adjective, comparative bleaker braver breezier briefer brighter brisker broader bumper
NN noun, common, singular or mass common-carrier cabbage knuckle-duster Casino afghan shed
NNP noun, proper, singular Motown Venneboerger Czestochwa Ranzer Conchita Trumplane
NNS noun, common, plural undergraduates scotches bric-a-brac products bodyguards facets
PRP pronoun, personal hers herself him himself it itself me myself one oneself ours
RB adverb occasionally unabatingly maddeningly adventurously professedly
RP particle aboard about across along apart around aside at away back
TO "to" as preposition or infinitive marker to
VB verb, base form ask assemble assess assign assume atone attention avoid bake
VBD verb, past tense dipped pleaded swiped wore soaked tidied convened halted
VBG verb, present participle or gerund telegraphing stirring focusing angering judging stalling lactating
VBN verb, past participle imitated dilapidated aerosolized chaired languished panelized used
VBP verb, present tense, not 3rd p. singular predominate wrap resort sue twist spill cure lengthen brush
VBZ verb, present tense, 3rd p. singular bases reconstructs marks mixes displeases seals carps weaves

24
Ambiguities in POS tagging
  • Children eat sweet candy / noun.
  • Too much boiling will candy / adjective the
    molasses.
  • Fruit flies / ? like / ? a banana.

25
Levels of study of NLP
  • Lexical
  • Phonetics phonology
  • Parts-of-speech Morphology
  • Phrase Structure and Syntax
  • Semantics
  • Pragmatics
  • Discourse
  • World-Knowledge

26
Syntax or Phrase Structure
  • Syntax
  • study of the regularities and constrains of word
    order and phrase structure
  • the book is red vs red book is the
  • Grammar
  • expresses the relations among the constituents of
    a sentence

27
Constituents
  • also called, syntactic structures
  • Main Constituents
  • S sentence The boy is happy.
  • NP noun phrase the little boy Sam Smith
  • I three boy from Montreal
  • VP verb phrase eat an apple sing
  • leave Boston in the morning
  • PP prepositional phrase in the morning about my
    ticket
  • AdjP adjective phrase really funny rather clear
  • very large
  • AdvP adverb phrases slowly really slowly

28
Sentence Moods/Types
  • Declarative
  • Mary eats.
  • S --gt NP VP
  • Imperative
  • Eat!
  • S --gt VP
  • Yes-No Question
  • Did Mary eat?
  • S --gt Aux NP VP
  • Wh-Question
  • When did Mary eat?
  • S --gt WH-pro Aux NP VP

29
Noun Phrases
  • NP --gt pre-modifiers head post-modifiers
  • head central noun in NP
  • the little boy, the boy from Montreal
  • pre-modifiers
  • determiners, cardinal, ordinal, quantifier
  • the boy, two boys, first boy, several boys
  • AdjP
  • funny boy, really funny boy
  • post-modifiers
  • PP
  • flights from Montreal
  • non-finite clause
  • gerundive (-ing)
  • flights arriving from Montreal
  • -ed
  • dinner served on board, jewels stolen from the
    queen
  • infinitive form
  • flight to arrive from Montreal

30
Verb Phrases
  • VP --gt head-verb complements adjuncts
  • Some VPs
  • Verb eat.
  • Verb NP leave Montreal.
  • Verb NP PP leave Montreal in the morning.
  • Verb PP leave in the morning.
  • Verb S think I would like the fish.
  • Verb VP want to leave.
  • want to leave Montreal.
  • want to leave Montreal in the morning.
  • want to want to leave Montreal in the morning.

31
Subcategorisation frames
  • Some verbs can take complements that others
    cannot
  • I want to fly. I find to fly.
  • Verbs are subcategorized according to the
    complements they can take --gt subcategorisation
    frames
  • traditionally transitive vs intransitive
  • nowadays up to 100 subcategories / frames

32
Prepositional phrases
  • PP --gt Preposition NP
  • from Japan
  • inside my blue bag

33
Adjective Phrases
  • AdjP --gt Adj Modifiers
  • tall
  • very tall
  • taller than Mary

34
Adverb Phrases
  • AdvP --gt Adv Modifiers
  • affirmatively
  • very graciously
  • rather secretively

35
Context Free Grammars
  • set of non-terminal symbols
  • constituents parts-of-speech
  • S, NP, VP, PP, Det, N, V, ...
  • set of terminal symbols
  • lexicon of words punctuation
  • cat, mouse, nurses, eat, ...
  • a non-terminal designated as the starting symbol
  • sentence S
  • a set of re-write rules
  • having a single non-terminal on the LHS and one
    or more terminal or non-terminal in the RHS
  • S --gt NP VP
  • NP --gt Pro PN Det Nominal

36
A simple context-free grammar
  • S --gt NP VP
  • NP --gt AT NNS
  • NP --gt AT NN
  • NP --gt NP PP
  • VP --gt VP PP
  • VP --gt VBD
  • VP --gt VBD NP
  • P --gt IN NP
  • NNS --gt children
  • NNS --gt students
  • NNS --gt mountains
  • VBD --gt slept
  • VBD --gt ate
  • VBD --gt saw
  • AT --gt the
  • IN --gt in
  • IN --gt of
  • NN --gt cake

The Grammar
The Lexicon
37
A parse tree
  • a tree representation of the application of the
    grammar to a specific sentence.

38
Stochastic Grammars
  • Grammars obtained by adding probabilities to
    algebraic (i. e., non-probabilistic) grammars.
  • 1 S --gt NP VP
  • 0.4 NP --gt AT NNS
  • 0.4 NP --gt AT NN
  • 0.2 NP --gt NP PP
  • 0.1 VP --gt VP PP
  • 0.1 VP --gt VBD
  • 0.8 VP --gt VBD NP
  • 1 P --gt IN NP

39
Syntactic Dependencies
  • Local dependency
  • dependency between two words expressed within the
    same syntactic rule.
  • The 3/plural books/plural.
  • n-grams models this very well.
  • Non-local dependency
  • two words can be syntactically dependent even
    though they occur far apart in a sentence
  • Ex subject-verb agreement
  • The children who found a wallet on the street
    yesterday while walking their dog were given a
    reward.
  • challenge for certain statistical NLP approaches
    (ex. n-grams) that model local dependencies.

40
Difficulties in parsing
  • Attachment ambiguity
  • The children ate the cake with a spoon.
  • The children ate (the cake with a spoon).??
  • The children (ate with a spoon).??

41
Other difficulties
  • NP bracketing
  • plastic cat food can cover
  • --gt ? (plastic cat) (food can) cover
  • --gt ? plastic (cat food can) cover
  • --gt ? (plastic cat food) (can cover)
  • Conjunctions and appositives
  • Maddy, my dog, and Samy
  • --gt ?(Maddy, my dog), and (Samy)
  • --gt ?(Maddy), (my dog), and (Samy)

42
Another Ambiguity Garden-Path Sentences
  • well-studied class of syntactic ambiguity
  • sentence is re-analysed when the last word in
    encountered
  • humans have difficulty analysing such sentences
  • Example
  • The horse raced past the barn fell.
  • (the horse that was raced past the barn) fell.

43
Garden Path Wrong Parse
  • S NP The horse VP raced past the barnfell
  • dt determiner
  • n noun
  • v verb
  • p preposition
  • S sentence
  • NP noun phrase VP verb phrase
  • PP prepositional phrase

44
Garden Path Right Parse
  • S NP The horse PAP raced past the barnVP
    fell
  • dt determiner
  • n noun
  • v verb
  • p preposition
  • S sentence
  • NP noun phrase VP verb
    phrase PP prepositional phrase
  • PAP passive phrase

45
Levels of study of NLP
  • Lexical
  • Phonetics phonology
  • Parts-of-speech Morphology
  • Phrase Structure and Syntax
  • Semantics
  • Pragmatics
  • Discourse
  • World-Knowledge

46
Semantics
  • the study of the meaning of words, constructions,
    and utterances
  • can be divided into two parts
  • lexical semantics
  • meaning of words
  • compositional semantics
  • Meaning of sentences and discourse
  • the meaning of the whole often differs from the
    meaning of the parts.

47
Lexical Semantics
  • Meaning of individual words
  • I went to the bank of Montreal and deposited 50.
  • I went to the bank of the river and dangled my
    feet.
  • Word Sense Disambiguation
  • Determining which sense of a word is used in a
    specific sentence
  • Semantic relations between words
  • hypernymy, hyponymy, synonymy, antonymy,
    meronymy, holonymy, polysemy, homonymy and
    homophony.

48
Meaning of sentences
  • The cat eats the mouse The mouse is eaten by
    the cat.
  • Goal
  • built a representation of the meaning of the
    sentence
  • attach semantic roles to constituents
  • Some characteristics of a sentence that influence
    semantic interpretation
  • Type declarative, interrogative, imperative,
    exclamatory
  • Polarity positive, negative
  • Tense past, present, future
  • Voice Active, passive
  • Some semantic roles (different from syntactic
    roles)
  • Agent the doer of a volitional act
  • Patient the thing that is affected by an act
  • Recipient the receiver of an object
  • Instrument the instrument used to perform an act.
  • Time the time the act is performed.
  • Location the location of an act or object.

49
Semantic Roles
  • Ex
  • JohnAGENT hit PeterPATIENT with a ballINSTRUMENT.
  • Ex
  • I ate spaghetti with meatballsINGREDIENT_OF_SPAGUE
    TTI
  • I ate spaghetti with saladSIDE DISH_OF_SPAGUETTI
  • I ate spaghetti with a forkINSTRUMENT
  • I ate spaghetti with a friendACOMPANIER_OF_EATING
  • Important for machine translation
  • I AGENT PERSON_LACKING_SOMEONE miss you PATIENT
    PERSON_MISSED
  • ?Je PATIENT PERSON_MISSED teAGENT
    PERSON_LACKING_SOMEONE manque.
  • Tu PATIENT PERSON_MISSED me AGENT
    PERSON_LACKING_SOMEONE manques.

50
Levels of study of NLP
  • Lexical
  • Phonetics phonology
  • Parts-of-speech Morphology
  • Phrase Structure and Syntax
  • Semantics
  • Pragmatics
  • Discourse
  • World-Knowledge

51
Pragmatics
  • goes beyond the study of the meaning of a
    sentence
  • tries to explain what the speaker is really
    expressing
  • understanding how people use language socially
    (ex. figures of speech, speech acts, discourse
    analysis, )
  • Ex Could you spare some change?

52
Discourse Analysis
  • In logics A ? B ? C ? C ? B ? A
  • Not in NL
  • John visited Paris. He bought Mary some
    expensive cologne. Then he flew home. He went
    to Kmart. He bought some underwear.
  • John visited Paris. Then he flew home. He went to
    Kmart. He bought Mary some expensive cologne. He
    bought some underwear.
  • NL Text must be coherent
  • ? Bill went to see his mother. The trunk is what
    makes the bonsai, it gives it both its grace and
    power.

53
Using world knowledge
  • Using our general knowledge of the world to
    interpret a sentence/discourse
  • Ex A men was killed yesterday because a jealous
    husband returned home earlier then usual.
  • Ex Silence of the lambs
Write a Comment
User Comments (0)
About PowerShow.com