Title: LING 388: Language and Computers
1LING 388 Language and Computers
- Sandiway Fong
- Lecture 25 11/22
2Administrivia
- No Lecture Thursday
- Thanksgiving
- Homework 5
- handed out last time
- due next Tuesday 29th
- (after Thanksgiving)
3Last Time
- solving language puzzles and Artificial
Intelligence (AI) - e-rater (from ETS Technologies)
- scores essays
- based on a vector of linguistic features
- claimed high agreement (98) with human raters
- not real understanding but is real
understanding necessary? - save manpower and money? machine-assisted rating
4Todays Topics
- Internet search and language
- stemming
- compounding
5Search
- information retrieval
- do we search exactly on what is typed?
- or can we do better?
- possibilities
- use WordNet
- use stemming
6Search
- possibilities
- use WordNet
- example
- large house
- spacious house
7Search
- possibilities
- use word stemming
- example
- symmetrical
- symmetric
- symmetry
8Search
- search is a compromise between precision and
recall - you can typically boost one at the expense of the
other
- precision
- of the answers/hits returned, what is the
proportion that is relevant? - recall
- what proportion of the true relevant answers are
returned?
9Morphology
- Inflectional Morphology
- basically no change in category
- ?-features (person, number, gender)
- Examples movies, blonde, actress
- Irregular examples
- appendices (from appendix), geese (from goose)
- case
- Examples he/him, who/whom
- comparatives and superlatives
- Examples happier/happiest
- tense
- Examples drive/drives/drove (-ed)/driven
10Morphology
- Derivational Morphology
- basically category changing
- nominalization
- Examples formalization, informant, informer,
refusal, lossage - deadjectivals
- Examples weaken, happiness, simplify, formalize,
slowly, calm - deverbals
- Examples see nominalization, readable, employee
- denominals
- Examples formal, bridge, ski, cowardly, useful
11Morphology and Semantics
- Morphemes units of meaning
- suffixation
- Examples
- x employ y
- employee picks out y
- employer picks out x
- x read y
- readable picks out y
- prefixation
- Examples
- undo, redo, un-redo, encode, defrost, asymmetric,
malformed, ill-formed, pro-Chomsky
12Stemming
- normalization procedure
- inflectional morphology
- cities ? city, improves/improved ? improve
- derivational morphology
- transformation/transformational ? transform
- criterion
- preserve meaning (word senses)
- organization ? organ
- primary application
- information retrieval (IR)
- efficacy questioned Harman (1991)
13Stemming and Search
- up until very recently ...
- Word Variations (Stemming)
- To provide the most accurate results, Google does
not use "stemming" or support "wildcard"
searches. - In other words, Google searches for exactly the
words that you enter in the search box. - Searching for "book" or "book" will not yield
"books" or "bookstore". If in doubt, try both
forms "airline" and "airlines," for instance
14Stemming and Search
- Google is more successful than other search
engines in part because it returns better, i.e.
more relevant, information - its algorithm (a trade secret) is called PageRank
- general idea how many people link to you?
- exact details are unavailable
15Stemming and Search
- SEO (Search Engine Optimization)
- is a topic of considerable commercial interest
- goal
- How to get your webpage listed higher by PageRank
- techniques
- e.g. by writing keyword-rich text in your page
- e.g. by listing morphological variants of
keywords - Google does not use stemming everywhere
- selective use only
- and it does not reveal its algorithm to prevent
people optimizing their pages
16Stemming
- IR-centric view
- Applies to open-class lexical items only
- stop-word list the, below, being, does
- exclude determiners, prepositions, auxiliary
verbs - not full morphology
- prefixes generally excluded
- (not meaning preserving)
- Examples asymmetric, undo, encoding
17Stemming Methods
- use a dictionary (look-up)
- OK for English, not for languages with more
productive morphology, e.g. Japanese, Turkish - write rules, e.g. Porter Algorithm (Porter, 1980)
- Example
- Ends in doubled consonant (not l, s or z),
remove last character - hopping ? hop
- hissing ? hiss
18Stemming Methods
- dictionary approach not enough
- Example (Porter, 1991)
- routed ? route/rout
- At Waterloo, Napoleons forces were routed
- The cars were routed off the highway
- notes
- here, the (inflected) verb form is ambiguous
- preceding word (context) does not disambiguate
19Stemming Errors
- Understemming failure to merge
- Example
- adhere/adhesion
- Overstemming incorrect merge
- Example
- probe/probable
- Claim -able irregular suffix, root probare
(Lat.) - Mis-stemming removing a non-suffix (Porter,
1991) - Example
- reply ? rep
20Stemming Interaction
- interacts with noun compounding
- example
- operating systems
- negative polarity items
- for IR, compounds need to be identified first
- want to index by concept (compounds)
21Noun-Noun Compounding
- examples
- operating system (OS)
- negative polarity item (NPI)
- often abbreviated
22Noun-Noun Compounding Semantics
- productive
- examples
- tea leaf
- teabag
- teabreak
- tea garden
- tea service
- teapot
23Noun-Noun Compounding Semantics
- multiple semantic relationships between elements
of the compound possible - example (Keene Costello, 1997)
- pencil bed
- a narrow bed
- a container for pencils
- a bed shaped like a pencil
- disambiguating context
- The pencil bed is in the bedroom upstairs
- The pencil bed is in the middle of the exam hall
- He moved the pencil bed last week
24Noun-Noun Compounding Semantics
- meaning sometimes unpredictable or hard to guess
at - cf. idioms (kick the bucket, grind sesame...)
- example (made-up)
- cousin chair
25Noun-Noun Compounding Idioms
- non-compositional semantics
- examples
- bootleg
- marshmallow
26Noun-Noun Compounding Semantics
- novel compounds sometimes force the introduction
of other compounds/words - example
- mountain bike (invented in the 1970s)
- road bike
- hybrid
27Noun-Noun Compounding
- choice of words sometimes arbitrary?
- example
- soccer mom
- soccer mother
- Driven by ambiguity reduction?
- mother of soccer
- mom of soccer vs. caregiver
28Noun-Noun Compounding
- compositionality
- example
- school girl
- girl who goes to school
- girl school
- school for girls
- DP NP girl D s NP school
- syntax intervenes
29Noun-Noun Compounding
- Language-particular
- examples
- house museum (Russian)
- bookstore (English)
- book-adj store (Russian)
- van driver (English)
- genitive construction for compounds headed by
deverbal nouns? (Russian)
30Noun-Noun Compounding
- V-N Compounds
- examples
- pickpocket (V-N)
- scarecrow (V-N)
- scofflaw (V-N)
- not right-headed, cf. blackboard
- not productive
31Noun-Noun Compounding Conceptual Categories
- (Costello Keene, 1996) More compounds headed by
artifacts - compound formation affected by conceptual
categories (WordNet) - artifacts more polysemous
- examples
- elephant gun
- gun used for shooting elephants
- gun used by elephants
- cherry tree
- sub-type relationship only
32Noun-Noun Compounding Syntax
- How are compounds formed?
- e.g. relative clause deletion
- example
- girl who goes to school
- ?
- school girl
- evidence against this?
- compounding is acquired before relative clause
formation (Hoeksema, 1985)
33Noun-Noun Compounding Syntax
- Morphological Island Constraint (Botha, 1980)
- compound-internal morphology changes not possible
- examples
- bus stop
- buses stop
- operating system
- operation system
- algorithms course
- but
- frozen foods section cf. frozen food section
34Noun-Noun Compounding Headedness
- In English, the head of compound is always to the
right - structural ambiguity
- (putting aside word sense considerations)
- example
- computer furniture design
- computer furniture design
- computer furniture design
35Noun-Noun Compounding
- structural ambiguity
- compounds can be very long
- Judiciary plea bargain settlement account audit
(Gazdar, 1985) - How many ways ambiguous?
- (N-1)!
- N is number of words
36Noun-Noun Compounding
- Example
- 1 2 3 4
- 12 3 4
- 12 3 4
- 12 3 4
- 1 23 4
- 1 23 4
- 1 23 4
- 1 2 34
- 1 2 34
- 1 2 34
37Noun-Noun Compounding
- structural ambiguity not present in all languages
- example (Turkish indefinites)
- signaled morphologically by the possessive marker
(POSS) - N N N N-POSS (right-branching)
- Turk Language Organization-POSS
- N N-POSS N-POSS (left-branching)
- Language Organization- POSS Dictionary-POSS
- both left and right branching possible
- Turk Language Organization-POSS
Dictionary-POSS
38Back to Search and Meaning
- keyword-based search is ok...
- bigger goal
- Question-Answering (QA)
- hot research topic
- need semantics
- example (Google)
- how did Sadats assassin die?
- keyword-based search is not enough
- need some idea of semantic roles
- i.e. who did what to whom?
39Back to Search and Meaning
- example (Google)
- how did Sadats assassin die?
40Back to Search and Meaning
- example (Google)
- how did Sadats assassin die?
Consider two examples. In 1945, the
twenty-seven-year-old Anwar al-Sadat and his
friends decided to assassinate the on-and-off
prime minister of Egypt, Nahhas Pasha. Nahhas
had been one of Egypt's most popular nationalist
politicians, but the younger nationalists thought
him too pro-British. Listen to Sadat describe
the decision to kill him