LING 388: Language and Computers - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

LING 388: Language and Computers

Description:

( Russian) Noun-Noun Compounding. V-N Compounds. examples. pickpocket (V-N) scarecrow (V-N) ... girl who goes to school. school girl. evidence against this? ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 41
Provided by: sandiw
Category:

less

Transcript and Presenter's Notes

Title: LING 388: Language and Computers


1
LING 388 Language and Computers
  • Sandiway Fong
  • Lecture 25 11/22

2
Administrivia
  • No Lecture Thursday
  • Thanksgiving
  • Homework 5
  • handed out last time
  • due next Tuesday 29th
  • (after Thanksgiving)

3
Last Time
  • solving language puzzles and Artificial
    Intelligence (AI)
  • e-rater (from ETS Technologies)
  • scores essays
  • based on a vector of linguistic features
  • claimed high agreement (98) with human raters
  • not real understanding but is real
    understanding necessary?
  • save manpower and money? machine-assisted rating

4
Todays Topics
  • Internet search and language
  • stemming
  • compounding

5
Search
  • information retrieval
  • do we search exactly on what is typed?
  • or can we do better?
  • possibilities
  • use WordNet
  • use stemming

6
Search
  • possibilities
  • use WordNet
  • example
  • large house
  • spacious house

7
Search
  • possibilities
  • use word stemming
  • example
  • symmetrical
  • symmetric
  • symmetry

8
Search
  • search is a compromise between precision and
    recall
  • you can typically boost one at the expense of the
    other
  • precision
  • of the answers/hits returned, what is the
    proportion that is relevant?
  • recall
  • what proportion of the true relevant answers are
    returned?

9
Morphology
  • Inflectional Morphology
  • basically no change in category
  • ?-features (person, number, gender)
  • Examples movies, blonde, actress
  • Irregular examples
  • appendices (from appendix), geese (from goose)
  • case
  • Examples he/him, who/whom
  • comparatives and superlatives
  • Examples happier/happiest
  • tense
  • Examples drive/drives/drove (-ed)/driven

10
Morphology
  • Derivational Morphology
  • basically category changing
  • nominalization
  • Examples formalization, informant, informer,
    refusal, lossage
  • deadjectivals
  • Examples weaken, happiness, simplify, formalize,
    slowly, calm
  • deverbals
  • Examples see nominalization, readable, employee
  • denominals
  • Examples formal, bridge, ski, cowardly, useful

11
Morphology and Semantics
  • Morphemes units of meaning
  • suffixation
  • Examples
  • x employ y
  • employee picks out y
  • employer picks out x
  • x read y
  • readable picks out y
  • prefixation
  • Examples
  • undo, redo, un-redo, encode, defrost, asymmetric,
    malformed, ill-formed, pro-Chomsky

12
Stemming
  • normalization procedure
  • inflectional morphology
  • cities ? city, improves/improved ? improve
  • derivational morphology
  • transformation/transformational ? transform
  • criterion
  • preserve meaning (word senses)
  • organization ? organ
  • primary application
  • information retrieval (IR)
  • efficacy questioned Harman (1991)

13
Stemming and Search
  • up until very recently ...
  • Word Variations (Stemming)
  • To provide the most accurate results, Google does
    not use "stemming" or support "wildcard"
    searches.
  • In other words, Google searches for exactly the
    words that you enter in the search box.
  • Searching for "book" or "book" will not yield
    "books" or "bookstore". If in doubt, try both
    forms "airline" and "airlines," for instance

14
Stemming and Search
  • Google is more successful than other search
    engines in part because it returns better, i.e.
    more relevant, information
  • its algorithm (a trade secret) is called PageRank
  • general idea how many people link to you?
  • exact details are unavailable

15
Stemming and Search
  • SEO (Search Engine Optimization)
  • is a topic of considerable commercial interest
  • goal
  • How to get your webpage listed higher by PageRank
  • techniques
  • e.g. by writing keyword-rich text in your page
  • e.g. by listing morphological variants of
    keywords
  • Google does not use stemming everywhere
  • selective use only
  • and it does not reveal its algorithm to prevent
    people optimizing their pages

16
Stemming
  • IR-centric view
  • Applies to open-class lexical items only
  • stop-word list the, below, being, does
  • exclude determiners, prepositions, auxiliary
    verbs
  • not full morphology
  • prefixes generally excluded
  • (not meaning preserving)
  • Examples asymmetric, undo, encoding

17
Stemming Methods
  • use a dictionary (look-up)
  • OK for English, not for languages with more
    productive morphology, e.g. Japanese, Turkish
  • write rules, e.g. Porter Algorithm (Porter, 1980)
  • Example
  • Ends in doubled consonant (not l, s or z),
    remove last character
  • hopping ? hop
  • hissing ? hiss

18
Stemming Methods
  • dictionary approach not enough
  • Example (Porter, 1991)
  • routed ? route/rout
  • At Waterloo, Napoleons forces were routed
  • The cars were routed off the highway
  • notes
  • here, the (inflected) verb form is ambiguous
  • preceding word (context) does not disambiguate

19
Stemming Errors
  • Understemming failure to merge
  • Example
  • adhere/adhesion
  • Overstemming incorrect merge
  • Example
  • probe/probable
  • Claim -able irregular suffix, root probare
    (Lat.)
  • Mis-stemming removing a non-suffix (Porter,
    1991)
  • Example
  • reply ? rep

20
Stemming Interaction
  • interacts with noun compounding
  • example
  • operating systems
  • negative polarity items
  • for IR, compounds need to be identified first
  • want to index by concept (compounds)

21
Noun-Noun Compounding
  • examples
  • operating system (OS)
  • negative polarity item (NPI)
  • often abbreviated

22
Noun-Noun Compounding Semantics
  • productive
  • examples
  • tea leaf
  • teabag
  • teabreak
  • tea garden
  • tea service
  • teapot

23
Noun-Noun Compounding Semantics
  • multiple semantic relationships between elements
    of the compound possible
  • example (Keene Costello, 1997)
  • pencil bed
  • a narrow bed
  • a container for pencils
  • a bed shaped like a pencil
  • disambiguating context
  • The pencil bed is in the bedroom upstairs
  • The pencil bed is in the middle of the exam hall
  • He moved the pencil bed last week

24
Noun-Noun Compounding Semantics
  • meaning sometimes unpredictable or hard to guess
    at
  • cf. idioms (kick the bucket, grind sesame...)
  • example (made-up)
  • cousin chair

25
Noun-Noun Compounding Idioms
  • non-compositional semantics
  • examples
  • bootleg
  • marshmallow

26
Noun-Noun Compounding Semantics
  • novel compounds sometimes force the introduction
    of other compounds/words
  • example
  • mountain bike (invented in the 1970s)
  • road bike
  • hybrid

27
Noun-Noun Compounding
  • choice of words sometimes arbitrary?
  • example
  • soccer mom
  • soccer mother
  • Driven by ambiguity reduction?
  • mother of soccer
  • mom of soccer vs. caregiver

28
Noun-Noun Compounding
  • compositionality
  • example
  • school girl
  • girl who goes to school
  • girl school
  • school for girls
  • DP NP girl D s NP school
  • syntax intervenes

29
Noun-Noun Compounding
  • Language-particular
  • examples
  • house museum (Russian)
  • bookstore (English)
  • book-adj store (Russian)
  • van driver (English)
  • genitive construction for compounds headed by
    deverbal nouns? (Russian)

30
Noun-Noun Compounding
  • V-N Compounds
  • examples
  • pickpocket (V-N)
  • scarecrow (V-N)
  • scofflaw (V-N)
  • not right-headed, cf. blackboard
  • not productive

31
Noun-Noun Compounding Conceptual Categories
  • (Costello Keene, 1996) More compounds headed by
    artifacts
  • compound formation affected by conceptual
    categories (WordNet)
  • artifacts more polysemous
  • examples
  • elephant gun
  • gun used for shooting elephants
  • gun used by elephants
  • cherry tree
  • sub-type relationship only

32
Noun-Noun Compounding Syntax
  • How are compounds formed?
  • e.g. relative clause deletion
  • example
  • girl who goes to school
  • ?
  • school girl
  • evidence against this?
  • compounding is acquired before relative clause
    formation (Hoeksema, 1985)

33
Noun-Noun Compounding Syntax
  • Morphological Island Constraint (Botha, 1980)
  • compound-internal morphology changes not possible
  • examples
  • bus stop
  • buses stop
  • operating system
  • operation system
  • algorithms course
  • but
  • frozen foods section cf. frozen food section

34
Noun-Noun Compounding Headedness
  • In English, the head of compound is always to the
    right
  • structural ambiguity
  • (putting aside word sense considerations)
  • example
  • computer furniture design
  • computer furniture design
  • computer furniture design

35
Noun-Noun Compounding
  • structural ambiguity
  • compounds can be very long
  • Judiciary plea bargain settlement account audit
    (Gazdar, 1985)
  • How many ways ambiguous?
  • (N-1)!
  • N is number of words

36
Noun-Noun Compounding
  • Example
  • 1 2 3 4
  • 12 3 4
  • 12 3 4
  • 12 3 4
  • 1 23 4
  • 1 23 4
  • 1 23 4
  • 1 2 34
  • 1 2 34
  • 1 2 34

37
Noun-Noun Compounding
  • structural ambiguity not present in all languages
  • example (Turkish indefinites)
  • signaled morphologically by the possessive marker
    (POSS)
  • N N N N-POSS (right-branching)
  • Turk Language Organization-POSS
  • N N-POSS N-POSS (left-branching)
  • Language Organization- POSS Dictionary-POSS
  • both left and right branching possible
  • Turk Language Organization-POSS
    Dictionary-POSS

38
Back to Search and Meaning
  • keyword-based search is ok...
  • bigger goal
  • Question-Answering (QA)
  • hot research topic
  • need semantics
  • example (Google)
  • how did Sadats assassin die?
  • keyword-based search is not enough
  • need some idea of semantic roles
  • i.e. who did what to whom?

39
Back to Search and Meaning
  • example (Google)
  • how did Sadats assassin die?

40
Back to Search and Meaning
  • example (Google)
  • how did Sadats assassin die?

Consider two examples. In 1945, the
twenty-seven-year-old Anwar al-Sadat and his
friends decided to assassinate the on-and-off
prime minister of Egypt, Nahhas Pasha. Nahhas
had been one of Egypt's most popular nationalist
politicians, but the younger nationalists thought
him too pro-British. Listen to Sadat describe
the decision to kill him
Write a Comment
User Comments (0)
About PowerShow.com